A New Hybrid Forecasting Model Based on Dual Series Decomposition with Long-Term Short-Term Memory

In recent years


Introduction
Ozone (O 3 ) is one of the six major pollutants in the air, and when the ozone concentration in the atmosphere is too high, the ecological environment deteriorates and adversely afects human health [1,2].Ozone is a trace gas in the earth's atmosphere.It is formed when oxygen molecules in the atmosphere are decomposed into oxygen atoms by solar radiation, and the oxygen atoms combine with the surrounding oxygen molecules.It contains 3 oxygen atoms, and its chemical formula is O 3 .Ozone pollution has special conditions for its formation.Under the conditions of high temperatures, sufcient sunshine, and dry air, VOCs and NOx in the air "meet" and produce photochemical reactions, which are easy to generate ozone pollution [3,4].In recent years, the concentration of ozone and other air pollutants has been changing continuously [5][6][7].Tere are two main reasons for the analysis of the changes in ozone in this study.One is the increase in pollutant emissions caused by frequent human activities, and the other is the weather [8].Te stronger the sunlight, the more the ozone will be produced [9].As people pay more and more attention to the degree of ozone pollution, it is very important for researchers to forecast the ozone concentration in a timely and efective manner [10].
In recent years, China's ozone pollution problems have become increasingly apparent.Beijing-Tianjin-Hebei and surrounding areas, the Yangtze River Delta region [11], and other regions with ozone concentrations show an upward trend.Especially in the summer and autumn, ozone has become the primary pollutant in some cities.Ozone, nitrogen oxides (NOx), volatile organic compounds (VOCs), and other pollutants in the atmosphere can have a photochemical reaction with secondary pollutants [12], resulting in a strong stimulating efect on the human cardiovascular and respiratory systems, leading to the occurrence of a variety of diseases.In addition, ozone can also cause serious harm to the environment [13,14].Advance predictions of ozone pollution notify governments about implementing environmental management decisions.
At present, there are many studies on ozone, and many scholars are also committed to the forecasting of ozone concentrations.Te research on ozone at this stage is mainly divided into two aspects: Te frst aspect is to study the connection between changes in ozone concentrations and human health, the ecological environment, crops, etc.For example, Jiang et al. [15] studied the efect of ozone concentrations in Fuzhou on the risk of death from circulatory diseases, and the results showed that short-term exposure to ozone increased the risk of death from these diseases.Zhao et al. discussed how excessive ozone concentrations on the ground damage the ecological environment, damage human health, reduce crop yields, and cause certain economic losses [16].Chen et al. [17] explored the link between short-term exposure to ozone and lung function and airway infammation.Zhang et al. [18] studied the association between ozone concentrations in Yangzhou and the daily deaths of residents.Tese studies have shown that ozone concentration exceeding the standard negatively impacts human health, the ecological environment, crops, and so on [18].
Te second aspect is the forecast and early warning analysis of ozone, mainly the establishment of statistical prediction models.Te frst is to establish a regression prediction model by analyzing indicators related to ozone concentration.For example, Shams et al. [19] selected NO 2 , SO 2 , air temperature, water pressure, and other indicators as forecasting factors to establish a regression model that could better refect the average daily change in ozone concentration.Gong et al. [20] selected meteorological factors, such as humidity and wind speed, to establish regression models, predict ozone concentration in Xiamen, and establish an evaluation system.Zhang and Ma [21] used meteorological factors, such as wind direction, temperature, humidity, and other meteorological factors as input variables to predict ozone concentration through a back propagation neural network, and the results showed that the model based on meteorological factors helped to improve the prediction performance of the model.Te artifcial intelligence (AI) method uses machine learning technology to train historical data, has higher prediction accuracy in nonlinear time series data [22], and has been successfully applied to solve nonlinear regression estimation problems.Typical models include artifcial neural networks (Masood and Ahmad [23]; Masood and Ahmad [24]), genetic algorithms, support vector machines, random forests, and the AdaBoost model [25,26].
However, traditional AI technology cannot describe the interdependence between time series data, and its prediction accuracy of time series data is limited [27].Te deep recurrent neural network (RNN) can handle the interdependence between time series data due to its embedded feedback and cyclic structure [28] [29].Tsai et al. [30] achieved good results in predicting diferent air pollutants, such as PM 2.5 , PM 10 , SO 2 , and NO 2 based on the RNN model.However, RNN cannot solve the long-term dependency problem.As a variant of the RNN network, long short-term memory (LSTM) can efectively describe time series data by introducing memory units into the network structure [31,32].LSTM not only focuses on event-related semantic information but also considers the temporal efects of important events in the past.Xayasouk et al. [33] applied LSTM for the prediction of air quality concentration using the autoencoder.To anticipate PM 2.5 concentrations, taking into account the impact of wind direction and speed on the variations in spatial-temporal PM 2.5 , Liu et al. [34] presented a novel wind-sensitive attention mechanism with an LSTM neural network model.Compared with other forecasting methods, Liu et al. [25] used LSTM for stock price forecasting of the CSI 300 Index, and the results showed that the LSTM model had a better forecasting efect than the support vector regression and AdaBoost models.Bathla [35] used the LSTM network to predict a data series, and the results showed that the LSTM network performed better than the traditional GARCH model and SVR model in longer range volatility prediction.Te above literature shows that the LSTM model has certain advantages in predicting complex time series data.Terefore, this paper selects LSTM as the main component of the model.
In order to overcome the limitations of traditional AI models, another type of forecasting method, which has achieved good development, is to develop hybrid models.In most hybrid models, signal processing methods are used to decompose the time series, and AI methods are used to predict the decomposed components.Typical sequence preprocessing methods include wavelet decomposition and empirical mode decomposition (EMD) [36].Te EMD algorithm does not depend on any basis function and is essentially diferent from wavelet decomposition.It has signifcant advantages in dealing with nonstationary and nonlinear complex signals.For example, Jin et al. [37] used the EMD algorithm to decompose a trend and analyze the 2 International Journal of Intelligent Systems periodic fuctuations of the air quality parameters.To a certain extent, it refects the various cyclical variations in time series data.However, since EMD is prone to mode aliasing, Amanollahi and Ausati [38] used an ensemble empirical mode decomposition (EEMD) algorithm for air quality prediction.EEMD has been widely used in air quality forecasting.Du et al. [39] used the EEMD method to study the tourist impact on air pollution in Zhangjiajie, China.Due to the lack of a mathematical foundation, the inability to separate components with similar frequencies, and the overenvelope and under-envelope problems of the EEMD method, its decomposition efect is limited [40].As an improved decomposition technology, variational mode decomposition (VMD) can adaptively decompose the effective components corresponding to each center frequency in the frequency domain, and its decomposition accuracy is higher.Te VMD decomposition method is more efective for feature selection in prediction models and has been successfully applied to air quality by decomposition of series [41].Terefore, this paper also adopts the VMD technique as the main decomposition technique for modeling.
Looking at the previous studies, it can be seen that in the prediction of a single AI model, the LSTM model has achieved excellent prediction results; at the same time, the prediction efect of all the combined models is better than that of the single AI method.In some studies using VMD for combined model prediction, the important component information contained in the residual term after the original sequence is decomposed by VMD is ignored.Terefore, this paper considers secondary decomposition of the complex signal contained in the residual term after VMD classifcation to improve the prediction accuracy of the residual term and proposes a new fusion VMD-EEMD dual decomposition method, combining it as an input to LSTM to develop a VMD-EEMD-LSTM-based ozone prediction model.Te structure of the rest of this paper is as follows: Section 2 briefy introduces the hybrid model's construction.Section 3 shares the results and analysis of Nanjing city.Section 4 is a discussion.Section 5 shares conclusions.

Method (Proposed Algorithm)
Before constructing the VMD-EEMD-LSTM portfolio model to predict the ozone change, it is necessary to briefy describe the components of the model portfolio: EEMD, VMD technology, and the LSTM neural network.[42] added a very smallamplitude white noise sequence to the original time series and extended the EEMD technique.Te decomposition algorithm made full use of the frequency-balanced distribution characteristics of the white noise.Te obtained intrinsic mode function (IMF) is averaged to cancel the added white noise, thereby improving the mode aliasing problem.Te decomposition steps are as follows:

EEMD. Wu and Huang
Step 1: Tis will satisfy the normal distribution of white noise.An equal-length sequence of columns n i (t) is added to the original time series x(t) multiple times, i.e., In the formula, x i (t) is the time series after adding white noise for the ith time.
Step 2: Perform EMD on the time series after adding white noise to obtain the IMF component C i,j (t) and r i (t) residual term, where C i,j (t) is the jth obtained by EMD after adding white noise for the ith time, an IMF component.
Step 3: Take the average value of each component C i,j (t) by taking advantage of the characteristic of zero mean value between uncorrelated random sequences to cancel the infuence of the white noise added multiple times on the real IMF component, and fnally, obtain the IMF component decomposition.
In the formula, C j (t) is the jth IMF component obtained after EEMD, and N is the number of white noise sequences.
Step 4, further obtain the fnal decomposition result of EEMD, namely, International Journal of Intelligent Systems Te IMF component C j (t) is the information trend of diferent frequency segments from high to low in the time series, and r(t) is the overall residual term.

VMD.
Te core principle of VMD technology is to use an adaptive and quasi-orthogonal decomposition method to decompose the original input signal into k modal components u k , obey the center frequency and limited bandwidth, and minimize the sum of the bandwidth estimates of all modes [43].Te VMD signal decomposition process is also the solution process for the variational constraint problem.Te model expression for the variational constraint problem is shown in the following equation: In the formula, u k � {u 1 , . .., u k } is the modal component VMF obtained after decomposition and w k � w 1 , . . ., w k   are the center frequencies corresponding to the VMF, respectively; * is the convolution symbol; z t is the partial derivative of t, δ(t) is the shock function; f is the original input signal.Te analytic signal of u k (t) related to it is obtained by Hilbert transform, and then its unilateral spectrum is obtained; the estimated value of the center frequency of each mode is adjusted by multiplying the exponential term e − jw k t , and the spectrum of the mode is adjusted to the fundamental frequency band.In order to obtain the optimal solution to the above constrained variational problem, it needs to be transformed into an unconstrained problem to solve.By introducing the Lagrangian multiplication operator λ(t) and the quadratic penalty factor α, the constrained variational problem is transformed into an unconstrained variational problem of the following form: In the formula, the quadratic penalty factor α can ensure the accuracy of signal reconstruction in the presence of Gaussian noise; the Lagrangian operator can be used to maintain strict constraints.Further, the alternate direction method of multipliers iterative search is used to obtain the saddle point of the above Lagrangian function, that is, to obtain the optimal solution of the constrained variational problem of formula (6), its VMF uk and center frequency.Te expressions of w k are as follows: Te specifc implementation steps of the VMD method are as follows: Step 1: Set the initialization values of parameters such as modal components and center frequency 1 , n � 0 and select the appropriate number K of modal components.
Step 2: Update the values of u k and w k , respectively, according to formulas (7) and (8).
Step 3: Update the value of λ Step 4: Given the judgment accuracy, ε > 0 if the following conditions are met: Ten, stop the iteration; otherwise, go back to step 2.
In the above formula,  u n k (w),  f(w) and  λ n (w) are Fourier transforms corresponding to n k,  u n k , f(t), and λ n , respectively.

LSTM Neural Network.
Te traditional RNN has achieved good results in processing time series because it considers the self-correlation characteristics of time series, but the back propagation algorithm used by RNN results in gradient explosion or gradient disappearance, which cannot describe the long-term dependency problem [22,44].A descriptive implementation of the LSTM model is shown in Annexure A.
LSTM models have been successfully applied in sequence generation, machine translation, speech, video analysis, language modeling, handwriting recognition, and other felds.LSTM models more realistically represent or imitate human behavior, logical development, and neural organization with cognitive processes.

Proposed VMD-EEMD-LSTM Model.
Ozone has typical nonstationary, nonlinear, and other complex characteristics, and the accuracy of using a single prediction method is limited.Since VMD technology can decompose a complex signal into several regular modal components with lower complexity, the prediction accuracy will be greatly improved when the common prediction methods are used to predict and model each modal component after VMD.However, 4 International Journal of Intelligent Systems previous studies only modeled the estimated modal components after VMD and directly discarded the complex information contained in the residual terms after modal decomposition.Diferent from the regular residuals in the EEMD technology, the residuals after VMD are highly complex.If this part of the information is directly discarded, the overall prediction accuracy of the model will be reduced.Terefore, in this paper, a decomposition technique for the residual term of VMD is proposed; that is, the residual term is decomposed by the EEMD technique so as to improve the prediction accuracy of the residual term and then improve the prediction accuracy of the model as a whole.Combined with the excellent performance of the LSTM neural network in characterizing time series' autocorrelation and long memory, the detailed modeling steps are as follows.
Step 1: Use VMD technology to decompose the original sequence to obtain each modal component of VMF and subtract the sum of each VMF data from the original time series data to obtain the remaining residual term of VMD.
Step 2: Normalize the decomposed VMF and select training samples and test samples appropriately.LSTM is used to train each VMF, and the prediction result of each VMF component subsequence is obtained.
Step 3: Use EEMD technology to decompose the remaining residual items after VMD twice, use LSTM to separately predict each IMF subsequence after EEMD, and further superimpose the prediction results of the subsequences to obtain the fnal prediction result of the residual item.
Step 4: Superimpose the prediction results of each VMF component and residual item after VMD to obtain the fnal prediction result of the original sequence.Te complete fow of the implementation is shown in Figure 1.

Study Area
Tis section explains the study area for data collection and implementation results of the proposed method.

Monitoring Stations.
Nanjing is a subprovincial city and the capital of Jiangsu Province.As of 2019, Nanjing has jurisdiction over 11 municipal districts, including Gulou District, Xuanwu District, Jianye District, Qinhuai District, Qixia District, Yuhuatai District, Pukou District, Liuhe District, Jiangning District, Lishui District, and Gaochun District, with a total of 95 streets.Tere are six towns with a total area of 6,587 square kilometers.According to the results of the seventh national census, the resident population at the end of 2020 was 9,314,685.Nanjing has a subtropical monsoon climate and abundant rainfall, with an annual precipitation of 1,200 mm and four distinct seasons.Nanjing is sunny in the spring, rainy in the rainy season, hot in the summer, dry and cool in the autumn, and cold and dry in the winter [45].
Nanjing has a short spring and autumn, a long winter and summer, and a signifcant temperature diference between winter and summer.Te four seasons have their own characteristics and are suitable for tourism.Tere are nine air quality monitoring stations in Nanjing; the details of each station including their coordinates and names, are shown in Table 1. Figure 2 shows the locations of all Nanjing monitoring stations, with the covered areas marked with black dots.
3.2.Ozone Data.Tis paper primarily takes the daily average data sets of nine stations' ozone.Data has been taken from January 2018 until December 2021 for each station in Nanjing.Concentrations of ozone at all stations were normally distributed; the minimum/maximum average values of each station and the mean, median, and standard deviation were used to describe the concentration of air pollutants.Furthermore, to show regional variation in air pollution levels, graphic maps were developed with a geographic information system using ArcGIS (version 10.5).Statistical description of the data is shown in Figure 3 for each station in each year (i.e., from 2018 to 2021).Data from 2018 to 2020 are used for training, and the remaining data are used for testing and validation.Correlation results among stations are shown in Annexure B.

Validation Methods and Comparative Algorithms.
Te evaluation indicators of prediction results are selected as RMSE, MAE, and the mean absolute percentage error (MAPE).Tree evaluation indicators are used to test the prediction efect of the model.Te calculation formula is as follows: In the formula, y i and  y i are the actual value and predicted value of the station ozone, respectively; n is the test sample size and i is the serial number of the test sample point.R 2 is measured in percentage while MSE, MAE, and MAPE use the same units as measured values.
To verify the advantages of the proposed model, four direct prediction models of LSTM comparisons are used, such as LSTM, gated recurrent units (GRU), BILSTM, and BIGRU, as well as three time series models-ARIMA, SARIMA, and Prophet models-and one prediction model, which is the ablation study of the proposed approach, International Journal of Intelligent Systems

Results and Discussion
First, the data is decomposed.Te value of R 2 is a reliability coefcient between zero and one hundred (or 0 and 1.0).A higher R 2 indicates a more reliable model.Due to the signifcance of both models, stability and fexibility, optimizing R 2 is not the goal.For the best results when comparing the adjusted R 2 to the original R 2 value, it is ideal for the two numbers to be quite similar.When comparing the R 2 values of all prediction models, it is clear that the VMD-EEMD-LSTM method produced the highest value (R 2 = 0.98) (Figure 5).
A visual comparison of the results from 150 days of observation are shown in Figure 6 and highlighted in two diferent spots where the results of our prediction overlap the actual values.It is also important to observe that, since the data is not linear and is changing constantly, our prediction is approaching.In other aspects, the outcomes showed that LSTM could memorize over long periods of time and had a high degree of accuracy when making predictions.When dealing with complex ozone data, however, a single LSTM model rarely provides optimal results.By breaking complex time series data into time series with diferent frequencies, EEMD enhanced the prediction accuracy, as seen by an increase in the prediction accuracy across all stations.Similarly, the station comparison experiment revealed that LSTM performed worse than the GRU when incorrect settings were used.To further enhance the model's prediction accuracy, VMD was employed to locate the denoising pattern of the data for LSTM.Specifically, when compared with other models predicting shortterm ozone levels, the VMD-EEMD-LSTM model performed better and was useful in other contexts.
Some researchers predict ozone series after one decomposition, and the prediction accuracy is enhanced compared with direct prediction models due to the high complexity of ozone series.EMD, EEMD, and VMD are all decomposition techniques, yet they all sufer from modal aliasing and inefciency.As a result of this development, VMD is now well-suited for the decomposition of ozone series, a class of problems that had previously plagued the original decomposition method.Te ozone series complexity is further reduced by further decomposing the IMF components that have signifcant complexity after the initial decomposition.Te complexity of the IMFs can be efciently reduced through secondary decomposition; however, it is still unclear how to choose high-complexity IMF components.
After doing a simplex decomposition of the IMF, Wang et al. [46] concluded that the initial component has the most complexity.In this study, we use VMD to quantify the difculty of each IMF component, and we provide quantitative criteria for selecting complex parts.Modal aliasing and inefcient performance are two issues that VMD can successfully address.However, the decomposition efect will be diferent if the VMD's decomposition level and penalty factor are not appropriately specifed in advance.
It has been shown that the predictive performance of the models given by Wu and Lin [47] which use several series decomposition-integrated frameworks is greatly enhanced.Using pollution data from the city of Anyang as an example, EEMD-LSTM shows improvements of 50.8%, 51.81%, and 52.96% over LSTM in terms of MAE, RMSE, and MAPE.Good prediction performance, early warning accuracy, and prediction stability were also observed using the VMD-SE-LSTM and the EEMD-LSTM across many data sets.Tese results were similar to our study, which shows that EEMD-LSTM is better than LSTM after series decomposition.
In another approach, noise was removed from air quality data using an EMD model developed by Huang et al. [48] and the resulting data allowed them to extract the IMF components.Each component of the IMF was then modeled using an EMD-IPSO-LSTM air quality prediction model, and values for each were then retrieved.Te theoretical and technological support for air pollution prediction and management was supplied by the validation analyses of the algorithm, which revealed that the revised model had higher prediction accuracy and enhanced the model ftting efect compared with LSTM and EMD-LSTM.Compared with this study, a similar approach is proposed in our study by International Journal of Intelligent Systems using EEMD, and our method also produced better results than LSTM and EEMD-LSTM.
Te experimental data provided was insufcient because of the experimental settings; however, this strategy was successful in predicting ozone.More work has to be done to refne this study's fndings.We did not have information about meteorological factors close to the monitoring stations because of the limits of the air quality monitoring station data.It is conceivable that including information about these aspects in future studies would signifcantly improve the performance of our model.Te spread of air contaminants, for instance, could be infuenced by factors such as temperature and wind speed.Better air quality forecasts could be achieved with additional study of climatic conditions, automobile emissions, and interactions between citywide monitoring stations.Furthermore, cubic spline interpolation in EEMD could be swapped out for more modern datainterpolation technology to increase the quality of signal decomposition by minimizing the error introduced by ftting the envelope of each extreme point of the signal.

Conclusion
In order to improve the prediction accuracy, various prediction models based on soft computing have been proposed.However, some existing models only emphasize the classifer of the model and pay little attention to data preprocessing.Due to the presence of noise and redundant information in high-dimensional raw data, data preprocessing is a crucial step in predictive models.In this study, a decomposition algorithm is introduced as a preprocessing tool to reduce the dimensionality and extract the intrinsic features of the input raw data.Decomposition algorithms and deep learning latest approaches based on graph as well as transformer based methods [49][50][51] have many achievements in natural language processing, computer vision, and other felds.In the environment, however, and especially in air quality time series forecasting, there has been little progress recently.
Te aim of this paper was to present a new type of prediction model that combines the strengths of EEMD, VMD, and LSTM.Te following fndings are based on a study of ozone data from nine stations in Nanjing: (i) Te accuracy of ozone prediction was signifcantly boosted by decomposing the data using VMD with EEMD into many components of diferent frequencies and then putting these components into the LSTM model.(ii) In many cases, LSTM's hidden layer neural units were chosen automatically based on past data.LSTM helps to predict more results for short-term and long-term data.(iii) Te VMD-EEMD-LSTM hybrid model described here was found to have the best prediction performance based on experimental comparisons, with a high degree of ftting between the true and predicted values.Tese results demonstrated the efcacy of the hybrid prediction strategy suggested here for making accurate forecasts in the future.Since this is an approach with real-world implications, Te fuctuations of ozone data are irregular and complex, and the sequence of time series data is afected by multidimensional and complex factors.For example, the level of humidity and weather factors impact air pollutants.It is very difcult to predict the future trend of ozone values based on several factors.Terefore, in future research, the model proposed in this paper can be combined with multidimensional complex infuencing factors to further improve the overall forecasting efect.
In the future, optimal ensemble models for the decomposed modes can be explored rather than a simple addition approach, and an intelligent forecasting system and smart decision system for ozone monitoring can be developed so that appropriate policies for management can be formulated in light of forecasting results.Future work will be focused on exploring the relationship among pollutants and ozone and using some data fusion approaches.In addition, the suggested method can be applied to other areas of energy forecasting, such as crude oil price forecasting and wind speed forecasting.

Figure 2 :
Figure 2: Study area of Nanjing with monitoring stations.

Te objective of this study is to develop a new time series-based machine learning model which is good for
(iii) Te proposed VMD-EEMD-LSTM model is compared with other machine learning models.Beside that results among diferent stations of the Nanjing city are also compared to verify the effectiveness of the model at diferent locations.Te proposed model performs better than other methods as validated by diferent indicators, such as mean absolute error (MAE), mean square error (MSE), and root mean square error (RMSE).

Table 1 :
Nanjing air quality monitoring station details and geographical locations.
45 results of the EEMD on the ozone time series data are shown in Annexure C. Te VMD method is used to decompose the data's original yield sequence in advance, and diferent VMF components and residual items u are obtained.Ten, the residual item u with a series is decomposed by EEMD and combined with the LSTM model for combined prediction analysis.Te EEMD-LSTM model is a combined prediction model constructed using EEMD technology as sequence preprocessing to combine with the LSTM method and compare with the proposed method.Next, the prediction efects of diferent combination models are compared and analyzed.As shown in Figure4, the average results of all Nanjing stations are shown and compared with diferent models.MAE for ARIMA is45.06,SARIMA is 50.31,BIGRU is 20.21,GRU is 25.4,BILSTM is 25.83, LSTM is 30.51,Prophet is 31.33,EEMD-LSTM is 8.36, and the lowest recorded is for VMD-EEMD-LSTM, 4.54, which shows the accuracy of prediction with low error exists after two series decompositions.Te results of each station for all the validation criteria are shown in Annexure D. Similarly, MSE for ARIMA is 55.07, SARIMA is 60.96, BIGRU is 10, GRU is 14.92, BILSTM is 14.48, LSTM is 16.27, Prophet is 40.71,EEMD-LSTM is 10.74, and the lowest is VMD-EEMD-LSTM, i.e., 5.38.MAPE for ARIMA is 61.3,SARIMA is 68.44,BIGRU is 30.56,GRU is 30.99,BILSTM is 29.86,LSTM is 29.83,Prophet is 28.07,EEMD-LSTM is 3.24, and the lowest is VMD-EEMD-LSTM, 3.1.