Hybrid Time-Series Prediction Method Based on Entropy Fusion Feature

High-precision time sequence forecasting is a complicated cyber-physical system (CPS) task. Due to the diversity of data scales and types, the classic time-series prediction model meets the challenge to deliver accurate prediction results for many forms of time-series data. Tis work proposes a hybrid model with long short-term memory (LSTM) and embedded empirical mode decomposition (EEMD) based on the entropy fusion feature. First, we apply EEMD in entropy fusion feature long short-term memory (ELSTM) to lessen pattern confusion and edge efects in traditional empirical mode decomposition (EMD). Te sequence is then divided into intrinsic mode functions (IMF) by using EEMD. Ten, feature vectors are constructed between IMFs and their respective information entropy for feature merging. LSTM is used to build a full connection network for each entropy fusion feature IMF subsequence for prediction and each type of IMF subsequence as the feature dimension to obtain its prediction results. Finally, the output results of all IMF subsequences are reconstructed to obtain the fnal prediction result. Compared with the LSTM method, the performance of the proposed method has been improved 64.33% on the evaluation metric MAPE. Te proposed model has also delivered the best prediction outcomes across four diferent time-series datasets. Te experimental results conclusively show that the proposed method outperforms other models compared.


Research Motivation and Problem Statement
1.1.1.Motivation.Time-series forecasting (TSF) is a recent dynamic technique for managing CPS, intelligent processing, fnancial analysis [1], and equipment fault diagnosis [2,3].Hrabia et al. [4] pointed out that if an efective analysis model and prediction of COVID-19 can be developed, the epidemic prevention policy can be adjusted accordingly.TSF will play a constructive role in predictive decision-making.In addition, for CPS, the CPS sensor network will collect diferent types of data.In order to organize system resources more logically, it is necessary to develop a reliable regression prediction analysis model.Huang et al. [5] proposed an EMD method to perform modal decomposition on the original complex signal data and, fnally, obtain multiple intrinsic mode function (IMF) individuals with gentle characteristics.Te EMD helps to decompose and extract the inherent information of data sequences; with the same purpose as neural networks in solving TSF problems, we consider using existing deep learning methods combined with EMD-derived methods for prediction to achieve more reliable prediction performance.Lu et al. [6] proposed a new hybrid approach Prophet-EEMD-LSTM based on decomposition methods and the prediction model.Te fusion model can economically improve the prediction ability of the algorithm.

Problem Statement.
Although the fusion model using the EMD method can improve the prediction ability of the model to a certain extent, the inherent problems of the EMD method, such as mode mixing and boundary efect, will afect the prediction ability of the model.Terefore, it is necessary to develop a model with better decomposition ability to extract the hidden features of time series [7].Moreover, the mode decomposition method itself cannot achieve orthogonal decomposition in the time domain decomposition and there is incomplete decomposition.Terefore, there must be a correlation between the IMF, and how to make full use of this feature to further extract data features is also the focus of our consideration.

Research Challenge and Gaps
1.2.1.Challenges.In common application scenarios, the sequence characteristics are typical nonstrictly stable and nonlinear data types, with high discretization features [8].
EMD is usually used to analyze complex signals with nonstrict stationary characteristics and a high degree of nonlinear dispersion in traditional signal processing.Zhao et al. [9] used a hybrid process based on the EMD and LSTM neural network method.By using the original EMD methodology, the feature extraction ability was improved and the prediction error was decreased.However, there is mode aliasing in signal decomposition.Tis problem will be particularly prominent when the original signal contains more high-frequency components.Tis problem can be dealt with by the VMD-LSTM method proposed by Niu et al. [10].Te intrinsic mode function (IMF) in signal decomposition has boundary efects between their components, which will afect the prediction accuracy [11].Ensemble empirical mode decomposition (EEMD) aims to prevent the mode overlap problem that results from the EMD method's insufcient breakdown of data properties [12].Te method enhanced the extraction of periodic sequence characteristics and trends using the conventional EMD approach as a foundation.In an attempt to lessen aliasing across IMF components and EEMD-LSTM proposed superimposing Gaussian noise on the original signal [13,14].Based on the EEMD components [14], the data are divided into high-frequency and lowfrequency sequences using the enhanced Bi-LSTM.

Research Gap.
Existing research usually focuses on how to improve the bias of modal aliasing and boundary efects on network models through more optimized decomposition models.However, current mainstream modal decomposition fusion models ignored the correlation between IMF components obtained from pattern decomposition.At the same time, this kind of decomposition method cannot achieve complete decomposition (orthogonal decomposition).Moreover, in the framework of the current EMD-LSTM method, each IMF uses independent prediction methods to reconstruct.Tis step places too much emphasis on the characteristic results of the modal component itself, which will cause the prediction result to be skewed to the ofset or special mode for incomplete modal decomposition.In general, the abovementioned model will inevitably have the defect of ignoring the interaction between IMFs.Furthermore, the hidden features between the time-series features of IMFs are also ignored in the construction of most fusion models.(1) Te EEMD algorithm can decompose hidden informative features from time-series data onto multiple characterized IMF components (2) Te ELSTM phase of the algorithm can map the hidden characteristics of IMF components and pass on the same characteristics among components using entropy fusion, relying on the learning characteristics of the LSTM network (3) In the LSTM stage, the algorithm is able to map the hidden characteristics between IMF components through entropy fusion according to the learning characteristics of the LSTM network and transfer the commonality between the components (4) Te proposed optimization algorithm can be applied to diferent datasets and has general applicability

Related Work
Te current research in the time-series prediction feld can be divided into three main categories [15,16].

Statistical Model-Based
Method.Prajapati and Kanojia [17] examined the index that has the largest infuence on the fuctuation of COVID-19 for India by using autoregressive integrated moving average (ARIMA) and autoregressive (AR) models.Behzadi et al. [18] proposed a general information theory framework based on the generalized linear model (GLM), which was applied to the causal inference of heterogeneous datasets and verifed on celestial data.Kap et al. [19] proposed the additive noise model (ANM) on noise level for time-series analysis, which confrmed the  [25] used wavelet transform to process the signal for the regression analysis of user trafc.Zhang et al. [26] decomposed several IMF sequences through variational modal decomposition (VMD) in the study of nonstationary wind speed series.Tey conducted secondary denoising for each IMF and then trained in the improved residual neural network (PCA-BP-RBF), which signifcantly increased the prediction performance.Although the VMD method can efectively eliminate the problem of modal aliasing, there are signifcant diferences between the decomposition results obtained by redefning the IMF function and EMD and its variants.In addition, before performing VMD decomposition on the sequence, the number of modal components K needs to be set frst.However, this step needs to be based on a great deal of prior experience, and in some applications, specifying the number of IMFs is an advantage.However, for certain scenarios where the number of hidden modes of the signal is not predicted, calculating and setting this optimal K-value will actually increase the time cost of modeling.Moreover, this kind of model has a high degree of complexity and is even unsuitable for systems with low time delay requirements.

Machine
Learning-Based Method.Te least squares support vector machine (LSSVM) was created by Gong et al. [27].To optimize parameters on the basis of the particle swarm optimization (PSO) algorithm, Montesinos López et al. [28] addressed the issue of high algorithm time complexity by using the sequential minimal optimization (SMO) algorithm to optimize objectives following the quadratic programming problem of the support vector machine (SVM) and support vector regression (SVR).Pekel [29] used decision tree regression (DTR) to predict soil moisture.Jumin et al. [30] conducted the regression study of solar radiation using the boosted decision tree regression (BDTR) model.Qiu et al. [31] optimized the extreme gradient boosting (XGBoost) model and improved the prediction accuracy compared with other XGBoost-based models.Tis kind of method does not require a massive amount of data, but their model structures are complex.For the three methods described above, we draw the following conclusions.First, machine learning and statistical learning have a large overlap, or machine learning is based on statistical learning.Second, statistical learning is theorydriven, making assumptions about data distribution, explaining cause and efect with strong mathematical theory support, and focusing on parameter inference.Tird, machine learning is data-driven, relying on big data scale to predict the future, weakening the convergence problem, and focusing on model prediction.Fourth, deep learning is a subfeld of machine learning; that is, feature extraction relies more on hidden layer models, weak explanatory properties, and tends to black boxes.
From the three main research categories mentioned above, it is not difcult to draw the following conclusions.Machine learning and deep learning methods have shown great advantages in related research.Moreover, the research of fusion algorithms based on modal decomposition optimization and neural networks also refects its excellent reliability.Terefore, in this paper, we try to combine the related methods of modal decomposition and LSTM methods.
Te EMD decomposition of the correlation prediction model of the sequence data with time characteristics has been developed in recent years [32][33][34].Zhang et al.'s [35] derived model of EMD decomposition improved the gating cycle unit (GRU) and combined EMD with the regression prediction model of PM2.5.Ali et al. [36] pointed out that they a proposed new version of EMD based on the Akima spline interpolation technique and LSTM network, and at the same time, this method enhances the efectiveness of the improved model.Dedovic et al. [37] used EMD and ARIMA to predict air quality.Liu et al. [38] stated that combining the EMD model and ANN can improve the prediction efect.However, most works are based on single type data and there are still modal aliasing and boundary efects.In addition, the relevant algorithms did not fully use the neural network characteristics to mine data features.Terefore, this paper proposes EEMD-ELSTM to ameliorate the prediction accuracy and universality of the model.

Method
Te global EEMD-ELSTM framework is described as follows: (1) EEMD using Gaussian white noise is used to deconstruct the source series data for the period International Journal of Intelligent Systems (2) In the decomposed IMF, the entropy value is analyzed and classifed according to the entropy value (3) Te LSTM network is then used to conduct full connection prediction for IMF subcolumns of each category to obtain prediction results (4) All IMF subcolumns are then combined and reconstructed to obtain the fnal forecast result Te abovementioned process is summarized in Algorithm 1.

LSTM and EMD Relational Model
3.1.1.LSTM.Lobo Neto et al. [39] pointed out that LSTM further optimizes the network performance of RNN and impressive results have been obtained on various time-series problems.Generally, an LSTM network is composed of storage units.Tree special cell structures are responsible for updating network data: output gate, input gate, and forgetting gate.Its structure is displayed in Figure 1, where ⊙ indicates Hadamard product and + is matrix addition.
Inputs for LSTM typically consist of c t− 1 state parameter, h t− 1 state parameter, and x t state parameter.Te three outputs are the c t state parameter, the h t state parameter, and the y t state parameter, where the x t state parameter represents the current round's input, the h t− 1 state parameter represents the round's state quantity output, and the c t− 1 state parameter represents the round's global information carrier.Ten, the y t state parameter represents the current round's output, the h t state parameter represents the round's status output, and the c t state parameter represents the round's global information carrier.
Te LSTM calculating process is as follows.First, the four states for splicing training are obtained by utilizing the current input's x t state parameter of the LSTM and the h t− 1 state parameter got past from the preceding state parameter, and these four states' parameters are expressed as z f , z i , z o , z.After being multiplied by the weight matrix, the splicing vector is turned into a single parameter through a sigmoid activation function as a gating state.z turns the output into such a number between both −1 and 1 through using the tanh activation function.
Second is the forgetting stage, that is, the forgetting gate.Selectively forgetting the information from the preceding node is the primary task of this step.Depending on the abovementioned state parameter z f , f is expressed as forget, which is used as the control function of the forgetting gate to decide whether the data in the previous state c t− 1 needs to be forgotten and output a value between 0 and 1, where 1 means to be completely retained and 0 means to be completely forgotten.Te z f function is expressed as follows: Te third stage is selecting the memory stage and inputting the gate.Tis stage will selectively memorize the input time-series data.It is mainly used to selectively memorize the input x t .Te selected gating signal is represented by the state parameter z, the current input is represented by the state parameter z i , and i is information.Te next stage of transmission c t can be obtained by adding the abovementioned two steps.
Te fnal stage is the output stage and this stage will decide whether the data will be output as the current state.Te decision is mainly made through the state parameter z o .In addition, the c t obtained in the previous stage is scaled through the tanh activation function mentioned above.Compared with the traditional RNN, y t output is also obtained through the change of h t .
On the basis of the traditional RNN chain structure, LSTM uses a special gate structure memory unit to replace the originally hidden nodes, which enhances the overall network's ability to retain time-series data information and extends the network's long-term memory ability.

EMD.
It disintegrates the sequence data into multiple subcolumns according to the characteristics of the time scale itself and does not need other basis functions at the same time.In the structure of the decomposition algorithm, the main part is empirical decomposition.
Usually for a data sequence with temporal characteristics, M � m 1 , m 2 , . . ., m n  , EMD can decompose multiple subcolumns of IMF and a residual component, stack staggers through multiple IMF subcolumns, and reconstruct the original sequence x(t) as Among them, r n (t) is able to be such stagger component, displaying the general current of such sequence resources.
Te EMD method is used with the aim of extracting, from the raw signal, the high-and low-frequency sequences of such signal, as well as the various scale components, and arranging them in the plain sequence of frequency from high to low, so as to obtain the plain sequencer of IMF.
Te main problems of the EMD algorithm mentioned in Section 2 can be described as follows [40][41][42]: (1) Boundary efects: the endpoints of the entire timeseries data are usually not extreme points.Terefore, the constructed envelope function diverges at the beginning and at the end of the sequence due to the uneven distribution of the endpoints.Te deviation 4 International Journal of Intelligent Systems caused by this phenomenon will continue to be superimposed in the decomposition course, which will eventually interfere with the decomposition reliability.
(2) Mode aliasing: IMF after overall sequence data decomposition has incomplete decomposition and components of diferent scales and frequencies are mixed in a subcolumn.When the abovementioned phenomena occur in multiple subcolumns at the same time, the EMD algorithm will lose its physical meaning.

EEMD.
To address the abovementioned issues, we use superimposed noise to extend the EEMD of the sequence.Te EEMD used in this paper superimposes Gaussian white noise.Tese are the precise processes of breakdown.Given the input time-series data M � m 1 , m 2 , . . ., m n ,  , the Gaussian white noise W i , i � 1, 2, ..., n is loaded with an additive mean of 0 into the sequence M.
where i represents the amount of Gaussian white noise added.Comparisons between EMD and EEMD based on the UCI power network dataset are shown in Figure 2. Te EEMD method extends the extremes of the original EMD process during decomposition to alleviate the extreme trailing efect and mode overlap at the endpoints.
In addition, for the IMF subcolumns that can be obtained by the decomposition of the abovementioned two methods, it is obvious that the EEMD method can obtain more subcolumns.At the same time, the IMF subsequences obtained by the modal decomposition algorithm are generally diferent and random.

(a) EEMD model (b) EMD model
Te EEMD method extends the extremes of the original EMD process during decomposition to alleviate the extreme trailing efect and mode overlap at the endpoints.It can be seen from Figure 2 that the boundary efect and modal aliasing problems that arise during the decomposition of the EMD algorithm are efectively improved in the EEMD model after superimposing Gaussian white noise.
Te abovementioned steps are then repeated until the entire time-series data are decomposed N times.Each time, a new Gaussian white noise is added to the series, and fnally, the entire IMF subcolumn can be obtained.
Integrated averaging on imf ij is performed as mentioned in the above steps to obtain imf of the overall sequence M.    International Journal of Intelligent Systems each IMF subcolumn.Te EEMD-ELSTM model algorithm fow structure presented in this paper is depicted in Figure 3.In Section 3.1, to break down the data series, we employ the EEMD approach, m � {m 1 , m 2 , . . ., m n , . ..}, and gain the imf � imf j  , j � 1, n. Ten, we use the ELSTM algorithm to anticipate each imf j , j � 1, . . ., n, providing each IMF series' forecast outcomes.
After the calculation iteration, the L j component of each imf j   subsequence is obtained.We then perform entropy analysis on the abovementioned imf subsequence and take the two highest entropy score subsequences of imf subsequence as the feature dimension, and further construct the feature vector of the ELSTM model.Te specifc implementation steps of the entropy value feature fusion model defned in this section are as follows: frst, calculate the entropy value of each IMF component and then screen the two maximum entropy values after obtaining the entropy value score.Second, for the model of this part, based on modal decomposition, we will get a residual under the ideal decomposition condition since a good residual is usually close to a linear function of one variable.At the same time, based on entropy theory.We can know that under the ideal condition of decomposition, the residual will be a maximum entropy sequence.However, in order to fully exploit other hidden data features after decomposition, we further select an IMF component with the second highest score as an auxiliary dimension.Tird, the specifc step of constructing the feature vector is to add two high entropy components as the feature vector for the LSTM sequence structure with three inputs and one output for the low entropy sequence training, while when constructing the LSTM sequence structure for the high entropy components, all IMF components are calculated as their auxiliary dimensions.For the low entropy sequence, the feature trend for the high entropy sequence is increased, while the high entropy sequence acquires the high-frequency features of the rest of the sequences.So as to achieve the model optimization, we ultimately expect and improve the robustness of information physical system modeling.
Te value of the fnal predicted sequence data m can be expressed as Te prediction formula for M � (imf j ) series is as described above so that the abovementioned L can be regarded as the prediction value of IMF of each subcolumn.Te amount of import data n determines the algorithm complexity of the traditional EMD model, which is O(n 2 ).Our EEMD model performs multiple iterations and cycles on the original EMD model with Gaussian white noise added and subtracted.
As a result, the EEMD method has an O(k * n 2 + k) complexity, in which k seems to be the number of cycles.In the entropy fusion model, calculating the entropy once and dividing the threshold according to the entropy value is approximate to the K-means classifcation algorithm.Consequently, this decomposition algorithm complicacy is defned as O(n).Since such complexity that belongs to the LSTM depends on the input size n and the hidden size m, its complicacy is approximately 4(m * n + m 2 + m), and each prediction of the ELSTM network will increase by n − 1 dimensions; thus, ELSTM has a complexity of , where m * n and m(n − 1) are equal levels, the complicacy of the method is

Experiment
4.1.Experimental Settings.Te datasets used in this work are provided by the University of California.Te experimental results show that the proposed approach overcomes the drawbacks of the conventional EEMD-LSTM method and further improves its accuracy compared with the state-ofthe-art (SOTA) algorithms.Four diferent types of datasets of Tetouan and Morocco from the UCI database (2017) are used in this work: (1) the power grid power consumption, (2) the temperature, (3) the wind speed, and (4) the humidity.Te datasets are listed in Table 1.Te schematic design is shown in Figure 4. Te dataset that is used in this comparative study is related to power distribution networks of Tetouan city which is located in north Morocco.Te historical data used have been taken from the Supervisory Control and Data Acquisition System (SCADA) every 10 minutes for the period between 2017-01-01 and 2017-12-31.We used 1000 sample sequences from this dataset in our study.
It can be seen from Figure 4 that the characteristics of nonstrict stability and nonlinearity are in line with the problem we expect to solve.In order to better visualize the distribution of the aforementioned dataset we used, we also conducted box plot analysis on the data samples.Te results are shown in Figure 5, and it can be seen from the box plot analysis results that there are great nonperiodic and discrete characteristics in the wind speed data in the abovementioned data distribution.In summary, the temporal characteristics of the four research objects are consistent with the problems that our proposed research model expects to solve in cyberphysical systems.
From Figure 6, we know that, from the decomposition results, the power consumption data have the most typical periodic characteristics, followed by local periodic characteristics of wind speed and temperature.Wind speed data have strong discreteness and nonperiodicity, and there are outliers in the data sequence.
Based on the decomposition results of Figure 5, we perform the entropy value calculation operation on the IMF component in Table 2, and combined with our previous description of the entropy fusion feature model, we can clearly know from the entropy theory that when there is a sequence with strong periodicity IMF, the entropy calculation result will inevitably be a low value, and the amount of hidden information contained in it itself is also low.In International Journal of Intelligent Systems addition, through the entropy calculation score, we can also clarify the periodic characteristics of local cycles and problems between power consumption data and temperature and humidity, but the frequency bands are diferent, and the wind speed data has a modal gradient completely diferent from the above three, so the four data sets can represent the complex and changeable real environment in the cyber-physical system to a certain extent.Tis is one of the reasons why we chose the abovementioned four datasets as the model test objects.
To evaluate the reliability and dependability of the proposed model, the root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE) functions, and Pearson correlations (CORR) are used as the indicators [43][44][45].
where y i is the foresting outcome,  y i is the authentic value resource, cov(y i ,  y i ) is the covariance, and var(y i ) and var( y i ) represent the variance.

Parameter Setting.
Te noise added to the experiment is assumed to be zero-mean Gaussian noise.In addition, an unlimited amount of data decompositions is allowed to pursue total decomposition.Te EEMD-ELSTM model in this paper is validated by shufe-split cross-validation and all the result parameters are the optimal value of the result of 10-fold validation.

International Journal of Intelligent Systems
In our comparative experiments and related ablation validation experiments, we set the following parameters.
Te batch scale is set to 25, the epoch scale is set to 50, the validation split is 0.1, and the loss function is a mean square error (MSE), and the optimizer chose the Adam and Dense value as 1.Tere are 40 neurons in the LSTM and ELSTM variable parameter settings.Te SVR parameters are C � 100, degree 3, gamma 1, and the kernel is rbf.Te ARIMA parameters are p � 2, q � 2, and d � 0. For the Prophet model and the XGBoost model, we also give their hyperparameter defnitions in Table 3. Te data are divided into two halves for training (ffty percent) and testing (ffty percent).All experiments are made in the Anaconda with TensorFlow 2.1 platform.

Ablation Experiment.
Te ablation experiments are compared and analyzed on the dataset power consumption for LSTM, ELSTM, EEMD-LSTM [46], and EEMD-ELSTM algorithms.Te results are listed in Table 4.
Te results show that the entropy fusion model and EEMD approach have enhanced the LSTM computation optimization.Te prediction precision of the LSTM model can be improved by introducing the EEMD-ELSTM algorithm.In addition, it can be known from the indicated parameters in Table 4 that the EEMD-ELSTM method has the most optimized results among the abovementioned methods, and the ELSTM using the entropy fusion method is also optimized to a certain extent.7, the prediction results show that the EEMD-ELSTM model gives the regression closer to the truth data than the LSTM model.Combined with the comparison of the iteration results in Table 4, it is claimed that the proposed approach outperforms the LSTM model in terms of prediction accuracy.Figure 8 shows the comparison with the statistical method ARIMA, machine learning method SVM, deep learning method LSTM, Prophet, and decision tree approach XGBoost models.Among the fve models, it can be concluded that the prediction results of EEMD-ELSTM are the closest to the truth.Te prediction error of the EEMD-ELSTM model in Table 5 is also at the smallest level.Six models are compared in terms of RMSE, MAE, and MAPE in Table 5.
In order to further discuss the infuence of LSTM network parameters on the model results, we adjust the Batch size and the number of neurons to verify the efectiveness of the model, and in order to prove that the baseline model LSTM has certain advantages over the traditional ARIMA model, we add the following experiments, which are International Journal of Intelligent Systems performed when the Batch size � 5 and the number of neurons is 50.In addition, we have added a set of SOTA from 2023 as a comparison model of LSTM-TCN [47].
From Figure 9 and Table 6, through the comparison of indicators, it is not difcult to fnd that the LSTM-TCN model has certain advantages over XGBoost in the performance of periodically signifcant time-series data prediction ability, and after parameter adjustment, the LSTM model has certain advantages over the traditional ARIMA model, but it can also be seen that the baseline model always has some disadvantages over the fusion model and ensemble algorithm in terms of the versatility and validity of diferent data.
It can be seen from Table 5 that the EEMD-ELSTM model outperforms other compared models on all datasets in terms of accuracy.Moreover, compared with other models, the EEMD-ELSTM method has good universality.Te best evaluation values were obtained on four diferent datasets.At the same time, EEMD-ELSTM also has certain advantages over the evaluation value of the current advanced algorithm XGBoost model.
From the comparison of evaluation indicators in Tables 5  and 6, it can be seen that compared with the LSTM-TCN model, the proposed model has better applicability in the practical application of CPS.In addition, the results of the ARIMA model are not reliable in the presence of complex stationarity for the data.
Te visual comparison is presented in histograms in Figure 10.Te information is formed into a histogram to more efectively represent the prediction outcomes of the approaches in Tables 5 and 6.It can be seen from the histogram that the EEMD-ELSTM method gains the most optimized values of RMSE, MAE, and MAPE.

Friedman Validation and Post Hoc Nemenyi
Test.We conducted the Friedman test [48] to investigate the advantages and disadvantages of the proposed model.Table 7 presents the Friedman validation result for the six models on four diferent datasets, and the result of Friedman test verifcation is stat � 22.691, p � 0.001.
On the basis of the order value in Table 7, we calculate the critical diference by the post hoc Nemenyi algorithm and we get the result in Figure 11.
Each algorithm's mean value is diferent, and the EEMD-ELSTM model has the lowest mean value.Te mean values of other methods' are the XGBoost decision tree approach, ARIMA method (statistical methods), LSTM model and LSTM-TCN model, Prophet model (deep learning method), and the SVR model (machine learning method) from low to

14
International Journal of Intelligent Systems

Conclusion
Te entropy fusion feature-based hybrid time-series prediction approach for EEMD and LSTM is proposed in this paper.First, a modal decomposition method EEMD based on Gaussian white noise is constructed.Ten, the IMF subcolumn components of some input data are obtained by using this algorithm.Te IMF subcolumn components generated by decomposition are analyzed by entropy.Te component of the IMF subcolumn with the largest entropy value is extracted as the auxiliary dimension, the eigenvector of the ELSTM model is constructed, and the fnal prediction result is obtained by model prediction and reconstruction.In addition, four time-series stream datasets with varying data sizes and six SOTA prediction models are chosen for comparison in the experimental verifcation section.Te experimental results showed that the proposed model outperforms others in reliability and validity.Moreover, based on the MAPE index, we calculated the optimization percentage of the results under each dataset and diferent parameter confgurations by comparing the results of the baseline model LSTM obtained by the experiment and the results of our improved model, and fnally, we improved the MAPE parameters by 66.43% compared with the baseline model.
Te abovementioned experimental verifcation shows that the EEMD-ELSTM model has the following characteristics: (1) Validity: from the analysis of evaluation indices in Tables 5 and 6 and Figures 10, and 11, the optimization EEMD-ELSTM method clearly outperforms the generic nonfused network prediction model in forecasting validity.(2) Universality: from the evaluation index results and Friedman test results of the abovementioned comparison experiments, it can be clearly known that among the four diferent types of datasets, the EEMD-ELSTM algorithm always has a good prediction efect, has good universality, and can be applied to datasets of diferent scales.(3) Causality: the efectiveness of the optimization part and the improved strategy mechanism proposed in the article can be seen from the comparison of ablation experiments set in the article.Ablation research is the most direct way to understand the causality in the system and generate reliable predictions.Te causality of each part of the model was proved by ablation.

Further Study
(1) Such time complicacy of the LSTM method is regarded as generally large, and it depends on the machine computing power, increasing network neurons, and iteration times to improve the operation efect.In future research, it is

Figure 6 :
Figure 6: Decomposition results of the EEMD method for four diferent datasets: (a) temperature analysis, (b) humidity analysis, (c) wind speed analysis, and (d) power consumption analysis.

Figure 10 :
Figure 10: Comparison of prediction results of diferent models.

2
[21]rnational Journal of Intelligent Systems efect and signifcance of the noise overlaid on the extraction sequence in the time-series prediction model.Hanapi et al.[20]proposed the fuzzy sliding window autoregressive conditional heteroskedasticity model for time-series prediction and applied it to aerial data.Studies based on timeseries decomposition[21]that used the Prophet model can handle time-series data with default values and predict itself.

Table 1 :
Information from four categories of statistical data sources.

Table 2 :
IMF component entropy score obtained from time-series data decomposition.

Table 3 :
Experiment settings of each control model.

Table 4 :
Prediction evaluation at diferent iteration times.

Table 5 :
Comparative experimental results of diferent models.

Table 6 :
Improved comparative experiment with tuning parameters.

Table 7 :
Friedman validation results.Figure 11: Friedman and posthoc Nemenyi test.16InternationalJournal of Intelligent Systems considered to further optimize the operation process of relevant machine learning models to improve the model efect or to replace the traditional neural network part of the current common hybrid models.(2) Tis article only aims at one-dimensional time-series data fow, which has certain limitations.Predictive studies with a wealth of multidimensional data are among the hot studies of the present time series, as a result, we are considering including multidimensional data decomposition prediction as the next topic for research.