A New Hybrid Forecasting Model Based on SW-LSTM and Wavelet Packet Decomposition: A Case Study of Oil Futures Prices

The crude oil futures prices forecasting is a significant research topic for the management of the energy futures market. In order to optimize the accuracy of energy futures prices prediction, a new hybrid model is established in this paper which combines wavelet packet decomposition (WPD) based on long short-term memory network (LSTM) with stochastic time effective weight (SW) function method (WPD-SW-LSTM). In the proposed framework, WPD is a signal processing method employed to decompose the original series into subseries with different frequencies and the SW-LSTM model is constructed based on random theory and the principle of LSTM network. To investigate the prediction performance of the new forecasting approach, SVM, BPNN, LSTM, WPD-BPNN, WPD-LSTM, CEEMDAN-LSTM, VMD-LSTM, and ST-GRU are considered as comparison models. Moreover, a new error measurement method (multiorder multiscale complexity invariant distance, MMCID) is improved to evaluate the forecasting results from different models, and the numerical results demonstrate that the high-accuracy forecast of oil futures prices is realized.


Introduction
Crude oil is a natural and nonrenewable resource that has an irreplaceable effect on the development of the global economy and international financial markets. Since oil is the main source of energy production, it is often considered the single important commodity in the world. e price fluctuations of crude oil may affect the economic situation, social stability, and even national security in the world [1]. Meanwhile, international crude oil price series are regarded as nonlinear and nonstationary time series. Hence, accurate forecasting of the crude oil price is a challenging task of energy market and has increasingly become an active research field.
In recent years, numerous methods for time series predictions have been proposed [2][3][4][5][6][7][8][9][10][11][12][13]. ese methods can be classified into the following three categories: traditional econometric models, machine learning approaches and deep learning models. e autoregressive integrated moving average model (ARIMA) is a popular statistical model applied to time series prediction. Liu et al. [3] proposed two novel forecasting models based on ARIMA, which was employed to forecast two sections of actual wind speed series. Abdollahi and Ebrahimi [4] established a new composite model to predict Brent crude oil prices by integrating the adaptive neuro fuzzy inference system (ANFIS), autoregressive fractionally integrated moving average (ARFIMA), and Markov-switching models. However, the traditional econometric models have evident shortcomings. For instance, the time series data must be stable when these models are used for forecasting. It is difficult to capture the characters if the datasets are nonstationary. erefore, the model is less effective when applied for time series forecasting during periods of sharp fluctuations [14]. With the development of artificial intelligence, machine learning models, such as support vector machine (SVM) and artificial neural networks (ANNs), have attracted a lot of attention because of the learning capabilities for nonlinear kernel mapping between input and output vectors. For instance, Huang et al. [7] explored the forecasting ability of SVM for financial movement direction and proposed a combining model based on SVM and classification methods. Ghiassi et al. [15] presented a dynamic neural network model for time series events prediction, and compared with the ARIMA model, the prediction results of the proposed model have higher accuracy. Liao and Wang [6] established an improved neural network, the stochastic time-effective neural network model, and analyzed the volatility statistics characteristics of the Chinese stock price indices. Wang and Wang [8] established a hybrid model by combining the principle component analysis (PCA) algorithm and random time-effective neural networks (STNN) and explored the predictive performance by considering financial time series. Although machine learning techniques have considerable prediction processing capacity, their precision on the correlations exploring between data is still not efficient. Meanwhile, these methods are extremely time-consuming for big data and predictions are not quite expected [16]. With the establishment of the hidden layer units, the transmission of historical information can be realized by recurrent neural networks (RNNs). Wang and Wang [9] proposed a new forecasting model to elevate the prediction accuracy of crude oil price fluctuations, which is based on multilayer perceptrons (MLP) and Elman recurrent neural networks (ERNN) with stochastic time effective function. Berradi and Lazaara [17] combined principal component analysis and RNNs to predict the stock price from Casablanca Stock Exchange, and the results enhanced the accuracy of the original method and performed a desirable prediction for the stock price. Deep learning methods are the broader series of machine learning methods, which try to learn advanced features from the given data. Compared with traditional neural network models, deep learning methods contain multiple hidden layers of multilayer perceptrons, and they have better performances in managing strong nonlinear characteristics. Long short-term memory network (LSTM) is a type of deep learning method devised to deal with the longterm dependence problems for a special purpose [18]. e network structure of LSTM is much more complex than that of RNNs, which utilizes memory cell states to maintain essential historical information and get rid of the unimportant. Due to the superior algorithm mechanism, LSTM is widely applied to natural language processing (NLP) and sentimental analysis [19,20], time series forecasting [10,21,22], and synthesizing a piece of music [23]. However, the individual forecasting models cannot precisely reveal the complicated connections existing in the nonlinear and nonstationary datasets.
To obtain more accurate and reliable time series prediction, different kinds of hybrid forecasting models have been proposed which could take the advantage of different single models [24][25][26]. Among them, the hybrid models based on decomposition and prediction have been widely recognized, and such models are usually composed of nonlinear decomposition method and forecasting model. Liu et al. [27] presented an improved hybrid forecasting model for wind speed, which includes the empirical wavelet transform method and three types of deep learning networks. By comparing all the data results of different methods, the proposed reinforcement learning based hybrid model is effective in combining three types of deep learning networks and performs better than conventional optimization-based hybrid models. Wang and Wang [28] combined empirical mode decomposition (EMD) method with random time strength neural network to predict global stock indices, and the empirical results showed that the proposed approach veritably has a great effect in predicting stock market fluctuations. Wang et al. [29] established a two-layer decomposition model and then developed an ensemble approach by integrating the fast ensemble empirical mode decomposition method (FEEMD), variational mode decomposition (VMD), and optimized backpropagation neural network by firefly algorithm (FA-BPNN). e empirical results indicated that the developed new model has exceptional forecasting implementation in electricity price series. e first key point of hybrid models is to break down the original data series into several independent subseries and makes it likely for models to adaptively learn the nonlinear characteristics of fluctuations in each subseries. en, by using the inverse transformation algorithm, the forecasting series of each subseries are integrated to acquire the final forecasting results. ese hybrid models could raise the efficiency and precision of modelling by conquering the handicap of nonlinear and nonstationary of original series [30][31][32]. e empirical results show that wavelet transform (WT) is a time-frequency localization analysis method in which the window area is fixed but its shape can be changed. Because it only redecomposes low-frequency signals during the decomposition process, and no longer breaks down high-frequency signals, its frequency resolution decreases as the frequency increases. e EMD, FEEMD, and VMD methods also have some certain limitations, for example, inadequate mathematical explanations, the boundary effects, noise oversensitivity, and pattern overlap. ese may cause excessive decomposition of the original data and adversely affect the prediction results [33,34]. On the other hand, the well-known deep learning model causes overfitting problems and is always based on historical information without thinking over the statistical regularity of behavior in the financial market, which leads to deficient precision [10,32].
To improve the disadvantages of the above widely recognized decomposition methods and the traditional deep learning methods, this paper proposes a novel ensemble energy forecasting framework, WPD-SW-LSTM, which combines wavelet packet decomposition (WPD), the stochastic time strength weights (SW) method, and LSTM. e WPD is proposed on the basis of the issue that the inferior frequency resolution of wavelet decomposition in the high-frequency range and poor time resolution in the low-frequency range. It is a more sophisticated method of signal analysis to improve the temporal resolution signal. Moreover, the WPD working speed is faster than the traditional WT, and by selecting the appropriate wavelet basis function and mother function, the mixing-frequency problem can be improved. erefore, WPD is adopted in this research to explore the complexity of nonlinear 2 Computational Intelligence and Neuroscience characteristics for original energy future time series. In fact, there are complicated factors that affect energy futures prices in the process of market transactions fluctuations. SW is based on stochastic process which conforms with both the real trading market and the gating mechanism in the forecasting model [6,8,10]. e mechanism of SW is to measure historical information in conformity with the time of occurrence. e newer the historical data occurs, the more valuable its data information is to present future information, so that historical price figures can be employed to advanced pick up the fluctuations statistics in the energy futures series. In addition, this research employs the WPD method to extract the original crude oil series for the first time and firstly improves the conventional LSTM model with stochastic time strength weights for the crude oil prices forecasting. With the method of WPD, the original energy futures price series can be decomposed into several subseries (SS i ), which are in different frequency bands. en, different SW-LSTM models are modeled for the corresponding SS i , respectively. Finally, the ensemble forecasting result of the original energy futures series is produced by integrating all the predicted SS i components. To estimate the predictive power of the proposed model WPD-SW-LSTM, the conventional and latest hybrid models (SVM, BPNN, LSTM, WPD-BPNN, WPD-LSTM, CEEMAD-LSTM, VMD-LSTM, and ST-GRU) are introduced for comparative analysis. In order to reveal the predictive capabilities of different forecasting models, quantitative analysis is performed through different error methods. At the same time, this research proposes a new error measurement method called multiorder multiscale complexity invariant distance (MMCID) [9,35]. e main contributions of this paper are summarized as follows: e structure of this article is as follows. Section 2 explains the price datasets from the energy futures markets. Section 3 introduces the WPD and SW-LSTM methodologies and provides the main framework of this paper. Section 4 demonstrates the experimental forecasting results in detail. Section 5 compares the proposed hybrid method with other models, which are SVM, BPNN, LSTM, WPD-BPNN, WPD-LSTM, CEEMAD-LSTM,VMD-LSTM, and ST-GRU. Moreover, error measurement methods are applied to estimate the prediction performance of each model in this section. Finally, Section 6 summarizes the main conclusion of this study.

Datasets
Crude oil is an international bulk financial commodity, which can be traded in markets around the world either through spot oil or through financial derivative contracts.
is research mainly focuses on the oil futures market, and four representative oil futures indices are selected for the case study: west Texas intermediate (WTI) futures prices series, Brent crude oil futures prices series, RBOB gasoline, and heating oil. ese four datasets are from the New York Mercantile Exchange (NYMEX) energy futures market, which can be downloaded from https://www.wind.com.cn/. WTI crude oil price is widely applied in the pricing of US domestic crudes. Brent is the theoretical international oil benchmark, and prices of most oil use Brent crude as the criterion, which connected with two-thirds of all the world's oil contracts. Brent crude and WTI dominate the oil market, and both determine pricing in their corresponding markets.
ey are known as light sweet oil because they contain low sulfur, making it "sweet," and have low density, making it "light." Gasoline and heating oil are refined from crude oil which are usually merchandised as futures contracts in financial markets. Figure 1 reveals the similar dynamic changes in more than a 10-year period from January 2, 2009, to October 23, 2019, of the four corresponding oil futures series. In the past decades, the price fluctuation trends of these four futures series are almost the same, which manifest that there is a certain correlation between them.

Wavelet Packet Decomposition.
Wavelet transform is a mathematical method produced to solve the problem of decomposition of nonstationary signals. Compared with wavelet analysis, wavelet packet decomposition (WPD) can be used to analyze the signal more meticulous. Wavelet packet analysis can divide the time-frequency plane in more detail, and the resolution of the high-frequency part of the signal is better than wavelet analysis [36]. It can also adaptively select the best wavelet basis function according to the characteristics of the signal in order to better analyze the signal. e theory of the WPD analysis is as follows [37][38][39]. e wavelet packet function is a time-frequency function; it can be defined as where the integers j and k are the index scale and translation operations. e index n is an operation modulation parameter or oscillation parameter. e first two wavelet packet functions are the scaling and mother wavelet functions: When n � 2, 3, . . ., the function has the following recursive relationship: Computational Intelligence and Neuroscience where h(k) and g(k) are the quadrature filter function related to the previously defined scaling function and mother wavelet function. e wavelet packet coefficients w n j,k are calculated by the inner product 〈f(t)W n j,k 〉, which is defined as According to the literature [40], the number of the decomposition level is often in the range from 2 to 4 in forecasting model. In the present work, the 3-level framework of WPD algorithm is applied, which is schematically shown in Figure 1(a). Additionally, the Daubechies wavelets of order 4 are employed as the mother wavelet in this research [41], and the corresponding decomposition result of the WTI crude oil is demonstrated in Figure 2(b). Each subseries with different frequency band represents a sort of oscillatory factor embedded in the futures price indices. In Figure 2(b), the decomposed subseries "DDD3," "DDA3," "DAD3," "DAA3," "ADD3," "ADA3," "AAD3," "AAA3" are recorded as SS i (i � 1, 2, . . . , 8) series subsequently.

Long Short-Term Memory
Network. Long short-term memory networks are a particular form of RNNs that can handle with long-term and short-term dependencies. ey were introduced in 1997 by Hochreiter and Schmidhuber [18] and were improved and promoted in subsequent work. Although the structure of traditional RNNs are entirely component of handling long-term memory dependencies in theory, the effect is confined in the actual application [42]. erefore, the memory storage capacity of RNNs is more suitable for short-term sequences. On the basis of conventional RNNs, cell states and gate mechanism are added to the hidden layer, so that the gradient vanishing problem can be largely mitigated through its control gates. In addition, each time the historical message is dispatched to the neurons of the hidden layer, several control gates with different functions are employed to regulate the information of the past and latest. e principle of the control gate is described as follows. It is mainly composed of a sigmoid neural net layer and a pointwise multiplication operation. e output values of sigmoid function stage are between 0 and 1, which indicate how much information can be delivered to the next step. A value of zero means letting nothing through, while a value of one means letting everything through. Specially, when the value is 0, it means nothing can be transmitted, and when the value is 1, it implies everything can be transmitted. e LSTM control gates involve three gates: the forget gate f t , the input gate i t , and the output gate o t . e forget gate determines how much historical information stored in the current moment from the last moment. e input gate judges the information saved in the cell state, and the output gate decides the output data based on the cell state. e architecture of LSTM network is shown in Figure 3. e description of LSTM networks follows Fischer and Krauss [43], Sainath et al. [44], and He et al. [45]. e specific algorithm steps of LSTM are as follows: (i) e memory cell reads in the input x t and the previous hidden state h t−1 , which can reveal longterm dynamic trends and abandon the redundant useless information. e forget gate is determined by the following equation: (ii) e first part of input gate in the model determines how much current information should be retained in the cell state: (iii) e second part is to generate a new candidate vector C t to update the state, which is according to the following equation: (iv) After that, the new cell state C t is constructed on the basis of the outcomes of the last steps with ⊗ denoting the Hadamard (element-wise) product: (v) Finally, the output gate o t is updated and the final output h t is decided based on the updated state and the output gate state: In the previous equations, the following notation is used: (iv) f t , i t , and o t are forget gate, input gate, and output gate vectors. (v) C t and C t are vectors for the cell states and candidate values. (vi) h t is a vector for the output of the LSTM layer. Computational Intelligence and Neuroscience (vii) σ(·) and tanh(·) are the sigmoid function and hyperbolic tangent function, respectively.

LSTM with Stochastic Time Effective Weight Function (SW-LSTM).
Dufresne and Gatheral et al. [46,47] demonstrate that the prediction of financial market price series should integrate great amount of historical data, because the information represented in different periods has different impacts on future results. In other words, the closer the data is to the current time, the stronger the impact of information is at that moment, and, on the contrary, the further the data is, the weaker the influence is [48]. erefore, to improve the accuracy of forecasting in actual application, this paper considers combining the SW function with LSTM theory in the predictive modelling process. During the stage of model training, SW function is integrated into the LSTM model to construct a novel forecasting model, which is referred to as long short-term memory with stochastic time strength weight function model (SW-LSTM). e expression of SW function derives from a stochastic process [6]. It can assign different weights to different data in the light of the variant time of occurrence. e mathematical expression is as follows: where β( > 0) is the depth of market parameter, t 0 is the moment of the latest time point in the data set, and t n is an arbitrary time point in the dataset. B(t) is the standard Brownian motion which is commonly considered as random movement of a particle in liquid [49]. μ(t) is the drift function which mainly direct trend changes. ω(t) is the wave function which is applied to model the uncertain events during the forecasting process. e mathematical expression of μ(t) and ω(t) is as follows: , In the training process of conventional LSTM network, the parameter matrices W f , W i , W C , and W o are modified following the backpropagation in each iteration through time procedure of typical RNNs [17]. e model training error of the sample point n is defined as For the SW-LSTM model, a new description of model training error E tn can be obtained: en, the corresponding global error of model training is defined as In the modelling process, based on the newly defined global error E, the model parameters are updated through the gradient descent method [10,50,51]. First, the partial derivative of each model parameter needs to be calculated from the global error function. en, the principle of parameter update is as follows: Figure 3: e architecture of LSTM network. 6 Computational Intelligence and Neuroscience where net f,t , net i,t , net C,t , net o,t denotes the input of the corresponding function, e above is the algorithm of SW-LSTM model, which corrects the model parameters accords with the gradient descent method. Figure 4 illustrates the training algorithm procedures of the proposed model, which involve six steps. For the different subseries of different crude oil series, different hyperparameters, which include the training steps, the number of hidden layers units, the learning rate, number of iterations, and the batch size, should be trained by the proposed model. e specific modelling and empirical prediction are given in Section 4.

Forecasting Process of the Hybrid WPD-SW-LSTM Model.
In this study, the fluctuation of energy futures prices is applied to the proposed hybrid forecasting model, WPD-SW-LSTM. e procedure of the WPD-SW-LSTM approach is described in brief subsequently, and the flowchart of this research is shown in Figure 5. Firstly, the main process of the proposed model is displayed on the upper left of Figure 5, which includes three steps. e first step is data decomposition, where the original preprocessed data are decomposed by WPD method. en, applying the improved SW-LSTM method for subseries forecasting step, the third step is the ensemble forecasting step. en, the final forecasting results can be obtained by aggregating the subseries forecasting results with inverse wavelet packet transform. e specific description of each step is as follows: Step 1: the WPD technique is employed to analyze the original energy futures series X(t)(t � 1, 2, . . . , N). And, 8 subseries SS i , i � 1, 2, . . . , 8 are derived from the three-layer WPD method, which indicate that the local oscillations in different frequency bands. e details of the WPD algorithm are given in Section 3.1.
Step 2: each subsequence SS i derived from WPD method is separated into training and testing datasets. e SW-LSTM network is utilized to train and establish Computational Intelligence and Neuroscience the forecasting model on the basis of the training dataset. Model parameters need to be set in advance, which includes the learning rate, the number of hidden layer units, the number of iterations, and the batch size. ey are essential for predicting precision of the model. e training algorithm procedures of SW-LSTM model are proposed in Sections 3.2 and 3.3.
Step 3: it composites the prediction of each SS i to obtain the final forecasting results by employing the theory of inverse wavelet packet transform. Moreover, linear regression and relative error are applied to investigate the correlation between predictive points and actual values.
Step 4: multiple evaluation indicators are adopted to estimate the prediction ability of WPD-SW-LSTM, which involves MAE, RMSE, MAPE, SMAPE, and TIC and a novel method multiple multiorder complexity-invariant distance (MMCID) based on information theory. In addition, other models like SVM, BPNN, LSTM, WPD-BPNN, and WPD-LSTM are taken into account for prediction comparison. Step 4: establish the loss function

Forecasting and Statistical Analysis
Step 5: modify the connective weights Step testing to examine the effectiveness of the proposed model. Table 1 provides the selection and division of the four selected oil futures indices. Generally, to minimize the influence of noise and finally enhance the accuracy of forecasting, each subseries SS i derived from WPD is normalized to the range of [0, 1] by the following standardized method [52,53]: . (17) After that, to acquire the true predictive value and then intuitively compare the numerical results with the actual value, the normalized output variables S(t) should be reverted to S(t) as follows:

Training and Forecasting by the Hybrid WPD-SW-LSTM
Model. In this section, four different energy futures price series are carried out to support the proposed hybrid WPD-SW-LSTM model. e decomposition merit of WPD makes it exceptional in the extraction of feature sequences. e model parameters are trained by calculating the root mean square error between the predicted value and actual value. e global error between the predicted value and the actual target is reduced through weights modification. e training enters the next step when the global error is less than the preset value. For all prediction models involved in this article, the input units are set to 4, and the output units are set to 1. In WPD-SW-LSTM model, the batch size is set to 32, the hidden size is 30, and the epochs number is 400.
Afterwards, the normalized subseries SS i obtained from WPD are trained and predicted by the SW-LSTM model. e   Computational Intelligence and Neuroscience number of input samples is set to 4, and the number of outputs is set to 1; that is, the 4th order historical data are used to predict the data of the next period. Figure 6 shows the forecasting results of each subseries from the futures series of WTI crude oil. It is shown visually that the predicted value of each subseries SS i is almost consistent with the actual values.
With the purpose of illustrating the prediction from the SW-LSTM forecasting model, Figure 7 demonstrates the empirical results of each subseries from RBOB gasoline. Figures 6 and 7 present decomposed forecasting results of WTI crude oil and RBOB gasoline as examples, which is a critical component that measures the fluctuations of the prediction, especially in forecasting the direction of fluctuations accurately. e subseries SS i has been recognized as the whole trend of the futures price series, whose results from the proposed forecasting model are well predicted. e curves of the actual data and the predicted data intuitively are very approximating. en, the final predictive results of the four sample datasets can be calculated by employing the theory of inverse wavelet packet transform. Figure 8 shows the final predictive results for four indices, WTI, Brent, heating oil, and RBOB, with the proposed WPD-SW-LSTM model. From this figure, the fluctuation trends of the predictive data are extremely near that of the actual data. In addition, the absolute correlation error results of the empirical analysis are also revealed in Figure 7, which can be calculated by RE(t) � |y t − y t |/y t . It can be concluded that the predicted results nearly have consistent trends with the fluctuations of the actual data. e results of RE are also centralized in (0, 0.01), and only a few sectional data points surpass 0.01 and are smaller than 0.015. It means that with repeated experiments, the energy futures series have been trained excellently, and the forecasting performance of the WPD-SW-LSTM model is improving.
It is generally known that the predicted results and the actual value can be fitted by linear regression method, where the predicted points are regarded as the dependent variable Y, and the actual data are considered as the independent variable X. rough linear regression analysis between the predicted value of the WPD-SW-LSTM model and the actual data, the prediction accuracy can be judged by the goodness of fit. e closer the goodness of fit value is to 1, the closer the predicted value is to the true value. An effective numerical indicator between the two variables is the correlation coefficient R. e curves of linear regression for series WTI, Brent, heating oil, and RBOB are revealed, respectively, in Figure 9, and the numerical results are revealed in Table 2. In detail, the values of R for these four series are all above 0.98, and the regression coefficients a of the linear equations are near to 1, which indicates that the predicted values are almost close to the actual values. e regression equation parameters of the proposed model for WTI are a � 0.9934, b � 0.6931, which is approaching to the ideal situation y � x, followed by the Brent indices, a � 0.9217, b � 4.864. e heating oil is a � 0.9441, b � 0.0823 and RBOB gasoline is a � 0.9930, b � 0.0007.

Performance Evaluation Criteria.
While the established model WPD-SW-LSTM is utilized to the forecasting experiments, it is also indispensable to validate the forecasting effects of different models. en, five models (SVM, BPNN, LSTM, WPD-BPNN, and WPD-LSTM) are employed to the forecasting evaluations in this part. Support vector machine (SVM) technique is displayed in this part, which is regarded as the state-of-the-art machine learning theory for binary classification [54][55][56]. Additionally, to fully prove the effectiveness of the proposed model, BPNN, LSTM, and WPD-BPNN are selected to make a comparison because the proposed model is constructed based on LSTM network, and backpropagation neural network (BPNN) is the most typical neural network. For the purpose of estimating the forecasting error of the new hybrid model and comparing it with other five models, the error measurement between actual data points and predicted value for different models are investigated. Among them, mean absolute error (MAE), root mean square error (RMSE), mean absolute percent error (MAPE), symmetric mean absolute percent error (SMAPE), and eil inequality coefficient (TIC) are selected as the error evaluation criteria, which can indicate the forecasting performance of each model. Generally, the smaller the error (MAE, RMSE, MAPE, SMAPE, and TIC) values are, the more accurate the predictive ability of the forecasting model is [52]. e evaluation definitions are expressed as follows: where y t and y t are the actual value and the predicted value at time t, respectively, and N is the total number of the data. Figure 10 illustrates the forecasting results of WTI, Brent, RBOB, and heating oil for the six forecasting models in comparison. Additionally, the forecasting results from the insert plots of Figure 10 show the local prediction of training sets and testing sets from the proposed WPD-SW-LSTM   Computational Intelligence and Neuroscience SMAPE, and the right y-axis is the TIC value. But for the case of RBOB and heating oil, the left y-axis represents the value of MAE, RMSE, and TIC, and the right y-axis is the value of MAPE and SMAPE. From Figure 11, the MAPE and SMAPE have similar numerical results for all the case study. e MAPE, SMAPE, and TIC values of RBOB and heating oil indicate that there is no obvious difference between WPD-LSTM model and the WPD-BPNN model, but in accordance with the results of MAE and RMSE, the former is slightly better than the latter model.
In order to verify whether the proposed model is significantly different from other forecasting models (WPD-LSTM, WPD-BPNN, LSTM, BPNN, and SVM), the nonparametric Wilcoxon signed rank test is applied on two absolute errors by two compared models [57][58][59]. e corresponding statistical test results of the four indexes are presented in Table 7. e results illustrate that the proposed model has statistical significance among the other models. Besides, in Tables 3-6, the error evaluations of MAE, RMSE, MAPE, SMAPE, and TIC by WPD-SW-LSTM are all smaller than those by other five models for indexes WTI, Brent, RBOB, and heating oil. It can be inferred that the WPD-SW-LSTM model is significant superior to other models for the four indexes. (MMCID). In this section, novel error evaluation methods are proposed to detect the predicted performance. e new analysis method is based on complexity-invariant distance (CID) which generally brings about major improvements in time series classification and clustering accuracy [35]. Complexity invariance makes use of knowledge about complexity discrepancy between two different datasets as a modification factor for the existing distance measurement methods [35,60]. By improving the CID method, multiorder multiscale complexity invariant distance (MMCID) is derived to evaluate the predictions of the energy futures prices with different forecasting models. In practical application, the complexity is not limited to a single scale. e MMCID measurement considers multiple time scales when validating   110  100  90  80  70  60  50  40  30  20  110  100  90  80  70  60  50  40  30 Actual value of WTI     and quantifying the connection between different futures series. e MMCID measurement can consist of the following two procedures: (i) considering one-dimensional discrete time series: x 1 , x 2 , . . . , x i , . . . , x N , consecutive coarse-grained vector y (τ) is calculated with the scale parameter τ.

Evaluation of Multiorder Multiscale CID Analysis
e specific mathematical expressions are as follows, which refers to [61] Particularly, when τ � 1, the coarse-grained time series is y (1) , which is merely the primitive sequence. e length of each coarse-grained time series is equal to the length of primitive series divided by the scale parameter τ. (ii) According to the principle of CID, we compute the multiorder value of CID for each coarse-grained time series and then acquire the MMCID method as a function with scale parameter τ. Assuming that there are two time series, R and S, with length n, R � r 1 , r 2 , . . . , r i , . . . , r n , S � s 1 , s 2 , . . . , s i , . . . , s n . (21) e multiorder distance expression is given as     where ED q (R, S) between two time series R and S indicates complexity invariant by introducing a correction index. CF q is a complexity correction index, and CE q (T) is a complexity evaluation of time series T. Moreover, CF q gives reasons for complexity differences of different datasets into comparison. It separates time series with distinctly different complexities to be further apart. And multiorder parameter q is applied to enlarge the performance of great changes in the process of error evaluation. When evaluating with the MMCID method, the actual value can be regarded as series R and the predicted results as the series S. According to the theory of the MMCID, the predicted effectiveness is better when the MMCID value is smaller. It also indicates that the fluctuation trends of the prediction are almost consistent with the actual data. In this study, the parameter q is set to 2 and τ is from 1 to 20. Table 8 shows the specific MMCID values between the forecasting results and the actual values from the six mentioned models when the scale parameter τ � 1. e empirical results from the four different types of experiment data demonstrate that the proposed hybrid model performs much better than the other five forecasting models. Figure 12 shows MMCID results  between the actual futures prices series and the corresponding prediction of them from each predictive model. It is distinctly noticed that the MMCID value between actual data and the prediction ones by the WPD-SW-LSTM model is the smallest one of all, and the results from hybrid models are much better than those from single models for all the four contemplated futures indices. With the novel estimation method, the forecasting merits of the proposed WPD-SW-LSTM model are further manifested, and the productiveness of the SW method added to WPD-LSTM model is also revealed distinctively. In view of the above empirical analysis, the established new hybrid forecasting approach is effective for improving the accuracy of energy futures prices.

Comparative Analysis with Existing Hybrid Models.
In this section, the latest hybrid models are considered as the benchmark models to make predictions on the selected four energy futures indexes. Recently, many researchers have combined decomposition methods with machine learning algorithm to establish hybrid forecasting models. Lin et al. [34] proposed the CEEMDAN-LSTM model to the forecast of exchange rate. Niu et al. [32] and He et al. [45] applied the VMD-LSTM model to the forecasting fields of stock prices and exchange rate movements. Li and Wang [62] developed a novel model ST-GRU by embedding stochastic time intensity function into gated recurrent unit model (GRU). erefore, this section makes comparative analysis between the WPD-SW-LSTM model with the CEEMDAN-LSTM, VMD-LSTM, and ST-GRU models, respectively. Table 9 has listed the error evaluation results of the four hybrid forecasting models. Table 10 is the hypothesis test results of Wilcoxon signed rank test for different paired models. e p values are all close to 0 and the H values are 1 through calculation by hypothesis test, indicating that test rejects null hypothesis. Hence, the prediction error of the WPD-SW-LSTM model is significantly different (under the significance level of 0.05) from the error of the other three hybrid models. Furthermore, compared with the results of other models, all the error evaluations of the forecasting performances in Table 9 are very close, but those of the proposed model are smaller than the errors of the other models. Combined with the results of the statistical test in Table 10, it can be deduced that the prediction efficiency of the proposed model is more  superior to the latest three hybrid models for energy futures prices forecasting.

Conclusion
In this research, a new hybrid forecasting model, WPD-SW-LSTM, has been set up by integrating the wavelet packet decomposition based on LSTM with stochastic time strength weight function method. After decomposing the primitive futures series into several subseries, each forecasting model for the different subseries SS i has been established according to its own frequency band properties. e correlation coefficient values (R) from four energy futures series are all above 0.98 and extremely near 1, which implies that the proposed model performs great prediction effect.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.