A Multilevel Wavelet Decomposition Network Hybrid Model Utilizing Cyclic Patterns for Stock Price Prediction

,


Introduction
Stock price prediction is a very important and complex problem in the feld of fnancial time-series prediction [1].Stock price fuctuations are infuenced by corporate fundamentals, business cycles, stock market trading rules, international political events, investor sentiments, and other various factors.Due to the abovementioned reasons, stock price prediction is a challenging problem that has attracted more and more researcher's attention.
Te main methods of stock price prediction could be classifed into two major classes: traditional statistical methods and machine learning methods [2].Te traditional statistical methods have the advantages of solid statistical theory as support.Various statistical methods for stock price prediction such as exponential smoothing model (ESM) [3], vector autoregression (VAR) [4], autoregression integrated moving average (ARIMA) model [5], generalized autoregression conditional heteroskedasticity (GARCH) model [6], and radial basis function (RBF) [3] were proposed and widely adopted in econometrics.However, most of these methods are linear based on hand-crafted factors and are limited by the statistical assumptions that the data are smooth and normally distributed.In this case, such methods may face challenges when analyzing fnancial series data with large volume, highly noisy, nonlinear, and nonstationary characteristics [7].
Among machine learning methods, deep learning methods based on neural networks are more popular and of better performance [8].Te main advantage of deep learning models is their ability to learn representations from raw data without feature engineering conducted by experienced practitioners, and this advantage makes deep learning models especially suitable for complex systems such as the stock market.Moreover, deep learning models can provide a general approximation of functions for complex, nonlinear, and nonsmooth processes [9].Terefore, deep learning models are quite worth exploring to deal with fnancial time-series data.
In addition to the main two classes of methods, some researchers proposed stock prediction methods based on fuzzy system theory.Wu et al. presented a fuzzy momentum contrarian uncertain characteristic system for the classifcation and quantifcation of stock characteristics [10].Based on the suitability index (SI) derived from fuzzy-set theory, Syu et al. presented a stock selection system called TripleS [11].
According to literature [12], the time series consists of four components: trend, cycle, seasonal, and irregular component.Trend is a long-run tendency characterizing the time series.It may be a linear increase or decrease in level over time.It may be stochastic, a result of a random process, or deterministic, a result of a prescribed mathematical function of time.Seasonal components or signals, by contrast, are distinguishable patterns of regular annual variations in a series.Tese may be due to changes in the precipitation or temperature and so on.Cycles are recurrent data movement patterns over periods.It is also a more or less regular long-range fuctuation above or below some equilibrium level or trend line.Tey have upswings, peaks, downswings, and troughs.Tey are studied for their turning points, duration, frequencies, depths, phases, and efects on related phenomena.For example, business cycles are postulated recurrent patterns of prosperity, recession, depression, and recovery.And what is left over after these components are extracted from the series is the irregular or error components.
In this paper, we refer to the cycles (or similar concepts) in the time series of stock prices generally as cyclic patterns.As far as we all know, signal decomposition methods in the data signal processing feld can generate features of diferent frequencies from series data and perform time-frequency analysis on the series data.Tese features that contain upswings, peaks, downswings, troughs, and cycle information (or frequency information) can be considered as cyclic patterns.In order to capture cyclic patterns, we found, among the signal decomposition methods, discrete wavelet transform (DWT, or discrete wavelet decomposition) and empirical mode decomposition (EMD), which are considered to be efective methods for obtaining cyclic patterns [13].
However, we found that inappropriate procedures for applying the wavelet decomposition to time-series data easily lead to data leakage [14], which uses unobserved data, and its forecasting results would be of extremely high precision, and predictions based on these methods are unreliable.We also found that in order to curb the data leakage, a sliding window mechanism was proposed; however, the wavelet coefcients vary near the endpoint of the transformation window with its shifts and cause boundary problems.Te boundary problem causes the generated subseries to be distorted, and the constructed hybrid models are less efective than simple prediction methods in prediction.In this study, we carefully investigate the calculation mechanism of wavelet decomposition and multilevel wavelet decomposition network methods to resolve the two problems mentioned above.
Currently, most hybrid neural network models that utilize cyclic pattern information did not take into consideration the data leakage and boundary problem during the utilization of signal decomposition techniques.Terefore, we would like to propose a hybrid neural network model that utilizes cyclic patterns to predict stock prices, avoiding the data leakage and alleviating boundary problems.We propose the mWDN-LSTM stock price prediction hybrid model, which utilizes the mWDN network to generate cyclic patterns and then uses the LSTM model to make time-series predictions.
Te rest of this paper is organized as follows.Section 2 introduces the related research work.Section 3 introduces the proposed mWDN-LSTM in detail.Ten, Section 4 and Section 5 present the experimental setup and discuss the experimental results.At last, Section 6 concludes this paper.
Te main contributions of this paper are as follows: (1) We propose a solution that avoids data leakage while alleviating the boundary problem.A multilevel wavelet decomposition neural network and its variants are investigated, which can adaptively adjust the wavelet coefcients.(2) A new hybrid model combining the wavelet decomposition network and LSTM is proposed, which can efectively utilize the cyclic patterns, and experimental results demonstrate the efectiveness of our proposed model.

Related Work
Due to the success of deep learning in recent years, models based on neural networks have gained more and more attention for stock price prediction problems [15].In 2010, Naeini et al. applied two neural networks, a feedforward multilayer perceptron (MLP) and an Elman recurrent network, to predict a company's stock value based on its history stock price [16].In 2013, Ticknor proposed the model of feedforward neural networks with Bayesian regularization to predict stock prices, thereby reducing the possibility of model overftting [17].In 2015, Rather et al. achieved a high accuracy based on the RNN model for the prediction of 6 stock prices from NSE [18].In 2016, Di Persio and Honchar employed CNN to predict the S&P 500 price movement.Te results showed that CNN achieved better results for fnancial time series compared to MLP and RNN models [19].In 2017, Selvin constructed several deep learning models to predict stock prices in the Indian stock market, and in this paper, the following neural network models were employed: deep recurrent neural network (RNN), long short-term memory (LSTM) neural network, and convolutional neural network (CNN).Te results of the empirical analysis showed that these models achieved reasonable prediction accuracy for stock prices, and among them, LSTM performed the best [20].In 2020  [21].In 2021, Wu and Ming-Tai proposed the SACLSTM stock price prediction algorithm, which constructs a sequence array of historical data and its leading indicators and uses the array as the input image of the CNN framework, and this algorithm has achieved excellent forecasting results for Taiwan and American stocks [22], which is similar to the work proposed by the authors in reference [23].An LSTM-GA stock trading suggestion system in IOT was proposed, based on historical data and leading indicators [24].In 2022, Zhang et al.
proposed the novel transformer encoder-based attention network (TEANet) framework, which realizes the efective processing and analysis of stock prices to improve the accuracy of stock movement prediction [25].Some researchers have constructed hybrid models based on signal decomposition techniques and neural networks to exploit the cyclic patterns in the stock market.However, the vast majority of researchers did not take into account the data leakage and boundary problem implicit in the utilization of signal decomposition techniques such as DWT for time-series prediction tasks.For example, in 2019, Qiu et al. decomposed the historical stock price time series using DWT and EMD and then analyzed the obtained subseries and generated prediction by the RVFL model [26].Chandar decomposed the fnancial time series using DWT and subsequently inputted the decomposed subseries into ANFIS to predict closing prices [27].In 2020, Li and Tang proposed the WT-FCD-MLGRU model and chose four major stock indices, S&P 500, IXIC, DJI, and SSE, to test the model performance [28].In 2021, Wu et al. proposed a combination of ELM and DWT-based models to predict the stock price movements of 400 stocks in China [29].
In the abovementioned study that employed the signal decomposition techniques, the data decomposition of the whole data series, including the training and test sets, was performed before the model was trained.Tis decomposition operation leads to the problem of data leakage of future data.Terefore, the fnal results are unrealistic, and similar efects cannot be achieved in practical applications.In addition, in 2018, Hasumi and Kajita found that wavelet-based time-series predictions cannot even outperform a simple prediction when the time series is properly processed due to boundary problems [30].Since the data leakage and boundary problem may lead to unreliable results, we will explain them in detail in Section 3.2.
Tere were also research works utilizing cyclic patterns (or similar concepts) in other time-series tasks.In 2018, Wang designed the mWDN network which implemented a multilevel discrete wavelet decomposition process through a neural network called mWDN.Tis model has a better prediction performance than SAE, RNN, and LSTM in cellphone user numbers and in ECG time-series prediction tasks [31].In 2020, Zhang proposed a hybrid neural network model based on mWDN in an industrial productivity prediction task that was able to efectively improve the accuracy and granularity of the prediction [32].

Model
In this section, frst, we introduce how to generate cyclic patterns in the stock market by the discrete wavelet decomposition techniques.Second, we explain the two major problems during the wavelet decomposition procedure that need to be overcome.Tird, we introduce the proposed model and each of its components in detail.At last, the training and prediction of our model are introduced.

MDWD and Cyclic Patterns
3.1.1.MDWD.Multilevel discrete wavelet decomposition (MDWD), a typical discrete signal analysis method, is commonly applied to numerical analysis, time-frequency analysis, denoising, and so on.Te process of multilevel discrete wavelet decomposition mainly includes convolution operation and downsampling.Te convolution operation can decompose the series into low-frequency and high-frequency subseries.Downsampling was designed to reduce the redundancy of the data, and at the same time, it can keep the total amount of decomposed data consistent with the original data.However, if the translation-invariance of the decomposition process needs to be maintained (the length of each subseries obtained from the decomposition is equal to the length of the original series), this step can be left out.
Te multilevel discrete wavelet decomposition process is shown in Figure 1, and the related parameters are shown in Table 1.Te implementation steps are as follows: (1) In the 1-th level of decomposition, the input series x will do convolution operations with the low-pass flter l and the high-pass flter h and generate the intermediate variable series a l (1) and a h (1), respectively.Tis step can also be represented in the form of matrix operations.
Te formula for the i-th level convolution operation is as follows: where x l n (i) is the n-th element of the low-frequency subseries in the i-th level and x l (0) is set as the input series x.
(2) 1/2 downsampling of the intermediate variable series a l (1) and a h (1) is performed to obtain the lowfrequency and high-frequency subseries x l (1) and x h (1) of the 1-th level decomposition.(3) Te low-frequency subseries x l (1) is set as the input series for the next level of decomposition.(4) After pooling i times for step (1) to step (3), the decomposition result χ(i) of the i-th level decomposition is obtained.

Cyclic Patterns.
In order to utilize the cycle characteristic in the stock market, the frst step is to generate cyclic patterns from the raw dataset.For example, discrete wavelet decomposition methods are employed to generate cyclic pattern information.Te subseries obtained by discrete wavelet decomposition contain cyclic information (or frequency information), such as cycle fuctuation depth, fuctuation duration, and fuctuation turning point, rendering it consistent with the defnition of a cyclic pattern in a time series.

Complexity
For example, we choose the series with a length of 200 and a decomposition level of 2 in Figure 2 to illustrate it.As shown in Figure 3, frst, we can notice that the generated series fuctuates more or less regularly at the level of value 0, showing a cyclic fuctuation pattern.Although this series is not a rigorous cyclic series, it is a discrete combination of several cyclic series.Second, the series contains upward fuctuations and downward fuctuations, with the maximum upward fuctuations from t � 5 to t � 18 and the minimum downward fuctuations from t � 95 to t � 100.Tese unidirectional fuctuations are half of the cyclic fuctuations.Terefore, the series contains cyclic fuctuations with a minimum cycle of 10 days and a maximum cycle of 26 days.In addition, the series contains peaks and troughs, and the highest peak in this series is (195, 50.69) and the lowest trough is (5, −54.75).Terefore, the depth of the cyclic fuctuations in this series is between −54.75 and 50.69.
In conclusion, we can get the cycle information (or frequency information), cycle fuctuation depth information, fuctuation duration information, and fuctuation turning point information in the stock series data from the subseries decomposed by discrete wavelet decomposition methods, and it is consistent with the defnition of a cycle in time series, so the subseries obtained by discrete wavelet decomposition is the cyclic pattern in the stock market that we need.In our model, the low-frequency subseries are long-term cyclic patterns and the high-frequency subseries are short-term cyclic patterns.We argue that cyclic patterns could be an enhancement for stock market prediction.

Data Leakage and Boundary
Problem.We fnd that data leakage and boundary problems are two major problems when applying discrete wavelet decomposition in real stock price prediction applications.In the following, we describe these two problems in detail and introduce our method.

Data Leakage.
Data leakage is the use of information in the model training process which would not be expected to be available at the prediction time, causing the predictive scores (metrics) to overestimate the model's utility when run in a production environment.We include the results of a method with data leakage in our experiments to demonstrate its easily overestimated efect.
When employing DWT with translation-invariance property, the length of the subseries is equal to the length of the original series, which makes many researchers mistakenly believe that they can decompose the original series in a one-time manner and then divide the dataset into the subseries.Based on this, model training and prediction are performed.Tis process is shown in Figure 4, and it contains data leakage.Tis is because the wavelet transform works by computing the convolution operation of the time series with x h (2) x h (1)    Complexity Tis warns us that data decomposition should not involve prediction points and their subsequent data.

Boundary Problem.
When we take measures to precisely control the decomposed series to avoid data leakage, such as sliding windows, the prediction results are signifcantly afected by the boundary problem and the model cannot generate accurate predictions.
In order to illustrate the boundary problem, we plot Figure 2 to show the diference in decomposing time series of diferent lengths.It also shows the diference of the corresponding output between whether the same point in time is at the boundary.Te data are the SSE Composite Index data which are a part of the experimental dataset.We apply discrete wavelet decomposition to decompose the data into three components by expanding the number of data points from 50 to 200.
As can be seen from Figure 2, in the results of discrete wavelet decomposition of series data with lengths of 50, 100, 150, and 200, there are huge diferences in the calculated results at the boundaries of the series.Te subseries at the boundary are of-track and distorted.Te subseries at the boundary are of-track and distorted, such as the four areas A, B, C, and D in Figure 2. Tis is caused by the assumption of circularity that the computation of the boundary involves the data at the other edge of the window.For example, the output of x 9 in Figure 5 needs to be calculated together with x 0 and x 1 .Te prediction of future data should be performed with the most recent data possible, rather than data from the other end of the sliding window, which would cause large biases in the prediction results.
We avoid data leakage by applying the sliding window mechanism with mWDN to replace the one-time wavelet decomposition.By establishing a new wavelet convolutional operation matrix and incorporating an adaptive adjustment mechanism for mWDN parameters, we aim to mitigate the impact of boundary problem on the prediction accuracy.Tis is elucidated in both the input layer and the mWDN component.

mWDN-LSTM Model.
Our model mWDN-LSTM can be divided into four components: input layer, mWDN component, LSTM component, and output component.Te model structure diagram is shown in Figure 6.In the frst component, the input layer is designed to set up the sliding window and normalize the data.In the second component, the mWDN component is designed to implement wavelet decomposition and decompose the series data to generate cyclic patterns.In the third component, the LSTM component is utilized to learn and memorize long-term and short-term information and to make predictions.In the fourth component, the output component is a fully connected network that is utilized to convert the output vector into the fnal prediction.

Input Layer.
In order to avoid data leakage and make the solution practically feasible, we can only decompose in real-time and predict while decomposing, so we use a sliding window mechanism as shown in Figure 7. Te window is set up in front of the prediction point and moves forward one unit at a time until all data points are covered.mWDN also only decomposes the data within the window.Tis mechanism ensures that the decomposition process is real-time and does not include future data, which makes the prediction results realistic and reliable and can be deployed in real investment scenarios.

mWDN Component.
In order to obtain cyclic patterns using discrete wavelet decomposition while alleviating the infuence of boundary problems on prediction, we set up a new convolutional operation matrix and utilize mWDN with adaptive parameter adjustment capability to implement the discrete wavelet decomposition process.
(1) Redesign of the Convolutional Operation Matrix.When adopting the regular convolutional operation matrix (similar to Figure 5), the convolutional calculation at the boundary of the window involves data on the other side of the sliding window, causing the calculation results to be distorted.Terefore, we alleviate the impact of the boundary problem by shifting the wavelet parameters in the convolutional operation matrix so that the calculation results at the boundary near the prediction point are not distorted as much as possible.Te redesigned matrix is shown in Figure 8. Te redesigned matrix will be applied in mWDN.
Te mWDN approximately implements an MDWD under a deep neural network framework.Tis neural network framework mainly consists of a perceptron model and an average pooling layer.mWDN implements the convolution operation in the MDWD by replacing the weight parameter matrix in the perceptron model using the wavelet function    Complexity matrix of the convolution operation.Tis makes mWDN diferent from MDWD with constant parameters and has the ability to fne-tune parameters such as convolutional operation matrix and bias vector to ft diferent learning tasks.Ten, the downsampling process in MDWD is implemented by the average pooling layer.We hope to alleviate the impact of the wavelet decomposition boundary problem on the prediction results by optimizing the prediction efect based on the capability of mWDN to fne-tune the convolution calculation matrix and deviation vector.Te schematic diagram for mWDN to implement the i-th level decomposition of the MDWD process is shown in Figure 9. Te steps of the process are as follows: (1) We set up weight matrices W l (i) and W h (i) according to the parameters of low-pass flter l and high-pass flter h.Te values of the low-pass flter and high-pass flter depend on the selected wavelet function.We initialize the bias vectors b l (i) and b h (i) as close-to-zero random values.We set the initial value of the weight matrices W l (i) and W h (i) at the i-th level decomposition as shown in Figure 8. (2) We then multiply the weight matrix with the input series to implement the convolution operation described in Section 3.2.1 (1).(3) Te result of the previous step is then added to the bias vectors b l (i) and b h (i), and the addition result is input into the activation function to obtain the intermediate variable series a l (i) and a h (i).
Te calculation process of step (2) and step (3) is shown in the following equation: (4) Te intermediate variables a l (i) and a h (i) are downsampled using the average pooling layer as

LSTM Component.
In this component, we employ LSTM to model time-series data.LSTM is a special kind of recurrent neural network (RNN) proposed by Hochreiter and Schmidhuber [33].Although RNN models can store history information by hiding states and efectively utilize history data information for prediction.However, RNNs can only learn short-term dependencies between features.Te model has a problem of gradient explosion and gradient disappearance.LSTM is improved for the abovementioned problems by adding three gate structures and a memory cell on the basis of RNN.Te three gates are the input gate, forget gate, and output gate.Te role of these gate structures is to control the fow of information in the hidden state, learning long-term and short-term dependencies, which work quite well on time-series datasets.

mWDN-LSTM Training and Prediction Process.
In order to train the mWDN-LSTM network, the training set needs to be standardized frst.After standardization, the data are imported into the mWDN component for calculation based on the sliding window.Te input data are decomposed into subseries of diferent frequencies after passing through the high-pass and low-pass flters.During decomposition, cyclic patterns are generated from the data.Ten, the data are imported into the LSTM component for calculation.Te input of each LSTM subnetwork is the output of the mWDN component.Te LSTM component is calculated to obtain an output vector.Te output vector is fed into a fully connected neural network to obtain the fnal prediction.After completing one model's calculation, the error function is utilized to calculate the error between the predicted value and the real value.Finally, the network is trained by propagating the calculated error values back to the network and using the optimizer to update the weights and biases of the network.After the training is completed, the model is saved.Similarly, the test set frst needs to be standardized.After standardization, the test set is imported into the saved model to calculate and obtain the predicted values.Since the obtained predicted values are standardized, standard restoration of the predicted values is required.Finally, the evaluation criteria are calculated based on the predicted and real values, and the predicted values and evaluation criteria are given as output.
Te process of mWDN-LSTM training and prediction is shown in Figure 10.

Experiment
To demonstrate the efectiveness of mWDN-LSTM, we compare the model with MLP, CNN, RNN, LSTM, and ( Te dataset features can be described as follows: (i) Opening price is the frst price of any listed stock at the beginning of an exchange on a trading day.(ii) High and low prices are the highest and lowest prices of the stock on that day.Generally, these data are applied by traders to measure the volatility of a stock.(iii) Closing price is the price of the stock at the end of a trading day.(iv) Volume is the total number of shares or contracts traded in the market during the day.
(v) Turnover is the total value of stocks or contracts traded in the market on that day.(vi) Ups and downs are the values of the increase or decrease of the day's closing price relative to the previous day's closing price.(vii) Change is the ratio of the increase or decrease of the day's closing price relative to the previous day's closing price.
(4) Prediction target: Prediction target is the closing price of the next day.(5) Train and test set splits: We take the data of the frst 6,627 trading days as the training set and the data of the last 500 trading days as the test set.

Experimental Setup.
Te data are standardized and restored by the z-score method.Te standardization of data by using $ and the restoration of data are performed by using the following equations: where x i is the input data, x is the average of the input data, s is the standard deviation of the input data, and y i is the standardized value.For evaluation criteria, the mean absolute error (MAE), root mean square error (RMSE), and R-squared (R 2 ) are applied to evaluate the efectiveness.Te MAE, RMSE, and R 2 calculation formulas are as follows: ... ... ... ...

Avg pooling
x l 1 (i -1) a l 2 (i) b l P-1 (i) x l P (i -1) where  y i is the predictive value, y i is the real value, and y i is the average value.Te closer the MAE and RMSE values are to zero, the smaller the diference between the predicted and real values is, and the higher the prediction accuracy is.Te closer the R 2 is to 1, the better the ftting degree of the model is.

Implementation of mWDN-LSTM.
Te parameter settings of our proposed model mWDN-LSTM are tuned one by one according to cross-validation.Te parameters of this experiment are shown in Table 3.
According to the parameter settings of the mWDN-LSTM network, the data dimensions of input and output in each component of mWDN-LSTM are shown in Figure 11.Te model structure is as follows: according to the size of the time_step and the dimension of the input data, the data of the input layer are a three-dimensional vector (none, 32, and 8).After the data are input into the mWDN component, the data are decomposed into subseries of diferent frequencies.Te cyclic patterns in the data are generated.After 2-level decomposition, the data of length 32 will be decomposed into one subseries of length 16 and two subseries of length 8, for a total of three subseries.Terefore, the output of the mWDN component is two fourdimensional vectors: (none, 16, 1, 8) and (none, 8, 2, 8).Each subseries feeds an LSTM subnetwork.After the LSTM component is trained, an output vector (none and 48) will be output, where 48 is the number of hidden units in the LSTM component.Finally, the vector is fed into the output component to get the fnal predicted value.

. Experiment Results
In this section, we will discuss our model's efectiveness compared with other benchmarks.With regard to benchmarks, to the best of our knowledge, there is no research that utilizes the cyclic pattern correctly in the stock price prediction task.So, we choose MLP, CNN, RNN, LSTM, and CNN-LSTM models as benchmarks.DWT-LSTM is used as a case study to describe the results of hybrid models in the existence of data leakage.
Our experiments exploit the training set data to train mWDN-LSTM, MLP, CNN, RNN, LSTM, and CNN-LSTM, respectively, and then exploit the test set data to generate predictions.Based on the experimental results, we plotted the comparison fgure of predicted and real values (Figures 12-19), as well as the table of evaluation criteria (Table 4) and the comparison chart of evaluation criteria performance (Figures 20 and 21).
Te results of DWT-LSTM, as shown in Figure 22 and Table 4, usually cause researchers to overestimate the performance of signal decomposition techniques such as wavelet decomposition, but similar hybrid models with data 10 Complexity leakage are unreliable in application scenarios.Tis is one of the motivations of our paper.Furthermore, in order to clearly display and analyze the intersection of the constructed mWDN-LSTM stock index prediction model and the progress of the cutting-edge benchmark CNN-LSTM stock index prediction model, two time periods were selected from the test set results for enlarged display and comparison.If the frst point of the test set is marked as t 1 , the second point is marked as t 2 , and so on, then the two time periods are "t301 to t400" and "t401 to t500".Te result is as follows.

Results Demonstration.
Te comparison fgure between predicted and real values can visually demonstrate the error between predicted and real values at the turning point and trend duration stage, as well as the degree of model ftting.From Figures 12-19, we can notice that mWDN-LSTM has the lowest error between the predicted and real values at the turning point and the trend duration phase stage compared to other models.So, the predicted value series of mWDN-LSTM has the highest degree of ftting with the real value series.Based on the diagram at the turning point and the trend duration phase, the descent order with regard to the ftting degree of all models is mWDN-LSTM, CNN-LSTM, LSTM, RNN, CNN, and MLP.
From the diagrams mentioned above, we can fnd that most models predict badly especially around the turning point, and our model mWDN-LSTM alleviates this problem by being guided by cyclic pattern information.

Result Analysis.
Te diagrams mentioned above demonstrate the prediction results visually.In this section, we calculated the evaluation criteria (MAE, RMSE, and R 2 ) based on the experiments carried out on the various models under the same experimental setup, so that we can more accurately evaluate the prediction error and model ftting degree.From the results presented in Table 4 and Figures 20  and 21, we can reach 3 major conclusions.
First, LSTM-based models outperform non-LSTMbased models, and this conclusion means that, generally, LSTM-based models are more suitable for time-series prediction tasks.
Among the non-LSTM-based models (CNN, RNN, and MLP), CNN and RNN have close prediction results with little diferences between them, but they are signifcantly better than MLP.For example, compared to MLP, the MAE of CNN decreases from 37.757 to 30.397 by 19.5%, RMSE decreases from 49.371 to 41.492 by 16%, and R 2 improves by 1.59%.Terefore, the CNN and RNN models outperform the MLP.
Among the LSTM-based models (mWDN-LSTM, CNN-LSTM, and LSTM), LSTM performs worst but still, significantly improves the prediction results compared to CNN and RNN.For example, compared to CNN, the MAE of    Second, the hybrid models outperformed the nonhybrid model.Tis conclusion demonstrates that hybrid models designed for a specifc task generally outperform generalpurpose models.
Among all the hybrid models (CNN-LSTM and mWDN-LSTM), the CNN-LSTM model performs the worst.Among all the nonhybrid models (LSTM, RNN, CNN, and MLP), LSTM performs the best.For example, compared with LSTM, the MAE of CNN-LSTM decreased from 28.675 to 27.559 by 3.9%, and the RMSE decreased from 40.793 to 39.522 by 3.1%, and the R 2 also improved by 0.23%.
Finally, of all hybrid models, our model mWDN-LSTM performed the best.Tis demonstrates that correctly utilizing the cyclic patterns in a hybrid model can improve the prediction results.
We compare mWDN-LSTM with CNN-LSTM, which already achieve excellent prediction results among the benchmarks.Compared with CNN-LSTM, the MAE of the mWDN-LSTM model decreased by 4.8%, and the RMSE decreased by 3.1%, and the R 2 also improved by 0.48%.Based on the experimental results, the comparison fgures of predicted values and real values (Figures 23 and 24) are plotted, as well as the table of evaluation criteria (Table 5) and the comparison charts of evaluation criteria performance (Figures 25 and 26).
From       16 Complexity the best, and R 2 is closest to 1; the mWDN-LSTM model also obtains excellent prediction results, and it has the highest degree of ftting compared to other benchmark models.It can be concluded that mWDN-LSTM has generalizability.

Summary.
Our proposed mWDN-LSTM has outperformed all the other baseline models and is more efective for predicting the next day's closing price of stocks.
Meanwhile, our experiments demonstrate the efectiveness of utilizing cyclic patterns while avoiding data leakage and alleviating the impact of boundary problems.

Conclusions
In this paper, we study the problem of stock price prediction which aims to predict the next day closing price of the stock using historical information.We have noticed that cyclic  Complexity patterns are important characteristics of the stock market.
From this motivation, we propose the mWDN-LSTM model based on deep neural networks, which can efectively and correctly utilize the cyclic patterns in the stock market.Unlike other DWT-based hybrid models, our mWDN-LSTM model avoids the data leakage by sliding window mechanism, and through the adaptive parameter adjustment mechanism of mWDN and redesign of convolution matrix, the impact of boundary problem in wavelet decomposition on prediction performance is alleviated.Terefore, the model is both theoretically sound and practically feasible in stock price time-series prediction.
In addition, the model generates cyclic patterns with diferent frequencies from stock data by applying the mWDN network and then employing the LSTM model to learn the cyclic patterns and predict the next day's closing price.We compare mWDN-LSTM with baseline models to verify its efectiveness on the datasets of the SSE Composite Index and the Hang Seng Index.Te experimental results show that the evaluation criteria, MAE and RMSE, of our model are the best, and R 2 is closest to 1. Tis means that our model mWDN-LSTM outperforms the benchmarks and demonstrates the efectiveness of utilizing cyclic patterns in stock price prediction tasks when avoiding data leakage and alleviating the impact of boundary problems.
Complexitythe selected wavelet function.When calculating the output of a point in a time series, it is necessary to convolve the wavelet function with that point and several points before and after it.As shown in the case of Figure5, x 0 to x 9 are the time-series data arranged in a chronological order.Te output of the data point x 1 needs to be obtained by convolution calculation with x 0 , x 2 , x 3 , x 8 , and x .Te output of the data point x 3 needs to be obtained by convolution calculation with x 0 , x 1 , x 2 , x 4 , and x 5 .So, the output of the convolution operation is a local combination of data points, and the decomposed components involve historical and future data.Tis is a typical data leakage problem.

Figure 2 :Figure 3 :
Figure 2: Comparison of series of diferent lengths after decomposition.(a) is the original data, (b) is the high-frequency subseries of level 3 decomposition, (c) is the high-frequency subseries of level 2 decomposition, and (d) is the high-frequency subseries of level 1 decomposition.Te abovementioned decomposition has not been downsampled.
Performs one-time decomposition of raw dataDivide the decomposed data into training and test setsInput the decomposed sub-series into model for training and prediction

Figure 4 :
Figure 4: Wavelet decomposition-based prediction process in the existence of data leakage.

Figure 8 :
Figure 8: Redesigned convolutional operation matrix, where W l (i) and W h (i) ∈ R P×P , P is the size of input series at the i-th level decomposition, ε are random values that satisfy |ε| ≪ |l|, ∀l ∈ l, and | ε| ≪ | h |, ∀h ∈ h.

Figure 11 :Figure 12 :Figure 13 :
Figure 11: Data dimensions of input and output in each component of mWDN-LSTM.

Figure 14 :Figure 15 :
Figure 14: Comparison of the predicted value of the CNN-LSTM and the real value.

Figure 23 :
Figure 23: Te comparison of the predicted value of the mWDN-LSTM and the real value (in HSI dataset's experimental validation).

Figure 24 :
Figure 24: Comparison of the predicted value of the CNN-LSTM and the real value (in HSI dataset's experimental validation).

Table 1 :
Parameters in the MDWD implementation process.
x � x 1 , . . .x t , . . ., x T   Low and high subseries in the i-th level x l Features included in each piece of data: Te features include the opening price, highest price, lowest price, closing price, volume, turnover, ups and downs, and change.A sample of the data is shown in Table2.

Table 2 :
Head data in the SSE experimental data.

Table 5 ,
in experimental validation of the HSI dataset, we can see that the evaluation criteria, MAE and RMSE, of the mWDN-LSTM model are

Table 4 :
Comparison table of evaluation criteria.Note.Te DWT-LSTM model achieves high precision with data leakage, which is impractical in real applications.

Table 5 :
Performance comparison table in HSI dataset.