Prediction of Financial Time Series Based on LSTMUsingWavelet Transform and Singular Spectrum Analysis

In order to further overcome the difficulties of the existing models in dealing with the nonstationary and nonlinear characteristics of high-frequency financial time series data, especially their weak generalization ability, this paper proposes an ensemble method based on data denoising methods, including the wavelet transform (WT) and singular spectrum analysis (SSA), and long-term short-term memory neural network (LSTM) to build a data prediction model. 'e financial time series is decomposed and reconstructed by WT and SSA to denoise. Under the condition of denoising, the smooth sequence with effective information is reconstructed. 'e smoothing sequence is introduced into LSTM and the predicted value is obtained. With the Dow Jones industrial average index (DJIA) as the research object, the closing price of the DJIA every five minutes is divided into short term (1 hour), medium term (3 hours), and long term (6 hours), respectively. Based on root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and absolute percentage error standard deviation (SDAPE), the experimental results show that in the short term, medium term, and long term, data denoising can greatly improve the stability of the prediction and can effectively improve the generalization ability of LSTM prediction model. As WT and SSA can extract useful information from the original sequence and avoid overfitting, the hybrid model can better grasp the sequence pattern of the closing price of the DJIA.


Introduction
As the worldwide largest economy, the US has advanced statistical constitutions and a mature financial supervision system, whose financial data are comprehensive, accurate, and credible. At the same time, the US stock market cooperates with other markets in an efficient way and plays an important role in the US financial system, and all these characteristics make the market a good model. On the one hand, global stock markets react quickly to the tendency of this market, especially in the case of unusually high market volatility. On the other hand, most economic theories and assumptions are based on the study of a developed financial system with a larger and more active stock market, a more mature economy, and a more effective financial supervision system. As a representative of developed markets, the US market is also the most favorable object for empirical or theoretical propositions in academic research. e three major stock indexes in the United States are the Dow Jones Index [1], the Standard and Poor's 500 Index, and the Nasdaq Composite Index. e most famous of these indexes is the Dow Jones Index. e importance of the Dow Jones Index has been further recognized in global markets beyond its role in the domestic market. e 30 companies that make up the Dow, such as Citigroup, Coca-Cola, General Motors, and Intel, are prestigious multinational corporations. ey cover a wide range of large industries and their performance is behind the global economy. erefore, forecasting the Dow Jones index is of great significance to the entire financial system.
At present, there are two categories of prediction models that are suitable for financial time series: parametric model and nonparametric model. Autoregressive (AR), moving average (MA), autoregressive moving average (ARMA), and autoregressive comprehensive moving average (ARIMA) are typical models of parameter types [2]. However, all of the above models can only be used if the predicted time series conforms to the statistical assumptions they have. erefore, the parametric model has limitations to some degree. Additionally, the parametric model is more functional for the time series with linear characteristics, but the financial time series of DJIA price is characterized with nonlinearity and is highly fluctuating, which will cause so many estimated parameters and increase the complexity of the model. Due to the limitations of the parametric model and features of DJIA price, the parametric model is unsuitable for the Dow Jones index price forecast.
Stock price prediction is a pertinent and tricky problem that has caught the interest of many scholars. In order to adapt to the characteristics of DJIA and ensure high accuracy, nonparametric models are used. Specifically, machine learning (ML) and deep learning (DL) models are used for DJIA prediction. Ahmed [3] found that machine learning methods can perform better than traditional econometrics methods. Sun et al. [4] have compared the accuracy of the echo state network (ESN) and the long short-term memory model (LSTM) on Kweichow Moutai's stock price prediction. e empirical results found that the ESN model can alleviate the problems of the low accuracy of the deep learning model, the slow convergence speed, and the complex network structure. At the same time, it has higher prediction accuracy. Cao et al. [5] used the convolutional neural network (CNN) model to predict the stock index and finally established the CNN-support vector machine (SVM) stock index prediction model. e empirical results show that the effect of using a neural network to predict the financial time series is better than the traditional measurement method Ustali et al. [6] have used Artificial Neural Network (ANN), Random Forest (RF) algorithm, and XGBoost algorithm to estimate the future price of company stocks listed on the Istanbul Stock Exchange (BIST) United Joint Stock Company (BIST) 30 Index. e empirical results show that although the results of XGBoost and Random Forest algorithms are similar, the prediction results of XGBoost are slightly better. Moreover, the performance of both models is better than ANN.
LSTM (Long Short-Term Memory) network is one of the cyclic neural networks (RNNs). Sepp Hochreiter and Jurgen Schmidhuber first presented this algorithm in Neural Computing [7]. It has better performance than ordinary RNN in data processing and prediction. Considering the excellent performance of the LSTM network in time series, Jiang et al. [8], taking the daily data of Shanghai Composite Index and Dow Jones Index as the research object, respectively, uses RNNS and LSTM to build the model. en they found through experiments that the LSTM model prevails over the RNNS model for the neural network model. However, this model is still not that well applicable to Dow Jones Index. Actually, to ensure the prediction accuracy of the Dow Jones Industrial Average, more potential factors, such as the influence of policy information, should be taken into consideration. However, the quantification of these parameters is exceedingly difficult. erefore, another method is to further mine the information of the sequence itself through the lag term. In order to make up for the deficiency of a single forecasting model, a new forecasting model-hybrid model is presented. In this category, decomposition methods, such as empirical mode decomposition (EMD) [9] and singular spectrum analysis (SSA) [10], are usually combined with ML and DL based on financial time series prediction models [11][12][13]. In recent works [14,15], the superiority of hybrid models has been verified because of their preponderance in identifying time series patterns. As there is a nonlinear relationship between the predicted price of agricultural products and the influencing factors, Jia et al. [16] have designed a neural network model of LSTM-DA (Long Short-Term Memory-Double Attention) which combines the convolutional attention network, the LSTM network, and the attention mechanism. Compared with the traditional signal model, this model can improve the prediction accuracy, and the predicted price index can accurately describe the overall trend of vegetable products in the next week.
However, hybrid models are rarely used to forecast DJIA prices. Meanwhile, lots of previous studies have proved that denoising the high-frequency time series can significantly raise the ability to extend the model and dramatically optimize the prediction results. At present, the empirical data decomposition and noise reduction methods mainly include ensemble empirical mode decomposition (EEMD), singular spectrum analysis (SSA), and wavelet transform decomposition (WT). Although the integrated empirical mode decomposition (EEMD) can suppress the mode aliasing problem to some extent, it may increase the complexity of the sequence. Jung et al. [17] have integrated wavelet transforms and recurrent neural network (RNN) based on artificial bee colony (ABC) algorithm (called ABC-RNN) to establish a system for the purpose of stock price prediction, and it turns out that the performance of the presented model is the best in TAIEX. However, it still has some insufficiencies. For example, this system lacks a solution that includes a feature selection function and addiction parameter information provision function to achieve a simplified system organization.
is paper proposes an integrated method to establish a prediction model, which utilizes data denoising methods including the wavelet transform (WT) and singular spectrum analysis (SSA), and long-term short-term memory neural network (LSTM). At the same time, we compare the results of different models. e second part of this paper will introduce the model formula used in this paper, and the third part will introduce the data, model prediction accuracy, and stability results comparison. e fourth part is a summary.

Model Formulation
In this section, based on the previous work [18], we provide an overview of the main models used in this study, including the LSTM, WT, hybrid WT-LSTM, SSA, and hybrid SSA-LSTM models.

Long Short-Term
Memory. LSTM neural network was first proposed by Hochreiter and Schmidhuber, which is widely used to process sequence information owning to its advantages in discovering long-term dependencies. erefore, it is theoretically feasible to establish an LSTM neural network model for financial high-frequency time series data. e structure of each neuron in LSTM is shown in Figure 1, and its internal structure is constituted of a cell and three gates. Cell records the state of neurons, and the function of input gate and output gate is to receive, output, and modify parameters. Forget gate controls the forgotten degree of the previous state of neurons; that is, it determines the information to be removed from the cell. e selection of activation function is an important part in the process of training a neural network, which can make the neural network learn the nonlinear factors in the data. e activation function used in this paper is the traditional sigmoid activation function and tanh activation function. It mainly includes the following stages. e first stage is eliminating part of the information from the cell through the forget gate.
where x t represents the input of the current cell, h t−1 represents the output of the previous cell, and σ represents the sigmoid activation function which reads the information of x t and h t−1 and outputs a value between 0 and 1. e second stage is updating the status of information in the cell.
where C t−1 indicates the status of old cell information. e third stage is outputting the information controlled by output gate. Firstly, we run the sigmoid layer to determine which part of the cell state outputs. Secondly, we process the cell state through the tanh activation function (get a value between -1 and 1) and multiply it with the output of the sigmoid layer. In the end, we will only output the part of the controlled information.
For the purpose of verifying the effectiveness and versatility of WT and SSA filtering scheme, the most common LSTM neural network structure is adopted in this research. e characteristic quantity is selected as the most basic highest price in the last five minutes, lowest price in the last five minutes, closing price in the last five minutes, and opening price in the last five minutes. Specifically, the main structure of the LSTM neural network in this paper includes a 150-node LSTM layer, a 50node LSTM layer, and a fully connected layer. Besides, a dropout layer is introduced to this model in order to compare the performance of the dropout layer and data denoising methods. e calculation diagram structure of the LSTM neural network constructed in this paper is shown in Figure 2. e dotted box represents the neural network structure.

Wavelet Transform Analysis.
Dow Jones Index is susceptible to a large number of factors such as economic development, policy changes, and investor sentiment. ey usually contain lots of noise and are characterized by nonlinearity. To raise the ability to extend of the model, the noise data should be filtered out when the deep neural network is used to process the nonlinear data. Wavelet analysis can carry out multiscale refined analysis of signals through operation functions such as stretching and shifting, effectively eliminating noise contained in the data and retaining the characteristics of original signals [19]. Accordingly, this paper intends to use wavelet decomposition and reconstruction for data preprocessing of financial time series, which is shown in Figure 3, and adopts "wavelet denoising" to eliminate the high-frequency components of noise in time series, so as to weaken the influence of short-term noise disturbance on neural network structure and improve the prediction performance of the model.
Wavelet decomposition decomposes each input signal into a signal of low frequency and a signal of high frequency and merely decomposes the part of low frequency. Assume that C 0 is the original financial time series signal; C 1 , C 2 . . . C l and D 1 , D 2 . . . D l are the first, second, and L-layer low-frequency and high-frequency signals. en it can be mathematically expressed as follows: In order to denoise, the Mallat wavelet is used to be reconstructed on the basis of the coefficients of the first N layer with low frequency and the coefficients of 1-N layer with high frequency of wavelet decomposition, and the highfrequency part of zero. e low frequency part of wavelet decomposition of financial sequential data reports the general tendency of the series, and the high-frequency part reports the short-term stochastic disturbance of this financial time series. erefore, on the one hand, setting the high-frequency part to zero can eliminate the noise and smooth the signal. On the other hand, it can also obtain the approximate signal of the primary financial sequential data so as to prevent the excessive learning of the neural network structure caused by short-term stochastic disturbance factors and raise the extrapolation and generalization ability of the model.

WT-LSTM.
In order to obtain higher accuracy, LSTM and WT are combined to predict the price of DJIA. Hybrid WT-LSTM consists of the following three phases and the modeling process is shown in Figure 4. e first stage is WT decomposition: sym wavelet is chosen as the wavelet basis for DJIA closing price forward fractional solution. e second stage is time series reconstruction: reconstruct the financial time series data and raise the ability to extend of the prediction model. e third stage is LSTM prediction: smoothed series x t and volume (V t ) are the input characteristics of LSTM. According to the partial autocorrelation function (PACF) of x t , the time lag of x t and V t is determined. en, the final prediction result is obtained. e sym wavelet is an approximate symmetric orthogonal wavelet function of db wavelet, and it has better symmetry [20]. e more the layers of wavelet decomposition, the better the stability of detail signal and approximate signal. However, it will lead to greater errors in the decomposition process, so the number of layers should not be too much or too little. Under the circumstance of four layers of decomposition, the effect of denoising is remarkable without eliminating much valid information. erefore, in this paper, the sym4 wavelet basis is firstly used to divide the closing price of DJIA into four layers, so as to reconstruct the time series data of the gold melting and raise the ability to extend the prediction model. On this basis, the general tendency and market volatility information in the primary data are preprocessed. As a result, the hybrid WT-LSTM model proposed can avoid overfitting and outperform the single LSTM model.

Neural network layer
Pointwise operation Vector transfer Concatenate Copy

Singular Spectrum Analysis and SSA-LSTM.
In order to study the effects of the prediction accuracy of out-ofsample data and the prediction ability of future dynamic trend after decomposing and reconstructing financial sequential data by wavelet transform (WT), we also carry out a controlled experiment to study the model prediction after financial time series data decomposition and reconstruction using singular spectrum analysis (SSA). e effect of SSA can construct a trajectory matrix from the observed financial time series, decompose and reconstruct the trajectory matrix, and extract the different parts of the signal, thus effectively eliminating the noise of the financial sequential data and retaining the features of the primary signal.
Suppose there is a one-dimensional sequence x(i) (i � 1, 2, . . . , n). Given that the embedding dimension is m(m < (n/2)), a time-delay matrix X can be obtained, and its dimension is m × k(k � n − m + 1), Let S be the m × m dimensional covariance matrix of the delay matrix, then Singular spectrum analysis is used to decompose the covariance matrix S to obtain m singular values λ i (i � 1, 2, . . . , m).
en arrange the obtained m singular values in descending order. e magnitude of the singular value represents the relative relationship between the signal and the noise. e singular value points with larger values are regarded as signal points, and the points with smaller values are regarded as noise points.
e eigenvector E k corresponding to λ k is called the empirical orthogonal projection function. e orthogonal projection coefficient of the sampled signal x(i) on the eigenvector E k is the k-th principal component: If each principal component and the empirical orthogonal function are known, the process of reversing the original sequence is as follows: So as to reach a comparison model with the WT-LSTM model, we combined LSTM and SSA to predict the DJIA. e hybrid SSA-LSTM consists of the following three stages and the modeling process is shown in Figure 5. e first stage is SSA decomposition: SSA technique is used to decompose the primary sequential data (x t ) into tendency, market volatility, and noise. e second stage is time series reconstruction: reconstruct the smooth sequence based on the tendency and market volatility signal (x t ).
e third stage is LSTM prediction: smooth series x t and volume (V t ) are the input characteristics of LSTM. In light of the partial autocorrelation function (PACF) of x t , the time Mathematical Problems in Engineering lag of x t and V t is determined. en, the final prediction result is obtained.
SSA is used as a pretreatment method to extract effective information of overall tendency and market volatility from primary sequential data. is paper chooses m � 10, that is, the original series is decomposed into ten layers. According to the singular values shown in Table 1, it is found that, for this financial highfrequency time series, the first layer already contains more than 99.99% of the sequence information. erefore, the first layer is selected as the reconstruction of the sequence, and the rest are regarded as noise.

Training Method and Optimizer Selection.
e goal of this article is to compare the prediction effect of the closing price of DJIA, so mean square error (MSE) is chosen as the loss function. As for the optimizer, since the Adam algorithm has advantages over other adaptive learning rate algorithms in convergence speed and learning effect, this article uses the Adam optimizer (Adaptive Moment Estimation) for optimization training. Epochs are set to be 10. is article is based on the Python language environment and uses TensorFlow as the deep learning framework for training, prediction, and comparison.

Data.
In order to research the feasibility and effectiveness of the denoising methods to forecast the actual financial sequential data, this section compares the prediction results of the RNN and LSTM models with a dropout layer and the LSTM model using data denoising methods   Table 2. e data come from the Wind database.

Test Set Prediction Effect Evaluation Index.
To verify the validity of the model, the prediction and verification have been carried out from the three dimensions of short term (1 hour), medium term (3 hours), and long term (6 hours). Root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) are used as the prediction accuracy indexes to evaluate the predictive effect of the test set. e smaller the values of the above three indicators are, the higher the prediction accuracy is. e prediction stability is evaluated by the standard deviation of absolute percentage error (SDAPE). e lower the SDAPE value, the higher the reliability of prediction.
Here, y i is the actual value and y i is prediction derived from the forecast model. N is the number of predictions.

Comparative Analysis of Short-Term Forecasting Effects.
For the short-term prediction effect, according to the results shown in Table 3, in terms of prediction accuracy, RNN with a dropout layer is superior to the LSTM model, and RMSE, MAE, and MAPE are decreased by 11.63%, 1.96%, and 1.89%, respectively. Besides, the dropout layer [21]  rough the analysis of short-term prediction, it is found that the generalization effect of filtering to prevent overfitting is better than the effect of the dropout layer to improve the accuracy. At the same time, wavelet transform has a good filtering effect. In terms of prediction accuracy, WT-LSTM can improve the prediction effect of SSA-LSTM model, and RMSE, MAE, and MAPE are reduced by 31.57%, 32.64%, and 32.64%, respectively. In terms of predictive stability, WT-LSTM can improve the predictive stability of the SSA-LSTM model, and SDAPE can be reduced by 27.85%. e prediction results of the four methods are shown in Figures 6-10.

Comparative Analysis of Medium-Term Forecasting
Effects. For the medium-term prediction effect, according to the results shown in Table 4 in terms of prediction accuracy, RNN with a dropout layer is superior to the LSTM model, and RMSE, MAE, and MAPE are decreased by 10.64%, 4.54%, and 4.53%, respectively. Besides, the dropout layer can optimize the prediction effect of the original LSTM model, and RMSE, MAE, and MAPE are decreased by 34.54%, 36.15%, and 36.14%, respectively. SSA-LSTM can optimize the prediction effect of the original LSTM model, and RMSE, MAE, and MAPE are reduced by 77.67%, 76.49%, and 76.49%, respectively. WT-LSTM can optimize the prediction effect of the original LSTM model, reducing RMSE, MAE, and MAPE by 75.60%, 73.29%, and 73.28%, respectively. In terms of prediction stability, RNN with a dropout layer is also superior to the LSTM model and the SDAPE is reduced by 21.23%. Besides, the dropout layer can improve the prediction stability of the original LSTM model and reduce the SDAPE by 32.03%. SSA-LSTM can improve the prediction stability of the original LSTM model and reduce the SDAPE by 79.20%. WT-LSTM can improve the prediction stability of the original LSTM model and reduce the SDAPE by 79.79%. rough the analysis of mediumterm prediction, we can see that the generalization effect of filtering to prevent overfitting is better than the effect of the dropout layer to improve the accuracy. At the same time, singular spectrum analysis also has a good effect on filtering.

Comparative Analysis of Long-Term Forecasting Effects.
For long-term forecasts, according to the results shown in            In summary, both WT-LSTM and SSA-LSTM can significantly enhance the prediction ability of the original LSTM and raise the prediction accuracy and stability, especially the generalization ability, no matter in the short, medium, or long term. Also, in the short term and medium term, the improvement effect of WT-LSTM is better than that of SSA-LSTM, while in the long term, the improvement effect of SSA-LSTM is better than that of WT-LSTM. A comprehensive comparison of the prediction results of the four methods is shown in Figures 21-23

Conclusion and Discussions
is paper has discussed the theoretical basis of deep learning and the practical application of LSTM price prediction and has proposed the use of denoising methods to reduce noise on high-frequency financial time series to minimize the effect of random interference noise to raise the prediction generalization of the model for out-of-sample data. e results are also significant enough to prove the improvement of the LSTM predicting model with effective denoising methods, especially wavelet transform and singular spectrum analysis.
In light of the empirical results of the DJIA 5 minutes closing data, the following conclusions can be drawn: firstly, the use of wavelet transform and singular spectrum analysis to denoise data can significantly raise the ability to extend of LSTM neural network and WT's effect is better than SSA's effect in the short term and medium term, but worse than SSA filtering method in the long term. Secondly, with the extension of the time limit, the generalization ability of wavelet transform and reconstructed filter sequence in the prediction of LSTM neural network is weakening, while the generalization ability of singular spectrum analysis decomposition and reconstructed filter sequence in the prediction of LSTM neural network is increasing. But the prediction effect of wavelet transform and singular spectrum analysis reconstruction filter is still significant. irdly, the WT-LSTM neural network and SSA-LSTM neural network can converge quickly in a small amount of time and has a good prediction effect under the high-frequency data, which provides a new idea for financial risk management and monitoring under high-frequency trading. ese findings can be widely used in the selection of methods for processing time series. To be specific, WT-LSTM is recommended to be chosen when processing relatively short-term time series, while SSA-LSTM is more efficient in processing long-term time series.
In view of the high tunability of neural networks, there are still many technical improvements in future research, such as adding more nonhomogeneous information as input to the neural network and optimizing the structure of the neural network itself. It is worth noting that applying the advantages of big data in the financial highfrequency time series to the investment field can enable investment opportunities to be discovered by investors in a timely manner and facilitate the development of intelligent investment in the financial market. In addition, it can also strengthen risk management, improve the efficiency of risk identification, and effectively maintain the stability of the financial market.

Data Availability
e DJIA data used to support the findings of this study were supplied by Wind under license and so cannot be made freely available. Requests for access to these data should be made to Wind, sales@wind.com.cn.