Stock Index Prices Prediction via Temporal Pattern Attention and Long-Short-Term Memory

*is study attempts to predict stock index prices using multivariate time series analysis. *e study’s motivation is based on the notion that datasets of stock index prices involve weak periodic patterns, long-term and short-term information, for which traditional approaches and current neural networks such as Autoregressive models and Support Vector Machine (SVM) may fail. *is study applied Temporal Pattern Attention and Long-Short-TermMemory (TPA-LSTM) for prediction to overcome the issue. *e results show that stock index prices prediction through the TPA-LSTM algorithm could achieve better prediction performance over traditional deep neural networks, such as recurrent neural network (RNN), convolutional neural network (CNN), and long and short-term time series network (LSTNet).


Introduction
Stock index prediction is one of the critical topics in financial time series forecasting. However, stock index characteristics, noisy and nonstationary phenomena, make predictions face challenges [1,2]. "Noisy" indicates a lack of information for investors to detect past stock index behaviors. "Nonstationary" arises in a situation where the stock index may change dramatically in different periods. ese characteristics lead to reduced stock index prediction results as predicted by traditional econometric models such as linear model, Autoregressive Integrated Moving Average (ARIMA), and Vector Autoregression (VAR) [3][4][5]. All such methods are generally based on several assumptions, such as independent and normally distributed variables that are contradicted with real market behavior.
Additionally, recent academic literature evidences that time series prediction based on neural networks have been widespread. Unlike traditional models, deep neural networks have several distinct advantages: nonparametric, selflearning, nonassumption, noise-tolerant, and ability to capture nonlinear interdependencies that are not commonly available in traditional models [6][7][8]. Hence, deep neural networks are usually more effective in forecasting stock index prices than traditional models [9].
Recent studies recognize deep neural networks as another promising tool in financial time series forecasting [10,11] due to its ability to model nonlinear patterns, comprehend complicated causal relationships, and learn from colossal history dataset. In the field of financial time series, prediction through deep neural networks uses various approaches, such as long and short-term memory (LSTM) [12] and support vector machine (SVM) [13]. e related studies mainly consist of three categories. In the first category, researchers identify significant events through some templates, such as the stock market bulletin [14,15]. In the second category, researchers seek inherent structure in the time series and predict future patterns [16,17]. In the third category, researchers predict the numerical value of financial time series accurately through technical analysis, which investigates past stock prices and volumes [18].
However, the performance of neural network techniques in the stock index scenario is relatively less explored [19][20][21]. Also, the overall body of literature highlights five problems in stock index prediction through neural networks. First, existing studies mainly focus on predicting a single stock index without considering differences in various industries [22]. Second, previous studies mainly focus on the univariate time series without considering dynamics dependencies among multiple variables [23]. ird, most studies proposed that a single neural network model cannot perform well in combining nonlinear and linear structures inherent in most financial data [19,20]. Forth, these models are mainly designed for multivariate time series with strong periodic patterns, which are not adaptable to datasets with nonperiodic or weak periodic patterns [24]. Fifth, most literature use RSE, CORR, and t-statistic to evaluate a model's performance, which is based on strict assumption such as normal distribution.
To solve the problems mentioned above, this paper extends Temporal Pattern Attention and Long Short-Term Memory (TPA-LSTM) method [25] to the financial field and verify the effectiveness of TPA-LSTM in this sector comprehensively. is paper's contributions mainly include the following three aspects. First, this paper considers the differences among different industries and predicts eight industry stock index prices in Hangseng Composite Index simultaneously with full consideration of interdependencies among them. Due to the macroeconomic environment and other conjunctural factors, industry stock index prices change collaboratively. erefore, this paper attempts to predict eight industry stock indexes simultaneously, including consumer good manufacturing, consumer service, energy, industry, information technology, integrated industry, raw material, and real estate. Second, this paper models stock index prices' complicated structure through a temporal pattern attention (TPA) mechanism proposed by the TPA-LSTM method. Datasets of stock index prices represent weak periodic patterns with short-term and longterm memory. TPA mechanism, which includes (Long Short-Term Memory) LSTM module, CNN module, and temporal attention module, is adaptable to various datasets, even multivariate time series data with nonperiodic and weak periodic patterns. LSTM component could capture one pattern with long length by itself, while the CNN module can extract short-term patterns in the time dimension and local dependencies between variables. Additionally, the temporal attention module could select variables that are helpful for forecasting and capturing temporal information. erefore, we could discover short-term and long-term weak repeating patterns of multivariate industry stock index prices and predict prices more accurately. ird, this paper attempts to test TPA-LSTM's robustness in the financial field through three evaluation metrics, including two single method's performance measures and one performance difference test. ese evaluation metrics are different from the performance evaluation metrics used in [25] and the traditional statistical significance test, which is based on strict assumptions. e rest of the paper is structured as follows. Section 2 reviewed the mathematical model on stock index price prediction and detailed introduction to the TPA-LSTM method. Section 3 discussed the experimental preliminaries, which contain experimental data and selection of evaluation criteria. Section 4 presents the application steps of the TPA-LSTM method and the experimental results. Finally, section 5 highlighted the conclusion.

Mathematical Model on Stock Index Prices Prediction.
is study attempts to predict stock index prices in different industries simultaneously. In the given datasets X � x t T t�1 , where x t ∈ R n (n � 8) represents eight industry indexes' prices at a time t, and n is the variable dimension, this paper is interested in the task of forecasting stock index prices in a rolling forecasting step. Instead of looking at a single stock index's price y t , this paper predicts x T+h with n dimensions simultaneously, wherein h is the desirable horizon ahead of the current timestamp and x t T t�1 are available. Similarly, this paper forecasts x T+h+1 in the next time step and assumes x t T+1 t�1 are available. Moreover, this paper uses only x t T T−w+1 to predict stock index prices x T+h , where w is the window size. is is based on the assumption that there is no useful information before the window w, which is set to be 30 in this paper [24]. erefore, the input matrix of this paper at timestamp T is X � x t T T−w+1 ∈ R n×T , and the output matrix is x T+h ∈ R n .

Framework.
Neural networks are widely used for financial time series prediction. e prediction process needs to address three difficulties. Firstly, many studies prefer univariate time series prediction rather than multivariate time series prediction, which is considered by the method applied in this paper. Secondly, the ignorance of weak periodic patterns in financial multivariate time series usually yields unsatisfactory outcomes. Finally, with the increase of data size, the time needed to predict time series increases remarkably. erefore, an algorithm to reduce the total number of data points and time series prediction's operation time is crucial. ese three difficulties are closely related to each other in the financial time series prediction process.
Shun-Yao Shih et al. proposed TPA-LSTM [25]. Compared to other forecasting methods, TPA-LSTM is the first method to predict n-dimensional time series with a mixture of short-term and long-term weak repeating patterns, which could solve the above problems. In this section, we describe the details of the TPA-LSTM algorithm applied in this paper.
TPA-LSTM consists of a nonlinear part and a linear part. e nonlinear part is a temporal attention mechanism that includes a recurrent layer, convolutional layer, and a temporal pattern attention layer, while the linear part uses an autoregressive model (AR) to forecast the result.

Recurrent Layer.
e first layer of TPA-LSTM is a long short-term memory network (LSTM). Given the input matrix X � x 1 , x 2 , . . . , x t , wherein x t ∈ R n (n � 8), this recurrent layer aims at capturing long-term information; the outputs of the recurrent layer are the hidden states at each time stamp. e hidden states of recurrent layer's units at time t can be formulated as follows: which is defined by the following equations: , ⊙ is the element-wise product, and σ is the sigmoid function.

Convolutional Layer. Given previous LSTM hidden states
. , x t , this section extracts short-term signal patterns and interdependencies among eight variables. e output in this section can be expressed as follows: where H C i,j represents the convolutional value of the ith row vector and the jth filter, C i denotes the k filters we have, T is the maximum length this paper is paying attention which is set to be 30.

Temporal Pattern Attention Layer.
e traditional attention mechanism selects relevant information relative to the current time step, which may lead to a failure to ignore noisy variables and detect useful temporal patterns in multivariate time series forecasting. TPA-LSTM develops a new temporal pattern attention mechanism to alleviate this problem, which could select useful variables and capture temporal information for forecasting. Given the previous convolutional value H C t , recurrent value H, and initial input matrix X, the output of this temporal pattern attention layer is a nonlinear projection part, which is computed as follows: where v t � H C i α t is the weighted context of hidden states of the convolutional matrix, α t is the attention weights which can be expressed as follows:

Autoregressive Layer.
Due to the nonlinear property of the proposed attention mechanism, the TPA-LSTM method decomposes the prediction into a nonlinear part and a linear part. e nonlinear part's prediction is captured by a recurrent layer, a convolutional layer, and a temporal pattern attention layer. In contrast, the linear part's prediction is solved by the Autoregressive (AR) model in this section. Given the initial input X, we can get the forecasting result of the nonlinear part through AR Layer, which is formulated as follows: en, the forecasting result of TPA-LSTM can be expressed as follows: e pseudocode of TPA-LSTM is described in Algorithm 1.

Data. Industry stock indexes in Hangseng Composite
Index collated from the Wind platform are examined in the experiment. Excluding the financial and utilities industries, the datasets in this article include consumer good manufacturing, consumer service, energy, industry, information technology, integrated industry, and raw material, real estate. e data set covers the period from 01/09/2006 up to 01/02/2019.
More specifically, we used the daily closing prices as the datasets of this study and illustrated in Figure 1. e shortterm and long-term repeating patterns are not visible due to nonstational time series or patterns with a flexible period. Each sample data of stock index prices is split into a training set (60%), validation set (20%), test set (20%) in chronological order. e study uses a validation set to tune hyperparameters while using a test set to evaluate and compare TPA-LSTM and other models' forecasting performance. Also, the Null values are instantly dropped due to its small scale.

Evaluation Criteria.
e prediction performance of the TPA-LSTM method is compared with five methods, including Long and Short-term Time series network (LSTNet) with recurrent-skip layer, LSTNet with the attention layer [24], RNN, CNN, and Laplacian echo state network (LAESN) [26]. ese five methods can analyze multivariate input and output. To verify the validity of the TPA-LSTM method proposed in this paper, we select three evaluation metrics: Root Relative Squared Error (RSE), Empirical Correlation Coefficient (CORR), and a multistep conditional predictive ability test [27]. e first two evaluation metrics are performance measures, while the last evaluation metric is a performance difference test. e Root Relative Squared Error is in a scaled version, which is designed to make comparisons more efficient and valid. In the multistep conditional predictive ability test, we just sum the RSE and RAE of the 8 output variables at each time point and reject Advances in Multimedia the hypothesis that two models have equal out-of-sample performance when the test statistic is larger than the critical value. e definitions of these criteria are found in Table 1.
Here, Y, Y ∈ R n×T are true values and predicted values of industry stock index prices, respectively, n is the number of out-of-sample forecasts, τ is forecast horizon, T is the total sample size, m is the maximum estimation of window size, h t is a test function, ΔL i is out-of-sample forecast loss differences of two methods, Z m,n � n −  found the lowest validation loss value. e following section discusses the experiment and results. e study results found that an attention length of 30 produces the best possible results when predicting industry stock indexes's prices based on TPA-LSTM. Also, the learning rate, the dropout rate, the horizon h, and the optimization algorithm are arbitrarily chosen to be 0.2, 3 × 10 − 3 , 24, and the Adam algorithm, respectively. Python 3 language is used to construct the program. Table 2 shows the prediction result of industry stock index prices. Table 2 illustrates the prediction performances of all methods on all test sets (20%) in all metrics, including RSE and CORR of TPA-LSTM, LSTNet-Skip, LSTNet-Attn, RNN, CNN, and LAESN. We set horizon � {3, 6, 9, 12, 15, 18, 21, 24}, respectively. e results show that the larger the horizons, the worse the prediction results in most cases. e best results for five methods and two metrics are highlighted in bold-face in Table 2. e total count of the bold-faced results is 14 for TPA-LSTM, 1 for LSTNet-Attn, 1 for RNN, 0 for LSTNet-Skip, CNN, and LAESN. Moreover, an asterisk sign ( * ) indicates that the test rejects equal conditional predictive ability at the 1% level and that the TPA-LSTM method outperforms other methods through conditional predictive ability tests on average. ese results show that even though the periodic patterns of industry stock index prices are not clear, the methods proposed by TPA-LSTM still perform much better than other neural network methods on most datasets. More specifically, TPA-LSTM outperforms the neural baseline methods in most cases. When the horizon is 24, TPA-LSTM outperforms LSTNet-Skip, LSTNet-Attn, RNN, CNN, and LAESN by 28.71%, 4.44%, 1.93%, 195.26%, and 44.68% in the RSE metric, respectively. Moreover, TPA-LSTM outperforms LSTNet-Skip, LSTNet-Attn, RNN, and CNN by 4.07%, 2.15%, 1.39%, 11.75%, and 7.66% in the CORR metric, respectively. e TPA-LSTM method has robust performance in different metrics, partly due to its consideration of interdependencies among multiple variables and weak periodic patterns with complex structures.

Conclusions
Multivariate time series prediction with neural networks plays a significant role in reducing the risks and uncertainty of financial markets. More specifically, it provides essential support in understanding the trends and behavior of industry stock index prices. e existing literature on multivariate time series prediction through neural networks mainly concentrates on forecasting univariate time series without considering interdependencies among different variables and mostly fails to capture the linear structure and weak periodic patterns. In this study, we applied a different approach, Temporal Pattern Attention and Long Short-Term Memory (TPA-LSTM), to predict stock indexes' prices in different industries included in the Hangseng Composite Index. TPA-LSTM method is a new prediction model that enables the prediction of multivariate time series simultaneously with a top concern of weak periodic patterns and a mixture of linear and nonlinear structures.
Further, the TPA-LSTM method comprises four components, the Temporal Pattern Attention component, the RNN component, and the Autoregressive component. e experiment results indicate that by combining the strengths of convolutional network, recurrent network, temporal  ere are two possible extensions of multivariate time series prediction in industry stock index prices. In the first extension, stock index price prediction can analyze automatic adjustment of hyperparameters, including window size w and horizon h, which are tuned manually in TPA-LSTM. e second extension is to investigate possible profits in different trade strategies with the TPA-LSTM method.

Data Availability
e data used to support the findings of this study have been deposited in Xiaolu Wei's repository (https://github.com/ xiaoluees/TPA-LSTM-data).

Disclosure
is manuscript is the second stage of the three-stage architecture proposed in "Discovery and Prediction of Stock Index Pattern via ree-Stage Architecture of TICC, TPA-LSTM, and Multivariate LSTM-FCNs" published on IEEE Access.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.

Authors' Contributions
Binbin Lei was responsible for collecting data in this manuscript. e whole architecture of this study was proposed by Xiaolu Wei, Hongbing Ouyang, and Qiufeng Wu based on their earlier study.