A Stock Closing Price Prediction Model Based on CNN-BiSLSTM

,


Introduction
Stock predicting research is an applied research direction of financial big data. With the rapid growth of China's economy and the continuous expansion of the financial market, more and more investors have begun to pay attention to the methods to improve return on investment and effectively avoid certain risks. Among these methods, the stock price prediction is of great significance in the commercial and financial fields [1,2]. In the face of the rise and fall of stock price, investors will get unpredictable profits and even losses, so it has become an issue of concern for investors to predict stock price and select stock worthy of investment. In view of the complexity and instability of the stock market [3], a large number of variables and information sources need to be considered in the process of stock price prediction, which is a very difficult task, and are still the focus and discussion in the financial sector [4]. e traditional analysis method is to use the existing stock data and relevant technical charts, combined with the investor's own experience to predict the stock price. But this method is not applicable in today's increasingly large and complex stock market. In addition to low efficiency and excessive reliance on manual experience, there are also a series of problems such as poor integrity of stock content information and feature data redundancy. e utilization rate of stock data is low, and the effect is not good, so it is difficult to meet the needs of market development.
Many factors affect the changing trend of the stock market, and the trend of stock price fluctuation which is showing a nonlinear change law is very complex, so it is often very difficult to predict the stock market [5]. With the increasing availability of high-frequency trading data and the increasing popularity of artificial intelligence, deep learning is favoured as an "upgraded version" of existing models and methods without relying on econometric assumptions and expert experience [6,7]. Deep learning neural network has a good fitting ability for nonlinear function relations [8,9]. Building a deep neural network to predict the trend and price of stock has been widely concerned by people, and some scholars have also carried out in-depth research on this aspect [10][11][12]. In 2010, Nair et al. built a denoising hybrid stock price prediction model based on decision tree [13]. Firstly, the model was used to extract the relevant features of stock data, and then the decision tree algorithm was used to select the extracted features.
en the principal component analysis (PCA) algorithm was used to reduce the dimension. e reduced dimension data were input into a fuzzy model for stock price prediction. In 2016, Wang et al. used the support vector machine (SVM) to build a model to predict the trend of the CSI 300 index and verified the validity of the support vector machine in stock price index prediction. [14]. In 2019, Hoseinzade and Haratizadeh proposed a framework based on CNN and predicted the trend of the S&P 500 index, Nasdaq index, Dow Jones index, New York Stock Exchange index, and Russell index on the next day. e results show that the prediction performance is higher than the baseline algorithm. [15].
To predict stock closing price more accurately, this paper proposes a stock prediction model based on CNN-BiSLSTM, which uses stock data of the last five trading days to predict the closing price of the next trading day. BiSLSTM improved the output gate based on the BiLSTM model. e model consists of CNN and BiSLSTM. CNN is used to extract the characteristics of stock data, and BiSLSTM is used to predict the stock closing price. Compared with BiLSTM, BiSLSTM can make the output value of the output gate more accurate. CNN-BiSLSTM can more accurately predict the stock closing price of the next trading day, which can be used as a reference for the majority of investors to effectively avoid certain risks. e main contributions are as follows: (1) CNN is proposed to extract the feature that affects the stock price. One-dimensional CNN can be well applied to time series analysis. e one-dimensional convolutional layer extracts advanced data features from sample data, makes full use of the feature information of the input data, adopts local links, weight sharing, and space or time-related downsampling method to gain better features, makes the extracted features more distinguishable, and improves the accuracy of model prediction results when the closing price of stock is being predicted.
(2) BiSLSTM is proposed to predict the closing price of stocks. BiSLSTM which is improved on BiLSTM adds 1 − tanh(x) function to its output gate, so that the value range of the output gate is finally (0.24, 1). erefore, BiSLSTM not only has the strong learning ability of BiLSTM, but also has a better fitting effect than BiLSTM in model training process. As a result, BiSLSTM is suitable for analyzing the relationship between time series data.

Related Work
Artificial neural network (ANN) has been proved to be able to deal with complex nonlinear problems well, but the testing and training speed of neural networks are slow [16]. In addition, overfitting and falling into the local minimum are the disadvantages of neural networks. Huang et al. took LSTM as the main model of stock prediction and adopted the Bayesian optimization method to dynamically select parameters to determine the optimal number of units, and the prediction accuracy was improved by 25% compared with traditional LSTM [17]. Gunduz et al. sent relevant technical indicators of each sample into CNN to improve the accuracy of prediction [18]. Generalized autoregressive conditional heteroskedasticity (GARCH) is a classic model widely used in time series prediction, and GARCH assumes that values for time series are a linear generation process. However, market features are nonlinear, so making GARCH assumption is unsuitable for many financial time series applications [19]. Wen et al. proposed a new method to simplify noisy-filled financial temporal series via sequence reconstruction by leveraging motifs (frequent patterns) and then utilized a CNN to capture the spatial structure of time series [20]. But most conventional time series analysis studies rely on the linear relationship between stock prices, which is more suitable for sequences with stable trends and regular, so this relationship makes them insufficient to deal with more complex nonlinear relationships. However, stock price shows the feature of uncertainty and nonlinearity, and the influencing factors of stock price volatility are very complex. All of these are ignored in simple time series analysis. As a result, the prediction effect is poor [21].
After RNN was proposed, most scholars found that the RNN would forget the previous state information over time, and then the LSTM was proposed. In deep learning, the LSTM network structure is suitable for learning data of time type and is widely used in various tasks of time series analysis [22,23]. LSTM is better than the traditional recurrent neural network [24,25]. It overcomes the problem of gradient disappearance or gradient explosion [26]. Many financial time series studies use LSTM modelling [27]. Zhang et al. used the generative adversarial network (GAN) to predict the stock market [28]. MLP was used as the discriminator and LSTM network as the generator to predict the closing price. is is a breakthrough of a new method, which is worth further deepening and improving. e advantage of this method is that it can capture the time series feature of stock data. Akita et al. used the text data of Nikkei News as the input of LSTM, combined with market time series numerical data to predict the opening prices of 10 companies [29]. Under the simulated trading strategy, a model trained with numerical data and text data was used, which could obtain a higher profit rate than a model trained with only numerical data. Hyun et al. proposed a stock price prediction model based on CNN. Nine technical indicators were selected as predictors of the prediction model, and the technical indicators were converted into images of time series graph to verify the applicability of the new learning method in the stock market [30]. Yang  CNN was used to efficiently extract features from the data, and LSTM was used to predict the stock price with the extracted feature data. is forecasting method not only provided a new research idea for stock price forecasting but also provided practical experience for scholars to study financial time series data [32]. Lecun et al. in 1998 [33]. CNN is a multilayer neural network structure with a deep supervised learning structure, which is able to process time series data and image data. Since CNN has been successfully applied to the preprocessing of twodimensional images, the same idea can also be used to process one-dimensional data [34]. CNN uses a small number of parameters to capture the features of input data and combine them to form advanced data features. Finally, these advanced data features are put into the full connection layer for further regression or classification prediction. e typical CNN structure consists of the input layer, convolutional layer, pooling layer, fully connected layer, and output layer. Among them, the convolution layer mainly performs convolution operations on the samples through the convolution kernel to obtain the input of the next layer. e pooling layer is an important part of CNN, which can effectively reduce the number of model parameters and reduce the complexity of operations while the useful information of the feature map is retained. CNN can extract data features through layer-by-layer convolution and pooling operations. e filter can set appropriate window size and window sliding step size according to the size of the input data and the need to extract features.

Convolutional Neural Network. CNN was proposed by
In the one-dimensional convolutional neural network, a one-dimensional array is used as the convolution kernel. In the traditional two-dimensional backpropagation algorithm, the dimensions need to be adjusted to match the convolution kernel. In the process of forward propagation, the output of the current convolutional layer l can be expressed as follows: where x l j is the output feature map of the j-th neuron of the current layer (layer l); l is the number of input features of the l-th convolutional layer; x l−1 i is the output feature map of the previous layer (layer l-1), x l−1 i is also the input of the current l layer; ⊗ represents the convolution operation; k l ij represents the convolution kernel of the i-th neuron of the l-1 layer to the j-th neuron of the l layer; b l j is the j-th neuron of the l layer standard deviation; and f is the activation function, which is obtained by using the following formula: (2) As a subsampling layer, the pooling layer can ensure the invariance of the mapping, and max-pooling can be expressed as follows: where y l i is the output of the i-th neuron of the current layer l; max-pooling () is the down-sampling function, taking the maximum value within a certain range; s scale is the scale of pooling; and s stride is the step length of pooling.

Long Short-Term Memory. LSTM was first proposed by
Hochreater and Schmidhuber in 1997 [35]. In 2000, Gers et al. improved the LSTM network and proposed the forget gate method, which was suitable for continuous prediction [36]. Later in 2012, Grave improved and promoted LSTM [37]. On many issues, LSTM has achieved considerable success and has been widely used. e predecessor of LSTM is RNN. RNN is a neural network that learns sequence patterns through internal loops. In the RNN backpropagation process, the value is propagated back to the activation function, so the slope will become extremely small or extremely large, and the problem of gradient disappearance or gradient explosion occurs. In 2013, Hochreiter et al. proposed memory cells and gates, and these gate structures could solve the gradient problem of RNN and add or delete cell information [38]. Such gate structures could store information for a long time, and unnecessary information was forgotten [39,40]. e LSTM uses memory units instead of neurons. e structure of LSTM memory cell is shown in Figure 1. e LSTM cell consists of a memory cell (C t ) and three gate structures. e three gate structures include input gate (i t ), forget gate (f t ), and output gate (o t ). e input gate is used to calculate the input information at that moment and control the input of new information into the internal memory unit. e forget gate is used to control the internal memory unit, which needs to save the information of the previous time. e output gate is used to control the amount of information output by the internal memory unit.
In Figure 1, x is the input; h is the hidden state that gives the network memory ability; and the subscripts t − 1 and t represent different time steps. e connections between its nodes form a directed graph along the sequence, and h t is calculated based on the output of the hidden state of the previous layer and the input of the current moment. e calculation principle of LSTM is as follows.
Firstly, the value of the input gate i t is calculated by using formula (4), and the candidate state value C t of the input cell at time t is calculated using formula (5): Complexity 3 Secondly, the following formula is used to calculate the activation value of forget gate f t at time t: irdly, the original information and the newly increased information are, respectively, controlled by the forget gate and the input gate. e i t , C t , and f t , calculated in the first two steps, are used to calculate the updated value C t of the cell state at time t using the following formula:.
After the new cell state is obtained, formula (8) is used to calculate the output gate value, and the updated memory cell uses formula (9) to calculate the current hidden state h t : In formulas (4)-(9), W i , W c , W f , and W o represent four different matrix weights, b i , b c , b f , and b 0 represent the offset, σ is the sigmoid function, and the symbol * represents the vector outer product.
Finally, backpropagation is performed to obtain the LSTM, which composed of these storage blocks. rough the above calculation, the LSTM can effectively use the input time series data to make it have the function of long-term memory.

Bidirectional Long Short-Term Memory.
Although LSTM can obtain the feature information of long distance, the obtained information is the information before the output time, and it does not use the reverse information. In time series prediction, the forward and backward information law of time series data should be fully considered, which can effectively improve the prediction accuracy. BiLSTM consists of two LSTM, forward and reverse. Compared with the one-way-state transmission in the standard LSTM, BiLSTM considers the changing laws of the data before and after data transmission and can make more complete and detailed decisions using the past and future information. It has shown superior performance. BiLSTM consists of forward calculation and backward calculation, from the BiLSTM structure diagram in Figure 2. In Figure 2, the horizontal direction arrow indicates the two-way flow of time series information in the model, while the data information flows in one direction vertically from the input layer to the hidden layer to the output layer.

CNN-BiSLSTM.
CNN-BiSLSTM is a hybrid of CNN and BiSLSTM. BiSLSTM is improved on BiLSTM, and 1 − tanh(x) function is added to the output gate, so that the value range of the output gate is about (0.24, 1). erefore, BiSLSTM not only has the strong learning ability of BiLSTM, but also has a better fitting effect than BiLSTM in the model training process. As a result, BiSLSTM is suitable for analyzing the relationship between time series data. SLSTM unit structure diagram is shown in Figure 3. CNN-BiSLSTM network structure is shown in Figure 4. e stock historical trading information is time series and belongs to time series data. In the CNN-BiSLSTM, CNN is used to extract the local features of the data layer by layer. Advanced features with strong expression ability can be extracted from the data, effectively avoiding subjectivity and limitations of manual feature extraction. e BiSLSTM has the feature of retaining contextual historical information for a long time, which can realize feature extraction of time dimension and long-distance dependent data. In addition, BiSLSTM can mine the long-term time series relationship between the influencing factors of stock and the closing price. erefore, the data from the CNN output place are put into the BiSLSTM to model the bidirectional time structure through the calculation of formulas (10)- (15) where f t is used as the forgetting gate, sigmoid function σ is used to judge whether the past memory needs to be retained for the current memory state through formula (12); i t is used as the input gate to calculate whether the current input data are worth retaining through formula (10); C t is used to calculate the data that need to be updated by formula (11), and i t is used to control whether it needs to be updated or not; and C t calculates whether the state at the current moment needs to be updated by formula (13). After the new state is obtained, formula (14) is used to calculate the output gate value O t ; compared with BiLSTM, BiSLSTM adds 1 − tanh(x) function here. e updated memory cell can calculate the current hidden state h t through the following formula:

Complexity
Since BiSLSTM is composed of two SLSTM, one is forward and the other is backward, and the above calculation needs to be calculated in reverse. Finally, through the full connection layer, we calculate the closing price of the stock and make a more accurate forecast.

Experimental Environment.
To verify the effectiveness of the proposed model, Shenzhen Component Index is used as the experimental data in the experiment. All experiments are implemented on a computer equipped with Intel Core i5-6300HQ 2.30 GHz, 12.0 GB RAM, NVIDIA GeForce GTX 960m, and Windows 10 64-bit operating system. In this experiment, Python 3.7 is used as the programming language, PyCharm and Anaconda3 are used as the development tools, and Keras based on TensorFlow is used to construct the network model structure.

Experimental Data.
Shenzhen Component Index is used as historical data for stock prediction in the experiment. Shenzhen Component Index is a constituent stock index compiled by Shenzhen Stock Exchange. It is a weighted stock index calculated by taking 40 representative listed companies from all listed stocks as the researching object and taking the outstanding shares as weight, which comprehensively reflects the stock price trend of A and B shares listed on Shenzhen Stock Exchange. e data used in the experiment come from the Wind-Economic database. e software ensures the accuracy of the data from the data source.
e experimental data use the historical data of Shenzhen Component Index from July 1, 1991, to October 30, 2020. Some experimental data are shown in Table 1.

Experiment Process.
e CNN-BiSLSTM is used to predict the stock closing price, and the experimental process is as follows: (1) Perform preprocessing operations on experimental data, remove irrelevant items, serialize time data, standardize data, and divide training set and testing set. (2) Input the preprocessed time series data into the CNN-BiSLSTM model for training. e training process is shown in Figure 5.

Experimental Data Preprocessing.
Firstly, the original data are checked, and the missing data are filled or eliminated to facilitate the training and testing of the model. For some special reasons, some intermittent data are vacant. Considering that the data are serial data. e data do not change much from one trading day to the next trading day. So, the average value of the data of the previous trading day and the next trading day will be used to make up. Secondly, the Chinese stock market stipulates that the market is closed all day on Saturdays, Sundays, and major holidays. erefore, all data at these time nodes are removed, and only the trading day data are retained.
Considering that some data in the data set have nothing to do with stock price prediction, they are excluded. e data of the index opening price, highest price, lowest price, closing price, volume, turnover, ups and downs, and change are selected as the influencing factors of stock closing price.  Table 2. e comparison model parameters are the same as some of the CNN-BiSLSTM model parameters.

Experimental
e model training parameters CNN-BiSLSTM used in this experiment are exactly the same as the comparison model. e sequence length is 5, and the delay is 1. e optimizer uses Adam, which not only calculates the adaptive parameter learning rate based on the mean of the first moment as the RMSProp algorithm, but also makes full use of the mean of the second moment of the gradient. e learning rate is 0.0001, and the loss function uses MAE. MAE is the sum of the absolute values of the difference between the true and predicted values. It only measures the mean modulus length of the predicted value error, without considering the direction, and has better robustness to outliers. Batch_size is 64, and epochs is 50.

Model Training and Prediction.
e selected 6878 stock data are divided into training set and testing set, among which the training set is the first 6078 and the testing set is the last 800. Since the magnitude of data in different dimensions is not at the same level, the z-score standardization method is used to convert the data of different orders of magnitude in training set and testing set into the same level. e standardized operation is shown in the following formula: where y i is the standardized value, x i is the input data, x is the average value of the data, and s is the standard deviation of the data. After the parameters are set, CNN-BiSLSTM is initialized, and the training set data standardized by z-score are put into the model. e forward calculation of the neural network is performed. e model structure is shown in Figure 6. After the calculation is completed, MAE is used to calculate the error between the result of the forward calculation and the true value, and then the Adam algorithm is used for backpropagation to update the weight parameters. e CNN-BiSLSTM stock prediction model is obtained through repeated training of 6078 training samples. Complexity e data of the testing samples are put into the CNN-BiSLSTM after the training for prediction. Since the data in the testing set are standardized data, formula (17) is required to restore the data. MAE, RMSE, and R 2 are used to evaluate the predicted value and the true value after restoration:

Analysis of Results.
e preprocessed stock data are put into the CNN-BiSLSTM, MLP, RNN, LSTM, BiLSTM, CNN-LSTM, and CNN-BiLSTM models for training. After the training is completed, the divided testing set is used for prediction. e comparison result of the predicted value and the true value in the last 200 days is shown in Figures 7-13. Models' evaluation index contrast is shown in Table 3. From

Discussion
According to the data in Table 3  is about 0.986, which is closer to 1. e prediction effect of more complex CNN-BiSLSTM is better than that of CNN-BiLSTM, and it is more suitable for stock price prediction.

Conclusions
A hybrid stock predicting model based on CNN-BiSLSTM is proposed. e model consists of two parts. First, CNN is used to capture the features of the input data and combine them to form high-level data features. BiSLSTM adds 1 − tanh(x) function to the output gate calculation based on BiLSTM. Second, BiSLSTM is used to consider the change rule of historical data at the same time, and the stock data in the past are used to predict the closing price of the stock of the next trading day. CNN-BiSLSTM is compared with the reference models of MLP, RNN, LSTM, BiLSTM, CNN-LSTM, and CNN-BiLSTM. e experimental results show that the CNN-BiSLSTM stock prediction model has a better prediction effect than the reference models. ere are still some details to be improved in this paper, which need to be further studied. e future work can be divided into two parts: (1) Investors are an indispensable part of the stock market. To some extent, investors are also understanding and controlling the stock market. erefore, through the investors' evaluation and views on individual stock, we can analyze the opinions and emotions held by most investors and further infer the future trend of stock, which can provide guidance for investment strategies. (2) e prediction of the closing price of stock in this paper has limitations. It only predicts the closing price of stock in the next trading day, which has limited reference value for investment. Regarding investors, they prefer to predict the price and trend of the stock in the next period of time, so they need to conduct more in-depth research on the stock changes.

Data Availability
e data presented in this study are available on request from the corresponding author due to restrictions privacy.

Conflicts of Interest
e authors declare that they have no known conflicts of interest or personal relationships that could have appeared to influence the work reported in this paper.

References
[1] S. Zhang, Q. He, H. Zhang, and K. Ouyang, "Doppler correction using short-time MUSIC and angle interpolation resampling for wayside acoustic defective bearing diagnosis,"