Stock Price Forecast Based on CNN-BiLSTM-ECA Model

Financial data as a kind of multimedia data contains rich information, which has been widely used for data analysis task. However, how to predict the stock price is still a hot research problem for investors and researchers in financial field. Forecasting stock prices becomes an extremely challenging task due to high noise, nonlinearity, and volatility of the stock price time series data. In order to provide better prediction results of stock price, a new stock price prediction model named as CNN-BiLSTM-ECA is proposed, which combines Convolutional Neural Network (CNN), Bidirectional Long Short-term Memory (BiLSTM) network, and AttentionMechanism (AM). More specifically, CNN is utilized to extract the deep features of stock data for reducing the influence of high noise and nonlinearity. /en, BiLSTM network is employed to predict the stock price based on the extracted deep features. Meanwhile, a novel Efficient Channel Attention (ECA) module is introduced into the network model to further improve the sensitivity of the network to the important features and key information. Finally, extensive experiments are conducted on the three stock datasets such as Shanghai Composite Index, China Unicom, and CSI 300. Compared with the existing methods, the experimental results verify the effectiveness and feasibility of the proposed CNN-BILSTM-ECA network model, which can provide an important reference for investors to make decisions.


Introduction
With the unprecedented development of the network, multimedia data such as text, image, video, financial data from mobile phones, social networking sites, news, and financial websites are growing at a rapid pace and also affecting our real daily life.In the era of big data, how to make full use of these data providing the relevant and valuable information for us becomes a great significance task [1,2].For example, investors can employ the financial data to predict the future price trend of financial assets to reduce the decision-making risk [3,4].However, investors can hardly acquire the useful information of the budget allocation timely.In order to make right investment decisions for investors, some technical or quantitative methods are necessary and important to use to predict the fluctuation of asset prices [5].
ere are many methods that have been proposed to predict the financial stock data, which achieve excellent performance [5,6].For instance, traditional methods based on econometric statistical model aim to find the best estimation for time series prediction, such as Auto Regressive (AR) model, Moving Average (MA) model, Auto Regressive and Moving Average (ARMA) model, and Autoregressive Integrated Moving Average (ARIMA) model [7].Although the abovementioned approaches can describe and evaluate the relationship between variables through statistical inference, there are still some limitations.On one hand, since these methods are based on the assumption of linear relationship of model structure, they can hardly capture the nonlinear variation of the stock price [8,9].On the other hand, these approaches assume that the data have constant variance, while the financial time series have high-noisy, time-varying, dynamic properties, and so on [10].
In order to solve the aforementioned shortcomings, many machine learning technologies have been applied to simulate the nonlinear relationship in financial time series.Among them, Artificial Neural Network (ANN) owing to its excellent ability in nonlinear mapping and generalization has been widely used for dealing with the financial time series [11,12].Different from the econometric statistical models, the strict model structure and additional series of assumptions are not required for ANN model.For example, Hajizadeh et al. [13] put forward a hybrid model, which combined ANN with Exponential Generalized Autoregressive Conditional Heteroscedasticity (EGACH) model to predict the volatility of S&P500.e experimental results show that the test error of the hybrid model is lower than that of any single econometric model.Rather et al. [14] realized stock return prediction via taking ARMA and Exponential Smoothing (SE) linear models with Recurrent Neural Network (RNN) [15] into consideration.
e experimental results indicate that RNN can further improve the prediction performance.
In recent years, Long Short-Term Memory (LSTM) network has been proposed which overcomes the vanishing gradient problem in RNN and uses storage cells and gates to learn long-term correlation in time series data [16].LSTM has been widely used for forecasting time series due to its advantages [17][18][19].Nelson et al. [20] proposed a stock price prediction method based on LSTM.From the experimental results [20], we can find that the LSTM model is more accurate than other machine learning models, such as Support Vector Machine (SVM) [21], Genetic Algorithm (GA) [22], and BP Neural Network [23,24].Graves and Schmidhuber [25] proposed a Bidirectional LSTM (BiLSTM) network.BiLSTM contains a forward LSTM and a reverse LSTM, in which the forward LSTM takes advantage of past information and the reverse LSTM takes advantage of future information.en, many researchers applied BiLSTM to solve the problem of time series prediction [26][27][28].Because BiLSTM network can use the information of both the past and the future, the final prediction is more accurate than unidirectional LSTM.In addition, multitask RNNs methods also have been proposed for forecasting time series such as EEG-based motion intention recognition and dynamic illness severity prediction [29,30].
Attention Mechanism (AM) is derived from human vision [31].By scanning the target area quickly, the human eyes can pay more attention on the key areas to obtain the useful information and suppress other useless information.
us, we can use the limited resources to quickly screen out the valuable information based on the target from the massive information [31].At present, attention mechanism has been well applied in computer vision question answering [32,33], natural language processing [34,35], and speech recognition [36,37].In addition, some researchers have also successfully applied the attention mechanism to the related research of time series [38][39][40][41].
Inspired by the successful applications of deep learning and attention mechanism on stock data analysis [42][43][44], this paper proposes a time series prediction model named as CNN-BiLSTM-ECA, which integers Convolutional Neural Networks (CNN) [10,12] and BiLSTM to predict the closing price of stock data.First, our model adopts CNN to extract the deep features.Second, the feature vector is constructed in time series regarded as the input of the BiLSTM network.
ird, in order to further improve the prediction performance of BiLSTM network, a novel attention mechanism termed as Effective Channel Attention (ECA) [45] module is introduced.Compared with other attention mechanisms, ECA is lighter and less complex, which can greatly improve network performance.Finally, we compare the proposed CNN-BiLSTM-ECA model with the LSTM, BiLSTM, and CNN-BiLSTM models on three stocks' data including Shanghai Composite Index, China Unicom, and CSI 300 to verify its effectiveness.e outline of this paper is as follows: Section 2 reviews the related works and backgrounds are introduced in Section 3. Section 4 gives the proposed network model structure in detail.Section 5 shows extensive experiments to prove the effectiveness of the proposed approach.Section 6 presents some conclusions.

Related Works
Recently, a large number of stock prediction methods have been proposed.
is paper mainly reviews two kinds of methods based on machine learning and deep learning.

Machine Learning Methods.
As machine learning techniques are becoming more and more popular, Machine Learning (ML) methods based on financial time series forecasting have been studied extensively.Specifically, Nayak et al. [46] proposed a hybrid model based on SVM and K-Nearest Neighbor (KNN) to predict the Indian Stock Market Index.Combining weight SVM and KNN, Chen and Hao [47] proposed a new model for predicting the trend of the Chinese stock market.In order to find the optimal solution of neural network, Chiang et al. [48] developed a novel model to predict stock trend by introducing the Particle Swarm Optimization (PSO) into the neural network.Furthermore, Zhang et al. [49] proposed a novel ensemble method by taking AdaBoost, generic algorithm, and probabilistic SVM into consideration to acquire better prediction performance.Moreover, several hybrid based approaches have been proposed for stock trend prediction, which have achieved excellent performance.For example, Marković et al. [50] proposed a new hybrid method that integrates the analytic hierarchy process and weighted kernel least squares SVM.Lei [51] developed a hybrid method by combining the rough set and wavelet neural network.Many researchers have found that fusion model based on different techniques plays a vital role in prediction.Based on technical analysis and sentiment embedding, Picasso et al. [52] proposed a fusion model, which integrates Random Forest (RF), ANN, and SVM.Similar to [52], Parray et al. [53] applied several methods based on machine learning including Logistic Regression (LR), SVM, and ANN to predict the trend of the stocks for the next day.Xu et al. [54] proposed a new fusion method by combining the k-mean clustering and ensemble method (i.e., SVM and RF).In order to reduce the influence of parameters, Dash et al. [55] proposed a new stock price prediction method named fine-tuned SVR, which combines the grid search technique and SVR. e grid search technique is used to select the best kernel function and tune the optimized parameters through training and validation datasets.
Apart from the aforementioned approaches based on machine learning, a variety of Deep Learning (DL) techniques have emerged in recent years for financial time series forecasting researches.In the next section, we will introduce some stock price prediction methods based on DL techniques.

Deep Learning Methods. DL technique based on
ANNs is a branch of ML, which can extract high-level abstraction feature for data representation.DL methods can achieve excellent performance compared to conventional ML methods; thus, they have been widely applied in many application fields such as image processing and compute vision.Recently, DL methods have been proposed for analyzing the financial time series data.Specifically, combining the stock technical indicators and the historical price of stock data, Nelson et al. [20] applied LSTM model to predict the movement of the stock market's price.In order to forecast the short-term stock market trend, Liang et al. [56] constructed a new prediction model by combing Restricted Boltzmann Machine (RBM) and several classifiers.Kim and Won [57] combined LSTM with Generalized Autoregressive Conditional Heteroscedasticity (GACH) model to identify the stock price volatility.Moews et al. [58] designed a forecasting method based on Deep Feed-forward Neural Network (DFNN) and exponential smoothing.In order to reduce the training complexity and improve the prediction accuracy, Li et al. [59] constructed a forecasting model by integrating Feature Selection (FS) and LSTM method.For selecting and focusing on key information of stock data, Zhao et al. [60] introduced AM into RNN and proposed three predication frameworks named AT-RRR, AT-LSTM, and AT-GRU, respectively.
CNN utilizes the local perception and weighted sharing to greatly reduce the number of parameters.erefore, many methods based on CNN are recently proposed to predict the stock trend and achieve good performance.For example, Sezer and Ozbayoglu [61] first transformed stock technical indicators into 2D images and then designed a novel method based on CNN for the stock price prediction task.Wen et al. [62] first exploited sequences reconstruction method to reduce noise of the financial time series data.And then, they employed CNN model to extract spatial structure from the time series data for stock prediction.Different from [62], Barra et al. [63] first utilized Gramian angular field technique to obtain 2D images from the time series data.en, an ensemble learning framework of CNNs was exploited to forecast the trend of US market.Long et al. [64] defined three matrices to describe the trading behavior pattern named as the transaction number matrix, buying volume matrix, and selling volume matrix.Next, they exploited CNN to extract deep features.To capture the characteristics of time series, Hao and Gao [65] proposed a novel method by extracting multiscale CNN features of price series.Different from the existing methods, Lu et al. [66] proposed a new network model to predict stock price by combining CNN and LSTM.

Backgrounds
In this section, some basic knowledge of the proposed method will be reviewed briefly.[15].However, RNN model often suffers from the vanishing gradient problem in the training processing.erefore, it is difficult to remember the previous information, namely, long dependence problem [16,67].To deal with this issue, Greff et al. [67] have proposed a LSTM network model, which has the function of memory in a longer time span.e model utilizes gate control mechanism to adjust information flow and systematically determines the amount of incoming information to be retained in each time step.Figure 1 shows the structure of the basis LSTM unit composed of a storage unit and three control gates (named as input gate, output gate, and forgetting gate).x t and h t correspond to the input and hidden states of time t, respectively.f t , i t , and o t are forgetting gate, input gate, and output gate, respectively. C t is the candidate information for the input to be stored, and the amount of storage is then controlled by the input gate.

Long Short-Term Memory Network. RNN model is widely used to analyze and predict time series data
e calculation processes of each gate, input candidate, cell state, and hidden state are shown in the following formulas: where

Attention Mechanism.
Squeeze-and-Excitation (SENet) is an efficient attention mechanism, which can improve the representation ability of the network by modeling the dependency of each channel and can adjust the features channel by channel, so that the network can select more useful features [68].e basic structure of SE block is shown in Figure 2. e first step is the squeeze operation, which takes the global spatial features of each channel as the representation of the channel to get the global feature description.e second step is the extraction operation, which learns the dependence degree of each channel and adjusts it, and the obtained feature graph is the output of SE block.

The Proposed Network Model Structure
In order to extract features efficiently and improve prediction accuracy, we combine CNN, BiLSTM network, and a lightweight ECA attention module into a unified framework and propose a new time series prediction network model named as CNN-BiLSTM-ECA.Our proposed model can automatically learn and extract local features and long memory features in the time series by making full use of data information to minimize the complexity of the model.e network model structure is shown in Figure 3.In this model, firstly, the CNN model is utilized to extract deep feature vectors from the input origin time series data.
en, the BiLSTM model is employed to learn the temporal features from the new time series data constructed by the deep feature vectors.Moreover, the attention mechanism named as ECA is further introduced to extract more important features.Finally, the Dense model consisting of several fully connected layers is employed to perform the prediction task.

Convolutional Neural Network.
In this paper, Convolution Neural Network (CNN) is utilized to extract data features efficiently.Compared with the traditional neural network structure, CNN is a local connection between neurons, which reduces the number of the parameters between the connection layers.In other words, it contains a part of the connections between n − 1 layer and n layer in CNN. Figure 4 shows the difference between full connection and local connection, where Figure 4

Bidirectional LSTM.
In order to build a more accurate prediction model, the bidirectional LSTM (BiLSTM) network is employed, which acts as forward and backward LSTM networks for each training sequence.e two LSTM networks are connected to the same output layer to provide complete context information to each sequence point.Figure 5 shows the structure of BiLSTM.

Attention Module ECA.
Channel Attention (CA) mechanism has great potential in improving the performance of Deep Convolutional Neural Networks (DCNNs).However, most of the existing methods are committed to designing more complex attention modules to achieve better performance, which inevitably increases the complexity and computational burden of the model.In order to avoid overfitting of the model and reduce the computation, a lightweight and low complexity module called as Effective Channel Attention (ECA) [32] is introduced.ECA can not only generate the weights for each channel, but also learn the correlation among the different channels.For the time series data, the larger weights will be assigned for the key features and smaller weights for the irrelevant features.erefore, ECA focuses on the useful information, which improves the sensitivity of the network to the main features.Figure 6 shows the structure of ECA.
As shown in Figure 6, ECA is first to carry out channel Global Average Pooling (GAP).en, ECA employs each channel and its k adjacent channels to capture the local cross channel interactions.ECA generates channel weights by performing fast 1D convolution as follows: where C1D is denoted as 1D convolution and k is the kernel size of 1D convolution.
In order to avoid manually adjusting k, ECA uses channel dimension adaptively mapping way to 4 Scientific Programming determine the value of k.Since the kernel size k of 1D convolution is directly proportional to the channel dimension C, the corresponding relationship is defined as follows: erefore, given the channel dimension C, the kernel size k can be adaptively determined by where | | odd is the nearest odd number.e parameters of c and b are set to 2 and 1, respectively, in this paper.
Obviously, the high-dimensional channel has a longer range of interaction through nonlinear mapping, while the low dimensional channel has a shorter range of interaction.

Dense Model.
e purpose of Dense model is first to exploit full connection layer to extract the correlation among these features changed by nonlinear mapping and then maps them to the output space.In the proposed network structure model, three layers of full connection layer are added to solve the nonlinear problem well, which can achieve accurate prediction.Figure 7 shows the structure of Dense model.

Experimental Process.
e experiment consists of six parts: data colleting, data preprocessing, model training, model saving, model testing, and prediction results.e schematic diagram of the experiment process is shown in Figure 8.

Data Description and Preprocessing.
e experimental data are collected from NetEase Finance (data collected from the website http://quotes.money.163.com/0000001.html#1b01),which includes Shanghai Composite Index (SSE Index for short, stock symbol: 000001), China Unicom (stock symbol: 600050), and CSI 300 (stock symbol: 399300).SSE stock data was collected from December 20, 1990, to November 23, 2020 (7304 groups); China Unicom stock data was collected from October 9, 2002, to March 17, 2021 (4340 groups); CSI 300 stock data was collected from January 7, 2002, to March 17, 2021 (4567 groups).Each stock data includes the closing price, the highest price, the lowest price, the opening price, previous day's closing price, change, ups and downs, and other time series data.Table 1 is an overview of the three stock's data, and Tables 2-4, respectively, show partial data of the three stocks.
From these tables, the original dataset obtained from the network cannot be directly used for model training and testing, so the data preprocessing is required.In view of the missing or disordered attribute values in the original dataset, we first use interpolation and sorting by date and other operations to complete time series data in our experiment.
en, data normalization is employed to deal with the   problem of inconsistency in the magnitude of the data.
Finally, an effective dataset can be constructed for the experiment.Data normalization is to scale the data according to a certain scale and transform it into a specific interval.In this experiment, the value of data is converted to [0, 1], which improves the convergence speed and accuracy of the model.e transformation function is defined as follows: where max and min are denoted as the maximum value and minimum value of the sample data, respectively.

Experiment Setting and Implementation.
Many hyperparameters are required to be predefined, e.g., the neurons numbers of different layers, the learning rate, and the number of iterations.In our experiment, the neurons numbers of three fully connected layers in Dense model are set as 128, 32, and 1, respectively.e hidden neuron number of each layer in BiLSTM is all set to 64. e learning rate is 0.001, and the number of iterations is 200.Relu function is used as the activation function.Description of the parameter setting of the proposed method is shown in Table 5.
In this experiment, Keras is used as the framework of neural network and Python programming language is employed to implement network structure.According to the parameter setting of the proposed network, the specific CNN-BiLSTM-ECA model structure is shown in Figure 9.

Evaluation Criterion.
In order to analyze the performance of the proposed prediction model intuitively and quantitatively, three evaluation criteria including Mean Square Error (MSE), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE) are utilized to evaluate the prediction results comprehensively.
e Mean Square Error (MSE) represents the mean of the sum squares of the errors between the predicted data and the original data, which is defined as follows: where X t and X t ′ represent the predicted value and the true value, respectively.
e Root Mean Square Error (RMSE) defines the expected value of the square of the error caused by the predicted value and the true value, and its range is [0, +∞).When the predicted value is completely consistent with the true value, the value of RMSE is equal to 0, which is a perfect model.On the contrary, when the prediction error is larger, the value of the RMSE will be larger.e calculation formula is as follows: Table 4: Some data come from CSI 300 stock.e mean absolute error (MAE) represents the average of the absolute value of the deviation of all single observations, which can avoid the problem of mutual cancellation of errors and accurately reflect the size of the actual forecast error as below:  6.
From these figures, when the time step is set to 5, the prediction results have a large deviation.It means that the small-time step setting will ignore the influence of global factors.When the time step is set to 20, the time setting span is too large.
erefore, the data have certain fluctuations resulting in inaccurate prediction results.When the time step is set to 10, the error of the model is the smallest and the accuracy is the highest.erefore, the time step is set to 10; in other words, the stock data of the previous 10 days is used as the input layer of the neural unit, and the closing price of the 11th day is used as the label to train the model.
Table 6 visually analyzes and describes the forecast error value of each time step.It is found that the error trend between each time step is consistent with the error trend between the predicted price curve and the actual price curve in Figures 10-12.erefore, according to these results, when the time step is set to 10, the prediction model can achieve the best performance.

Experimental Results and Analysis.
In order to verify the effectiveness of the proposed model, different models including CNN [10], LSTM [15], BiLSTM [25,26], CNN-LSTM [20], BiLSTM-ECA, CNN-LSTM-ECA, and CNN-BiLSTM [27] are compared on the three stock datasets collected from the Shanghai Composite Index, China Unicom, and CSI 300.e prediction results are shown in Figures 13-15.As shown in these figures, the blue curve is the predicted value of the closing price of the stock, and the red curve is the true value of the closing price of the stock.Meanwhile, the x-axis is the time, and the y-axis is the normalization value of the stock price.
As shown in Figures 13-15, firstly, we can find that the prediction results of BiLSTM and LSTM methods are better than CNN model.It indicates that LSTM and BiLSTM take the trend of the stock price as time series in consideration, which is useful to improve the accuracy of the forecast.Moreover, the performance of BiLSTM is better than LSTM due to the fact that BiLSTM model can exploit the subsequent information of price stock time series.Secondly, introducing the CNN model to reduce the noise and capture nonlinear structure of the stock data well, CNN-LSTM and CNN-BiLSTM methods can outperform CNN, LSTM, and BiLSTM.Finally, by integrating the ECA model into LSTM and BiLSTM to select the important features and key information, BiLSTM-ECA, CNN-LSTM-ECA, and CNN-BiLSTM-ECA methods can achieve better performance than other methods.To sum up, the proposed CNN-BiLSTM-ECA model obtains the best prediction results.
e prediction errors of different methods on the three stock datasets are listed in Table 7.As shown in this table, firstly, the CNN model has the low prediction performance among all the compared approaches.Since BiLSTM uses bidirectional information, BiLSTM model can further improve more prediction ability than LSTM.en, owing to the deep feature extraction of CNN, CNN-LSTM and CNN-BiLSTM model have higher predictive power.At last, among all the compared methods, the CNN-BiLSTM-ECA model can achieve the best performance, which verifies the effectiveness of the CNN-BiLSTM-ECA network model.
In order to show the model performance more clearly, some prediction results of the LSTM model and the CNN-BiLSTM-ECA model on the part days of the three datasets are selected, as shown in Figures 16-18.From these figures, we can learn that the prediction results of the CNN-BiLSTM-ECA model are closer to the true value of the stock price, and the error is smaller.In summary, the experiment results verify the feasibility and effectiveness of the proposed network model.

Conclusions
For the stock market, learning the future price is very important for making investment decisions.is paper first proposes a new stock price time series forecasting network model (CNN-BiLSTM-ECA), which takes the stock closing price, the highest price, the lowest price, the opening price, the previous day's closing price, the change, the rise and fall, and other time series data as the input to predict the next day's stock closing price.
e proposed network model combines CNN and BiLSTM network models.Firstly, CNN is used to extract the deep features of the input data effectively.Secondly, feature vectors are constructed in time series as inputs to the BILSTM network for learning and prediction.At the same time, ECA attention model is introduced into the model to enhance the importance of learning features.us, the useless features are reduced and the accuracy of the model is further improved.e proposed model is compared with CNN, LSTM, BiLSTM, CNN-LSTM, CNN-BiLSTM, BiLSTM-ECA, and CNN-LSTM-ECA network models on three datasets.e experimental results show that the proposed model has the highest prediction accuracy and the best performance.e MSE, RMSE, and MAE of CNN-BiLSTM-ECA are the smallest among all methods, with the values of 1956.036,44.227, and 28.349, respectively, for the Shanghai Composite Index.Compared with single LSTM model, the reduction was 53.67%, 31.94%, and 43.96%, respectively.It can be seen that it is difficult to achieve high prediction accuracy by using only a single network, but the complexity of the network can improve its prediction accuracy.
e CNN-BiLSTM-ECA model proposed in this paper can effectively predict stock prices and provide relevant references for investors to maximize investment returns.
e existing problem is that the input characteristic parameters of the model selection are relatively single.erefore, it can be considered to increase the training features of the model, such as the emotional or sentiment features of news events or social media [69], so as to improve the prediction performance of the model from the perspective of feature selection.And then, we further apply our model on more application fields, such as gold price prediction, oil price prediction, foreign exchange price prediction, novelty detection [70], and optic disc detection [71].In addition, a graph-based embedding technology will be introduced to solve the problem of time series prediction [72].
(a) is a schematic diagram of a full connection and Figure 4(b) is a local connection diagram.

Figure 1
Figure 1: e structure diagram of LSTM unit.

Figure 4 :Figure 3 :
Figure 4: e diagram of full connection and local connection.

Figure 9 :
Figure 9: e model structure of the proposed CNN-BiLSTM-ECA using Keras to implement.
and W c represent the weight matrix of forgetting gate, input gate, output gate, and update state, respectively.b f , b i , b o , and b c represent the bias vectors of forgetting gate, input gate, output gate, and update state, respectively.x t represents the time series data of current time interval t, and h t−1 is the output of memory unit in the previous time interval t − 1.
e structure diagram of ECA.

Table 1 :
e detailed information of three stocks' data.

Table 2 :
Some data come from SSE stock.

Table 3 :
Some data come from China Unicom stock.

Table 5 :
Parameters setting of the proposed method.
Time step is a key parameter of the time series neural network, which affects the prediction performance of the network.erefore,weanalyze the time step parameters on the three datasets including the Shanghai Composite Index, China Mobile, and CSI 300.Firstly, the top 85% of the dataset is used for the training set, and the remaining 15% is used for the test set.Secondly, we conduct comparative experiments under the conditions that when the time steps are set as 5, 10, 15, and 20. e experimental results under different parameters are shown in Figures10-12.e prediction error values of each time step are shown in Table

Table 6 :
Forecast errors at different time steps.

Table 7 :
Forecast errors of different network models.