Evolutionary Framework with Bidirectional Long Short-Term Memory Network for Stock Price Prediction

As an important part of the social economy, stock market plays an important role in economic development, and accurate prediction of stock price is important as it can lower the risk of investment decision-making. However, the task of predicting future stock price is very diﬃcult. This diﬃculty arises from stocks with nonstationary behavior and without any explicit form. In this paper, we propose a novel bidirectional Long Short-Term Memory Network (BiLSTM) framework called evolutionary BiLSTM (EBiLSTM) for the prediction of stock price. In the framework, three independent BiLSTMs correspond to diﬀerent objective functions and act as mutation individuals, then their respective losses for evolution are calculated, and ﬁnally, the optimal objective function is identiﬁed by the minimum of loss. Since BiLSTM is eﬀective in the prediction of time series and the evolutionary framework can get an optimal solution for multiple objectives, their combination well adapts to the nonstationary behavior of stock prices. Experiments on several stock market indexes demonstrate that EBiLSTM can achieve better prediction performance than others without the evolutionary operator.


Introduction
It is essential for investors to forecast the future price of a stock because the risk of decision-making can be mitigated by appropriately determining its future movement. e topic has attracted many researchers from various academic fields. However, it is a challenging task to predict accurately. In the early stage, most of them used moving average [1], linear regression [2], hidden Markov model (HMM) [3], autoregressive integrated moving average (ARIMA) [4], and prophet model [5] to predict stock prices and their trend.
Currently, neural networks based on deep learning are dominant so far in time series prediction as surveyed, especially Long Short-Term Memory (LSTM) [6]. Recurrent Neural Network (RNN) is also widely used to predict stock prices [7], which applies a decision-making method based on an estimate of the zero-crossing rate to enhance the ability of prediction. Relative insensitivity to gap length is an advantage of LSTM over CNN, RNN, HMM, and other learning methods in numerous applications [8][9][10]. In some of the initial researches, scholars used only raw financial data for price prediction, which utilized LSTM to predict high and low prices of soybean futures using the data set from the Dalian Commodity Exchange [11]. Later some researchers found that preprocessing of the original data can improve the accuracy of the prediction [12], which proposed a movement trend-based data prediction method to preprocess the trend indicator.
Deep neural networks (DNN) have also been widely applied in stock price prediction to identify trends and patterns. Go and Hong [13] firstly trained DNN by the data of financial time series and then tested and confirmed the predictability of their model. Fluctuation of the stock price was predicted by DNN with 715 novel input features [14].
e performance of their model was also compared with the other models with simple price-based input features. To predict the stock market behavior, the performance of DNN was examined in which high-frequency intraday stock returns were considered as the input [15]. In the model, the predictability of principal component analysis (PCA), autoencoder, and restricted Boltzmann machine (RBM) was analyzed. e results showed that DNN has good predictability with the information received from the residuals of the autoregressive mode. Moreover, it has been found that financial news may be one of the key factors to produce fluctuations in stock prices. Several sentiment analysis studies have tried to point out the relationship between reaction from investors and news [16], which utilized a novel two-stream gated recurrent unit network to predict the directions of stock prices by using Stock2Vec.
However, due to the complexity of stock data, it is difficult to obtain satisfactory accuracy by only using simple preprocessing or a single LSTM model. By both the complex preprocessing and a hybrid model, the prediction accuracy can be significantly improved [16,17]. In [17], a large-scale deep learning model was proposed to predict price movements from data of Limit Order Book (LOB) [18]. e architecture utilized CNN to learn the spatial feature of the LOBs and LSTM to remember longer time dependencies series. Framework with multiple LSTMs was also studied extensively to improve the performance of prediction, in which different LSTM can capture different features of data [19,20].
On the other hand, evolutionary algorithms are inspired by biological evolution mechanisms and simulating evolutionary processes such as reproduction, mutation, genetic recombination, and natural selection, for evolutionary calculation of candidate solutions to optimization problems. Since the evolutionary algorithm is a highly robust and widely applicable global optimization algorithm [21], many scholars have begun to use it to optimize various complex models. e combination of evolutionary algorithms and neural networks can further improve network performance. In recent years, there have been many achievements in practical applications in multiple fields [22][23][24][25][26][27][28][29]. To minimize human participation in designing deep learning algorithms and automatically discover such configurations, there have been some attempts to optimize deep learning hyperparameters through an evolutionary search [22,23]. For network optimization, Generative Adversarial Network (GAN) [24] can generate attacked images by one-pixel adversarial perturbation based on differential evolution (DE), that is, blackbox attack, which only required a little adversarial information and can fool many types of networks due to the inherent features of DE. Moreover, GAN can also make the generated image more stable through an evolutionary algorithm [25], in which the adversarial game was composed of a population of generators and acted as the discriminator, thereby improving the generative performance. In reinforcement learning, the network topology can also be optimized by combining differential evolution and metaheuristic algorithms [26]. In [27], the transfer learning was used as agents and embedded in the neural network to determine which parts of the network can be reused for a new task. During learning, a tournament selection genetic algorithm was used to select pathways through the neural networks. LSTM combined with evolutionary algorithms has been reported in the prediction of time series [28,29]. In [28], the gradient descent method in LSTM was combined with the particle swarm optimization (PSO) algorithm to update the weights of network. In [29], an evolutionary attention-based LSTM was proposed for multivariate time series prediction, which can refrain from being trapped into partial optimization like traditional gradient-based methods.
In this paper, EBiLSTM is proposed for stock price prediction, which takes the BiLSTM training procedure as an evolutionary problem. Specifically, the training process of each BiLSTM has its adaptive loss functions and is independent. A population of BiLSTMs evolves corresponding to the training process of multi-BiLSTMs.
e BiLSTM is trained for predicting the stock price of the next day during each training (or evolutionary) iteration.
In summary, contributions in this paper are listed as follows: (i) A framework named EBiLSTM is proposed which integrates BiLSTM and evolutionary algorithm to effectively predict stock price. As far as we know, it is the first report on the approach. (ii) An evolution strategy is proposed which uses multiobjective functions (square loss, abs loss, and Huber loss) to optimize BiLSTM. (iii) Performances are evaluated with several stock market indexes and the results demonstrate that the proposed EBiLSTM can get more accuracy of prediction than others. e rest of this paper is organized as follows: in Section 2, EBiLSTM together with its training process is proposed. Section 3 provides the experimental validation of the method. Finally, the conclusion is presented in Section 4.

Framework.
In contrast to conventional BiLSTM which utilizes a single BiLSTM to train the stock data of the real world, an evolutionary algorithm is used that evolves a population of BiLSTM (s), that is, BiLSTM { }, in the training process. In this population, each individual stands for a possible solution in the weights space of the BiLSTM. During the evolutionary process, mutation operations (different objective functions are chosen dynamically) are used to generate different offspring individuals (the weights of different BiLSTM). As shown in Figure 1, there are three substages in each step of evolution: (i) Mutation: Given an individual BiLSTM θ in the population, the variation operators are used to produce its offspring BiLSTM θ1 , BiLSTM θ2 , . . . . Each individual creates some copies and mutations are used to modify each of them. e modified copies are taken as children.
(ii) Evaluation: BiLSTM training process is taken as fitness function F(·) which is used to evaluate the performance of each child. It can be represented by fitness score. (iii) Selection: According to the fitness score, all children are sorted from high to low. Some of the lowest ones, that is, the worst ones, are deleted. e remaining children are kept and act as parents to be further evolved at the next iteration.
After each evolutionary circle, the BiLSTM is updated to predict the price of the next day; that is, L BiLSTM � F BiLSTM (square, abs or huber). (1) Here, we take a simple example to show the process of evolution. Data of stock A is split into various batches to train BiLSTM. e first batch, that is, data_1, is input to BiLSTM. According to the difference between the outputs and the corresponding labels, various losses, such as square, abs, and Huber, are calculated. Among them, the least one is selected and the weights of BiLSTM are updated. en, the second batch, that is, data_2, is input to BiLSTM. e training process continues until the loss becomes small enough. us, the adaptive losses from the evolution of the BiLSTM population can produce optimal solutions.

Mutation.
Sexual reproduction with different mutations is employed to produce the next BiLSTM's individuals (i.e., children). Specifically, these mutation operations are taken as different training objectives, which try to narrow the distance between the predicted value and the real stock price. In this section, the mutation is presented in detail. To analyze the properties of these mutations, we assume that the optimal BiLSTM model can be got from each evolutionary circle.
(1) Abs mutation (L1 loss): e abs mutation represents the abs objective function in the original BiLSTM: e abs aims to minimize the absolute value between the prediction value and the real close price (label). If there are outliers in the training set which may corrupt the training process, it is necessary to use L1 loss. However, L1 loss has a serious problem. Its gradient is kept the same throughout. When L1 loss is small, the gradient will be large which may impede learning.
(2) Square mutation (L2 loss): e square mutation represents the square objective function in the original BiLSTM: Gradients of the square mutation can be used for BiLSTM training. When L2 loss approaches zero, it means that the prediction accuracy of BiLSTM is very high (i.e., L BiLSTM ⟶ 0). While when L2 loss is close to infinity, it means that the training of BiLSTM is not effective. Because L2 loss is square of the error (y − f(x) � e), the error (e) increases a lot when e > 1. Once there is an outlier in our data, e may be high and e 2 may be ≫|e|. Due to the outliers, the weights of a model will be affected more seriously by L2 loss than by L1 loss. (3) Huber mutation: the Huber mutation represents the Huber objective function in the original BiLSTM:

Evaluation.
For evolutionary algorithm, fitness function (i.e., evaluation operation) is used to measure the quality of an individual. In this paper, the loss function is taken as the fitness function which focuses on minimal loss. As shown in equation (1), the smaller the loss value is, the better the fitness is. Note that BiLSTM is constantly updated to get the optimal solution in the training process. If a BiLSTM has a relatively high fitness value, its prediction result can get better performance.

Selection.
In an evolutionary algorithm, the selection is the counterpart of the mutation operators. In the proposed EBiLSTM, a simple yet useful survivor selection strategy is used to determine the next generation based on the fitness score of existing individuals.
Particularly, the BiLSTMs (i.e., population) are optimized in a dynamic procedure. us, the fitness function is changeable and we can evaluate the fitness score of each BiLSTM from the corresponding training process in the same evolutionary generation. It indicates that we cannot compare the fitness scores evaluated in different generations with each other. Moreover, because the mutation operators of the proposed EBiLSTM actually represent selection from different BiLSTM training objectives, the desired offspring represents the effective training strategies. Taking into account both the fitness function and the mutation operators, the selection mechanism of EBiLSTM is taken as the comma selection, i.e., (μ, λ)-selection. Specifically, after the current offspring population BiLSTM i λ i�1 is sorted according to their fitness scores F i , the μ − best individuals are selected as population of the next generation.

Data Preprocessing.
Stock data have variables with multiple dimensions, such as opening price, closing price, the highest price, the lowest price, and trading rate. Among these prices, the closing price is always the most concern by investors. erefore, we use the closing price as the input variable. Our experiments show that the results are basically the same when other prices are used as input variables.
As shown in Figure 2, the overall data is divided into a training set and a test set, respectively. A rolling window is used to segment data. We use the way of N + 1 (i.e., the closing price of the previous N days is used to predict the closing price of the next day) to train EBiLSTM continuously. After the train finishes, the data of the last N days in the training set are used as input data to predict the first day of the test set.
en the rolling window of input data is shifted forward one day and the last day of the input data becomes vacant. In this case, the first day of the test data is used as the last day of the input data to predict the second day of the test data and so forth. Once the last day of the test set is predicted, the test process is over.
To reduce the noise of stock data and benefit the detection of stock price pattern, it is necessary to smooth and normalize the stock price data since every stock may have its specific domain and scale. Data normalization is defined as adjusting values measured on different scales to a uniform scale [30], while data smoothing is defined as transforming stock prices into variations of daily change.
e data smoothing is shown in the following equation: where x (s,t) denotes the result of data smoothing at the t th day. Here, when t � 1, we set x (s,t) � 0. e learning of BiLSTM is in fact to get stock patterns which can be magnified by "min-max" normalization of the data set. e "min-max" normalization method is shown as follows: where x (n,t) denotes the data after normalization, x (s,t) is original data, x (s, min) is the minimum among the data set, and x (s, max) is the maximum. Accordingly, denormalization and desmoothness are required at the end of the prediction process to get the original price, which are given by where x (n,t) denotes the predicted data, x (s,t) denotes the predicted data after denormalization, and x t denotes the predicted data after both denormalization and desmoothness.
2.6. BiLSTM. As a variant of LSTM [6], BiLSTM can capture context information more comprehensively and the correlations between contexts. Two LSTM networks, one is with a forward direction and the other is with a backward direction, are connected to the same output layer. Both of them are trained with the same sequence of data. ere are three gates, that is, input gate, forget gate, and output gate, in a unit of LSTM. Equations (9)- (14) show the calculation processes:  Mathematical Problems in Engineering Here, w i , w f , and w o are the weights of LSTM and b i , b f , and b o are the biases. i t is input gate, f t is forget gate, and o t is output gate. e input vector is x t and the output vector is h t . c t is the cell state and t * c t means the candidate of the cell state. For the forward LSTM, it can be presented as

2.7.
EBiLSTM. e complete process of EBiLSTM can be shown in Algorithm 1. At each evolutionary circle, BiLSTMs are updated with different mutations (or objectives). Among children of the next generation, only well-performing ones will survive and participate in the next rotation of training, following the principle of "survival of the fittest." Unlike a single BiLSTM with a fixed and static training objective, EBiLSTM integrates the advantages of different training objectives and selects the best solution. erefore, during training, EBiLSTM can not only largely suppress the limitations (local optimal, etc.) of individual training objectives but also harness their advantages to find a better solution.

Implementation Details.
In order to evaluate the proposed EBiLSTM, experiments on several stock prediction tasks are run and their prediction results are presented in this section. Comparing with previous BiLSTM models, we show that the proposed EBiLSTM can achieve better stock prediction. Configurations from Table 1 are used in all the following experiments.
We evaluate EBiLSTM with three stock market indexes, as shown in Table 2. Every index includes data of 4750 days which are large enough to train EBiLSTM effectively.

Evaluation Metrics.
To evaluate the proposed EBiLSTM, we use Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Mean Square Error (MSE) as quantitative metrics. ey are shown as follows: For MAE, the error is calculated as an average of absolute differences between the target values and the predictions. It is a linear score and always nonnegative which means that all the individual differences are weighted equally on average [31]. e closer to 0 its value is, the higher the accuracy is. MAPE is measured by calculating the absolute error in each Input: population size P � N, the number of mutations n m , the batch size m, batch data D, and initial weight ω 0 , output: close price of the next day (1) ω � ω 0 (2) Initializes model parameter ω 0 : (3) for i � 1 to m/(Nn m ) (4) param ⟵ ω save model parameters (5) for j � 1 to N (6) for k � 1 to n m (7) M(param) assign parameters to the model (8) get a batch D as input x i of EBiLSTM; (9) switch(k) (10) case1: loss square , param square ⟵ M(x i , square, param) (11) case2: loss abs , param abs ⟵ M(x i , abs, param) (12) case3: loss huber , param huber ⟵ M(x i , huber, param) (13) end switch (14) if k � n m (15) loss min ⟵ min(loss square , loss abs , loss huber ) (16) param new ⟵ (loss min , param square , param abs , param huber ) (17) ω ⟵ param new (18) end for (19) end for (20)  Mathematical Problems in Engineering period, dividing this by actual value for that period, and finding the average of absolute percentage errors. MSE is basically based on the average squared error of our predictions. For each point, it calculates the square difference between the predictions and the target and then averages those values. e higher this value is, the worse the model is.

Effectiveness.
To evaluate the effectiveness of the EBiLSTM, it is compared with BiLSTMs under different loss functions, that is, BiLSTM-square, BiLSTM-abs, and BiLSTM-Huber. After training, closing prices of 50 days are predicted. For each model, simulations are taken 5 times independently and their average results of metrics are shown in Tables 3-5. ree stock market indexes shown in Table 1 are used in simulations, respectively. From these tables, it is evident that EBiLSTM achieves the best performance among the four models. e three models with a single objective function always get worse results because their objective functions cannot keep optimal during training. It may be easier for them to be suffered from local minima of parameters.

Training Stability.
To further examine the performance of prediction for different length of days, EBiLSTM as well as BiLSTM-square with a single objective is simulated with different days of prediction. e results with Shanghai Securities Composite Index are shown in Table 6. In this table, EBiLSTM can always get the best performance at different days of prediction.

Generality.
e architecture of EBiLSTM is general which can integrate different deep learning algorithms and keeps good performance. To demonstrate the generality, LSTM, GRU, and RNN are used to replace BiLSTM in the architecture, which are named ELSTM, EGRU, and ERNN, respectively. Simulation results with Shanghai Securities Composite Index are shown in Tables 7-9. In Table 7, ELSTM is compared with its corresponding algorithms with a single objective, that is, LSTM-square, LSTM-abs, and          Tables 8 and 9, respectively.

Conclusion
Stock market exchanges have become popular, encouraging researchers to find predictions using new technologies or methods. Proper predictive techniques can help investors get higher profits from the stock market. However, it is difficult to improve the prediction only by using neural networks because gradient descent in the training process is easy to fall into local optimal. To overcome it, we propose an evolutionary framework (EBiLSTM) for stock prediction. In the framework, we propose an evolutionary algorithm to evolve a population of BiLSTM. In contrast to conventional BiLSTM, the evolution of EBiLSTM selects three different BiLSTM objective functions as mutated individuals, then calculates their respective losses for evaluation, and finally selects the optimal objective function through the minimum loss. Experiments show that EBiLSTM can improve the training stability of BiLSTM and achieves more accuracy of prediction than others in various stock market indexes. For further investigation, there are still some promising directions in the future. For example, the BiLSTM with attention mechanism might have more potential to get better performance. In forecast-based trading, it is interesting to design a portfolio allocation to improve the performance of BiLSTM.
Data Availability e data are available at https://tushare.pro/.

Conflicts of Interest
e authors declare that they have no conflicts of interest.