Financial Time Series Prediction Using Elman Recurrent Random Neural Networks

In recent years, financial market dynamics forecasting has been a focus of economic research. To predict the price indices of stock markets, we developed an architecture which combined Elman recurrent neural networks with stochastic time effective function. By analyzing the proposed model with the linear regression, complexity invariant distance (CID), and multiscale CID (MCID) analysis methods and taking the model compared with different models such as the backpropagation neural network (BPNN), the stochastic time effective neural network (STNN), and the Elman recurrent neural network (ERNN), the empirical results show that the proposed neural network displays the best performance among these neural networks in financial time series forecasting. Further, the empirical research is performed in testing the predictive effects of SSE, TWSE, KOSPI, and Nikkei225 with the established model, and the corresponding statistical comparisons of the above market indices are also exhibited. The experimental results show that this approach gives good performance in predicting the values from the stock market indices.


Introduction
Predicting stock price index is difficult due to uncertainties involved. In the past decades, the stock market prediction has played a vital role for the investment brokers and the individual investors, and the researchers are on the constant look out for a reliable method for predicting stock market trends. In recent years, the artificial neural networks (ANNs) have been applied to many areas of statistics. One of these areas is time series forecasting. References [1][2][3] reveal different time series forecasting by ANNs methods. ANNs have been also employed independently or as an auxiliary tool to predict time series. ANNs are nonlinear methods which mimic nerve system. They have functions of selforganizing, data-driven, self-study, self-adaptive, and associated memory. ANNs can learn from patterns and capture hidden functional relationships in a given data even if the functional relationships are not known or difficult to identify. A number of researchers have utilized ANNs to predict financial time series including backpropagation neural networks, back radial basis function neural networks, generalized regression neural networks, wavelet neural networks, and dynamic artificial neural network [4][5][6][7][8][9]. Statistical theories and methods play an important role in financial time series analysis because both financial theory and its empirical time series contain an element of uncertainty. Some statistical properties for the stock market fluctuations are uncovered in the literatures such as power-law of logarithmic returns and volumes, heavy tails distribution of price changes, volatility clustering, and long-range memory of volatility [1,[10][11][12].
The backpropagation neural network (BPNN) is a neural network training algorithm for financial forecasting, which has powerful problem-solving ability. Multilayer perceptron (MLP) is one of the most prevalent neural networks, which has the capability of complex mapping between inputs and outputs that makes it possible to approximate nonlinear function. Reference [13] employs MLP in trading and hybrid timevarying leverage effects and [14] in forecasting of time series. The two architectures have at least three layers. The first layer is called the input layer (the number of its nodes corresponds 2 Computational Intelligence and Neuroscience to the number of explanatory variables). The last layer is called the output layer (the number of its nodes corresponds to the number of response variables). An intermediary layer of nodes, the hidden layer, separates the input from the output layer. Its number of nodes defines the amount of complexity which the model is capable of fitting. In previous studies, forward networks have frequently been used for financial time series prediction, while, unlike forward networks, recurrent neural network uses feedback connections to model spatial as well as temporal dependencies between input and output series to make the initial states and the past states of the neurons capable of being involved in a series of processing. References [15][16][17] show the applications in different areas of recurrent neural network. This ability makes them applicable to time series prediction with satisfactory prediction results [18]. As a special recurrent neural network, the Elman recurrent neural network (ERNN) has been used in the present paper for prediction. ERNN is a time-varying predictive control system that was developed with the ability to keep memory of recent events in order to predict future output.
The nonlinear and nonstationary characteristics of the stock market make it difficult and challenging for forecasting stock indices in a reliable manner. Particularly, in the current stock markets, the rapid changes of trading rules and management systems have made it difficult to reflect the markets' development using the early data. However, if only the recent data are selected, a lot of useful information (which the early data hold) will be lost. In this research, a stochastic time effective neural network (STNN) and the corresponding learning algorithm were presented. References [19][20][21][22] introduce the corresponding stochastic time effective models and use them to predict financial time series. Particularly, [23] has shown a random data-time effective radial basis function neural network, which is also applied to the prediction of financial price series. The present paper has optimized the ERNN model which is different with the above models; also at first step of the procedures we employ different input variables from [23]. At the last section of this paper, two new error measure methods are first introduced to evaluate the better predicting results of the proposed model than other traditional models. For this improved network model, each of historical data is given a weight depending on the time at which it occurs. The degree of impact of historical data on the market is expressed by a stochastic process, where a drift function and the Brownian motion are introduced in the time strength function in order to make the model have the effect of random movement while maintaining the original trend. In the present work, we combine MLP with ERNN and stochastic time effective function to develop a stock price forecasting model, called ST-ERNN.
In order to display that the ST-ERNN can provide a higher accuracy of the financial time series forecasting, we compare the forecasting performance with the BPNN model, the STNN (ERNN). The Elman recurrent neural network, a simple recurrent neural network, was introduced by Elman in 1990 [24]. As is well known, a recurrent network has some advantages, such as having time series and nonlinear prediction capabilities, faster convergence, and more accurate mapping ability. References [25,26] combine Elman neural network with different areas for their purposes. In this network, the outputs of the hidden layer are allowed to feedback onto themselves through a buffer layer, called the recurrent layer. This feedback allows ERNN to learn, recognize, and generate temporal patterns, as well as spatial patterns. Every hidden neuron is connected to only one recurrent layer neuron through a constant weight of value one. Hence the recurrent layer virtually constitutes a copy of the state of the hidden layer one instant before. The number of recurrent neurons is consequently the same as the number of hidden neurons. To sum up, the ERNN is composed of an input layer, a recurrent layer which provides state information, a hidden layer, and an output layer. Each layer contains one or more neurons which propagate information from one layer to another by computing a nonlinear function of their weighted sum of inputs.

Elman Recurrent Neural Network
In Figure 1, a multi-input ERNN model is exhibited, where the number of neurons in inputs layer is and in the hidden layer is and one output unit. Let ( = 1, 2, . . . , ) denote the set of input vector of neurons at time , +1 denotes the output of the network at time + 1, ( = 1, 2, . . . , ) denote the output of hidden layer neurons at time , and ( = 1, 2, . . . , ) denote the recurrent layer neurons. is the weight that connects the node in the input layer neurons to the node in the hidden layer. , V are the weights that connect the node in the hidden layer neurons to the node in the recurrent layer and output, respectively. Hidden layer stage is as follows: the inputs of all neurons in the hidden layer are given by The outputs of hidden neurons are given by  where the sigmoid function in hidden layer is selected as the activation function: ( ) = 1/(1 + − ). The output of the hidden layer is given as follows: where ( ) is an identity map as the activation function.

Algorithm of ERNN with a Stochastic Time Effective Function (ST-ERNN).
The backpropagation algorithm is a supervised learning algorithm which minimizes the global error by using the gradient descent method [18,21]. For the ST-ERNN model, we assume that the error of the output is given by = − and the error of the sample is defined as where is the time of the sample ( = 1, . . . , ), is the actual value, is the output at time , and ( ) is the stochastic time effective function which endows each historical data with a weight depending on the time at which it occurs. We define ( ) as follows: where (> 0) is the time strength coefficient, 0 is the time of the newest data in the data training set, and is an arbitrary time point in the data training set. ( ) is the drift function, ( ) is the volatility function, and ( ) is the standard Brownian motion.
Intuitively, the drift function is used to model deterministic trends, the volatility function is often used to model a set of unpredictable events occurring during this motion, and Brownian motion is usually thought as random motion of a particle in liquid (where the future motion of the particle at any given time is not dependent on the past). Brownian motion is a continuous time stochastic process, and it is the limit of or continuous version of random walks. Since Brownian motion's time derivative is everywhere infinite, it is an idealised approximation to actual random physical processes, which always have a finite time scale. We begin with an explicit definition. A Brownian motion is a real-valued, continuous stochastic process { ( ), ≥ 0} on a probability space (Ω, F, P), with independent and stationary increments. In detail, we have the following: Computational Intelligence and Neuroscience data-time effective function, the impact of the historical data on the stock market is regarded as a time variable function; the efficiency of the historical data depends on its time. Then the corresponding global error of all the data at each network repeated training set in the output layer is defined as The main objective of learning algorithm is to minimize the value of cost function until it reaches the preset minimum value by repeated learning. On each repetition, the output is calculated and the global error is obtained. The gradient of the cost function is given by Δ = / . For the weight nodes in the input layer, the gradient of the connective weight is given by for the weight nodes in the recurrent layer, the gradient of the connective weight is given by and for the weight nodes in the hidden layer, the gradient of the connective weight V is given by where is the learning rate and (net ) is the derivative of the activation function. So the update rules for the weights , , and V are given by Note that the training aim of the stochastic time effective neural network is to modify the weights so as to minimize the error between the network's prediction and the actual target. In Figure 2, the training algorithm procedures of the stochastic time effective neural network are displayed, which are as follows.
Step 1. Perform input data normalization. In ST-ERNN model, we choose four kinds of stock prices as the input values in the input layer: daily opening price, daily highest price, daily lowest price, and daily closing price. The output layer is the closing price of the next trading day. Then determine parameters of the network such as learning rate which is between 0 and 1, the maximum training iterations number , and initial connective weights. Also, the topology of the network architecture is the number of neural nodes in the hidden layer in this paper.
Step 2. At the beginning of data processing, connective weights , V , and follow the uniform distribution on (−1, 1). Step 3. Introduce the stochastic time effective function ( ) in the error function . Choose the drift function ( ) and the volatility function ( ). Give the transfer function from the input layer to the hidden layer and the transfer function from the hidden layer to the output layer.
Step 4. Establish an error acceptable model and set preset minimum error . Based on network training objective = (1/ ) ∑ =1 ( ), if is below preset minimum error, go to Step 6; otherwise go to Step 5.
Step 5. Modify the connective weights: calculate the gradient of the connective weights , Δ , V , ΔV , , and Δ . Then modify the weights from the layer to the previous layer, +1 , V +1 , or +1 . Step 6. Output the predictive value To reduce the impact of noise in the financial market and finally lead to a better prediction, the collected data should be properly adjusted and normalized at the beginning of the modelling. There are different normalization methods that are tested to improve the network training [27,28], which include "the normalized data in the range of [0, 1]" in the following equation, which is also adopted in this work:

Forecasting and Statistical Analysis of Stock Price
where the minimum and maximum values are obtained on the training set during the training process. In order to obtain Computational Intelligence and Neuroscience 5 Step 1: construct the ST-ERNN model Step 2: initialize the connective algorithm Step 4: establish the error accuracy If E <

Yes
Step 6: output predictive value y t+1 Input preset minimum error Compute the cost function E

No
Set the topology of network architecture Establish input vector x t and output y t+1 Set the training data Introduce the stochastic time effective function Establish the update rule for the weights Apply transfer function Step 5: modify the connective weights Step 3: set up the learning weights w ij , j , and c j w ij , j , or c j Modify the weights w ij , j , or c j Δ j , and Δc j Calculate the gradient of weights Δw ij , where is the parameter which is equal to the number of samples in the datasets and is the mean of the sample data.
Then the corresponding cost function can be written by Figure 3 shows the predicting results of training and testing data for SSE, TWSE, KOSPI, and Nikkei225 with the ST-ERNN model correspondingly. The curves of the actual data and the predictive data are intuitively very approximating. It means that with many times experiments the financial time series have been well trained; the forecasting results are desired by ST-ERNN model.
The plots of the real and the predictive data for these four price series are, respectively, shown in Figure 4. Through the linear regression analysis, we make a comparison of the predictive value of the ST-ERNN model with the real price data. It is known that the linear regression can be used to fit a predictive model to an observed data set of and . The linear equations of SSE, TWSE, KOSPI, and Nikkei225 are exhibited, respectively, in Figures 4(a)-4(d). We can observe that all the slopes of the linear equations for them are drawn near to 1, which implies that the predictive values and the real values are not deviating too much. A valuable numerical measure of association between two variables is the correlation coefficient . Table 2 shows the values of , , and for the above indices. is given as follows: where is the actual value, is the predicting value, is the mean of the actual value, is the mean of the predicting value, and is the total number of the data.

Comparisons of Forecasting Results.
We compare the proposed and conventional forecasting approaches (BPNN, STNN, and ERNN model) on the four indices mentioned above, where STNN is based on the BPNN and combined with the stochastic effective function [19]. For these four different models, we set the same inputs of the networks, including four kinds of series: daily open price, daily closing price, daily highest price, and daily lowest price. The network output is the closing price of the next trading day. In the stock markets, the practical experience shows us that the above four kinds of data of the last trading day are very important indicators when predicting the closing price of the next trading day. To choose better parameters, we have carried out many experiments on these four different indices. In order to achieve the optimal networks of each forecasting approach, the most appropriate numbers of neural nodes in the hidden layer are different; the learning rates are also varying by training different models; see Table 3. In Table 3, "Hidden" stands for the number of neural nodes in the hidden layer, and "L. r" stands for learning rate. The hidden number is also chosen by referring to [21,29,30]. The experiments have been done repeatedly to determine hidden nodes and training cycle in the training process. The details of principles of how to choose the hidden number are as follows: If the number of neural nodes in the input layer is , the number of neural nodes in the hidden layer is set to be nearly 2 + 1, and the number of neural nodes in the output layer is 1. Since the ERNN model and the ST-ERNN model have similar topology structures, in Table 3 To analyze the forecasting performance of four considered forecasting models deeply, we use the following error evaluation criteria [31][32][33][34][35]: the mean absolute error (MAE), the root mean square error (RMSE), and the correlation coefficient (MAPE); the corresponding definitions are given as follows: where and are the real value and the predicting value at time , respectively. is the total number of the data. Noting that MAE, RMSE, and MAPE are measures of the deviation between the prediction values and the actual values, the prediction performance is better when the values of these evaluation criteria are smaller. However, if the results are not consistent among these criteria, we choose the MAPE as the benchmark since MAPE is relatively more stable than other criteria [16].  Figure 6; we can see that the large fluctuation period forecasting is relatively not accurate from these four models.  When the stock market is relatively stable, the forecasting result is nearer to the actual value. Compared with the BPNN, the STNN, and the ERNN models, the forecasting results are also presented in Table 4, where the MAPE(100) stands for the latest 100 days of MAPE in the testing data. Table 4 shows that the evaluation criteria by the ST-ERNN model are almost smaller than those by other models. From Table 4 and Figure 6, we can conclude that the proposed ST-ERNN model is better than the other three models. In Table 4 time series forecasting of STNN model is superior to that of BPNN model, and the dynamic neural network is effective, robust, and precise than original BPNN for these four indices.
Besides, most values of MAPE(100) are smaller than those of MAPE in all stock indexes. Therefore, the short-term prediction outperforms the long-term prediction. Overall training and testing results are consistent with the measured data, which demonstrates that the ST-ERNN predictor has higher forecast accuracy. In Figures 7(a), 7(b), 7(c), and 7(d) we considered the relative errors of the ST-ERNN forecasting results. Figure 7 depicts that most of the predicting relative errors for these four price series are between −0.1 and 0.1. Moreover, there are some points with large relative errors of forecasting results in four models, especially on the SSE index, which can attribute to the large fluctuation that leads to the large relative errors. The definition of relative error is given as follows: where and denote the actual value and the predicting value, respectively, at time , = 1, 2, . . . .

CID and MCID Analysis
The analysis and forecast of time series have long been a focus of economic research for a more clear understanding of mechanism and characteristics of financial markets [36][37][38][39][40][41][42].
In this section, we employ an efficient complexity invariant distance (CID) for time series. Reference [43] shows that complexity invariant distance measure can produce improvements in classification and clustering in the vast majority of cases.
Complexity invariance uses information about complexity differences between two time series as a correction factor for existing distance measures. We begin by introducing Euclidean distance and use this as a starting point to bring in the definition of CID. Suppose we have two time series, and , of length . Consider The ubiquitous Euclidean distance is The Euclidean distance, ED( , ), between two time series and , can be made complexity invariant by introducing a correction factor where CF is a complexity correction factor defined as and CE( ) is a complexity estimate of a time series , which can be computed as follows: Computational Intelligence and Neuroscience  It is worth noticing that CF accounts for differences in the complexities of the time series being compared. CF forces time series with very different complexities to be further apart. In the case that all time series have the same complexity, CID simply degenerates to Euclidean distance. The prediction performance is better when the CID distance is smaller; that is to say the curve of the predictive data is closer to the actual data. The actual values can be seen as the series and the predicting results as the series . Table 5 shows CID distance between the real indices values of SSE, TWSE, KOSPI, and Nikkei225 and the corresponding predictions from each network model. It is clear that the CID distance between the real index values and the prediction by ST-ERNN model is the smallest one; moreover the distances by the STNN model and the ERNN model are smaller than those by the BPNN for all the four considered indices.
In general, the complexity of a real system is not constrained to a sole scale. In this part we consider a developed CID analysis, that is, the multiscale CID (MCID). The MCID analysis takes into account the multiple time scales while measuring the predicting results, and it is applied to the stock prices analysis for the actual data and the predicting data in this work. The MCID analysis should comprise two steps. (i) Considering one-dimensional discrete time series { 1 , 2 , . . . , , . . . , }, we construct consecutive coarse-grained time series { ( ) }, corresponding to the scale factor , according to the following formula: For scale one, the time series { (1) } is simply the original time series. The length of each coarse-grained time series is equal to the original time series divided by the scale factor . (ii) Calculate the CID for each coarse-grained time series and then plot as a function of the scale factor. Figure 8 shows the MCID values between the forecasting results and the real market prices from BPNN, ERNN, STNN, and ST-ERNN models. In Figure 8, it is obvious that the MCID from  ST-ERNN with the actual value is the smallest one in any scale; that is, the ST-ERNN (with the stochastic time effective function) for forecasting stock prices is effective.

Conclusion
The aim of this research is to develop a predictive model to forecast the financial time series. In this study, we have developed a predictive model by using an Elman recurrent neural network with the stochastic time effective function to forecast the indices of SSE, TWSE, KOSPI, and Nikkei225. Through the linear regression analysis, it implies that the predictive values and the real values are not deviating too much. Then we take the proposed model compared with BPNN, STNN, and ERNN forecasting models. Empirical examinations of predicting precision for the price time series (by the comparisons of predicting measures as MAE, RMSE, MAPE, and MAPE(100)) show that the proposed neural network model has the advantage of improving the precision of forecasting, and the forecasting of this proposed model much approaches to the real financial market movements. Furthermore, from the curve of the relative error, it can make a conclusion that the large fluctuation leads to the large relative errors. In addition, by calculating CID and MCID distance the conclusion was illustrated more clearly.
The study and the proposed model contribute significantly to the time series literature on forecasting.