Advantages of Combining Factorization Machine with Elman Neural Network for Volatility Forecasting of Stock Market

School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China Beijing Laboratory of National Economic Security Early-Warning Engineering, Beijing Jiaotong University, Beijing 100044, China School of Humanities, Social Sciences & Law, Harbin Institute of Technology, Harbin, China National Academy of Economic Security, Beijing Jiaotong University, Beijing 100044, China Beijing Center for Industrial Security and Development Research, Beijing Jiaotong University, Beijing 100044, China


Introduction
In recent years, the fluctuation analysis of financial time series has received a lot of concerns. Stock market volatility prediction has become a significant topic in economic research. e study of stock market volatility forecasting can be helpful for policy makers to take appropriate decisions on asset allocation and risk management. erefore, predicting the volatility of financial time series with a reasonable accuracy deserves much attention. However, stock market exhibits nonlinear and chaotic properties in nature [1,2]. Statistical models then have some difficulties in dealing with nonlinear and nonstationary time series or deriving satisfactory forecasting performance under the statistical assumptions of normally distributed observations. e predicting becomes more challenging.
Artificial neural network has the advantages on learning from sample data and capturing the nonlinear relations among interconnected neurons through training mode [3]. It is capable of dealing with nonlinear highdimensional data and approximating any nonlinear functions with arbitrary precision [4][5][6][7]. Particularly, the simple recurrent network, i.e., Elman neural network (Elman NN) [8] has shown its stronger ability as it has the characteristic of time-varying. And the Elman NN is a kind of feedback network where the added layer connecting to the hidden layer can be regarded as a time delay operator capable of memorizing recent events. It is a time-varying predictive control system that has faster convergence and more accurate mapping ability . Elman NN has been utilized to financial prediction and applied to many other different types of time series. Most studies on Elman NN obtained higher accuracy. Zheng [9] used an Elman NN to forecast opening prices of the Shanghai Stock Exchange. Wu and Duan applied the Elman NN in predicting stock [10] and gold future markets [11], respectively. In the area of electricity prediction, Rani and Victoire [12] integrated the decomposition method and group search optimization algorithm into the Elman NN. It showed that the Elman NN outperformed other approaches.
ere are also other artificial neural networks like wavelet neural network and radial basis function neural network [13][14][15][16]. Some developed artificial intelligence techniques like expert systems [17,18], support vector machines (SVMs) [19,20], and hybrid methods [21,22] are also applied in forecasting stock prices. Recently, some novel models have utilized random jump or random time effective function with different neural networks [23,24] which have been proposed in forecasting financial market.
Although the models which are based on artificial intelligent have achieved remarkable results, there are still limitations. ere is a few technique in most models which pay attention to the nonlinear interactions among the inputs. For example, the nonlinearities in neural network models were handled by the activation functions. ese models without consideration of interactions between features with different scales have been widely used in some applications such as image processing, mechanical translation, and speech recognition [25][26][27].
FM is originally used for collaborative recommendations which were first introduced by Rendle [28]. FM is a supervised learning method that can model feature interactions with second-order even when the data have very high sparsity and high dimension. FMs show state-of-the-art performance as they have two main benefits. First, FMs are on a par with polynomial regression but can achieve empirical accuracy with smaller and faster evaluation results. Second, unlike the linear regression, FMs can infer the weights of feature interactions that were not observed in the training dataset. e weights of second-order feature interactions have the low-rank property which makes FMs become increasingly popular in the recommender system. Although FM is a general framework of matrix factorization, FM shows more flexibility as the matrix factorization method only models the relation between two entities [29]. FMs are general predictors like SVMs and have a lot of applications in industry. FMs are applicable to any variables with real feature and are not restricted to recommender systems. FM gives a promising direction for the prediction purpose in regression, classification, and ranking [30][31][32][33].
As far as we know, real-world time series is rarely pure nontime-varying. And the linear regression is not always capable of deriving the interactions between features which however are more common in various applications. Hence, the problem of dealing with time-varying and nonlinear interactions can be solved by combining FM with Elman NN. Moreover, it is almost universally agreed in the forecasting literature that no single model is the best in every situation because a real-world problem is often complex. Using any single model may not be able to capture different patterns equally well [34]. erefore, we propose a forecasting model combining FM technique with Elman NN for stock market volatility prediction in the present paper.
In this paper, we apply the FM-Elman neural network to forecast the volatility degree's behavior of the Standard & Poor's 500 Composite Stock Price (S&P 500) index, the Dow Jones industrial average (DJIA) index, the Shanghai Stock Exchange Composite (SSEC) index, and the Shenzhen Securities Component index (SZI) from January 2 nd , 2000, to December 31 st , 2011. Different threshold values were introduced into our model, and the corresponding volatility prediction results were presented. To show the advantages of the proposed FM-Elman model, we compare the predicting results with two other neural network models including BP network and Elman recurrent network through three performance evaluation measures such as the mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE). e remainder of this paper is presented as following sections. In Section 2, the Elman NN and FM are reviewed where they are prepared for our proposed model. en, we give the prediction model FM-Elman neural network in Section 3. In this section, we first give the model description and in the same time introduce some needed ingredients of it. And the algorithm of the FM-Elman model is also given. Section 4 presents the main forecasting results of the FM-Elman model.
is section gives predicting comparisons among our proposed model, BP neural network, and Elman neural network. It not only presents the effects of different parameters like volatility degree and user-specified dimension on the FM-Elman model's performance but also considers other evaluation measures. And Section 5 highlights some necessary conclusions finally.

Elman Neural Network (Elman NN).
Elman neural network was founded by Elman [8] in 1990 which is famous for its recurrent topology structure. Unlike the BP network, an Elman NN has a set of recurrent nodes. e so-called recurrent nodes in the buffer received message from the peered output nodes in the hidden layer and then transmitted message to the hidden layer. Every hidden node is connected to only one recurrent neuron, and the message will remain the same after transmitting. Hence, the number of recurrent layer nodes is the same as the number of hidden nodes, and the recurrent layer contains the state of input data from the hidden layer. Figure 1 gives the structure of multi-input Elman NN. e Elman NN is composed of the input layer, the hidden layer, the output layer, and the recurrent layer. ere are n nodes in the input layer, and both the hidden layer and the recurrent layer have m nodes. In the output layer, there exists only one unit neuron. e mathematical computation for the nonlinear state of the Elman NN is Complexity where u → (t) is the vector of output values in the hidden layer, y(t) is the final output of the network, and x ., x n,t−1 ) denotes the input of the network at time t − 1. e weight matrix W A connects the input layer node to the node in the hidden layer, W B connects the node in the recurrent layer to the hidden layer neuron, and W C is the matrix which connects the node in the hidden layer to the output node. Functions f(·) and g(·) are the activation functions where f(x) is the sigmoid function and g(x) is an identity function in this paper.
From equation (1) and through deduction, we can obtain that where z → (t) depends on the matrix W A t−1 and W B t−1 which comes from different time. Elman NN has the ability to adapt to time series varying.

Factorization
Machine. FM has the same prediction ability as SVMs but also has capability of estimating reliable parameters under very sparse data. e feature of modelling all variable interactions is comparable to a polynomial kernel in SVM. e equation for a FM with second-order feature is defined as follows: where the parameters w 0 ∈ R, w ∈ R n , and V ∈ R n×k have to be determined. And 〈·, ·〉 is the inner product of two vectors with size k. en, which models the interaction between the ith variable and the jth variable, where v i is the ith variable with k(∈ N + 0 ) dimension factors.
Our intuition for the complexity of equation (3) is in O(kn 2 ) because all pairwise interactions have to be computed. As there is no parameter in a model depending on two variables directly, the pairwise interactions in equation (3) are reformulated as follows: And the equation only needs linear runtime O(kn) to be computed after the reformulation. So, FMs are applicable from a computational point of view.

Our Proposed Method
3.1. Modelling. We construct the Elman recurrent neural network with factorization machine, i.e., FM-Elman neural network, to predict the volatility of different stock indexes. e detailed topology of FM-Elman neural network is presented in Figure 2.
e layers of the FM-Elman neural network are analyzed as follows: (1) Hidden Layer. e nodes in the hidden layer are partitioned into two parts. One part of them has normal nodes which show the linear relations among the input data. And the remaining nodes in the other part incorporate all interactions between each pair of features from the input data. e results are computed by  Figure 1: Topology of the Elman neural network.
Complexity 3 where x i is the input value from input node i, y j denotes the value of the jth node in the hidden layer, v ij is the undetermined weight which relates the ith input node to the jth normal node in the hidden layer, v ij is the weight connecting the ith input node to the remaining node in the hidden layer which is also undetermined, u zj and u zj have the same meaning with v ij and v ij except for the first two parameters which link the nodes in the recurrent layer to the hidden layer, t is the iteration number in the formulas, k is the user-specified dimension, and f is the activation function. (2) Recurrent Layer. e number of nodes in the recurrent layer is the same as the number of hidden nodes. Each hidden node is connected to only one node in the recurrent layer, and the connected weight is a constant value one. So, the recurrent layer is also partitioned into two parts which are presented as follows: (3) Output Layer. e outputs are where w ij is the undermined weight, O j is the output value of jth node in the output layer, and g is the activation function. ere are different loss functions to calculate the error between the actual and the estimated values from the output layer. We consider the squared loss function which is given as follows: where T(t) is the actual value in the tth iteration. So, the final output error of the FM-Elman model is computed by Output layer Hidden layer with FM Recurrent layer

Complexity
To optimize the FM-Elman model, we often use the stochastic gradient decent method to update the weights until it achieves convergence.

Algorithm of FM-Elman Model.
e training process of the FM-Elman model is detailed as follows: (1) e gradients of the weights and the updated rule in the output layer are computed as follows: where j � 1, 2, . . . , m, η is the learning rate. (2) e gradients of the weights and the updated rule in the hidden layer connected by the recurrent layer are calculated as the following two cases: where z � 1, 2, . . . , m, η is the learning rate, k is the user-specified dimension, and g ′ and f ′ are corresponded derivative functions of g and f.
(3) e gradients of the weights and the updated rule in the hidden layer connected by the input layer are computed as the following two cases: Complexity where i � 1, 2, . . . , n, η is the learning rate, k is the userspecified dimension, and g ′ and f ′ are corresponded derivative functions of g and f.

Data Selecting and Processing.
Stock prices' different changing behaviors and volatility predicting study have long been a focus in economic research. We use the logarithmic return to describe the statistical characteristic of a stock return volatility. e stock logarithmic return is defined as where In this paper, a threshold value θ( ≥ 0) is introduced as the volatility degree. Let R(θ) denote the set in which the stock returns' absolute values are greater than the valueθ, and the definition is given by R(θ) � r(t)‖r(t)| ≥ θ, t � 1, 2, . . . , T { }. Once the value θ is set, we can obtain the dataset including the satisfied stock daily closing price. Figure 3 gives an example of stock returns with different thresholds. For a fixed threshold value θ, the corresponding stock trading dates are determined. e trading dates are in the set where time t values satisfy |r(t)| ≥ θ, t � 1, 2, . . . , T. Newly formed series are arranged in a chronological order. Table 2 gives the numbers of data for stock indexes distributed in training data and testing data under different threshold values θ.
When the threshold value θ equals 0, the averaged values of daily absolute returns for S&P 500, DJIA, SSEC, and SZI are 0.0094, 0.0089, 0.0117, and 0.0130, respectively. We set different volatility degrees 0.003, 0.006, 0.009, and 0.012 to see the data numbers distributed in the training and testing data set, respectively. In Table 2, as the threshold value θ increases, the quantity of data in both training dataset and testing dataset that exceeds the given threshold value gradually reduces. And it can be predicted that when θ is larger than 0.012, the corresponding numbers will be fewer.
Four input variables including the daily opening prices, the daily highest prices, the daily lowest prices, and the daily closing prices are selected according to the newly formed dates. And we choose the next time daily closing price in the chronological ordered datasets as the output variable. In order to reduce the noise's impact on the stock markets, all the input data X are normalized as follows: en, it is easily to obtain the actual prediction value through X � X ′ (max X − min X) + min X.

Performances of FM-Elman Model.
In the proposed FM-Elman neural network, we choose the structure with 4 × 10 × 1 where the number of input nodes is 4, the number of the hidden nodes is 10, and the number of the output nodes is 1. We set the maximum iterations number as 5000, η � 0.02, and the predefined minimum training threshold is ε � 5 × 10 − 5 .
To analyze and evaluate the predicting performance of the FM-Elman neural network model, we use the accuracy measures with the corresponding definitions as follows:   We then give the performance comparisons among different prediction models for θ � 0 in Table 3 where the MAPE (100) means the latest 100 days of MAPE in the testing data. ree different prediction models include BPNN, Elman neural network, and FM-Elman neural network with the user-specified dimension k � 3. Table 3 shows that FM-Elman model' evaluation errors are all smaller than those in the other two models. In addition, the MAPE (100) value is smaller than the corresponding stock index' MAPE value. It shows that the short-term prediction outperforms the long-term prediction.

e Impact of θ.
When θ varies from 0.003 to 0.012, different prediction analysis of indexes S&P 500, DJIA, SSEC, and SZI can be performed by the FM-Elman neural network. Figures 6 and 7 are the prediction analysis of S&P 500 and SSEC by the FM-Elman neural model. e two figures also show the effectiveness of forecasting with different volatility degree values of θ. When θ is small, the performance of volatility prediction is revealed better through the empirical results. Like θ � 0.03 and θ � 0.06 in Figures 6(a) and 6(b), the predictive values are closer to the actual values than those in Figures 6(c) and 6(d). Figure 7 also indicates the similar results.
We choose the often recommended criterion MAPE to measure the prediction performance for stock indexes S&P 500, DJIA, SSEC, and SZI under the FM-Elman neural model which is presented in Table 4. When the value of θ increases, the value of MAPE increases gradually. And the numerical experiment results show that using the FM-Elman neural network model, the volatility degree forecasting is feasible.

e Impact of k.
In this subsection, we want to analyze how the user-specified dimension k affects the prediction performance by the FM-Elman neural model through numerical experiment when θ � 0. From the descriptions in Section 3, when k increases, the amount of red nodes in both hidden layer and recurrent layer becomes larger. at means more hidden and recurrent nodes in the proposed FM-Elman neural network will contain interaction information from the connected inputs. en, the computation becomes complicated as k increases. It is interesting to see in Table 5 that the evaluation values of MAE, MAPE, and RMSE for indexes S&P 500, DJIA, SSEC, and SZI increase first and then decrease with the increasing of k. So, the low userspecified dimension or high user-specified dimension is a better choice. No matter which outperforms the predicting results of BPNN and Elman NN from the previous Table 3.

Further Predicting Performance Evaluation.
We adopt three trend-type statistical methods, i.e., directional symmetry (DS), correct up-trend (CP), and correct down-trend (CD) [35], to check the practical stock movement. When the values of these three performance evaluation results become larger, the forecasting of change direction will be more precise. e definitions of these three performance evaluation methods are given as where N 1 is the number of testing samples.
where N 2 is the number of testing samples which satisfy  Training  Testing  Training  Testing  Training  Testing  Training  Testing  0  2010  1009  2018  1009  1925  976  1926  976  0.003  1415  777  1426  756  1485  785  1520  824  0.006  987  585  945  544  1105  643  1190  688  0.009  663  455  622  431  817  520  923  562  0.012  447  350  418  327  615  411  688  462   Complexity  7 where N 3 is the number of testing samples which satisfy T t − T t−1 < 0. T t and O t are the actual value and the predictive value in the tth iteration, respectively. In Table 6, the trend-type measures DS, CP, and CD for stock indexes S&P 500, DJIA, SSEC, and SZI under varying volatility degrees are presented through some numerical experiments. When the value of θ changes, all the stock indexes change a little. And we can see that the direction forecasting results of SSEC and SZI show better performance than S&P 500 and DJIA since the DS, CP, and CD values in two indexes before all exceed 50. And the performance results of stock indexes S&P 500 and DJIA are more sensitive

Conclusion
In this study, we developed an improved Elman recurrent neural network by introducing the factorization machine. rough extensive numerical experiments on the data from stock indexes S&P 500, DJIA, SSEC, and SZI, we demonstrated the effectiveness of the FM-Elman neural network. e prediction accuracy for all financial time series shows that our proposed FM-Elman model outperforms the BP neural network and the original Elman NN. We select training and testing datasets under different volatility degrees, i.e., the threshold value θ varies, to predict. e prediction performance of the FM-Elman model will degrade as θ becomes larger. We also investigate the effect of the user-specified dimension on the prediction performance by the FM-Elman neural model. e contribution of this work includes the following two points: (1) a technique FM combined with Elman NN to form an FM-Elman neural model for nonstationary analysis which enjoys benefits from both FM and Elman NN and (2) we demonstrate the prediction accuracy in various metrics. e numerical experiments show significant improvements in prediction accuracy over the existing methods. However, the limitation of this research is that the proposed model is data dependent which does not guarantee excellent predictions on all datasets. And further study on high-order interactions among the inputs is also a challenging work. e power of combining FM with neural network to achieve better performance will likely exist for the area of classification and regression which will be useful for future studies. We believe that FM can be used in conjunction with other deep learning network such as LSTM to form the high quality predicting method. Various combinations in techniques and approaches can be investigated in the future to solve problems occurring in different applications.

Data Availability
e data used to support this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.