Nonlinear Autoregressive Neural Network and Extended Kalman Filters for Prediction of Financial Time Series

Time series analysis and prediction are major scienti ﬁ c challenges that ﬁ nd their applications in ﬁ elds as diverse as ﬁ nance, biology, economics, meteorology, and so on. Obtaining the method with the least prediction error is one of the di ﬃ cult problems of ﬁ nancial market and investment analysts. State space modelling is an e ﬃ cient and ﬂ exible method for statistical inference of a broad class of time series and other data. The neural network is an important tool for analyzing time series especially when it is nonlinear and nonstationary. Essential tools for the study of Box-Jenkins methodology, neural networks, and extended Kalman ﬁ lter were put together. We examine the use of the nonlinear autoregressive neural network method as a prediction technique for ﬁ nancial time series and the application of the extended Kalman ﬁ lter algorithm to improve the accuracy of the model. As application on a real example, we are analyzing the time series of the daily price of steel over a 790-day period for establishing the superiority of this method over other existing methods. The simulation results using MATLAB and R software show that the model is capable of producing a reasonable accuracy.


Introduction
The prediction plays an important role in the management of several areas, among them the economic domain. Forecasting financial time series that are very noisy and nonstationary is a major problem for financial operators to invest at their best profit. The goal is to model relationships linking data to predict short-or medium-term values. Several techniques available for time series analysis assume the linearity of the relationships between variables, among them the Box and Jenkins [1,2] methodology, one of the most popular methods of time series modelling through five stages: (1) choosing a class of models to represent the series, (2) the identification of the type of model, (3) the estimation of the coefficients of the identified models, (4) validation of the chosen model, and (5) the forecast of the series over a given horizon.
In reality, the financial series present significant irregularities so we resort to more complex techniques such as the extended Kalman filter (EKF) [3,4] which consists of a set of mathematical equations to model nonlinear relations. The Kalman filter has been applied in econometrics for the case where a deterministic system is unknown and must be estimated from the data, see for example Engle and Watson (1987). The Kalman filter algorithm proved to be an additional tool to improve model output [3].
Among the most promising methods are neural networks that have been introduced to solve complex classification problems. They are characterized by their ability to learn (supervised or not) from examples, then generalize to data that have not been presented to them. Neural networks can be seen as a black box that learns to map input models to appropriate output models. Learning is accomplished by changing the neuron connection weights to improve the desired matching between the input and output patterns.
After the introduction of simplified neurons by McCulloch and Pitts in 1943 [5], as models of biological neurons, some years later, exactly in 1957, Rosenblatt's Perceptron was endowed with the apprenticeship called "Perceptron Rule." But the disadvantage of perceptron is that all the functions that it realizes are linearly separable. It was in 1969 that Minsky and Papert approached the disadvantage of the perceptron in their book "Perceptrons" [6] which will lead to a break of research in the next decade.
In the eighties, we saw the explosion of artificial intelligence techniques, with the two articles of physicist Hopfield [7]. In recent years, neural networks have found wide use in time series modelling (Chakraborty et al., 1992, Weigend Gershenfeld 1993, Gencay 1993, Hoptroff 1993. Most neural networks used in economic forecasts are organized in layers, so we speak of MLP (multilayered perceptron) [8,9]. In the field of finance, neural networks give good results in prediction.
This article begins with a section that describes time series analysis and gives an idea about the neural networks.
The second section presents a description of the extended Kalman filter and the proposed combination between the extended Kalman filter and the neural networks. The application of different studied models and the proposed model to the daily price of steel and the comparison between them is presented in the third section. The results and their discussion are discussed in the fourth section.

Time Series.
A time series is a parameter that changes over time (price, cost, turnover, stocks, etc); it is the set of observations of a quantity ordered according to their indices. In the following, we will have a noted series: The study of a time series makes it possible to analyze, describe, and explain a phenomenon over time and to draw consequences for decision-making. One of the main objectives of the time series study is prediction which consists of predicting the future values of the series from its observed values.
The notion of stationarity is indispensable for the analysis of time series. A stationary series Y t is a series whose properties are unchanged by the change of time. We are led to the following definition. Definition 1. A stochastic process (Y t , t ∈ ℤ) is (weakly) stationary, if for any finite sequence of instants t 1 , ⋯, t k , k ∈ ℕ * , and for any integer t, the joint law of Y t 1 +t , ⋯, Y t k +t does not depend on t.
(1) ∀t ∈ ℤ, E½Y t = μ (independent of t), In sum, if the statistical characteristics of the stochastic process studied vary during the measurement period, it is said that the latter is nonstationary. Stationarity can be summarized as temporal homogeneity.

Neural Networks.
Network learning takes place as the weights are adjusted along the layers, according to the relationship between the inputs and the desired outputs. One of the most basic models is multilayer rerceptron (MLP) network, which is widely used in the approximation of nonlinear functions that describe complex relationships between independent and dependent variables in many applications.
Multilayer perceptron (MLP) was first introduced to solve complex classification problems. But because of their universal approximation property [9], they were quickly used as nonlinear regression models and then for time series modelling and forecasting.
However, the estimation and identification of these models use sophisticated techniques and it is not easy to determine the correct architecture. Indeed, these models are by definition overparametrized, the error functions to be minimized have many local minima, and the implementation is often difficult.
The nonlinear autoregressive neural network (NAR) as shown in Figure 1 can be trained to predict a time series from that series past values Yðt − 1Þ, Yðt − 2Þ, ⋯, Yðt − dÞ called feedback delays, with d is the time delay parameter.
The network is created and trained in an open loop, using the real target values as a response and making sure of greater quality being very close to the true number in training. After training, the network is converted into a closed loop and the predicted values are used to supply new response inputs to the network. A nonlinear autoregressive neural network applied to time series forecasting, describe a discrete, nonlinear autoregressive model that can be written in this form: The function hð:Þ is unknown in advance, and the training of the neural network is aimed at approximating the function by means of the optimization of the network weights and neuron bias.
So a model (NAR) is defined precisely by an equation of the type where a is the number of entries, k is the number of hidden layers with activation function Φ, and β ij is the parameter corresponding to the weight of the connection between the input unit i and the hidden unit j, α j is the weight of the connection between the hidden unit j and the output unit, and β 0j and α 0 are the constants that correspond, respectively, to the hidden unit j and the output unit. The optimization of the architecture is aimed at reducing as much as possible the number of synapses (weights) and 2 Journal of Applied Mathematics neurons in order to reduce the complexity of the network, improve computing times, and maintain the generalization capabilities. Concerning the optimization of the network architecture, two main approaches have been proposed in the literature: • Selection approach: consists of starting with the construction of a complex network that contains a large number of neurons, then this approach is to try to reduce the number of unnecessary neurons and remove redundant connections during or at the end of learning • Incremental approach: we start with the simplest possible network, then we add neurons or layers, until we have an optimal architecture An effective approach is to estimate the prediction error using a set of data that was not used to construct the predictor, i.e., not used for learning. This dataset is called a test set.
Divide the dataset into three kinds of target timesteps as follows: • Training: these datasets are presented to the network during training and the network is adjusted according to its error • Validation: these datasets are used to measure network generalization and to halt training when generalization stops improving • Testing: these datasets have no effect on training and so provide an independent measure of network performance during and after training.

Extended Kalman Filter
Kalman filtering (KF) is a technique that gives estimates of unknown variables using a series of measurements containing statistical noise. It is a recursive way of doing things that process new data as they arrive being suited for inline realtime processing. The KF can only work with linear equations, When the system under consideration is nonlinear, the extended Kalman filter (EKF) is applied. A short description of the EKF is given below. Consider a nonlinear system described by the following 2 equations: where y t is a vector that describes the system state, z t is the observation vector (values obtained through a direct measurement of the system), u t is the process noise vector, r t is the measurement noise vector, f ð⋯Þ is a nonlinear function that gives the state transition of the system, and hð⋯Þ is the observation (nonlinear) function. Kalman filter gives a method for the recursive estimation of the unknown state y based on all observation values z up to time t [10].
Kalman filter can be used for the improvement of neural network forecasts. The evolution of the filter from time t − 1 to t is described by the following equations: (1) Initialization: y a t = y 0 with P 0 is the error covariance matrix (2) The t-th predictor step: (3) The t-th corrector step: where a is the actual value of the variable, p for the predicted value, J f is the Jacobian matrix of the f ð⋯Þ function, and J h is the Jacobian matrix of hð⋯Þ function and K t is a matrix called the Kalman gain that arranges how easily the filter adjusts to possible new conditions or alternations of the type of data.

Proposed Prediction Model.
Financial time series are very noisy and nonstationary, the presence of these two constraints pushed us to propose a model of prediction (NAR-EKF) which is a combination between the extended Kalman filter (EKF) and the multilayer perceptron (MLP) nonlinear autoregressive neural network (NAR). The main idea of forecasting time series using the extended Kalman filter and neural networks (NAR-EKF) is to use the data processed by the extended Kalman filter of the Y t series as the input for the nonlinear autoregressive neural network (described in Section 2.2), according to the following steps: • Step 1: a set of historical data is collected

Description of Data.
The data used in this study represent the daily price of steel between 2013 and 2016. This data was collected from the https://www.investing.com/ website Figure 2 shows the price of steel per day between January 2013 and March 2016. A nonstationary series is one which generally increases (decreases) with time; the observation of this figure shows that the curve decreases steadily over time, thus revealing the presence of a long-term trend. The nonstationarity of this series (Figure 1) is confirmed by the Dickey-Fuller test (p:value = 0:645 > 5%) and Phillips-Perron (p:value = 0:649 > 5%) [11,12].
To overcome this problem, the series of daily price of steel is transformed into a daily series of logarithmic difference of the price of steel. The stationarity of the series of daily returns of steel price (Figure 3) is confirmed by the Dickey-Fuller test (p:value = 0, 01 < 5%).

Application of Box-Jenkins Analysis.
In this section, we will model the daily steel price. The data was centred and reduced and then used to develop a descriptive model. In order to find the most suitable model to best explain the data, we used the Box-Jenkins methodology to estimate the parameters that characterize the series. The most suitable models for the daily price of steel are an ARMA (1,0) and an ARMA-GARCH(1,1). The estimation of the parameters of ARMA (1,0) can be carried out by several methods. In this study, they were estimated using the maximum likelihood method for the dataset [13,14].

Application of Neural Networks and EKF.
We have chosen to apply a nonlinear autoregressive neural network model to our time series. The concept is forecasting by using the data which is preprocessed through the extended Kalman filter as the input for the neural network. The financial time series is high fluctuation and time varying, and the extended Kalman filter has a good dynamic real-time tracking characteristics. The advantage that the EKF provides is smoothing and denoising the time series. After this, smoothed time series might be predicted using the nonlinear autoregressive neural networks.
• Training: 570 observations are used for training the network • Validation: 114 observations are used to measure network generalization • Testing: the last 76 observations are used for testing the network Some preliminary tests made it possible to define the structure of the network. A hidden layer of 25 neurons was used, and the activation function for the hidden layer and the output neuron is the Sigmoid function ð f ðxÞ = 1/ ½1 + exp ð−xÞÞ. We evaluated the effect of the time delay parameter d on the performance of the training process, evaluated using the mean-squared error (MSE) and the coefficient of determination R, which is a goodness-of-fit measure for linear regression between the target and the predictions. We set d from 1 to 6. The NAR-EKF model presents a very accurate fit and a small MSE independent of the value of d over all 6 trials.
Training of the network use Levenberg-Marquardt backpropagation. Training automatically stops when generalization stops improving, as indicated by an increase in the mean square error of the validation samples. Other specifications of the NAR-EKF model are mentioned in Table 1 below.
In order to enable a comparison between the classical methods, extended Kalman filter, nonlinear autoregressive neural network, and the combination between the extended Kalman filter and the nonlinear autoregressive neural network, we adopted the same evaluation criteria RMSE and MAE.
The results from a real example (daily price of steel) shown in Table 2, proof that the combination of nonlinear autoregressive neural network model with the extended Kalman filter allows to produce high precision of prediction than that obtained by tested methods. This result is coherent with the literature that state nonlinear autoregressive neural network models are less sensitive to long-term time dependencies and presenting better learning capabilities [12].

Conclusions
This paper proposes a combination of a nonlinear autoregressive neural network model with the extended Kalman filter for predicting the financial time series. The results were compared according to the calculation of the root meansquared error (RMSE) and the mean absolute error (MAE) for the six proposed models.
The combination of the two methods (NAR and EKF) for the prediction of the financial series (which are in step of daily time) seems able to improve the forecasts of the series studied by bringing the possibility of a continuous correction.