A Novel Multilevel-SVD Method to Improve Multistep Ahead Forecasting in Traffic Accidents Domain

Here is proposed a novel method for decomposing a nonstationary time series in components of low and high frequency. The method is based on Multilevel Singular Value Decomposition (MSVD) of a Hankel matrix. The decomposition is used to improve the forecasting accuracy of Multiple Input Multiple Output (MIMO) linear and nonlinear models. Three time series coming from traffic accidents domain are used. They represent the number of persons with injuries in traffic accidents of Santiago, Chile. The data were continuously collected by the Chilean Police and were weekly sampled from 2000:1 to 2014:12. The performance of MSVD is compared with the decomposition in components of low and high frequency of a commonly accepted method based on Stationary Wavelet Transform (SWT). SWT in conjunction with the Autoregressive model (SWT + MIMO-AR) and SWT in conjunction with an Autoregressive Neural Network (SWT + MIMO-ANN) were evaluated. The empirical results have shown that the best accuracy was achieved by the forecasting model based on the proposed decomposition method MSVD, in comparison with the forecasting models based on SWT.


Introduction
Time series forecasting has reached high significance in planning and management for government institutions, industries, and business. Unfortunately the forecasting implementations are limited due to the data complexity. Environmental conditions, economy variables, risk situations, among others, are originated in highly dynamic systems; consequently their analysis becomes complex and inaccurate results have been often obtained.
From the literature review, several linear and nonlinear models have been proposed. One popular linear model is Autoregressive Integrated Moving Average (ARIMA), which was introduced by Box et al. [1] and widely applied to nonstationary time series such as, electricity consumption [2], rainfall [3], solar radiation [4], and tourists arrivals [5]. Empirical results have shown varied accuracy levels by testing different parameter configuration; therefore the performance of ARIMA is dependent on an effective selection of parameters. Even more ARIMA models are limited to deal with constant variance processes and normally distributed residuals which are rarely satisfied in real life signals.
On the other hand, the Artificial Neural Networks (ANNs) are nonparametric models that have been implemented for modeling nonstationary time series. The nonlinear features of the ANNs sometimes explain the nonlinear relationships among the explaining variables and observed phenomena. By instance, Li and Shi (2010) applied three typical ANN techniques for one-step ahead wind speed forecasting by using different datasets of two representative north American sites [6], by implementation of Feed Forward Back-Propagation (FFBP), Radial Basis Function (RBF), and Adaptive Linear Element (ADALINE). After multiple tests the FFBP model was considered the best model for one site, while the RBF model was the best for other sites; therefore the research concludes that it is not recommended to employ only one type of ANN model in wind speed forecasting. Other representative example is the 2 Computational Intelligence and Neuroscience prices variation range; Laboissiere et al. [7] modeled stock prices of power distribution companies through an ANN based on Levenberg-Marquardt (LM). Different Multilayer Perceptron (MLP) topologies were evaluated iteratively with opening and closing prices and other correlated variables as input, different number of hidden neurons and one output until finding the best configuration for short-term horizon. In general, an ANN implementation implies taking some decisions after several tests, such as, network topology, signal propagation method, activation function, weights updating, hidden levels, and numbers of nodes. Sometimes higher accuracy can be reached by an ANN, but this leads a computational complexity increasing [8,9].
A novel solution is the hybrid models which are based on the combination of techniques. Preprocessing methods combined with conventional linear and nonlinear models are implemented to improve the forecast. The Wavelet Decomposition (WD) was originated in 1984 with the discovery of Grossman and Morlet in the quantum physics context [10]. The conjunction Wavelet Decomposition and artificial intelligence can improve the efficiency of pure models in many areas such as, hydrology [11,12], transportation systems [13], and public health [14].
In this work is proposed a new decomposition method based on Multilevel Singular Value Decomposition (MSVD) for extraction components of low and high frequency of a nonstationary time series in order to improve the accuracy of a linear and a nonlinear forecasting model. Related works about traffic accidents forecasting are scarce; most applications make classification with multivariate methods [15][16][17][18]. There have been found some forecast applications related to transportation areas, such as, traveling time [19], traffic flow parameters as volume, travel speeds and occupancies [20,21], the market demand after transportation disruptions [22], and freight transportation demand [23,24]. This paper is organized as follows. Section 2 describes the proposed methodology based on MSVD + MIMO-AR. Section 3 describes SWT + MIMO-AR and SWT + MIMO-ANN. Section 4 presents the efficiency metrics. Section 5 specifies the study case. Section 6 shows the empirical research results. Finally Section 7 concludes the paper.

Forecasting Methodology Based on MSVD and MIMO-AR
The proposed forecasting methodology is described in two stages; the first stage is presented by Multilevel Singular Value Decomposition to decompose a time series into two components of low and high frequency, whereas the second stage Normalize time series = ./max(abs( )) Set the counter = 0, and signal to be decomposed 0 = while ( < ) = + 1 Embed the signal =hankel( 0 , 2) Decompose the matrix [ , [ 1 2 ] , ] = svd( ) performs the prediction through the MIMO-AR model. The MIMO-AR inputs are the lagged values of the components extracted, and the outputs are the prediction for multiple horizon.

Multilevel Singular Value
Decomposition. MSVD is a method inspired in the pyramidal process implemented in multiresolution analysis of Mallat Algorithm [25] which was defined for wavelet representation. In this method is proposed the multilevel decomposition of a Hankel matrix, at difference of standard HSVD [26]. MSVD implements iterative embedding and pyramidal decomposition with a fixed window length = 2; therefore two components are obtained at each decomposition level.
MSVD algorithm is summarized as the pseudocode shown in Algorithm 1. The input algorithm is the observed time series of length , and at the end, two additive and intrinsic components are obtained as outputs, and , which represent the Low Frequency and the High Frequency Component, respectively, each one of length . MSVD is performed in three steps: embedding through a Hankel matrix of 2 × ( − 1) dimension as (2), decomposition in orthogonal matrices of eigenvectors and and singular values 1 , 2 , and finally extraction from elementary matrices 1 and 2 .
MSVD is processed iteratively until the optimum decomposition level , when the Singular Spectrum Rate Δ reaches the asymptotic point. Equations (1a) and (1b) describe the computation of Δ , which is based on the relative energy of the singular values (for = 1, . . . , decomposition levels).

Multiple Input Multiple Output Autoregressive Prediction.
The AR model is implemented to forecast the time series by using the MIMO strategy. MIMO is used for overcoming the error accumulation problem that is observed in the recursive strategy and the direct strategy and for preserving the random relationships between predicted values [27]. MIMO-AR computes the output for the forecast horizon in a single simulation with unique model, which returns a vector rather than a scalar value as follows: where is the forecast horizon and̂is the matrix of outputs. Each column will contain the prediction related to a specific horizon.
The following equation defines MIMO in matrix form: where is a × 2 matrix of linear coefficients and is the Autoregressive transposed matrix created from the components and that were extracted by means of MSVD. Matrix is × 2 dimension, where is the number of samples. The matrix of coefficients is computed with the Least Square Method (LSM) as below: where † is the Moore-Penrose pseudoinverse matrix of . The implementation of SWT is defined in the algorithm of Shensa [28]. SWT implements filtering but the downsampling procedure is omitted and the filters are upsampled [29,30].

Stationary Wavelet Transform Combined with MIMO-AR and Stationary Wavelet Transform Combined with MIMO-ANN
In SWT the length of the observed signal must be an integer multiple of 2 , where = 1, 2, . . . , is the scale number. The signal is separated in approximation coefficients and detail coefficients at different scales; this hierarchical process is called multiresolution decomposition [25].
The observed signal 0 (which was named in previous section) is decomposed in approximation and detail coefficients through a bank of low pass filters (ℎ 0 , ℎ 1 , . . . , ℎ −1 ) and a bank of high pass filter ( 0 , 1 , . . . , −1 ) one to each level as the scheme of Figure 1. Each level filter is upsampled version of the previous one. The components obtained after the decomposition are never decimated; therefore they have the same length as the observed signal.
At first decomposition level the observed signal 0 is convoluted with the first low pass filter ℎ 0 to obtain the first approximation coefficients 1 and with the first high pass filter 0 to obtain the first detail coefficients 1 . The process is defined as follows: The process follows iteratively, for = 1, . . . , − 1 and it is given as Inverse Stationary Wavelet Transform (iSWT) performs the reconstruction. The implementation of iSWT consists in applying the operations that were performed in SWT but in inverse order and based on equivalent filters of reconstruction. SWT obtains subbands of frequency; the last approximation coefficient reconstructed̃gives rise to the component of low frequency , whereas all reconstructed detail coefficients̃are added to obtain the component of high frequency .

Forecast Based on Components Extracted by SWT.
The forecasting is implemented via both a linear and nonlinear models to evaluate the performance of the decomposition obtained through SWT.
The MIMO-AR model has the same structure that was used in the previous section. Therefore the nonlinear forecast based on an Artificial Neural Network is described here.
A sigmoid Multilayer Perceptron (MLP) of three layers [31] is implemented. The ANN is denoted with ( , , ℎ). The inputs are the lagged terms contained in the regressor matrix . The hidden layer has nodes and the output has ℎ nodes, which are the forecast horizon. The prediction at each forecast horizon̂+ ℎ via the ANN is expressed with, where ℎ = 1, . . . , , ℎ is the weight of the connection between th hidden node and ℎth output, and is th hidden level output, while is the weight of the connection between th input node and th hidden node, and represents th lagged vector.
The sigmoid transfer function is denoted with the following: Parameters and are updated with the application of the learning algorithm, in this case, Levenberg-Marquardt [32,33].

Efficiency Metrics
The performance of the models MSVD + MIMO-AR, SWT + MIMO-AR, and SWT + MIMO-ANN is evaluated with three efficiency criteria. The normalized Root Mean Square Error (nRMSE), the modified Nash-Suctliffe Efficiency (mNSE) [34], and the modified Index of Agreement (mIA) [35]. Metrics mNSE and mIA at difference of nRMSE are not based on square differences; in place they are based on the Sum of Absolute Error (SAE) and the Sum of Absolute Deviation (SAD); the formulas are shown below: where is th observed value,̂is th predicted value, is the mean of all , and is the testing sample size. Scores mNSE and mIA overcome the possible oversensitivity to extreme values induced by metrics based on square computation and increase the possible sensitivity for lower values.
Metrics mNSE and mIA are monotonically and functionally related, but the use of 2SAD balances the number of deviations evaluated within the numerator and within the denominator of the factional part.

Case Study
The Chilean Police and the National Traffic Safety Commission (CONASET) are the official institutions that register the data of traffic accidents in Chile [36]. The data are continuously collected, and in this study they have been sampled with a fixed interval of 7 days (one week). Three discrete time series of injured people in traffic accidents in Santiago from 2000:1 to 2014:12 due to different causes are used. CONASET has defined one hundred causes of traffic accidents; in this study case the series of persons with injuries I-G1 and I-G2 group include 20 causes which are related to improper behavior of drivers, passengers, and pedestrians, with an incidence rate of 75%, whereas the series I-G3 groups include the remaining causes (not related to improper behavior) with an incidence rate of 25%. Table 1 presents the series of persons with injuries in traffic accidents and the causes related to the incidents. On the other hand, I-G2 presents downward trend from week 232 until the end, and I-G3 presents upward trend from week 491 until end. The FPS analysis shows the signal spectrum and the red-noise spectrum; a signal spectrum peak is significative when its value is higher than the red-noise spectrum [37]. Both series, I-G1 and I-G2, present the highest peak at week 26 at 98% of confidence level, whereas I-G3 shows the highest peak at week 17 at 73% of confidence level. The order of each AR model was selected with the number of weeks for which the highest power spectrum was found; consequently = 26 for I-G1 and I-G2, and = 17 for I-G3.

Empirical Research Result
The empirical results obtained by the application of MSVD + MIMO-AR, SWT + MIMO-AR, and SWT + MIMO-ANN are presented in this section in two stages decomposition and prediction.

Decomposition Based on MSVD and SWT.
MSVD implements an iterative process which finishes when Δ reaches the asymptotic value. The Singular Spectrum Rate Δ for each decomposition level is illustrated in Figure 5; the asymptotic value is reached when Δ ≈ 1, and it was observed in 6 Computational Intelligence and Neuroscience Injured-G1 Injured-G2 Injured-G3  repetition 16. Therefore the iterative process finishes when iteration 16 was performed; this condition is used with alltime series.
SWT decomposition is implemented trough Daubechies of order 2 (Db2) (due to the inaccurate results that were obtained with the other types of wavelet functions, they are not presented). Three decomposition levels ( = 3) were selected according to the period fluctuation between 8 and 16 weeks.  both MSVD and SWT show long-memory periodicity features, whereas the components show short-term periodic fluctuations.

Prediction via MIMO-AR and MIMO-ANN Models.
The MIMO strategy is implemented to predict the number of injured people in traffic accidents for multiple horizon through the Autoregressive model and through an Artificial Neural Network. For both linear and nonlinear models the spectral analysis developed by means of FPS informs about the order of the models; it was shown in Figures 2(b), 3(b),  and 4(b). The inputs are the lagged values of and the lagged values of , and the outputs are the number of injured people for the next weeks. The components and were extracted previously via MSVD and SWT. Before prediction, each data set of low and high frequency has been divided into two subsets, training and testing. The training subset ( tr ) involves 70% of the samples, and consequently the testing subset ( ) involves the remaining 30%.
The MIMO-ANN structure denoted with ( , , ℎ) was implemented with lagged values (set previously with the FPS information), = log 2 ( tr ), where tr is the training subset size, and ℎ = 8 forecast horizon.
The prediction performance is evaluated with the efficiency metrics nRMSE, mNSE, and mIA, which are presented in Tables 2, 3, and 4 for I-G1, I-G2, and I-G3, respectively. The ANN results correspond to 500 epochs and 10 runs. The results obtained through the nonlinear model SWT + MIMO-ANN are inferior with respect to the linear models MSVD + MIMO-AR and SWT + MIMO-AR; therefore the rest of comparisons are performed between the linear models. The SWT + MIMO-AR results for 14 weeks' ahead prediction of all-time series are not presented due to the poor results that were obtained, as those results that were obtained with SWT + MIMO-ANN for forecast horizon higher than 8 weeks.
From Table 2 and Figure 9, MSVD + MIMO-AR obtains the best accuracy in the forecast of I-G1. A significative gain was observed in each forecasting horizon of the model based on MSVD with respect to the models based on SWT. The mean gain for 1 to 13 weeks of MSVD + MIMO-AR over SWT + MIMO-AR is 17.7% in mNSE and 8.1% in mIA.  From Table 3 and Figure 10, MSVD + MIMO-AR obtains the best accuracy in the forecasting of I-G2. A significative gain of MSVD + MIMO-AR was observed in each forecasting horizon with respect to the models based on SWT. The mean gain for 1 to 13 weeks of MSVD + MIMO-AR over SWT + MIMO-AR is 20.6% in mNSE and 9.3% in mIA.
From previous analysis that was made for I-G1 and I-G2, from Table 4 and Figure 11, the I-G3 forecast based on 8 Computational Intelligence and Neuroscience  MSVD + MIMO-AR is also more accurate than the prediction obtained with the models based on SWT. The mean gain for 1 to 13 weeks of MSVD + MIMO-AR over SWT + MIMO-AR is 20.9% in mNSE and 9.4% in mIA. The I-G1 Prediction via MSVD + MIMO-AR for 14 weeks' ahead prediction is shown in Figures 12(a) and 12(b); from figures good fit is observed between actual and estimated values. Metrics computation gives nRMSE of 2.9%, mNSE of 83.3%, and mIA of 91.6%. The prediction of the same series via SWT + MIMO-AR for 13 weeks' ahead prediction is shown in Figure 13; lower accuracy is observed with nRMSE of 10.1%, mNSE of 43.7%, and mIA of 71.8%.
Computational Intelligence and Neuroscience  The I-G2 Prediction via MSVD + MIMO-AR for 14 weeks' ahead prediction is shown in Figures 14(a) and 14(b); from figures good fit is observed with nRMSE of 5.8%, mNSE of 81.9%, and mIA of 90.9%. The prediction of the same series via SWT + MIMO-AR for 13 weeks' ahead prediction is shown in Figure 15, lower accuracy is observed with nRMSE of 19.6%, mNSE of 36.4%, and mIA of 68.2%.
The I-G3 Prediction via MSVD + MIMO-AR for 14 weeks' ahead prediction is shown in Figures 16(a) and 16     shown in Figure 17, lower accuracy is observed with nRMSE of 13.0%, mNSE of 35.0%, and mIA of 67.5%.

Conclusions
In this paper was presented a new decomposition method for extracting components of low and high frequency of a nonstationary time series. The method was called MSVD due to the use of Multilevel Singular Value Decomposition of a Hankel matrix. MSVD was evaluated for multistep ahead forecasting based on the Autoregressive model and the MIMO strategy.
The forecasting model MSVD + MIMO-AR was compared with a linear and a nonlinear forecast model based on the commonly decomposition technique Stationary Wavelet Transform. The empirical application was developed through three time series coming from traffic accidents domain. All experiments have shown the superior accuracy of MSVD + MIMO-AR with respect to SWT + MIMO-AR and SWT + MIMO-ANN. MSVD + MIMO-AR in comparison with the second best model SWT + MIMO-AR, achieving mean mNSE-gain of 19.8% and mean mIA-gain of 8.9% for 13 weeks' ahead forecasting of persons with injuries in traffic accidents in Santiago, Chile. It was also observed that an ANN with Levenberg-Marquardt based on SWT decays significantly from 9 weeks' ahead forecasting.
Furthermore, the implementation of MSVD presents simplicity with respect to other techniques based on singular values by the use of a fixed window length in the embedding step, and although the algorithm is iterative, the stopping condition is guaranteed by the convergence of the Singular Spectrum Rate parameter. MSVD also presents more simplicity with respect to SWT; this is because SWT requires taking some decisions to select the wavelet mother function.
Future implementations will consider new application areas to support planning and management tasks of public and private institutions.