Smoothing Strategies Combined with ARIMA and Neural Networks to Improve the Forecasting of Traffic Accidents

Two smoothing strategies combined with autoregressive integrated moving average (ARIMA) and autoregressive neural networks (ANNs) models to improve the forecasting of time series are presented. The strategy of forecasting is implemented using two stages. In the first stage the time series is smoothed using either, 3-point moving average smoothing, or singular value Decomposition of the Hankel matrix (HSVD). In the second stage, an ARIMA model and two ANNs for one-step-ahead time series forecasting are used. The coefficients of the first ANN are estimated through the particle swarm optimization (PSO) learning algorithm, while the coefficients of the second ANN are estimated with the resilient backpropagation (RPROP) learning algorithm. The proposed models are evaluated using a weekly time series of traffic accidents of Valparaíso, Chilean region, from 2003 to 2012. The best result is given by the combination HSVD-ARIMA, with a MAPE of 0 : 26%, followed by MA-ARIMA with a MAPE of 1 : 12%; the worst result is given by the MA-ANN based on PSO with a MAPE of 15 : 51%.


Introduction
The traffic accidents occurrence is a matter of impact in the society, therefore a problem of priority public attention; the Chilean National Traffic Safety Commission (CONASET) periodically reports a high rate of sinister on roads; in Valparaíso from year 2003 to 2012 28595 injured people were registered. The accuracy in the projections enables the intervention by the government agencies in terms of prevention; another demandant of information is the insurance companies, who require this kind of information to determine new market policies.
In order to capture the dynamic of traffic accidents, during the last years some techniques have been applied. For classification, decision rules and trees [1,2], latent class clustering and bayesian networks [3], and the genetic algorithm [4] have been implemented. For traffic accidents forecasting, autoregressive moving average (ARMA) and ARIMA models [5], state-space models [6,7], extrapolation [8], dynamic harmonic regression combined with ARIMA, and dynamic transfer functions [9] have been implemented.
The smoothing strategies Moving Average (MA) and Singular Value Decomposition (SVD) have been used to identify the components in a time series. MA is used to extract the trend [10], while SVD extracts more components [11]; the SVD application is multivariate and in some works is applied for parameter calibration in dynamical systems [12,13], in time series classification [14], or to switched linear systems [15]; typically SVD has been applied over an input data set to reduce the data dimensionality [16] or to noise reduction [17].
ARIMA is a linear conventional model for nonstationary time series; by differentiation the nonstationary time series is transformed in stationary; it is based on past values of the series and on the previous error terms for forecasting. ARIMA has been applied widely to model nonstationary data; some applications are the traffic noise [18], the daily global solar radiation [19], premonsoon rainfall data for western India [20], and aerosols over the Gangetic Himalayan region [21].
The autoregressive neural network (ANN) is a nonlinear method for forecasting that has been shown to be efficient in solving problems of different fields; the capability of learning of the ANN is determined by the algorithm. Particle swarm optimization (PSO) is a population algorithm that has been found to be optimal; it is based on the behaviour of a swarm; this is applied to update the connections weights of the ANN; some modifications of PSO have been evaluated based on variants of the acceleration coefficients [22], others apply the adaptation of the inertia weight [23][24][25][26], also the usage of adaptive mechanisms for both inertia weight and the acceleration coefficients based on the behaviour of the particle at each iteration have been used [27,28]. The combination of ANN-PSO has improved the forecasting over some classical algorithms like backpropagation (BP) [29][30][31] and least mean square (LMS) [32]. Another learning algorithm that has been shown to be better than backpropagation is RPROP and is also analyzed by its robustness, easy implementation, and fast convergence regarding the conventional BP [33,34].
The linear and nonlinear models may be inadequate in some forecasting problems; consequently they are not considered universal models; then the combination of linear and nonlinear models could capture different forms of relationships in the time series data. The Zhang hybrid methodology that combines both ARIMA and ANN models is an effective way to improve forecasting accuracy; ARIMA model is used to analyze the linear part of the problem and the ANN models, the residuals from the ARIMA model [35]; this model has been applied for demand forecasting [36]; however some researchers believe that some assumptions of Zhang can degenerate hybrid methodology when opposite situation occurs; Kashei proposes a methodology that combines the linear and nonlinear models which has no assumptions of traditional Zhang hybrid linear and nonlinear models in order to yield the more general and the more accurate forecasting model [37].
Based on the arguments presented in this work, two smoothing strategies to potentiate the preprocessing stage of time series forecasting are proposed; 3-point MA and HSVD are used to smooth the time series; the smoothed values are forecasted with three models; the first is based on ARIMA model, the second in ANN is based on PSO, and the third in ANN is based on RPROP. The models are evaluated using the time series of injured people in traffic accidents occurring in Valparaíso, Chilean region, from 2003 to 2012 with 531 weekly registers. The smoothing strategies and the forecasting models are combined and six models are obtained and compared to determine the model that gives the major accuracy. The paper is structured as follows. Section 2 describes the smoothing strategies. Section 3 explains the proposed forecasting models. Section 4 presents the forecasting accuracy metrics. Section 5 presents the results and discussions. The conclusions are shown in Section 6.

Moving Average.
Moving average is a smoothing strategy used in linear filtering to identify or extract the trend from a time series. MA is a mean of a constant number of observations that can be used to describe a series that does not exhibit a trend [38]. When 3-point MA is applied over a time series of length , the − 2 elements of the smoothed series are computed with̃= wherẽis the th smoothed signal element, for = 2, . . . , − 1, is each observed element of original time series, and terms̃1 and̃have the same values of 1 and , respectively. The smoothed values given by 3-points MA will be used by the estimation process through the selected technique (ARIMA or ANN); this strategy is illustrated in Figure 1(a).

Hankel Singular Value Decomposition.
The proposed strategy HSVD is implemented during the preprocessing stage in two steps, embedding and decomposition. The time series is embedded in a trajectory matrix; then the structure of the Hankel matrix is applied, the decomposition process The Scientific World Journal 3 extracts the components of low and high frequency of the mentioned matrix by means of SVD, the smoothed values given by HSVD are used by the estimation process, and this strategy is illustrated in Figure 1 The original time series is represented with , is the Hankel matrix, , , are the elements obtained with SVD and will be detailed more ahead, is the component of low frequency, is the component of high frequency,̂is the forecasted time series, and is the error computed between and̂witĥ= −̂. (2)

Embedding the Time Series.
The embedding process is illustrated as follows: where is a real matrix, whose structure is the Hankel matrix, 1 , . . . , are the original values of the time series, is the number of rows of and also is the number of components that will be obtained with SVD, is the number of columns of , and is the length of the time series. The value of is computed with

Singular Value Decomposition.
The SVD process is implemented over the matrix obtained in the last subsection. Let be an × real matrix; then there exist an × orthogonal matrix , an × orthogonal matrix , and an × diagonal matrix with diagonal entries 1 ≥ 2 ≥ ⋅⋅⋅ ≥ , with = min( , ), such that = . Moreover, the numbers 1 , 2 , . . . , are uniquely determined by [39]: The extraction of the components is developed through the singular values , the orthogonal matrix , and the orthogonal matrix , for each singular value is obtained one matrix , with = 1, . . . , : = ( ) × (:, ) × (:, ) .
Therefore the matrix contains the th component; the extraction process is where is the th component and the elements of are located in the first row and last column of .
The energy of the obtained components is computed with where is the energy of the th component and is the th singular value. When > 2, the component is computed with the sum of the components from 2 to , as follows: (9)

Autoregressive Integrated Moving Average
Model. The ARIMA model is the generalization of the ARMA model; ARIMA processes are applied on nonstationary time series to convert them in stationary, in ARIMA( , , ) process; is a nonnegative integer that determines the order and and are the polynomials degrees [40]. The time series transformation process to obtain a stationary time series from a nonstationary is developed by means of differentiation; the time series will be nonstationary of order if = Δ is stationary; the transformation process is where is the time series, is the time instant, and is the number of differentiations obtained, that is, because the process is iterative. Once we obtained the stationary time series, the estimation is computed witĥ where represents the coefficients of the AR terms of order and denotes the coefficients of the MA terms of order , is the input regressor vector, which is defined in Section 3.2, and is a source of randomness and is called white noise. The coefficients and are estimated using the maximum likelihood estimation (MLE) algorithm [40].

Neural Network Forecasting
Model. The ANN has a common structure of three layers [41]; the inputs are the lagged terms contained in the regressor vector ; at hidden layer the sigmoid transfer function is applied, and at output layer the forecasted value is obtained. The ANN output iŝ wherêis the estimated value, is the time instant, is the number of hidden nodes, V and are the linear and nonlinear weights of the ANN connections, respectively, represents the th lagged term, and (⋅) is the sigmoid transfer function denoted by 4 The Scientific World Journal The lagged terms are the input of the ANN and they are contained in the regressor vector , whose representation for MA smoothing is where = lagged terms and and were defined in Section 3.1.
The representation of for HSVD smoothing is where = 2 lagged terms. The ANN is denoted by ANN( , , 1), with inputs, hidden nodes, and 1 output. The parameters V and are updated with the application of two learning algorithms: one based on PSO and the other on RPROP.

Learning Algorithm Based on PSO.
The weight of the ANN connections, and V are adjusted with PSO learning algorithm. In the swarm the particles have a position vector = ( 1 , 2 , . . . , ) and a velocity vector = ( 1 , 2 , . . . , ); each particle is considered a potential solution in a -dimensional search space. During each iteration the particles are accelerated toward the previous best position denoted by and toward the global best position denoted by . The swarm has rows and columns, and it is initialized randomly; is computed with × ℎ + ℎ ; the process finishes when the lowest error is obtained based on the fitness function evaluation or when the maximum number of iterations is reached [42], as follows: where = 1, . . . , , = 1, . . . , ; denotes the inertia weight; 1 and 2 are learning factors, 1 and 2 are positive random numbers in the range [0, 1] under normal distribution, and is the th iteration. Inertia weight has linear decreasing, max is the maximum value of inertia, min is the lowest, and iter max is total of iterations.
The particle represents the optimal solution, in this case the set of weights and for the ANN.

Learning Algorithm Based on Resilient Backpropagation.
RPROP is an efficient learning algorithm that performs a direct adaptation of the weight step based on local gradient information; it is considered a first-order method. The update rule depends only on the sign of the partial derivative of the arbitrary error regarding each weight of the ANN. The individual step size Δ is computed for each weight using this rule [33], as follows: where 0 < − < 1 < + . If the partial derivative / has the same sign for consecutive steps, the step size is slightly increased by the factor + in order to accelerate the convergence, whereas if it changes the sign, the step size is decreased by the factor − . Additionally in the case of a change in the sign, there should be no adaptation in the succeeding step; in the practice this can be done by setting / = 0 in the adaptation rule Δ . Finally the weight update and the adaptation are performed after the gradient information of all the weights is computed.

Forecasting Accuracy Metrics
The forecasting accuracy is evaluated with the metrics root mean squared error (RMSE), generalized cross validation (GCV), mean absolute percentage error (MAPE), and relative error (RE): where V is the validation (testing) sample size, is the th observed value,̂is the th estimated value, and is the length of the input regressor vector.

Results and Discussions
The data used for forecasting is the time series of injured people in traffic accidents occurring in Valparaíso, from 2003 to 2012; they were obtained from CONASET, Chile [43]. The data sampling period is weekly, with 531 registers as shown in Figure 2(a); the series was separated for training and testing, and by trial and error the 85% for training and the 15% for testing were determined.  values are used as input of the forecasting model ARIMA( , , ); this is presented in Figure 1(a). The effective order of the polynomial for the AR terms is found to be = 9 and the differentiation parameter is found to be = 0; those values were obtained from the autocorrelation function (ACF) shown in Figure 2(b); to set the order of MA terms, is evaluated the metric GCV versus the Lagged values. The results of the GCV are presented in Figure 3(a); it shows that the lowest GCV is achieved with 10 lagged values. Therefore the configuration of the model is denoted by AM-ARIMA(9,0,10).
The evaluation executed in the testing stage is presented in Figures 4 and 5(a) and Table 1. The observed values versus the estimated values are illustrated in Figure 4(a), reaching a good accuracy, while the relative error is presented in Figure 4(b), which shows that the 87% of the points present an error lower than ±1.5%.
For the evaluation of the serial correlation of the model errors the ACF is applied, whose values are presented in Figure 5(a); it shows that ACF for a lag of 16 is slightly lower than the 95% confidence limit; however the rest of the coefficients are inside the confidence limit; therefore in the errors of the model AM-ARIMA(9,0,10) there is no serial correlation; we can conclude that the proposed model explains efficiently the variability of the process.

HSVD Smoothing.
In this section the forecasting strategy presented in Figure 1(b) is evaluated; to implement this strategy in first instance the time series is mapped using the Hankel matrix, after the SVD process is executed to obtain   the components. The value of is found through the computation of the singular values of the decomposition; this is presented in Figure 6(a); as shown in Figure 6(a), the major quantity of energy is captured by the two first components; therefore in this work only two components have been selected with = 2. The first component extracted represents the long-term trend ( ) of the time series, while the second represents the short-term component of high frequency fluctuation ( ). The components and are shown in Figures 6(b) and 6(c), respectively.
To evaluate the model, in this section = 9 and = 0 are used, and is evaluated using the GCV metric for 1 ≤ ≤ 18; then the effective value = 11 is found, as shown in Figure 3(b); therefore the forecasting model is denoted by HSVD-ARIMA(9,0,11).
Once and are found, the forecasting is executed with the testing data set, and the results of HSVD-ARIMA(9,0,11) are shown in Figures 7(a), 7(b), and 5(b) and Table 1. Figure 7(a) shows the observed values versus the estimates vales, and a good adjusting between them is found. The relative errors are presented in Figure 7(b); it shows that the 95% of the points present an error lower than ±0.5%.
For the evaluation of the serial correlation of the model errors the ACF is applied, whose values are presented in Figure 5(b); it shows that all the coefficients are inside the confidence limit; therefore in the model errors there is no serial correlation; we can conclude that the proposed model HSVD-ARIMA(9,0,11) explains efficiently the variability of the process.
The results presented in Table 1 show that the major accuracy is achieved with the model HSVD-ARIMA(9,0,11), with a RMSE of 0.00073 and a MAPE of 0.26%; the 95% of the points have a relative error lower than ±0.5%.

Moving Average Smoothing.
The raw time series is smoothed using the moving average of order 3, whose obtained values are used as input of the forecasting model presented in Figure 1(a). The calibration executed in Section 5.1.1 is used for the neural network and then an ANN( , , 1) is used, with = 9 inputs (lagged values), = 10 hidden nodes, and 1 output.
The evaluation executed in the testing stage is presented in Figures 8 and 9(a) and Table 2. The observed values versus the estimated values are illustrated in Figure 8(a), reaching a good accuracy, while the relative error is presented in Figure 8  which shows that the 85% of the points present an error lower than ±15%.
For the evaluation of the serial correlation of the model errors the ACF is applied, whose values are presented in Figure 9(a); it shows that there are values with significative difference from zero to 95% of the confidence limit; by example the three major values are obtained when the lagged value is equal to 3, 4, and 7 weeks. Therefore in the residuals there is serial correlation; this implies that the model MA-ANN-PSO(9,10,1) is not recommended for future usage and probably other explanatory variables should be added in the model.
The process was run 30 times and the best result was reached in the run 22 as shown in Figure 10(a); Figure 10(b) presents the RMSE metric for the best run.

HSVD Smoothing.
In this section the forecasting strategy presented in Figure 1(b) is evaluated; the HSVD smoothing strategy is applied using the same calibration   The evaluation executed in the testing stage is presented in Figures 11 and 9(b) and Table 2. The observed values versus the estimated values are illustrated in Figure 11(a), reaching a good accuracy, while the relative error is presented in Figure 11(b), which shows that the 95% of the points present an error lower than ±4%.
For the evaluation of the serial correlation of the model errors the ACF is applied, whose values are presented in Figure 9(b); it shows that all the coefficients are inside the confidence limit of 95% and statistically are equal to zero; therefore in the model errors there is no serial correlation; we can conclude that the proposed model HSVD-ANN-PSO(9,11,1) explains efficiently the variability of the process.
The process was run 30 times and the best result was reached in the run 11 as shown in Figure 12(a); Figure 12(b) presents the RMSE metric for the best run.
The results presented in Table 2 show that the major accuracy is achieved with the model HSVD-ANN-PSO(9,11,1), with a RMSE of 0.0123 and a MAPE of 5.45%; the 95% of the points have a relative error lower than ±4%.   model presented in Figure 1(a). The calibration executed in Section 5.1.1 is used for the neural network; then an ANN( , , 1) is used, with = 9 inputs (lagged values), = 10 hidden nodes, and 1 output.

ANN Forecasting
The evaluation executed in the testing stage is presented in Figures 13 and 14(a) and Table 3. The observed values versus the estimated values are illustrated in Figure 13(a), reaching a good accuracy, while the relative error is presented in Figure 13(b), which shows that the 81% of the points present an error lower than ±15%.
For the evaluation of the serial correlation of the model errors the ACF is applied, whose values are presented in Figure 14(a); it shows that there are values with significative difference from zero to 95% of the confidence limit; by example the three major values are obtained when the lagged value is equal to 3, 4, and 7 weeks. Therefore in the residuals there is serial correlation; this implies that the model MA-ANN-RPROP(9,10,1) is not recommended for future usage and probably other explanatory variables should be added in the model.
The process was run 30 times, and the best result was reached in the run 26 as shown in Figure 15(a); Figure 15(b) presents the RMSE metric for the best run.

HSVD Smoothing.
In this section the forecasting strategy presented in Figure 1(b) is evaluated, the HSVD smoothing strategy is applied using the same calibration explained in Section 5.1.2, and then an ANN( , , 1) is used, with = 9 inputs (lagged values), = 11 hidden nodes, and 1 output.
The evaluation executed in the testing stage is presented in Figures 16 and 14(b) and Table 3. The observed values versus the estimated values are illustrated in Figure 16(a), reaching a good accuracy, while the relative error is presented in Figure 16(b), which shows that the 96% of the points present an error lower than ±4%.
For the evaluation of the serial correlation of the model errors the ACF is applied, whose values are presented in Figure 14(b); it shows that all the coefficients are inside the confidence limit and statistically are equal to zero; therefore in the model errors there is no serial correlation; we can conclude that the proposed model HSVD-ANN-RPROP(9,11,1) explains efficiently the variability of the process. The process was run 30 times and the first best result was reached in the run 21 as shown in Figure 17(a); Figure 17(b) presents the RMSE metric for the best run.
The results presented in Table 3 show that the major accuracy is achieved with the model HSVD-ANN-RPROP(9,11,1), with a RMSE of 0.024 and a MAPE of 8.08%; the 96% of the points have a relative error lower than ±4%.   Finally, Pitman's correlation test [44] is used to compare all forecasting models in a pairwise fashion. Pitman's test is equivalent to testing if the correlation (Corr) between Υ and Ψ is significantly different from zero, where Υ and Ψ are defined by where 1 and 2 represent the one-step-ahead forecast error for model 1 and model 2, respectively. The null hypothesis is significant at the 5% significance level if |Corr| > 1.96/√ V . The evaluated correlations between Υ and Ψ are presented in Table 4.
The results presented in Table 4 show that statistically there is a significant superiority of the HSVD-ARIMA forecasting model, regarding the rest of models. The results are presented from left to right, where the first is the best model and the last is the worst model.

Conclusions
In this paper were proposed two strategies of time series smoothing to improve the forecasting accuracy. The first smoothing strategy is based on moving average of order 3, while the second is based on the Hankel singular value decomposition. The strategies were evaluated with the time Table 4: Pitman's correlation (Corr) for pairwise comparison six models at 5% of significance and the critical value 0.2219. The estimation of the smoothed values was developed through three conventional models, ARIMA, an ANN based on PSO, and an ANN based on RPROP. The comparison of the six models implemented shows that the first best model is HSVD-ARIMA, as it obtained the major accuracy, with a MAPE of 0.26% and a RMSE of 0.00073, while the second best is the model MA-ARIMA, with a MAPE of 1.12% and a RMSE of 0.0034. On the other hand, the model with the lowest accuracy was MA-ANN-PSO with a MAPE of 15.51% and a RMSE of 0.041. Pitman's test was executed to evaluate the difference of the accuracy between the six proposed models and the results show that statistically there is a significant superiority of the forecasting model based on HSVD-ARIMA. Due to the high accuracy reached with the best model, in future works, it will be applied to evaluate new time series of other regions and countries.