Two smoothing strategies combined with autoregressive integrated moving average (ARIMA) and autoregressive neural networks (ANNs) models to improve the forecasting of time series are presented. The strategy of forecasting is implemented using two stages. In the first stage the time series is smoothed using either, 3-point moving average smoothing, or singular value Decomposition of the Hankel matrix (HSVD). In the second stage, an ARIMA model and two ANNs for one-step-ahead time series forecasting are used. The coefficients of the first ANN are estimated through the particle swarm optimization (PSO) learning algorithm, while the coefficients of the second ANN are estimated with the resilient backpropagation (RPROP) learning algorithm. The proposed models are evaluated using a weekly time series of traffic accidents of Valparaíso, Chilean region, from 2003 to 2012. The best result is given by the combination HSVD-ARIMA, with a MAPE of 0 : 26%, followed by MA-ARIMA with a MAPE of 1 : 12%; the worst result is given by the MA-ANN based on PSO with a MAPE of 15 : 51%.
The traffic accidents occurrence is a matter of impact in the society, therefore a problem of priority public attention; the Chilean National Traffic Safety Commission (CONASET) periodically reports a high rate of sinister on roads; in Valparaíso from year 2003 to 2012 28595 injured people were registered. The accuracy in the projections enables the intervention by the government agencies in terms of prevention; another demandant of information is the insurance companies, who require this kind of information to determine new market policies.
In order to capture the dynamic of traffic accidents, during the last years some techniques have been applied. For classification, decision rules and trees [
The smoothing strategies Moving Average (MA) and Singular Value Decomposition (SVD) have been used to identify the components in a time series. MA is used to extract the trend [
ARIMA is a linear conventional model for nonstationary time series; by differentiation the nonstationary time series is transformed in stationary; it is based on past values of the series and on the previous error terms for forecasting. ARIMA has been applied widely to model nonstationary data; some applications are the traffic noise [
The autoregressive neural network (ANN) is a nonlinear method for forecasting that has been shown to be efficient in solving problems of different fields; the capability of learning of the ANN is determined by the algorithm. Particle swarm optimization (PSO) is a population algorithm that has been found to be optimal; it is based on the behaviour of a swarm; this is applied to update the connections weights of the ANN; some modifications of PSO have been evaluated based on variants of the acceleration coefficients [
The linear and nonlinear models may be inadequate in some forecasting problems; consequently they are not considered universal models; then the combination of linear and nonlinear models could capture different forms of relationships in the time series data. The Zhang hybrid methodology that combines both ARIMA and ANN models is an effective way to improve forecasting accuracy; ARIMA model is used to analyze the linear part of the problem and the ANN models, the residuals from the ARIMA model [
Based on the arguments presented in this work, two smoothing strategies to potentiate the preprocessing stage of time series forecasting are proposed; 3-point MA and HSVD are used to smooth the time series; the smoothed values are forecasted with three models; the first is based on ARIMA model, the second in ANN is based on PSO, and the third in ANN is based on RPROP. The models are evaluated using the time series of injured people in traffic accidents occurring in Valparaíso, Chilean region, from 2003 to 2012 with 531 weekly registers. The smoothing strategies and the forecasting models are combined and six models are obtained and compared to determine the model that gives the major accuracy. The paper is structured as follows. Section
Moving average is a smoothing strategy used in linear filtering to identify or extract the trend from a time series. MA is a mean of a constant number of observations that can be used to describe a series that does not exhibit a trend [
Smoothing strategies: (a) moving average and (b) Hankel singular value decomposition.
The proposed strategy HSVD is implemented during the preprocessing stage in two steps, embedding and decomposition. The time series is embedded in a trajectory matrix; then the structure of the Hankel matrix is applied, the decomposition process extracts the components of low and high frequency of the mentioned matrix by means of SVD, the smoothed values given by HSVD are used by the estimation process, and this strategy is illustrated in Figure
The original time series is represented with
The embedding process is illustrated as follows:
The SVD process is implemented over the matrix
The energy of the obtained components is computed with
The ARIMA model is the generalization of the ARMA model; ARIMA processes are applied on nonstationary time series to convert them in stationary, in ARIMA
The time series transformation process to obtain a stationary time series from a nonstationary is developed by means of differentiation; the time series
The ANN has a common structure of three layers [
The representation of
The ANN is denoted by
The weight of the ANN connections,
The particle
RPROP is an efficient learning algorithm that performs a direct adaptation of the weight step based on local gradient information; it is considered a first-order method. The update rule depends only on the sign of the partial derivative of the arbitrary error regarding each weight of the ANN. The individual step size
The forecasting accuracy is evaluated with the metrics root mean squared error (RMSE), generalized cross validation (GCV), mean absolute percentage error (MAPE), and relative error (RE):
The data used for forecasting is the time series of injured people in traffic accidents occurring in Valparaíso, from 2003 to 2012; they were obtained from CONASET, Chile [
Accidents time series: (a) raw data and (b) autocorrelation function.
The raw time series is smoothed using 3-point moving average, whose obtained values are used as input of the forecasting model
(a) MA smoothing and (b) HSVD smoothing.
The evaluation executed in the testing stage is presented in Figures
Forecasting with ARIMA.
MA-ARIMA | HSVD-ARIMA | |
---|---|---|
RMSE | 0.0034 | 0.00073 |
MAPE |
|
|
GCV | 0.006 | 0.0013 |
|
|
— |
|
— |
|
MA-ARIMA(9,0,10), (a) observed versus estimated (b) relative error.
Residual ACF: (a) MA-ARIMA(9,0,10) and (b) SVD-ARIMA(9,0,11).
For the evaluation of the serial correlation of the model errors the ACF is applied, whose values are presented in Figure
In this section the forecasting strategy presented in Figure
Accidents time series: (a) components energy, (b) low frequency component, and (c) high frequency component.
To evaluate the model, in this section
Once
SVD-ARIMA(9,0,11): (a) observed versus estimated and (b) relative error.
For the evaluation of the serial correlation of the model errors the ACF is applied, whose values are presented in Figure
The results presented in Table
The raw time series is smoothed using the moving average of order 3, whose obtained values are used as input of the forecasting model presented in Figure
The evaluation executed in the testing stage is presented in Figures
Forecasting with ANN-PSO.
MA-ANN-PSO | HSVD-ANN-PSO | |
---|---|---|
RMSE | 0.04145 | 0.0123 |
MAPE |
|
|
GCV | 0.053 | 0.022 |
|
|
— |
|
— |
|
MA-ANN-PSO(9,10,1): (a) observed versus estimated and (b) relative error.
Residual ACF: (a) MA-ANN-PSO(9,10,1) and (b) HSVD-ANN-PSO(9,11,1).
For the evaluation of the serial correlation of the model errors the ACF is applied, whose values are presented in Figure
The process was run 30 times and the best result was reached in the run 22 as shown in Figure
MA-ANN-PSO(9,10,1): (a) run versus fitness for 2500 iterations and (b) iterations number for the best run.
In this section the forecasting strategy presented in Figure
The evaluation executed in the testing stage is presented in Figures
HSVD-ANN-PSO(9,11,1): (a) observed versus estimated and (b) relative rrror.
For the evaluation of the serial correlation of the model errors the ACF is applied, whose values are presented in Figure
The process was run 30 times and the best result was reached in the run 11 as shown in Figure
HSVD-ANN-PSO(9,11,1): (a) run versus fitness for 2500 iterations and (b) iterations number for the best run.
The results presented in Table
The raw time series is smoothed using the moving average of order 3, whose obtained values are used as input of the forecasting model presented in Figure
The evaluation executed in the testing stage is presented in Figures
Forecasting with ANN-RPROP.
MA-ANN-RPROP | HSVD-ANN-RPROP | |
---|---|---|
RMSE | 0.0384 | 0.024 |
MAPE |
|
|
GCV | 0.0695 | 0.045 |
|
|
— |
|
— |
|
MA-ANN-RPROP(9,10,1): (a) observed versus estimated and (b) relative error.
Residual ACF: (a) MA-ANN-RPROP(9,10,1) and (b) HSVD-ANN-RPROP(9,11,1).
For the evaluation of the serial correlation of the model errors the ACF is applied, whose values are presented in Figure
The process was run 30 times, and the best result was reached in the run 26 as shown in Figure
MA-ANN-RPROP(9,10,1): (a) run versus fitness for 85 iterations and (b) iterations number for the best run.
In this section the forecasting strategy presented in Figure
The evaluation executed in the testing stage is presented in Figures
HSVD-ANN-RPROP(9,11,1): (a) observed versus estimated and (b) relative error.
For the evaluation of the serial correlation of the model errors the ACF is applied, whose values are presented in Figure
HSVD-ANN(9,11,1): (a) run versus fitness for 70 iterations and (b) iterations number for the best run.
The results presented in Table
Finally, Pitman’s correlation test [
The evaluated correlations between
Pitman’s correlation (Corr) for pairwise comparison six models at 5% of significance and the critical value 0.2219.
Models | M1 | M2 | M3 | M4 | M5 | M6 |
---|---|---|---|---|---|---|
M1 |
— | −0.9146 | −0.9931 | −0.9983 | −0.9994 | −0.9993 |
M2 |
— | — | −0.8676 | −0.9648 | −0.9895 | −0.9887 |
M3 |
— | — | — | −0.6645 | −0.8521 | −0.8216 |
M4 |
— | — | — | — | −0.5129 | −0.4458 |
M5 |
— | — | — | — | — | 0.1623 |
M6 |
— | — | — | — | — | — |
The results presented in Table
In this paper were proposed two strategies of time series smoothing to improve the forecasting accuracy. The first smoothing strategy is based on moving average of order 3, while the second is based on the Hankel singular value decomposition. The strategies were evaluated with the time series of traffic accidents occurring in Valparaíso, Chile, from 2003 to 2012.
The estimation of the smoothed values was developed through three conventional models, ARIMA, an ANN based on PSO, and an ANN based on RPROP. The comparison of the six models implemented shows that the first best model is HSVD-ARIMA, as it obtained the major accuracy, with a MAPE of 0.26% and a RMSE of 0.00073, while the second best is the model MA-ARIMA, with a MAPE of 1.12% and a RMSE of 0.0034. On the other hand, the model with the lowest accuracy was MA-ANN-PSO with a MAPE of 15.51% and a RMSE of 0.041. Pitman's test was executed to evaluate the difference of the accuracy between the six proposed models and the results show that statistically there is a significant superiority of the forecasting model based on HSVD-ARIMA. Due to the high accuracy reached with the best model, in future works, it will be applied to evaluate new time series of other regions and countries.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was supported in part by Grant CONICYT/FONDECYT/Regular 1131105 and by the DI-Regular project of the Pontificia Universidad Católica de Valparaíso.