Shorttime traffic flow prediction is necessary for advanced traffic management system (ATMS) and advanced traveler information system (ATIS). In order to improve the effect of shortterm traffic flow prediction, this paper presents a shortterm traffic flow multistep prediction method based on similarity search of time series. Firstly, the landmark model is used to represent time series of traffic flow data. Then the input data of prediction model are determined through searching similar time series. Finally, the echo state networks model is used for traffic flow multistep prediction. The performance of the proposed method is measured with expressway traffic flow data collected from loop detectors in Shanghai, China. The experimental results demonstrate that the proposed method can achieve better multistep prediction performance than conventional methods.
Accurate and realtime traffic flow forecasting is essential to adaptive traffic control system and traffic guidance system, which is of great significance for alleviating urban traffic congestions. Because of the importance of traffic flow prediction results, many traffic engineering researchers began to apply mature prediction models of other areas to shortterm traffic flow prediction and developed a variety of forecasting methods at the beginning of the 1960s. Earlier prediction methods mainly included autoregressive model, moving average model, autoregressive integrated moving average model [
Despite number of methods having been put forward to improve forecasting accuracy, shortterm traffic flow forecasting is still a difficult challenge. There are generally two aspects of shortcomings among the existing traffic flow forecasting methods. Most of the achievements mainly focused on the research of model optimization, but ignored the effective use of the similarity characteristics of traffic flow data. Specifically, most of the forecasting models used the traffic flow data which are at the prior time instant of prediction moments as input data. However, the fluctuation of traffic flow has strong randomness. If the input data of prediction model only relies on the data of the prior time instant, there will be large prediction error. In addition, majority of researchers only conducted onestep prediction, which cannot describe the future trend of traffic state sufficiently. There are different requirements for the length of prediction intervals according to different applications. For example, traffic control system needs to grasp recent traffic flow forecasting results for realtime traffic control, while traffic guidance system requires relatively long time forecasting results to be able to understand the trend of traffic state. Therefore, it is essential to establish a shortterm traffic flow multistep forecasting method which can make full use of similarity characteristics of traffic flow data.
Aiming at the shortcomings of the previous traffic flow forecasting methods, this paper presents a shortterm traffic flow multistep prediction method based on similarity search of time series. The general idea of the proposed method mainly includes two parts: first, the input data of prediction model are determined by searching similar time series instead of the data of the prior time instant; second, the echo state networks model is used for shortterm traffic flow multistep forecasting. Figure
The schematic of traffic flow multistep forecasting method.
There are large numbers of shortterm fluctuations and random disturbance in original traffic flow data. The direct use of original time series data for similarity search will not only lead to low efficiency, but also influence the accuracy and reliability. Therefore, many researchers have put forward pattern representation methods of time series. The existing pattern representation methods of time series mainly include discrete Fourier transform method [
Landmark model proposed by Perng is consistent with human intuition and episodic memory. The basic idea of landmark model is that the searching object is landmarks series rather than the original time series. If the
The original time series of traffic flow data is usually noisy. The minimal distance/percentage principle is presented to eliminate noise in landmark model. It is defined as follows.
For a series of landmarks
Minimal distance/percentage principle.
For most of the similarity models, the error tolerance is a single value which is measured from pointwise differences in amplitude. Nevertheless, landmarks distance is needed to measure similarity in the landmark model. The definition of landmarks distance is given below.
Given two series of landmarks
In the process of similarity search of time series, the calculation amount of online operation is tremendous due to the pattern representation for each search. In order to reduce the calculation amount of online operation and improve the efficiency of similarity search, it is necessary to build a historical database.
Neural network methods are popular among many traffic flow prediction methods. However, traditional neural network models often suffer from slow convergence and local optimum. Either feedforward neural network model or recursion neural network model is limited in practical applications. Aiming at the shortcomings of traditional neural network models, Jaeger and Haas [
As shown in Figure
The structure of echo state network model.
ESN is a special type of neural network. The basic idea of the ESN model is to use recursive network with largescale random connections to replace the middle layer in classical neural network, so as to simplify the network training process. The state equation of echo state networks model is as follows:
The shortterm traffic flow multistep prediction method based on similarity search of time series mainly includes pattern representation of traffic flow time series, similarity search of time series, and prediction model. The basic process is shown in Figure
Building historical database with the feature of completeness and typicality: the historical traffic flow data which have strong similarity with predicted traffic flow time series are selected to build historical database. Generally, both temporal and dimensional factors should be considered to improve the quality of historical database.
Pattern representation of time series: the landmark model is used to represent the historical traffic flow time series and current traffic flow time series, which can improve the efficiency of similarity search.
Similarity search of time series: the landmarks distance is calculated between historical time series and current time series to select similar traffic flow time series. The corresponding input data of prediction model are determined according to similar time series.
Traffic flow multistep prediction model: shortterm traffic flow multistep prediction is carried out using echo state networks model.
The process of traffic flow multistep prediction method.
The traffic flow data come from loop detectors located on tenkilometer long expressway in Shanghai, China. This segment includes 24 mainline detecting sections and 30 ramp detecting sections, equipped with 88 mainline loop detectors and 60 ramp loop detectors, respectively. The experimental data are collected on five consecutive Mondays from September 1, 2008, to September 29, 2008. The time interval of collected data is 20 s. Figure
The layout of the loop detectors.
Duo to the stochastic volatility of traffic flow data collected per 20 s, they are rarely used in traffic flow prediction, while fiveminute traffic flow data are usually used in practical applications. Therefore, the original traffic flow data have been aggregated into fiveminute intervals. In addition, some practical applications such as traffic flow guidance system not only need realtime traffic flow information, but also require the traffic flow information within one hour. So this paper conducts twelvestep prediction for shortterm traffic flow data. Figure
Traffic flow data from loop detectors.
In order to evaluate the performance of the proposed traffic flow multistep prediction method, two different types of measurements are introduced: the mean absolute percentage error denoted by MAPE and the proportion which the MAPE is in the range of
In order to verify the effectiveness of pattern representation, we take the traffic flow data collected from loop detector NBDX08(1) on September 1, 2008, for example. The traffic flow data are represented by using firstorder landmarks. The MDPP (2.15%) is used to smooth the landmarks series. Figure
The effectiveness of pattern representation.
From Figure
Two parameters have to be addressed in the process of similarity search. One is the number of landmarks for similarity search denoted by
The MAPE corresponding to different parameter values.








28.6%  25.3%  21.9%  18.4%  18.6% 

23.5%  18.5%  17.4%  15.5%  16.3% 

21.8%  18.2%  19.7%  16.8%  17.2% 

20.6%  17.8%  18.1%  17.2%  18.5% 
As shown in Table
In order to display the predicted effect of the proposed method intuitively, Figure
The onestep prediction results based on the proposed method.
Because of their well theoretical foundation and effectiveness in prediction, the ARIMA model and BPNN model gradually have become standard methods to compare with newly developed forecasting models. Therefore, this paper considers ARIMA model and BPNN model as standard methods to evaluate the effectiveness of the proposed method. In addition, the ESN model whose input data are the data at the prior time instant of prediction moments is also selected as comparison method. The orders of the ARIMA model are determined based on the AIC criteria. The parameters of the BPNN model are selected as follows: the number of input layer units is 5, the number of output layer units is 1, the number of hidden layer units is 8, the activation function of hidden layer units is selected to sigmoid function, and the activation function of output units is liner function. Figure
The MAPE of different methods from onestep to twelvestep prediction.
From Figure
Figure
The proportion in which the MAPE is less than 5% for different methods.
The proportion in which the MAPE is in a different range of the proposed method.
This paper proposed a shortterm traffic flow multistep prediction method based on similarity search of time series. The landmark model was used to represent original time series of traffic flow data. Furthermore, the input data of prediction model were determined through searching similar time series from historical database. Finally, the echo state networks model was used for shorttime traffic flow multistep prediction. Expressway traffic flow data collected from Shanghai were employed to evaluate the prediction performance of the proposed method. The experimental results demonstrated that the proposed method can achieve satisfactory accuracy and the MAPE of the proposed method is about 15.5%. The comparative analysis showed that the multistep prediction performance of the proposed method not only outperformed ARIMA model and BPNN model, but also outperformed ESN model whose input data are the data at the prior time instant of prediction moments. In addition, the proportion in which the MAPE is less than 20% based on the proposed method could reach up to 89.5%, which indicated that the proposed method can achieve high quality forecasting results in most of the time.
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors express their sincere appreciation to the Chinese National High Technology Research and Development Program Committee for the financial support provided under Grant no. 2014BAG03B03, China Postdoctoral Science Foundation no. 2013T60331, and Specialized Research Fund for the Doctoral Program of Higher Education no. 20120061120046.