Accurate traffic flow prediction is prerequisite and important for realizing intelligent traffic control and guidance, and it is also the objective requirement for intelligent traffic management. Due to the strong nonlinear, stochastic, timevarying characteristics of urban transport system, artificial intelligence methods such as support vector machine (SVM) are now receiving more and more attentions in this research field. Compared with the traditional singlestep prediction method, the multisteps prediction has the ability that can predict the traffic state trends over a certain period in the future. From the perspective of dynamic decision, it is far important than the current traffic condition obtained. Thus, in this paper, an accurate multisteps traffic flow prediction model based on SVM was proposed. In which, the input vectors were comprised of actual traffic volume and four different types of input vectors were compared to verify their prediction performance with each other. Finally, the model was verified with actual data in the empirical analysis phase and the test results showed that the proposed SVM model had a good ability for traffic flow prediction and the SVMHPT model outperformed the other three models for prediction.
Accurate traffic flow prediction is an important research content in intelligent transportation system (ITS). The traffic flow information predicted rapidly and accurately is essential for traffic control, guidance, and providing traffic information services to the public. Especially in recent years, urban traffic congestion are now becoming the major problems in the traffic management all over the world, which produce a series influence hindering the social sustainable development and people’s daily work. The transit agencies also have realized that the rapid and accurate traffic information can help them efficiently make reasonable and effective traffic guidance strategy. Thus, there is a growing demands in providing accurate traffic flow duration and diffusion prediction over a period of time.
The urban transport system has the characteristics such as nonlinear, stochastic, and timevarying. In order to achieve accurate prediction, some scholars applied traffic flow model to explain the dynamic changes and evolution of traffic flow. However, the traffic flow model is relatively complex in construction and has some difficulties with practical application. Therefore, the models based undetermined predictive method encounters the problems with model construction and solving. In contrast, nonmathematical models such as nonparametric regression, neural networks, and SVM are now widely applied to the traffic flow prediction due to their characteristics of selflearning and without complicated mathematical model construction.
On the other hand, most of the traditional prediction is based on the singlestep method, which can only provide the current or singlestep traffic parameters with obtained traffic data. The information provided is not enough for the public or traffic agencies’ making decision. Thus, multisteps traffic flow prediction is essential for obtaining the traffic state trends over a certain period in the future. From the perspective of dynamic decision, the development trends of traffic states within a certain time in the future are more important than the current traffic state. For example, the traffic congestion is considered occurred with the significant differences between the obtained traffic data and prediction trends. So we can estimate the range of possible spread of congestion and duration according to the traffic parameters of multistep prediction results.
However, accurate prediction of traffic flow is very difficult due to many stochastic variables involved (e.g., traffic conditions). Therefore, the deployment of traffic flow prediction model is a challenging task.
Accurate realtime traffic flow prediction is prerequisite and key to realize intelligent traffic control and guidance, and it is also the objective requirement to intelligent traffic management. Over the past decades, various sophisticated techniques and algorithms have been developed for traffic flow prediction. These methods can be roughly categorized as prediction methods based on mathematics and physics, including the historical average model, time series model, Kalman filter model, and exponential smoothing model; the other methods based on nonmathematical models such as artificial neural network (ANN), nonparametric regression (NPR), and SVM. Here, only a brief introduction about the typical method is made, more detailed information is found in their literature, respectively.
The Kalman filter, also known as linear quadratic estimation (LQE), is an efficient recursive procedure that estimates the future states of dependent variables. It is originated from the statespace representations in modern control theory. Wang et al. [
Support vector machine, a novel supervised learning method used for classification and regression, has been recently proved to be a promising tool for both data classification and pattern recognition [
In summary, due to the characteristics of urban traffic state such as uncertainty, nonlinearity and complexity, some researchers use traffic flow model to illustrate the dynamic changes of traffic state, to predict the evolution of the traffic flow, and then achieved the shortterm traffic flow prediction model. However, the structure of the traffic flow model is complex relatively, which brings difficulty to the practical application. The method based on mathematical model is hard to meet the practical realtime requirements and the need for accuracy due to its difficulties in model building and solving. In contrast, the method based on nonmathematical model does not need complex model building, the prediction accuracy can meet the requirements of the intelligent transportation system. Therefore, these methods have been widely applied to the traffic flow prediction.
Urban road traffic conditions are not only closely related to historical period conditions, but also to the upstream and downstream road state. At present, most of the accurate traffic flow prediction methods are focusing on the relation analysis between the traffic conditions and the historical traffic data from the time dimension of view. While the spatial dimension such as the upstream and downstream traffic condition and the traffic mode changes daily are ignored. In addition, most of the current traffic flow predictions are concentrated on the singlestep prediction, less in the multisteps prediction. In other words, the prediction concentrates more on the coming traffic flow estimation for a specific time point, not concerns with the traffic condition changing trends in the next certain time periods. But in fact, from the perspective of dynamic decision making, the traffic condition changing trends are more important than the current traffic situation for the traffic management. Therefore, it is absolutely essential to establish a multisteps model for the traffic flow prediction, which can be used for estimating the road traffic status trends accurately and the prediction result can be applied in improving rationality of the traffic management and the travel decisionmaking.
This paper seeks to make two contributions to the literature. Firstly, it attempts to develop the models to predict multisteps traffic flow with multiple steps using realworld data. It is expected to help the transit agencies efficiently make reasonable and effective traffic guidance strategy. Secondly, in order to improve the prediction accuracy, not only the historical traffic data but also the daily spacetime sequences data are taken into consideration during the input state vector constructing. The performance of the proposed model can provide valuable insight for researchers as well as practitioners.
The structure of this paper is organized as follows: Section
SVM is a type of learning algorithm based on statistical learning theory, which can be adjusted to map the complex inputoutput relationship for the nonlinear system without dependent on the specific functions. Unlike other nonlinear optimization methods, the solution of SVM always can achieve the global optimal solution without limitation to a local minimum point and it shows the strong resistance to the overfitting problem and the high generalization performance.
Given the samples
Traffic flow data are timeseries data which have the characteristics such as autocorrelation and historical changes with similarity. Figure
Traffic volume diurnal variation trends.
Furthermore, other some cues can be inferred. Firstly, the traffic volume changes with similarity each day due to the inhabitant regular daily traveling. The data waveform presents like a saddle and the peak/valley of the wave appeared in the same time. In this paper, this regular changing pattern is named as the historical change of similarity, which can be used for establishing the traffic flow changing model to predict future multisteps traffic state. Secondly, although the daily traffic volume has the same changing trends, the models have different properties from each other. For example, the traffic volume curve are similar from Monday to Friday, but on Saturday and Sunday, the curve presents different pattern from the weekdays, the peak/valley of traffic volume curve is not obvious for distinguishing.
Therefore, a historical data model for prediction should be constructed which can be used for predicting the traffic flow historical models for every day (from Monday to Sunday). When the database is not enough for predicting, we also can establish the models for weekdays and week ends, respectively. In addition, the trends of traffic flow data also can be influenced by weather or holidays.
Besides the above demonstration about the traffic flow properties in time domain, the data are also correlated in space domain. The traffic flow data has some correlations between the upstream and downstream for a certain road section. The upstream traffic flow will reach the downstream after a while; therefore, we can predict the future traffic flow condition of the downstream based on this correlation.
Given the traffic volume
Based on the description above and the spacetime characteristics of the traffic flow data, in order to predict the traffic flow for the
autocorrelation time feature vector:
historical model feature vector:
space correlation feature vector:
The relationship between each input vector and output vector is shown in Figure
Prediction mechanism based on the traffic flow spacetime characteristics.
From the aspect of theoretical analysis, for the singlestep mode, accurate results can be obtained with only consideration of the time state and space state vectors due to the gradient traffic flow characteristics. On the contrary, for the multisteps mode, the traffic flow changing trends play a more important role in prediction; therefore, more accurate results can be achieved by taking the historical data into account.
Based on the above analysis, the combination of three state vectors is used as the input data for SVM, which are as follows:
the combination of onedimensional data for the specified road section in the previous intervals (T):
the combination of multidimensional spacetime data of the upstream and the specified road section (PT):
the combination of the timeseries data and the historical pattern data for the specified road section in the current interval (HT):
the combination of the timeseries data for the upstream and the specified road section in the current interval and the historical pattern data for the target road section (HPT).
Each of the combination vectors above are used as the input variable of the SVM model, and
As an indicator to reflect the accuracy and availability of the prediction model, error plays an important role in evaluating the prediction model. Common error indictors include absolute prediction error and relative prediction error, which can be calculated as follows, respectively:
In order to obtain an objective and accurate evaluation of the prediction effect, the mean value of all the errors is used as the evaluation index for multisteps prediction, the formulas are as follows:
The presented SVM model for traffic flow prediction was tested with the actual survey data of Gaoerji Road in Dalian, China, dated from May 14 (Monday) to 20 (Sunday), 2012. Gaoerji road is a oneway lane with several imports/exports and goes from Zhongshan road to Yierjiu street through the city center with a distance of 3.8 km. The traffic state is highly congested in the morning and afternoon peaks. In this research, the actual survey started from 7:00 in the morning. The collected data was obtained with SCOOT (Split, Cycle, and Offset Optimization Technology) and consist of the traffic volume during the peak time (PT; 0700–0830 h) and offpeak time (OT; 1000–1200 h) with recording data once every 5 minutes.
There are some influence factors including missing, error and random noise in the raw datasets which is obtained with SCOOT. Therefore, the data must be identified, repaired and smoothed firstly. Then the timeseries data is achieved with normalization method. The main advantage of data normalization is to accelerate convergence velocity of iteration during the SVM training. Another advantage is to facilitate subsequent data processing. Singular value sample data might cause numerical problems because the kernel values usually depend on the inner products of feature vectors such as the linear kernel or the polynomial kernel. Therefore, it is recommended that each attribute should be linearly scaled to the range [−1, +1] or [0, +1]. In this research, the data sets were scaled to the range between 0 and 1.
In Section
Four popular kernel functions.
Kernel function  Expression  Comment 

Linear kernel 


Polynomial kernel 


Radial basis function (RBF) kernel 


Sigmoid kernel 

The previous researches [
The structure of the SVMs model for
Structure of the SVMs model for traffic flow prediction.
To determine the SVM’s inputs vector for practical application, some comparison tests have been conducted between the models with different input variables (T, PT, HT, and HPT) which have been calibrated based on the data collected. The comparison of prediction errors about the four models on offpeak period and peak period prediction error results are shown as Figure
Prediction error results of the SVM model with different input vectors.
During the whole prediction process, the data processing was divided into three steps: respectively for training, crossvalidation, and testing. Firstly, about 10% of samples data were set as testing data. Then, 70% of the remaining samples data were assigned to training and the others to crossvalidation. In particular, the training and testing processes were conducted with the same datasets for the four models in order to have the same basis of comparison.
From the comparison results of the four models (
Firstly, in the case of singlestep prediction, the number of prediction steps
Secondly, in the case of multisteps prediction, the number of prediction steps
It is important to note that the execution performance of
In order to verify the actual performance of the prediction model presented above, a test was conducted with
Example of dynamic multisteps prediction results of
The urban transport system has the characteristics such as nonlinear, stochastic, and timevarying. Therefore, artificial intelligence methods are now receiving more and more attentions in ITS. In order to predict traffic flow with multisteps accurately, this paper proposed a SVM model for the prediction. In the present research, the traffic volume data with actual observation surveys in urban area of Dalian was used to predict the traffic flow by SVM models. In order to obtain the prediction effect with different input vectors, some comparison tests are conducted with different input variables (T, PT, HT, and HPT). The test results showed that the proposed SVM model had a good ability for traffic flow prediction and the comparison of different input vectors indicated that the
In this paper, only the traffic volume data was used to estimate the current traffic conditions. Further study will consider more factors analysis such as the input vectors dimensions and the prediction steps so as to enhance the performance of the proposed prediction models.
This work was jointly supported by grants from the Humanities and Social Sciences Foundation of the Ministry of Education in China (Project no. 12YJCZH280), the Ph.D. Programs Foundation of Ministry of Education of China (Project no. 20112125120004), and National Natural Science Foundation of China (Project no. 11272075).