Short-term traffic prediction is a key component of Intelligent Transportation Systems. It uses historical data to construct models for reliably predicting traffic state at specific locations in road networks in the near future. Despite being a mature field, short-term traffic prediction still poses some open problems related to the choice of optimal data resolution, prediction of nonrecurring congestion, and the modelling of relevant spatiotemporal dependencies. As a step towards addressing these problems, this paper investigates the ability of Artificial Neural Networks, Random Forests, and Support Vector Regression algorithms to reliably model traffic flow at different data resolutions and respond to unexpected traffic incidents. We also explore different feature selection methods to identify and better understand the spatiotemporal attributes that most influence the reliability of these models. Experimental results indicate that data aggregation does not necessarily achieve good performance for multivariate spatiotemporal machine learning models. The models learned using high-resolution 30-second input data outperformed the corresponding baseline ARIMA models by
Traffic congestion results in significant monetary losses in countries around the world, with the cost of traffic congestion in 2014 estimated to be Explore the effect of the resolution of multivariate spatiotemporal input data on the accuracy of short-term traffic predictions models; we specifically consider models built using Artificial Neural Networks, Support Vector Regression, and Random Forests. Evaluate the responsiveness of these predictive models to nonrecurring congestion events. Specifically, we study the reliability of the predictions provided by these models in the presence of unexpected events such as accidents. Identify the spatiotemporal traffic attributes that most influence the performance of these models and their ability to model the complex dependencies in traffic data.
We illustrate these contributions using historical data of volume and occupancy measurements on a highway in Auckland (New Zealand). We first motivate the need for the proposed study by discussing related work in Section
Many algorithms have been developed for short-term traffic prediction, which is a complex problem influenced by a variety of factors such as the resolution (i.e., the aggregation level) of the input and output data, and spatiotemporal dynamics. We review some of the related work in this section.
Although studies in the existing literature predominantly use data aggregated over 5 min and 15 min intervals, some prior studies have investigated the effect of data resolution on the reliability of the predictions provided by the corresponding models; the results have, however, been inconclusive. For instance, Park et al. [
The use of high-resolution data is challenging for multiple reasons. First, for some statistical models used for short-term traffic state prediction, it is necessary to ensure that the input data and the output data have the same aggregation level, but this constraint can be relaxed when machine learning algorithms are used to build predictive models. Second, while research shows that the high-resolution data (as expected) includes more accurate measurements; for example, Martin et al. [
There has been considerable research on analyzing the effects of spatiotemporal dynamics. For instance, Kamarianakis and Prastacos [
In addition to the approaches that build on the ARIMA models [
There is no agreement in the literature regarding the number of upstream and downstream links (neighbouring any link of interest) that should be considered while building the predictive models. While some algorithms consider just one upstream or downstream link [
Most existing work on short-term traffic prediction focuses on typical conditions [
In this study, we explore three machine learning algorithms that have demonstrated the ability to incorporate spatiotemporal data in predictive models built for intelligent transportation and other applications. Specifically, we explore (1) Artificial Neural Networks (ANN), (2) Support Vector Regression (SVR), and (3) Random Forests (RF). We chose ANN and SVR because they are the most widely used machine learning algorithms used to build predictive models in the literature. We chose Random Forests since it is an ensemble learning algorithm that requires a small number of parameters to be tuned. Please note that the primary objective of our study was not to introduce new algorithms. Instead, we make three key contributions. First, we examine how the predictive accuracy of models based on these algorithms changes as a function of the aggregation level of the input data. Second, we explore the ability of these models to respond accurately to nonrecurring congestion conditions. Third, we identify the spatiotemporal attributes that most influence the predictive accuracy of these models and their ability to model the complex dependencies in traffic data.
This section introduces the study area and data and provides a mathematical formulation of the short-term traffic prediction problem (Section
This study was carried out in a
Study area with 45 road segments spread over
Traffic can be measured in different ways. The most common sensor used to collect traffic data is the Inductive Loop Detector, which comes in different forms. Dual loop detectors, which have two inductive loops placed a short distance apart, are able to accurately capture the speed of a vehicle going over them, the volume (i.e., count of vehicles passing the detector), and occupancy (i.e., the amount of time a vehicle was over the detector). However, most of the loops in many cities (including Auckland) are single loop detectors, which can measure volume and occupancy but can only estimate vehicle speed as a function of these measured values and the average effective vehicle length. Research shows that measuring speed with a constant effective vehicle length can lead to errors of up to
The fundamental model of traffic flow established by traffic engineers considers the relationship between three key traffic variables: (1) flow (volume), (2) density, and (3) speed. Since density is difficult to measure directly, occupancy is frequently used as a substitute [
For each predictive model, the input vector
Data from 30 days of April 2016 was collected for 45 segments
The dataset was preprocessed to remove some extreme values that were highly unlikely. First, we used winsorization [
Volume values from segment
Second, we scaled each attribute in the input data to lie
Training of the models was accomplished using data from the first 20 days (57,600 samples), and data corresponding to the remaining ten days was used for testing. The parameters of each model were tuned using the training dataset. Next, we briefly discuss the algorithms that we used to build the models for short-term traffic prediction.
In this section, we describe the three machine learning algorithms used to build the predictive models explored in this paper: Artificial Neural Networks (Section
Feedforward neural networks or multilayer perceptrons are the most common Artificial Neural Network (ANN) models. A neural network is composed of neurons arranged in layers with each layer containing one or more neurons. Each neuron is connected to all the neurons in its adjacent layers, and neurons within a layer are not connected. Each neuron takes a linear weighted sum of all its inputs
Each such output
The weights
Although the nonlinear activation function in a neural network has traditionally been the sigmoid function, empirical results have indicated that the rectified linear unit (ReLU) activation function improves the ability to model complex relationships and reduces the time taken to train the model [
For classification problems, a Support Vector Machine computes a decision boundary that maximizes the margin between this boundary and the closest data sample. Support Vector Regression (SVR) uses a similar approach for regression problems—errors corresponding to estimated values within an
We can also incorporate nonlinear kernel functions to extend SVR to nonlinear problems. Popular kernels include linear kernel and the Radial Basis Function (RBF) kernel, which transform the input sample into a higher dimensional space that results in better separation (for classification) or estimation of values (for regression). We experimentally chose to use a linear kernel for SVR because it provided better results.
Random Forest (RF) [ Pick Train a decision tree
In other words, each subset created by sampling from the training set with replacement results in a decision tree. The prediction for any test input
This approach ensures that individual trees are not highly correlated because of a small number of strong predictors. RF methods are popular because they provide some robustness to noisy data with outliers. They are also able to focus on attributes most useful to the regression or classification task under consideration and ignore attributes that are less relevant. In our study, we used a RF with 100 trees.
We experimentally evaluated the following hypotheses regarding the predictive models learning using the machine learning algorithms: The learned models are able to disregard the amplification of noise and variations in high-resolution data and provide higher accuracy than models that do not use high-resolution data The learned models are responsive to nonrecurring congestion events such as accidents, and this ability improves with the increase in the resolution of data The learned models are able to capture the complex spatiotemporal evolution of traffic by assigning higher importance to volume and occupancy attributes extracted from segments near the segment of interest
As baselines for comparison, wherever appropriate, we used two established methods for volume prediction in existing literature (ARIMA, historical average). To experimentally evaluate the hypotheses, we used three measures: accuracy, root mean square error (RMSE), and mean absolute error (MAE), defined as follows:
To quantify responsiveness to nonrecurring conditions, we computed these measures over samples that were representative of nonrecurring conditions. Specifically, a sample
This section discusses the results of experimentally evaluating the three hypotheses listed in Section
As stated in Section
The results summarized in Table
Traffic volume prediction under all conditions.
Model | Input resolution (minutes) | ||||||||
---|---|---|---|---|---|---|---|---|---|
0.5 | 5 | 15 | |||||||
Accuracy | RMSE | MAE | Accuracy | RMSE | MAE | Accuracy | RMSE | MAE | |
ANN | 0.906 (0.01) | 34.5 (11.7) | 23.8 (8.6) | 0.889 (0.01) | 44.5 (16.9) | 30.1 (11.5) | 0.865 (0.013) | 53.6 (24.8) | 37.3 (16.9) |
RF | 0.904 (0.01) | 34.0 (11.2) | 23.8 (8.5) | 0.890 (0.013) | 39.9 (13.3) | 28.1 (9.7) | |||
SVR | 0.905 (0.01) | 34.7 (12.2) | 24.4 (8.8) | 0.894 (0.01) | 39.5 (14.5) | 27.9 (10.6) | 0.882 (0.007) | 43.7 (16.3) | 30.9 (11.9) |
Historical avg. | 0.806 (0.01) | 79.7 (35.7) | 43.5 (17.4) | 0.806 (0.01) | 79.7 (35.7) | 43.5 (17.4) | 0.806 (0.01) | 79.7 (35.7) | 43.5 (17.4) |
ARIMA | 0.839 (0.02) | 54.6 (18.3) | 39.1 (13.2) | 0.879 (0.01) | 43.8 (15.6) | 30.6 (11.4) | 0.881 (0.01) | 44.3 (16.3) | 30.1 (11.4) |
Standard deviations across segments are reported in parentheses and numbers in boldface show the best results.
The results in Table
Diebold–Mariano test statistic for each pair of models for predicting volume.
0.5 min | 5 min | 15 min | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
ANN | RF | SVR | ANN | RF | SVR | ANN | RF | SVR | ||
0.5 min | ANN | — | 36.88 | 12.69 | −54.59 | 23.10 | −19.35 | −71.81 | −17.20 | −38.30 |
RF | −36.88 | — | −27.87 | −69.47 | −11.95 | −50.30 | −94.20 | −44.32 | −63.09 | |
SVR | −12.69 | 27.87 | — | −65.29 | 14.55 | −52.00 | −92.40 | −31.63 | −62.36 | |
5 min | ANN | 54.59 | 69.47 | 65.29 | — | 71.18 | 48.64 | −11.23 | 43.96 | 25.76 |
RF | −23.10 | 11.95 | −14.55 | −71.18 | — | −50.28 | −100.1 | −45.49 | −66.02 | |
SVR | 19.35 | 50.30 | 52.00 | −48.64 | 50.28 | — | −79.88 | −57.11 | ||
15 min | ANN | 71.81 | 94.20 | 92.40 | 11.23 | 100.1 | 79.88 | — | 68.62 | 51.22 |
RF | 17.20 | 44.32 | 31.63 | −43.96 | 45.49 | − | −68.62 | — | −29.18 | |
SVR | 38.30 | 63.09 | 62.36 | −25.76 | 66.02 | 57.11 | −51.22 | 29.18 | — |
Critical value:
Table
Traffic occupancy prediction under all conditions.
Model | Input resolution (minutes) | ||||||||
---|---|---|---|---|---|---|---|---|---|
0.5 | 5 | 15 | |||||||
Accuracy | RMSE | MAE | Accuracy | RMSE | MAE | Accuracy | RMSE | MAE | |
ANN | 0.859 (0.02) | 1.98 (0.64) | 1.00 (0.37) | 0.838 (0.01) | 2.59 (0.74) | 1.27 (0.44) | 0.780 (0.03) | 3.51 (0.89) | 1.70 (0.57) |
RF | 0.850 (0.01) | 2.17 (0.55) | 1.07 (0.35) | 0.80 (0.03) | 2.80 (0.70) | 1.43 (0.47) | |||
SVR | 0.858 (0.01) | 1.88 (0.46) | 0.95 (0.30) | 0.829 (0.01) | 2.13 (0.52) | 1.12 (0.33) | 0.732 (0.04) | 2.54 (0.59) | 1.45 (0.34) |
Historical avg. | 0.433 (0.02) | 7.49 (4.50) | 3.56 (1.02) | 0.433 (0.02) | 7.49 (4.50) | 3.56 (1.02) | 0.433 (0.02) | 7.49 (4.50) | 3.56 (1.02) |
ARIMA | 0.689 (0.04) | 20.5 (4.71) | 10.1 (2.65) | 0.833 (0.02) | 2.37 (0.70) | 1.17 (0.41) | 0.834 (0.02) | 2.59 (0.80) | 1.22 (0.43) |
Standard deviations across segments are reported in parentheses and numbers in boldface show the best results.
Next, the average accuracy and MAE at different times of the day, for the three different data aggregation levels, are shown in Figure
Accuracy and MAE at different times of the day. Shaded areas are the
The results discussed so far support the first hypothesis that predictive models based on machine learning algorithms are able to disregard the amplification of noise in high-resolution data and provide higher accuracy than models that do not use the high-resolution data. The lower accuracy values during overnight hours can be explained by the accuracy being represented as a percentage of vehicles and the average number of vehicles overnight being significantly lower; this is confirmed by the lower MAE values for the same period.
Next, we evaluated the second hypothesis by examining the responsiveness of the predictive models to nonrecurring congestion events. We did so by only evaluating the trained predictive models on a subset of the test set comprising samples that were significantly different from historical average values. The results are summarized in Tables
Traffic volume prediction under nonrecurring congestion conditions.
Model | Input resolution (minutes) | ||||||||
---|---|---|---|---|---|---|---|---|---|
0.5 | 5 | 15 | |||||||
Accuracy | RMSE | MAE | Accuracy | RMSE | MAE | Accuracy | RMSE | MAE | |
ANN | 0.880 (0.02) | 66.5 (24.4) | 46.2 (17.0) | 0.840 (0.03) | 80.2 (29.7) | 59.3 (22.8) | |||
RF | 0.900 (0.01) | 50.1 (17.4) | 37.4 (13.2) | 0.890 (0.01) | 57.3 (19.6) | 42.0 (15.2) | 0.860 (0.02) | 66.6 (21.1) | 50.9 (16.0) |
SVR | 0.892 (0.02) | 56.0 (18.9) | 41.0 (14.1) | 0.870 (0.02) | 67.2 (21.4) | 49.7 (16.1) | 0.850 (0.03) | 76.1 (22.9) | 56.4 (17.0) |
Historical avg. | 0.139 (0.08) | 232 (109) | 192 (83.6) | 0.139 (0.08) | 232 (109) | 192 (83.6) | 0.139 (0.08) | 232 (109) | 192 (83.6) |
ARIMA | 0.851 (0.02) | 73.8 (20.5) | 54.2 (15.5) | 0.670 (0.02) | 176 (157) | 126 (48) | 0.860 (0.02) | 77.7 (30.4) | 51.6 (19.7) |
Standard deviations across segments are reported in parentheses and numbers in boldface show the best results.
Traffic occupancy prediction under nonrecurring congestion conditions.
Model | Input resolution (minutes) | ||||||||
---|---|---|---|---|---|---|---|---|---|
0.5 | 5 | 15 | |||||||
Accuracy | RMSE | MAE | Accuracy | RMSE | MAE | Accuracy | RMSE | MAE | |
ANN | 0.869 (0.01) | 1.93 (1.27) | 0.94 (0.29) | 0.837 (0.01) | 2.77 (1.72) | 1.32 (0.34) | 0.80 (0.02) | 3.50 (2.19) | 1.63 (0.48) |
RF | 0.850 (0.01) | 2.21 (1.42) | 1.07 (0.32) | 0.796 (0.02) | 2.85 (1.83) | 1.42 (0.43) | |||
SVR | 0.858 (0.01) | 1.92 (1.20) | 0.95 (0.23) | 0.828 (0.01) | 2.18 (1.38) | 1.13 (0.28) | 0.73 (0.03) | 2.58 (1.60) | 1.44 (0.31) |
Historical avg. | −1.57 (0.83) | 18.0 (7.92) | 16.4 (1.76) | −1.57 (0.83) | 18.0 (7.92) | 16.4 (1.76) | −1.57 (0.83) | 18.0 (7.92) | 16.4 (1.76) |
Standard deviations across segments are reported in parentheses, and numbers in boldface show the best results.
To further explore the responsiveness of the learned models, we examined a known (i.e., reported) breakdown along the motorway in more detail. Figure
(a) Traffic volume in segment 23 on April 21, 2016, (Thursday) compared with the historical weekly average and (b) tweets from NZTA accessed from [
Figures
Traffic volume predictions in response to a nonrecurring congestion event for different input data aggregation levels (
For additional examples of how the models predicted during nonrecurring congestion, see Figure
Additional examples of nonrecurring congestion events. Predictive models based on machine learning methods provide good tracking performance, especially at the high-resolution (
Figure
Traffic volume prediction on April 25, 2016, a public holiday in New Zealand (ANZAC day).
Table
Training and testing time for each of the three learned models for short-term traffic prediction. Results indicate that these learned models will scale well for short-term predictions in large road networks.
Average training time for 57600 samples (seconds) | Average prediction time for one input sample (milliseconds) | |
---|---|---|
ANN | 283.8 | 0.16 |
RF | 1154 | 82.08 |
SVR | 4.743 | 0.0223 |
We did not optimize our algorithms—performance could have been improved by using fewer training samples or tuning the algorithms’ parameters, for example, by using a smaller number of trees in the Random Forest or a smaller neural network. The different algorithms take different amounts of time for training and testing; for example, models based on the (linear) SVR algorithm have the lowest training time and testing time—the nonlinear SVR models have a much longer training time (
Overall, we believe that models based on these machine learning methods will scale to large road networks. The retraining of the models can be undertaken as new data comes in over several weeks or months, enabling the system to adapt to changes in the road network.
Next, we evaluate the third hypothesis regarding the ability to model the complex spatiotemporal evolution of traffic. To do so, we first identify the attributes that most influence the performance of the learned predictive models.
One common approach for identifying informative attributes is to compute the Pearson correlation coefficient between the target variable and each of the input attributes [
There are different ways to characterize the importance of attributes in RF-based models. Since any RF is a collection of decision trees, the
Figures
Ranking of attributes in terms of their relative importance to the performance of ANN models, for each of the three different input data aggregation levels (segment 23). The plots for the volume features are on the left and those for the occupancy features are on the right. (a) 30 sec aggregation level. (b) 5 min aggregation level. (c) 15 min aggregation level.
Ranking of attributes in terms of their relative importance to the performance of SVR models, for three different input data aggregation levels (segment 23). The plots for the volume features are on the left and those for the occupancy features are on the right. (a) 30 sec aggregation level. (b) 5 min aggregation level. (c) 15 min aggregation level.
Ranking of attributes in terms of their relative importance to the performance of RF models, for three different input data aggregation levels (segment 23). The plots for the volume features are on the left and those for the occupancy features are on the right. (a) 30 sec aggregation level. (b) 5 min aggregation level. (c) 15 min aggregation level.
A more careful examination of the results indicated that the predictive models based on SVR and RF assign higher importance to volume attributes than occupancy attributes when making decisions. Also, the same set of attributes do not contribute significantly to the performance of all three models. For all three models, the attributes that are considered important change when the resolution of the input data changes. For instance, for the models based on the
To further analyze the importance of the attributes, we considered the relative importance of different subsets of these ranked attributes. We observed that the performance, specifically accuracy, flattens out after including
Accuracy of each of the three models for the
Finally, we compared the performance of the RFE approach for ranking attributes with the more common correlation-based approach and an approach that chose important attributes randomly; we considered the performance of the corresponding models under normal conditions and in the presence of nonrecurring congestion events. Tables
Comparison of feature selection methods; traffic volume predictions under all conditions with
Model | Features | Accuracy | RMSE | MAE | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Correlation | Random | RFE | Correlation | Random | RFE | Correlation | Random | RFE | ||
ANN | 10 | 0.864 | 0.853 | 0.863 | 40.5 | 47.0 | 42.2 | 27.9 | 32.7 | 29.1 |
20 | 0.880 | 0.850 | 0.878 | 37.7 | 46.2 | 34.7 | 25.7 | 32.9 | 24.8 | |
40 | 0.891 | 0.866 | 0.896 | 35.4 | 39.5 | 31.2 | 24.3 | 28.0 | 21.7 | |
60 | 0.893 | 0.877 | 0.900 | 35.5 | 36.9 | 29.0 | 23.8 | 26.5 | 20.8 | |
80 | 0.893 | 0.881 | 0.905 | 35.2 | 36.6 | 23.7 | 25.5 | |||
100 | 0.894 | 34.9 | 27.5 | 23.5 | 19.6 | |||||
120 | 0.894 | 0.885 | 0.904 | 34.6 | 33.8 | 29.0 | 23.3 | 23.9 | 20.4 | |
140 | 0.889 | 0.889 | 0.900 | 35.4 | 33.1 | 28.7 | 23.8 | 23.6 | 20.3 | |
160 | 0.897 | 0.879 | 0.902 | 33.1 | 37.6 | 29.6 | 22.5 | 26.3 | 20.5 | |
180 | 0.884 | 0.896 | 34.3 | 31.4 | 24.4 | 21.7 | ||||
RF | 10 | 0.871 | 0.841 | 0.886 | 39.9 | 49.6 | 34.6 | 27.5 | 34.5 | 24.2 |
20 | 0.880 | 0.847 | 0.897 | 37.5 | 45.0 | 30.4 | 25.9 | 32.6 | 21.5 | |
40 | 0.886 | 0.875 | 0.902 | 36.1 | 36.2 | 28.3 | 24.7 | 25.9 | 20.1 | |
60 | 0.891 | 0.882 | 0.903 | 35.1 | 35.0 | 28.3 | 23.7 | 24.9 | 20.1 | |
80 | 0.892 | 0.866 | 0.904 | 35.0 | 36.0 | 28.0 | 23.5 | 25.5 | 19.9 | |
100 | 0.893 | 0.886 | 0.905 | 34.8 | 33.7 | 27.7 | 23.3 | 24.2 | ||
120 | 0.894 | 0.885 | 0.905 | 34.7 | 34.6 | 27.8 | 23.2 | 24.5 | 19.8 | |
140 | 0.894 | 0.905 | 34.7 | 23.2 | 19.7 | |||||
160 | 0.894 | 0.883 | 0.905 | 34.7 | 33.7 | 27.7 | 23.2 | 24.2 | 19.7 | |
180 | 0.890 | 32.9 | 27.7 | 23.4 | 19.7 | |||||
SVR | 10 | 0.859 | 0.758 | 0.867 | 41.6 | 76.2 | 38.3 | 29.2 | 51.3 | 27.5 |
20 | 0.870 | 0.843 | 0.882 | 40.1 | 55.5 | 33.7 | 27.8 | 36.4 | 24.4 | |
40 | 0.877 | 0.850 | 0.895 | 38.4 | 47.8 | 26.5 | 33.1 | 22.2 | ||
60 | 0.881 | 0.854 | 0.895 | 37.8 | 49.7 | 32.1 | 25.7 | 34.2 | 22.7 | |
80 | 0.885 | 0.897 | 37.5 | 31.5 | 25.2 | 22.3 | ||||
100 | 0.886 | 0.876 | 0.896 | 38.2 | 32.1 | 26.9 | 22.6 | |||
120 | 0.887 | 0.869 | 0.896 | 37.6 | 42.4 | 31.8 | 25.1 | 29.6 | 22.4 | |
140 | 0.887 | 0.878 | 0.897 | 37.7 | 38.9 | 31.6 | 25.1 | 26.9 | 22.3 | |
160 | 0.887 | 0.878 | 0.896 | 37.7 | 38.7 | 31.7 | 25.1 | 27.0 | 22.3 | |
180 | 0.878 | 37.8 | 38.8 | 31.4 | 25.1 | 27.1 |
Comparison of feature selection methods.
Model | Features | Accuracy | RMSE | MAE | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Correlation | Random | RFE | Correlation | Random | RFE | Correlation | Random | RFE | ||
ANN | 10 | 0.833 | 0.800 | 0.855 | 70.1 | 80.7 | 58.3 | 49.8 | 59.1 | 43.4 |
20 | 0.845 | 0.822 | 0.880 | 66.1 | 69.4 | 47.8 | 46.5 | 53.6 | 35.8 | |
40 | 0.861 | 0.843 | 0.891 | 60.2 | 63.8 | 45.5 | 42.6 | 47.6 | 33.3 | |
60 | 0.857 | 0.864 | 0.901 | 62.8 | 55.6 | 41.1 | 43.8 | 41.6 | 30.1 | |
80 | 0.861 | 0.857 | 0.906 | 61.4 | 60.6 | 39.8 | 42.9 | 43.5 | 28.7 | |
100 | 0.862 | 0.906 | 60.4 | 42.3 | ||||||
120 | 0.861 | 0.883 | 61.1 | 49.5 | 40.0 | 42.7 | 35.9 | 28.8 | ||
140 | 0.856 | 0.886 | 0.907 | 63.5 | 48.2 | 40.0 | 44.1 | 35.0 | 28.4 | |
160 | 0.875 | 0.905 | 54.0 | 40.0 | 38.4 | 29.0 | ||||
180 | 0.868 | 0.876 | 0.902 | 57.2 | 52.1 | 43.9 | 40.2 | 38.0 | 31.0 | |
RF | 10 | 0.832 | 0.797 | 0.875 | 69.2 | 81.1 | 52.4 | 50.3 | 61.1 | 38.5 |
20 | 0.842 | 0.836 | 0.886 | 65.9 | 62.9 | 48.0 | 47.8 | 48.9 | 34.7 | |
40 | 0.847 | 0.858 | 64.0 | 55.5 | 46.0 | 42.1 | ||||
60 | 0.852 | 0.856 | 0.892 | 58.7 | 44.6 | 45.1 | 43.7 | 32.8 | ||
80 | 0.851 | 0.865 | 0.893 | 64.0 | 57.2 | 44.5 | 45.2 | 41.1 | 32.6 | |
100 | 0.852 | 0.868 | 0.893 | 63.8 | 54.2 | 44.2 | 45.0 | 40.4 | 32.5 | |
120 | 0.851 | 0.860 | 0.893 | 64.0 | 57.6 | 44.2 | 45.2 | 42.3 | 32.5 | |
140 | 0.852 | 0.868 | 0.893 | 63.8 | 54.0 | 44.0 | 45.0 | 40.4 | 32.4 | |
160 | 0.851 | 0.867 | 0.894 | 64.0 | 53.9 | 43.9 | 45.1 | 40.2 | 32.4 | |
180 | 0.893 | 63.9 | 44.1 | 32.4 | ||||||
SVR | 10 | 0.825 | 0.722 | 0.850 | 72.6 | 114.1 | 59.3 | 53.0 | 84.4 | 45.2 |
20 | 0.822 | 0.751 | 73.5 | 100.8 | 53.4 | 73.1 | ||||
40 | 0.834 | 0.796 | 0.874 | 81.6 | 50.0 | 50.1 | 60.4 | 37.7 | ||
60 | 0.816 | 0.878 | 70.2 | 76.3 | 50.0 | 55.2 | 37.2 | |||
80 | 0.832 | 0.838 | 0.879 | 71.5 | 66.0 | 49.9 | 50.5 | 48.7 | 36.9 | |
100 | 0.832 | 0.879 | 72.0 | 50.3 | 50.5 | 37.1 | ||||
120 | 0.829 | 0.828 | 0.877 | 72.7 | 70.0 | 50.8 | 51.2 | 51.8 | 37.5 | |
140 | 0.830 | 0.832 | 0.876 | 72.6 | 69.3 | 51.3 | 51.1 | 50.7 | 37.8 | |
160 | 0.829 | 0.839 | 0.875 | 72.7 | 65.9 | 51.7 | 51.1 | 48.3 | 38.0 | |
180 | 0.830 | 0.835 | 0.876 | 72.6 | 66.9 | 51.3 | 51.1 | 49.5 | 37.7 |
Traffic volume predictions under nonrecurring conditions with
Performance comparison of RFE, correlation-based and random-selection approaches for selecting important attributes; results correspond to an ANN model for the
Performance comparison of RFE, correlation-based, and random-selection approaches for selecting important attributes; results correspond to an SVR model for the
Traffic congestion results in significant monetary losses in countries around the world. Short-term traffic prediction helps make decisions based on predictions of traffic in the near-future and is more useful than just using the real-time data of traffic conditions. Despite being a mature field, short-term traffic prediction poses many open problems such as the (a) choice of the optimal input data resolution; (b) reliable prediction and efficient tracking of nonrecurring congestion events; and (b) accurate modelling of the complex spatiotemporal dependencies influencing traffic estimation. We have explored the construction and use of predictive models based on three established machine learning algorithms for addressing the aforementioned problems. Specifically, we investigated the use of Artificial Neural Network (ANN), Support Vector Regression (SVR), and Random Forest (RF) and evaluated the predictive performance of these models for three different input data aggregation levels, Aggregation of high-resolution data to a lower resolution is not required for accurate forecasting with machine learning algorithms. Aggregation may actually have a negative effect on accuracy for these multivariate models. Our results indicate that machine learning algorithms are able to extract useful information from high-resolution data despite the corresponding amplification of noise and variability in the sensor measurements. By not explicitly exploiting the periodic characteristics in traffic, the machine learning models studied here perform equally well under both recurring and nonrecurring congestion without requiring any special changes to the models. The corresponding experimental results also indicate that these learned models are able to capture the underlying complex, spatiotemporal evolution of traffic. Recursive Feature Elimination provides a good ranking of attributes for short-term traffic prediction. The more commonly used linear Pearson correlation coefficient-based feature selection [
These results open up multiple directions for further research. First, we will incorporate these findings in more sophisticated machine learning algorithms for short-term traffic prediction. For instance, the complex, nonlinear relationships influencing traffic flow may be modeled well using deep network architectures, especially when high-resolution input data is considered. We will also consider other datasets in order to generalize the findings reported in this paper based on data from a single highway. Second, we will build on the indicated ability to track nonrecurring congestion events in order to consider both accidents and weather conditions. This will require the underlying algorithms to model additional variables and their effect on traffic flow. Furthermore, we will explore network-wide traffic predictions towards the long-term objective of effective use of resources for the smooth flow of traffic under a wide range of circumstances.
The terms of use of the data used in this study do not allow the authors to distribute or publish the data directly. However, these data can be obtained directly from NZTA through APIs on the following web page:
Mr. Rivindu Weerasekera (BE (Hons)) is a doctoral candidate at the University of Auckland, New Zealand. He holds a first class honors degree in Electrical and Electronics Engineering from the University of Auckland. His research interest focus on the intersection of Intelligent Transportation Systems and Machine Learning. Dr. Mohan Sridharan (Ph.D.) is a senior lecturer in the School of Computer Science at the University of Birmingham (UK). He was previously a senior lecturer in the Department of Electrical and Computer Engineering at The University of Auckland (NZ), and a faculty member at Texas Tech University (USA) where he is currently an Adjunct Associate Professor of Mathematics and Statistics. He received his Ph.D. in Electrical and Computer Engineering from The University of Texas at Austin (USA). Dr Sridharan’s primary research interests include knowledge representation and reasoning, interactive machine learning, cognitive systems, and computational vision, in the context of adaptive robots and agents. Dr. Prakash Ranjitkar (Ph.D., MEng, BEng (Civil)) is a senior lecturer in Transportation Engineering in the Department of Civil and Environmental Engineering and a founding member of the Transportation Research Centre (TRC) at the University of Auckland, New Zealand. He has over 19 years of academic, research, and consulting work experience in a range of transport and other infrastructure engineering projects. He has strong research interest in modelling and simulation of traffic, Intelligent Transportation System, traffic operations and management, traffic safety, human factors, and applications of advanced technologies in transportation. Prior to joining the University of Auckland in 2007, Prakash worked for the University of Delaware in USA (2006–2007) and before that in Hokkaido University in Japan (2001–2006). He is a member of IPENZ Transportation Group and Institute of Transportation Engineers (USA). He is an Editorial Board Member for the Open Transportation Journal and reviewer of Journal of Transportation Research Board, Journal of Eastern Asia Society for Transportation Studies, Journal of Intelligent Systems, and IEEE Transactions of Intelligent Transportation Systems.
The authors would like to thank Mike Duke from Auckland’s Joint Transport Operations Centre (JTOC) for helping them obtain access to the data used for experimental evaluation in this paper.