Multistep-Ahead Prediction of Urban Traffic Flow Using GaTS Model

The mathematical models for traffic flow have been widely investigated for a lot of application, like planning transportation and easing traffic pressure by using statistics and machine learning methods. However, there remains a lot of challenging problems for various reasons. In this research, we mainly focused on three issues: (a) the data of traffic flow are nonnegative, and hereby, finding a proper probability distribution is essential; (b) the complex stochastic property of the traffic flow leads to the nonstationary variance, i.e., heteroscedasticity; and (c) the multistep-ahead prediction of the traffic flow is often of poor performance. To this end, we developed a Gamma distribution-based time series (GaTS) model. First, we transformed the original traffic flow observations into nonnegative real-valued data by using the Box-Cox transformation. Then, by specifying the generalized linear model with the Gamma distribution, the mean and variance of the distribution are regressed by the past data and homochronous terms, respectively. A Bayesian information criterion is used to select the proper Box-Cox transformation coefficients and the optimal model structures. Finally, the proposed model is applied to the urban traffic flow data achieved from Dalian city in China. The results show that the proposed GaTS has an excellent prediction performance and can represent the nonstationary stochastic property well.


Introduction
As the main driving force of development, traffic has significant effects on the flow of production factors and the daily life of the urban system. The intelligent transportation system (ITS) can effectively provide innovative services relating to different modes of transport and traffic management [1,2], such as transportation planning [3], traffic pressure easing [4], and traffic accident evaluation [5]. It enables transport networks to be more informed, more coordinated, and more efficient for various users. ITS requires a reliable prediction of traffic information in real time. Thus, how to accurately and timely predict traffic is a challenging task, which has gained more and more attention.
The traffic flow is full of complex dynamics and is stochastic [6], which make analysis and prediction mainly depend on timely or historical traffic data. As we illustrate in the latter, the urban traffic flow data is a nonstationary stochastic process with heterogeneous variance. Thus, the time series models are preferred for the prediction of traffic flow. On the other hand, the control operations, like variable speed limits (VSLs), are always embedded into ITS [7]. This fact suggests that the prediction models in ITS should be of concise structures natural to conduct the control and operation design. Thus, our studies focus on developing the data-driven time series model, which can predict the nonstationary distribution of the traffic flow and is of concise structures for the stochastic control design for ITS.
To construct the statistical model for the nonnegative traffic flow data, we first investigate which probability distribution is feasible for describing the uncertainty of the traffic flow. By detecting the change point of the traffic flow in 24 hours, we divide the traffic flow into four groups, whose distributions are separable from each other. According to the characters of the four groups, we proposed Gamma distribution-based time series (GaTS) models motivated by the generalized linear model [8].
We take the original observations and their Box-Cox transformation [9,10] as the response variable, which can be considered as random variables generated by a Gamma distribution-based stochastic process. Moreover, we extracted the homochronous term from the historical observations used as explanatory variables.
We use the Bayesian information criterion (BIC) to select the optimal model structure. By these means, the proposed model can predict not only the mathematical expectation but also the nonstationary variance from the past observations. Furthermore, using the homochronous term makes our model of outstanding accuracy in multistep-ahead prediction. Finally, the real data collected from Dalian city in China are used to validate the performance of the proposed GaTS. The computational results from the real-world data indicate that the homochronous term is helpful to enhance multistep-ahead prediction performance. Meanwhile, GaTS is of the linear structure. Thus, GaTS is efficient and convenient for further control design in ITS.
The rest of this paper is organized as follows: In Section 2, we review the studies on short-term traffic flow prediction. In Section 3, we present the GaTS methodology for generating traffic flow data as building blocks for prediction. In Section 4, we discuss the experimental results. At last, concluding remarks are described in Section 5.

Literature Review
Over the past few decades, a lot of mathematical models have been developed by using statistics and machine learning methods for traffic flow prediction. The regressive type model, including the autoregressive models and the support vector regression (SVR), has been used as the parametric methods. The nonlinear model, like the artificial neural network (ANN) model, has also been applied to the prediction of the traffic flow. Besides the parametric methods, the nonparametric models, including the k-nearest neighbour (KNN) model, were also constructed.
In the family of the regression-type parametric models, the autoregressive integrated moving average (ARIMA) models were widely used for predicting the traffic flow [11][12][13][14][15][16][17]. Besides, the extensions of ARIMA have been studied for the prediction of traffic flow. The space-time autoregressive integrated moving-average model was proposed to fulfil the internal relationships of the links [18]. Stathopoulos and Karlaftis [19] designed a model for predicting the traffic congestion on the basis of a multivariate time-series state-space model. Meanwhile, SVR is also used for traffic prediction [20][21][22][23]. These regressive models mainly focused on the prediction of the tendency (mathematical expectation) of the traffic flow and ignored the statistics for the dispersion (variance).
The ANN model is one of the most commonly used nonlinear models in artificial intelligence methods. Most of the researchers proposed to apply ANN for the traffic prediction problem using a different architecture of the ANN models or to treat the ANN model as a baseline for comparing a wide variety of classification methods [7,[24][25][26][27][28]. It has been observed recently that many researchers in the field have proposed to integrate ANN with different preprocessing methods like fuzzy methods to improve its performances [29][30][31][32]. Wang et al. [33] proposed to design a prediction model for traffic flow by integrating a fuzzy ANN using the Taguchi method. This work employed the Taguchi method to fix up a count for sensors along the roadside. They proved the benefits of the information collected through the detectors. On similar lines, Quek et al. [34] proposed to utilize a fuzzy-based ANN to the problem of estimation of shortterm traffic flow. The reported results indicated that the performance of the proposed model was promising in comparison to the backpropagation-based trained feedforward (FF) ANN. The ANN and deep learning networks have achieved excellent performance on the traffic flow prediction problem [35][36][37]. However, they are relatively difficult to develop further traffic control designs for their complex structures.
Effectively modelling traffic flow variance can produce more accurate confidence intervals for short-term traffic flow forecasts and thus improves prediction reliability. Because the generalized autoregressive conditional heteroscedasticity (GARCH) model can be used to describe the time-varying volatility structure of the time series data, it was used by Kamarianakis et al. [38] for prediction of the conditional variance of speed with the mean equation of the ARIMA model. Similarly, GARCH was used for dependent variance prediction to 15 min volume based on a seasonal ARIMA model [39,40]. Furthermore, Tsekeris and Stathopoulos used a fractionally integrated asymmetric power GARCH model with the mean equation of an autoregressive fractionally integrated moving average model for traffic volatility prediction and found that the combined model outperformed the ARIMA-GARCH model [41]. Because of the stochastic characteristics in traffic flow series, another volatility model, the stochastic volatility model, was proposed by Tsekeris and Stathopoulos [42] for urban traffic variability prediction. The evaluation results showed that the stochastic volatility model could produce a more accurate forecast speed variance than GARCH.
KNN is an essential method in the family of nonparametric methods. KNN has the ability to predict the sampled data based on a number k without formulating a model [43,44]. KNN can most benefit the situation with little prior knowledge. Keeping its simplicity and better performance into consideration, the popularity of the applied algorithm is increasing in the field of traffic prediction. Yu et al. [45] proposed a KNN model for regression of estimation of multiple-time-step prediction. The parameters are measured for each minute by a loop detector. Hou et al. [46] presented a model for determining the flow of short-term traffic based on KNN. The major limitation of the algorithm is that it requires tremendous computational resources for a massive amount of historical data. The algorithm also suffers a limitation of the sensitivity of the outliers of archival data.
Besides the works mentioned above, the Kalman filtering method [47,48], advanced techniques for kernel regression [49,50], and mixtures of multivariate Gaussian processes [51] were also used to the prediction of the traffic flow.  (Figure 1). As shown in Figure 1, the data are of apparent periodicity, which suggests that the information in the corresponding period of the past days can be helpful for the prediction. This fact motivates us to improve the multistep-ahead prediction by using the homochronous term. Figure 2 summarizes the boxplots for the corresponding sampling points of 10 days. The ranges of the boxes at each corresponding sampling point show that the variance of traffic flow is small at night and is large in the daytime. Thus, the traffic data is of time-varying variance, i.e., heteroscedasticity. Furthermore, we divide the data into four segments by using the change point detection method for the periodic time series [52]. Figure 3 illustrates the histograms for the four segments. Figures 3(a)-3(c) suggest that the distribution for the positive random variable, like the log-normal distribution and the Gamma distribution, can be used. However, Figure 3(d) is the histogram of the data collected from midnight to early morning and is of a single right tail, which cannot be approximated by the density function of the log-normal distribution. Thus, we use the Gamma distribution to build the time series model. Furthermore, we use the Box-Cox transformation to find a proper positive real-valued time series data as the following: We use BIC to select the proper λ.

Model Structure.
Because the transformed y t is nonnegative real-valued, we assume that y t obeys a Gamma distribution. Let f ðy t | μ t , σ t Þ denote the probability density function, with μ t and σ t being, respectively, the location and scale parameters. Consequently, the conditional probability density of y t+j can be formulated as the following: with Γð⋅Þ being the Gamma function. The time-varying σ t implies that the stochastic process generating y t is nonstationary. To predict such nonstation-ary on the basis of historical data, μ t and σ t are regressed as follows: where u it is the explanatory vector given by ½y t , ⋯, y t−l iy , c t+j , ⋯, c t+j−l is ⊤ with l iy and l is being the maximum time lags of each variable for i = 1, 2. j means that (3) is used for j-step-head prediction. c t is called the homochronous term, which is the mean of the observations at time t in the past five days up to the day containing y t . Then, for the data set fðy t , c t Þ | t = 1, 2, ⋯, Tg, the likelihood can be formulated as the following: where f ðy 0 , y 1 , ⋯, y l−1 Þ is the initial joint distribution and B = fβ i0 , β i | i = 1, 2g is the set of unknown parameters. Note that the initial joint distribution is not the function about B. The parameter set B can be estimated by solving the following maximum likelihood estimation problem: 3.3. Evaluation Criteria for Prediction Performance. To comprehensively test the prediction performance of the models, several evaluation criteria are calculated. We use the mean absolute error (MAE) [53] and root mean square error (RMSE) [54] to show the scale of the prediction error: Because the above two criteria cannot be used to evaluate the models crossing the data sets, the coefficient of determination R 2 calculated from the observation y t and estimated valueŷ t is used as the following: Here, y is the sample mean of y t . From (7), we can know that R 2 does not consider the time-varying variance. Thus, it 3 Wireless Communications and Mobile Computing is not proper for evaluating GaTS models with a timevarying variation. To solve this problem, we use the adjusted coefficient of determination R 2 H [55] as the following: with being weighted mean. Note that R 2 = R 2 H if the scale parameter σ t is estimated as constant.
The appropriate model structure determined by the time lags is crucial for prediction. Note that both R 2 and R 2 H are monotonic increasing with the complexity of the model. Therefore, they cannot be used for model structure selection. Instead,   Table 1 shows the one-to four-step-ahead (15 × 1 minutes to 15 × 4 minutes) prediction results obtained by three kinds of models. (fM i | i = 1, 2, 3g defined by the autoregressive variable y t and homochronous term c t , where M 1 is of time-invariant scale variable σ t = σ, M 2 and M 3 are of time-varying scale parameter σ t , and the homochronous term is applied to M 3 . For M 1 to M 3 , the proper λ's for the Box-Cox transformation are selected by BIC. Figure 4 shows the BIC values for the λ candidates. M 4 is the normal distribution-based model. Htr is calculated from the training data, and R 2 Hte is derived from the testing data.

Htr values and R 2
Hte values is minimal, which suggests that all the models have been well estimated without overfitting or underfitting. Bold numbers are the best values for each evaluation criterion. They indicate that M 3 's with time-varying scale parameters and homochronous terms are the optimal models for one-to four-step-ahead predictions. Meanwhile, Hte values show that M 3 's are of the best prediction performance. In M 3 's, the selected regression structures for σ t are more straightforward than those for μ t . This is similar to the results in work [55]. The homochronous terms c t represent the periodicity of μ t rather than that of σ t , which are Wireless Communications and Mobile Computing the mean of the traffic flow at the same time of 5 successive days before the day containing prediction time t. Therefore, the homochronous terms have not been selected to regress to σ t by BIC, as we expected. We also construct normal distribution, GaTS without the Cox-Box transformation and log-normal distribution-based models, denoted by M 4 's, M 5 , and M 6 in Table 1. By comparing M i 's for i = 3, 4, 5, 6, we find that M 3 's are optimal models according to the minimum BIC values. This suggests that GaTS are more optimal than the other distributionbased ones. Figures 5 and 6 illustrate the range estimation results for one day on Jan. 14, 2016, in which 95% confidence intervals (CI) are obtained by the predicted b μ t and b σ t along with t. In Figure 5, the green fields are collected by four M 3 's in Table 1, and the yellow areas are obtained by the corresponding normal distribution-based models. Figure 5(a) shows that GaTS have a similar CI prediction performance, compared with the normal distributionbased models, for one-step-ahead prediction. However, Figures 5(b)-5(d) indicate that the GaTS-based models are of a more narrow range than the normal distribution-based models for multistep-ahead prediction. Furthermore, the normal distribution-based models for multistep-ahead prediction even obtained the negative lower bound of CI, when the traffic flow values are small. This is contradicting to the fact that the traffic flow is positive-valued.

Range Estimates.
In Figure 6, the green fields are also collected by four M 3 's in Table 1, and the yellow areas are obtained by the corresponding log-normal distribution-based models with the identical structures of four M 3 's. From Figures 5(a)-5(d), we can see that both the two models of exponential transformation are more stabilized than using normal distribution directly from one-to four-step. Therefore, the variance of the log-normal distribution is a wider range than the Gamma distribution in the daytime, which is contrary to the real traffic flow state. That will influence the estimation and prediction of the traffic state by the ITS. Furthermore, the CI obtained by GaTS can well approximate the heteroscedasticity shown in Figure 2. Thus, GaTS can achieve more rational CI predictions.

Conclusion
This research mainly focused on the prediction issue on the urban traffic flow. By specifying GLM with the Gamma distribution, we proposed GaTS to predict the nonstationary stochastic process of the traffic flow. The objective of GaTS is to predict the probability distribution of the traffic flow in real time. To this end, the Gamma distribution presents the stochastic properties of nonnegative-valued traffic flow  The traffic flow data in this research were collected from Dalian, which is a large port in northern China, as well as a major destination for Chinese tourists [56]. The aggregation of a large number of different types of crowds not only brings traffic and environmental problems but also makes the state of the use of the region's urban public space become complicated and contradictory [57]. Because it is relatively difficult to improve transportation infrastructure, we are focusing on developing intelligent software control and management to relieve traffic congestion. ITS need more accurate historical information and future prediction of the road network [2]. Furthermore, to control and regularize the traffic flow, the model with precise accuracy should be of simple structures. Thus, our proposed GaTS is more proper than the research focus on the models of ANN and deep learning. Furthermore, a series of GaTS can be extended to model the joint distribution for the joint prediction of the multiple sensors.
Several research topics can be further considered. We successfully specified GLM by using the Gamma distribution for the urban zone. However, the probability distributions for other zones, like the highways, should be further summarized. The potential external factors, which have a relation with the traffic flow and can be governed by ITS, should be investigated to be embedded into GaTS. On the basis of GaTS with the external factors, the cost function for the control and regularization of the traffic flow should be constructed, and the corresponding optimization solver should be developed.

Data Availability
The data that support the findings of this study are available from the ITS database of the traffic police department in Dalian city of China. But restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of the traffic police department.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.