Nonlinear Autoregressive Conditional Duration Models for Traffic Congestion Estimation

The considerable impact of congestion on transportation networks is reflected by the vast amount of research papers dedicated to congestion identification, modeling, and alleviation. Despite this, the statistical characteristics of congestion, and particularly of its duration, have not been systematically studied, regardless of the fact that they can offer significant insights on its formation, effects and alleviation. We extend previous research by proposing the autoregressive conditional duration ACD approach for modeling congestion duration in urban signalized arterials. Results based on data from a signalized arterial indicate that a multiregime nonlinear ACD model best describes the observed congestion duration data while when it lasts longer than 18 minutes, traffic exhibits persistence and slow recovery rate.


Introduction
Congestions impact on transportation networks are of particular importance for both planners and users.Congestion leads to longer travel times and limited reliability of the transportation system; further, during congestion, users experience delays, inability to forecast travel conditions, increased transportation cost, emissions, and so on 1 .The dynamics of congestion duration may contain useful information about intraday traffic operations and should be further explored.
Regardless of its significance for all aspects of traffic operations, the investigation of congestion duration's statistical characteristics has attracted only limited attention in the literature.During the last two decades, focus has been given to developing short-term prediction models for accurately forecasting the volume, occupancy, and speed, and determining the anticipated traffic flow conditions in urban road networks; existing literature

The Autoregressive Conditional Duration Models
Duration models have been applied to problems in many scientific fields such as medicine, economics, and transportation to investigate phenomena in which their temporal characteristics durations for example are of primary importance 9-13 .However, in classical econometric techniques, time-series are often taken to be sequences of data points separated by uniform time intervals; as noted by Tsay 14 , "traditional" duration models do not account for the possibility of a time-series dimension in these models and the related phenomena an excellent example of such a phenomenon are stock transactions as discussed in 15, 16 .Congestion events are typical of such behavior; congestion occurs in unequally spaced time intervals; classical duration modeling may not be adequate to model such events.
Using concepts similar to the generalized autoregressive conditional heteroskedastic GARCH models, Engle and Russell 15 proposed the autoregressive conditional duration ACD model to describe the evolution of data that arrive in unequally spaced time intervals.Let t i the time of congestion occurrence, with 0 t 0 < t 1 < • • • < t N and X i t i − t i−1 be the duration congestion events.Associated with the arrival times is the counting function N t which is the number of events that have occurred by time t.Let ψ i be the expectation of the ith duration given by ψ i ≡ E X i | X i−1 , . . ., X 1 ; θ 1 with parameter θ 1 .The basic assumption of the ACD model is that the standardized durations are independent and identically distributed 15 : where D is the general distribution over 0, ∝ with mean equal to one and parameter vector θ 2 .From the above, it appears that there is a number of potential ACD models that vary with respect to the different specification of the expected durations, as well as the distribution of the ε i .
In order to define the conditional intensity or hazard function, let p 0 be the density function of ε i and let S 0 , the associated survival function then is the baseline hazard utilized to express the hazard function as 15 The "accelerated failure time" model since the past information influences the rate at which time passes 15 .This is in line with most observations in traffic flow theory; sometimes flow changes rapidly and time between congestion occurrences flows rapidly while in other cases the opposite applies.In this case, the rate of time flow depends on the past event arrival times through the function ψ i .An ACD p, q model specifies the conditional mean duration ψ i as a linear function of p lagged durations and their q conditional expectations 14 2.4 Once a parametric distribution of ε i has been specified, maximum likelihood estimates of θ can be obtained by using different numerical optimization algorithms.When the error distribution ε i is exponential, the resulting model is called an EACD model.Similarly, if ε i follows a Weibull distribution, the model is referred to as a WACD p, q model, and so on.We focus on the WACD 1, 1 model that is a flexible in terms of hazard distribution features, b straightforward to estimate, and c can account for serial dependence in high frequency data 15, 17 .In the case of Weibull distribution with parameters k, a , the hazard function is and the conditional intensity function is 15 where Γ • is the Gamma function.Equation 2.6 shows that the conditional intensity is now dependent on two parameters k, a which, in turn, indicate that either increasing or decreasing hazard functions may result; this makes especially long durations more or less likely than for the exponential depending on whether a is less or greater than unity, respectively 15 .The log likelihood for the Weibull ACD is 14 The Weibull distribution reduces to the exponential distribution if α equals 1, but allows for an increasing decreasing hazard function if α > 1 α < 1 14 .
The ACD models can be modified to account for nonlinearities that are quite common in high-resolution datasets.Zhang et al. 18 extend the ACD model to account for nonlinearity and structural breaks in the data.A threshold autoregressive conditional duration TACD model allows for the expected duration to depend nonlinearly on past information variables.A positive stochastic process {X i } follows a jth regime threshold ACD model when the threshold variable Z i−d ∈ R j : where the delay parameter d is a positive integer, ψ i is the conditional mean of X i , R i r j−1 , r j , j 1, 2, . . ., J for a positive integer, − ∝ r 0 < r i < • • • < r j < ∝ are the threshold variables 18 .Based on the TACD formulation, the different regimes of a time-series dataset are allowed to have different duration persistence and error distributions, making modeling more flexible and efficient.However, in such models, the proper selection of the number of regimes is critical as it significantly affects the estimation process; Zhang et al. 18 underline the computational difficulty in estimating a 3-regime TACD 1,1 model for financial transaction duration data.Further specifications and details on ACD model development, estimation, and testing can be found in 14, 16, 17 .

The Data
Data from Athens Greece are used for modeling traffic congestion duration.The urban arterial examined has signalized intersections with link lengths varying from 150 to 500 m while there are three lanes for through traffic per direction.Traffic data on arterial links are collected using a system of loop detectors located around 90 m from stop lines; volume and occupancy data are extracted every 90 sec.Figure 1 depicts the time-series of volume and occupancy for a typical day.We examine the temporal dependence of both time-series via the autocorrelation function Figure 2 , where it is apparent that both volume and occupancy possess strong long memory characteristics reflected in the hyperbolic decay of their autocorrelation structure 19 .In order to determine the periods of congestion from the available data, it is necessary to identify congestion occurrences.The identification of congestion is based on a methodology developed using an advanced artificial intelligence approach to detect and cluster the transitional characteristics of volume and occupancy 20 ; this approach is found to be consistent with a kinematic wave theoretic model and can identify the spillover region the region where queues occurring are longer than the signalized arterial links .As can be observed in Figure 3, congested conditions spillover region and the critical area before congestion are separated by the spillover line where V is the volume veh/time interval , L eff the typical vehicle length, u f the free-flow speed km/h , C the cycle length, and r the red phase duration.Equation 2.5 is applied to extract the duration of congestion episodes.Table 1 summarizes the parameter values that will be used for congestion spillover detection.Figure 4 demonstrates the distributions of congestion duration.Figure 5 shows the distribution of the congestion durations observed and Figure 6 the autocorrelation function graph for the observed congestion durations time series.Summary statistics for the congestion data are presented in Table 2; the high Ljung-Box statistics, which is a χ 2 L distributed statistic given by N N 2 L j 1 r 2 j / N − j , where N is the series length, r j is the ACF of the jth lag sample.L is the degrees of freedom, show strong serial correlation.

Congestion Duration Models: Specifications, Estimation, and Diagnostics
For the estimation of congestion duration, an autoregressive conditional duration model with the Weibull distribution describing the errors WACD is considered.The estimated values of the fitted models are presented in Table 3.The model's parameters are significant at the 5% level.The fitted models have a Weibull error distribution with parameter α 0.991 0.021 ; the estimated value of the Weibull distribution is very close to 1, indicating a conditional hazard function that monotonously decreases at a slow rate.The sum of ω 1 and γ 1 is less than 1, pointing to an ergodic process.The Ljung-Box statistics for the residuals series show that the standardized innovations are not significantly correlated.
The fitted WACD model is contrasted to two parametric duration models previously applied in congestion modeling 8 .Comparisons are established based on the adjusted Anderson-Darling test statistics AD and the correlation coefficients COR ; the best fitted model will have the lowest value of AD and the highest COR value.Table 4 shows the goodness-of-fit tests for the three models.Although all COR values are relatively high, there exists a difference with respect to the AD values; the best fitted model is the WACD.Based on  the results presented on Tables 3 and 4 some interesting remarks are extracted; first, none of the models presented in Table 4 has AD values below the critical value at 95% confidence level 2.492 21 .Second, the nonlinearity test the null hypothesis is that the true model is an AR process.conducted 22 suggests that some nonlinearities remain in the residuals Table 3 .Third, Engle's LM ARCH test the null hypothesis is that there is no ARCH effect in the timeseries under study constant conditional variance.23 points towards a heteroscedastic behavior for the residuals Table 3 .All the above indicate the need to further refine the model.
In order to account for the remaining nonlinearities and the ARCH effect in the residuals, a threshold conditional duration model with Weibull distribution for the error T-WACD is developed.A recursive approach is used to identify the number and magnitudewith respect to the congestion duration boundaries-of regimes that best describe the available congestion duration data.Results for the model are summarized in Table 3. Figure 7 shows the scatter plot of the actual versus estimated congestion durations for WACD 1, 1  and T-WACD 1, 1 ; both models fit data well.Moreover, based on the results the mean absolute percent error MAPE is calculated Table 3 .Although results show superiority of the TR-WACD model over the WACD model, the error levels cannot be fully evaluated as no MAPE results for duration modeling have been reported in previous researches.
A thorough investigation of the results shows that the T-WACD model provides a better fit to the original data when compared to the single regime WACD model.The T-WACD 1, 1 can explain most of the temporal dependence in the congestion duration periods; additionally, most of the nonlinearities and ARCH effects are efficiently addressed.Some differences may also be identified in the regimes.For example, the estimated Weibull distribution parameter α monotonically decreases at a slow rate during congestion episodes that last up to 18 minutes whereas the Weibull distribution monotonically decreases for congestion episodes of 18 minutes and above.Moreover, a significant observation refers to comparing the value of the sum of ω 1 γ 1 across the two identified regimes; for the regime 1, the estimated model returns ω < 1 while the opposite applies for the second regime.This suggests that the first regime-indicating congestion events that last up to 18 minutes-describes a stationary process while congestion durations longer than 18 minutes are governed by nonstationary dynamics.Results from an urban signalized arterial indicated that congestion duration data are typically nonlinear and volatile.We showed that a multiregime nonlinear ACD model fits the observed data best.The estimated model suggests-for the specific applicationthe existence of two distinct congestion duration regimes; in congestion incidents that last up to 18 minutes, traffic is most likely to quickly exit congestion, whereas in congestion duration longer than 18 minutes, traffic exhibits persistence and congestion is expected to last.It is worth noting that although the results cannot claim transferability with regards to differences in arterial geometry, traffic demand, and signalization plans, the multi-regime nature of traffic flow is both well mathematically established in the proposed models and supported by the estimation results.From a methodological perspective, the paper's novelty is that models applied allow for the conditional expected duration to be a nonlinear function of the past duration incidents.Additionally, we considered possible nonstationarity that may  lie in the microstructure of traffic congestion occurrences.Nevertheless, regardless of the flexibility of the nonlinear ACD models, there is still important information that has to be considered in modeling; for example, other distributional error forms should be considered including the gamma and loglogistic functional forms.

Figure 1 :
Figure 1: Time-series of volume vehicle/90 sec and occupancy % for a typical day.

Figure 3 :
Figure 3: Volume-occupancy relationship for two consecutive signalized arterial links the black thick line represents the boundary of the congestion area after which queue spillovers are observed in the arterial link .

Figure 4 :
Figure 4: Congestion duration data time-series for a typical day in one of the arterial links under investigation.

Figure 8 :
Figure 8: Hazard function for each of the two regimes of T-WACD 1, 1 .

Table 1 :
Parameter values used for congestion detection.
b Figure 2: Autocorrelation function of the time-series of volume and occupancy.

Table 2 :
Statistical specifications of a congestion duration data and b the duration of related uncongested traffic periods.
* ≤ 18 min Regime 2 t * > 18 min * AIC −2 log l − k /T, log l is the log-likelihood value, k is the number of parameters, and T is the number of observations.

Table 4 :
Goodness-of-fit test for different congestion duration models.