^{1}

^{1}

^{1}

^{2}

^{1}

^{2}

Multistate models, that is, models with more than two distributions, are preferred over single-state probability models in modeling the distribution of travel time. Literature review indicated that the finite multistate modeling of travel time using lognormal distribution is superior to other probability functions. In this study, we extend the finite multistate lognormal model of estimating the travel time distribution to unbounded lognormal distribution. In particular, a nonparametric Dirichlet Process Mixture Model (DPMM) with stick-breaking process representation was used. The strength of the DPMM is that it can choose the number of components dynamically as part of the algorithm during parameter estimation. To reduce computational complexity, the modeling process was limited to a maximum of six components. Then, the Markov Chain Monte Carlo (MCMC) sampling technique was employed to estimate the parameters’ posterior distribution. Speed data from nine links of a freeway corridor, aggregated on a 5-minute basis, were used to calculate the corridor travel time. The results demonstrated that this model offers significant flexibility in modeling to account for complex mixture distributions of the travel time without specifying the number of components. The DPMM modeling further revealed that freeway travel time is characterized by multistate or single-state models depending on the inclusion of onset and offset of congestion periods.

Modeling travel time distribution is essential for measuring the consistency of the traffic performance of a highway system. Moreover, the distribution of the travel time is useful in simulation and theoretical derivations regarding different traffic performance measures such as travel time reliability and variability. The accurate estimation and prediction of travel time are essential for traffic operators, planners, and traveler information systems [

This study develops a nonparametric Bayesian model to estimate the travel time distribution for freeways. The model is based on Dirichlet process distribution with an extension of a hierarchical structure to account for the mixture/multistate characteristics of a given dataset. During the modeling process, the proposed model is truncated with an upper bound of six mixture components to reduce computational cost. Unlike a parametric model, this model does not require specifying the true number of components; instead, the number of components grows with the dataset, which is automatically inferred using the Bayesian posterior inference framework. The posterior distributions of the model parameter are derived using the Metropolis-Hastings Markov Chain Monte Carlo (MCMC) sampler. For this study, an Interstate 295 freeway corridor located in Jacksonville, Florida, was studied using 2015 traffic data.

In the next section, review of relevant studies is undertaken, followed by the methodology framework used in this research. Then, the discussion of the dataset and a method used to estimate the travel time is presented. Next, the results and model evaluation using simulated data with known parameters is displayed, after which conclusions and recommendations for possible future research are made.

Literature review indicates that models of estimating the travel time distribution can be divided into two groups, that is, single probability (unimodal) and multistate/mixture models. Unimodal distributions commonly used to estimate the travel time distribution are Gaussian, lognormal, gamma, Weibull, and Burr [

The multistate/mixture models refer to models comprising two or more distributions. In mixture modeling, the individual distribution forming the mixture is linearly added using a weighted sum of the individual distribution contributing to the model. The weights refer to the mixing probabilities of the model. Studies comparing the performance of mixture models to single models revealed that mixture models provide a superior fit of travel time distribution over single models [

In multistate modeling, there are two commonly used methods for finding model parameters, that is, the maximum likelihood estimation-expectation maximization (MLE-EM) and the Bayesian approach (BA) [

Taken together, the probability distributions discussed above are parametric with either the single model or multistate characteristics, whereby the multistate model consists of a fixed number of mixture components. The number of mixture components is specified as input in the model. The information criterion, cross-validation, and Bayesian factor are procedures commonly used to select the best model among a set of candidates [

However, there are two methods that can be used in modeling without causing overfitting or underfitting problems. The use of the infinite Dirichlet Process Mixture Model (DPMM) with a truncated number of mixture components overcomes the underfitting problem [

The Bayesian nonparametric mixture models have been implemented in a wide range of applications, including topic modeling, image analysis, and lifetime distribution [

The Dirichlet distribution is the generalization of a Beta distribution to account for higher order outcomes. The distribution is parameterized by a concentration parameter

The Dirichlet process is described as a set of distributions over the infinite sample space or distributions [

Graphical Dirichlet Mixture Model.

The model in Figure

For this study, data from the 20.4-mile corridor of the Interstate 295 freeway (Figure

Summary of links.

Link ID | Location | Distance (miles) | Number of detectors |
---|---|---|---|

1 | From I-95 to Old St. Augustine Rd | 2.8 | 4 |

2 | San Jose Blvd | 1.6 | 4 |

3 | Park Ave. | 4.8 | 5 |

4 | Blanding Blvd | 2.0 | 3 |

5 | Collins Rd | 1.1 | 2 |

6 | 103rd St. | 3.2 | 4 |

7 | Wilson Blvd. | 1.5 | 4 |

8 | Normandy Blvd. | 2 | 4 |

9 | I-10 | 1.2 | 1 |

The study corridor.

The travel time of each link was estimated using the average speed from the traffic speed reported by all MVDs in a segment. In addition, time of a day was an important parameter in the analysis. The travel time of a segment

We considered the same departure time in estimating the corridor travel time from individual link’s travel time. By aggregating the travel time, the results showed that the morning peak hour for both directions (northbound and southbound) occurred between 7 a.m. and 8 a.m. while the evening peaking hour occurred between 5 p.m. and 6 p.m. Figure

Time of the day corridor travel times.

Northbound traffic

Southbound traffic

Two chains were drawn and the first 10,000 iterations were discarded as burn-in while the next 10,000 iterations were used for inference. To reduce correlations between drawn samples, the sequence of inference iterations were thinned by 10 iterations. Figure

Corridor parameters and KS goodness-of-fit.

Time of day | Parameters | KS goodness-of-fit test | |||
---|---|---|---|---|---|

Number of components (mixture probability%) | Mean | Standard deviation | KS test stats | KS | |

Northbound (from I-95 to I-10) | |||||

6:00 a.m. | 2 | (2.89, 3.09) | (0.02, 0.14) | 0.093 | 0.32 |

7:00 a.m. | 2 | (2.87, 2.95) | (0.01, 0.05) | 0.051 | 0.73 |

8:00 a.m. | 2 | (2.87, 2.94) | (0.01, 3.19) | 0.074 | 0.43 |

9:00 a.m. | 1 | (2.87) | (0.01) | 0.079 | 0.99 |

10:00 a.m. | 1 | (2.88) | (0.01) | 0.090 | 0.68 |

15:00 p.m. | 2 | (2.88, 2.92) | (0.01, 0.04) | 0.078 | 0.40 |

16:00 p.m. | 2 | (2.93, 3.01) | (0.03, 0.07) | 0.067 | 0.99 |

17:00 p.m. | 2 | (2.96, 3.09) | (0.10, 0.13) | 0.027 | 0.96 |

18:00 p.m. | 2 | (2.89, 3) | (0.03, 0.08) | 0.049 | 0.52 |

19:00 p.m. | 2 | (2.87, 3) | (0.02, 0.08) | 0.125 | 0.45 |

20:00 p.m. | 2 | (2.87, 2.9) | (0.01, 0.06) | 0.046 | 0.98 |

| |||||

Southbound (from I-10 to I-95) | |||||

6:00 a.m. | 2 | (2.94, 3.09) | (0.02, 0.12) | 0.070 | 0.07 |

7:00 a.m. | 1 | (3.25) | (0.17) | 0.037 | 0.35 |

8:00 a.m. | 2 | (2.96, 3.14) | (0.03, 0.12) | 0.064 | 0.06 |

9:00 a.m. | 2 | (2.94, 3.13) | (0.01, 0.83) | 0.078 | 0.29 |

10:00 a.m. | 1 | (2.94) | (0.02) | 0.108 | 0.17 |

15:00 p.m. | 2 | (2.92, 2.96) | (0.01, 0.03) | 0.083 | 0.31 |

16:00 p.m. | 3 | (2.95, 2.94, 1.38) | (0.04, 0.02, 0.09) | 0.938 | 0.86 |

17:00 p.m. | 2 | 2.91, 3 | (0.01, 0.07) | 0.061 | 0.28 |

18:00 p.m. | 2 | (2.9, 2.98) | (0.01, 0.05) | 0.062 | 0.43 |

19:00 p.m. | 3 | (2.91, 2.72, 2.2) | (0.02, 0.86, 0.23) | 0.081 | 0.27 |

20:00 p.m. | 1 | (2.93) | (0.2) | 0.082 | 0.41 |

Predicted distribution and actual data density.

Northbound traffic

Southbound traffic

Cumulative predicted distribution and data cumulative density.

Northbound traffic

Southbound traffic

As can be seen in Table

To understand the performance of the DPMM in estimating the distribution of the travel time, four finite mixture models (i.e., single, two, and three mixture models) were simulated. The simulation was aimed at evaluating the accuracy of the models given the known parameters. The simulation was conducted with the known mean and variance, which were chosen randomly from link’s average and variance of the travel time data. Subsequently, the true parameters were used to simulate various sample sizes including 100, 1,000, and 10,000 following the lognormal distribution with the predefined finite mixture. The reason for simulating different sample sizes was to evaluate the influence of sample size on the proposed model. The truncated DPMM with 6 numbers of components was applied to each sample data. Discarding first 10,000 iterations, the next 10,000 iterations were considered for inference of the model parameters. Table

Parameters of the study.

ID | True parameters | Predicted parameters |
---|---|---|

a | | |

b | | |

c | | |

d | | |

e | | |

f | | |

Regardless of the number of observations, the number of mixture components was predicted correctly. The true and the predicted distributions are plotted in Figure

Estimated travel time distributions of the simulated data.

In addition, similar to travel time distribution goodness-of-fit test, the KS test was conducted to compare the actual cumulative distributions against those predicted by the DPMM. The results from analysis suggest that there is no evidence to reject the null hypothesis indicating that the predicted probability density follows the observed data.

As indicated in Table

The Kolmogorov-Smirnov goodness-of-fit test of the cumulative density.

ID | Test stats | |
---|---|---|

a | 0.015 | 0.999 |

b | 0.083 | 0.624 |

c | 0.026 | 0.999 |

d | 0.035 | 0.999 |

e | 0.028 | 0.999 |

f | 0.046 | 0.999 |

This study evaluated the application of a nonparametric Bayesian mixture model with the truncated DPMM through lognormal kernel density to estimate travel time distribution. The model developed here extends the commonly used mixture models to incorporate uncertainty about the number of mixture components of the model. In the DPMM, the number of components and the parameters of the travel time distribution were considered as random numbers. One-year spot speed data collected from a 20.4-mile corridor of the Interstate 295 freeway in Jacksonville, Florida, was used in the study. The peak and nonpeak hour travel times were aggregated at 5-minute intervals using data from MVD installed in various links in the corridor.

The findings have demonstrated that the developed model is capable of modeling the travel time distribution. Moreover, the results of the model support previous studies that travel time distribution is characterized by both multistate and single-state model depending on the time window of the analysis. Furthermore, the results demonstrated that the proposed model can offer significant flexibility in modeling to account for complex mixture distributions of the travel time without specifying the number of components. In the analysis, the uncertainties related to the number of mixture components were incorporated as well. The performance of the model based on the KS test on the actual and predicted cumulative probability density revealed promising results. Moreover, while testing the proposed model using simulated data, the number of true mixture components, mean, and the standard deviation value were correctly predicted.

It is important to note that in this study the travel time for the corridor was aggregated using the same departure time. This process may not represent the actual travel time of the corridor. Future studies may consider a vehicle trajectory-based method, dynamic time slice methods, or other methods to aggregate travel time across links. In addition, future studies could aim at analyzing and comparing the finite mixture and nonparametric mixture models using different sample sizes and other kernel functions such as gamma and normal distributions.

The random probability density function coming from the Dirichlet distribution with parameters

The base measure

Concentration parameter

The random distribution drawn from the Dirichlet process

The parameter of

The nonnegative vector representing a probability mass function of length

The mixing proportion

A Dirac delta function concentrated at

The lognormal kernel distribution function with a parameter

The number of mixture components, usually equal to or less than a total number of realizations

Travel time

The gamma function.

The authors declare that there is no conflict of interests regarding the publication of this paper.