This paper explores the travel time distribution of different types of urban roads, the link and path average travel time, and variance estimation methods by analyzing the largescale travel time dataset detected from automatic number plate readers installed throughout Beijing. The results show that the bestfitting travel time distribution for different road links in 15 min time intervals differs for different traffic congestion levels. The average travel time for all links on all days can be estimated with acceptable precision by using normal distribution. However, this distribution is not suitable to estimate travel time variance under some types of traffic conditions. Path travel time can be estimated with high precision by summing the travel time of the links that constitute the path. In addition, the path travel time variance can be estimated by the travel time variance of the links, provided that the travel times on all the links along a given path are generated by statistically independent distributions. These findings can be used to develop and validate microscopic simulations or online travel time estimation and prediction systems.
Traffic congestion during peak hours has become unavoidable in numerous cities worldwide because of the rapid increase in car ownerships and the lack of resources for proportionately increasing the supply capacity of road systems. This problem is causing travel time to be highly unreliable. Travel time and travel time reliability have become important performance measures in assessing traffic system conditions. A previous study [
Traffic systems are complex and stochastic. For instance, a number of notable sources of traffic congestion (traffic incidents, work zones, bad weather, special events, and traffic demand fluctuations) can disrupt system performance and lengthen travel time. A high variability indicates the unpredictability of travel time and the reduced reliability of traffic service [
The insufficient amount of actual travel time data has caused several previous studies to use either loop detector data to analyze travel time reliability or microsimulation technique. In the past decade, numerous cities have implemented various travel time direct measurement techniques, such as automatic number plate readers (ANPR), automated vehicle identification (AVI) systems, GPSequipped vehicles, smart phone devices, and Bluetooth [
The remainder of the paper is organized as follows. Section
Travel time distribution is an important basis for modeling travel time variability and reliability, which can be measured using several travel time distribution properties, such as standard deviation and coefficient of variation. Previous studies on travel time variability and reliability assumed that travel time distribution may follow either normal distribution or lognormal distribution [
Numerous studies have been conducted to investigate the probability distribution of travel time on freeways and signalized arterial roads. Recently collected empirical travel time data exhibit positive skews and long tails. Therefore, normal distribution is not suitable for these data. During the 1950s, Wardrop [
Dandy and McBean suggested a lognormal or gamma distribution [
To obtain a better view of the travel time distribution, Susilawati et al. [
In most of these studies, the data derived from GPSbased probe cars only provide the travel time for the probe vehicles, resulting in a small sampling rate of traffic flow and preventing the collection of a large sampling data set for various routes and times of day. Moreover, this low sampling rate may not reflect the real distribution of travel time during a short period, such as 15 min. With the rapid development of traffic flow data collection techniques, the ANPR system allows traffic management engineers to record the time when a vehicle passes a specific location on roads. The time differences between continuous locations can be directly used as the travel time of this vehicle. The ANPR system provides almost the entire sampling rate of the traffic flow, except for the identification error of the camera. These accurate travel time data are beneficial to the studies on travel time variations in urban environments.
Different travel time monitoring techniques have been developed over the past decades. These techniques include ANPR cameras, Bluetooth scanners, GPSbased incar devices, smart phones, and speed sensors. The first two have been proven to be promising methods. Various models for estimating or predicting travel time distribution have been proposed on the basis of the data obtained from these techniques, and these models have exhibited excellent performance on freeways.
Approximately 200 ANPR identification detectors are currently being mounted throughout Beijing city to collect vehicle passing time. These detectors allow the travel time on the target road link to be obtained through the comparison and analysis of the passing time between two consecutive detectors.
However, the major problem of ANPR in urban networks lies in the difficulty in determining whether a vehicle has traveled exactly along the route between two locations without making unexpected stops. Thus, a number of invalid travel times from individual vehicles are inevitably observed. These invalid travel times do not represent the average traffic conditions on the link considered at the time the vehicle was detected. For instance, an invalid travel time that is considerably longer than the average travel time of vehicles can be observed from a vehicle making unexpected stops between two detection stations or from a bus that must stop at bus stops to load and unload passengers. Such travel times must be removed from the dataset to avoid bias in the analysis results.
The data must be preprocessed to remove the outlier value before the estimation of the date distribution. Quartile screening method using quartile interval is applied in this study to reflect the variation scale.
Specifically, the data interval is the difference between the upper and lower quartiles. If the data lies outside the interval, then it will be identified as abnormal data and deleted:
Figure
Travel time distribution of link 6968 before and after preprocessing.
The influence of signal timing and other parameters causes the travel time on arterial road links to show unimodal, bimodal, or multimodal distribution shape. Therefore, in this study, the distribution patterns of the travel times of different links were analyzed in order of priority. A total of 17 unimodal distribution links were selected: seven signalized arterials with lengths ranging from 417 m to 2028 m and 10 nonsignalized urban expressways with lengths ranging from 1600 m to 4100 m.
A 15 min interval is used to study how the travel time distribution on actual road varies over time. This interval provides sufficient data for most times of a day, and these data can be used for travel time distribution estimation. In addition, this interval is short enough to capture shortterm variations in travel time.
The data collected on June 16, 2011, were selected for use in this paper. For each road link, a total of 96 15 min travel time datasets may be used for analysis. Normal, lognormal, gamma, and Weibull distributions are fitted to these 15 min travel time datasets, and the Chi squares are selected for testing goodnessoffit.
To determine the relationship between the distribution pattern and the congestion degree of the road link, travel speed
The relationship between the travel speed
Distribution form of 15 min travel time under different traffic conditions.
Signalized arterial link  Congested  Slow  Free flow  

<10 km/h  10 km/h to 30 km/h  >30 km/h  
Number  Percentile  Number  Percentile  Number  Percentile  
Normal distribution (1)  12  0.136  23  0.155  13  0.169 
Lognormal distribution (2)  23 

55 

35 

Gamma distribution (3)  15  0.170  22  0.149  17  0.221 
Weibull distribution (4)  38 

48 

12 

 
Urban Expressway  Congested  Slow  Free flow  
<20 km/h  20 km/h to 50 km/h  >50 km/h  
Number  Percentile  Number  Percentile  Number  Percentile  
 
Normal distribution (1)  10  0.222  48  0.142  16  0.050 
Lognormal distribution (2)  12 

149 

181 

Gamma distribution (3)  6  0.133  98  0.291  115  0.362 
Weibull distribution (4)  17 

42 

6 

As shown in Table
Table
For a 15 min travel time and its variance estimation, the specific distribution of travel time for certain road links during certain times must first be obtained using distribution fitting process; then, the average travel time and the standard deviation of travel time can be derived from the corresponding distribution function. However, in actual application, the parameter estimation of lognormal, gamma, and Weibull distributions is complicated, thereby highly constraining online realtime application. By contrast, the parameter estimation of normal distribution is simple. Therefore, the average travel time and variance estimation results for the normal distribution were compared with those for the other distributions (i.e., lognormal, gamma, and Weibull). In actual application, such as travel time estimation for dynamic traffic assignment or travel time reliability estimation, if the difference is within an acceptable range, then normal distribution can be considered a substitute for other distributions to obtain higher realtime calculation efficiency and reduce computer workload.
First, we consider the “Nanheyan Intersection > Wangfujing Intersection” as an analysis example. The dataset consists of four 15 min intervals from 8 AM to 9 AM on June 16, 2011. The average travel time and variance estimation results of different distributions in various 15 min intervals are shown in Table
Errors between estimation of normal distribution and other distributions.
8:00–8:15  8:15–8:30  


Error 

Error 

Error 

Error 

Normal distribution  46.587  —  31.803  —  47.688  —  34.109  — 
Lognormal distribution  45.593  2.18%  29.036  9.53%  46.592  2.35%  31.789  7.30% 
Gamma distribution  46.587  0  28.03  13.46%  47.688  0  30.241  12.79% 
Weibull distribution  47.143  1.18%  29.891  6.40%  48.257  1.18%  31.988  6.63% 
 
8:30–8:45  8:45–9:00  

Error 

Error 

Error 

Error 

 
Normal distribution  73.982  —  32.15  —  56.626  —  28.849  — 
Lognormal distribution  75.431  1.92%  43.737  26.49%  56.929  0.53%  33.011  12.61% 
Gamma distribution  73.982  0  36.347  11.55%  56.626  0  29.279  1.47% 
Weibull distribution  74.214  0.31%  30.92  3.98%  56.964  0.59%  28.201  2.30% 
The following observations are drawn from Table
Without loss of generality, 17 road links observed on June 16, 2011, were used to determine the estimation error between the bestfitting and normal distributions. A total of 962 valid 15 min travel time datasets were left for analysis after the invalid datasets were removed. The errors of average travel time estimation and standard deviation estimation derived using normal distribution with other distributions are shown in Figures
(a) Average travel time estimation errors using normal distribution with other distributions. (b) Errors of standard deviation estimation using normal distribution with other distributions.
As shown in Figures
These results suggest that the average travel time under most traffic situations can be estimated using normal distribution. For the estimation of the standard deviation of travel time, that is, travel time variability or reliability, the errors are occasionally large. For high accuracy requirements, for example, more than 95%, the accuracy of normal distribution estimation cannot be ensured.
In actual trips, travelers are more concerned with the travel time of a path or trip, which is the travel time from the original point to the destination. For a path
Similarly, the path travel time variance can be computed using the path travel time realizations [
This equation serves as the ground truth used for comparison purposes.
The expected travel time along a link can be computed as follows:
The travel time variance for link
For method 2, three methods were used to compute the path travel time variance from the link travel time variance [
The majority of vehicles do not travel the entire trip length; thus, the travel time data of a path within a shortterm interval is limited. To retrieve a significant amount of travel time data, the sampling time interval is set to 30 mins in this study. Five paths are used for analysis. The main consideration in choosing these five paths are as follows. (1) In the observed period, the datasets for these paths have an enough number of valid records to support the analysis. (2) These paths have consecutive links within the path. The dataset has a travel time for each link and a travel time for the entire path, thus allowing the correlation analysis of the links and error estimation. The data of the selected path consist of 24 hour data obtained from September 3, 2008. This dataset is then divided into 48 groups of values for every 30 mins.
The five paths travel time estimation error is illustrated in Figures
(a) Path travel time estimation error by (
As shown in Figures
Figure
MARE of the three methods for path travel time variance estimation of five paths.
MARE  Method  Path 
Path 
Path 
Path 
Path 

Whole day  Method 2.1  18.96  9.40  24.49  17.88  28.54 
Method 2.2  57.04  26.70  22.85  29.67  33.72  
Method 2.3  53.97  16.47  16.12  17.15  36.96  
 
a.m. (7:00–9:00)  Method 2.1  6.83  12.94  12.23  5.37  31.17 
Method 2.2  47.60  22.03  24.11  24.18  35.78  
Method 2.3  43.77  19.79  11.49  6.02  35.88  
 
p.m. (17:00–19:00)  Method 2.1  10.36  #  #  4.39  # 
Method 2.2  40.33  #  #  21.88  #  
Method 2.3  39.46  #  #  4.84  #  
 
Nonpeak hours  Method 2.1  21.57  8.22  26.95  20.69  27.99 
Method 2.2  61.18  28.25  22.60  31.10  38.12  
Method 2.3  57.89  15.36  17.05  19.68  37.18 
# indicates that, during this period, there are no data for the paths.
Standard deviation of travel time estimation.
The following observations are drawn from Table
These results are relatively different from those described in [
These findings suggest that the path average travel time can be estimated by summing the travel time along the set of links constituting the path. On the basis of the performance of Method 2.1, the assumption that travel times on all links along a path are generated by statistically independent distributions is reasonable under most traffic conditions, particularly for congested periods. Nevertheless, for a number of paths, Method 2.3 is more suitable than Method 2.1 for calculating the path travel time variance.
This paper discussed three issues concerning travel time variability and reliability estimation on the basis of the ANPR data in the Beijing road network. The first observation involves link travel time distribution estimation through goodnessoffit tests within shortterm time intervals of 15 min. The results show that for congested periods, approximately 40% of the 15 min travel time distribution followed Weibull distribution, with 43.2% for signalized arterials and 37.8% for urban expressways. The increase in average travel speed caused the 15 min travel time distribution percentile following lognormal distribution to gradually increase until the values become 45.5% for signalized arterials and 56.9% for urban expressways (uncongested traffic conditions).
The second observation involves the quick estimation process of the average travel time for 15 min intervals. The comparison between estimation results based on normal distribution and those based on other more fitting distributions shows that the estimation results based on normal distribution for the average travel time estimation are acceptable, with errors of no more than 2% and less than 1% for more than 90% of the estimations. For travel time variance estimation, although the MARE is 6.9%, approximately 1/3 of the absolute relative error is more than 50%. Thus, under most traffic conditions, normal distribution can be used to estimate the average travel time, whereas the most fitting distribution can be used to estimate the travel time variance.
The third observation involves the estimation process of the path travel time and variance on the basis of the travel time of the links constituting the path. The results show that the path travel time can be computed by summing the average travel time of links constituting the path. In addition, Method 2.1 produces the best results for path travel time variance estimation under most traffic conditions, particularly during congested periods.
In the future, more travel time data should be collected to conduct similar studies based on largescale data sets and to further understand the travel time distribution pattern and relationship between path travel time and travel time of links constituting a given path, particularly for long distance paths or trips with more than three links.
The authors declare the lack of conflict of interests regarding the publication of this paper.
This work was supported by the National Natural Science Foundation of China (Grant no. 71361130015). The authors are personally grateful to Beijing Traffic Management Bureau for its support.