Data Fusion Based Hybrid Approach for the Estimation of Urban Arterial Travel Time

Travel time estimation in urban arterials is challenging compared to freeways and multilane highways. This becomes more complex under Indian conditions due to the additional issues related to heterogeneity, lack of lane discipline, and difficulties in data availability. The fact that most of the urban arterials in India do not employ automatic detectors demands the need for an effective, yet less data intensive way of estimating travel time. An attempt has been made in this direction to estimate total travel time in an urban road stretch using the location based flow data and sparse travel time data obtained using GPS equipped probe vehicles. Three approaches are presented and compared in this study: 1 a combination of input-output analysis for midblocks and Highway Capacity Manual HCM based delay calculation at signals named as base method, 2 data fusion approachwhich employs Kalman filtering technique nonhybridmethod , and 3 a hybrid data fusion HCM hybrid DF-HCM method. Data collected from a stretch of roadway in Chennai, India was used for the corroboration. Simulated data were also used for further validation. The results showed that when data quality is assured simulated data the base method performs better. However, in real field situations, hybrid DF-HCM method outperformed the other methods.


Introduction
Characterization of traffic systems is complex in nature due to the dynamic interaction between the system components, namely, the vehicles, road, and the road users.The uncertainties associated with human behavior makes the system more complex making modeling of the system a challenging task.Estimation and prediction of various parameters associated with this system is also difficult due to the associated uncertainties.The usual parameters used for characterizing the system include flow, speed, density, and travel time.The present study is dealing with the estimation of one of these parameters, namely, travel time.To obtain travel time information of all vehicles in a stream by direct measurement is both time consuming and costly, and it is impractical to collect this information from Data fusion is a broad area of research in which data from several sensors are combined to provide comprehensive and accurate information 20 .The advantages of using data fusion include increased confidence, reduced ambiguity, improved detection, increased robustness, enhanced spatial and temporal coverage, and decreased costs 20-22 .The basic idea of data fusion is to estimate parameters by using more than one measurement from different sources or sensors.This may be due to lack of availability of enough data from a single source or to capture the advantages of different data sources.Some specific applications of data fusion in the field of transportation engineering are discussed below.
Kwon et al. 23 proposed a linear regression model for travel time prediction by combining both loop detector and probe vehicle data.They showed that linear regression on current flow, occupancy measurements, departure time, and day of week is beneficial for short-term travel time prediction while historical method is better for long-term travel time prediction.Zhang and Rice 24 used a linear model with varying coefficients to predict the travel time on freeways using loop detector and probe vehicle data.The coefficients vary as smooth functions of departure time.The coefficients have to be estimated offline and stored and after that the model can be used real-time.El-Faouzi et al. 25 put forward a model based on the Dempster-Shafer theory.They used travel time from loop detector and toll collection data to estimate travel time.The model required the likelihood that the data sources are giving the correct data.El-Faouzi 22 carried out a similar work using Bayesian method using travel time data from loop detector and probe vehicle to estimate travel time.The results showed that the travel time estimate using data fusion approach was better than the estimate obtained if the data sources were used individually.Chu et al.21 used simulated loop detector and probe vehicle data to estimate travel time using a model based approach with Kalman filtering technique.Ivan 26 used the ANN technique to detect traffic incidents on signalized arterials using simulated travel time data from loop detector and probe vehicle data.
Another simple analytical model that uses readily available count data from upstream and downstream ends of a link for the estimation of travel time is the cumulative counts input-output method 9, 10, 12-14, 27 .However, a major drawback of the input-output method is its dependency on the accuracy of flow counts for travel time estimation 9-13, 27, 28 .Some of the other reported approaches include traffic flow theory based 11, 29, 30 .It can be seen from the above literature review that majority of the models discussed above are limited to freeways, and it may not be feasible to apply them directly on urban networks without further calibration due to differences in behaviour of traffic on the freeway and urban facilities.Moreover, the models developed for freeways generally provide average travel time for the link as a whole, which may not be a true representation in case of links with intersections, turning movements, and so forth.Thus, for better performance, intersection delays may have to be dealt separately.The present study analyse the validity of this assumption by comparing the accuracy of the estimated travel time with and without considering the intersection separately.
The first and one of the most popular methods for intersection delay estimation was developed by Webster 31 from a combination of theoretical and numerical simulation approaches that became the basis for all subsequent delay models.Modifications to the above model under varying traffic conditions were reported by Miller 32 and Newell 33 .The delay model suggested in Highway Capacity Manual usually known as HCM model 34 is a modified Webster's model incorporating the effect of progression and platooning.Attempts to overcome the assumption of steady-state condition by using time-dependent functions are reported in 35 .Other reported studies include deterministic queueing method 28, 36 , modified input-output technique 18 , shock-wave theory based models 37, 38 , and the use of Markov Chain processes 39, 40 .
Overall, it can be seen that most of the reported studies on travel time and delay estimation used data collected from homogeneous and lane-disciplined traffic, either directly from the field or indirectly through simulation models.The traffic conditions existing in India is complex and different with its heterogeneity and lack of lane discipline.There are only limited studies 7, 41, 42 which addressed heterogeneous traffic characteristics.None of those studies estimated the stream travel time in an urban arterial taking signals into account.Also, lack of automated data collection methods in India makes it difficult to explore many of the statistical, time series, and machine learning techniques which are data driven since a good data base is required for applying such techniques.Thus, the application of a new approach for urban arterial travel time estimation with less data requirement is an area that will be of interest for countries like India and requires additional research and is discussed in this study.
The present study compares the performance of three different travel time estimation methodologies, which uses flow data as the main input.The estimation methodologies include two hybrid methods namely base method and data fusion-HCM and a non-hybrid method.The study stretch consists of a midblock and an intersection.The total travel time of the stretch is considered as the sum of travel time in the midblock section, without being influenced by the intersections, and the travel time at intersection, taking into account the delays at signals.Mid-block travel time is estimated using two approaches, namely, inputoutput analysis and data fusion approach.Input-output analysis is a popular approach and utilizes the cumulative count at entry and exit to find the travel time of vehicles within the section.HCM method is the most popular approach for estimating delay at signalized intersections.The method of applying input-output analysis for the midblock and HCM method for the intersection to obtain the total travel time of vehicles in the study stretch can be considered as a base approach and is entitled as base method.The other approach presents a data fusion method for mid-block travel time estimation and HCM analysis for the intersection area and will be called as hybrid DF-HCM method.The data fusion approach utilizes the location based flow data and the sparse travel time data obtained from probe vehicles for estimating the travel time of the stream.The total travel time of the stretch is then obtained by summing up the mid-block travel time and delay incurred at the bounding intersection.The necessity for analyzing the intersections separately was validated using a non-hybrid approach, where the total travel time is estimated using data fusion alone for the whole stretch without separating into mid-block and intersection area.In this approach, the total travel time is directly estimated using the data fusion approach without separately analysing the delay at intersection.The data till intersection stop line is used in this case for the data fusion approach assuming that the delay is implicitly captured by the flow and travel time data till stop line.A comparison of these three methodologies is carried out to understand the best method for travel time estimation under heterogeneous traffic conditions.This is one of the first study under Indian conditions that have applied data fusion techniques as well as hybrid technique for the estimation of travel time.The study has illustrated an efficient method to estimate the stream travel time in urban arterials with limited GPS data and location based flow data.The results of the study stressed the necessity of analyzing the intersections separately for more reliable estimates of travel time in urban roads.

Data Collection
Under Indian traffic conditions, a ready-to-use data archive is not available and hence the study relied upon field data collected manually and simulated data using VISSIM simulation package for corroborating the estimation methods.

Field Data
The test bed selected for the present study is a six-lane busy arterial road, namely, Rajiv Gandhi Salai in Chennai, India.Traffic in one direction only was considered for the present study.Data requirements based on the selected methodology included flow data from three locations as shown in Figure 1.The distance between location 1 and location 2, which is before the influence of intersection, is 1.72 km and between location 2 and intersection location 3 is 0.1 km, making the total section under consideration to be of 1.82 km.
Two pedestrian over bridges were identified to mount the cameras for covering location 1 and 2. The intersection area covering the stop line at location 3 was recorded using a camera mounted on a convenient luminaire support.To ensure that the three-video recordings could be synchronized during replay in the lab, the time clocks in all the three cameras were set to a common time at the start of the data collection.Data were collected over three days for a total of six hours for the complete analysis and another two days for a total of three and a half hours from the mid-block alone for comparing input-output method with data fusion approach.
The required location based data, namely, the flow was collected using videographic technique.Initial snap shots of the traffic inside the study stretch were taken from elevated points to get the initial count of vehicles in the section required for input-output analysis.Photographs taken from entry and exit points along with additional photographs taken from in-between elevated points were required to capture the whole length of the section.Classified flow data at the three-data collection points were extracted manually for twowheeler, three-wheeler, and four-wheeler categories from the videos in a temporal resolution of one minute.The required flow data were extracted manually due to lack of automated procedures.Travel time data required for validation was also extracted manually from videos by reidentifying vehicles at various locations.
The limited travel time data required for the data fusion model were collected using test vehicles equipped with GPS units.The test vehicles comprised of two cars, two threewheelers, and two two-wheelers which provided travel time data.These vehicles were travelling back and forth between entry and exit points continuously during the data collection period.Moreover, GPS data were available from route number 5C of the public transport bus passing through the stretch.The GPS raw data included time, latitude, and longitude at every five seconds interval.From this, travel time data were extracted using the software package ArcGIS 43 .
Due to the lack of automated video data extraction, all the above field data collection and extraction were carried out manually, which was laborious and time consuming.The data collection procedure required lot of coordination for collecting the video simultaneously from three different locations, along with initial snapshots, and GPS data from all different types of vehicles.An error in video recording even in one location makes the entire data not useful for analysis.Due to these difficulties, it was decided to simulate the field traffic conditions using collected field data, and carry out further analysis using the data generated using the calibrated simulated network.The details of the simulation are detailed below.

Simulated Data
VISSIM 5.3 from the PTV vision 44 was used in the present study to simulate the traffic conditions for testing the accuracy of the travel time estimation model under varying traffic conditions.A road stretch similar to the field test bed was created in VISSIM using a satellite image.For realistic representation of field conditions, data on intersection geometry, signal timing and phasing, vehicle types, traffic composition, vehicle input, proportion of turning traffic, and speed distribution were entered from field.Less lane disciplined, traffic movement was achieved by placing the vehicles anywhere on the lane, by setting the option for over-taking through left and right side of vehicle and allowing a diamondshaped queuing at the intersection.To account for the nonstandard vehicles types, static, and dynamic characteristics of most of the regular vehicle types in terms of length, width, acceleration, and deceleration, and speed ranges were defined based on field values.Signal timing and phase change data from the field were used in the simulation with a total cycle time of 145 s, red time of 98 s, green time 45 s, and amber time 2 s.Classified flow data for two-wheeler, three-wheeler, and four-wheeler categories extracted manually for the three locations is used for calibration and validation of the simulation.Five-minute aggregated flow and composition at location 1 were used as dynamic inputs for calibration.Data generated at the other two locations were compared with the field values for validation.
During calibration, several parameters in VISSIM were adjusted to match the field scenario.Simulations were performed with different random seeds with an average of five to ten values for each influencing parameter.Parameters were calibrated such that the error in flow, density, and travel time/speed was reduced.The errors were quantified in terms of mean absolute percentage error and were comparable with other similar studies in VISSIM 45, 46 .

Estimation Schemes
As mentioned already, in this study, the total travel time is estimated using a hybrid DF-HCM method making use of data fusion approach for the mid-block and HCM approach for intersection.A comparison is carried out using the base method employing inputoutput analysis which uses a simple deductive principle of cumulative counts inputoutput method for estimating the link travel time and HCM analysis for the intersection area.The total travel time of the stretch is then estimated as the sum of mid-block and intersection travel time.Also, the need for analysing intersection delay separately is verified by comparing with a direct estimation of travel time till intersection using non-hybrid method which employs data fusion approach alone for the whole section.The basic approach of the above methods, namely, model based approach using data fusion, input-output method, and HCM approach are discussed below.

Data Fusion Method
This section details the methodology adopted for fusing both Eulerian video data and Lagrangian GPS data for estimating the travel time.The methodology was motivated from the study of Chu et al.21 .The estimation scheme is based on the conservation equation and the fundamental traffic flow equation given in 3.1 , and 3.2 , respectively: where q is the flow in PCU/hour, k is the density in PCU/km, and V is the space mean speed in km/hour, with x being the distance and t being the time.By discretising 3.1 , the density at time t can be represented as where q entry t − 1, t and q exit t − 1, t are, respectively, the flow in PCU/h at the entry and exit points during the time interval t − 1 to t. Δt is the data aggregation interval 1 minute in this study .
A filtering technique is used to estimate the density by assuming a value for the initial density, k 0 .Then the average travel time taken by the vehicles to reach the exit point from the entry is given by where tt t is the travel time at time t, q t − 1, t is the flow along the section during t−1 to t which is given by q ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ q entry , if q exit − q entry > q critical q exit , if q entry − q exit > q critical q entry q exit 2 , if q entry − q exit < q critical .

3.5
Average of the flows at entry and exit points is used under normal conditions, when the flows at both ends are comparable without any shock-wave propagation.When Journal of Applied Mathematics the flows at the entry and exit are not comparable, minimum of the two was adopted to capture the density variation within the stretch.In the present study, q critical was selected as 20 PCU/minute.
In the above formulation, since the initial density in the section, k 0 is unknown, there is a need for a parameter estimation scheme.The use of techniques such as Kalman Filtering or High Gain Observer HGO based parameter identification are reported in literature for similar applications 21, 47, 48 .In the present study, a Kalman filter based estimation scheme is adopted.
The Kalman filter KF is a recursive algorithm 49 and is usually applicable to system models which can be written in the state space representation.It is a model based tool for estimation and prediction and incorporates the stochastic nature of parameters.The KF can be of different types such as discrete Kalman filter, extended Kalman filter, and adaptive Kalman filter.The selection of the filter depends on the nature of the governing equations.In the present problem, as state equation 3.3 and the measurement equation 3.4 are linear, the Discrete Kalman Filter DKF is used.The present study uses flow data from the video and travel time from limited test vehicles to estimate average stream travel time.The state variable used is traffic density and the travel time is the measurement variable.The state process and measurement observation equations of DKF can be derived from 3.3 and 3.4 and are given below.
State equation: Measurement equation: where u t − 1, t is the input which depends on the flow and is given by H t is the transition matrix which converts the density to travel time and is given by H t Δx q t − 1, t , 3.9 and w t−1 and z t are the process disturbance and the measurement noise, respectively.These are assumed to be Gaussian with zero mean and variances Q and R, respectively.
The Kalman filter algorithm is given by where k t is the a priori estimate of density calculated using the measurements prior to the instant t and P t is the a priori error covariance associated with k t .k t and P t are the a posteriori density estimate and its covariance, respectively, after incorporating the measurements till time t.G(t) is the Kalman gain which is used in the correction process.The above steps are repeated at every time steps, and the correction step was carried out only at the intervals when a measurement of GPS travel time is available.

Input-Output Method
The input-output cumulative count method as given by Nam and Drew 50 involves constructing the cumulative vehicle counts N on the y-axis and time on the x-axis, as shown in Figure 2.
The classical analytical procedure for travel time estimation considers cumulative flow plots N X 1 , t and N X 2 , t at upstream entrance and downstream exit of the link.The total travel time of vehicles during a given time interval, say between t n and t n−1 , is then given by the area between the two curves for that time period, represented by the shaded region in Figure 2. The area can be calculated considering all vehicles that are entering, exiting, or entering and exiting.In this study, all vehicles that are exiting in the time period are considered, and the area is calculated accordingly.Corresponding analytical expression for total travel time area of trapezoid is given by where, t time of entry of the last vehicle that exits the link during t n−1 to t n , t time of entry of the first vehicle that exits the link during the t n−1 to t n , m t n the total number of vehicles that exit the link during t n−1 to t n .
Under the first-in first-out condition m t n can be given as where, Q X 2 , t n cumulative number of vehicles exiting at t n , Q X 2 , t n−1 cumulative number of vehicles exiting at t n−1 .Interpolating the values of t and t and substituting them in 3.11 , the total travel time T t n can be calculated.The average travel time of vehicles exiting the link during the given interval TT is then obtained by dividing the total travel time T t n by the number of vehicles exiting the link for the same period as T t n /m t n where m t n is the number of vehicles that exit during the interval.

HCM Delay Method
HCM delay method 34 is for estimating delay at an intersection over a given time period.Using this method, the average delay per vehicle for a lane group can be calculated using 3.13 .
where, d average overall delay per vehicle seconds/vehicles ; d 1 uniform delay s/veh ; d 2 incremental or random delay s/veh ; d 3 residual demand delay or initial queue delay s/veh ; PF progression adjustment factor; X volume to capacity ratio of the lane group; C traffic signal cycle time seconds ; c capacity of the lane group veh/h ; g effective green time for the through lane group seconds ; T duration of analysis period hours ; k incremental delay factor 0.50 for pre timed signals ; I upstream filtering/metering adjustment factor 1.0 for an isolated intersection ; P proportion of vehicles arriving during the green interval; f PF progression adjustment factor.
The above delay calculation of HCM requires flow values at location 3, free flow running time between the location 2 and location 3, cycle timings, capacity of the lane group, vehicle arrival type, and progression adjustment factor as input values.Out of these, the flow values and cycle timings were directly obtained from the field.The capacity of the lane group was calculated from the field using saturation flow rate and green cycle time ratio as given by c s × g/C 34 .A value of 3564 vehicles per hour was obtained as capacity value which closely matched the value given in IRC: 106-1990 51 for three-lane arterial road.Hence, the standard value of 3600 vehicles per hour as per IRC: 106-1990 for three-lane arterial road was taken for analysis.HCM Exhibits 16-11 and 16-12 34 were used to determine the arrival type and progression adjustment factor for the known volume condition and vehicle distribution over green time.The value corresponding to arrival type 4 and green cycle time ratio of 0.3 was chosen for the progression adjustment factor.
The additional free flow running time for the delay stretch constant value for a particular link is added to the estimated delay to obtain the total travel time between the two locations.The free flow running time is obtained by dividing the distance between locations 2 and 3, L a,s by the free flow speed ff s of the stretch.Thus, the total travel time of the delay stretch is obtained as where, TT is the total travel time of the delay stretch and d is the estimated delay value.The total link travel time of 1.82 km stretch is then computed as the sum of midblock travel time 1.72 km and the travel time in the delay stretch 0.1 km .Each of the modules of base, hybrid DF-HCM, and non-hybrid method is corroborated using field data and simulated data and is detailed in the section below.

Corroboration of the Estimation Scheme
In order to evaluate the performance of the above methods, the estimated mean link travel times using these methods were plotted against the actual travel times using both field data and simulated data.The actual travel time data required for the validation of the results while using field data were obtained through GPS equipped test vehicles as well as by manually re-identifying vehicles from videos.GPS data were collected using three cars, three auto rickshaws, two motor bikes, and five buses as representative samples of each classification.Throughout the data collection period, these GPS test vehicles, except the buses, were made to travel along the study section repeatedly.Figures 3, 4, and 5 shows the plots of travel times predicted by the three methods compared to the actual travel time from the field data and simulated data.It can be seen that the hybrid DF-HCM model is able to capture the variations in the actual travel time better than the other two methods.
The errors in travel time estimation in the above cases were quantified using mean absolute percentage error MAPE and root mean squared error RMSE .The mean absolute percentage error is obtained using where N est k and N meas k are the estimated and the measured average travel time of the study stretch during the kth interval of time with N being the total number of time intervals.Journal of Applied Mathematics 13 MAPE meets most of the criteria required for a summary measure such as measurement validity, reliability, ease of interpretation, clarity of presentation, and support of statistical evaluation.However, as noted by most researchers 52 , the distribution of MAPE is often asymmetrical or right skewed and undefined for zero values.Hence, a scale dependent measure called root mean square error RMSE is also used which is often helpful when different methods applied to the same set of data are compared.However, there is no absolute criterion for a "good" value of any of the scale dependent measures as they are on the same scale as the data 53 .The lesser the value of RMSE, the better is the forecast obtained.RMSE expresses the expected value of the error and has the same unit as the data which makes the size of a typical error visible.the root mean square error is given by where N est k and N meas k are the estimated and the measured average travel time of the study stretch during the kth interval of time with N being the total number of time intervals.The MAPE and RMSE values obtained are shown in Figures 6 and 7 and can be seen that the errors are within acceptable ranges 52, 53 .According to Lewis' scale of judgment of forecasting accuracy 49 , any forecast with a MAPE value of less than 10% can be considered highly accurate, 11%-20% is good, 21%-50% is reasonable, and 51% or more is inaccurate.
It can be observed that in the case of field data, hybrid DF-HCM method performed better than the other two methods.The results using simulated data show both base and hybrid DF-HCM methods performing comparably and both performing better than the nonhybrid method.Thus, the results clearly show that analysing the delays at intersections separately brings in more accuracy to travel time estimation.Also, it can be observed that input-output method is too constrained by the flow data quality and can be used only when the flow data accuracy is guaranteed such as from simulation.This is further checked by comparing the performance of two more days of field data for mid-block section, and the MAPE and RMSE is as shown in Table 1.It can be seen that in these cases, the data fusion method outperformed the input-output method unlike the case of using data from simulation confirming that with uncertainty in flow values as obtained from field, the data fusion method outperforms the input-output approach.
Overall, it can be observed that the proposed data fusion method is a better candidate for travel time estimation compared to input-output method in the mid-block sections.Inputoutput method can be considered in cases where the accuracy of flow values is guaranteed.Also, it can be clearly observed that, analysing the intersection delay separately brings in more accuracy to the travel time estimation.This leads to the conclusion that intersection delay needs to be analysed separately to determine the total travel time of urban arterials.Thus, the proposed data fusion method can be used in mid-block sections along with HCM method for delay estimation at intersections to determine the total travel time of urban arterials.

Summary and Conclusions
The negative impacts of growth in vehicular population include congestion and delays, and are much debated topics currently all over the world.India, with its rapid growth in of the intersections separately for better performance.Among the hybrid models, data fusion method gave promising results under field conditions.When the accuracy of flow value was guaranteed, such as using simulated data, both the base and hybrid DF-HCM methods showed comparable performance with a slightly better performance from the inputoutput method.Hence, for real-time field implementations such as ATIS for urban arterials, the hybrid approach using data fusion for mid-block sections along with separate delay estimation at signalized intersections brings in the maximum accuracy of the predicted travel time information showing its potential for any such real time ITS implementations.

Figure 1 :
Figure 1: Schematic representation of the study corridor.

Figure 2 :
Figure 2: Illustration of input output analysis.

Figure 3 :
Figure 3: Sample travel time results using field data.

Figure 4 :
Figure 4: Travel time comparison for a 2 hrs simulated data.

Table 1 :
Performance of models in terms of MAPE and RMSE for mid-block.