A finite mixture of logistic regression model (FMLR) was applied to analyze the heterogeneity within the merging driver population. This model can automatically provide useful hidden information about the characteristics of the driver population. EM algorithm and NewtonRaphson algorithm were used to estimate the parameters. To accomplish the objective of this study, the FMLR model was applied to a trajectory dataset extracted from the NGSIM dataset and a 2component FMLR model was identified. The important findings can be summarized as follows: The studied drivers can be classified into two components. One is called RiskRejecting Drivers. These drivers are consistent with previous studies and primarily merge in as soon as possible and have a distinct preference for the large gaps. The other is the RiskTaking Drivers that are much less sensitive to the gap size and pay more attention to surrounding traffic conditions such as the speed of front vehicle in the auxiliary lane and lead space gap between the merging vehicle and its leading vehicles in the auxiliary lane. RiskTaking Drivers use the auxiliary lane to get to the further downstream or less congested area of the main lane. The proposed model can also produce more precise predicting accuracy than logistic regression model.
Congestion has become one of the most serious economic and social problems and has drawn great attention from the public, transportation research scientists, transportation managers, and so on. Understanding the causes and mechanism of traffic congestion can help traffic managers formulate targeted policies to make better use of the existing transportation infrastructures.
Merging areas are the bottleneck of freeway. Merging behavior is one of the typical mandatory lane changes when vehicles have to move from an onramp to the main road. It has been claimed in some studies that merging behavior at merging areas affects traffic operations and may trigger traffic congestions and breakdowns [
Recently, driver heterogeneity has drawn great attention in microscopic traffic flow studies. Several studies investigated the driver heterogeneity during carfollowing process [
Prove the existence of heterogeneity among merging drivers.
Identify different driving styles and attitudes during merging process.
Model the merging behavior more accurately.
The present study is organized as follows. The next section will provide a critical review on the existing relevant literature followed by Section
Several methods have been adopted to model merging behavior, among which gap acceptance theory was the most widely used method [
Gap acceptance theory was often criticized as its basic assumption is often inconsistent with the real world observation because some lane change behaviors occurred when only the lead or lag gap or even none of them are larger than the critical gap [
Traffic behaviors are always uncertain and variable and heterogeneity cannot be ignored in traffic studies. Some studies investigated the heterogeneity among the macroscopic traffic flow [
Thus, a FMLR model was introduced in this paper to model the gap selection behaviors during merging process and investigate the heterogeneity among merging drivers. The FMLR model takes the advantage of two techniques: clustering and regression analysis. The model naturally incorporates the unobserved heterogeneity into logistic regression model and automatically segments the drivers into different homogeneous populations. The proposed FMLR model can explain the different strategies in merging behaviors.
The NGISM dataset has been widely used for traffic flow and traffic simulation studies and proved to have high accuracy. Thus, in this paper, the vehicle trajectory data in NGSIM dataset collected on a segment of southbound U.S. Highway 101 (Hollywood Freeway) in Los Angeles, CA, are chosen [
The section of US 101 [
In this study, we focus on the behavior of merging vehicles and only trajectory data in the weaving section were used. However, it has been pointed out that the original trajectory data contain some noise and errors, which are caused by the system errors and tracking errors [
(1) The velocities and accelerations of vehicles are directly estimated from the longitudinal positions.
(2) The locations (both local lateral and longitudinal coordinates), velocities, and accelerations of vehicles are smoothed by the symmetric exponential moving average filter (sEMA) proposed by Thiemann
Although the random errors can be reduced by the smoothing process, there are still some errors in the data. Thus, the following heuristic rules are applied to filter the datasets:
Filter out the trajectories when there are no putative leading vehicles or putative following vehicles on the adjacent main lane. Such trajectories are recorded at the beginning or ending of the video tape and cannot provide the interactions of merging vehicles with their surrounding vehicles.
Filter out the trajectories when putative leading or putative following vehicle of a merging vehicle runs around the lane boundary (it keeps touching the lane boundary before lane change or turns back the original lane in about 1 second). These trajectories are always caused by the tracking errors.
After filtering, a searching process was conducted to check the consistency of the local coordinates and global coordinates. Linear regression was performed between local coordinates and global coordinates for each subdataset. Three linear relationships were obtained for each subset:
To further verify the inconsistency of the US101 dataset, several data points that have the same global coordinates among the three subsets were searched and obtained. By checking the local coordinates (local x and local y), it was found that the three subsets of US101 dataset are consistent in local x, but inconsistent in local y. Tables
Examples with the same global coordinates in the first and second subsets.
Data Point  Sub dataset 1  Sub dataset 2  

Vehicle ID  Frame ID  Local x  Local y  Vehicle ID  Frame ID  Local x  Local y  
1  33  424  54.612  1397.746  36  847  54.612  1438.019 


2  33  429  54.687  1420.332  1070  4878  54.687  1460.518 


2  63  290  67.936  514.811  1472  5857  67.936  550.085 
Examples with the same global coordinates in the first and third subsets.
Data Point  Sub dataset 1  Sub dataset 3  

Vehicle ID  Frame ID  Local x  Local y  Vehicle ID  Frame ID  Local x  Local y  
1  42  446  53.395  1449.048  1721  8609  53.395  1483.814 


2  63  483  41.056  1494.004  1280  6744  41.056  1528.773 


3  296  967  53.8340  1389.247  905  4719  53.834  1424.013 
One can find that, for the points with the same global coordinates, the three subdatasets have the same local x, but different local y. In the local longitudinal coordinate, the upstream edge (0 m) in datasets 1 is at 12.275m in dataset 2 and 10.598 m in dataset 3. Thus, the three datasets must be unified by using the local coordinates of one of the three subsets.
At every instant when offered a new gap, a merging vehicle driver assesses traffic conditions to decide whether to accept the offered gap or not. One merging vehicle could only accept one gap but could reject several gaps. After data processing, trajectories of 374 merging vehicles consisting of 925 observations were extracted from the dataset. The explanatory variables that may affect a driver’s merging decision used as candidates for analyzing the merging behavior model are shown in Table
Descriptions of the explanatory variables.




The size of the 

The speed of merging vehicle 

The longitudinal position of the merging vehicle 

The speed difference between the putative leading vehicle and the merging vehicle 

The speed difference between the putative following vehicle and the merging vehicle 

Existence of a lead vehicle in the merge lane. If there is a lead vehicle in the merge lane, 

Lead gap of merging vehicle 

The speed of the leading vehicle in the auxiliary lane at offered gap 

The speed difference between the leading vehicle in the auxiliary lane and the merging vehicle 
The FMLR model is based on the idea that the observed data come from a population with several subpopulations or components [
Let
where
Several finite mixture models can be extended based on (
The analyst does not observe directly which component,
The constraint on
If individual specific characteristics are provided, the mixing proportions are extended as [
For the observed random sample,
The maximum likelihood (ML) estimate of
The conditional probability that observation
The conditional probabilities can be used to segment data by assigning each observation to the component with maximum conditional probability [
Parameters of FMLR models can be efficiently estimated through the EM algorithm [
The EM algorithm alternates between the expectation and the maximization steps until the likelihood improvement falls under a prespecified threshold or a maximum number of iterations are reached.
But the drawbacks of EM algorithm are its possible slow convergence rate and long processing time in computer. Thus, in this paper, Latent GOLD 5.0 is used to estimate the parameters. Latent GOLD 5.0 can take the advantages of both EM and NewtonRaphson algorithms. It first uses EM algorithm to get close to the final solution and then switches to NewtonRaphson to finish estimation [
The most important and difficult step in building FMLR model is to determine
To select an optimal model, we apply the FMLR model having an increasing number of components from 1 to 4 to fit, and apply Bayesian Information Criterion (BIC) as the indicator to select the most appropriate number of components. Table
BIC value of FMLR model.
The Number of Components  BIC Value 


790.2955 



773.3871 



808.3826 



843.3061 
To select the model variables, the forwardselection method is adopted in this paper. It starts with no variables in the model, tests the addition of each variable using Waldstatics, and adds the variable that gives the most statistically significant improvement of the fit. In this paper, variables will be added one by one until none produce a significant Waldstatistic in all components.
Table
Model estimation results of FMLR model.
Variables  Logistic Regression  FMLR(K = 2)  

component 1(0.672)  Component 2(0.328)  
Parameter  Parameter  Parameter  

  0.1810 
0.1063 

0.40848 
0.3903 
0.2557 

.05490 
0.1895 
0.0158 

.01345 
0.0109 
0.0105 

.07111 
0.0400  0.0568 

.01370 
0.0037  0.0113 
Constant  1.26281 
0.8417 
1.7619 
By using (
Mean values and standard deviations of related variables in each component.
Variables  Component 1  Component 2  

Rejected Gaps (Standard Deviation)  Accepted Gaps (Standard Deviation)  Rejected Gaps (Standard Deviation)  Accepted Gaps (Standard Deviation)  

15.050 (3.196)  13.418 (3.107)  13.505 (2.852)  14.272 (3.466) 



8.611 (3.825)  1.985 (2.766)  5.375 (3.698)  3.187 (2.908) 



10.068 (5.274)  33.14 (22.32)  17.468 (15.175)  27.09 (23.65) 



9.841 (7.114)  11.627 (6.671)  10.529 (6.420)  8.062 (8.100) 



43.42 (47.04)  44.83 (42.99)  33.05 (35.06)  26.79 (35.46) 


Merge Location 
41.66 (57.87)  108.58 (64.19)  


Number of Rejected Gaps  1.05  3.19 
As seen from significance levels of parameters of Component 1 in Table
It is interesting to find that the parameter of
As illustrated in Table
Figure
Relation between the gap size and location for the rejected and accepted gaps.
Figure
Box plot of the reverse succession of offered gaps.
Comparing the two components, drivers in Component 1 prefer larger gaps and lower speed difference, while drivers in Component 2 pay more attention to better surrounding traffic conditions and may sacrifice larger gaps to save travel time and get better traffic conditions. Thus, in this paper, Component 1 is named as RiskRejecting Drivers and Component 2 is named as RiskTaking Drivers.
Tables
Comparison of estimated and observed values of logistic regression model.
Estimated  

Observed  Reject  Accept  Total 


Reject 

68.0  552.0 


Accept  92.0 

373.0 


Total  576.0  349.0  925.0 
Comparison of estimated and observed values of FMLR2 model.
Estimated  

Observed  Reject  Accept  Total 


Reject 

42.0  552.0 


Accept  39.0 

373.0 


Total  547.0  378.0  925.0 
To incorporate the unobserved heterogeneity into merge model, the present study builds a FMLR model which uses BIC to determine the proper number of mixing components and performs parameter estimation by using Latent GOLD 5.0.
Given U.S. Highway 101 data, the identified optimal model is a 2component mixture of logistic regression model, which means the drivers can be divided into two components characterized by the driving behavior heterogeneity. One is the RiskRejecting Drivers whose drivers are consistent with previous studies and primarily merge in as soon as possible. Drivers in this component have a distinct preference for the larger gaps. The decrease of speed difference between merging vehicle and putative leading vehicle and a gap located further towards the end of the auxiliary lane also increase the probability of accepting the current gap. Contrast to Component 1, Component 2 is constituted with the drivers that are much less sensitive to the gap size and have more emphasis on surrounding traffic conditions such as the speed of front vehicle in the auxiliary lane and space gap between the merging vehicle and its leading vehicles in the auxiliary lane. These drivers are using the auxiliary lane to get to the further downstream or less congested area of the main lane. Thus they are called RiskTaking Drivers.
In addition, the proposed model can produce more precise predicting accuracy than logistic regression model.
However, more empirical studies are needed to apply this method to datasets in other sites with different demographics, climate, and geometric parameters in order to fully assess the effect of the factors affecting merging behaviors as well as fully understand the strengths and weaknesses of the proposed model.
The NGISM data used to support the findings of this study have been deposited at the website:
The author declares that there are no conflicts of interest regarding the publication of this paper.