A Comparative Study on Drivers’ Stop/Go Behavior at Signalized Intersections Based on Decision Tree Classification Model

,e stop/go decisions at signalized intersections are closely related to driving speed during signal change intervals. ,e speed during stop/go decision-making has a significant influence on the dilemma area, resulting in changes of stop/go decisions and high complexity of the decision-making process. Considering that traffic delays and vehicle exhaust pollution aremainly caused by queuing at intersections, the stop-line passing speed during the signal change interval will affect both vehicle operation safety and the atmospheric environment. ,is paper presents a comparative study on drivers’ stop/go behaviors when facing a transition signal period consisting of 3 s green flashing light (FG) and 3 s yellow light (Y) at rural high-speed intersections and urban intersections. For this study, 1,459 high-quality vehicle trajectories of five intersections in Shanghai during the transition signal period were collected. Of these five intersections, three are high-speed intersections with a speed limit of 80 km/h, and the other two are urban intersections with a speed limit of 50 km/h. Trajectory data of these vehicle samples were statistically analyzed to investigate the general characteristics of potential influencing factors, including the instantaneous speed and the distance to the intersection at the start of FG, the vehicle type, and so on. Decision Tree Classification (DTC) models are developed to reveal the relationship between the drivers’ stop/go decisions and these possible influencing factors. ,e results indicate that the instantaneous speed of FG onset, the distance to the intersection at the start of FG, and the vehicle type are the most important predictors for both types of intersections. Besides, a DTC model can offer a simple way of modeling drivers’ stopping decision behavior and produce good results for urban intersections.


Introduction
At signalized intersections in most cities of China, a 3 s green flashlight (FG) indicator and a 3 s yellow light (Y) indicator are the most common form of transition signal setting [1][2][3].
e current practice shows that it is reasonable to set the yellow light as 3 s for the intersection with a speed limit of less than 50 km/h. Once the speed limit is higher than 50 km/ h, the vehicle will often fall into the dilemma zone (DZ) due to the higher driving speed and insufficient yellow light duration [4][5][6][7][8]. In most Chinese cities, the speed limit of high-speed intersections in rural areas is generally larger than 60 km/h. In comparison, in urban areas, the speed limit of high-speed intersections is usually smaller than 60 km/h. us, for the above two intersections with different areas, the setting of green flashlight (FG) can impose other effects on the stop/go decision-making behavior of drivers.
Tremendous research efforts have done to study the influence of FG on drivers' decision-making behaviors as well as to model such behaviors in response to signal change intervals. However, few studies have compared the impact of FG on the driver decision-making process at different types of intersections. Furthermore, there is no study on the specific combination of 3-second yellow light (Y) and 3second green flashlight (FG).
is kind of signal combination is a unique feature of signalized intersections in Table 1: Characteristics and conditions of the investigated intersections.

Intersections
Cao'an Rd. and Jiasongbei Rd.
Cao'an Rd. and Xiangjiang Rd.
Cao'an Rd. and Caofeng Rd.
Siping Rd. and Dalian Rd.
Rende Rd. and Jipu Rd.   China. It provides a long time for the observation and determination of the driver before stop/go decision-making, i.e., 6 s. erefore, this paper mainly focuses on the research gap.
In this study, the Decision Tree Classification (DTC) models are applied to analyze how drivers' stop decisions relate to potential influencing factors for two different types of intersections. Firstly, vehicle trajectory data, reflecting stop/go decision behavior of five intersections, are collected during the signal change interval. ree of which are highspeed intersections with a speed limit of 80 km/h in the rural area, and two of which are intersections with a speed limit of 50 km/h in the urban area. Secondly, we use these trajectory data, and we also carried out statistical analysis to summarize the general characteristics of the potential influencing factors of the two types of the intersection, including instantaneous speed, the vehicle type, and the distance to the intersection at the beginning of FG signal. irdly, the DTC model is built based on the description of the critical design decisions and parameters. Next, the results of the DTC model and the discussion of findings are given accordingly. Finally, we summarize the findings of the study, point out the contribution of this study, and suggest future directions of related research.

Literature Review
Many previous achievements have focused on the influence of FG on the driver's decision behavior and DZ. ere are both positive and negative conclusions about the effect of FG in these kinds of literature studies. e positive results show that FG can warn the driver that the phase of green light is  coming to an end, and the driver can reduce the incidence of DZ by reducing the driving speed, to avoid red light violations [1,2,9]. FG signal essentially plays a role in prolonging the duration of yellow light. erefore, compared with the intersections without FG, the proportion of drivers running the red light at the intersections with FG is significantly reduced [10][11][12][13]. Among the negative aspects, it showed that FG could cause a significant increase in the proportion of stop decisions [10,11,14]. Besides, although FG can effectively reduce the DZ range caused by the yellow light, it enlarges the indecision zone and enormously increases the number of conservative stops and slightly encouraging aggressive passes slightly [13,15]. Meanwhile, the presentation of an FG indicator before the Y indicator considered increasing the complexity of the driver's stop/go decision, leading to repeated decision-making [15,16].    Notably, most of the studies listed above focused on the comparative study of FG installation or not and DZ occurrence and/or stopping probability. Meanwhile, numerous studies have focused on the modeling of driver's decision-making behavior at the end of the green light [17][18][19][20], the most typical of which is the GHM model proposed by Gazis, Herman, and Maraddin [21]. A basic assumption of the GHM model is that the driver decides whether to stop or pass the intersection according to the relationship between the maximum passing distance and the minimum stopping distance at the beginning of the yellow light. Several notable variants have also been reported in the literature [22][23][24][25]. e GHM model assumes that all drivers will choose to stop, if possible. But Olson and Rothery [26] found that the yellow light is often used as an extension of the green time phase in the decisionmaking process. Research conducted by May [27] showed that some drivers avoid DZ by accelerating or decelerating. e study of Liu et al. and Wei et al. [23,28] showed that the theoretical hypothesis could lead to differences in driving behavior. In general, the primary defect of the GHM model is the lack of description of the randomness of driving behavior. Because of this disadvantage, some other researchers have attempted to explain DZ behavior through stochastic approaches [15,29,30].
Many studies [12,16,17,23,[31][32][33][34] believe that the decision-making behavior of drivers is random and obeys a specific probability distribution. e stopping probability, which is described as a function of the speed of the vehicle, the distance to the intersection, or the travel time to the stopline at the beginning of the yellow light, the type of vehicle, etc., is expressed as binary logit model or Bayesian model. Meanwhile, other researchers, such as Rakha et al. [35], Hurwitz et al. [36], Kuo et al. [37], and Moore et al. [38], used fuzzy logic theory to analyze decision-making behavior. It should further point out that the behavioral parameters closely related to decision-making behavior may vary due to the influence of location conditions, driver behavior characteristics, vehicle performance, etc. Also, various potential influencing factors are often related to each other. Some research studies [16,29,30,[39][40][41] carried out in recent years have found that the distribution of decision-making areas may be dynamic, rather than the certainty described by traditional theories.

Site Descriptions.
Five intersections in Shanghai were selected to collect the necessary data, which were a 3 s FG signal and a 3 s Y signal. ese intersections are divided into two categories, one with a speed limit of 80 km/h and the other with a speed limit of 50 km/h. e former is mainly located on the roads connecting the urban area and the suburban area, such as Cao'an highway, etc., which has a large traffic flow and a high proportion of large trucks in peak hours. e latter is mainly located in the urban area, and the traffic composition is mostly cars. e main characteristics and conditions of the investigated intersections are shown in Table 1.   lens. rough residual analysis and t-test, it ensured that the accuracy error is not more than 0.15 m and 0.1 s. e time interval of the software-controlled is 0.1 s. erefore, matching the trajectory data with the signal change timing, driving behavior parameters such as the speed, acceleration, and deceleration of vehicles, and the position of each step are obtained.
To avoid the influence of preceding vehicles, only the last-to-stop and first-to-go vehicles after the onset of FG are selected for analysis. e last-to-stop vehicle refers to the vehicle selected to stop in front of the stop-line before the start of the red light. e last means that the vehicle is the last vehicle to make a decision in the study period. e first-to-go vehicle refers to the first vehicle passing through the stopline during the study period (i.e., from the end of green light time to the end of yellow light).
Eventually, the trajectories of 1,459 vehicles including 1,186 vehicles (345 trucks and 841 passenger cars) at the rural intersections and 273 vehicles (37 trucks and 236 passenger cars) at the urban intersections were obtained for use in subsequent statistical analysis and model development. As shown in Figure 1, the 1,186 vehicle trajectories collected at the rural intersections included the trajectories of 529 vehicles selected to stop and 657 vehicles selected to pass. In comparison, the 273 trajectories obtained from the urban intersections included the trajectories of 128 vehicles selected to stop and 145 vehicles selected to pass.

Statistical Analysis of Potential Influencing Factors
Past research has indicated that drivers' stopping decisions at signalized intersections may be influenced by the speed and distance to the stop-line immediately before the phase transition period as well as the vehicle type and time of day [14,21,22]. erefore, statistical analysis was performed to explore the variability of these potential influencing factors as well as their relationships with stop/go decisions in response to the onset of FG. Table 2.

Instantaneous Speed at the Start of FG. A statistical analysis of vehicles' instantaneous speeds at the observed approach lane at the start of FG is provided in
Comparisons between the rural and urban areas indicate that in both areas, the mean velocities of vehicles making go decisions are higher than those of vehicles making stop decisions. Besides, passenger cars typically have higher rates than trucks in both rural and urban areas. Figure 2 illustrates the distributions of the stop/go decisions relative to various FG-onset speed intervals in rural and urban areas. It finds that in a rural area, if the driver's speed is 60 km/h, the probabilities of stop and go decisions are equal. e same situation occurs in an urban area when the speed is 50 km/h, meaning that more truck drivers decide to stop than passenger car drivers given the same approach speed. In Figure 2(b), the situation is similar, with more truck drivers choosing to cross the intersection at a lower speed, which may be safer for large vehicles. Table 3 presents a statistical summary of distance to the intersection at the start of FG. It is found that the mean value of distance for crossing drivers is shorter than that for stopping drivers. Figure 3 illustrates the distributions of the stop/go decisions   Journal of Advanced Transportation relative to various FG-onset distance intervals in rural and urban areas. is figure shows that it is more likely for a driver to make a go decision if he or she is farther from the stop-line, and vice versa. In rural areas, for drivers located in a distance interval of 60-100 m from the stop-line, the probability of stop decision or pass decision is close to 50%. e same situation is found for a distance interval of 60-80 m in urban areas. In these distance intervals, it is difficult for drivers to decide whether to stop or go. Moreover, among all drivers who make stop decisions, more truck drivers than passenger car drivers are inclined to stop when the distance to the intersection at the start of FG is shorter than 100 m.  including stop/go decisions (p < 0.01), vehicle type (p < 0.01), and area type (p < 0.01), exhibit significant effects with respect to the approach speed of the vehicles at the onset of FG, but only the stop/go decisions (p < 0.01) exhibit significant effects with respect to the distance to the stop-line.

Decision Tree Models.
Because of its nonparametric nature and straightforward interpretation, DTC is proved in the field of traffic engineering [42]. For example, in the traffic safety evaluation, Abellán et al. [43] use DTC to analyze the relationship between stop/go decision, red light violation, and traffic parameters. Some researchers [44,45] have used DTC methods to explore the relationship between the relevant traffic rules and accident severity. In this study, the SPSS software package is used for the classification tree analysis. Based on the CART approach, a classification tree model was established, and the Gini criterion (or index) is used as the measure for splitting decisions. Because the data volume is not large, the minimum number of cases for the parent nodes was set to 30, and the minimum number of instances of the child nodes was set to 10. Besides, the cross-validation method (with ten folds) was used to evaluate how to extend the tree structure to a larger population. e three variables were expected to be closely related to the driver's stop/go decision, i.e., distance, speed, and vehicle type. e distance variable represents the distance from the vehicle to the stop-line at the start of FG, and the speed variable represents the vehicle's speed at the beginning of FG. Vehicle type variables are divided into two categories: passenger cars and trucks (0 � passenger cars and 1 � trucks). Table 5 shows the precision of the two developed models. For the rural area model, the training and test accuracies are 83.9% and 80.2%, respectively, and the prediction of cross behavior is more accurate than that of stopping the behavior. For the urban area model, the training and test model accuracies are 93.8% and 88.2%, respectively. e model is correctly fitted. Figure 4 shows the classification tree diagram used for training the stop/go decision model for rural areas. Figure 5 shows the corresponding partitions, which are much finer-grained than those in Figure 4. When the distance to the stop-line is shorter than 44.3 m or more prolonged than 116.4 m, most of the vehicles make the same decision. When the distance is between 44  Journal of Advanced Transportation decision. Trucks are more likely to stop than passenger cars. ese behaviors above and below 48.9 km/h correspond to Zones 2 and 4, respectively. (iv) Finally, for vehicles at FG-onset distances between 68.9 m and 116.5 m, the approach speed again plays a critical role. For vehicles with FG-onset rates higher than 66.8 km/h, most of the drivers (71.4%) will cross the intersection. However, for vehicles with FG-onset speeds below than 66.8 km/h, most drivers (72.8%) will stop, as indicated in Zone 5.

Result Analysis at Urban
Intersections. Figure 6 shows the classification tree diagram used to train the stop/go decision model for urban areas. Similar to Figure 5, the corresponding partitions for the tree in Figure 6 are drawn in Figure 7. is graph is divided into six zones: (i) As shown in Figure 7, all vehicle drivers in Zone 1 will choose to cross the intersection, while most

Comparisons of Rural High-Speed Intersections and Urban Intersections.
e percentages of stop decisions are shown through a color scale in Figure 8.
is figure illustrates that drivers tend to make stop decisions when the vehicle is farther from the stop-line, and the approach speed is higher, whether the intersection is in a rural or urban area.
However, there are some differences between rural highspeed intersections and urban intersections: (1) Truck drivers are more conservative at urban intersections, especially when they are nearer to the stop-line at modestly low speeds (below 39 km/h). Because of the higher speed limit at rural high-speed intersections, such conservative decision behavior emerges at these intersections at greater distances of 44.3∼68.9 m and speeds below 48.9 km/h. (2) Due to the difference between the speed limits, most drivers tend to stop rather than cross at urban intersections when the distance exceeds 57.1 m while at rural high-speed intersections, this distance threshold increases to 68.9 m. (3) When the vehicles are at a sufficiently far distance, such as 116.5 m from the stop-line at a rural intersection, nearly all drivers choose to stop independent of the approach speed. However, this value is much smaller, specifically, 94 m at an intersection.

Conclusions and Future Works
is study generated two models: the first illustrates the conditions affecting stop/go decisions in rural areas, and the other explains the corresponding requirements in urban areas. e data analysis indicates that the vehicle speed and distance to the stop-line when FG is on as well as the vehicle type are the most significant factors affecting the driver's stop/go decision in both rural and urban areas. e normalized importance of the distance variable is 100% for both types of sites. In rural areas, the normalized importance of speed is higher than that in urban areas. For vehicles at FGonset distances between 68.9 m and 116.5 m, the rate becomes the critical factor affecting drivers' behavior. e probability of stop decision is almost equal to that of pass decision, both of which are close to 0.5. e corresponding distance interval in urban areas is between 67.4 m and 94 m. An interesting finding of this study is that under the same conditions, regardless of whether the intersection is in a rural or urban area, most truck drivers tend to park more than car drivers.
is study presents a novel way to analyze stop/go decisions. e tree-based model provides a good verbal explanation, which makes it easier to examine other conditions. e classification tree provides a simple method to model the driver's behavior without any normal assumptions. e stop/go decision-making model based on DTC developed in this study can be used to improve the driver behavior model embedded in microscopic traffic simulation software.

Data Availability
e data used to support the findings of this study included in the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.