Analysis of Freeway Secondary Crashes in Different Traffic Flow States by Three-Phase Traffic Theory

The objective of this study is to analyse the relationship between secondary crash risk and traﬃc ﬂow states and explore the contributing factors of secondary crashes in diﬀerent traﬃc ﬂow states. Crash data and traﬃc data were collected on the I-880 freeway in California from 2006 to 2011. The traﬃc ﬂow states are categorised by three-phase traﬃc theory. The Bayesian conditional logit model has been established to analyse the statistical relationship between the secondary crash probability and various traﬃc ﬂow states. The results showed that free ﬂow (F) state has the best safety performance of secondary crash and synchronized ﬂow (S) state has the worst safety performance of secondary crashes. The traditional logistic regression model has been used to analyse the contributing factors of secondary crashes in diﬀerent traﬃc ﬂow states. The results indicated that the contributing factors in diﬀerent traﬃc ﬂow states are signiﬁcantly diﬀerent.


Introduction
Exploring the crash mechanism and contributing factors plays an important role in preventing crash and reducing crash severity to freeway traffic surveillance systems. e occurrence of a crash can generate the turbulence of traffic flow which may lead to further crashes. Secondary crash (SC) occurs within the spatial and temporal impact ranges of the turbulent traffic conditions caused by the primary crash (PC). Previous studies suggested that 2.2% to 3.9% of freeway crashes can result in SC [1][2][3].
With the widespread use of freeway real-time traffic surveillance systems, researchers have started using highresolution dynamic traffic flow data to identify the traffic condition before SC occurrences. In general, SC can be affected by various contributing factors, including traffic flow characteristics, geometric design factors, weather conditions, PC characteristics, etc. [4][5][6][7]. In addition, many researchers have paid close attention to the identification method of SC. e common methods include static threshold method (STM) [8][9][10] and dynamic methods (DM) [11][12][13][14].
Although many studies have studied the identification method and crash mechanism of SC, few studies focused on the difference of SC in various traffic conditions. In different traffic conditions, there is a significant difference of traffic flow characteristics that affect the spatial-temporal evolution [15][16][17][18] and safety performance [19,20]. e typical divided methods of traffic flow include three-phase traffic theory [15,16], four-phase traffic theory [17], six levels of services [18], etc. It has been proved that the safety performance of SC associated with various traffic conditions has a significant difference [3,21].
In this study, the traffic flow is divided by three-phase traffic theory. e main purpose is to analyse the difference of safety performance for SC in different traffic flow sates and explore how contributing factors affect the probability of SC in different traffic flow sates. e SC related data were collected from the I-880 freeway in the United States from 2006 to 2011. e Bayesian conditional logit models have been established to analyse the statistical relationship between the SC probability and traffic flow states. e traditional logistic regression models were established to quantify the effects of various variables on the SC probability in different traffic flow states.
is research can help traffic management personnel better understand which traffic flow state is more dangerous for the occurrence of SC and realize the contributing factors of SC in different traffic flow states. e results can be applied to develop effective countermeasures and reduce the SC probability in different traffic flow states.

Literature Review
In early studies, STM was usually applied to identify SC. e STM is defined by the fixed spatial and temporal influence areas of traffic flow caused by a prior crash. Numerous studies have utilized the STM to analyse SC, such as Raub [22], Karlaftis et al. [23], Moore et al. [8], Zhan et al. [9], Hirunyanitiwattana and Mattingly [10], etc. However, there is an obvious limitation for STM. e determination of the scope in STM is too subjective to have objective and reasonable identification method [24]. In subsequent studies, to overcome the limitation of STM, many researchers adopt DM to identify SC [25][26][27][28]. DM has a dynamic boundary of influence area based on speed contour plot [14], shock wave [27], etc.
In recent studies, many researchers have analysed and predicted SC with statistical method or intelligent learning approaches. For example, Wang and Jiang proposed an identification method of SC by the speed contour plot and the spatiotemporal evolution of shockwaves [4]. e results indicated that the identification method based on an integer programming model can reduce the misidentification probability of SC. Kitali et al. used random forest to extract the important variables [5]. e results of Bayesian random effect complementary log-log model showed that some traffic flow variables, the PC types, and severities can significantly affect the probability of SC. In subsequent studies, Kitali et al. utilized the penalized logistic regression model to improve the predictive accuracy of the SC risk model [6]. e results of model indicated that the traffic flow variables and the PC characteristics can significantly affect the probability of SC. Specifically, the traffic flow variables include the occupancy, speed, variation of hourly flow, etc. e PC characteristics include the impact duration, types, occurrence time, etc. Yang et al. confirmed the PC boundary with the clustering method and metaheuristic optimization algorithm [7]. en, a novel identification method is introduced to identify SC. e results showed that the accuracy of identifying SC can rise to 95% with the market penetration rate increasing from 5% to 25%. Subsequently, Yang et al. summarized and discussed the previous studies from three perspectives, including the identification method of SC, the predictive models of SC risk, and the prevention measures of SC [11]. Goodall predicted the probability of SC by empirical queuing and estimated volumes [12]. It was found that SC occurred on average once every 10 crashes and 54 disabled vehicles. In the author's previous study, a twostep identification method of SC combined with the speed contour map and the shock wave was applied, and the random effect logit regression was utilized to analyse the contributing factors of SC [13]. e results indicated that the number of significant contributing factors increases with increasing of the threshold value. In addition, the collision type, road surface, speed, and traffic flow can significantly affect the probability of SC.
However, few researchers have analysed SC combined with the traffic flow states. Park et al. applied stochastic gradient boosting and rule extraction techniques to improve the accurate and comprehensible predictions of SC [21]. e results indicated that the unexpected traffic congestion caused by a crash has a significant effect on the occurrence of SC. Xu et al. used the zero-inflated ordered probit regression model to explore the relationship between the SC risk and traffic related variables, including traffic flow variables, geometric design factors, weather conditional factors, and PC characteristics [3]. e results showed that there is a significant difference of contributing factors between the SCprone state and the SC-free state.
Although some researchers have considered the traffic flow states into the studies of SC, no researchers have studied the difference of SC mechanism in different traffic flow states divided by classical macroscopic traffic flow theory. e common methods of classical macroscopic traffic flow theory have been widely used in numerous areas of transportation engineering, including six levels of service [18], four-phase traffic theory [17], and three-phase traffic theory [15,16]. In these common theories, the three-phase traffic theory is one of the accepted approaches for modelling freeway traffic flow [15,16]. According to the three-phase traffic theory, freeway traffic flow can be classified into three phases, including free-flow phase (F), synchronized flow phase, and wide moving jams [15,16].
To make up for the shortcomings of previous studies without considering the classical macroscopic traffic flow theory in the analysis of SC, the three-phase traffic theory is applied in this study to explore the difference of SC mechanism in various traffic flow states. Specifically, vehicle count, speed, and occupancy were collected in 30 s for each lane. e related crash data included traffic flow variables, environmental factors, geometric design factors, and others. A total of 9,919 crashes were used in this study. ere were three types of crashes, including SC, PC, and NC. PC are defined as the crashes that led to SC, while NC are defined as the crashes that did not lead to SC. e method of identification of SC and the number of SC, PC, and NC are given in Section 2. For each crash, to compensate for the possible inaccuracies in the reported collision occurrence time and identify hazardous traffic conditions ahead of the crash occurrence time, traffic data were collected for the 5-10 min prior to crash occurrence [13,19]. Non-crash cases were extracted based on crash locations and times, and the ratio between crash and noncrash is 1 : 4 [19].

Materials and Methods
In Table 1, the 30 s raw data of 5-minute intervals for each crash were further converted into the 19 traffic flow variables, in addition to 4 environment variables, 4 geometric characteristics variables, and 5 crash characteristics variables. A total of 32 candidate variables were considered.

e Identification of Traffic Flow States.
Previous studies have suggested that the traffic states defined by the threephase traffic theory can be identified by the traffic flow characteristics measured from loop detector stations. According to the three-phase traffic theory, the traffic flow is separated into three steady states, including free flow (F), synchronized flow (S), and wide moving jams (J). In addition to the three steady states, there are four traditional states in this study, including the transitional state from free low to synchronized flow (F⟶S), the transitional state from synchronized flow to free flow (S⟶F), the transitional state from synchronized flow to wide moving jams (S⟶J), and the transitional state from wide moving jams to synchronized flow (J⟶S). However, previous studies have demonstrated that the wide moving jams generally do not emerge with the free flow phase [25]. us, the transitional state from free flow to wide moving jams was not considered in the present study. e identification method of these traffic flow states has been introduced as follows [29-32]: (1) e free flow (F) is characterized by high vehicle speeds and low traffic density. e free flow phase can be easily distinguished from congested flow using the time series plot of speed and occupancy. (5) e transitional states between synchronized flow (S) and wide moving jams (J) are identified by the correlation between density and flow rate, with a correlation parameter between 0.2 and 0.5. e reduction in speed overtime is considered an indicator for the transitional state from synchronized flow to wide moving jams (S⟶J), and vice versa.

e Identification of Secondary
Crash. e method based on speed contour figure was applied to identify SC in this study. is method uses real-time traffic flow data to determine the spatial and temporal influencing range of a prior crash and simultaneously takes the effects of recurrent congestions into account. e identification method is introduced in detail as follows [13]: (1) e 5 min speed data were extracted to produce a speed contour figure for a prior crash. Specifically, the speed data were extracted from the loop detectors within 10 miles upstream and 10 miles downstream the prior crash during the time interval between 6 hours before and 6 hours after the prior. Figure 1(a) shows an example of a speed contour figure. It can be clearly seen from the figure that congestions and queue formations occur after the prior crash. However, less information has been offered by the figure about whether the queue formations resulted from recurrent congestions or the prior crash. To eliminate the effects of recurrent congestions, the spatial and temporal influencing range of the prior crash should be determined, which is given by the following two steps. (2) e 5 min speed data for the same time and same location in step one, however, from crash-free days, were extracted for the whole year in this step. For instance, the prior crash in Figure 1(a) happened at 11 : 45 am on September 20, 2010, at milepost 3.95. Following this step, the speed data for the same time and location in Figure 1(a) were collected from all crash-free days in 2010. Subsequently, the speed data for each time and location were averaged over all the crash-free days. (3) To eliminate the potential effects of recurrent congestions, the average speed in step two was subtracted from the speed data for each time and location in step one. A new speed contour figure was developed using the differences between speeds in step two and step one for various times and locations. e new speed contour figure as shown in Figure 1(b) was then used to determine the spatial and temporal influencing range of the prior crash. (4) e crashes that happened within the spatial and temporal influencing ranges of a prior crash were identified as SC. e crashes that did not lead to SC were identified as NC.
Following the above four steps of identification method, the summary of SC, PC, and NC in different traffic flow states is given in Table 2, respectively. Compared to the previous study by authors, only speed contour figure was Journal of Advanced Transportation used to identify SC without considering the shockwave in the present study [13], because this study focused on the difference in safety performance of SC associated with various traffic flow states.

Bayesian Conditional Logit Model.
As discussed in the literature review section, although some researchers have considered the traffic flow states into the studies of SC, no researchers have studied the difference in safety performance of SC associated with various traffic flow states divided by classical macroscopic traffic flow theory. e conditional logit model was applied to quantitatively analyse the relative safety performance of SC in different traffic flow states while controlling for the effects of other traffic related variables, such as weather condition, geometric metric design, road pavement, etc. e model can be written as [33][34][35][36][37] where x ijk is the kth unmatched variable for the case (j � 0) or the jth control in the ith matched set. erefore, X � {x ijk } consists of all the cases, and all matched sets are controlled, where i � 1, 2, . . . , I; j � 0, 1, ... , J; k � 1, 2, . . . , K. I represents the total number of matched sets; J represents the number of controls in each matched set; and K represents the number of contributing variables.
α i is the effect of matching variables on the probability of SC occurrence for each matched set; β k represents the estimated coefficients for explanatory variables; and x k is the unmatched contributing variables applied in this study.
A conditional probability is used to account for the selection bias of the matched case-control design. e conditional probability that the first vector of the contributing variables x i0 in the ith matched set corresponds to the case, conditional on x i0 , x i1 , . . ., x iJ being the vectors of contributing variables in the ith matched set, is shown as is study applied the Bayesian inference method based on Markov Chain Monte Carlo (MCMC) for the specification of the conditional logit model. Compared to the point estimations of the traditional maximum likelihood estimation (MLE) method, the Bayesian modelling technique regards all unknown parameters as random variables with a prior distribution. e estimates of the mean, standard deviation, and quartiles of the coefficients can be affected by the posterior distribution. Based on the Bayes' theory, the posterior distribution of parameters can be expressed as where f (β|Y) is the posterior joint distribution of parameters β conditional upon dataset Y, f(Y, β) is the joint probability distribution of dataset Y and model parameters β, f (Y|β) denotes the likelihood conditional on model parameters β, and e function π (β) is the prior distribution of model parameters β. e non-informative prior distributions were applied for the model parameters, which can be written as β ∼ Normal 0 K , 10 6 I K , where 0 K represents a K × 1 vector of zeros and I K represents a K × K identity matrices. Based on the specification of the prior distributions for the model parameters β, the posterior joint distribution f (β|Y) is expressed as e Markov Chain Monte Carlo (MCMC) method was used to generate realizations from the posterior joint distribution of the model parameters and draw parameters sequentially from equation (6). Compared to the nonstandard conditional distributions in equation (6), the Metropolis-Hasting sampling approach was applied to generate random draws.
e inference was used based on the remaining draws after discarding the draws during the burnin period.
In the present study, only Bayesian conditional logit model was used to analyse the difference in safety performance of SC associated with various traffic flow states. To compare the models, in terms of which one is better in future studies, DIC and AUC values can be adopted. DIC is recognized as Bayesian generalization of AIC (Akaike information criterion) and it was adopted for model comparisons. DIC is a combination of model fit measurement and the effective number of parameters; the smaller DIC indicates a better model fit. AUC value, which is area under the receiver operating characteristic (ROC) curve, was chosen to evaluate and compare these models; larger AUC values indicate a better goodness-of-fit and classification power [38].

Safety Performance of Secondary Crash in Different Traffic
Flow States. In this study, compared to other studies based on the Bayesian conditional logit model, the Bayesian conditional logit model was used to quantify the difference in the safety performance of SC associated with various traffic flow states divided by three-phase traffic theory. e group variables are separated based on case and control samples. In models, the events (the value of dependent variable is 1) are PC that induced SC and the non-events (the value of dependent variable is 0) are NC that did not induce SC. Because the traffic flow in this study is divided into seven states, including free flow (F), synchronized flow (S), wide moving jams (J), the transitional state from free low to synchronized flow (F⟶S), the transitional state from synchronized flow to free flow (S⟶F), the transitional state from synchronized flow to wide moving jams (S⟶J), and the transitional state from wide moving jams to synchronized flow (J⟶S), thus, in this method, the free flow (F) state is considered as the reference level, and the other six traffic flow states are considered as six independent variables. Finally, the odds ratio can be used to quantify the difference in the safety performance of SC between free flow (F) and the other six traffic flow states. is model did not include other traffic flow variables such as speed and density, because the traffic flow states were highly correlated with traffic flow variables [39].
ree parallel MCMC chains were constructed for Bayesian inference. Each MCMC chain consisted of 10 000 iterations, including an initial "burn-in" period of 4000 iterations [39]. e estimations of each parameter from the MLE method were considered initial values.
e initial values for multiple MCMC chains were dispersed throughout the 90% confidence intervals of the estimated parameters from the MLE. e convergence of the posterior distribution samples was checked by the visual inspection of the trace plots, posterior density plots, and autocorrelation function plots. In addition, the Gelman Rubin potential scale reduction (PSR) was also checked. If the PSR was lower than 1.1, the multiple chains were considered converged [39]. e estimation results of the Bayesian conditional logit models are given in Table 3. e 95% credible interval for each parameter in Table 3 indicates that the traffic flow states significantly affect the probability of SC occurrences. e odds ratio for each variable was used to quantify the safety performance of SC in different traffic flow states.
In the Bayesian conditional logit models, as shown in Table 3, the results suggest that the odds ratios of synchronized flow (S), the transitional state from free flow to synchronized flow (F⟶S), the transitional state from synchronized flow to free flow (S⟶F), the transitional state from synchronized flow to wide moving jams (S⟶J), and the transitional state from wide moving jams to synchronized flow (J⟶S) are significantly greater than free flow (F), and the odds ratio of wide moving jams (J) is not significantly greater than free flow (F). Accordingly, free flow (F) has the best safety performance in terms of the lowest SC likelihood. However, synchronized flow (S) has the highest SC likelihood, followed by the transitional state from synchronized flow to wide moving jams (S⟶J). e probabilities of SC occurrence associated with F⟶S, S⟶F, and J⟶S are very similar to each other. In detail, the SC probability of synchronized flow (S) is 8.561 times higher than free flow (F). e SC probability of the transitional state from free low to synchronized flow (F⟶S) is 2.488 times higher than free flow (F). e SC probability of the transitional state from synchronized flow to free flow (S⟶F) is 3.535 times higher than free flow (F). e SC probability of the transitional state from synchronized flow to wide moving jams (S⟶J) is 7.943 times higher than free flow (F). e SC probability of the transitional state from wide moving jams to synchronized flow (J⟶S) is 3.881 times higher than free flow (F).

e Contributing Factors of Secondary Crash in Different
Traffic Flow States. To identify how different contributing factors affect the SC probability in different traffic flow states, the traditional logistic regression models were applied [40]. e events are PC that induced SC and the non-events are NC that did not induce SC. In models, 1 is PC and 0 is NC. P value of 0.1 was employed for parameter estimate significance in these models. e models were estimated using the software package STATA. To avoid the biased results caused by multicollinearity, the Pearson correlation parameters between different candidate variables were calculated. e highly correlated explanatory variables were avoided to be included into the model simultaneously. e significant variables of the traditional logistic regression models in different traffic flow states are presented in Table 3. e meaning of symbols in Table 4 has been explained in Table 1.
In free flow (F), as shown in Table 4, Stdc and Lstdo were found to be positively related to the SC probability, but Spd and Ss were found to be negatively related to the SC probability. e difference of occupancy between adjacent lanes was found to be related to lane-change frequency [41]. e results indicated that the free flow with low speed and more lane changing behaviours can result in the increasing of SC risk. Moreover, the non-sideswipe prior crash can significantly increase the SC risk in free flow. In summary, the preventive measures for SC in free flow (F) are rapid evacuation of congestion and decreasing the interaction between vehicles.
In synchronized flow (S), Lw, Os, Occ, Lspd, and Ho were found to be positively related to the SC probability, and Stdo was found to be negatively related to the SC probability. In synchronized flow, there is a tendency to the synchronization of speeds on each lane and across different lanes [19]. e results showed that the congested flow and synchronization of traffic flow can lead to the increasing of SC likelihood. In addition, larger lane width and outer shoulder width may encourage drivers to take advantage of large space to pass the congested flow which is caused by a prior crash. us, the SC probability will increase in larger lane width and outer shoulder width situation. If the prior crash is a hit object crash, the SC probability will also increase. In summary of the preventive measures for SC in synchronized flow (S), relieving the traffic congestion as soon as possible, reducing the number of lanes, and reducing outer shoulder width can decrease the SC risk while a prior crash occurs, especially a hit object prior crash.
In wide moving jams (J), Stdo was found to be positively related to the SC probability, and Lcnt was found to be negatively related to the SC probability. e results indicated that the synchronization of vehicles between different adjacent lanes can significantly increase the SC likelihood. e large variation of occupancy in wide moving jams can  significantly increase the SC likelihood. erefore, making full use of available lanes and relieving the traffic congestion quickly, while a prior crash occurs, can help to reduce the SC risk in wide moving jams (J).
In the transitional state from free flow to synchronized flow (F⟶S), Occ and Co were found to be positively related to the SC probability, but Is, Dc, and Lstdo were found to be negatively related to the SC probability. e results implied that the significant tendency from free flow to synchronized flow can result in more congestion, less space, and more lane-change behaviours; the SC likelihood will increase with the traffic flow more and more congested in F⟶S. In addition, the evacuation of congested flow can benefit from the larger inner shoulder width, and the larger inner shoulder width can provide more space for drivers to apply crash avoidance measures. us, the larger inner shoulder width can decrease the SC likelihood.
In the transitional state from synchronized flow to free flow (S⟶F), Ds and Ss were found to be positively related to the SC probability, and Cc was found to be negatively related to the SC probability. Similar to the results of F⟶S, the more congested the transitional state tends to be, the larger the SC probability will get. In addition to the traffic flow characteristics, if the prior crash is a sideswipe crash, the SC probability will also increase. In summary of the preventive measures for SC in S⟶F state, relieving the traffic congestion quickly can help to reduce the SC risk while a prior crash occurs, especially a sideswipe prior crash.
In the transitional state from synchronized flow to wide moving jams (S⟶J), Stdo was found to be positively related to the SC probability, but Is and Dc were found to be negatively related to the SC probability. Synchronized flow and wide moving jams are both congested flow. In this transitional state, more congested flow will decrease the SC probability. is result is opposite to the results in F⟶S and S⟶F. It is because the less available space for drivers will lead to less dangerous driving behaviours. In this transitional state, similar to the results of F⟶S, the larger inner shoulder width can provide more space for drivers to apply crash avoidance measures.
In the transitional state from wide moving jams to synchronized flow (J⟶S), Tr was found to be positively related to the SC probability, and Cs was found to be negatively related to the SC probability. In previous studies, it has been proved that the prior crash including a truck is found to be a significant factor of SC [11]. In this transitional state, the results showed that the prior crash including a truck also significantly affects SC risk.

Conclusion
In this study, the traffic flow is divided by three-phase traffic theory. e main purpose is to analyse the difference of safety performance for SC in different traffic flow sates and explore how contributing factors affect the probability of SC in different traffic flow sates. e SC related data were collected from the I-880 freeway in the United States from 2006 to 2011. e Bayesian conditional logit models have been established to analyse the statistical relationship between the SC probability and traffic flow states. e traditional logistic regression models were established to quantify the effects of various variables on the SC probability in different traffic flow states.
More specifically, the results of the Bayesian conditional logit model have been summarized as follows: (1) F has the best safety performance in terms of the lowest SC likelihood (2) S has the highest SC likelihood, followed by S⟶J (3) e probabilities of SC occurrence associated with F⟶S, S⟶F, and J⟶S are very similar to each other (4) J is not significantly greater than F In addition, the results of the traditional logistic regression model have been summarized as follows: (1) In free flow (F), rapid evacuation of congestion and decreasing the interaction between vehicles can reduce the SC risk while a prior crash occurs, especially a non-sideswipe prior crash (2) In synchronized flow (S), relieving the traffic congestion as soon as possible, reducing the number of lanes, and reducing outer shoulder width can decrease the SC risk while a prior crash occurs, especially a hit object prior crash (3) In wide moving jams (J), making full use of available lanes and relieving the traffic congestion quickly, while a prior crash occurs, can help to reduce the SC risk (4) In the transitional state from free flow to synchronized flow (F⟶S) and the transitional state from synchronized flow to wide moving jams (S⟶J), larger inner shoulder width can decrease the SC likelihood (5) In the transitional state from synchronized flow to free flow (S⟶F), relieving the traffic congestion quickly can help to reduce the SC risk while a prior crash occurs, especially a sideswipe prior crash (6) In the transitional state from wide moving jams to synchronized flow (J⟶S), the prior crash including a truck also significantly affects SC risk is research can help traffic management personnel better understand which traffic flow state is more dangerous for the occurrence of SC and realize the contributing factors of SC in different traffic flow states. e results can be applied to develop effective countermeasures and reduce the SC probability in different traffic flow states. However, there are still several issues and potential future studies as follows: (1) More PC characteristics need to be considered in the analysis of SC risk. In this study, only four crash types were taken into account in models (2) In this study, the traffic flow is only divided by threephase traffic theory. More macroscopic traffic flow theories should be studied in the future Data Availability e weather data were obtained from the National Climate Data Center (NCDC) website which provides hourly weather information. e geometric and traffic data were collected from the nearest loop detector stations to each collision location and obtained from the Highway Performance Measurement System (PeMS) maintained by the California Department of Transportation (Caltrans). Crash data were obtained from the Statewide Integrated Traffic Records System (SWITRS) of the Caltrans.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.