Investigating the Differences of Single-Vehicle and Multivehicle Accident Probability Using Mixed Logit Model

Road traffic accidents are believed to be associated with not only road geometric feature and traffic characteristic, but also weather condition. To address these safety issues, it is of paramount importance to understand how these factors affect the occurrences of the crashes. Existing studies have suggested that the mechanisms of single-vehicle (SV) accidents and multivehicle (MV) accidents can be very different. Few studies were conducted to examine the difference of SV and MV accident probability by addressing unobserved heterogeneity at the same time. To investigate the different contributing factors on SV and MV, a mixed logit model is employed using disaggregated data with the response variable categorized as no accidents, SV accidents, and MV accidents. The results indicate that, in addition to speed gap, length of segment, and wet road surfaces which are significant for both SV and MV accidents, most of other variables are significant only for MV accidents. Traffic, road, and surface characteristics are main influence factors of SV and MV accident possibility. Hourly traffic volume, inside shoulder width, and wet road surface are found to produce statistically significant random parameters. Their effects on the possibility of SV and MV accident vary across different road segments.


Introduction
Given the economic costs and human casualties motor vehicle crashes continue to claim, traffic safety remains a hot topic among researchers across the world. Over the last decades, traffic safety researchers have spent tremendous effort and time to gain a better understanding of the contributory factors towards motor vehicle crash [1][2][3][4][5][6]. Despite the progress, there are many knowledge gaps yet to be filled in safetyrelated studies. One of these gaps pertains to the differences between single-vehicle and multivehicle crashes. As shown by previous studies [7][8][9], the mechanisms of single-vehicle (SV) accidents and multivehicle (MV) accidents are inherently different. Knipling [8] pointed out that single-vehicle and multivehicle crashes were related to different kinds of driver errors. Specifically, single-vehicle crashes usually resulted from loss of vehicle control that is associated with driver misbehavior. Multivehicle crashes, on the other hand, are most often caused by driver errors when interacting with other vehicles. It is therefore important to identify different contributing factors between single-vehicle and multivehicle crashes, which further offers insights for countermeasures to mitigate SV and MV crash risk, respectively. Along with this line of research, separate models were developed at first for SV and MV crashes to account for the difference between them [7,[9][10][11][12]. However, those models largely ignored the shared unobserved effects between SV and MV crashes, which is problematic and leads to inconsistent estimates [13]. To account for these shared heterogeneities between SV and MV crashes, researchers have proposed advanced models such as bivariate Poisson-gamma/lognormal models to study SV and MV accidents jointly [14][15][16][17]. These previous studies mainly employed count data models and had undoubtedly provided many useful finding which contributed to the overall understanding of the characteristics of SV and MV accidents.
Though many safety studies have already been conducted, the investigations on the mechanism of SV and 2 Journal of Advanced Transportation MV crashes especially those using disaggregate data are still lacking. Besides, methods other than Poisson-based frequency models are yet to be explored to bring a new understanding on SV and MV crashes. The objective of this study is mainly to investigate the difference of contributing factors between SV and MV accidents using disaggregate data. To this end, a comprehensive database is first established which includes road geometric features, traffic status, and environmental conditions that are processed on an hourly basis. Then, mixed logit models where the response variable is categorized as no accidents, SV accidents, and MV accidents are employed to address unobserved heterogeneity. Unlike previous aggregated studies that suffer from loss of timevarying information [4], this paper adopts refined-scale panel data to capture the time-varying information as well as to make the short-term prediction. To the authors' knowledge, there is rarely reported study so far on using the mixed logit model to analyze short-term SV and MV accidents risk. By using mixed logit models to examine SV and MV crash risk in short period, this study can address the unobserved heterogeneity and potentially provide new insights regarding the mechanisms of SV and MV crashes.
The remainder of this paper is organized as follows. Previous studies on SV and MV crashes and mixed logit model are briefly summarized in Section 2. In Section 3, a description of data is presented, followed by Section 4, where a detailed explanation of the mixed logit model structure used in this study is outlined. Section 5 presents the model results. Finally, Section 6 summarizes the conclusions and future research directions.

Previous Research
2.1. Studies on SV and MV Crashes. SV and MV crashes often refer to different types of accidents. To be specific, SV usually involves run-off-road crash and hitting objects, while MV usually relates to accidents such as rear-end crashes and sideswipes. Therefore, traffic safety researchers have long established that the etiologies of SV and MV crashes are different. Past studies have examined the different features of SV and MV crashes. For example, Mensah and Hauer [9] were among the first to investigate single-vehicle and multivehicle crashes. They developed separate models for SV and twovehicle crashes and concluded that two separate models for SV and two-vehicle crashes outperformed the model that aggregated SV and two-vehicle crashes together.
Shankar et al. [18] investigated the effects of road geometric design and environmental and seasonal characteristics on SV and MV crashes. Different types of accident data and the various risk factors such as curve number and precipitation from a 61-km highway section were collected. They found that the separate models for SV and MV crashes can better explain the data than the model that pools all the crash data together.
F. Chen and S. Chen [19] examined the injury severities of truck-involved crashes on rural highways based on the distinct models for different crash types. Mixed logit models were used to investigate different risk factors including the driver, temporal, environmental, and roadway characteristics. Their results showed that SV and MV models have their respective contributing factors. The likelihood ratio test was conducted to verify the significance of separate models over the combined model, and the results indicated that separate models are superior.
These past studies have shown that it is beneficial to develop separate models for SV and MV crashes. However, the models adopted in those studies failed to account for the dependence between SV and MV crashes. By developing separate models, possible unobserved effects shared by SV and MV crashes were typically ignored [4]. To account for the dependence between crash types, researchers have proposed multivariate models to study SV and MV accidents jointly. For instance, Yu and Abdel-Aty [17] employed Bayesian bivariate Poisson-lognormal model and hierarchical Poisson models to examine the different characteristic of SV and MV crashes.
Geedipally and Lord [15] investigated the difference of confidence intervals between disaggregated models and the combined model of SV and MV crashes. Five-year crash data on multilane undivided highways were used to develop bivariate Poisson-gamma models for crash prediction. They discovered that the univariate models provide narrower confidence intervals than the bivariate model.
Ma et al. [16] proposed a random effect bivariate Poissonlognormal model to investigate the effect of geometric features, weather, and traffic conditions on crash occurrence. Their results indicated that the proposed model could address the different levels of correlations between SV and MV crashes.
These abovementioned studies have contributed to the general understanding of SV and MV crashes. Most of these studies adopted Poisson-based models such as Poissongamma and Poisson-lognormal models to predict crash frequency. In this study, the difference of single-vehicle and multivehicle crashes will be reexamined from a different perspective. Advanced discrete choice model, that is, mixed logit model, is developed using real-time crash-related information that is processed into hourly records.

Mixed Logit Model.
Over the past two decades, researchers have developed various methods to analyze the risk factors related to traffic crash frequency. Count data models such as Poisson, Negative Binomial, and Poissonlognormal models are predominantly employed for such purposes [4]. Discrete choice models, on the other hand, are mainly used to investigate injury severity levels. For example, Barua and Tay [20] developed an ordered logit model to study the injury severities of bus crashes in Bangladesh. Xu et al. [21] used spatial logit model to examine the impact of possible risk factor on the injury severity of pedestrians in the crashes which occurred at signalized intersections.
Among various discrete choice models, mixed logit model, that is, the random parameter logit model, has become popular in injury severity studies [19,22,23]. It relaxes the independence of irrelevant alternatives assumption for multinomial logit model and offers great capability to capture unobserved heterogeneity in crash data. For instance, Haleem and Gan [23] developed a mixed logit model to investigate the injury severities of urban freeway crashes in Florida. The role of vehicle types, driver's age, and sides of impact on each injury severity are assessed to unfold their respective effects. Based on the results, two major strategies were suggested to reduce the impacts of adverse factor. Hao and Kamga [24] use ten-year crash data which occurred at highway-rail grade crossings to analyze the effect of lighting on driver injury severities based on mixed logit models. The authors established separate models for lighted intersection and unlighted intersection and found that there are common and different significant attributes for the two situations and suggested that it is necessary to focus more on how drivers react to emergencies at unlighted highway-rail intersections. These studies have all demonstrated the great potential of mixed logit models in crash analyses. By allowing the parameters to vary across observations, mixed logit models enable analysts to discover complex relationships between injury severity and its contributing factors.

Real-Time Crash Prediction.
Despite the preponderance of literature in safety research, most studies are focused on crash predictions on aggregate levels based on yearly records [1,4,14,15,18,25]. Those studies employed highly aggregated data, thus being unable to provide guidance for proactive intervention. Real-time crash risk prediction which seeks to identify crash precursor, on the other hand, shows a great appeal for proactive traffic management. It has become a hot topic and been frequently examined by researchers in recent years [17,26,27].
However, the literature on real-time crash estimation is not without limitation. To begin with, the real-time safety analysis often requires traffic turbulence measures 5-10 minutes before the crash. Therefore, one key assumption for realtime risk evaluation requires the error of reported crash time to be small. Imprialou and Quddus [28] investigated in detail the quality of police-reported crash data. The results revealed that the reported crash time which ended at zero or five minutes over the course of 1 hour was unproportionally high, with an even higher spike at the thirtieth minute. It is therefore possible that some crashes occurred earlier than the reported time. They concluded that such inaccuracy in reported crash time might significantly compromise the validity of real-time safety studies. Schlögl and Stütz [29] summarized important issues associated with data uncertainty in road safety studies. They pointed out that it is untenable to use time unit smaller than 1 hour due to rounding errors in reported crash time and called for more hourly based studies. Moreover, Roshandel et al. [27] summarized the opportunities and challenges facing real-time risk prediction. They reviewed real-time safety literature and revealed problems such as inconsistent results from different studies and poor predictive performance. Another shortcoming relates to the nature of matched casecontrol design, which is a major tool for real-time crash prediction. The readers are referred to Roshandel et al. [27] for a detailed discussion.
Given that the aggregated models are incapable of guiding proactive traffic management and that the real-time safety studies suffer from abovementioned shortcomings, this paper tries to find a middle ground that can balance both sides. By employing the crash-related data processed into hourly records, this study can be far less sensitive to the inaccuracy in reported crash time yet still being able to provide short-term (1 hour) crash prediction for proactive traffic management.

Data Description
The selected highway stretch is a part of I-25 in Colorado, which starts at Mile Marker 188.49 and ends at Mile Marker 221.03. The overall length of this stretch is 55.93 miles. The data set used in this research mainly consists of the following four sources: (1) one-year detailed crash data obtained from Colorado State Patrol; (2) highway geometric characteristic and pavement condition data obtained from Roadway Characteristic Inventory; (3) refined-scale (in 20-minute intervals) weather and surface condition data from Road Weather Information System; (4) real-time (in 2-minute intervals) traffic data detected by the traffic monitoring stations on I- 25.
In previous studies, crash data are usually processed into relatively large time interval. Such aggregation suffers from loss of the time-varying information and leads to estimation bias. To avoid these problems, the crash-related data are processed into relatively short time intervals (one hour) in the current study. The road segments are divided based on the location of traffic stations and further segmented according to the variation of geometric characteristics. For instance, if one of the main characteristics, such as speed limit, changes, this segment will be divided into two new segments. In this way, this study developed 57 road segments, 29 of which are northbound and the others are southbound. The crash data were mapped to roadway segments and processed into hourly records. Then they were matched with the traffic and weather data. The response variable resulted in four possible outcomes: (1) no accident; (2) SV accident; (3) MV accident; and (4) SV mixed with MV accident. However, there were only five out of 328,398 total observations which ended with the fourth outcome (SV mixed with MV). Due to its scarcity, the fourth outcome does not warrant a standalone category in the mixed logit model. Besides, the SV mixed with MV accident resembles the MV accident more than SV accident in terms of etiology. The resulting response variable is therefore defined as three categories, which are no accident, SV accidents, and MV accidents.
There are many detailed geometric variables in the dataset including segment length (miles), number of lanes, number of merging ramps per lane per mile, rutting condition, curvature (degree), and inside shoulder width (feet). Some important traffic control information like speed limit is also collected. There are five weather stations on the study stretch of I-25, which can provide road surface and weather data at a twenty-minute interval. The weather data of each segment are evaluated from the closest station. There are more than 20 monitoring stations that are almost evenly distributed on the road section. The traffic speed and volume data recorded by these stations at 2-minute interval are processed into hourly record. More details about how the data were processed can be found in a study by Chen et al. [30]. The descriptive statistics of response and explanatory variables are summarized in Table 1.

Methodology
For traditional crash prediction model, the effects of all the explanatory variables were assumed to be fixed across observations. Therefore, the unobserved heterogeneities were ignored. To address the problem, this study adopted mixed logit models to examine the risk factors and their degree of influence on the SV and MV accident. The model structure of mixed logit model is specified in the following section.
Since the data set used in this study is processed to panel data structure with multiple hourly observations from the same roadway segment, the number of all the observations is expressed as : where means the total number of observations in the site of ; means the total number of observations in the time period of ; and mean the number of segments and time periods, respectively. In contrast to previous studies that used cross-sectional data [31], this study adopts panel data structure and specifies the random parameter on road segment level.
Let ( ) be the probability of crash category (no accident, single-vehicle accident, and multivehicle accident) which occurred on observation : where = 1, . . . , , which means the observation on the th road segment at the th hour. is the set of all the possible crash categories which are mutually exclusive. and are different crash categories. and mean the parameter vectors of crash categories and .
is the vector of all the contribution variables for the observation , which have an influence on the possibilities of crash categories and . and are random components (also called error terms) that explain the unobserved effects on crash categories of the th road segment at th hour.

Journal of Advanced
Assuming that follows a type I extreme-value distribution [32], this results in a multinomial logit model which can be defined as where the parameter can be estimated using the maximum likelihood method. The mixed logit model is introduced by relaxing the parameters of the multinomial logit model to be variable across hours . The distribution of random parameter is specified as follows: if is a random parameter, (4) where is the index of explanatory variables for the crash category of level i; is the th parameter in at crash level ; ( , ) means that obeys a normal distribution which varies across different hours; and are the mean and standard deviation of . In this case, the mixed logit model is specified on a panel data structure where multiple observations are nested within each segment under different hours. The resulting mixed logit model is given as follows: where ( | ) is the density function of with parameter vector . The likelihood function of mixed logit model was programmed using the NLMIXED procedure in SAS software. In previous studies [19,22], the normal distribution was found to best fit the data compared to other distributions, including lognormal, triangular, and uniform distribution. Therefore, only normal distribution is considered herein.

Results
In this study, the no accident category is chosen as the base category. Hence, the estimated parameters of mixed logit model indicate the difference between the base category (no accident) and the corresponding category (SV accidents or MV accidents). To examine whether it is reasonable to divide the crash type into three categories, models with three crash categories and four crash categories are both established, respectively, for comparing. Detailed model estimation results are summarized in Tables 2 and 3. Many risk factors from different aspects (road geometric, traffic status, and environment characteristics) are shown to have significant influence on the SV and MV crash risk. AIC (Akaike information criterion) and BIC (Bayesian information criterion), which weigh model fit against model complexity, are used to compare the performance of the two models. The model of three crash categories has relatively lower AIC and BIC than the model of four categories. This result somehow provides empirical evidence that three crash categories are better than four crash categories. Therefore, the following analysis is mainly based on the results shown in Table 2. According to model estimation results, three explanatory variables are found to be better treated as random parameters (significant at the level of 95% with -statistics 2.29, 5.89, and 3.35, respectively). From Table 2, the parameter associated with the hourly traffic volume of MV crash category is normally distributed with mean 0.8228 and standard deviation 6 Journal of Advanced Transportation  consistent with people's perceptions/experience and is in line with previous studies [16,33]. Besides, the random parameter indicates that the impact of hourly traffic volume on MV crash possibility is different across road segments. The same variable is not significant in SV crash category, which means hourly traffic volume has no significant effect on singlevehicle crashes. The difference of significant parameters reveals that some essential differences do exist between single-vehicle crashes and multivehicle crashes. Inside shoulder width is also found to have random effect on different road segments for MV accidents. According to the results, the estimated mean of the parameter is not statistically significant, which may be considered as a problem. Nevertheless, a recent study by Behnood and Mannering [34] pointed out that when the standard deviation of a random parameter is significant, the mean of the random parameter does not need to be significant as well. From Table 2, the parameter of the inside shoulder width of MV crash category is normally distributed with mean 0.0318 and standard deviation 0.0472. The corresponding distribution is shown in Figure 2, which shows that 75% of the distribution is greater than zero and 25% is less than zero. This means that wider inside shoulder is associated with higher probability of MV accidents in 75% of the road segments and lower probability of MV accidents in the rest 25% of the road segments. This finding can possibly be explained by the tradeoff between forgiving geometric design and risky driving. On the one hand, wider inside shoulder tolerates more driver errors. On the other hand, it is possible that when inside shoulder exceeds a certain threshold, drivers may be more likely to take risky actions such as passing and speeding, according to risk compensation theory [35].
The parameter of the wet road surface of MV crash category is normally distributed with mean −0.3303 and standard deviation 0.6076. As shown in Figure 3, 29% of the distribution is greater than zero and 71% is less than zero. When the road surface gets wet, nearly three-quarters of the hours are related to lower likelihood of MV accidents, while the other hours are related to higher risk of MV accidents. Moreover, wet road surface is positively correlated to SV accidents, which means that wet road surface usually leads to more SV accidents. Such phenomena may be caused by some unobserved heterogeneity of driver behavior. On wet road surface, the skid resistance decreases which increases crash risks. At the same time, wet road surface is often concurrent with rainy condition when drivers tend to maintain a longer distance between vehicles. The crash risk is therefore a result of increased driver attentiveness and reduced skid resistance. As a result, wet road surface leads to mixed effects on MV crash.

Temporal Characteristics.
It can be found from Table 2 that the temporal factors do not have a strong impact on the possibility of SV accidents. As for MV accidents, the results indicate that the MV crashes are more likely to occur on weekends and less likely to occur during 4 a.m. to 5 a.m. Compared to other months, the likelihood of MV crashes is larger in November. This may be associated with some harmful impacts caused by the sudden storm and temperature change in November, 2010 [30].

Traffic Characteristic.
Speed limit indicator is used to evaluate the effect of speed limit on traffic safety. This study uses a dummy indicator to express the speed limit. If the legal speed limit is smaller than 55, then the value equals 1, otherwise equals 0. The results show that low speed limit will increase the possibility of multivehicle crashes. Some researchers also found that low speed limit will increase crash possibility [36], but they failed to reveal its different effects on SV and MV accidents. The speed gap variable denotes the difference between the average speed and the speed limit, which can represent the congestion level to some extent. As shown in Table 2, the results indicate that both SV and MV vehicle crashes are more likely to occur when the speed gap gets larger. This finding is partially similar to those in some previous studies [17]. In addition, the increase of truck ratio will decrease the likelihood of multivehicle crashes, which is in accordance with the conclusions drawn by Anastasopoulos and Mannering [37].

Road and Pavement
Characteristic. Several road characteristics are found to have significant influence on SV and MV crash risks. The length of road segment tends to increase the likelihood of both SV and MV crash, which is consistent with the research by Venkataraman et al. [38]. More merging ramps per lane per mile will decrease the likelihood of multivehicle crashes, which may be attributed to the careful driving behavior on roads consisting of more merging ramps. The same indicator has also been investigated in previous studies. Pei et al. [39] also found that an increase in the merging and diverging ramp number leads to fewer accidents, while some other researchers [37] made opposite conclusions. The difference of their conclusion may be due to their aggregate model structure, which does not consider the different mechanism between SV and MV accidents. This inconsistency among past findings points to the very need to investigate SV and MV crashes separately and uncover their respective risk factors.
A similar result is also found for the curvature variable. In this study, the results imply that higher curvature will cause higher multivehicle crash risks. Albeit some researchers found that the degree of curvature can be beneficial for traffic safety [17,39], other researchers found it positively correlated with the crash likelihood [36,40].
As for pavement conditions, it can be found that the possibility of multivehicle crashes will decrease on segments with longer remaining service live of rut. This is probably because drivers have a tendency to drive carefully on the road with deeper rut and is consistent with past studies [30].

Environmental and Surface
Characteristic. Turning to surface characteristic, wet road surface and chemical wet road surface are both shown to be associated with increased possibility of single-vehicle crash. Chemical wet road surface leads to more multivehicle crashes, while the effect of wet road surface will change across road segments because of its random nature, which has been discussed above.
From Table 2, higher visibility is related to decreased possibility of single-vehicle crashes. According to the results, other environment characteristics, such as cross wind, temperature, and humidity, are not significant. This phenomenon is plausible because the selected stretch of I-25 is relatively flat and spans across Denver metro. On highways located at mountainous terrains subjected to complex weather, the environment characteristics may impose significant influence on traffic safety.

Discussion and Conclusion
In this study, mixed logit models are developed to examine the difference of SV and MV accident probability using hourly based disaggregated crash data. One-year accident data, detailed traffic data, weather condition, road geometry, and surface condition data on I-25 from the state of Colorado were collected to establish a refined-scale panel data structure. The refined-scale is used to capture the potential information lost in aggregate data. Many risk factors are found to have varying effects on SV and MV accident probability.
These findings contribute to the literature on the risk factors that are associated with the different mechanisms of SV and MV crash. Different from the former researches which model the SV and MV accidents frequencies separately or jointly, this study uses mixed logit model to study the risk of both SV and MV crashes. Therefore, the results of this paper can provide guidance to develop more rational and effective segment management measures and accident prevention strategies. In addition, some findings are also helpful for the evaluation and improvement of the designs of existing transportation infrastructure.
The main conclusions associated with the risk factors and their different effects on SV and MV accident are summarized as follows.
(1) Speed gap, length of segment, and wet road surface are found to have significant effects on both SV and MV accident possibility. In addition to these indicators, most of other variables including time of weekends, November, low speed limit indicator, hourly traffic volume, truck percentage, and chemical wet road surface are significant for only MV accidents. Visibility indicator is significant for only SV accidents.
(2) For I-25, the main influence factors of SV and MV accident possibility are traffic, road, and surface characteristics. Other temporal and environment characteristics like weekends, special period, and visibility also have certain effects on the possibility of SV and MV accidents, respectively.
(3) The model results indicate that hourly traffic volume, inside shoulder width, and wet road surface are random parameters with normal distribution for multivehicle crash probability. So the impacts of these variables on MV accident are proved to be different across road segments. Without doubt, there are also some limitation existing in the present study. The conclusions conducted here are mainly based on the data from part of I-25, which may be not suitable for other highways. In order to get more precise and universal rules on the possibility of SV and MV accidents, further studies should be conducted on different types of highways.

Conflicts of Interest
The authors declare that they have no conflicts of interest.