This paper presents a model system to predict severity and duration of traffic accidents by employing Ordered Probit model and Hazard model, respectively. The models are estimated using traffic accident data collected in Jilin province, China, in 2010. With the developed models, three severity indicators, namely, number of fatalities, number of injuries, and property damage, as well as accident duration, are predicted, and the important influences of related variables are identified. The results indicate that the goodness-of-fit of Ordered Probit model is higher than that of SVC model in severity modeling. In addition, accident severity is proven to be an important determinant of duration; that is, more fatalities and injuries in the accident lead to longer duration. Study results can be applied to predictions of accident severity and duration, which are two essential steps in accident management process. By recognizing those key influences, this study also provides suggestive results for government to take effective measures to reduce accident impacts and improve traffic safety.
Traffic accidents are a significant source of deaths, injuries, property damage, and a major concern for public health and traffic safety. Accidents are also a major cause of traffic congestion and delay. Effective management of accident is crucial to mitigating accident impacts and improving traffic safety and transportation system efficiency. As two major steps of the accident response program (shown in Figure
Accident response procedure.
To the authors’ knowledge, most of the previous studies examined accident severity and duration separately, although they were found to have correlation between each other. Moreover, only one or two of the three aspects of accident severity, that is, number of fatalities, number of injuries, and property damage, were investigated by existing researchers. Therefore, the present study is aimed at developing a model system to estimate both accident severity and duration. Furthermore, three indicators for accident severity will be set, which represents number of fatalities, number of injuries, and property damage, respectively. In doing so, we will provide crucial information for emergency responders to take effective management measures.
The remainder of this paper is organized as follows. In Section
As two major factors in accident analysis, severity and duration have long been important topics for research. Most of the previous studies examined only one of severity and duration. For example, with respect to severity analysis, Chang and Mannering [
Concerning severity analysis, which includes mainly three aspects, that is, number of fatalities, number of injuries, and property damage, most of the existing researchers investigated it as one comprehensive indicator; for example, Mannera and Wünsch-Ziegler [
As mentioned above, most of the previous studies examined accident severity and duration separately, although they were found to have correlation between each other. Moreover, only one or two of the three aspects of accident severity, that is, number of fatalities, number of injuries, and property damage, were investigated by the existing studies. Therefore, the present work is aimed at developing a model system to estimate both accident severity and duration. Furthermore, three indicators for accident severity will be investigated, which represent number of fatalities, number of injuries, and property damage, respectively.
The dataset for the study contains police-reported traffic accident records for Jilin province, China, in 2010. With records containing missing values eliminated, our final dataset consists of 3,914 cases, in which, 1,280 (32.70%) cases were pedestrian involved accidents and 387 (9.89%) cases were non-motor-vehicle-involved accidents. In addition to severity information, the data contains information regarding accident duration, accident characteristics (vehicle fire, crash type, accident occurrence time, and number of lanes affected), emergency services (police services, fire and rescue services, tow services, and emergency medical services), vehicle characteristics (vehicle type involved, debris involved, hazardous material involved, and disabled vehicles involved), environmental factors (weather conditions and visibility distance) and road conditions (number of lanes, pavement condition, road geometrics, and roadway surface condition, etc.).
Based on a preliminary correlation test, 4 independent variables and 26-candidate dependent variables were selected from the dataset, as shown in Table
Variables and statistics based on survey data.
Factors | Variables | Values | Percentage (%) | Variables | Values | Percentage (%) |
---|---|---|---|---|---|---|
Accident severity | Number of fatalities: Nof | 0 | 89.59 | Number of injuries: Noi | 0 | 9.86 |
|
10.38 | [1, 3) | 85.89 | |||
More than 3 | 0.03 | [3, 11) | 4.14 | |||
Property damage (yuan): Pd | Less than 1000 | 61.18 | Over 11 | 0.11 | ||
[1001, 30000) | 37.19 | |||||
Over 30000 | 1.63 | |||||
| ||||||
Duration |
Duration |
Mean (min) 192.95 | Standard deviation 111.63 | |||
| ||||||
Accident characteristics | Motor-vehicle-only accident: Mvoa | Yes | 57.41 | Vehicle fire: Vf | Yes | 8.93 |
No | 42.59 | No | 91.07 | |||
Head-on type collision: Hotc | Yes | 8.93 | Weekend or festival: Wof | Yes | 38.60 | |
No | 91.07 | No | 61.40 | |||
Rear-end type collision: Retc | Yes | 19.64 | Vehicle rollover: Vr | Yes | 26.79 | |
No | 80.36 | No | 73.21 | |||
Time of day: Tod | [00:00, 6:00) | 6.24 | Number of lanes blocked: Nolb | 0 | 3.57 | |
[6:00, 18:00) | 69.12 | 1 | 62.50 | |||
[18:00, 24:00) | 24.64 | over 1 | 33.93 | |||
| ||||||
Vehicle characteristics | Bus involved: Bi | Yes | 16.07 | Hazardous material involved: Hmi | Yes | 1.79 |
No | 83.93 | No | 98.21 | |||
Truck involved: Ti | Yes | 89.29 | Disabled vehicles involved: Dvi | Yes | 27.27 | |
No | 10.71 | No | 72.73 | |||
Debris involved: Di | Yes | 53.57 | ||||
No | 46.43 | |||||
| ||||||
Environmental factors | Weather conditions: Wc | Sunny | 89.48 | Visibility distance (meter): Vd | Less than 50 | 8.90 |
Fog | 0.23 | [50, 100) | 22.70 | |||
Sleet | 5.97 | [100, 200) | 19.86 | |||
Other | 4.32 | Over 200 | 48.54 | |||
| ||||||
Road environment factors | Number of lanes in each direction: Nol | 2 | 33.92 | Motor vehicle lanes | 71.68 | |
3 | 51.79 | Bike lane | 6.60 | |||
Over 3 | 14.29 | Accident location (horizontal): Alh | Shared motor vehicle and bike lane | 13.71 | ||
Pavement condition: Pc | Asphalt | 96.95 | Sidewalk | 2.22 | ||
Cement | 2.85 | Crosswalk | 3.42 | |||
Sand and gravel | 0.07 | Other | 2.37 | |||
Soil | 0.07 | Regular road section | 60.01 | |||
Other | 0.06 | Accident location (vertical): Alv | Four-way intersection | 20.43 | ||
Roadway surface condition: Rsc | Dry | 85.16 | Other road sections (narrow carriageway and tunnel, etc.) | 1.09 | ||
Wet | 6.38 | Other intersections | 18.47 | |||
Slippery (snowy or icy conditions) | 6.76 | Road geometrics: Rg | Flat and straight | 98.57 | ||
Other | 1.70 | Hill or bend | 1.43 | |||
Traffic signal control: Tsc | Yes | 17.46 | ||||
No | 82.54 | |||||
| ||||||
Emergency services | Police services: Ps | Yes | 71.43 | Fire and rescue services: Frs | Yes | 16.07 |
No | 28.57 | No | 83.93 | |||
Tow services: Ts | Yes | 98.21 | Emergency medical services: Ems | Yes | 33.93 | |
No | 1.79 | No | 66.07 |
With
Accident severity and duration modeling framework.
Besides the Ordered Probit model [
As shown in Table
The Ordered multiple choice model assumes the relationship:
The Ordered Probit model, which assumes standard normal distribution for
Support vector machine (SVM) is a type of learning algorithms based on statistical learning theory, which can be adjusted to map the input-output relationship for the nonlinear system [
Given a set of input-output data pairs
If the domain of output space
For classification about the training data
In order to conduct multiclass classification (as SVC model is originally designed for binary classification), one-against-one method will be employed in this paper [
By using Stata and Matlab, the severity prediction models based on Ordered Probit and SVM are estimated, respectively. The estimation results as well as the prediction accuracies are shown in Table
Estimation results of severity prediction models.
Variables | Fatality model | Injury model | Property damage model | ||||||
---|---|---|---|---|---|---|---|---|---|
SVM | Ordered Probit | SVM | Ordered Probit | SVM | Ordered Probit | ||||
Coef. | Z-stat. | Coef. | Z-stat. | Coef. | Z-stat. | ||||
Dvi | — | — | — | — | — | — |
|
0.23 | 1.99 |
Bi |
|
0.75 | 8.82 |
|
0.28 | 4.37 | — | — | — |
Ti |
|
0.64 | 7.93 |
|
0.29 | 4.61 | — | — | — |
Di | — | — | — | — | — | — |
|
0.11 | 2.00 |
Hmi |
|
0.04 | 1.34 |
|
0.04 | 1.65 |
|
0.21 | 1.96 |
Tod |
|
0.12 | 2.94 |
|
0.04 | 1.21 | — | — | — |
Wc |
|
0.12 | 2.3 |
|
−0.05 | −2.3 |
|
0.04 | 2.46 |
Vd | — | — | — | — | — | — |
|
0.10 | 5.92 |
Tsc |
|
0.03 | 2.31 |
|
−0.02 | −1.56 | — | — | — |
Alh |
|
−0.03 | −1.44 |
|
0.04 | 2.71 |
|
−0.03 | −2.12 |
Alv |
|
−0.11 | −6.88 |
|
0.06 | 5 |
|
−0.13 | −12.16 |
Rsc | — | — | — |
|
0.11 | 4.03 | — | — | — |
Rg |
|
−0.26 | −1.71 | — | — | — |
|
−0.18 | −2.01 |
Pc | — | — | — | — | — | — |
|
−0.18 | −2.09 |
Vf |
|
0.73 | 7.70 | — | — | — |
|
0.13 | 1.98 |
Vr |
|
0.04 | 1.32 | — | — | — | — | — | — |
Mvoa | — | — | — | — | — | — |
|
0.45 | 11.66 |
|
— | 0.63 | — | — | −0.68 | — | — | 0.06 | — |
|
— | 2.79 | — | — | 2.36 | — | — | 1.99 | — |
|
— |
|
— | — | 3.71 | — | — | — | — |
| |||||||||
Hit ratio (%) | 89.21 | 89.59 | 86.50 | 86.89 | 59.57 | 62.66 |
The last row shows the hit ratio for all the models. In general, higher value of hit ratio represents higher goodness-of-fit of the model. As all the hit ratio values of the Ordered Probit models are higher than that of SVM models, Ordered Probit-based models are chosen as the severity prediction models.
The results indicate that hazardous material involved in the accident, weather, and accident location are significant in all the three models. According to the estimation results, hazardous material involved will increase the probability of high property damage. The reason is that hazardous material will increase the probability of occurrence of fire or even explosion, which leads to high damage to the vehicles and goods.
Some of the variables have impact on only one or two indicators. For example, bus involved, truck involved, time of day, and traffic signal control are crucial to number of fatalities and injuries. The more buses or trucks are involved, the more fatalities and injuries the accident will cause. In addition, the factors of road geometrics, vehicle fire, and vehicle rollover are important for number of fatalities, while roadway surface condition has effect on number of injuries. The results also indicate that disabled vehicles involved, debris involved, visibility distance, pavement condition, and motor-vehicle-only accident are significant for property damage. The more disabled vehicles or debris is involved in the accident, the more property damage the accident will lead to. As for motor-vehicle-only accident, the results reveal that accidents with only vehicles involved cause more property damage than that with pedestrian or non-motor-vehicles involved.
As suggested by Nam and Mannering [
Let
Let
Thus, the hazard function can be further expressed as
The distribution of the hazard can be assumed to be one of many parametric forms or to be nonparametric. Because the distribution of the accident duration is unknown, one of the nonparametric methods, the Kaplan-Meier (KM) product limit estimator, is conducted to explore the covariates effects and the potential distribution.
As a nonparametric method, the KM estimator, produces an empirical approximation of survival and hazard but hardly takes any covariate effects into consideration. It is similar to an exploratory data analysis. Denoting the distinct failure times of individuals
By using the KM estimator, the survival function curves of the accident duration are estimated, which are shown in Figure
Survival curve of accident duration.
The AFT model permits the covariates to affect the duration dependence. Its survival function is given as
The AFT model can be expressed as a log-linear model:
Assuming that the random error
Assuming that the random error in (
Estimation results of accident duration model.
Variables | Weibull distribution | Exponential distribution | ||
---|---|---|---|---|
Coef. | z-stat. | Coef. | z-stat. | |
Constant | 5.12 | 13.99 | 4.71 | 11.76 |
Nof | 0.51 | 4.14 | 0.51 | 1.43 |
Noi | 0.33 | 4.45 | 0.34 | 1.28 |
Pd | −0.13 | −1.01 | ||
Retc | −0.37 | −2.62 | — | — |
Vr | 0.28 | 2.06 | — | — |
Nolb | 0.24 | 4.48 | 0.25 | 1.64 |
Bi | 0.60 | 4.01 | 0.41 | 1.07 |
Ti | 0.58 | 3.12 | — | — |
Di | 0.55 | 5.28 | 0.45 | 1.35 |
Hmi | 0.88 | 2.89 | — | — |
Wof | −0.14 | −1.49 | — | — |
Alv | −0.57 | −4.55 | −0.43 | −1.06 |
Nol | −0.18 | −2.81 | — | — |
Ts | 0.38 | 1.35 | — | — |
|
0.26 | — | — | — |
| ||||
Prob |
0 | 0.0067 |
The Mean absolute percentage error (MAPE), which looks at the average percentage difference between predicted values and observed ones, is adopted to examine the accuracy of the developed duration predication model. MAPE is calculated as
The MAPE value of Weibull distribution (0.22) is less than that of the Exponential distribution (0.23), indicating that the values predicted by the AFT model with the Weibull distribution is more close to the actual accident duration [
The estimation results indicate that most of the results were consistent with the theoretical expectation. According to the results, the variables with respect to accident severity significantly affect accident duration: the more fatalities and injuries occur in the accident, the longer duration it will lead to. This supports the necessity of combining predictions of accident severity and duration in one model system. Besides, accident type is revealed to be crucial to duration: comparing with other types of accidents, the duration of rear-end type collision is 37% shorter, while that of rollover is 28% longer. The results also show that the duration of accident involving bus, truck, debris, or hazard material is 60%, 58%, 55%, or 88% longer than that of other accidents, respectively. Besides, according to the results, the accident which occurs in weekend or festival is found to be associated with shorter duration. The reason is that the traffic volume in nonworking day is lower than that in working day. As for accident location, the results reveal that the accident occurs at regular road section or 4-way intersection results in longer duration than that occuring at other locations. The reason may be that the traffic volume is higher at regular road section or intersection. Regarding emergency services, the accident which needs tow services has longer duration. Moreover, as the number of lanes occupied in the accident increases, duration increases.
By using the accident duration model, the survival curve of duration is estimated, which is shown in Figure
Goodness of fit index and estimated distribution statistics of accident duration model.
Model statistics | Mean (min) | Variance | Maximum (min) | Minimum (min) | MAPE value |
---|---|---|---|---|---|
Observed value | 192.95 | 111.63 | 510 | 20 | 0.22 |
Predicted value | 188.38 | 84.52 | 327.14 | 53.03 |
The estimated survival curve of accident duration.
In this paper, a severity prediction model system was constructed by employing Ordered Probit model, and a duration prediction model was established by applying Hazard model. Accident severity, including number of fatalities, number of injuries, and property damage, as well as accident duration was forecasted with the models.
Study results can be applied to severity and duration prediction, which are essential steps in accident response process. By comparing SVM and Ordered Probit model, it also makes a methodological contribution in enhancing prediction accuracy of severity estimation. In addition, by identifying the key effects of related factors on accident severity and duration, the results provide useful clues for government to take effective measures in order to reduce accident impacts and improve traffic safety.
One limitation of current study is that some factors, such as characteristics of the driver, passenger and pedestrian, and traffic condition, which have potential effects on accident severity and duration, are not considered because of the lack of suitable data. Further study should be done to collect the related information and investigate the impacts of these factors.
The research is funded by the National Natural Science Foundation of China (50908099) and China Postdoctoral Science Special Foundation (201104526).