The paper presents a comparison between two modeling techniques, Bayesian network and Regression models, by employing them in accident severity analysis. Three severity indicators, that is, number of fatalities, number of injuries and property damage, are investigated with the two methods, and the major contribution factors and their effects are identified. The results indicate that the goodness of fit of Bayesian network is higher than that of Regression models in accident severity modeling. This finding facilitates the improvement of accuracy for accident severity prediction. Study results can be applied to the prediction of accident severity, which is one of the essential steps in accident management process. By recognizing the key influences, this research also provides suggestions for government to take effective measures to reduce accident impacts and improve traffic safety.
As a significant cause of deaths, injuries, and property loss, traffic accident is a major concern for public health and traffic safety. According to statistics from the Ministry of Public Security of China between 2009 and 2011, traffic crashes resulted in an average of 65 123 people dead and 255 540 cases injured annually in China (China Statistical Yearbook of Road Traffic Accidents, 2009–2011). It was reported that the cost of medical care and productivity losses associated with motor vehicle crash injuries was over $99 billion, or nearly $500, for each licensed driver in the United States (Centers for Disease Control and Prevention, 2010). Being one of the major steps of accident management, accident severity prediction can provide crucial information for emergency responders to evaluate the severity level of accidents, estimate the potential impacts, and implement efficient accident management procedures.
In recent years, increased attention has been directed at accident severity prediction, for which Bayesian network and Regression model are two widely used modeling techniques. However, to the authors’ knowledge, there is no study that presents quantitative comparison of the two methods. Therefore, the present work focuses on conducting an accident severity modeling by employing both Bayesian network and Regression model. The accuracies of the two methods will then be compared and a better one will be selected for accident severity prediction. By carrying out accident severity analysis, the risk factors and their effects will also be identified in the work.
The remainder of this paper is organized as follows. In Section
Regression analysis has been widely used to accident severity prediction and contributing factors determination. The most commonly used Regression models are Logistic Regression model and Ordered Probit model [
Some researchers carried out traffic accident analysis by employing Bayesian network. For instance, de Oña et al. [
Although previous works presented the advantages of adopting Bayesian network in accident severity modeling, there is no contribution that conducts a quantitative comparison of Bayesian network and Regression model. Therefore, both Bayesian network and Regression model will be applied to accident severity modeling in this work and the accuracy of the two models will be compared.
The data set for this work contains police-reported traffic accident records for Jilin province, China, in 2010. With records containing missing values eliminated, our final data set consists of 2,246 cases, which are all motor-vehicle involved accidents. In addition to severity information, the data contains information regarding accident characteristics (accident occurrence time and accident location), vehicle characteristics (vehicle type involved and vehicle condition), environmental factors (weather condition and visibility distance), and road conditions (pavement condition, road geometrics and roadway surface condition, etc.).
Previous studies [
Variables and statistics based on survey data.
Factors | Variables | Values | Percentage (%) |
---|---|---|---|
Accident severity | Number of fatalities: Nof | 0 : 1 | 89.59 |
≥1 : 2 | 10.41 | ||
Number of injuries: Noi | 0 : 1 | 9.86 | |
|
85.89 | ||
|
4.14 | ||
≥11 : 4 | 0.11 | ||
Property damage (Yuan): Pd | <1000 : 1 | 61.18 | |
|
37.19 | ||
≥30000 : 3 | 1.63 | ||
| |||
Accident characteristics | Time of day: Tod | day |
69.12 |
night |
30.88 | ||
Location-Motor vehicle lanes: L-Mvl | Yes: 1 | 71.68 | |
No: 2 | 28.32 | ||
Location-Crosswalk: L-C | Yes: 1 | 3.42 | |
No: 2 | 96.58 | ||
Location-Regular road section: L-Rrs | Yes: 1 | 60.01 | |
No: 2 | 39.99 | ||
Location-Intersection: L-I | Yes: 1 | 38.90 | |
No: 2 | 61.10 | ||
| |||
Vehicle characteristics | Motorcycle involved: Mi | Yes: 1 | 16.97 |
No: 2 | 83.03 | ||
Bus or truck involved: Bti | Yes: 1 | 95.30 | |
No: 2 | 4.70 | ||
Vehicle condition: Vc | Good: 1 | 73.79 | |
Poor: 2 | 26.21 | ||
| |||
Environmental factors | Weather condition: Wc | Sunny: 1 | 89.48 |
Other: 2 | 10.52 | ||
Visibility distance (meter): Vd | <50 : 1 | 8.90 | |
|
22.70 | ||
|
19.86 | ||
≥200 : 4 | 48.54 | ||
| |||
Roadway characteristics | Pavement condition: Pc | Asphalt or cement: 1 | 99.80 |
Other: 2 | 0.20 | ||
Roadway surface condition: Rsc | Dry: 1 | 85.16 | |
Other: 2 | 14.84 | ||
Road geometrics: Rg | Flat and straight: 1 | 98.57 | |
Hill or bend: 2 | 1.43 | ||
Traffic signal control: Tsc | Yes: 1 | 17.46 | |
No: 2 | 82.54 |
Over the last decade, Bayesian network has become a popular representation for encoding uncertain expert knowledge in expert systems. It has been applied to many fields, such as medicine, document classification, information retrieval, image processing, data fusion, and decision support systems [
Bayesian network is a graphical model representing random variables and their conditional dependencies. Figure
An example of Bayesian network.
In most cases, the graphical structure of a Bayesian network needs to be automatically learnt from the data. This learning process can be described as follows. Let a random variable
The posterior probability
In order to fully specify a Bayesian network, it is necessary to specify the conditional probability of each node upon its parent nodes in the network, given the structure
Since the number of possible structures grows exponentially as a function of the number of variables, it is computationally infeasible to find the most probable network structure, given the data, by exhaustively enumerating all possible network structures. Cooper and Herskovits [
The structure of the severity prediction Bayesian network is learned by employing the K2 algorithm and the Full-BNT toolbox, which is an open-source Matlab package for directed graphical models [
The structure of the Bayesian network.
Based on the developed structure, the parameters are learned by employing the method of Bayesian estimation. The prior distributions of all the variables are assumed to be Dirichlet distribution, which is a kind of conjugate distribution allowing closed form for posterior distribution of parameters and closed-form solution for prediction. The Full-BNT toolbox of Matlab is employed to realize the algorithm of Bayesian estimation.
As the parent nodes gather the impacts of the indirect nodes and deliver them to the child nodes, the influence of parent nodes will be focused on. Under the impact of the parent nodes, that is, factors which have direct edge to the severity indicators in this structure, the parameter learning results of number of fatalities (Nof), number of injuries (Noi), and property damage (Pd) are shown in Tables
Parameter learning results of the fatality forecasting model.
No. | Variables | Estimation results |
| |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
L-Rrs | Vc | Noi | Nof ≥ 1 | Nof = 0 | ||||||||
Bayesian | Test | Absolute error | Relative error | Bayesian | Test | Absolute error | Relative error | |||||
1 | 1 | 1 | 1 | 0.0052 | 0.0000 | 0.0052 | 1.0000 | 0.9948 | 1.0000 | 0.0052 | 0.0052 | 1123 |
2 | 1 | 1 | 2 | 0.9995 | 1.0000 | 0.0005 | 0.0005 | 0.0005 | 0.0000 | 0.0005 | 1.0000 | |
3 | 1 | 1 | 3 | 0.6622 | 0.6667 | 0.0045 | 0.0068 | 0.3378 | 0.3333 | 0.0045 | 0.0133 | |
4 | 1 | 2 | 1 | 0.2628 | 0.2626 | 0.0002 | 0.0008 | 0.7372 | 0.7374 | 0.0002 | 0.0003 | |
5 | 1 | 2 | 2 | 0.9855 | 0.9855 | 0.0000 | 0.0000 | 0.0145 | 0.0145 | 0.0000 | 0.0000 | |
6 | 1 | 2 | 3 | 0.9994 | 1.0000 | 0.0006 | 0.0006 | 0.0006 | 0.0000 | 0.0006 | 1.0000 | |
7 | 2 | 1 | 1 | 0.2012 | 0.2000 | 0.0012 | 0.0060 | 0.7988 | 0.8000 | 0.0012 | 0.0015 | |
8 | 2 | 1 | 2 | 0.9369 | 0.9375 | 0.0006 | 0.0006 | 0.0631 | 0.0625 | 0.0006 | 0.0095 | |
9 | 2 | 1 | 3 | 0.5000 | 0.0000 | 0.5000 | 1.0000 | 0.5000 | 0.0000 | 0.5000 | 1.0000 | |
10 | 2 | 2 | 1 | 0.2601 | 0.2601 | 0.0000 | 0.0000 | 0.7399 | 0.7399 | 0.0000 | 0.0000 | |
11 | 2 | 2 | 2 | 0.9575 | 0.9575 | 0.0000 | 0.0000 | 0.0425 | 0.0425 | 0.0000 | 0.0000 | |
12 | 2 | 2 | 3 | 0.9087 | 0.9091 | 0.0004 | 0.0004 | 0.0913 | 0.0909 | 0.0004 | 0.0044 |
Parameter learning results of the injury forecasting model.
No. | 1 | 2 | 3 | 4 | ||
---|---|---|---|---|---|---|
Variables | Bti | 1 | 1 | 2 | 2 | |
Vc | 1 | 2 | 1 | 2 | ||
Estimation results | Noi = 0 | Bayesian | 0.1299 | 0.0907 | 0.2444 | 0.1813 |
Test | 0.1295 | 0.0906 | 0.2439 | 0.1812 | ||
Absolute error | 0.0004 | 0.0001 | 0.0005 | 0.0001 | ||
Relative error | 0.0031 | 0.0011 | 0.0020 | 0.0006 | ||
1 ≤ Noi < 3 | Bayesian | 0.8552 | 0.8643 | 0.7293 | 0.7542 | |
Test | 0.8561 | 0.8644 | 0.7317 | 0.7544 | ||
Absolute error | 0.0009 | 0.0001 | 0.0024 | 0.0002 | ||
Relative error | 0.0011 | 0.0001 | 0.0033 | 0.0003 | ||
Noi ≥ 3 | Bayesian | 0.0150 | 0.0450 | 0.0263 | 0.0646 | |
Test | 0.0144 | 0.0450 | 0.0244 | 0.0645 | ||
Absolute error | 0.0006 | 0.0000 | 0.0019 | 0.0001 | ||
Relative error | 0.04 | 0.0000 | 0.0722 | 0.0015 | ||
| ||||||
|
1123 |
Parameter learning results of the property damage forecasting model.
No. | 1 | 2 | 3 | 4 | ||
---|---|---|---|---|---|---|
Variables | L-Rrs | 1 | 1 | 2 | 2 | |
Vc | 1 | 2 | 1 | 2 | ||
Estimation results | Pd < 1000 | Bayesian | 0.7905 | 0.6802 | 0.4402 | 0.4641 |
Test | 0.7917 | 0.6803 | 0.4405 | 0.4641 | ||
Absolute error | 0.0012 | 0.0001 | 0.0003 | 0.0000 | ||
Relative error | 0.0015 | 0.0001 | 0.0006 | 0.0000 | ||
1000 ≤ Pd < 30000 | Bayesian | 0.1983 | 0.3072 | 0.5470 | 0.5093 | |
Test | 0.1979 | 0.3072 | 0.5476 | 0.5093 | ||
Absolute error | 0.0004 | 0.0000 | 0.0006 | 0.0000 | ||
Relative error | 0.0019 | 0.0000 | 0.0011 | 0.0000 | ||
Pd |
Bayesian | 0.0113 | 0.0126 | 0.0129 | 0.0266 | |
Test | 0.0104 | 0.0125 | 0.0119 | 0.0266 | ||
Absolute error | 0.0009 | 0.0001 | 0.0010 | 0.0000 | ||
Relative error | 0.0782 | 0.0048 | 0.0772 | 0.0002 | ||
| ||||||
|
1123 |
MAPE, which looks at the average percentage difference between predicted values and observed ones, is calculated as
The MAPE value of the fatality forecasting model is 0.0226, and the
According to the developed structure, Nof’s parent nodes are L-Rrs, Vc, and Noi. The estimation results indicate that the probability of occurrence of fatal accident increases when the condition of the involved vehicle gets worse. Moreover, higher number of deaths is associated with higher number of injuries. The accident that occurs at normal section of road, but not at abnormal section or intersection, tends to cause more fatalities. The reason may be that the involved vehicle usually speeds down when going through intersection or abnormal section of road.
The MAPE value of the injury forecasting model is 0.0013, and the
Two parent nodes, namely, Bti and Vc, have direct impacts on number of injuries in the accident. The estimation results show that bus or truck involved accident tends to cause more injuries. In addition, the worse the vehicle condition is, the more injuries are in the accident.
The MAPE value of the property damage forecasting model is 0.0019, and the
Two parent nodes, that is, L-Rrs and Vc, have direct impact on property damage. The results indicate that, like the influences of Vc on Nof and Noi, poor vehicle condition is associated with large amount of property damage and vice versa. In addition, the accident that occurs at irregular section of road or intersection tends to cause large amount of property damage. Combining the effects of L-Rrs on Pd and Nof, it can be deduced that the accident that occurs at regular section of road tends to result in high number of deaths but small amount of property damage.
The most commonly used Regression models in traffic injury analysis are the Logistic Regression model and the Ordered Probit model [
As one of the Binomial choice models, Binary Logit model is commonly used in discrete choice modeling. According to the random utility theory [
Here
Assuming
The Ordered multiple choice model assumes the relationship
The Ordered Probit model, which assumes standard normal distribution for
By using logistic and probit procedure in SAS [
Estimation results of the Regression models.
Variables | Fatality forecasting model | Injury forecasting model | Property damage forecasting model | |||
---|---|---|---|---|---|---|
Coef. |
|
Coef. |
|
Coef. |
|
|
Constant | −2.57 | −12.53 | ||||
Mi | 0.44 | 3.36 | −0.14 | −2.50 | −0.09 | −1.85 |
Bi | 1.11 | 8.82 | −0.30 | −4.99 | ||
Wc | −0.27 | −1.66 | 0.23 | 2.70 | −0.12 | −1.85 |
Tod | −0.44 | −4.03 | −0.15 | −3.29 | ||
Vd | 0.10 | 4.88 | ||||
Pc | −0.38 | −5.24 | ||||
Tsc | 0.22 | 2.06 | −0.07 | −1.57 | 0.11 | 2.76 |
L-Mvl | −0.08 | −1.57 | ||||
L-C | −0.52 | −1.39 | −0.40 | −3.51 | ||
L-Rrs | 0.73 | 5.98 | 0.72 | 4.07 | ||
L-I | 0.23 | 4.94 | 0.20 | 1.10 | ||
|
−1.50 | 0.80 | ||||
|
1.47 | 2.77 | ||||
MAPE | 0.0530 | 0.0415 | 0.0698 | |||
|
84.65 | 80.20 | 60.23 |
By comparing the test results of MAPE and
Besides goodness of fit, there is also difference between Bayesian network and Regression models regarding the interactions between the variables in the model. In Bayesian network, indirect nodes (or variables), which are related to the dependent variable, affect their own child nodes first, and then the impacts are delivered to the related edges and nodes until they arrive the dependent variable [
Impact of L-C on accident severity.
Accident severity | At crosswalk | Not at crosswalk | |
---|---|---|---|
Noi | Noi = 0 | 0.0578 | 0.0881 |
1 ≤ Noi < 3 | 0.8864 | 0.8883 | |
Noi |
0.0558 | 0.0237 | |
| |||
Nof | Nof > 0 | 0.9080 | 0.8990 |
Nof = 0 | 0.0920 | 0.1010 | |
| |||
Pd | Pd < 1000 | 0.7484 | 0.7313 |
1000 ≤ Pd < 30000 | 0.2439 | 0.2614 | |
Pd |
0.0077 | 0.0074 |
Comparison of Bayesian network and Regression models with respect to the interactions between variables.
Moreover, for Regression models, two independent variables cannot exist in one model if they are related to each other. This causes the missing of some influences between variables. Also, Regression models will fail to present the impact between dependent variable and dependent variable as well as the interaction between independent variable and independent variable, such as the impact of Noi on Nof and the effect of Rsc on Bti in this study, respectively, which can be presented by the Bayesian work shown in Figure
Furthermore, as mentioned above, most of the Regression models have their own assumptions and predefined underlying relationships between dependent and independent variables (i.e., linear relations between the variables or independence between variables) [
The above characteristics of Bayesian network prove that, compared with Regression models, Bayesian network is more suitable to be adopted in accident severity analysis.
In this paper, two modeling techniques, that is, Bayesian network and Regression models, are investigated in accident severity modeling. The goodness of fit of the two methods is compared according to the test results, and the differences between the two methods are analyzed. The results suggest that, comparing with Regression models, Bayesian network is more suitable for accident severity prediction.
Study results can be applied to predicting traffic accident severity and identifying the key effects of contributed factors on accident severity. By comparing Bayesian network and Regression models, it also makes a methodological contribution in enhancing prediction accuracy of severity estimation.
It should be pointed out that both the structure and the parameter of the proposed Bayesian network will change when there are specific numbers of new reported cases added into the data set. According to the study by Zhang [
One limitation of current work is that some factors, such as driver characteristics and traffic condition, which have potential effects on accident severity, are not considered because of the lack of suitable data. Further study should be conducted to examine the impacts of these factors on accident severity.
The research is funded by the National Natural Science Foundation of China (50908099 and 51078167) and the Doctoral Program of Higher Education of China (201104493).