Critical Factors Analysis of Severe Traffic Accidents Based on Bayesian Network in China

The purpose of this study is to minimize the negative inﬂuences of the severe traﬃc accidents in China by profoundly analyzing the complex coupling relations among accident factors contributing to the single-vehicle and multivehicle traﬃc accidents with the Bayesian network (BN) crash severity model. The BN model was established by taking the critical factors identiﬁed with the improved grey correlation analysis method as node variables. The severe traﬃc accident data collected from accident reports published in China were used to validate this model. The model’s eﬃciency was validated objectively by comparing the conditional probability obtained by this model with the actual value. The result shows that the BN model can reﬂect the real relations among factors and can be seen as the target network for the severe traﬃc accidents in China. Besides, based on BN’s junction tree engine, ﬁve-factor combination sequences for the number of deaths and three-factor combination sequences for the number of injuries were ranked according to the severity degree to reveal the critical reasons and reduce the massive traﬃc accidents damage.


Introduction
Severe traffic accidents occur in random form regardless of time and space [1]. Mass casualties and high risk are two main distinctive features that can quickly differentiate severe traffic accidents from general accidents [2]. Besides, the enormous negative impacts of severe traffic accidents on public opinion and personal property security also need to be noticed by the traffic administration and scholars in the field of traffic safety [3][4][5]. However, the studies of severe traffic accidents are lacking, no matter at home or abroad. Although several relevant researches and policies have been carried out because of the high frequency of severe traffic accidents in foreign countries [6], there are still deficiencies in understanding the critical contributing factors and mechanisms of extraordinarily severe traffic accidents in China [7,8]. Besides, the tool of early accident prevention and emergency rescue is not perfect, which weakens the prevention ability. erefore, it is necessary to research severe serious traffic accidents deeply [9][10][11]. Exploring the occurrence law of serious accidents and taking effective prevention measures play an important role in reducing the severity and improving road safety in China [12].
Drivers' behaviors are considered as the main factors causing traffic accidents in early studies. Some scholars believe that drivers' illegal behaviors significantly impact road traffic safety [13], and drivers themselves are related to accidents [14][15][16]. For example, Shinar in Israel has verified that the use of seat belts is positively correlated with the age and education level of the drivers [17]. Vehicle conditions, road conditions, environmental conditions, and more and more social and economic factors have been gradually taken into account to study the impacts of fatal traffic accidents [18][19][20]. Researchers from Japan have evaluated the traffic safety of 46 prefectures in Japan and concluded that natural binding force, such as social rules and social capital, could reduce the dangerous driving behaviors [21]. Researchers from several European countries have deeply integrated police investigation data and accident reports and established an accident information collection and analysis system to analyze numerous traffic accidents in Europe [22][23][24][25][26]. It is found that the proportion of accidents caused by "vehicles driving off the roads" is up to 70% [27]. Peng and Boyle et al. have studied traffic accidents based on Washington's accident database and found that speeding, fatigue driving, distraction, and driving without seat belts would affect the occurrence rate of accidents by prediction of the logical model [28]. eofilatos et al. have analyzed the influencing factors of road accidents in urban and suburban areas based on Greece's accident data and found the influencing factors of road accidents in urban areas are the drivers' age, location of the intersection, and bicycle parking, while the main influencing factor of road accidents in suburban areas is weather condition [7]. Aidoo et al. have studied the relationship between road condition and accident frequency and found that lighting conditions at night, road alignment, and weather conditions significantly affect traffic accidents [6,29]. Zhao and Deng have studied the characteristics and development trend of expressway accidents based on the annual statistical accident reports from 1995 to 2010. e results show that the factors of weather, region, time, and vehicle type contribute to the traffic accident [30]. e Bayesian network (BN) is widely used in sample learning methods, network structure construction, reasoning mechanism learning, and so on because of the powerful reasoning function [31]. In traffic accident safety analysis, BN is widely used to analyze the causes of maritime traffic accidents and road traffic accidents [32][33][34]. In algorithm solving, the genetic algorithm is introduced into BN's incremental learning, which alleviates the local extremum problem in the searching process [35]. A loop deletion algorithm considering KL spacing is also used to learn the structure of BN, eliminating the dependency on node order in the modeling process [36]. When establishing the Bayesian network model, the researchers have comprehensively considered the decision variables of solving the problem and the relationship among various factors. ey have used the reasoning ability of BN to analyze the multiattribute decision-making problem in an uncertain environment [37].
Accidents' research has been transformed from the initial single-factor analysis to multifactor analysis for a long time [38,39]. However, several systematic reviews of the iteration among the influencing factors only consider the polymorphism of the consequences of accidents. Few indepth discussions have been conducted on the mechanism by the objective data [40,41]. is paper aims to identify the critical factors contributing to severe single-vehicle and multivehicle traffic accidents separately and explore the inherent relationships among different factors based on objective data. rough a comprehensive comparison of these factors, some recommendations can be made in this paper for active precaution system construction. Hence, an improved grey correlation analysis method and BN traffic severity model were constructed in this paper. Firstly, the weighted grey relational degree was used to determine the critical factors contributing to single-vehicle and multivehicle traffic accidents, respectively. Secondly, the BN model was constructed, taking the critical factors as the nodes and the inherent correlations as the links. irdly, the sample data was trained based on the continuous condition solved by the CH score learning theory solved with the K2 algorithm. Finally, the conditional probability based on Bayesian estimation was used to validate the model's efficiency.

Data Sources.
e investigation and disposal report of accident reports in the production process has been published in China annually to record the accidents accurately and timely, whose transparency was required since 2014. According to the property loss and casualties, four traffic accident categories are shown in Table 1. e standards and collected accident data define this paper's research objects, namely, road traffic accidents with ten or more deaths, including serious and extremely serious traffic accidents. e data of 142 investigation reports were collected from investigation and disposal reports of accident reports in the production process from 2010 to 2016, available on the State and Provincial Work Safety administrations website in China. Besides, traffic accidents were divided into two categories: single-vehicle accidents and multivehicle accidents as the distributions of "occurrence time," "occurrence location," "vehicle type," and "accident characteristics" are quite different [42]. Table 2 shows the raw data of some samples.

Data Virtualization and Discretization for the Sample Set.
It is necessary to select the influencing factors before factor analysis to improve computing efficiency and highlight the correlation degree among factors. According to the 4M systematic theory principle, humans, facilities, environments, and management are regarded as the direct factors that play the dominant role in the accident occurrence. In this paper, these surveyed reports are taken as research objects to sort out the critical accident data, which are the basis for fatal traffic accidents study in China, mainly including four aspects and 35 items.
According to the BN model, construction requirements, classification, and coding need to be processed to virtualize and discretize the nodes' attribute variables. e variables' virtualization is an assignment of each attribute. e discretization is to map the assignment of continuous variables to the several mutually disjoint ranges. Referenced by the model construction experience of the investigation and disposal report of accident reports in the production process, the assignment result of node variables identified by the improved grey correlation is shown in Table 3.

Methodology
e basic idea of the traffic crash severity analysis model based on BN is firstly, determining the critical factors based on the improved grey correlation for network construction; secondly, clarifying the potential interconnectivity among network nodes and expressing directly through the network graph and the structure learning process based on the CH score adapted in this paper; thirdly, using node probability learning based on Bayesian estimation (BE) for validating 2 Journal of Advanced Transportation the model's efficiency. e flowchart of this BN model is shown in Figure 1.

e Improved Grey Correlation Analysis Method.
Grey correlation analysis is a comprehensive evaluation method based on a grey theory using the correlation degree of comparison sequence and reference sequence to distinguish the evaluation objects. Traditional grey correlation analysis methods can be divided into three categories: Deng's grey relational analysis, absolute grey correlation analysis, and relative greyness analysis. Deviation maximization theory is applied to enhance the traditional grey correlation method and then to overcome the limitation of traditional methods from a pure perspective through assigning weights. e application of deviation maximization theory can be described as follows: (i) Definition of comparison sequence and reference sequence. e accident factor set is defined as the comparison sequence X j , X j � X j (1), Xj (2), . . . , X j (N)}(j � 1, 2, . . . , n), n is the number of factors. e accident frequency, death rate, and injury amount are defined as accident description set Y i : where α ij , β ij , c ij are the weight of Deng's correlation degree, absolute greyness degree, and relative greyness degree, respectively. (iii) Determination of weight coefficient.
e number of accidents Y 1 , the number of deaths Y 2 , and the number of injuries Y 3 are usually selected to describe a traffic accident [43]. Hence, these three indexes were used as the reference sequence in the improved grey correlation Table 1: Accident classification standard.

Traffic accidents degree
Division standard e extremely serious accidents Number of deaths ≥ 30, number of injuries ≥ 100, or direct property loss ≥ 100 million e serious accidents 10 ≤ number of deaths < 30, 50 ≤ number of injuries < 100, or 50 million ≤ direct property loss < 100 million e larger accidents 3 ≤ number of deaths < 10, 10 ≤ number of injuries < 50, or 10 million ≤ direct property loss < 50 million e general accidents Number of deaths < 3, number of injuries < 10, or direct property loss < 10 million    Journal of Advanced Transportation model for the accidents' feature. Moreover, the factors in Table 2 were inputted into the model also as the reference sequence. e flowchart of critical factor identification is shown in Figure 2.

BN Modeling.
e interdependence of multiple factors in severe traffic accidents in network graphics can be studied with BN based on probability theory. BN is mainly composed of the Directed Acyclic Graph (DAG) and the Conditional Probability Table (CPT). Utilizing the conditional relation among variables, the joint probability distribution can be formed with BN to reduce the complexity. Supposing that the random variable represented by node i is X � (X i ) i∈I , then the joint probability of node i is where X node(i) is the parent node of node i. With the probability value of the input variable (evidence variable), the probability distribution of the output variable (query variable) can be calculated according to the existing network structure and CPT. erefore, the logical relationship between node variables in the network model is manifested in the propagation of conditional probability, which makes it possible to analyze the network's inference.

Structure Learning Based on the CH Score Method and K2
Algorithm. Structure learning is a data mining process, aiming to clarify the potential interconnectivity among network nodes and express directly through network graph. e principle is to construct the network structure according to certain grading criteria and searching strategies. Although the most optimized network structure is not always available, the accuracy, complexity, and robust model can be evaluated thoroughly. e model is expressed as where Φ is the possible network structure; f is the evaluation score; N| � C represents that structure N meets the limitation of constrained requirements C. Since the evaluation function used in this paper is based on BN, the most optimized network structure is where P(N|D) is the posterior probability of structure N under a given training data set D; P(N) is the corresponding prior probability. e iteration steps for network construction are shown as follows: Step 1: e factors are selected as the initial network nodes.
Step 2: An empty network is provided, and the node sequence of c � x 1 , x 2 , . . . , x n is supposed.
Step 3: e score function is calculated and the parent nodes are updated by the nodes with more significant posterior probability and connecting.
Step 4: Judging the number of parent nodes. If |node(x i)| < 2, continue search. Moreover, give priority to the other nodes without corresponding parent nodes, which must meet the requirement that the maximization of the new CH score function , then select the x j as the new parent node; else stop search.
Step 5: e node variables and the parent nodes are connected to form the directional edge of the network.

Node Probability Learning Based on BE.
Node probability learning is searching the parameters' variables through data mining when the network structure is known. e parameter learning method of this paper is the BE, which can combine the prior knowledge and training data set to improve the model's accuracy. e fundamental mechanism is as follows.
Supposing that the prior probability of network parameters is p(λ), this paper searches the parameters with maximum posterior probability through the training data set D � x 1, x 2, . . . , x n . en, the posterior probability is calculated as According to the law of total probability, p(D) � λ P(λ|D)P(λ). Supposing that the samples are independent of each other, p(D|λ) � n i�1 p(x i |λ); then, Because of the conjugate nature of Dirichlet distribution, the calculation complexity of this network model can be reduced significantly. erefore, the Dirichlet distribution is usually used to improve the efficiency of P(λ).
When the network structure is determined, the probability relation among variables can be described by the conditional probability. Supposing that the prior distribution of each node variable is Dirichlet distribution, the Full-BNT toolbox in Matlab was used to learn the conditional probability under different contributing factors with the BN estimation method. en, the junction tree engine in Matlab Journal of Advanced Transportation 5 was used to combine the factor links.
e model's effectiveness was validated by comparing learning results and the actual results.

Critical Factors Identification Result for Single-Vehicle and Multivehicle Accidents.
e critical factor identification results based on the improved grey correlation analysis are shown in Tables 4 and 5.
According to Pearson's correlation analysis principle, the factor with a coefficient of more than 0.75 is considered to have a significant effect [44]. Hence, the factors shown above were classified and organized according to an average weighted correlation degree of more than 0.75, which is the standard to build the set of key influencing factors for singlevehicle and multivehicle traffic accidents, respectively. In conclusion, the number of accidents is the dominant feature of the system, followed by the number of injuries and deaths, indicating that various factors have a considerable correlation with the number of accidents. Similarly, the number of accidents is the dominant feature of multivehicle traffic accidents, followed by the number of injuries and deaths, indicating that various factors correlate with the number of accidents. e critical factor set for single-vehicle and multivehicle traffic accidents was established through the weighted grey correlation degree, as shown in Figures 3 and 4.

BN Network for Single-Vehicle and Multivehicle Accidents'
Severity. Based on the analysis result of the improved grey correlation method, two categories of accident variables as network nodes were obtained in this paper. e first category is the primary variables, including drivers' behavior, vehicles, road, and the environment. e second category is the result variables, including the number of deaths and the number of injuries. As for single-vehicle accidents, 13 nodes variables were selected in this paper, shown in the preliminary learning result in Figure 5. Besides, Figure 6 shows the preliminary learning result of a multivehicle traffic accident network with 16 nodes' variables. Figure 5 shows that the node variables 4, 6, and 10 are not connected to nodes 1 and 2. Figure 6 shows that the node variables of overspeed, fatigue driving, driving cross the line, and low visibility have a low correlation degree with the two features of the accident in the multivehicle network. erefore, the node variables of misoperation, driving status, and safety protection facility were deleted from the singlevehicle network. ese four factors in multivehicle preliminary BN were deleted as well. Continuing to be trained by the K2 algorithm for BN structure searching iteration, the final learning result is split into two networks, shown in Figures 7 and 8, respectively, because of the independence of the number of deaths and the number of injuries.
In Figure 7, ten factors contribute to the occurrence of single-vehicle traffic accidents with two accident feature

Determination of the factors set
Reference sequence (traffic accident feature indexes)

Comparable sequence (factors set)
Calculation of Deng's correlation degree r ij , absolute greyness degree ε ij , and relative greyness degree μ ij Determination of weight for each degree: α ij , β ij , and γ ij Calculation of weighted greyness correlation degree: τ ij = α ij r ij + β ij ε ij + γ ij μ ij

Construction of grey incidence matrix
Identification of critical factors for the extremely serious and serious traffic accidents         pavement condition (wet or dry), accident period, weather condition, and visibility. Indirect acting factors contributing to the number of injuries: road alignment, physical separation, visibility, and weather condition.
Besides, it can be found in Figure 7 that five-factor set sequences contribute to the number of deaths, and three-factor sequences contribute to the number of injuries. For example, one of the most extended sequences shown in Figure 7(a) is {accident occurring period ⟶ visibility ⟶ pavement alignment ⟶ physical separation ⟶ vehicle types ⟶ number of deaths}. One of the most extended sequences shown in Figure 7(b) is {weather condition ⟶ visibility ⟶ pavement alignment ⟶ physical separation ⟶ vehicle type-⟶ number of injuries}. Similarly, there are six factor sequences contributing to the number of deaths and three factor sequences contributing to the number of injuries in the multivehicle severe traffic accidents. e most extended sequence in Figure 8(a) is {accident occurring period ⟶ physical separation ⟶ vehicle safety status ⟶ mislane use ⟶ the number of deaths}. Besides, the most extended sequence in Figure 8(b) is {pavement alignment ⟶ heavy trunk involved or not ⟶ misoperation or not ⟶ the number of injuries}. Although the multivehicle traffic accidents usually lead to a more severe effect, the factor sequence of multivehicle traffic accidents is slightly shorter than single-vehicles', indicating that the multivehicle accidents can be prevented more quickly because of fewer causes.

BN Conditional Probability of Learning Result.
e conditional probability of influencing factors of the single-vehicle and multivehicle traffic accident was calculated and shown in Tables 6 and 7, respectively.
It can be seen from Table 6 that overspeed, commuter bus involved, straight alignment, physical separation, slippery pavement, night, bad weather, and low visibility have significant impacts on single-vehicle accidents. As shown in Table 7, the significant factors contributing to multivehicle accidents are overload, mislane use, commuter bus involved, heavy truck involved, poor braking, straight alignment, no physical separation, night, weekend, and bad weather. e corresponding variable state is then selected to analyze the risk degree of each factor by the interval sorting theory of the BN network. e risk degree ranking of factors contributing to singlevehicle and multivehicle traffic accidents is shown in Tables 8 and 9.
As shown in Tables 8 and 9, different factors contribute to different degrees of severity in terms of the number of deaths and the number of injuries in single-vehicle and multivehicle traffic accidents. e results show that the risk factors that most significantly influence the number of deaths and injuries in single-vehicle accidents are bad weather and commuter bus. e risk factors that have the most significant influence on the number of deaths and injuries in multivehicle accidents are commuter bus and night.

Severity Ranking Result of Factors Combination.
Since accidents result from multiple factors, it is necessary to study the probability distribution of the number of deaths and injuries under the combination of multiple factors based on the analysis of a single factor. As for the analysis of each factor's effect on the severe traffic accident, the posterior probability of death and injury was deduced with BN's interval theory. en, the inherent logical relations among these factors were ranked by the severity degree in terms of the number of deaths and the number of injuries, as shown in Tables 10 and 11. e study on factor sequences can reduce the accident damage and help safety managers propose effective measures. e key reasons for the enormous damage caused by bad weather conditions are overspeed, overload, and mislane use.
erefore, countermeasures of adverse weather conditions, reasonable control of vehicle speed, and proper lane use should be focused on to minimize severe traffic accidents.

Model Validation Test
Result. Using the mathematical statistics, the conditional probability accuracy of this BN accident severity model is validated. e model's efficiency can be tested by the MSE and RMSE calculated by the conditional probability of actual value and learning value shown in Table 12.
From Table 12, as for single-vehicle accidents, the model's accuracy for the number of deaths is slightly lower than the number of injuries since the maximum absolute error is 0.0027, and the mean relative error is 0.3390.
at is why the sample distribution between these two types is unbalanced. Hence, the crash severity for single-vehicle traffic accidents can be analyzed using BN model with a greater prediction accuracy when a more randomly distributed accident data sample is provided. As for multivehicle accidents, although the model's accuracy for the number of deaths is slightly lower than the number of injuries, the model still meets the requirement of prediction. Hence, the BN model can be used to analyze the crash severity for multivehicle traffic accidents.

Conclusions and Discussions
Most previous researches have studied the relations between various factors and accident indexes from a particular perspective. is paper studies the critical factors contributing to severe traffic accidents in China from single vehicle and multiple vehicles with the BN crash severity model. From the case application result, the following conclusions can be drawn: (1) e direct factors contributing to the single-vehicle traffic accidents are commuter bus involved, mislane Night ⟶ no physical separation ⟶ low braking ⟶ mislane use 2 Commuter bus involved ⟶ poor braking ⟶ mislane use 3 Poor weather ⟶ mislane use 4 Night ⟶ low visibility ⟶ straight alignment ⟶ physical separation ⟶ commuter bus involved 5 Poor weather ⟶ straight alignment ⟶ physical separation ⟶ commuter bus involved Table 11: Factor combination sequence result for the number of injuries.
Severity ranking Factor sequences 1 Poor weather ⟶ straight alignment ⟶ physical separation ⟶ commuter bus involved 2 e commuter bus involved ⟶ overload 3 Poor weather ⟶ overload