Causation Analysis of Hazardous Material Road Transportation Accidents by Bayesian Network Using Genie

With the increase of hazardous materials (Hazmat) demand and transportation, frequent Hazmat road transportation accidents had arisen the widespread concern in the community. Thus, it is necessary to analyze the risk factors’ implications, which would make the safety of Hazmat transportation evolve from “passive type” to “active type”. In order to explore the influence of risk factors resulting in accidents and predict the occurrence of accidents under the combination of risk factors, 839 accidents that have occurred for the period 2015–2016 were collected and examined. The Bayesian network structure was established by experts’ knowledge using Dempster-Shafer evidence theory. Parameter learning was conducted by the Expectation-Maximization (EM) algorithm in Genie 2.0. The two main results could be likely to obtain the following. (1) The Bayesian network model can explore the most probable factor or combination leading to the accident, which calculated the posterior probability of each risk factor. For example, the importance of three or more vehicles in an accident leading to the severe accident is higher than less vehicles, and in the absence of other evidences, the most probable reasons for “explosion accident” are vehicles carrying flammable liquids, larger quantity Hazmat, vehicle failure, and transporting in autumn. (2) The model can predict the occurrence of accident by setting the influence degrees of specific factor. Such that the probability of rear-end accidents caused by “speeding” is 0.42, and the probability could reach up to 0.97 when the driver is speeding at the low-class roads. Moreover, the complex logical relationship in Hazmat road transportation accidents could be obtained, and the uncertain relation among various risk factors could be expressed. These findings could provide theoretical support for transportation corporations and government department on taking effective measures to reduce the risk of Hazmat road transportation.


Introduction
In recent years, the demand for hazardous materials (Hazmat) has increased, resulting in increasing transportation requirement.More than 95% Hazmat require off-site transportation in China, and 63% are transported by road in Brazil, as well as 90% in the United States [1,2].However, Hazmat could provide the great convenience for people's life, but also significant risks to environment and human health exist.For instance, a total of 3744 heavy trucks were involved in severe accidents in the United States, of which 3% were carrying Hazmat [3].On January 11, 2015, a tanker truck carrying gasoline collided with a bus in Pakistan, causing 57 deaths.And a tanker truck carrying liquid ammonia collided with a van, resulting in a large-scale spill of liquid ammonia, leading to 28 deaths and the number of poisons was up to 350 on March 29, 2005, which arose on negative social impacts in China [4].
Hazmat transportation accidents would be able to produce catastrophic influence on human health, public safety, environment, and property due to the special characteristic of Hazmat, attracting more attention from general public and government on the management of Hazmat road transportation.Thus, how to improve the transportation condition and reduce the risk of transportation have become important and urgent problems for the industrial development.A growing amount studies about Hazmat transportation and production have been conducted [5][6][7].Therefore, the need for investigating risk factors that contribute to Hazmat accident and the relationship of risk factors are highlighted to reduce the risk of Hazmat transportation.To that end, the effective method to describe and evaluate the accident process is causation analysis, which could be used to determine government priorities related to the implementation of prevention measures [8].And causation analysis also could provide the theoretical support for actionable information of controlling over the risk factors for the transportation corporations.In addition, exploring the most probable factor or combination leading to accidents and predicting accidents are the important research topics in the field of Hazmat safety, reducing the frequency and severity of accidents.

Literature Review
The purpose of this study is to explore risk factors to reduce the risk of Hazmat road transportation.Many studies have been conducted by using statistical methods.Haastrup and Brockhoff [9] statistically analyzed the cases of Hazmat accidents in Western Europe, and 39% of accidents occurred during transportation; in 682 accidents the consequence included fatality.A study about Hazmat transportation accidents divided risk factors into human, vehicle, packing, transportation facilities, road conditions, and environmental conditions [5].Shen [10] studied 708 accidents with Hazmat in China from 2004 to 2011 and found that accidents easily occurred at expressways, and the higher probability of spill accident is associated with accident type.Fang et al. [11] concluded that speeding was the main reason for Hazmat transportation accidents through the analysis of accident data between 1999 and 2013.
Although statistical methods could analyze the relationships between accidents and the risk factors, they cannot account for the interplay among different factors and fail to reflect the fact that an accident is not usually the result of a single factor [12].The use of causation analysis theory for accidents could extract the accident mechanism and accident models from a large number of typical accidents.For instance, Jason et al. [13] conducted the study about the influence of vehicle, occupant, driver, and environmental characteristics on accident injuries involved with heavy-duty trucks, and the conclusion was obtained by using the heteroskedastic ordered probit models, which showed that the likelihood of severe accident is estimated to rise with the more vehicles involved in accident.Uddin and Huynh [14] used an ordered probit model to explore the relationship among drivers, vehicles, roadways, environment, temporal characteristics, and the severity of accident.There was a study by using logit model to study the driver's behaviors effect on accidents, and the results indicated that the more significant risk factors were speeding, not using seatbelt, drivers' age, and drivers with no valid license [15].In addition, the Bayesian network and tree-based methods were considered to explore deeper accident mechanisms, which is increasingly utilized in traffic accidents analysis.For instance, Oña et al. [16] classified traffic accidents based on the severity of injuries by using the Bayesian networks; the factors associated with fatal or severe accidents were identified by inference, such as accident type, the driver's age, and lighting.In order to simplify the model, Mujalli et al. [17] used Bayesian networks to reduce the number of variables in the study of analyzing the accidents severity on rural roads, and the result showed that the number of variables could reduce up to 60% (the variables considered are accident type, age, atmospheric factors, gender, lighting, number of injured, and occupants involved), maintaining the good performance of models.Zhao et al. [18] pointed that the three most significant factors influencing Hazmat transportation by applying Bayesian networks were human factors, the transport vehicles and facilities, and the packaging and loading of Hazmat.Chen et al. [19] analyzed the between-accident variance and within-accident correlations by using Bayesian network and explored the risk factors influencing accidents and their heterogeneous impacts on accident severity in rural roads.And in order to improve the efficiency of emergency rescue of Hazmat transportation road accidents, a study was conducted to evaluate the time of accidents dealing based on the Bayesian network [20].In addition, the Bayesian network model could also be used to describe the probability and risk of accidents [21][22][23][24].
However, despite many studies on the traffic accidents and Hazmat accidents, most of them are studied based on the analysis of specific, isolated, and single factor [25,26].Moreover, the characteristic of Hazmat was not taken into consideration during the analysis of accidents, limiting the studies of risk factors in Hazmat road transportation.In addition, the statistical methods could reveal the inherent rules on the occurrence of accidents, but the relationship of risk factors was not observed, which cannot reflect the accident mechanism.The application of causal analysis model (such as Bayesian network) can explain the correlation between risk factors and further explain the accident mechanism, but the Bayesian network structure may exist subjectivity due to the experts' knowledge, leading to incorrect description of relationships between nodes in the Bayesian network structure.Therefore, Hazmat road transportation accidents in China from 2015 to 2016 are considered as the research object to explore the potential risk factors of accidents based on experts' knowledge.The Bayesian network is used to explore the most probable factor or combination leading to accident and determine the correlation between the risk factors, providing the decisionmaking basis for Hazmat transportation corporations and government departments to reduce the risk of Hazmat transportation.

Database
The Hazmat transportation accident data was obtained from State Work Accident Briefing System, and Chemical Accidents Information Network for two years (2015-2016) in China, and the weather data was obtained from the China Meteorological Administration.The regional distribution of Hazmat transportation accidents is shown in Figure 1.The database considered in the study contains 839 records, and each record contains detailed information including the date, time, location, type of accidents, type and number of vehicles involved in accident, driver characteristic, the quantity and categories of Hazmat, accident consequence, causes of accident, and a detailed description of the accident.Sixteen variables extracted from the database were considered as the significant factors, which are shown in Table 1.Accident information is accident type (rear-end, sideswipe, rollover, collision, and vehicle failure) and accident consequence (explosion, fire, spill, and nonspill).Previous studies [14,27] divided injury severity into five categories; the accident severity in the paper is considered as no injury, severe injury, and fatality.Simplified classification of accident severity could ease the issue of potential relationship of related consequences of an accident and ensure the sufficient sample size for the Bayesian network model [28,29].In the paper, the simplified classification of accident severity would obtain the better results.
Hazmat information is Hazmat categories and quantity of Hazmat transportation.
Driver information is characteristics of the driver, such as age and behavior.
Location information is road surface condition and accident location (such as Group one, Group two, Group three, and Group four); the special road section including intersection, freeway service areas, toll stations, and gas stations are considered in the study.
Vehicle information is type and number of vehicles involved in accident.

Definition of Bayesian Network.
Bayesian network is considered as the effective method to describe the causality between the risk factors and the output in the system, also referred to as the belief network.The Bayesian network is a Directed Acyclic Graph (DAG) and nodes represent variable status, while the directed edges represent dependencies between variables.The relationship or confidence coefficient between variables could be described by using Conditional Probability Table (CPT).The Bayesian formula is considered as the basis for the Bayesian network model, which could be expressed as where ( | ) is the probability of  under the condition of a known event .( | ) is the conditional probability of  at the occurrence of .And the joint distribution of two random variables  and  can be expressed as where () is called the prior probability and ( | ) is the posterior probability.Combined with the chain rules, reducing the complexity of the probability model, the joint distribution of n variables is and the joint distribution also could be expressed as where  = { 1 ,  2 , ⋅ ⋅ ⋅ ,   }, setting  is a network structure, P is a set of local probability distributions associated with each variable,   denotes the variable node, and () denotes the father node of  in .The construction of the Bayesian network model consists of following steps: (1) Parameter determination: analyze the risk factors of Hazmat road transportation, and determine the variables needed for modeling (nodes of the Bayesian network), which could be shown in Table 1.
(2) Structure learning: determine the dependencies or independencies relationships between variables (nodes), so that a directed acyclic network structure was constructed.
(3) Parameter learning: based on the given Bayesian network structure, determine the CPT for each node, and the dependence relationship between random variables could be described quantitatively.

Structure Learning.
The scientific network structure needs continuous iterations.At present, there are three methods to construct a Bayesian network structure [30].(1) Construct the network structure subjectively through experts' knowledge.(2) Determine the network structure objectively via the analysis of data.(3) Construct the network structure based on experts' knowledge and data analysis.The method used in the paper for accident causation analysis is that establishing a preliminary Bayesian network structure based on the model assumption and then the network structure is adjusted with experts' knowledge and data analysis, avoiding the disadvantage of strong subjectivity and enormous amount of data computing.The Bayesian network structure is constructed as shown in Figure 2.

Steps for Building a Bayesian Network Structure
(1) Establish a preliminary Bayesian network structure based on the assumptions of model.
(2) Use Delphi method to determine the relationship between risk factors.In general, there are four possible relationships between variables: (A)   directly lead to   , which could be represented as   →   .
(B)   directly lead to   , which could be represented as   ←   .
(C) The relationship between variables cannot be determined, which could be represented as   ←→   .
(D) There is no relationship between variables, which could be represented as   |   .
(3) Synthesize results from multiple experts.D-S evidence theory is used to reduce the subjectivity of experts' knowledge, and the correlation between variables could be determined.The Dempster synthesis rule formula could be expressed as where A represents the possible relationship between variables,   represents the mass function, equaling to the expert opinions, and  represents the number of experts.
(4) As the relationship of variables cannot be obtained by Delphi and D-S evidence theory, the mutual information value of variables should be calculated.And the entropy can be expressed as Conditional entropy is a measure of the uncertainty of a random variable   under the condition of giving   , which can be expressed as Before obtaining   , the uncertainty of   is (  ), and after obtaining   , the uncertainty of   is (  |   ), so that the difference of (  ) and (  |   ) is considered as the mutual information, which is expressed as

Parameter Learning.
There are missing data on Hazmat road transportation accidents; the Expectation-Maximization (EM) algorithm is considered as the effective method to perform the maximum likelihood estimation for a set of parameters  from the incomplete dataset [31][32][33].The EM algorithm starts with randomly assigning a configuration  0 for  by the system.Suppose that   is the outcome after t iterations.The calculation process mainly involved two steps: Expectation Step (E-Step) and Maximization Step (M-Step).
Consider that   is missing sample, and   is the set of all variables with missing value in the sample   .Set   =   , and the complete dataset would be obtained by adding   to   .All of the possible result would be considered by EM algorithm due to that   may have more possibility, so the weight    is assigned for each possible result by EM algorithm, and the weighted sample could be given by where    = (  =   |   ,   ), and the weight ranges from 0 to 1.

E-
Step: suppose the log-likelihood function of  based on   .

𝑚 (𝜃 | 𝐷
where  = ( 1 ,  2 , . . .,   ), and ( | ,   ) = ( |   ) is referred to as the expected log-likelihood function.In the iteration, due to the characteristic of , which is invariant, the formula could be expressed as where    is the sum of sample weights in the dataset   .

Results
The guidance for the variable selection and classification were followed by the analysis of accident data and previous studies [6,[34][35][36].In the paper, sixteen variables are considered as the significant risk factors, as shown in Table 1.There are numerous types of software to establish the Bayesian network efficiently, such as Netica, Genie, Bayes Net Toolbox, and Analytica.In the paper, Genie2.0 (developed by the Decision Systems Laboratory, the University of Pittsburgh) was considered as the effective tool to finish the Bayesian network parameter learning by using EM algorithm, which would make the construction, analysis, and visualization of Bayesian network be performed efficiently, simplifying the calculation.And the network parameters are repeatedly iterated by using the accident data; the conditions for the termination of calculation are as follows: (1) the variation of the posterior probability for single risk factor is less than 1%; (2) the cumulative variation of posterior probability for the entire network is less than 15%.The results were shown in Figure 3.

Causal Inference.
The Bayesian network could be used to calculate the posterior probability of risk factors under conditions of an accident and obtain the most likely factors or combinations that caused accidents.Set the "explosion" in "accident consequence" as the example to explore the causal inference, and the evidence variable is "explosion".As shown in Figure 4, the probabilities of risk factors are obtained through the update function of the Genie.And the probability of "autumn" in "season" increases from 22% to 35%; "vehicle failure" (referred as the tire blowout, spontaneous combustion, tanker damage) in "accident type" increases from 17% to 37%; the quantity of Hazmat increases from 8% to 20% for the category of more than 40 tons; "flammable liquids" in "Hazmat categories" increase from 51% to 65%; and the explosives increase from 3% to 8%.These findings mean that, in the absence of other evidences, the most probable reasons for "explosion" are vehicles carrying  flammable liquids, larger quantity of Hazmat, vehicle failure, and transporting in autumn.In addition, if the "fatality" in the "severity of accident" is considered as the evidence variable, the probability change of "total vehicle involved accident" could be obtained.The probability of "three" increases from 4% to 11%, and "more than three" is increasing from 3% to 9%.This may be explained by the fact that the importance of 3 or more vehicles in an accident leads to the severe accident being higher than less vehicles.Moreover, as for the accident consequence, the probability of "spill" decreases; meanwhile the "explosion" (3% to 6%) and fire (11% to 18%) have increased.Due to the special characteristic of Hazmat, explosion and fire would cause a larger area affected and can easily result in casualties, especially in the urban road and higher population densities [26].

Accident Prediction.
Based on the bidirectional reasoning, not only could the Bayesian network model obtain the risk factors or the combination caused accidents, but also the probability of accidents could be calculated under the risk factors or combination, for example, in Genie, setting the "speeding" in "driver behavior" as an evidence variable, meaning that the status of evidence variable is considered as 100%.As can be seen from Figure 5, the probability of "rear-end" in "accident type" is found to increase from 27% to 42%, indicating that the drivers' speeding could be more prone to lead to rear-end accidents.This is because the vehicle is difficult to control under the condition of speeding, and the braking time is longer.And previous studies have shown that driving behavior could significantly affect the severity of traffic accidents [37][38][39].
As shown in Figure 6, in addition to "speeding", it is assumed that the transportation route is on low-class roads; that is, "Group four" in the "accident location" is considered as the evidence variable, and the probability of the entire network is automatically updated.It can be found that the probability of "rollover" in "accident type" further increases from 42% to 97%.This finding shows that "driver behavior" and "accident location" would affect the probability of "rollover" accident on different degrees.Therefore, when the driver is speeding on low-class roads, the more attention should be paid on the rollover accident.flammable liquids due to the single-mode packaging.The quantity of Hazmat transported would significantly affect the severity of accident.The larger the quantity of Hazmat transportation, the larger the inertia of the transportation vehicles, making it not easy to control the emergency [40].Moreover, the larger quantity of Hazmat transportation is prone to the serious consequences, such as explosion and spill, threatening people's health and environment [10].

Driver Factors.
Previous studies have shown the relationship between driver's age and the severity of accidents [27,41,42].According to the model results, the younger driver (less than 35) would be more prone to inappropriate driving behavior, which indicates the need for carrying out education programs and training for younger drivers.Tavris et al. [43] also found that younger drivers were much more likely to be involved in severe and fatal accidents.As for the driving behavior, speeding is more likely to lead to rollover accident, especially on the low-class road.This could ascribe the small amount of lanes and the road condition defects on low-class roads, and the speeding would make Hazmat slosh or move around inside the tank, which can constantly shift the vehicle weight, leading to vehicle to rollover due to the off balance [44,45].

Location Factors.
The model results show that "Group one" (the posterior probability is 0.43) and "Group two" (the posterior probability is 0.40) in "accident location" are likely to be associated with severe accidents, which could be attributed by the combination of higher average speed and larger speed dispersion.More importantly, "Group one" and "Group two" roads are considered as the major transport corridors for Hazmat [10,46].In addition, some special sections would also considered as the significant risk factors; this could be explained by the fact that there are more interference factors (such as line of sight, pedestrians, and signal lights) at intersections and the greater potential explosion risk around the gas stations [47].highest posterior probability (0.41) followed by rainy (0.32).This could be ascribed that the driver's mood and visual would be decreased in cloudy and rainy, and the rainy would lower the friction coefficient of roads due to the thin film of water existing between the road surface and tires, which could make the road slippery, increasing the braking distance effectively [48,49].Regarding visibility, daytime has the highest posterior probability (0.49), and the dark is 0.33.This is because most transportation corporations are more likely to transport Hazmat at daytime in China [50].In addition, poor visibility at night would make drivers tired, resulting in driver fatigue, especially from 11:00 pm to 3:00 am [51].In the sample of accident data, drivers are more prone to fatigue status accounting for 62% of total accidents from 7:00 pm to 4:59 am.

Vehicle Factors.
As for the total vehicles involved in accident, "more than three" would easily result in higher severity of accidents.And the private car involved in accident would cause the severe accident.Two reasons could explain these findings: one is that more vehicles would cause more people involved in accidents, resulting in more people injured; another one is the disparity in mass and speed of trucks compared to other vehicles.In case of an accident, lighter vehicles (such as private cars) usually absorb the greatest part of the kinetic energy and suffer from more severe injury.
6.6.Accident Factors.Many studies have shown the significant relationship of accidents type and severity, indicating that the rollover accident is associated with the higher severity of accident [16,44].The Bayesian network results show that rollover accident has the highest posterior probability (0.41).The reason could be that Hazmat sloshing or moving around inside the tank can constantly shift the vehicle weight, making the vehicle off balance, causing the transportation vehicle to roll over, especially during abrupt evasive maneuvers or turning the vehicle [10].In addition, as for the consequence of accident, the posterior probability of spill could reach up to 0.81, threatening human health and environment.The result could be explained by that Hazmat releasing could immediately result in poisoning and suffocation, which is difficult for people on-site to escape quickly, resulting in severe and fatality accident [50].In summary, the occurrence of Hazmat road transportation accidents is unexpected, random, dangerous, and potential.Frequent accidents imply that it is necessary to explore risk factors by using accident mechanism.Bayesian network is the effective method to deal with uncertainties, which exhibit the potential hierarchical relation by the Directed Acyclic Graph.In the paper, the Bayesian network was developed based on experts' knowledge and modified based on the Hazmat road transportation accident data (N=839) in China.The Bayesian network structure was established by using Genie 2.0, and the results of network structure model reveal the influence of risk factors resulting in accidents and the relationship among risk factors.The study shows that the posterior probability of the Bayesian network could provide effective method for finding the important factors and the factors combination of accidents.These findings could provide theoretical guidance, which could help transportation corporations and government departments take necessary measures to reduce the frequency of Hazmat accidents.More importantly, it must be noted that the aforementioned results were obtained by analyzing the data sample collected from State Work Accident Briefing System and Hazardous Chemical Accidents Communications, which could be existing limitations.As for the further studies, the conclusions should be more generalizable if the dataset had larger size of sample and accidents from multiple states.

Figure 2 :
Figure 2: The Bayesian network structure for Hazmat road transportation accidents.

Figure 3 :
Figure 3: The Bayesian network model after parameter learning in Genie 2.0.

Figure 5 :
Figure 5: Accident prediction when the evidence variable is "speeding".

Figure 6 :
Figure 6: Accident prediction when the evidence variable are "speeding" and "Group four".

Table 1 :
Variables of Hazmat road transportation accidents.

Table 1 :
Continued. ) Flammable liquids have the highest posterior probability (0.51) and would easily result in explosion.This could be explained by that increasing demand for the flammable liquid and decreasing reliability of transporting 6.1.Hazmat Factors.