^{1}

^{1}

^{1}

^{2}

^{1}

^{2}

To predict the probability of roadside accidents for curved sections on highways, we chose eight risk factors that may contribute to the probability of roadside accidents to conduct simulation tests and collected a total of 12,800 data obtained from the PC-crash software. The chi-squared automatic interaction detection (CHAID) decision tree technique was employed to identify significant risk factors and explore the influence of different combinations of significant risk factors on roadside accidents according to the generated decision rules, so as to propose specific improved countermeasures as the reference for the revision of the Design Specification for Highway Alignment (JTG D20-2017) of China. Considering the effects of related interactions among different risk factors on roadside accidents, path analysis was applied to investigate the importance of the significant risk factors. The results showed that the significant risk factors were in decreasing order of importance, vehicle speed, horizontal curve radius, vehicle type, adhesion coefficient, hard shoulder width, and longitudinal slope. The first five important factors were chosen as predictors of the probability of roadside accidents in the Bayesian network analysis to establish the probability prediction model of roadside accidents. Eventually, the thresholds of the various factors for roadside accident blackspot identification were given according to probabilistic prediction results.

Roadside accidents occur when a vehicle leaves the travel line, crosses an edge line or a centre line, collides with trees, guardrails, utility poles, and other natural or man-made objects located on roadsides, or overturns or falls into deep ditches or rivers. According to the Fatal Accident Reporting System (FARS), these accident types account for more than 39% of fatal accidents in the United States [

There are several complex reasons a vehicle departs from the travelled path, such as an inappropriate avoidance manoeuvre or inattention of a driver, crossing a curve segment with a high speed, or understeering. A variety of contributing factors to roadside accidents have been identified based on various collected data and data analysis methods. Numerous studies have confirmed that highway geometric design indexes (i.e., roadway characteristics and roadside characteristics) play a significant role in whether a crash occurs resulting from driver error [

Among the environmental factors, most ROR accidents tend to occur on weekends [

In terms of human factors, the National Highway Traffic Safety Administration (NHTSA) suggested that driver distraction, fatigue, driver’s degree of familiarity with the roadway, blood alcohol presence, age, and gender were the most significant factors contributing to roadside accidents [

From a methodological perspective, different methods have been employed to determine these factors. Originally, Zegeer and Deacon [

Although there have been a considerable number of roadside accident frequency studies, few studies have focused on the quantitative analysis of roadside accident probability. Various approaches (i.e., Poisson model, NB model, ZIP model, and ZINB model) are capable of predicting the number or frequency of roadside accidents based on mass accident data but cannot precisely calculate probability values under the effects of various variables. Moreover, the research results based on the prediction of accident frequency or number are often influenced by different traffic characteristics in various regions, which is not universal. Considering that accident probability is more able to represent the degree of frequent accidents, it is better to carry out the prediction of accident probability than to carry out the prediction of accident frequency or number. To identify the roadside accidents blackspot and reduce the accidents probability, we therefore used a data mining technique (i.e., CHAID decision tree technique) to identify significant risk factors contributing to roadside accidents and another data mining technique (i.e., Bayesian network analysis) to establish the probability prediction model of roadside accidents. Additionally, we investigate the importance of various variables under the interactions of accident occurrence by developing a path analysis based on a logistic regression model. To the best of our knowledge, no research has used these three methods together in the study of the probability of roadside accidents.

Substantial statistical analysis generally relies on historical accident data. However, the constantly changing traffic environment, the high cost of maintaining or collecting roadside accident data, and the long-term lack of detailed data have formed a barrier to developing a study of the relationship between road design and the probability and severity of accidents [

We chose highway geometric design indexes (horizontal curve radius, hard shoulder width, longitudinal slope, superelevation slope, and width value of the curve), pavement condition (adhesion coefficient), and traffic characteristics (vehicle speed and vehicle type) as input variables, and vehicle final states as the output variable. In the present study, the final states of vehicles include departing from the roadway and not departing from the roadway. The former state refers to the circumstances of vehicle rollover or any of the vehicle wheels entering the slope represents the occurrence of a roadside accident (see Figure

Occurrence of roadside accidents. (a) Vehicle rollover. (b) Wheel of vehicle entering slope.

Consider that the values of slope gradient and slope height mainly affect the severity of the roadside accident when the vehicle enters the slope and have little effect on the occurrence of the roadside accident. In addition, in combination with the provisions of carriageway width and crown slope in the Design Specification for Highway Alignment (DSHA) (JTG D20-2017) of China [

Vehicle parameter.

Parameter | Value | |
---|---|---|

Car | Truck | |

Length (m) | 4.325 | 6.370 |

Width (m) | 1.765 | 2.500 |

Height (m) | 1.420 | 3.100 |

Wheelbase (m) | 2.690 | 3.700 |

Weight (kg) | 1385 | 7200 |

Height of centre of gravity (m) | 0.450 | 1.200 |

Distance of height of centre of gravity from front axle (m) | 1.210 | 1.070 |

Tyre pattern | 215/50 R 16 (621 mm) | 7.50 R 16 (719 mm) |

ABS | Yes | Yes |

ESP | Yes | No |

Notably, in the vehicle parameter setting, the steering of the vehicle was set ahead to match with different horizontal curve radii because we are unable to involve the driving behaviour factors considering the characteristics of the simulation software. For instance, when the horizontal curve radius is 200 m, the steering degree of the car is automatically updated to 1.57° and 1.54° to match the above radius by setting the turning radius of the vehicle as 200 m in the simulation software (see Figure

Setting (a) the steering degree for the vehicle and (b) the width value of the curve.

Each variable value is shown in Table

Description of variables.

Variable | Value | |||||
---|---|---|---|---|---|---|

Highway geometric design indexes | Horizontal curve radius (m) | 200 | 300 | 400 | 500 | 600 |

Hard shoulder width (m) | 0.75 | 1.5 | 2.25 | 3.00 | ||

Longitudinal slope (%) | 0 | 2 | 4 | 6 | ||

Superelevation slope (%) | 0 | 2 | 4 | 6 | ||

Width value of curve (m) | 0.4 | 0.6 | ||||

Pavement condition | Adhesion coefficient | 0.2 | 0.4 | 0.6 | 0.8 | |

Traffic characteristics | Vehicle type | “Truck” = 0 | “Car” = 1 | |||

Vehicle speed (km/h) | 40 | 60 | 80 | 100 | 120 | |

Output variable | Vehicle final state | “No departing from road” = 0 | “Departing from road” = 1 |

Similarly, the setting of superelevation slope can be achieved by adjusting the difference in height from the outside to the inside of the test section, and the difference in height

According to the value of each variable (excluding the width value of the curve) from the highway geometric design indexes and pavement condition (see Table

The CHAID decision tree, as a data mining technique, has been widely applied in various fields, such as the airline industry and public transport management. However, few studies have investigated traffic risk, especially for roadside accidents.

The CHAID decision tree is a technique of database segmentation that is capable of extracting significant information from a large quantity of data [

The CHAID analysis is generally called tree analysis, similar to a trunk (i.e., original node) being split into multiple branches; then, more branches until the trunk cannot be split any further in which case overfitting occurs. To identify optimal splits, the chi-square independence test is employed to examine and test the cross tabulations between each of the input variables (i.e., predictors of the occurrence of roadside accidents) and the outcome variables (i.e., occurrence of roadside accidents). The CHAID decision tree is, therefore, capable of providing detail that identifies the significant factors that result in the highest or lowest risk of roadside accidents using a series of if-then-else rules.

Furthermore, to prevent the occurrence of overfitting, CHAID uses

Path analysis is a form of structural equation modelling (SEM), in which all the variables are observed variables. In the present study, SEM was used because the mediated and moderated relationships of a set of variables can be tested in SEM. In other words, SEM can not only test the direct impact of independent variables on dependent variables but also analyses the indirect effect on dependent variables through other variables (mediators). In path analysis, mediation, moderation, moderated mediation, and mediated moderation can all be tested [

A simple mediation model describes a model in which the independent variable _{i} is the independent variable, _{1}, _{1}, and _{2} are partial regression coefficients of the models.

However, these partial regression coefficients in the above models denote the direct effect of various variables but cannot reflect the magnitude of impact from these variables on the outcome variable due to the presence of their different units and standard deviation. For this purpose, a binary logistic regression model was fitted to obtain a standard regression coefficient that can meet the demand of testing the magnitude of direct effects from input variables on the outcome variable as follows:_{i}; _{i} is the partial regression coefficient of _{i}; _{i} is the standard deviation of _{i}; and _{Z} is the standard deviation of the _{i} on the outcome variable (

Then, the indirect effect of _{i}) can be estimated using the product-of-coefficient estimator as in [_{i} on _{ij} is the correlation coefficient between _{i} and _{j}. Finally, the overall effects _{i} on

The Bayesian network became popular in the late 1990s and has been increasingly used since 2000. The Bayesian network, also known as the belief network, is regarded as one of the most effective theoretical models applied for representation and reasoning of uncertain knowledge. Bayesian nets and probabilistic directed acyclic graphs are technologies for graphically representing the joint probability distribution of a set of selected variables [_{ij} under the premise of _{j} occurrence, and _{j} under the effects of a set of variables

Compared to other theoretical models, the Bayesian network is suitable for traffic safety studies based on the following advantages: (1) combining data with expert experience and prior knowledge, (2) avoiding overfitting, (3) dealing with missing data, and (4) denoting causality by means of providing an understandable graph [

For crossvalidation, we divided the accident data obtained from the simulation into a training dataset (70%) and a test dataset (30%). The training data were applied to fit the model and estimate the model parameters, while the test data were used to determine the model for its ability to generalize and confirm the model’s applicability to independent variables. In the present study, we used exhaustive CHAID because it is superior in checking all possible splits [

CHAID provides the percentage of records with a particular value to the outcome variable, and the given value represents the confidence (accuracy) of the generated rules for the input variables. The overall classification accuracy of both the training set and testing set was 94% using the CHAID decision tree. Moreover, the

CHAID analysis took 3,783 samples from the overall dataset for testing, and the percentage of roadside accident data was 22%. All data involving roadside accidents and nonroadside accident occurrences were divided into 67 subgroups from the parent node to child nodes through different branches. The percentage of roadside accidents varied from 0% to 100%. The decision tree included horizontal curve radius, hard shoulder width, longitudinal slope, adhesion coefficient, vehicle speed, and vehicle type in the final structure, which indicates that these variables are significant risk factors in determining the occurrence of roadside accidents. Other predictors not involved in the tree structure (i.e., superelevation slope and width value of the curve) only play a slight role in improving roadside safety performance.

Figure

Decision tree for the identification of risk factors. (a) Tree 1. (b) Tree 2. (c) Tree 3.

Decision rules.

No. | Classification level | Percentage (%) | |||
---|---|---|---|---|---|

1 | 2 | 3 | 4 | ||

1 | ≤40 | — | — | — | 0 |

2 | 40 < | — | 52.9 | ||

0.2 < | 0.8 | ||||

3 | 200 < | — | — | 0 | |

4 | — | — | 60.5 | ||

300 < | — | — | 5.1 | ||

400 < | — | — | 0 | ||

5 | 60 < | — | 100 | ||

0.6 ≤ | — | 30.3 | |||

6 | 0.6 ≤ | Truck | 59.4 | ||

Car | 9.1 | ||||

7 | 300 < | — | 4.2 | ||

0 | |||||

8 | 80 < | 300 < | — | 66.7 | |

10.7 | |||||

9 | 400 < | Truck | 34.2 | ||

Car | 0 | ||||

— | 0.4 | ||||

10 | 100 < | 300 < | Truck | 76.7 | |

32.5 | |||||

11 | 400 < | Truck | 67.6 | ||

0 | |||||

12 | 200 < | Truck | 80.6 | ||

100 |

Note:

Each decision rule in Table

According to decision rule 1, when

Decision rules 6 and 9 presented that the percentage of roadside accidents for trucks was larger than that for cars under the same road condition, which can be concluded that trucks have a higher risk of roadside accidents compared to accidents involving cars because the higher centre of gravity for trucks cause them to be more likely to rollover than cars.

It can be seen from decision rules 2 and 5 that, in case of 40 km/h <

According to decision rule 7, when 60 km/h <

Decision rule 9 showed that, in case of 80 km/h <

It can be seen from decision rules 10 and 11 that when 100 km/h <

Using decision tree analysis, we discussed the relationship between different combinations of risk predictors and the occurrence of roadside accidents and identified the significant risk factors resulting in roadside accidents. However, the magnitude of the importance of these factors has not been investigated. To obtain a deeper insight into the interactions of factors and their impacts on roadside accidents, a path analysis based on a logistic regression model was built.

We input the risk factors (horizontal curve radius, hard shoulder width, longitudinal slope, adhesion coefficient, vehicle speed, and vehicle type) into the path analysis model and found that these factors were also statistically significant because they were all retained by the model. The coefficient of determination ^{2} = 0.868, illustrating the model fit, is good. Table

Modelling results.

Variable | Parameter estimate | S.E.^{a} | S.D.^{b} | Standard parameter estimate | |
---|---|---|---|---|---|

Horizontal curve radius | −0.033 | 0.001 | 141.388 | <0.05 | −2.572 |

Hard shoulder width | −2.527 | 0.108 | 0.583 | <0.05 | −0.812 |

Longitudinal slope | 0.430 | 0.027 | 1.325 | <0.05 | 0.314 |

Adhesion coefficient | −6.699 | 0.279 | 0.224 | <0.05 | −0.827 |

Vehicle speed | 0.213 | 0.006 | 28.282 | <0.05 | 3.321 |

Vehicle type | −3.645 | 0.136 | 0.5 | <0.05 | −1.005 |

^{a}is standard error. ^{b}is standard deviation.

It is important to note that unlike real accident data, there seemed to be no interaction between factors in the present study because the values of all these factors were set artificially in the simulation. However, to investigate the indirect effects caused by the interaction of variables on the occurrence of roadside accidents, we assumed that the correlation coefficient between variables could be regarded as their interaction.

A structural diagram of path analysis is shown as Figure

Structure diagram of path analysis.

Table

The magnitude of the impact of factors on roadside accidents.

Variable | Direct effect | Indirect effect | ||||||
---|---|---|---|---|---|---|---|---|

Vehicle speed | Horizontal curve radius | Adhesion coefficient | Hard shoulder width | Longitudinal slope | Vehicle type | Overall effect | ||

Vehicle speed | 3.321 | — | 2.292 | 0.508 | 0.484 | 0.461 | 0.684 | 7.749 |

Horizontal curve radius | −2.572 | −2.960 | — | −0.501 | −0.477 | −0.456 | −0.679 | −7.644 |

Adhesion coefficient | −0.827 | −2.040 | −1.559 | — | −0.323 | −0.305 | −0.443 | −5.496 |

Hard shoulder width | −0.812 | −1.979 | −1.510 | −0.329 | — | −0.300 | −0.443 | −5.373 |

Longitudinal slope | 0.314 | 1.066 | 1.007 | 0.124 | 0.013 | — | 0.083 | 2.607 |

Vehicle type | −1.005 | −2.272 | −1.746 | −0.366 | −0.360 | −0.341 | — | −6.086 |

According to the overall effect of each risk factor on roadside accidents shown in Table

Given that Bayesian network performs best with a small set of variables [

In the present study, the Bayesian network structure was developed based on the results of path analysis, and the Bayesian network parameter learning of roadside accidents was performed using the GD algorithm in Netica software, in which the prior and conditional probability distribution of each node could be obtained. In addition, according to the sensitivity analysis (see Table

Sensitivity analysis result of the node “roadside accident.”

Node | Mutual info | Percent | Variance in beliefs |
---|---|---|---|

Roadside accident | 0.84054 | 100 | 0.1968046 |

Vehicle speed | 0.13101 | 15.6 | 0.0358728 |

Horizontal curve radius | 0.07927 | 9.43 | 0.0220364 |

Vehicle type | 0.01063 | 1.27 | 0.0028862 |

Adhesion coefficient | 0.00959 | 1.14 | 0.0026972 |

Hard shoulder width | 0.00731 | 0.87 | 0.0020842 |

Bayesian network prediction model of roadside accidents. Note: 0 represents “truck” and 1 represents “car” in vehicle type; 0 denotes “no roadside accident occurrence” and 1 denotes “roadside accidents occurrence” in roadside accidents.

The probability of roadside accidents (i.e., posterior probability) under different combinations of variables can be obtained in this prediction model. For instance, assuming that a road section was a dry asphalt pavement with a speed limit of 80 km/h, a horizontal curve radius of 235 m, and a hard shoulder width of 0.75 m, then the probability of roadside accidents for truck passing through above road section need be predicted. First, the state of 60 km/h <

Probability calculation of roadside accidents.

Furthermore, the developed prediction model can also predict probabilities under the effects of any number (from 1 to 5) of factors (i.e., in the absence of some factors). For example, given that the speed limit of a road section was 80 km/h and the width of hard shoulder was 0.75 m, but lack of other indicators, and it could also be calculated that the probability of roadside accidents for car with a speed of 60 km/h <

Probability calculation of roadside accidents. (a) Result 1. (b) Result 2. (c) Result 3. (d) Result 4. (e) Result 5. (f) Result 6.

For another example, assume a road section was a dry asphalt pavement and horizontal curve radius and hard shoulder width were unknown. If the speed limit of this section was 80 km/h, the probabilities of roadside accident were 3.52% for car and 14.9% for truck (see Figures

It is important to note that when various variables were in extreme states tending to avoid roadside accidents, even if vehicle speed was set as 120 km/h, whether for car or truck, and the probability of roadside accidents was, not as expected, only 1.31% (see Figure

Threshold of significant factors leading to frequent roadside accidents.

No. | Vehicle speed (km/h) | Vehicle type | Horizontal curve radius (m) | Hard shoulder width (m) | Adhesion coefficient | Probability (>50%) |
---|---|---|---|---|---|---|

1 | 80 ≥ | Truck | 300 | ≥55.9 | ||

2 | 200 | ≥54.3 | ||||

3 | ≥54.9 | |||||

4 | 100 ≥ | 500 | ≥54.2 | |||

5 | 400 | ≥55.0 | ||||

6 | 300 | ≥54.8 | ||||

7 | ≥55.3 | |||||

8 | 120 ≥ | 500 | 1.50 | ≥55.3 | ||

9 | ≥54.3 | |||||

10 | ≥54.2 | |||||

11 | 80 ≥ | Car | ≥54.2 | |||

12 | 100 ≥ | 300 | ≥55.3 | |||

13 | 200 | ≥55.3 | ||||

14 | ≥54.2 | |||||

15 | 120 ≥ | 300 | ≥54.5 | |||

16 | 200 | 0.4 | ≥54.6 | |||

17 | ≥55.9 | |||||

18 | ≥54.3 |

We considered that there was a high frequency of roadside accidents (i.e., accident blackspot) when the probability of roadside accidents occurrence was greater than that of no roadside accidents occurrence (i.e., the probability of roadside accidents was greater than 50%). According to the results from Table

In this paper, a section (K2639 + 498.02 to K2679 + 170) from G105 was selected to confirm the effectiveness of the proposed method of identification. The G105 is a first-class road with a design speed of 80 km/h. By collecting road design documents and data of annual operating speed, the location of K2669 + 256.378 is determined to be the road section with frequent accidents according to the risk factor threshold, as shown in Table

Operating speed distributions. (a) Car. (b) Truck.

The importance of such a study lies in the fact that it can help authorities identify significant risk factors that result in frequent roadside accidents in small curve segments to implement effective countermeasures or optimize alignment design in the process of future road construction and reconstruction. For instance, most of the thresholds for trucks were larger than those for cars at the same vehicle speed in Table

The issue of roadside safety is crucial, especially for curve sections. In the present study, we employed CHAID decision tree analysis to identify significant risk factors resulting in the occurrence of roadside accidents, explored the impact of different combinations of risk factors on roadside accidents, and then used path analysis to determine the importance of these significant risk factors by investigating their direct and indirect effects on roadside accident occurrence. According to the results of the CHAID technique and path analysis, the significant predictors were in decreasing order of importance, vehicle speed, horizontal curve radius, vehicle type, adhesion coefficient, hard shoulder width, and longitudinal slope. The first five important factors were included as predictors of the probability of roadside accidents in the Bayesian network analysis to establish the probability prediction model of roadside accidents. Based on the results of probabilities of roadside accidents, the thresholds of horizontal curve radius, adhesion coefficient, and hard shoulder width corresponding to different vehicle speeds and vehicle types for accident blackspot identification in curve section were given.

These findings contribute to improving roadside safety in curve sections with a small radius. For instance, we confirmed again that vehicle speed and horizontal curve radius are still the most critical factors leading to roadside accidents, whether in this study or other previous literature [

For the highway with an operating speed of 60 km/h and a horizontal curve radius ≤200 m or an operating speed of 80 km/h and a horizontal curve radius ≤300 m, antislip measures should be strengthened

For the highway with an operating speed of 100 km/h and a horizontal curve radius of 300 m <

For the freeway with an operating speed of 120 km/h and a horizontal curve radius of 300 m <

Another important findings is that compared with cars, the width of the hard shoulder has a more significant influence on roadside accidents involving trucks, and trucks are more likely to have roadside accidents, especially in case of the vehicle speed >80 km/h. To ensure truck driving safety, the design standards of the horizontal curve radius, adhesion coefficient, and hard shoulder width should be further improved by decision makers in future highways construction. Additionally, limiting the load and running speed can be the most effective measures to mitigate the risk resulting from a higher centre of gravity. In recent years, a real-time monitoring system transmitting warning messages to truck drivers in cases of overload or overspeed has been designed by combining embedded technology and GPRS technology [

The most remarkable result in this paper is that the developed Bayesian network prediction model can achieve the quantitative analysis of the probability of roadside accidents under the effects of any number (from 1 to 5) of factors. The resulting threshold of factors leading to accident blackspot can be a guide for authorities to identify and check roadside accidents prone areas located in small curve sections. In fact, if there are obstacles to promoting safe design standards for the horizontal curve radius, the adhesion coefficient, and the hard shoulder width due to high construction cost or unrealistic issues, many other effective countermeasures, such as setting deceleration strips in the pavement or related warning signs to control running speeds, widening the road in curve sections to provide a fault-tolerant space for drivers [

Despite these promising results, some limitations exist in this paper. For example, this paper mainly predicts the roadside accident probability for two-way two-lanes or outer lanes of more than two lanes. Therefore, it remains to be further studied whether the prediction model is applicable to other road types (e.g., inner lanes of more than two lanes). In future studies, given the important impact of vehicle speed on roadside accidents, the limitation of maximum safe speed corresponding to different road geometric designs will be an additional research direction.

The data used to support the findings of this study are available from the corresponding author upon request.

The authors declare that they have no conflicts of interest.

This study was supported by the National Key Research and Development Program of China (no. 2018YFB1600902), MOE Layout Foundation of Humanities and Social Sciences (no. 18YJAZH009), National Natural Science Foundation of China (no. 51778063), and Fundamental Research Funds for the Central Universities (no. 2572019AB26).