^{1}

^{2}

^{1}

^{2}

^{1}

^{1}

^{3}

^{1}

^{1}

^{1}

^{2}

^{3}

Crash severity prediction has been raised as a key problem in traffic accident studies. Thus, to progress in this area, in this study, a thorough artificial neural network combined with an improved metaheuristic algorithm was developed and tested in terms of its structure, training function, factor analysis, and comparative results. Data from I5, an interstate highway in the Washington State during the period of 2011–2015, were used for fitting and prediction, and after setting the theoretical three-layer neural network (NN), an improved Particle Swarm Optimization (PSO) method with adaptive inertial weight was offered to optimize the NN, and finally, a comparison among different adaptive strategies was conducted. The results showed that although the algorithms produced almost the same accuracy in their predictions, a backpropagation method combined with a nonlinear inertial weight setting in PSO produced fast global and accurate local optimal searching, thereby demonstrating a better understanding of the entire model explanation, which could best fit the model, and at last, the factor analysis showed that non-road-related factors, particularly vehicle-related factors, are more important than road-related variables. The method developed in this study can be applied to a big data analysis of traffic accidents and be used as a fast-useful tool for policy makers and traffic safety researchers.

Traffic safety is a challenging task to be accomplished and has been identified as crash hotspots around the world. The total number of fatal crashes in the U.S. increased to around 35,000 in 2016. In addition, according to the Washington State Collision Summary report, a total of 117,053 crashes were identified in the Washington State, including 499 fatal collisions, 36,531 injury collisions, and 77,358 property-damage-only collisions, indicating a crash occurred every 4.5 min and a person died in a crash every 16 hours [

Traffic fatality rates in the U.S. vs. Washington State (per 100M vehicle miles traveled). Source: CLAS (WSDOT) and FARS (WTSC).

To achieve the intrinsic goal of exploring numerous factors that trigger the crash, crash severity is often used for crash analyses to represent the degree of injury. A KABCO scale was proposed by WSDOT to represent the level of the injury: K—fatal injury; A—incapacitating-injury; B—nonincapacitating injury; C—minor injury; and O—property-damage-only injury. The KABCO scale has been widely adopted and adapted by many scholars (e.g., [

Crash prediction problems have long been a popular area of study around the world. Numerous studies conducted the prediction analyses based on classic statistical models, e.g., the linear, nonlinear, generalized linear model (GLM), generalized estimating equation (GEE), nominal binary (NB), and Poisson regression models, which are regarded as a good attempt at thoroughly formulating the relationship between tens or hundreds of explanative variables. However, it should be noted that the traditional statistical method has its limitation. The artificial intelligence (AI) technique, particularly, the deep learning methodology [

Safety performance functions (SPFs) are frequently utilized to demonstrate the relationships between different crashes and crash impact parameters. Such functions usually use the crash frequency as the targeting variable. The Highway Safety Manual [

A number of researchers are eager to dig into the use of statistical analysis for traffic safety research, such as linear, nonlinear, GLM, GEE, NB, or Poisson regression models. Such models have performed well when the number of explanatory variables is constrained. Debrabant et al. [

To address the shortcomings of the traditional statistical model, scholars in the crash analysis field are more willing to use AI methods at present. Karlaftis and Vlahogianni [

However, in short, both statistical and AI methods in the previous studies are still facing some challenges, and especially for neural networks, the traditional NNs are more easily stuck in the local optimum in accordance with its random weights initialization at the very first beginning. Although some studies [

To solve the aforementioned problems, the purpose of this paper is as follows: (1) to provide a BPNN algorithm integrating the PSO with adaptive inertial weights for the establishment of the crash severity prediction model; (2) to conduct a detailed factor analysis (FA) based on the refined model to quantify the internal relationship and heterogeneity of different variables that trigger the crash of distinguished severity. The prediction target in the first phase is the crash severity. It should be noted that the novelty of the paper lies in the fact that an integrated method incorporating an emerging AI technique and a traditional statistical model was provided for crash severity analysis. Besides, it is marginally original to use FA and PSO for the calculation of the parameters of the crash triggers.

This paper is organized as follows: in the first part, the background and severity levels are discussed to demonstrate the reason for conducting this research. In the second part, classic statistical models and NN models dealing with crash analysis are reviewed to illustrate their advantages and limitations. The following section demonstrates the processing of the dataset, including its description and simplification. In the fourth part, the entire methodology for the developed model and data incorporation are presented to highlight the process of conducting a severity prediction process. Finally, the results are presented with the conclusion.

The dataset used in this study consisted of the data acquired from the Highway Safety Information System (HSIS); in this system, data from nine states (California, Washington, Minnesota, Michigan, Maine, Ohio, North Carolina, and Illinois) in the U.S. are available. Considering the author’s time studying in the Washington State during the period of 2016 to 2017, the crash data from this were selected as the target data.

The HSIS data contained, roughly, two tables for the variables. The first one is related to Accident, Vehicle, and Occupant files, which involve TIME, ENVIRONMENT, ACCIDENT-RELATED INFORMATION, VEHICLE INFORMATION, DRIVER INFORMATION, OCCUPANT, ROADWAY ELEMENTS, and PEDESTRIAN/BICYCLIST INFORMATION, whereas the second table was more concerned with the roadway containing LOCATION/LINKAGE ELEMENTS, ROADWAY CLASSIFICATION, ROAD ALIGNMENT, CROSS SECTION, ROAD FEATURES, TRAFFIC CONTROL/OPERATIONS, and TRAFFIC DATA. In addition, there were tens of subvariables for both the tables.

The crash data from I5 in the Washington State covering the years 2011–2015 were extracted. The data from the first four years, 2011–2014, were used to fit the model, and the data from the later year were used as the prediction validation set. The total crashes were 9926, 10083, 10127, 11628, and 12804 from 2011–2015. Based on these raw samples, the following steps need to be conducted before digging into the model input procedure:

Exclude apparently irrelevant variables. More than 40 features were requested from the HSIS database system, and some features, such as “CASENO” (accident case number), “MILEPOST” (milepost), “RD_INV” (a linkage variable on the Accident file which is used in the merging operation), and “RTE_NBR” (route number) are not related to the crash severity and were omitted for simplicity.

Samples with features such as “LIGHT,” “WEATHER,” and “ACCTYPE” (accident type) which have values such as “UNKNOWN,” “NAN,” “UNSTATED,” and “NULL” were also omitted for simplicity.

Some nominal variables which cannot be denoted by the continuous number such as “DIR_CURV” (the horizontal curve direction) and “DIR_GRAD” (the vertical curve grade direction), both representing the relative direction of left or right, were transformed into discrete scale values (“1” or “0”).

The vehicle-related and driver-related variables such as “DRV_AGE” (driver age) and “DRV_SEX” (driver sex) were incorporated with accident-related data files through the “CASENO” label, whereas the grad/curve-related variables were incorporated with accident-related data through “MILEPOST”; here, a data process computer program written in MATLAB was developed to locate the “MILEPOST” between “BEGPOST” and “ENDPOST” in the grad/curve files.

“VEHYR,” which indicates the vehicle model year, was transformed into the vehicle operation year through the following formula:

where

The output file contains the vectors derived from the “SEVERITY” variable in the raw dataset. In detail, noninjury was derived from “1, No Injury,” injury was derived from “6, Nondisabling Injury; 7, Possible Injury,” and incapacitating injury was derived from the remaining. The vectors are described through the following formula:

After the processing was conducted through the abovementioned steps, a total of 4310, 4494, 4436, 4666, and 4984 samples from 2011 to 2015 were used for model fitting and validation; glancing at the data, crashes seemed to be slightly more prone to occur during the winter (cold season) and on work days, whereas younger drivers contribute to a significant number of accidents (Figure

Crash distribution based on a (a) month, (b) weekday, and (c) driver age for 2011–2014.

After the selection of 20 features (Table

Summary of the selected variables.

Category | Variables | Value definition |
---|---|---|

(1) Accident type, (2) month, and (3) weekday | Categorical, categorical, and categorical | |

(4) Location type, (5) light, and (6) Driver sex | Categorical, categorical, and categorical | |

(7) Driver age, (8) driver restrain, and (9) vehicle year | Continuous, categorical, and categorical | |

(10) Vehicle type and (11) weather | Categorical and categorical | |

(12) Road characteristics and (13) road surface | Categorical and categorical | |

(14) Road functional class and (15) curve angle | Categorical and continuous | |

(16) Curve direction and (17) gradient direction | Categorical and categorical | |

(18) Gradient percentage and (19) curve radius | Continuous and continuous | |

(20) Curve degree | Continuous |

From Table

An artificial neural network uses information technology to mimic the human neurons and can process complicated connections between the input, hidden, and output layers. Among the multiple-layer neural networks, a three-layer simple NN has been proven to be most adopted and effective in a previous research [

In the forward propagation three-layer NN, the input variables can be defined as an input vector

Similarly, the expectation of the crash severity level output vectors

In addition, the weight matrix between the input and hidden layers,

The weight matrix between the hidden and output layers,

The structure of a general three-layer neural network is shown in Figure

General structure of a typical NN used in this study.

Using the forward calculation method [

Similarly, the outputs calculated from the hidden layer

Generally, the activation functions such as sigmoid or tan

Another aspect used for building an NN is the definition of the number of nodes in the hidden layer, and there is no easy and complete mathematical way of defining this number; however, based on the experience from former research [

Based on the BP method [

The weights in the network can be updated as

Other than the general BP method, there are some other modified training functions, including resilient backpropagation (RPROP), conjugate gradient backpropagation, gradient descent/momentum, and adaptive backpropagation. These functions have different levels of accuracy and training speeds, and thus, an attempt should be made to find a better solution.

In the traditional BP, method, the initialized weights in formulas (

The particle swarm optimization method was a well-known metaheuristic computation method provided in 1995 [

In this paper, the initial weights from BPNN are treated as the particles in the PSO algorithm, and the optimization problem can be described as mapping a decision space

For the standard or original PSO, it could solve nonlinear or nondifferentiable problems easily, but the searching space for a particular particle is almost fixed during each phase of generation, which means the model could then be easy and fast to find a solution, a possible solution near the local optimal. Thus, bringing the tradeoff between the local search ability and global search ability ahead [

Thus, in this paper, different inertial weight setting methods [

The function graph for formulas (

Weight function (

In conclusion, the pseudocode for the whole procedure in Sections 3.1 and 3.2 is formulated as follows:

FOR

FOR

PARTICLE INITIALIZE;

CALCULATE FITNESS (MSE of BP NN);

UPDATE PARTICLE VELOCITY (WITH ADAPTIVE INERTIAL WEIGHTS);

UPDATE PARTICLE POSITION;

END

END

ASSIGN SOLUTION TO BP NN;

BP NN TRAINING, TESTING, VALIDATION;

NN PREDICTION;

The calculation process is given in Figure

Calculation process for the method used in this study.

To carry out a factor analysis (FA), the factor importance index (FII) is introduced in this paper. According to the nonlinear and classification function in practice, the

Considering formula (

Through formula (

Finally, in order to ease the simulation variance of the model training process, the FII expectation is introduced by running the model for a certain

Based on the theory discussed in the previous section, the most primary step in building an NN is to define the number of good hidden layer nodes and a better training function. Usually, the model performance (mean square error, MSE) combined with the total iteration number of convergence is used to test the structure. For the number of hidden layer nodes, based on formula (

Summary of training functions.

No. | Function name | Abbreviation |
---|---|---|

1 | BFGS quasi-Newton backpropagation | BFGS |

2 | Conjugate gradient backpropagation with Powell-Beale restarts | CGB |

3 | Conjugate gradient backpropagation with Fletcher–Reeves updates | CGF |

4 | Conjugate gradient backpropagation with Polak–Ribiere updates | CGP |

5 | Gradient descent backpropagation | GD |

6 | Gradient descent with adaptive learning rate backpropagation | GDA |

7 | Gradient descent with momentum | GDM |

8 | Gradient descent w/momentum and adaptive learning rate backpropagation | GDX |

9 | One-step secant backpropagation | OSS |

10 | RPROP backpropagation | RP |

11 | Scaled conjugate gradient backpropagation | SCG |

12 | Levenberg–Marquardt backpropagation | LM |

13 | Bayesian regulation backpropagation | BR |

The last two methods in Table

Theoretically, the number of nodes in the hidden layers should be within the range of

We randomly separated the sample data from 2011 to 2014 to form the dataset of training, testing, and validation with respect to 70%, 15%, and 15%. In detail, a total data of 17839 were divided into 12487, 2676, and 2676, in accordance with training, testing, and validation. The outcome is shown in Table

Test on the number of neural network hidden layer nodes (best validation performance in terms of MSE).

10 | 11 | 12 | 13 | 14 | Average | |
---|---|---|---|---|---|---|

BFGS | 0.260 | 0.252 | 0.250 | 0.250 | 0.208 | 0.242 |

CGB | 0.202 | 0.206 | 0.254 | 0.244 | 0.246 | 0.230 |

CGF | 0.208 | 0.240 | 0.204 | 0.242 | 0.232 | 0.224 |

CGP | 0.220 | 0.222 | 0.202 | 0.240 | 0.202 | 0.218 |

GD | 0.230 | 0.232 | 0.236 | 0.224 | 0.232 | 0.230 |

GDA | 0.204 | 0.206 | 0.208 | 0.204 | 0.206 | 0.204 |

GDM | 0.466 | 0.472 | 0.228 | 0.216 | 0.240 | 0.324 |

GDX | 0.206 | 0.202 | 0.204 | 0.198 | 0.204 | 0.202 |

OSS | 0.200 | 0.202 | 0.204 | 0.244 | 0.206 | 0.212 |

RP | 0.206 | 0.202 | 0.198 | 0.206 | 0.210 | 0.204 |

SCG | 0.204 | 0.206 | 0.198 | 0.206 | 0.204 | 0.204 |

LM | 0.204 | 0.204 | 0.198 | 0.208 | 0.202 | 0.202 |

BR | 0.199 | 0.199 | 0.199 | 0.198 | 0.198 | 0.198 |

Average | 0.230 | 0.234 | 0.214 | 0.220 | 0.214 |

It can be seen from Table

After setting the adaptive inertial weights for the PSO optimizer, we can conclude the following performance graph through each iteration.

To eliminate the random data separation variance, 100-time simulation was conducted, and the average performance for each method is described in Table

Summary of the performance relating to different models.

Category | Performance | Training accuracy (percentage) | Prediction accuracy (percentage) |
---|---|---|---|

NN | 0.235 | 78.5 | 71.6 |

NN with std. PSO | 0.232 | 78.6 | 72.4 |

NN with PSO ( | 0.298 | 73.2 | 70.5 |

NN with PSO ( | 0.194 | 80.4 | 73.1 |

NN with PSO ( | 0.196 | 79.3 | 73.6 |

From Figure

NN performance (MSE) adopting different PSO optimizations. (a) PSO without inertial weight; (b) PSO with

As shown in Table

The final results for two PSO with nonlinear adaptive inertial weight are described in Table

Summary of FII expectation (

Variables | Category | NN with PSO ( | NN with PSO ( | ||
---|---|---|---|---|---|

FII | Relative percentage (%) | FII | Relative percentage (%) | ||

Acctype | Nonroad | 0.108 | 77.3 | 0.080 | 63.4 |

Month | Nonroad | 0.088 | 63.1 | 0.102 | 81.0 |

Weekday | Nonroad | 0.061 | 43.7 | 0.054 | 43.5 |

loc_type | Nonroad | 0.080 | 57.1 | 0.076 | 60.1 |

rd_char1 | Road | 0.078 | 55.6 | 0.063 | 50.2 |

Rdsurf | Road | 0.052 | 37.4 | 0.456 | 36.2 |

Light | Nonroad | 0.069 | 49.1 | 0.071 | 56.1 |

Weather | Nonroad | 0.067 | 47.7 | 0.059 | 47.2 |

func_cls | Road | 0.063 | 45.1 | 0.058 | 46.1 |

drv_sex | Nonroad | 0.024 | 17.4 | 0.017 | 13.1 |

drv_rest | Nonroad | 0.043 | 30.5 | 0.039 | 31.0 |

Vehtype | Nonroad | 0.092 | 65.6 | 0.090 | 72.0 |

dir_curv | Road | 0.032 | 23.2 | 0.028 | 22.1 |

dir_grad | Road | 0.020 | 14.2 | 0.019 | 15.1 |

drv_age | Nonroad | 0.117 | 83.4 | 0.113 | 90.0 |

Vehyr | Nonroad | 0.140 | 100.0 | 0.126 | 100 |

curv_ang | Road | 0.047 | 33.5 | 0.045 | 35.4 |

pct_grad | Road | 0.056 | 40.1 | 0.052 | 41.3 |

deg_curv | Road | 0.035 | 25.1 | 0.032 | 25.4 |

curv_rad | Road | 0.040 | 28.4 | 0.034 | 26.8 |

From Figure

Summary of results for factor analysis. (a) FII and its relative percentage for each variable (NN with PSO (

Driver age and month are two other important factors in predicting a crash severity. From the sample size, we can see that the most severe crashes occur during the winter in the Washington State (December, January, and February), and drivers below the age of 25 and above the age of 60 are more prone to encountering severe injury crashes. The month may account for the rainy season in the mountainous Seattle area, whereas age may be derived from the fact that younger people and older people are more prone to making severe mistakes.

In this study, a thorough artificial neural network (ANN) was developed to address the problems of the crash severity level modeling and factor analysis (FA). Besides the test of different types of training structure and methods, more importantly, a nonlinear adaptive PSO optimization method was proposed in order to solve the tradeoff problem between the global and local search ability among the previous studies. The detail test of different algorithm confirmed our hypothesis. The additional contributing factor analysis also offers a different point of view compared with former statistical analysis. The main conclusions can be concluded as follows:

The number 12 hidden layer nodes fit the model developed in this paper well; and the BP method (Levenberg–Marquardt) can be better utilized when aided by fast hardware

The simulation result showed that the PSO optimizer with nonlinear adaptive inertial weight outperforms the standard PSO and PSO with linear adaptive inertial weight

Through the factor analysis (FA), it can be found that, among all 20 variables, nonroad-related variables can account for most of the severity prediction variance, and the rainy mountainous area in Seattle may be the reason for the importance of the month as a factor and, also, the impact of driver age, where younger and older people are more prone to encountering a severe crash

The main innovations can be concluded as follows:

Traditional studies often used statistical methods like the Poisson regression, negative binary regression, and generalized logit or probit model for the identification and mathematical qualification of the inner internal triggers and their impact on crash severity, while this paper utilized FA as the analytical tool, which is unusual for the current research system of crash severity, and we think that our attempt extended the methods of crash severity analyses, and more research could be conducted in the future work.

FA, as a traditional statistical implement, also can serve as a powerful explanatory tool in the last stage of the model, and our work has proved it. The application of FA in this paper indicated that the basic statistical method is still useful and efficient while the AI methods sometimes did not have an agreeable explanation for the inner mechanism of the data.

The method developed in this study can be applied to a big data analysis of traffic accidents and be used as a fast-useful tool for policy makers and traffic safety researchers. The authors recognize that much can be further investigated. In this paper, only crash severity was discussed. Further research could be conducted from the perspective of the collision type (e.g., head-on collisions and rear-end collisions). Besides, the dataset could be enlarged in the future research to improve the accuracy.

The dataset used in this study was made up of data requested from the Highway Safety Information System (HSIS), and requests for access to these data should be made by filling the form at the following link:

The authors declare no conflicts of interest.

The authors would like to thank the Highway Safety Information System, U.S.A, Smart Transportation Application and Research Laboratory (STAR Lab) at the University of Washington, U.S.A, and Pacific Northwest Transportation Consortium Region 10, U.S.A. They also thank National Natural Science Foundation of China (Grant nos. 51778141 and 71871078), China Scholarship Council, and Jiangsu Creative PhD Student Sponsored Project (KYLX15_0157) for providing essential data and support.