^{1}

^{2}

^{1}

^{2}

Inclement weather affects traffic safety in various ways. Crashes on rainy days not only cause fatalities and injuries but also significantly increase travel time. Accurately predicting crash risk under inclement weather conditions is helpful and informative to both roadway agencies and roadway users. Safety researchers have proposed various analytic methods to predict crashes. However, most of them require complete roadway inventory, traffic, and crash data. Data incompleteness is a challenge in many developing countries. It is common that safety researchers only have access to data on sites where a crash has occurred (i.e., zero-truncated data). The conventional crash models are not applicable to zero-truncated safety data. This paper proposes a finite-mixture zero-truncated negative binomial (FMZTNB) model structure. The model is applied to three-year wet-road crash data on 395 divided roadway segments (total 586 km), and the parameters are estimated using the Markov chain Monte Carlo (MCMC) method. Comparison indicates that the proposed FMZTNB model has better fitting performance and is more accurate in predicting the number of wet-road crashes. The model is capable of capturing the heterogeneity within the sample crash data. In addition, lane width showed mixed effects in different components on wet-road crashes, which are not observed in conventional modeling approaches. Practitioners are encouraged to consider the finite-mixture zero-truncated modeling approach when complete safety dataset is not available.

According to the World Health Organization, more than 1.3 million roadway users died each year as a result of traffic crashes, and the cost of traffic crashes accounted for about 3% of the gross domestic product in most countries [

A traffic crash is usually caused by one or several factors, including humans (i.e., vehicle driver, motorist, bicyclist, and pedestrians), vehicles, roadway facilities, and environment. Weather affects traffic safety, demand, selection of transportation mode, driving capabilities, vehicle performance (i.e., stability, maneuverability, and traction), and roadway infrastructure (i.e., pavement friction) through visibility impairments, precipitation, and temperature. Inclement weather not only increases crash risk but also significantly affects users’ travel time. According to crash statistics, more than 20 percent of crashes and more than 15 percent of traffic fatalities are weather-related [

Statistical modeling approach has been extensively used in recent two to three decades to quantitatively predict number of crashes. Specifically, safety researchers have proposed various models for developing crash counts, e.g., Poisson, negative binomial (NB), Sichel [

This study is an extension of a recent study on modeling zero-truncated crash data [

The rest of this paper is organized as follows: Section

Because of the great influence of weather on roadway safety, transportation researchers have made continues efforts in understanding the relationship between different weather conditions and traffic crashes.

Shankar et al. [

Maze et al. [

Qiu and Nixon [

Jung et al. [

Recently, Das et al. [

To summarize, extensive studies have been conducted to analyze the relationship between weather and safety. Overall, crash rates increase significantly during inclement weather conditions. In the previous studies, almost all of them include weather data as factors in the regression models, and none of them have focused on developing a safety performance function for wet-weather crashes specifically. In addition, previous studies have used the common count models (e.g., negative binomial), which require complete safety data. Zero-truncated data are common in developing countries, and zero-truncated models have been proposed by researchers to analyze crash data in recent years [

This section discusses three crash modeling approaches: (1) the commonly used negative binomial model; (2) zero-truncated NB model; and (3) finite-mixture zero-truncated NB model.

As has been mentioned in Section

The commonly used NB model assumes that the number of crashes occurred at a given site (a segment or an intersection) during a certain period follows Poisson distribution as follows:

The probability mass function (PMF) of crash count is shown as follows:

Furthermore, assume that the Poisson rate

Assuming that the mean

Interpreting

The PMF of

The NB model has been widely used in analyzing overdispersed count data; however, it requires completed observed data. When the zero’s are truncated, the assumption of the NB model cannot be satisfied, and the estimated parameters are biased. Statisticians proposed truncated models [

From equation (

Substituting equation (

Compared to the conventional NB model, the zero-truncated NB model can be viewed as a conditional NB distribution that the response variable takes nonzero values. The conditional distribution (i.e., positive NB) brings complexity in estimating parameters. A few software packages are available for estimating the ZTNB model, for example, VGAM with R [

In both the conventional NB and zero-truncated NB models, the distribution of the response variable has only one component, i.e., there is only one Poisson mean. The finite-mixture models assume that the response variables arise from two or more unobserved components with unknown proportions. This provides significant modeling flexibility than the conventional single component models [

Analogous to equation (

In both the FMNB-

It can be seen that when

It is important to note that, as the number of components

In terms of parameter estimation, the commonly used maximum likelihood estimation (MLE) algorithm will not generate reliable results due to the complicated likelihood function in the FMZTNB-2 model. An alternative is the Gibbs sampling technique, also known as the Markov chain Monte Carlo (MCMC) method, which has been frequently used in estimating parameters of finite-mixture models [

This study collected data on 395 rural multilane-divided roadway segments, including traffic volume, lane width, average shoulder width, and median width. Three years of wet-road crash data were collected. A wet-road crash is defined as that the weather condition was rain, snow, or hail, or the surface condition was wet, snowy, ice, or standing water at the time of the crash occurred. In terms of independent variables, this paper mainly considered data availability and potential effects on the occurrence of crashes during rainy weather conditions from published literature [

Descriptive statistics of data (sample size: 395).

Variable | Mean | Minimum | Maximum | Standard deviation |
---|---|---|---|---|

Segment length (km) | 1.483 | 0.172 | 3.080 | 0.969 |

Traffic volume (veh/day) | 11569.41 | 3192 | 26935 | 5505.763 |

Lane width (m) | 3.826 | 2.9 | 5.0 | 0.513 |

Average outside shoulder width (m) | 2.515 | 0.61 | 3.05 | 0.758 |

Average inside shoulder width (m) | 1.536 | 0 | 3.05 | 0.705 |

Median width (m) | 6.352 | 3.05 | 14.64 | 3.107 |

Wet-road crash count | 2.42 | 1 | 15 | 2.439 |

It is worth mentioning that the minimum crash count of the sample segments is 1 (see the last row in Table

Previous studies have revealed that the commonly used NB model is not applicable for modeling zero-truncated crash data [

The authors developed the ZTNB model with the data described in Section 4 with the following functional form.

Although studies have pointed out that varying dispersion parameter (i.e.,

The modeling results of the ZTNB model is shown in Table

Estimating results of the ZTNB model.

Variable | Estimate | Std. err. | Significant level | |
---|---|---|---|---|

Intercept, | −3.638 | 1.698 | — | Not significant |

Log (ADT), | 0.169 | <0.001 | 99.9% | |

Lane width, | −0.100 | 0.159 | 0.530 | Not significant |

Ave. out. SHD, | 0.082 | 0.054 | 90.0% | |

Ave. in. SHD, | 0.080 | <0.001 | 99.9% | |

Median width, | 0.034 | <0.001 | 99.9% | |

Disp. par., | 1.615 | 5.465 | 0.409 | Not significant |

AIC | 1142.29 | — | — | — |

BIC | 1170.14 | — | — | — |

MAE | 0.64 | — | — | — |

RMSE | 2.52 | — | — | — |

This study used four types of goodness-of-measure (GOF) to evaluate the model performance: Akaike information criterion (AIC), Bayesian information criterion (BIC), mean absolute error (MAE), and root mean square error (RMSE). The AIC, BIC, MAE, and RMSE for the ZTNB model are 1142.29, 1170.14, 0.64, and 2.52, respectively (see the last four rows in Table

As has been mentioned in Section

The modeling results of the FMZTNB-2 model are documented in Table

Estimating results of the FMZTNB-2 model.

Variable | Estimate | Std. err. | Estimate | Std. err. | ||
---|---|---|---|---|---|---|

Component 1 | Component 2 | |||||

Intercept | −4.047 | 2.819 | 0.151 | − | <0.001 | |

Log (ADT) | <0.001 | <0.001 | ||||

Lane width | − | <0.001 | <0.001 | |||

Ave. Out. SHD | − | <0.001 | − | <0.001 | ||

Ave. In. SHD | −0.241 | 0.177 | 0.174 | <0.001 | ||

Median width | − | <0.001 | −0.494 | 0.251 | 0.049 | |

Disp. par. | <0.001 | 4.619 | 1.856 | 0.013 | ||

Weight | <0.001 | – | – | |||

AIC | 1020.54 | – | – | – | – | – |

BIC | 1088.32 | – | – | – | – | – |

MAE | 0.22 | – | – | – | – | – |

RMSE | 2.18 | – | – | – | – | – |

Finally, the AIC, BIC, MAE, and RMSE for the FMZTNB-2 model are 1020.54, 1088.32, 0.22 and 2.14, respectively (see the last four rows in Table

Prediction comparison between ZTNB and FMZTNB models (three example sites).

Site number (level) | ZTNB | FMZTNB-2 | ||||
---|---|---|---|---|---|---|

Prediction | Std. err. | 90% PI | Prediction | Std. err. | 90% PI | |

138 (low) | 0.0040 | 0.0636 | [0.0036–0.0046] | 0.0005 | 0.0007 | [0.0005–0.0005] |

90 (moderate) | 0.0654 | 0.2608 | [0.0392–0.109] | 0.0627 | 0.0449 | [0.0574–0.0684] |

65 (high) | 0.7350 | 1.0341 | [0.0968–5.5788] | 0.7625 | 0.5903 | [0.2398–2.425] |

The FMZTNB-2 model shows superiority in modeling the wet-weather crash data. First, the FMZTNB-2 model fits the dataset better than the ZTNB model in terms of GOF measures (e.g., AIC, BIC, MAE, and RMSE). Second, the predictions using FMZTNB-2 model have lower standard deviations and narrower prediction intervals, indicating that the predictions are more accurate. Finally, a few interesting relationships between variables and crashes are observed from the FMZTNB-2 model. For example, the parameters of lane width are opposite in the two components, indicating that this factor have mixed effects at different locations. These results indicate that the FMZTNB-2 model captures the heterogeneity of the crash data better than the ZTNB model.

Inclement weather increases both crash risk and travel time. Efforts have been made in the past decades to predict the occurrence of traffic crashes. However, very few of the previous studies have focused on predicting wet-road crashes. Most of the commonly used crash prediction models require complete roadway inventory, traffic, and crash data. Data missing is relative common in developing countries. How to analyze zero-truncated crash data and predict the number of wet-road crashes is the primary objective of this study. To better capture the heterogeneity of wet-road crash data, this study developed the two-component finite-mixture zero-truncated negative binomial model. The model is applied to three-year wet-road crash on 395 rural-divided roadways. The model results are compared with those based on zero-truncated negative binomial model. Comparison indicates that the proposed FMZTNB-2 model fits the wet-road crash data better than the ZTNB model. It is worth mentioning that, the wet-weather crash data were not modeled with the conventional NB model since previous studies have demonstrated that the application of NB model on truncated data is not recommended. There are trade-offs of using ZTNB or FMZTNB models in crash analyses. With zero-truncated data, the sample size is smaller than that of full data. The reduced sample size might increase uncertainty of parameter estimates.

There are some limitations with this study. First, only a number of roadway characteristics (i.e., segment length, lane width, shoulder width, and median width) and traffic data are available to the authors. There are other factors affecting the occurrence of wet-roadway crashes (e.g., precipitation, number of rainy days per year, and surface skid number). Unfortunately, they are not accessible to the authors. Second, previous studies have shown that the varying forms of dispersion parameter and weight factor for the components in the finite-mixture models improve both crash prediction and hotspot identification [

The data used to support the findings of this study are available from the corresponding author upon request.

The authors declare that they have no conflicts of interest.

This research was sponsored jointly by the National Natural Science Foundation of China (project no. 51978082); the Outstanding Youth Foundation of Hunan Education Department (project no. 19B022); and the Young Teacher Development Foundation of Changsha University of Science & Technology (project no. 2019QJCZ056).