Predicting Wet-Road Crashes Using the Finite-Mixture Zero-Truncated Negative Binomial Model

Inclement weather aﬀects traﬃc safety in various ways. Crashes on rainy days not only cause fatalities and injuries but also signiﬁcantly increase travel time. Accurately predicting crash risk under inclement weather conditions is helpful and informative to both roadway agencies and roadway users. Safety researchers have proposed various analytic methods to predict crashes. However, most of them require complete roadway inventory, traﬃc, and crash data. Data incompleteness is a challenge in many developing countries. It is common that safety researchers only have access to data on sites where a crash has occurred (i.e., zero-truncated data). The conventional crash models are not applicable to zero-truncated safety data. This paper proposes a ﬁnite-mixture zero-truncated negative binomial (FMZTNB) model structure. The model is applied to three-year wet-road crash data on 395 divided roadway segments (total 586km), and the parameters are estimated using the Markov chain Monte Carlo (MCMC) method. Comparison indicates that the proposed FMZTNB model has better ﬁtting performance and is more accurate in predicting the number of wet-road crashes. The model is capable of capturing the heterogeneity within the sample crash data. In addition, lane width showed mixed eﬀects in diﬀerent components on wet-road crashes, which are not observed in conventional modeling approaches. Practitioners are encouraged to consider the ﬁnite-mixture zero-truncated modeling approach when complete safety dataset is not available.


Introduction
According to the World Health Organization, more than 1.3 million roadway users died each year as a result of traffic crashes, and the cost of traffic crashes accounted for about 3% of the gross domestic product in most countries [1]. Road traffic injuries and deaths are a global problem, and traffic crashes are a leading cause for nonnatural death. Safety researchers and practitioners have made continuous efforts to reduce the number and severity of crashes.
A traffic crash is usually caused by one or several factors, including humans (i.e., vehicle driver, motorist, bicyclist, and pedestrians), vehicles, roadway facilities, and environment. Weather affects traffic safety, demand, selection of transportation mode, driving capabilities, vehicle performance (i.e., stability, maneuverability, and traction), and roadway infrastructure (i.e., pavement friction) through visibility impairments, precipitation, and temperature. Inclement weather not only increases crash risk but also significantly affects users' travel time. According to crash statistics, more than 20 percent of crashes and more than 15 percent of traffic fatalities are weather-related [2]. It is necessary to accurately predict the occurrence of crashes under inclement weather conditions.
Statistical modeling approach has been extensively used in recent two to three decades to quantitatively predict number of crashes. Specifically, safety researchers have proposed various models for developing crash counts, e.g., Poisson, negative binomial (NB), Sichel [3,4], Conway-Maxwell-Poisson [5,6], zero-inflated Poisson [7], Poisson-Tweedie [8], Tobit [9][10][11][12], machine learning techniques [13,14], etc. For a detailed review of the crash regression techniques, readers can refer to the article by Lord and Mannering [15]. In all these models, crash counts are treated as response variables, and more importantly, all of these models require complete roadway inventory, traffic, and crash data. Safety data (e.g., roadway inventory, traffic, operation, and crash information) play a critical role in crash prediction model development, hotspot identification, and safety effectiveness evaluation. Inaccurate or incomplete crash records with the conventional crash prediction models may lead to various misleading results. ese errors not only result in inefficient use of limited resources for safety improvements but also cause additional loss of lives. However, data incompleteness is a challenge in many developing countries. For example, only information on segments or intersections where a crash has occurred is collected. is type of data is known as zero-truncated data. Previous studies have shown that the conventional count models are not adequate to model zero-truncated crash data [16,17]. How to develop reliable crash prediction models using zerotruncated data is an important topic for safety analysts.
is study is an extension of a recent study on modeling zero-truncated crash data [16]. e primary objective is to develop safety performance functions for wet-road crashes when zero's are truncated in the safety data considering the heterogeneity. Particularly, this paper proposes a finitemixture zero-truncated negative binomial (FMZTNB) model structure and examines if the FMZTNB model provides better modeling results than the commonly used models. e rest of this paper is organized as follows: Section 2 reviews the literature pertaining to the influence of weather on safety. Section 3 describes the details of the zero-truncated models. Section 4 briefly documents the zero-truncated data. Section 5 presents the modeling results, and Section 6 summarizes the study.

Literature Review
Because of the great influence of weather on roadway safety, transportation researchers have made continues efforts in understanding the relationship between different weather conditions and traffic crashes.
Shankar et al. [18] conducted one of the earliest studies on the effect of weather conditions on roadway crashes. e researchers developed a negative binomial crash model with roadway geometrical and environmental factors. e modeling results suggest that both maximum rainfall and number of rainy days play significant and positive role in number of total crashes.
Maze et al. [19] studied how inclement weather affects traffic demand, traffic safety, and traffic flow relationships. e researchers pointed out that certain types of severe weather conditions (e.g., winter storms) bring a higher risk of being involved in a crash by 13 to 25 times. Weather conditions also impact the crash severity, but it varies depending on specific weather condition and crash location.
Qiu and Nixon [20] conducted a systemic review on the effect of adverse weather on the occurrence of roadway crashes.
e researchers reviewed 112 studies conducted between 1967 and 2005 that had examined the association between weather and traffic crashes. Crash rates from each study were combined through a meta-analysis method. e researchers conclude that the crash rate usually increases during precipitation. Snow has a greater effect than rain does on crash occurrence. Specifically, snow can increase the crash rate by 84% and the injury rate by 75%.
Jung et al. [21] analyzed the influence of four weather factors (i.e., rainfall intensity, water film depth, temperature, and wind speed and direction) on the injury severity of rainy day multivehicle crashes. e study found that wind speed is associated with the outcome of crashes.
Recently, Das et al. [22] developed safety performance functions for two types of roadways (i.e., rural two-lane highway and rural multilane highway) in two states (i.e., Ohio and Washington). e researchers included speed measures and weather conditions in the models. Modeling results revealed that precipitation is negatively associated with number of crashes. is result is inconsistent with most previous studies, and the researchers noted that the vehicle speeds might reduce during the wet-weather conditions, hence resulting in fewer crashes.
To summarize, extensive studies have been conducted to analyze the relationship between weather and safety. Overall, crash rates increase significantly during inclement weather conditions. In the previous studies, almost all of them include weather data as factors in the regression models, and none of them have focused on developing a safety performance function for wet-weather crashes specifically. In addition, previous studies have used the common count models (e.g., negative binomial), which require complete safety data. Zero-truncated data are common in developing countries, and zero-truncated models have been proposed by researchers to analyze crash data in recent years [16,17]. To the best of the authors' knowledge, no efforts have been made to analyze zero-truncated wet-road crashes. is study aims to fill this gap.

Conventional NB Model.
As has been mentioned in Section 2, various statistical methods have been developed by safety researchers to predict number of crashes. e NB model is still the most commonly used approach and is recommended by the first edition of Highway Safety Manual (HSM) [23]. is section briefly introduces the structure of the NB model. e commonly used NB model assumes that the number of crashes occurred at a given site (a segment or an 2 Journal of Advanced Transportation intersection) during a certain period follows Poisson distribution as follows: e probability mass function (PMF) of crash count is shown as follows: where y denotes the crash count. e subscripts i and t represent site index and study period, respectively. λ i,t is the Poisson rate for the site during the period. For the ease of readers, the subscripts i and t are omitted in the rest of this paper.
Furthermore, assume that the Poisson rate λ follows gamma distribution: where μ is the mean for λ and α is the shape parameter (positive).
Assuming that the mean μ is associated with roadway features (e.g., traffic volume, segment length, and geometric characteristics), Interpreting λ from equation (2), the PMF of the NB distribution can be obtained as e PMF of y is shown as follows: where y is the response variable (i.e., crash count), µ indicates the mean response of the observation, and α is the dispersion parameter (i.e., shape parameter in the Gamma distribution). For the detailed derivative of the NB model, readers can refer to [24]. It is important to note that, in the conventional NB model structure, the response variable y takes the values of all nonnegative counts (i.e., 0, 1, 2, 3, . . .). In other words, all the observed crash counts should be included in the model development. Since the NB model has closed-form, the parameters can be easily estimated. Many software packages have been developed to estimate the unknown parameters, for example, the MASS package of R [25,26].

Zero-Truncated NB Model.
e NB model has been widely used in analyzing overdispersed count data; however, it requires completed observed data. When the zero's are truncated, the assumption of the NB model cannot be satisfied, and the estimated parameters are biased. Statisticians proposed truncated models [27]. In the truncated count model, the response variable, y, is also considered to follow Poisson distribution. But, it only takes positive numbers (i.e., conditional on that y > 0) as follows: From equation (2), it can be derived that Substituting equation (8) into (7), the zero-truncated Poisson distribution can be obtained as follows: where y is the response variable (truncated) and λ is the Poisson rate. Similarly, assuming that the Poisson rate λ follows Gamma distribution, the zero-truncated NB model can be obtained as follows: where µ is the mean response of the observation and α is the dispersion parameter. e finite-mixture models assume that the response variables arise from two or more unobserved components with unknown proportions.
is provides significant modeling flexibility than the conventional single component models [29]. As has been mentioned, statisticians have proposed the K-component finite mixture of negative binomial regression models (i.e., FMNB-K) as follows [29,30]: where y is the response variable (y � 0, 1, 2, 3, 4, . . .); w k is the weight factor of component k which sum to 1 ( K k�1 w k � 1); μ k is the Poisson mean of component k; and α k is the dispersion parameter of component k.
Analogous to equation (12), the K-component finite mixture of zero-truncated NB model (FMZTNB-K) can be constructed as where y is the zero-truncated response variable (i.e., crash counts; y � 1, 2, 3, 4, . . .); w k is the weight factor of component k which sum to 1 ( K k�1 w k � 1); μ k is the Poisson mean of component k; and α k is the dispersion parameter of component k.
In both the FMNB-K and FMZTNB-K models, a function is used to link the Poisson mean and roadway features; therefore, μ � It can be seen that when K � 1, the FMNB-K and FMZTNB-K models reduce to NB and ZTNB models, respectively. e FMNB models allow for additional heterogeneity within components not captured by the independent variables.
It is important to note that, as the number of components K increases, the FMNB model becomes more flexible. However, it also brings complexity in the parameter estimation. Previous studies have indicated that a two-component finite mixture of NB regression models (FMNB-2) was quite enough to characterize crash data [31][32][33].
us, this study considers the two-component finite mixture of zero-truncated NB model (FMZTNB-2) in the analyses.
In terms of parameter estimation, the commonly used maximum likelihood estimation (MLE) algorithm will not generate reliable results due to the complicated likelihood function in the FMZTNB-2 model. An alternative is the Gibbs sampling technique, also known as the Markov chain Monte Carlo (MCMC) method, which has been frequently used in estimating parameters of finitemixture models [29,34]. Package "rjags" is used to draw the samplings [35], and the FMNZTB-2 MCMC model is developed using JAGS (Just Another Gibbs Sampler) [36]. e truncation is represented using function T(,) in the JAGS.

Data
is study collected data on 395 rural multilane-divided roadway segments, including traffic volume, lane width, average shoulder width, and median width. ree years of wet-road crash data were collected. A wet-road crash is defined as that the weather condition was rain, snow, or hail, or the surface condition was wet, snowy, ice, or standing water at the time of the crash occurred. In terms of independent variables, this paper mainly considered data availability and potential effects on the occurrence of crashes during rainy weather conditions from published literature [19][20][21]. Finally, the following six variables were selected from the dataset: segment length, traffic volume, lane width, average outside shoulder width, average inside shoulder width, and median width. Descriptive statistics of the roadway and crash data are illustrated in Table 1.
It is worth mentioning that the minimum crash count of the sample segments is 1 (see the last row in Table 1), rather than 0. is is because when collecting the roadway data, only information on segments where at least one crash had occurred is available to the authors. In other words, the safety data is zero-truncated.

Modeling Results
Previous studies have revealed that the commonly used NB model is not applicable for modeling zero-truncated crash data [16,17]. e parameters can be heavily biased, and the results are not reliable. us, the conventional NB model is not used to the data collected in this study. is section presents the results of the ZTNB model and the FMZTNB-2 model, separately.

Modeling Result of ZTNB.
e authors developed the ZTNB model with the data described in Section 4 with the following functional form.
where μ is the mean of the observed crash data; ADT is traffic volume; LW is lane width (m); OSH is average outside shoulder width; ISH is average inside shoulder width (m); MW is median width (m); β 0 , β 1 , . . . , β 5 are unknown parameters to be estimated. It is important to note that the length of a segment is considered as an offset variable, meaning that the number of crashes is proportional to the segment length. is assumption is consistent with the HSM.
e modeling results of the ZTNB model is shown in Table 2. As can be seen, the parameters for traffic volume, average outside shoulder width, and average inside shoulder width are all statistically significant at the level of 90 percent or higher. Specifically, as the traffic volume increase, the predicted number of wet-road crashes also increases. e parameters for the other three roadway features are all negative, indicating that, with the increase of shoulder width or median width, the predicted number of wet-road crashes will decrease. For example, with one meter increase in average outside shoulder width, the predicted number of wetroadway crashes will decrease by 14.6 percent (i.e., 1 − e − 0.158 ). is is expected, as outside shoulders become wider, it provides additional recovery spaces for vehicles which slide away from the traveling lane due to the reduced skid number during rainy days. e results are in line with several previous studies [21,43]. On the other hand, the parameter for lane width is − 0.1, and the result is not statistically significant. e dispersion parameter, α, is estimated as 1.615, which is also insignificant.
is study used four types of goodness-of-measure (GOF) to evaluate the model performance: Akaike information criterion (AIC), Bayesian information criterion (BIC), mean absolute error (MAE), and root mean square error (RMSE). e AIC, BIC, MAE, and RMSE for the ZTNB model are 1142.29, 1170.14, 0.64, and 2.52, respectively (see the last four rows in Table 2).

Modeling Result of FMZTNB-2.
As has been mentioned in Section 3, this study utilized MCMC approach to estimate the parameters of the FMZTNB-2 model. Noninformative priors were used for hyperparameters. is study performed 1,000,000 MCMC iterations with two different chains, and the first 20,000 samples of each chain were discarded as burn-in samples from the MCMC outputs. Gelman-Rubin (G-R) convergence statistics and visual history plots were used to verify the MCMC process [44,45]. e functional forms linking the Poisson mean and the roadway features are similar to those of the ZTNB model, except that there are two forms in the components, as shown in the following equations.

Journal of Advanced Transportation
where μ is the mean of the observed crash data; μ c1 and μ c1 are the mean of observations in the two components, respectively; ADT is traffic volume; LW is lane width (m); OSH is average outside shoulder width; ISH is average inside shoulder width (m); MW is median width (m); and β ′ s are parameters to be estimated. e modeling results of the FMZTNB-2 model are documented in Table 3. First, the estimated weight factor for component 1 is 0.712, with a standard error of 0.082. is result is statistically significant, indicating that the sample data include two components.  [46][47][48].
Finally, the AIC, BIC, MAE, and RMSE for the FMZTNB-2 model are 1020.54, 1088.32, 0.22 and 2.14, respectively (see the last four rows in Table 3). In addition to model goodness-of-fit, this paper also analyzed the prediction performance of the two models using three sites. e three sites represent relative low, moderate, and high crash levels, respectively. e crash mean prediction, standard deviation, as well as 90 percentile confidence intervals of the three sites by the two models are tabulated in Table 4. e results indicate that, for the three sites, the predicted crash mean (i.e., number of wet-weather crashes) between the two models are fairly close (except for the first site, which has a very small crash mean). For site 90, the predicted number of crashes of the ZTNB and FMZTNB-2 models are 0.0645 and 0.0627, respectively. eir standard deviation values are 0.2606 and 0.0449, respectively. e crash predictions with FMZTNB-2 model have significantly lower standard deviation values and narrower intervals, indicating that the model has higher prediction accuracy.
e FMZTNB-2 model shows superiority in modeling the wet-weather crash data. First, the FMZTNB-2 model fits the dataset better than the ZTNB model in terms of GOF measures (e.g., AIC, BIC, MAE, and RMSE). Second, the predictions using FMZTNB-2 model have lower standard deviations and narrower prediction intervals,

Conclusions
Inclement weather increases both crash risk and travel time. Efforts have been made in the past decades to predict the occurrence of traffic crashes. However, very few of the previous studies have focused on predicting wet-road crashes. Most of the commonly used crash prediction models require complete roadway inventory, traffic, and crash data. Data missing is relative common in developing countries. How to analyze zero-truncated crash data and predict the number of wet-road crashes is the primary objective of this study. To better capture the heterogeneity of wet-road crash data, this study developed the two-component finite-mixture zero-truncated negative binomial model. e model is applied to threeyear wet-road crash on 395 rural-divided roadways. e model results are compared with those based on zerotruncated negative binomial model. Comparison indicates that the proposed FMZTNB-2 model fits the wetroad crash data better than the ZTNB model. It is worth mentioning that, the wet-weather crash data were not modeled with the conventional NB model since previous studies have demonstrated that the application of NB model on truncated data is not recommended. ere are trade-offs of using ZTNB or FMZTNB models in crash analyses. With zero-truncated data, the sample size is smaller than that of full data. e reduced sample size might increase uncertainty of parameter estimates.
ere are some limitations with this study. First, only a number of roadway characteristics (i.e., segment length, lane width, shoulder width, and median width) and traffic data are available to the authors. ere are other factors affecting the occurrence of wet-roadway crashes (e.g., precipitation, number of rainy days per year, and surface skid number). Unfortunately, they are not accessible to the authors. Second, previous studies have shown that the varying forms of dispersion parameter and weight factor for the components in the finite-mixture models improve both crash prediction and hotspot identification [33,37,[49][50][51]. In this study, fixed dispersion parameter and weight factor were used to simplify the parameter estimation process. In the future, it is necessary to collect more data, especially those closed related to wet-road crashes, and to examine if varying forms of dispersion parameter and weight factor will further improve the model performance. Finally, the finite-mixture model provides better results than the previously proposed zerotruncated model (e.g., goodness-of-fit and prediction). However, parameter estimates with the FMZTNB-2 model require MCMC, and it increases the computational time, which may be challenging for practitioners. e parameter estimating method in the FMZTNB-2 model needs to be further simplified in the future.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.