Distinguishing between Rural and Urban Road Segment Traffic Safety Based on Zero-Inflated Negative Binomial Regression Models

In this study, the traffic crash rate, total crash frequency, and injury and fatal crash frequency were taken into consideration for distinguishing between rural and urban road segment safety. The GIS-based crash data during four and half years in Pikes Peak Area, US were applied for the analyses. The comparative statistical results show that the crash rates in rural segments are consistently lower than urban segments. Further, the regression results based on Zero-Inflated Negative Binomial ZINB regression models indicate that the urban areas have a higher crash risk in terms of both total crash frequency and injury and fatal crash frequency, compared to rural areas. Additionally, it is found that crash frequencies increase as traffic volume and segment length increase, though the higher traffic volume lower the likelihood of severe crash occurrence; compared to 2-lane roads, the 4-lane roads have lower crash frequencies but have a higher probability of severe crash occurrence; and better road facilities with higher free flow speed can benefit from high standard design feature thus resulting in a lower total crash frequency, but they cannot mitigate the severe crash risk.


Introduction
Previous studies have been focused on distinguishing between rural and urban traffic safety using traffic crash data, but the influence of rural or urban settings on segment safety is controversial.The fatal traffic crash research indicated that fatality rates in rural areas are higher than in urban areas 1-3 .The higher fatality and injury rates in rural road facilities segments.Before analyzing segment crashes, the crashes at intersections were separated from the databases.Thus, the 200-ft intersection buffers were first created, and the crashes within these intersection buffers were deleted from the segment crash analyses.Then, with a roadsegment layer separated from the road network geodatabase, the crashes associated with segments needed to be further separated from all other crashes.Because these segments may have wide cross-sections, a 150-foot buffer on both sides of an arterial centerline was adopted to capture most crashes associated with the segments only.After the 150 foot buffers were created, the crashes within these buffers were selected and aggregated in their corresponding segments.
Because different categories of road facilities vary by characteristics of highway design, traffic operation, and environments, the crash data associated with a specific type of highways needed to be separated from the other types of highways.In this study, the crash risk was calculated and analyzed not only for the overall segment network, but also for interstate, expressway, principal arterial, and minor arterial, respectively.The segments belonging to other road types were excluded from these segments.The combined data set was further organized according to the following criteria.i These accidents were divided into three categories: fatal, injury, and propertydamage only PDO accounting for the accident severity.
ii Road segments with 2 and 4 lanes were selected, because 6 lanes segments exist in urban areas only.
iii ADT was calculated by 1000, because the change in crash frequency with increment of one vehicle is meaningless.
The cleaned accident data were overlaid with the GIS-based network and distributed into each segment in rural and urban areas.The segments were first analyzed and compared in terms of crash rate based on the comparative statistics of the four types of road segments.Then, ZINB models for segment crash frequency analyses and predictions were developed, in which variables are described in Table 1.

Zero-Inflated Negative Binomial Regression
For a Poisson crash frequency model, it assumes that the observed crash count data y i , given the vector of covariate x i , follows a Poisson distribution.The density function of y i can be expressed as follows: where the parameter u i , conditional mean number of events for each covariate x i , is given by where β is a k 1 ×1 parameter vector β 0 is the coefficient for intercept, and β 1 , β 1 , β 2 , . . ., β k are for k regressors .
In the Poisson regression, the conditional variance of the count variable is equal to the conditional mean as follows: where x i is the covariate of road segment geometric and traffic features in each record including the intercept; u i is the conditional mean of the crash frequency y i .Since this assumption is contradict to the fact that the vehicle accident data are always significantly overdispersed relative to its mean, the NB regression model was developed with a heterogeneity component accounting for unobserved heterogeneity in the crash count data as follows: where β is the parameter coefficients vector to be estimated for independent variables including intercept; Exp ε i is a heterogeneity component accounting for unobserved heterogeneity in the crash count data, which is independent of x i .However, there is always a large density of zeros in crash count data, which cannot accurately be predicted by traditional NB models.For this situation, the zero-inflated regression models were developed in the crash frequency-related research area.Zero-inflated count models provide a way of modeling the excess zeros in addition to allowing for overdispersion.For each road segment, there are two possible data generation processes.Process 1 is chosen with probability ω i and process 2 with probability 1−ω i .Process 1 generates only zero counts, whereas process 2 generates counts from either a poisson or a negative binomial model.In this paper, the probability ω i depends on the geometric and traffic features of segment i, can be obtained from the logistic function F, as follows: where z i is the vector of independent variables specified in the logistic regression model road facility and traffic features and intercept; γ is the vector of zero-inflated coefficients to be estimated.The probability of crash frequency for segment i can be expressed as follows: where g y i | x i follows either Poisson distribution or NB distribution; x i is the vector of covariates of observation i specified in the model.In this study, ZINB models were used for regression efforts because zero-crash segments account for more than 40% of the total data.

Comparative Statistical Analyses of Rural and Urban Traffic Safety
During the observation period of four and a half years, there were 9651 crashes occurring in the study areas, consisting of 1057 records in rural segments and 8594 records in urban segments.Among the crashes in the rural segments, there were 15 fatal and 176 injured accidents.On the other hand, 46 fatal and 1038 injury crashes happened in urban areas.Table 2 shows the descriptive statistics for rural and urban segment lengths, which indicate that average mileage of rural segments 0.968 mile is longer than urban segments 0.293 mile because of a lower density of intersections in rural networks.Figure 1 displays the road segment crash rate distribution, calculated as the number of crashes per 100 million VMT, where the double line is the boundary between rural and urban areas.It shows that the percentage of segments with higher crash rates within the urban region is more than rural areas.
Table 3 displays the t-test statistics of rural and urban segment comparison for different types of facilities.It shows that there is a significant difference between rural and urban in terms of crash rates using both crash per lane * miles * year and crash per 100 million VMT in 2-lane segments.The crash rates in rural segments are consistently lower than urban segments.The 2-lane expressway is exceptional mainly because of the small sample size of 2-lane rural expressway.However, there is no statistical difference between rural and urban 4-lane arterial segments.

ZINB Regression Analyses
The crash frequencies distribution histogram Figure 2 clearly illustrates that there are excessive zeros over 40% in the crash data.The P values in Kolmogorov-Smirnov, Cramervon Mises, and Anderson-Darling normality tests are all less than 0.05.Therefore, it strongly supports the null hypothesis that the crash data do not follow the normal distribution.Therefore, the ZINB models are suitable to the crash count data regression analyses.ZINB models were developed using the software SAS 9.2.We chose the crash frequency in segment Num crsh as the dependent variable, and the regressors included segment length length , number of lanes Numberofla , thousand average annual day traffic ADT 1000 , free flow speed FFS , and RoU rural or urban .The segment type was not considered in this model since it was highly correlated with FFS and RoU.
Table 4 shows the parameter estimates of ZINB model for total crash frequency in segment, and only significant variables P < 0.05 were included in the model.The ZINB model parameter estimates include 2 parts: NB regression and logistic regression.In the NB regression process, it can be found that the number of lanes, rural or urban, ADT, length, and FFS are all significantly correlated with the number of crashes.Further, the measure of Alpha in Table 4 is 1.435, with a P value less than 0.001, displaying a very strong overdispersion effect and indicating the superiority of the ZINB model over the zero-inflated Poisson ZIP model.ADT 1000 and LENGTH are positive associated with the crash frequency, suggesting that crash frequencies increase with increments of traffic volume and segment length.The results are consistent with many previous research conclusions 7, 9, 27 .FFS is negatively associated with the crash frequency, indicating that crash frequencies are decreasing with increment of roadway free flow speed.Since FFS is correlated with the design standard of road facilities, it would be more appropriate to be explained that a better road facility with higher FFS has a lower crash rate compared to the facilities with lower FFS.In this study, FFS can be treated as a surrogate of speed limit but it can more accurately reflect the actual traffic operation status in road segments than speed limit.Previous research finding is less conclusive about the impact of speed limit on crash frequency 28 .In addition, four-lane roadways were found to be associated with a lower number of crashes than 2-lane roadways in this model.This is reasonable because this comparison was based on the assumption of same traffic exposure so that the segments with 4 lanes should have lower traffic volume per lane.More importantly, the urban regions appear to have a higher crash frequency than rural areas, which is consistent with the crash rate analyses results.The logistic regression part of the model predicts the likelihood of zero crash occurrences.The modeling results reveal that the variables of ADT 1000 and LENGTH are significant in estimating the probability of segments belonging to the zero crash occurrence group.According to the parameter coefficients estimated, the higher the traffic exposure thousand of AADT and segment length , the lower the possibility of zero crash occurrences, which is consistent with all the previous study conclusions.Furthermore, Table 5 shows the parameter estimates of ZINB model for injury and fatal crash frequency in a segment Alpha is 1.074, with a P value less than 0.001 .The NB regression indicates that Numberofla, RoU, ADT 1000, and LENGTH are significant variables to predict injury and fatal crash frequency, which displays a very similar result to that for total crash frequency except for FFS.It implies that although the better road facilities with higher FFS benefit from high standard design features resulting in a lower total crash frequency as shown in Table 4 , they would not mitigate the severe crash risk.A previous study reported that by controlling the other factors, purely increasing operation speed in road segments by 1% would approximately result in 2% increment in injury crash rate and 4% increment in fatal crash rate 29 .On the other hand, compared to the total crash frequency model, the logistic regression results for injury and fatal crash frequency model are quite different though the effect of LENGTH keeps similarity.First, the number of lanes is a significant variable for estimating the probability of zero injury and fatal crash occurrence in segment.Compared to 2-lane roads, the 4-lane roads have a lower severe crash frequency but have a lower probability of zero crashes.A possible explanation is that changing lane maneuver in 4-lane segments would increase the severe crash risk.Second, the effect of ADT 1000 in the Logistic regression of injury and fatal crash model is reverse from the total crash model.It shows that as traffic volume increases, the likelihood of zero severe crashes decreases.This interesting finding is consistent with the previous conclusion in a crash severity study, which explains that lower ADT could mean higher speeds that more often lead to severe/fatal crashes 30 .

Conclusion and Discussions
There have been numerous studies to clarify the role of rural or urban settings in segment safety, but it was still controversial to make a conclusion.Before reaching the common agreement on the difference between rural and urban traffic safety, it is important to clarify the definition of "rural."Generally, to distinguish from urban environments, rural areas have the attributes associated with demographic features e.g., low population size and density, outside boundary of urban area , economic statues low economic indicators, farming, and agriculture , social structure e.g., intimate, informal, and homogeneous forms of social interaction, limited social resources , cultural characteristics e.g., traditional, conservative, provincial, slow to change , and so forth.The above features are often used to explain the statistical fact that the death rate from many common causes in US is significantly higher in rural compared to urban areas 1, 6 , as well as in different countries 31-33 .However, these thresholds should not be universally applied to make local transportation safety analyses.For many developed regions, although districts are clearly separated into rural and urban regions according to their demographic, economic, or social attributes, the transportation facilities are well connected to each other and formed more standardized road networks.Thus, it was reported that there are relatively high numbers of crashes in urban regions because the heavy traffic volume and complex driving environments in urban lead to more conflicts between vehicles 34 .Therefore, for a specific safety evaluation project, this study supports the argument that more detailed crash risk comparisons between rural and urban transportation road segments should be performed at a comparable level.In this paper, the crash rate comparison and ZINB regression for both total crash frequency and injury and fatal crash frequency in road segment were conducted to discriminate between rural and urban traffic safety.It was found that compared to urban areas, the measures for traffic safety in rural areas show lower crash rates, total crash frequencies, and injury and fatal crash frequencies.The results based on the ZINB regression models also showed the following.
i Segment crash frequencies increase as traffic volume and segment length increase.
However, higher traffic volume will lower the likelihood of severe crash occurrence.
ii Compared to 2-lane roads, the 4-lane roads have a lower crash frequency but have a higher probability of severe crash occurrence.
iii Better road facilities with higher free flow speed benefit from high standard design feature resulting in a lower total crash frequency but would not mitigate the severe crash risk.
Finally, it can be concluded that in the research area traffic safety of rural segments is better than urban segments, which implies that a priority for traffic safety improvement should be put on the urban highway segments.

Figure 1 :
Figure 1: Road segment crash rate distribution in terms of the number of crashes per 100 million VMT.

Table 1 :
ZINB models for segment crash frequency analyses and predictions.

Table 2 :
Original statistics for the length and mileage.

Table 3 :
t-test statistics for rural and urban segment comparison.

Table 4 :
Parameter estimates of ZINB model for total crash frequency.

Table 5 :
Parameter estimates of ZINB model for injury and fatal crash frequency.