Evaluating Effect of Operating Speed on Crashes of Rural Two-Lane Highways

Crashes on a roadway are infuenced by


Background
According to a study conducted in the United States, rural two-lane highways account for a signifcant proportion, specifcally 76% of the overall paved road mileages [1].In Kentucky, a substantial proportion of roadway crashes are attributed to these specifc roadways.In particular, they are responsible for forty percent of all crashes, forty-seven percent of crashes that result in injuries, and sixty-six percent of fatal crashes, occurring on roads maintained by the state [2,3].Tere are various factors that lead to roadway crashes, encompassing attributes such as roadway geometric conditions, trafc volume, environmental conditions, and speed characteristics.Of these factors, speed is often cited as a primary cause of crashes [4].
Te traditional approach in the Highway Safety Manual (HSM) incorporates annual average daily trafc (AADT) and segment length as the base conditions for crash prediction, which can be further adjusted for diferent road geometric attributes [5].Multiple studies have provided empirical evidence supporting the relationship between speed and trafc crashes [4,[6][7][8][9][10][11].Furthermore, these studies have recommended the incorporation of speed as one of the variables in crash prediction models [11][12][13][14][15][16][17].Te speed considered in the analysis may represent either individual driver speed [4,6,7,9,10] or aggregated roadway speed, depending on the purpose of the analysis [8,[11][12][13]15].However, such analyses are predominantly carried out on routes with high levels of trafc, such as interstates and arterials.Te examination of the correlation between speed and safety on rural two-lane highways has primarily been conducted within the framework of geometric design consistency [18][19][20][21][22]. Geometric design consistency refers to the uniformity and predictability of road features, such as curves, slopes, and intersections, which can afect driver behavior and safety.Especially, the 85 th percentile speed serves as a metric for assessing design consistency throughout diferent segments.Oftentimes, due to limited available data, speed is calculated through models.
In recent times, there has been a notable increase in the availability of measured speed data, especially on higher functional class roads.Tis has led to many studies examining the association between measured speed and crash frequency relationship on these roads [23][24][25][26][27][28][29][30][31][32][33][34].Some studies observed that considering speed variables in crash prediction models can lead to improved performance when compared to traditional approaches [26].One recent study utilized measured speeds from Ohio and Washington while developing crash prediction models for a wide range of roadways and found certain operating speed-related measures to be signifcant when modeling total crashes, fatal and injury crashes, and property damage crashes [24].Another study by Das et al. developed crash modifcation factors (CMFs) using several speed metrics for evaluating the safety efectiveness of a countermeasure specifc to speed.Tey considered diferent levels of data aggregation in their analysis [25].Further studies were carried out to examine how the speed and crash relationship varies depending on the data aggregation approach used [33,34].
Certainly, previous literature has extensively investigated the role of speed among the contributing factors of crash occurrence.Tis highlights the importance of considering speed when assessing the crashes of a particular location.Nevertheless, the signifcance of speed is yet to be systematically explored in relation to crashes that occur on rural two-lane highways, which are less traveled with limited availability of speed data while constituting a signifcant portion of the nation's roadways.Neglecting the importance of speed in rural two-lane crash studies may result in incorrect decision-making during the selection of safety countermeasures and roadway design processes.Tis, in turn, can have a signifcant impact on the investment made by the Department of Transportations (DOTs) in highway projects aimed at reducing crashes.
Te objective of this study is to investigate the signifcance of speed in relation to crashes of rural two-lane highways.To achieve this, the authors develop a model to predict crashes on these roads.Tis model incorporates speed as a factor, utilizing aggregated speed metrics at the segment-level, including average speed and the 85 th percentile speed.Te signifcance of speed is evaluated across diferent speed ranges.Such analyses ofer insights into ways to enhance the model's performance.Te subsequent sections of the paper are structured in the following manner: In Section 2, an overview of the data sources is provided.Section 3 presents a zero-infated negative binomial model to estimate the crash frequency as a function of AADT, length, and speed.In Section 4, the performance of the model is analyzed and ways to enhance its performance are discussed.In Section 5, a summary of fndings and future research direction concludes the paper.

Data Collection and Preparation
Te study particularly utilized rural two-lane highway segments in Kentucky.Datasets on roadway, speed, and crashes were collected for these roads.Te crash datasets used in this study were obtained from the Kentucky State Police collision database, covering the time frame from 2013 to 2017.In addition, the roadway geometry data and trafc counts were collected from the Highway Information System (HIS) maintained by the Kentucky Transportation Cabinet (KYTC).Te crashes were further linked to the homogenous segments of roads based on the attributes such as trafc counts, functional classes, horizontal curves, shoulders, and grades [35].
Following the study by Ng, crashes that occurred within a distance of one hundred feet of intersections were classifed as intersection crashes [22].Tese crashes were excluded from the dataset since it is more likely that they were caused by a diferent combination of contributing factors.While HSM recommends 250 ft for intersection-related crashes, this value can be too restrictive for this study considering the low-volume condition on most of the segments.Furthermore, as suggested by Hauer and Bamfo, we also excluded the segments that were shorter than 0.1 miles [36].
Speed data from GPS-based probes were collected for the years 2015-2017.Tese data were obtained from a thirdparty data vendor known as HERE Technologies [37].Te data were available in 5-minute epochs for each day and in both directions of study segments, whenever probes were observed.Tese speeds were referenced to the HERE road network, which was then confated with the homogeneous segments to create a spatial linkage among speed, roadway attributes, and crash dataset.Details on the confation process are documented by Zhang and Chen [38].Subsequently, a screening process was conducted to assess the adequacy of the speed data, ensuring that only segments containing enough data were included in the analysis.To identify the minimum required sample size of the speed data for each segment, this study used equation (1) by Li et al. [39].Such a method is commonly used to estimate a reasonable sample size for collected trafc data to be within an allowable error range by incorporating data dispersion [39,40]: where the value of Z is 1.96 for a 95% confdence interval, σ is the standard deviation from the speed data, and the allowable error value, ε, is used as 5 units.Te estimated minimum sample sizes for each segment were compared with available speed data, and only the 2 Journal of Advanced Transportation segments meeting the minimum sample sizes of speed data were included in this study.Note that daytime speed data from 6 am to 8 pm were used, as nighttime data could be sparse in some rural areas.For each segment, we calculated aggregated speed metrics, especially the average speed and the 85 th percentile speed, by utilizing the 5-minute epoch speed data available during the daytime period of 2015-2017.After all preprocesses, the fnal dataset contained 44,008 segments with 93,820 crashes recorded over a 5-year period in both directions of the road.Te segments collectively encompass 21,240 centerline miles of rural two-lane segments in Kentucky, as depicted in Figure 1.

Methodology
Tis section outlines the methodology employed in the development of the model for predicting crashes of rural two-lane highways in this study.Multiple models were explored with separate speed measures to come up with the most reasonable measure to properly explain how speed afects the crashes on these roads.
3.1.Zero-Infated Negative Binomial Model.Since crashes are infrequent, it is likely that a signifcant proportion of instances in the dataset contain zero-observed crashes.Te threshold for determining the percentage of zero observations that warrants the use of zero-infated (ZI) models remains debatable [41][42][43][44][45]. Existing literature has employed such models with zero observations ranging from 11% to 62% [41][42][43][44][45].In our dataset, approximately, 40% of rural two-lane segments had no observed crashes, making it necessary to address the overdispersion issue caused by excess zeros.To tackle this, we utilized the zeroinfated negative binomial (ZINB) model, a statistical approach that has demonstrated a good statistical ft in previous studies [46].It is important to note that certain studies argue against the use of zero-infated models, claiming that the high percentage of zero-crash sites is not due to inherently safe and unsafe sites but rather results from specifc conditions such as a mix of low exposure, high heterogeneity, and high-risk crash sites [47][48][49].In addition, issues such as short time or small spatial scales of analysis, missing or misreported crash data, or omitted key variables in the model are cited as potential factors contributing to the high percentage of zero crashes [48].However, there are studies that advocate for considering ZI models for crash count modeling.Tese studies suggest that ZI models do not make assumptions about roads being inherently safe or unsafe but instead take into account the possibility of observing zero crashes [46,50].Furthermore, it is important to highlight that the main goal of model selection is to determine a model that effectively fulflls the research objectives, rather than seeking the ultimate "true" model [46].Given the objectives of our study as well as the long-time period and large spatial scale of the data collected, the ZINB model is considered a reasonable choice to efectively model crashes in this study.ZINB is formed by integrating a logit model and a negative binomial (NB) model [51].Te logit model is associated with excess zero crash occurrences, whereas the NB model generates the crash frequency in a segment, including instances of zero crash occurrences, based on a binomial process.If we indicate the likelihood of a crash frequency generated by the logit model as p i , then the likelihood of the crash frequency produced by the NB model can be represented as (1 − p i ).In ZINB, the parameter p i is commonly estimated by employing a logistic regression model that incorporates explanatory variables [52].In this study, we considered AADT and length of the segment (L) as the independent variables in addition to the speed measure (V), following existing practices [52,53].Here is the equation showing the logistic regression model: In equation (2), the term p i /1 − p i denotes the odds associated with the crash frequency resulting from the logit model.In particular, it represents the ratio between the likelihood of the crash frequency from the logit model and the likelihood of the crash frequency from the NB model.Te equation also includes an intercept term, c 0 , along with regression coefcients c 1 , c 2 , and c 3 .Te calculation of the likelihood of the zero crash frequency from the logit model can be adjusted as follows (equation (3)).A p i value that is somewhat close to 1 indicates that segment i is unlikely to experience any crashes and is hence considered as a safe segment: Now, the distribution of ZINB can be used to express the likelihood of the crash frequency, Y i , on segment i [54]:

Journal of Advanced Transportation
where Γ is the gamma function, α is the overdispersion parameter estimated using equation ( 5), and μ i refers to the mean of the underlying distribution of NB, which can be expressed as a function of the independent variables, as shown in equation ( 6): Here, Var (Y i ) is the variance of Y i , and μ i is calculated from the following equation: Here, μ i represents the expected crash frequency in 5 years.Besides speed measures, we included AADT and length as the independent variables, similar to previous studies [19,26,27,32,55,56].Te equation also includes an error term ε i that follows a gamma distribution, as well as regression coefcients β 1 , β 2 , and β 3 , which are to be estimated.

Variable Selection.
By utilizing the 5-minute epoch speed data collected over a span of 3 years during the daytime, several speed metrics were computed for each direction of the segments.Tese include average speed (V avg ), the 85 th percentile speed (V 85 ), the diference between average speed and speed limit (V avg − V sl ), and the diference between the 85 th percentile speed and speed limit (V 85 − V sl ).Te metrics were aggregated from both directions of a segment; and crashes were summed up.Te ZINB model was utilized to examine each of the speed variables, together with AADT and length, as provided in equation ( 7) However, it should be noted that the model did not include geometric attributes such as lane width and shoulder width, as these variables indicated a high correlation with AADT based on the Pearson correlation coefcient: In equation ( 7), we applied a natural logarithm transformation to AADT and L, as they exhibited a skewed distribution.No transformations were considered necessary for the speed measures due to their normal distribution.
Table 1 displays the descriptive information for the independent variables (i.e., AADT, length, and speed metrics) and the dependent variable, which is the crash frequency observed over a period of 5 years, considered in this study.It is noteworthy to mention that the dataset includes segments with low average speeds, which can be attributed to highly restrictive geometric conditions, such as narrow lanes and sharp curvature.Furthermore, the study data contain 14 segments with very low-speed limits, such as 10 mph, primarily located in mountainous areas.Moreover, many study segments exhibited average speeds or the 85 th percentile speeds well below the default speed limit of 55 mph for rural two-lane roads in Kentucky.Tis is largely due to the limiting geometrics of these roads.
To assess the relative performance of models employing alternative speed metrics, we utilized the Akaike information criterion (AIC) and Bayesian information criteria (BIC), which were computed using equations ( 8) and ( 9), respectively: and where Q represents the maximized likelihood function for the model, K denotes the number of parameters included in the model, and i is the total number of observations.According to previous research, models with lower values of AIC and BIC are considered to be more favorable [57].
To evaluate the prediction accuracy of the models, we examined various metrics of goodness-of-ft using data that were not previously observed by the model.Tese metrics include the root mean squared error (RMSE), mean absolute percentage error (MAPE), mean absolute deviation (MAD), and generalized R 2 value.RMSE is calculated by taking the square root of the mean squared error (MSE), which is obtained by averaging the squared errors of predicted crash frequencies across all segments.MAPE calculates the absolute error by comparing it to the actual crash frequency while excluding segments with no crash [58].MAD quantifes the average absolute diference between the predicted crash frequency by the model and the actual crash frequency.Generalized R 2 is derived from the likelihood function Q, wherein an upper limit of 1 is applied to the scale.Tis approach ofers a simplifed version of the traditional R 2 metric, eliminating the need for assumptions regarding the distribution of the dependent variable, such as a normal distribution.Generalized R 2 is estimated with the following equation: where Log Q(  β) and Log Q(0) indicate the log-likelihoods of the ftted and null models with only the intercept, respectively.
We evaluated fve models presented below with rural two-lane segments in this study.Te conventional model form, which consists of only AADT and length of the segment, was used as a benchmark for evaluating the performance of other models, each of which contained at least one of the speed metrics.Te goal was to assess the impact of incorporating speed as a variable in the crash prediction model and determine the extent to which it improved the accuracy of predictions:  2 provides a summary of all the tested models, including coefcients, AIC, BIC, generalized R 2 , RMSE, MAPE, and MAD values.It is interesting to note that models that include speed metrics tend to match the data better than the conventional model, as evidenced by lower values of AIC and BIC.In addition, each model shows that all of the speed metrics are signifcant at a signifcance level of 5%.Among all the models, the one utilizing the 85th percentile speed appears to exhibit the least amount of error, closely followed by the average speed model.Given that the 85 th percentile speed is frequently employed in highway planning to evaluate safety [57], it is plausible that this model would be more appropriate for such purposes.However, it is necessary to collect a substantial amount of data to achieve an accurate estimate of the 85 th percentile speed.Since average speed provides a better representation of actual operating conditions, the model with AADT, length, and average speed, as shown in equation ( 11), was ultimately selected for further analysis.

Integration of Speed for Better Performance
We have observed that speed is certainly a signifcant contributor to crashes.In this section, we discuss how speed and other independent variables are correlated with crashes using the average speed-based model shown in equation (11).We also evaluated how well the model fts the data, which further helped us adopt a refned approach of incorporating speed and ultimately improving model performance.
In equation (11), it is observed that both AADT and length exhibit a signifcant positive association with the crash frequency, as anticipated.Te model also reveals a negative correlation between average speed and crash frequency, which suggests that more crashes tend to take place at lower speeds.Tis observation aligns with a recent investigation conducted by Dutta and Fontaine, which specifcally examined interstates [26].Te negative relationship can also be noticed through marginal model plots, which illustrate how responses align with an independent variable while setting all other variables constant at their average values [59].Te obtained marginal model plots in Figure 2 illustrate that segments with lower average speeds tend to have a higher crash frequency, while the crash frequency increases with AADT and length.
We further justifed the negative relationship between the average speed and crash frequency by normalizing the crash data in proportion to the vehicle miles traveled (VMT), utilizing AADT and length.A clear decreasing trend was noticed on the normalized crash frequency with a higher average speed.To be more specifc, when other factors, such as AADT and length, remain constant, the crash frequency in the region with a higher average speed is actually lower, despite the fact that the total crash frequency may be higher due to high trafc volume.
Further analysis of the performance of the model was carried out utilizing cumulative residual (CURE) plots.Te construction of CURE plots followed the methodology outlined by Hauer and Bamfo [36].Tese plots display the cumulative residual, which represents the diference between the observed crash frequency and the predicted crash frequency derived from the model.Te independent variables are ordered in ascending order in the plot.Te purpose of such a plot was to get a visual representation of how well the model matched the dataset.An acceptable cumulative residual curve is defned as one that remains within a range of two standard deviations (± 2σ) [23].Figure 3 presents the CURE plots for the three independent variables employed in the average speed-based model.Evidently, the model exhibits inadequate ft to the data as a substantial proportion of the CURE extends beyond the ± 2σ limit, considering all independent variables.Furthermore, it is apparent that the model consistently overestimates or underestimates the crash frequency where the speed and AADTare higher.Te average speed plot shows that the model constantly overestimates or underestimates at three speed intervals, deviating from the expected ranges.Tese observations prompted us to explore a diferent approach, outlined in the following section, which involved utilizing speed as a categorizer.

Speed as a Categorizer for Model Development.
In this section, we attempted to investigate the most efective means by which speed can be incorporated into crash models.Based on Figure 3, it is clear that the current model exhibits a steady tendency to overestimate the crash frequency as the average speed increases up to approximately 30 mph.Subsequently, there is a shift towards underestimation until the average speed reaches roughly 50 mph.After this point, the model goes back to overestimating the crash frequency.
Considering these transitions in the CURE plot in terms of average speed, the study dataset was divided into three speed ranges based on average speed, and three distinct models were developed.Te three speed ranges were categorized as follows: low speed, which encompassed speeds below 30 mph, medium speed, which included speeds ranging from 30 mph to 50 mph, and high speed, which referred to speeds over 50 mph.Te respective proportions of total segments were approximately 21%, 61%, and 18%.
For each individual speed range, we developed crash prediction models with the ZINB form.Similar to the overall model, 75% of the segments within each speed range were used to train the model, and the remaining 25% were used for testing after model calibration.Te infuence of speed was analyzed across all speed levels.In the next subsections, we explain the importance of including speed as the variable in the model, in addition to how the crash frequency is afected by speed in diferent speed ranges.

Low-Speed Roads.
Te dataset for low-speed roads had 9,371 individual segments, all of which had an average speed of less than 30 mph.Tese segments had a total of 8,158 crashes in 5 years.Of the three independent variables considered, AADT and length exhibited statistical signifcance (p value <0.0001) at a signifcance level of 5%.However, the average speed was found to be insignifcant and, therefore, not included in the model.Te fnal model specifcation is presented in Table 3.
Quantifying the variables is one method for determining the relative signifcance of each independent variable in the model.Equation ( 12) provides a method for quantifying the signifcance of an independent variable: Here, the variance of the crash frequency, y, and given independent variable, X, denoted as Var(E(y/X)), is calculated by taking into account the predicted crash frequency, y, in relation to the conditional distribution of the variables under consideration.Te variance is subsequently calculated throughout the probability distribution of variable X. Var(y) is calculated as the variance of y.Based on the results, the relative importance of AADT and length on lowspeed roads is 68% and 32%, respectively.

Medium-Speed Roads.
Within the medium-speed group, a total of 27,075 distinct segments were identifed, each characterized by an average speed ranging from 30 to Based on the calibrated model, all three variables, i.e., AADT, length, and average speed, exhibited statistical signifcance (p value <0.0001) at a signifcance level of 5%.For comparison purposes, a traditional model with only AADT and length was also ftted with the same dataset.Table 4 presents the specifcations and performance of the two models.
While the statistical signifcance of average speed is observed within the medium-speed group, its relative importance is only about 1%.In contrast, AADTand L exhibit signifcantly higher levels of importance, accounting for 59% and 40%, respectively.It would appear that the infuence of speed is quite insignifcant for this group, which is supported by the marginal model plots in Figure 4. Based on the fgure, the line remains relatively fat, suggesting that there is no signifcant change in the crash frequency with average speed.However, the plot does indicate that other factors are playing an important role in infuencing the crash frequency.Based on this fnding, it appears that taking the average speed out of the model does not change the accuracy of the model very much.
Based on the above fnding, we proceeded with the conventional model form and developed CURE plots for AADT and length, as illustrated in Figure 5. Te plots indicate the possibility of further partitioning the data to enhance the accuracy of the model.Clearly, the plot suggests that there is a noticeable pattern of consistently underpredicting values, which then shifts to consistently overpredicting values when Ln (AADT) reaches a value of approximately 8, which corresponds to an AADT value of around 3000.Te medium-speed dataset was further separated into low-volume and high-volume subsets using this value as a cutof.
In order to assess the potential improvement in prediction accuracy, we conducted calibration and testing on two separate submodels: one developed for low-volume roads and another for high-volume roads.Te purpose was to determine if incorporating AADT as an additional categorizer could enhance the predictive capabilities of the models.Te ZINB formulation was utilized in both submodels, and AADT and length were used as the independent variables.Table 5 shows the specifcations and prediction performance of these models.We then combined the predicted crash frequency from the two submodels to compare their overall performance with that of the single model.
From the table, it can be observed that the performance of the two submodels, when combined, shows a marginal improvement compared to the performance of the single model.Furthermore, Figure 6 shows that the corresponding CURE plots for both submodels ft better, demonstrating the efectiveness of considering AADT as an additional categorizer for medium-speed roads.

High-Speed Roads.
High-speed roads included a total of 7,561 segments, each of which had an average speed of higher than 50 mph.Tese segments had a total of 27,648 crashes in 5 years.Upon calibration, it is evident that average speed is statistically signifcant (p value <0.0001) for crashes on high-speed roads.As expected, AADT and length are also signifcant.Table 6 shows variable coefcients and error metrics for the speed-based model.Te estimated coefcient of average speed indicates a negative correlation between the crash frequency and speed of these roads.Further investigation revealed that these roads are characterized by high geometric standards.Compared to low and mediumspeed roads, lanes and shoulders are wider with the presence of straighter sections.Within this particular category, the model gives 8% weight to average speed, while AADT and length account for 52% and 40%, respectively.Tis indicates that, as compared to its efect on other roads, speed has a greater impact on crash predictions on high-speed roads.
In addition, the traditional model was developed and is included in Table 6 for comparison purposes.It should come as no surprise that integrating speed in the crash frequency prediction model results in an enhanced performance over the traditional approach.Te inclusion of average speed in the model leads to improved performance measures, as displayed in the table.
Further evaluation of CURE plots for the speed-based model showed that overprediction occurs after an AADT of nearly 5,000.However, due to the relatively small number of samples available in the high-speed range, we decided not to further subdivide the dataset on the basis of AADT.As more data become accessible in subsequent periods, it will be possible to reexamine this analysis.

Overall Performance Result.
We evaluated the combined performance of the models that were based on speed and AADTcategorizers with the performance of the initial model in equation (11).Te goal was to illustrate how utilizing separate models using speed and volume enhances the overall accuracy of crash prediction for rural two-lane roadways.To achieve this, all of the predictions made by the low-speed, medium-speed, and high-speed road models, which are based on speed and AADT, were aggregated.Subsequently, error metrics were computed to assess prediction accuracy.Te performance of the combined model was also compared to that of the conventional model (Table 7), which incorporates only AADT and length variables.Table 7 demonstrates that when speed is utilized as a categorizer, and the model is then subdivided based on AADT within the medium-speed group, there is a notable reduction in the prediction error of up to 11.3%.
To further evaluate the performance of our models across diferent crash ranges, Figure 7 displays the confusion matrices for both the single average speed-based model (left) and the combined models (right).Tese matrices depict the accuracy of predictions for each range, with the diagonal line showing the percentage of correct predictions.Although both models perform similarly in terms of accurately predicting crashes, the combined models exhibit fewer predictions that deviate signifcantly from the actual values.For instance, the combined models predict only 0.14% and 0.62% of locations with zero and 1-3 crashes, respectively, to have more than 10 crashes, as opposed to 0.27% and 1.5% predicted by the single model.Moreover, for locations with more than 10 crashes, the combined models mistakenly predict only 6.7% to have 1-3 crashes, whereas the single model erroneously predicts 8.1% to have 1-3 crashes.Tese fndings demonstrate the advantage of the combined models over the single model in practical applications that aim to identify high-risk segments and inform improvement decisions.
Overall, the fndings of this study indicated that the performance of the crash prediction model for rural twolane roadways can be improved.Tis improvement was accomplished by using the actual dataset to estimate speed metrics and by taking speed and AADT into consideration as the categorizers.

Discussion and Summary
Te objective of this study was to examine how speed contributes to the crashes of rural two-lane highways.Tis was achieved by integrating measured speed data into the crash prediction model.We examined the impact of four distinct speed metrics on crashes.Te fndings revealed that all four speed metrics exhibited statistical signifcance in their respective models.Subsequently, we opted to conduct a more comprehensive examination of average speed in conjunction with AADT and segment length, as average speed more accurately depicts the prevailing operating conditions encountered by drivers on these roadways.
Upon conducting a more thorough investigation, it was discovered that there exists a negative correlation between the average speed and frequency of crashes on rural two-lane roadways.
Tis negative correlation aligns with prior research fndings that crashes tend to occur less when average speed is higher [8,26,30,60,61].One possible justifcation for this observed relationship is that rural two-lane highways with higher speeds are typically the primary routes in the area, often with improved geometric characteristics [30].
In addition, it was revealed that the importance of speed crash prediction seems to increase with speed.Tis observation prompted us to categorize the entire dataset based on speed into three subsets: below 30 mph, between 30 mph and 50 mph, and above 50 mph.Te analysis showed that speed was not signifcant for roads in the low-speed category but was signifcant for roads in both medium-and high-speed categories.While the efect of speed on crash prediction was shown to be statistically signifcant within the medium-       Furthermore, our study has revealed that incorporating additional categorizer based on AADT in conjunction with speed and developing submodels under each speed leads to improved predictions compared a single model.While developing models for predicting crashes of rural two-lane highways, it is important to consider both speed and AADT as categorizers, provided that the available data are sufcient for separate models.
Overall, the fndings of this study suggest that the efect of speed in predicting crash frequency can difer based on the speed ranges of rural two-lane highway sections.Such an analysis of speed on rural two-lane highways can provide valuable insights into the geometric and operational features of the roadway.Tis information can be efectively utilized to assess the safety performance of these highways under diferent circumstances.Consequently, appropriate countermeasures can be implemented to improve safety on these roads.Moreover, the developed submodels can be a valuable tool for transportation planners and policymakers to locate high-risk segments and allocate budgets to improve safety on those roads.
Currently, the sample size for roadways with higher operating speeds is relatively limited.We will continue to collect data on these roads to further test the performance of the model.In addition, better speed data coverage on rural low-volume roads is necessary to have a reliable estimate of the 85 th percentile speed for assessing safety from a design consistency perspective.To further understand the role of speed specifc to rural two-lane highways, it would be interesting to incorporate additional geometric variables and possibly crash severity into the model.Furthermore, in light of the concerns raised regarding the ZINB model, it is important to explore alternative statistical approaches that can handle the issue of excess zeros, such as the random parameters negative binomial, random parameters negative binomial-generalized exponential, random parameters negative binomial-Lindley, and extreme value models [47][48][49].In addition, considering more advanced techniques, such as machine models, could potentially enhance the overall performance and predictive capability of the model.

Data Availability
Te crash data used in this study are available at Kentucky State Police (https://crashinformationky.org/).Road attributes are available at Highway Information System (https:// transportation.ky.gov/Planning/Pages/HIS-Extracts.aspx).Lastly, HERE speed data are proprietary and would not be made available due to the restriction of data use agreement.

Conflicts of Interest
Te authors declare that there are no conficts of interest regarding the publication of this article.

Figure 1 :
Figure 1: Map of study segments.

( 1 )
AADT and length-only model (2) AADT, length, and V avg -based model (3) AADT, length, and V 85 -based model (4) AADT, length, and (V avg − V sl )-based model (5) AADT, length, and (V 85 − V sp )-based model For model development, we utilized 75% of the dataset for training and the remaining 25% for testing the model.Table

Figure 2 :
Figure 2: Marginal model plots for the average speed-based model.

Figure 3 :
Figure 3: CURE plots for the model based on average speed.

Figure 4 :Figure 5 :
Figure 4: Marginal model plots for a medium-speed model.

Figure 6 :
Figure 6: CURE plots of the models with an AADT categorizer.(a) Low-volume roads.(b) High-volume roads.

Figure 7 :
Figure 7: Confusion matrix to compare performances between single and combined models: (a) single average speed-based model; (b) combined models.

Table 1 :
Descriptive information of the study segments.

Table 2 :
Model parameter estimates and goodness-of-ft.

Table 3 :
Model specifcation for low-speed roads.
speed group, its overall infuence was not particularly pronounced.In contrast, speed had a greater impact on crashes occurring on high-speed roads.According to the study dataset, high-speed roads exhibited better geometric characteristics (such as wider lanes and shoulders) than lowand medium-speed roads.Tis observation implies that speed can serve as an indicator of the geometric condition of rural two-lane highways.

Table 5 :
Models based on AADT categorizer and performance comparison.

Table 6 :
Comparison of models for high-speed roads.