The Effects of Traffic Composition on Freeway Crash Frequency by Injury Severity : A Bayesian Multivariate Spatial Modeling Approach

This study sets out to investigate the effects of traffic composition on freeway crash frequency by injury severity. A crash dataset collected from Kaiyang Freeway, China, is adopted for the empirical analysis, where vehicles are divided into five categories and crashes are classified into no injury and injury levels. In consideration of correlated spatial effects between adjacent segments, a Bayesian multivariate conditional autoregressive model is proposed to link no-injury and injury crash frequencies to the risk factors, including the percentages of different vehicle categories, daily vehicle kilometers traveled (DVKT), and roadway geometry. Themodel estimation results show that, compared to Category 5 vehicles (e.g., heavy truck), larger percentages of Categories 1 (e.g., passenger car) and 3 (e.g., medium truck) vehicles would lead to less no-injury crashes and more injury crashes. DVKT, horizontal curvature, and vertical grade are also found to be associated with no-injury and/or injury crash frequencies. The significant heterogeneous and spatial effects for no-injury and injury crashes justify the applicability of the proposed model. The findings are helpful to understand the relationship between traffic composition and freeway safety and to provide suggestions for designing strategies of vehicle safety improvement.


Introduction
As a primary component of roadway transportation system, the safety performance of freeways has attracted much attention in the field of traffic safety research in recent years [1][2][3].To reveal crash occurrence mechanism, rank hotspots for safety improvement, and evaluate the safety benefits of countermeasures, crash prediction models (also called "safety performance functions") are usually developed to link traffic volume, roadway attributes (e.g., length, speed limit, number of lanes, median and shoulder width, vertical grade, and horizontal curvature), and weather conditions (e.g., visibility, temperature, and precipitation) to the crash frequencies on freeway segments [2,4,5].
In addition to reducing crash occurrence, mitigating the degree of injury sustained by crash-involved road users is also a major emphasis of freeway agencies.Probably due to the high speed, freeway crashes tend to result in more severe injury outcomes than those that occurred on other types of roadways (e.g., urban roadway).For example, the Annual Statistical Report on Roadway Traffic Accidents [6] shows that the fatality rate of freeway ranks first among all roadway types in China [6].In 2015, nationwide, one third of roadway crashes resulting in over 10 people killed occurred on freeways.Thus, a more comprehensive evaluation on freeway safety performance necessitates modeling crash frequency by injury severity, which may reveal undetected deficiencies in total crash frequency prediction model and provide a more cost-effective approach for identifying crash hotspots [7,8].
While there are several studies on analyzing crash frequency by severity, only a few of them focus on freeway safety.Ye et al. [9] propose a simultaneous equations model for analyzing frequencies of no-injury, possible injury, and injury and fatal crashes on 275 multilane freeway segments in the State of Washington.Ma et al. [1] use multivariate space-temporal approaches to modeling daily no-injury and injury crash counts on the mountainous freeway, I70 in the State of Colorado.In these studies, the roadway geometry and environmental factors are used as the safety predictors.None of them has investigated the effects of traffic composition on crash frequency by injury severity.As indicated by Dinu and Veeraragavan [10], traffic composition can be defined as the involved vehicle types and the proportions of each vehicle type in the mixed traffic flow.Compared to urban traffic, freeway traffic generally consists of more motor vehicle types, varying from motorcycles, passenger cars, coaches, vans, to light/medium/heavy trucks and trailers.There may exist substantial differences in driving behavior, maneuver ability, and crash aggressivity and worthiness among these vehicle types, which would lead to considerable variability in crash risk and injury severity [11][12][13].A number of previous studies have demonstrated that traffic composition has statistically significant effects on crash occurrence [3,10,14].Therefore, it would be beneficial for modeling freeway crash frequency by injury severity to incorporate the proportions of each vehicle type in the mixed traffic into the explanatory variables.
From a methodological perspective, crash frequencies at different injury degrees can be modeled either separately or jointly [15].Since they are often found significantly correlated, joint modeling approaches, including multivariate regression [16,17], simultaneous equations [8,9], jointprobability approach [18], two-stage multivariate model [19], and multinomial-generalized Poisson models [20][21][22], are widely used for analyzing crash frequency by severity.Among these approaches, multivariate regression is able to provide a fully general covariance structure for capturing the underlying correlation while accommodating some other characteristics of crash data, such as overdispersion and spatiotemporal correlation [1].For freeway crash frequency modeling, spatial correlation is an important issue to be considered, because the safety performance of adjacent segments may be affected by common omitted factors [23].The multivariate conditional autoregressive (CAR) model is one of the most advanced methods for multivariate spatial modeling under the Bayesian framework.It has been successfully applied to analyze crash frequency by severity or transportation mode at the macro level (e.g., canton and census tract) [24][25][26] and the micro level (e.g., roadway segment and intersection) [1,[27][28][29].With the ability of accounting for heterogeneous and spatial effects and their correlations among crash severities, the multivariate CAR model is expected to improve the accuracy of identifying the contributing factors to freeway crash frequency by severity.Please refer to the review papers, Lord and Mannering [30] and Mannering and Bhat [31], for more descriptions and assessments on novel approaches for modeling crash frequency.
In this research, the key goal is to investigate the effects of traffic composition on freeway crash frequencies at different injury degrees.To achieve this goal, one-year crashseverity frequency data from Kaiyang Freeway in China are collected, and a multivariate CAR model is developed with the proportions of different vehicles and freeway-specific attributes as predictors for predicting crash frequency by severity simultaneously.
The remainder of the paper is as follows.The next section describes the collected data for the empirical analysis.In Section 3, the formulation of the multivariate CAR model is specified and the estimation process of the model is introduced.The results of parameter estimation are discussed in Section 4. Finally, Section 5 presents the concluding remarks and directions for future research.

Data Preparation and Preliminary Analysis
The crash, traffic, and roadway data on Kaiyang Freeway in Guangdong Province, China, in 2014 are collected for the current analysis.Kaiyang Freeway is about 125 km long, with a median barrier, four lanes, and 120 km/h speed limit along the whole freeway.The roadway geometry data, which cover the detailed information on vertical grade, horizontal curve, and the layout of bridges and ramps, are extracted from the Horizontal and Longitudinal Profile, provided by Guangdong Province Communication Planning and Design Institute Co., Ltd.The other roadway-specific attributes, including median width, lane width, roadway shoulder type and width, and pavement type, are constant along the freeway, thus not considered as safety predictors.According to two criteria-the homogeneity in horizontal and vertical alignments as well as the 150 m minimum length-the freeway is split into 154 segments.A more detailed description on the segmentation method can be found in Zeng et al. [3].
The crash data are obtained from the Highway Maintenance and Administration Management Platform of the Guangdong Transportation Group.The disaggregated crashes are mapped to these segments, according to their locations recorded as the mileages of the freeway in crash reports.In the original records, crash injury severities are categorized into four levels: no injury, slight injury, severe injury, and fatality.Due to the rareness of severe injury and fatality crashes which only account for 5.9% of the total observed crashes, two severity degrees are considered in this analysis: (1) no injury and (2) injury, which refers to slight injury, severe injury, and fatality.For each freeway segment, the aggregated crashes are further divided into two groups according to the crash severity.Therefore, the no-injury and injury crash counts on each segment in 2014 can be obtained.
The traffic data are acquired from the Guangdong Freeway Networked Toll System.The traffic volumes are originally recorded by vehicle category.In the Networked Toll System, vehicles are classified into five categories, based on four vehicle-size parameters: head height, axis number, wheel number, and wheelbase.The specific classification criteria are presented in Table 1.For freeway segment , the annual average daily traffic (AADT) is calculated by using the weights 1, 1.5, 2, 3, and 3.5 for vehicle categories 1-5, respectively (the weights are officially recommended by the Transportation Department of Guangdong, according to the average sizes of the involved vehicle types in each of the categories.They are extensively employed in the field of transportation engineering, such as freeway toll and traffic volume prediction): where  (1)   ,  (2)   ,  (3)   ,  (4)   , and  (5)    are the traffic volumes for vehicle categories 1-5, respectively, on freeway segment  in 2014.As in the previous studies [1,2], the daily vehicle kilometers traveled (DVKT), measured as the product of AADT and segment length, are used as the crash exposure variable.
The percentage of the 5th vehicle category, ℎ (5)   , is set as the reference case.
Table 2 illustrates the definitions and descriptive statistics of the variables used in the model development.In SPSS, Pearson correlation tests for the risk factors are conducted.The results indicate that five variable pairs, ℎ (2)   and ℎ (3)   , ℎ (2)   and ℎ (4)   , ℎ (3)   and ℎ (4)   , ℎ (2)   and DVKT, and ℎ (4)    and DVKT, are significantly correlated with the absolute values of their correlation coefficients over 0.6.To avoid the adverse effect of significant correlation, ℎ (2)   and ℎ (4)    are excluded from the analysis.The results of the multicollinearity diagnoses suggest that there is no significant collinearity in the other factors.

Model Specification.
In the present study, there are two injury severity degrees used for the empirical analysis.Therefore, a bivariate CAR model is developed, which can be extended to a multivariate version for simultaneously predicting ( > 2) response variables [27].In the bivariate CAR model, the observed no-injury ( = 1) or injury ( = 2) crash count  , on freeway segment, , is assumed to follow a Poisson distribution: A generalized linear relationship is assumed between the expected crash count  , and the observed risk factors X , (including traffic composition): ln  , = ln  , + X  ,   +  , +  , , where   are the regression coefficients corresponding to the risk factors for severity level .To reveal their possible nonlinear relationship, the crash exposure  , is formulated as a power of DVKT: where   is an estimable parameter.
The residual terms  , and  , in the link function are used to capture the unstructured heterogeneous and structured spatial effects, respectively, for the expected crash frequency on freeway segment  at each severity level .These two terms also account for the overdispersion characteristic commonly found in crash frequency data [1].To accommodate the correlation between the heterogeneous effects of noinjury and injury crashes,  1, and  2, are assumed to be binormally distributed with zero means: ) . ( In the variance-covariance matrix Σ, the diagonal elements (  1,1 and   2,2 ) represent the variances of heterogeneous effects for no-injury and injury crashes, respectively, while the offdiagonal elements (  1,2 and   2,1 ,   1,2 =   2,1 ) represent the covariance between the heterogeneous effects.To measure the correlation between them, the correlation coefficient is calculated as  =   1,2 /√  1,1   2,2 .To capture the spatial effects correlated between crash severities, a bivariate two-dimensional CAR prior is advocated.Proximity structure is a critical element of the CAR prior.While various proximity structures have been investigated [32], the most prevalent structure, 0-1 first-order neighbor, is used to define the proximity matrix in this study.Specifically, if segments  and  are connected to each other directly,  , = 1; otherwise,  , = 0. Given the 0-1 first-order neighbor structure, the bivariate CAR prior is expressed as follows: where   = ∑  ̸ =  , is the number of segments that are adjacent to segment  and  , = ∑  ̸ =  ,  , /  .Ω is the variance-covariance matrix for spatial correlation, in which    1,1 and   2,2 reflect the spatial variances of no-injury and injury crash frequencies, respectively, and   1,2 (=   2,1 ) reflects the spatial covariance between them.The correlation coefficient   =   1,2 /√  1,1   2,2 describes the correlation between the spatial effects.
The posterior proportion of extra-Poisson variation explained by the spatial correlation for each crash severity level is also of interest and is defined as follows [24]: 3.2.Model Estimation.Due to the complexity of the bivariate CAR model, the parameters and hyperparameters are estimated by Bayesian methods using Markov chain Monte Carlo (MCMC) simulation available in freeware WinBUGS.The prior distributions of the (hyper)parameters, which reflect prior knowledge about the (hyper)parameters, should be specified, when conducting Bayesian estimation.In the absence of sufficient knowledge, noninformative (vague) prior distributions are usually used [33].Specifically, a diffused normal distribution (0, 10 4 ) is specified as the priors of the regression coefficients (i.e., the elements of   ).A Wishart prior, (P, ), is used for Σ −1 and Ω −1 , where the scale matrix P is an identity matrix and  = 2 is the degrees of freedom [27].A chain of 150,000 iterations of the MCMC simulation is set, and the first 100,000 iterations act as burnins.Visual inspection of the MCMC trace plots for the model parameters and monitoring of the ratios of the Monte Carlo errors relative to the respective standard deviations of the estimates are used to evaluate the MCMC convergence.The estimation results of the parameters and hyperparameters in the bivariate CAR models are presented in Tables 3 and 4, respectively.

Result Analysis
According to the results in Table 3, we can find that a larger proportion of Category 1 or 3 vehicles in the mixed freeway traffic would significantly decrease the probability of noinjury crashes, while significantly increasing the probability of injury crashes.The coefficients of ℎ (1)   indicate that there are a 19.8% decrease in the expected no-injury crash frequency and a 22.1% increase in the expected injury frequency per 1% increase in Category 1 vehicles.Similarly, there are a 46.1% decrease in the expected no-injury crash frequency  and a 4.75 times increase in the expected injury frequency per 1% increase in Category 3 vehicles.These results may be attributed to the variations of the crash aggressivity and worthiness among these vehicle categories and the driving behavior and maneuver ability among their drivers [10].For example, Huang et al. [11] have found that, compared to heavy trucks (Category 5), passenger cars (Category 1) and medium trucks (Category 3) possess much lower crash worthiness, which would increase the degrees of injury sustained by occupants in passenger cars or medium trucks.For freeway safety improvement, the findings suggest that more intensive education on freeway safety should be focused on drivers of small-size vehicles and more strict enforcement should be conducted on their common risky driving behaviors, such as speeding and no use of seat belt.Moreover, the findings also indicate the warrant of driving safety assistant system for the vehicles in Categories 1 and 3, to avoid injury (especially fatal) crashes.
Turning to the estimated safety effects of other explanatory variables, the natural logarithm of DVKT is found statistically significant at the 95% credibility level and positively associated with no-injury and injury crash frequencies.Specifically, a 1% increase in DVKT is expected to result in a 0.9% increase in no-injury crashes and a 0.8% increase in injury crashes.These results are reasonable and consistent to the findings in previous studies [1,2], as more DVKT provide more opportunities for crash occurrence at each severity degree.
The parameter estimates show that the effect of horizontal curvature is significant (at the 80% credibility level) on noinjury crashes only.The positive coefficient (mean=0.09)implies that a 0.1 km -1 increase in curvature is associated with a 9.4% increase in no-injury crashes.It is in line with the existing findings [34,35]: freeway segments with great curvature generate limited sight distances for drivers and require strong centrifugal forces on vehicles negotiating the curve, which may lead to rear-end or sideswipe crashes.The outcomes of these two crash types are usually property damage only.
Vertical grade is found to have a significantly positive (at the 80% credibility level) effect on injury crash frequency.The coefficient (mean=0.30)indicates that a 0.01 increase in freeway grade may contribute to a 35% increase in injury crashes.The result conforms to engineering intuition and aligns with many previous studies which argue that a steep grade also has an adverse impact on stopping sight distances, thereby increasing crash risk [3,34].Moreover, some previous studies have demonstrated that vertical grade would increase crash injury severities [36,37].Due to the bad safety performance, steep grade is not recommended in freeway vertical design by almost all design manuals [4].With respect to the hyperparameters, the estimation results in Table 4 indicate that there are significant (at the 95% credibility level) heterogeneous (   1,1 =0.12 and   2,2 =0.25) and spatial (   1,1 =0.17 and   2,2 =0.32) effects in both no-injury and injury crashes.The spatial correlations account for 64% and 68% of the extra-Poisson variations of no-injury and injury crash frequencies, respectively, as reflected by the estimates of  1 and  2 .These results demonstrate the strength of incorporating spatial correlation into freeway crash modeling.While the heterogeneous and spatial covariances and correlation coefficients between the two crash severities are positive, unfortunately, their magnitudes are very low and none of them is statistically significant (less than 80% credibility level).The results indicate that no-injury and injury crash frequencies are mutually independent.Further investigation on the finding is fully merited.A plausible cause is that their correlation may substantially be attributed to the safety effects of the observed factors.Nevertheless, to further verify the performance of the proposed model, we compare it with univariate CAR and (aspatial) bivariate Poisson lognormal models in terms of goodness of fit.The results in Table 5 show that the bivariate CAR model yields a lower deviance information criteria (DIC) value and a higher R 2 value than the other two models, which suggests that the multivariate CAR model outperforms its univariate and aspatial counterparts.Therefore, the proposed multivariate CAR model is a good alternative for analyzing crash frequency by severity.

Conclusions and Future Research
This study investigates the effects of traffic composition on freeway crash frequency by injury severity based on a crash dataset collected from Kaiyang Freeway in Guangdong Province, China.In the empirical analysis, five vehicle categories and two severity levels (i.e., no injury and injury) are considered.To quantitatively estimate the safety effects of the percentages of different vehicle categories, a Bayesian multivariate CAR model is developed, with the effects of DVKT and freeway geometry controlled and the heterogeneous and spatial effects as well as the (aspatial and spatial) correlations between crash severities accommodated.
According to the results of Bayesian estimation conducted in WinBUGS, increasing the percentages of Categories 1 (e.g., passenger car) and 3 (e.g., medium bus, medium truck) vehicles would decrease the risk of no-injury crashes but increase the risk of injury crashes, compared to Category 5 (e.g., heavy truck) vehicles.Moreover, more (no-injury and injury) crashes are expected to occur on freeway segments with more vehicle kilometers traveled.Horizontal curvature is positively correlated with no-injury crashes, and vertical grade is positively correlated with injury crashes.The results are consistent to the findings of previous studies and conform to engineering experience.Significant heterogeneous and spatial effects are also found for both no-injury and injury crashes.These findings justify the applicability of the proposed model.
While the study is empirical in nature, it provides good overall understanding of the safety effects of freeway traffic composition and can bring inspiration to strategies of freeway vehicle safety improvement.However, there are some limitations in the current study, to which future research efforts can be devoted.First, each of the grouped vehicle categories contains several vehicle types.The crash risk of these vehicle types may also be different.More elaborated traffic data may provide a deeper insight on the relationship between traffic composition and freeway safety.Second, due to the limitation of data collection, weather conditions are not considered.Controlling for the weather factors may help further reveal the occurrence mechanism of freeway crashes.Last, while the proposed 0-1 first-order neighbor structure is the most prevalent method to define the proximity matrix in CAR models, more (such as distance-based) neighbor structures can be explored to find the most promising one for spatial modeling freeway crash frequency by severity.

Table 2 :
Descriptive statistics of variables in the model.
a pcu: passenger car units

Table 3 :
Parameter estimates in the bivariate CAR model a .
a b BCI: Bayesian credible interval.c Boldface indicates statistical significance at the corresponding credibility level.

Table 4 :
Hyperparameter estimates in the bivariate CAR model.
a BCI: Bayesian credible interval.b Boldface indicates statistical significance at the corresponding credibility level.

Table 5 :
The results of comparison between the bivariate CAR, univariate CAR, and (aspatial) bivariate Poisson log-normal models.