The Model of Severity Prediction of Traffic Crash on the Curve

With the study of traffic crashes on curved road segments as the focus of research, a logistic regression based curve road crash severity prediction model was established based on a sample crash database of 20000 entries collected from 4 regions of China and 15 evaluation indicators involving driver, driving environment, and traffic environment factors. Maximum Likelihood Estimation and step-back technique were deployed for data analysis, the conclusion of which is that the three main contributory factors on curve road crash severity are weather, roadside protection facility, and pavement structure. Hosmer and Lemeshow tests were used to verify the reliability of the model, and the model variables were discussed to a certain degree as well.


Introduction
As a major component of road geometric design, curved road segments, due to their alignment characteristics, are most prone to traffic crashes among all road geometric elements.In 2010, crashes on curved segments accounted for 10.5% of the total number of traffic crashes in China.Correspondingly, the number of deaths accounted for 12.89% of the total number of deaths, as shown in Figure 1.
Many researches have been conducted with regard to the characteristics and causes of crash on curved segments, ranging from those based on macroeconomic statistics, where crash rates of straight and curved segments are compared [1,2], and those based on vehicle dynamics, where the focuses are on improving vehicle's safety and through performance on curves [3,4], to those that are based on the study of human factors, where the studies of traffic crash have been heavily focused on drivers and driver behavior [5].Vehicle speed and geometric design are also topics of study when curve road crashes are concerned [6,7].Regarding crash severity, certain mathematical models have been established, such as the Bayesian Method [8], the Ordered Probit Model [9], and the Neuronal Network Approach [10].However, among the aforementioned studies and researches, few had aimed specifically at investigating crash severity on curved road segments.
Analysis and prediction models are based on crash data.In the past, data used by traffic crash studies on curve roads was mostly gathered from specific segments.Crash samples and crash-related factors had not been duly considered either.For these reasons, crash analysis would often fail to explain the real causes of crash, and the models were often questionable in terms of reliability and adaptability.
In this study, the crash data consists of 500 randomly chosen samples, according to the data gathering and management standards of China's "National Road Traffic Crash Information System, " from a crash database of 20000 valid crash data entries that covers 4 regions of China from 2004 to 2008.The database comprises 60 items of crash information, which is comprehensive enough to recreate the process of a crash and provide an important basis for the analysis of causes.In terms of analysis method, the crash data is the foundation of analysis of the cause of crash and the structure and form of the crash determines the model for crash causation analysis.Since the attributed causes of each crash cannot be isolated from a single point of interpretation [11]   comprehensive analysis of the severity of crash on curves from three aspects that include driver, vehicle type, and the road environment.
In recent years, due to its innate suitability, the logistic regression model has been widely used in practical problems, such as Banking [12], Genomics [13], and Psychopathology [14].Many researchers used the logistic regression model to analyze the safety of roads, such as crashes of single-vehicle motorcycles [15], and crash prevention [16,17].Descriptive factors are defined as natural numbers that start from 0 and can be transformed into discrete variables.By this transformation, traffic crash data can meet the requirements of logistic regression model; therefore, the severity prediction model of traffic crash on curved segments may be built based on these data.The model [18] can overcome the deficiencies of analysis method and linear regression analysis of the traditional Mantel-Haenszel model, for it can contain multiple influencing factors that include an analysis of both discrete and continuous variables.The model can effectively analyze mixed influences and interactions from external variables and therefore also provides a methodological basis for the quantitative description relationship between multiple influencing factors and the prediction of severity on curves.

Binary Logistic Regression Probability Formula.
The severity of traffic crash on curved segments is regarded as dependent variable , when the th crash has any number of deaths,   = 1; otherwise,   = 0.
Assuming there are  influence factors that are related to dependent variable  and are denoted  = ( 1 ,  2 , . . .,   ), the probability of fatal crash under the impact of influencing factors is where   ( = 1, 2 . . ., ) are the influencing factors of crash severity; it can be a continuous variable, categorical variables, or dummy variables;   ( = 1, 2, . . ., ) are the regression coefficients.

Binary Logistic Regression Model Parameter Estimation.
Considering the severity of the crash to be a dichotomous dependent variable,  1 , . . .  are its corresponding independent variables.Let   = (1, 1 , . . .  )  , and  = ( 0 ,  1 , . . .  )  be its corresponding variable vector,  = 1, . . ., .Thus, a logistic model may be established as [15] logit in which ).The model used Maximum Likelihood Estimation method for variable estimation and the likelihood function of variable  = ( 0 ,  1 , . . .  )  can easily be derived from the binary logistic regression model, which is where   = 0 or 1.Therefore, the logarithmic likelihood function is For (L())/ = 0, it resulted in In the logistic regression model, the Newton-Raphson method is used.Define  i =   −   , V  =   (1 −   ), and let  = ( 1 , . . .,   )  ,  = diag(V  ); then If the number of iterations is , then the maximum likelihood estimate of variable  becomes

Selection of Crash Impact Factor
Through a detailed analysis on the structure of the crash database, it is found that the database is a superdimensional structure with radial, multidimensional, and multilevel characteristics.Each crash record contains multiple data attributes and each value reflects a traffic crash's characteristics in one aspect.In addition, the setting of data attributes is based on five factors, which are the information of the crash, personnel information, vehicle information, road information, and environmental information.Therefore, logically the data attributes and five factors of people, vehicles, roads, climate environment and crash basic information formed a two-level formation, in which the five factors of people, vehicles, roads, environment, and crash basic information are in the upper layer, and the data attributes are in the lower layer.Also, different attributes and the attribute value formed a set complying with their specific logical relations and within each attribute there existed a specific hierarchy.500 crash samples, randomly selected from a crash information database of 20000 entries, have been analyzed.The causes of crash are mostly related to bad subjective judgment and human errors, vehicle performance issues, change of external environment, and change of road conditions.However, vehicle performance is generally not considered in crash analysis and is neglected in this study.
Therefore, from three-level analysis system of drivers, driving environment, and road environment, the model selects 15 evaluation items as independent variables.The evaluation of independent variables is shown in Table 1.

Prediction Model of Traffic Crash Severity on Curved Segments
The model used stepwise regression method to analyze the independent variables and step-back technique to obtain the results.In step 1, 15 independent variables were all put into the model and the variables based on the probability of the likelihood ratio for the test were assumed.In step 13, the weather  6 , roadside protective facility type  12 , the road pavement  14 , and constant were selected.Results are shown in Table 2.
Taking a significance level of 0.05 and using the reverse stepwise method and 13 times of screening for selection, the model obtained the correlation of crash severity on curves with weather, road-side protective facility type, and road pavement structure.From Hosmer and Lemeshow tests, the results in Table 3 showed that the significance level of 0.674 is greater than 0.05 and thus proved that the original assumption is valid.Also, the chi-square value from 7.856 of the first screening reduced to 0.79 in the 13th screening, which proved that the model is correct.
The probability of crash with death from the logistic regression is expressed by

Conclusion
(1) Not all of the 15 impact factors are selected to put into the curve road crash severity prediction model.Some were excluded partly due to weak correlations, but it does not mean that they have little impact on the severity of a crash.For example, visibility and road surface conditions are closely related to weather conditions and their effects may be indirectly reflected in the final model, if such factors are to be further considered.
(2) During bad weather conditions, road surface friction coefficient decreases and visibility of road condition reduced, and vehicles driving on curved segments will be prone to cross over the median, causing side scraping, rollover, or rear-end crashes.During rainy or foggy weathers, water film will form on the surface of the road reducing tire friction.Therefore, weather is one of the main factors that affect crash severity on curved segments, and the severity will be even greater especially under ice and snow conditions.
(3) Highway safety design often focuses on road alignment and profile, but lacks due considerations toward roadside facilities.However, in the real world, road-side environment in general is closely related to traffic crashes.On curve roads, off-road crashes frequently occur and it is mainly because of improper speed control and road-side environment, especially on mountainous curves where one side of the road is either deep grooves or cliffs.If curved segments have roadside protection facilities, it can greatly reduce the severity of the crash.Hence, better design and installment of road-side protection facilities should be one of the priorities for preventing severe crashes from happening.
(4) Compared to gravel pavement, asphalt roads generally have higher crash rates.Also, compared to the cement concrete pavement, asphalt roads have smoother surface and greater involuntary horizontal movement from vehicles.Therefore, drivers usually feel more comfortable driving on an asphalt-paved road surface, which often leads to high speed driving and speed related crashes.Asphalt pavement is more sensitive to the temperature; its structural strength decreases at high temperature and cracks at low temperature.However, drivers would often neglect the effect of temperature changes on the condition of road surface and their effects on safe driving, which is another very important factor in causing severe crashes.
(5) The existing road traffic crash database of China was originally designed to determine the responsibility of crash for legal purposes.With its 60 data items or attributes,   although very comprehensive, it still needs future improvement in order to better describe the whole process of an actual traffic crash and to be used as the basis for crash prediction and prevention studies.

Figure 1 :
Figure 1: National crash loss on the curve in 2010.

Table 1 :
Evaluation of independent variables.