Combined Prediction Model of Death Toll for Road Traffic Accidents Based on Independent and Dependent Variables

In order to build a combined model which can meet the variation rule of death toll data for road traffic accidents and can reflect the influence of multiple factors on traffic accidents and improve prediction accuracy for accidents, the Verhulst model was built based on the number of death tolls for road traffic accidents in China from 2002 to 2011; and car ownership, population, GDP, highway freight volume, highway passenger transportation volume, and highway mileage were chosen as the factors to build the death toll multivariate linear regression model. Then the two models were combined to be a combined prediction model which has weight coefficient. Shapley value method was applied to calculate the weight coefficient by assessing contributions. Finally, the combined model was used to recalculate the number of death tolls from 2002 to 2011, and the combined model was compared with the Verhulst and multivariate linear regression models. The results showed that the new model could not only characterize the death toll data characteristics but also quantify the degree of influence to the death toll by each influencing factor and had high accuracy as well as strong practicability.


Introduction
With the gradual progress of "Science and Technology Action Plan of Traffic Safety" and the implementation of "Law of the People's Republic of China on Road Traffic Safety, " the number of traffic accidents and the degree of injury have shown a decreasing trend since 2004; however, the death toll was still about 60 000 every year.Among the four traffic accident indicators, death toll has a direct effect on the sense of security and the degree of stability of the society, so understanding the death toll in the future will be of great guiding significance for making subsequent traffic management measures and policies and will play a guiding role in the development and orientation of traffic safety guarantee technology.Therefore, the agreement on the predictions of death toll has always been the key point of related studies [1][2][3][4][5].However, traffic accident is difficult to forecast due to its randomness.And because the related forecasting methods are affected by various factors, the precision is difficult to guarantee.
The commonly used predicting methods include regression analysis method, exponential smoothing method, fuzzy analysis method, and time series method.In the area of traffic safety and accident, gray theory, Markov method, and artificial neural network are several major predicting methods.For example, on the basis of gray model GM (1,1) for traffic accident prediction, Markov chain prediction method was introduced by Li et al. [6] and then the gray Markov prediction model was built by him.By making use of the advantages of artificial neural network such as strong nonlinear approximation, fuzzy reasoning, and selflearning, Dong and Shi [7] built the BP neural network prediction model of traffic accident.Zhang et al. [8] and others, using ARIMA model, did some research on the time series' stationarity of the mortality among 100 000 people in traffic accidents in China from 1970 to 1997 and used SPSS software to fit the model and made forecast.The conclusion was as follows: ARIMA model could improve prediction accuracy and could be applied to different kinds of nonseasonal and seasonal time series.Based on gray system theory and Markov theory, Zhao and Xu [9] and others used the system cloud gray model SCGM(1, 1) c to fit the general trend of the time series data of road traffic and put forward 2 Computational Intelligence and Neuroscience the gray weighting Markov model SCGM(1, 1) c which could be used to predict the number of traffic accidents.This model was suitable for dynamic prediction with short time series, less data, and not too big random fluctuations.
The above methods have their own characteristics, but each has its own defects; for example, using the GM (1, 1) model in gray theory, we can proceed from traffic accident data to analyze the characteristics and change law of the data and to predict the trend in the future.The model is easy to use and there is no need to consider other factors, but it can only describe monotonic changing process.If combining it with Markov theory, we can get a new model "gray Markov model, " which is applicable to random fluctuation process of traffic accidents; however, as for the model, there are no uniform standards for the classification of the system states.Artificial neural network is a kind of method which simulates the process of information input and decisionmaking output of human brains, during which process, the specific process of information processing and model building is not shown, which is very simple and convenient, but the accuracy is influenced by the data greatly.Multiple regression method can build a mathematical relationship between accident results and related factors and quantify the process and extent of the influence of various factors to accidents, whereas the accuracy of the model is comparatively poor, for the selection of factors is variable, and the future trends of the factors must be predicted prior to the final prediction of accidents, which means that the predicted data will be used as dependent variables of next prediction.
This paper concluded and analyzed both advantages and disadvantages of the above models.To begin with, it planned to use Verhulst model, most suitable to traffic accident in gray theory, to make a preliminary prediction on the basis of analyzing the characteristics of accident data [10,11], which could characterize the changing trend of accidents; and meanwhile, in order to reflect the impact of other factors on traffic accidents, a multiple regression model of death toll caused by traffic accidents was built to analyze the dependent relationship of traffic accidents; then, for the purpose of combining the advantages of the two models, a combined prediction model of traffic death toll, based on independent and dependent variables, was built.This model could not only reflect the fluctuation law of the change of traffic accidents but also reflect the dependent law of traffic death toll caused by the interaction of multiple factors.

The Verhulst Model of Traffic Death Toll
2.1.Verhulst Model.In recent years, the development tendency of traffic death toll in China has shown a saturated S-shaped process, so it was suitable to use Verhulst model to make prediction [12].The fundamental and process of Verhulst model building are as follows.
Using the least square method to solve the problem gives α = (  ) −1   .

Analysis of Model Influence
Factors.Road traffic system is mainly composed of human, vehicle, road, and environment, and each subsystem contains multiple factors.If one or more factors go wrong, the traffic safety will be discounted and the probability of traffic accidents will increase.Therefore, road traffic accident prediction needs to be analyzed from the above four systems in both macroscopic and microcosmic terms; the features of accidents should be considered, the factors related to accident greatly studied, and the process and inducements of accidents quantified.This paper studied the death toll law and the future development tendency of traffic accidents in China, which belonged to macroresearch, and thus this paper planned to select some macroindicators as factors, such as population, vehicle population, highway mileage, and passenger and freight turnover volume.The main reason for selecting macroindicators was that the above factors could reflect the overall degree of traffic activity.For example, with a large population base, the trip volume was relatively larger; the increase of vehicle population and highway mileage would encourage travelers to travel; passenger and freight turnover volume could directly reflect the frequent traffic behaviors of passengers and freights.Numerous traffic behaviors would increase the cardinal number of traffic accidents and had something to do with traffic accidents.Besides, influenced by policies and security technology as well as some other factors, the number of traffic accidents and death toll should be subject to change; however, these kinds of factors were difficult to quantify, and if quantification was unscientific, the correctness and precision of the prediction model would be affected, so this paper would not select relevant indicators for the time being.

Death Toll Forecast.
With the traffic accident death tolls from 2002 to 2011 in China as dependent variables and the above relevant data as independent variables, a model was built.The detailed data were shown in Table 3. SPSS 18.0 was used to build a multiple linear regression model, to calculate the correlation between the above factors and dependent variables-death tolls (see Table 4)-and to calculate determination coefficient  2 , test value , and test value .The details were shown in Table 4.
In Table 4, the minimum value of correlation coefficient between independent variables and death tolls was 0.890, which indicated that the above six factors had significant correlation with traffic death toll, so it was feasible to build a multiple linear regression model.In addition, the determination coefficients of the regression model were  2 = 0.996,  = 67.431,which indicated that the regression degree of model equation to data was very high.The coefficient of each factor was shown in Table 5.
The equation of regression model could be obtained by the above coefficient values: In the equation,  was death toll/person,  1 was vehicle population/10 4 vehicles,  2 was population/10 4 persons,  3 was GDP/10 8 yuan,  4 was freight volume by road/10 4 ,  5 was passenger volume by road/10 4 persons, and  6 was road mileage/km.
The death tolls from 2002 to 2011 were predicted again by the above equation, and the prediction values as well as the relative errors could be seen in Table 6.

Combined Prediction Model of Traffic Death Toll
Suppose the combined prediction model is () = ∑  =1   ⋅   (), in which   is weighting coefficient of each model,   ≥ 0, and ∑  =1   = 1.The model in this paper was the combination of two models, so we could let  1 () be the Verhulst model and let  2 () be the multiple regression model to build a combined prediction model () =  1 *  1 () +  2 *  2 ().
In the above combined prediction model, weighting coefficient would affect the accuracy of the model directly, so the selection of reasonable weighting coefficient was very important.The selection methods included arithmetic average method, standard deviation method, mean square inverse method, analytic hierarchy process, and optimal weighting method.Arithmetic average was the simplest method, but with poorest reasonability, and could not reflect the differences between the models and the contributions to the final prediction results.As for analytic hierarchy process, the value of weighting coefficient must be assigned manually by relevant scholars, which was subject to subjective factors.The accuracy of optimal weighting method was very high, but the calculation was complicated; besides, the weighting coefficient might be negative, which had great limitations in practical application.
This paper determined weighting coefficient by Shapley method, a mathematical method which was proposed by Professor Shapley in 1953 and could be used to solve multiperson cooperation games by achieving a fair and efficient allocation of team total revenue between members [13].The greatest advantage was that principles and results were easy to be deemed as fair by each partner and the result was easy to be accepted.The total error of combined prediction, generated for the reason of joint actions of each single prediction method in the process of combined prediction, could be regarded as a kind of "cooperation relationship" of the prediction methods for the same purpose.

Shapley Value Method.
Suppose the combined prediction model contains  kinds of prediction methods, which could be denoted by  = {1, 2, . . ., },  was any subset of , () was the combined error of this subset, the absolute value of error of the th prediction method was   , and the total error of the combined prediction was .The values were as follows: In the above formulas,  was the number of samples and   was the prediction error of combining th prediction method with th data.The distribution formula of Shapley value was In the formula,  was the set containing prediction model .|| was the number of prediction models in the combination.(||) was weight factor which reflected the contributions of model  in the combined model.−{} was the removal model  in the combination modal.
The weight of each prediction method in the combination prediction was

Weight Calculation of Prediction Model.
According to the results of Tables 2 and 6, the total error of the combined prediction was  = 1/2 * (1.534 + 2.700) = 2.117.
Based on the concept of Shapley value, the "cooperation relationship" member involved in the total error apportion of the combined prediction model was  = {1, 2}, and the combined errors of all of their subsets were {1}, {2}, {1, 2}, respectively.The values of the combined errors were average values of vector errors contained in the above subsets.See Table 7.
According to Shapley calculation method, the Shapley value of each member was obtained as follows: The summation of the two members was  1 +  2 = 0.4755 + 1.6415 = 2.117 =  3 , which indicated that  8, could be obtained through calculations.Among the above three models, Verhulst model could reflect the fluctuation of accident data, so there was a great change in its precision; however, the comprehensive error 2.700% was still very small.Multiple regression model was affected by multiple factors, and there were many original data for calculation, which could reflect the variation trends of practical data, so the accuracy of the model was very high.The maximum relative error was only 3.977% and the comprehensive error only 1.534%.The relative error of the combined prediction model lay between the errors of the above two models, and the comprehensive error was 2.205%.Though the accuracy was a little lower than that of multiple regression model, the model was applicable to medium and long term accident prediction because the results of this model could reflect the change rule of accident data, and the model adopted the thought of dependent relationship between the factors and the number of accidents in multiple regression model; therefore, it could not only reflect the tendency in the future qualitatively but also reflect the mathematical relationship between the factors and the model quantitatively.9.

Prediction Value of Traffic Death Tolls in
The traffic death tolls in China from 2012 to 2013 were calculated by formulas ( 13), (14), and (20), and the values and relative errors are shown in Table 10.
From Table 10, the death toll prediction value by multiple linear regression model in 2012 is 62402, and the relative error is 4.009%, but in 2013 they are 73962 and 32.035%.By analyzing the passenger volume by road in 2013, it fell to 52.107% compared with 2012.But from 2002 to 2011, passenger volume by road rose close to 10% a year.This may be a great relationship with the operation of China's highspeed rail in 2013, so there is a big gap between prediction result and reality in 2013.The results show that multiple linear regression model has erratic projection during political and economic instability.The stability of Verhulst model is higher

Conclusion
(1) The occurrence of traffic accidents, with great randomness and burstiness, relates to human, vehicle, road environment, and other factors.It is difficult to predict the change rule of accidents with common models.Verhulst model is regarded as the prediction model closest to its own change rule; however, it cannot be used to describe the quantitative influence of other factors to itself.
(2) By quantifying the mathematical relationships between multifactor variables and dependent variables, multiple regression model can reflect the objective law that traffic accidents are affected by lots of factors; however, the factors are difficult to choose, and predicted data are needed for prediction again, so the errors are usually very large.
(3) Combining the above two methods, calculating weight coefficient of each model through Shapley value method, a combined prediction model can be built based on Verhulst model and multiple linear regression model.The combined model can not only describe the features of the data of accident fatalities but also quantify the impact of factors to death toll; in addition, the accuracy is very high and the model is very practical.
China from 2012 to 2013.The data from 2012 to 2013 are not used for modeling, so they are fit for verifying the accuracy of the model, and the related statistical data from 2012 to 2013 are shown in Table

Table 1 :
Traffic accident fatalities in China from 2002 to 2011.

Table 2 :
Predicted value by Verhulst model of traffic accident fatalities in China from 2002 to 2011.
3.1.ModelOverview.The main idea of multiple linear regression method is to build correlation analysis of two or more dependent and independent variables.There were many related researches and the technology was very sophisticated.After the regression model was built, a statistic test of the model was necessary, including determination coefficient test ( 2 test), significance test of regression coefficient (-test), and significance test of regression equation ( test).If the significant test of regression equation failed, it was possible that important factors were missed during the selection of independent variables, or the relationship between independent and dependent variables was nonlinear, in which situation the model should be rebuilt.

Table 3 :
Related statistical data from 2002 to 2011.

Table 5 :
The coefficient value of each factor.

Table 6 :
The prediction value of multiple linear regression model of traffic death tolls from 2002 to 2011 in China.

Table 7 :
Errors of each subset.

Table 8 :
The combined model prediction value of traffic death tolls in China from 2002 to 2011.

Table 9 :
Related statistical data from 2012 to 2013.

Table 10 :
The combined model prediction value of traffic death tolls in China from 2012 to 2013.regression model from Table10, but it has a big relative error too, 13.112% and 14.471%, respectively, in 2012 and 2013.The death toll prediction value of combined model is very good, the relative errors are 9.267% in 2012 and 4.025% in 2013, and the results coincide with the truth, so the combined model has a high validity.