Traffic Accident Prediction Based on LSTM-GBRT Model

. Road traﬃc accidents are a concrete manifestation of road traﬃc safety levels. The current traﬃc accident prediction has a problem of low accuracy. In order to provide traﬃc management departments with more accurate forecast data, it can be applied in the traﬃc management system to help make scientiﬁc decisions. This paper establishes a traﬃc accident prediction model based on LSTM-GBRT (long short-term memory, gradient boosted regression trees) and predicts traﬃc accident safety level indicators by training traﬃc accident-related data. Compared with various regression models and neural network models, the experimental results show that the LSTM-GBRTmodel has a good ﬁtting eﬀect and robustness. The LSTM-GBRTmodel can accurately predict the safety level of traﬃc accidents, so that the traﬃc management department can better grasp the situation of traﬃc safety levels.


Introduction
"By 2020, half the number of global deaths and injuries from road traffic accidents" is one target of the Sustainable Development Goals (SDGs) published by the United Nations (UN) in 2015 [1]. e country's attention to traffic safety continues to increase. Applying traffic accident situation prediction results to traffic planning can improve traffic safety. Many experts and scholars have predicted some indicators of traffic accidents [2,3]. e research methods are mainly divided into three categories, statistical regression method [4], grey prediction [5], and neural network model method.
Statistical regression methods include time series prediction and many classic traffic accident experience models (Smid model, I. Agalal model, Japanese model, and Beijing model). Yannis et al. [6] proposed an autoregressive nonlinear time-series modelling of traffic fatalities in Europe. Kumar and Toshniwal [7] proposed a novel framework for time series data of road traffic accidents, which segments the time series data into different clusters for trend analysis. Ihueze and Onwurah [8] analyzed road traffic crashes in Anambra State, Nigeria, with the intention of developing accurate predictive models for forecasting crash frequency in the state using autoregressive integrated moving average (ARIMA) and autoregressive integrated moving average with explanatory variables (ARIMAX) modelling techniques. e regression model is simple and convenient to calculate, and it can predict short-term data changes. e essence of the regression model is the linear fit to the data. However, the results predicted by the model are one sided and weak in anti-interference ability. Due to the randomness of traffic accidents themselves, there are many influencing factors. erefore, the reliability of its prediction results is not guaranteed. e grey prediction model can predict a small number of samples, and the principle is simple, the operation speed is fast, and the testability is strong. e grey prediction model can make short-term and medium-term macropredictions for data with little fluctuation. e essence of the model is to find the dynamic relationship between the road traffic accident sequence data. However, the grey theory is modeled for a class of series that conforms to the condition of a smooth discrete function, and the grey system model describes only a process that monotonically increases or decays exponentially over time. Shi et al. [9] proposed a sequence GM (1, 1) model with strong exponential law to predict traffic accidents, but the model can only describe the monotonous change process. Hosse et al. [10] applied a Grey Systems eory MGM (1,4) in order to predict the development of road traffic accidents in Germany until 2025 based on the market diffusion of electronic stability program (ESP). Liu and Wu [11] proposed a grey Verhulst prediction model for road traffic accidents, which is suitable for nonmonotonic wobble development sequences or S-shaped sequences with saturation. Zhao et al. [12] proposed a model that weighted and combined a variety of grey prediction methods. Although the prediction accuracy has been improved, its essence is a linear combination of the original data and there are still shortcomings in the medium-and long-term prediction. e neural network prediction method has strong nonlinear mapping ability, high robustness, and powerful self-learning ability and has been widely used in many fields. He and Guo [13] proposed a traffic accident prediction model based on the BP neural network. e model can implement any nonlinear mapping, especially suitable for complex internal mechanisms. e shortcomings of the BP neural network model include slow training convergence, long training time, and easy to fall into the saddle point. Liwei et al. [14] proposed a grey neural network model. e grey theory compensated for the shortcomings of data mining for small sample data distortion, while the neural network compensated for the shortcomings of grey theory that can only be used for short-term prediction. Although the model improves the training speed, the accuracy of the model prediction results is low and the deviation is too large.
is paper proposed an LSTM-GBRT model for traffic accident prediction. e LSTM layer captures time-dependent information in the data; the GBRT model has the advantage of high robustness of ensemble learning for model training.

Long Short-Term Memory.
e LSTM [15] model proposed by Hochreiter et al. is a variant of the recurrent neural network (RNN). It builds a specialized memory storage unit that trains the data through a time backpropagation algorithm. It can solve the problem that the RNN has no longterm dependence. e schematic diagram of the LSTM structure is shown in Figure 1. e standard LSTM can be expressed as follows. Each step t and its corresponding input sequence are X � x 1 , x 2 , . . . , x t , the input gate is t, the forget gate is i t , and the output gate is f t . Memory cell state c t controls data memory and oblivion through different gates. e formula is as follows: (1) e memory cell state c j t of the unit time t of the jth LSTM is as follows: After the memory cell state is updated, calculate the current hidden layer h j t : where W is the weight matrix of the input, U is the state transition weight matrix, σ is the sigmoid function, tanh is the hyperbolic tangent function, h t is the hidden state vector of the output, c t is the new cell state after the adjustment and update, and "∘" indicates point multiplication. e three types of gates jointly control the information entering and leaving the memory cell state, and the input gates adjust new information into the memory cells; the forgetting gate controls how much information is stored in the memory cells and how much information can be output by the output gate definition. e gate structure of the LSTM allows the information in the time series to form a balanced long short-term dependency.

Boosting Ensemble Learning Framework.
GBRT model is a boosting [16] type ensemble learning algorithm. Ensemble learning is a technical framework that combines multiple different models to perform the corresponding tasks in order to achieve more efficient and accurate arrival. Currently used ensemble learning frameworks include bagging, boosting, and stacking. e training process of the boosting framework is stepped, the base model is trained in order, and the training set of the base model is transformed according to a certain summary strategy. en, the prediction results of all the base models are linearly integrated to produce the final prediction result. Figure 2 is a schematic diagram of the boosting ensemble learning framework. e overall model based on the boosting framework can be described by a linear combination: where h i (x) is the product of the base model and its weight. e training goal of the overall model is to approximate the predicted value F(x) to the true value y, that is, to make the predicted value of each base model approximate the partial true value to be predicted. h t is tested by using training examples, and the weight of misclassified instances is increased. e researchers came

Data vector Dot product
Sigmoid σ Activation function ∫ Fit the residual. Introducing an arbitrary loss function and fitting the inverse gradient

Gradient Boosted Regression Trees
Model. For a given data set with n examples and m features D � (x i , y i ) (|D| � n, x i ∈ R m , y i ∈ R), a tree ensemble model uses K additive functions to predict the output. where is the space of regression trees. Here, q represents the structure of each tree that maps an example to the corresponding leaf index, T is the number of leaves in the tree, and each f k corresponds to an independent tree structure q and leaf weights w. Unlike decision trees, each regression tree contains a continuous score on each of the leaf, and we use w i to represent score on the ith leaf.

Road Safety Impact Factor Data.
As we all know, the occurrence of traffic accidents is caused by the combination of factors such as people, vehicles, roads, and the environment. People include pedestrians and drivers; vehicles include motor vehicles and nonmotor vehicles on the road; road conditions are the condition of the road; environment refers to the natural environment and social environment, and the social environment includes political, economic, cultural, and other factors. On the premise of collecting data, we should consider as much as possible the relevant data of the accident. e data used in this paper include gross national product (GDP) (100 million yuan), per capita GDP (yuan), gross national income (RMB 100 million), road mileage (10,000 kilometers), highway mileage (10,000 kilometers), number of civilian vehicles (10,000 vehicles), number of drivers (10,000 people), passenger traffic (10,000 people), road passenger traffic (10,000 people), total population at the end of the year (10,000 people), male population (10,000 people), female population (10,000 people), urban population (10,000 people), rural population (10,000 people), and the total number of deaths from traffic accidents per year (person). e data used are from the 1997-2016 data of the National Bureau of Statistics of China. e data are shown in Table 1.

Prediction Index for Road Safety
Level. e measures of traffic safety level generally include the number of accidents, deaths, injuries, and property losses. To ensure the accuracy of the data, indicators such as the number of accidents, the number of injured people, and economic losses are subject to subjective influence and the accuracy is difficult to judge. e statistics on the number of deaths are true and reliable, difficult to falsify, and comparable. erefore, this article uses the number of deaths as a predictor of traffic safety levels to predict the number of deaths.

Variable Correlation Analysis.
If the information in the data is uncorrelated or noisy, the quality of the predictions may be affected [17]. In this paper, by comparing the chisquare value and the Pearson correlation coefficient to where k is the variable of the kth group, r is the variable number, c is the target variable number, d is the degree of freedom � (r − 1) * (c − 1), f ij is the observation frequency of the variable V k , and F ij is the expected frequency of the variable V k . e Pearson correlation coefficient is calculated as follows: where R is the correlation coefficient, X is the independent variable, Y is the dependent variable, X is the mean of the independent variable, and Y is the mean of the dependent variable. e chi-square test can calculate the degree of deviation between samples, and the greater the score of the chi-square test, the more obvious the association exists.
e Pearson correlation coefficient can roughly give the degree of correlation between variables, and the absolute value of the Pearson coefficient indicates the degree of correlation. According to the chi-square score and the Pearson coefficient in Table 2, we removed the variable with the smallest chi-square score (highway mileage) and removed the variable with the smallest Pearson coefficient (road passenger traffic). Finally, 12 related independent variables and death toll were used as input variables, a total of 13.

Model Performance Evaluation Index.
In this paper, error rate (E) and root mean square error (RMSE) were used to compare the predicted deviation degree, and root mean square logarithmic error (RMSLE) and decision coefficient (R-square) were used to measure the fitting capacity of the model. e error rate and root mean square error formula are as follows: e formula for the logarithmic error and the coefficient of determination of the root mean square is as follows: where n is the number of samples, Y 0 is the original value, Y p is the predicted value, and Y mean is the sample mean.

LSTM-GBRT Modelling Methodology
e LSTM neural network is capable of capturing timedependent information and has an excellent effect on time series prediction, but it is insufficient in predicting inflection point data. e GBRT model is a typical representative of the ensemble learning algorithm, and the model is robust. In this paper, the LSTM-GBRT model is proposed by combining the two methods. e LSTM neural network is used to extract the features with time-dependent information. e features are trained by the GBRT model to predict traffic accidents. e structure of the LSTM-GBRT model is shown in Figure 3.

Normalization.
e raw data are processed using minmax normalization to eliminate dimensional differences. A linear transformation of the original data causes the result to fall into the [0, 1] interval, and the conversion formula is as follows: where max represents the maximum value of the feature in the sample data and min represents the minimum value of the feature in the sample data; x represents raw data, and X represents normalized data.

LSTM Layer Hidden Unit Number.
ere is no clear theoretical guidance for determining the number of nodes in the hidden layer. In general, use the following formula to select the number of nodes: where N is the number of hidden nodes; n is the number of input nodes; m is the number of output nodes; and a can take a constant of 1 to 10. In this paper, there are 13 input nodes and 1 output node. According to formula (7), the number of hidden nodes is 5∼13. Try a different number of hidden layer nodes using 1 layer of LSTM and judge the degree of deviation according to the error rate and root mean square error, so as to select the number of hidden layer nodes.
e experimental results of the test set show that the LSTM model using 11 hidden nodes has the smallest RMSE value and the best prediction effect. e detailed error rate and root mean square error results of the test set are shown in Table 3.

LSTM Layer Depth.
Since there are only 19 records in this example, the model depth is too high, which will cause the data to be overfitting. e experiment uses Journal of Control Science and Engineering 1∼5 layer models for comparison, and 11 hidden nodes are used for each layer. After training, the model performance is judged by calculating the root mean square logarithmic error and the decision coefficient of all records. e fitting results are shown in Table 4.
e smaller the RMSLE model, the better the fitting effect. e closer the R-square is to 1, the stronger the ability of the variable to interpret y and the model fits the data better. According to the results in Table 4, the 2-layer LSTM model has the best fitting ability.

GBRT Layer Regularization.
e regularization formula is as follows: where Ω(f) � ΥT + (1/2)λ‖ω‖ 2 . Here, l is a differentiable convex loss function that measures the difference between the prediction y i and the target y i ; the second term Ω penalizes the complexity of the model (i.e., the regression tree functions). e additional regularization term helps to smooth the final learnt weights to avoid overfitting.

Hyperparameters of LSTM-GBRT Model.
e hyperparameters of the LSTM layer include the number of network layers, the number of hidden cells in the layer, the learning rate, and the optimizer type, and the parameter settings are shown in Table 5.
e hyperparameters of the GBRT layer include the learning rate, the number of estimators, the maximum depth of the tree, the number of split nodes in the sample, the minimum sample required for the leaf nodes, and the loss function. is paper uses GridResearchCV to automatically find the optimal superparameters. e final parameter settings are shown in Table 6.

Experimental Environment.
e experimental environment of this example is TOSHIBA satellite S40-A laptop, CPU: Intel(R) Core(TM) i3-3217U CPU at 1.80 GHz, running memory is 10 G, operating system is Windows 10 Enterprise Edition 2016 long-term service version, development environment. To use the PyCharm integrated development tool of Python 3.5 language, use the neural network model such as LSTM provided by Keras and use the GBRT model provided by skit-learn.

Experimental Design and Analysis of Results.
Experiments include traditional regression models, neural network models, and integration model types of experiments. e experimental items are multivariate nonlinear regression (MUL), BP neural network model (BP), long-and short-term memory neural network model (LSTM), gradient boosted regression trees model (GBRT), and LSTM-GBRT model. e 15 data from 1998 to 2012 were used as training sets, and the four data from 2013 to 2016 were used as test sets. Use the data from the previous year as an input sample to predict the number of traffic accident deaths in the coming year. Figure 4 is a trend chart of actual traffic accident deaths from 1998 to 2016.
After experimental training, the prediction results of each model in the test set are shown in Table 7.
e prediction results in the test set show that the BP neural and MUL regression models have no obvious regularity, and the prediction accuracy is not high. e accuracy of LSTM in 2013, 2014, and 2015 was extremely high, and the deviation in 2016 suddenly increased. e trend of the actual number of deaths in Figure 4 is analyzed. 2016 is the year of the inflection point in the time period, and the trend of the first three years is consistent with the trend of the training data set, indicating that LSTM has an excellent prediction effect on the same trend. Conversely, when the forecast is the inflection point of the trend, the performance will suddenly drop. It also proves that the LSTM model can indeed learn the time-dependent information in the data.       Figure 5 shows the actual death toll in 1998-2016 and the fitted prediction results for each model.
After observing the prediction results of each model, we evaluate the effect of fitting all the data of each model. In addition to the performance indicators mentioned in Section 3.4, we add the training time-consuming indicators of the model for comparison. e performance indicators are shown in Table 8. e performance indicators of different models in Table 8 show that the LSTM-GBRT model has the smallest RMSLE value, the best model fitting effect, the R-square value is closest to 1, and the variable has the strongest interpretation ability for the predicted value, but the training time is the longest. GBRT model training time is the shortest. e prediction performance of the GBRT model is not bad but slightly lower than the LSTM-GBRT  model. e performance of the LSTM neural nework is lower than the GBRT model, and the performance of the MUL regression model and the BP neural network model is poor.
In terms of training time, the MUL regression model and the GBRT model have the shortest training time because their essence is a linear combination of mathematical data; the LSTM-GBRT model has the slowest training time, and LSTM model training time is very close to BP neural network training time.
e training time of the neural network is obviously higher than that of the former because the training of the neural network model needs to construct a complex network structure.

Robustness Analysis.
e occurrence of a traffic accident is influenced by many factors. e predictive model can predict the complex and variable conditions more stably, which indicates that the model has better robustness.
When analyzing the robustness of the model in this experiment, two aspects should be considered: first, internal factors, whether there are abnormal fluctuations in the model training data; second, external factors and policies proposed at the social level have promoted or inhibited the predicted data. Regardless of internal factors and external factors, the core is in the data. For model training data, the role of external factors is still indirectly affecting the data required for training, and then the effect of prediction is reflected. e model which is difficult to control is the external factor.
In this case, the model uses annual periodic data and policy factors have a short period of action and can cause less data fluctuation, so the robustness is better. When the model uses more sophisticated data, the influence of data fluctuation will increase. Firstly, anomaly data should be analyzed visually, the uniformity of each variable should be observed, and the uneven data should be processed, such as log function. Secondly, the abnormal variables are divided into two or more types of processing strategies. After the correlation analysis is completed, two or more models are established to train and predict. Finally, the prediction results of the multiple models are accumulated. Model training for specific data classification can also improve the accuracy of prediction, thus enhancing the robustness of the model.

Conclusion
e prediction of traffic accidents is of great significance. e future traffic accident trend forecasting work can help the traffic management department to grasp the trend dynamics in time, discover the rules of traffic accidents, formulate laws and regulations according to the rules, make scientific decisions, and construct the traffic system reasonably.
is paper proposes a road traffic accident prediction model based on the LSTM-GBRT model. Compared with the traditional regression model, the traditional BP neural network model, the LSTM neural network model, and the GBRT model, the experimental results show that the LSTM-GBRT model has the strongest ability to fit the data and the variable has the best interpretability for the predicted value. e model has a good predictive ability for the trend of road traffic safety level and can provide more accurate forecast data for the traffic management department, so that the traffic management department can better grasp the situation of traffic safety levels.
e model proposed in this paper also has defects. (1) Data collection, model training lacks relevant data on environmental factors. Due to the large randomness of road traffic accidents, its occurrence is affected by many factors. e environmental data belong to spatiotemporal data, which are difficult to collect, and the data of annual accident traffic accidents are not easy to quantify, so the weather environment factors are lacking in the model training data. (2) In terms of inflection point prediction, since the inflection point of the trend is unlikely to be discovered by the model in advance, the forecasting ability of the inflection point of the possible future trend is poor.
is paper takes China's annual traffic accident data as the research object, the proposed prediction task of the model is relatively macroscopic, and the predictability of microdata needs further experiment. is paper considers adding more relevant features, but in the macrodata forecasting work, related features are difficult to obtain or difficult to quantify. In the future, when taking microtraffic accident data as the research object, consider adding more features.
Data Availability e raw data we used were official open data published by the UK Department of Transportation, and our experimental data were filtered from raw data online available at https://data.gov.uk/dataset/road-accidents-safety-data. e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.
Acknowledgments is work was funded by the Xinjiang Uygur Autonomous Region Natural Science Fund Project "Research on Highway