Intelligent Method for Identifying Driving Risk Based on V2V Multisource Big Data

. Risky driving behavior is a major cause of traffic conflicts, which can develop into road traffic accidents, making the timely and accurate identification of such behavior essential to road safety. A platform was therefore established for analyzing the driving behavior of 20 professional drivers in field tests, in which overclose car following and lane departure were used as typical risky driving behaviors. Characterization parameters for identification were screened and used to determine threshold values and an appropriate time window for identification. A neural network-Bayesian filter identification model was established and data samples wereselectedtoidentifyriskydrivingbehaviorandevaluatetheidentificationefficiencyofthemodel.Theresultsobtainedindicated asuccessfulidentificationrateof83.6%whentheneuralnetworkmodelwassolelyusedtoidentifyriskydrivingbehavior,butthis couldbeincreasedto92.46%oncecorrectedbytheBayesianfilter.Thishasimportanttheoreticalandpracticalsignificancein relationtoevaluatingtheefficiencyofexistingdriverassistsystems,aswellasthedevelopmentoffutureintelligentdrivingsystems.


Introduction
In recent years, previous achievements in sensing and big data technologies in vehicles have led to the development of collaborative driving.Among them, V2V, V2E, and V2D technologies are key driving factors to enhance safety level and operating efficiency of the vehicles [1,2].During driving process, drivers tend to show special speed and trajectory control behaviors according to driving environment and relative motion relationships (V2V information) between the ego vehicle and surrounding target vehicles [3,4].Driver behavior is the most active and unruly factor in the road traffic safety system.Accurately and timely identifying drivers' intentions (V2D technologies) and dangerous driving behaviors can not only ensure traffic safety, but also enrich the connotation of collaborative driving [5].
Risky driving behavior is a major cause of traffic accidents, and hence such accidents can be reduced or relieved to some extent by the use of car assistant systems.The main principle of these systems is to monitor driving behavior in real time, acquire data relating to the behavior of the driver relative to the motion state of the vehicle, and identify risky driving behavior and provide a timely warning to the driver.Currently, assistant systems for risky driving behavior are mainly focused on providing warnings for lane departure and vehicle following distance [6,7].
Kozak et al. [8] have proposed adopting yaw and deviation technologies to control lane departure caused by driver fatigue.Angkititrakul et al. [9], on the other hand, presented a concept of virtual lane crossing in which they determined the geometrical characteristics of the lane and the width of a virtual lane by fuzzy logic deduction.Using this, an alarm is triggered when a vehicle is predicted to cross the virtual lane, which provides a higher warning accuracy and a lower false alarm rate.Hsiao et al. [10] presented an implementation scheme for a vision-based lane departure system and established a lane departure decision mechanism that differed from TLC (time to lane crossing) [11] and CCP (current car position) [12] in using image preprocessing, binarization processing, the selection of dynamic thresholds, and the fitting of linear and parabolic models.This mechanism uses the angle between the lane and the horizontal axis as a standard by which lane departure is judged.To deal with tracking and detecting lane departure under actual road conditions, Yu et al. [13] proposed using monocular visual characteristics to track and detect lane departure, that is, determining the positional relation between the vehicle and lane line in real time with pinhole cameras, and then using this to deduce the functional relation between the lane departure rate and the slope ratio of lane lines on both sides during straight driving.The output of this function can then be used to judge whether the vehicle has departed from its lane.Jin et al. [14] established a vehicle following distance model for drivers based on the desired distance and investigated the variation in this distance between drivers with different driving habits based on full-scale vehicle data.The effectiveness of their model was also verified using the software application PreScan.
All the algorithms outlined above require vehicle system dynamic models to varying degrees and do not fully take into account the dominant role of driver behavior in predicting risky driving behavior based on multisource big data during driving process, such as the vehicle's motion state and relative motion between vehicles, which may affect the classification efficiency.This study therefore aims to identify risky driving behavior using a neural network and Bayesian filter and to verify the efficiency of this identification model in relation to overclose vehicle following distance and lane departure.
The rest of this paper is organized as follows.In Section 2 we introduce the experiment under naturalistic driving conditions.In Section 3 risky driving behaviors are divided into different categories.Neural network and Bayesian filter models are described in Section 4. In Section 5 risky driving behaviors identification methods are presented in detail.And finally we conclude this work in Section 6.

Participants.
A total of 20 professional drivers, including 12 males and 8 females, were recruited for the field test.All drivers were between 28 and 50 years old, with an average age of 41.1 and a standard deviation of 5.85 years.All participants had held a driving license for over 5 years, and during this time they had driven at least 80,000 kilometers.A physical examination was organized prior to field testing, and all participants qualified with no visual, physical, or emotional impairment.Each participant received appropriate payment to compensate for their loss of working time after the test was over.

Test Process.
Most studies into driving behavior were conducted on a driving simulator for the sake of convenience and ease of implementation.However, unlike real-world driving tests, the driving simulator could not accurately reflect the influence of the surrounding environment on drivers.As a result, a platform was established for collecting driving behavior data based on full-scale vehicles to determine characteristic parameters such as the behavior of the driver and the state of motion of the vehicle.In field driving tests based on this platform, participants were allowed to complete driving tasks by relying on their own driving habits and expectations based on personal judgments and decisions of the real-time traffic state [15,16].
Before testing, drivers were asked to fill out a personal information form that was used to calibrate the test equipment according to the physiological and psychological characteristics of each participant.The detailed test route was then introduced along with matters needing attention.Each driver was given 15 min to familiarize themselves with the test platform and vehicle, with field testing then beginning after an 8 min break.

Test Platform.
The test platform included a FaceLAB 5 eye tracking system to track the eye and head movements of drivers; a millimeter-wave radar to measure the relative distance, relative speed, and angle between the test vehicle and target vehicle; a VBOX system to ascertain the vehicle's speed, transverse acceleration, longitudinal acceleration, yaw velocity, and so on; a lane line identification system for monitoring the distance between the vehicle and lane crossing; and a torque transducer to measure the steering wheel angle and angular velocity of steering.A detailed description of the system is provided in Figure 1.We developed a synchronous acquisition system to collect the driving behavior data, and the overall sampling frequency is about 10 HZ, which could meet the requirement of driving risk identification.

Test Route. The Chongqing-Guizhou Expressway from
Sigongli to Qijiang Section was selected as the test route to simplify the system composition and reduce the influence of interference factors on the test results.The test route was a 60kilometer two-way 6-lane road with a dividing green belt and a speed limit of 110 km/h.Traffic monitoring was conducted for 2 weeks prior to testing, and the test time was determined based on these results to minimize the influence of traffic volume change on driving behavior.

Definition of Risky Driving
Behavior.The first risky driving behavior of interest in this study is an overclose vehicle following distance, which is defined as a distance between the vehicle and the target vehicle ahead that is less than the safe threshold value.When vehicles are too close to each other, any unexpected braking action by the vehicle ahead may cause tailgating or impact, which can impose psychological pressure on the driver of the vehicle in front [17][18][19].
The second risky driving behavior is lane departure, which occurs when the vehicle runs on, or too near to, the lane crossing.Note that such behavior is distinctly different from that necessary for drivers to change lanes actively and safely.Lane departure is more dangerous than an overclose following distance because it is more difficult for drivers ahead and behind the target vehicles to judge the intentions of others, thus making traffic accidents more likely, especially when travelling at high speed [20,21].The technical route for identifying these risky driving behaviors is shown in Figure 2.

State Division and Determination of Characterization
Parameters.Driving behavior in this study was classified as one of three states, namely, normal driving behavior, overclose vehicle following, and lane departure.The typical scenes played out during driving are shown in Figure 3.The relative distance   , relative speed, and angle between vehicles was acquired using the millimeter-wave radar.Meanwhile, the distance between the target vehicle and lane line was determined by the lane identification system and CCD camera.
Figure 4 provides a sample of lane departure data for when the driver drove on the lane line after changing lanes.The lane line distance in the figure refers to the distance between the left tires and the left lane line after processing through a Kalman filter to ensure data continuity.A vehicle is considered to be running on the lane line when the lane

Target lane
Current lane line distance remains at approximately 0 cm.The mean value of the data in a single identification time window serves as the criterion for state division.That is, when the lane line distance in a single time window is within ±10 cm, the change in lane line distance is assessed in several consecutive time windows before and after that time.According to Figure 4, the first 60 lane line distance samples change continuously and consistently but gradually approach a point that can be considered changing of lanes.After the 60th sample, the lane line distance stays at approximately 0 cm for multiple consecutive time windows, and hence the first instance of 0 cm is considered the start time for lane departure.Figure 5 shows the change in the steering wheel angle corresponding to the different stages of driving in Figure 4 and reveals that the angle at the point of lane departure is around 0 ∘ .The characterization parameters for overclose vehicle following include the distance, angle, and relative speed to the vehicle ahead [22][23][24].During the calibration process of overclose car following, firstly we consider whether the two vehicles are in the same lane based on their relative angle to each other.If they are, then whether the test vehicle is overly close to the other vehicle is assessed according to the relative distance, relative speed, and TTC (ratio of relative distance to relative speed) [25][26][27].The settings for relevant parameter thresholds will be introduced in the sections that follow.
The relative speed and relative angle of the two vehicles changed slightly during testing, as shown by the data sample in Figure 6.The relative distance, however, changed more significantly over the entirety of the vehicle following process.From the 15th to 40th sampling, the relative speed was positive with a small relative distance, according to present relative motion states between the two vehicles, and the risk of traffic conflicts will increase, so this stage was defined as overclose vehicle following.

Neural Network Algorithm and
Bayesian Filter  and input these identification results into a Bayesian filter.This algorithm included the forward propagation of signals and backpropagation of errors; that is, the actual output was calculated from input to output, while the weight and threshold were corrected from output to input [28,29].In Figure 7,   represents the input at the th node of the input layer (where  = 1, 2, . . ., ),   is the weight from the th node on the hidden layer to the th node on the input layer,   is the threshold of the th node on the hidden layer, () is the excitation function of the hidden layer,   is the weight from the th node on the output layer to the th node on the hidden layer (where  = 1, . . ., ),   is the threshold of the th node on the output layer (where  = 1, . . ., ), () is the excitation function of the output layer, and   is the output of the th node on the output layer.

Forward Propagation of Signals.
The input net  of the th node on the hidden layer is The output   of the th node on the hidden layer is The input net  of the th node on the output layer is The output   of the th node on the output layer is Input variable Output variable

Backpropagation of Errors.
The backpropagation of errors consists of calculating the output errors of the neurons on each layer, starting with the output layer, and then adjusting the weight and threshold of each layer by the error gradient descent to give a final output for the corrected network that is closer to the expected value.See Figure 8 for a detailed outline of the calculation procedure.

Bayesian Filter.
The Bayesian network is a graphical means of presenting the probability of connection between variables [30].It provides a more natural method for causal information to find the potential relations among data [31].Within this network, the variables are represented by nodes and the dependency among variables is represented by directed edges.The Bayes theorem connects the prior probability of an event with its posterior probability.If we assume a random vector of , then the simultaneous distribution density of  is given as (, ), and the marginal densities are () and (), respectively.It is generally assumed that  is the observation vector and  is the unknown parameter vector, and so the Bayes theorem is denoted as follows based on the estimation of the unknown parameter vector through the observation vector: where () is the prior distribution of .The general process for estimating the unknown parameter vector by the Bayesian method is as follows.
(1) Consider the unknown parameter as a random vector.Note that this is the biggest difference between the Bayesian method and conventional parameter estimation methods.
(2) Determine the prior distribution () based on a previous understanding of the parameter .This is a controversial part of the Bayesian method and something that has been criticized by proponents of classical statistics.
(3) Calculate the posterior distribution density and deduct the unknown parameter.In step (2), if there is no useful information to help with determining (), Bayes proposed considering its distribution as being uniform; that is, a parameter has the same probability of changing to any value within a range, which is called the Bayesian assumption [32,33].The identification precision of the BP neural network can therefore be improved very effectively through proper modification of its results output via a Bayesian network.

Time Window Calibration.
The selection of a time window has a significant influence on the identification of driving behavior state.If the time window is too large, then the data in a single time window will contain multiple behavioral characteristics, and so the uniqueness of the state division will be limited.In addition, a time window that is too much large  will reduce identification efficiency and affect the timeliness of the model.If the time window is too small, on the other hand, then the characteristics of the data in a single time window will be insignificant and the identification accuracy will decrease dramatically.The optimal time window length was determined in this study based on a precise identification of differences in risky driving behavior using the neural network with different time window lengths, as shown in Figure 9.It can be seen in this figure that the model has the highest identification accuracy with a time window length of 1.5 s, which was selected as the final time window length.

Threshold Determination.
Threshold settings are required for the relative angle, relative speed, and relative distance when defining the risky driving behavior of overclose vehicle following.Figure 10 shows the range of variation in the angle between the test vehicle and other vehicles in the same lane based on extensive observation of millimeter-wave radar data.It is evident from this that, during periods of vehicle following, the relative angle between the vehicles usually lies within ±10 ∘ on sections with small road curvature, and thus this range can be considered as the threshold for judging whether vehicles are in the same lane or not.Relevant studies have shown that when the TTC is less than 5 s, drivers tend to be nervous and make many more incorrect operations [34,35].The vehicle was therefore considered overclose when the following distance reduced the TTC to less than 5 s; that is, a TTC threshold of 5 s was used.Meanwhile, based on an analysis of the running tracks of sample vehicles changing lanes, as well as the duration of any lane change, the time threshold for deciding lane departure was defined as 2 identification time window lengths; that is, a vehicle was considered in a dangerous state of lane departure if it ran on the lane line for more than 3 consecutive seconds [36][37][38].

Model Training and
Test.Aiming at the naturalistic driving test referred to above, incidents of overclose vehicle following, lane departure, and normal driving were extracted based on the videos and test number, and all samples were clipped to 5 s for comparison.In all, 2,000 sets of data samples were collected, and of these, 1,200 sets were used for model training.The remaining 800 sets were used to test the model.Each driving behavior type was numbered, as shown in Table 1.
The input for the neural network was the characterization parameters, which included the lane line distance, relative distance, relative speed, relative angle, and TTC value.The output was the serial number of driving behaviors.So the dimension of the input layer is 5, the dimension of the output layer is 2, and the number of hidden layers is 1.Neural network training is conducted using the MATLAB software, and Levenberg-Marquardt algorithm was used to train the network.The weights and thresholds of the network can be modified via the training to minimize the network output error.After training the network for nine times, the training error converges to 0.00924, which is smaller than the training target of 0.01, indicating that the model achieves good classification performance.After training the model and assessing the performance of the BP network with respect to the training samples, the remaining samples could be input into the BP model to identify driving risk.According to an attribute test performed on the test samples, the general identification accuracy of the model was 83.6%.By taking the output of the neural network model as the input for the Bayesian filter, the filter network first predicts the driving behavior for the next moment.It then outputs the identification results of the neural network-Bayesian filter (BP-BF) model based on an estimated value of the present moment's difference from the previous moment, as well as the output of the neural network model.The output results of the Bayesian filter were probability values, in which it is assumed that ( = 0) =  and ( = 1) = , and hence ( = 2) = 1−−.If the output results of the neural network model are consistent with those of the Bayesian filter, it is unnecessary to correct the results.If not, then the filter will correct the output results of the neural network.Table 2 shows the application of the Bayesian filtering process to a select number of samples (partial results), with the predicted value of BF in the table indicating the probability of lane departure.All these results indicate that the identification accuracy of the BP-BF model reached 92.46% and that overclose vehicle following, lane departure, and normal driving identification accuracies are 88.24%, 91.65%, and 95.40%, respectively.Note that these are all higher accuracies than when a single neural network model is used.

Model Evaluation.
The identification model cannot rely solely on the identification accuracy, as this only indicates whether the model can classify the selected samples well or not.The adaptive capacity of the test accuracy under different diagnostic values therefore needs to be evaluated separately using a receiver operating characteristic (ROC) curve [39].In this study, a comparative analysis was conducted for the BP-BF model and a neural network model using the ROC curve shown in Figure 11.General engineering practice can accept a misjudgment rate of 5% at most, and according to the figure, the BP-BF model has a higher identification rate than the BP model at 5% FP.As the performance of a model is usually evaluated by the area of the lower part of the ROC curve, it is also evident from larger area of the lower part of the BP-BF model that its performance is significantly greater than that of the BP model.Compared to the traditional driving risk identification method mentioned in [8,10,13], the proposed method in this work fully takes into account integrated factors affecting driving safety and gets higher identification success rates; thus it is worth applying widely.

Conclusion
Characterization parameters reflecting the safety characteristics of two typical risky driving behaviors, namely, overclose car following and lane departure, have been determined.An identification model was also established based on a BP neural network and Bayesian filter to identify risky driving behavior.Through this, the following conclusions have been reached: (1) Overclose car following and lane departure can be identified by introducing and analyzing the lane line distance, vehicle following distance, relative speed, and TTC values within a time window.Whether the selected parameters still work when identifying other unsafe driving behaviors needs further researches to verify.
(2) The difference in model identification accuracy with different time windows indicates that the best identification time window for risky driving behavior is 1.5 s, and this conclusion should only be limited to the given subjects and samples; if the variables change, the best identification time window may change correspondingly.
(3) A BP-BF model can effectively identify risky driving behavior with an identification accuracy of 92.46%, which is 8% higher than that of a pure neural network model.
And, in the future study, we will focus on building identification models based on individual drivers' driving behaviors to improve the reliability and practicability of the method.Moreover, we will try to verify if the model built in this research could identify driver distraction or fatigue driving behavior.

Figure 1 :Figure 2 :
Figure 1: Driving behavior test platform and experimental scenarios.

Figure 6 :
Figure 6: Relative speed, relative distance, and relative angle changes during car following.

Figure 9 :
Figure 9: Calibration of the identification time window.

Figure 11 :
Figure 11: ROC curve of the identification results.

Table 1 :
Serial number for each driving behavior type.

Table 2 :
Output modification of the BP neural network.Figure 10: Relative motion states between the ego vehicle and the target vehicles.