A Support Vector Regression Approach for Investigating Multianticipative Driving Behavior

This paper presents a Support Vector Regression (SVR) approach that can be applied to predict the multianticipative driving behavior using vehicle trajectory data. Building upon the SVR approach, a multianticipative car-following model is developed and enhanced in learning speed and predication accuracy. The model training and validation are conducted by using the field trajectory data extracted from the Next Generation Simulation (NGSIM) project. During the model training and validation tests, the estimation results show that the SVR model performs as well as IDM model with respect to the model prediction accuracy. In addition, this paper performs a relative importance analysis to quantify the multianticipation in terms of the different stimuli to which drivers react in platoon car following. The analysis results confirm that drivers respond to the behavior of not only the immediate leading vehicle in front but also the second, third, and even fourth leading vehicles. Specifically, in congested traffic conditions, drivers are observed to be more sensitive to the relative speed than to the gap. These findings provide insight into multianticipative driving behavior and illustrate the necessity of taking into account multianticipative car-following model in microscopic traffic simulation.


Introduction
Microscopic traffic simulation tools are playing an increasingly essential role in Intelligent Transportation System (ITS).The problems in ITS application, such as adaptive congestion control, incident management, and traffic flow forecast, are usually difficult to solve due to the complex nature of the system dynamics in traffic flow.Microscopic traffic simulation tools permit the transportation engineers to study such problems under different scenarios without any disruption to real traffic.It is therefore of great importance to ensure that these simulation tools are able to replicate real traffic situations sufficiently well.
Although numerous car-following models have been developed, inaccuracy and unreliability in many of these models' performance have been identified in earlier studies [12][13][14][15][16][17].For instance, Brockfeld et al. [12] reported that minimum calibration errors of about 15% to 25% could probably not be suppressed, regardless of the model used in the experiments.They also inferred that the errors might result from a highly stochastic component in the driver's behavior.It turns out that better modeling approaches are needed to correctly explain the dynamic and stochastic driving behaviors.

Mathematical Problems in Engineering
One particular area that could lead to increased realism and accuracy in car-following models is that of multianticipation in follower's driving behavior.Researchers have indicated that driving behavior cannot be adequately described by only considering the immediate vehicle in front.Rather, drivers anticipate traffic conditions farther downstream by considering the second leading vehicle and even the third and fourth vehicles ahead [18][19][20][21][22][23].Based on such an assumption, several well-known car-following models have been modified to incorporate multileader stimuli.More recently, Hoogendoorn et al. [24] proposed a generalized framework for multianticipative car-following model.Farhi et al. [25] and Zhang [26] developed multianticipative car-following models by using piecewise linear and multiple linear regression, respectively.These studies on multianticipative car-following models are of great importance due to the potential application in the context of microscopic traffic simulation, as most of the car-following models in commercial simulation programs models consider only the first leader, whereas multiple leaders should be considered in the model to describe driving behavior more realistically.
Recently, with the advanced computing capability of processing high-resolution vehicle trajectory data, efforts have been made to improve the performance of car-following models by using Machine Learning (ML) methods (e.g., Artificial Neural Networks and Support Vector Regression (SVR)) [27][28][29][30][31][32].The results of comparative experiments showed that ML-based models outperformed the traditional car-following models.In particular, Wei and Liu [32] reported that the complex asymmetric characteristic in carfollowing behavior could be well interpreted by using the SVR approach.
Motivated by the outstanding learning and generalization ability of the SVR approach, this study attempts to further explore the SVR-based car-following model, with a particular focus on the analysis of the multianticipation phenomenon in driving behavior, using high-resolution trajectory datasets (i.e., NGSIM [33]).The objective of this study is twofold: (1) to develop a multianticipative car-following model using the SVR approach and (2) to empirically investigate the number of vehicles and the type of stimuli to which drivers react.

Model Framework
In this section, we propose the multianticipative SVR-based car-following model after a brief discussion on the SVR approach.SVR was extended from Support Vector Machine to solve nonlinear regression problem [34].It has been receiving increased attention because of its remarkable generalization performance, the absence of local minima, and the sparse representation of the solution.As SVR has the potential to model complex, real-world problems, it has been used in the study of several transportation topics [35][36][37][38].
The basic idea of SVR is to solve a nonlinear regression in a linear way by mapping nonlinear input data from original dimensional feature space to a higher dimensional feature space.Consider a set of training data as ( 1 ,  1 ), . . ., (  ,   ), . . ., (  ,   ), where   is the training sample,   is the corresponding target value, and  refers to the number of the training data points.The SVR estimation function takes the following form: where (⋅) denotes a nonlinear space transformation,  is the weight vector, and  is the bias.Our goal is to find a () that has at most  deviation from the target value   .One feasible way to achieve this goal is to solve the following convex optimization problem: where  is a predefined penalty,  is the tolerance, and   ,  *  are the positive and negative slack variables, respectively.Applying Lagrange multipliers and Karush-Kuhn-Tucker (KKT) condition to (2) yields the following dual Lagrange dual function: where   and  *  are Lagrange multipliers.In deriving (3), the partial derivative of Lagrange function with respect to the weight vector  has to be eliminated for optimality: By further introducing a kernel function,  is no longer needed to be given explicitly in (1).The SVR estimation function becomes where (,   ) = (  )  (  ) is the kernel function.The kernel function provides a linear classifier to solve nonlinear problem in a computationally efficient way.The most commonly used kernel functions are Linear, Polynomial, Sigmoid, and Radial Basis Function (RBF).Though no general guideline is available to choose a kernel function, the RBF kernel is generally preferred because of fewer parameters included and less numerical difficulties [39]; thus, it is employed in this study for experimentation.The RBF kernel is defined as follows: where  is the kernel width tuning the prediction accuracy.Since (2) is a convex optimization problem, there always exists a unique solution for  and  once the kernel function along with the parameters , , and  is determined.The reader is referred to [40] for further explanation.
To improve learning speed and prediction quality, it is also necessary to perform feature selection to discard redundant or irrelevant variables while only extracting a minimum set of indispensable variables from the input data [41].It is known that vehicle speed, relative speed, and gap are widely considered the most decisive stimuli in the car-following models (e.g., GHR and IDM) and most importantly can be directly observed from vehicle trajectory data.Therefore, we can define the SVR-based car-following model as the following acceleration function: where () is the dependent variable representing the acceleration of following vehicle at time , V() is the vehicle speed, ΔV( − ) denotes the relative speed, and Δ( − ) denotes the gap (i.e., the distance between the back bumper of the leader and the front bumper of the follower) at a previous time step  − ;  is the driver's reaction time.Initially, we build the SVR-based car-following model to consider only one leading vehicle.It is found in the literature that multianticipative car-following models are often extensions of the single-leader model [24].Thus, ( 7) is further modified to account for multianticipation by incorporating leaderspecific parameters: () =  (V () , ΔV (1) ( − ) , ΔV (2) ( − ) , . . ., ΔV () ( − ) , Δ (1) ( − ) , Δ (2) ΔV (1)   ( − ) , ΔV (2)   ( − ) , . . ., ΔV ( − ) , Δ (1)   ( − ) , Δ (2)   ( − ) , . . ., Δ ΔV (1) ( − ) , ΔV (2) ( − ) , . . ., ΔV () ( − ) , Δ (1) ( − ) , Δ (2) ( − ) , . . ., Δ () where  ≥ 1 represents the number of leaders to whose relative speed and gap a driver reacts.This model is referred to as SVR- in the rest of the paper.For example, SVR-3 model assumes a driver responding to the first, second, and third leaders.So far, the SVR-based car-following model is established and expected to deal with the multianticipative driving behavior.

Data Preparation
This study uses the Next Generation Simulation (NGSIM) trajectory dataset, which was collected on a 500-m segment of eastbound I-80 in Emeryville, California, from 5:00 to 5:15 p.m. on April 13, 2005, with 0.1-s resolution.It should be mentioned that the trajectory datasets were collected during peak hours that showed congestion.A more detailed introduction to this trajectory dataset is presented in NGSIM Interstate 80 Freeway Dataset Report [33].To investigate multianticipative driving behavior, the platoons with a minimum size of five vehicles are considered in this study.Moreover, this study considers the following criteria to choose the appropriate carfollowing platoons.
(i) The car-following platoons should be observed consecutively for at least a period of 60 s (i.e., 600 observations) to ensure sufficient observations for SVR learning.
(ii) The car-following platoons present on Lane 1 (high occupancy lane) and Lane 7 (on-ramp) should be excluded.
(iii) Only the car-following platoons without lane changing for any of its member during the observation period are accepted.
(iv) The vehicle type of both followers and leaders in the platoon should only be passenger car.Considering the potentially different driving behavior from other vehicle types in the analysis is beyond the scope of this study.
VTAPE [42], a vehicle trajectory analysis tool developed by the authors, is used throughout this study to process NGSIM dataset.By applying the criteria proposed above, a total of 229 car-following platoons consisting of 202,598 trajectory points are extracted from the NGSIM dataset.Note that the raw trajectory dataset, particularly the vehicle speed, has been denoised by using a symmetric exponential moving average filter, since studies have reported issues in the quality of the NGSIM dataset [43][44][45].For a visual inspection, a sample of trajectories, speed and gap profiles from a car-following platoon are presented in Figure 1.The trajectory data of this platoon will also be used as the example in the next section for model performance comparison.
Before model training and validation, we also normalized the trajectory data to improve the SVR training convergence rate.The data normalization used in this study is to map the raw trajectory data into range [0, 1]: where  ,norm denotes the normalized value of   ,  max is the maximum value of the training data, and  min is the minimum value of the training data.The output value  ,norm will be denormalized before comparing with its target value.

Model Training and Validation
The objective of model training is to determine the set of SVR parameters that best fit the given data samples.In this paper, recall that for (8) we need to calibrate penalty , tolerance , and kernel width  by vehicle speed V, relative speed ΔV, gap Δ, and driver's reaction time .However, the driver's reaction time is not directly observable from trajectory data.Efforts to estimate the reaction time have been made for a single-leader car-following model [29,31,46], while a systematic approach to estimate reaction time for multianticipation is still absent.In this study, we treat the reaction time as one of the model parameters to be calibrated.The feasible range for the reaction time is set from 0.5 to 2 s and discretized by the step of 0.1 s.Accordingly, the reaction time and other parameters that yield the best estimation result are considered the optimal parameters.The initial selection of remaining parameters is set as  = 2 −5 , 2 −3 , 2 −1 , . . ., 2 15 ,  = 0.001, 0.01, 0.1, 0.2, 0.5,  = 2 −5 , 2 −4 , 2 −3 , . . ., 2 1 and then optimized based on the cross-validation, which is a widely used trial-and-error testing for assessing the model generalization ability.The idea of the cross-validation is to train on one subset of the samples and then to test the prediction accuracy on the other separate subsets.Although the cross-validation leads to high prediction accuracy, the implementation of the cross-validation could be very timeconsuming on the large-scale data, since a full retraining of data samples should be done at each iteration.To reduce computational burden, the Accurate Online SVR (AOSVR) algorithm proposed by Ma et al. [47] is used in this study.AOSVR is able to update the trained SVR function whenever a sample is added or removed from the original training set, without the need to retrain from scratch.In this manner, during the cross-validation procedure, we first construct the SVR function using the entire trajectory dataset, followed by removing each subset (in our case, the trajectory data of each platoon) from the SVR function using the AOSVR algorithm, and then calculate the prediction error for each subset.Ultimately, we choose the parameters associated with the minimum error presented by an error indicator.The root mean square error (RMSE) is one of the basic error indicators, which measures the deviation of the predicted value from the field data: where ŷ presents the estimation of field data   .Another error indicator is Theil's inequality coefficient [48], widely used in econometrics, which is defined as Theil's inequality coefficient provides a measurement of RMSE in relative terms, which falls between 0 and 1.The closer the value of  to zero, the better the prediction result.Basically, either speed or gap can be used as the error term.Ossen [23] reported that the multicriterion objective function containing both speed and gap produced better results than either single-objective function.Thus, the following error indicator with multicriterion objective is applied in this study: where V and Δŝ  are the estimation of observed speed and observed gap, respectively.Note that the car-following behavior relies heavily on its preceding state, which means that the trajectory data cannot be regarded as independent samples.Thus, a car-following simulation is carried out in this study for model performance evaluation.During the simulation, the following vehicle starts with the same speed and position as its first observation, while the movement of leaders will be simulated exactly the same as in the observations.At each time step, the considered car-following model is applied to update the followers' trajectories according to (13).The error indicator (12) can be calculated at the end of the simulation: where () is the position of the subject vehicle at time  and Δ is the time step (0.1 s).
In addition, to assess the performance of SVR model to conventional physical models, IDM model, which is considered one of the most robust car-following models in the literature, is applied to the trajectory data based on the same evaluation and simulation procedure.The IDM is defined as follows: () =  (V () , ΔV () , Δ ()) with the minimum desired gap Δ * given by where V * ,  * , and  * are the desired speed, maximum desired acceleration, and absolute maximum desired deceleration of the following vehicle, respectively; Δ 0 is the jam distance;  0 denotes the desired time gap.By introducing a set of weighting coefficients, IDM can be extended to a linear combination form that accounts for the multianticipation driving behavior [49]: where weighting coefficients   ( = 1, 2, . . ., ) are assumed to decrease monotonically as  increases, which indicates the influence order of the leading vehicles and is in agreement with daily driving experience.This model is referred to as IDM- in the rest of the paper.To limit the parameter space for the optimization, the following constraints are applied to the minimum and maximum values of model parameters by referring to the study of Kesting and Treiber [50].The desired speed V * is restricted to [1,70]  This study applies the first 12-min trajectory data, which covers 163 car-following platoons, to train the total of eight car-following models considered (i.e., SVR-1, SVR-2, SVR-3, and SVR-4; IDM-1, IDM-2, IDM-3, and IDM-4).By applying the described calibration method, we found the best fit of the car-following models to the training trajectory dataset.Tables 1 and 2 list the optimal parameters and training results for the SVR and IDM models, respectively.
Theil's inequality coefficients presented in Tables 1 and  2 are very close to zero, in the range of 0.1360 to 0.1430, which indicate that both the SVR and IDM models in the simulation tests perform well and are able to produce trajectories that closely match the field observations.It is also obvious that the models including multiple leading vehicles produce higher prediction accuracy, as corresponding Theil's inequality coefficients are smaller compared with the singleleader model.These training results imply that drivers may react not only to the first leader but also to the second, third, and even the fourth leaders.In addition, by comparing Theil's inequality coefficients of SVR and IDM models, we can notice that the SVR models perform as well as the IDM models.But the consistency of the model performance needs to be determined by additional validation test.Therefore, to further verify the models' performance in terms of predicting multianticipation, this study uses the remaining 3-min trajectory data, including 66 car-following platoons, to perform the validation test.Similar to the model training, the SVR and IDM models are evaluated in a microscopic simulation while applying the same optimal parameters derived from the model training results.Additionally, the number of times a specific model outperforms the other models in terms of prediction accuracy is counted during the simulation.Table 3 presents the overview of the validation results.
As might have been expected, the prediction errors shown in Table 3 tend to be slightly higher than in the model training with additional 6% error rate.But the range of Theil's inequality coefficients still falls in 0.1452 to 0.1514, indicating overall acceptable validation results, which is consistent with the typical error ranges reported in previous studies [12].Recall that these eight car-following models are assumed to represent different driving styles according to the different number of leaders considered.Thus, it can be inferred that the model outperforming all the other models is most likely the driving style applied by the driver.From Table 3, we can see that 83.3% of the best models consider two or more leaders; even more than 60.6% are the three and four leader models.This result empirically indicates that drivers react to the behavior of not only the immediate leader in front but also the second and even the third and fourth leaders ahead.In general, the models including more leaders yield better performance.However, as far as the model prediction accuracy is concerned, the result shows that there is still no single model that consistently outperforms all other models in all conditions.It turns out that different car-following models are still needed to accurately predict different multianticipative driving behaviors.Figure 2 compares the deviations of the speed and gap profiles generated by SVR and IDM models with a different number of leading vehicles considered (the original empirical trajectories were shown in Figure 1).
In order to quantitatively determine to what extent drivers react to the stimuli during multianticipative car following, a relative importance analysis is performed in the next section.

Relative Importance Analysis
Compared with the conventional statistical models (e.g., linear regression) in which the relative importance of each independent variable can be clearly explained by its coefficients, variable relative importance for SVR is difficult to determine due to the inherent limited input-output model framework.This study applies a derivative-based sensitivity analysis to measure the relative importance of features (independent variables) for SVR.For further explanation, we refer to the original work of Cao et al. [51].
Recall that only the data points with a nonzero coefficient, which are the so-called support vectors, are presented in (5).Hence, (5) can be rewritten as where   is the number of support vectors.The importance of support vector associated with the th feature can be derived by calculating the partial derivative of (17) Substituting the RBF kernel function ( 6) into (18) yields The average absolute deviation is one of the applicable measurements to evaluate the feature importance, which takes the following form: where   is the number of features.To identify to what extent the different stimuli contribute in motivating the multianticipative driving behavior, we perform the proposed relative importance analysis to evaluate the nine multianticipative car-following models.Table 4 shows the importance scores for each model feature.
The importance scores presented in Table 4 are assumed to represent the sensitivity at which drivers react to the stimuli.It is apparent from Table 4 that drivers are overall most sensitive to the first leader.Not much difference is found between the sensitivity to the third leader and the sensitivity to the fourth leader.This may explain that the models considering four leaders do not yield much better prediction accuracy than the models considering three leaders.Such result is consistent with our daily driving experiences, as drivers may not be able to pay attention to four or more leading vehicles ahead.Additionally, combined with the finding in the model training and validation, it can be concluded that considering second, third, and even fourth leaders will further improve the model performance over a single-leader model.It is also evident that drivers appear to be more responsive to the relative speed than to the gap in this analysis.Since the trajectory data used for model training and validation was collected in an afternoon peak hour, the multianticipative driving behavior can in part be explained by the traffic congestion, as intuition suggests that drivers in congestion may react to the stimuli of multiple leaders for the sake of safety.

Summary and Future Research
The motivation for this study was to investigate multianticipative car-following behavior and provide empirical evidence for the interpretation.Efforts were made to build the SVRbased car-following model that considers multiple leaders.Due to its outstanding learning ability, the SVR-based carfollowing model is able to track the multianticipative driving behavior from the NGSIM trajectory dataset and performs as well as the IDM model.Since the IDM model has difficulty in determining the model parameters (e.g., desired speed and desired gap) that are unobservable in nature, SVR model has the advantage that it provides a data-driven way that allows us to empirically study driving behavior directly from field trajectory data.The estimation results from training and validation tests using SVR and IDM models show that incorporating multianticipative behavior considerably improves the model prediction accuracy.Although both SVR and IDM models perform well in the validation test, this study shows that there is still no single model that consistently outperforms all the other models in all conditions.Different car-following models are required to accurately predict driving behaviors for different drivers.The validation results confirm that drivers react to the behavior of not only the immediate leader in front but also the second, third, and even more leaders.Additionally, this study preforms a relative importance analysis to quantify the different stimuli to which drivers react in multileader car following.It is evident that drivers appear to be more sensitive to the relative speed than to the gap in congested traffic conditions.
These findings are of great importance for the current microscopic simulation practice.Most of the commercial simulation models include only the first leader; however, multiple leaders need to be considered to describe driving behavior more realistically.Future research will be aimed at extending the current empirical analysis to different levels of traffic flow and examining how the current results might change.Furthermore, applying various Machine Learning techniques to interpret complex traffic phenomenon and its implementation in the traffic simulation deserves more attention.

4 Figure 2 :
Figure 2: Comparison of speed and gap profiles generated by SVR and IDM models.

Table 1 :
Optimal parameters and training results for SVR models.

Table 2 :
Optimal parameters and training results for IDM models.

Table 3 :
Validation results for SVR and IDM models.

Table 4 :
Results of relative importance analysis.