A Decision-Making Model for Self-Driving Vehicles Based on Overtaking Frequency

. The driving state of a self-driving vehicle represents an important component in the self-driving decision system. To ensure the safe and eﬃcient driving state of a self-driving vehicle, the driving state of the self-driving vehicle needs to be evaluated quantitatively. In this paper, a driving state assessment method for the decision system of self-driving vehicles is proposed. First, a self-driving vehicle and surrounding vehicles are compared in terms of the overtaking frequency (OTF), and an OTF-based driving state evaluation algorithm is proposed considering the future driving eﬃciency. Next, a decision model based on the deep deterministic policy gradient (DDPG) algorithm and the proposed method is designed, and the driving state assessment method is integrated with the existing time-to-collision (TTC) and minimum safe distance. In addition, the reward function and multiple driving scenarios are designed so that the most eﬃcient driving strategy at the current moment can be determined by optimal search under the condition of ensuring safety. Finally, the proposed decision model is veriﬁed by simulations in four three-lane highway scenarios. The simulation results show that the proposed decision model that integrates the self-driving vehicle driving state assessment method can help self-driving vehicles to drive safely and to maintain good maneuverability.


Introduction
With the significant increase in the feasibility of selfdriving technology, decision systems that guarantee the safe and reliable driving state of self-driving vehicles and provide restricted information for their efficiency optimization have become a key affecting factor of the future industry. e driving state refers to the state of a vehicle's lateral velocity, longitudinal velocity, lateral acceleration, and longitudinal acceleration while travelling and the difference between vehicles travelling in adjacent lanes. In traffic flow, vehicles can make different decisions on driving behaviour based on a difference in the driving status compared to the surrounding vehicles. For human drivers, the basis of a pre-decisional judgment is mostly based on personal experience. However, for self-driving vehicles, a number of judgment criteria are required before making decisions. To realize the decision-making of self-driving vehicles in multiple lanes, it is necessary to implement evaluation methods that can assist self-driving vehicles in judging the difference in the driving status between the ego vehicle and vehicles in other lanes.
ere have been many studies on the decision-making of self-driving vehicles, and they have been mainly focused on longitudinal and lateral decision-making.
Considering the longitudinal driving decision-making, Zhu et al. [1] proposed a deep reinforcement learning-based framework for humanoid self-driving vehicles following planning to obtain an optimal policy from the aspects of speed, a relative speed of the vehicles in front and behind, vehicle spacing, and acceleration of the following vehicle in a human-like manner. Wei et al. [2] proposed a decisionmaking algorithm to assist self-driving vehicles under a single-lane uncertainty, considering the behaviour uncertainty of the in-front vehicles and the uncertainty in the environment perception accuracy, and achieved a significant improvement in system robustness. Ziegler et al. [3] proposed a method for planning the acceleration and deceleration maneuvers of autonomous vehicles during trajectory planning under the condition of deterministic driving behaviour of surrounding vehicles. However, these decisionmaking methods consider only the state relationship between the vehicle in front of it and the vehicle and lack comparisons with vehicles in adjacent lanes.
In lateral decision-making, Gao et al. [4] proposed a reinforcement learning-based decision-making method for networked autonomous vehicles, which utilized the advantages of networked information to make autonomous driving decisions more effective. A number of studies have proposed different lane-changing decision-making methods for autonomous vehicles based on the game theory [5,6]. For instance, Li et al. [7] proposed a game theory-based traffic model for testing and comparing various autonomous vehicle decision-making systems. e lane-changing decisions and lane-changing trajectories have been modeled by learning the driving behaviour of human drivers to build a humanoid lane-changing decision-making system [8].
Comprehensive studies have been conducted on decision-making behaviours, including vertical and horizontal decision-making behaviours. Khattak [9] preinstalled historical road performance data on the navigation map of selfdriving vehicles and fused it with the vehicle's multisensor data to help drivers of self-driving vehicles and vehicles with a low automation level to make reasonable driving decisions. Zheng et al. [10] proposed an intelligent vehicle behaviour decision model based on driving risk assessment by analyzing the driver's driving characteristics and selecting safety and high efficiency as the two main factors that drivers pursue when driving. ey establish a multiobjective optimal cost function for a decision model based on the least action principle. Hubmann et al. [11] considered the current and future interactions and uncertainties of vehicles and established a multiobjective optimal cost function for a decision-making model that can optimize autonomous driving behaviour under different future scenarios. Bahram et al. [12] proposed a combined optimization predictionresponse-based driving strategy selection mechanism, which considered comfort in addition to ensuring the safety of autonomous vehicles. Rauskolb et al. [13] used a hybrid rulebased behavioural modeling approach to model an intelligent vehicle's behaviour decisions. However, this method does not consider differences in the driving state of the surrounding vehicles and its own lane.
Many autonomous driving decisions are based on algorithms designed for the Markov decision process [14,15]. Zuo et al. [16] proposed a continuous reinforcement learning method that combines deep deterministic policy gradients with live demonstrations. is method accelerates the training process while learning more demonstrator preferences. In several studies, Q-learning and deep learning have been combined to design autonomous driving frameworks [17][18][19].
Part of the current research on decision-making for autonomous vehicles has been based on an actor-critic model. e DDPG algorithm has good convergence [20]. Based on the DDPG algorithm, Wang et al. [21] built a personalized autonomous driving system and designed driving decision-making methods according to different driving styles. In a multivehicle scenario, based on an actor-critic learning approach, Xu et al. [22] established an actorcritic model as a decision model for autonomous driving. ey used a value network to evaluate the current situation and a strategy network to make the next decision, respectively. By combining the two networks, an intelligent control model, which is in line with the human decision process, was developed.
In the existing autonomous driving following models, the speed of the vehicle in front and the distance between the vehicle and the vehicle in front of it are considered. e lane change model considers whether the driving states of the vehicle in front, the vehicle in front of the target lane, and the vehicle behind allow self-driving vehicles to perform the lane-changing behaviour. However, human drivers make the decision on changing their current state when vehicles in both lanes keep overtaking them or they keep overtaking vehicles in both lanes during the driving process. According to this idea, this paper makes an efficient and safe decision for integrated horizontal and vertical decision-making for autonomous vehicles, based on an OTF approach that considers the driving variability between the ego vehicle (EV) and the vehicles in both lanes.
To overcome the limitations and shortcomings of the existing work, this paper proposes an OTF-based driving state assessment method for autonomous vehicles and, based on this method, designs an autonomous driving decision model. e structure of this paper is shown in Figure 1. e main contribution of this paper is the development of a vehicle state assessment approach based on the OTF parameters to improve the accuracy of self-driving vehicles in judging the driving state, which can quantitatively and objectively measure whether a driving state of a three-lane scenario is appropriate. By using the OTF-based method and the DDPG algorithm, a decision model is established to obtain an optimal action-state by evaluating the combined efficiency of these strategies. e rest of this paper is organized as follows. Section 2 presents a state evaluation method for self-driving vehicles applicable to three-lane traffic scenarios. Section 3 describes the decision process. Section 4 presents the simulation results of four typical scenarios. Section 5 gives the conclusion.

OTF-Based Driving-State Evaluation Approach for Self-Driving Vehicles
In the vehicle driving decision problem in high-speed scenarios, the influencing factors of self-driving vehicles mainly include the speed, position, and driving safety of a vehicle. Before generating a decision, a self-driving vehicle needs to determine whether the traffic conditions in its current lane are consistent with the traffic conditions in the adjacent lanes and then can decide whether to change the driving status. erefore, this paper establishes an OTF-based driving state assessment method for self-driving vehicles.

Description of the OTF-Based Driving-State Evaluation
Approach.
e consistency of a vehicle's driving state with the surrounding environment has a significant impact on whether EV intends to change the vehicle's driving behaviour. In this paper, the term "overtaking frequency" (OTF) is introduced to evaluate the difference between a selfdriving vehicle and the surrounding vehicles. e OTF can be used to compare the EV's driving state with the driving states of vehicles in other lanes.
Since the OTF reflects the speed difference between vehicles in other lanes and an EV, it is related to the difference in the number of overtaking and overtaken vehicles on the two sides. e numbers of vehicles overtaking and being overtaken on the left and right sides can be, respectively, calculated by where N al (t) and N ar (t) denote the numbers of vehicles overtaken by an EV in the left and right lanes, respectively; N bl (t) and N br (t) are the numbers of vehicles overtaking the EV in the left and right lanes, respectively; N nl (t) is the difference in the numbers of vehicles overtaking and being overtaken in the left lane, and N nr (t) is the difference in the number of vehicles overtaking and being overtaken in the right lane. In a three-lane scenario, the number of overtaking vehicles and the number of overtaken vehicles can be calculated as follows: where N a (t) is the total number of vehicles overtaken by an EV and N b (t) is the total number of vehicles overtaking the EV. In this study, the OTF is defined as a difference between the numbers of vehicles overtaking and those being overtaken in the left and right lanes within a unit time interval, which can be expressed as follows: where δ is the unit time window. According to equation (3), the OTF threshold is set to [−a o , a o ]. erefore, in the OTF evaluation function, when a vehicle's OTF belongs to [−a o , a o ], the autonomous vehicle's state represents a consistent speed, and when OTF is larger than a o , the autonomous vehicle's state represents an excessive speed, and when OTF is less than (−a o ), the autonomous vehicle's state represents an insufficient speed.
is judgment framework is shown in Figure 2.

Time-Window Determination.
When the time-window length varies, the OTF value in the corresponding timewindow also varies. e time-window length is divided into different length ranges to investigate the OTF value ranges under different fixed time-window lengths. For instance, when δ � 5 s, the OTF is calculated by When δ � 10 s, the OTF is calculated by erefore, when δ � i ′ s, the OTF is calculated by In this paper, 1 s is used as a time-window step for each car overtaking an EV. When δ u � 1 s, the OTF value changes every 1 s, resulting in a dynamic OTF change. When δ � i ′ s, the OTF value is calculated for the time-window length of (N a (t) − N b (t)) in the following time periods, and the division of the time-window under dynamic changes is as follows:   Table 1.

Decision-Making Process
In this study, a learning approach is used to obtain an optimal decision. In this section, a decision-making model based on the policy gradient algorithm is introduced. In this algorithm, the vehicle driving state evaluation function based on the OTF is defined, and reward functions in different scenarios are designed, which can reflect the driving difference between self-driving vehicles and other vehicles. e optimal action-state is searched by the optimal search.

DDPG.
e DDPG [24] represents a realization of the DeepMind research team constructed using the DQN to extend the Q-learning algorithm and a deep neural network to approximate the state-behaviour value function and the deterministic strategy. e DDPG algorithm separately parameterizes the critic function Q μ (s, a|θ Q ) and the actor function μ(s|θ μ ), where θ Q and θ μ are the weight parameters. e critic function is defined by equations (9) and (10), and it is updated by minimization.
e actor function maps the current state to the current best action, and it is updated by where i is the number of training steps, and N is the total number of training steps; Q and μ denote the functions of critics and actors, respectively; Q ′ and μ ′ denote the critic and actor functions of the target network, respectively. Finally, the target network copies the original network's parameters according to the delay factor τ to perform the update by 3.2. OTF-Based Self-Driving Vehicle Decision-Making Process in Different Scenarios. When making decisions for autonomous driving, different decisions need to be made according to certain scenarios. According to the OTF-based driving state evaluation function presented in Section 2, this paper establishes four typical scenarios of autonomous vehicle decision-making methods. is paper discusses the following scenarios in a threelane highway scenario. When OTF > a o , an EV's driving speed is significantly higher than the driving speeds of vehicles in the adjacent lane, and the EV's state represents the excessive-speed state. Since the EV's status can be assessed as too fast, the EV can make the OTF value meet the OTF threshold range by slowing down and performing the other related actions.

No Cars in
When OTF < − a o , the EV's state is the insufficientspeed state. Since there is no blocking from the EV in the front, the EV should accelerate to achieve that its OTF is within the threshold range.
e OTF-based self-driving vehicle decision-making process in scenario (1) is shown in Figure 4. When OTF > a o , the vehicle can make the OTF meet the threshold range by slowing down and performing the other actions while ensuring the safety distance.

Sudden Insertion of Other
When OTF < − a o , although the EV's state is the insufficient speed, the OTF threshold cannot be reached by the acceleration maneuver due to the insertion of the OV in the adjacent left lane. erefore, it is necessary to consider performing a switch to the right lane to achieve optimal efficiency. e same applies to the adjacent right lane with vehicle insertion. e OTF-based self-driving vehicle decision-making process in scenario (2) is shown in Figure 5. When OTF > a o , the EV's state is the excessive speed. us, the EV can make the OTF reach the threshold range by deceleration while ensuring the safety distance at the same time.
When OTF < − a o , the EV's state represents the insufficient speed. erefore, it is impossible to reach the OTF threshold by acceleration action due to the sudden braking of the car in front, so it is necessary to consider executing lane change action. e OTF-based self-driving vehicle decision-making process in scenario (3) is shown in Figure 6.

e Vehicle in Front Changes Lane: Consider the Example of the Front Car Changing Lanes to the Adjacent Left Lane.
When OTF ∈ [−a o , a o ], since the vehicle in front executes a lane-changing maneuver, under normal circumstances, the EV will decelerate in its own lane. erefore, the EV should consider performing a right-lane action to achieve optimal efficiency while ensuring a safe distance from the vehicle in front.
When OTF > a o , the EV can make the OTF reach the threshold range by deceleration action while ensuring the safety distance at the same time.
When OTF < − a o , if the lane-changing action of the front car is completed within a unit time-window, the strategy of the EV after the lane change of the front car can be referred to case (1). If the front car is unable to complete the lane-changing process within the time-window due to the factor of its target lane and the EV cannot accelerate to within the threshold value, the execution of the lanechanging in the right lane should be considered. e OTF-based self-driving vehicle decision-making process in scenario (4) is shown in Figure 7.

Reward Function Design.
e measurement of a policy depends on the cumulative reward received by an intelligent body after executing a policy for a long period of time. Since the most important issues to be considered for intelligent  vehicles are safety and timeliness, these two aspects should be considered when designing the reward function. e timeliness is mainly shown by two aspects, OTF function and vehicle speed. erefore, in this paper, the OTF reward and vehicle speed reward are established separately.
In the OTF-based speed suitability assessment model, the output results in a positive reward value for consistent speed, and the reward value for the rest of the cases is zero. e OTF reward r o can be defined as follows: Based on the results in Section 2, a o was set to 0.05. Outside the OTF constraint, smart vehicles travel at faster speeds, which is beneficial for timeliness. In their speed constraint, the speed vehicle should be as fast as possible. erefore, the speed reward can be defined as follows: where v E is the current vehicle speed and v max is the maximum vehicle speed in the current lane. e safety of a smart vehicle is related to the state of a vehicle in front of it, so setting the safety reward is determined by the time-to-collision (TTC) and a relative distance between the two vehicles D [25]. e TTC value is calculated by where x F denotes the longitudinal position of the front vehicle, x E is the longitudinal position of the EV, v E is the EV speed, and v F is the speed of the front vehicle. e reward r T for TTC is expressed as where t T min denotes the minimum threshold of t T [26]. When the calculated value of t T is infinite, i.e., when the speeds of the two cars are equal, the reward value is one.

Other Vehicle1
Other Vehicle2 Radar sensor Other Vehicle1 Other Vehicle2 Ego Vehicle Other Vehicle2 Ego Vehicle However, when the relative distance between the two cars is less than the minimum safe distance, the reward value is set to negative infinity, which can be expressed as erefore, the accumulated total reward value can be calculated by

Simulation and Validation
In order to verify the effectiveness of the proposed decision algorithm, the reinforcement learning framework provided by MATLAB was used, and four complex high-speed scenes were constructed as an experimental environment.
In the experiment, the reinforcement learning elements, including actions, states, and rewards, were implemented. e OTF-based vehicle DDPG algorithm was used for vehicle driving behaviour decision-making. e control variables (front-wheel angle and acceleration) were output by the neural network in the DDPG. In this section, the three-degree-of-freedom vehicle dynamics model in Simulink responds to the control variables, and finally outputs the EV's own state variables lateral velocity v y , longitudinal velocity v x , and yaw angle ω. e exact flow of the simulation is shown in Figure 8. e selected high-speed scene was a one-way three-lane scene, and the state space S was the location and motion information of the surrounding 10 vehicles, including the vehicle under test. e vehicles in the scene were free and randomly selected actions. e parameters of the environmental model are shown in Table 2. e vehicle movement space included: left lane-changing, driving straight ahead, right lane-changing, acceleration, and deceleration. e specific training process refers to the algorithm described in Section 3.1, where the maximum number of epochs in the training phase was set to 10,000. e tests were divided into four scenarios, and the OTFbased driving behaviour decision model for autonomous vehicles presented in Section 3 was validated in each of the scenarios.

Scenario (1): No Cars in front of the EV.
e results of the self-driving vehicle using the OTF-based driving behaviour decision model in Scenario 1 are displayed in Figure 9. As    Journal of Advanced Transportation shown in Figure 9(a), the EV was driving in the middle lane, and the EV's driving decision in this scenario was lane-keeping. Since the current driving lane was not blocked by the car in front, the EV performed the acceleration action, as shown in Figures 9(b) and 9(d) in order to improve the driving efficiency. As shown in Figure 9(c), the training result finally converged.

Scenario (2): Sudden Insertion of the OV from the Adjacent
Lane. e results of the autonomous vehicle using the OTFbased driving behaviour decision model in Scenario 2 are presented in Figure 10. As shown in Figure 10

Scenario (3): Sudden Braking of the Vehicle in Front.
As shown in Figure 11(a), the decision of the EV was to change lanes to the left. Due to the sudden braking of the vehicle in front, the EV performed the deceleration action first, as shown in Figures 11(b) and 11(d), to ensure driving safety. en, to drive more efficiently and obtain a larger reward value, the EV performed the lane-changing maneuver. As shown in Figure 11(c), the training results finally converged.

Scenario (4): e Vehicle in Front Changed
Lanes to the Adjacent Lane. As shown in Figure 12(a), the EV's decision in this scenario was lane-keeping. During the simulation, since the lane-changing maneuver of the car in front was completed in 3 s, there was no obstruction in front of the EV after the front car left the lane. erefore, the EV performed the acceleration action after the front car changed the lane, as shown in Figures 12(b) and 12(d). As shown in Figure 12(c), the training results converged.

Conclusions
In this paper, a method based on overtaking frequency is proposed for solving the autonomous decision-making problem of self-driving vehicles in highway scenarios. e degree of difference in the driving state between the selfdriving vehicle and surrounding vehicles is evaluated. is difference is quantified by the evaluation method of the driving state of autonomous vehicles based on the overtaking frequency. With the assistance of this evaluation method, a decision-making model based on the DDPG is established. A driving decision-making method of selfdriving vehicles based on the overtaking frequency in different typical scenarios is designed to make self-driving decisions more efficient and reasonable. e proposed model is verified by simulations, and simulation results prove the applicability and effectiveness of the decisionmaking model in four typical driving scenarios. e method can provide a theoretical basis for further research in uncertainty decision-making. However, whether the application scenarios of the algorithm have broad applicability remains to be studied in the future research. In the future research, the training amount of the model will be further increased, and the application of the decision model will be expanded.

Data Availability
e data presented in this study are available on request from the corresponding author.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this study.