^{1}

^{1}

^{2}

^{3}

^{1}

^{1}

^{4}

^{1}

^{2}

^{3}

^{th}Street

^{4}

The decision-making models that are able to deal with complex and dynamic urban intersections are critical for the development of autonomous vehicles. A key challenge in operating autonomous vehicles robustly is to accurately detect the trajectories of other participants and to consider safety and efficiency concurrently into interactions between vehicles. In this work, we propose an approach for developing a tactical decision-making model for vehicles which is capable of predicting the trajectories of incoming vehicles and employs the conflict resolution theory to model vehicle interactions. The proposed algorithm can help autonomous vehicles cross intersections safely. Firstly, Gaussian process regression models were trained with the data collected at intersections using subgrade sensors and a retrofit autonomous vehicle to predict the trajectories of incoming vehicles. Then, we proposed a multiobjective optimization problem (MOP) decision-making model based on efficient conflict resolution theory at intersections. After that, a nondominated sorting genetic algorithm (NSGA-II) and deep deterministic policy gradient (DDPG) are employed to select the optimal motions in comparison with each other. Finally, a simulation and verification platform was built based on Matlab/Simulink and PreScan. The reliability and effectiveness of the tactical decision-making model was verified by simulations. The results indicate that DDPG is more reliable and effective than NSGA-II to solve the MOP model, which provides a theoretical basis for the in-depth study of decision-making in a complex and uncertain intersection environment.

Today’s driving-assistance systems have made traffic more efficient and safer and show considerable improvements towards the availability of autonomous driving. To develop the next generation of driver assistance systems or even self-driving systems, the algorithms that are capable of handling complex situations are required. Many researchers have proposed some approaches about perception [

The problems of robust tactical decision-making for autonomous vehicles in a complex and dynamic urban environment have been investigated quite extensively by many organizations and researchers, such as Google [

In recent years, more and more researchers have begun studying decision-making behavior. Chen [

This paper focuses on the decision-making process of autonomous vehicles in an urban environment and develops a vehicle trajectory prediction model based on Gaussian process regression (GPR) [

The remainder of this paper is organized as follows: Section

Gaussian process regression (GPR) is a statistical method that can make full use of raw data by considering its temporal trends and periodic changes to establish a suitable predictive model. This model has been used to predict the trajectories of vehicles and has been proven to be efficient. Compared with LSTM, its main advantage is that it is more robust when dealing with data with noise, making it more suitable for urban intersections.

The log likelihood function of the sample data is shown as follows:

The joint distribution of the model’s observations and training data is shown as follows:

Therefore, the output of the model can be found with (

In 2000, a new nondominated sorting genetic algorithm (NSGA-II) was proposed by Srinivas and Deb on the basis of the NSGA, which is a theory and method of handling the Pareto optima in multiobjective optimization problems. It is one of the most popular multiobjective genetic algorithms (GAs) in studying complex system analysis, and diversity results discovery. The structure of the algorithm is as shown in Figure

Structure of NSGA-II.

The step-by-step procedure shows that NSGA-II algorithm is simple and straightforward. First, a combined population

The interactive learning process of reinforcement learning is similar to human learning, which can be represented as a Markov decision process consist of

The DDPG algorithm is an improved actor-critic method. In the actor-critic algorithm, the actor function

The DDPG algorithm combines the advantages of the actor-critic and DQN algorithms so that the converge becomes easier. In other words, DDPG introduces some concepts from DQN, which are employing the target network and estimate network for both of the actor and critic. Moreover, the policy of the DDPG algorithm is no longer stochastic but deterministic. It means the only real action is outputted from the actor network instead of telling probability of different actions. The critic network is updated based on

The data were collected from the intersections of Wei Gong Cun Road using subgrade sensors and a retrofit autonomous vehicle as the training and testing samples of the trajectory prediction model. The details are discussed in the following section.

The camera for subgrade data acquisition was installed on the BIT Science and Technology Building. The vehicles’ locations (

The vehicle data were collected with a BYD line-controlled autonomous vehicle which was retrofitted by the BIT Intelligent Vehicle Research Institute. The retrofit autonomous vehicle “Surui” [

(a) The raw and processed data. (b) The layout of sensors on the BYD retrofit vehicle.

The camera and LIDAR sensor were able to detect, track, and localize dynamic objects. The outputs of the fusion algorithm are the positions of vehicles.

Due to the different driving directions and routes of vehicles at intersections, a collision may occur. As shown in Figure

Scenes on crossing intersection with no signal.

Vehicle course angle and azimuth were extracted to distinguish if a vehicle turned or not, because these two parameters change linearly with time when vehicles turn. By utilizing vehicles’ motion parameters to recognize driving patterns, incoming vehicles’ trajectories were predicted effectively. Real-time acceleration was used to distinguish if a vehicle kept driving or gave way to incoming vehicles because vehicles’ real-time accelerations for the two patterns are distributed across different ranges.

A trajectory prediction model based on the GPR model was used to predict the trajectories of MVs. The training process of GPR models [

(a) The training process of the GPR model. (b) The trajectory prediction model.

In this paper, the data collected from the subgrade sensors were used for training the GPR models and optimizing its hyper parameters.

After training the prediction model, as this paper paid more attention to straight driving MVs, the CA (constant acceleration) [

An appropriate parameter should be selected to analyze the traffic conflict. TTC (time to collision) is a widely used parameter in traffic conflict research, but it is generally used for scenes such as highway and is improper to evaluate the danger degree of vehicles collision at intersections. We use EPET (estimating postencroachment time) as the safety indicator which describes the time difference between vehicles passing through the center of conflict zone and can effectively evaluate collision danger between vehicles at any angles, as shown in Figure

While ensuring safety, an appropriate speed is expected, which stands for efficiency during crossing the intersection. Using these criteria, we define the following measure combining safety and efficiency:

As the states and actions of vehicles are continuous, we use acceleration

The mathematical model of MOP is usually expressed as follows:

To ensure safety, a simplified circle model for vehicles is established, as shown in Figure

Simplified circle model.

We set a safety constraint for no overlap between the excircles of vehicles:

The formula for the motion state of vehicles is as follows:

For the model of MOP, we perform an optimal solution based on NSGA-II, and the process is shown in Figure

Process of the MOP model at intersections by the NSGA-II algorithm.

There are two stages in the solution process: the first phase is decision making at the initial moment and performing the action with the known information, and the second phase is to update the position and velocity of vehicles with dynamic information and then regenerate optimal motions.

If we assume that the process of crossing intersections is a Markov decision process (MDP), it is practical to apply deep reinforcement learning for continuous action spaces. The input state is the speed of vehicles and distance from the center of vehicles to the center of conflict zone, i.e.,

In this section, we trained DDPG on OpenAI Gym and then tested the algorithms on PreScan to compare. This allowed us to verify the effectiveness and reliability of the proposed algorithms.

Simulation parameters are set as follows: we test the algorithms in single or multiple-vehicle scenes where there is one or more MVs driving straight from north to south, and a UV is excepted to cross the intersection controlled by algorithms with no collision. The length and width of vehicle MV and UV are 4800 mm and 2178 mm, respectively, communication distance range is less than 200 m apart from each other, and speed limit at intersection is 60 km/h.

PreScan is a simulation environment for developing advanced driving assistant systems (ADASs) and intelligent vehicle (IV) systems. It is a platform that can be used to build 3D virtual traffic scenes, generate vehicles, pedestrians, traffic lights, and other control modules, as shown in Figure

Simulation platform. (a) PreScan. (b) OpenAI Gym.

We build a new task about intersection with multiple vehicles on OpenAI Gym, as shown in Figure

In this paper, the predictions of steering-vehicle trajectories and the straight vehicle trajectories are verified separately. These trajectories are divided into several different pieces to evaluate the prediction performance. The prediction lengths of the straight vehicle are 3 s, 4 s, 5 s, and 6 s. The prediction lengths of steering-vehicle are 3 s, 4 s, and 5 s. There are 80 trajectories in each group.

Figure

The trajectory prediction error of straight vehicle and left-turn vehicle. (a) Straight vehicle. (b) Left-turn vehicle.

Scenario 1: single-vehicle scenario

Figure

(a) Scenario 1. (b) Predicted trajectory. (c) Velocity. (d) Distance. (e) TTC.

Figure

Scenario 2: multiple-vehicle scenario

To compare the performances of the DDPG and NSGA-II algorithms, we conducted two groups of experiments on the same scene, in which

Comparison of the performance of the two algorithms. (a) NSGA-II. (b) DDPG.

For group A, the UV adopts a yield strategy wherein it slows down before

Figure ^{2} during the entire process of passing through the intersection, thus achieving a much higher total reward than that in group A.

The comparison data in Table

Comparison data of two algorithms.

Algorithm | Time to cross the conflict zone for UV (s) | Total reward | |
---|---|---|---|

NSGA-II | 3.75 | −44.184 | 5 |

DDPG | 2.25 | −18.743 | 0 |

The stability of the DDPG and NSGA-II algorithms was studied by performing a new task wherein the initial speed of the UV was varied from 30 km/h to 55 km/h.

We built a single-vehicle scene, where there is only one UV, and imported the trained actor policy of the DDPG to output the motions of the UV. We then imported the NSGA-II algorithm as a compared group to observe the performance on the same task 10 times. As shown in Figure

Stability and robustness of two algorithms.

To improve the safety and efficiency of autonomous vehicles, this paper proposed a MOP decision-making model based on efficient conflict resolution for autonomous vehicles at urban intersections, which considers the complexity of urban intersections and the uncertainties of vehicle behavior. The prediction algorithm for incoming vehicles was studied, and we compare the performance for UV at intersections based on the decision-making model by NSGA-II and DDPG. The main conclusions are listed as follows:

The trajectory prediction model fits the predicted trajectory by learning the probability distribution of a large amount of trajectory data, and the accuracy of the model depends on the quantity and quality of the training data. The incoming vehicle trajectory data collected in this paper was limited and was unable to cover all the incoming vehicle motion patterns.

The MOP decision-making model performs well, which can avoid a collision for vehicles happened at intersections. Compared to a traditional machine learning algorithm, NSGA-II, the performance of DDPG algorithm is more stable and effective to solve the MOP model at intersections, and UVs perform the more appropriate and efficient motions by DDPG.

The decision making of autonomous vehicles is influenced by human-vehicle-road (environmental) factors. Due to limits on the length of this article, the impacts of pedestrians, nonmotor vehicles, road structure types, and traffic flow density on decision-making were not considered in this study. In the future, the impacts of these factors will be studied and discussed. The interactions between people and vehicles will be considered to further improve the decision-making model of driving behavior under real road conditions.

The data used to support the findings of this study are provided in the Supplementary Materials section.

The authors declare that they have no conflicts of interest.

This work was supported in part by the Youth Science Fund (no. 51705021), Automobile Industry Joint Fund (no. U1764261) of the National Natural Science Foundation of China, Beijing Municipal Science and Technology Project (no.Z191100007419010), and Key Laboratory for New Technology Application of Road Conveyance of Jiangsu Province (no. BM20082061706).

The data collected from the intersections of Wei Gong Cun Road in Beijing using subgrade sensors and a retrofit autonomous vehicle were as the training and testing samples of trajectory prediction model. And the data were divided into three categories: left-turn vehicles, right-turn vehicles, and straight vehicles. Every category included the vehicles, information like location, speed, acceleration, and so on.