Intention-Aware Autonomous Driving Decision-Making in an Uncontrolled Intersection

Autonomous vehicles need to perform social accepted behaviors in complex urban scenarios including human-driven vehicles with uncertain intentions. This leads to many difficult decision-making problems, such as deciding a lane change maneuver and generating policies to pass through intersections. In this paper, we propose an intention-aware decision-making algorithm to solve this challenging problem in an uncontrolled intersection scenario. In order to consider uncertain intentions, we first develop a continuous hidden Markov model to predict both the high-level motion intention (e.g., turn right, turn left, and go straight) and the low level interaction intentions (e.g., yield status for related vehicles).Then a partially observable Markov decision process (POMDP) is built to model the general decision-making framework. Due to the difficulty in solving POMDP, we use proper assumptions and approximations to simplify this problem. A human-like policy generation mechanism is used to generate the possible candidates. Human-driven vehicles’ future motion model is proposed to be applied in state transition process and the intention is updated during each prediction time step. The reward function, which considers the driving safety, traffic laws, time efficiency, and so forth, is designed to calculate the optimal policy. Finally, our method is evaluated in simulation with PreScan software and a driving simulator. The experiments show that our method could lead autonomous vehicle to pass through uncontrolled intersections safely and efficiently.


Introduction
Autonomous driving technology has developed rapidly in the last decade.In DARPA Urban Challenge [1], autonomous vehicles showed their abilities for interacting in some typical scenarios such as Tee intersections and lane driving.In 2011, Google released its autonomous driving platforms.Over 10,000 miles of autonomous driving for each vehicle was completed under various traffic conditions [2].Besides, many big automobile companies also plan to launch their autonomous driving product in the next several years.With these significant progresses, autonomous vehicles have shown their potential to reduce the number of traffic accidents and solve the problem of traffic congestions.
One key challenge for autonomous vehicles driven in the real world is how to deal with the uncertainties, such as inaccuracy perception and unclear motion intentions.With the development of intelligent transportation system (ITS), the perception uncertainty could be solved through the vehicle2X technology and the interactions between autonomous vehicles can be solved by centralized or decentralized cooperative control algorithms.However, human-driven vehicles will still be predominance in a short time and the uncertainties of their driving intentions will still be retained due to the lack of "intention sensor."Human drivers anticipate potential conflicts, continuously make decisions, and adjust their driving behaviors which are often not rational.Therefore, autonomous vehicles need to understand human drivers' driving intentions and choose proper actions to behave cooperatively.
In this paper, we focus on solving this problem in an uncontrolled intersection scenario.The uncontrolled intersection is a complex scenario with high accident rate.In US, stop signs can be used to normalize the vehicles' passing sequence.However, this kind of signs is rarely used in China and the right first traffic laws are often broken by some (x 0 , y 0 , 0 , a 0 , yaw 0 ) (x 1 , y 1 , 1 , a 1 , yaw 1 ) Figure 1: A motivation example.Autonomous vehicle B is going straight, while human-driven vehicle A has three potential driving directions: going straight, turning right, or turning left.If vehicle A turns right, it will not affect the normal driving of autonomous vehicle B. But the other maneuvers including turning left and going straight will lead to a passing sequence problem.Besides, if they have potential conflict, autonomous vehicle B will simulate the trajectories of vehicle A in a prediction horizon and gives the best actions in the current scenario.The vehicles drawn by dash lines are the future prediction positions.The red dash lines are the virtual lane assumption used in this paper, which means that the vehicles are considered to be driven inside the lane.The dark blue area is the potential collision region for these two cars.
aggressive drivers.Perception failures, misunderstandings, and wrong decisions are likely to be performed by human drivers.In such cases, even with stop signs, the "first come, first served" rule is likely to be broken.Besides, human driving behaviors are likely to change as time goes on.With these uncertain situations, specific layout, and the traffic rules, when autonomous vehicles approach an intersection, they should have potential ability to recognize the behavior of other vehicles and give a suitable corresponding behavior considering future evolution of the traffic scenario (see Figure 1).With these requirements, we propose an intention-aware decision-making algorithm for autonomous driving in an uncontrolled intersection in this paper.Specifically, we first use easily observed features (e.g., velocity and position) and continuous hidden Markov model (HMM) [3] to build the intention prediction model, which outputs the lateral intentions (e.g., turn right, turn left, and go straight) for humandriven vehicles and longitudinal behavior (e.g., the yielding status) for related vehicles.Then, a generative partially observable Markov decision process (POMDP) framework is built to model the autonomous driving decision-making process.This framework is able to deal with the uncertainties in the environment, including human-driven vehicles' driving intentions.However, it is intractable to compute the optimal policy for general POMDP due to its complexity.We make reasonable approximations and assumptions to solve this problem in a low computational way.A humanlike policy generation mechanism is used to compute the potential policy set.A scenario prediction mechanism is used to simulate the future actions of human-driven vehicles based on their lateral and longitudinal intentions and the proper reward functions are designed to evaluate each strategy.Traffic time, safety, and laws are all considered to get the final reward equations.The proposed method has been well evaluated during simulation.The main contributions of this paper are as follows: (i) Modeling a generative autonomous driving decisionmaking framework considering uncertainties (e.g., human driver's intention) in the environment.(ii) Building intention prediction model using easily observed parameters (e.g., velocity and position) for recognizing the realistic lateral and longitudinal behaviors of human-driven vehicles.(iii) Using reasonable approximations and assumption to build an efficient solver based on the specific layout in an uncontrolled intersection area.
The structure of this paper is as follows.Section 2 reviews the related work and two-layer HMM-based intention prediction algorithm is discussed in Section 3. Section 4 models general autonomous driving decision-making process in a POMDP, while the approximations and the simplified solver are described in Section 5.In Section 6, we evaluate our algorithm in a simulated uncontrolled intersection scenario with PreScan software and a driver simulator.Finally, the conclusion and future work are discussed in Section 7.

Related Work
The decision-making module is one of the most important components of autonomous vehicles, connecting environment perception and vehicle control.Thus, numerous research works are performed to handle autonomous driving decision-making problem in the last decade.The most common method is to manually define specific driving rules corresponding to situations.Both finite state machines (FSMs) and hierarchical state machines (HSMs) are used to evaluate situations and decide in their framework [4][5][6].In DARPA Urban Challenge (DUC), the winner Boss used a rule-based behavior generation mechanism to obey the predefined driving rules based on the obstacle vehicles' metrics [1,6].Boss was able to check vehicle's acceleration abilities and the spaces to decide whether merging into a new lane or passing intersections is safe.Similarly, the decisionmaking system of "Junior" [7], ranking second in DUC, was based on a HSM with manually defined 13 states.Due to the advantages including implementing simply and traceability, this framework is widely used in many autonomous driving platforms.However, these approaches always use constant velocity assumptions and lack considering surrounding vehicles future reactions to host vehicle's actions.Without this ability, the driving decisions could have potential risks [8].
In order to consider the evolution of future scenario, the planning and utility-based approaches have been proposed for decision-making.Bahram et al. proposed a prediction based reactive strategy to generate autonomous driving strategies [9].A Bayesian classifier is used to predict the future motion of obstacle vehicles and a tree-based searching mechanism is designed to find the optimal driving strategy using multilevel cost functions.However, the surrounding vehicles' reactions to autonomous vehicles' actions are not considered in their framework.Wei et al. proposed a comprehensive approach for autonomous driver model by emulating human driving behavior [10].The human-driven vehicles are assumed to follow a proper social behavior model and the best velocity profiles are generated in autonomous freeway driving applications.Nonetheless, their method does not consider the motion intention of human-driven vehicles and only targets in-lane driving.In their subsequent work, Wei et al. modeled traffic interactions and realized autonomous vehicle social behavior in highway entrance ramp [11].The human-driven vehicles' motion intentions are modeled by a Bayesian model and the human-driven vehicles' future reactions are introduced, which is based on the yielding/notyielding intentions at the first prediction step.Autonomous vehicles could perform social cooperative behavior using their framework.However, they do not consider the intention uncertainty over prediction time step.
POMDPs provide a mathematical framework for solving the decision-making problem with uncertainties.Bai et al. proposed an intention-aware approach for autonomous driving in scenarios with many pedestrians (e.g., in campus) [12].In their framework, the hybrid  * algorithm is used to generate global path, while a POMDP planner is used to control the velocity of the autonomous vehicle solving by an online POMDP solver DESPOT [13].Brechtel et al. presented a probabilistic decision-making algorithm using continuous POMDP [14].They focus on dealing with the uncertainties of incomplete and inaccurate perception in the intersection area, while our goal is to deal with the uncertain intentions of human-driven vehicles.However, the online POMDP solver always needs large computation resource and consumes much time [15,16], which limits its use in real world autonomous driving platform.Ulbrich and Maurer designed a two-step decision-making algorithm to reduce the complexity of the POMDP in lane change scenario [17].Eight POMDP states are manually defined to simplify the problem in their framework.Cunningham et al. proposed a multipolicy decision-making method in lane changing and merging scenarios [18].POMDPs are used to model the decision-making problem in their paper, while multivehicle simulation mechanism is used to generate the optimal highlevel policy for autonomous vehicle to execute.However, the motion intentions are not considered.
Overall, the autonomous driving decision-making problem with uncertain driving intention is still a challenging problem.It is necessary to build an effective behavior prediction model for human-driven vehicles.Besides, it is essential to incorporate human-driven vehicles' intentions and behaviors into autonomous vehicle decision-making system and generate suitable actions to ensure autonomous vehicles drive safely and efficiently.This work addresses this problem by first building a HMM-based intention prediction model, then modeling human-driven vehicle's intentions in a POMDP framework, and finally solving it in an approximate method.

HMM-Based Intention Prediction
In order to pass through an uncontrolled intersection, autonomous vehicles should have the ability to predict the driving intentions of human-driven vehicles.Estimating driver's behavior is very difficult, because the state of a vehicle driver is in some high-dimensional feature space.Instead of using driver related features (e.g., gas pedal, brake pedal, and drivers' vision), easily observed parameters are used to build the intention prediction model in this paper.
The vehicle motion intention  considered in this paper is divided into two aspects, lateral intention  lat ∈ { TR ,  TL ,  GS ,  S } (i.e., turn right, turn left, go straight, and stop) and longitudinal intention  lon ∈ { Yield ,  Yield }.The lateral intention is a high-level driving maneuver, which is determined by human drivers' long term decision-making process.This intention is not always changed in the driving process and determines the future trajectory of humandriven vehicles.In particular, the intention of stop is treated as a lateral intention in our model because it can be predicted only using data from human-driven vehicle itself.However, the longitudinal intention is a cooperative behavior only occurring when it interacts with other vehicles.We will first describe the HMM and then formulize our intention prediction model in this section.

HMM.
A HMM consists of a set of  finite "hidden" states and a set of  observable symbols per state.The state transition probabilities are defined as Α = {  }, where The initial state distribution is denoted as  = {  }, where Because the observation symbols are continuous parameters, we use Gaussian Mixture Model (GMM) [19] to represent their probability distribution functions (pdf): where   represents the mixture coefficient in the th state for the th mixture.N is the pdf of a Gaussian distribution with mean  and covariance Σ measured from observation o.Mixture coefficient  satisfies the following constraints: where Then a HMM could be completely defined by hidden states  and the probability tuples  = (, , , , Σ).
In the training process, we use the Baum-Welch method [20] to estimate model parameters for different driver intention .Once the model parameters corresponding to different driver intention have been trained, we can perform the driver's intention estimation in the recognition process.The prediction process for lateral intentions can be seen in Figure 2.

HMM-Based Intention Prediction
Process.Given a continuous HMM, the intention prediction process is divided into two steps.The first step focused on the lateral intention.
The training inputs of each vehicle's lateral intention model in time  are defined as  lateral = {, V, , yaw}, where  is the distance to the intersection, V is the longitudinal velocity,  is the longitudinal acceleration, and yaw is the yaw rate, while the output of this model is the motion intentions  lat ∈ { TR ,  TL ,  GS ,  S }.The corresponding HMMs can be trained, including  TR ,  TL ,  GS , and   .
The next step is about longitudinal intention.This probability could be decomposed based on the total probability formula: where  is the behavior data including  lateral and  lon .
In this process, we assume that the lateral behavior  lat is predicted correctly by a deterministic HMM in the first step, and therefore  lat is determined by the lateral prediction result  latPredict , where ( lat | ,  lat =  latPredict ) = 1 and ( lat | ,  lat !=  latPredict ) = 0.And ( 6) is reformulated by The problem is changed to model ( Yield | ,  latPredict ).The features used in longitudinal intention prediction are  lon = {ΔV, Δ, ΔDTC}, where ΔV = V social − V host , Δ =  social −  host , and ΔDTC = DTC social − DTC host .DTC means the distance to the potential collision area.The output of the longitudinal intention prediction model is longitudinal motion intention  lon ∈ { Yield ,  Yield }.
Instead of building a generative model, we use a deterministic approach to restrict ( Yield | ,  latPredict ) as 0 or 1.Thus, two types of HMMs named  , lat ,  , lat are trained where  lat ∈ { TR ,  TL ,  GS ,  S }.Two test examples for lateral and longitudinal intention prediction are shown in Figures 3 and 4. Through these two figures, we can find that our approach can recognize human-driven vehicle's lateral and longitudinal intention successfully.

Modeling Autonomous Driving Decision-Making in a POMDP Framework
For the decision-making process, the key problem is how to design a policy to perform the optimal actions with uncertainties.This needs to not only obtain traffic laws but also consider the driving uncertainties of human-driven vehicles.Facing potential conflicts, human-driven vehicles have uncertain probabilities to yield autonomous vehicles and some aggressive drivers may violate the traffic laws.Such elements should be implemented into a powerful decisionmaking framework.As a result, we model autonomous driving decision-making problem in a general POMDP framework in this section.

POMDP Preliminaries.
A POMDP model can be formulized as a tuple{S, A, , Z, , , }, where S is a set of states, A is the action space, and Z denotes a set of observations.The conditional function (  , , ) = Pr(  | , ) models transition probabilities to state   ∈ S, when the system takes an action  ∈ A in the state  ∈ S. The observation function (,   , ) = Pr( |   , ) models the probability of observing  ∈ Z, when an action  ∈ A is taken and the end state is   ∈ S. The reward function (, ) calculates an immediate reward when taking an action  in state . ∈ [0, 1] is the discount factor in order to balance the immediate and the future rewards.
Because the system contains partially observed state such as intentions, a belief  ∈ B is maintained.A belief update function  is defined as   = (, , ).If the agent takes action  and gets observation , the new belief   is obtained through the Bayes' rule: where  = 1/ ∑   ∈ (  , , ) ∑ ∈ (, ,   )() is a normalizing constant.
A key concept in POMDP planning is a policy, a mapping  that specifies the action  = () at belief .To solve the POMDP, an optimal policy  * should be designed to maximize the total reward: (9) where  0 is marked as the initial belief.
Prediction process for HMM.The observed sequence will be evaluated by four HMMs.Forward algorithm is used to calculate the conditional probabilities and the intention corresponding to the largest value will be considered as the vehicle's intention.

State Space.
Because of the Markov property, sufficient information should be contained in the state space S for decision-making process [14].The state space includes the vehicle pose [, , ], velocity V, the average yaw rate yaw ave , and acceleration  ave in the last planning period for all the vehicles.For the human-driven vehicles, the lateral and longitudinal intentions [ lat ,  lon ] also need to be contained for state transition modeling.However, the road context knowledge is static reference information so that it will be not added to the state space.
The joint state  ∈ S could be denoted as  = [ host ,  1 ,  2 , . . .,   ]  , where  host is the state of host vehicle (autonomous vehicle),   ,  ∈ {1, 2, 3, . . ., }, is the state of human-driven vehicles, and  is the number of humandriven vehicles involved.Let us define metric state  = [, , , V,  ave , yaw ave ]  , including the vehicle position, heading, velocity, acceleration, and yaw rate.Thus, the state of host vehicle can be defined as  host =  host , while the humandriven vehicle state   is   = [  ,  lat, ,  lon, ]  .With the advanced perception system and V2V communication technology, we assume that the metric state  could be observed.Because the sensor noise is small and hardly affects decisionmaking process, we do not model observation noise for the metric state.However, the intention state cannot be directly observed, so it is the partially observable variables in our paper.The intention state should be inferred from observation data and predictive model over time.

Action Space.
In our autonomous vehicle system, the decision-making system is used to select the suitable tactical maneuvers.Specifically, in the intersection area autonomous vehicles should follow a global reference path generated by path planning module.The decision-making module only needs to generate acceleration/deceleration commands to the control layer.As the reference path may not be straight, the steering control module can adjust the front wheel angle to follow the reference path.Therefore, the action space A could be defined as a discrete set A = [acc, dec, con], which contains commands including acceleration, deceleration, and maintaining current velocity.
In the decision-making layer, we do not need to consider complex vehicle dynamic model.Thus, the host vehicle's motion Pr(  host |  host ,  host ) can be simply represented by the following equations given action : ) Δ sin ( + Δ) ,   =  + Δ, Thus, the key problem is converted to compute Pr(   |   ), the state transition probability of human-driven vehicles.Based on the total probability formula, this probability can be factorized as a sum in whole action space: host of host vehicle, the current state of itself, and its intentions.Instead of building a complex probability model, we designed a deterministic mechanism to calculate the most possible action   given   host ,   , and   .In this prediction process, the host vehicle is assumed to be maintaining the current actions in the next time step and the action   will be leading human-driven vehicle passing through the potential collision area either in advance of host vehicle under the intention  Yield or behind the host vehicle under the intention  Yield to keep a safe distance  safe .In the case with the intention of  Yield , we can calculate the low boundary  ,low of   through the above process and determine the upper one using the largest comfort value  ,comfort .If  ,comfort <  ,low ,  ,low will be used as the human-driven vehicle's action.If not, we consider the targeted   following a normal distribution with mean value    between  ,low and  ,comfort .To simplify our model, we use the mean value of these two boundaries to represent human-driven vehicle's action   .Similarly, the case with the intention of  Yield can be analyzed in the same process.
After these steps, the transition probability Pr(  | , ) is well formulized and the autonomous vehicle could have the ability to understand the future motion of the scenario through this model.

Observation Model.
The observation model is built to simulate the measurement process.The motion intention is updated in this process.The measurements of humandriven vehicles are modeled with conditional independent assumption.Thus, the observation model can be calculated as The host vehicle's observation function is denoted as But in this paper, due to the use of V2V communication sensor, the observation error almost does not affect the decision-making result.The variance matrix is set as zero.
The human-driven vehicle's observation will follow the vehicle's motion intentions.Because we do not consider the observation error, the value in metric state will be the same as the state transition results.But the longitudinal intention of human-driven vehicles in the state space will be updated using the new observations and HMM mentioned in Section 3. The new observation space will be confirmed with the above step.( The detailed information will be discussed in the following subsections.In addition, the factor of comfort will be considered and discussed in policy generation part (Section 5.1).

Safety Reward.
The safety reward function  safety (, ) is based on the potential conflict status.In our strategy, safety reward is defined as a penalty.If there are no potential conflicts, the safety reward will be set as 0. A large penalty will be assigned due to the risk of collision status.
In an uncontrolled intersection, the four approaching directions are defined as   ∈ {1, 2, 3, 4} (Figure 5).The driver's lateral intentions are defined as  lat ∈ { TR ,  TL ,  GS ,  S }.So the driving trajectory for each vehicle in the intersection can be generally represented by   and  lat, , and we marked it as    , lat, , 1 ≤  ≤ 4, 1 ≤  ≤ 4. The function  is used to judge the potential collision status, which is denoted as where   and   are vehicles' maneuver    , lat, .(, ) can be calculated through relative direction between two cars, which is shown in Table 1.
The safety reward is based on the following items:   where   and   are vehicles' maneuver    ,  .This function Law(  ,   ) is formulized as shown in Algorithm 1.
If the behavior will break the law, a large penalty is applied and the behavior of obeying traffic laws will get a zero reward.

Time Reward.
The time cost is based on the time to the destination for the targeted vehicles in the intersection area: DTG is the distance to the driving goal.In addition, we also need to consider the speed limit, which is discussed in policy generation part in Section 5.

Approximations on Solving POMDP Problem
Solving POMDP is quite difficult.The complexity of searching total brief space is O(|A|  |Z|  ) [12], where  is the prediction horizon.In this paper, we model the intention recognition process as a deterministic model and use communication sensors to ignore the perception error, and thus the size of |Z| is reduced to 1 in the simplified problem.To solve this problem, we first generate the suitable potential policies according to the property of driving tasks and then select the reasonable total predicting interval time and total horizon.After that, the approximate optimal policy can be calculated through searching all possible policies with maximum total reward.The policy selection process is shown in Algorithm 2 and some detailed explanations are discussed in the subsections.

Policy Generation.
For autonomous driving near intersection, the desired velocity curves need to satisfy several constraints.Firstly, except for emergency braking, the acceleration constraints are applied to ensure comfort.Secondly, the speed limit constraints should be used in this process.We aim to avoid the acceleration commands when autonomous vehicle is reaching maximum speed limit.Thirdly, for the comfort purpose, the acceleration command should not be always changed.In other words, we need to minimize the jerk.Similar to [11], the candidate policies are divided into three time segments.The first two segments are like "keep constant acceleration/deceleration actions," while keeping constant velocity in the third segment.We use  1 ,  2 , and  3 to represent the time periods of these three segments.To guarantee comfort, the acceleration is limited to the range from −4 m/s 2 to 2 m/s 2 and we discrete acceleration action into a multiple of [−0.5, 0.5, 0].Then, the action space can be represented by a discretizing acceleration set.Then, we can set the value of  1 ,  2 , and  3 and the prediction period of single step.An example of policy generation is shown in Figure 6.

Planning Horizon Selection.
After building policy generation model, the next problem is to select a suitable planning horizon.Longer horizon can lead to a better solution but consuming more computing resources.However, as our purpose is to deal with the interaction problem in the uncontrolled intersection, we only need to consider the situation before autonomous vehicle gets through.In our algorithm, we set the prediction horizon as 8 seconds.In addition, in the process of updating the future state of each vehicle using each policy, the car following mode is used after autonomous vehicle passes through the intersection area.

Experiment and Results
6.1.Settings.In this paper, we evaluate our approach through PreScan 7.1.0[21], a simulation tool for autonomous driving and connected vehicles.Using this software, we can build the testing scenarios (Figure 7) and add vehicles with dynamic model.In order to get a similar scenario considering social interaction, the driver simulator is added in our experiment (Figure 8).The human-driven vehicle is driven by several people during the experiment and the autonomous vehicle makes decisions based on the human-driven vehicle's driving behavior.The reference trajectory for autonomous vehicle is generated from path planning module and the humandriven vehicle's data (e.g., position, velocity, and heading) are transferred through V2V communication sensor.The decision-making module sends desired velocity command to the PID controlled to follow the reference path.All policies in the experiment part use a planning horizon  = 8 s, which is discretized into the time step of 0.5 s.

Results.
It is difficult to compare different approaches in the same scenario because the environment is dynamic and not exactly the same.However, we select two typical situations and special settings to make it possible.The same initial conditions including position, orientation, and velocity for each vehicle are used in different tests.Besides, two typical situations, including human-driven vehicle getting through before or after autonomous vehicle, are compared in this section.With the same initial state, different reactions will occur based on various methods.We compare our approach and reactive-based method [6] in this section.The key difference for these two methods is that our approach considers human-driven vehicle's driving intention.
The first experiment is that human-driven vehicle tries to yield autonomous vehicle in the interaction process.The results are shown in Figures 9 and 10.Firstly, Figure 9 gives us a visual comparison of the different approaches.From almost the same initial state (e.g., position and velocity), our approach could lead to autonomous vehicle passing through the intersection more quickly and reasonable.
Then, let us look at Figure 10 for detailed explanation.In the first 1.2 s in Figures 10(a) and 10(c), autonomous vehicle maintains speed and understands that human-driven vehicle will not perform yielding actions.Then, autonomous vehicle gets yielding intention of human-driven vehicle and understands that human-driven vehicle's lateral intention is to go straight.Based on candidate policies, autonomous vehicle selects acceleration strategy with maximum reward and finally crosses the intersection.In this process, we can obviously find that autonomous vehicle understands humandriven vehicle's yielding intention.Figure 10(c) is an example of understand human-driven vehicle's behavior based on ego vehicle's future actions in a specific time.Our strategy predicts the future actions of human-driven vehicle.Although the velocity curves after 1 s do not correspond, it does not affect the performance of our methods.The reason is that we use a deterministic model in the prediction process and the prediction value is inside two boundaries to ensure safety.Besides, the whole actions of autonomous vehicle in this process could also help human-driven vehicle to understand not-yielding intention of autonomous vehicles.In this case, cooperative driving behaviors are performed by both vehicles.However, if the intention is not considered in this process, we can find the results in Figures 10(b), 10(d), and 10(f).After 2 s in Figure 10(b), while the human-driven vehicle gives a yielding intention, autonomous vehicle could not understand and they find a potential collision based on the constant velocity assumptions.Then, it decreases the speed but the human-driven vehicle also slows down.The puzzled behavior leads both vehicles to slow down near intersection.Finally, human-driven vehicle stops at the stop line and then autonomous vehicle could pass the intersection.In this strategy, the human-driven vehicle's future motion is assumed to be constant (Figure 10(f)).Without understanding of humandriven vehicle's intentions, this strategy can increase congestion problem.
Another experiment is that human-driven vehicle tries to get through the intersection first.The results are shown in Figures 11 and 12.This case is quite typical because many traffic accidents in real world are happening in this situation.In detail, if one vehicle tries to cross an intersection while violating the law, another vehicle will be in great danger if it does not understand its behavior.From the visualized performance in Figure 11, our method is a little more safe than other approaches as there is nearly collision situation in Figure 11(b).In detail, we can see from Figure 12(a) that our strategy could perform deceleration actions after we understand the not-yielding intention in 0.8 s.However, without understanding human-driven vehicle's motion intention, the response time has a 1-second delay which may be quite dangerous.
In addition, it is shown that good performance is in the predictions of human-driven vehicle's future motion in our methods (Figure 12(e)).
The results of these two cases demonstrate that our algorithm could deal with typical scenarios and have better performance than traditional reactive controller.Autonomous vehicle could be driven more safely, fast, and comfortably through our strategy.

Conclusion and Future Work
In this paper, we proposed an autonomous driving decisionmaking algorithm considering human-driven vehicle's uncertain intentions in an uncontrolled intersection.The lateral and longitudinal intentions are recognized by a continuous HMM.Based on HMM and POMDP, we model general decision-making process and then use an approximate approach to solve this complex problem.Finally, we use PreScan software and a driving simulator to emulate social interaction process.The experiment results show that autonomous vehicles with our approach can pass through uncontrolled intersections more safely and efficiently than using the strategy without considering human-driven vehicles' driving intentions.
In the near future, we aim to implement our approach into a real autonomous vehicle and perform real world experiments.In addition, more precious intention recognition algorithm aims to be figured out.Some methods like probabilistic graphic model can be used to get a distribution of each intention.Finally, designing online POMDP planning algorithms is also valuable.The definition of each subfigure is the same as in Figure 10.

Figure 3 :
Figure 3: Lateral intention prediction example.The true intention of human-driven vehicle is to turn left in this scenario.In the first figure, the value 1 of the  label means turn left, 2 means turn right, 3 represents go straight, and 4 corresponds to stop.

Figure 4 :
Figure 4: One example of predicting longitudinal intentions.This example is based on the scenario of Figure1and two vehicles both go straight.The value 1 of -axis in the first figure denotes the intention of yielding, while 2 represents not yielding.In the first 2.8 s, the intention is yielding.After that, due to the acceleration action and less relative DTC, autonomous vehicle could understand human-driven vehicle's not-yielding intention.

4. 7 .
Reward Function.The candidate policies have to satisfy several evaluation criterions.Autonomous vehicles should be driven safely and comfortably.At the same time, they should follow the traffic rules and reach the destination as soon as possible.As a result, we design objective function (17) considering three aspects including safety, time efficiency, and traffic laws, where  1 ,  2 , and  3 are the weight coefficient:  (, ) =  1  safety (, ) +  2  time (, ) +  3  law (, ) .

Figure 5 :
Figure 5: One typical scenario for calculating safety reward.

Table 1 :‰
Safe condition judgments in the intersection.indicates potential collision.I indicates no potential collision.

Figure 6 :Figure 7 :
Figure 6: An example of policy generation process.(a) is the generated policies and (b) is the corresponding speed profiles.The interval of each prediction step is 0.5 s, current speed is 12 m/s 2 , and the speed limit is 20 m/s 2 .The bold black line is one policy.In the first 3 seconds, autonomous vehicles decelerate in −3.5 m/s 2 , then accelerate at 2 m/s 2 for 4 seconds, and finally stop in the last one second.In this case, 109 policies were generated, which is suitable for replanning fast.

Figure 10 :T = 4 sT = 1 sT = 2 sT = 3 sT = 5 sTFigure 11 :
Figure 10: Case test 1.In this case, human-driven vehicle passes through intersection after autonomous vehicle.(a), (c), and (e) are the performance of our method, while (b), (d), and (f) are from the strategy without considering the driving intention.(a) and (b) are the velocity profiles and the corresponding driving intention.For longitudinal intention, label 1 means yielding and label 2 means not yielding.In lateral intention, 1 means turning left, 2 means turning right, 3 means going straight, and 4 means stop.The intentions in (b) are not used in that method but for detailed analysis.(c) and (d) are the distance to collision area for autonomous vehicle and human-driven vehicle, respectively.(e)and (f) are the prediction and true motions of human-driven vehicles in time 1.5 s with a prediction length of 8 s.The red curves in these subfigures are from autonomous vehicle while blue lines are from human-driven vehicle.The green lines in (e) and (f) are the prediction velocity curves of human-driven vehicle.

Figure 12 :
Figure12: Case test 2.In this case, human-driven vehicle passes through intersection before autonomous vehicle through different strategies.The definition of each subfigure is the same as in Figure10.
In state transition process, we need to model transition probability Pr(  | ,).This probability is determined by each targeted element in the scenario.So the transition model can be calculated by the following probabilistic equation: 4.4.Observation Space.Similar to the joint state space, the observation  is denoted as  = [ host ,  1 ,  2 , . ..,   ]  , where  host and   are the host vehicle and human-driven vehicle's observations, respectively.The acceleration and yaw rate can be approximately calculated by speed and heading in the consecutive states.4.5.State Transition Model.
With this equation, we only need to calculate the state transition probability Pr(   |   ,   ) given a specific action   and the probability of selecting this action Pr(  |   ) under current state   .Because the human-driven vehicles' state   = [  ,   ], the probability Pr(   |   ,   ) can be calculated as Pr (   |   ,   ) = Pr (   ,    |   ,   ,   ) = Pr (   |   ,   ,   ) Pr (   |    ,   ,   ,   ) .  , Pr(   |   ,   ,   ) is equal to Pr(   |   ,  lat, ,   ).The lateral behavior  lat, is considered to be a goaldirected driving intention which will not be changed in the driving process.So Pr(   lon, is assumed to be not updated in this process.But it will be updated with new inputs in observation space.Now Pr(   |   ,   ) is well modeled and the remaining problem is to compute the probabilities Pr(  |   ) of humandriven vehicles' future actions: Pr (  |   ) = Pr (  |   ,   )  |   host ,   ,   ) Pr (  host |   ,   ) .(14) Because   host is determined by the designed policy, Pr(  host |   ,   ) could be calculated by (11) given an action  host .The probability Pr(  |   host ,   ,   ) means the distribution of human-driven vehicles' actions given the new state |   ,  lat, ,   ) is equal to Pr(   |   ,   ) given a reference path corresponding to the intention of  lat, .Using (11), Pr(   |   ,   ) can be well solved.The remaining problem for calculating Pr(   |   ,   ) is to deal with Pr(   |    ,   ,   ,   ).The lateral intention   lat, is assumed stable through the above explanation.And the longitudinal intention then the safety reward is equal to 0 due to the noncollision status.(ii) If potential collision occurs, there will be a large penalty.(iii) If |TTC  − TTC host | <  threshod , there is a penalty depending on |TTC  − TTC host | and TTC host .