A Multiobjective Cooperative Driving Framework Based on Evolutionary Algorithm and Multitask Learning

Jiangsu Key Laboratory of Urban ITS, Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies and Jiangsu Province Collaborative Innovation Center for Technology and Application of Internet of #ings, School of Transportation, Southeast University, Nanjing 210096, China School of Engineering, Tibet University, Lhasa, Tibet 850000, China Department of Civil & Environmental Engineering, University of Wisconsin-Madison, 1221 Engineering Hall, 1415 Engineering Drive, Madison, WI 53706, USA

methods to solve the problem for different scenarios. Grand cooperative driving challenges were also organized to promote its development in practice [16].
Some of the existing studies belong to the optimizationbased method. Yan et al. [17] proposed a dynamic programming algorithm to evacuate vehicles at the intersection as soon as possible. Zhu and Ukkusuri [18] put forward a linear programming model to dispatch vehicles at autonomous intersections in order to minimize total travel time. Besides, mixed-integer linear programming (MILP) is widely used to obtain solutions [19][20][21]. However, Li and Wang [13] proposed a framework based on the optimization principle, which utilized a tree search algorithm to achieve the same purpose. All of the listed studies focus on searching for optimal solutions based on different prior hypotheses.
Relevant studies pointed out that the key to solving the problem is determining the right-of-way for CAVs approaching the merging area [22][23][24]. In other words, the vehicles can be formulated as a passing sequence in the form of arrays, and the performance of the schedule strategy hinges on the way to generate the best passing order among a large number of possible solutions.
In terms of generating passing orders, the existing studies can be classified into two categories. One is the rulebased strategy, which uses some heuristic rules to determine the passing order of vehicles. Dresner and Stone [25,26] proposed a reservation-based system and assigned right-ofway to vehicles on a first-come-first-served (FCFS) basis. Although the effectiveness of the FCFS method can be proved [26,27], its rule-based nature always leads to feasible but not optimal solutions. Moreover, the reservation-based strategy cannot outperform traditional signal control in some cases [28,29]. While the rule-based strategies cannot always perform very well, the other approach to generate passing orders is introduced, called "planning-based strategy" [13]. Meng et al. demonstrated that the planning-based strategy could consistently outperform the FCFS method in intersection scenarios by comprehensive simulations [30]. Actually, the planning-based strategy is a framework that can search for optimal solutions in a huge solution space. e strategy is essentially a traversal problem with intolerable computational complexity. erefore, consequent studies focused on the reduction in computing time. Xu et al. [31] proposed a grouping-based strategy, which groups CAVs to reduce the count of possible solutions. In their other study, a Monte Carlo tree is built to keep the trade-off between coordination performance and computation time [32]. Meanwhile, Zhang et al. [33] reported a framework that utilized a neural network to surrogate the simulation test process with the intent to reduce computation time. However, the only optimization target they considered is about traffic efficiency indexes such as passing time or total delay, while the value of other targets such as energy consumption or queue length is difficult to acquire. is is caused by the weakness of the trajectory interpretation algorithm in their studies. erefore, there is still a lack of a real-time, multi-objective cooperative driving strategy that can be maneuverable and reliable. To this end, we design a multi-objective discrete evolutionary algorithm (MODEA) to search for (near) optimal passing orders, which combines the nondominated sorting method [34] and state transition algorithm [35]. A multitask learning model is proposed to be a regressor, which can give feedback of objective values to MODEA. e scenario is simulated by Simulation of Urban MObility (SUMO) [36]. e simulation results indicate that the framework can be applied to different scenarios, performing well even under a high concurrency environment. e rest of the study is arranged as follows: Section "Problem Statement" gives the general form of cooperative driving problems and traffic scenarios the paper studied. Section "Methodology" presents the framework we proposed, including the MODEA and multitask learning method in detail. Section "Simulation and Analysis" provides the simulation results of a series of experiments. Finally, conclusions are given in Section "Conclusion."

Problem Statement
Highway on-ramps and urban unsignalized intersections are two typical scenarios for cooperative driving (see Figure 1). Rios-Torres and Malikopoulos [37] pointed out that the two mainstream frameworks in the cooperative driving field are centralized coordination and decentralized coordination, respectively, while the method proposed in this study belongs to the former. e centralized frameworks rely on a central controller responsible for computing and sending control commands. e controller has a communication range (CR) that defines the boundary of communication and control.
is article denotes the CR as a circle, which is widely adopted in previous studies [30,33,38]. Only vehicles within the CR will communicate with the controller and be controlled.
Some followed assumptions are listed to make the analysis and implementation easier: (i) Lane-changing behaviors are prohibited in CR for safety consideration. (ii) e system has no interference from pedestrians and non-motor vehicles. (iii) All CAVs can transmit id, position, speed, and other precise information to the controller spontaneously. (iv) e vehicles are homogeneous pure electric CAVs for estimating energy consumption. e energy model can be found in [39]. e general form of the objective function in cooperative driving can be defined as follows: where F is the function that represents queue length, energy consumption, or traffic delay, and x is the independent variable that will give rise to the optimization target. In this study, two objects are considered: (a) the minimization of time consumption to evacuate all CAVs in CR and (b) electricity consumption for CAVs in the process of a scheduling scheme. e input of the function x denotes a passing sequence, which can be denoted as follows: where n is the number of vehicles in CR. Let f 1 be the time consumption to evacuate all CAVs in CR, and f 2 be the corresponding electricity consumption, and (1) can be transformed to as follows: Here, f 1 (x) can be denoted as follows: where t e represents the time when the vehicle CAV n exits from CR. f 2 (x) can be denoted as follows: where e represents the energy consumption of CAV i in discrete time, and readers can refer to [39] for the stepwise energy consumption model. Figure 2 illustrates the procedure of the framework this study proposed. e framework uses MODEA with non-dominated sorting and multitask neural network to reduce computation time and implement multi-objective optimization. A population-based evolutionary algorithm is used to search solutions in solution space, while the fitness value of every individual can be obtained from a neural network, which plays the role of target regressor. en, the framework will be introduced in detail.

Multitask Learning Model.
It is found that carrying out learning for tasks jointly can improve the performance compared with conducting them individually [40]. us, in this study, a multitask deep learning model is trained to target the evolutionary algorithm's feedback. erefore, the task of the model is learning for target yield in each traffic state. Here, we consider the time consumption and electricity consumption as the targets defined in equations (4) and (5). For performing the regression task, the input should be appropriately expressed. As in equation (2), a passing sequence can be denoted as an array including CAV ids. We define the encoding of a single CAV as follows: where p i is the position of CAV i from the beginning of the lane, and v i is the speed, while a i represents the acceleration of CAV i , and p i , v i , a i will be normalized for input into the model. In addition, encode(lane i ) is the encoding of the lane that the driving vehicle belongs to. e encoding method is different according to the different traffic scenarios. For the on-ramp scenario shown in Figure 1(a), the one-hot encoding is applied. However, in the intersection scenario, considering the spatial relationship, we combine with approach direction and driving direction. Figure 3 shows the encoding process that takes the scenarios in Figure 1 as an example. For instance, vehicle D is coming from the west approach, and it will turn left at the intersection, so the encoding of its lane is (1, 0, 0, 0, 0, 1). Finally, a passing sequence can be formulated as the concatenation of encodings of CAVs.
When the vectorized representations of passing sequences are constructed, a neural network model can be built to take the vectors as input. Similar to TextCNN [41], we also use the convolutional neural network (CNN) to carry out the learning process, whereas CNN can extract the features from original data automatically [42]. e structure of the CNN-based multitask learning model is shown in Figure 4. e backbone part takes sequence vectors consisting of several CAV encodings as inputs and extracts latent feature representations for them; then, the specific task part takes the feature representations as input and output time consumption and energy consumption of the sequences in a specific traffic scenario. In the backbone part, one-dimensional convolution layers with different scales of kernel size are applied to extract features.
After determining the basic structure of the neural network, the loss function should be specified to train the learning model towards the optimization goals. Here, considering the training process of two targets in two singletask models, the loss functions are considered as mean squared error (MSE), which is as follows: where n is the count of test samples. f 1 and f 2 are predicting values, while f 1 and f 2 are ground truth. Generally, the loss function in the multitask learning model can be defined as the naive weighted sum of losses, which is as follows: where the loss weights ω 1 and ω 2 are uniform or manually tuned. e performance of the model highly depends on the settings of the weight parameters. Cipolla Figure 1(b). [43]. As a result, let f W (x) be the outputs of neural network with weights W, and the likelihood as a Gaussian can be defined as follows: where σ is a scalar that represents observation noise. Let f j (x) be the sufficient statistics; then, the multitask likelihood can be derived from the following: Taking logarithmic form, the new loss function can be defined as follows: Notice that σ 1 and σ 2 are the denominators in equation (11). To avoid division by zero errors, the logarithmic form is used for the actual training process: Finally, the loss function is given in equation (13), which can be adaptive during the training process.

Multi-Objective Discrete Evolutionary Algorithm
Generally, the average count of possible passing sequences in cooperative driving grows almost exponentially with the increase in numbers of CAVs in CR [30]. us, searching for the best solution is hard when the number of CAVs is large, so this study proposes a population-based evolutionary algorithm to obtain (near) optimal passing order from this perspective.
In multi-objective optimization problems, the Pareto optimal solution is used to select according to the practical problem [44]. e conception of the Pareto optimal solution set is introduced as below. First, in this minimization problem, solution x 0 Pareto dominates x 1 only if: We use the corresponding symbol to denote the domination relationship: which represents that x 0 dominates x 1 . If there is not any solution that dominates x 0 , then x 0 will be called the non- dominated solution. Accordingly, the Pareto optimal solution set P s can be defined as the set consisting of all the nondominated solutions. erefore, the primary purpose of the algorithm is to search corresponding Pareto optimal solutions. If there is more than one element in the Pareto optimal solution set, two kinds of heuristic strategies can be used: (i) Delay-first strategy (DFS): always choose the solution with minimal time consumption from P s . (ii) Energy-first strategy (EFS): always choose the solution with minimal energy consumption from P s . e form of the candidate solutions in the algorithm is denoted as equation (2), while the initialization operation is generating n different integers with ranges from 1 to n. e feasible solutions make up a population in the evolutionary algorithm. Considering that lane-changing behavior is prohibited in CR, some solutions will be illegal. For example, in Figure 1(a), the passing order [C, A, B] cannot be accepted as candidate solution because A is supposed to be in front of C. Hence, a repair operation is applied to repair illegal sequence, which is defined as follows: where _ x represents a passing sequence that can be a candidate, and M r is a matrix that carries out the repair operation. e matrix is constructed according to the order of vehicles on the lanes. For unfeasible sequence [3, 1, 2], which represents "C-A-B" in Figure 1(a), M r is as follows: en, [3, 1, 2] will be transformed to [1,3,2], which represents "A-C-B," and it will be legal. e proposed algorithm uses selection operation, crossover operation, state transition with swap operation, shift operation, and symmetry operation for population evolution. Corresponding operations can be described as follows.

Selection Operation.
Non-dominated sorting technique is used for layering individuals. Algorithm 1 shows the process of non-dominated sorting. In the algorithm, c is the non-dominated level, and X N is the set of all the non-dominated solutions in P; fitness represents the virtual value of individuals, which is used for selection operation. Eventually, the roulette wheel method is applied to choose individuals in the population; then, the crossover operation can be carried out. In the roulette wheel method, the selection probability of individual i is defined as follows: where c f is the value of c after iterations in Algorithm 1.

Crossover Operation.
Tie-breaking crossover is introduced in this study [45]. is operation can prevent two identical orders from appearing in a sequence, and the procedure is indicated in Figure 5. e start positions and length of subsequences are generated randomly, so the results after crossover could be with duplicated items. A crossover map will also be generated, and the crossover map is actually a random order of integers 0, 1, . . . , n − 1. Accordingly, the new sequences after exchange can be transformed by multiplying the length of the sequence and adding the crossover map. Finally, as shown in Figure 5, offspring can be produced by sorting operation according to phase 3.

State Transition.
e state transition procedure is probabilistic in the light of predefined probability value p. In this study, the value of p is set to 0.2 to keep the tradeoff between exploration and exploitation. e state transition operations include swap, shift, and symmetry [35]. Swap transformation is used for randomly exchanging subsequences in passing sequences; shift transformation is used for subsequence translation, and symmetry transformation means two subsequences symmetrical about a selected central point exchange their values. ese operations can be implemented by several matrixes, which can be denoted as follows: where x k+1 is a passing sequence after k iterations. M symmetry , M shift , and M swap represent the matrix, which implements symmetry operation, shift operation, and swap operation, respectively. Figure 6 illustrates the three transformations. e length of subsequences is a hyperparameter for swap transformation and shift transformation. e values of these two operations are generated randomly according to the number of CAVs. While for symmetry operation, the length of subsequences and the position of the symmetry center can be generated randomly. Note that the boundary condition will be processed here when the indexes of elements may be out of bounds.

Vehicle Control
When a passing order is determined, CAVs can move in the light of the sequence. First of all, the motion of vehicles needs to be constrained by the speed limit and acceleration ability: where v max denotes the maximum speed limit on the road, and d max is the maximum deceleration, while a max is the maximum acceleration constraint by vehicle dynamics. e virtual vehicle mapping method is used in the framework to ensure safety [46,47]. Taking the case in Figure 1(a) as an example, if the passing order is "A-C-B," then C will be mapped into lane 1−1 . CAV B will then follow a virtual vehicle mapped by CAV C, which means the mode of motion of CAVs will be divided into two cases: free driving and car following, respectively. e control process of the CAVs in sequences can be given by Algorithm 2. Accordingly, Conflict is a function to judge whether there are potential conflicts between where t 0 is the start time and t 1 is the time when x[i] arrives at the conflict zone or stop line. In addition, Δs is the value of the safe gap between two consecutive CAVs. e gap here represents the distance from the front of the following vehicle to the rear of the leading vehicle. If x[k] is a real CAV, the value is set to € Δs, and if x[k] is a virtual vehicle, a correction factor should be added to it, which can be denoted as follows: where b is a bool variable, if x[k] is virtual, the value of b will be 1, and L M will be the distance for x[i] to cross the conflict zone.
Using Algorithm 2, the first CAV in sequence drives freely, and a CAV with a minimal relative distance with the first CAV in the rest of the sequence is chosen as car following target.
Finally, if a passing sequence is determined, it will not be altered unless the set of CAVs in CR changes.

Simulation Platform and CNN Training.
is study uses the microscopic traffic simulation software SUMO to study the cooperative driving strategy in two traffic scenarios in Figure 1. Under the premise of comprehensive consideration of reality, the simulation settings are given in Table 1. e simulation step is set to 0.2 s for smoother time-continuous control. e radius of CR in the on-ramp scenario is set to 1000 m by considering the communication capability [38]. Meanwhile, we set the radius parameter to 200 m in the urban intersection scenario because the speed of vehicles in this case is slow, while 200 m is enough for vehicle braking.
First of all, more than 50000 records were collected in SUMO for each traffic scenario to serve as the training data. e records include encoding of passing sequence and the combination of two regression targets. We use messagedigest algorithm 5 (MD5) to delete duplicated data to ensure the uniqueness of the records. Because the length of CAV encoding in the two scenarios is 5 and 9, respectively, the convolution kernel sizes are set to [2,3,4] and [2,5,7] to extract different scales of features. e Adam optimizer is used to optimize the weights and biases for the network, and a step decay schedule for learning rate is implemented in the training process for better performance. Accordingly, the rest of the hyperparameters (e.g., batch size, the initial learning rate, and the scales of dense layers) were tuned automatically by applying tree-structured Parzen estimator (TPE), which can search significantly better results compared with random search methods [48]. Journal of Advanced Transportation

Simulation Results.
To evaluate the proposed strategy comprehensively, we carried out two kinds of simulations based on the pre-trained CNN model. One is a discrete simulation, which is used for observing the performance of the framework under different static numbers of vehicles to be scheduled. e other is a continuous simulation, which is served to evaluate the framework in different traffic demand levels using the trace data exported from SUMO. We choose the FCFS strategy as a baseline, whereas it is generally used in the domain. e iteration step and population size in MODEA are set to 30 and 40, respectively. We generate different numbers of CAVs distributed in lanes randomly for the two scenarios, and the results of the discrete simulation are shown in Figure 7. Obviously, the proposed method always has a better performance than the FCFS method. While in the on-ramp scenario, the gap between the two methods becomes more significant with the increase in CAVs. us, the capability of global optimization of MODEA can be verified, while the rule-based FCFS method is regarded as weak to get satisfying solutions.
Meanwhile, when there is more than one solution in the Pareto front, the final sequence can be chosen manually according to specific requirements.
As for continuous simulation, different arrival rates of CAVs are deployed for 2000 simulation steps, and the trace data are exported per 4 times steps. e trace datasets include the information of CAVs such as position, speed, and acceleration, and then, we reload these data in SUMO and carry out simulations. In other words, the same trace data are used for result comparison so that the randomness can be eliminated.
All results presented are averaged over 10 independent runs, when the best results are shown in bold in Table 2. According to Table 2, there is no significant difference between DFS and EFS, which may be caused by the regression error of the neural network. However, with the increase in CAV arrival rate, the difference in results between FCFS and the proposed framework gets more remarkable. It demonstrates that the MODEA can optimize the two objectives jointly. Input: e passing sequence x (1) :  Journal of Advanced Transportation

Discussion about Computation Time
In cooperative driving tasks, the computation time of algorithms is vital to ensure safety and efficiency. We focus on the time performance of the proposed framework in this part, and we only consider the on-ramp scenario for evaluating computation time because the time complexity of the algorithm in the two scenarios is equal. All experiments were conducted using Julia programming language on Windows 10 operating system with Intel CORE i7-10750H CPU. Meanwhile, BenchmarkTools.jl package is used to precisely evaluate the computation time performance [49].
As Figure 8 shows, the computation time of the proposed method mainly depends on the population size of MODEA, while the number of CAVs in CR has little effect on the computation complexity, which means that we can control the computation time flexibly by setting the population size of the algorithm manually.
Meanwhile, the influence of computation time on the traffic system should be discussed. First, safety is always the most primary goal to be achieved. e impact of computing time on safety considerations will be reflected in the safe gap € Δs.
e € Δs can be roughly revised with the time consumption t d : where v max is used for ensuring safety under any circumstance, so that Δs will be changed in simulations in terms of equation (22). en, we carry out a series of simulations using the same trace data exported from SUMO to compare the performance of the control framework under different computation delays. In the test, the delay caused by computation varies from 0.1 s to 0.4 s, while DFS is chosen to get solutions. Figures 9(a) and 9(b) show the time consumption and energy consumption under different circumstances. On average, the FCFS rule will outperform the proposed framework in the time consumption aspect when the computation delay reaches 0.3 s, and it will have almost identical performance in the electricity consumption aspect when the computation delay reaches 0.4 s.

Conclusions
Over the last few years, many methods have been put forward in the cooperative driving field, but the controllability of optimization objectives and the efficiency of algorithms are still difficult to deal with. Based on the combination of evolutionary algorithm and machine learning technique, this study proposes an intelligent framework that considers both the delay and the energy consumption of vehicles. An encoding approach of CAVs is implemented, and a passing sequence of CAVs is approximately regarded as a sentence in natural language so that the TextCNN can be applied to extract features. Compared with other frameworks, it has some significant advantages: (i) Controllability and flexibility: the optimization objectives and computation time can be adjusted manually, and it can be instrumental under different design requirements. (ii) General applicability: similar to FCFS protocol, the framework can be applied in different cooperative driving scenarios such as intersection and on-ramps.
In future research, a more concrete vehicle control method is supposed to be studied for practicability. Moreover, the neural network this study implements can only deal with a finite number of cases because the input length for the network is fixed. erefore, the maximum number of CAVs must be assigned, and the zero paddings will be used if the number of CAVs is less than the predefined maximum length. Hence, the form of the neural network and CAV encodings can be further studied for better performance; for example, the encoder-decoder structure can be applied to study the cases of different numbers of CAVs. Finally, the lane-changing behavior of vehicles and pedestrian crossing rules can be considered in the system. However, a more complex but more realistic system will be put before us to study.

Data Availability
e data used to support the findings of this study are produced by simulations.