An Online Distributed Game Optimal Control for Heavy Haul Trains with Limited Communication

For heavy haul trains, it is difficult to get global information due to the limited range of communication. This paper proposed a novel distributed optimal control based on game strategy, in which the global optimization is achieved by equilibrizing subsystems’ performance just utilizing local information. To online solve the game control, an efficient multivariable extremum seeking algorithm was adapted to approximate the partial differential equation deduced by optimal condition. The convergence of the proposed approximate algorithm is proved by constructing a fictitious Lie bracket system using Lyapunov function. Finally, the proposed distributed optimal control is valuated rigorously by case study according to the configuration of Daqin railway in China.


Introduction
Heavy haul trains are used broadly in many countries with high demand for transporting mineral, petroleum, coals, and so on.Essentially, heavy haul trains are distributed powered networked system constituted with many locomotives and wagons.The basic control problem for heavy haul trains is tracking to the target speed profile while considering some performance index, including the traveling time, energy consumption, and in-train forces.Then, the optimal controller should be designed to achieve the performance.
The optimal problem has received many attentions recently in the field of train's operating control [1][2][3][4][5].Energy consumption and traveling time focused optimal control was studied in these works for either passenger trains or ordinary freight trains.But much larger dynamic in-train forces were imposed in heavy haul trains due to unreasonable control of the distributed power, undulating grades, and also the lager train lengths.The couplings used in heavy haul trains wear out due to large in-train forces [1].So the optimization techniques used for passenger trains or ordinary freight trains may not be directly applicable to heavy haul trains.
Researches on optimal control of heavy haul trains were done about travelling time, in-train forces, and energy consumption.An optimal open-loop offline controller is designed for cruise control of heavy haul trains [6].It is derived that distributed controller can be more useful to minimize the in-train forces.But static error always exists for the reason of open loop.Based on this, the authors did further research on closed-loop optimal controller of heavy haul trains with consideration of in-train forces, speed tracking error, and energy consumption.It concluded that the 2-2 mode (full distributed) controller's performance about intrain forces is much better than 1-1 mode (centralized).A similar conclusion was derived in [7].In these literatures, a LQR (linear quadratic regulator) optimal method is used to optimize the feedback gain using global information [6,7].The global information is gathered by communication networks laid on the train, for example, Lonworks (used in electronically controlled pneumatic (ECP)), GSM-R, or LTE-R.But the global information is not always available during the traveling due to the range of communication, disturbance, delay, and communication failure.Therefore, design of a distributed optimal controller for each powered locomotive and wagon is more practical and intriguing using available local information about the whole system.
Local performance index should be constructed first when designing distributed optimal controllers for heavy haul trains.The subsystems in the train work in a team to track a desired speed profile.At the same time, each of 2 Mathematical Problems in Engineering them tries to minimize the energy consumption and in-train forces at the cost of increasing others, which is very complex and difficult.Game strategy is an efficient method to solve these types of problems with conflict benefits [8].Distributed optimal control problem was solved by game strategy for multiagent system in some works [9][10][11].
In [9], a noncooperative game was designed to distributed optimization with fixed local communication.The authors design estimators to estimate other players' states which make the controller compute heavily.Since there is information exchange among the players, there is no need to design so many estimators.A most related work is done in [10], which brings together cooperative control, reinforcement learning, and game theory to solve multiplayer differential games on communication graph topologies.It formulates graphical games for dynamic systems and provides policy iteration and learning algorithms along with proof of convergence to the Nash equilibrium.Since there are communications, the subsystems cooperate to track the target speed.
A differential game in the sense of cooperation is constructed in [12] and Pareto equilibrium is further studied for cooperative game [13].Analysis of convergence to Pareto equilibrium was done and the necessary and sufficient conditions were derived.It showed that if the dynamic system is controllable then all Pareto candidates can be obtained by solving the necessary conditions of a weighted sum optimal control problem.However, this is done offline and global information is required.
In this paper, we made an improvement on these works.We propose a novel distributed optimal control based on game strategy, in which the global optimization is achieved by equilibrizing subsystems' performance just utilizing local information.To online solve the game, an efficient multivariable extremum seeking algorithm was adapted to approximate the partial differential equation deduced by optimality condition.The proposed controller is applied to heavy haul trains with second order interconnected model.
The main contribution of this paper includes four aspects.Firstly, a distributed optimal controller using local information with conflict performance index is proposed for heavy haul trains with a second order interconnected model.A system level global target was attained by optimizing the local performance with proper communication topology.Secondly, a cooperative differential game is constructed to solve the distributed optimal problem with conflicting individual objectives.We proved that a Pareto equilibrium would be reached when the communication topology of the subsystem in heavy haul trains was strongly connected.Thirdly, we proved that the cooperative game can be solved only by local information, while, in most existing literatures, global information is necessary for solving the game.Lastly, we develop an efficient multivariable extremum seeking algorithm, which was proposed to approximate the partial differential equation deduced by solving the cooperative game.
The remainder of this paper is organized as follows.In Section 2, the model of heavy haul trains is given, including kinetic model and communication connections model.In Section 3, the distributed optimal control for heavy haul trains is formulated and then a cooperative game is constructed.In Section 4, the stability and convergence of formulated cooperative game are analyzed.In Section 5, a multivariable extremum seeking algorithm is designed to approximate the partial differential equation deduced by solving the cooperative game.In Section 6, a simulation scenario is set based on application.At last, the conclusion is given.

Model of Heavy Haul Trains
In this section, the formulation of distributed optimal control problem and the train model used in this paper is summarized.
There are two types of model of heavy haul trains discussed in the existing literatures, namely, point-mass model used in [3,14] and spring-mass model used in [7,15].Spring-mass model is mostly used when considering the complex in-train dynamics and the effectiveness is validated against experimental data collected on a train operated by Spoornet on its COALlink in South Africa [16].The wired Lonworks communication or wireless GSM-R and LTE-R communication is used to operate heavy haul trains more efficiently with the development of communication based train control system.In our paper, the spring-mass model of heavy haul trains is used with considering the communication connections among the cars (for ease of presentation, in what follows both locomotives and wagons in heavy haul trains are referred to as cars), which is more realistic.

Kinetic Model of Heavy Haul
Trains.The longitudinal dynamic characters of heavy haul trains can be described by the following set of equations [7,16]: where  = 2, . . .,  − 1 and  (positive integer) is the number of cars in heavy haul trains.When   , V  , and   denote the displacement with respect to an inertial frame, the speed, and the mass of the th car, respectively,   is the traction force or braking force added to the th car.   is the in-train force between the th and ( + 1)th car and can be formulated as follows: where   is the coupler's spring constant of the th car and   is the coupler's damp constant of the th car.The in-train forces can be viewed as the coupling force between the cars and we can rewrite (1) as follows: where the letter ⃗  is used to denote a vector.
2 is the th car's states vector, the matrix ⃗  = [0 1; 0 0], the matrix ⃗  = [0 1]  ,   is coupling coefficient, and   = 1 when  ̸ = ,   = −2 when  = , Θ = [0 0; / /] is the coupling matrix where we suppose that all the   and   are equal, if the types of couplers in the train are the same actually, and   is the control input.

Model of Communication Connections among the Cars.
As a benefit from the communication network equipment laid on heavy haul trains, more flexible control strategy can be designed.Lonworks is the most popular network protocol used in heavy haul trains equipped with ECP/iDP systems as shown in Figure 1(a).However, heavy haul trains need to marshal when loading and unloading freight, and then the wireless network equipped on the train will be more flexible, such as LTE-R (Long Term Evolution-Railway), as shown in Figure 1(b).
Remark 1.It is obvious that there are some drawbacks of configuration as shown in Figure 1(a).If one of the nodes fails, the whole network will be broken.Moreover, frequent marshalling of the train makes it difficult and expensive to maintain.Configuration (b) is much more flexible and can provide larger bandwidth.LTE-R has already been laid out in Shuohuang Railway in China.
A direction graph is used to describe the communication among subsystems.According to the trains model equation (1), there are  vertexes V = {V 1 , V 2 , . . ., V  } in the Graph G(V, , A), which is used to describe the communication among the cars (a node in the graph) similarly as [17].The set of edges or arcs is  ⊆ V × V.An edge from node  to node  is denoted by (V  , V  ), which means that node  receives the information from node .The associated adjacency matrix A = [  ] ∈  × , where   = 1 if (V  , V  ) ∈ , means node  can receive information from node ,   = 0 otherwise.Particulalry,   = 0; that is to say, there is no self-loop in the Graph.Node  is called a neighbor of node  if (V  , V  ) ∈ .The set of neighbors of node  is denoted as A direction graph is said to have a spanning tree, if there is a node   (called the root), such that there is a directed path from the root to every other node in the graph.In the communication graph G, if there is a path from node  to node , we denote that  is reachable from .If there is at least a path between any two nodes  and , the graph is strongly connected [18].

Desired Speed
Profile.The heavy haul trains travel according to a desired speed profile and the speed profile can be expressed as where  0 , V 0 , and  0 are the desired displacement, speed, and input, respectively, and Usually, we can view the desired speed profile as the operator's command and can be scheduled offline due to experience or online heuristic or close-loop global optimality of energy consumption, traveling time, and so on.However how to design the desired speed profile is not our focus in this paper.If the desired speed profile is scheduled appropriately, and the heavy haul trains can track the profile well under our proposed controller, the trains can travel with favorable performance.

Distributed Optimal Control with Neighbor's Information.
Due to the different control requirements caused by undulating grades of the railway and unpredictable communication failure among cars in the train, distributed controllers are designed by neighbor's information (available information from cars in the train, and global information is not necessary), with considering some performance index to track the desired speed profile.The disagreement of car  to his neighbors and the desired speed profile is denoted in where   is defined in the section of communication model and  0 = 1 if the car  can directly receive the operator's command.
The controllers in the train cooperate to achieve a system level objective.So, the objective of our work is to design   in (1) with optimizing the local performance index   as designed in where   , ⃗   , and ⃗   are weight coefficients.In the performance index   ,   is the weight of tracking desired speed and the in-train forces and ⃗   is the weight of energy consumption and ⃗   is the weight of energy consumption of neighbor cars.To minimize the performance index   means optimization about the train's travel time, in-train forces, and energy consumption [7].The significant difference is   defined in (6) depending only on local neighbors' information and the performance index may be coupled and conflict.We can see that decreasing the input of one car will result in increasing inputs of others, which makes solving of optimal control problem very difficult.The distributed optimal control problem discussed in this paper will be formulated as a cooperative game problem, which is capable of solving problems of many individuals with cooperative and/or conflict interests.

Applying Cooperative Game Strategy to Heavy Haul
Trains.As discussed in the section of distributed optimal control problem, it is converted to a cooperative game.The detailed description and definition will be given as follows.
A four-tuple [ ⃗         ] is used to describe the state of a car in the train.  is a controller's decision value, namely, control input, considering the neighbors' states. − denotes the decision values of 's neighbors.U  is used to denote the set of available decisions for car .A tuple ⃗  = ( 1 ,  2 , . . .,   ) = Π  U  ∈ U is denoted as a joint value.A function   (  ()) is used to denote the cost of taking decision   .The cost function is defined as (7), which coincides with performance index   : Assumption 2. Assume that   ( ⃗   ()) is continuous and firstorder derivative and from the expression equation (7), it is an integral of quadratic form function, then   is nonnegative.
The neighbors' states are available to make a decision for car .It is always impossible to minimize the performance index for a single car without increasing that of other cars.It implies that there exists an equilibrium for the whole system, which coincides with the characters of Pareto optimality.The Pareto optimality equilibrium for cooperative game is defined in [12], with global information of the system.Lemma 3. Global Pareto optimality:  * are Pareto optimal solutions for the formulated cooperative game, if and only if, for ∀ ∈ ,  * minimizes   () on the constrained set: However, when only partial information is available to design the decision during game, Lemma 3 needs to be extended in the sense of local Pareto optimality equilibrium.If the communication topology of the system meets some conditions, the global Pareto optimality equilibrium will be reached on the basis of local Pareto optimality equilibrium.The definition of local Pareto optimality equilibrium is given in Definition 4. The objective of the paper is to design a decision making algorithm while playing the cooperative game.The game converges to a global Pareto optimality when every player is achieving local Pareto optimality.
For a distributed optimal problem formulated in our paper, if the designed decision making rule   , ∀ to play the game is continuous and   (0) = 0 can stabilize the disagreement ⃗   of every car locally with minimizing   , and the value function   ( ⃗   ()) is finite, we say that a global optimization is reached under distributed optimal controllers by cooperative game.
Design   to minimize the value function   ( ⃗   ()) subject to the state transition function of the system equation ( 3) and the initial states ⃗   (0) with states bound By Lemma 3 and conclusion from [13], this game can be solved by a group of constrained optimal equations.Construct a Hamiltonian function with boundary conditions and where   ( ⃗   , ⃗   ,   ) = ̇⃗   and ̇⃗   is denoted in (3),   is defined in (6), and   and   are positive real numbers [19].To minimize the local performance index equation ( 6) with constrains, by Pontryagin's minimum principle and Karush-Kuhn-Tucker condition, the following equations should be satisfied: where if   =   , ( 12) can be written as Choose any car labeled , its neighbours are   , and not all the cars are included in this group.Then for this group Gr  , the Pareto optimal is reached.Because the graph is strongly connected, for the group Gr  , there is at least a car  which is reachable for other cars not in group Gr  .It is the same for the group Gr  which is different from Gr  .Then the Pareto optimality is propagated from Gr  to Gr  .The Pareto optimality is propagated to other cars until there is no any car left in the graph.Then we can say that the global Pareto optimality is reached.

Stability and Convergence of Cooperative Game
Necessity.If the graph is not strongly connected, there is at least a pair of  and  between which there is no path.Then the Pareto optimalilty related with  cann't propagate to , and then the global Pareto optimality is not achieved.
Sufficiency.When the graph is strongly connected, the global Pareto optimality is not achieved and only some local Pareto optimality is achieved, for example, Gr  , Gr  , and Gr  .That is to say,  ∈   ,  ∉   , and  ∉   .By the view of Pareto optimality propagation, we can deduce that there is no path from  to  or , so this is contradictory to the graph that is strongly connected.
Theorem 6 (stability of cooperative distributed optimal controller).Let Assumption 2 hold.To design a distributed cooperative optimal controller through game, according to (13), aiming at effecting on state disagreement and deviation to the desired profile,   should be related with the variety of   , that is   /  .Given   () =   /  , ∀, under the Pareto optimal  *  , the heavy haul trains are asymptotically stable and terminally converge to the desired speed profile.Proof.To analyze the stability of cooperative distributed optimal control, we utilize the Lyapunov function.Choose the value function   as a Lyapunov function,   ≥ 0. The time derivative of   is as follows: where   is designated in (6).The terms in V  are quadratic form, and then we can derive that V  < 0. So the system is asymptotically stable and ⃗   → 0. That is to say ⃗   → ⃗   → ⃗  0 , the states converge to the desired profile.

Multivariable Extremum Seeking Algorithm for Cooperative Game
To drive the cooperative game to a Pareto optimal equilibrium, the decision rule equation ( 13) must be computable.According to Theorem 6,   is in the form of Substituting   into (9), we derive a partial differential equation.It is hard to get an analytical solution for such an equation.Extremum seeking algorithm provides a numerical perturbation based method to steer an unknown dynamical system to the optimality.Motivated by [20,21], a multivariable extremum seeking estimator is designed to solve the Pareto optimality.A basic multivariable extremum seeking algorithm is given in [22], which is based on perturbation and periodic (sinusoidal) excitation signals are primarily used to probe the nonlinearity and couplings.According to the equations of   and ⃗   , block diagram of the multivariable extremum seeking algorithm for solving Pareto optimality is given as Figure 2, where ℎ  is a vector to pick   whose th element is 1, and other elements are 0. ⃗   is a vector of Theorem 8. Consider a distributed optimal problem aiming at tracking the desired profile which would be solved by a game, to minimize the performance index   , ∀ ∈  as defined in (6), Lemma 3. Through a multivariable extremum seeking algorithm described in Figure 2 and ( 16), the Pareto optimality can be solved online at  *  ,  *  and  *  .If   is large sufficiently, the Pareto optimal solution (û *  , ̂⃗  *  ), ∀ ∈ , is globally uniformly asymptotically stable for system as described in (16).
Proof.Motivated by [21], a Lie brackets based analysis method is developed.Firstly, we consider the stability of   .According to (16), the overall states can be written as where , and .] 1 = sin    and ] 2 = cos    are fictitious control input.Then an input affine system is constructed.
The Lie bracket operation of vector ⃗  and ⃗  is defined as follows: where ∇     is the gradient of   at the direction of   .Then a Lie bracket system can be defined according to (18): where  12 = ∫  − ∫  0 sin  cos   , calculating the integral, and  12 = .Then the Lie bracket system can be written as Construct a global Lyapunov function as   = ∑  =1   − ∑  =1  *  .It is obvious that   ≥ 0 because  *  is minimized.The time derivative of   is as follows: Substitute ( 20) into (21), when   = 0 and V  = 0, we can see that   will converge to Pareto optimal equilibrium  *  for all  and the optimal control  *  will be achieved at the same time.So the system is globally uniformly asymptotically stable.

Results and Discussion
To validate the performance of proposed distributed cooperative optimal controller, a simulation scenario based on field application is set.A segment of track, desired speed profile Figure 3, and simulation parameters of trains are given in Table 1, which comes from heavy haul trains running on Daqin railway in China.Without loss of generality, the number of locomotives of heavy haul trains is set to be four  in the simulation, which is a 1+2+1 (namely, one locomotive + some wagons + two locomotives + some wagons + one locomotive) mode usually used in Daqin.Traction or braking of the train mainly act on locomotive and the dynamics of wagons are not considered for simplicity and treat them as rigid body in our simulation.But the method can be easily applied to more cars scenario.
The communication is limited due to the long distance among the cars, so the cars can only get some information of the whole train.By Theorem 6, a strongly connected communication scenario is set as Figure 4, where the locomotives in the heavy haul trains are labeled 1, 2, 3, and 4. The car labeled 0 describes the order imposed on the heavy haul trains.The order may come from the operator or an automatic driving system.It is easy to validate that the graph is strongly connected according to the definition in Section 2.  The desired speed profile V 0 () is given in Figure 3, and the initial speeds of the locomotives are set to be V 1 (0) = 59.40 km/h, V 2 (0) = 60.12 km/h, V 3 (0) = 59.44 km/h, V 4 (0) = 59.29 km/h.( When the distributed optimal controller is used, the velocities of the locomotives response as shown in Figure 5.The lateral axis displays the train's traveling distance during simulation.The vertical axis displays the speed response of locomotives under the controller designed in our paper.We can see that the locomotives in the heavy haul trains can track desired speed profile very well under the distributed controller with local performance index.That is because the controller aims to minimize a performance index   where ⃗   contains a term about tracking the desired speed.When the desired speed changes, there is a transient process before the locomotives reach the new desired speed.A partial enlarged figure in Figure 5 (from 7.5 km to 8.5 km) shows the detailed speed dynamics.
The cars' velocity deviation to the desired profile under distributed optimal controller is shown in Figure 6.The lateral axis displays the train's traveling distance during simulation.The vertical axis displays the velocity deviation to the desired profile under the controller designed in our

Figure 1 :
Figure 1: Two types of network equipped on heavy haul trains.In (a)   ( ∈ ,  is the number of cars in the train) is the th controller equipped on the cars.They can communicate with each other by wired Lonworks protocol.  is the conjunction box between the cars.(b) It is a wireless network configuration of heavy haul trains.The physical connections are ignored in the figure.

Figure 4 :
Figure 4: Communication topology of simulation system.

Figure 5 :
Figure 5: Velocity response under distributed optimal controller.
(6) then the estimator of (  , ⃗   ) can be  , ̂⃗   ).,   , and   are real numbers.According to the extremum seeking algorithm shown as Figure2, one hasu  =     sin    +   cos   , ⃗   =       sin    +     cos   ,   = 2 (  +  0 )In our designed extremum seeking algorithm,   is not measured directly but can be derived by measuring other states according to(6), which depends on local available states.  affects the frequency of perturbation and avoiding of reaching a local optimality for nonconvex system.It is insignificant to set   uniformly as  for simplify.  is set similarly.By the multivariable extremum seeking algorithm, we aim at forcing the solutions (û  , ̂⃗   ), ∀ ∈  finally

Table 1 :
The simulation parameters.