Distributed Model Predictive Control of the Multi-Agent Systems with Improving Control Performance

This paper addresses a distributed model predictive control (DMPC) scheme for multiagent systems with improving control performance. In order to penalize the deviation of the computed state trajectory from the assumed state trajectory, the deviation punishment is involved in the local cost function of each agent. The closed-loop stability is guaranteed with a large weight for deviation punishment. However, this large weight leads to much loss of control performance. Hence, the time-varying compatibility constraints of each agent are designed to balance the closed-loop stability and the control performance, so that the closed-loop stability is achieved with a small weight for the deviation punishment. A numerical example is given to illustrate the effectiveness of the proposed scheme.


Introduction
Interests in the cooperative control of multiagent systems have been growing significantly over the last years.The main motivation is the wide range of military and civilian applications, including formation flight of UAV and automated traffic systems.Compared with the traditional approach, model predictive control (MPC), or receding horizon control (RHC) has the ability to redefine cost functions and constraints as needed to reflect changes in the system and/or the environment.Therefore, MPC is extensively applied to the cooperative control of multiagent systems, which makes the agents operate close to the constraint boundaries and obtain better performance than traditional approaches [1][2][3].Moreover, due to the computational advantages and the convenience of communication, distributed MPC (DMPC) is recognized as a nature technique to address trajectory optimization problems for multiagent systems.
One of the challenges for distributed control is to ensure that local control actions keep consistent with the actions of others agents [4,5].For the coupled systems, the local optimization problem is solved based on the states of its neighbors' at sample time instant using Nashoptimization technique in [6].As the local controllers lack of communication and cooperation, the local control actions cannot keep consistent [7,8].require each local controller exchange information with all other local controllers to improve optimality and consistency based on sufficient communication.For the decoupled systems [9], exploits the estimation of the prediction state trajectories of the neighbors' [10]; treats the prediction state trajectories of the neighbor agents as bounded disturbance where a min-max optimal problem is solved for each agent with respect to the worst-case disturbance.In [11,12], the optimal variables of the local optimization problem contain the control action of its own and its neighbors' which are coupled in collision avoidance constraints and cost function.Obviously, the deviation between the actions of what the agent is actually doing and of what its neighbor estimates for it affects the control performance.Sometimes the consistency and collision avoidance cannot be achieved, and the feasibility and stability of this scheme cannot be guaranteed [13].proposes a distributed MPC with a fixed compatibility constraint to restrict the deviation.When the bound of this constraint is sufficiently small, the closed-loop system state enter a neighborhood of the objective state [14,15] give an improvement over [13] by adding deviation punishment term to penalize the deviation of the computed state trajectory from the assumed state trajectory.Closed-loop exponential stability follows if the weight on the deviation function term is large enough.But the large weight leads to the loss of the control performance.
A contribution in this paper is to propose an idea to reduce the adverse effect of the deviation punishment on the control performance.At each sample time, the value of compatibility constraint is set as the maximum value of the deviation of the previous sample time.We give the stability condition to guarantee the exponential stability of the global closed-loop system with a small weight on the deviation punishment term, which is obtained by dividing the centralize stability constraint as the manner of [16,17].The effectiveness of the scheme is also demonstrated by a numerical example.

Notations. x i
k is the value of vector x i at time k • x i k,t is the value of vector x i at a future time k + t, predicted at time is the absolute value for each component of x.For a vector x and positive-definite matrix

Problem Statement
Let us consider a system which is composed of N a agents.The dynamics of agent i [11] is where T .The sets of feasible input and state of agent i are denoted as U i ⊂ R mi and X i ⊂ R ni , respectively, that is, At each time k, the control objective is [18] to minimize with respect to u k,t , t ≥ 0, where where is the equilibrium point of agent i, and (x e , u e ) is the corresponding equilibrium point of all agents.X The models for all agents are completely decoupled.The coupling between agents arises due to the fact that they operate in the same environment, and that the "cooperative" objective is imposed on each agent by the cost function.Hence, there are the coupling cost function and coupling constraints [19].The coupling constraints can be transformed to coupling cost function term directly or handled as decoupling constraints using the technique of [15].In the present paper we will not consider this issue.
The control objective for all system is to cooperatively asymptotically stabilize all agents to an equilibrium point (x e , u e ) of (4).In this paper we assumed that the (x e , u e ) = (0, 0), f (x e , u e ) = 0.The corresponding equilibrium point for each agent is (x i e , u i e ) = (0, 0), f i (x i e , u i e ) = 0. Assumption f i (0, 0) = 0 is not restrictive, since if (x i e , u i e ) / = (0, 0), one can always shift the origin of the system to it.
The resultant control law for minimization of (3) can be implemented in a centralized way.However, the existing methods for centralized MPC are only computationally tractable for small-scale system.Furthermore, the communication cost of implementing a centralized receding horizon control law may be costly.Hence, by means of decomposition, J k is divided as J i k 's such that the minimization of ( 3) is implemented in distributed manner, with where [19].For the agents that have decoupled dynamics, the couplings of control moves for all system are not considered.R is a diagonal matrix and R i is directly obtained.
Under the networked environment, the bandwidth limitation can restrict the amount of information exchange [17].It is thus appropriate to allow agents to exchange information only once in each sampling interval.We assume that the connectivity of the interagent communication network is sufficient for agents to obtain information regarding all the variables that appear in their local problems.
In the receding horizon control manner, a finite-horizon cost function is exploited to approximate J i k .According to the (5), the evolution of the control moves with predictive horizon for agent i is based on the estimation of the state trajectories x −i k,t , t ≤ N of the neighbors' , which are substituted by the assumed state trajectories x −i k,t , t ≤ N as [11].In each control interval, the transmitted information between agents is the assumed state trajectories.As the cooperative consistency and efficiency of distributed control moves is affected for the existence of the deviation of the computed state trajectory from the assumed state trajectory, it is appreciate to penalize it by adding the deviation punishment term into the local cost function.Define where x −i k,t includes the assumed states of the neighbors.
Obviously, Q i is designed to stabilize the agent i to the local equilibrium point, independently.Q i is designed to stabilize the agent i to the local equilibrium point with neighbor agents, cooperatively.T i is the weight on the deviation punishment term, to penalize the deviation of the computed state trajectory from the assumed state trajectory.At each time k, the optimization problem for distributed MPC is transformed as: 1), (2), ( 6), (7).( 10) is implemented, and the problem (9) is solved again at time k + 1.
Remark 1.The local deviate punishment by each agent effects the control performance, that is, incurs the loss of optimality.

Stability of Distributed MPC
The stability of distributed MPC by simply applying the procedure as in the centralized MPC will be discussed.The compact and convex terminal set Ω i is defined where P i > 0, α i > 0 are specified such that Ω i is a control invariant set.So using the idea of [20,21], one simultaneously determines a linear feedback such that Ω i is a positively invariant under this feedback.Define the local linearization at the equilibrium point and assume that (A i , B i ) is stabilizable.When x i k,N+t , t ≥ 0 enters into the terminal set Ω i , the local linear feedback control law is assumed as is a constant which is calculated off line as follows.

Design of the Local Control Law.
The following equation follows for achieving closed-loop stability: Lemma 1. Suppose that there exist Q i > 0, R i > 0, P i > 0, which satisfy the Lyapunov-equation: for some κ i > 0.Then, there exists a constant α i > 0 such that Ω i defined in (11) satisfies (13).
Remark 2. Lemma 1 is directly obtained by referring to "Lemma 1" in [21].For MPC, the stability margin can be adjusted by turning the value of κ i according to Lemma 1.
With regard to DMPC, [11] adjusts the stability margin by tuning the weight in the local cost function.The control objective is to asymptotically stabilize the closed-loop system, so that x i k,∞ = 0 and u i k,∞ = 0.For t = 0, . . ., ∞, summing (13) obtains Considering both ( 7) and ( 15), yields where J i k is a finite-horizon cost function, which consists of a finite horizon standard cost, to specify the desired control performance and a terminal cost, to penalize the states at the end of the finite horizon.
The terminal region Ω i for agent i is designed, so that it is invariant for nonlinear system controlled by a local linear state feedback.The quadratic terminal cost x i k, N 2 Pi bounds the infinite horizon cost of the nonlinear system starting from Ω i and controlled by the local linear state feedback.

Compatibility Constraint for Stability.
As in [18], we define two terms, ξ Lemma 2. Suppose that (9) holds and there exits ρ(k) such that, for all k > 0, Then, by solving the receding-horizon optimization problem min 1), (2), ( 14), ( 16), and implementing u * i k,0 , the stability of the global closed-loop system is guaranteed, once a feasible solution at time k = 0 is found.
Hence, it leads to where λ min (Q) is the minimum eigenvalue of Q.This indicates that the closed-loop system is exponentially stable.Satisfaction of (18) indicates that all x i k,t should not deviate too far from their assumed values x i k,t [13].Hence, ( 18) can be taken as a new version of the compatibility condition.This compatibility condition is derived from a single compatibility condition that collects all the states (whether predicted or assumed) with in the switching horizon and is disassembled to each agent in distributed manner, which results in local compatibility constraint for each agent.

Synthesis Approach of Distributed MPC.
In the synthesis approach, the local optimization problem incorporates the above compatibility condition.Since x * k,t for all agent i is coupled with other agents through (18), it is necessary to assign the constraint to each agent so as to satisfy (18) along the optimization.The continued discussion on stability depends on handling of (18).
Remark 3. By adding the deviation punishment term in the local cost function, the closed-loop stability follows with a large weight.The larger weight means the more loss of the performance [14,19].For a small value of T i , we can adjust the value of ρ i (k) to obtain exponential stability.As the ρ i (k) is set by optimization, this scheme has more freedom to tuning parameters, to balance the closed-loop stability and control performance.
Remark 4. According to (31), the maximum value and minimum value of T i can be calculated by considering the range of each variable.We choose the middle value for T i .
Obviously, the T i is time varying and denoted as T i (k).

Control Strategy
For practical implementation, distributed MPC is formulated in the following algorithm.

Algorithm. Off-line stage:
(i) Set the value of the prediction horizon N.
(iv) Calculate the terminal weight P i , local linear feedback control gain K i and the terminal set Ω i .
On-line stage: For agent i, perform the following steps at k = 0: (i) Take the measurement of x i 0 .Set T i = 0. (ii) Send x i 0 to its neighbor j, j ∈ N i of agent i. Receive x j 0 .(iii) Set x j t,0 = x j 0,0 , j ∈ N i , t = 0, . . ., N − 1 and x i 0,t = x i 0 .(iv) Solve problem (19).(v) Implement u i 0 = u * i 0,0 .(vi) Get x i t,0 and the value of compatibility constraint E i (1).(vii) Send x i 0,t and E i (1) to its neighbor j, j ∈ N i .Receive x j 0,t and E j (1).Calculate T i (k).For the agent i, perform the following steps at k > 0: (i) Take the measurement of x i k .(ii) Solve problem (19).
(iii) Implement u i k = u * i 0,k .(iv) Get x i k,t and the new value of compatibility constraint E i (k + 1).(v) Send x i k,t and E i (k + 1) to its neighbor j, j ∈ N i .Receive x j k,t and E j (k + 1).(vi) Calculate T i (k).

Numerical Example
We consider the model of agent i [22] as which is obtained by discretizing the continuous-time model T , q i,x k and q i,y k are positions in the horizontal and vertical directions, resp.v i,x k and v i,y k are velocities in the horizontal and vertical directions, resp.) with sampling time interval of 0.5 second.There are four agents.A set of positions of the four agents constitute a formation.The initial positions of the four agents are Linear constraints on states and input are The agent i, i = 1, 2, 3 are selected as the core agents of the formation.A 0 is designed as If all systems achieve the desire formation and the core agents cooperatively cover the virtue leader, then u The global cost function is obtained as They cooperatively track the virtual leader whose reference is q c = (0.5 * k, 0).The distance between agents is defined as c 12 = (−2, 1), c 13 = (−2, −1), c 24 = (−2, 1).Choose N 1 = {2}, N 2 = {1}, N 3 = {1}, N 4 = {2}.Then, and The above choice of model, cost, and constraints allow us to rewrite problem (19) as a quadratic programming with quadratic constraint.To solve the optimal control problems numerically, the package NPSOL 5.02 is used.From top to bottom, the first subgraph of Figure 1 is the evolution of the formation with central MPC; the second sub-graph of Figure 1 is the evolution of the formation with distributed MPC with time-varying compatible constraint; the third sub-graph of Figure 1 is the evolution of the formation with distributed MPC with a fixed compatibility constraint.With the three control schemes, the formation of all agents can be achieved.The obtained J true s are 2.5779 × 10 6 , 4.8725 × 10 6 , and 5.654 × 10 6 , respectively.Compared with the second sub-graph, the third sub-graph have a large overshoot at the time-instant k = 9 (nearby the position (3, 0)).The distributed MPC with the time-varying compatible constraint has a better control process comparing to the one with fixed compatible constraint.The value of ρ i (k) is shown in Figure 2. " * " for agent 1; "O" for agent 2; ">" for agent 3; "<" for agent 4.
Remark 5.For the second simulation, the value of the fixed compatible constraint is 0.2.For the third simulation, the values of the time-varying compatible constraint is calculated according to the states deviation of the previous horizon.

Conclusions
In this paper, we have proposed an improved distributed MPC scheme for multiagent systems based on deviation punishment.One of the features of the proposed scheme is that the cost function of each agent penalizes the deviation between the predicted state trajectory and the assumed state trajectory, which improves the consistency and optimal control trajectory.At each sample time, the value of compatibility constraint is set by the deviation of previous sample time-instant.The closed-loop stability is guaranteed with a small value for the weight of the deviation function term.Furthermore, the effectiveness of the scheme has been investigated by a numerical example.One of the future works will focus on feasibility of optimization.

Figure 1 :
Figure 1: Evolutions of the formation with different control schemes.