Event-Triggered Adaptive Dynamic Programming Consensus Tracking Control for Discrete-Time Multiagent Systems

'is paper proposes a novel adaptive dynamic programming (ADP) approach to address the optimal consensus control problem for discrete-time multiagent systems (MASs). Compared with the traditional optimal control algorithms for MASs, the proposed algorithm is designed on the basis of the event-triggered scheme which can save the communication and computation resources. First, the consensus tracking problem is transferred into the input-state stable (ISS) problem. Based on this, the event-triggered condition for each agent is designed and the event-triggered ADP is presented. Second, neural networks are introduced to simplify the application of the proposed algorithm. 'ird, the stability analysis of the MASs under the event-triggered conditions is provided and the estimate errors of the neural networks’ weights are also proved to be ultimately uniformly bounded. Finally, the simulation results demonstrate the effectiveness of the event-triggered ADP consensus control method.


Introduction
Because of the wide applications in the control field [1][2][3][4][5][6], the consensus control of MASs gained more and more attentions. In recent years, quite a few methods have been reported to solve the consensus control problem of MASs, such as adaptive control [7,8] and sliding mode control [9,10]. It is worth mentioning that the previous methods focus on the stability of the MASs. However, the optimal characteristic is also worth considering in the consensus control problem. Optimal consensus control problem aims to find the optimal control policies which guarantee the stability of MASs and minimize the energy cost. As one of the core methods to achieve the optimal control policies, ADP approaches address the issue abovementioned by approximating the solutions of Hamilton-Jacobi-Bellman (HJB) equation [11][12][13].
Till now, ADP approaches have been applied in the optimal consensus control of MASs [14][15][16][17][18][19][20]. In [14], an optimal coordination control algorithm has been designed to address the consensus problem of the multiagent differential games through fuzzy ADP. e optimal output heterogeneous MASs was considered in [15]. Based on this work, Gao et al. [16] considered the dynamic uncertainties factor in the cooperative output regulation problems. Zhang et al. [17,18] considered the optimal consensus tracking control for discrete-time/continuous-time MASs. In order to address the optimal consensus problem for unknown MASs with input delay, the authors proposed a data-driven disturbed adaptive controller based on ADP technique in [19]. In [20], the problem of data-based optimal consensus control was studied for MASs with multiple time delays. All the above results are based on the assumption that the communication and computing resources are big enough to transmit system data and update the control policy in every time step. However, it is difficult to be satisfied in practice.
Event-triggered control (ETC) is a well-recognized technology to address the above issue [21][22][23][24]. Different from the time-triggered control, whether the systems sample the signals or not only depends on the event-triggered condition. If it is satisfied at some time instants, then the data will be transmitted and the control policy will be updated. erefore, compared with the control algorithms based on time-triggered scheme, the event-trigger control algorithms can efficiently save the computation resources [25]. In the past years, ETC is introduced to solve the optimal control problem under the limited computing resources [26][27][28][29]. In [26], an ETC method based on ADP is developed for continuous-time MASs. e authors considered the unknown internal states factor in the event-triggered optimal control for continuous-time MASs in [27]. e multiplayer zero-sum differential games are considered in [28] and an optimal consensus tracking control based on event-triggered is designed to solve this problem. In [29], an event-triggered optimal control algorithm is designed for unmatched uncertain nonlinear continuous-time systems. In [30], to save the limited network resources, an event-triggered mechanism was introduced to address the consensus problem of linear discrete-time MASs. e authors considered the event-triggered consensus problem of discrete-time multiagent networks in [31]. It is worthy to say, all the results in [26][27][28][29] studied the event-triggered optimal control for continuous-time MASs, but there were few works [30,31] which consider the discrete-time MASs.
Motivated by the above discussions, an event-triggered ADP control algorithm is designed to address the optimal consensus tracking problem for discrete-time MASs. e major contributions of this paper are emphasized as follows: (1) Comparing with the existing event-triggered ADP consensus control methods [27][28][29], we design the adaptive ET condition for every agent in the MASs. en, the agent samples the data and communicates with the neighbors only when its event-triggered condition is satisfied. at means the agents in the MASs may not communicate with their neighbors or update their control policies at the same time instant, and then, the communicate resources are saved.
(2) In this paper, we give the stability analysis for the MASs under the event-triggered condition. It shows all agents in the discrete-time MASs will achieve consensus under the ET condition. And, we also prove the weight estimate errors for the critic neural networks (NNs) and actor NNs are uniformly ultimately bounded during the learning process. e rest of this paper is organized as follows. In Section 2, the discrete-time MASs are considered and the consensus problem is formed. e event-triggered conditions for each agent in the system are introduced and the stability analysis is given in Section 3. en, NN-based event-triggered ADP algorithm is introduced in Section 4, and the simulation results of this algorithm are given in Section 5. Finally, the conclusions are shown in Section 6.

Problem Formation
Consider the discrete-time MASs: where x i (k) ∈ R n×1 and u i (k) ∈ R m i ×1 denote the state and the coordination control of agent i, i ∈ 1, 2, . . . , N, respectively. A ∈ R n×n and B i ∈ R n×m i are the constant matrices. e leader's dynamics function is defined as where x 0 (k) ∈ R n denotes the state of the leader. e local neighbor consensus tracking error ξ i is defined as where α ij denotes the adjacency elements, a ij > 0 if agent i can communicate with agent j, otherwise, α ij � 0, and β i denotes the pinning gain, β i > 0, if agent i can communicate with the leader, otherwise, β i � 0. We assume that there is at least one agent who can get the information from the leader. Under the event-triggered scheme, the discrete-time MASs transmit the systems' data only when the event is triggered. Here, we define that the event is triggered at the discrete-time instants' sequence k i,1 , k i,2 , . . . , k i,p− 1 , k i,p , for i � 1, 2, . . . , N and p � 1, 2, . . . , ∞. At the pth event-triggered instant of agent i, the consensus errors of agent i denote as e event-triggered error is defined as which means the difference between the consensus tracking errors at the pth event-triggered instant and the current local neighbor consensus tracking errors. en, the consensus problem of the discrete-time MASs is to find the distributed feedback control law, u i (k) � χ(ξ i (k i,p )), which becomes a continuous signal through a zero-order hold (ZOH) device when en, the local cost function is defined as where 2 Complexity (i) U i (ξ i (k), u i (k), u j (k)): the utility function, for agent i, (ii) u j (k): the control of the neighbors of agent i.
According to Bellman's principle, the optimal local cost function J * i (ξ i (k), u i (k), u j (k)) can be defined as which is also called discrete-time HJB equations. e optimal disturbed control law u * i (ξ i (k i,p )) is defined as

Stability Analysis
Assumption 1 (see [32]). ere exist positive constants L, L 1 , ϕ, and ψ, a C 1 function V: R n ⟶ R ≥ 0, and class κ ∞ functions c 1 and c 2 , such that If (10) and (11) are satisfied, function V is called an ISS-Lyapunov function for the discrete-time MAS.
Let us consider a situation that k ∈ [k i,p , k i,p+1 ), which means that the ET condition is satisfied at the sampling instant k i,p . In this situation, it is obvious that Substituting (1) and (2) into (3), we have en, we can have Substituting (9) into (14), we have erefore, en, we can rewrite the ET condition as for every k ∈ [k i,p , k i,p+1 ).

Complexity
To better illustrate the control process, a flowchart has been displayed in Figure 1. e transmitted data and control policies are updated at k i,p instant, and the event-triggered error is reset to zero. Once the event-triggered condition is satisfied, the current instant becomes the next triggering instant k i,p+1 , and the system data are transmitted.
Otherwise, keep the transmitted data and control policies unchanged.
en, we will prove the discrete-time MAS is stable under our event-triggered conditions.

Theorem 1. If a discrete-time MASs which is under assumption 1 and satisfies the function,
for every k ∈ [k i,p , k i,p+1 ), where φ ∈ (0, 1), then the system is asymptotically stable.
From (22), we obtain Applying (9) into (24), we have Since (23) and (25) hold, the stability of the discrete-time MAS is proved. □ Complexity Remark 1. We give the event-triggered condition for each agent in the discrete-time MASs. Moreover, the stability of the systems is also proved in this paper.

Event-Trggered Controller Design
In this section, considering the good fitting characteristics of the neural networks (NN) [33,34], the actor-critic neural network structure is introduced to approximate the local cost function J i (ξ i (k), u i (k), u j (k)) and the distributed feedback control law u i (x). e actor-critic NNs are defined as where z denotes the input data, Ψ(·) denotes the activation functions, and w and ω denote the weight matrices of the NNs.

Formulation of the Critic Networks.
e critic NN approximates the local cost function J i (ξ i (k), u i (k), u j (k)) in this paper as follows: where z ci (k) denotes the input vector of the critic NN which is constituted by ξ i (k), u i (k), and u N(i) (k), Ψ ci (·) denotes the activation function of the critic NN, and w ci and ω ci are the weight matrices for the critic NN. We define the difference between the current cost value and the estimate value as the error function of the critic NN as follows: en, the loss function for the critic NN is given as Our objective is to minimize the loss function during the critic NN training. e weights for the critic NN are updated according to the gradient-based rule, which is given as follows: where K ci denotes the learning rate.

Formulation of the Actor Networks.
e actor NN approximates the disturbed control law u i (k), which can be formulated as where z ai (k) is the input vector of the actor NN, ψ ai (·) is the activation function for the actor NN, and ω ai and w ai are the weight matrices for the actor NN. We define the difference between the current local cost value V i (k) and the target cost value P i (k) as the error function, which is given as

Complexity
In this paper, the target cost value is defined as 0. en, the loss function for the actor NN is given as Our objective is to minimize the loss function during the actor NN training. e weights for the actor NN is updated according to the gradient-based rule, which is given as follows: where Ω(k) � zΨ ci (w T ci z ci )/zz ci , C i � zz ci /zu i , and K ai is the learning rate for the actor NN. e procedure of the NN-based event-triggered optimal consensus control algorithm for discrete-time MASs is shown in Algorithm 1. (30) and (34), respectively, under condition (17). e state x i , the critic NN weight estimation error, ω ci � ω ci − ω * ci , and the action weight estimation error, ω ai � ω ai − ω * ai , in the close loop system are UUB.

Proof
Case 1: the ET condition is satisfied at iteration index k.
e Lyapunov function for agent i can be defined as follows: where , and L i,3 (k) � 1/K ai · tr ω T ai (k)ω ai (k) . e difference between L i,1 (k + 1) and L i,1 (k) can be given as e difference between L i,2 (k + 1) and L i,2 (k) can be given as According to the update function for the weight matrix of critic NN (30), we have where η(k) � − ρΨ ci (w T ci z ci (k + 1)) + Ψ ci (w T ci z ci (k)).

Substituting (38) into 37 we have
Complexity e difference between L i,3 (k + 1) and L i,3 (k) can be given as According to the update function for the weight matrix of critic NN (34), we have Substituting (41) into (40), we have Combining (36), (39), and (42), the difference between ΔL(k) and ΔL(k + 1) is given as ) ≥ D mi holds, the difference is ΔL < < 0. is means the states of the system and the error of the weight matrices for critic NN and actor NN are UUB. Case 2: if the ET condition is not satisfied at iteration instant k, consider the Lyapunov function (35) in case 1.
is means when the ET condition is not satisfied at the time index k, the states of the system and the error of the weights matrices for the critic NN and actor NN are UUB.

Simulation Analysis
To test the effectiveness of the proposed algorithm, we apply the proposed algorithm in a numerical example. Consider a discrete-time leader-follower MAS consisting of 4 agents with a network topology, as shown in Figure 2. In the topology, agent 0 denotes the leader and the followers are labeled as agent 1 to agent 4. e adjacency elements α 21 , α 31 , and α 42 are set to 1. e other adjacency elements are set to 0. In this numerical example, only agent 1 can communicate with the leader, which means β 1 � 1 and β 2 � β 3 � β 4 � 0. e weight matrices of the utility function are selected as e parameters for the critic NN and the actor NN are set to ρ � 0.9, and K c1 � K c2 � K c3 � 0.01, K c4 � 0.001, and K a1 � K a2 � K a3 � K a4 � 0.01. Ψ c1 (k 1,p ) � [ξ   Figure 3. From Figure 3, we can observe that all the agents in the system can reach the same state as the leader, and then, they achieve synchronization. e driving errors for the agents in the system are shown in Figure 4. All the agents' driving error all are not updated at every instant k, that is to say, all the agents are driven when the ET condition is satisfied. Figure 5 shows the comparisons of event-triggered errors and thresholds for every agent in the system. In Figure 5, we can observe that the eventtriggered errors are always smaller than the thresholds during the tracking process, and we only sample the data

Initialization:
Give the computation precision τ and the initial state x i (0) for agent i; Give the initial state x 0 (0) for the leader; Select the learning rate K ai and K ci ; Give the positive matrices Q ii , R ii , and S ij ; Initialize the event-triggered error condition δ iT (0) � 0; Select the positive constant L; Iteration: Let the iteration index k � 0; repeat: Calculate the tracking error ξ i (k) and the event-triggered error δ i (k); Compute the control law u i (k); Compute the local cost function V i (k); Compute the next state x i (k + 1) of agent i and the next state x 0 (k + 1) of the leader agent; Calculate the next tracking error ξ i (k); Compute the control law u i (k + 1); Compute the local cost function V i (k + 1); Update the weights matrix of the critic NN; Update the weights matrix of the actor NN; ELSE: e control law u i (k) � u i (k − 1); Compute the control law u i (k); Compute the next state x i (k + 1) of agent i and the next sate x 0 (k + 1) of the leader agent according to the model NN;  when the event-triggered errors are bigger or equal to the thresholds, so we sample the less data and save computing resources using our algorithm. Figure 6 shows the comparisons of the required number of transmitting data under the time-triggered and event-triggered ADP algorithm for every agent in the system. We can observe the required number of the event-triggered algorithm is much less than the required number of the time-triggered algorithm.

Conclusion
An event-triggered optimal consensus tracking control algorithm based on the ADP structure is proposed in this paper. To save the communication and computation resources, we introduce the event-triggered scheme to the optimal consensus tracking control algorithm. e neural networks technology is introduced to simplify the application of the proposed algorithm. It is proved the discretetime MASs are stable with the proposed algorithm and the estimate errors of the weights for NNs are UUB. e simulation results illustrate the efficiency of the proposed method.

Data Availability
All data included in this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.