Optimal Control for Networked Control Systems with Markovian Packet Losses

This paper is concerned with the optimal output feedback control problem for networked control systems (NCSs) with Markovian packet losses. In this paper, the packet losses occur both between the sensor and controller and between the controller and actuator. Moreover, the packet loss channels are described with two-state Markov chains. Since the precise state information cannot be obtained, thus an optimal recursive estimator is designed. Furthermore, by adopting the dynamic programming approach, we derive the optimal output feedback control, which is based on the solution to a given modiﬁed Riccati equation. The obtained results can be seen as an important implementation of the control theory for NCSs with unreliable communication channels.


Introduction
As it is well known, NCSs are spatially distributed systems in which the communication between sensors, actuators, and controllers occurs through a shared communication network (see, for e.g., [1,2] and references therein). NCSs connect cyberspace to physical space and can perform many tasks over long distance, which have the practical application value. And NCSs have the advantages of low cost, light weight, and simple structure, which can improve the reliability of the system, so they are widely used in various fields, such as remote surgery [3], unmanned aerial vehicles [4], and artificial intelligence [5]. It is worth noting that the study of network control has become an important direction of control science in the recent years (see, for e.g., [6][7][8][9]).
In communication networks, the impact of network communication bandwidth and limited load capacity, network congestion, and network connection interruption will result in data losses, retransmission, and communication links blocked or cut off (see, for e.g., [10,11] and references therein). Especially, data packet losses may occur in NCSs from sensors to controllers and from controllers to actuators. In addition, the communication between the sensor nodes and the remote state estimator is implemented via a shared network, where only one sensor node is permitted to transmit data at each time instant for the purpose of preventing data collisions [12]. In order to evaluate the estimation accuracy, various state estimation performance requirements have been introduced to quantify the engineering specifications such as H ∞ constraint, minimum mean squared error (MMSE) index, ellipsoidal bound constraint, linear quadratic performance index, and ultimate boundedness requirement [13]. For example, Zou et al. [14] designed the moving horizon estimator for linear systems with unknown inputs and quantization effects.
Although the closed-loop control system is robust to the change of structure and parameters in the system, the packet losses will inevitably lead to the performance degradation, which will seriously lead to the instability of the system. us, studying packet losses is a key problem to analyze and design the NCSs, which is the motivation of this paper. Many researchers showed their interest in the networked control for NCSs with unreliable packet loss channels, and great contributions have been made in this research field (see, for e.g., [15][16][17][18]). In [15], a switched system approach is adopted to deal with the problem of fault detection (FD) for uncertain delta operator NCSs with packet losses and time varying delays. And two independent Bernoulli distributed white sequences are introduced to account for packet losses. Qi and Zhang [18] concerned the optimal measurement feedback control and stabilization for NCSs with packet losses. However, they only considered the problem of packet losses in one channel, and the packet losses' process is the Bernoulli process.
In general, the process of data packet losses in communication networks is usually modeled as an independent identical distribution (i.i.d) Bernoulli process or a Markov chain. Obviously, the latter case of Markovian packet loss is more general and realistic. For example, Wang et al. [19] considered the case of data packet reception between the controller and the actuator network transmission at each time instant, which is described with Markovian packet losses. Zhou and Zhang [20] investigated the H-infinity fault detection for time-delay delta operator systems with random two-channel packet losses and limited communication. e random two-channel packet losses are described by the Markov chain process. Moreover, the stability of NCSs was studied for the NCSs with the simultaneous input delay and Markovian packet losses in [21]. However, the problem of two-channel packet losses is not considered. Overall, the next time instant of network transmission may depend on the current time instant, and the Markov process can be used to describe the dependency of events. Hence, it is more suitable to model the NCSs with this kind of packet losses than the i.i.d stochastic processes. In order to study the NCSs with packet losses, it is necessary to combine NCSs modeled by MJSs with optimal control. Furthermore, due to the existence of unreliable packet loss channels, the precise state information may not be available to the controller. erefore, an observer or estimator is needed to be the feedback of regulating the system performance, i.e., an output feedback controller should be designed. It is noted that, in the past several decades, many literatures have been published on the output feedback control, especially for NCSs (see, for e.g., [22][23][24][25][26][27][28][29]). For example, in order to deal with this problem, Wu and Chen [22] designed the networked control systems with packet losses. While Tan et al. [25] investigated the stabilization of networked control systems with network-induced delay and packet losses. en, Shah and Mehta [26] proposed a design method of the discrete-time sliding mode controller based on iran's delay approximation, which considered the real time situation of networked medium and packet losses situation. Shah et al. [29] presented a discrete-time higherorder sliding mode controller for NCSs using event-triggered approach and time delay compensation to overcome the network abnormalities such as communication delay, congestion, and network utilization that degrade the performance of the NCSs.
Referring to the previous work [30], we are devoted to deal with the optimal control of NCSs with two kinds of Markovian packet loss channels in this paper. e model of NCS under consideration is shown in Figure 1. θ k depicts whether the state signal is lost when data is transmitted from the sensor to the controller. β k indicates the packet loss when data is transmitted from the controller to actuator. Furthermore, Song et al. [30] addressed the modeling and guaranteed cost control for a class of NCSs with packet dropouts.
e packet dropout processes in the forward channel and feedback channel are modeled as two Markov chains, and the overall closed-loop NCS is modeled as a Markovian switched system with two modes. e differences between this paper and [30] are as follows. Firstly, the system model of this paper is different from the system model considered by [30]. ere is additive noise in the system considered in this paper. However, the system proposed by [30] has no additive noise. Moreover, Song et al. [30] assumed v(k) � Kw(k) as the optimal controller for the design. In contrast, we did not assume the form of the controller when we considered the optimal control problem. Finally, Song et al. [30] studied the stochastic stability problem and gave a sufficient condition of stochastic stability with upper bound of the quadratic cost function. And in this paper, we investigated the optimal control problem without considering the stability.
For the considered NCSs model, we will solve a linear quadratic (LQ) optimal control problem. While previous works mainly focused on the i.i.d Bernoulli packet loss case and the single packet loss channel case. In addition, it should be pointed out that an estimator should be designed in the existence of θ k , which may cause the separation principle fail. As pointed out in [31], the controller was designed with linear minimum mean square error (LMMSE), but it was shown that the LMMSE was not ideal and the popular separation principle was invalid. Finally, it is noted the Kalman filter [32][33][34] cannot be directly adopted in designing the optimal estimator. In this paper, by adopting the optimality principle of dynamic programming approach, we solve the basic optimal LQ control problem for NCSs with Markovian packet losses, and also the optimal estimator is proposed in this paper. e main contributions can be summarized as follows. First of all, some preliminary results are presented and an optimal estimator (conditional expectation) is given for the NCSs with Markovian packet losses. Accordingly, the error covariance matrices with the optimal estimator can also be calculated recursively. Consequently, the value function is Estimator 2 Complexity defined. en, an induction method is used to derive the optimal control strategies and the associated value function by the optimality principle. We show that the optimal controls are given in terms of a modified Riccati equation, which is well defined and can be calculated backwardly under the basic assumption made in this paper. e obtained results are new to the best of our knowledge, which can be regarded as important implementation for NCSs with Markovain packet losses. e rest of this paper is as follows. Section 2 introduces the system model and states the problems. Section 3 gives the main results. In Section 4, some numerical examples are given to verify the obtained results. Finally, we conclude this paper in Section 5.
roughout this paper, the following notations will be used. R n means the n-dimensional Euclidean space, A T indicates the transpose of matrix A, E[·] is the mathematical expectation, and E[·|F k ] is the conditional expectation with respect to F k . P(X) denotes the probability if the event X occurs, P(X|Y) is the conditional probability. e real symmetric matrix A > 0 or A ≥ 0 signifies A is positive definite or positive semidefinite. Tr(B) implies the trace of matrix B, and N(μ, Σ) means the normal distribution with mean μ and covariance Σ.

Problem Statement
In this paper, we will consider the following NCS, which is a linear time-invariant system: where x k ∈ R n is the system state, u k ∈ R m is the control input, and ω k ∈ R p denotes the system noise with zero mean and covariance Σ ω . A, B, and M are the given coefficient matrices with appropriate dimensions. β k ∈ 0, 1 { } is a twostate Markovian chain, which is used to denote the Markovian packet losses when data is transmitted from the controller to actuator. e transition probability matrix of β k is given by Moreover, x 0 is a random vector with mean μ and covariance Σ 0 .
Due to the packet losses from the sensor to controller, the state information cannot be precisely obtained, the 'raw' information available to the controller is given by where θ k is a two-state Markovian chain with θ k ∈ 0, 1 { }, and the transition probability matrix is given by To facilitate the discussions, we assume β k , θ k , ω k , and x 0 are independent with each other.
Associated with (1)-(4), the quadratic cost function to be minimized is given as follows: where Q ∈ R n×n , R ∈ R m×m , and S N+1 ∈ R n×n are the given symmetric weighting matrices. In this paper, it is assumed that u k is F k -measurable, and the information set F k is given by erefore, the optimal control problem to be solved in this paper can be described as below.
Problem 1. Find the F k -adapted controller u k to minimize cost function (5).

Remark 1.
According to previous works [30,35], the packet losses occur both from the sensor to controller and from the controller to actuator. And both the packet loss channels are described with two-state Markovian chains. In this paper, the difficulties and challenges can be concluded as follows.
e precise state information is not available to the controller, and then to derive the optimal control and verify the 'separation principle' will remain challenging.

Optimal Estimation.
In this section, some preliminary results will be introduced.
e proof can be found in [36], which can be omitted here. (3), then the optimal estimator x k � E[x k |F k ] can be recursively calculated as follows:

Lemma 2. For NCS (1) and measurement
with the initial estimator Furthermore, the error covariance matrix Proof. e results can be deduced from eorem 2 in [18], and the detailed proof is omitted here to avoid the repetition.

e Optimal Control Law.
In order to guarantee the solvability of Problem 1, we make the following standard assumption on the weighting matrices of (5).
For the sake of discussion, we denote the value function V k , k � 0, . . . , N as follows: Consequently, we will introduce the main results on the optimal control design for Problem 1.

Theorem 1. Under Assumption 1, the output feedback optimal controller can be given by
where (15) in which x k can be calculated from Lemma 2, and S k satisfies the modified Riccati equation: with terminal condition S 0 N+1 � S 1 N+1 � S N+1 given in (5). In this case, the value function V k is given by where S k can be calculated from (16)- (18), and the error covariance P k can be calculated from Lemma 2. Moreover, Λ k is given by with Furthermore, the optimal cost function can be obtained by in which Π 0 � E(x 0 x T 0 ), and S k , Λ k are given by with terminal condition S N+1 is given by (5), and π 0 k , π 1 k , τ 0 k , and τ 1 k can be calculated from (7) and (8).
Proof. Firstly, we will show that S k given by (16) is positive semidefinite for k � 0, . . . , N. In fact, from Assumption 1, we know S N+1 ≥ 0, R + λB T S N+1 B > 0, and R + (1 − λ) B T S N+1 B > 0. It can be easily verified that both S 0 N and S 1 N in (17) and (18) are positive semidefinite. Hence, S N ≥ 0 can be derived. By repeating the above procedures, we can conclude that S k ≥ 0 for k � 0, . . . , N. Moreover, it can also be obtained that R + λB T S k B > 0 and R + (1 − λ)B T S k B > 0. erefore, the Riccati equations (16)-(18) are well defined under Assumption 1.
By following the well-known Bellman optimality principle from dynamic programming approach, it can be derived that the following relationship holds:

Complexity
From (1), we know that V N can be calculated as e following two cases are under consideration: (1) When β N− 1 � 0, we can obtain that Furthermore, since we have shown R + λB T S N+1 B is positive definite, then it holds By using completing square skill, we know that the optimal control can be presented as en, V N can be calculated as follows: where S 0 N and P N satisfy (17) and (11) for k � N, respectively. And Λ 0 N is given by (21).
Also noting that R + (1 − λ)B T S N+1 B > 0, so we have Hence, the optimal controller of minimizing V 1 N can be given by Complexity from u N � u 1, * N , we get that where S 1 N , Λ 1 N , and P N satisfy (18), (22), and (11), respectively.
In this case, V N can be written as follows: To use the induction method, we assume the optimal controls u * k are given by (13) for k � l + 1, . . . , N, and relationships (14)- (18) hold. Moreover, it is assumed that V k , k � l + 1, . . . , N can be presented as (19), i.e., So, we can know that Also, two cases will be studied. Actually, if β l− 1 � 0, then it holds that Tr Since the positive definiteness of R + λB T S l+1 B has been shown before, thus V 0 l can be written as follows: Tr Hence, the optimal control of minimizing V 0 l can be given as follows: In the case of u l � u (0) * l , we can obtain where S 0 l satisfies (17) for k � l. Besides, for the case of β l− 1 � 1, we can obtain Since R + (1 − λ)B T S l+1 B is strictly positive definite, so we have And we can obtain in which S 1 l is given by (18). From that, the optimal controller u * k is valid for k � l. erefore, we can prove that V l can be calculated as below: Until now, the procedures of the induction method are complete, and we have shown that (13)- (22) hold.
Finally, we will prove that the optimal cost function J * N is given by (23).
In fact, noting that J * N � E[V 0 ], and the following assertions hold for k � 0, . . . , N, then it follows in which Π 0 � E(x 0 x T 0 ). us, the optimal cost function (23) has been verified. is ends the proof. □ Remark 2. For the strategy proposed in the paper, we analyze and clarify its advantages and disadvantages. For Problem 1, the main technique adopted in deriving the main results is the optimality principle of dynamic programming approach. Furthermore, when the controller cannot obtain Complexity accurate state information, we give the optimal estimation (conditional expectation) of the NCSs with Markovian packet losses and use recursion to calculate the error covariance matrix. It is verified that the 'separation principle' holds, i.e., the control gain matrix and the estimation gain can be calculated separately. e obtained results are new to the best of our knowledge.
Remark 3. It can be easily seen from the derivations that the obtained results in eorem 1 include previous works as special cases (see, for e.g., [8,18]). For example, if the packet losses processes β k and θ k are independently identically distribution (i.i.d) Bernoulli processes, then the main results can be reduced to the case in [8,18].
Remark 4. In this paper, we mainly investigate the finite horizon optimal control for NCSs with two kinds of Markovian packet loss channels.
e infinite horizon will be reported in the future work. us, the convergence analysis of the proposed recursive estimator is not given in this paper. Actually, by defining the Lyapunov function candidate with the optimal cost function of infinite horizon, the stabilization problem can be solved. In this case, the finite horizon Riccati equations converge to algebraic Riccati equations.

Remark 5.
e differences between this paper and [17] are as follows. Firstly, different from the measurement equation considered in this paper, the measurement equation in [17] is y k � Cx k + υ k , where C is the system matrix and υ k is the measurement noise. Secondly, the design of the estimator is different. In [17], parameters similar to the standard Kalman filter are used to obtain the estimation equation. In this paper, the optimal estimator and the error covariance matrix with the optimal estimator are obtained recursively. Finally, in [17], only the data loss of the control signal sent from the controller is considered, while this paper considers not only the packet loss from the controller to the actuator but also the packet loss from the sensor to the actuator. ere are also differences in the information sets available to the two controllers at any given time.
Remark 6. By adopting the dynamic programming approach, we derive the optimal output feedback control, which can be regarded as a special case of output feedback control problem. As for the general output feedback control problem, i.e., y k � Hx k + v k or y k � c k (Hx k + v k ), we will study this case in the future. Obviously, the general case is more challenging and complicated. We will try to extent the obtained results and methods to the general case.

Remark 7.
A pseudocode of obtaining the control signal of this paper is shown in Figure 2

Optimal Estimation and LMMSE.
Without loss of generality, we consider the higher-order system with the following coefficients: , and the time horizon N � 50. It is noted that the controller u k is F k -measurable, which cannot affect the performance of the estimators. Furthermore, under Assumption 1, the output feedback optimal controller can be calculated by eorem 1.
In order to show the optimality of the estimator given in Lemma 2, we will provide numerical examples to compare the proposed estimator in Lemma 2 and the linear minimum mean square error estimator (LMMSE).
As shown in Figures 3 and 4, as expected, it can be easily judged that the proposed estimator given in Lemma 2 can achieve better performance than the LMMSE given in [34].

Probability Display.
In this simulation, we see more clearly the process of probability change. θ k denotes the packet loss process, which obeys the two-state Markovian chain with θ k ∈ 0, 1 { }. e initial distribution is P(θ 0 � 0) � 1 − b � 0.6. As shown in Figure 5, the dotted line represents the change of P(θ k � 0), and the solid line indicates the change of P(θ k � 1).

e Effect of TPM on System
Performance. In order to show the transition probability matrix (TPM) effect on the system performance, in this paper, numerical results can be carried out for different cases.
We consider the higher-order system with the following     As shown in Figure 6, we can see that different transition probability matrices have different effects on the performance of the system. When the transition probability matrix changes, the cost function of the system also changes.

Conclusion
In this paper, we have investigated the optimal control problem for NCSs with two kinds of Markovian packet losses. By adopting the optimality principle of dynamic programming approach, for the first time, we derive the optimal control strategy, which is based on the given modified Riccati equations. Moreover, we also have shown that the separation principle is applicable for the considered problem, and the optimal control gain and the optimal estimator can be calculated separately. For future research, we plan to extend the obtained results to investigate the stabilization problem of the infinite horizon.

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare that they have no conflicts of interest.