Modelling and Prediction of Random Delays in NCSs Using Double-Chain HMMs

This paper is concerned with the modelling and prediction of random delays in networked control systems. The stochastic distribution of the random delay in the current sampling period is assumed to be aﬀected by the network state in the current sampling period as well as the random delay in the previous sampling period. Based on this assumption, the double-chain hidden Markov model (DCHMM) is proposed in this paper to model the delays. There are two Markov chains in this model. One is the hidden Markov chain which consists of the network states and the other is the observable Markov chain which consists of the delays. Moreover, the delays are also aﬀected by the hidden network states, which constructs the DCHMM-based delay model. The initialization and optimization problems of the model parameters are solved by using the segmental K-mean clustering algorithm and the expectation maximization algorithm, respectively. Based on the model, the prediction of the controller-to-actuator (CA) delay in the current sampling period is obtained. The prediction can be used to design a controller to compensate the CA delay in the future research. Some comparative experiments are carried out to demonstrate the eﬀectiveness and superiority of the proposed method.


Introduction
In traditional control systems, system nodes (such as sensors, controllers, and actuators) are usually connected by the port to port wiring, which may cause many problems such as the difficult wiring and maintenance and the low flexibility and reliability. Such drawbacks appear in many automation systems due to the increasing complexity of controlled plants. With the advent and development of networks, traditional point-to-point control systems are being reshaped and redefined, which gives birth to networked control systems (NCSs) wherein feedback control loops are closed through communication networks [1]. e utilization of a multipurpose shared network to connect spatially distributed components endows control systems with several advantages such as decreased volume of wiring, low installation and maintenance costs, increased system flexibility, and high reliability [2]. Nowadays, NCSs have been extensively applied in many practical systems such as unmanned marine vehicles [3], intelligent manufacture [4], transportation network [5], and haptics collaboration over the Internet [6].
Compared with traditional control systems, the integration of communication network in the control loops leads to network-induced delays, data dropouts, packet disorder, and congestion. Among them, the network-induced delay is the most significant phenomenon that degrades the system performance or even destabilizes the control systems. See [3] for an example of dealing with random delays in networked unmanned marine vehicles. erefore, it is very necessary and urgent to investigate the random delays while implementing networked control. is is the initial research motivation of this paper. Over the past decade, considerable attention has been paid to the studying of NCSs with delays, and many significant results have been reported in the literature (see [7,8] and references therein).
Based on the networked structure for the UMV (Figure 1 in [3]), we can draw a typical structure of NCS as shown in Figure 2. e sensor-to-controller delay (called SC delay) in the k th sampling period (denoted as τ sc k ) is exposed when the sensor measurement is transmitted from the sensor to the controller. Similarly, the controller-to-actuator delay (called CA delay) in the k th sampling period (denoted as τ ca k ) is exposed when the control law is transmitted from the controller to the actuator. Regardless of the type of network (wired or wireless communication network), the performance of NCSs is always affected by these delays, which can be constant or time-varying according to their distribution characteristics. To compensate the delays, it is often necessary to establish the mathematical model of delays first. e last two decades have witnessed the flourishing evolution of this area in NCSs. e constant delay occurs in NCSs when a buffer is used on the controller (or actuator) side, and the data in the buffer are read periodically by the controller (or actuator) [9]. For this case, the delay is simply modelled as a constant. As a result, the NCSs can be transformed into deterministic systems, and some conventional deterministic control methodologies have been applied to NCSs in [10,11]. Nevertheless, the buffer makes the delay artificially increased because the data packet can be used only at some fixed time instant although it has already arrived for a while. e main limitation of the constant delay model leads to much design conservatism, or, even worse, the system stability margin decreases so much that the system becomes unstable.
On the other hand, the controller and actuator in NCSs are usually event-driven, which results in the time-varying delay in NCSs. For this case, two divisions can be found in the literature, i.e., the deterministically varying delay and the stochastically varying delay. In the former division, the variation of delays is not known a priori, but the instantaneous value is available to designers in real time. In the latter division, the variation of the delays is associated with some statistical descriptions: either the case of the current delay being independent of the previous ones or the case of some correlation existing between the current delay and the previous ones.
For the deterministically varying delay, there are three main approaches to analyze the NCSs although the statistics of time-varying delay is not available. e first approach is to transform the network-induced delay into the input delay, and then the recent advances on the delay system approaches can be applied for the analysis and control of NCSs [12,13]. e second approach is to treat the network-induced delay as a variation parameter of the system, and, more specifically, it is usually called as the exponential uncertainty [14,15]. en, the stability and stabilization problems can be solved based on the robust control approach. e third approach is to introduce a new working mode of the actuator, by which the delay can take finite values only, and ultimately the NCS is modelled as a discrete-time switched system with a finite number of subsystems [16,17]. e main difference between them is that different approaches result in a different system model. e first one models the NCS as a delay system, the second one models the NCS as a parameter-varying system, and the third one models the NCS as a discrete-time switched system.
For the stochastically varying case, the statistics of delays are available for system design, and the stochastic system approach can be applied. To model the random delays, two methods are widely used: one is the independent identically distributed (i.i.d.) model and the other is the Markov chain model. e i.i.d. model is similar to the modelling of stochastic sampling problem, i.e., the delay is partitioned into multiple different time-varying delays and each delay has a certain bound. en, a delay distribution-based analysis and synthesis approach can be applied for the NCSs with nonuniform distribution characteristics of delays [18,19]. e same delay distribution-based approach has been extended to fuzzy NCSs in [20,21]. Different from the i.i.d. model, the Markov chain model is proposed under the assumption that the delay is correlated and the transition of different delays has Markovian property. For the past few years, a lot of effort has been put on the research about Markov chain-based analysis and synthesis methods for random delays in NCSs. ese methods can be generally divided into two categories: one considering only SC delays [22] (or CA delays [23]) and the other considering both  Figure 1: Refined diagram of Figure 2 with some other data.   Discrete Dynamics in Nature and Society Markov models (HMMs) have been successfully used to model random delays of NCSs [28,29]. In the HMM-based delay model, the stochastic distribution of current delay is only governed by the current network state. is is the main difference between the HMM-based delay model and the Markov chain-based delay model. Along with each packet transmission over the network, the network state will jump from one mode to another following a Markov chain. e network state determines the distribution of random delays. is kind of relationship between the network state and the random delay is referred to as an HMM. According to the delay characteristics, there are mainly three kinds of HMMs to model the random delays of NCSs: discrete-time HMM (DTHMM) [28,30], continuous-time HMM (CTHMM) [29,31], and semicontinuous HMM (SCHMM) [32]. Moreover, how to optimally initialize the parameters of HMM-based delay models has been discussed in [33]. Compared with the Markov chain-based delay model which describes the probabilistic relationship among random delays themselves, the HMM-based delay model reveals the essential generation mechanism of random delays and describes the probabilistic relationship between random delays and hidden network states. To a certain extent, the HMM-based delay model is closer to the real distribution of random delays. However, the probabilistic relationship objectively existing among random delays themselves is neglected in the HMM-based delay model. erefore, to get a more accurate delay model, we prefer to believe that the probabilistic relationship exists not only between random delays and network states, but also between random delays themselves, which is the main motivation of this paper.
at is to say, the current delay is governed by the current network state as well as the previous delay. erefore, the socalled double-chain hidden Markov model (denoted as DCHMM) is introduced in this paper to model the random delays in NCSs. e contribution of this paper lies in the novel modelling method (i.e., DCHMM) that considers the relationship between the random delay and the network state as well as the relationship between the random delay and its previous value. is is the first time in the literature that both the network status and the delay itself are considered simultaneously in the delay modelling. Based on the model, the segmental K-mean clustering algorithm and the expectation maximization algorithm are presented to solve the initialization and optimization problems of the model parameters. en, the optimized model is used to predict the CA delay and get more accurate delay prediction result than the discrete HMM.
Generally, the CA delay can arbitrarily take values from its acceptable interval, although only a certain discrete value is observed during each sampling period. is is the reason the continuous HMM [29,31] and semicontinuous HMM [32] have been used to model the CA delays. In these two models, the mixture Gaussian density functions are used to describe the distribution of the CA delays, which improves the modelling accuracy compared with the discrete HMM. However, at the same time, the model parameters to be estimated are augmented and then the computational complexity is increased in these two models. erefore, for simplicity, the CA delays are limited to a few discrete intervals by using a scalar quantizer, such as the uniform quantizer in [28,34] and the K-means clustering quantizer in [30], and then the discrete HMM can be used to model the CA delays. Similarly, in this paper, the CA delays will be also quantized into several finite different discrete observations, and more specifically, the observation process will be assumed to possess the Markovian property, which is much different from the discrete HMM in [28,30] where the observations are independent of each other. erefore, there are two Markov chains to be considered in this paper, one is the hidden network states and the other is the observable delay observations, which are modelled as the double-chain hidden Markov model (DCHMM). Since the DCHMM concerns the dependency relationship objectively existing among random CA delays, it will present higher precision of modelling and prediction than the discrete HMM, which will be demonstrated through the contrastive experiments in this paper. Nevertheless, the superiority of DCHMM over the continuous or semicontinuous HMM is not so clear unless some continuous stochastic processes other than the Markov chain are used to describe the dependency relationship among random CA delays, which will be investigated in our future work. e rest of this paper is organized as follows. e problem formulation is given in Section 2. e DCHMMbased delay model and its parameter estimation are presented in Section 3. How to predict the random CA delays based on the DCHMM delay model is proposed in Section 4. e effectiveness and superiority of the proposed methods are illustrated by experimental examples in Section 5. Finally, some concluding remarks are given in Section 6.

Problem Formulation
To better present the problem, the block diagram of a typical NCS in Figure 2 is further refined in Figure 1, where some other data (e.g., sensor measurement, control law, and historical CA delays) are annotated. In Figure 1, the sensor is time-driven and samples the plant every h seconds to get the plant state (called sensor measurement and denoted by x k ). Both the controller and the actuator are event-driven, which means that the controller calculates the control law as soon as the sensor measurement arrives at the controller node and the actuator acts as soon as the control law arrives at the actuator node. If the network nodes (i.e., sensor, controller, and actuator) are clock-synchronous (this can be implemented using the method in [35]), the timestamp technology can be used to calculate the SC and CA delays. e sensor measurement is time-stamped before it is transmitted through the backward network, and when it arrives at the controller node, the SC delay can be calculated by comparing the timestamp of the sensor measurement with the local time of the controller node. Similarly, the control law is timestamped before it is transmitted through the forward network, and when it arrives at the actuator node, the CA delay can be calculated by comparing the timestamp of the control law with the local time of the actuator node.
Both the SC and CA delays may degrade the system performance or even destabilize the control system. erefore, many kinds of controllers are designed to compensate these delays. When designing the current control law u k , the current SC delay τ sc k has occurred and can be measured by using the timestamp Discrete Dynamics in Nature and Society technology (in this paper, the word "current" means the k th sampling period). However, at the same time, the current CA delay τ ca k has not occurred. In order to compensate the current CA delay τ ca k by the control law u k , a feasible way is to predict the current CA delay before designing the control law u k . us, both the predicted CA delay (denoted as τ ca k ) and the measured SC delay τ sc k are known to the controller and can be compensated only if they are considered into the design of the control law u k . In this paper, the DCHMM is proposed to model the CA delay and obtain its current predicted value τ ca k , which aims to improve the prediction accuracy compared with the DTHMM.
ere are several parameters (the number of hidden states N, the number of observations M, the probability distribution of the first hidden state π, the transition matrix between hidden states A, and the set of transition matrices between successive outputs given a particular hidden state B) in the DCHMM-based CA delay model (denoted as λ, and λ � (N, M, π, A, B)). To get the optimal estimation of these parameters, the historical CA delays are needed as the inputs of an iterative optimization process to train these parameters. erefore, a delay buffer is set at the controller node to collect all the past CA delay data (i.e., τ ca k−1 , . . . , τ ca 1 ) as shown in Figure 1. As described above, the adjacent previous CA delay τ ca k−1 is calculated at the actuator node and then can be packaged into the current senor measurement x k to generate a single packet. is packet will be transmitted to the controller through the backward network. Once receiving this packet, the controller will extract the previous CA delay data τ ca k−1 from this packet and put it into the delay buffer. In this way, all the past CA delay data are collected in the delay buffer and then can be used to optimally estimate the parameters of the DCHMM-based CA delay model and predict the current CA delay (τ ca k ). Based on the predicted value (τ ca k ) and the real measured value (τ sc k ), some control law can be designed to compensate both the CA delay and the SC delay in the current sampling period. is paper is focused on the modelling and prediction problem, leaving the compensation problem as the future work.
For simplicity, the sum of the SC and CA delay is assumed to be not more than one sampling period (i.e., τ sc k + τ ca k ≤ h) in this paper, which means that there is no data packet dropout and disorder concerned in this paper.

Model Derivation.
It should be noted that this paper will focus on the discrete-time, discrete-state, and discrete-observation DCHMM, where the state transition probability matrix and the observation transition probability matrix for each state are assumed time homogeneous. Similar to DTHMM, the forward network states are modelled as a time-homogeneous Markov chain with N different states that constitute a discrete finite state set S � 1, 2, . . . , N { }. Generally, the network state (denoted as s) defines the whole working status of the network and is an abstract variable that comprehensively reflects the stochastic factors (e.g., network load, nodes competition, and network congestion) of the network. e network state in the k th sampling period is represented as s k . Obviously, s k ∈ S(s) holds. Along with each packet transmission in the forward network, the network state may transfer from one mode to another through Markovian property described as the following equation: (1) e Markovian property tells us that the conditional probability distribution of the network state in the next sampling period (denoted as s k+1 ) depends only upon the current network state s k , not on the past network states s k−1 , . . . , s 1 . In equation (1), a ij (∈ [0, 1]) denotes the onestep transition probability of going from the network state i at time k to the network state j at time k + 1 , satisfying the constraint: a ij ≥ 0 and N j�1 a ij � 1. e Markovian property of network states is illustrated in Figure 3(a). All one-step transition probabilities construct a one-step transition matrix A � a ij 1 ≤ i,j ≤ N as shown in Figure 3(b). For the special case of time k � 1, a ij is simplified into an initial state probability of the first hidden network state that is defined by π j as follows: Obviously, π j ≥ 0 and N j�1 π j � 1 hold. All possible initial state probabilities construct an initial vector π � π j 1 ≤ j ≤ N . Now, these parameters (N, π, and A) define a Markov chain of network states. As is known, the DCHMM describes a two-stage stochastic process. e defined Markov chain is exactly the first stage of the DCHMM-based delay model. In the second stage, for every state at time k additionally a CA delay τ ca k is generated. As mentioned before, the DCHMM proposed in this paper has finite discrete observations and the observations for each state construct a time-homogeneous Markov chain. Similar to the DTHMM [28,30], the observations are also obtained by quantizing the CA delays, and then they can be used to estimate the parameters of the DCHMM-based delay model. e delay interval ((0, h]) is assumed to consist of a complete set such as (0, After k − 1 sampling periods, one can get a set of network states: s � s 1 , s 2 , . . . , s k−1 , a set of the CA delays: Under the previously mentioned time homogeneous assumption, the conditional probability of obtaining the observation o k for the DCHMM is defined as follows: In equation (3) lm . at is, if the DCHMM is currently in the state i, B (i) will be the observation transition probability matrix used to determine the observation at the current time point given the observation which was obtained at the previous time point. From this, it can be seen that the observation of the DCHMM-based delay model can be viewed as a time inhomogeneous Markov chain, where the transition probability matrix used for the observations is dependent on the hidden network state of the DCHMM.
Based on the definitions in equations (1), (2), and (3), the DCHMM-based delay model, which is usually denoted as λ, can be completely described as follows: (3) A vector π of initial state distribution: (4) A set of the transition matrix between successive observations given a particular state i (i ∈ S): From the above, λ � (N, M, π, A, B) . In general, N and M are known in advance, so λ can be rewritten as follows for simplicity: Equation (7) gives the definition of the DCHMM-based delay model which is represented in Figure 4.

Remark 1.
e DCHMM is a generalization of both a visible Markov chain and a hidden Markov chain. When there is only one hidden state (N � 1), the DCHMM reduces to an homogeneous Markov chain with transition matrix B (1) . On the other hand, when there are N > 1 hidden states but each matrix B (i) (i ∈ S) has identical rows, the model reduces to a DTHMM.
In Figure 4, the observed process is governed by a hidden Markov chain in which successive observations are directly correlated through the Markov property. e DCHMM presented in this paper combines characteristics of both visible and hidden models. It is called "double" since it can be viewed as the superposition of two interlinked Markov chains: the hidden Markov chain governing the relation between states of a nonobservable variable (i.e., network state) and the visible Markov chain governing (together with the hidden state process) the relation between outputs of an observed variable (i.e., CA delay). at is, the DCHMM has a similar stochastic framework to the HMM, but now it is assumed that, for a given time point k, the current CA delay τ ca k is not only dependent on the current hidden state s k but also dependent (through the Markov property) on the previous CA delay τ ca k−1 . In Figure 4, since the value of τ ca 1 depends on the past, we consider an initial CA delay value (τ ca 0 ) at time 0 with no corresponding hidden state, and its corresponding observation is o 0 . Based on this consideration, the set of the past CA delays τ ca is redefined as τ ca � τ ca 0 , τ ca 1 , τ ca 2 , . . . , τ ca k−1 , and the corresponding set of the observations o is redefined

Remark 2.
e dependence of the current CA delay on both the current network state and the previous CA delay can be explained as follows. e CA delay process can be considered as a Markov chain, where the transition probability matrix is dependent on the current network state occupied.
at is, the transition probability matrix of the CA delay process is associated with each state in the network state space, and each time the DCHMM enters a new state, the transition probability matrix of the CA delay process for that network state is used to determine which observation (given the previous CA delay) will be emitted for that time point. e CA delay process can thus be viewed as a time inhomogeneous Markov chain, where the transition probability matrix used for the observation process is driven by the hidden state process of the DCHMM.
One of the benefits to introduce the DCHMM-based delay model is that the advantages of both the Markov chain and the HMM are conserved. at is, the system is driven by an unobserved process, and the successive observations are also directly correlated.

Model Estimation.
In order to get the accurate prediction of the current CA delay (τ ca k ) by using the DCHMMbased delay model as described in equation (7), three different estimation problems have to be considered as follows: Q1: the estimation of the likelihood of a sequence of observations o 0 , o 1 , . . . , o k given a model λ Q2: the estimation of the parameters π , A and B given a sequence of observations Q3: e estimation of the optimal sequence of hidden network states given a model and a sequence of outputs ese three problems are similar to the problems occurring in DTHMM theory and can be solved using similar methods. Q1 is solved using a forward iterative algorithm. Q2 is achieved with an Expectation Maximization (EM) algorithm. Q3 is obtained through the Viterbi algorithm.
Remark 3. In this section, only the resulting formulas for each algorithm are given. e complete derivation is provided in Appendix A. Moreover, as presented here, the three algorithms can lead to numerical problems since they involve the calculation of infinitesimal values. A good solution is to normalize the intermediary results at each step of the calculation. e practical implementation of this method is discussed in Appendix B.

Likelihood of the Observation.
e likelihood of the observation o (o � o 0 , o 1 , . . . , o k−1 ) given the DCHMMbased delay model λ is defined as follows: An iterative procedure similar to the forward procedure developed in [32] for the estimation of the SCHMM is proposed to get the likelihood in equation (8). A forward variable is defined as follows: For r � 1, equation (9) becomes In the general case, for r � 2, . . . , k − 1, By summing α k−1 (j) over j, one can obtain the likelihood of the entire sequence of observation o: e iterative computation of α r (j) is sufficient to obtain the likelihood. Moreover, another iterative algorithm similar to the backward procedure appearing in [32] is defined here. It will be used later for the estimation of the parameters of the DCHMM-based delay model. e backward variable is defined as follows.

6
Discrete Dynamics in Nature and Society

Estimation of π, A, and B.
e complete identification of the DCHMM-based delay model requires the estimation of three sets of probabilities: π, A, and B. We use an expectation maximization (EM) algorithm for the parameter estimation as we have done in [32].
First, we define the joint probability of two successive hidden network states as follows: en, we define the marginal distribution of the hidden network states as follows: Using ε r and ξ r , we can write the reestimation formula for π, A, and B as follows: In practice, the parameter estimation of the DCHMMbased delay model is achieved using iteratively the forwardbackward procedure and the reestimation formulas in (19)∼ (21). Generally, after dozens of iteration, one can obtain the optimum model parameters λ * (λ * � (π * , A * , B * ), where π * � π * i , A * � a * ij , and B * � b (i) * lm .

Optimal Sequence of Hidden States.
Once we have an estimation of the DCHMM-based delay model, we can search the optimal sequence of hidden network states which maximizes the following conditional probability: or the joint probability is problem can be solved through an iterative dynamic procedure called the Viterbi algorithm. For r � 1 and j � 1, . . . , N, we define and for r � 2, . . . , k − 1, e optimal network state at time k − 1 is then predicted as and we obtain recursively for r � k − 2, . . . , 1: Finally, the joint probability of the sequence of hidden network states and the sequence of observed CA delay outputs is equal to As a dynamic programming method, every decision in the computation of the δ r (j) is only locally optimal in the Viterbi algorithm. e globally optimum of the state sequence s is only known after the evaluation of the δ k−1 (j), i.e., after the CA delay sequence has been considered in its entire length. e state s k−1 maximizing δ k−1 (j) denotes the end of the optimal state sequence. e predecessor states can be determined by means of backtracking in (27).

Delay Prediction
Based on the optimal prediction of the network state s k−1 at time k − 1 and the optimal estimation of the parameters A and B, the method to predict the current CA delay (τ ca k ) is proposed as follows.
Step 1. Predict the optimal network state in the current (i.e., k th) sampling period, which is denoted as s k : Step 2. Predict the optimal observation of the current CA delay, which is denoted as o k : Discrete Dynamics in Nature and Society Step 3. Predict the optimal CA delay in the current (i.e., k th) sampling period, which is denoted as τ ca k : Remark 4. Once we get the optimal network state in the previous sampling period (s k−1 ) by using (26), the probability of different network state j (j ∈ S) in the current sampling period can be calculated based on the Markovian property of network states in (1), and the state that maximizes the probability is the optimal prediction of the current network state (s k ). Based on the prediction (s k ) and the previous observation of CA delay (o k−1 ), the probability of different delay observation m (m ∈ O) in the current sampling period can be calculated according to (3), and the observation that maximizes the probability is the optimal prediction of the current CA delay observation (o k ). For the predicted observation (o k ), how to calculate the prediction of the current CA delay depends on the delay quantization method. As discussed in [30], there are two kinds of delay quantization methods to obtain the observations of historical CA delays, namely, uniform quantization and K-means clustering quantization. For the uniform quantization, the prediction of the current CA delay (τ ca k ) is set to the midpoint of the subinterval which the current observation (o k ) falls in. Based on the definition of the complete subintervals, the subinterval which the observation o k falls in is (h o k −1 , h o k ], and then the prediction of the current CA delay is given in (31). For the K-means clustering quantization, the prediction of the current CA delay (τ ca k ) is set to the centroid of the cluster which the current observation (o k ) falls in, and the corresponding calculating method is given as following step 3.
Step 4. Predict the optimal CA delay in the current (i.e., k th) sampling period, which is denoted as τ ca k : In (32), c o k is the centroid of the cluster which the current observation (o k ) falls in. e detailed K-means clustering quantization method and the definition of the cluster centroid have been given in [30]. Actually, we have proved that the K-means clustering quantization is better than the uniform quantization in improving the prediction accuracy of CA delays. erefore, we will only use equation (32) to predict the current CA delay in the experiments of this paper.

Illustrative Example
Some comparative experiments are carried out in this section to demonstrate the effectiveness and superiority of the methods proposed in this paper. For the sake of comparison, the experimental context is designed as the same as that in [32], and the CA delay values shown in Figure 4 of [32] are used again here to derive the DCHMM-based delay model and validate the prediction of CA delays. For the convenience of reading, Figure 4 of [32] is redrawn as follows. In Figure 5, the first 200 CA delay values (from 1 to 200 on the x-coordinate) are used for the parameter estimation of the DCHMM-based delay model, and the last 200 CA delay values (from 201 to 400 on the x-coordinate) are used to evaluate the predictive effect based on the model. e main difference between the experiments in this paper and that in [32] is that this paper not only considers the relationship between the CA delays and the network states but also considers the relationship between the adjacent CA delays.
at is to say, the distribution of the current CA delay in this paper is not only dependent on the current network states but also related to the previous CA delay.
Firstly, we need to initialize the DCHMM-based delay model λ 0 (λ 0 � (N, M, π 0 , A 0 , B 0 )). By using the optimal stabilization methods in [33], we can obtain that the optimal initial values of N and M are 4 and 6, respectively, which means that there are four different network states Compared with π 0 and A 0 , the initial value of the observation matrix (B 0 ) is very crucial to the parameter Obviously, the matrix B is a three-dimensional matrix, in which i denotes the layer of the matrix, l denotes the row of each layer, and m denotes the column of each layer. e detailed structure of the matrix B is given in Figure 6. ere are 4 layers in the matrix B, and every layer has 36 elements, which makes the matrix B have a total of 144 elements. In Figure 6, the black number corresponds to the number of layers, the red number to the number of rows, and the blue number to the number of columns. For simplicity, the matrix elements of the second and third layers are omitted here since they are easy to figure out.
Based on the definition in equation (3), all the elements of the matrix B satisfy the following constraint relation. Each element is greater than or equal to 0 and less than or equal to 1, and the sum of each row in each layer is equal to 1: Based on the first 200 CA delay values in Figure 5, the optimal initial values (B 0 ) of the observation matrix can be obtained as shown in Figure 7 by using the segmental K-means clustering algorithm proposed in [33].
In general, the network status is good and the large numeric elements have relatively small row and column numbers in Figure 7. In order to illustrate this point, we transform layer 1 in Figure 7 into Figure 8 that is a matrix plot with M rows and M columns. In the same way, the other three layers (layer 2 to 4) in Figure 7 can be transformed into their corresponding matrix plots, and the transformed result is omitted here for simplicity. In the matrix plot of Figure 8, each element is a colored dial. e color value (referring to the right color bar) and the size of the colored sector in one rectangle block (i, j) (1 ≤ i, j ≤ M) are determined by the initial value corresponding to the same (i, j) entry of layer 1 in Figure 7. In Figure 8, the region on the left shows the relatively large color values as well as the size of the colored sectors, while the region on the right shows the relatively small color values as well as the size of the colored sectors. Figure 8 illustrates, from another perspective, that the network always shifts to the state with short delay and presents a good status.
Secondly, we can use the EM algorithm to train the DCHMM-based delay model based on the initialized model λ 0 and the first 200 delay values in Figure 5. During this experiment, the maximum iterative number of the training procedure is set to 40 and the threshold ε to terminate the procedure is set to 5 × 10 − 4 . Under these conditions, it is found that the training procedure can always converge after dozens of iterations. At the end of this procedure, the optimized parameters (π * , A * , B * ) of the DCHMM-based delay model are obtained as follows. Considering that the matrix B * contains too many elements (144 in total), which makes it unrepresentable, we present it in the form of a one-dimensional curve as shown in Figure 9 where every 36 coordinates on the horizontal axis corresponds to one layer of B * :   Discrete Dynamics in Nature and Society irdly, we can use the Viterbi algorithm to get the optimal previous network state and then predict the current CA delay through three steps. e first step uses equation (29) to estimate the optimal current network state. e second step uses equation (30) to estimate the optimal observation of the current CA delay. e third step uses equation (32) to predict the current CA delay. Taking the three steps, the predicted values of CA delays can be obtained. For example, during the 201st period, the predicted value is 0.8591 ms that is close to the real value 0.8432 ms in Figure 5, and the relative error is 1.89%. By recursively executing the three steps in the following periods, we can get other predicted CA delay values from the 202nd to the 400th period as shown in Figure 10.
To judge the predictive precision of the DCHMM-based delay model, the mean-squared error (MSE) is defined as follows in this experiment:   According to the predicted values in Figure 10 and the real values in Figure 5, we can calculate that the MSE is 0.0032 by using equation (37), which is denoted as MSE DCHMM (i.e., MSE DCHMM � 0.0032). Generally, such prediction accuracy has been very high and is acceptable in NCS.
To illustrate the superiority of the DCHMM-based delay model over the DTHMM-based delay model in the prediction accuracy, we carry out again the experiment under the DTHMM-based delay model in [33]. Without considering the dependence between adjacent CA delays, the observation matrix B 0 degenerates into a two-dimensional matrix, in which the rows represent the network states and the columns represent the intervals of CA delays. Under the same experiment condition, the predicted value is 0.8706 ms during the 201st period for the DTHMM-based delay model, and the relative error is 3.25% (>1.89% for the DCHMMbased delay model). erefore, the DCHMM-based delay model proposed in this paper gives a better prediction of the CA delay during the 201st period than the DTHMM-based delay model. Moreover, by the end of this experiment, we got that the MSE for the DTHMM-based delay model is 0.0043 ( >0.0032 � MSE DCHMM ), which is denoted as MSE DTHMM (i.e., MSE DTHMM � 0.0043). Obviously, we have MSE DTHMM > MSE DCHMM , which demonstrates the superiority of the DCHMM-based delay model over the DTHMMbased delay model in the prediction of CA delays.
Besides, we try to compare the DCHMM-based delay model over the SCHMM-based delay model proposed in [32] and get some interesting results. In the experiment of [33], the predicted value is 0.8615 ms during the 201st period for the SCHMM-based delay model, and the relative error is 2.17% (>1.89% for the DCHMM-based delay model). However, we cannot simply assume that the DCHMM-based delay model is superior to the SCHMMbased delay model. Actually, by the end of the experiment, the MSE for the SCHMM-based delay model is 0.0029 (<0.0032 � MSE DCHMM ), which is denoted here as MSE SCHMM (i.e., MSE SCHMM � 0.0029). It is easy to find that MSE SCHMM < MSE DCHMM < MSE DTHMM . So, from this point of view, the DCHMM-based delay model is not superior to the SCHMM-based delay model in the prediction of CA delays. is is mainly because that the delays are treated as discrete values and quantized into some limited subintervals in the DCHMM-based delay model. If the delays are treated as continuous values and can take any value in its allowable interval or if the dependency relationship among random CA delays is taken into account in the SCHMM-based delay model, the derived DCHMM-based delay model will give more accurate prediction and show the superiority over the SCHMMbased delay model, which will be demonstrated in our future work.

Conclusions
In networked control systems, random delays are the main cause that degrades the system performance and even causes the system instability. A feasible method to compensate the random delays is to take the random delays into the design of the controller. Considering the CA delay has not occurred when the controller is designed, we need to predict the CA delay before designing the controller. To get a high-precision prediction value, it is necessary to establish a high-precision delay model first. Different from the existing modelling methods, this paper considers the dependency between the network states and the CA delays as well as the interdependence between the CA delays. e dependency between the network states and the CA delays is modelled as a hidden Markov model, and the interdependence between the CA delays is modelled as a Markov chain. As a result, the double-chain hidden Markov model (DCHMM) is proposed in this paper to model the random CA delays for the first time in the networked control systems. As the name implies, there are two Markov chains in this model. One is the hidden Markov chain which consists of the network states, and the other is the observable Markov chain which consists of the random CA delays. Moreover, the random CA delays are also affected by the hidden network states, which constructs the DCHMM-based delay model. e DCHMM-based delay model is highly consistent with the distribution of the real network delays. erefore, the DCHMM, compared with the DTHMM, gets much closer to the real network, which renders the DCHMM-based delay model with higher accuracy than the DTHMM-based delay model. However, it is worth noting that the DCHMM is more complex than the DTHMM in model parameter estimation and will consume more computing time. As a result, the DCHMM has a relatively poor real-time performance when directly applied to actual systems. A feasible solution is to determine the parameter reestimation frequency according to the prediction accuracy. A threshold is defined as in [32], and the event-triggering mechanism proposed in [36,37] can be  Discrete Dynamics in Nature and Society used to trigger the parameter reestimation when the prediction error exceeds the threshold. e expectation maximization algorithm is presented to obtain the optimal estimation of the model parameters after they are initialized by using the K-mean clustering algorithm.
e Viterbi algorithm is presented to obtain the optimal estimation of the network states, and then the prediction of the CA delay in the current sampling period is obtained. Finally, some comparative experiments are carried out to demonstrate the superiority of the DCHMM-based delay model over the DTHMM-based delay model. Nevertheless, the random CA delays considered in this paper are discrete, and the DCHMM-based delay model is derived through quantizing the delays. If the CA delays are treated as continuous values and can take any value in its allowable interval or if the dependency relationship among random CA delays is taken into account in the SCHMM-based delay model, the derived DCHMM-based delay model will give more accurate prediction and show the superiority over the SCHMMbased delay model. In our future research, the predicted CA delay will be used to design the controller in the current sampling period to compensate the imminent real CA delay. Furthermore, we will investigate the adaption of these methods to other systems such as the cyber-physical systems and the multiagent systems.

A. Derivation of the Algorithm
In this appendix, we provide the complete derivation of the algorithms of Section 3.
A.1. Likelihood of the Observed CA Delay Sequence. For r � 1, equation (9) becomes Since o 0 and s 1 are independent and the value of o 0 is known, we have For r > 1, equation (11) A.2. Estimation of π, A, and B. Equation (17) for the calculation of ε r (i, j) is and, for ξ r (i), we have

B. Practical Computation of the Algorithms
Some algorithms in this paper may lead to numerical errors. A good solution is to normalize the intermediary results at each step of the calculation. A computable version of these algorithms is provided in this appendix.

B.1. Computation of the Forward
Procedure. e forward variable α r (j) may easily take values too small to be handled by a computer. To avoid this problem, one solution is to normalize α r (j) at each step r.
For r � 1 and j � 1, . . . , N, we define o 0 o 1 π j /N is the average value of the α 1 (j). en, for r � 2, . . . , k − 1, we have where α r � b (j) o r−1 o r N i�1 a ij α r−1 (i)/N is the average value of the α r (j).
As a result, the log-likelihood of the sequence of observed CA delays can be obtained as follows:

B.2. Computation of the Backward Procedure.
e computation of the backward procedure leads to the same kind of problems as the forward procedure and requires the same type of solution.

B.4. Computation of the Optimal Sequence of Hidden Network
States. In order to avoid numerical errors, it is necessary to scale the quantity δ used in the Viterbi algorithm.

Data Availability
All data sets used in this study are provided within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.