A GMM-Based Secure State Estimation Approach against Dynamic Malicious Adversaries

We consider the secure state estimation of linear time-invariant Gaussian systems subject to dynamic malicious attacks. An error compensator is proposed to reduce the impact of local error data on state estimation. Based on that, a new estimation algorithm based on the Gaussian mixture model (GMM) aiming at dynamic attacks is proposed, which can cluster the local state estimates autonomously and improve the remote estimation accuracy effectively. The superiority of the proposed algorithm is verified by numerical simulations.


Introduction
Cyberphysical systems (CPSs), such as transportation networks and smart grids, integrate sensing, computing, and control technologies with a communication infrastructure. Tight integration and cooperation between cyber and physical components are the features of CPSs [1]. However, CPSs are vulnerable to any successful attacks especially network attacks on the data and communication channels, which causes serious harms to the national economy and social security, for example, the Stuxnet storm reported in [2], StuxNet malware [3], power blackouts in Brazil [4], and Maroochy water bleach [5]. Due to the widespread application of CPSs in many real-life critical infrastructures [6], the security of CPSs has become an increasingly important issue which has attracted attention from many researchers in the past decades.
In the recent literature, the secure state estimation is an important research direction of CPSs security. In [7], a distributed state estimation method based on parallelized stream computing is proposed, which can not only significantly improve the speed of state estimation calculation but also reduce the interregional convergence correlation and the residual pollution. In [8], a new sequential estimation method is proposed to improve the estimation accuracy, which sequentially estimates states by the particle filter (PF) and parameters by the separable natural evolution strategy (SNES). The state estimation of three-phase power system models is studied in [9]. In [10], a Bayesian network based on the wireless power transfer (WPT) system state estimation algorithm is proposed, which can estimate the WPT system states in a distributed way using the Bayesian tree structure. In [11], a robust generalized maximum likelihood (GM) estimator, which leverages modified projection statistics and a Huber convex score function, is designed to bound the influence of observation outliers while maintaining its high statistical estimation efficiency. In [12], a distributed dynamic state estimation method for microgrids incorporating distributed energy resources is presented. In [13], a robust generalized maximum-likelihood Koopman operator-based Kalman filter (GM-KKF) is designed, which can estimate the rotor angle and speed of synchronous generators. In [14], a correlation-aided robust adaptive unscented Kalman filter (UKF) for power system decentralized dynamic state estimation with unknown inputs is presented, which has lower requirement of number of measurements for dynamic state estimation while achieving better robustness against bad data. In [15,16], the state estimation method based on undamaged sensors is studied. In [17,18], the state estimation for different systems is studied based on the convex optimization methods. In [19], by modeling and adopting a variety of models, a random Bayesian approach is proposed to solve the state estimation against switching patterns and signal attacks. In [20], the state estimation against fixed target attacks, switched target attacks with disturbance, and sparse sensor attacks are considered, and the sufficient condition for the existence of the switched observer is given. In [21], a fusion algorithm based on the Gaussian mixture model is presented to solve the estimation of a linear time-invariant Gaussian system under stealth attacks. However, the dynamic attacks are not considered. In [22], a dynamic combination strategy and a distributed Kalman filter are proposed, which improve the robustness of the system against random error data injection and replay attacks.
Most of the studies mentioned above have focused on static attacks. However, dynamic attacks are very common in real systems. Therefore, this paper considers the state estimation for a networked system suffering from dynamic adversaries as shown in Figure 1. The different sensors are attacked randomly at each time instant, and it is assumed that the number of attached sensors does not exceed half of the sensors.
Inspired by [21], we have designed an error compensator to reduce the impact of incorrect data on state estimation. Based on that, a new GMM-based state estimation algorithm is presented, which can effectively improve the state estimation accuracy against the dynamic adversaries. The contributions of this article are listed as follows: (1) A new error compensator is proposed to alleviate the influence of wrong data on state estimation, which can judge whether the beliefs generated by the expectation-maximum (EM) algorithm are accurate based on the observability of the system, and correct the doubtful beliefs (2) By introducing the error compensator, a new GMMbased estimation algorithm is presented, which can improve the estimation accuracy effectively. The proposed algorithm can fuse the local data by adopting the modified beliefs as the weights of the local data with the centralized Kalman filter The rest of the paper is organized as follows. Section II formulates the model of the considered system and the problem of interest. Section III proposes the error compensator and the new GMM-based state estimation algorithm against dynamic adversaries. In Section IV, the effectiveness of the proposed algorithm is demonstrated by numerical simulations. Conclusions are given in Section V.
Notation: ℕ and ℝ are the sets of positive integers and real numbers, respectively. ℝ n denotes the n-dimensional Euclidean space. S n + ðS n ++ Þ is the set of n × n positive semidefinite (definite) matrices. We write X ≥ 0ðX > 0Þ when X ∈ S n + ðS n ++ Þ. X ′ denotes the transpose of matrix X. E½· is the expectation of a random variable. N ðμ, ΣÞ is the Gaussian distribution with mean μ and covariance matrix Σ, and X~N ðμ, ΣÞ denotes X follows the Gaussian distribution N ðμ, ΣÞ. Diagf·g denotes a block diagonal matrix.

Problem Formulation
Consider the following networked system under attacks: where x k ∈ ℝ n denotes the system state, y i,k ∈ ℝ m i represents the measured value from sensor i at time k, and a i,k ∈ ℝ m i is attack signal. The number of sensors is denoted by N. w k ∈ ℝ n is the process noise, and w k~N ð0, QÞ. v i,k ∈ ℝ m i is the measurement noise, and v i,k~N ð0, R i Þ. Meanwhile, it is assumed that E½w k w l ′ = δ kl Q ðQ ≥ 0Þ, E½v i,k v j,l ′ = δ ij δ kl R i ðR i > 0Þ, where i = jði ≠ jÞ, δ i,j = 1ðδ i,j = 0Þ. E½w k v i,l ′ = 0, ∀k, l ∈ ℕ, i, j = 1, 2, ⋯, N. The initial state x 0 is independent of w k and v i,k for all k ≥ 0 and x 0~N ð0, Π 0 Þ. ðA, C i Þ and ðA, ffiffiffi ffi Q p Þ are detectable and controllable, respectively. The malicious attack a i,k ∈ ℝ m i satisfies the following assumptions: Assumption 1. Any s (s ≤ N/2) sensors can be corrupted by the adversary, and the output values of the sensors are changed. Only when sensor i is unattacked, a i,k = 0.   4. a i,k is statistically independent of fw K g K>k and fv i,K g K>k , respectively. Remark 1. According to [23,24], it is impossible to accurately reconstruct the state of a system when more than half the sensors are attacked. Thus, we assume that the maximum number of damaged sensors does not exceed N/2 in this paper, i.e., the upper limit of s is N/2.  Journal of Sensors When the system is not attacked, the measurements at time instant k can be stacked as where Then, we adopt a centralized Kalman filter as the remote estimator: wherex − k andx k are the priori and the posteriori estimation of the system state x k , respectively. P − k and P k are the priori and posteriori estimation error covariance, respectively. K k is the Kalman filter gain.
From [21], we know that the information-form Kalman filter can be expressed aŝ Similarly, the local Kalman filter for sensor i can be written asx It is noted that P k and P i,k can be calculated offline. According to [25], the Kalman filter converges from any initial condition exponentially when ðA, C i Þ and ðA, ffiffiffi ffi Q p Þ are detectable and controllable, respectively. The steady-state values of local and centralized Kalman filter are defined as It is assumed that the system starts from the steady state with P i,0 = P i and P 0 = P, and the fixed-gain of local and centralized Kalman filters can be represented as: The objective of this paper is to design a new GMMbased estimation method for systems suffering from dynamic adversaries.

The GMM-Based State Estimation
In this section, an error compensator and the GMM-based state estimation algorithm against dynamic adversaries are proposed.

Modeling and the EM Algorithm.
For a Gaussian mixture model with ℚ components [21], the mean and covariance of the q-th component Q q ðq ∈ f1, 2, ⋯, ℚgÞ are expressed as μ ðqÞ and Σ ðqÞ , respectively. π ðqÞ is the mixture component weights of Q q , and ∑ ℚ q=1 π ðqÞ = 1. In this case, the mixture density of a Gaussian mixture model can be expressed as where pðx | Q q Þ and Pr ðQ q Þ are the Gaussian distribution density and weight of the q-th component, respectively. Function f ðx ; μ, ΣÞ is the probability density function (pdf) for Gaussian random variables: At time instant k, we denote the means of the state variables for sensor i as μ When sensor i is attacked (defined as the second component), the exact distribution ofx i,k is unknown since the specific type and the starting time of attacks are unknown. In this case, similar to [21], we can adopt a Gaussian distribution with the first and second moments, i.e., pðx i,k | Q 2 Þ~N ðμ ð2Þ k , Σ ð2Þ k Þ, ∀i ∈ N, to approximate the distribution of all local estimates in the second component. Then,x i,k can be described by the following 2-component Gaussian mixture model: where π ð1Þ k and π ð2Þ k are the weights of the first and second components at time k, respectively.
The observation data set is defined as According to [26,27], it is known that the expectationmaximization (EM) algorithm can be adopted to find the maximum likelihood estimates for the parameter Φ k = fπ The log likelihood is shown as Generally, the EM algorithm is divided into two steps: the expectation and maximization step. First, initializing the parameter Φ k at each time k, then the expectation step generates a belief γ ðqÞ i,k ðq = 1, 2Þ based on Φ k andx i,k for each sensor: where γ The expectation and maximization steps iterate until they converge to a certain value. This iterative procedure maximizes the concave lower bound of the log likelihood in (14).

The Error
Compensator. In this subsection, an error compensator is proposed to reduce the influence of incorrect data on the state estimation.
According to 3.1, the EM algorithm can be used to calculate the GMM parameters and find the maximum likelihood estimation. However, the convergence and clustering results of the EM algorithm are affected by the initial parameters. In this paper, the first and second moments are adopted as the initial parameters of the second cluster. Due to the randomness of dynamic adversary and its specific type is unknown, the output of some attacked sensors may be similar to that of normal sensors at some moments. In this case, γ ð1Þ i,k will be miscalculated as γ ð2Þ i,k in the iterative process (15)- (19), since the observed data are considered to be closer to the second cluster by the EM algorithm. When the above case occurs, the estimation accuracy will be reduced seriously because the number of data available for fusion is less than N/2. On the other hand, the measurements that are similar to the true measurements can provide useful information for the remote state estimation, which means that the data belonging to the second cluster can be adopted to estimate system state. Hence, a compensator is designed to solve the above problem.
i,k at time instant k, which can be calculated as follows: According to the EM algorithm, γ i,k tends to 1 if and only if sensor i is attacked, and the expectation step is accurate, which causes i,k to approach s. When the expectation step is miscalculated, i,k approachs 0 for the attacked sensor i. According to Assumptions 1-4, the maximum number of damaged sensors does not exceed N/2 (namely, s ≤ N/2), which means N − s > N/2. Hence, it can be known that γ ð1Þ i,k and γ i,k > N/2. Based on the above analysis, the compensator is designed as follows: Journal of Sensors i,k are the modified beliefs, and ε ≥ s/N represents a threshold, which can be adjusted according to the performance requirements of the actual system.

The GMM-Based State Estimation Approach against
Dynamic Attacks. In this subsection, a GMM-based estimation algorithm is proposed to deal with the dynamic attacks, which can improve the estimation accuracy effectively.
where the initial valuesx 0 and P 0 are the steady-state values of the remote estimator when k ≤ 0.    Proof. According to the Definition 2 in [16,28], if sðs ≤ N/2Þ sensors are attacked, the following system is still observable in the absence of attacks: where s ⊆ f1, 2, ⋯, Ng is the set of unattacked sensors, and y s,k is the measurement stacked by the set s. Similarly, C s and v s,k are the system parameter and the measurement noise stacked by the set s, respectively. The pair ðA, C s Þ is observable. According to Section II, Equation (6) can be expanded aŝ where the default weight of each sensor is equal to 1 when the sensor is not attacked.
Based on the above analysis, we can calculate the remote state estimationx k by adopting the undamaged sensors. The belief b γ ð1Þ i,k represents the probability that the sensor i is undamaged. Then, we can fuse the local data by adopting b γ ð1Þ i,k as the new weight of the local data, and then the Equations (22a)-(22d) can be obtained.
The system is assumed to reach steady state before time k = 0. The adversary can launch dynamic attacks at any time when k ≥ 1. Starting from time k = 1, the local state estimationx i,k is calculated utilizing the measurement of sensor i at each time instant k. Based on that, the remote estimator clusters the local state estimates and calculates the parameter Φ k by the EM algorithm according to Equation (15)- (19). Then, the error compensator is used to correct the error beliefs. Finally, based on the modified belief b γ ð1Þ i,k , the local data can be fused by Theorem 2 to obtain the state estimationx k . The whole process is summarized in Algorithm 1.

Numerical Simulation
In this section, the effectiveness of the GMM-based estimation algorithm is verified through numerical simulations. Similar to literature [21], we consider a linear time-invariant dynamic process which is measured by 15 sensors. The system parameters A and Q are randomly generated from intervals [0.4, 0.99] and [0.5, 2], respectively. Matrices C i and R i , i ∈ N, are randomly generated from intervals [1,2]. The system reaches steady state before t = 30, and the attack signal starts from time t = 31, assuming that s ð1 ≤ s ≤ 6Þ sensors are attacked by a i,k at each time instant tðt ≥ 31Þ. 4.1. Example 1. In this example, the estimation accuracy of GMM-based method with and without compensator against dynamic attacks has been compared. Similar to [15], the attack signal a i,k can be assumed to be a linear function of the measurement noise: where β and Θ are real number from the interval [-5, 5] and [-10, 10], respectively. Meanwhile, a i,k satisfies Assumptions 1-4.   Journal of Sensors Set the threshold ε = 0:45 in the following example. In Figure 2, the trajectories of the actual state and the states estimated by the GMM-based estimation method with and without compensator are plotted. It is shown that the estimated states of the GMM-based method with compensator (dotted line) are closer to the actual state than that without compensator (red line). Figure 3 shows the estimation error covariance for the GMM-based method with and without compensator, respectively. It is observed that the estimation error covariance of the method without compensator (red line) is larger than that with compensator (black line), which means that the error compensator proposed in this paper can effectively reduce the impact of faulty data on state estimation. According to Figures 2 and 3, the estimation accuracy of the GMM-based estimation method with the compensator is higher than that without the compensator against dynamic attacks.  The number of attacked sensors at each moment when T ≥ 31 is plotted in Figure 4, and the state estimation and corresponding error covariance of the GMM-based algorithm when the compensator takes different thresholds are shown in Figures 5 and 6, respectively. It is seen that the state estimation accuracy is higher when ε = 0:45 and 0.65 than ε = 0:15 and 0.95, which is indicated that the performance of the remote estimator will deteriorate while ε is too large or too small. Hence, the threshold can be adjusted according to the actual performance requirements of the real system.     Distributed and centralized χ 2 false-data detectors are common, and they determine whether an attack exists based on the statistical characteristics of the innovation y i,k − C ix − i,k and y k − Cx − k , respectively. From [21], a well-designed dynamic attack can successfully bypass the distributed χ 2 detector but fails to remain stealthy to the centralized χ 2 false-data detector. In this subsection, we have compared the proposed approach and the estimation methods based on different χ 2 false-data detectors.
Similar to [21], the attack signal a i,k is set as where a i,k satisfies Assumptions 1-4. In Figure 7, the trajectories of the actual state and the state estimated by estimation methods based on different detectors are plotted, respectively. It is seen that the GMMbased state estimation (black line) is closer to the actual state than the state estimation based on the distributed and centralized χ 2 detector (red and green). Figure 8 shows the estimation error covariance of the corresponding methods, and it is observed that the GMM-based estimation error covariance is much smaller than that based on the distributed and centralized χ 2 detector. It can be seen that the GMM-based estimation approach proposed in this paper can improve the performance effectively.

Conclusion
This paper studies the state estimation problem against dynamic malicious attacks. An error compensator is presented, which can reduce the influence of local error data on state estimation effectively. Based on that, a new GMMbased state estimation algorithm is proposed to improve the estimation accuracy for the system suffering from dynamic attacks. Finally, the effectiveness of the proposed algorithm is verified by numerical simulations. We will extend the GMM-based approach further to systems with parametric uncertainties in the future.

Data Availability
Some or all data, models, or code generated or used during the study are available from the corresponding author by request (Cui Zhu).

Conflicts of Interest
The authors declare that they have no conflicts of interest related to this work.