F-DDIA : A Framework for Detecting Data Injection Attacks in Nonlinear Cyber-Physical Systems

Data injection attacks in a cyber-physical system aim at manipulating a number of measurements to alter the estimated real-time system states. Many researchers recently focus on how to detect such attacks. However, most of the detection methods do not work well for the nonlinear systems. In this paper, we present a compressive sampling methodology to identify the attack, which allows determining howmany and whichmeasurement signals are launched.The sparsity feature is used. Generally, our methodology can be applied to both linear and nonlinear systems.The experimental testing, which includes realistic load patterns from NYISO with various attack scenarios in the IEEE 14-bus system, confirms that our detector performs remarkably well.


Introduction
A cyber-physical system (CPS) is a dynamical system, which integrates the computational components (i.e., real-time operations) with its physical components (i.e., hardware facilities).Examples of CPS can be large-scale distributed systems, such as smart grid, transportation networks, railway control system, and medical monitoring.The design of CPS involves various of disciplines, such as control engineering, software engineering, and mechanics and networks.Particularly, control engineering is a communication network for transmitting sensor data (measurements) so that the system operator can in real-time monitor the production process.Among the control disciplines, a scheme called bad data detector (BDD) is applied to detect whether there exists a disruption of sensor data caused by the genetic malfunction or malicious attacks.The classical BDD technique is to utilize the "residual principle," which calculates the difference between the observed readings and the computed readings based on the estimated system states.When an attack is injected into the system, BDD will remove those readings (collected from the sensors), of which residuals are larger than a threshold.
As the increased vulnerabilities proposed by the recent discoveries of system malware, concerns about the security of CPS are arising.In 2011, a malware, known as Stuxnet [1], successfully penetrated the networks of Iran's uranium enrichment infrastructure via programmable logic controllers.From this instance, we can see that it is possible for an attacker to introduce errors on physical readings.Inspired by this attacking strategy, a class of attacks named data injection attacks are proposed in recent years, which can affect the system control algorithms and thus lead to abnormal operations [2,3].Hence, sufficient attention should be paid to the detection techniques against this attack, which is easy to be implemented by strong adversaries who are quite knowledgeable about the targeted systems.
To fight against this attack, existing works focus on the detection of data injection attacks and the protection of nonlinear measurements [4,5].Detectors utilizing the sparsity and low rank of the system topology are proposed in [6][7][8].Greedy and game theory methods have been used for optimizing the placement of devices [9], to lower the possibility of the construction of data injection attacks.Applying the machine learning techniques to conduct the classification is proposed in [10].They propose a "first difference aware" machine learning (FDML) classifier to detect the cyber attacks.A graph theory-based algorithm is proposed in [11] to determine which measurement signals an attacker will alter.However, we notice that all detection models except [11,12] are conducted in a constrained setting, by assuming that the functions from system states to measurements are linear.This assumption is too stringent to fit for some nonlinear systems, for example, alternative current (AC) model in power grids.
This paper investigates an alternative approach to detect data injection attacks in the nonlinear system.We propose a detector framework named F-DDIA to reconstruct the initial states of the plant from the corrupted observations, which formulates an error correction problem.In particular, we notice that, due to the property of data injection attacks, only a small fraction of the observations are supposed to be attacked at a given time instance.Thus, we formulate the error correction problem as a sparse optimization problem which can be solved with the general ℓ 1 -minimization program technique.In this paper, we apply Douglas-Rachford techniques [13] among minimization techniques.Furthermore, we employ the "divide-and-conquer" principle to construct a compressive sensing model of a linear subspace, which is interesting in the general mathematical settings.
To validate and illustrate our algorithm, we use realworld CPS power grids as a case study.In particular, we use the data injection attacks model proposed in [2], where the attacks are directed by injecting false data into the sensors.Simulations based on IEEE 14-bus test systems validate the effectiveness of our methodology.The results show that the proposed algorithm can efficiently identify the data injection attacks (i.e., with high precision and recall values) and recover the initial system states (i.e., with small average phase error).
The rest of this paper is organized as follows.Section 2 presents the system model in a nonlinear system, including preliminaries related to a broad class of attacks.Section 3 states the problem and derives a theoretical justification of the efficacy of the security algorithm in a general cyberphysical system model.Section 4 analyzes the performance of the proposed approach through simulations.Section 5 gives concluding remarks.

System Model and Bad Data Detector.
A cyber-physical system is usually described by the following widely adopted discrete-time nonlinear dynamical model: where at time  ∈ I ≜ {0, . . .,  − 1}: [] ∈ R  is the system state; [] ∈ R  is the bounded input vector; [] ∈ R  is the measurement vector (data collected by the sensors); [] denotes the state noise (i.e., Gaussian with known statistics); and V[] denotes measurement errors.Here the matrix  is a constant matrix,  : R  → R  denotes the state transition function and ℎ : R  → R  denotes the topology of the system, which are the nonlinear functions with respect to the states.The process of estimating system states from the measurements is called state estimation.
In traditional weighted least squares (WLS) state estimation, the system states are valid only if the measurement residual vector [] is less than a threshold [14], where x[] is the estimated system state after the process of state estimation.Specifically, the presence of bad measurements is inferred if [] > , where  is a chosen identification threshold.Upon detection of bad data, two kinds of methods, named the largest normalized residual test ( max  ) and hypothesis testing identification (HTI) method, are widely used to identify whether the measurements contain bad data.

Data Injection Attack.
Data injection attacks are commonly known as false data injection attacks [2], data framing attacks [3,15], in the sense of the following definition.Definition 1.A vector [] is called a (, )-data injection attack if there exists an index set  ∈ A, where A is the set of manipulated measurements and A ⊂ P ≜ {1, . . ., }, such that To implement this class of attack, it requires the attacker to have the knowledge of either the measurements information () or the topology configuration (ℎ(⋅)).Specifically, data injection attack can be written in the form of where [] is the injected false measurement data.There are many ways to generate this type of attacks.For example, if ℎ(⋅) is available to the attacker, the attack  can be constructed in the following form (namely, false data injection attack in a linear system): where  is the error injected on the system state and  = ℎ()/ is the Jacobian matrix.However, to implement this attack, the attacker needs to take control of at least  sensors, where  ≤ .

Measurement Dynamics.
We can use the polynomial regression approach to fit the measurement dynamics, where   : R  → R  denotes the dynamics of the measurements.Furthermore, we define   [] as the th corrupted measurement at time .That is, a polynomial regression model, which expresses the dynamics of the th measurement can be given as follows: where  is called the degree of the polynomial and  ∈ P.

Our Methodologies
In this section, we formulate the detection problem as an error correction problem.We will further describe and explain why we can use ℓ 1 -norm minimization technique (including Douglas-Rachford) to solve the detection problem.

Sparse Optimization Problem Formulation.
In this paper, we consider the scenario that an attacker is limited to the resources of  sensors and possesses the knowledge of system topology ℎ, as well as the historical measurements  = ([0]; . . .; [ − 1]) ∈ R  .Denote  = ([0]; . . .; [ − 1]) ∈ R  as the initial measurements (without attacks) in time base.The obtained temporal observations  can be expressed as where A = ([0]; . . .; [ − 1]) ∈ R  .Remark that, due to the property of data injection attacks, only a small fraction of the observations are supposed to be attacked at a given time instance.Hence, noticing the sparsity of vector A, the detection problem can be converted to minimize where  is the maximum number of the meters that can be compromised.Under certain conditions which are explained above, we will focus on the problem of recovering the sparse vector A from .And we denote the optimal solution of problem (10) as A * .

Subproblem Formulation. In the rest of this paper, we define the matrices
and  = [ 1 , . . .,  + ].We further define the matrices , , and  in the following forms: We can further obtain the following formulation among   ∈ R  ,   ∈ R  , and   ∈ R  : We denote by col ∈I () ∈ R  the columns of the matrix .Hence, problem (10) Note that ‖‖ ℓ 0 = ∑  =1 ‖  ‖ ℓ 0 ; we can further solve problem (13) by seeking for the locally optimal choice for each  *  with the hope of finding a globally optimal solution ( * ): minimize The solution of this subproblem ( 14) will be given in Section 3.4.After solving  above optimization problems, the optimal solution  * will be checked by the following constraints: For any  ∈ I, if (col  ( * )) = 1, there exists the attack; otherwise, there does not exist any data injection attack. [1] [2] . . .

Solving Subproblem by
[2] +   [2] . . . ) ) . ( Then we use the notation W as follows: where the matrices Γ  ∈ R  and Ψ  ∈ R × are ) , In this paper, We have an approximation   ([]) ≐   ([]).The reason we take this approximation is that the difference of [] and [] is Since the values of  ,1 are small ( = 1, . . ., ),   ([]) ≐   ([]).We have done experiments about this fact, and the experimental result supports our approximation claim.Then, W in (17) can be updated as We can further take the QR decomposition of Γ  ∈ R  [16]: where Security and Communication Networks 5 By using the second block row, we can solve the following problem to obtain the sparse solution , instead of A: Hence, the problem is reduced to reconstruct a sparse vector   from the observations   2 W .Problem ( 14) is equivalent to the following problem: minimize where   ∈ R  .As is discussed above, solving problem ( 24) is in general NP-hard since it requires searches over all subsets of columns of   2 Ψ  , a procedure which has exponential complexity.To overcome this problem, a frequently discussed approach considers a similar program in the ℓ 1 -norm: minimize This operation is common and can be found in [13,17,18].Throughout this paper, we consider Douglas-Rachford splitting algorithm [13] in the context of above ℓ 1 -minimization.

Theoretical Guarantee.
In this paper, we are also interested in studying the theoretical conditions under which obtaining the solution of the problem is guaranteed.It is well known that an inverse problem of finding the solution to the compressive sensing problem involves mathematical questions on the existence, uniqueness, and stability of the solution.On the other hand, the equivalence of the solution between ( 13) and ( 25) is not very clear and proof may be needed.We therefore consider two questions for a given   2 Ψ  and signal   2 W ( ∈ P): (i) uniqueness: under which conditions a possible sparsest solution is necessarily unique to problem (13)/(25)?and (ii) equivalence: under which conditions a sparse solution to problem (13) is also equivalent to the solution of problem (25)?3.4.1.Uniqueness.As is described in Section 3.3, solving problem (24) requires exhaustive searches over all subsets of columns of   2 Ψ  .Actually, it is a combinatorial procedure in nature and has exponential complexity.Inspired by [7,17], Theorem 3 provides a sufficient condition for a unique solution to problem (24).It guarantees obtaining a unique sparse vector (i.e., ) from the corrupted observations (i.e., ) for the ℓ 0 minimization.We denote by row ∈P () ∈ R  the rows of the matrix .Before giving the theorem, we need to first introduce the following definition [17].
Definition 2 (see [17,Definition 1.1]).Let   2 Ψ  be the matrix with the finite collection of vectors col(  2 Ψ  ) ∈I ∈ R  as columns.For every integer 1 ≤ ] ≤ |I|, we define the ]restricted isometry constants  ] to be the smallest quantity such that   2 Ψ  obeys for all real coefficients  ∈P .
The number  ] measures how close the vectors row  (  2 Ψ  ) are to behave.In particular, for ] = 1, we can have To see the relevance of  ] to the error recovery problem, we consider the following theorem.
(30) And given the condition that ‖col() ∈I ‖ ℓ 0 ≤ , we can conclude that  is also unique to problem (13).
In the literature, a lot of efforts have been made to determine how sparse the desired corrected error must be for equivalence to hold.As we consider to use ℓ 1 -minimization instead of ℓ 0 (to obtain the desired error), the conditions in the above lemma may not be guaranteed.Thus, Theorem 4 gives a general condition, which guarantees a unique solution   for ℓ 1 -minimization problem.
In conclusion, Theorems 3 and 4 show that the hypothesis of our theorem holds provided that the sparse error can be uniquely corrected.Naturally, if the assumption does not hold, then neither does (13) or (31).

Equivalence.
Next, we will discuss the conditions under which it is theoretically possible to use ℓ 1 -minimization to obtain the sparse solution  (or A) instead of ℓ 0minimization.We derive an algorithm for precisely verifying ℓ 0 -ℓ 1 equivalence.We can use the following definition and proposition proposed in [19].
Note that implication (35) is the condition that we want to verify.As we need to deal with high-dimensional matrices (e.g.,  ∈ R × ), we need to give asymptotic guarantees of equivalence, which is described in Proposition 6.In our experiments, it is confirmed that we can benefit from this equivalence, even when the matrices are in high dimensions.

Case Study: Power
Network.We employ a real-world power grid system as the test system we used.A state-space control model in a smart grid consists of buses connected to transmission lines.We use the IEEE 14-bus system as the test system [20].Moreover, we use the real load data in year 2016 from New York Independent System Operator (NYISO).The NYISO load data include the 11 regions (namely, A-H).Similar to [12], the following procedures are used to estimate 5-minute system state () using load pattern from NYISO.
(1) Link each load bus of IEEE 14-bus system to one region of NYISO using the following matrix: ( 2 3 4 5 6 9 10 11 12 13 14 The first row of the matrix is the bus number of IEEE 14-bus system and the second row represents the corresponding NYISO region index. (2) Normalize the load data collected from NYISO to the initial real and reactive load of the corresponding IEEE 14-bus system.Due to lack of reactive load information in NYISO database, we use the direct current (DC) power flow model to estimate system states.This condition can be relaxed when the reactive load data is available.
(3) Add the normalized load data on the IEEE 14-bus system.
(4) Estimate the system state ( x) from the solution of power flow analysis for benchmarking purpose.In this paper, we apply Newton-Raphson algorithm for estimating x.Similar to [12], we estimate  operating points of the system state () by adding the normalized 5-minute load data on the MATPOWER IEEE 14-bus case file [21].In this paper, we use one-day NYISO data as the testing set.Thus, on one day, there will be 288 operating points.So, we set  = 288 to construct the F-DDIA method.Second, we prepare the attacked samples as follows.We let the parameter  range from 1 to  = 54 in the IEEE 14-bus test system.For each , we simulate -specific meters to attempt the attack construction ( = ) with a randomly injected error .Thus, at most, a total of 6564 labeled samples, which includes 6017 attack samples and 547 initial samples (without attacks), are prepared.

Parameters in Load
Fitting.According to Section 2.3, the  in ( 6) is the parameter of the measurement (load) dynamical model for power grid system.We estimate  by polynomial regression using data traces of [ + 1] −  [𝑘].The historical load data in NYISO and attack samples prepared in previous session are used to construct the matrix γ (i.e., polynomial regression in order of ) in (6).The measurement dynamics at each time  are estimated by the data of 24 hours prior to the time.For example, if we want to estimate the load dynamics at 0:05 am Jun 30, Zone F, the load data samples (which may contain attacks) during 0:05 am, Jun 29-0:00 am, Jun 30 are used.
We are concerned about what the regression order  is appropriate for fitting the dynamics of the system.The experimental results show that  = 2 is a suitable regression order.As the increase of  will improve the load fitting accuracy at the cost of computation time, we will use  = 2 in the rest of our experiments.Table 1 gives the regression results for predicting the dynamical model by using the load data on Jun 30, 2016.
Specifically, we take Zone F for an example; Figure 1 shows a quadratic polynomial fit of load in Zone F with 95% confidence bounds (the 95% interval indicates that we have a 95% chance that a new observation will fall within the bounds.).We collect the hourly data to fit the model, where the blue "+" represents the actual hourly load, and the green curve describes the fitting model.

Performance Matrices.
When A is calculated by our detector, we set the following rule to identify whether the system is attacked: where  ob is the observation threshold when detecting data injection attacks.The parameter  ob will be discussed later in this section.We denote the user-defined threshold D  [] = 1 when   [] is identified as attacked.Then, we identify whether [] is attacked by aggregating the values of D  [] ( ∈ P).We predict [] as attacked (denoted as Label[] = 1) if the sum of D  [] is larger than the all-users-defined threshold N  , and secure (denoted as Label[] = 0) otherwise: In smart grid networks, the major concern is not only the detection of attack cases but also that of the secure cases.In other words, after following the rule (38), we need to be careful of the samples with high precision and recall performance in order to avoid false alarms.Therefore, we utilize precision and recall metrics, which are commonly used for classification tasks [10].Specifically, as Table 2 defines, we denote CA as the number of attacked samples, which we identified as attacked, WA as the number of secure samples, which we identified as attacked, CS as the number of secure samples, which we identified as secure, and WS as the number of attacked samples, which we identified as secure.In addition, the performance of the proposed detector can be measured by the precision and recall metrics: where   (  ) and   (  ) indicate the precision and recall values for the class attacked (secure), respectively.Precision values give information about the decision performance of the algorithms among identified class.And recall values measure the degree of attack retrieval.

Performance on Detecting Attacks.
We first analyze the performance of the proposed algorithm against the attacks, which are made from a set of false data injection attacks when  = 1.In the experiments, we observe that the selection of threshold parameter  ob does affect the precision and recall performances.2 and 3, where   Meanwhile, the performance of identifying secure samples is compared in Figures 4 and 5.Both values (precision and recall) of the secure class are high (i.e., near 100%).Summing up, the above experimental results show that if we choose the parameter  ob ∈ [0.04, 0.06], our methodology can efficiently detect the data injection attacks.

Performance on Recovering System States.
In this part, we compare the performances of our detector and the residualbased approach with the performance of recovering the initial systems states.We first introduce how we evaluate the performances of an algorithm.In IEEE 14-bus system, the state vector  will have 14 bus voltage magnitudes and 13 phase angles, where the phase angle of one reference bus is set as the reference.If the system is observable [14], the state vector  can be represented as follows:   works.The residual-based fault detector takes around 50 min (0.043 s per sample), while the proposed approach only takes 12 min (0.011 s per sample).The 12 min of our approach includes load dynamics fitting and Douglas-Rachford iterations process.The main computation burden for our proposed approach is to proceed Douglas-Rachford iterations for basis pursuit process.In general, we do not consider the state estimation process.This is why our proposed approach is faster than the other one.

Conclusions
The paper examines the problem of detecting data injection attacks in smart grid networks.We propose a detection framework named F-DDIA, which can recover the initial system state, as well as the real measurement readings.Due to the sparse nature of data injection attacks, ℓ 1 minimization technique (including Douglas-Rachford) can be applied.The validation of the proposed detecting algorithm is validated using load data from NYISO.Our detector works well in both linear and nonlinear systems.

Figure 1 :
Figure 1: The quadratic polynomial fit of the load data in Zone F with 95% confidence bounds on Jun 30, 2016, when  = 2.

Figure 2 :
Figure 2: Precision of attacked samples for the IEEE 14-bus system.

Figure 3 :
Figure 3: Recall of attacked samples for the IEEE 14-bus system.

Figure 5 :
Figure 5: Recall of secure samples for the IEEE 14-bus system.

Table 2 :
Denotations for defining evaluation metrics.
Table3shows the comparison for different  ob values.andincreaseas  ob increases and remain 100% when  ob ≥ 0.04.In addition,   and   decrease as  ob increases.Note that the precision value at  ob = 0.01 is 7.14% and the recall value at  ob > 0.06 is lower than 50% for class attacked.Thus, the optimal  ob value should be in range [0.02, 0.06].Note that the performance at  ob = 0.05 is quite similar to that at  ob = 0.06; thus we do not draw the performance at  ob = 0.06 in Figures2, 3, 4, and 5 to avoid unreadability.The performance of different  ob values for identifying attacked samples is compared in Figures

Table 3 :
Performance of proposed detector against multiperiod attacks for IEEE 14-bus system,  = 1.Although the proposed algorithm at  ob = 0.02 and  ob = 0.03 may correctly detect the attacked samples as / increases, the secure variables are incorrectly labeled as attacked and therefore give more false alarms.
/ ∈ [0, 1].We observe that   increases and   decreases when  ob increases.The precision value of attacked class is approximately 100% when  ob = 0.04 and  ob = 0.05.The recall value of the attacked class increases with rising / values and is approximately 100% when / is larger than 54.55%.