A Novel Sparse False Data Injection Attack Method in Smart Grids with Incomplete Power Network Information

. The paper investigates a novel sparse false data injection attack method in a smart grid (SG) with incomplete power network information. Most existing methods usually require the known complete power network information of SG. The main objective of this paper is to propose an e ﬀ ective sparse false data injection attack strategy under a more practical situation where attackers can only have incomplete power network information and limited attack resources to access the measurements. Firstly, according to the obtained measurements and power network information, some incomplete power network information is compensated by using the power ﬂ ow equation approach. Then, the fault tolerance range of bad data detection (BDD) for the attack residual increment is estimated by calculating the detection threshold of the residual L2-norm test. Finally, an e ﬀ ective sparse imperfect strategy is proposed by converting the choice of measurements into a subset selection problem, which is solved by the locally regularized fast recursive (LRFR) algorithm to e ﬀ ectively improve the sparsity of attack vectors. Simulation results on an IEEE 30-bus system and a real distribution network system con ﬁ rm the feasibility and e ﬀ ectiveness of the proposed new attack construction method.


Introduction
The traditional power systems operate in an isolated physical environment, where their security mainly focuses on the random failures of the system components [1].With the deep integration of electricity infrastructure and modern information and communication technology, a smart grid (SG) uses two-way flows of electricity and information to create a widely distributed automated energy delivery network [2][3][4][5][6][7][8][9][10], leading to the great improvement of the comprehensive level of automation and management.However, SG has been found vulnerable to cyberattacks as a large number of smart devices are deployed over unencrypted cyber communication environments [11][12][13][14][15]. Malicious cyberattacks are one type of the most popular cyberattacks, which may trigger the catastrophic damage to power supplies and widespread power outages [16,17].For example, during the Christmas of 2015, a synchronized and coordinated cyberattack compromised three Ukrainian regional electric power distribution companies, resulting in power outages and further affecting approximately 225,000 customers for several hours [18].Moreover, the US PJM system received 4090 cyberattacks in one month in 2015, which was equivalent to 5.5 times per hour [19].Moreover, the Israeli power supply system was hit by a major cyberattack in 2016, forcing a large number of computers in the power supply system to run offline [20].Therefore, cyber security of SG is an important and open problem, which has attracted great interests from the government, industry, and academia.Cyber security can be studied from two perspectives to improve the system reliability.The remote state estimation was investigated from the perspective of defense [21] under possible false data injection attacks, where the whole knowledge of the system model must be known.However, this paper is aimed at finding the vulnerability of the power system with incomplete power grid information by developing an effective sparse false data injection attack strategy from the attackers' perspective.
State estimation is usually employed to estimate or predict the system operational states, which provides real-time information and effective supervision of SG.The traditional state estimation based on the least squares (LS) method and the fast decoupling method derived from the LS has been applied for many years [22].As the scale of the power system continues to increase, the dispatch center puts higher and higher requirements on the accuracy and stability of state estimation.Some power grids use a weighted least squares method based on a fixed Jacobian matrix and introduce orthogonalization [23].This state estimation method has better numerical stability and faster calculation speed.Others use a two-level distributed state estimation method [24], which makes full use of a large amount of redundant measurement information in the substation: the first step is to perform high-precision local estimation and the second one is to perform global coordination, so that a more reliable real-time state estimation result of the whole network can be obtained.Moreover, the distributed state estimation has also been employed for a large-scale power system to support the system operation [25].
False data injection attacks (FDIAs), as one typical type of malicious cyberattacks, can purposely manipulate measurements to perturb the results of state estimation without posing any anomalies to the bad data detection (BDD) while producing a serious threat or damage to SG operations [26,27].A common assumption on FDIAs in most works is that the attacker must obtain complete knowledge of the power network information [8,[26][27][28], i.e., topology information and transmission line parameters of the power grid.However, a practical attacking situation needs to be usually considered from two aspects: (1) it is difficult for an attacker to know all power network information of a power grid due to the strict protection of the control center and the lack of knowledge of real-time grid parameters such as the position of circuit breaker switches and transformer tap changers and (2) the attacker may access only a part of smart meters due to the limited attack resources and the physical protection of some important smart instruments.
For the first case, an attacker cannot gain the complete network information; i.e., the Jacobian matrix H is an incomplete matrix, but it is critical for the construction of a perfect FDIA strategy [26].To overcome the strong requirement of knowing the full topology and parameter information of a power grid, the first attempt is made successfully to design false data injection attacks with incomplete power information [29].Here, the limited parameter information obtained by the attacker is expressed as H = H + δ, where δ represents the difference between the complete parameter information H and the obtained partial parameter information H.Then, two cases of perfect attacks and imperfect attacks are studied, and the residual increments caused by perfect attacks and imperfect attacks are zero and nonzero, respectively.Furthermore, the range of residual increments caused by the undetectable imperfect FDIAs is given as 0 ≤ τ a ≤ a 2 ⋅ cos γ in [30], where τ a denotes the residual increments caused by the attacks and γ represents the angle between the null space of the real Jacobian transpose matrix H T and the image space of the inaccurate Jacobian matrix H.The attackers only need to obtain the power network information of the local attacking region to inject false data into smart meters in the local region of the power grid without being detected [31], and a strategy is designed to determine the optimal attacking region of a single load bus by obtaining less power network information [32].The phenomenon of intermittent faults is described by Bernoulli distribution in [33], as the intermittent faults in the nonuniformly sampled multirate systems occur randomly.However, the incomplete power network information in this paper is the incomplete power information of the system parameter; i.e., the parameter information of the whole power network is known well.The above works do not consider the compensation for the incomplete information in the measurement Jacobian matrix to reduce the estimation error of the predesigned false data to be injected into certain measurements.Furthermore, the fault tolerance range of the BDD unit for the residual increment caused by an imperfect false data is not analyzed in detail, which cannot ensure the high success rate for an attack to avoid the BDD.
For the second case, the attackers always tend to compromise as fewer measurements as possible to implement successful attacks, namely, constructing sparse attack vectors.It has stimulated several research works [34][35][36][37].These sparse attack models still require the full-power network information.Moreover, to the best of our knowledge, there is no feasible algorithm that can efficiently construct highly sparse undetectable attack vectors with incomplete power network information.
It seems to be much more difficult to launch an undetectable sparse attack when considering both aspects of the practical attacking situation.However, to improve the robustness of SG, it is very necessary to find the system vulnerability by developing a new and practical FDIA strategy.However, the following challenges and difficulties need to be addressed: (1) The first challenging problem is how to compensate unknown power information in the measurement Jacobian matrix and distinguish the secure measurement set and the attackable measurement set after the compensation (2) How to estimate the fault tolerance range of the BDD unit for the attack residual increment is another difficult problem (3) The third difficult problem is how to design and solve a sparse imperfect attack model to obtain an effective sparse imperfect strategy To address these difficulties, this paper investigates a novel sparse imperfect FDIA construction method by modifying only a much smaller number of measurements.The main contributions of the paper include: (1) according to the obtained measurements and power network information, some unknown information in the measurement Jacobian matrix is compensated by solving the power flow equation, and the secure measurement set and the attackable measurement set are constructed by determining whether the attackers can inject false data.(2) To ensure that the attack can bypass the BDD with a high success rate, the fault tolerance range of the BDD unit for the attack residual increment is estimated by calculating the detection threshold of the The rest of the paper is organized as follows.Section 2 describes the problem formulation of the sparse imperfect attack strategy.In Section 3, the LRFR algorithm is used for the smallest subset selection of attack vector elements.Simulation results are provided in Section 4, followed by concluding remarks in Section 5.

Problem Formulation
Considering the practical attacking situation, the schematic block diagram of a power network control system under FDIAs is shown in Figure 1.The attacker can only inject false data into certain measurements.That is, the system contains an attackable measurement set z F and a secure measurement set z S , which will be defined in detail in the later section.Then, the contaminated measurements z a = z S , z F a are transmitted to the state estimator for the identification of state variables.Furthermore, the bad data detector is used to identify and detect anomaly data based on the results of state estimation.If the attack cannot be detected by the BDD, the misleading state estimate results will be transmitted to the control system, which may pose seriously potential threats to system security and economic operation.Therefore, for SG with unknown power information, how to design a new sparse imperfect attack strategy is the following work.It will lay the foundation for finding system vulnerabilities and designing the corresponding protection strategies.
2.1.State Estimation in a DC System Model.We focus on a steady-state and lossless power transmission system with a set N = 0, 1, 2, … , n of buses and a set L = 1, 2, … , l of transmission lines.Each bus i ∈ N corresponds to an active power injection p i (generator active power minus load) and a bus phase angle θ i .Each branch k = i, j ∈ L connects two buses and corresponds to an active power flow f ij .Then, the branch active power flow is defined as positive if it is in the direction of the branch; otherwise, it is negative if it is in the opposite direction.Therefore, f ji = −f ij for ∀ i, j ∈ L [38].To describe the network topology and transmission line parameters of the electricity grid better, let A ∈ −1, 0, 1 l×n denote the branch-bus connection matrix; i.e., if the branch k is not connected to bus i, 1, if the direction of branch k begins f rom bus i, −1, if the direction of branch k ends towards bus i 1 Then, let the diagonal matrix D ∈ ℜ l×l describe the physical properties of the transmission lines, and the k th diagonal element of D (i.e., D k ) is the negative admittance of branch k = i, j , i.e., D k = −b ij .Therefore, the power network information matrix H is constructed as According to the DC power flow model, the relationship between measurements z and state variables x can be expressed as where z ∈ ℜ m×1 is the measurement vector consisting of branch active power flow measurements and bus active power injection measurements, m is the total number of measurements, x ∈ ℜ n×1 is the state vector of bus phase angles except for the reference angle fixed as θ o = 0, n + 1 is the total number of buses, and v ∼ N 0, R is the Gaussian measurement noise vector with a diagonal covariance matrix [39].Real-time state information is important to provide operation supports such as ensuring system stability.However, we consider that the power system is static and the measurement equation is linear.For the system, the weighted least square (WLS) method [39] is used for state estimation because it is able to handle regression situations in which data points are of varying quality.The state of system x, which is estimated by the WLS method, follows Remark 1.A recursive filter algorithm to deal with dynamic state estimation problem for power systems with quantized nonlinear measurements is proposed in [40], which is for dynamic nonlinear systems.However, the main objective of this paper is to design an effective sparse attack strategy so that the attackers can compromise as fewer measurements as possible to destroy the measurement information accuracy.This paper considers that the power system is static and the measurement equation is linear, where the weighted least square (WLS) method is employed to estimate the system state.Thus, an attack regression model can be obtained and the choice of measurements can be treated as a subset selection problem, which can be solved by the locally regularized fast recursive (LRFR) algorithm to effectively improve the sparsity of attack vectors.
Further, the measurement estimates can be obtained as When the system is attacked, false data in measurements may mislead the results of state estimation.Traditional BDD methods identify and detect bad data by testing 3 Complexity the measurement residual, which is denoted as the difference between the observed measurements z and the estimated measurements ẑ, i.e., r ≜ z − ẑ = z − Hx.Generally, if the L2norm of the residual vector r exceeds a certain threshold (i.e., r 2 > τ), bad data may exist in the measurements z.
Remark 2. The selection of threshold τ is a key issue for BDD based on residual L2-norm, which can be determined according to the LNR test [39].The process is as follows: The normalized residuals [41] are defined as Referring to the literature [39], the normalized residual threshold τ N can be chosen, where the normalized residual generally follows the standard normal distribution, i.e., r N,i ~N 0, 1 , and τ N is determined by looking up the standard normal distribution table.For example, if the probability of false detection is set as P e = 0 005, i.e., P r N,i < τ N = 1 − 0 005, the range of the normalized residuals is expressed as Then, the range of normal residuals can be obtained as r i < 2 81 ∑ ii , i = 1, … , m, and the threshold τ of the residual L2-norm test is further solved as With the wide application of the communication technology in SG, the attackers can access the SCADA system through a communication network.If the attackers can obtain the full network information of the power grid and enough measurements, they can then inject purposely the predesigned false data into some measurements without posing any anomalies to the traditional BDD based on the residual L2-norm test, which will inevitably bring serious security threats to SG. 4 Complexity

Undetectable Attacks with Complete Power Network
Information.Generally, the attackers intrude into the power system by compromising the readings of certain smart devices intentionally.That is, the original measurements z are manipulated by the false data a, i.e., z a = z + a.The state estimation deviation caused by the attack is denoted as xa = x + c, where c ∈ ℜ n×1 is the arbitrary error vector injected into state estimation.Thus, the attack residual L2-norm can be expressed as where Δ a = a − Hc is defined as the attack residual increment.There are two cases for an attack to bypass the BDD.In the first case, the attack vector is carefully constructed to satisfy a = Hc, namely, Δ a = 0.Then, the attack will not be detected by the traditional BDD because the injected false data no longer affect the original residual L2norm, i.e., r a = r .Such attacks are thus called perfect FDIAs.In the second case, there is a ≠ Hc, namely, Δ a ≠ 0, but the condition Δ a < τ − r ≜ τ a must be met.Such attacks can also successfully bypass the traditional BDD, which are called imperfect FDIAs.
Remark 3.For the first case, the perfect FDIAs depend on a strong assumption that the attackers have complete knowledge about the network information of the power grid and are capable of accessing all measurements.However, it is more practical for the attackers to obtain incomplete power network information and limited measurements, which makes the attackers unable to launch perfect FDIAs successfully in the practical attack situation.Therefore, how to design a new attack strategy with limited network information and attack resources is an interesting and open problem.
Remark 4. For the second case, the threshold τ a represents the fault tolerance ability of the BDD, which is related to the measurement noises contained in the original measurements z.If the measurement noises are smaller, the original residual L2-norm r is smaller and then the threshold τ a is larger; otherwise, if the measurement noises are larger, the original residual L2-norm r is larger and then the threshold τ a is smaller.When the measurement noises are given, the fault tolerance ability of the BDD τ a is determined.Therefore, if the attack residual increment is set within the range of τ a , the attack can bypass the BDD with a high success rate.
Remark 5. Recent researches have shown that an undetectable false data injection attack can still be accomplished even with the incomplete power network information.However, there is shorting of the more practical FDIA strategies with both incomplete power network information and limited attack resources considered.

Sparse Imperfect FDIA Strategy with Incomplete Power
Network Information and Measurements.It is analyzed above that the attackers cannot have complete power network information and measurements due to the practical issues.However, we find that some unknown information, i.e., transmission line admittances D k , can be calculated according to the obtained measurements and power network information.Thus, more information about the measurement Jacobian matrix H can be known indirectly.In terms of the unknown D k , we have the following theorem.
Theorem 1.When the phase angle difference θ i − θ j between the two buses of the branch k = i, j can be calculated indirectly and the branch power flow f ij is known, the unknown element D kk = −b ij in the branch admittance matrix D can be calculated by using the branch active power flow equation Proof 1. Firstly, a generator bus needs to be selected as the reference bus o, and the adversary needs partial knowledge of the network topology to find the paths from bus o to bus i and bus j separately (at least one path can be found according to the network topology connectivity).Assuming that the path o → i passes through a sequence of intermediate buses Then, the phase angle θ i can be calculated as Next, the phase angle θ j can be calculated in a similar way, and the adversary needs to further know the branch power flow f ij .Then, the unknown element D k in the branch admittance matrix D can be obtained finally by Therefore, the proof is completed.Remark 6.According to Theorem 1, some unknown information in H can be mathematically compensated.But there is still some unknown power information that cannot be compensated.Thus, according to the sufficient condition a = Hc for constructing false data inject attacks, namely, a i = ∑ n j=1 h ij c j , when the i th row of the Jacobian matrix H contains unknown elements due to the incomplete power network information after the compensation, set a i = 0.Meanwhile, when the measurement z i cannot be obtained, set a i = 0 similarly.Thus, we can divide all the measurements into the secure set S (or z S ) and the attackable set F (or z F ), where a i = 0, ∀i ∈ S.
Assume that the number of elements in the set F is M; i.e., an attacker can tamper with M smart devices at most.

Complexity
Thus, an attack vector a with at most M nonzero elements can be expressed as By removing the zero elements from a, it gets a = a i 1 , a i 2 , … , a i M T with M elements.Then, we have where a S i = 0, ∀i ∈ S.
Remark 7. The reason that the Jacobian matrix H can be divided into two parts H S and H is as follows.After the compensation for some unknown information in H, if the i th row of the Jacobian matrix H contains unknown elements, set a i = 0.Meanwhile, if the measurement z i cannot be obtained, set a i = 0 similarly.Thus, we can divide all the measurements into the secure set S and the attack set F, where a i = 0, ∀i ∈ S.Then, the attack vector a can be divided into the secure section a S and the attack section a. Correspondingly, the Jacobian matrix H can be divided into two parts H S and H by the row.
Suppose that there exists a vector c satisfying H S c = 0 = a S , but there is no guarantee that a = Hc will be met.According to (12), we thus have T is known completely by the attacker and h i j = h i j ,1 , h i j ,2 , … , h i j ,n 1 ≤ j ≤ M is the i j th row of the matrix H.According to (13), the construction of an imperfect attack needs to be considered.Therefore, considering the attackable set F, we formulate the imperfect attack subvector as where ε is defined as an attack deviation vector.According to the BDD detection mechanism, if ε 2 = Δ a < τ a is satisfied, the attack can bypass the BDD well.
Remark 8. Calculating the test threshold τ and the fault tolerance range of the BDD τ a requires the complete Jacobian matrix H and all the measurements z, which cannot be obtained completely by the attackers in the practical situation.Thus, we take the obtained measurements and parameter information to calculate the approximation thresholds τ and τ a .We can have H and R and calculate τ by (8).Furthermore, due to r = z F − Hx , the approximation fault tolerance range of the BDD τ a = τ − r 2 is then calculated by (9).If ε 2 < τ a is satisfied, the attack can bypass the BDD well.
For the convenience of subsequent calculation, define the attack vector as a = a 1 , a 2 , … , a M T .The state error vector c is unknown, so the attack vector a cannot be calculated directly.However, the attackers can find an attack vector a as follows.Let Ψ = H H T H −1 H T and B = Ψ − I.It is easy to see that ΨH = H [28].The attackers can simply multiply Ψ to both left sides of the relation a = Hc + ε to obtain a sequence of equivalent equations as where For constructing an attack vector a, it is the first step to determine an exploitable measurement such as the i th measurement in the attackable set F and inject the predesigned false data a i into it.According to Ba − Βε = b 1 a 1 + ⋯ + b m a m − Βε = 0, we move the term b i a i to the right of the equation.Then, we have b Next, the attack vector can be constructed by using the identification estimation method.Let −b i a i = Y, and the attack regression model is obtained as where , and Ξ = −Βε is defined as the residual error vector.ε is the system residual increment caused by an attack, which is time-varying and usually regarded as Gaussian white noise; thus, Ξ is treated as the Gaussian noise.And if Ξ = Β ⋅ ε < Β ⋅ τ a is satisfied, the attack can bypass the BDD well.For convenience, define the vectors again as Remark 9. Based on the attack regression model with noise, the fast stepwise forward algorithms, i.e., the fast regression algorithm (FRA) [42], can be used to choose the nonzero attack vector elements one by one, each time maximizing the model error reduction ratio.However, the existence of the residual error vector Ξ can lead to overfitting of the attack model with more measurements selected, which is inconsistent to attackers' wish that as few measurements as possible need to be manipulated for an undetectable attack with the limited resources.To improve the sparsity of attack vectors and the generalization performance of the attack model, the locally regularized fast recursive (LRFR) algorithm [43] is used next to choose significant attack vector elements by associating each candidate attack vector element with an 6 Complexity individually regularized parameter, which is optimized within the Bayesian evidence framework.

The Locally Regularized Fast Recursive Algorithm
When the LRFR algorithm is used to construct the attack vector, each attack vector element corresponds to a candidate (i.e., the column vector b i ′) in the regression matrix B′.Then, identifying the smallest set of attacked measurements is equivalent to the selection of the significant model candidates in the regression matrix B ′ .In this paper, a regularization technique is used to bind a regularization parameter to each candidate, and the Bayesian evidence framework is used to optimize the regularization parameters.Next, the significant candidates are directly selected according to the model error reduction contribution of each candidate term with a regularization parameter, leading to the construction of a compact regression model.To reduce the computational complexity, some proper regression contexts are further defined which allows fast implementation of the proposed method.
3.1.Generalization of Sparse Attack Vectors.The regularization technique, which introduces a decay term into the cost function, has been proposed to overcome the overfitting problem.A regularized cost function J based on the attack regression model ( 16) is shown as where λ = λ 1 , … , λ M−1 T is the regularization parameter vector that has the same dimension as the column vectors in the regression matrix B′ and Λ = diag λ 1 , … , λ M−1 is a diagonal matrix.To ensure that the attack can bypass the BDD well, the value of J needs to satisfy the corresponding condition (e.g., J < Β ⋅ τ a 2 ) by Ξ = Β ⋅ ε < Β ⋅ τ a .Thus, to guarantee that the condition is established, the appropriate termination criterion needs to be set.Then, the least-squares estimate of the attack vector elements that minimizes ( 17) is given as Remark 10.As can be seen from ( 18), each attack vector element âi ′ is bound to a regularization parameter λ i .It has been demonstrated that Bayesian evidence framework inference can be used to optimize the regularization parameter vector λ.During the optimization process, if the values of some regularization parameters are getting larger and larger, the corresponding attack vector elements will become smaller and smaller and approach zero.That is, the corresponding measurements will not be selected to attack as the false data injected into them approach zero.This provides an effective way to guarantee the sparsity of attack vectors.
In forward subset selection, suppose that k out of M − attack regression vectors p 1 , p 2 , … , p k has already been selected.The remaining vectors (corresponding to the candidate attack vector elements) from the regression matrix B ′ are p k+1 , p k+2 , … , p M−1 .For an attack regression model with k elements, it follows that where The latter contains the corresponding k regularization parameters from the full vector λ in terms of the original indexing of selected, the selected regression matrix increases by one column, becoming The regularized cost function is then updated as and the contribution of b j ′ as the k + 1 th attack vector element is given as To select a new attack vector element, the contribution in (23) has to be computed for each of the M − k − remaining candidate elements as The one that produces the largest error reduction to the cost function is then chosen as the k + 1 th attack vector element.In this way, the compact attack regression model is constructed in a forward selection way; i.e., the attack vector elements are selected to attack, one at a time according to the size of their contributions.The selection of the attack vector element continues until some attack vector construction criterion is satisfied.
The use of the LRFR will generate an initial subset of measurements to be attacked.To further reduce the number of the measurement subsets, the level 2 inference of the Bayesian evidence framework will be used to optimize the regularization parameters.Complexity

Optimization of the Regularization Parameters. Define
T as the vector of hyperparameters and β as the noise parameter, i.e., the inverse of the variance of the noise Ξ.The regularization parameters are equivalent to the ratio of the hyperparameters to the noise parameter explained by the level 1 inference of the Bayesian evidence framework [41]; i.e., The second level of inference can determine the values of h and β by maximizing the posterior distribution.To further optimize the regularization parameters by the level 2 inference of the Bayesian evidence framework, the Hessian matrix G is given as where 16) and ( 21), it is clearly shown that G = βS.Defining quantities Then, define γ = ∑ M−1 i=1 γ i ; it follows that Substituting ( 23) and ( 24) into λ i = h i /β, the updating formulas for the regularization parameters can be given as 3.3.Reduction of the Computational Complexity.To reduce the computational complexity, the following two steps need to be achieved: firstly, the updating of the regularization parameters λ i 1 ≤ i ≤ M − 1 is relieved by using a recursive formula derived below and, secondly, some proper regression contexts are defined to significantly reduce the computation effort of the updating process (23).Before further reducing the computational complexity, two steps are achieved as follows.
For the first step, suppose that the inverse of S −1 k+1 is defined as where F k ∈ ℜ k×k , g k ∈ ℜ k×1 , and u k ∈ ℜ.Also, another two Then, ( 29) can be computed by using the following recursive formulas: According to ( 25) and ( 29), it is obvious that the inverse of the Hessian matrix can be updated using the recursive formula G −1 k+1 = β −1 S −1 k+1 .For the second step, define a residual matrix series as where R k , 1 ≤ k ≤ M − 1 is of full-column rank and R 0 ≜ I.Then, R k+1 can be expressed using the following recursive formula: According to ( 32), ( 23) can be computed as To further simplify the computation of ( 33), another quantity involving b j ′ ∈ p k+1 , … , p M−1 is now introduced: From (32), R j k can be recursively updated as Substituting ( 34) into (33), the reduction contribution of b j ′ to the cost function can be explicitly expressed as 3.4.Complete Algorithm.The procedure for the proposed attack strategy can be summarized in Figure 2.

Simulation and Results
To verify its effectiveness and feasibility, the proposed new sparse imperfect attack strategy is tested on a IEEE 30-bus system as shown in Figure 3. Firstly, the sparsity of the imperfect attack vectors constructed by FRA and LRFR is compared.Then, from the operator's viewpoint, the probability that the attackers can successfully construct an attack vector bypassing the BDD is calculated.Finally, a practical coastal area distribution network system is tested to further demonstrate the effectiveness of the proposed approach in practical systems.
4.1.Case 1.The IEEE 30-bus system consists of 30 buses and 41 transmission lines.Bus 1 is selected as the reference bus with the reference phase angle θ 1 = 0.The unknown admittances and the unknown measurements by attackers are summarized in Table 1.

The Feasibility of the New Attack Strategy.
There are a total of 112 measurements in the IEEE 30-bus system, where the 1 st -30 th measurements are bus active power injections, the 31 st -71 st measurements are power flows at "from" buses, and the 72 nd -112 th measurements are power flows at "to" buses.However, the system is assumed to be measured with 97 measurements except for 15 unknown measurements in Table 1, and the noise of each measurement follows v i ∼ N 0, 0 05 2 .It is assumed that the attackers are able to obtain the topology information but unable to acquire 3 admittances shown in Table 1.According to Theorem 1, an attacker can take the line flow measurements to calculate the phase angle difference between buses 2 and 6; i.e., cannot be obtained due to the attacker's limited resources or the physical protection of some smart meters, set a 2 , a 8 , a 17 , a 19 , a 24 , a 42 , a 48 , a 57 , a 58 , a 66 = 0 and a 83 , a 89 , a 98 , a 99 , a 107 = 0 similarly.According to (8) and ( 9), we can calculate the approximation threshold τ = 1 1063 and τ a = 1 1063 − 0 0198 = 1 0865.
The proposed new sparse imperfect attack strategy based on LRFR is used to construct the sparse attack vector.The 10 th measurement in the attackable set F is set as the initial attacked measurement, and the corresponding attack vector element (i.e., injected false data) is set as a 10 = 0 0970.Each element of the residual error vector Ξ is set to follow Ξ i ∼ N 0, 0 25 2 .The initial value of ρ for terminating the attack vector element selection is chosen to be as small as possible such that in the first iteration of regularization parameter optimization, a large measurement subset is produced.This ensures that the significant measurements are not missed when λ is far from its optimal value, and the attack can bypass the BDD well.
Compensate some unknown information in H by (10) Divide all measurements into z F and z S Determine a predesigned false data a i , obtain the attack model Y = B′a′ +  by ( 16

Complexity
After the first iteration, there are 40 candidate attack vector elements.The attack model is then refined until the regularization parameter λ converges at the 114 th iteration.Some values of the regularization parameters and false data are listed in Table 2.It is shown that all attack vector elements added after the 25 th measurement have very large regularization parameters and their corresponding values (i.e., the false data injected into the measurements) are very close to zero.Therefore, the final attack vector is constructed by this new method effectively with only 25 nonzero elements.
The change of the resultant attack model error is shown in Figure 4 and Table 3, respectively.The parameter estimation process when 25 measurements are selected is shown in Figure 5. Table 4 shows the final selection order of the measurements and the estimation values of the attack vector elements.

The Sparsity Comparison of the Attack Vectors
Constructed by LRFR and FRA.The sparse attack strategy based on FRA is used firstly to construct the sparse attack vector.Then, the sparsity of attack vectors constructed by LRFR and FRA is compared.It can be seen from Figure 6 that the strategy based on FRA selected 55 measurements and the new proposed strategy based on LRFR only selected 25 measurements.Thus, the proposed new attack strategy is able to produce a sparser attack vector.

Complexity
The attack residuals calculated in the 500 simulations are shown in Figure 7, which indicates that 15 residuals exceed the threshold τ.Thus, the success rate of constructing attack vectors P s is calculated as where N F represents the number of detected attacks.This shows that the proposed novel attack strategy can effectively construct the attack vectors with high success rate while ensuring the sparsity of attack vectors.

Case 2.
The proposed new sparse attack strategy is next tested on a practical coastal area distribution network system [44] as shown in Figure 8.The distribution network system consists of 23 buses, 12 transformers, and 15 transmission lines (including overhead lines and underground cables).It covers 4 voltage levels containing one 110 kV/35 kV substation, two 35 kV/10 kV substations, and five 10 kV/0.4 kV substations.All branch impedance parameters (named values) are given in Table 5.Table 6 shows the active requirements of all 15 loads C1~C15 in the system.
There are a total of 67 measurements in the coastal area distribution network system.The acquisition of the measurements used in this simulation is from "Matpower" [45] directly, which is modified by [44].Firstly, we can get the topology and parameter in [44] (i.e., all branch impedance parameters and the active demand of loads and voltage levels).And the distribution network system has 23 nodes and 22 branches.Then, we set the settings (e.g., system MVA base, bus data, generator data, and branch data) of the "Matpower case" according to the information of the distribution system.Finally, we set the number of measurements m = 23 + 22 × 2 = 67, and the measurements are got by power flow calculation for the modified "Matpower case."Therefore, the simulation can be performed well by using the sufficient measurements.Moreover, if only a small number of measurements are available, the state estimation can still be used.For example, a new state estimation method referred to as the "mean squared estimator" (MSE) [46] is proposed, which is accurate with a limited number of measurements with guaranteed convergence.
However, considering the practical attacking situation where only partial topology and parameter information of the power grid are available and the attackers can only have access to limited smart meters, the system is assumed to be measured with 54 measurements except for the 13 secure measurements p 6 , p 9 , p 19 and ±f 2−4 , ±f 8−11 , ±f 8−16 , ±f 12−14 , ± f 20−21 .Thus, set a 6 , a 9 , a 19 = 0 and a 26 , a 32 , a 34 , a 39 , a 44 , a 48 , a 54 , a 56 , a 62 , a 66 = 0.According to (10), we can then calculate the approximation threshold τ = 0 7948 and τ a = 0 7948 − 0 0160 = 0 7788.The distribution of the measurement noise vector v and the distribution of the residual error vector Ξ are the same as in Section 4.1.
The proposed new sparse attack strategy is then used to construct the sparse attack vector against the practical coastal area distribution network system.The 40 th measurement is selected from the attackable set F as the initial attack measurement, and the corresponding attack vector element (i.e., injected false data) is set as a 40 = 0 1873.The candidate measurement set after the first iteration contained 36 candidate measurements.The regularization parameters and the elements of the attack vector (injected false data) after λ converges at the 155 th iteration are listed in Table 7.According to Table 7, the regularization parameters relating to the false data from the 23 rd to the 36 th are all very large and the associated attack vector elements are effectively close to zero.Thus, the final attack vector with 22 nonzero elements is produced.The parameter estimation process when 22 measurements are selected is shown in Figure 9. Table 8 shows the final selection order of the measurements and the estimation values of the attack vector elements.
Compared to the attack strategy based on FRA, Figure 10 shows that the LRFR algorithm produced a sparser attack vector with a reduction of 8 nonzero elements.The results confirm that the proposed method can produce a smaller number of measurements to attack again.
The full Jacobian matrix H and the full measurements z are known for the operators.Then, the detection threshold can be calculated as τ = 0 9425.The LRFR method is tested on a separate set of 500 imperfect attack regression models generated by the same way in Section 4.1.3.The attack residuals calculated in the 500 simulations are shown in Figure 11, which indicates that 30 residuals exceed the threshold τ.Thus, the success rate of constructing attack vectors is calculated as where N F represents the number of detected attacks.This shows that the proposed novel attack strategy can effectively construct the attack vectors with high success rate while ensuring the sparsity of attack vectors again.12 Complexity

Conclusion
The paper has proposed a novel sparse false data injection attack method in SG with incomplete power network information.Firstly, according to the obtained measurements and network information, some incomplete network information is compensated by the power flow equation.Then, the fault tolerance range of the BDD for the attack residual increments is estimated by calculating the detection threshold of the residual L2-norm test.Finally, an effective sparse imperfect strategy is proposed by treating the choice of measurements as a subset selection problem, which is solved by the LRFR algorithm to effectively improve the sparsity of the attack vector.The effectiveness of the proposed attack strategy is verified by the two cases  13 Complexity studied.For further work, considering that the attack does not rely on any prior power system network information but only uses the measurements, this will save attack resources and bring more complex effects on the smart grid.Thus, a blind sparse attack construction strategy is important and meaningful for finding the vulnerabilities of the power system.

Figure 1 :
Figure 1: Power network control system under false data injection attacks.

) 2 Figure 2 :
Figure 2: The procedure for the proposed attack strategy.

Figure 4 :
Figure 4: The change of the attack model error by the LRFR.

Figure 5 :
Figure 5: Parameter estimation process when 25 measurements are selected.

Figure 6 :
Figure 6: The change of the attack model error based on the FRA and LRFR.

Figure 7 :
Figure 7: The attacked residuals calculated in 500 detection experiments.

Figure 8 :
Figure 8: Topology diagram of the generic distribution network (N: node, T: transformer, L: line, and C: customer).

Figure 9 :
Figure 9: The parameter estimation process when 22 measurements are selected.

Figure 10 :
Figure 10: The change of the attack model error based on the FRA and LRFR.

Figure 11 :
Figure 11: The attacked residuals calculated in 500 detection experiments.

Table 2 :
Regularization parameters and false data for selected measurements.The distribution of the measurement noise vector v and the distribution of the residual error vector Ξ are the same as in Section 4.1.Firstly, a measurement z i ∈ F is randomly selected and a random false data a i is injected into it to construct an imperfect attack regression model; 500 groups of imperfect attack regression models are then generated repeatedly in the same way.For each attack regression model, the LRFR method is used to solve the attack vectors and the attack residual is calculated to determine whether the attack vectors can successfully bypass the BDD.

Table 3 :
The change value of the attack model error by the LRFR.

Table 4 :
Final selection order and estimation values of attack vector elements.

Table 5 :
Impedance parameters of network lines.

Table 6 :
The active demand of loads.

Table 7 :
The values of attack vector elements when converged at the 155 th iteration.

Table 8 :
Final selection order and estimation values of attack vector elements.