Efficient Privacy-Preserving Data Aggregation Scheme with Fault Tolerance in Smart Grid

As the traditional grid produces a large amount of greenhouse gas and cannot adapt to such new demands as dynamic electricity prices, data analysis, and early warning, smart grid with high eﬃciency and reliability is increasingly valued. It plays a key role in achieving carbon neutrality. Nonetheless, smart grid requires the collection of real-time power data, and personal privacy may be leaked through the frequent electricity measurement reports. With the requirements of data analysis and prediction while preserving users’ personal privacy, data aggregation schemes have emerged. However, existing schemes cannot resolve all the troubles well. Some schemes do not consider the failures for smart meters, and most of the schemes have expensive computation cost. In view of this, an eﬃcient privacy-preserving data aggregation scheme with fault tolerance in smart grid is put forward in this paper. To be speciﬁc, the proposed scheme is lightweight due to the application of the symmetric homomorphic encryption technology and the elliptic curve cryptography. Even if some smart meters are destroyed, the proposed scheme can still successfully obtain aggregated data. Moreover, the proposed data aggregation scheme is proved to be secure, and all security requirements can be satisﬁed. Performance evaluation illustrates the relatively low computation cost and communication overhead of the proposed scheme compared to other related schemes.


Introduction
In recent years, the negative effects of global warming have become increasingly significant, as can be observed from the rising sea levels and the destruction of biodiversity. All countries are looking for ways to achieve carbon neutrality [1]. e application of smart grid (SG) can effectively accelerate the realization of this goal, and SG is included in long-term development plans [2][3][4][5][6]. Compared with traditional grid, SG has the advantages of conforming to lowcarbon sustainable development, adopting a two-way communication mode, and allowing for diversified gradient electricity prices and early warning based on status analysis [7][8][9]. ese features compensate for various shortcomings of traditional power grid; therefore, SG is considered an excellent next-generation power system. Figure 1 illustrates the framework of SG, which consists of the markets, control center, service provider, energy generation, transmission, distribution, and customers [10].
For the information communication in SG, a large number of sensors are employed, especially smart meters (SM), which need to collect real-time household power measurement data every 10-15 minutes and send them to the control center (CC) for electricity data analysis and dispatch [11]. It is very time-consuming for a large amount of data transmission; at the same time, real-time data transmission also raises people's privacy concern. According to the survey [12,13], individual real-time electricity consumption data will expose sensitive information of users; for example, the lifestyle and living habits of family members might be exploited by malicious adversary.
In order to deal with the above contradiction simultaneously, data aggregation technology has been proposed, where SM will use homomorphic encryption to protect realtime power data and upload ciphertext to gateway (GW); then, data ciphertext is aggregated by GW and sent to CC. Finally, CC can take advantage its private key to decrypt the aggregated ciphertext, but it is unable to gain a single power measurement data of a SM. In this way, the users' personal privacy information can be protected, and in the meanwhile, CC can analyze power measurement data and allocate and adjust power supply in a timely and reasonable manner.
Although there have been many data aggregation schemes , many issues are still worthy of further improvement.
First, SM exposed to an environment without any protection may malfunction and cannot send reports; the lack of partial power data reports may make the system fail to recover the aggregated power data. e feature of fault tolerance enables the recovery of the aggregated data despite the SM malfunction. Some previously existing aggregation schemes [14,[16][17][18][19][20] do not support fault tolerance, so they cannot obtain normally aggregated data, and the entire system will be paralyzed. Although some other schemes [22,24,29] achieve fault tolerance, their trusted authority (TA) or CC needs to do some special operations, such as generating dummy ciphertext; this is not very practical because the number of GW and SM under the control of the TA is enormous, which will bring unbearable computation costs to the TA.
Second, since the GW and CC are semitrusted, they may launch collision attacks to obtain the private data of a single SM. In the existing schemes [22,28,30,37], the data reports are encrypted directly with the public key of CC. If GW sends the ciphertext of a certain SM to CC, or CC accidentally obtains the single ciphertext, the individual report can be decrypted by CC through its private key.
ird, identity privacy is also a kind of secret information, which should be protected; meanwhile, when malicious SM appear, its real identity should be revealed by the TA. Some existing schemes [18,23,25,26] fail to consider identity privacy, some other schemes [16,17,20,27] achieve identity anonymity, but the way is that the data reports do not contain identity information, which makes it impossible to trace the identity of the malicious SM when it appears.
In order to settle the above problems and realize further optimization, an efficient data aggregation scheme that would support fault tolerance is proposed. e primary contributions are shown below: (i) e proposed scheme applies lightweight symmetric homomorphic encryption technology and elliptic curve signature to accomplish efficiency, instead of commonly used time-consuming public key homomorphic encryption technologies such as Paillier [38] and BGN [39]. It is also characterized by the feature of fault tolerance, thus being able to run normally even if some SM fail to upload data reports.
(ii) e security analysis formally proves that the proposed aggregation scheme is secure based on (L, p)-based decision problem and elliptic curve discrete logarithm problem. Moreover, the proposed scheme would implement required security requirements, especially to resist collusion attacks. (iii) Performance evaluation carries out quantitative analysis, and the result displays; the proposed scheme involves less computation cost and communication overhead compared with other related data aggregation schemes.
e structure of the rest of this paper is allocated as follows. Section 2 displays the related works of data aggregation schemes. e background and preliminaries are given in Sections 3 and 4, respectively. In Section 5, the proposed scheme is introduced in detail. Sections 6 and 7, respectively, illustrate the security analysis and the performance evaluation. Ultimately, the conclusion is described in Section 8.

Related Work
With the long-term research of data aggregation technology, many problems are considered to satisfy the security requirements, for instance, collision attack, fault tolerance, and identity privacy protection. e related aggregation schemes are vulnerable to various attacks, especially collision attacks. Compared with external adversaries, internal attackers are more likely to damage the SG system because they have more private information. Fan et al. [14] first considered collision attacks and successfully resisted them by virtue of blinding factors assigned by a trusted third party. Regrettably, Bao and Lu [15] illustrated the integrity drawback of the scheme [14], which lies in that the private key of the user was easily recovered so that data pollution would be caused. He et al. [16] created the certificateless data aggregation scheme by the mechanism of elliptic curve cryptography which could speed up the process and withstand the collision attacks. He et al. [17] improved the BGN scheme to realize data aggregation scheme against collision attacks. Zhang et al. [18] considered the false data injection attacks and prevented them with the blinding factors. Li et al. [19] applied the BGN encryption and blinding factors to complete data aggregation scheme that can prevent collision attacks. Shen et al. [20] put forward the aggregation scheme that can counteract new malicious data mining attacks and internal attacks with BLS short signature. Based on elliptic curve cryptography, a scalable data aggregation scheme was designed by Chen et al. [21], where the encryption key, instead of the public key of CC, was generated independently, even if CC cannot decrypt the single ciphertext.
Once SM malfunction appears and hinders the normal submission of electricity reports, most of the above schemes [14,[16][17][18][19][20] would be paralyzed. erefore, fault tolerance needs to be taken into consideration. In the scheme [22] of Chen et al., a trusted third party additionally generated the dummy ciphertext for the damaged SM to ensure the smooth running of the agreement as the trusted third party held the private keys of all users. Bao and Lu [23] advanced the differentially private aggregation scheme with fault tolerance, where CC was still able to receive the aggregated data from the remaining reports. Pan et al. [24] combined with Lagrangian interpolation technology to propose a two-dimensional and fault-tolerable privacy-preserving aggregation scheme. Ge et al. [25] put forward a fine-grained data analysis scheme which could still run even if the meter was failed, and this scheme could obtain a variety of statistics. Xue et al. [26] proposed the privacy-preserving service outsourcing scheme, which supported fault tolerance mechanism and flexible electricity price. Guan et al. [27] utilized the Shamir sharing method and RSA signature to implement the fault tolerant aggregation protocol, yet this protocol cannot be decrypted correctly. In [28], Ding et al. raised an identity-based secure data aggregation scheme, which supported fault tolerance due to their particular ciphertext structure. Wang et al. [29] skillfully improved the Paillier encryption to get the fault tolerant data aggregation scheme through collaborating between users; unfortunately, the integrity was not considered here. In general, when facing SM malfunction, TA will perform additional operations to achieve fault tolerance, but it will be overloaded because it manages many GW and many SM under GW.
In addition, user identity privacy is an important security problem. Liu et al. [30] utilized the blind signature technology to realize an anonymous data aggregation scheme, where the token was unlinkable to any valid signature. Combined with the ring signature technology, Badra and Zeadally [31] blocked the connection between the content of the report and the identity of the SM. Tan et al. [32] designed a privacy-preserving pseudonym-based collection scheme, where the SM adopted the group key to generate a pseudoidentity so that the adversary was unable to get its real identity. Gong et al. [33] satisfied anonymity by separating data reports and identity of the SM.
Although the above schemes achieve different functions and features, most of them are time-consuming, which can make SM with limited calculation resources embarrassing. He et al. [34] applied batch verification to accelerate the execution of the aggregation scheme. Combining the elliptic curves cryptography and super-increasing sequence technology, Ming et al. [35] came up with an efficient privacypreserving multidimensional data aggregation scheme, which can classify power measurement data and achieve fine-grained analysis. In scheme [36] of Shen et al., XOR operation of pseudorandom function was employed to encrypt power data and realize confusion so that the adversary could not identify the source of the reports. Zhang et al. [37] adopted online and offline signature technology to create a lightweight aggregation scheme, which would help to speed up the signature verification process.

Background
e background of the proposed scheme is described, mainly including system model, security requirements, and design goal in this section.

System Model.
In the proposed scheme, the system model is divided into four entities: trusted authority (TA), control center (CC), gateway (GW), and n smart meters (SM), as shown in Figure 2. For the ease of description, considering only one GW, we link n(n > 1) smart meters in the model.
(1) TA: it is a fully trusted entity, who produces the blinding factors and the secret value for SM. TA would recover the exact identity of the SM with malicious behavior.
(2) CC: it is a semitrusted entity, who generates the system parameters. CC is also in charge of the registration of SM and GW. In addition, after receiving aggregated encrypted data from GW, CC will decrypt and analyze them. contains t members, that is, ω · t � n. In the ith domain D i , a random member is selected as the domain header. Without loss of generality, assuming that the first member SM i1 is appointed as the domain header SM H i1 by GW, obviously, the domain header SM H i1 itself is also a member in D i . Here, the domain member SM ij (j � 1, 2, . . . , t) is charge of collecting electricity measurement data of each user's household and sending it to SM H i1 . e domain header SM H i1 is responsible for preaggregating the data report in the domain D i and then uploading it to GW. Besides, SM ij is not allowed to send electricity report directly to GW. It is worth noting that all SM cannot collude with GW or CC.

Security Requirements.
e security requirements that the proposed scheme should satisfy are as follows: (1) Confidentiality: the electricity data are closely related to users' privacy information. erefore, only useless knowledge can be obtained even if the adversary gets the transmitted ciphertext.

Security and Communication Networks
(2) Authentication: it is necessary to realize the authentication because the report transmission between any two entities must verify each other to ensure legal identity. (3) Integrity: the report transmitted in the open channel may be tampered, and wrong message may be conveyed. So, the proposed scheme would detect whether the report has been altered. (4) Anonymity and traceability: no entity other than TA can determine or distinguish identity by analyzing transmitted reports. From another aspect, when a malicious SM uploads fake data, its true identity should be revealed by TA to supervise the behavior. (5) Resistance against common attacks: the proposed scheme should guarantee that many common types of attacks would be rejected, including but not limited to collision attack, modification attack, and replay attack.

Design Goal.
e proposed scheme satisfies the following the objectives.
(1) Privacy-preserving: the actual data and identity of a single SM are prohibited from being obtained by anyone. CC is only allowed to decrypt aggregated data ciphertext. In addition, the abovementioned attacks should be resisted. (2) Fault tolerance: it is unbearable that the aggregated data cannot be recovered when few SM are damaged. erefore, even if some SM fail to submit reports, the system should continue to run normally.
(3) High efficiency: on the premise of fulfilling the above security requirements, the proposed scheme tries to reduce the computation cost and communication overhead. For practical smart grid, an efficient scheme is more suitable for SM with limited resources.

Preliminaries
Two preknowledge are briefly stated in this section, including elliptic curve cryptography and symmetric homomorphic encryption.

Elliptic Curve
Cryptography. e definition of elliptic curve cryptography (ECC) comes from Millier [40]. Let G be an additive cyclic group of prime order q; the generator is P.
e security problem and assumption are described as follows.
Elliptic curve discrete logarithm problem (ECDL problem) [41]: given two elements P, Q ∈ G in the elliptic curve E as input, output an integer a ∈ Z * q where Q � aP. Elliptic curve discrete logarithm assumption (ECDL assumption) [41]: it is difficult for the probabilistic polynomial time algorithm to solve the ECDL problem with a nonnegligible advantage.

Symmetric Homomorphic Encryption.
Mahdikhani et al. [42] designed a new symmetric homomorphic encryption; the algorithm is described as follows.
Enc(K, m, r, r ′ ): on inputting the symmetric homomorphic encryption key K and the plaintext m ∈ M, where the message space M is 0, 1 { } k 1 , the encryption algorithm selects two random numbers r ∈ 0, 1 { } k 2 and r ′ ∈ 0, 1 { } k 0 and encrypts the plaintext: Dec(K, c): on inputting the ciphertext c and the symmetric homomorphic encryption key K, the decryption algorithm decrypts the ciphertext: m � (cmodp)modL.
(2) e security of symmetric homomorphic encryption [42] is based on the following security assumption.
(L, p)-based decision problem [43]: given (k 0 , k 2 , N), the (L, p)-based decision problem is to determine whether an integer x ∈ Z N belongs to S or S without (p, q, L), where Blinding Factors (L, p)-based decision assumption [43]: it is difficult for the probabilistic polynomial time algorithm to solve the (L, p)-based decision problem with a nonnegligible advantage in k 0 and k 2 .

The Proposed Scheme
In this section, a detailed privacy-preserving data aggregation scheme that supports fault tolerant in the smart grid is proposed, consisting of six phases: initialization phase, registration phase, report generation phase, report aggregation phase, report reading phase, and fault tolerant phase. All the notations used in this paper are described in Table 1.
e general picture of the proposed scheme is depicted in Figure 3.

Initialization.
In this section, CC would produce the system parameters, TA would generate the blinding factors and the secret value for smart meters.

Control Center
(1) Given the security parameter (k 0 , k 1 , k 2 , K), CC produces an additive cyclic group G of the prime order q satisfying |q| � K; G is based on a nonsingular elliptic curve E which is defined over a finite field F p , satisfying p > q. CC chooses the generator P of G. (2) CC randomly chooses two large prime numbers p, q satisfying |p| � |q| � k 0 and computes the public parameter N � pq. CC chooses arbitrary

Trusted Authority
(1) TA selects the random number s TA ∈ Z * q as the master secret key and calculates the corresponding public key P pub � s TA P.

Registration
. SM ij and GW would register with CC, respectively, in this section.

Smart Meters' Registration
(1) SM ij randomly chooses s ij , u ij ∈ Z * q and calculates the public key S ij � s ij P and knowledge signature: CC randomly selects π ij ∈ Z * q and calculates the pseudoidentity

Gateway's Registration
(1) GW randomly selects s G , u G ∈ Z * q and computes the public key S G � s G P and knowledge signature

Report Generation.
In this section, SM ij would collect and transmit electricity data to GW.

SM ij Submits the Data Report to SM H i1
(1) SM ij collects electricity measurement data m ij , randomly selects r ij ″ ∈ 0, 1 { } k 2 , and computes where T is the current timestamp. (2) SM ij randomly chooses e ij ∈ Z * q and calculates E ij � e ij P, Security and Communication Networks i1 examines the timestamp T and verifies whether In order to speed up the verification, SM H i1 uses small exponent test technology [44] to achieve batch verification. SM H i1 randomly selects a set of tiny (2) Given t − 1 data reports and his own data report C i1 , SM H i1 randomly selects e iH ∈ Z * q and calculates where T iH is the current timestamp. (3) Finally, SM H i1 uploads the preaggregated report 〈ID i1 , C i , E iH , σ iH , T iH 〉 to GW.

Report Aggregation.
In this section, GW would verify and aggregate the preaggregated reports from SM H i1 . Afterward, GW would upload the aggregated report to CC.

Report
Reading. CC would verify and decrypt the aggregated report from GW in this section.
(1) Receiving 〈RID G , C, E G , σ G , T G 〉 from GW, CC checks the timestamp T G and verifies whether (2) CC decrypts aggregated electricity measurement data: (3) CC analyzes and processes the aggregated data and makes optimal allocation.

Security and Communication Networks 7
Correctness:

Fault Tolerant.
is section describes how to obtain the aggregated data when some smart meters fail to work normally.
(2) Given 〈ID i1 , C i , E iH , σ iH , T iH , λ〉, SM ij examines the timestamp T iH and verifies the signature σ iH . If it is valid, SM ij randomly selects e ij ∈ Z * q , r ij ∈ 0, 1 { } k 2 and computes where T ij is the current timestamp.

Indistinguishability.
e proposed scheme is proved to be the indistinguishability under the chosen plaintext attack (IND-CPA). e adversary A can execute the below queries: Hash query: given a hash query, output a random value Encryption query: given an encryption query on the message m ij , output the ciphertext C ij e security model can be defined by the interactive game played between the adversary A and the challenger C.
Setup: C produces the system parameters and sends them to A.
Phase 1: A adaptively executes the hash queries and the encryption queries for polynomial times.
Challenge: after completing phase 1, A randomly selects two messages m 0 ij , m 1 ij ∈ M and submits two messages to C. Next, C randomly chooses b ∈ 0, 1 { }, calculates the ciphertext C b ij corresponding to m b ij , and replies it to A. Security and Communication Networks e advantage for the adversary A to win the game is defined as Definition 1. e proposed scheme ensures IND-CPA secure if the advantage of an adversary in the above game is negligible.

Theorem 1.
e proposed scheme is IND-CPA secure under the (L, p)-based decision assumption.
Proof. Assume that the adversary A wins the game in Definition 1 with a nonnegligible advantage ε; an algorithm B would be constituted for breaking the (L, p)-based decision problem with advantage ε ′ . Given an arbitrary bit Setup: B sets the security parameter (k 0 , k 1 , k 2 ) satisfying k 1 ≪ k 2 < (k 0 /2) and chooses two large prime numbers p, q which satisfy |p| � |q| � k 0 . C calculates N � pq. Next, B randomly chooses L ∈ (0, 1) k 2 and sets up message space M � m|m ∈ (0, 1) k 1 . B secretly keeps (p, q, L) and returns (k 0 , k 1 , k 2 , N) to A.
For the purpose of continuous rapid response and consistency, B holds the below list. When z � 0, which means x ∈ S and αL < p, the ci- correctly guessing b is 1/2 + ε. erefore, the probability that B can successfully guess is Pr[Success of B|z � 0] � 1/2 + ε.
When z � 1, which means x ∈ S and αL ≥ p, the ciphertext C b ij � m b ij + H(T, k)θ + x � (m b ij + H(T, k)θ+ αL + βp)modN is an invalid ciphertext. e probability of A correctly guessing b is 1/2. erefore, the probability that B can successfully guess is Pr[Success of B|z � 1] � 1/2. Based on the above two cases, the probability that B would break the (L, p)-based decision problem is Consequently, B can break the (L, p)-based decision problem with nonnegligible probability, ε ′ � 1/2 + ε/2. is generates a conflict with (L, p)-based decision assumption; therefore, the proposed scheme is IND-CPA secure.

Unforgeability.
e security of the proposed scheme satisfies the existential unforgeability under the adaptively chosen message attack (EUF-CMA). e adversary A can execute the following queries: Hash query: given the hash query, output a random value Create user query: given a create user query on ID ij of SM ij , output the public key (S ij , U ij ) Corrupt user query: given a corrupt user query on ID ij of SM ij , output the private key s ij Signature query: given a signature query on the ciphertext C ij under ID ij of SM ij , output the signature σ ij e security model can be defined by the interactive game between the adversary A and the challenger C.
Initialization: A chooses a challenging identity ID * ij and submits it to C.
Setup: C produces the system parameters and sends them to A.
Query: A adaptively executes the hash queries, the create user queries, the corrupt queries, and the signature queries for polynomial times except the corrupt user query on ID * ij . Forgery: A produces a forged signature σ * ij on the ciphertext C * ij and the challenging identity ID * ij , such that (1) σ * ij is a valid signature (2) ID * ij has never been queried in the corrupt user queries e advantage for the adversary A to win the game is defined as 10 Security and Communication Networks Definition 2. e proposed scheme ensures EUF-CMA secure if the advantage of an adversary in the above game is negligible.

Theorem 2.
e proposed scheme is EUF-CMA secure under ECDL assumption.
Proof. Assume that the adversary A wins the game in Definition 2 with a nonnegligible advantage ε; an algorithm B would be constituted for breaking ECDL problem with advantage ε ′ . An instance (P, aP � Q) of ECDL assumption is established, the ultimate target of B is to discover a ∈ Z * q . Initialization: A selects a challenging identity ID * ij and submits it to B.
Setup: B selects security parameter (k 0 , k 1 , k 2 , K) and the cyclic group G.
B maintains the following three lists: Query: A adaptively executes the polynomial times following queries.
Hash H 2 query: A makes a H 2 query on (RID ij , S ij , U ij ) and B responds according to the following steps: randomly selects h 2,ij ∈ Z * q , inserts (RID ij , S ij , U ij , h 2,ij ) into the list L H 2 and responds h 2,ij to A Hash H 3 query: A executes a H 3 query for (C ij , S ij , ID ij , E ij , T) and B responds according to the following steps:

S ij , ID ij , E ij , T) is not included in the list
L H 3 , B randomly selects h 3,ij ∈ Z * q , inserts (C ij , S ij , ID ij , E ij , T, h 3,ij ) into the list L H 3 , and responds h 3,ij to A Create user query: this query is issued by A on the identity ID ij of SM ij and B responds according to the following steps: is not included in the list L SM ij , B executes the following steps: (i) If ID ij � ID * ij , B randomly chooses v ij , h 2,ij ∈ Z * q and sets S ij � aP � Q and U ij � v ij P − h 2,ij S ij ; if h 2,ij already emerges in the list L H 2 , B randomly selects another v ij ∈ Z * q and tries again. en, B inserts (RID ij , S ij , U ij , h 2,ij ) into the list L H 2 and inserts (ID ij , ⊥, S ij , v ij , U ij ) into the list L SM ij , respectively. Ultimately, B responds (S ij , U ij ) to A.
(ii) If ID ij ≠ ID * ij , B executes the smart meters' registration algorithm to produce (S ij , U ij ) and responds them to A.
Corrupt user query: this query is performed by A on the identity ID ij of SM ij and B responds according to the following steps: ij , B executes the following steps: executes the create user query on ID ij and responds (s ij , v ij ) to A.
Signature query: after receiving a ciphertext C ij and ID ij for a signature query, B responds according to the following steps: randomly chooses σ ij , T, h 3,ij ∈ Z * q and calculates E ij � σ ij P − h 3,ij S ij . If h 3,ij already emerges in the list L H 3 , B randomly chooses another σ ij ∈ Z * q and tries again. Afterward, B inserts (C ij , S ij , ID ij , E ij , T, h 3,ij ) into the list L H 3 and responds (ID ij , C ij , E ij , σ ij , T) to A.
(2) If ID ij ≠ ID * ij , B executes the report generation algorithm to produce (ID ij , C ij , E ij , σ ij , T) and responds them to A.
Forgery: A produces a forged signature σ ij on the ciphertext C ij under identity ID ij of SM ij , such that can produce an additional valid signature σ ij ′ through different hash value H 3 according to forking lemma [45]. e following two equations can be obtained: We can calculate ECDL problem's solution is obtained by B: Probability analysis: considering that A is allowed to execute at most q H 2 times H 2 query, q H 3 times H 3 query, q cre Security and Communication Networks times create user query, q cor times corrupt user query, and q s times signature query. e situation that B breaks the ECDL problem defined three events as follows: (1) E 1 : B never aborts the game for the corrupt user queries (2) E 2 : B can produce a valid signature (3) E 3 : ID ij � ID * ij According to the above simulation, there are erefore, the probability that B can solve the ECDL problem is us, B can break the ECDL problem with nonnegligible is produces a contradiction with ECDL assumption; consequently, the proposed scheme satisfies unforgeability security.

Analysis of Security Requirement.
e security requirements are analyzed comprehensively in this section.

Confidentiality.
On the basis of eorem 1, the adversary cannot decrypts the ciphertext C ij , C i , and C to collect electricity data without the key of symmetric homomorphic encryption K. Consequently, confidentiality can be satisfied.

Authentication.
Legal smart meter SM ij will register its identity information with CC in advance. After receiving the reports of SM ij , SM H i1 will verify whether σ ij P� ? S ij H 3 (C ij , S ij , ID ij , E ij , T) + E ij holds. Based on eorem 2, the adversary cannot create a valid authentication without the private key s ij . Obviously, authentication can be met.

Integrity.
e ciphertext C ij is signed to generate the signature (σ ij , E ij ). On the basis of eorem 2, the adversary cannot generate the legal signature without the private key s ij , and only valid reports can be accepted. So, this means integrity can be achieved.

Anonymity.
Every SM ij is set as a pseudoidentity ID ij � ID ij,1 , ID ij,2 in the registration phase corresponding to the real identity RID ij , where ID ij,1 � π ij P and ID ij,2 � RID ij ⊕H 1 (π ij P pub ). e adversary cannot get real identity RID ij without π ij or s TA . us, anonymity is guaranteed in the proposed scheme.

Traceability.
When SM ij has malicious behavior, only TA can calculate RID ij � ID ij,2 ⊕H 1 (s TA ID ij,1 ) by using private key s TA to uncover the true identity RID ij . In this way, the proposed scheme realizes traceability.

Resistance against Collision
Attack. GW can disclose extra ciphertext C i to CC. Next, CC can obtain t j�1 (m ij + H(T, k)θ ij ) by calculating (C i modp)modL. However, they cannot gain the plaintext t j�1 m ij without θ ij and k, even if CC obtains ciphertext C i by accident. Similarly, CC is still unable to obtain the real electricity data. Hence, the proposed scheme would withstand the collision attack.

Resistance against Modification Attack.
According to the guarantee of eorem 2, any modification of the data report by the polynomial adversary will be detected. Hence, the proposed scheme could resist the modification attack.

Resistance against Replay Attack. Since the reports
and 〈RID G , C, E G , σ G , T G 〉 contain the timestamp, the receiver could check the freshness of timestamp. erefore, the replay attacks can be withstood.

Functionality Comparison.
e functionality comparison with the related schemes [19-21, 27, 28] is shown in Table 2. Confidentiality, authentication, integrity, and fault tolerance are denoted by F1, F2, F3, and F4, respectively. In addition, anonymity, traceability, and resistance against collision attack, modification attack, and replay attack are represented by F5, F6, F7, F8, and F9, respectively. e schemes [19,20] do not support fault tolerance. Furthermore, the schemes [19,21,28] do not protect users' identity privacy, and the schemes [20,27] cannot trace malicious behaviors. Besides, the schemes [27,28] may be subjected to collision attacks, and the scheme [27] may be subjected to replay attacks. It is clear that the other related schemes fail to meet several requirements, yet the proposed scheme simultaneously fulfill all the security requirements.

Performance Evaluation
In this section, the computation cost and the communication overhead are compared and analyzed in a quantitative way between the related schemes [19-21, 27, 28] and the proposed scheme.

Computation Cost.
In order to ensure fairness comparison, the proposed scheme should be compared with other existing data aggregation schemes [19-21, 27, 28] based on the same 80 bits security level. With respect to the schemes [20,27] based on Paillier encryption, two large prime numbers u, v are selected as 512 bits, and N � uv is 1024 bits. Considering the schemes [19-21, 27, 28] based on bilinear pairing, the symmetric bilinear pairing e: G 1 × G 1 ⟶ G T is exploited, where G 1 is an additive group with generator P of the order q, that is defined on the super singular elliptic curve E: y 2 � x 3 + x mod p with embedding degree 2, q is 160-bit Solinas prime number, and p is 512-bit primer number satisfying q · 12 · r � p + 1. In terms of the ECC-based schemes [20,21] and the proposed scheme, an additive group G with prime order q is established by nonsingular elliptic curve E: y 2 � x 3 + ax + b mod p, in which p, q are both 160 bits prime numbers and a � −3 and b is a random 160-bits prime number. With regard to the symmetric homomorphic encryption in this paper, p and q are two 512 bits prime numbers, and the length of L is 160 bits.
For making more accurate comparison, the running time of each cryptographic operation is estimated by the MIR-ACL Crypto SDK [46]. e hardware equipment is a PC with 2.90 GHz, whose CPU is i5-10400, memory is 16 GB, and the operating system is 64 bit Windows 10 system. Table 3 indicates the mean consumed time of 10000 executions corresponding to different cryptographic operations.
Considering simplicity, some lightweight operations have been ignored, such as general hash function and point addition. e specific details are described in Table 4, in which n represents the number of smart meters. Assume t � 10 in the proposed scheme, that is, ω � 0.1n. e computation cost is divided into three phases, including report generation phase, report aggregation phase, and report reading phase.
First of all, the computation cost of the report generation phase is considered.
Li et al.'s scheme [19] employs 3n exponentiation operations in Z N , 2n multiplication operations in Z N , n exponentiation operations in G 1 , and n map-to-point hash operations. As a result, the running time is 3nT e−N + 2nT m−N + nT e−G 1 + nT mtp � 2.7585n ms.
Shen et al.'s scheme [20] applies n Paillier public key encryption operations, 2n exponentiation operations in Z q , n scale multiplication operations in ECC, and n map-topoint hash operations. In this way, the running time is nT Enc−P + 2nT e−q + nT m−ECC + nT mtp � 2.6266n ms.
Chen et al.'s scheme [21] utilizes n bilinear pairing operations, 2n map-to-point hash operations, n exponentiation operations in G T , and n scale multiplication operations in ECC. Consequently, the running time is nT bp + 2nT mtp + nT e−G T + nT m−ECC � 5.0429n ms.
Guan et al.'s scheme [27] demands n exponentiation operations in Z N 2 , n Paillier public key encryption operations, n exponentiation operations in G 1 , and n exponentiation operations in Z N . As a consequence, the running time is nT e−N 2 + nT Enc−P + nT e−G 1 + nT e−N � 2.237n ms.
Ding et al.'s scheme [28] needs 2n exponentiation operations in G T and n exponentiation operations in G 1 . Hence, the running time is 2nT e−G T + nT e−G 1 � 0.9108n ms.
In the report generation phase of the proposed aggregation scheme, SM ij executes n scale multiplication operations in ECC and 3n multiplication operations in Z N . SM H i1 executes 1.2n scale multiplication operations in ECC. As a matter of fact, the running time is 2.2nT m−ECC + 3nT m−N � 0.3198n ms.
Afterward, the computation cost of the report aggregation phase is analyzed.
Li et al.'s scheme [19] employs n + 1 bilinear pairing operations, n + 1 map-to-point hash operations, n multiplication operations in Z N , one exponentiation operation in G 1 , and one exponentiation operation in Z N . As a result, the running time is (n + 1)T bp + (n + 1)T mtp + nT m−N + T e−G 1 + T e−N � 3.2532n + 4.0169 ms.
Shen et al.'s scheme [20] applies n + 2 bilinear pairing operations, n + 1 map-to-point hash operations, and one scale multiplication operation in ECC. In this way, the running time is (n + 2)T bp + (n + 1)T mtp + T m−ECC � 3.2264n + 5.003 ms.
Guan et al.'s scheme [27] demands n + 2 exponentiation operations in Z N and n exponentiation operations in G 1 . As a consequence, the running time is (n + 2)T e−N + nT e−G 1 � 0.7905n + 0.3558 ms.
In the report aggregation phase, the proposed scheme executes 0.1n + 2 scale multiplication operations in ECC. As a matter of fact, the running time is (0.1n + 2)T m−ECC � 0.0109n + 0.2176 ms.   Ultimately, the computation cost of the report reading phase is summarized.
Li et al.'s scheme [19] employs two bilinear pairing operations, one map-to-point hash operation, one exponentiation operation in Z N , and one solving the discrete logarithm operation. As a result, the running time is 2T bp + T mtp + T e−N + T log � 5.2566ms.
Shen et al.'s scheme [20] applies two bilinear pairing operations, one map-to-point hash operation, one Paillier public key decryption operation, and two exponentiation operations in Z q . In this way, the running time is 2T bp + T mtp + T Dec−P + 2T e−q � 6.1197 ms.
Chen et al.'s scheme [21] utilizes three bilinear pairing operations, two map-to-point hash operations, and one solving discrete logarithm operation. Consequently, the running time is 3T bp + 2T mtp + T log � 8.3051ms.
Guan et al.'s scheme [27] demands one exponentiation operation in Z N and one Paillier public key decryption operation. As a consequence, the running time is T e−N + T Dec−P � 1.3908 ms.
Ding et al.'s scheme [28] needs three exponentiation operations in G 1 , one bilinear pairing operation, and one solving discrete logarithm operation. Hence, the running time is 3T e−G 1 + T bp + T log � 3.6901ms.
In the report reading phase, the proposed scheme executes two scale multiplication operations in ECC, one modular p operation, and one modular L operation. As a matter of fact, the running time is 2T m−ECC + T modp + T modL � 0.2342 ms. e total running time of the other schemes [19-21, 27, 28] and the proposed scheme are 6.0117n + 9.2735, 5.853n + 11.1227, 9.9371n + 9.9725, 3.0275n + 1.7466, 1.5234n + 5.5279, and 0.3307n + 0.4518, respectively. Figure 4 displays that the overall computation cost varies with the number of smart meters. Apparently, the overall computation cost of all schemes has a linear relationship with the number of smart meters. e proposed scheme requires the minimum overall computation cost and shows slower growth than other schemes. Specifically, the proposed scheme reduced the cost by 94.5%, 94.3%, 96.7%, 89.1%, and 78.3%, respectively, compared with other schemes [19-21, 27, 28]. Consequently, the proposed data aggregation scheme is more appropriate for the smart meters with limited computation resources because it involves no time-consuming operations, such as map-to-point hash and bilinear pairing operation.

Communication Overhead.
e communication overhead will be compared with the schemes [19-21, 27, 28]; the details are shown in Table 5, where |x| denotes bit size of x. In the smart grid, the size of the transmitted report is analyzed, including two parts, communication overhead from SM to GW and from GW to CC. Same as before, the length of G, G 1 , G T , Z q , Z N , and Z N 2 are 160 bits, 512 bits, 1024 bits, 160 bits, 1024 bits, and 2048 bits, respectively. Furthermore, assume that the identity and the timestamp are both defined as 32 bits.
For the first part, the communication process from SM to GW is analyzed.
In Chen et al.'s scheme [21], the electricity transmission data are 〈c i , S i , t i , id i 〉, where c i ∈ G T , S i ∈ G 1 , t i is 32 bits timestamp, and id i is 32 bits identity. erefore, the size of communication overhead is (|c i | + |S i | + |t i | +|id i |)n � (1024 + 512 + 32 + 32)n � 1600n bits.
In Guan et al.'s scheme [27], the electricity transmission data are  [28], the electricity transmission data are 〈CT i,j , S i,j , R i , T i,j , ID i 〉, where CT i,j � (c i,j,1 ∈ G 1 , c i,j,2 ∈ G T ), S i,j ∈ Z q , R i ∈ G 1 , T i,j is 32 bits timestamp, and ID i is 32 bits identity. Hence, the size of communication overhead is (|CT i,j | + |S i,j | + |R i |+ |T i,j | + |ID i |)n � (1536 + 160 + 512 + 32 + 32)n � 2272 n bits.
In the proposed scheme, SM ij submits the data report 〈ID ij , C ij  Chen et al.' s scheme [21] Guan et al.' s scheme [27] Ding et al.' s scheme [28] The proposed scheme T is 32 bits, and the size is (|ID ij | + |C ij | + |E ij | + |σ ij |+ |T|)n � (192 + 1024 + 160 + 160 + 32)n � 1568n bits. SM H i1 submits the data report 〈ID i1 , C i , E iH , σ iH , T iH 〉 to GW, and the size is (|ID i1 | + |C i | + |E iH | + |σ iH |+ |T iH |)0.1n � (192 + 1024 + 160 + 160 + 32)0.1n � 157n bits. As a matter of fact, the communication overhead is 1568n + 157n � 1725n bits. Figure 5 intuitively reflects the relationship between the communication overhead from SM to GW and the number of smart meters. It is clear that the communication overhead raises linearly with the increase in the number of smart meters. e proposed scheme demands 1725n bits, so it reduces the communication overhead by 47.2%, 52.7%, and 24.1%, respectively, compared with the other schemes [20,27,28]. Although the related schemes [19,21] are slightly better than our work in terms of the communication overhead from SM to GW, this is negligible because the proposed scheme would meet the fault tolerance and anonymity that they did not. In general, the proposed scheme consumes less communication resources.
For the second part, the communication process from GW to CC is analyzed.
In Li et al.'s scheme [19], the electricity transmission data are 〈ID ES i , t i , C i , σ i 〉, where ID ES i is 32 bits identity and t i is 32 bits timestamp, C i ∈ Z N and σ i ∈ G 1 . Consequently, the size of communication overhead is |ID ES i | + |t i | + |C i | + |σ i | � 32 + 32 + 1024 + 512 � 1600 bits.
In Chen et al.'s scheme [21], the electricity transmission data are 〈c j , S j , t i , id j 〉, where c j ∈ G T , S j ∈ G 1 , t i is 32 bits timestamp, and id j is 32 bits identity. erefore, the size of communication overhead is |c j | + |S j | + |t i | + |id j | � 1024 + 512 + 32 + 32 � 1600 bits.
In the proposed scheme, the electricity report is 〈RID G , C, E G , σ G , T G 〉, where RID G is 32 bits identity, C ∈ Z N , E G ∈ G, σ G ∈ Z q , and T G is 32 bits timestamp. As a matter of fact, the size of communication overhead is |RID G | + |C| + |E G | + |σ G | + |T G | � 32 + 1024 + 160 + 160 + 32 � 1408 bits. e last column of Table 5 directly illustrates the communication overhead from GW to CC of the related schemes. According to the size of the transmission data report, the proposed scheme utilizes 1408 bits, which is reduced by 12.0%, 56.9%, 12.0%, 54.6%, and 49.4%, respectively, compared with other schemes [19-21, 27, 28]. Consequently, the proposed scheme realizes lower communication overhead from GW to CC, which is more beneficial for GW with limited communication resources.

Conclusion
In this paper, we have employed the symmetric homomorphic encryption technology and the elliptic curve signature to design a lightweight and privacy-preserving data aggregation scheme in smart grid. In the proposed scheme, even though the smart meters produce malfunction, the system can still run normally to get aggregated data. Besides, it does not restrict the space of electricity data. e security analysis has demonstrated that the proposed scheme is IND-CPA and EUF-CMA secure and satisfies all security requirements. Ultimately, the performance analysis has reflected the lightweight of the proposed scheme in terms of computation cost and communication overhead. Judging from the results, the proposed scheme is more practical for the smart grid with limited computation and communication capabilities.

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.