FPETD: Fault-Tolerant and Privacy-Preserving Electricity Theft Detection

Electricity theft occurs from time to time in the smart grid, which can cause great losses to the power supplier, so it is necessary to prevent the occurrence of electricity theft. Using machine learning as an electricity theft detection tool can quickly lock participants suspected of electricity theft; however, directly publishing user data to the detector for machine learning-based detection may expose user privacy. In this paper, we propose a real-time fault-tolerant and privacy-preserving electricity theft detection (FPETD) scheme that combines n-source anonymity and a convolutional neural network (CNN). In our scheme, we designed a fault-tolerant raw data collection protocol to collect electricity data and cut off the correspondence between users and their data, thereby ensuring the fault tolerance and data privacy during the electricity theft detection process. Experiments have proven that our dimensionality reduction method makes our model have an accuracy rate of 92.86% for detecting electricity theft, which is much better than others.


Introduction
Electricity theft is widespread in the smart grid [1]; illegal users may be trying to reduce their bills by stealing electricity. Electricity theft will cause great economic losses to the power company and potential safety hazards such as fire and electric shock [2,3]. Therefore, it is vital to take measures to detect electricity theft behaviors in real time. The Internet of Things (IoT) is popular nowadays; in the smart grid, users' electricity consumption data are collected from smart meters to the data center in real time, which can be published or outsourced for data analysis to identify the theft users [4,5].
Many works have been conducted to solve the electricity theft detection problem in the smart grid. The existing electricity theft detection methods are mainly divided into three categories: state estimation, game theory, and machine learning. Among them, the machine learning-based methods are now widely used in electricity theft detection since they are more efficient and accurate.
However, most of the existing machine learning-based schemes for electricity theft detection consider the detection server to be credible [6][7][8][9]. For example, in [8], the authors use decision tree and support vector machine (SVM) to conduct electricity theft detection. The detector directly obtains the raw data corresponding to the user, which will leak the user's privacy. In [6], although the data center can detect the number of stealers, the detector can decrypt users' encrypted data and directly obtain users' raw data. In reality, the detection server is often untrusted. Directly publishing users' data to the detection server may reveal user privacy [10,11]. Now is the era of big data, and data contains sensitive information of people. For example, an attacker may analyze the user's electricity consumption data to find out whether the user is at home or not on a certain day, which may cause security issues such as burglary, so it is important to protect user data privacy.
To protect user privacy, it is necessary to perform privacy processing on user data before publishing it to the detector server [12]. Common data privacy processing technologies such as data aggregation [13,14] have well achieved privacy-preserving data collection, but they cannot be combined with the actual application of electricity theft detection [15,16]. For example, in [13], each user encrypts his/her own electricity consumption data into a ciphertext using a lifted ElGamal homomorphic cryptosystem before sending it to the aggregator [17]. The aggregator aggregates ciphertexts from all users and sends the aggregated data to the operating center [18]. In this way, the operating center obtains the sum of all. The aggregated data reflects the overall electricity consumption level; the data published facilitates the electricity distribution decision. However, the data aggregation scheme cannot achieve the accurate electricity theft detection in a specific area because the sum data may not retain the features of the original single user's raw data [19]. The electricity theft detection needs to extract the features of a single user for analysis. But the aggregation result may mask these features; it is not applicable for electricity theft detection.
Compared with the maximum value, average value, and sum value obtained by data aggregation, the raw data obtained by the n-source anonymity method often has greater use-value. The n-source anonymity is a data privacy processing method which was first proposed by Zhang et al. [20]; cryptographic tools are used to guarantee the rawness and unlinkability of the data. In Liu et al.'s scheme [21], "Shuffle" is introduced to allocate the participants' slots, to achieve n-source anonymity without a trusted third party. After that, Chen et al. [22] improve Liu et al.'s scheme [21] to reduce storage efficiency and make the scheme more lightweight. The anonymized raw data collected by the n-source anonymity method can realize the detection of electricity theft under the premise of protecting user privacy. However, the existing n-source anonymity methods [20][21][22] do not take into account the fault tolerance issues that may occur during the data collection stage in real data application scenarios. In the data collection stage of a real electricity theft detection scenario, if a device fails, it is likely that the entire detection system cannot work normally; therefore, fault tolerance is important. In this paper, a fault-tolerant and privacy-preserving electricity theft detection (FPETD) scheme is proposed, and three contributions are achieved as follows.
(1) We perform privacy processing on the users' electricity consumption data by n-source anonymity before it is published, to complete real-time electricity theft detection without the need of a trusted third party while ensuring user privacy (2) We propose a fault-tolerant n-source anonymity data collection scheme, so that users' electricity consumption data can still be collected privately in the event some smart meters fail, thereby ensuring that electricity theft detection can still be performed normally in the case of device failure (3) Sufficient experiments prove that the data normalization and dimensionality reduction preprocessing we do on the dataset can speed up the model training speed and improve the detection accuracy. Our preprocessing of the dataset makes our CNN model perform better than other existing methods The rest of the paper is organized as follows. Preliminaries are introduced in Section 2. System model and security hreats are discussed in Section 3. The proposed FPETD scheme is presented in Section 4. Privacy analysis is introduced in Section 5. Experiments and analysis are described in Section 6. Finally, the paper is concluded in Section 7.

Preliminaries
2.1. n-Source Anonymity. n-source anonymity is a raw data collection method that cannot trace the source of data. The steps of n-source anonymity in Liu et al.'s scheme [21] are as follows (Figure 1 shows a process of 4-source anonymity data collection): (1) Each participant p i ði ∈ ½1, nÞ obtains a slot which is only known by himself/herself through the slot generation phase (for more details, the reader can refer [21]) (2) Each participant p i ði ∈ ½1, nÞ uses the session keys to construct the masking data e j i ði ∈ ½1, n, j ∈ ½1, nÞ and adds raw data m i to the slotðiÞ-th slot. Then, each p i obtains a ciphertext: (3) The data collector executes the XOR operation on the ciphertexts from the participants and obtains the raw data finally 2.2. z-Score Standardization. z-score standardization is often used to map all features of the data to the same scale to avoid part of features of the data from forming a leading role due to numerical magnitude differences. The equation of the z-score standardization performs as below: wherex and σ represent the mean and standard deviation of the training samples. The x i ∈ R n , i = 1, ⋯, N, represent a training set of points. The x i ∈ R n , i = 1, ⋯, N, represent the normalized training samples.

Principal Component Analysis.
Principal component analysis (PCA) is often used to reduce the dimensionality of a dataset while maintaining the features that contribute the largest variance in the dataset. PCA is the most commonly used linear dimensionality reduction method [23]. Dimensionality reduction plays an essential role in machine learning, especially when the dataset has more than a thousand features. In addition to making feature processing easier, the algorithm can also improve the results of the classifier and speed up the training of the classifier. In dimensionality reduction, the information measurement indicator used by PCA is the sample variance, also known as the explainable variance; the greater the variance, the 2 Wireless Communications and Mobile Computing more information the feature carries. The feature variance equation is as below: where v ar represents the variance of a feature, n represents the number of samples, x i represents the value of each sample in a feature, andx represents the mean of this list of samples.

Convolutional Neural Network.
Convolutional neural network (CNN) is a common model in the field of deep learning, which is often used to deal with large-scale image problems [24]. CNN includes convolutional layers, pooling layers, and full connection layers [25]. The common CNN model structure is shown in Figure 2.
The convolutional layer includes multiple convolution kernels. These convolution kernels slide on the input matrix, that is, do dot multiplication with the pixels of the matrix to obtain a new matrix, which is the feature map. These feature maps are used as the next layer entry. The ReLU activation function is used to realize the nonlinear classification of neural networks. The equation of ReLU [26] performs as follows: Pooling layer: maximum pooling selects a maximum number in the sliding window as the result. The role of the pooling layer is to reduce the dimension of the feature map and reduce the amount of calculation.
Full connection layer: each node of this layer is connected to all nodes of the previous layer, which combines the features obtained from all previous layers and outputs them to the softmax function [27] for classification.
The softmax function is used to obtain the probability of the categories, which compresses the elements in the K -dimensional vector output by the full connection layer to the range of (0,1], and the result of their addition is 1, The final predicted categories are compared with the real categories, and through backpropagation [28], the parame-ters in the convolutional neural network are updated again and again in the iteration of backpropagation, approaching our real parameters infinitely. To train the CNN, we use cross entropy as a loss function; the equation of cross entropy for the distributions u and v over a given discrete set performs as below: 3. System Model and Security Threats 3.1. System Model. As shown in Figure 3, the system model includes participants p i ði ∈ ½1, nÞ, fog nodes (FN), the cloud server (CS), and the data consumer.
(1) Data consumer serves as a detection server in the system (2) Each participant equipped with a smart meter collects the real-time electricity consumption data and sends the data to FN (3) FN is a server located near the participants for processing data, which performs fog computing and reduces the computational burden of CS [29]. FN communicates with the participants and forwards participants' real-time electricity consumption data to CS according to a set of protocols [30,31] (4) CS stores the electricity consumption data sent by the FN. Data collected by CS can be outsourced to a data consumer for further processing, such as electricity theft detection 3.2. Security Threats. The smart grid suffers from a variety of security issues, such as the data injection attack, the denial of service attack, and some other physical threats [32]. Security goals require that the data is only shared between the two 3 Wireless Communications and Mobile Computing authenticated entities [33]. However, these security goals are not enough in the real environment; the privacy issues are also very important. For example, the data consumers are often untrusted in reality; they often leak user data for profit. The data privacy processing system in the ellipse in Figure 3 will process the data privacy to achieve a balance between participants' data privacy and the needs of the data consumer under the premise that the data consumer is untrusted. The privacy goals require that the data source cannot be known by others, and the data are not totally shared between two entities. In the proposed fault-tolerant and privacypreserving electricity theft detection (FPETD) scheme, we assume that there exists a secure communication channel between the authenticated entities, and hence, we only consider privacy issues in our system model. Therefore, the security threats of the proposed FPETD scheme are as follows: (1) The data consumer is untrusted which is curious for users' privacy (2) CS and FN are honest-but-curious. That is to say, CS and FN will not tamper with the data but will infer the source of the data (3) Participants are also honest-but-curious. Each participant will honestly follow the protocol but try to snoop on others' data

Our Proposed FPETD Scheme
This section presents the detailed content of the proposed FPETD scheme; it mainly includes the data collection phase with fault tolerance, method for faulty participants to reconnect, example for fault tolerance and reconnection, and detector design and training phase. In the data collection phase, the fault tolerance is introduced to ensure that the data collection process can still be completed normally in the case of device failure, and the method for faulty participants to reconnect is introduced to ensure the faulty participants can reconnect normally. In the detector design and train phase, we propose an efficient method to train a CNNbased detection model and use the trained model to detect the electricity theft. The overall system flow chart is shown in Figure 4.

Data Collection Phase with Fault
Tolerance. Here, we mainly introduce data collection when a failure occurs. In the real situation of data collection, even if the network situation is good, some participants will occasionally be offline. The section describes the data collection process when some participants fail to work; the cloud server can still collect data of the normal participants.
Session key sharing: each participant p i ði ∈ ½1, nÞ selects ð1 ≤ β ≤ n − 1Þ key-shared partners in the group and shares a session key k ij ði, j ∈ ½1, n, i ≠ jÞ with the selected partners p j . p i stores all the session keys fk i1 , k i2 , ⋯, k iβ g ð1 ≤ β ≤ n − 1Þ in the local.
CS sends data collection requests to the FN. FN initiates a task request. Assuming that only b ð1 < b ≤ n − 1Þ participants respond, FN notifies the normal participants p i to check if its key-shared partners are functional; if not, p i removes the corresponding session keys from its list. After that, assuming p i possesses α ð1 ≤ α ≤ b − 1Þ session keys, where α is the number of the session keys of the normal participants p i ,, then p i performs: (1) Each p i reconstructs new masking data e j i ðj ∈ ½1, nÞ with time t (time t is the timestamp in every time period) such as (2) Each p i adds his/her raw data m i to the slotðiÞ-th slot, and the ciphertext c i is reconstructed as follows: (3) FN eventually receives c i ði = 1, 2,⋯,bÞ from the b participants. FN executes the XOR operation to all the c i ði = 1, 2,⋯,bÞ and obtains a collected data list ML = fm π nð1Þ , m π nð2Þ ,⋯,m π nðnÞ g. It is worth mentioning (2) p i adds m i to the slotðiÞ-th slot, and c i is reconstructed as follows: (3) FN eventually receives c i ði = 1, 2, ⋯, nÞ from the n participants. FN executes the XOR operation to all the c i ði = 1, 2, ⋯, nÞ and obtains a collected data list ML = fm π nð1Þ , m π nð2Þ ,⋯,m π nðnÞ g. Then, FN sends the collected electricity consumption data list ML to CS

4.
3. An Example of Fault Tolerance and Reconnection. We use 4 participants as an example to describe the data collection with fault tolerance and the method for faulty devices to reconnect. p 1 shared k 12 , k 13 ,, and k 14 with p 2 , p 3 , and p 4 , respectively, while p 2 shared k 23 with p 3 , and after the slot generation, their slots are 3, 1, 4, and 2, respectively.
(1) Assuming that every time the data collection is performed on one floor in a building, there are 4 participants on each floor. CS sends task collection requests to the FN. FN initiates a task request. Assuming that p 1 , p 3 , and p 4 respond, and p 2 does not respond, it is considered that p 2 is faulty. Data on the correspond-ing slot of the faulty p 2 is 0 after the XOR operation.
In order to prevent the slot of the faulty participant from being exposed, FN randomly selects a participant from normal participants and informs him/her to submit data 0, thereby effectively preventing the leakage of the slot of the faulty participant. Assume that the FN randomly selects normal p 3 and informs him/her to fill in the data with 0 (2) FN notifies the normal participants p 1 , p 3 , and p 4 to check if its key-shared partners are functional; those who have the session keys with the faulty p 2 will delete the session keys with p 2 , that is, delete k 12 and k 23 , as shown in Figure 5 (3) Then, p i reconstructs the masking data p 1 reconstructs new masking data.
Because the slot of p 1 is 3, the data that p 1 fills in the third slot is e 3 1 ⊕ m 1 , and finally, the masking data is filled into each slot in order and the ciphertext c 1 = e 1 1 je 2 1 je 3 1 ⊕ m 3 je 4 1 of the data collection phase of p 1 is obtained. p 2 is faulty at this time and does not do anything.
The ciphertext of the data collection phase of p 3 is c 3 = e 1 3 | e 2 3 | e 3 3 | e 4 3 ⊕ m 3 .At this time, p 3 is selected by the FN, and the data is filled with 0 for disguise to protect the slot of the faulty participant, that is, m 3 = 0.
In the end, each participant publishes his/her data collection phase ciphertext to FN, and FN performs the XOR operation to all the ciphertexts, that is, ðc 1 ⊕ c 3 ⊕ c 4 Þ, to obtain 0jm 4 jm 1 j0.In the next data collection task, the faulty p 2 applies to reconnect. At this time, the faulty p 2 applies for the new session keys, as shown in Figure 6.
Then, all the participants reconstruct the masking data; then, the raw data can be obtained by executing the XOR operation to all the ciphertexts normally, and the faulty p 2 can return to normal. p 1 reconstructs masking data.
Ciphertext c 1 = e 1 1 | e 2 1 | e 3 1 ⊕ m 1 | e 4 1 .The faulty participant p 2 who applied for online restoration uses the new session keys to reconstruct the masking data: Ciphertext c 2 = e 1 2 ⊕ m 2 | e 2 2 | e 3 2 | e 4 2 . p 3 reconstructs the masking data: Ciphertext c 3 = e 1 3 | e 2 3 | e 3 3 | e 4 3 ⊕ m 3 . p 4 reconstructs the masking data: We can use a labeled dataset which contains the electricity usage data within n days of the customers to train a model by analyzing the electricity consumption pattern of the customers for a certain period of time. The n-source anonymity data collection process can be performed n times in total to complete the collection of the n days of historical electricity consumption data of these participants. For the missing value caused by an occasional failure, we use the average value of all electricity consumption data of the participant to fill the missing value at the time of the failure.

Detector Design and Training Phase
4.4.1. Data Preprocessing. Electricity consumption data often contains missing or erroneous values. This is mainly caused by various reasons such as the unreliable transmission of measurement data and the failure of smart meters. We use the forward interpolation method to recover the missing values as where x i stands for the value in the electricity consumption data over a period (e.g., a day). If x i is a null or a nonnumeric character, we set it as a member of NaN (NaN is a set). An electricity consumption dataset contains a samples, and each sample contains the electricity consumption data of the sample within n days. We randomly divide 80% of the data as the training set and 20% as the test set. In the electricity consumption dataset, there are often large differences in the electricity consumption of some users for certain days. In order to eliminate the impact of data differences on the prediction results, we use the z-score standardization to keep the values of the training samples on the same scale. In addition, the electricity consumption dataset contains the users' electricity consumption data for many days, which will cause  Wireless Communications and Mobile Computing a heavy training burden. We use PCA to reduce the dimensionality of the training set while maintaining the amount of information carried in the training set, thus ensuring that under the premise of accuracy, the training burden is reduced.
To meet the input matrix format of CNN, we can transform the 1D vector after performing PCA to a matrix C like ðp, q, dÞ, where p represents the number of rows in the matrix, q represents the number of columns in the matrix, and d represents the number of matrices. So, we can get a matrix for a single user; the shape is ðp, q, 1Þ; it is shown as below: 4.4.2. Our CNN Model. We use 2D convolutional layers and pooling layers and a full connection layer to build our CNN framework, as shown in Figure 7, which includes 3 stages.
(1) The shape of input data should be ðj, c, 1Þ since the target is a single user (2) We stack the convolution layer and the pooling layer alternately to extract more features and reduce computation. We use the padding method during the convolution and pooling process. The convolution layer doubles the number of features, and the pooling layer changes the shape. For example, assume the current shape is ðj, c, rÞ; after the convolution layer, it becomes ðj, c, αÞ; after the pooling layer, it becomes ðj/2, c/2, αÞ (3) We change the shape to one dimension before the full connection layer. Then, we use a full connection layer whose length is λ to change the shape to ðλÞ. Through the softmax function, the final output shape is (2). One is the probability of theft, the other is the probability of normal, and the sum of the two probabilities is 1. If the probability of theft is greater than the normal probability, we think electricity consumption data is abnormal, and the opposite is the same. It is worth mentioning that the above variables are adjustable in order to improve the performance of our CNN model 4.4.3. Data Detection Scheme. Different from the model training process, using a trained model for data detection requires only one forward propagation, as shown in Figure 8, which includes 4 stages.
(1) Through the convolutional layer, the model extracts the preliminary features of the user's electricity consumption data (2)

Privacy Analysis
In this section, we analyze the privacy of user data during the detection process. In the process of data detection, the detector requires raw data to ensure the accuracy of detection. From the perspective of protecting user privacy, the detector should not be able to track the source of the data. Before publishing the data to the detector, according to the needs of the detector, we use n-source anonymity to make the data private. The n-source anonymity method guarantees the rawness and unlinkability of the data. Therefore, collecting data through the n-source anonymity method can not only enable the detector to detect the data normally but also ensure the privacy of user data.
In a word, the realization of n-source anonymity is equivalent to the realization of the privacy of the data detection process. For details about the rawness and unlinkability of the n-source anonymity, the readers can refer [21]. The data collection process in our data detection scheme is based on the n-source anonymity method, so the privacy of users is guaranteed.

Experiment and Analysis
In this section, we evaluate the proposed FPETD scheme by conducting experiments on a 64-bit computer with Intel (R) Core (TM) i5-6500 CPU, 3.2 GHz, 8 GB RAM, using Python, TensorFlow, and Keras framework.
6.1. Experimental Data. We use the labeled database from the State Grid Corporation of China (SGCC) [34] to conduct experiments. The SGCC dataset contains the energy usage data of 42372 customers within 1035 days, and the last column of the dataset is the label corresponding to the user, which is a single value (0 or 1): 0 represents the normal user and 1 represents the suspected electricity theft user. We randomly divide 80% as the training set and 20% as the test set. Then, we use the z-score standardization to keep the values of the training samples on the same scale. Since the samples' features have more than a thousand dimensions, which are high dimensional features, we use the PCA algorithm to reduce the samples' dimensions.
As shown in Figure 9, the abscissa of the curve represents the feature dimension. When the abscissa is 256, the ordinate of the corresponding curve has a cumulative explainable variance ratio of 99%. That is, it can still maintain 99% of the feature information when the dataset is reduced to 256 dimensions. Then, we reshape the 256-dimensional features into a matrix whose shape is (16,16,1). The data is transformed into the matrix as below: 6.2. Model Training Phase. Our convolutional neural network contains 2 convolutional layers, 2 pooling layers, and 1 full connection layer. Our first convolutional layer uses 32 convolution kernels with a size of (5, 5), and the sliding step size is (1, 1), using the method of padding when doing convolution operation in the input matrix. The output of the first convolutional layer is a matrix whose shape is (16,16,32), and then, the matrix is passed to the first pooling layer. We adopt the maximum pooling method, the sliding window size of the pooling layer is set to the size of (2, 2), and the sliding step size is set to (2,2). The output of the first pooling layer is a matrix whose shape is (8,8,32), and then, the matrix is passed to the second convolution layer which has 64 convolution kernels with a size of (5, 5), and the sliding step size is (1, 1). Using the method of padding when doing convolution operation in the input matrix, the output of the second convolution layer is a matrix whose shape is (8,8,64), and then, the matrix is passed to the second pooling layer. Using the maximum pooling method, the sliding window size of the pooling layer is set to the size of (2, 2), the sliding step size is set to (2,2), and the output of the second pooling layer is a matrix whose shape is (4,4,64). Then, expand the matrix into a (1, 1024) vector. After that, it is passed to the full connection layer to synthesize the previous features, and the category probabilities are outputted through the softmax function.
The final predicted categories are compared with the true categories. Through backpropagation, the parameters in the  Wireless Communications and Mobile Computing convolutional neural network are iterated in backpropagation. They are updated again and again, infinitely approaching our real parameters. This is the model training process.

Model Evaluation.
We use the accuracy score as a performance score to evaluate the performance of the trained PCAbased CNN model. The equation of accuracy score performs as below: where δ y i ,ŷ i ð Þ= 1, y i =ŷ i , 0, else: ( Theŷ represents the predicted value, and the y represents the true value.ŷ i is the predicted value of the i-th sample and y i is the corresponding true value; n represents the number of samples.
We use the z-score standardization and PCA to preprocess the test set. We train the model for 100 epochs to update the model parameters. After training the model, we evaluate the model on the test set; the model accuracy score is 0.9286.

Model Comparison.
We compared the designed PCAbased CNN model to the single CNN model and other traditional machine learning methods, such as linear SVC [35], logistic regression (LR) [36], and random forest (RF) [37]. The experiment shows that the accuracy score of our PCAbased CNN deep learning model is better than that of the single CNN model and other traditional machine learning models, which proves that our dimensionality reduction method greatly improves the accuracy of electricity theft detection, as shown in Figure 10.
6.5. Comparison with Existing Schemes. This section mainly describes the comparison with existing electricity theft detection schemes. In the scheme of Yao et al. [6], it is assumed that the SG detector is trusted, but in reality, the detector is often untrusted, so it may leak user privacy. In Zheng et al. [7] and Jindal et al. [8], there is also a problem of leaking user privacy. Our FPETD scheme is to treat the detector as completely untrusted; after collecting the data, CS directly adds these raw data; therefore, CS can know the overall power demand of the building and then make decisions about the power distribution of the building.
As shown in Table 1, our FPETD scheme can protect user privacy and detect big data and also obtain the sum power consumption data to make power distribution decisions.

Conclusion
In this paper, we propose the FPETD scheme to realize realtime electricity theft detection in the smart grid. In our scheme, we designed a fault-tolerant raw data collection protocol to collect electricity data and cut off the correspondence between users and their data, thereby ensuring the fault tolerance and data privacy during the electricity theft detection process. Experiments have proven that our dimensionality reduction method before training the model makes the accuracy of our model better than others. However, the computational burden of our data collection process is a bit heavy. In our future work, we consider reducing the computational burden in the data collection process.

Data Availability
The labeled data from the State Grid Corporation of China (SGCC) were used to support this study.

Conflicts of Interest
The authors declare that there is no conflict of interest.