A Federated Learning-Based Fault Detection Algorithm for Power Terminals

Power terminal is an important part of the power grid, and fault detection of power terminals is essential for the safety of the power grid. Existing fault detection of power terminals is usually based on articial intelligent or deep learning models in the cloud or edge servers to achieve high accuracy and low latency. However, these methods cannot protect the privacy of the terminals and update the detection model incrementally. A terminal-edge-server collaborative fault detection model based on federated learning is proposed in this study to improve the accuracy of fault detection, reduce the data transmission and protect the privacy of the terminals.e fault detection model is initially trained in the server using historical data and updated using the parameters of local models from edge servers according to dierent updating strategies, then the parameters will be sent to each edge server and further to all terminals. Each edge server updates the local model via the compressed system log from terminals in its coverage region, and each terminal uses the model to detect fault according to the system behavior in the log. Experiment results show that this fault detection algorithm has high accuracy and low latency, and the accuracy increases with more model updating.


Introduction
With the development of the power grid and the applications of information and communication technologies, the smart grid has been widely deployed in most countries, and an automated and distributed advanced energy delivery network has also been constructed. Nowadays, the power terminals, such as smart energy meter, concentrator, special transformer, and energy controller, have become more and more intelligent and important in the smart grid, and the reliability, security, and stability of the power terminals have been challenges for the smart grid. Once the power terminal fails, it will lead to inaccurate power data, confused power scheduling, and damage of power equipment and even part of the power grid. Hence, the fault detection of power terminals with high accuracy is needed to nd the faults quickly to avoid the damage to the power grid.
Fault detection is the foundation of the security of power grid. At present, there are many fault detection methods for power terminals, most of which use arti cial intelligent or deep learning models to improve the accuracy of fault detection. In these methods, the detection operation is executed on the servers and the state data generated on the terminal of collected by extra devices are transmitted to the server. For example, drone and auto-tracking camera were used to detect the defects in power lines in reference [1]. Bouazza et al. [2] proposed arti cial intelligence-based methods to detect faults of the power switches in the wind energy conversion system. Shi et al. [3] proposed a fault detection method based on the LSTM model to predict the faults of DC-DC power supply. Nguyen [4] used a microphone to detect the overload of large power transformer by sound analysis. Hong et al. [5] detected the open conductor fault in power distribution networks using multiple measurement factors of feeder RTUs with DGs. Visible images, infrared images, and ultraviolet images of power equipment were fused to train a deep learning-based fault detection model [6] in a power system. Improved random forest was used to detect the power outage accident of the power terminal in reference [7], and continuous wavelet transform and convolution neural network were adopted to detect faults for power electronic converters in reference [8].
e above fault detection methods can achieve high accuracy mainly by the complex model and massive data. But the complex model will lead to high requirements of computing and storage resource, and massive data may result in congestion and delay of network transmission.
erefore, the fault detection should be at the terminal and the model training should be in the cloud to make full use of the advantages of cloud computing resources and local data to take the accuracy, efficiency, and privacy of fault detection into consideration.
Edge computing is a new framework to provide service at the edge of the network [9,10]. It has been used in many applications, especially for fault detection of IoT (Internet of ings) devices. An IoT fault detection based on edge computing and blockchain was proposed in reference [11], in which weighted random forest was adopted. Huong et al. developed LocKedge [12] framework to detect low complexity cyberattack in IoT edge computing. Mishra et al. [13] proposed data anomalies detection method at the edge of pervasive IoT systems. A device-edge split architecture for intrusion detection for IoT devices was proposed in reference [14] to reduce the overhead of the IoT devices. e architecture of the power grid is similar to the framework of edge computing, hence researchers have attempted to apply edge computing in the power grid for fault detection. Huo et al. [15] proposed a fault detection based on edge computing for distributed power distribution, and a method based on edge computing architecture was proposed to judge unsafe actions of electric power operations in time in reference [16]. Yang et al. [17] proposed a semisupervised cloud edge collaborative unsafe actions detection framework, and Zhang et al. [18] combined cloud edge fusion framework and deep learning techniques for abnormal object detection in the power grid. ese studies overcome the accuracy requirement of fault detection in the power grid by splitting the fault detection to edge server or cloud to reduce the overhead of the power terminals; however, the data privacy of the terminals and the data transmission in the network are not well considered.
Federated learning [19], which is first proposed in 2016, is a machine learning method that takes into account model accuracy and data privacy. It has been used in the power grid to protect the privacy of the data of consumers or terminals. A privacy-preserving federated learning framework Fed-Detect [20] is developed for energy theft detection in smart grid, and a federated learning-based method was proposed for privacy-preserving household characteristic identification in reference [21]. Su et al. [22] proposed a secure and efficient federated-learning-enabled AIoT scheme for private energy data sharing in smart grids with edge-cloud collaboration. Wang et al. [23] developed a distributed electricity consumer characteristics identification method based on federated learning to preserve the privacy of retailers. Liu et al. [24] used an asynchronous decentralized federated learning model for collaborative fault diagnosis of PV stations.
ese methods protect the privacy of the power terminals but cannot be applied to fault detection because of requirements of the high accuracy, low latency, data privacy, and incremental updating.
To overcome the above problems, we propose a threetier fault detection model based on federal learning for power terminals and three different model updating strategies in this study. e fault detection model training, updating, and testing are split into edge servers, cloud, and terminals respectively in the power grid. e cloud is responsible for the construction of the initial fault detection model and subsequent model updating to improve the accuracy of fault detection; the edge server is responsible for training the local model to ensure the data privacy of the terminal, and the terminal that only uses the trained model for fault detection to improve the efficiency of fault detection can compress the raw system log to reduce the data transmission. In the data interaction between terminal, edge server, and cloud, the amount of data transmission and system delay is reduced by means of log compression and parameter transmission. e experimental results show that our algorithm can reduce the amount of data transmission and achieve higher accuracy than traditional fault detection methods.

Data Model of Power
Terminal. e functions and manufacturers of power terminals are different in the power grid, and the configurations of hardware resources are also different. e configuration of some typical power terminals is listed in Table 1. e complex fault detection model cannot be trained on power terminals, but fault detection using the trained model can be performed on most terminals. e faults of the power terminals involve various functional modules of the embedded operating system, such as hardware driver, system security, file system, system application, and memory management. When a system fault occurs, the system log will record the faultrelated information, including the fault occurrence time and system abnormal behaviors. erefore, the fault detection can be realized through the analysis of the operating system log.
ough log formats and the description of system behaviors of different embedded operations are different, the system log usually includes system time, system components, and behavior description. Figure 1 shows the log format of an embedded operating system designed independently, and the record of the system log is in the form of time, device, and detail, where time and device represent the system time and the system component of this system behavior respectively, and detail is the operation description of the system behavior. As shown in the first line in Figure 1, "3.567604" is the relative time after system startup, "USB usb3" is the system component, and "Manufacture: Linux 3.10.108 ohci_hcd" is the system description.
e system fault detection of power terminal needs to find the fault and judge the type of fault as soon as possible after it happens, so that the users can quickly deal with the fault to avoid serious accidents, such as terminal hardware damage, data loss, and network intrusion. When the system fails, the device and detail attributes of the system log record the abnormal behavior or operations of the system devices. Some common system faults can be found through the keywords in the detail attribute of the system log, and the system log usually contains some fault-sensitive words such as failed, error, and warning, as listed in Table 2. However, some system faults caused by hidden bugs or premeditated external attacks may not contain these fault-sensitive words, and the speci c type of the system fault is di cult to determine by keywords. erefore, it is necessary to analyze the system log using natural language processing methods to accurately identify and classify the faults.
ough the system log records the system behaviors and operations when the system failure occurs, the content of the system log is generally simpli ed in the embedded operating system. erefore, it is necessary to take the language characteristics of the system log into consideration when using natural language processing methods: (1) Short Sentence: Each record of the system log has less content, the fault location is usually a speci c word or number of the system component, and the fault description is usually a short sentence, which contains only a few words. e syntax structure of the sentence is simple. (2) Less Vocabulary. ere are less vocabularies in the whole system log. e frequencies of most words in the system log are high, while some words with lower frequency are usually related to the identi cation of the terminal and have nothing to do with the type of the fault, such as speci c IP addresses.  (3) High Redundancy. e system log generated by each terminal contains a large number of repeated records, and the system logs generated by similar terminals also have a large amount of redundancy. e redundant records are typically representing normal operations.
erefore, in order to improve the efficiency of fault detection by natural language processing, the redundant records in the system log should be removed to reduce the number of records to be processed. In the process of recognition, the characteristics of short sentences and less vocabulary should be taken into consideration to reduce the complexity of the detection model.

Fault Detection Framework.
Considering the data model and hardware configuration of power terminals, the requirements of fault detection accuracy and delay, as well as the network architecture of power grid, a three-tier fault detection architecture, namely, end-edge-cloud, for power terminals is proposed in this study, as shown in Figure 2. e fault detection model is trained initially and updated in the cloud, and the terminals detect the fault using the detection model. Each edge server is used to train the local model using data from the terminals in its coverage region to protect the privacy of the terminals and generate the parameters of the updated model for the cloud.
ey are deployed to perform measuring, monitoring, controlling, and other functions in the power grid, such as energy controllers, fusion terminals, intelligent meters, and special transformer terminals. e power terminals are designed, manufactured, and used by different companies or end users, and they typically have low hardware configuration, and perform different tasks by running different application software. e faults of the power terminal and the abnormal behaviors resulted by the faults are recorded in the system log, hence the fault detection should be performed on the power terminal to detect the fault as soon as possible. Power terminals are the main participants of fault detection and are responsible for the collection and preprocessing detect system log and the detection of the faults.
(2) Edge Servers. ey can be either the dedicated servers or powerful terminals, which have a large number of computing and storage resources and can perform the collection and processing of large datasets. Each edge server is responsible for the collection of system logs and the training of local fault detection model for multiple terminals in a distinct region. e edge server acts as the role of connector of the terminals and the cloud in the fault detection architecture, and they interact with the cloud for the parameters of fault detection model, terminals for the updated parameters of the fault model, and the compressed system logs. e end-edge-cloud three-tier architecture is the common architecture of the practical power grid, hence the three-tier fault detection model can be easily deployed and completed without extra hardware support. Due to the limited network transmission capacity of the terminals, the bandwidth constraints of the edge server, the long transmission distance between the edge server and the cloud, and the data size and delay of data transmission between the cloud, edge server, and terminals greatly affect the training and detection performance of the fault detection model. erefore, the amount of data transmission between terminals, edge server, and the cloud should be minimized to improve fault detection performance.
From the perspective of model training, the cloud performs the training of the initial fault detection model and the updating of model. e edge server is responsible for collecting the compressed system log of the terminals and training the local fault detection model. e terminal uses the trained model to detect the system fault. e powerful computing resource of the cloud can quickly complete the model training and updating. e local model training can also be quickly accomplished with the small dataset on the edge server. e fault detection on the terminals only has a little demand for computing resources.
From the perspective of data transmission, the system log generated by the terminal will be compressed and transmitted to the adjacent edge server, which greatly reduces the amount of data and occupies less network bandwidth. e edge server transmits the parameters of the local model to the cloud and the parameters of updated model to terminals. Although the distance between the edge server and cloud is long, the transmission delay is very small because the parameters of the model is far less than the raw system logs.

Federated Learning-Based Fault
Detection Algorithm e spirit of federated learning is used in the three-tier fault detection framework to reduce the data transmission between the cloud and edge servers and protect the privacy of power terminals. e fault detection algorithm consists of Warning: get ephy clock is failed 2 Warning: mountpoint for pids not found 3 Console-setup.service: failed with result "exit-code" 4 Failed to start: set console font and keymap 5 Sunxi-ahci: Probe of sata failed with error −1 6 Error while tracing: no such file or directory three steps: pretraining, local training, and model updating. e pretraining step is processed in the cloud, the local training is completed on the edge server, and the model updating is finished in the cloud.

Pretraining.
e cloud uses historical system logs to train the initial fault detection model, and the system logs are marked and provided by the manufacturers and managers. Each record in the system log is in the form of device, time, detail, and result, where device is the type of the terminal; time, device, and detail are the system time, system component, and system behavior of the fault respectively; result is a nonnegative integer and represents the type of the fault. e natural language processing method is adopted to recognize the fault from the system behavior in the system log. e description of the system behavior is recorded in the system log in the form of detail � (w1, w2, w3, . . . , wm), where m is the number of words and w i is the ith word, 1 ≤ i ≤ m.
e LSTM (long short-term memory) network model is used in our fault detection model to improve the accuracy of fault detection. LSTM is an improvement of RNN (recurrent neural network), and it uses an input gate, forget gate, and output gate to selectively retain part of the previous cell state and transfer it to the next cell to overcome the problem of long-term dependence in RNN. Each cell state in LSTM is consisted of one forget gate, one input gate, and one output gate, and each gate is composed of a sigmoid neural network layer and a pointwise operation, as shown in Figure 3. e forget gate is used to forget part of the information of the previous cell state C t−1 , and only f t * C t−1 is remained in the current cell state, where f t is the output coefficient of a sigmoid layer: e input gate is used to determine which new information will be keep in the current cell state, and the new information is defined as i t * C t , where i t is the output of sigmoid layer and C t is the output of a tanh layer: (2) e current cell state C t can be obtained after the forget gate and input gate: e output gate is used to generate the output h t of current cell state C t . e output coefficient o t and tanh will be computed to get the output h t :

Local Training.
After completing the LSTM model training, the cloud sends the parameters of the fault detection model to each edge server, which then sends the parameters to all terminals in its corresponding region. During the execution of the terminal, the system log will be preprocessed and input into the fault detection model to obtain the fault detection results, and finally, the terminal carries out corresponding response, such as restart, shutdown, and alarm. Because only fault detection is performed on the terminal, and the model training and updating are performed on the edge server and cloud, the terminal can quickly detect and respond the fault. Due to the high reliability and security requirements of power grid, the failure probability of each terminal is very low, most records in the system log are normal operations, and only a few records are abnormal system behavior in a long time. Hence, the system log on each terminal contains a lot of redundant information, and it should be compressed before being transmitted to the edge server to reduce the amount of data transmission. Log compression is mainly divided into three steps: ① Variable Replacement. Some variables in the system log represent the identification of the device or network and do not contain the semantics of the fault. ese variables are usually long words or strings and can be replaced with the category of these variables, such as IP address and web page address, and device name can be replaced with IP, URL, and local, so as to reduce the differences between records in the system log and reduce the data transmission between the terminal and edge server. ② Similarity Computing. Each record of the system log contains the system component and the system behavior of the fault, and the records with high similarity will be compressed into one record to reduce the data transmission. Hence, the similarity of two records consists of the similarity of the system component S device and that of the system behavior S detail . e similarity of two system components depends on the locations of the components in the fault tree: where Root is the nearest common ancestor node of device 1 and device 2 in the fault tree; p(Root, device 1 ) and p(Root, device 2 ) are the number of nodes on the path from Root to device 1 and device 2 , respectively. If the detail of the record in the system log contains one or more of words in the set of sensitive words Keywords, then the record is marked as a sensitive record, and the similarity of this record will not be computed because any sensitive record cannot be compressed to ensure the accuracy of the fault Mathematical Problems in Engineering detection. For the nonsensitive records, the similarity of the system behavior is related to the words in detail. Given details 1 (w 1 , w 2 , . . . , w m ) and details 2 (w 1 ′ , w 2 ′ , . . . , w n ′ ), the similarity of details 1 and device 2 can be obtained: where m and n are the number of words in detail 1 and detail 2 respectively, details 1 ∩ details 2 is the number of words in the intersection of detail 1 and detail 2 , and details 1 ∪ details 2 are the number of words in the union of detail 1 and detail 2 . e similarity of two records r 1 (time 1 , device 1 , details 1 ) and r 2 (time 2 , device 2 , details 2 ) in the system log S(r 1 , r 2 ) is the weighted sum of the similarity of S device and S details : where α and β are the coe cients of S device and S details respectively and can be set by the domain experts, such as system designers, maintainers, and testers. ③ Redundancy Filtering. Log records with high similarity generated in a short time are called redundant records. e redundant logs should be discarded on the terminal, and only the remaining log records will be transmitted to the edge server to reduce the data transmission. e ltering process is as follows: rst, all the sensitive log records should be retained, that is, the record r i (time i , device i , details i ) with details i (w 1 , w 2 , . . . , w m ) will be transmitted to the edge server if w j ∈ detail j and w j ∈ Keywords j . If r j is not a sensitive record, then its similarity with any nonsensitive record r j , S(r i , r j ), is calculated in the subsequent short time interval. e log record r j with S(r i , r j ) ≥ S th will be marked as redundant record and discarded on the terminal, where S th is the similarity threshold set by the system manager or other experts.
After receiving the compressed system logs from terminals in its region, the edge server performs the local training of the fault detection model using the data received. en, the parameters of the updated local model are transmitted to the cloud to update the global fault detection model. Due to the low probability of system failure of each terminal, the local training should not be performed frequently. For example, a manufacturer of power terminal found that the system failure of each terminal occurs once every two months during on-site use, according to a pilot testing of energy controller since April 2021. If the fault detection model is trained frequently using the dataset without fault records, then it may lead to over tting of the fault detection model. In order to ensure the detection accuracy of fault detection model, three strategies of local training are proposed in this study.
① Periodic Update (PUpdate). Each edge server trains its local model within a speci ed period. e parameter period is set according to the reliability of the terminals in the coverage region of the edge server, and it is set to be the average failure time of the terminals by default. Given the edge server server i and the set of terminals in its coverage region P i (p i,1 , p 1,2 , . . . p i,k ), the failure time of the terminal p i,j is ft j , and we can obtain the update period of edge server server i as follows: Input gate Output gate where k is the number of terminals.
e update period of different edge servers may also be different and can be adjusted when the terminals in the region are removed or newly deployed. ② Incremental Update (AUpdate). Each edge server starts the local training when the number of abnormal log records is no less than the threshold. Since training the local model using normal records may lead to over fitting, the training dataset should contain some abnormal records. Suppose is the training dataset on edge server server i , D th is the threshold of the abnormal records, the server i will perform local training if |ADataset i | ≥ Dt h, where NDataseti is the set of normal records in Dataseti, and ADataseti is the set of abnormal records in Dataseti. e threshold Dt h � 1 is generally set as a linear function of the number of terminals in the coverage region of the edge server, and the threshold on each edge server may be different. ③ Triggered Update (TUpdate).
e edge server performs the local training when receiving abnormal log records from any terminal in its coverage region. e local fault detection model may be updated as soon as possible in this strategy, and the updated parameters will be sent to the cloud. It may lead to a large amount of data transmission between the edge servers and the cloud when there are many terminals in the power grid. is strategy can be seen as a special case of AUpdate with threshold Dt h � 1. e above three strategies should be used in different situations; PUpdate and AUpdate can reduce the data transmission between edge servers and the cloud, while TUpdate can quickly update the fault detection model. We can choose different strategies according to the practical applications, and the strategies can be adopted on different edge servers simultaneously. e main difference of the three updating strategies is the updating frequency of the fault detection model; TUpdate may lead to high and unpredictable updating frequency, while the updating frequency of PUpdate and AUpdate can be controlled by changing parameters period and Dt h.

Model Updating.
After receiving the parameters of local models from edge servers, the cloud will aggregate the parameters to update the global detection model and then send the updated parameters to all edge servers. Each edge server will further send the updated parameters to all terminals in its coverage region. Suppose the parameters of local models from k edge servers are the set V � v 1 , v 2 , . . . , v k , and the parameters of each local model are a vector V i � 〈v i 1 , v i 2 , . . . , v i p 〉, where 1 ≤ i ≤ k and p is the number of parameters of the fault detection model. e set of parameters V is aggregated in the cloud, and we can get the new parameter vector V u � 〈v u 1 , v u 2 , . . . , v u p 〉, in which each parameter is updated according to the following rules: where v ′u i is the original value of the ith global parameter, and ηis the weighted coefficient and can be obtained by the cloud in the step of pretraining. e accuracy of the fault detection model on the terminals depends on the frequency of model updating in the cloud, if the architecture of the grid power and the method to process the system log are given. e parameters of local models from the edge servers will be less when the cloud updates the global model frequently, which will increase the data transmission to send the updated parameters to all the edge servers and terminals. If the frequency of model updating is low, then some edge servers, on which the frequency of local training is high, will send their local parameters several times to the cloud, and only the latest version of the local parameters will be aggregated to update the global model, so that the influences of some faults will be ignored and the accuracy of the fault detection model will be affected. erefore, in order to improve the accuracy of the fault detection model, the cloud can adopt different strategies to update the model, such as PUpdate, AUpdate, and TUpdate.
ese update strategies in the cloud for model updating are similar to that on the edge server for local training, and the only difference is the measurement of the thresholds in each strategy. e update period of PUpdate in the cloud is typically set as the shortest time of local training on all edge servers. e threshold of AUpdate in the cloud will be set according to the number of edge servers, which have been finished the local training, that is, the number of parameter vectors received by the cloud. e cloud will update the model when receiving parameters of local model from any edge server when TUpdate is used in the cloud. e update strategy in the cloud is determined by the experts according to the requirements and configurations of the power grid and the strategies adopted on the edge servers.

Fault Detection Algorithm.
Suppose there are one cloud Cloud and k edge servers Server 1 , Server 2 , . . . , Server k in the power grid, and the terminals in the coverage region of edge server Server i are de v i 1 , de v i 2 , . . . , de v i n , then the fault detection process is as follows: Step 1. e historical system logs Records are used to pretrain the initial fault detection model using LSTM in Cloud. en, Cloud sends the parameter vector V init to all edge servers Server 1 , Server 2 , . . . , Server k , and each edge server Server i further sends V init to the h terminals de v i 1 , de v i 2 , . . . , de v i n in its coverage region.
Step 2. When a log record r i is generated on a terminal de v i j , it should be firstly preprocessed by variable replacement and input into the fault detection model to check whether a fault occurs and determine the type of the fault on dev i j . en, the log records within a time period will be compressed and sent to the edge server Server i .
Step 3. Each edge server Server i collects the compressed system logs R i R i 1 , R i 2 , . . . , R i h from terminals de v i 1 , de v i 2 , . . . , de v i n in its coverage region and trains the local LSTM model using dataset R i according to the updating strategy, such as PUpdate, AUpdate, or TUpdate. Once completing the local training, Server i transmits the parameters V i of local model to Cloud.
Step 4. After receiving the parameters V V 1 , V 2 , . . . , V k from edge servers Server 1 , Server 2 , . . . , Server k , Cloud aggregates V to generate the new parameters V update v u 1 , v u 2 , . . . , v u p using PUpdate, AUpdate, or TUpdate strategy. en, V update will be transmitted to each edge server Server i , and nally to all terminals de v i 1 , de v i 2 , . . . , de v i n , and steps 2-4 will continue until the fault detection is interrupted.
In the above algorithm, each terminal continuously generates a large amount of system log, and each record in the system log is input into the fault detection model to nd as many faults as possible. e system log will be compressed before transmitting to the edge server, and only the parameters of local model are sent to the cloud to reduce the transmission delay. Meanwhile, the fault detection on the terminal, the local model training on edge server, and the model updating in the cloud can fully utilize the computing resources of di erent hardware. erefore, high accuracy, low latency, less data transmission, and privacy protection are taken into consideration in this federated learning-based fault detection algorithm.

Performance Evaluation
e fault detection algorithm proposed in this study is evaluated in a simulated power grid with a cloud, 10 edge servers, and a number of terminals. e terminal is congured as a brand of energy controller, the system log is collected in practical application, and the summary of the system log is listed in Table 3. e time of model updating, the amount of data transmission of each terminal, and the fault detection accuracy are tested to evaluate our fault detection algorithm. e weights in the similarity of records are set as α β 0.5. e average update time of the model using di erent update strategies when the number of terminals in the coverage region of the edge server changes is shown in Figure 4. It can be seen that the update time of PUpdate strategy keeps stable if the update period is xed, while the update time of AUpdate and TUpdate strategies decreases with the increase of terminals. e reason is that the increase terminals lead to high frequency of the edge server receiving abnormal log records, and the local update will be performed more frequently if the other conditions is xed. Since the threshold D th in AUpdate is set to be 10, and the TUpdate can be seen as a special case of AUpdate with D th 1, the average update time of TUpdate is signi cantly lower than that of AUpdate strategy for the same number of terminals. e average data transmission of each terminal for the local training on edge server with di erent update strategies is shown in Figure 5. e legends "X" and "X-C" represent the two cases when the terminal transmits raw system log and compressed system log to edge server with update strategy X respectively. It can be seen from Figure 5 that log compression can signi cantly reduce the data transmission in all update strategies, and PUpdate generates the largest data transmission, followed by AUpdate, and data transmission in TUpdate is the smallest. e main reason is that PUpdate takes the longest period for each local training, while the average time of local training in TUpdate is the  shortest. Meanwhile, when the number of terminal devices increases, the model update time of AUpdate and TUpdate gradually approaches. In the case of log compression, the data transmission of AUpdate and TUpdate approximates to be equal. From the above results, it can be seen that the PUpdate strategy has the longest update period and the largest amount of data transmission for each local training. e TUpdate has the shortest update period and the least data transmission for each local training. e update period and data transmission in AUpdate are less than that of PUdate and more than that of TUpdate. In practical applications, TUpdate is suitable for newly deployed terminals and terminals with high security requirements, so as to quickly collect the abnormal system records of the terminal and train a more accurate fault detection model. Terminals with limited network bandwidth can also avoid network congestion and packet loss caused by a large amount of data transmission using TUpdate strategy. PUpdate is applicable to terminals with stable operation, less strict security requirements, and large network bandwidth, and the update period can be set according to the actual application and is usually initialized as the average failure time of the terminal. Strategy AUpdate has the maximum exibility, the update period can be changed easily by modify the threshold D th , and the update period grows as D th become larger. e accuracy of fault detection using di erent fault detection methods is shown in Figure 6. e legends "KeyWords" is the keyword matching-based method, "Global" is the global LSTM model, that is, the fault detection model is trained in the cloud and cannot be updated, and "EdgeCloud" is our LSTM model with edge and cloud cooperation. e results show that the accuracy of Keywords method is very low since the abnormal behavior of some faults does not contain the keywords. e accuracy of both Global and EdgeCloud methods is obviously higher than KeyWords because the faults are recognized by the natural language processing method LSTM. With the accumulation of the system log, the Global method does not update the fault detection model, which leads to decreasing of the accuracy of fault detection. e new system log is used to train the local model and further update the global fault detection model to increase the accuracy of the fault detection model. With the increase of the number of model updates in the cloud in EdgeCloud, its accuracy becomes higher than that of Global method.

Conclusions
In this study, we analyze the characteristics of power terminal fault detection in the power grid and propose a threetier fault detection model based on federated learning. e accuracy of fault identi cation and the privacy protection of terminal are both taken into consideration in this model, then the model training, model updating, and fault detection are performed at di erent levels. Log compression and transmission of parameters of the model are used to reduce data transmission and protect the privacy, the LSTM model is used to improve fault detection accuracy, and three different model update strategies are used to further improve the accuracy of fault detection. e experimental results show that the log compression method can e ectively reduce the amount of data transmission. e three model strategies are suitable for di erent application scenarios and terminals, and the detection accuracy of our proposed fault detection model is higher than that of the traditional keyword-based models and the global models based on historical data. In the future work, it is planned to integrate our proposed fault detection model into various embedded operating systems and deploy it on di erent devices.
Data Availability e data used to support the ndings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no con icts of interest. Mathematical Problems in Engineering 9