LstFcFedLear: A LSTM-FC with Vertical Federated Learning Network for Fault Prediction

The firefighting IoT platform links multiple firefighting subsystems. The data of each subsystem belongs to the sensitive data of the profession. Failure prediction is a crucial topic for firefighting IoT platforms, because failures may cause equipment injuries. Currently, in the maintenance of fire IoT terminal equipment, fault prediction based on equipment time series has not been included. The use of intelligent technology to continuously predict the failure of firefighting IoT equipment can not only eliminate the intervention of regular maintenance but also provide early warning of upcoming failures. In order to solve this problem, we propose a vertical federated learning framework based on LSTM fault classification network (LstFcFedLear). The advantage of this framework is that it can encrypt and integrate the data on the entire firefighting IoT platform to form a new dataset. After the synthesized data is trained through each model, the optimal model parameters can be finally updated. At the same time, it can ensure that the data of each business system is not leaked. The framework can predict when IoT equipment will fail in the future and then provide what measures should be used. The experimental results show that the LstFcFedLear model provides an effective method for fault prediction, and its results are comparable to the baseline.


Introduction
The firefighting IoT platform is one of the key safeguards for enterprise fire safety. However, the current fire Internet of things platform has low accuracy in identifying various types of alarm information. How to effectively identify the false alarm information of the fire Internet of things platform is very important. The current firefighting IoT platform is linked to multiple firefighting subsystems, such as smoke and sprinkler sensors in the office area and power environment monitoring in the substation. Since the data belongs to different business departments, the data of each department is expected to run on their own independent systems, which requires that each data cannot interact with other data. However, in order to improve the accuracy of the false alarm prediction of the fire Internet of things platform, it is necessary to use all the business data to train the network model. Based on this, we introduced a federated learning framework and proposed the LstFcFedLear network. The accuracy of fault prediction is one of the keys to ensure the normal operation of the fire IoT. Predictive science is a discipline that analyzes a large amount of data and discovers some potential relevance among them. This provides an important basis for industry equipment failure prediction [1,2]. In order to further improve the accuracy of various failure predictions, many new technologies (for example, artificial intelligence, big data, and blockchain) have been gradually applied to factories [3,4].
In essence, failure prediction is to correlate the occurrence of failure events with the failures that may occur in the future. Failure prediction has made important developments in 1979. The corresponding mathematics of fault prediction is to use input to predict output. Due to the widespread existence of nonlinearity and uncertainty, it is more difficult to establish an efficient system model in mathematics, which is also one of the objective reasons for missed detection and false alarms. Later, Box et al. proposed the application of time series to forecasts, which greatly improved the accuracy. The characteristic of the neural network is that it can perform nonlinear mapping, so it is widely used in the field of prediction. Unfortunately, neural networks need to be data-driven, and the biggest thing is that they need to manually set the network model parameters.
From the current enterprise monitoring system, it is relatively easy to obtain a large amount of historical equipment data and operating data. Therefore, it is feasible to use historical data to predict failures. With the rapid development of new technologies, it is also feasible to use artificial intelligence, big data, and other technologies to assist in forecasting. At present, in the use of artificial intelligence for fault prediction research, a supervised or unsupervised method is one of the two most common methods. For the design of the network structure, the designer can only rely on experience to subjectively design the depth of the network, the number of neurons, and other parameters. Designers with different experience design different networks. This leads to a problem, and the same problem may have different solutions. Among the many algorithm models, the support vector machine algorithm is more widely used.
If the condition of all equipment in the factory can be monitored and the failure can be alerted in advance, the reliability and stability of the entire factory can be greatly increased. In recent years, in the field of PHM, a lot of research on these fault topics has been carried out, which greatly reduces the cost of fault maintenance of factory equipment and also improves the efficiency of the factory [5]. The key function of PHM is to diagnose equipment failures and discover the causes of equipment failures [6]. Equipment failure prediction is very challenging, and the main reason is the need to consider both the maintenance plan and the type of failure. Over the years, the forecasting model has been continuously developed and improved. But so far, the complex algorithm model [7,8] still has many limitations. In order to overcome these shortcomings, some studies have adopted machine learning algorithms, such as neural networks [4] and support vector machines (SVM) [5] to predict failure types. These studies have promoted the development of probabilistic models to a certain extent [9,10], but probabilistic models lack clear physical meaning in fault prediction.
Compared with traditional machine learning algorithms that cannot process time series data, the advantage of the LstFcFedLear method is that it can predict the sequence of future data through learning from historical experience. The main contributions of this article to our work are summarized as follows: (1) We propose a vertical federated learning framework based on LSTM fault classification network (LstFcFe-dLear). The advantage of this framework is that it can encrypt and integrate the data on the entire firefighting IoT platform to form a new dataset (2) The LstFcFedLear model can ensure that the data of each business system is not leaked. This framework can predict the probability of future failure of the fire IoT and can provide corresponding measures to solve the failure The structure of this article is as follows: Section 2 introduces related research. Section 3 introduces the new framework method. Section 4 shows the experimental verification results. Section 5 is the conclusion and future work.

Related Works
VSC and MMC lack the ability to regulate DC short-circuit current during DC faults. For multiterminal DC system fault detection, the calculation of short-circuit current during the discharge phase of the DC fault capacitor is crucial. Li et al. proposed a transient equivalent model suitable for fault analysis of multiterminal DC systems. This model only retained the high-frequency components in the original fault network, which greatly simplified the circuit analysis at the initial stage of the fault [11]. Since the current waveform when the arc fault occurs is very similar to the current waveform of some loads, it is difficult to detect arc faults through simple current characteristics. Aiming at this problem, Lin et al. proposed an arc fault detection method combining a self-organizing feature mapping network and a sliding window method [12]. On the basis of autonomously mining the inherent characteristics of current data, the current signal is continuously detected by using the correlation and continuity between adjacent periodic current samples. The proposed method can effectively realize arc fault detection, and the accuracy of arc fault detection can reach 99%.
The existing bearing fault alarm system is mainly based on the rule diagnosis of a single shaft temperature variable, and the alarm is not timely. In response to the above problems, Liu et al. combined the correlation of the multiaxis axle temperature of the same car and proposed a data-driven method for detecting and positioning train bearing faults [13]. The proposed DiCCA modeling method is verified by using the axle temperature data of a train in actual operation, and the results show the effectiveness of the proposed method. Based on a data-driven approach, Xiong et al. proposed an edge-assisted privacy protection original data sharing framework, which ensures that the data connected to autonomous vehicles will not be destroyed [14].
Using the sensor data of the traction system, Chen et al. proposed an optimal data-driven fault detection method to solve the fault problem of the dynamic traction system [15]. And based on the improved SVM, the optimal datadriven fault diagnosis problem is studied. Finally, through the actual high-speed train experimental platform, the rationality and effectiveness of the proposed method are verified. Yang et al. proposed a data-driven soft closed-loop faulttolerant control strategy for the voltage sensor failure of the DC side capacitor in the H-bridge structure of STAT-COM [16]. This method selected capacitor voltage and system output current as original signals and established the MLS-SVM prediction model based on historical operating data [17]. The predictive output of the MLS-SVM model and the residual signal output by the actual sensor are used to establish a sensor fault detection and judgment mechanism.

Wireless Communications and Mobile Computing
The test results showed that the method has good accuracy and real-time performance [18]. Condition monitoring and fault diagnosis are necessary means to ensure the safe and stable operation of mechanical equipment. Wang et al. proposed a deep learning framework based on ABiLSTM for intelligent fault diagnosis of mechanical equipment [19]. The framework first preprocesses the raw data collected by the sensor and divides it into a training sample set and a test sample set. Secondly, Oramas and Tuytelaars extracted features of the original time-domain signal by training multiple bidirectional LSTM networks of different scales and obtained multiscale features of equipment failures [20]. The experimental results show that the ABiLSTM model can achieve multiscale feature extraction of the original signal. By comparing with methods such as CNN, DAE, and SVM, the fault recognition performance of the ABiLSTM model is better than that of various common models [21,22]. The results of generalization performance experiments on the ABiLSTM model show that the fault recognition accuracy of samples under off-changing conditions can still reach more than 95%; the LSTM network architecture is shown in Figure 1.

LSTM Network.
A Recurrent Neural Network (RNN) is a type of neural network specially used to process time series data samples. Each layer of RNN not only outputs to the next layer but also outputs a hidden state. RNN's convolutional neural network can be easily extended to images with a large width and height, and some convolutional neural networks can also handle images of different sizes. RNN can be extended to longer sequence data, and most RNNs can handle data with different sequence lengths. It can be seen as a fully connected neural network with self-loop feedback. In forward propagation, the hyperbolic tangent activation function is generally used from the input layer to the hidden layer [23]. The hidden layer to the output layer uses softmax to map the output to a probability distribution of ð0, 1Þ. We will see that the output value of the hidden layer at the current moment is affected not only by the input at the current moment but also by the input at all times in the past. In this way, the output value of the hidden layer can be regarded as the memory of the network, which makes it very suitable for processing data samples that have dependencies before and after [24,25]. An important feature of RNN is that the parameters of the model are shared at different times. This allows us to share the statistical strength of different locations over time. When some parts of the sequence data appear in multiple locations, this parameter sharing mechanism becomes particularly important [26,27]. LSTM is a type of RNN. The timing backpropagation algorithm transmits the error information step by step in the reverse order of time. When the length of each time series training data is large or the time is small, the gradient of the loss function with respect to the hidden layer variable at a certain time is more likely to disappear or explode.
The input vector of a standard RNN network is x = ðx 1 , ⋯, x T Þ. The RNN network uses equations (1) and (2) to solve the hidden vector h = ðh 1 , ⋯, h T Þ and the output vector y = ðy 1 , ⋯, y T Þ.
Among them, W ih refers to the input weight matrix. W hh refers to the weight of the hidden layer. W ho refers to the calculated output matrix of the hidden layer. b h and b o refer to all bias vectors, and σ is usually set as a sigmoid function σ ðxÞ = 1/ð1 + exp ð−xÞÞ.
The biggest problem encountered by RNN is the gradient problem of gradient disappearance and gradient explosion. LSTM is one of the RNN architectures, essentially using memory cells and gate cells to solve the problem of gradient disappearance and gradient explosion. The memory cell function of LSTM focuses on the input gate unit, which can Among them, the input vectors are f t , o t , and c t , which correspond to the vectors of the input gate, forget gate, and output gate at time t. It is worth noting that we regard the vector with the same size as the hidden vector h t . The weight matrix W refers to the connection coefficient matrix between two different bodies.
The LSTM network is mainly composed of 32 bidirectional units, followed by the 50% discard layer and the sigmoid activation function. This uses L2 regularization to prevent network overfitting. In the model training, the cross-entropy loss function and Adam optimizer are used to train and solidify each LSTM network. When a fault is evaluated, LSTM divides the output fault into 5 levels, from small to large (0-4).
The performance of the pure LSTM model is better than that of the hybrid model [28]. Figure 2 shows the structure of the proposed model. By extracting the characteristics of the sequence data and using them as the input of the convolutional neural network model, the spatial characteristics of the data can be obtained. In order to prevent overfitting, Figure 3 adds a layer to prevent overfitting.

Data Preprocessing.
In this article, the premise is that we predict daily failures. To this end, we focus on the cluster location and date of occurrence. This turns the research ques-tion into determining when a failure occurs. Since we are solving two different prediction problems here, namely, binary classification and regression, we consider the production process of each type of prediction input dataset.
To address the problem of fault type classification for fire facility, we used the Fault Type of Fire Facility (FTFF) dataset from the Firefighting Internet of Things platform database of China State Grid Gansu Electric Power Company. This dataset contains two subdatasets, namely, FTFF1 and FTFF2.
We study the real dataset of 15084 alarm records of power firefighting equipment recorded between 2019 and 2020 from fault in power fire facility maintenance. All the The original fault dataset is first transformed into a fault vector, v R = ðv R1 , ⋯, v RT Þ. In the regression algorithm model, for the fault at a certain time t, the text is expressed as v Rt = ðv t1 , ⋯V ts Þ. Among them, v ts refers to the fault in the s-th space at a certain time t. In the simplified neural network, the sigmoid and hyperbolic tangent function are usually integrated in the gate and used as the activation function. The purpose is to convert the input value to between 0 and 1, where 1 means it is worth paying attention to and 0 means not needing attention. On the other hand, the role of the tanh function is to adjust the network performance by compressing the value to between -1 and 1. LSTM-FC is very sensitive to causality.

Prediction Models.
In the design of this article, we design two types of fault diagnosis models: binary classification and prediction. In this research, we develop two types of failure x Memory cell Tanh Tanh Figure 2: The process of the LstFcFedLear network. 4 Wireless Communications and Mobile Computing prediction models, namely, binary classification and prediction. In addition, we also focus on evaluating the relationship between spatial clusters to judge the impact on the prediction results. In this article, we have designed four types of failure prediction models, as shown in Table 1. In the second section, we introduced that the RNN model can receive delay information and has the ability to judge whether this information has an impact on the storage unit. The proposed LSTM-FC model is shown in Figures 1 and 4. It can be seen that these two models use different types of input data, and the activation functions in the output layer are also different. It is worth noting that it expresses faults through weighting factors, and these weights are determined by the following equation: In these equations, h t refers to the fault that occurs at time t. w a is the weight matrix set by the attention layer. a t refers to the probability of possible failure at time t. v refers to the weighted summation of the probabilities at all times t.
By converting the input into a fault sequence, X = fx 1 , x 2 , ⋯, x N g. x t refers to the fault that occurs at time t calculated by the LSTM model. At each time step, we first use the forward LSTM to predict the probability of the next failure. The overall goal is to minimize the following objective functions: Among them, L f refers to all the parameters of the model in forward prediction. The Pr ð·Þ function in the LSTM model is calculated as x t+1 , which mainly depends on the previous probability.
After getting a set of fault history data, the probability of the next fault can also be predicted through the reverse sequence. Therefore, we have also established a backward LSTM, the purpose of which is to predict the previous failure probability based on the later occurrence probability.
Finally, we analyze and classify the fault types and incorporate the key information into the whole process of LSTM training.
In summary, by using the optimized model, that is, equation (13), the parameters of the algorithm model can be obtained. After that, the corresponding topological quantities can be calculated.

LstFcFedLear
Model. LSTM-FC can calculate the topological value of the genetic network. Using this advantage of LSTM-FC, the LSTM-FC network algorithm can be iterated repeatedly to obtain the most reasonable parameter matrix. In Algorithm 1, we show the algorithms of the LSTM-FC method one by one.
In order to be able to encrypt and integrate the data on the entire fire IoT platform to form a new dataset and to ensure that the data of each business system is not leaked, we have designed the following framework, as shown in Figure 5. Then, the new dataset is fed into the LstFcFedLear   Step 1. The central server sends the public key to the LstFcFedLear1, LstFcFedLear2, LstFcFedLear3,…, LstFcFe-dLearn models and uses the Paillier partial homomorphic encryption algorithm to align the encrypted samples. The Paillier encryption algorithm is mainly divided into three steps. The first is to generate a key according to the Paillier encryption algorithm. Then, use the generated key to encrypt each part of the data. Finally, after the model training is completed, the model is decrypted [29].
Step 2. The encrypted samples are fed to the LstFcFe-dLear1, LstFcFedLear2, LstFcFedLear3,…, LstFcFedLearn models for iterative training, and the local parameter gradients of the models are calculated, respectively.
Step 3. LstFcFedLear1, LstFcFedLear2, LstFcFedLear3,…, LstFcFedLearn models push the gradient and loss calculated by each to the central server. The central server uses the private key to decrypt.
Step 6. LstFcFedLear1, LstFcFedLear2, LstFcFedLear3,…, LstFcFedLearn models are iteratively trained to generate a joint model.    Wireless Communications and Mobile Computing

Prediction by LstFcFedLear.
It is easy to see that displaying a certain vector in a sequence or a certain sequence in a sequence is basically the same as the training process of the LstFcFedLear algorithm. The following figure illustrates the training progress curve of the LstFcFedLear and SVM algorithm in detail. It can be clearly seen from the figure that the loss and root mean square error performance of the two algorithm models in the training process are very similar. The loss curve can explain that the learning speed of SVM is relatively slow at the beginning of training. But it is worth noting that as time goes by, the learning curve of SVM is close to a certain value, and it has gradually stabilized. It can be inferred from these data that LstFcFedLear did not learn enough knowledge at the beginning, but over time, this problem was solved. In general, the performance of LTSM is slightly better than that of LstFcFedLear. In order to further demonstrate the accuracy and generalization ability of LstFcFedLear, we compare its accuracy with the other three methods in [7][8][9]. In order to show the awareness of the results of the experiment, we use the method in [10]. The simulated test data was obtained using SynT-ReN, an environment frequently used in the industry. Figure 1 illustrates the operational characteristics (ROC) of the new data generated by LstFcFedLear and CNN. Obviously, on the synthesized test data, the accuracy of LstFcFe-dLear is better than that of the CNN method. As shown in Figure 6(b), compared with the FDR performance of SVM, KNN, and CNN methods, the error rate of LstFcFedLear is the lowest. This result clearly shows that for the genetic disease dataset, LstFcFedLear is better than SVM, KNN, and CNN.
As shown in Figure 6(c), the comparison between the positive prediction curve of LstFcFedLear and the PPV of SVM, KNN, and CNN shows that the LstFcFedLear method is optimal. This also further shows that LstFcFedLear is also superior to SVM, KNN, and CNN in terms of synthesizing genetic test data.
In this experiment, the LstFcFedLear model has 16 to 100 storage units. The training time interval of the model is ½16, 100, and the unit is s. Throughout the experiment, the mean square error is the only indicator that measures the performance of binary classification and regression models in the learning phase. In order to reduce the loss and increase the learning rate, the Adam optimizer is used in the LstFcFedLear model with the parameters beta1 = 0:9 and beta2 = 0:999. In order to prevent overfitting in the learning phase, a total of 3 times of cross-validation were used in this experiment. The pros and cons of the model's hyperparameters are the key to whether a model can achieve the best performance. In this experiment, we repeatedly test the hyperparameters of the LstFcFedLear model, such as the model's backtracking situation, the number of storage units, and the loss rate. The backtracking rate indicates how large the time interval to consider [33]. Table 2 shows the different dropout probability values in the experiment. What we want to explain here is that the loss function in the article represents the mean square error between the training value and the predicted value. In addition, it can be clearly seen from Table 2 that the accuracy of LstFcFedLear is as high as 94.6, which is 8.03% higher than the average of KNN, SVM, and CNN. The sensitivity of LstFcFedLear is as high as 93.4, which is 7.77% higher than the average of KNN, SVM, and CNN. The specificity of LstFcFedLear is as high as 95.1, which is 8.37% higher than the average of KNN, SVM, and CNN. These three indicators also show that LstFcFe-dLear is the best.

Performance Comparison.
Here, we compare the performance of SVM, KNN, and CNN in detail with the performance of the LstFcFedLear model proposed in this paper. The experiment uses the scikit-learn package in Python for testing. In order to be able to make a thorough comparison with the performance of the LstFcFedLear model, a 3-fold cross-validation was specifically used in the experiment. For the best training parameters, grid search technology is also used in the experiment. It can be seen from the experimental results that for SVM, the value of the penalty parameter is set to c = 0:1. In Tables 1-3, we, respectively, compared the fault classification performance of KNN, SVM, CNN, and LstFcFedLear in detail. The comparison results show that the LstFcFedLear model has the best performance. In terms of accuracy and recall, LstFcFedLear is the best among these three. The second place is the CNN model. Table 2 mainly      As shown in Figure 6, the area enclosed by the curve has shown that LstFcFedLear is larger than the other three types. That can show that in terms of accuracy, LstFcFedLear is definitely better than the other algorithms. It can be seen from Figure 6 that the value of LstFcFedLear actually reaches the maximum average number AUC, about 0.84. But the AUC values of SVM, KNN, and CNN are 0.47, 0.78, and 0.59, respectively. The rankings are LstFcFedLear, KNN, CNN, and SVM. In addition, we calculated and visualized the area enclosed under the curve in Figure 6 in order to highlight the accuracy of all query methods. The AUC value obtained by the LstFcFedLear model is about 0.84, which can also indicate that the model is the best. More importantly, the average AUC owned by GlobalMIT is about 0.47, which is obviously much lower than that of LstFcFedLear.
As shown in Figure 7, the LstFcFedLear model has the best performance in classifying all faults into positive probability because the area under the ROC curve corresponding to LstFcFedLear is the largest. According to the area ranking under the ROC curve, it can be seen that the ranking of SVM is only lower than that of LstFcFedLear but is better than that of CNN and KNN in turn. It is worth noting that the ROC area of the LstFcFedLear model is 10 times that of KNN, 6 times that of SVM, and 3 times that of CNN. The huge area difference once again illustrates the excellent accuracy of the LstFcFedLear model. Table 1 characterizes the accuracy of these algorithm models from another level. The MAE value of LstFcFedLear is 0.226, which is 0.075 lower than the average of KNN, SVM, and CNN. The test results show that the MAE value of KNN is the largest, indicating that the effect of the model is the worst. The ranking of other models from good to bad is CNN, SVM, and KNN. In terms of RMSE, the RMAE value of LstFcFedLear is 0.619, which is 0.123 lower than the average of KNN, SVM, and CNN. The test results show that the RMAE value of KNN is the largest, indicating that the effect of the model is the worst, which is consistent with the performance on the MAE value. The ranking of other models from good to bad is CNN, SVM, and KNN. In terms of SDE, the SDE value of LstFcFedLear is 0.634, which is 0.072 lower than the average of KNN, SVM, and CNN. The test results show that the SDE value of KNN is the largest, 0.763, indicating that the effect of the model is the worst, which is consistent with the performance on the MAE and RMSE values. The ranking of other models from good to bad is CNN, SVM, and KNN.
In Table 3, we compare the running time of the LstFcFe-dLear, KNN, SVM, and CNN models. It can be seen that although LstFcFedLear is indeed superior in various performances, it pays relatively high in training time and running time costs. As can be seen in Table 3, in terms of training time cost, the time-consuming order is KNN, SVM, CNN, and LstFcFedLear from least to more. KNN is indeed very timeconsuming, but its performance is too poor to be suitable for promotion and application in the industry. LstFcFedLear may take a little longer time, but it is relatively stable in terms of performance.

Conclusion
In short, we propose a vertical federated learning framework based on LSTM fault classification network to predict the failure of the fire IoT platform. The advantage of this framework is that it can encrypt and integrate the data on the entire firefighting IoT platform to form a new dataset. After the synthesized data is trained through each model, the optimal model parameters can be finally updated. At the same time, it can ensure that the data of each business system is not leaked. The experimental results showed that the LstFcFe-dLear model provides an effective method for fault prediction, and its results are comparable to the baseline. And the results among LstFcFedLear and SVM, KNN, and CNN methods showed that LstFcFedLear performs better than all methods in RMSE prediction, with the improvement being 9.8% and 24.3%, respectively. In the future, we plan to apply the LstFcFedLear model to power production application scenarios and then further optimize the robustness and other performance of the model.

Data Availability
We used the Fault Type of Fire Facility (FTFF) dataset from the Firefighting Internet of Things platform database of China State Grid Gansu Electric Power Company. This dataset contains two subdatasets, namely, FTFF1 and FTFF2.

Conflicts of Interest
All authors declare no conflict of interest over this article.