Online Fault Prediction Based on Collaborative Filtering in Smart Grid

Smart grid, responsible for upgrading traditional power networks by integrating with cutting-edge information and communication networks, forms coupled networks but also pose potential hazards in the face of fault cascade. In coupled networks, fault prediction is of signi ﬁ cance because tight interaction between power nodes and communication nodes makes the smart grid more vulnerable. Unfortunately, most existing works of fault prediction are speci ﬁ c to a single network and do not consider the correlation of coupled elements. To address these limitations, in this paper, we highlight the interdependence of networks and de ﬁ ne fault correlation. Further, we propose a probabilistic prediction model using collaborative ﬁ ltering in machine learning. We ﬁ nally present an online prediction algorithm. We conduct experiments to illustrate the effectiveness of our prediction algorithm with different parameters and give some observations that may give more insight into interdependent networks.


Introduction
With advanced information and communication technologies, a smart grid represents an unprecedented opportunity to improve power systems into a new era that will contribute to our economic and environmental health [1]. The smart grid integrates renewable energy (e.g., solar and wind) into the grid [2], and it needs flexible control as well as communication technologies [3]. Smart grid makes emerging interdependent networks because it heavily couples power network and communication network to control flexible energy dispatch. In interdependent networks, communication equipment needs the power supply, while power control commands need to be transmitted by a communication network. These reciprocal networks facilitate power transmission and bring convenience to users. However, it also ruins the robustness of networks owing to tight correlation [4]. In particular, faults can easily spread across two networks to cascade with devastating impact on network functionalities. The cascade probably causes network collapse and heavy economic loss. For instance, the largescale power outage in the northeastern United States and Italy in 2003. With fault prediction, we can take measures in advance to minimize the loss caused by potential failures. Therefore, fault prediction of networks is of great importance between two networks in the smart grid. In comparison with a single network, fault prediction becomes challenging work in coupled networks for the academic community [5].
The complexity of the smart grid and the correlation of faults in the coupled networks make fault prediction difficult. We sort out related works from two aspects.
There are some research works about fault prediction in smart grid systems. Sun et al. [6] have proposed a long shortterm memory-based prediction of confidence interval bound for wireless link quality in the smart grids. In their work, the lower bound of the confidence interval has been represented by the worst-case reliability of random wireless link quality. They have determined whether the link quality meets the reliability of the next transmission according to the low bound. Giampieri et al. [7] have proposed an evolutionary supervised classification system for fault prediction in the smart grids. They have used the supervised algorithm to process large data sets. Muralitharan et al. [8] have given a neural network-based optimization for energy demand forecasting in smart grids for power generation. However, the characteristic of coupled networks is neglected in these works. Jafari et al. [9] have realized the interdependence and have used feedforward neural networks to predict instability in smart cyber-physical grids; Malbasa et al. [10] have utilized active machine learning for voltage stability prediction to ensure the reliability of the grid system. The optimal operation of the energy storage system, which uses the prediction interval to reduce the peak value of the distribution network, has been proposed to manage the supply-demand balance between power generation and consumption rationally [11]. Some works have referred to the coupled networks, but they have not explored networks in detail.
There are some research works for fault prediction with machine-learning technologies. For fault prediction of power grids, Andresen et al. [12] have proposed that machine learning can predict faults in smart grids and have provided ideas for fault prediction in smart grids. Gupta et al. [13] have proposed a support vector machine for active outage prediction. A random forest algorithm-based voting has been proposed for fault prediction in distribution networks of smart grids [14]. Other scholars have predicted the node load in the smart grid and have considered that a power node with an abnormal load is a fault node [15,16]. These authors have studied fault prediction, and they have obtained some effective results in the smart grid. Nevertheless, they do not take into account the coupled networks since they only focus on voltage or power load. To predict the remaining useful life of rolling bearings, Mao et al. [17] have utilized deep feature representation and transfer learning based on the data and proposed a prediction method. This method includes an offline stage and an online stage. Recognizing imbalance between robustness and portability, Pan et al. [18] present an algorithm called LightWarner that is an easily applicable predictor based on model-free reinforcement learning. The LightWarner learns fault characteristics dynamically without pre-training. Other researchers have also studied interesting prediction methods for maintaining a zero downtime guarantee in the mobile scenarios [19]. In the 5G network, Vasilakos et al. [20] provide the software development kit to facilitate online real-time data.
In this study, we highlight the coupled networks and fault prediction. We first provide a coupled model considering the faults correlation; then, we leverage matrix decomposition for fault prediction, inspired by the collaborative filtering algorithm and sparse features in the smart grid. Specifically, our contributions are as follows: (i) On the basis of coupled heterogeneous networks, we define a metric of fault correlation considering the time as well as space features in the interdependent smart grid. (ii) We transform coupled networks into four matrices.
Using collaborative filtering, we propose a matrix decomposition model for predicting the fault probability of nodes. (iii) We propose an online algorithm for fault prediction and determine the key parameters of the prediction model using K-means machine learning. The The remainder of this paper is structured as follows. In Section 2, we provide a system model. In Section 3, we explain prediction algorithms of fault probability and determine the key parameters. In Section 4, we conduct experiments by open data and simulated data. The conclusion is drawn in Section 5.

System Model
In this section, we present a system model from the following aspects: coupled networks, fault correlation, collaborative filtering, and matrix decomposition. Based on the model, we clarify the specific problem.
2.1. Coupled Networks. How to couple a power network and communication network for a smart grid? Lin et al. [21] investigate the dynamic coupling strategy for interdependent network systems considering the current redistribution and loads. Kong [22] has provided the practical linker with the end nodes. These end nodes are sensors and actuators which generate data and execute commands from the control center. The authors consider the terminal users and discuss the energy balance problems in a power-heat-coupling system [3]. Zhang et al. [23] study the robustness of interdependent networks with multiple-dependence relations.
In our previous work [24], a many-to-many coupling model is proposed to couple a power network with a communication network because of many intelligent electronic devices in the smart grid. We also demonstrate experimentally that a smart grid has better robustness in a many-tomany way than that in a one-to-one way. Therefore, we use a many-to-many coupling model in this study, as shown in Figure 1.
In this coupling model, power nodes and communication nodes are interdependent and reciprocal. But faults can easily spread across the model. A node failure in a network may lead to the failures of nodes coupled with it in another network. These failures will propagate continuously in the interdependent networks so that they may lead to fault cascades [25]. In studies of fault cascades, researchers often use the size of the giant components in a system (or the size of the largest connected subgraph in a network) as a metric of robustness when a fault cascade stops [26].

Fault
Correlation. The cascade of fault in coupled networks is an event of time series. How do we exactly determine whether two faults are correlative at different times and locations? It is a prerequisite of fault prediction in coupled networks.
A pair of power nodes and communication nodes, which they are directly connected, is denoted as cor c i ; ð p i Þ ¼ cor p i ; ð c i Þ ¼ 1, if c i and p i are a pair of directly coupled nodes. p i is a power node, and c i stands for a communication node.
For homogeneous nodes in the networks, the fact that two faults are correlated depends on three factors: time interval, spatial distance, and relative location. For instance, two faults in a communication network are correlated when they satisfy the following three conditions: T f ault1 and T fault2 in Equation (1a) is the failure time of c 1 and c 2 nodes, respectively. Equation (1a) indicates that the time interval between two faults is less than T l ; c 1 ; k c 2 k in Equation (1b) limits the spatial distance between c 1 and c 2 , which can be obtained by Dijkstra algorithm; loc C 1 ; ð C 2 Þ in Equation (1c) shows the relative position of c 1 and c 2 . The parameter L can be obtained by the Breadth-First algorithm. Specifically, we perform the breadth-first traversal from c 1 to c 2 , the first number of passed nodes is a relative position. If three equations hold, we determine that the two nodes are correlative and refer it to cor c i ; ð c j Þ ¼ cor c j ; À c i Þ ¼ 1. Similarly, for two power nodes, we can define that two faults are correlated if three equations hold. We also note as cor p i ; ð p j Þ ¼ cor p j ; À p i Þ ¼ 1. Otherwise, we consider that two faults are uncorrelated. We symbolize cor p i ; ð p j Þ ¼ cor p j ; À p i Þ ¼ 0. For heterogenous faulty nodes, which are not on the same layer and not directly coupled, c i and p j are correlated if there exists a node p i coupled with c i and cor p i ; ð p j Þ ¼ 1 or there exists a node c j coupled with p j and cor C i ; ð C j Þ ¼ 1. We also note it as cor c i ; ð p j Þ ¼ cor p j ; À c i Þ ¼ 1. That is as follows: With definitions of fault correlation, we can calculate and analyze fault data from a smart grid, then get the samples of faults to support the subsequent fault prediction.

Collaborative Filtering.
The collaborative filtering is an effective algorithm used in recommendation systems [27,28]. The algorithm functions as feature extraction from partially known data to predict unknown data by data mining methods such as clustering and classification. In this study, we leverage this method to predict fault considering the interdependence. First of all, we model a node probability that the node causes other nodes to fail as an N-dimensional vector with values between 0 and 1. Since large-scale failures are uncommon, we can obtain sparse values in the vector by historical data.
Combining fault correlation with known historical data, we can obtain the data (e.g., the number) that a faulty node may affect its correlated nodes. We refer to a ratio of the number to the total fault numbers as a probability that the node can lead to the fault of correlated nodes. With the infrequency of faults in practice, many nodes do not exist with correlated faults between them. However, this does not decide that the probability of two nodes causing faults to each other is 0. It only indicates that fault data have not been detected and have needed to be predicted. For a smart grid system with N power nodes and M communication nodes, Further, we perform fault prediction for the unknown data in these four probability matrices.

Matrix Decomposition.
The matrix decomposition is one of the tools for collaborative filtering. This decomposition aims to produce a low-rank matrix by the same feature. For easy understanding, we take movie recommendations as an example. Some users have the same movie preferences, which can be grouped into one category. Different movies also have similar styles and characteristics, which can also be classified in a similar way. Therefore, a huge matrix of movie preferences can reduce rank because it has many same feature vectors. If R n×m is the true matrix, b R n×m is the predicted matrix. Assuming that a matrix rank is d, we can get the following formula: The decomposition is shown in Figure 3.
For the fault prediction, we use the same decomposition. There are a large number of communication nodes and power nodes with similar features, such as electrical loads, degrees, and similar centrality. Consequently, we can assume that a matrix of fault probability has many identical or similar feature vectors. We rewrite the four original probability matrices as production of eight low-rank matrices. The decomposition of four matrices is represented as follows: In this study, R i denotes the true probability matrix; Ω i and Θ i are the two potential spaces of failure probability; d i means the rank of probability matrix R i ; b R i denotes the probability matrix obtained by prediction. To learn from known data or samples exactly, we use the following function as the loss function and take minimum optimization:  where i denotes the ith probability matrix; r; q are the rows and columns of matrix; e irq is an indicator that there is an observed value of failure probability in the rth row and qth column of the ith probability matrix. If there is an observed value, e irq ¼ 1, stand for errors of sum of squares between the ith failure probability matrix and its prediction matrix. We refer to as the regularization term.
This term as penalty function is to avoid overfitting. λ i is a parameter. Thus, Equation (5) represents the error values between four matrices and prediction matrices. We use the gradient descent to obtain the optimal solutions of Ω i and Θ i through iteration until convergence with random initialization [29].
2.5. Problem Statement. With the coupled model, we formulate the interdependent networks as four matrixes and define the fault correlation. Then we discuss the matrix decomposition because of similarity and sparsity. We want to predict the faults from the known samples by collaborative filtering in the smart grid.

Fault Prediction
In this section, we focus on the specific process of fault prediction and detailed implementation, including the selection of rank in a probability matrix and online prediction algorithms.

Rank of Probability Matrix.
To reduce the rank of the probability matrix, we use the traditional K-means algorithm to assemble nodes in the smart grid because there are some similar properties among different power nodes or communication nodes. We assume that the cluster number is relevant to the rank of the matrix [30]. In the K-means algorithm, k is of significance. But how to choose an appropriate k value for a network? We use an elbow rule sum of squared error (SSE) to determine the k value [31]. The nodes in a smart grid include many features. We represent all features as a set S ¼ S 1 ; f S 2 ; …; S n g. The error sum of squares is S 2 ¼ ∑ n j¼1 S j −S À Á 2 for each node after clustering, whereS denotes the average value of feature data in the class. Assuming that there are N power nodes and M communication nodes in the system, the error sum of squares after clustering is as follows: whereS i denotes the average value of feature data. In general, SSE decreases with the increase in cluster numbers. But the decreasing slope gradually becomes steady, as shown in Figure 4. We use the elbow rule to choose an appropriate k value by multiple clustering in the experiments. According to patterns of data, SSE shows a mild decreasing slope when the cluster numbers approach the real. When the number of clusters exceeds the real clusters, SSE continually declines, but the declining trend reduces rapidly. We can plot the K-SSE curve and find the inflection point on the curve, as shown in Figure 4. After obtaining the optimal value of k for clustering, we set k as the lower limit and k 2 as the upper limit of rank in the probability matrix because of coupled networks.

Probability Matrix Decomposition.
In a smart grid, fault propagation, in which a failure of r node leads to the failure of q node, depends on three factors: state of r, state of q, and the parts of r, q involved together. These factors determine fault propagation between nodes. Additionally, the failure of a node spends a period rather than at once owning to the cascade process. Then, we formulate prediction as the following expressions: where b R rq t ð Þ denotes the fault probability of q because of r during time slot t; M r t ð Þ and M q t ð Þ mean state information of r and q, respectively. Note that r and q can be both power node or communication node. Ω r t ð Þ T Θ q t ð Þ is the matrix decomposition that also represents joint action of r as well as q according to Equation (7). The b R rq t ð Þ is linear combination of three factors. We equally regarded M r t ð Þ, M q t ð Þ, and Ω r t ð Þ T Θ q t ð Þ as components because they are unknown parameters. Thus we update the loss function as follows: L t ð Þ is the loss function of four probability matrices at the time t. In Equation (8), the first part is the error square between predicted results and actual results, and the second part is the penalty function to prevent overfitting.

Mathematical Problems in Engineering
If we sum up all time periods in the historical data, we get the global loss function L ¼ ∑ T t¼1 L t ð Þ, where t represents a period that fault cascade from the beginning to the end in historical data.

Online Prediction of Failure
Probability. Conventionally, we use gradient descent to optimize loss function L because of data sparsity. Nevertheless, the gradient descent performs better over static data (i.e., the training data). In the smart grid, node data are not static, and we need to obtain the state of nodes timely. For fault prediction, we predict nodes that may fail in the shortest possible time to prevent loss. Therefore, the gradient descent is not suitable for online prediction. We utilize stochastic gradient descent to train the matrix decomposition model. At a given time, nodes r and q have the following loss function: If time slot t is equal in the same period, we ignore t in Equation (9).
In this way, the loss function is ∑ r;q e irq t ð ÞΔL. We use stochastic gradient descent to minimize the loss function. That means we train the parameters Ω r ; Θ q ; M r , and M q by known data and continuously update them with random initial values. The gradient descent is as follows: where η is a learning rate that controls the step size of iteration. The iteration stops when the loss function converges. In any time period t, we input fault data to build a matrix of fault probability when nodes fail. If a node satisfies Equations (1b) and (1c), and the fault probability is greater than the threshold, we can determine that it fails. Specifically, we judge that a node is to fail when this node satisfies the following: where c 1 denotes a failed node; c 2 is a node to be predicted; P c 1 À! c 2 ð Þimplies probability that c 1 failure will cause c 2 failure, and we calculate this probability by the model; Ψ c 2 is a threshold.
For a smart grid, the states of power nodes and communication nodes are likely to change. Thus the interdependencies between nodes update over a period of time. In consequence, we can not rely on an unchanging prediction model. We ought to update the prediction model in time. In this study, we introduce an expiration time for data and discard the fault samples that exceed the expiration time.

Algorithm Description. We propose a Failure Probability Prediction (FPP) algorithm to predict fault and an Online
Model uPdate (OMP) algorithm to update model, respectively. We analyze and train parameters from historical data in the FPP algorithm. Firstly, we calculate fault correlation and initialize the fault probability between nodes (lines 1 and 2). Then we obtain the lower rank of the matrix of fault probability by cluster analysis in the networks (line 3). We randomly initialize the rank of matrices Ω r and Θ q (line 4). After these steps, we use gradient descent to train the parameters of the model (lines 5-7). The trained parameters supply the input of Algorithm 1.
What about the complexity of FPP algorithm? In view of sparsity, the complexity of Dijkstra is O n 2 ð Þ, and that of Breadth-First is O n ð Þ in the worst case where n is the node numbers. In step 2, we need to traverse four matrices and it consumes O n 2 ð Þ. In step 3, K-means is O t × n × k ð Þwhere t stands for numbers of iteration, k is the numbers of clusters. In steps 5-7, stochastic gradient descent is O 1 ð Þ because of quick convergence. The matrix multiplication is O n log 2 7 À Á according to Strassen algorithm. Synthetically, the time complexity of FPP is O n log 2 7 À Á . As for space complexity, adjacency list is used to represent matrix. The adjacency list is O n þ e ð Þ, where e is the number of edges. K-means consumes O m × k × n ð Þwhere m is data. Synthetically, the space complexity is O n ð Þ. We update the online prediction model in Algorithm 2. When new fault data are found, we first calculate the correlation of the faulty nodes (lines 3 and 4). If there is a correlation between two nodes, we renew the matrix of fault probability. Subsequently, the correlation parameters in the model are updated by stochastic gradient descent (lines 5-11). If there is no correlation between two nodes, we select a historical sample to continue. Then we check whether the selected sample is expired (line 14). If the selected sample is not expired, we use the data to train parameters. If it is expired, we set e ij t ð Þ  6 Mathematical Problems in Engineering to 0 to indicate that the data is discarded. We continue to wait for a new sample of fault data (lines [12][13][14][15][16][17][18][19][20]. We briefly introduce the complexity of OMP algorithm. In step 4, fault correlation consumes O n 2 ð Þ according to previous analysis. The initialization and update of matrix is O n ð Þ in the steps 8 and 10. In step 16, Equation (6) is O 1 ð Þ. Therefore, time complexity is O n 2 ð Þ. With regard to space complexity, the fresh data need to store and update matrix. It is O n þ e ð Þ where e is the edges. The e is not more than n owing to sparsity. In all, the space complexity is O n ð Þ.

Experiments and Results
In this section, we provide metrics of the prediction and experimental results.

Evaluation Metrics.
Accuracy rate and recall rate [14] are widely used metrics in the forecast. We also use them in our experiments as metrics. The accuracy rate is a ratio of the fault numbers of correct predictions to the total prediction numbers. The recall rate is a ratio of fault numbers of correct predictions to the total number of faults. A higher accuracy rate means a higher percentage of correct predictions, while a higher recall rate means more target faults are found. Conversely, a low accuracy rate indicates many untruthful alarms. A low recall rate means that many faults are not predicted. High-quality predictors have a high accuracy rate as well as a recall rate. Nevertheless, we often make a compromise in these two metrics [32].
The prediction results about fault are involved in four terms; TN (true normal) for normal samples of correctly predicted node, FN (false normal) for normal samples of incorrectly predicted node, TF (true fault) for fault samples of correctly predicted node, and FF (false fault) for fault samples of incorrectly predicted node. These terms are shown in Table 1.
Based on these four terms, we can obtain the accuracy rate (P:) and recall rate (R:) for fault nodes and normal nodes as follows: P F indicates the accuracy rate of fault node, which is the ratio of the numbers of faults correctly predicted to the total numbers of predicted fault nodes as follows: P N denotes the accuracy rate of normal node, which is the ratio of the numbers of normal nodes correctly predicted to the total numbers of predicted normal nodes.
With Equations (12) and (13), we get the average accuracy rate as follows: R F , recall rate of fault nodes, is a ratio of the numbers of faulty nodes correctly predicted to the total numbers of faulty nodes as follows: R N , recall rate of normal nodes, is a ratio of the numbers of normal nodes correctly predicted to the total numbers of normal nodes as follows: Similar to Equation (14), we can obtain the average recall rate as follows: Input: The Ω r ; Θ q ; M r ; M q from Algorithm 1, model parameters λ; η and the continuous fault data t; f c i ; c j g from actual networks. Output: Real-time and continuously updating prediction 1. Repeat 2. Collect fresh fault data 3. If a new data sample t; c i ; c j received then 4. Calculate fault correlation cor c i ; ð c j Þ by ⊳ Equations (1a)-(2).
Update R ij in the matrix of failure probability; 8.
update the values of Ω r ; Θ q ; M r ; M q ; ⊳ by Equation 6. 11.
Select a historical fault sample t; c i ; c j randomly; 14.
until converged 18. Else In addition, we also use SSE and time metrics to decide parameters in machine learning.

Experimental Results.
We conduct experiments with Windows 10, Python 3.7, and Intel(R) Core(TM) i5-9600K CPU. Data are available from the academic projects for state estimation, visualization, disturbance detection, and prediction in power systems, including EPFL Smart Grid (EPFL Smart Grid: http://smartgrid.epfl.ch/), openPDC (openPDC: https://github.com/GridProtectionAlliance/openPDC) and FNET/GridEye (FNET/GridEye: http://fnetpublic.utk.edu/). Although the data from projects are valid, we also use simulation data for model parameter training in view of the scarcity of fault data in the smart grid. We collect 2,815 the operation data of electric power company. In the fault correlation, the time parameter T is 30 min. Then we use Python and networkX to set up simulations. Each simulation result is an average of over 50 runs. Table 2 lists the fault correlation. From the 345 fault events, the independent fault is 158 when D ¼ 2 and less than half of the events. Interrelated faults account for a relatively large proportion, 32:46%. The three-correlative faults and four-correlative faults are 10:72% and 5:50%, respectively. Table 2 shows that more than half of the faults are related. Figure 5 shows SSE with respect to cluster numbers for different nodes. Generally, the SSE decreases with an increase of cluster numbers regardless of node types (power nodes or communication nodes) in the power system. Nevertheless, for a specific network, there is an optimal value for clusters. For example, the optimal cluster number is 4 in Figure 5(a) when the number of nodes is 1,000. We judge it as an optimal value because it is the inflection point. When the nodes are 8,000, the optimal number is 8, as shown in Figure 5(d).
In Figure 6, we provide the optimal number of clusters with different nodes. From the figure, the optimal number of clusters becomes larger with respect to different nodes. Furthermore, we observe that the growth rate has steadily fallen. We also fit a curve to show the relationship. The curve increases with the number of nodes when the optimal number of clusters reaches 7. The curve has tended to level off. The optimal number of clusters is 8 with 8,000 nodes when the growth rate is close to 0. Therefore, we conclude that the number of clusters is 4-8 to meet most of the smart grid systems. The specific number of clusters is decided according to the node numbers.
Since the optimal number of clusters is k in a single network, k 2 is the cluster number at most in the coupled networks. We provide results of fault prediction with different numbers of nodes. The rank of the selected probability matrix is k-k 2 . For simplification, P F 1; P N 1; P F 2, and P N 2 denote the prediction accuracy rate of fault nodes in Algorithm 1, the prediction accuracy rate of normal nodes in Algorithm 1, the prediction accuracy rate of fault nodes in Algorithm 2, and the prediction accuracy rate of normal nodes in Algorithm 2, respectively. In the same way, R F 1; R N 1; R F 2, and R N 2 denote the prediction recall rate of faulty nodes in Algorithm 1, the prediction recall rate of normal nodes in Algorithm 1, the prediction recall rate of faulty nodes in Algorithm 2, and the prediction recall rate of normal nodes in Algorithm 2, respectively. Similarly, P1; P2; F1, and F2 denote the average prediction accuracy rate in Algorithm 1, the average prediction accuracy rate in Algorithm 2, the average prediction recall rate in Algorithm 1, and the average prediction recall rate in Algorithm 2, respectively.
Figures 7-9 horizontally show the relationship between metrics (accuracy rate, recall rate, and average recall rate or average accuracy rate) and the rank of the probability matrix with different nodes. We list these results with 1,000, 3,000, and 6,000 nodes in Figures 7-9, respectively. Vertically: (a) shows the relationship between the rank of the probability matrix and the accuracy rate; (b) is that between the rank of the probability matrix and the recall index; (c) is that between the rank of the probability matrix and the average recall rate or average accuracy rate.
From Figures 7-9, Algorithm 2 has outperformed Algorithm 1 in accuracy rate and recall rate, regardless of the prediction of faulty nodes or normal nodes. The reason is that Algorithm 2 updates some of the outdated information, which leads to an increase in the accuracy rate and recall rate of the prediction. In addition, these metrics show an overall increasing trend with respect to the rank of the fault probability matrix from k to k 2 . When the rank of probability matrix approaches k 2 , experimental metrics present a decreasing trend because of the overfitting problem caused by the small penalty function factor in the algorithm. As an interesting conclusion, Algorithm 1 and Algorithm 2 have better performance in fault prediction when the rank of the fault probability matrix is near k þ 2=3 k 2 − k ð Þ. For different node numbers, the exact value of rank in the specific matrix needs to be obtained by optimization experimentally.
To meet the requirements of online fault prediction, we concern training time (second) of convergence for the model. From Figure 10, the training time of convergence for the model increases with respect to the rank of the initial probability matrix. Figures 10(a) and 10(b) demonstrate the required training time in Algorithm 1 and Algorithm 2, respectively. In Figure 10(b), the model of Algorithm 2 takes time less than 1 min to converge. In view of the previous conclusions in Figure 6, an appropriate increase in the rank of probability matrix can improve the accuracy rate and recall rate, but the training time increases accordingly. Therefore, there is a tradeoff for requirements. Specifically, if  there are high demands for real-time prediction, a lower rank has to be used so that some of the accuracy rate and recall rate are lost. The recall rate is more important than the accuracy rate for fault prediction in practice in smart grid systems. If we miss a node fault, the fault may cascade in coupled networks and is likely to interrupt energy supply on a large scale. This interruption is much more serious than the incorrectly prediction of normal nodes. In consideration of recall rate, we improve the recall rate from the default threshold 50% by lowering the probability threshold for critical node failures. Figures 11(a) and 11(b) show the accuracy rate as well as recall rate with respect to different probability thresholds in Algorithm 1 and Algorithm 2, respectively. The relevant parameters are 3,000 nodes, 16 rank of the probability matrix. From Figure 11, we improve the recall rate of   Mathematical Problems in Engineering prediction for faulty nodes significantly by reducing the probability threshold of fault occurrence. However, it decreases the accuracy rate synchronously. For normal nodes, the threshold reduction of the fault probability can increase the accuracy rate and can decrease the recall rate at the same time. The experimental results demonstrate that for more concerned recall rate, it is effective to reduce the probability threshold of fault occurrence.
We experimentally compare our FPP algorithms with the OMP. We also contrast FPP with other algorithms, such as random forest algorithms (RF) [33], artificial neural networks (ANN) [34], and support vector machines (SVC) [35] in Table 3. From Table 3, the fault probability prediction algorithm by FPP outperforms SVC in terms of accuracy rate and recall rate, while it does not dominate with comparing RF and ANN algorithms; the online model update algorithm by OMP outperforms other algorithms in terms of accuracy rate and recall rate. Additionally, our model has a low training time, so it can well fit smart grid systems, in which there is usually a timely response to faults. Rank of probability matrix Average recall rate or accuracy rate ðcÞ FIGURE 7: Metrics and rank of the probability matrix with 1,000 nodes. (a) Accuracy rate with 1,000 nodes; (b) recall rate with 1,000 nodes; (c) average recall rate or accuracy rate with 1,000 nodes.

Conclusion
Smart grid, interdependent power grids with communication networks, supports the advanced functionalities to deliver electricity efficiently. Such interdependencies damage the robustness and make systems more susceptible to a fault. Fault prediction is of significance in coupled networks. In this paper, inspired by collaborative filtering in recommender systems, we utilize a matrix decomposition to predict node faults in smart grid systems. Based on fault correlation and coupled networks, We propose a fault probability model and algorithms to predict online faults. We also carry out experiments to validate the model and algorithm. In addition to some interesting observations, we provide the relationship between the rank of probability matrix and accuracy rate as well as recall rate.
In future works, we will focus on fault tolerance from the life cycle of faults. Fault tolerance is of greater importance for interdependent networks to build a strong, flexible, and reliable smart grid.   ðcÞ FIGURE 9: Metrics and rank of the probability matrix with 6,000 nodes. (a) Accuracy rate with 6,000 nodes; (b) recall rate with 6,000 nodes; (c) average recall or accuracy rate with 6,000 nodes.

Data Availability
Data used to support the findings of this study are listed in page 12 in the article.

Conflicts of Interest
The authors declare that they have no conflicts of interest. Accuracy rate or recall rate Probability threshold Accuracy rate or recall rate Probability threshold