Modeling of Failure Prediction Bayesian Network with Divide-and-Conquer Principle

For system failure prediction, automaticallymodeling fromhistorical failure dataset is one of the challenges in practical engineering fields. In this paper, an effective algorithm is proposed to build the failure prediction Bayesian network (FPBN) model with data mining technology. First, the conception of FPBN is introduced to describe the state of components and system and the causeeffect relationships among them. The types of network nodes, the directions of network edges, and the conditional probability distributions (CPDs) of nodes in FPBN are discussed in detail. According to the characteristics of nodes and edges in FPBN, a divide-and-conquer principle based algorithm (FPBN-DC) is introduced to build the best FPBN network structures of different types of nodes separately. Then, the CPDs of nodes in FPBN are calculated by the maximum likelihood estimation method based on the built network. Finally, a simulation study of a helicopter convertor model is carried out to demonstrate the application of FPBN-DC. According to the simulations results, the FPBN-DC algorithm can get better fitness value with the lower number of iterations, which verified its effectiveness and efficiency compared with traditional algorithm.


Introduction
With the developments of information and computer technologies, modern systems have become more complex while the relationships among systems have also become more complicated.To fulfil the high demands of system safety, operational efficiency, and life cycle cost, the key objective is to predict the system state and warn of potential failures with the help of advanced methods, which could avoid great loss before failure happens [1].
Failure prediction approaches have been divided into three types, including experience-based, condition-based, and model-based methods [2].Li et al. [3] performed a reliability analysis with an emphasis on predicting the lifetime of diesel engine's turbocharger, in which the failure mode and the information of criticality are fully utilized.Wang and Jiang [4] evaluated the degradation of complex system performance using complex system condition monitoring information based on support vector machine (SVM).Zhang et al. [5] proposed a particle swarm optimization based SVM model with the characteristics in software reliability prediction.Although many interesting methods have been proposed for failure prediction, the model-based method has played more important role in engineering fields for its advantages in effectiveness and efficiency.
Data mining, also referred to as knowledge discovery, is defined as the process of extracting nontrivial, implicit, previously unknown, and potentially useful information from databases [6].With the wide application of maintenance information management systems, operation data can be collected easily.So, many scientists and engineers applied artificial intelligence or statistical methods to establish the failure prediction models.For instance, Chen et al. [7] proposed the manufacturing defect detection method using the technique of associating rule mining.Han et al. [8] used sequential association rules mining to extract the failure patterns and forecast failure sequences of Republic of Korea Air Force aircraft with various combination states of aircraft types, location, mission, and season.Dong [9] described the concepts, models, algorithms, and applications of hidden Markov models and hidden semi-Markov models in engineering asset health prognosis.
Bayesian network (BN) is a directed acyclic graph which can represent uncertain knowledge by describing relationships and influences among variables [10].Built upon the Bayes' theorem, BN is designed to obtain posterior probabilities of unknown variables from known probabilistic relationships.Moreover, with the help of graphical diagrams consisting of nodes and edges, BN can be understood more easily than many other techniques.So, BN has got great popularity for solving system modelling problems in broad engineering fields owing to its advantages [11,12].Particularly for system reliability prediction, Muller et al. [13] formulated a dynamic prognosis BN model with the knowledge from functional and dynamic modelling.Langseth and Portinale [14] proposed a BN modelling framework which could translate standard fault tree to BN. Doguc and Ramirez-Marquez [15] studied a BN construction method for system reliability estimation and provided a step-by-step illustration of the method.Weber and Jouffe [16] developed the dynamic object-oriented BN by integrating system functioning and malfunctioning knowledge.Mahadevan et al. [17] applied BN to structural system reliability reassessment and validated it by analytical comparison.
Generally, it is not easy to build and quantify BN's relationship for practical cases only based on expert opinions, especially for uncertain reliability prediction problems.Because system operation data are abundant in quantity and various in characteristics, this paper introduce an expanded BN model to describe the failure prediction process for complex system under uncertainty and proposes a divideand-conquer principle based data mining algorithm to build the corresponding model.
The rest of this paper is organized as follows.Section 2 describes the failure prediction Bayesian network (FPBN), including nodes types, edges directions, and conditional probability distributions (CPDs).In order to facilitate the FPBN modeling with failure data, a divide-and-conquer principle based modeling method is proposed in Section 3.With the helicopter convertor case, Section 4 illustrates the application of the proposed FPBN modeling method.Section 5 concludes this study and gives several possible future research topics.

Failure Prediction Bayesian Network
By inheriting the advantages of BN, the FPBN is introduced to describe the state of components and system and the causeeffect relationships among them for system failure prediction [18].A FPBN is also described with ⟨X, A, P⟩, where X represents nodes, A represents edges, and P represents CPDs.However, some practical assumptions are built on nodes and edges in FPBN according to the characteristics of failure prediction tasks.So, the FPBN can perform more efficiently than traditional BN in the field of system failure prediction.A simple example is shown in Figure 1.

Types of Nodes in FPBN.
In BN, all the nodes represent variables which have equal status.In FPBN, nodes reflect the state of component or system in practical engineering systems.According to corresponding role in system failure prediction process, the node set X is divided into three types of subsets as X = C ∪ M ∪ E, including failure cause subset C, failure mode subset M, and failure detection subset E. As BN, the values of all nodes in FPBN are discrete and mutual exclusive.For binary systems, each node has two states, functioning as 0 and failure as 1.For multistate systems, there are more failure states within a node which will be represented as {0, 1, 2, . . ., }.
2.1.1.Failure Cause Nodes.This type of nodes describes the root causes of certain failure mode.In the failure prediction process, the possible states of a failure cause node could be derived from detected information, reliability estimation, or expert initialization.
2.1.2.Failure Mode Nodes.The failure mode nodes represent the operation states of system which is the final object of failure prediction task.There will be usually only one failure mode node which is the objective of the failure prediction.
2.1.3.Failure Detection Nodes.Failure detection nodes describe the detectable states of certain sensors or lights or alarms.They are affected by the failure cause nodes or failure mode node in FPBN model.

Directions of Edges in FPBN.
When dealing with the system failure prediction problem in practice, maintenance engineers usually use the failure detection information to diagnose the possible state of corresponding failure cause and integrate the failure cause states to estimate the probability of failure mode.
In traditional BN, an edge represents the relationship between any two nodes, while in FPBN, each edge   in A indicates that there are cause-effect relationships between   and   .  is the cause of   and   is the effect of   .In particular, the directions of edges between different types of node subsets are initialized.As shown in Figure 1, the failure mode nodes VL could only be affected by the failure cause nodes (HP, HV); the failure cause nodes (HP, HV) and failure mode nodes VL could be revealed by the failure detection nodes HT.But among the same node subset, such as (HP, HV), there is no restriction on the direction of edge between them.Such edges can only be determined by operation dataset or expert knowledge.The directions of edges for different node subsets in FPBN are consistent with the reasoning process of failure prediction tasks.The practical FPBN is easy to understand by maintenance engineers.

Conditional Probability Distributions in FPBN.
The FPBN has the same meaning of P with traditional BN.The P = {(  | (  )),   ∈ X} represents the CPD of each node which expresses the intensity of relevance among   and its father nodes (  ) ⊆ { 1 , . . .,  −1 ,  +1 , . . .,   }.For the root nodes which do not have father node, their CPDs are replaced with corresponding prior probability distributions.
Like a BN, the node is independent of other nodes in FPBN if the states of its all father nodes are known.When actual states of failure detection nodes are inputted, the FPBN model is operated with CPDs to estimate the state of failure cause nodes which will determine the state of failure mode node.

BN Modeling Method Based on Data Mining.
Since building objective BN model with expert experience is not easy, learning practical model from dataset with data mining methods has attracted considerable attention recently [19].
The BN modeling process usually consists of two parts: learning the BN structure which is represented with nodes and edges and learning the BN parameters which specify the CPDs of BN.
The key problem of learning BN structure from dataset is to find the most proper network structure which could represent the potential relationships in the dataset accurately.Because learning the BN structure from dataset is an NP-hard problem for large networks, the conditional independence tests based algorithms and the score and search based algorithms have been proposed separately to settle this challenge [20].The former method discovers the potential conditional independence relationships of nodes from dataset with conditional independence test equation and builds BN based on such relationships [21].In the score and search based methods, a score function is used as the criterion to represent how the candidate network structure fits the dataset while a searching algorithm is applied to find the best structure with the highest score in all candidate network structures.Some popular score functions include Copper-Herskovits function [22], Bayesian information criterion (BIC) function [23], and minimum description length [24].In the searching algorithms, the BN structure is usually encoded as an ordered string or a connection matrix while different operators have been designed and employed to find the one with the highest scores.Some common algorithms include genetic algorithm [25], evolutionary programming [26], ant colony optimization [27], integer linear programming [28], globally parallel learning [29], and heuristic equivalent learning [30].
Such algorithms are mainly proposed for general BN structure learning where no restriction is applied on the directions of edges.There is also another kind of BN structure learning where the sequence of all nodes is known and the latter node could only be the child node of former node.The famous K2 algorithm could deal with this kind of BN structure learning problem well with deterministic searching [22].The FPBN is actually a new kind of BN structure learning where the sequence of all node subsets is known and the node in latter subset could only be the child node of the node in a former subset.The K2 algorithm cannot be applied to FPBN structure learning for its comprehensive node sequence restrictions.The general BN structure learning algorithms usually cost lots of time when the node number is large and the edges between nodes are complex.So, the characteristics of nodes and edges in FPBN should be considered to decrease the number of possible candidate network structures and limit the searching space.

Structure Learning of FPBN with Divide-and-Conquer
Principle.In computer science, divide-and-conquer principle is an important algorithm based on multibranched recursion [31].The divide-and-conquer algorithm breaks down a problem into two or more subproblems of the same type which are simple enough to be solved directly.The solutions to the subproblems are then combined to give a solution to the original problem.The correctness of a divide-and-conquer algorithm can be proved by mathematical induction, and its computational cost is often determined by solving recurrence relations.
According to the directions of edges in FPBN, it is clear that (1) the father nodes of failure detection nodes may belong to failure cause nodes and failure mode nodes; (2) the father nodes of the failure mode nodes could only be failure cause nodes; (3) for a node in the failure cause node subset, every node except itself could be its father node.With these restrictions, a divide-and-conquer principle based algorithm (FPBN-DC) is introduced to learn the FPBN network structure.It will build the network structures for failure detection, failure mode, and failure cause nodes separately.The modeling process of FPBN-DC algorithm is listed as follows.
Step 1. Initialize node set X = { 1 ,  2 ,  3 , . . .,   } in FPBN and classify the nodes in X into three subsets C, M, E according to the type of nodes.Three subsets are ordered in ascending which means the node in latter subset could only be the child node of node in former subset.
Step 2. Choose the score function to evaluate candidate FPBN network structure.The BIC score [23] is used as the score function, as (1).Because the BIC function of whole network could be decomposed as the sum of each node's single function, (1) can be transferred to (2): where  represents the number of nodes in FPBN;   represents the number of candidate combination states of the father nodes of the th node;   represents the number of candidate states of the th node;   represents the number of failure records which satisfy the request that the th node is in the th state and its father node set is in the th state;   represents the number of failure records which matches that the father node set of the th nodes is in the th state;  represents the whole number of all the failure data records.
To search the best network structure with the highest  BIC , it is clear that every part of  BIC   should get its score as high as possible.Since FPBN is an extension from BN, it still has to satisfy the request that there is no loop in the network structure.So, the key point is to decompose  BIC logically where there is no loop in the corresponding network structure.
Theorem 1.If there is no loop inside the interior structure of each of node subsets C, M, E in FPBN, then no loop exists in the whole FPBN model.
Proof.If there is a loop in the FPBN model with no loop inside three subsets C, M, E, then there must be at least one edge point from one of the subsets C, M, E with a second edge pointing back to this subset.But according to the FPBN edge' directions, the edge could only point to the latter subset and cannot point back to the former subset.So, such a loop in the FPBN model does not exist.

Lemma 2. The maximal BIC score of the FPBN model could be broken up to the sum of the maximal BIC score of the three subsets, as max(𝑉
Proof.According to Theorem 1, there will be no loop between the subsets.When every subset has the highest score with no loop inside the subset, the whole FPBN structure satisfies the limitation of no loop and the score of the structure is the highest.
With Lemma 2, the FPBN structure searching problem for the highest score is divided into three small scale structure searching problems.
Step 3. Select a node   which belongs to subset E randomly and remove it from X. Its candidate father node set is (  )  ⊆ (C ∪ M).Theorem 3. Adding a father node to the node which belongs to failure detection subset E will not form a loop inside the subset.
Proof.Because (  )  ⊆ (C ∪ M), adding a father node to any node   belonging to subset E will not connect any two nodes inside the subset with an edge.This means that there will be no relationship between failure detection nodes.

Lemma 4. The maximal BIC score of subset E could be broken up to the sum of the maximal BIC score of each node inside the subset, as max(𝑉
Proof.According to Theorem 3, there will not be any loop between the nodes inside the subset E. When each node (  ,  = 1 . ..) has the highest score, the subset structure satisfies the limitation of no loop and the score of the subset  BIC E is the highest.
Lemma 4 could reduce the searching complexity by just calculating the highest score of every single node inside the subset E.
Step 4. With the selected node   , for every node in candidate father node set (  )  , compute the updated structure score  BIC   supposing that the node is added to the actual father node set (  ) of   .According to Theorem 3, it does not need to verify the structure when a node is added to node   's father node set.
Step 5. Select the node in the set (  )  that leads to the highest score of  BIC   and name this score as  BIC   -new .If the score  BIC   -new is higher than the old structure score  BIC   -old , move this node from the set (  )  to the actual father node set (  ) of   and update the score as  BIC   -old =  BIC   -new .Turn to Step 4 and search other father nodes of   in the rest candidate father node set (  )  .If ( BIC   -new <  BIC   -old ), it means there is no possible father node in set (  )  which could lead to a higher score.Turn to Step 6.
Step 6. Check whether there is still a node in set X which belongs to failure detection subset E. If yes, turn to Step 3 and search its maximal score.If no, it means the highest score of every node in subset E has been found.Turn to Step 7 to search the maximal score of the failure mode node.
Step 7. According to the FPBN description, there will be usually only one failure mode node   in subset M. Select the node   and remove it from X. Its candidate father node set is (  )  ⊆ {C}.
Theorem 5. Adding a father node to the node which belongs to failure mode subset M will not form a loop inside the subset.
Proof.Because there is only one node in the failure mode subset and its candidate father nodes belong to failure cause subset, adding a father node to the node   belonging to subset M will not connect the node itself with an edge.This means that there will be no loop in the subset M.
Step 8.With the selected node   , for every node in its candidate father node set (  )  , compute the updated structure score  BIC   supposing that the node is added to the actual father node set (  ) of   .According to Theorem 5, it does not need to verify the structure when a node is added to node   's father node set.
Step 9. Select the node in set (  )  which leads to the highest score of  BIC   and name this score as  BIC   -new .If the score  BIC   -new is higher than the old structure score  BIC   -old , move this node from the set (  )  to the actual father node set (  ) of   and update the score as  BIC   -old =  BIC   -new .Turn to Step 8 and search other father nodes of   in the rest candidate father node set (  )  .If ( BIC   -new <  BIC   -old ), it means there is no possible father node in set (  )  which could lead to a higher score.Turn to Step 10.
Step 10.The father nodes of a failure cause node   in subset C could be any other nodes in C, as (  )  ⊆ {C −   }.So, the problem transfers to how to learn a general BN structure inside the subset C with the highest score of  BIC C .An immune algorithm based structure learning method for BN (BN-IA) [32] is applied to deal with this problem.
Step 11.All maximal scores of three subsets have been calculated and the score of max( BIC ) is the sum of them.
Using FPBN-DC algorithm, it is clear that the original  nodes of FPBN structure learning problem are broken down into three BN structure learning problems with a fewer number of nodes.Then these 3 smaller scale searching problems can be solved with general BN structure learning methods easily.

Parameter Learning of FPBN.
For the BN, the parameter learning is to find the P to maximize objective likelihood function (P | X, A, D) when the best network structure ⟨X, A⟩ is learned from the dataset D. The calculation of P is a parameter estimation problem in statistics field and is usually solved by the maximum likelihood estimation (MLE) method [33].
In the MLE based BN parameter learning method, the CPDs of nodes { 1 ,  2 , . . .,   } are By counting the state distributions of each node under every state combination of all its father nodes from dataset, the MLE method can find the best probability distributions P * for all nodes.Each parameter  *  in P * is calculated as (3) and its practical meaning is shown in (4).When all   in P reach  *  , the objective likelihood function (P * | X, A, D) will have the largest value: *  = (the number of records fit   =  and  (  ) =  in dataset D) × (the number of records fit Because the CPDs in FPBN are the same with CPDs in BN, the parameter learning of FPBN also used the effective MLE method.

Simulation Dataset.
For the simulation study, we introduced a practical helicopter convertor FPBN model [34] as the original model.The nodes in the helicopter convertor FPBN model belong to three subsets.The failure cause subset includes "Power part, " "Voltage adjustor, " "Transformation filter, " "Output filter, " and "Fan." The failure mode node is "No output" and the failure detection nodes are "Voltage output, " "Filter output, " and "Fan sound." The details of these nodes are shown in Table 1.
The cause-effect relationships among nodes are shown in Figure 2. The node "Power part" affects node "Voltage adjustor" while the nodes "Voltage adjustor, " "Transformation filter, " and "Output filter" result in the convertor failure of "No output." The node "Voltage output" is an outer representation of "Power part" and the node "Filter output"  is also an outer representation of "Transformation filter." The node "Fan sound" is the result of both "Output filter" and "Fan." According to this practical model, 3000, 5000, and 7000 operation records are generated from it separately with a random sampling method.Each record represents the corresponding states of all variables in helicopter convertor at a time.These different scales of failure record datasets are named as dataset 1, dataset 2, and dataset 3 to demonstrate the application of FPBN-DC and verify its performance independently.

Simulation Results.
To verify the effectiveness and efficiency of the proposed FPBN-DC algorithm, the BN-IA algorithm, which ignores the assumption of node types and edge directions in FPBN, is also introduced to learn the network structure from datasets.
First of all, we discuss the coding scale of network structure which determines the searching space of each algorithm.For both FPBN-DC and BN-IA, an adjacency matrix is used to describe the network structure, as shown Then, two algorithms learn the 3 generated datasets 10 times separately with same parameters.The highest and average fitness values (an equivalent of the BIC score) of each algorithm for every dataset are listed in Table 3.The convergence iterations of the corresponding fitness values are also listed in it.Since the fitness value represents the similarity between the network structure and corresponding dataset, the highest fitness of 10 runs is the main criterion for the effectiveness of algorithms.The corresponding convergence iteration of the highest fitness is a reasonable criterion for algorithm efficiency.
According to the highest fitness (in bold type) of each algorithm and each dataset during the 10 runs in Table 3, the FPBN-DC and BN-IA algorithms reach the same highest fitness values for all the 3 datasets.However, the corresponding iterations needed to reach the highest fitness in FPBN-DC are much less than the iterations in BN-IA.Furthermore, the average fitness values of 10 runs in 3 datasets equal the highest fitness values in 10 runs for FPBN-DC.This means that the FPBN-DC can get the best network structure in every searching run.For BN-IA, the average fitness values of 10 runs in 3 datasets are less than the highest fitness values, which represent the stochastic error of this algorithm.The objective of the FPBN structure learning is to search the best network which could represent the dataset comprehensively.So, comparing the best network of each algorithm with the original one is another useful criterion for algorithm effectiveness.The edge differences between the best network of each algorithm in 10 runs and original FPBN in Figure 2 are listed in Table 4, where  represents the number of added edges,  represents the number of lacked edges, and  represents the number of reverse edges.The best network structures learned by FPBN-DC and BN-IA algorithm are exactly the same as the original one in 3 datasets.This result also shows the ability of FPBN-DC in data retrieval.
Finally, according to the comparison results, the FPBN-DC algorithm can get higher fitness value with a lower number of iterations, which verified its effectiveness and efficiency compared with BN-IA algorithm.

Conclusion
The paper proposed an effective algorithm to build the FPBN model from system operation dataset.The types of network nodes, the directions of network edges, and the CPDs of nodes in FPBN are discussed in detail at first.Then, the FPBN-DC algorithm is introduced into the FPBN modeling process to learn the network structure of failure detection, failure mode, and failure cause nodes separately according to their assumptions on edge directions.Finally, the simulation study of a helicopter convertor FPBN model is carried out.The proposed FPBN-DC and the BN-IA algorithms learn the same 3 generated datasets 10 times separately with same parameters.Taking the advantages of divide-and-conquer principle, the PFBN-DC has a smaller coding scale than BN-IA which means smaller searching space and less searching time.The comparison results also show that FPBN-DC can get higher fitness value with the lower number of iterations.The learned network structures by PFBN-DC in 3 datasets are exactly the same as the original one which also verified its effectiveness.For the future research, with the application of sensors in practical engineering system, we plan to introduce the real-time detection node into FPBN model which may provide a more precise failure prediction.

Figure 2 :
Figure 2: Network structure of a helicopter convertor FPBN model.

Table 1 :
Nodes in the helicopter convertor FPBN model.

Table 2 :
Coding scale in the helicopter convertor FPBN model.

Table 2 .
In BN-IA, the number of bits for structure code is 72 because there are 9 nodes in the model and each node needs 8 bits to represent its father nodes.In FPBN-DC, the structure code has only 20 bits because the structure learning scale is reduced according to Lemma 2. Actually, we just need to learn the network structure of nodes in failure cause subset.Generally, the algorithm searching time is mainly consumed in the score calculation process for the score and search based algorithms.Because BIC score has to go through the dataset (  ×   ) times to count the   and   according to (1), the searching time deeply depends on the parameter of   and   .Since   represents the number of all candidate combination states of the father nodes of the th node, it relates to the max number of a node's father nodes directly.It is obvious that the FPBN-DC algorithm has smaller searching space and less searching time.

Table 3 :
Fitness values and corresponding iterations of each algorithm for every dataset.

Table 4 :
Edge differences between the learned FPBN structures with the original one.