A Novel Method of Failure Sample Selection for Electrical Systems Using Ant Colony Optimization

The influence of failure propagation is ignored in failure sample selection based on traditional testability demonstration experiment method. Traditional failure sample selection generally causes the omission of some failures during the selection and this phenomenon could lead to some fearful risks of usage because these failures will lead to serious propagation failures. This paper proposes a new failure sample selection method to solve the problem. First, the method uses a directed graph and ant colony optimization (ACO) to obtain a subsequent failure propagation set (SFPS) based on failure propagation model and then we propose a new failure sample selection method on the basis of the number of SFPS. Compared with traditional sampling plan, this method is able to improve the coverage of testing failure samples, increase the capacity of diagnosis, and decrease the risk of using.


Introduction
In the process of industrial manufacturing for electrical systems and equipment, testability plays a crucial role in the reliability improvement for large scale electrical equipment [1]. As we know, having good testability of systems and equipment can detect and isolate failures quickly, reduce maintenance time, and increase the availability of the system. Thereby, testability is paid more attentions by many researchers.
Testability refers to testing the abilities of failure diagnosis, fault prognosis, and fault isolation. Numerous models and methodologies have been developed to diagnose, prognose, and prevent failures or faults. In 1983, Huang et al. introduced a new diagnostic concept of K-node fault diagnosis [2]. They pointed out that testability is only relying on the structure of circuits with nothing to do with the value of elements. In [3], Maeda et al. discussed factors of testability and distinguishability for nonlinear systems according to analysis and graph theory. Yang et al. [4] proposed the slope fault mode on the complex field; the method is available for the diagnosis of linear or nonlinear analog circuits. In [5], a new fault diagnostic method under tolerance condition is proposed by using fuzzy math means to detect faults. In order to detect and isolate faulty components and to predict the remaining useful performance of analog circuits, Vasan et al. [6] proposed using a kernel method and a particle filtering method for diagnosis and prognosis, respectively. One significant stage of design for testability is the testability demonstration experiment. It is to test the ability of failure detection and isolation through injecting some failures [1]. As injected testing failure samples, these failures samples are randomly selected or the selection depends on the biggest probability of failure in traditional methods. However, in accordance with the traditional testability demonstration experiment method failure propagation modes in systems are ignored, which commonly leads to serious fault omission in failure sample selection. In other words, if propagation failures are caused by some failures which have very low failure rate, it means that these failures could not be selected in the traditional testability demonstration experiment because of their lower failure rate. When their failures occur in a system they will spread to other components and cause huge faults. The phenomenon could be a serious problem. That is, if we do not consider propagation failures, these low failure rate faults which cause propagation faults could be missed. This means that established test failure set is not complete and is not able to detect and isolate failures correctly. To solve the problem, numerous related researches have been developed [7][8][9]. Reference [7] proposed an approach to analyze failure propagation of aircraft engine systems with small world net theory. Li et al. utilized fuzzy probability Petri net model to describe fault propagation and then the method of sample selection based on propagation intensity was introduced [8]. This method can afford better fault coverage rate.
Our work is to employ failure propagation probability to deduce the intensity of failure propagation and then optimize maximum probability failure propagation path using ant colony optimization (ACO) according to the intensity of the failure propagation. Finally, subsequent failure propagation sets (SFPS) are built and a new failure sample selection plan is proposed. The proposed method can effectively reduce the risk of omission of propagation failures, and it increases the accuracy of failure diagnosis.
The materials in this work are arranged as follows. In Section 2, a brief introduction to the principle of failure propagation modeling and ant colony optimization is introduced. Moreover, a new failure sample selection optimization method is presented. In Section 3, a case study is used to verify the failure diagnostic effect of our method through comparing the traditional failure sample selection. Finally, brief conclusions are presented in Section 4.

Methodology
The proposed method involves three major stages: (1) failure sample selection plan design and analysis on the basis of failure propagation model; (2) path optimization with ACO; and (3) failure sample selection optimization based on subsequent propagation failure set. The block diagram of the procedure of optimization for electrical systems is shown in Figure 1. Once a sampling plan is confirmed, failure samples will be assigned to different modules. In order to detect and identify failures correctly and avoid the omission of propagation failures, failure propagation should be taken into account. Through adopting ant colony optimization algorithm, the maximum probability propagation path is searched. Then a new failure sample selection is proposed according to the intensity of edge in this path. In the section, fundamentals of the procedure of testability demonstration experiment and ACO are provided as below at first.   Figure 2 shows the process of traditional failure sample selection plan. Thereby, the primary aim of the scheme is to determine the value of and .
For testability demonstration experiment, we assume that it meets the requirements of a binomial distribution. Assume that the probability of success of each test is , after independent tests. The probability of failures can be expressed as where is the combinatorial number. It represents the number of all combinations where each combination is an unordered collection of distinct elements. And these distinct elements are taken from a giving set consisting elements. To our knowledge, for a successful testability demonstration experiment, the number of failure tests should be less than or equal to the threshold value . Therefore, the probability of failure for a successful testability demonstration experiment is equal to the sum of probabilities of its failure tests. The following expression is given by Through consultation between suppliers and customers, the design value 0 of fault detection rate (FDR) is determined. The design value is the probability of success for one test. FDR's minimum acceptable value is 1 . When 0 ≥ 1 , we consider the test has reached the design standard. The suppliers' risk is , which denotes the minimum accepted probability of success for an experiment by suppliers. The risk of using is which is the maximum probability of failure for Computational Intelligence and Neuroscience 3 the experiment. Under these conditions, we can use formula (3) to determine the values of and .
When the plan ( , ) is confirmed, samples will be assigned to different modules in the system according to layered design pattern and proportions. Then, an injected failure set is built through extracting failure modes in each module. As we all know, we should acquire only failure modes, and the number of these failure modes is less than the total number of failure modes in the systems. In order to guarantee that the injected failure set has bigger failure coverage, a hierarchical distribution of failure sample size is used. Its formula is shown in where denotes the number of assigned samples for module , is the assignment weight of the th module, is the number of failure modes of the th module and it indicates the complexity of equipment, is the operation time coefficient of the th module and it is equal to the ratio of the operation time and work life, and (MTBF) represents mean time between failures in module . Thus, is the failure rate in the th module which is expressed in failures per unit of time. Step 1 Step 2

Stage 2: Failure Propagation
Modeling. In this section, a failure propagation model will be built based on propagation probability with the use of directed graph (DG) of failure propagation and adjacency matrix. In graph theory, DG is a graph, which is a set of nodes connected by directed edges. It can be used to describe the relationship of failure propagation among components of electrical system with nodes and directed edges. In formal terms, directed graph is represented with a function DG = { , , } as shown in Figure 3. In the diagram, indicates nodes (components) set; expresses a failure set which includes 5 failure modes such as 1 , 2 , . . . , 5 ; is a set of directed edges which can clearly describe the link and the relationship between any two circuit components or modules with the capacity or intensity of the failure propagation.
These intensities of the failure propagation and relationship between nodes (components) may be heterogeneous.
Assuming the system has nodes, we introduce × adjacency matrix = [ ] to describe the link relationship between components with all zeros on the main diagonal and off-diagonal elements. It is given as follows: there is a directed edge between node and node with probability 0 there is no directed edge between node and node with probability 1 − , where is the directed weight between node and node with probability ∈ [0, 1], ̸ = 0 for ̸ = , = 0 for = , and , ∈ {1, 2, . . . , }.
The existence of an edge from node to node is determined by the probability which is independent of other edges. The probability is where ( ) is the membership degree of ambiguity set of failure states, indicates various failure's symptom signals, and represents the probability of the th . The probabilities are collected in the probability matrix = [ ].

Stage 3: Analysis of Failure Propagation.
When a failure occurs in a certain node of circuit system, the failure spreads to its connected neighbor nodes and could lead to these neighbor node failures. As the directed link weight between nodes, intensity of failure spread indicates the fact that the greater the intensity an edge has, the bigger the possibility that failure propagation happens in the edge. It means that the failure propagation may lead to bigger possibility of cascading failures to its connected neighbor nodes with bigger intensity of edge. In order to describe the intensity of failure spread, the formula of the intensity is given as follows: where is crossing-clustering coefficient. is the weight of failure propagation probability, is the weight of node degree, is the propagation probability from node to node in the th propagation step, represents subsequent node set after propagation steps, and indicates the node degree of node in . Node degree is the number of edges associated with a node.
In order to easily compare intensities between each other and also to simplify calculations, the -score of is the most suitable method to compare these intensities in our work, because -score indicates a datum above or below the mean with signed number. It is defined as = √Var ( ), where [ ] is the expected value of and is the standard deviation of the population of . For instance, we have known intensities of edges of Figure 3. Their -scores are calculated by making use of (8) and (9) as shown in Table 1. We take 2 as an example; according to the structure of the DG, we can see that there are two propagation edges from 2 -namely, edge ( 2 , 4 ) and edge ( 2 , 3 ). It is clear that the intensity (1.3908) of the edge ( 2 , 4 ) is greater than the intensity (−0.9934) of the edge ( 2 , 3 ). As a result, it is easy to determine that failure 2 leads to a failure propagation on the edge ( 2 , 4 ) with greater possibility than on the edge ( 2 , 3 ).
According to the above analysis, failure propagation happens on the path with the greatest intensity the failure propagation has. As shown in Figure 3, the bold line is 2 's propagation path with the maximum intensity.

Path Optimization with ACO.
In general, the structure of Very Large Scale Integration (VLSI) is very complex and hard to analyze failure propagation through manual work. Hence, intelligent algorithms are used. In order to obtain the maximum probability failure propagation path, the ACO is adopted in the paper.
The algorithm was proposed by M. Dorigo in his doctoral thesis in 1991 and it was aimed at solving the travelling salesman problem based on the action of ants, in which the goal was to find the shortest round-trip to link a series of cities [10]. More details about this technique can be found in [10]. The ACO has strong robustness and it is suitable for parallel implementations [11]. Therefore, we use the ACO to search the maximum probability failure propagation path.
The mathematical model of the maximum probability failure propagation path in circuits can be represented as follows: At a given time , ants make use of pheromone which is deposited between nodes to search subsequent path from node . For ant , the probability of selected next path is where is equal to the intensity of failure propagation from node to node , is the amount of pheromone deposited for transition from node to node , ≥ 0 and ≥ 1 are parameters to control the influence of and , respectively, and 1 is the set of nodes which connect with node .
The pheromones are updated by where represents the pheromone evaporation coefficient; (0) is the initialization of pheromone; ∑ =1 Δ is the amount of pheromone; is the number of ants; and Δ is the pheromone of ant .

Failure Sample Selection Optimization Based on Subsequent Propagation Failure Set. In testability demonstration experiment, for Unit Under Test (UUT), there is a replaceable unit
(1 ≤ ≤ , being the amount of replaceable units), which consists of failure modes. Adopting the rule of allocation in stratified sampling, failure modes are assigned to the replaceable unit . Thus, we need to consider selecting suitable failure modes from total failure modes to establish a failure sample set. As our discussion above, we also need to take the influence of the failure propagation into account. Here, subsequent failure propagation sets (SFPS) are made use of to optimize the failure sample set. SFPS is defined as a set of failure modes which occur in a failure propagation path and it indicates the range of failure spread.
We assume that failure mode set of the replaceable unit is = { 1 , 2 , 3 , . . . , }, and steps of failure sample selection optimization based on SFPS are described as follows.
Computational Intelligence and Neuroscience 5 Check valve failure 3 3 Pipeline failure 4 4 Exhaust port failure 5 5 Seal B failure 6 6 Adapter defect 7 7 Electromagnetic switch failure 8 8 Charging valve leakage 9 9 Refrigeration pipe deformation 10 10 Gauges failure 11 11 Electromagnetic switch leakage 12 12 Refrigeration elements failure Step 1. Count the SFPS number of every element (failure mode) of the failure mode set to construct a set = { 1 , 2 , 3 , . . . , }. Then count the number of elements which are greater than 1 in the set , marked as . After that, select failure modes from to construct a new failure mode set = { 1 , 2 , 3 , . . . , }. These selected failure modes have more than one SFPS. Next, make = − , where is the set of the remaining failure modes.
Step 4. Achieve the amount of failure sample set through adding up the failure sample sets from Steps 2 and 3.

Case Study
A certain type of air-to-air missile system consists of six modules, namely, refrigeration module, vibration control device, rectifier, shear stents, lock system, and the box of circuit. Here, we only take the refrigeration module as an example. Table 2 shows failure modes of the refrigeration module. Suppose that we have known these values of 1 , 0 , , and according to the agreed contract between suppliers and customers. By making use of formula (3) the failure sampling plan (50, 6) is confirmed. It means that 50 failure samples  will be assigned to 6 modules with proportional stratified sampling method. We have known the assignment weight 1 of the refrigeration module in the system is 0.121; thus the number of failure samples assigned to it is 6 based on expression (4). Therefore, we should pick up 6 suitable failure samples from a total of 12 failure samples to establish failure sample set for refrigeration module.
In accordance with the circuit connection of the system, directional graph of failure propagation for refrigeration module is gained as shown in Figure 4. From the graph, we can see that the component set of refrigeration module is = { 1 , 2 , 3 , . . . , 12 } and its failure mode set is = { 1 , 2 , 3 , . . . , 12 }. To take advantage of formulas (6), (7), and (8), the propagation intensity of each directed edge in the system is obtained as shown in Table 3. In the table, the range of -score value is from −1.1389 to 1.6663.
Analyzing the failure propagation with the data from Table 3, it is clear that the greater the -score value of an edge is, the more possible it is that the failure propagation happens on the edge. For instance, the -score value of intensity of Refrigeration elements failure 1 edge ( 7 , 11 ) is −1.1389. It is the smallest value in allscore values. This means that when a failure occurs in 7 , the failure could not spread to 11 or spread to 11 with tiny possibility. However, edge ( 3 , 9 ) has the greatestscore value of 1.663. It shows propagation failure is inevitable on the edge. Thus, through searching the maximumscore for each failure with ACO, we can find the maximum probability propagation path of failure easily. Take 1 as an example; failure 1 spreads along with edge ( 1 , 8 ) and edge ( 8 , 12 ) which have the biggest intensities (0.8528 and 1.6663, resp.). As a result, the maximum probability propagation path ( 1 → 8 → 12 ) of failure 1 is obtained with ACO. At the next step, we use the method mentioned in Section 2.3 to obtain the related subsequent failure propagation set PossibilitySet ( 1 ) of 1 ; PossibilitySet ( 1 ) = { 8 , 12 }. It means the failure spread number from 1 is 2. Utilizing the same method, other modules' subsequent failure propagation sets and failure spread numbers can be solved as well. Finally, the optimization samples set is established based on the failure spread path. Table 4 shows the advantages of the proposed method compared with traditional failure sampling plan. The symbol √ in the table expresses the selection of failure samples. By the traditional sample plan, 6 samples are assigned randomly to 12 modules, not taking into account the influence of failure propagation. Conversely, the proposed method can reasonably choose 6 samples under the consideration for failure propagation. Through experiment, the proposed method has better failure coverage than traditional one.

Conclusion
This paper proposes a new failure sample selection method to cover the shortage of the traditional sample selection.
First of all, we use the DG and ACO to obtain a maximum probability failure propagation path based on the intensity of edge. Then we proposed the new failure sample selection method on the basis of the subsequent failure propagation set. Compared with traditional sampling plan, this method is able to increase the coverage of failure due to establishing a relatively complete fault sample set through focusing on the propagation failure and a case study is given to demonstrate that it can decrease the risk of using.