Neural Networks and Fault Probability Evaluation for Diagnosis Issues

This paper presents a new FDI technique for fault detection and isolation in unknown nonlinear systems. The objective of the research is to construct and analyze residuals by means of artificial intelligence and probabilistic methods. Artificial neural networks are first used for modeling issues. Neural networks models are designed for learning the fault-free and the faulty behaviors of the considered systems. Once the residuals generated, an evaluation using probabilistic criteria is applied to them to determine what is the most likely fault among a set of candidate faults. The study also includes a comparison between the contributions of these tools and their limitations, particularly through the establishment of quantitative indicators to assess their performance. According to the computation of a confidence factor, the proposed method is suitable to evaluate the reliability of the FDI decision. The approach is applied to detect and isolate 19 fault candidates in the DAMADICS benchmark. The results obtained with the proposed scheme are compared with the results obtained according to a usual thresholding method.


Introduction
Industrial complex automated systems are vulnerable to many types of faults (due to sensors, actuators, components, etc.). In order to maintain normal operating conditions, the human operator plays the roles of supervisor according to the several plant parameters, measurements, and observations. These faults may be abrupt or incipient. Due to the growing complexity of modern engineering systems and ever increasing demand for safety and reliability, there is a great interest in the development of fault detection and isolation (FDI) methods. Those techniques are important in process engineering because plant faults may cause abnormal operations and, if not detected early, can cause emergency shutdowns and also definitive damages. Moreover, the quality of production will not be maintained in abnormal situations (i.e., process variables deviate significantly from their nominal values). Therefore, designing robust FDI systems has received considerable attention both from industry and academia [1]. The robustness of the method depends mainly on the reliable discrimination between the effects of uncertainties in the model behavior, noises in the signal measurements, and faults that may occur [2].
FDI methods are generally separated into model-based and data-based approaches. The advantage of model-based approaches is to lead easily to residual signals by comparing the behaviors of the system with the model and to provide a mathematical framework that can be used to evaluate the performance of the method [3][4][5]. For nonlinear systems, the standard approach is to linearize the model around the operating point and to make use of usual contributions derived from linear system theory. However, linearization does not always provide a good model for the processes, in particular when strongly nonlinear behaviors are observed. Moreover, complex processes often operate in multiple operating regimes in industrial applications (e.g., mining, chemical treatment, and water treatment). So it is often not possible to obtain linear models that accurately describe the plants in all regimes. One solution is to use nonlinear methods such that nonlinear observers with analytical approach and geometric 2 Computational Intelligence and Neuroscience approach which require a perfect knowledge of nonlinear system [6][7][8]. But, nonlinear observers are limited to a few types of standard nonlinearities. Furthermore, the nonlinear observer approach can be used only when the nonlinear systems dynamics are known with sufficient confidence; this is rarely the case for real system applications [2,4,9,10]. To solve the nonlinear problem of observed data, nonlinear PCA (principal component analysis) and PLS (partial least squares) approaches have been developed [11,12]. However, PCA and PLS have a linearity assumption, limiting their application.
An attractive alternative to nonlinear techniques is to use linear multimodel strategies. The multimodel approach has been often used in recent years for the modeling and control of nonlinear systems [13]. Multimodel methods for FDI are based on the partition of the operating range into separate regions [14]. Local linear models are applied in each region. It has also been associated with Kalman filters in order to detect, isolate, and estimate the state of a system in presence of faults [3,15,16]. In addition, Lane et al. proposed a multigroup model to monitor batch processes with multiple modes [17]. Hwang and Han assumed that different operating modes have the same number of retained principal components and proposed a super PCA model to monitor multimode batch processes [18]. More recently, effectiveness of the multimodel approaches for FDI of real industrial systems has been discussed [14,[19][20][21] and Baniardalani et al. proposed a qualitative model based on fault diagnosis using a threshold level [22].
The main motivation for this research is to explore the potential of computational intelligence (CI) approaches to design models of faulty behaviors and to generate residuals for nonlinear systems [23][24][25][26]. Diagnosis is a complex reasoning activity, which is currently one of the domains where artificial intelligence techniques have been successfully applied as these techniques use association rules, reasoning, and decision making processes as would the human brain in solving diagnostic problems. The proposed method combines the benefits of model-based method (to easily generate residuals) with those of data-based methods (probabilistic methods for isolation). Some methods have been developed based on neural networks (NNs) [27]. Kramer developed a nonlinear PCA based on autoassociative neural networks having five layers [28]. Chen and Liao proposed dynamic process fault monitoring based on neural network and PCA [29]. The NN approaches are regarded as multivariate nonlinear analytical tools capable of recognizing patterns from noisy complex data. Their major advantages include learning, noise suppression, and parallel data processing [10].
Intelligent systems found broad application in fault diagnosis from their early stages because an expert system simulates human reasoning about a problem domain, performs reasoning over representations of human knowledge, and solves problems using heuristic knowledge rather than precisely formulated relationships, in forms that reflect more accurately the nature of most human knowledge. Neural networks are able to learn diagnostic knowledge from process operation data. However, the learned knowledge is in the form of weights which are difficult to comprehend. In this work, an FDI method is proposed that generates a large number of residuals computed according to the set of fault candidates. For each fault candidate, a model of faulty behavior is worked out and residuals are obtained with this model. The advantage of using models for both faultfree and faulty behaviors lies in the fact that, in addition to estimating the state of the system, faulty models provide the probability of occurrence or activation of each model in case of dysfunction. These probabilities are used for diagnosis issues. The residuals are analyzed according to their magnitude and signature and a confidence factor evaluates the performance of the decision. The method is validated with the DAMADICS benchmark process [30]. This benchmark is well-defined for FDI purposes. The paper is organized as follows: In Section 2, the FDI problem is presented for the DAMADICS valve actuator. In Section 3, the design of NN models for faulty and fault-free behaviors is put forward and FDI based on those models is developed in Section 4. Section 5 presents the application of our contributions to the DAMADICS benchmark problem. Finally, in Section 6, some concluding remarks are provided.

FDI for Electropneumatic Actuator
The DAMADICS benchmark is an engineering research case-study that can be used to evaluate FDI methods. The benchmark is an electropneumatic valve actuator in the Lublin sugar factory in Poland [30]. The DAMADICS has been used as test bed of the fault detection and diagnosis approach proposed in this paper. Its main characteristics are as follows: (a) the DAMADICS benchmark is based on the physical phenomena that give origin to faults in the system; (b) the DAMADICS benchmark clearly defines the process and data sets; the fault scenarios are standardized. This is done in view of industrial applicability of the tested FDI solutions, to cut off methods that have no practical feasibility.

Electropneumatic Actuator Description.
The actuator consists of three main parts as follows: control valve (V); pneumatic servomotor (S); positioned (P). It is depicted in Figure 1. Furthermore, each of the three main parts Computational Intelligence and Neuroscience 3 consists of other components shown in Figure 1, such as the following: positioner supply air pressure (PSP); PT: air pressure transmitter; FT: volume flow rate transmitter; TT: temperature transmitter; ZT: rod position transmitter; E/P: electropneumatic converter; V1, V2: cut-off valves; V3: bypass valve; Ps: pneumatic servomotor chamber pressure; and CVI: controller output (PC output). In this actuator, faults can appear in control valve, servomotor, electropneumatic transducer, piston rod travel transducer, pressure transmitter, or microprocessor control unit. A total number of 19 types different faults are considered (p = 19, Table 1). The faults are emulated under carefully monitored conditions, keeping the process operation within acceptable limits. Five available measurements and one control value signal have been considered for benchmarking purposes: process control external signal (CV), liquid pressures on the valve inlet ( 1 ) and outlet ( 2 ), liquid flow rate (F), liquid temperature ( 1 ), and servomotor rod displacement (X) ( Table 2).
To test the robustness of the proposed fault detection and diagnosis method, several tests have been performed with the set of 19 different types of abrupt and incipient faults with several severities, according to the benchmark rules defined in the actuator benchmark library (DABLib) [31]. The simulations have been conducted considering the physical variables free of noise and affected by noise. Furthermore, all simulation tests have been performed considering the simulator input variables. A sampling time of 1 s has been used by the fault detection system, while the simulator uses a fourth-order Runge-Kutta method with a fixed step size of 0.0025 s. The results achieved during the tests are summarized in Table 1. The white cells in Table 1 indicate that such faulty scenarios were not considered for benchmark purposes.
Within the DAMADICS project the actuator simulator was developed under MATLAB Simulink. This tool makes it possible to generate data for the normal operating mode and also for the 19 faulty modes. The considered faults are presented in Table 1. They can be considered either as abrupt or incipient. Abrupt faults may have small (S), medium (M), or big (B) magnitude. The mark " * " denotes the faults that are specified for benchmark. In this study, results are provided in case of big magnitude.

FDI Issues for Electropneumatic
Actuator. The conditions for testing and validating the FDI algorithms on the actuator benchmark are given in [32,33]. The system has already experimented several FDI methods [34][35][36]. In [36], binaryvalued evaluation of the fault symptoms is explored and the authors focus on the optimization of the neural network architecture according to Akaide Information Criteria and Final Prediction Error. Both criteria include the learning error and also a term that depends on the complexity (size of the network in number of nodes) and on the dimension of the learning set in order to optimize the ratio complexity/performance. The authors provide interesting performances with small networks for detection but some faults are not isolable. In comparison, our approach will require a larger number of networks and the networks have more nodes but all faults will be detected and isolated. In [34], multiple-valued evaluation of the fault symptoms is introduced to improve the isolation of faults. Such a method requires a heuristic knowledge about influence of faults on residuals. In comparison, our approach uses 3-valued evaluation of the residuals for fault-free behaviors and binaryvalued evaluation of the residuals for faulty behaviors.

Model of Fault-Free Behaviors.
Physical processes are very often complex dynamic systems, having strong nonlinearities. As a consequence, knowledge based models are not easy to obtain. Simplifications are essential to formulate an exploitable model, but are degrading the accuracy of the mathematical model. Other problems remain with some model parameters that are not easy to measure or estimate and that could be variable in time. Another approach lies in the systematic processing of data collected by sensors. At this stage, unknown nonlinear systems are considered with input vector ( ) = ( ( )), = 1, . . . , , and output vector ( ) = ( ( )), = 1, . . . , . The state variables are not measurable. NNs are introduced to generate accurate models of the system in normal operating conditions [37,38]. The comparison between the output of the system and the output 0 ( ) = ( 0 ( )), = 1, . . . , , of the NN model gives the error vector ( ) = ( ( )), = 1, . . . , , with The learning of the ANN is obtained according to the Levenberg-Marquardt algorithm with early stopping. This algorithm is known for its rapid convergence. During learning stage, the NN is trained with data collected during the normal functioning of the system. The NN is then validated with another set of data. In order to get the best model, several configurations are tested according to a trial error processing that uses pruning methods to eliminate the useless nodes. Finally the resulting NN will be used as a fault-free model of the system.

Model of Fault-Free Behaviors for Actuator.
We have constructed a multilayer perceptron (MLP) NN to model the coupled outputs 1 ( ) = ( ) and 2 ( ) = ( ) of the DAMADICS actuator system in case of fault-free behaviours. We note 10 ( ) = ( ) and 20 ( ) = ( ) the estimated values of ( ) and ( ) processed by the NNs: where NNFM(0) stands for the double MLP structures with inputs CV, 1 , 2 , 1 , , . To select the structure of NNFM(0), several tests have been carried out to obtain the best architectures (with minimal number of hidden layers and number of neurons by layer) for modeling the operation of the actuator. Table 3 provides some results obtained during this stage. The training and test data were generated by the simulation of the Matlab Simulink actuator model. Validation Fully or partly opened bypass valves * * *  is done by the measured data provided by the "Lublin Sugar Factory. " From Table 3, the structure NNFM(0) = NNFM(6, 3, 2) is selected to avoid the phenomenon of overlearning. Adding more nodes in hidden layers does not improve the performance of NNFM(0). The modeling results are very satisfactory because no noise was considered and the modeling errors are less than 10 −5 for the first output and about 10 −4 for the second output.

Models of Faulty Behaviors.
When multiple faults are considered, the isolation of the detected faults is no longer trivial and early diagnosis becomes a difficult task. One can multiply the measurements and use some analysis tools (residuals analysis) in order to isolate the faults. But the number of sensors limits the use of such approach. Another approach is to use a history of collected data to improve the knowledge about the faulty behaviors and then to use this knowledge to design models of faulty behaviors and additional residuals. Such models will be used to provide estimations for each fault candidate and then the decision results from the comparison of the estimations with the measurements collected during system operations. The systematic design of models for the fault-free behaviors is the first component of the proposed approach. The design of models for faulty behaviors is similar to the method described in Section 3.1. The learning of faulty behaviors is obtained according to the Levenberg-Marquardt algorithm with early stopping. Each model is built for a specific fault candidate that is considered as an additional input.

Models of Faulty Behaviors for DAMADICS.
The preceding method is applied to build NNs models corresponding to the 19 fault candidates that are considered with DAMADICS benchmark. For that purpose, it is necessary to create a data base that contains samples for all faults exposed to the DAMADICS system [39]. The method is illustrated in Figure 4 for the fault 3 . The network NNFM(3) learns the mapping from = 6 inputs to = 2 outputs when fault 3 is assumed to affect the system from time = 0. Equation (3) holds: where NNFM (3)

Principle.
The proposed approach is based on the analysis of the outputs obtained after applying the input ( ) on the real system and also in parallel on the fault-free and faulty NN models ( Figure 5). Detection and diagnosis result from  Inputs The residual 0 ( ) provides information about faults for further processing. Fault detection is based on the evaluation of residuals magnitude. It is assumed that each residual 0 ( ), = 1, . . . , , should normally be close to zero in the faultfree case, and it should be far from zero in the case of a fault. Thus, faults are detected by setting threshold 0 on the residual signals ( Figure 6 The main difficulty with this evaluation is that the measurement of the system outputs ( ) is usually corrupted by disturbances (e.g., measurement noise). In practice, due to the modeling uncertainties and disturbances, it is necessary to assign large thresholds 0 in order to avoid false alarms. Such thresholds usually imply a reduction of the fault detection sensitivity and can lead to no detections. In order to avoid such problems, one can run also the models of faulty behaviors from = 0 and use the method described below. The idea is to evaluate the probability of the fault candidates at each time. A fault is detected when the probability of one model of faulty behaviors NNFM( ), = 1, . . . , , becomes larger than the probability of the fault-free model NNFM(0).

Proposed Method for Fault Diagnosis.
The diagnosis results either from the usual thresholding technique or from the online determination of fault probabilities and confidence factors [39]. In the second method, the faulty models run simultaneously from time = where is the fault detection time. Each model will behave according to a single fault candidate and the resulting behaviors will be compared with the collected data to provide a rapid diagnosis. In case of numerous fault candidates , = 1, . . . , , the output The introduction of probabilities to evaluate the significance of each residual and the reliability of the decision is another component of our approach. The proposed method uses a time window that can be sized according to the time requirement. Diagnosis with a large time window includes a diagnosis delay but will lead to a decision with a high confidence index. On the contrary single diagnosis with a small time window leads to early diagnosis but with a lower confidence index. To evaluate the probability of each fault Computational Intelligence and Neuroscience ( , , ) is used to determine which is the most probable fault according to delayed or early diagnosis. Two particular cases are considered for = and = 0.
The most probable fault at time is given according to the a posteriori analysis of ( , , ) computed for the time interval The probability ( , , ) that the current fault is will be given by The probability ( , , ) that the current fault is will be given by The window size is selected in order to satisfy real time requirements for rapid diagnosis. Let us mention that a confidence factor for diagnosis can also be worked out The preceding method can also be combined with a thresholding technique to avoid the multiplication of residuals and to provide a reliable decision according to a hierarchical scheme. In a first stage, a small number of residuals are evaluated and analyzed. This stage leads to the determination of a subgroup of possible faults that have the same signature. Then, the fault probabilities are used within this subgroup in order to select the most probable fault candidate.

Fault Detection.
The residual vector 0 ( ) = ( 0 ( )), = 1, 2, is first considered for fault detection: where and are the outputs of the NN model of faultfree behaviors. The detection is obtained according to the comparison of residuals with appropriate thresholds. Threevalued signals are obtained (positive, negative, and zero). The thresholds were calculated according to the standard deviation of the residual for fault-free case [39]. Let us notice that the choice of constant or adaptive thresholds strongly influences the performance of the FDI system. The thresholds must be thoroughly selected. For the continuation of our work, the thresholds 10 = 10⋅ 1 and 20 = 10⋅ 2 are selected where 1 and 2 are the standard deviations obtained from the learning process. Table 4 sums up the detection performances for the 19 types of faults according to the sign of the residual vector 0 .
The evaluation of residual vector 0 leads to a first stage in detection and isolation: from Table 4     also exceeds the threshold in some points but these points can be interpreted as outlaws and the faults 3 and 9 are difficult to separate.
The second method leads to better results. Let us define the cumulative residuals 1 ( , , ), 2 ( , , ), and the distance ( , , ), according to (8) and (9). The application of the method described in Section 4.3 leads to the results in Table 5. Delayed diagnosis with a large time window is obtained according to (10).
The diagnosis results are reported in Table 5 for = 1000 s. The column 5 of Table 5 shows that the probability for fault 3 is about 52% and the confidence factor for the diagnosis is about 51% according to (15). To conclude 3 is the most probable fault when residuals are analyzed within time interval [0, 1000 s].
Early diagnosis for fault 3 is also illustrated by selecting a small time interval with = 50 s. For any ∈ [0, 1000], the model with minimal distance to the origin (i.e., minimal value of ( , 50, )) corresponds to the most probable fault. Figure 9 reports the probabilities of the fault candidates from the instant of detection versus time and also the confidence factor of the FDI decision. One can notice that the signals and CF exhibit a specific frequency of 0.01 Hz that corresponds to the frequency of input.
In Figure 9(a), the curve above in red corresponds to the probability of the fault 3 . This probability increases with time and reaches the value 1 at time = + 290 = 791 s. It varies quickly during the decision phase [500 s 550 s]. This illustrates the robustness of our method. Figure 9(b) shows the variations of the confidence factor calculated by (15)  The FDI method proposed is also applied to isolate 15 . The application of the method described in Section 4.3 leads to the results in Table 6. Table 6 reports the location of each model NNFM( ) in plan ( 1 , 2 ) and the distance ( , , 458) at time = 1000 s. The column 5 of Table 6 also reports the probabilities of  each fault candidate according to (14). From this column one can conclude that the most probable fault is 15 : the fault probability for 15 is about 96%. In the same time the probabilities of the other faults do not exceed 3%. Such indicators provide a confidence factor for the diagnosis about 96% according to (15). Early diagnosis of fault 15 is illustrated by selecting a small time interval with = 50 s. For any ∈ [0, 1000], the model with minimal distance to the origin corresponds to the most probable fault. In Figure 11(a), all trajectories are reported; the trajectory for model NNFM(15) is highlighted. Figure 11(b) plots details about the trajectory for model NNFM (15).
The trajectory corresponding to NNFM(15) remains near origin in comparison to the other trajectories. One can conclude that the fault candidate 15 is the most probable fault. The repartition of the cumulative residuals in plan ( 1 , 2 ) confirms the significance of both outputs ( ) and ( ) to design residuals (we can notice that cumulative residuals 1 ( , , ) and 2 ( , , ) cover the positive part of plan ( 1 , 2 )). Figure 12 reports the probabilities of the fault candidates from the instant of detection versus time and also the confidence factor of the FDI decision.
In Figure 12(a), the curve above corresponds to the probability of the fault 15 . This probability increases very quickly and reaches the value 1 at time = + 100 = 558 s. Figure 12(b) shows the variations of the confidence factor calculated by (15) and confirms that the 15 fault is the most probable fault. One can notice that the confidence factor for the isolation of fault 15 reaches quickly the value 1 in comparison with fault 3 : the reason is that 15 is an abrupt fault whereas 3 is an incipient one.
The fault 5 is also simulated during time interval [302 s 1000 s]. This fault cannot be detected with the thresholding technique: the residuals in Figure 13 are obtained and one can notice that no residual from group 3 overcomes the thresholds previously defined.
In this case, detection and isolation are obtained in a single stage by considering simultaneously all residuals for models in group number 3 (i.e., 0 , 5 , 8 , and 14 ). The probabilities of the models NNFM(0), NNFM(5), NNFM (8), and NNFM (14) are reported in Figure 14   model NNFM (5) increases from time = 500 s (curve with blue circles). The confidence factor reported in Figure 14(b) illustrates that the decisions provided by the FDI system are reliable in intervals [100 s 300 s] and [500 s 1000 s]. Table 7 reports some conclusion concerning the detection and diagnosis of faults for the DAMADICS benchmark and according to the considered method. Results are detailed (1) for the fault detection with thresholds (according to the evaluation of residual 0 ); (2) for the fault isolation with thresholds (according to the evaluation of residuals 0 to 19 ); (3) for the fault detection and isolation with probability and confidence factor computation (according to the evaluation of residuals 0 to 19 ). 84% of the fault candidates are detected with the thresholding method. The delay to detection never exceeds 30 s. But faults in group 3 are not detectable with the considered thresholds. Decreasing the detection thresholds improves detection results but leads also to false alarms and fault 14 remains undetectable. Some faults are detected but cannot be isolated with thresholds (e.g., 3 and 9 ): isolation succeeds for 63% of the fault candidates with the thresholding method. In comparison, the computation of fault probabilities and confidence factor leads to the detection and isolation of all faults (for the considered example). In a few cases, the confidence factor is near 0.5 and the decision is not considered as reliable. The computation effort with the proposed method is to run several (up to 6) models in parallel.

Conclusion
The proposed FDI scheme combined the design of neural networks to model fault-free and faulty behaviors of industrial systems (residuals generation by using thresholding  method for isolation) with a probabilistic method (evaluating the fault probability and the confidence on decision). The results are compared with a usual thresholding method. Both techniques give correct decisions in many cases. However, the results obtained with the method based on the computation of the probabilities are better and the reliability of the decision is also explicitly evaluated. In particular the proposed method does not require computing thresholds for detection and isolation and as a consequence is easier to use for incipient faults.
The systematic design of fault-free and faulty models based on NNs has been proved to be suitable for early detection and diagnosis issues in case of nonlinear systems. The application of the proposed method on the DAMADICS benchmark illustrates also the performance of the proposed FDI approach.
From our point of view, the main limitation of the proposed method is the rapid increase of the computational effort when numerous fault candidates and numerous outputs are considered. To reduce this effort, one can notice that some residuals contain useful information for FDI and others are quite useless. Based on the evaluation of a confidence factor for each residual, we will study a method to select the more reliable residuals. Another drawback is that the proposed method requires the design of models that include the influence of faults. The strength and size of faults also can influence the model behavior. For these reasons, the method must be carefully applied depending on the system under conditions. Our future works are also to validate this technique by applying it on other systems with various operating conditions and various faults.