A Hybrid Process Monitoring and Fault Diagnosis Approach for Chemical Plants

Given their potentially enormous risk, process monitoring and fault diagnosis for chemical plants have recently been the focus of many studies. Based on hazard and operability (HAZOP) analysis, kernel principal component analysis (KPCA), wavelet neural network (WNN), and fault tree analysis (FTA), a hybrid process monitoring and fault diagnosis approach is proposed in this study. HAZOP analysis helps identify the faultmodes and determine process variablesmonitored.TheKPCAmodel is then constructed to reduce monitoring variable dimensionality. Meanwhile, the fault features of the monitoring variables are extracted, so then process monitoring can be performedwith the squared prediction error (SPE) statistics of KPCA.Then,multipleWNNmodels are designed through the use of low-dimensional sample data preprocessed by KPCA as the training and test samples to detect the fault mode online. Finally, FTA approach is introduced to further locate the fault root causes of the fault mode. The proposed approach is applied to process monitoring and fault diagnosis in a depropanizer unit. Case study results indicate that this approach can be applicable to process monitoring and diagnosis in large-scale chemical plants. Accordingly, the approach can serve as an early and reliable basis for technicians’ and operators’ safety management decision-making.


Introduction
The chemical industry is one of the most important economic forces in world development [1].The industry is adopting an increasingly large-scale, highly automated, and complex system because of increasing demands in terms of product quantity and production efficiency.However, serious consequences, such as major production losses, human injury, and environmental impact, can occur when errors emerge in chemical plants.Consequently, a significant amount of attention has been directed to the reliability and safety of the system in the chemical industry [2,3].
A chemical process is broadly classified into a normal condition, abnormal condition, and fault condition.An abnormal condition is a range of abnormal operating states that are beyond the normal state but lack automated shutdowns [4].This condition can occur in the system when the actual conditions deviate from original design conditions because of a slight fluctuation in variables or disturbance.If the abnormal condition is not monitored and handled in a timely manner, it could transform into a fault condition.Therefore, early monitoring and diagnosis of an abnormal condition are critical, so that appropriate actions can be taken to avoid fault.
In early times, process fault diagnosis completely relies on the domain knowledge of experts because of the lack of advanced monitoring devices and diagnosis approaches.As a result, faults cannot be monitored and diagnosed in a timely and accurate manner because of limitations in human ability.Over the past few years, many researchers have focused on process monitoring and fault diagnosis approaches to ensure process system safety [5][6][7].Various techniques for process monitoring and fault diagnosis have been developed, such as mathematical-based models and knowledge-based and datadriven techniques.Mathematical-based approach was first proposed.However, the application of this approach is also limited because an accurate mathematical model is difficult to achieve or may even be unavailable for some complex industry plants.With advancements in computer control and artificial intelligence, data-driven and knowledge-based approaches have been developed in recent years.These approaches are constructed on the basis of the historical 2 International Journal of Chemical Engineering information of process variables and priori knowledge, respectively.Compared with the mathematical-based approach, accurate process models do not require to be established in the above two approaches.Therefore, data-driven and knowledge-based approaches are widely applied in process monitoring and fault diagnosis in industry plants [8].
An artificial neural network (ANN) is an extensively used knowledge-based approach in pattern recognition and classification of a nonlinear complex system because of its strong self-learning ability and nonlinear modeling.Among various ANN techniques, wavelet neural network (WNN) is a new class of neural network that has been used successfully in many studies [9].Compared with other ANN techniques, WNN has a universal optimum capacity and a fast convergence speed.However, the fault diagnosis accuracy can decrease with the large architecture of WNN when the input data dimensionality is very large in the case of massive monitoring variables in large-scale chemical plants.
Various data-driven approaches have been used; examples are principal component analysis (PCA), independent component analysis (ICA), partial least squares (PLS), and techniques aided by subspace identification.Among these approaches, PCA is the most popular method applied in chemical process diagnosis.It is a multivariate statistical approach in which the fault feature of variables can be extracted and variable dimensionality can be reduced by analysis of the correlation among variables [10].The state change of the system, that is, process monitoring, can then be monitored with PCA.Nevertheless, accurately identifying fault root causes with the conventional contribution plot approach in PCA is difficult because complicated process controls and recycle loops are common in industrial process [8].
In recent years, some researchers presented an improved PCA algorithm integrated with ANN for process monitoring and fault diagnosis.Chen and Liao proposed a process fault monitoring process based on a neural network and the PCA algorithm for chemical dynamic processes [11].Kulkarni et al. developed a monitoring model of batch processes with a PCA-assisted generalized regression neural network [12].Rusinov et al. built hierarchical neural networks integrated with PCA for fault diagnosis in chemical processes [13].Jiang and Yan proposed a PCA model integrated with support vector data description for chemical process monitoring [14].Although PCA and ANN have been successfully applied in process monitoring and fault diagnosis, PCA may not efficiently capture nonlinear features in the nonlinear process of the actual industrial complex system because PCA assumes that the relationship among variables is linear [15].Kernel principal component analysis (KPCA) based on kernel function was first proposed by Schölkopf et al. to solve the problem caused by nonlinear data [16].This approach maps the input space into a high-dimensional feature space and then computes the principal components (PCs) [17].KPCA has proven more effectiveness than PCA in process monitoring and fault diagnosis [18].
Although the approaches discussed above have been employed in the fault diagnosis of some processes, in practice it is almost impossible for any method only to be successfully used for fault diagnosis of the large-scale chemical process system.For instance, it is extremely difficult to find the root causes of the fault by using alone PCA or KPCA.While the identification of fault root causes can be performed based on predefined knowledge base and previous experiences, in general, ANN or WNN used alone will lead to a large neural network size with long learning time and low diagnosis accuracy.Nonetheless, the combination of PCA and ANN also suffers from a drawback.There are usually a large number of the process variables in the complex chemical plants, so it is difficult to determine appropriate process monitoring variables for the specific fault mode by utilizing the above two approaches.
To address these problems, a hybrid approach based on hazard and operability (HAZOP) analysis, KPCA, WNN, and FTA for process monitoring and fault diagnosis is proposed in this study.HAZOP analysis is used as the first step to determine the fault mode and process variables that are monitored under fault condition.The KPCA model is constructed based on normal historical data obtained from process variables, and the squared prediction error (SPE) statistics is applied to process monitoring.Then, low-dimensional fault data preprocessed by KPCA are considered as the training and test samples of WNN.Finally, the FTA models are used as predefined knowledge base to further locate the fault causes of the fault modes.The proposed approach can be utilized to quickly monitor abnormal and fault conditions and effectively identify fault root causes.The information generated can then serve as a reliable decision-making basis for technicians and operators.

Process Monitoring and Fault Diagnosis Approach
The flowchart of the proposed process monitoring and fault diagnosis approach is shown in Figure 1.The herein proposed approach involves two steps: (a) establishment of the process monitoring and fault diagnosis model and (b) online application of the process monitoring and fault diagnosis model.
Phase 1. Firstly, the fault modes and process variables monitored are determined by HAZOP analysis.Based on the historical monitoring data under normal condition, KPCA model is then constructed for reducing data dimensionality and obtaining SPE statistics.Next, multiple WNNs models are established to find fault modes through the use of fault historical data preprocessed by KPCA as the training and test samples.Finally, FTA method is introduced to identify fault root causes.
Phase 2. The preestablished KPCA model is applied to transform the high-dimensional online process monitoring data into lower dimensional data and calculate SPE statistics to monitor the abnormal and fault condition.Then, if SPE value exceeds the control limit, the data are fed to WNN models to pinpoint the fault mode.Moreover, the fault root causes by FTA can be further identified to effectively diagnose faults.
The procedure of the proposed approach is described as follows.2.1.HAZOP.HAZOP analysis is currently recognized as the most widely used and preferred approach to identify hazards in the chemical process industry [19].The plant studied is usually divided into some independent nodes to facilitate HAZOP analysis.For each node, some deviations are defined.These deviations consist of a guide word and a process variable, such as "higher temperature" and "lower temperature."Then, the causes and consequences of the potential hazards caused by these deviations are discussed by the HAZOP team.In this study, the deviations obtained from HAZOP analysis are used to build the fault modes that comprise the knowledge base in the fault diagnosis system.

Process Monitoring Based on KPCA.
The basic idea of KPCA is to first map the input space into a highly dimensional feature space via nonlinear mapping and then compute the PCs on the feature space () [17].This means that the data are performed PCA on the kernel feature space.Compared with that of the PCA approach, the main advantage of the KPCA approach is that it can extract more statistical features in the greatest degree relative to the original nonlinear data.In the present study, KPCA is used to reduce dimensionality and extract fault feature.Meanwhile, the SPE statistics is conducted to online monitor the incipient abnormal condition, and the control limit of SPE statistics under the normal condition is defined as   .When SPE value exceeds   , the incipient abnormal condition can be detected and fault diagnosis can be performed by the following procedure.The detail KPCA procedure is presented in [15].

Fault Diagnosis Based on WNN.
WNN is employed to quickly detect the fault modes of the abnormal condition.
WNN is a new type of feedforward neural network that combines wavelet transform and ANN.The difference between WNN and conventional ANN is that a wavelet function is introduced to WNN as an activation function instead of the sigmoidal function.WNN integrates the advantages of both the wavelet multiscale time-frequency localization properties and self-learning of the neural network.Therefore, the convergence speed is faster and the universal approximation performance is stronger for WNN than for conventional ANN.Multiple fault modes exist in industrial processes.The diagnosis accuracy and convergence speed of WNN may thus be lower than those of multiple neural networks if a single WNN is constructed, which can cause a significantly large network topology.For this reason, multiple WNNs need to be separately constructed for different fault modes in this study.
A WNN with a three-layer network structure that consists of an input layer, a hidden layer, and an output layer was constructed and is shown in Figure 2.
After dimensionality reduction through KPCA, the fault historical process data were used as the input of the training and test samples in WNN.The code of the fault mode obtained by HAZOP analysis was considered as the output of WNN.In this case, the node numbers of the input layer and the output layer were determined with the PCs number of KPCA and the code digit of the fault modes, respectively.In this work, the numbered fault mode in ascending order is represented by a binary ASCII value, such as "0, 0, . . ., 1, 0, . . ., 0." The node number of the hidden layer can be calculated according to International Journal of Chemical Engineering where , , and  denote the node number of hidden layers, input layers, and output layers, respectively;  is a specified parameter within the range (0, 10).A Morlet wavelet function is selected as the activation function of the hidden layer; that is, The outputs of the hidden layer ℎ() and the output layer () can be obtained by the following expressions: where   and   are the factor of translation and dilation in wavelet function, respectively;   is the weight parameters between the input and the hidden layer nodes; and   is the weight parameters between the hidden and output layer nodes.The detailed descriptions of the above equations can be found in [20].The historical data under abnormal and fault conditions are used to train the WNN structure.The preceding four parameters are updated and optimized continuously in the network training process until the absolute error value meets the desired goal.Finally, the ASCII value of the WNN predicted output is converted into an Arabic number that corresponds to the fault mode.

FTA.
From the preceding procedure, the fault mode under the abnormal and fault condition can be detected with WNN.In many situations, however, locating the fault cause accurately is considerably difficult for technicians and operators because a fault mode usually relates to several fault causes.In this study, the FTA approach is introduced to locate fault causes under abnormal and fault condition.FTA is a popular digraph approach used to perform quantitative risk assessment of a defined industrial process by combining the primary events with Boolean algebraic operators as indicated by the gates [21].The output produced by WNN, that is, the fault mode, is regarded as a top event (TE) of FTA.Subsequently, the intermediate events (IEs) of the different levels are determined from top to bottom until all possible basic events (BEs) are identified by FTA.In this way, the fault tree models are constructed to use as predefined knowledge base.The fault root causes in a fault tree are often represented by the minimal cut sets of the fault tree [22].Moreover, the higher is the probability of occurrence of the minimal cut set, the higher is also the probability that the event of the corresponding minimal cut set occurs.As a result, the probability of occurrence of each minimal cut set is evaluated quantitatively and ranked in descending order.In this case, the minimal cut set with the highest rank is considered as the most probable fault cause.Based on FTA results, technicians and operators can pinpoint the fault cause under the abnormal and fault condition to conduct appropriate preventive actions.

Case Study
In this section, the proposed process monitoring and fault diagnosis approach was applied to a depropanizer unit.The schematic diagram of the depropanizer unit is illustrated in Figure 3.The liquefied petroleum gas (LPG) from the other unit enters the feed tank of the depropanizer and then pumped to the feeding preheater with a feeding pump, where it is heated to bubble point temperature.The LPG is fed into the 27th tray of the depropanizer.Both C2 and C3 fractions of the top of the depropanizer are condensed and then recycled to the reflux tank of the depropanizer.A part of the condensate is used as the reflux of the depropanizer.The remainder condensate is pumped to the deethanizer by the deethanizer feeding pump.C4 and C5 fractions from the depropanizer bottom are fed to the deisobutanizer.

Data Sample Collection.
From the HAZOP analysis results, 27 process monitoring variables (Table 1) were used, and the six types of deviations (Table 2) were selected as the fault modes for the case study.The dynamic process simulation model of the unit was established to simulate the normal operating condition, abnormal condition, and fault condition with UniSim software.After the simulation model ran for 50 minutes under normal operating condition, six types of disturbance signals were superimposed on the system to simulate the abnormal and fault conditions generated by fault modes.Fault samples were collected from the beginning of the disturbance.Each process variable under normal and fault conditions was sampled on a 20-and 4-second interval, respectively, in the simulation.A total of 150, 1125 samples were recorded under the normal operating condition, abnormal condition, and fault condition, respectively.A total of 1125 simulated sample data corresponding to each fault mode were randomly partitioned into a training set and a test set.The training set of each fault mode consists of 900 samples.The remaining 225 samples were considered as the test set.In addition, the sample data of any type of fault mode were regarded as an online validation set to verify the performance of the process monitoring and fault diagnosis model.

Online Process Monitoring and Fault Diagnosis.
The sets of sample data under normal condition were inputted into the KPCA model.The eigenvalue and cumulative contribution rate of the first seven PCs of the kernel matrix generated from KPCA are shown in Table 3.This table shows that the cumulative contribution rate ( value of 85%) of the first seven PCs is above 85%, so the number of the PCs was set as seven.That is, the number of dimensionalities of the sample data was reduced to 7 through KPCA from the original 27 dimensionalities.This finding indicates that dimensionality reduction of the nonlinear data through KPCA is obvious.Meanwhile, the SPE control limit (  ) was computed for process monitoring use.The online 2000 validation samples of fault mode 3 collected on a 3-second interval were inputted into the KPCA model to evaluate the performance of the online process monitoring model.The generated SPE (confidence limit of 99%) chart is shown in Figure 4.This figure shows that the SPE value at the 500th sample moment rapidly increases and exceeds the SPE control limit, and the system condition at the 904th sample moment becomes a fault condition.That is, the system condition at the 500th sample moment is a turning condition from the normal condition to the abnormal one.The results in the SPE chart agree with the predefined situation.In conclusion, the fault feature of the monitoring variables can be extracted effectively in KPCA monitoring.
Then, the causes resulting in abnormal and fault conditions were detected.Six types of fault modes were divided into two groups.In this way, each WNN corresponds to three types of fault modes.Two sets of 7-dimensional fault sample data preprocessed by KPCA were inputted into the WNN model.The fault mode code corresponding to the WNN output is represented by a six-digit ASCII code (Table 2).Therefore, the node number of the input layer and the output layer is 7 and 6, respectively.The node number of the hidden layer is determined as 14 according to (1).In this case, the WNN structure is 7-14-6.A total of 2700 training sets and 675 test data sets were used for WNN.The predicted absolute error of WNN is 0.05. Figure 5 shows the prediction results of the 900 validation samples with fault mode 3 in WNN.The Arabic numbers, from 1 to 6, represent the corresponding node sequence number of the WNN output layer in Figure 5.For example, data display the first node outputs obtained using the first WNN for validation samples in no. 1 subplot of Figure 5(a).Figure 5 illustrates that the above 90% outputs of the first WNN obviously accord with the pattern of "0 0 0 0 1 1" within a specified limit of ±0.2.The outputs correspond with fault mode 3.However, the outputs of the second WNN are not followed by the regular pattern.The results of this case  study show that the fault diagnosis accuracy of the proposed approach is very high.After the fault mode under abnormal condition is identified, further diagnosis is required to pinpoint the root cause through FTA.A fault tree and each event related to fault mode 3 are illustrated in Figure 6 and Table 4, respectively.The fault mode 3, that is, no flow of the overhead product to the deethanizer, is defined as the TE of the fault tree.And then two IEs and five BEs are obtained by  FTA.The minimal cut set of the fault tree and the occurrence probability (8000 h) corresponding to each minimal cut set are shown in

Conclusions
In this study, a hybrid process monitoring and fault diagnosis approach is presented.To identify effectively potential abnormal and fault conditions, HAZOP analysis is used to analyze deviations as fault modes and process variables monitored.The KPCA method is developed to reduce data dimensionality and build the process monitoring statistics by extracting fault feature.According to HAZOP analysis, lowdimensional data corresponding to fault modes with the use of KPCA are regarded as the training and test samples of WNN.Then, the WNN model is constructed to detect the fault mode under an online abnormal and fault condition.
To locate the root causes related to the fault mode, FTA is a particularly useful method to assist technicians and operators in quickly identifying potential risks.Process monitoring and fault diagnosis of a depropanizer unit are performed in a case study.The results show that the proposed hybrid approach is effective in ensuring process safety in large-scale chemical plants.
The sample dimensionality of the process monitoring variables in the KPCA model may be largely caused by process complexities in real plants.The dimensionality of the obtained kernel matrix may thus become significantly high; that is, the curse of dimensionality can occur.This problem in turn significantly affects the calculation speed of KPCA.Using precompression techniques for data, such as immune algorithm, can help address this problem.The number of monitoring variables can decrease significantly as a result.
On the other hand, if new fault modes or fault causes occur in process monitoring, the collected new data samples should be added to the WNN and fault tree model.When the same fault reoccurs, the system can detect the fault mode and pinpoint the fault causes.

Figure 1 :
Figure 1: Flowchart of the process monitoring and fault diagnosis approach (black line: establishment procedure of the process monitoring and fault diagnosis model; red dot line: online application of the process monitoring and fault diagnosis model; HAZOP: hazard and operability; KPCA: kernel principal component analysis; WNN: wavelet neural network; FTA: fault tree analysis).

Figure 2 :
Figure 2: Network structure of the WNN.

Table 2 :
Fault mode versus the output code in WNN.

Figure 4 :
Figure 4: SPE chart for the online validation data set by KPCA.

Figure 5 :
Figure 5: WNN outputs of the fault mode 3. (a) represents predicted outputs produced by the first WNN; (b) represents predicted outputs produced by the second WNN (black line: the predicted outputs of the WNN; red line: the desired outputs of the WNN).

Table 3 :
Cumulative contribution rate of the eigenvalue.

Table 5 .
The table depicts that BE2 and BE1 are the prime consideration factors that result in the occurrence of TE.When the SPE value exceeds the control limit in the fault diagnosis system, based on the knowledge base of the FTA, technicians and operators can be assisted in effectively locating the root cause and then taking appropriate measures to eliminate the fault.Figure 6: Fault tree with fault mode 3 (TE: top event; IE: intermediate event; BE: basic event).

Table 5 :
Probability of occurrence of the minimal cut set.