A Fault Prognosis Strategy Based on Time-Delayed Digraph Model and Principal Component Analysis

Because of the interlinking of process equipments in process industry, event informationmay propagate through the plant and affect a lot of downstream process variables. Specifying the causality and estimating the time delays among process variables are critically important for data-driven fault prognosis. They are not only helpful to find the root cause when a plant-wide disturbance occurs, but to reveal the evolution of an abnormal event propagating through the plant. This paper concerns with the information flow directionality and time-delay estimation problems in process industry and presents an information synchronization technique to assist fault prognosis. Timedelayed mutual information TDMI is used for both causality analysis and time-delay estimation. To represent causality structure of high-dimensional process variables, a time-delayed signed digraph TD-SDG model is developed. Then, a general fault prognosis strategy is developed based on the TD-SDG model and principle component analysis PCA . The proposed method is applied to an air separation unit and has achieved satisfying results in predicting the frequently occurred “nitrogen-block” fault.


Introduction
The desire and need for accurate diagnostic and real predictive prognostic capabilities are apparent in process industry.Detecting potential problems quickly and diagnosing them accurately before they become serious can significantly increase process safety, reduce production costs, and guarantee product quality.From the aspects of methodology and technology, it involves fault detection, fault diagnosis, and fault prognosis the three major tasks of prognostics and health management PHM systems .Fault prognosis is the most difficult one, since it requires the ability to acquire knowledge about events before they actually occur 1 .There has been much more progress made in fault detection and diagnosis than in prognosis 2-5 .Despite the difficulties, some impressive achievements have been made in fault prognosis, which has been approached via a variety of techniques, including modelbased methods such as time-series prediction, Kalman filtering, and physics or empiricalbased methods; probabilistic/statistical methods such as Bayesian estimation, the Weibull model; data-driven prediction techniques such as neural network.References 1, 4, 5 had given comprehensive surveys on those fault prognosis methods.
For process industry, quantitatively data-driven methods are more attractive, because accurate analytical models are usually unavailable due to process complexity, while abundant process measurements can provide a wealth of information upon process safety and product quality.For almost all data-based process modeling, monitoring, fault detection, diagnosis, or prognosis methods, process measurements collected by the distributed control systems DCS adopted in many industrial plants are synchronized by sampling time.But in many industrial processes, such as oil refining, petrochemicals, water and sewage treatment, food processing, and pharmaceutical, raw materials are processed sequentially by a series of interlinked units along the production line.Flowing material generates flowing information.Synchronizing process measurements by sampling time implies that information delay may exist among correlated process variables located in different operation units.
For convenience, a process with two units A and B is considered, where x A and x B are two correlated variables measured at each unit, and the production material flows from A to B, as shown in Figure 1.
Suppose that an abnormal event is occurring in unit A at time k and it is not serious to cause any alarming in unit A, the event will appear in unit B at time k τ, where τ is the time-delay determined by process characteristics.Process measurement x A k may be affected by the event immediately, but x B k is still in normal state until at time k τ.The early-stage event information may be obscured by the downstream measurements if we treat process measurements in the routine form {x A k , x B k } instead of the time-series form {x A k , x B k τ }.Synchronizing process measurements by event information instead of sampling time can highlight the early-stage process abnormalities, which is of importance for realizing earlier fault detection and diagnosis for industrial processes.
Information synchronization has received extensive attentions in many scientific fields such as physics, medicine and biology, computer science, or even economy and ergonomics 6, 7 .Causality i.e., the cause-effect relationship or dynamical dependence can be detected by synchronizing the temporal evolutions of two coupled systems 8 .It is apparent that, it will be easier to carry out fault prognosis in an industrial process when the dynamical interdependences among process variables are retrieved.
In order to realize information synchronization and further benefit fault prognosis, two basic issues, identification of causality and estimation of time-delay among variables when information flow goes through the subsystems, must be firstly solved.Specifying the causality and estimating the time-delays among process variables are critically important for datadriven fault prognosis in process industry.They are not only helpful to find the root cause when a plant-wide disturbance occurs in a complex industrial process, but to reveal the evolution of an abnormal event propagating through the plant.
There is an extensive literature on causality modeling, applying, and combining mathematical logic, graph theory, Markov models, Bayesian probability, and so forth 10 .Recently, information-theoretic approaches arouse more attention, where causality can be quantified and measured computationally.The linear framework for measuring and testing causality was developed by Granger who proposed the definition of Granger causality (GC) and two x A (k) x A (k + 1)  In causal analysis of two variables in industrial process, time-delay estimation naturally arises.Because of the interlinking of process equipments, event information may propagate through the plant and affect a lot of downstream process variables.If information flow direction can be determined, then a time-delayed correlation can be taken as evidence of causality due to physical causation.There are several methods published in literatures that could be used to determine the time-delay.A practical method was proposed that used crosscorrelation function to estimate the time-delay between process measurements and derived causal maps for identifying the propagation path of plant-wide disturbances 18 .Cross-correlation technique has the benefits of simple concept and fast computation.However, it requires the correlations between measurements to be linear.Automutual information AMI was also adopted to deal with time-delay in a single time series 19 , which can be extended to multivariate time series.Compared to cross-correlation technique, entropy-based methods such as AMI or TDMI are more general as they can deal with nonlinear correlations.
This paper concerns with the information flow directionality and time-delay estimation problems in process industry and presents an information synchronization technique to assist fault prognosis.TDMI will be used in this paper as it can be easily modified for both causality detection and time-delay estimation.To represent causality structure of high-dimensional process variables, a time-delayed signed digraph TD-SDG is then developed as a process model.Then, a general fault prognosis strategy is developed, which consists of two phases: offline modeling phase I and online fault prognosis phase II .In phase I, process measurements collected from historical database are rearranged into time series form.A widely used statistical projection method, principle component analysis PCA , is used for data modeling.Data-based prediction models should also be developed offline for all nonroot nodes in the TD-SDG process model, because in phase II, the first thing is to predict the future process measurements to arrange process data in time-series form for online information synchronization.
The proposed fault prognosis strategy is applied to an air separation unit ASU .The ASU suffers from frequent nitrogen-blockage fault in the argon production subsystem.The application results show that, it can achieve early and accurate detection of the nitrogenblockage fault and meet the application needs.

Information's Directionality and Time-Delay Estimation
Entropy or information entropy is the most popular measure for quantifying information in random variable.To quantify the dependency for bi-or multivariate random variables, mutual information is widely used.Let us consider two time series X and Y .Each time series can be thought of as a random variable with underlying probability density function PDF , p x or p y .The mutual information MI between X and Y is defined as where p x, y is the joint PDF between X and Y .The mutual information function is strictly nonnegative and has a maximum value when the two variables are completely identical.Note that MI X, Y is symmetric under the exchange of X and Y , and therefore it quantifies the amount of dependency but cannot measure its directionality or causality.However, it is easy to obtain asymmetric MI, called time-delayed mutual information TDMI , by adding a time-delay τ in one of the variables using the following equations: TDMI Y X τ i,j p y j , x i τ log p y j , x i τ p y j p x i τ .

2.2
TDMI was firstly suggested by Fraser and Swinney 20 as a tool to determine a reasonable delay between two series.If the time-delayed mutual information exhibits a marked minimum at a certain value of τ * , then this is a good candidate for a reasonable time-delay.In the field of neurophysiology 21 , TDMI was extended to indicate the direction of information flow.Since TDMI XY and TDMI Y X are not symmetric, the difference between them, NI XY TDMI XY − TDMI Y X , can show the net flux of information, which may be interpreted as the information flow between them.If NI XY is positive, then the information flows from X to Y and vice versa 21 .
This idea is very similar with transfer entropy, but it is more attractive because of the following reason.Although transfer entropy is effective in determining the directionality and it has been applied to specify the directionality of fault propagation path in some industrial processes according to the literatures 22, 23 , it is difficult to determine the time-delay compared with the TDMI-based method according to our experiments.Back to the motivation of synchronizing process measurements in terms of event information, time-delay is of the same importance as information directionality.Therefore, a TDMI-based causality analysis and time-delay estimation method is proposed as below.
According to 20 , the TDMI method estimates the time-delay τ * when the TDMI function shows the first local optimum.By 2.2 , the time-delays τ * XY and τ * Y X can be defined as where N is the length of the estimation window.To specify information directionality, an index is introduced as If D XY is positive, then the information flows from X to Y with the time-delay τ * XY ; if D XY is negative, then the information flows from Y to X with the time-delay τ * Y X .Compared to the method in 21 , the above method can estimate information directionality and time-delay simultaneously.

Offline Information Synchronization and Online Information Prediction
Once the information flow directions and time-delays among process variables are quantitatively determined by the above TDMI method, it is easy to synchronize process measurements by rearranging process data in a time series form, as illustrated in Figure 2.
The offline information synchronization is simple and effective in analyzing the causal relationships among process variables, which can benefit the procedures of posthoc fault diagnosis.For earlier fault detection, we have to predict the future measurements.Take the process in Figures 1 and 2 as an example again, for online application, process measurement x B k τ is not available at time k.It is necessary to develop a model to estimate x B k τ for prognostic purpose.There are plenty of data-based prediction methods, such as time series models 24, 25 , Kalman filter 26 , and artificial neural networks 27 .All these datadriven prediction models can be easily embedded into the proposed method.Take the neural network model for example, there are many configurations and types of neural networks for data prediction.In general, multilayer perceptron MLP and radial bias function RBF networks have much faster network training which could be useful for adaptive prediction systems.When the system shows significant time-varying relationship between its inputs and outputs, dynamic or recurrent neural networks are required to model the time evolution of dynamic system.In order to interpret the panorama of the proposed fault prognosis strategy, in Section 4, a classical multilayer neural network model with back propagation BP learning algorithm is used for predicting future process measurements: x A x A (k + 1) x A (k − 1) x A x A (k + 1) x B (k − 1) x B x B (k + 1) Figure 2: Rearrangement of process data for the illustrative process in Figure 1.

Time-Delayed SDG Process Model
In Section 2.1, a pair-wise causality detection and time-delay estimation method is given for two random process variables.To deal with the high dimensionality of an industrial process, it is better to develop a signed digraph SDG model to represent the causality structure.
In a standard SDG model, nodes correspond to process variables; arcs represent the immediate influence between the nodes.Positive or negative influence is distinguished by the sign or − , assigned to the arc.In the SDG developed here, called time-delayed SDG model TD-SDG , the arcs will be assigned to represent the time-delayed mutual information, {MI i,j , τ i,j }; the arrows on the arcs indicate the directionality of information flow.The solid arcs represent positive correlation, while dashed arcs represent negative correlation.Figure 3 gives an illustrative example of a TD-SDG model.
The above TD-SDG model can be derived quantitatively from historical data following the work of 18 .Process topology will be extracted automatically based on the causality matrix.Consistency check is necessary to ensure the correctness of the derived TD-SDG model.In addition, since the proposed entropy-based information synchronization is statistical, significance testing and threshold settings are also necessary.The major steps for developing the TD-SDG model are summarized in Figure 4.
Step 1: select process variables and do data pretreatment.
Step 2: pair-wisely calculate TDMI to determine the directions and time delays.
Step 6: construct TD-SDG model.Variable selection is an important issue in constructing a TD-SDG model.In actual practice, there are two ways to do variable selection.On one hand, one can specify the key process variables according to process knowledge.Field engineers usually have rich experiences in determining those critical-to-performance process variables.On the other hand, a lot of data-based variable selection methods in the field of multivariable statistical analysis are available, for example, the Lasso technique the least absolute shrinkage and selection operator 28 , which is quite popular recently.In those methods, a performance indicator variable should first be specified, and then a regression model is developed between process variables and the indicator variable.Those process variables that have significant correlation with the indicator variables according to certain criteria are finally selected.With these selected process variables, a TD-SDG model is then developed following the steps in Figure 4.According to the TD-SDG model e.g., Figure 3 , the first variable x 1 , from which possible abnormal events may be originated, is chosen as the standard variable for calibrating the remaining process variables x j j > 1 .After synchronization, process measurements are in the form of {x 1 k , x 2 k τ 1,2 , x 3 k τ 1,3 , x 4 k τ 1,2 τ 2,4 , and x 5 k τ 1,2 τ 2,4 τ 4,5 }.In Figure 3, there is a shortcut between the nodes x 2 and x 5 .It is possible in real processes because mutual information measures variables' dependency.In some situation, if variable x A affects variable x B and variable x B affects variable x C , dependency between x A and x C may be significant and detectable.To simplify the graph model, shortcuts can be removed without any information loss.Another problem raised in Figure 3 is there may be multiple paths between two nodes such as x 1 and x 4 .But τ 1,2 τ 2,4 basically equals τ 1,3 τ 3,4 according to our experiments when the proposed method is applied to a real industrial process.It needs further theoretical study.
Online implementation involves multistep prediction problem, that is to predict future values of x 2 , x 3 , x 4 , and x 5 at time k.Multistep ahead prediction is a difficult task due to the growing uncertainties which arise from various sources, such as the accumulated errors.There are three strategies that could be frequently used for multistep prediction: recursive prediction, DirRec prediction, and direct prediction 29 .The direct prediction strategy usually provides a higher accuracy due to the avoidance of the accumulated errors and is therefore used in this paper.Thus, it is necessary to calculate the accumulated time-delays between the first variable and the downstream variables.In the example of Figure 3, we can get τ 2 τ 1,2 , τ 4 τ 1,2 τ 2,4 , and so on.The future value of a downstream variable x j at time k τ j is calculated by the model developed in Section 2.2 as follows:

PCA
PCA is one of the most popular tools in data-driven fault detection methods.By performing PCA, the original data set is decomposed into principal component PC or named as score and residual subspaces as follows: where X n×m is the data matrix, n is the number of samples, m is the number of process variables, A is the number of PCs retained in score subspace, t j is the score vector, p j is the loading vector by which the original variables are projected into score subspace, T and P are score matrix and loading matrix, respectively, X is the reconstructed data matrix, X XP A P T A , P A consists of the first A columns of the loading matrix P , and E is the residual matrix.
For process data, x k x 1 k , . . ., x m k , the Hotelling's T 2 and the squared prediction error SPE statistics are calculated in the score and residual subspaces, respectively, where t 1×A is the score vector for the data sample x k , x k is the reconstruction of x k , Λ is the diagonal matrix consisting of the eigenvalues of covariance matrix X T X. SPE gives a measure of the distance of an observation from the space defined by the PCA model, while T 2 measures the shift of an observation in the mean of the scores.For process monitoring and fault detection, SPE is the main criterion of process abnormality.But in some exceptional situations where the fault does not alter the correlation structure of process variables, T 2 will be used to assist fault detection.The control limits of SPE and T 2 can be calculated from the normal values with certain statistical assumptions.If any of the two statistics is beyond the control limit, then it means the measurements cannot be described by the PCA model, and an abnormality may happen 30 .

PCA-Based Fault Prognosis
PCA is performed on the synchronized process data as follows: where X x 1 , x 2 , . . ., x m , x j j 2, . . ., m are the variables after information synchronization.In the modeling phase or in the offline analysis, and x j j 2, . . ., m can be the true process measurements.For online application, future measurements x j k τ j j 2, . . ., m can be obtained by the neural network prediction model given by 2.6 .
Once online synchronized data x k x 1 k , x 2 k τ 1 , . . ., x m k τ m is obtained, SPE can be calculated by the PCA model for online fault detection.The proposed fault prognosis strategy can be summarized by Figure 5, which contains the key steps in offline modeling phase phase I and online process monitoring and fault prognosis phase phase II .
It should be noted that, PCA is only applicable to stationary processes.For nonstationary processes, independent component analysis ICA can be used as a fault detection tool instead of the PCA method in the proposed fault prognosis strategy.

Air Separation Unit
A cryogenic air separation unit ASU is always connected to a manufacturing process such as production of primary metals, chemicals, or gasification.In our application project, an internally compressed cryogenic air separation plant with a nominal capacity of 20,000 Nm 3 /h gaseous oxygen is studied 31, 32 .In the plant, the compressed and cooled air streams are distilled in an integrated four-column distillation system, which consists of a high-pressure main column, a low-pressure main column, crude argon sidearm column, and an argon distillation column.Figure 6 shows the key components and process variables of an argon production subsystem.
The air separation unit suffers from frequent nitrogen-block faults in the argon production subsystem.The field engineers hope to detect the nitrogen-block fault at least 10-15 minutes earlier before the variable AI 705 argon content of the crude argon gas dramatically exceeds its control limit.Although dramatic drop of AI 705 is the most obvious symptom of the nitrogen-block fault, it leaves a very narrow time window to regulate the process back to normal state.The air separation unit has a clear demand for the earlier detection and diagnosis of the nitrogen-block fault.

Application Results
The key process variables are described in Table 1 sampling period is 1 minute , which mainly involve the main column MC and the crude argon columns CAC .Two data sets X1 and X2 are collected when process is under normal operating condition changeover.X1 is for causality analysis and time-delay estimation, while X2 is for validation.Some interim results in constructing the TD-SDG model are given in Figures 7 and 8, where "1" means that information flows from the row variable to the column variable with detectable time-delay; "e" means nondetectable time-delay but significant mutual information between the two  variables; "0" means nonsignificant mutual information; "-" means not applicable.The final TD-SDG models derived from different data sets are the same, as shown in Figure 9.
To detect the nitrogen-block fault 10-15 minutes earlier than AI 705 does, it means the fault prognosis method should detect the incipient symptoms of the "nitrogen-block" fault at least from the variable AI 701 according to the developed TD-SDG model Figure 9 .It is possible to meet this requirement because AI 701 is indeed a key process variable often influenced by the "nitrogen-block" fault.Theoretically, we can predict the fault in advance of AI 705 by 22 minutes, because the total time-delay between PDI 1 and AI 705 is 22 minutes in the TD-SDG model.
The data set X3 for training and testing the neural-network prediction models covers both normal operation periods and faulty operation periods.The training data set X3 train contains 5000 samples randomly selected from X3, while the testing data set X3 test has 2000 samples mainly focusing on the faulty operation periods.Figures 8 and 9 show the performances of the neural network prediction models for process variables AI 701 and AI 705.The prediction model for AI 701 i.e., x 3 k is in the form of x 3 k 8 f NN x 1 k , x 2 k , x 3 k , x 4 k , ∇x 3 k .Note that, in particular, TI 16 x 4 k is included as an input variable because there exists strong cross-correlation between AI 701 and TI 16 as shown in Figure 9. Details on the PCA model and the prediction models for the other variables are omitted here.From Figures 10 and 11, the neural network prediction models have satisfying prediction performance, although the models involve 8-step-ahead prediction for AI 701 and 22-stepahead prediction for AI 705, respectively.Note that, in order to show the accuracy of the prediction, the predicted values are shifted 8 steps and 22 steps forward in Figures 10 and 11     the appended time-lagged process measurements may slow down fault detection.Its prediction ability is limited because the model is built on the past information.
4 When PCA is applied to the offline-synchronized data, it alarms the faults at 8848 and 19638, respectively.It can predict the faults 22 minutes earlier than AI 705, which is consistent to the theoretical analysis. 5 The method that applies PCA on the online-synchronized data alarms the two faults at 8858 and 19638.It can still achieve 10-15 minutes earlier fault prediction than AI 705, although the method involves predictions of the future process measurements.

Conclusion
Many industrial processes are confronted with information delay problem when process measurements are sampled and synchronized by sampling time.Synchronizing process measurements by information instead of sampling time can highlight the early-stage process abnormalities, which is vital for realizing earlier fault detection and diagnosis.An information synchronization technique is proposed using the time-delayed mutual information technique.A TD-SDG model is then developed to represent both information directions and information delays among process variables.A fault prognosis method is finally proposed by applying PCA on the synchronized process measurements.The application of the proposed fault prognosis method to an air separation process shows that, it can achieve early and accurate prediction of the "nitrogen-block" fault and meet the requirement of the field engineers for the air separation process.

Figure 1 :
Figure 1: Illustration of information flow in process industry.

Figure 3 :
Figure 3: Illustration of the developed SDG model.

Figure 4 :
Figure 4: Key steps in constructing the TD-SDG model.

Figure 5 :
Figure 5: Diagram of the proposed fault prognosis strategy.

Figure 8 :
Figure 8: The simplified causality matrix for ASU process. .

Table 1 :
Description of process variables.

Table 3 :
Fault detection time comparison results.