Application of a Fault Detection and Isolation System on a Rotary Machine

The paper illustrates the design and the implementation of a Fault Detection and Isolation (FDI) system to a rotary machine like a multishaft centrifugal compressor. A model-free approach, that is, the Principal Component Analysis (PCA), has been employed to solve the fault detection issue. For the fault isolation purpose structured residuals have been adopted while an adaptive threshold has been designed in order to detect and to isolate the faults. To prove the goodness of the proposed FDI system, historical data of a nitrogen centrifugal compressor employed in a refinery plant are considered. Tests results show that detection and isolation of single as well as multiple faults are successfully achieved.


Introduction
In oil and gas plants many processes such as refinery, natural gas extraction/compression, energy production, and gasification, involve the use of rotary machines like centrifugal pumps, turbines, and centrifugal compressors.Given the importance and the crucial role played by these machines, in the literature several approaches for their control and supervision have been proposed.For control purposes, the compressor regulation distant from the surge point has been discussed in [1][2][3][4][5] while the influence of compressors working conditions which can greatly affect the overall efficiency of the plant is considered in [6].In recent years the attention has been focused on the prevention of possible recurrent malfunctions and potential faults which may cause equipment inactivity or even their complete break.In fact a prompt identification of perturbations on the working conditions by a diagnostic system may increase the availability of the machines and improve plant safety while achieving functioning costs reduction as well.For the fault detection on pumps machines several studies have been carried out: Dalton and Patton in [7] have proposed a model-based fault diagnosis system of a two-pump systems.Nold and Isermann in [8] have developed a fault detection system for centrifugal pumps and AC drives also derived from a model-based approach.Dealing with other rotary machines, various works can be found in the literature to diagnose possible faults on centrifugal compressor: in [9] the authors suggest using the time series analysis with neural network to realize a fault diagnosis system for monitoring a centrifugal compressor; in [10] the monitoring of the bearing temperature employed in a centrifugal compressor has been achieved.Another study concerning the detection of the fault occurring on the compressor bearings has been discussed in [11].A comprehensive discussion on fault diagnosis system applied on rotary machines can be found in [12][13][14].
In the present work a multivariable data-driven and model-free approach, that is, Principal Component Analysis (PCA) [15], has been adopted.The focus is the design of a Fault Detection and Isolation (FDI) system to be applied on a multishaft centrifugal compressor.The main reason for choosing a model-free technique rather than a modelbased one is that, as often verified for large scale chemical processes, it was difficult to develop a detailed physical model for the compression process.The presented FDI system deals with the functioning of the complete centrifugal compressor system, and this feature differentiates it by typical works on centrifugal compressors which focus on particular aspects like surge control or bearing temperature monitoring as in [1][2][3][4][5][9][10][11].
In the literature, other authors suggested using a PCA technique for industrial plant applications: in [16] the authors use the PCA to detect possible faults on Tennessee Eastman (TE) Process while Bezergianni and Kalogianni in [17] propose the PCA approach on a hydrotreating process.
The contribution of the paper is the adoption of the adaptive thresholds approach, suitably implemented, to compute the gains associated with the inputs of the thresholds.
The paper is organized as follows.In Section 2 the background of the Principal Component Analysis is briefly summarized.In Section 3, the developed adaptive thresholds approach is described.In Section 4 the rotary machine considered for the actual implementation of the FDI system is briefly described.Results on single and multiple faults detection and isolation are reported in Section 5. Section 6 gives conclusions and future developments.

Theoretical Background on PCA
A brief overview of the Principal Components Analysis and its use in fault detection and isolation is given here.The main steps of the procedure useful for the understanding of the proposed FDI system are reported.Further detailed descriptions of the method are available in published articles (see among others [15,[17][18][19]).
The Principal Component Analysis can be considered a subspace decomposition technique by which the process measurement space is divided into two orthogonal subspaces, that is, the principal components (PC) subspace and residual subspace.The PC subspace contains the components that account for a maximal amount of total variance in the observed variables.In practice, given a data matrix  of  samples of  variables, an optimal linear transformation of  is operated by which the process measurement space is divided into the two orthogonal subspaces mentioned above.In this way possibly correlated variables of the process are mapped into a smaller number  <  of uncorrelated variables called Principal Components.The main steps of the PCA procedures can be summarized as in the following [18].
Consider the transformation of the data matrix  of normal variables (zero mean and unit variance) given by Matrix  in ( 1) is named the score matrix while  is the loading matrix.The columns of matrix  are the eigenvectors associated with the eigenvalues of the following matrix  proportional to the data correlation matrix: Once the number of the  most significant components is determined, the loading matrix is partitioned as follows:  = ( P P) , P ∈  ×  (3) and, considering the linear transformation (1), the data matrix  is partitioned into two parts: X, the principal part of the data explained by the first  eigenvectors and the residual part X explained by the remaining components: Equation ( 4) is sometimes called Back-Transformation process; by this transformation it is possible to obtain the original variables with only significant variances; that is, if the number of Principal Components is correctly chosen, insignificant noise effects are removed.
When approaching a PCA problem, the number of the Principal Components retained in the model is an essential parameter that strongly determines its performance.When too few components are retained, the model will not capture all of the information in the data and a poor representation of the process will result.On the other hand, if too many components are chosen, the model will be overparameterized and it will include noise.Several approaches for selecting the optimal number of PCs have been developed.The approach followed in this paper is based on the ANalysis Of Variance, ANOVA test (see [20,21]), that was proven to be a reliable and objective method.
Stored dataset is processed offline applying the ANOVAbased test so that principals components are determined; the loading matrix  is then computed and partitioned according to the selected PCs as in (3).Hence, back-transformation (4) is calculated online for each new sampled data vector x [1×] .The reconstructed vector x * is thus obtained: Then the signal of the original variables x is compared with the reconstructed one x * ; the difference between the two signals is called residual.Comparing the actual residual with its trends in standard operative conditions, detection of faults can be achieved.However, based on this residual quantity, isolation of the fault cannot be performed.

Fault Isolation
Developing a (good) diagnostic system, the interest in general is not just to accomplish the faults detectability but rather to specify the kind of the fault that has occurred, thus realizing the primary function of isolation.At this regard various methods have been developed for the generation of particular structured residuals.The authors have chosen to apply the so-called structured residual approach as described by Tharrault et al. in [18].These structured residuals are based on the reconstruction principle; the founding idea of this approach is to reconstruct variables using the PCA model with the remaining variables of the model.Different subsets  of variables to be reconstructed can be considered.The reconstruction of variables consists in estimating, for each new sampled data, the reconstructed vector x by eliminating the effect of possible faults occurred on the selected variables.
To achieve isolation the reconstructed variables x are projected on the Residual subspace by a new projection matrix generating x .Given the propriety of the projection matrix which states that its product with the matrix that identifies the directions of reconstruction is null, it is possible to isolate potential faulty components.In fact from the above property it can be deduced that the components of x are not sensitive to the components of  belonging to the subset .
This property can be used to identify which components of  are disturbed by faults.Through the analysis of all the residual trends, computed considering different (structured) variables subsets, faults directions are identified.
Finally, considering the large dimensions of the problem, instead of checking all the components of computed residual vectors separately, it may be convenient to group them in to a single index called square prediction error (SPE) calculated as the square Euclidean norm of the computed residuals: Fault isolation is based on the threshold violation of one or a few SPE variables.

Adaptive Thresholds.
To perform the fault detection and isolation, it is necessary to compare the SPEs values to threshold values which can be fixed or adaptive.Values for static threshold are typically computed, analyzing the SPEs signal in normal operative conditions.This solution works well if the process stays in a steady state or if the operating point does not change.Furthermore, when operating with real processes data, it is highly possible that due to the presence of noisy data and/or due to the influence of external conditions, the generated residuals exceed the set fixed threshold even without faults.A possible solution is to enlarge threshold values.To set the tolerances, compromises have to be made between the threshold size for avoiding misdetection problems because of too large threshold values and the generations of false alarms because of normal fluctuations of the system.
In order to improve the isolation feature, the authors have chosen to employ an adaptive threshold approach.In fact, the process variability is such that the adoption of a fixed threshold would imply many spurious false alarms.
In the literature, different approaches to the design of adaptive thresholds have been introduced.Isermann in [15,19] observed that deviations of the generated residuals frequently depend on the amplitude and frequencies of the input excitation and proposed an adaptive threshold scheme based on a first-order high-pass filter of the input signal with a possible additional proportional enlargement.A lowpass filter is then introduced to smooth the dynamic of the threshold.Conversely Clark in [22] proposes a scheme where the use of the high-pass filter is substituted by static gains.Following Clark's approach, the adaptive threshold scheme, proposed by the authors, is constituted by a term proportional to the amplitude of the input signals and of a constant term for a tight tuning (see Figure 1).The proportional term is obtained, multiplying the input data signal by a gain matrix which is diagonal.
For the actual computation of the gains no clear suggestions are given in [22] so that a new method based on the frequency analysis of the input signals has been proposed, and implemented by the authors.The single gains elements have been computed by taking into account the input variables that actually contribute to the generation of each SPE.The significance of each input signal has been associated to its energy in terms of its power spectrum in the frequency domain.A variable thresholds approach has thus been adopted which resulted effective and more efficient than applying fixed ones.In particular, it has been chosen to assign low gain weights to signals "poor" in frequency while high weights have been assigned to signals with the highest energy contents.

The Multishaft Centrifugal Compressor
In the present paper, a multishaft centrifugal compressor is considered.Centrifugal machines are critical equipments; their essential characteristics have been the large pressure rises and flow rates involved.Given the importance of and the crucial role played by compressor machines, in recent years an increasing attention has been given to the prevention of possible frequent malfunctions and potential faults which may cause inactivity of compressor or even its complete break.
The machine, called BLNC (base load nitrogen compressor), is located in the air separation unit (ASU) of a refinery plant and is employed for nitrogen compression in the dilution of a particular gas, the Syngas, which is forwarded to a gas turbine.It is a complex machinery consisting of two sections: the first section includes two compression stages while the second comprises three compression stages.
In order to decrease the nitrogen temperature at the exit of the compression stage, leveling it at its input value before compression, a heat exchanger is positioned at the end of each stage.In these conditions, the compression is nearly equivalent to an isotherm process which requires less mechanical work for the compression.
Variables considered in the construction of the data matrix  employed in the PCA diagnoser consist of sensor and actuator (positioner) variables.Their tag names used in the refinery are summarized in Table 1 together with their description.As it can be observed, variables considered for the detection and isolation of faults are the nitrogen (N 2 ) flow through the multishaft compressor measured at different points, the commanded actuators signals, and their actual values used to regulate the position of the IGVs.All the variables included in the PCA data matrix  are process variables; no variables related to thermodynamic parameters    were considered for the implementation of the FDI system in the case considered here since it has been verified that these variables were not particularly affected by the analyzed faults.

Fault Detection and Isolation System
The first step of PCA method concerns the selection of the Principal Component retained in the model.At this regard, for the computation of the matrix A (2) to be effective, the absence of faults in the system data is required.Furthermore, the absence of measurement noise has to be assumed.In fact, if the calculation of matrix  is performed from a noisy dataset, errors on variables reconstruction may influence the PCA model and, consequently, the overall fault identification process.For the application to the multishaft centrifugal compressor, to avoid measurement noise effects, PCA training data have been gathered just after instruments calibration.
The resulting PCA model together with the application of adaptive thresholds guarantees good performances in terms of fault detection and isolation even in the presence of perturbations due to altered ambient conditions like, for example, the ones that can be daily experienced due to the alternation of day and night.At the same time, the succession of the seasons may call for the FDI system update, and to improve system sensibility, different PCA training dataset could be considered.Accordingly, with the PCA model, the measurement space has been partitioned into two orthogonal spaces: the principal component subspace, which includes data variations according to the principal component model, and the residual subspace, which includes data variation not explained by the model.Applying the ANOVA procedure developed by the authors (see [20,21]), the dimension of the PC subspace is set to four.In order to verify if the selected number of PCs is adequate to correctly explain the original system's variables, the signals of the original variables are compared with the reconstructed ones.Plots of Figure 2 show the measured and reconstructed signal.With the exception of Figure 2(f) (positioner of the IGV) and Figure 2(i) (N 2 mass flow at the head of the high pressure column) where slight mismatching can be noticed, all the reconstructions are in good agreement with measurements.
After having trained the model on system data in the absence of faults, that is, after having chosen the PCs that make up the loading matrix P used for the generation of the structured residuals (see (1)) the diagnoser has been tested on both single and multiple faults.
Faults that may possibly occur in the centrifugal compressor concern errors in the sensor readings and/or in the actuators.By inspection of historical data of the compressor at issue, the most common faults were found to be faults of the  actuators and mass flow sensors as specified in Table 2. To test the FDI system both single and multiple faults are simulated.The simulation is necessary to test the system on critical and not easy to detect faults such as drift faults and multiple faults.

Single Fault Case: Third Stage IGV Positioner Fault.
The diagnoser has been tested on the detection and isolation of faults of the Inlet Guide Vanes (IGV) of the third stage of the compressor.An abrupt failure of the actuator, which caused its complete breakdown, was documented on the historical data at disposal; this failure was correctly detected by the Fault Diagnosis module, but since the detection in this case is quite trivial, we have chosen to test the diagnoser performances on an intermittent fault by modifying the real data by the addition of step or ramp variations.These kinds of faults may be likely caused by a temporary malfunction of the leverage used for IGV handling.A drift on the IGV positioner has been simulated with the addition of a ramp signal up to 10% of the variable amplitude starting from the 50th sample as shown in Figure 3.When the drift of the actuator causes the measure to rise up to 5%-7% of   its standard value, the square prediction error overcomes the assigned threshold.Figures 4, 5, and 6 show SPE computed on the first variable together with its difference from the adaptive threshold (see [11,15]); it can be noticed that, at sample 62, SPE value exceeds the threshold.Since each SPE takes into account all the variables but the one or the ones associated with the direction(s) under inspection, it is necessary to check the temporal trends of all the other residuals.As a matter of fact, all the residuals are influenced by the IGV positioner fault with the exception of the residual associated with the fault variable.
This situation allows the correct isolatation of the fault.Figure 7 shows the deviations of the SPE from the threshold; it can be noticed that the difference remains negative for all the experiment sampling time only in correspondence of the 7th variable (ZT 89704).
These results show that the fault is detected at the 12th sample after its onset; given a sampling time of five minutes, the fault detection time can be computed in the order of 60 minutes.The behavior of the diagnoser in term of promptness is linked to the adopted large sampling time; given that actual process dynamics which are rather slow, the result can be considered fully satisfactory.Given the interest to investigate the performances of the proposed system on an operational dataset covering a large period of time (about two years), the adoption of a sampling time of five minutes was consequentially forced by the dataset at disposal.It is clear that the proposed FDI system, when implemented online, can process data at a faster sampling rate (typically one minute).No sensible limitations are imposed by the computational load of the proposed approach; the lower limit is determined at the I/O acquisition level: the DCS employed for controlling the machinery under study does not handle sampling period lower than 0.1 seconds; moreover, for many of the considered variables in Table 2, a sampling time not lower than one minute is generally set.

Multiple Faults Case: First and Third Stage IGV Positioners
Fault.To check the validity of the proposed system on the detection of multiple faults, faults on the two IGV partitioners of the first and third section, respectively, have been simulated.
The simulated faults are supposed to be simultaneous, and they have been constructed with the following characteristics: (i) Positioner ZT 89700: a trend of the signal is simulated at the time instants where the IGV position is opened around 60% of its total value; (ii) Positioner ZT 89704: a bias of 10% of its average value is simulated.
In Figure 8, the differences between the single direction SPEs and the relative thresholds are shown.As it has to be expected, all the SPEs exceed the threshold.
The system correctly isolates the faults occurred indicating the presence of a multiple fault on the 3rd and 7th variables.Computed residuals concerning some of the variable pairs are reported in Figures 9 and 10.As expected the only SPE that remains under the threshold is the one that is associated with the variables in fault, that is, variable 3 and variable 7.

Conclusions
A Fault Diagnosis system for the detection and the isolation of expected faults of a rotary machine based on the Principle Components Analysis technique has been developed.The considered machine is a multishaft centrifugal compressor located in an integrated gasification and combined cycle of a refinery plant.The adoption of a model-free technique is justified by the fact that in the process industry, rich process data are available while, conversely, the development of a physical model is a demanding task that may not assure suitable results.
For the Practical implementation of the PCA, the choice of the number of principal components to be retained in the model has been based on an approach centered on the ANalysis Of Variance, ANOVA test, and for what concern the detection and the isolation issues, a structured residual approach has been applied and an adaptive threshold has

Figure 1 :
Figure 1: The scheme employed by the authors to realize the adaptive threshold.

Figure 2 :
Figure 2: Figures (a)-(i) show the trends of the real variables (blue) and their reconstruction with PCA (red).
prediction error (SPE ) relative to the first variable

Figure 4 : 2
Figure 4: SPE relative to the N 2 mass flow sensor and its threshold.

Figure 5 :Figure 6 :Figure 7 :Figure 8 :
Figure 5: Differences between the SPE relative to N 2 mass flow sensor and its threshold.

2 Figure 10 :
Figure 10: SPE calculated from the faulty variables pair (variable 3 and variable 7) remains under threshold.

Table 2 :
Most common compressor faults.