Multimode Process Monitoring Based on Sparse Principal Component Selection and Bayesian Inference-Based Probability

According to the demand for diversified products, modern industrial processes typically have multiple operating modes. At the same time, variables within the same mode often follow a mixture of Gaussian distributions. In this paper, a novel algorithm based on sparse principal component selection (SPCS) and Bayesian inference-based probability (BIP) is proposed formultimode process monitoring. SPCS can be formulated as a just-in-time regression between all PCs and each sample. SPCS selects PCs according to the nonzero regression coefficients which indicate the compact expression of the sample. This expression is necessarily discriminative: amongst all subset of PCs, SPCS selects the PCs which most compactly express the sample and rejects all other possible but less compact expressions. BIP is utilized to compute the posterior probabilities of each monitored sample belonging to the multiple components and derive an integrated global probabilistic index for fault detection of multimode processes. Finally, to verify its superiority, the SPCS-BIP algorithm is applied to the Tennessee Eastman (TE) benchmark process and a continuous stirred-tank reactor (CSTR) process.


Introduction
Over the past two decades, with the development of complex chemical processes and the growing demand of plant safety and stable product quality, timely process monitoring is gaining importance.Because large amounts of data can be gathered by the use of distributed control systems (DCSs), multivariate statistical process monitoring (MSPM) algorithms have received great attention.Among these algorithms, principal component analysis (PCA) and partial least squares (PLS) are the most widely used algorithms [1][2][3][4][5][6][7][8].Both algorithms project high-dimensional data onto lower dimensional subspaces.Process normal and abnormal conditions can be isolated by the use of Hotelling's  2 or squared predicted error (SPE) [9][10][11][12][13].Other complementary MSPM algorithms, including independent components analysis (ICA), Fisher discriminant analysis (FDA), and canonical variate analysis (CVA), are used to overcome some limitations in PCA/PLS-based monitoring schemes [14][15][16][17].However, most of MSPM algorithms rely on the assumption that the system is in a single operating region and that the data follow a Gaussian distribution.In chemical processes, operating condition shifts are often encountered due to the changes of various factors such as feedstock, product specification, set points, and manufacturing strategy.When a process is running under substantially different operating conditions, only a small number of variables actually follow Gaussian distribution [18].As a result, the multimodality of data distribution might lead to unseemliness for the monitoring of conventional MSPM algorithms.To address these problems, it is necessary to develop new algorithms.
In literature, multiple models can be built to fit each individual operating mode, but these are two essential issues that need to be addressed.One is how to divide the training data into multiple subsets correctly, corresponding to different operating modes.In order to solve this issue, many clustering algorithms are applied.In terms of the traditional approaches, Ge and Song [19] used fuzzy C-means clustering algorithm to separate the training data set according to the unique characteristics of each mode.He et al. [20] applied 2 Mathematical Problems in Engineering the -nearest-neighbor method.Srinivasan et al. [21,22] identified the different operating modes by evaluating the Euclidean distances between samples in a constructed data window and then applied dynamic PCA-based similarity measures to cluster the samples.Liu and Chen [23] developed a method using Bayesian classification for selecting multiple regions from a training data set.Zhao et al. [24] presented a multiple principal component analysis (MPCA) algorithm that selects one suitable model to monitor multimode processes.The other issue is how to determine the final results.A proper measurement should be employed to determine which model is the most suitable one for monitoring at the current moment.Ng and Srinivasan [22] exploited the most suitable PCA model through a minimized distance reflecting both the  2 and SPE values.Zhao et al. [24] close the local PCA model with the minimum SPE value.Natarajan and Srinivasan [21] used the distance between the sample and the center of local models as a criterion.Yu and Qin [25] performed Bayesian inference on the postprobabilities calculated by the Gaussian mixture model (GMM) or the nonlinear kernel GMM.Meanwhile, Ge and coworkers [26,27] took advantage of Bayesian inference to softly combine the monitoring results computed by local models built by means of probabilistic PCA (PPCA), factor analysis (FA), or subspace algorithms.
To date, the problem of how to correctly divide the training data into multiple subset can successfully be solved by many algorithms mentioned in the previous paragraph.However, there are still some issues that need to be resolved; the most important one is how to select the key principal components (PCs) when using one suitable model for process monitoring.Many algorithms for selecting PCs have been proposed, such as cumulative percent variance (CPV) [28], variance of reconstruction error (VRE) [29], and cross validation (CV) [30].Generally, most of the classical algorithms just take normal operational observations into account and select the first several PCs with larger variance.While PCs with larger variance of normal data cannot guarantee the capture of the largest variations in fault data online.Jolliffe [31] suggested that the last PCs may be as important as those with large variance.Togkalidou et al. [32] noted that the PCs with larger variance do not always contain much information for prediction.However, this issue is insufficiently discussed in PCA-based process monitoring, and the standard PC selection is still not established.
Fortunately, many researches have been aware of the inherent defects of classical PCA algorithm.A lot of workers tried to seek a subspace spanned by key PCs, which contains the most important information for process monitoring.Peng et al. [33] suggested a new feature selection algorithm, named minimal-redundancy-maximal-relevance criterion (mRMR).It is based on mutual information and selects the features with highest relevance to the target class.Jiang et al. [34] put forward the sensitive principal component analysis for fault detection and diagnosis in chemical processes.They pointed out that PCs selected by PCA algorithm are not always the key PCs for fault detection.Their task was to find the sensitive PCs which have relationship with fault information.Arbel et al. [35] proposed that the process variables that are preponderant in achieving specific objectives need to be selected.
In this paper, a process monitoring algorithm using multisubspace sparse principal component analysis with the BIP algorithm is put forward.First, variables are divided into different subblocks corresponding to different units or pieces of equipment to reduce the complexity of process analysis.By using BIP algorithm, multimode data in each subblock are divided into multiple subgroups.BIP can compute the posterior probabilities of each monitored sample belonging to the multiple components and derive an integrated global probabilistic index for fault detection of multimode processes.The PCs selected by PCA algorithm with larger variances do not always have relationship with fault information.Sparse principal component selection (SPCS) takes the information of both normal and abnormal observations into account.The algorithm is formulated as a just-in-time form that constructs an elastic net regression between all PCs and each sample.SPCS selects PCs corresponding to the nonzero regression coefficients which indicate the compact expression of the sample.This expression is necessarily discriminative: amongst all subset of PCs, SPCS selects the PCs which most compactly express the sample and rejects all other possible but less compact expression.Third, the key PCs are selected by SPCS in each subgroup to solve the problem of fault information loss.It needs to be stressed that the subspace spanned by the key PCs selected is the feature subspace.Finally, in order to verify the superiority of the SPCS-BIP algorithm, it is applied to the Tennessee Eastman (TE) benchmark problem and a continuous stirred-tank reactor (CSTR) process.

Preliminaries
2.1.Principal Component Analysis.Principal component analysis is a multivariate statistical analysis which is widely used in chemical process monitoring, fault detection, and so forth [36][37][38].Let x ∈ R  represent an -dimensional sample vector and X ∈ R × denote a data matrix with zero mean and unit variance, where  is the number of samples and  is the number of variables in the process.From the statistical viewpoint, the PCA algorithm could be obtained by singular value decomposition (SVD) [28,34]: where T ∈ R × and P ∈ R × are the score matrix and the loading matrix, respectively. is the principal components retained number.The loading matrix P can be obtained by eigenvalue decomposition on the covariance matrix cov(x) as follows: where Λ = diag{ 1 ,  2 , . . .,   } denotes the eigenvalue matrix and P = [ P P] contains the loading matrices of component subspace and residual subspace, respectively.

Construction of Finite Gaussian Mixture Model Based on EM.
For the process running at multiple operating condition, owing to the mean shifts or covariance changes, the assumption of multivariate Gaussian distribution becomes invalid [21,22].In this situation, the local Gaussian distribution is still appropriate to characterize each subset of measurement data from the same operating conditions.Therefore, the finite Gaussian mixture model is prime suited to represent the data sources driven by different operating modes [13,24,25].
To construct a FGMM, given a set of training samples X ∈ R × , the log-likelihood function can be expressed as and the parameter estimation problem is formulated as where is the prior probabilities, and  is the number of Gaussian components included in FGMM.  is the mean vector and Σ  is the covariance matrix.
There are a lot of learning algorithms, such as maximum likelihood estimation (MLE), EM, and the F-J algorithm, that have been put forward for mixture model estimation [39,40].As a more tractable numerical strategy, the EM algorithm has been well used in practice to estimate the maximum likelihood distribution parameter [39].EM algorithm is implemented iteratively by means of repeating the expectation step (E-step) and maximization step (M-step) to calculate the posterior probabilities and then the corresponding distribution parameters until a convergence criterion of the log-likelihood function is satisfied.Given the training data X and an initial estimate }}, the iterative E-step and M-step are expressed as follows: (i) E-step: where  () (  | x  ) denotes the posterior probability of the th training sample within the th Gaussian component at the th iteration; (ii) M-step: where

𝑙
, and  (+1)  are the mean, covariance, and prior probability of the th Gaussian component at the (+1)th iteration, respectively.

Fault Detection with Sparse Principal Component Selection and Bayesian Inference-Based Probability
In this section, the idea of SPCS-BIP algorithm for multimode process monitoring is demonstrated in detail.We first introduce the Bayesian inference-based probability which can derive the confidence boundary around the normal operating regions for process monitoring and fault detection.Then, the sparse principal component selection was introduced for selecting the key Pcs related with fault information.Finally, the steps of this algorithm were given.

Bayesian Inference-Based Probability.
In the previous section, the FGMM has been constructed, and it is essential to further derive the confidence boundary around the normal operating regions for process monitoring and fault detection.Due to the multimodality of mixture distribution, it is really difficult to capture the analytical boundary of the density function ( | Θ) in a certain confidence level.
In the proposed monitoring approach, given an arbitrary monitored sample   belonging to each Gaussian component, Bayesian inference strategy is used to calculate the posterior probability as follows: which can also be formulated as Given that each component   follows a unimodal Gaussian distribution, the squared Mahalanobis distance of   from the center of   follows  2 distribution, provided that Under the assumption that   ∈   and  2  has  degree of freedom, ((  ,   ) |   ∈   ) denotes the squared Mahalanobis distance between   and the mean center of   .Owing to colinearity, Σ  is usually ill-conditioned, and the following regularized Mahalanobis distance is utilized instead to avoid too wide confidence regions: where the function of  is to remove the ill condition of covariance matrix Σ  by adding a positive number to all the diagonal entries.
For the monitored sample   , a local Mahalanobis distance-based probability index relative to each Gaussian component   can be defined as or Given the appropriate degree of freedom,  ()  (  ) can be computed by integrating the  2 probability density function.Under a given confidence level, this index has the function of indicating whether the monitored sample is normal or abnormal provided that it belongs to the corresponding Gaussian component.A global BIP index is proposed to combine the local probability metrics across all the Gaussian clusters because the random characteristic of each monitored sample may come from multiple Gaussian components with the corresponding posterior probabilities.The formulation of BIP index for the monitored sample   is given by where the posterior probability (  |   ) is used to incorporate the contribution of each local Gaussian component to the overall probabilistic index.As 0 ≤  ()  (  ) ≤ 1, we have Under the preset confidence level (1−) 100%, the process is determined within normal operation if Otherwise, the process operation is treated out of control.

Sparse Principal Component Selection.
Sparse representation has proven to be an extremely powerful tool for acquiring, representing, and compressing high-dimensional data [41][42][43].This success is mainly because of the fact that the important reconstruction information of data such as process data and time series data has naturally sparse representations with respect to fixed bases, or concatenations of such bases.Qiao et al. [44] proposed that the graphs constructed by the where β = [ β1 , β2 , . . ., β ] are the sparse representation coefficients and Card() denotes the number of nonzero elements of .From the perspective of statistics, formula ( 16) can be named the Lasso criterion.Lasso is a penalized least squares algorithm which was originally by quadratic programming imposing a constraint on the  1 norm of the regression coefficients.Thus, the Lasso estimates β are obtained by minimizing the Lasso criterion: where  is nonnegative.However, only using the  1 -norm penalty in Lasso has its limitation.Zou et al. [45] proposed that if there is a group of variables among which the pairwise correlations are very high, lasso tends to select any variable from the group and does not consider which one is selected.Fortunately, elastic net was put forward by Zou et In brief, it is expected that the elastic net is used to group a set of sparse coefficients to construct the sparse alignment matrices, in which the sparse representation information or the potential discriminative information is encoded to enhance the discriminative ability in an unsupervised manner.

Fault Detection with SPCS and BIP.
The key problem for monitoring the multimode process is to select a suitable model and choose the subspace spanned by key PCs.In the Introduction, we had put forward the fact that the subspace spanned by the first several PCs with largest explained variance does not always have fault information.
In the following part, a novel multimode process monitoring approach based on SPCS and BIP is proposed.This approach is in a just-in-time form.For each sample, an elastic net regression between all PCs and the sample is constructed and solved.The PCs which have nonzero regression coefficients are retained, while other PCs are rejected.That means, for each sample, we can pick out the most discriminative bases and the others are set to zero.Its concrete calculating steps are summarized in Figure 1.

Offline Modeling
(1) Collect a set of historical training data under all possible operating conditions.
(3) For each submodel, get a normal operational observation set X =∈ R × , where  is the number of samples and  is the number of variables.This set is denoted as the training set for threshold determining.A testing set Y ∈ R × with both normal and abnormal observations is given for testing.
(4) Normalize the training data through the mean value and variance of each variable.

Online Monitoring
(1) Normalize the current time point data by using mean values and variance of the training data.
(2) Obtain the loading vector P from offline modeling. (

Case Studies on the TE and CSTR Process
In this case study, the TE benchmark and CSTR process are introduced to verify the effectiveness of the SPCS-BIP algorithm.PCA-GMM is the classic algorithm for multimode processing monitoring.And the fault detection index (FDI) is similar to Bayesian inference probability (BIP).So here, a comparison was made between SPCS-BIP and PCA-GMM.
In addition, to verify the improvements of SPCS algorithm, which can select sparse PCs, a comparison was performed between the SPCS-BIP algorithm and the MPPCA algorithm.

Tennessee Eastman Process.
As a well-known benchmark process, the Tennessee Eastman process, which was presented by Downs and Vogel, has been widely applied to evaluate and compare the efficiency of process monitoring techniques [46,47].The schematic diagram of the process is illustrated in Figure 2.This process consists of five major unit operations: a reactor, a product condenser, a vapor-liquid separator, a recycle compressor, and a product stripper.In addition, there are six modes of process operation, as listed in Table 1.The variables can be divided into three categories: composition variables, continuous process variables, and manipulated variables.In our study, only modes 1 and 3 were simulated through the Simulink programs developed on the basis of the decentralized control strategy designed by Ricker [48].The Simulink programs can be downloaded from http://depts.washington.edu/control/LARRY/TE/download.html.The 31 selected monitoring variables contained 9 manipulated variables and 22 continuous process variables.Thus, these variables were divided into five subblocks according to five units.However, given that only two variables were allocated to each the compressor unit and the condenser unit, there were four variables assigned to the other three related subblocks.As a result, the total of 31 variables was divided into three subblocks.There are 20 faults in the multimode TE process, which are listed in Table 2.Among these faults, the root causes of the faults 16-20 are unknown [46,47].What is more, to simplify interpretation, the amplitudes of faults 3, 9, and 15 are so small.It is difficult to detect, so only the remaining 12 faults were considered in this study.In the modeling stage, 2000 normal samples, which include 1000 mode 1 samples and 1000 mode 3 samples, were collected as the training data set.In the testing stage, 1000 samples of mode 1 were tested first, and then the process switches to mode 3.As a result, the test data set consists of 1000 samples of mode 1 and 1000 samples of mode 3.And faults occurred from the 1200th sample.A set of 20 faults in multimode TE process, which are listed In the MPPCA algorithm and PCA-GMM algorithm, when the variance contribution was selected as 85%, the dimension of feature space in MPPCA and the number of PCs in PCA-GMM were each selected as 18.In order to compare the monitoring performances of these algorithms in the same situation, the selected sparse PCs of each mode in SPCS-BIP were selected as 18.The 99% control limit was assigned to all three algorithms.
First, Figure 3 shows that the different submodes can be successfully divided by the EM algorithm used in this paper.And by using other algorithms, the modes also can be divided correctly.In other words, how to correctly divide the training data into multiple subset is not a problem by many related algorithms.
The normal process was tested by different algorithm, and the results are shown in Figure 4.In this figure, it is hard to figure out which algorithm's FR is lower.In the figure, most samples of each algorithm are lower than the control limit.And by calculation, the FR of these algorithms are 0.333%, 0.08%, 0.25%, and 1.08%, respectively, corresponding to Figures 4(a), 4(b), 4(c), and 4(d).The monitoring performances of these three algorithms suggest that the FR are acceptable.Next, the data sets of 12 faults in mode 3 were tested, and the MR of these three algorithms are listed in Table 3, with the smallest MR shown in bold.
From Table 3, we observe that the monitoring performance of SPCS-BIP is the best, compared to the MPPCA and PCA-GMM algorithms for all 12 faults.Here, we take the further analysis.In comparison with MPPCA and PCA-GMM algorithms, the SPCS-BIP algorithm can exactly divide the process data into subgroups corresponding to different modes by using the E-M algorithm, and, in each submode, SPCS-BIP can select the most important PCs that have most relation with the fault.Due to the fact that the subspace spanned by the PCs was monitored by BIP, most of the PCs are related to the main process of chemical industrial process, and only little PCs are related to the fault process.SPCS are discriminative by constructing an elastic net regression between all PCs and each sample.So, in Table 3, we observe Step IDV (4) Reactor cooling water inlet temperature Step IDV (5) Condenser cooling water inlet temperature Step IDV (6) A feed loss (Stream 1) Step IDV (7) C header pressure loss reduced availability (Stream 4) Step IDV (8) A, B, and C feed composition (Stream 4) Random variation IDV (9) D feed temperature (Stream 2) Random variation IDV (10) C feed temperature (Stream 4) Random variation IDV (11) Reactor cooling water inlet temperature Random variation IDV (12) Condenser cooling water inlet temperature Random variation IDV (13) Reaction kinetics Slow drift IDV (14) Reactor cooling water valve Sticking IDV (15) Condenser cooling water valve Sticking IDV (16) Unknown Unknown IDV (17) Unknown Unknown IDV (18) Unknown Unknown IDV (19) Unknown Unknown IDV (20) Unknown Unknown  that the results of SPCS-BIP are better than the results of MPPCA and PCA-GMM.
Figure 5 shows the monitoring performances of fault 10.It is easy to see that the FDI of MPPCA- 2 and the BIP of PCA-GMM cannot detect the fault effectively in Figures 4(a) and 4(c).In the figure, more than half of the fault samples were regarded as the normal samples, while, compared to the performances of MPPCA- 2 and PCA-GMM, the FDI of MPPCA-SPE shows some improvements.However, the monitoring performance of MPPCA-SPE does not match the performance of SPCS-BIP.We can find this point both in Figure 4 and Table 3 4.2.CSTR.This study simulated the CSTR process described by Yoon and MacGregor [49].The diagram of the process is presented in Figure 6.Due to the fact that the CSTR process consisted of only one operating unit, the number of subblocks was selected as 1.In the modeling stage, 1000 samples which include 500 mode 1 samples and 500 mode 2 samples were collected as the training data set.In the testing stage, 1000 samples of mode 2 were tested, and two faults were introduced to the process as follows.
Case 1.A step of 1 K was added in the cooling water temperature   from the 500th sample.
Case 2. A 2 kmol/(m 3 ⋅min) step was added in the inlet solute concentration   from the 500th sample.
In the MMPCA algorithm, when the variance contribution was selected as 85%, the dimension of feature space in MPPCA is 10.So, in order to compare the monitoring performances of these algorithms in the same situation, the number of PCs in PCA-GMM and the selected sparse PCs in SPCA-BIP were both selected as 10.The 99% control limit was assigned to all three algorithms.
The same as TEP, the FR of these algorithms are 0.4%, 0, 1.2%, and 1%, respectively.In an industry process, FR lower than 0.05 is acceptable [28].
The data sets of two faults in mode 2 were tested, and the MR were listed in Table 4.In the table, the smallest missed detection rates are shown in bold.
As shown in Table 4, the SPCS-BIP algorithm has shown the best performance for these two faults compared with other algorithms listed in the table.It is obvious that neither MPPCA nor PCA-GMM algorithms can detect the fault because their missed detection rates were high.In those four algorithms, only the SPCS-BIP was based on the selection PCs, so the improvements in the proposed sparse principal components selection can be demonstrated through the better monitoring performance of the SPCS-BIP algorithm.
Fault 1 is a bias in cooling water temperature   .Due to the control loop in the CSTR process, these would be a bias in outlet temperature , and then the cooling water flow rate   would increase.In Figure 7, both the MMPCA and PCA-GMM algorithms could not detect fault 1 effectively according to the performances of those shown in Figures 7(a), 7(b), and 7(c).In Figure 7(d), it is obvious that the monitoring performance of SPCS-BIP is much better than the others.The reason is that the correct classification for each subgroup by using E-M algorithm and the PCs selected by SPCS are discriminative and could construct the subspace that contains the important fault information for abnormal data.
Fault 2 is a bias in inlet solute concentration   .Then, due to the control loop in the CSTR process, there would be biases in outlet concentration  and outlet temperature .According to the performances shown in Figure 8(b), the FDI of MPPCA-SPE could not detect fault 2. Compared to the MPPCA-SPE, the FDI of MPPCA- 2 showed some  improvements.Both PCA-GMM and the proposed SPCS are all using BIP.In Figures 8(c) and 8(d), we could hardly see which algorithm is better.However, in Table 4, we could obviously find that the SPCS-BIP is better.Even, compared to MPPCA- 2 , the proposed algorithm has a little advantage than MPPCA- 2 .

Conclusions
An algorithm using sparse principal component selection and Bayesian inference-based probability (SPCS-BIP) was proposed in this study.Given that the modern industrial processes typically have multiple operating modes, BIP is utilized to compute the posterior probabilities of each monitored sample belonging to the multiple components and derive an integrated global probabilistic index for fault detection of multimode processes.In each submode, we use the sparse principal component selection to select the key PCs that have the best relation with fault.This algorithm constructs an elastic net regression between all PCs and each sample and then selects PCs according to the nonzero regression coefficients which indicate the discriminative expression of the sample.Finally, the TE and CSTR processes were employed to verify the superiority of the SPCS-BIP algorithm.The monitoring performances of MPPCA, PCA-GMM, and SPCS-BIP methods are discussed compared to those of the MPPCA and PCA-GMM algorithms, and the monitoring performances of the SPCS-BIP algorithm were found to be the best ones among the three algorithms.

Figure 1 :
Figure 1: The steps of SPCS-BIP algorithm for process monitoring.

Figure 4 :Figure 5 :
Figure 4: Monitoring performance of the normal process.
on the samples in the same class as the represented sample.Given the training sample L = [ 1 ,  2 , . . .,   ] ∈ R × , a test sample H ∈ R  , the solution to the sparse representation problem can be obtained by solving the following ℓ 1minimization problem: 1 -norm have the advantage of greater robustness to data noise, automatic sparsity, and adaptive neighborhood for individual datum.What is more another important advantage is that sparse representation has the potential discriminative ability since most nonzero elements are located =1 p    ‖ 2 +  1 ∑  =1 |  | +  2 ∑  =1 ‖‖ 2 subject to Card() ≤ .(7) Corresponding to the nonzero representation coefficients { β 1 , β 2 , . . ., β  }, construct a new loading vector P = [p  1 , p  2 , . . ., p   ].
The training data X is reconstructed by X = ∑  =1 t  p   , where t  is the score vector and p  is the loading vector.(6) For training sample x  ( = 1, 2, . . ., ), construct an elastic net regression between each observation value of training data and loading vector P made of PCs in step (5), according to β = arg min  ‖x  −∑ Corresponding to the nonzero representation coefficients { β 1 , β 2 , . . ., β  }, construct a new loading vector P = [p  1 , p  2 , . . ., p   ]. (5) Generate the BIP control chart with the calculated BIP index values for all the monitored samples.If the BIP index of a test sample is lower than the control limit, which means the sample is normal, go to step (1).Else, there is a fault in the process.

Table 1 :
Six process operation modes of TE process.

Table 2 ,
are simulated and the corresponding process data are collected for testing.The following simulations are run in MATLAB 8.3.0 (2014a) environment.Here, two indicators, which are FR (FR) and MR (MR), are often introduced to measure the result of process monitoring.FR is the rate of normal data classified as fault data.MR is the rate of fault data classified as normal rate.

Table 2 :
Process faults for the multimode TE process.