According to the demand for diversified products, modern industrial
processes typically have multiple operating modes. At the same time,
variables within the same mode often follow a mixture of Gaussian
distributions. In this paper, a novel algorithm based on sparse
principal component selection (SPCS) and Bayesian inferencebased
probability (BIP) is proposed for multimode process monitoring. SPCS
can be formulated as a justintime regression between all PCs and
each sample. SPCS selects PCs according to the nonzero regression
coefficients which indicate the compact expression of the sample.
This expression is necessarily
Over the past two decades, with the development of complex chemical processes and the growing demand of plant safety and stable product quality, timely process monitoring is gaining importance. Because large amounts of data can be gathered by the use of distributed control systems (DCSs), multivariate statistical process monitoring (MSPM) algorithms have received great attention. Among these algorithms, principal component analysis (PCA) and partial least squares (PLS) are the most widely used algorithms [
In literature, multiple models can be built to fit each individual operating mode, but these are two essential issues that need to be addressed. One is how to divide the training data into multiple subsets correctly, corresponding to different operating modes. In order to solve this issue, many clustering algorithms are applied. In terms of the traditional approaches, Ge and Song [
To date, the problem of how to correctly divide the training data into multiple subset can successfully be solved by many algorithms mentioned in the previous paragraph. However, there are still some issues that need to be resolved; the most important one is how to select the key principal components (PCs) when using one suitable model for process monitoring. Many algorithms for selecting PCs have been proposed, such as cumulative percent variance (CPV) [
Fortunately, many researches have been aware of the inherent defects of classical PCA algorithm. A lot of workers tried to seek a subspace spanned by key PCs, which contains the most important information for process monitoring. Peng et al. [
In this paper, a process monitoring algorithm using multisubspace sparse principal component analysis with the BIP algorithm is put forward. First, variables are divided into different subblocks corresponding to different units or pieces of equipment to reduce the complexity of process analysis. By using BIP algorithm, multimode data in each subblock are divided into multiple subgroups. BIP can compute the posterior probabilities of each monitored sample belonging to the multiple components and derive an integrated global probabilistic index for fault detection of multimode processes. The PCs selected by PCA algorithm with larger variances do not always have relationship with fault information. Sparse principal component selection (SPCS) takes the information of both normal and abnormal observations into account. The algorithm is formulated as a justintime form that constructs an elastic net regression between all PCs and each sample. SPCS selects PCs corresponding to the nonzero regression coefficients which indicate the compact expression of the sample. This expression is necessarily
Principal component analysis is a multivariate statistical analysis which is widely used in chemical process monitoring, fault detection, and so forth [
For the process running at multiple operating condition, owing to the mean shifts or covariance changes, the assumption of multivariate Gaussian distribution becomes invalid [
To construct a FGMM, given a set of training samples
There are a lot of learning algorithms, such as maximum likelihood estimation (MLE), EM, and the FJ algorithm, that have been put forward for mixture model estimation [
Estep:
Mstep:
In this section, the idea of SPCSBIP algorithm for multimode process monitoring is demonstrated in detail. We first introduce the Bayesian inferencebased probability which can derive the confidence boundary around the normal operating regions for process monitoring and fault detection. Then, the sparse principal component selection was introduced for selecting the key Pcs related with fault information. Finally, the steps of this algorithm were given.
In the previous section, the FGMM has been constructed, and it is essential to further derive the confidence boundary around the normal operating regions for process monitoring and fault detection. Due to the multimodality of mixture distribution, it is really difficult to capture the analytical boundary of the density function
In the proposed monitoring approach, given an arbitrary monitored sample
Given that each component
Under the assumption that
For the monitored sample
Given the appropriate degree of freedom,
Under the preset confidence level
Otherwise, the process operation is treated out of control.
Sparse representation has proven to be an extremely powerful tool for acquiring, representing, and compressing highdimensional data [
Given the training sample
In brief, it is expected that the elastic net is used to group a set of sparse coefficients to construct the sparse alignment matrices, in which the sparse representation information or the potential discriminative information is encoded to enhance the discriminative ability in an unsupervised manner.
The key problem for monitoring the multimode process is to select a suitable model and choose the subspace spanned by key PCs. In the Introduction, we had put forward the fact that the subspace spanned by the first several PCs with largest explained variance does not always have fault information.
In the following part, a novel multimode process monitoring approach based on SPCS and BIP is proposed. This approach is in a justintime form. For each sample, an elastic net regression between all PCs and the sample is constructed and solved. The PCs which have nonzero regression coefficients are retained, while other PCs are rejected. That means, for each sample, we can pick out the most discriminative bases and the others are set to zero. Its concrete calculating steps are summarized in Figure
The steps of SPCSBIP algorithm for process monitoring.
Collect a set of historical training data under all possible operating conditions.
Use the EM algorithm to learn the Gaussian mixture model and estimate the model parameter set
For each submodel, get a normal operational observation set
Normalize the training data through the mean value and variance of each variable.
Obtain all principal components using SVD decomposition. The training data
For training sample
Corresponding to the nonzero representation coefficients
Specify a confidence
Normalize the current time point data by using mean values and variance of the training data.
Obtain the loading vector
When a test sample
Corresponding to the nonzero representation coefficients
Generate the BIP control chart with the calculated BIP index values for all the monitored samples. If the BIP index of a test sample is lower than the control limit, which means the sample is normal, go to step
In this case study, the TE benchmark and CSTR process are introduced to verify the effectiveness of the SPCSBIP algorithm. PCAGMM is the classic algorithm for multimode processing monitoring. And the fault detection index (FDI) is similar to Bayesian inference probability (BIP). So here, a comparison was made between SPCSBIP and PCAGMM. In addition, to verify the improvements of SPCS algorithm, which can select sparse PCs, a comparison was performed between the SPCSBIP algorithm and the MPPCA algorithm.
As a wellknown benchmark process, the Tennessee Eastman process, which was presented by Downs and Vogel, has been widely applied to evaluate and compare the efficiency of process monitoring techniques [
Six process operation modes of TE process.
Mode  G/H mass ratio  Production rate 

1  50/50  7038 kg/h G and 7038 kg/h H 
2  10/90  1048 kg/h G and 12669 kg/h H 
3  90/10  10000 kg/h G and 1111 kg/h H 
4  50/50  Maximum 
5  10/90  Maximum 
6  90/10  Maximum 
Control scheme for the TE process.
There are 20 faults in the multimode TE process, which are listed in Table
Process faults for the multimode TE process.
Faults number  Disturbance state  Type 

IDV(1)  A/C feed ratio, B composition constant (Stream 4)  Step 
IDV(2)  B composition, A/C ratio constant (Stream 4)  Step 
IDV(3)  D feed temperature (Stream 2)  Step 
IDV(4)  Reactor cooling water inlet temperature  Step 
IDV(5)  Condenser cooling water inlet temperature  Step 
IDV(6)  A feed loss (Stream 1)  Step 
IDV(7)  C header pressure loss reduced availability (Stream 4)  Step 
IDV(8)  A, B, and C feed composition (Stream 4)  Random variation 
IDV(9)  D feed temperature (Stream 2)  Random variation 
IDV(10)  C feed temperature (Stream 4)  Random variation 
IDV(11)  Reactor cooling water inlet temperature  Random variation 
IDV(12)  Condenser cooling water inlet temperature  Random variation 
IDV(13)  Reaction kinetics  Slow drift 
IDV(14)  Reactor cooling water valve  Sticking 
IDV(15)  Condenser cooling water valve  Sticking 
IDV(16)  Unknown  Unknown 
IDV(17)  Unknown  Unknown 
IDV(18)  Unknown  Unknown 
IDV(19)  Unknown  Unknown 
IDV(20)  Unknown  Unknown 
In the MPPCA algorithm and PCAGMM algorithm, when the variance contribution was selected as 85%, the dimension of feature space in MPPCA and the number of PCs in PCAGMM were each selected as 18. In order to compare the monitoring performances of these algorithms in the same situation, the selected sparse PCs of each mode in SPCSBIP were selected as 18. The 99% control limit was assigned to all three algorithms.
First, Figure
Different modes of the training data.
The normal process was tested by different algorithm, and the results are shown in Figure
Missed detection rates (%) of 12 faults.
Faults number  MPPCA 
MPPCASPE  PCAGMM  SPCSBIP 

1  0.75  0.25  1 

2  3.5  8.5  5.5 

4  0  0  0  0 
5  0.375  0  1.125 

6  0  0  0  0 
7  0  0  0  0 
8  3.375  3.875  4.25 

10  82.375  16  89.875 

11  2.25  8.5  4.5 

12  1.375  1.625  1.5 

13  16.125  26.5  1.9625  11.625 
14  0  3.5  0.125 

Monitoring performance of the normal process.
MPPCA
MPPCASPE
PCAGMM
SPCSBIP
From Table
Figure
Monitoring performances of fault 10 in TEP.
MPPCA
MPPCASPE
PCAGMM
SPCSBIP
This study simulated the CSTR process described by Yoon and MacGregor [
Diagram of the CSTR process.
In the modeling stage, 1000 samples which include 500 mode 1 samples and 500 mode 2 samples were collected as the training data set. In the testing stage, 1000 samples of mode 2 were tested, and two faults were introduced to the process as follows.
A step of 1 K was added in the cooling water temperature
A 2 kmol/
In the MMPCA algorithm, when the variance contribution was selected as 85%, the dimension of feature space in MPPCA is 10. So, in order to compare the monitoring performances of these algorithms in the same situation, the number of PCs in PCAGMM and the selected sparse PCs in SPCABIP were both selected as 10. The 99% control limit was assigned to all three algorithms.
The same as TEP, the FR of these algorithms are 0.4%, 0, 1.2%, and 1%, respectively. In an industry process, FR lower than 0.05 is acceptable [
The data sets of two faults in mode 2 were tested, and the MR were listed in Table
Missed detection rates (%) of two faults.
Faults number  MPPCA 
MPPCASPE  PCAGMM  SPCSBIP 

1  90  100  98.8 

2  28.8  99.6  31.8 

As shown in Table
Fault 1 is a bias in cooling water temperature
Monitoring performance of fault 1 in CSTR.
MPPCA
MPPCASPE
PCAGMM
SPCSBIP
Fault 2 is a bias in inlet solute concentration
Monitoring performance of fault 2 in CSTR.
MPPCA
MPPCASPE
PCAGMM
SPCSBIP
An algorithm using sparse principal component selection and Bayesian inferencebased probability (SPCSBIP) was proposed in this study. Given that the modern industrial processes typically have multiple operating modes, BIP is utilized to compute the posterior probabilities of each monitored sample belonging to the multiple components and derive an integrated global probabilistic index for fault detection of multimode processes. In each submode, we use the sparse principal component selection to select the key PCs that have the best relation with fault. This algorithm constructs an elastic net regression between all PCs and each sample and then selects PCs according to the nonzero regression coefficients which indicate the discriminative expression of the sample. Finally, the TE and CSTR processes were employed to verify the superiority of the SPCSBIP algorithm. The monitoring performances of MPPCA, PCAGMM, and SPCSBIP methods are discussed compared to those of the MPPCA and PCAGMM algorithms, and the monitoring performances of the SPCSBIP algorithm were found to be the best ones among the three algorithms.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is supported by the National Natural Science Foundation of China (Grant no. 61375007) and Shanghai Science and Research Projects (Grant nos. 15JC1400600, 15JC1401700).