According to the demand for diversified products, modern industrial
processes typically have multiple operating modes. At the same time,
variables within the same mode often follow a mixture of Gaussian
distributions. In this paper, a novel algorithm based on sparse
principal component selection (SPCS) and Bayesian inference-based
probability (BIP) is proposed for multimode process monitoring. SPCS
can be formulated as a just-in-time regression between all PCs and
each sample. SPCS selects PCs according to the nonzero regression
coefficients which indicate the compact expression of the sample.
This expression is necessarily
Over the past two decades, with the development of complex chemical processes and the growing demand of plant safety and stable product quality, timely process monitoring is gaining importance. Because large amounts of data can be gathered by the use of distributed control systems (DCSs), multivariate statistical process monitoring (MSPM) algorithms have received great attention. Among these algorithms, principal component analysis (PCA) and partial least squares (PLS) are the most widely used algorithms [
In literature, multiple models can be built to fit each individual operating mode, but these are two essential issues that need to be addressed. One is how to divide the training data into multiple subsets correctly, corresponding to different operating modes. In order to solve this issue, many clustering algorithms are applied. In terms of the traditional approaches, Ge and Song [
To date, the problem of how to correctly divide the training data into multiple subset can successfully be solved by many algorithms mentioned in the previous paragraph. However, there are still some issues that need to be resolved; the most important one is how to select the key principal components (PCs) when using one suitable model for process monitoring. Many algorithms for selecting PCs have been proposed, such as cumulative percent variance (CPV) [
Fortunately, many researches have been aware of the inherent defects of classical PCA algorithm. A lot of workers tried to seek a subspace spanned by key PCs, which contains the most important information for process monitoring. Peng et al. [
In this paper, a process monitoring algorithm using multisubspace sparse principal component analysis with the BIP algorithm is put forward. First, variables are divided into different subblocks corresponding to different units or pieces of equipment to reduce the complexity of process analysis. By using BIP algorithm, multimode data in each subblock are divided into multiple subgroups. BIP can compute the posterior probabilities of each monitored sample belonging to the multiple components and derive an integrated global probabilistic index for fault detection of multimode processes. The PCs selected by PCA algorithm with larger variances do not always have relationship with fault information. Sparse principal component selection (SPCS) takes the information of both normal and abnormal observations into account. The algorithm is formulated as a just-in-time form that constructs an elastic net regression between all PCs and each sample. SPCS selects PCs corresponding to the nonzero regression coefficients which indicate the compact expression of the sample. This expression is necessarily
Principal component analysis is a multivariate statistical analysis which is widely used in chemical process monitoring, fault detection, and so forth [
For the process running at multiple operating condition, owing to the mean shifts or covariance changes, the assumption of multivariate Gaussian distribution becomes invalid [
To construct a FGMM, given a set of training samples
There are a lot of learning algorithms, such as maximum likelihood estimation (MLE), EM, and the F-J algorithm, that have been put forward for mixture model estimation [ E-step: M-step:
where
where
In this section, the idea of SPCS-BIP algorithm for multimode process monitoring is demonstrated in detail. We first introduce the Bayesian inference-based probability which can derive the confidence boundary around the normal operating regions for process monitoring and fault detection. Then, the sparse principal component selection was introduced for selecting the key Pcs related with fault information. Finally, the steps of this algorithm were given.
In the previous section, the FGMM has been constructed, and it is essential to further derive the confidence boundary around the normal operating regions for process monitoring and fault detection. Due to the multimodality of mixture distribution, it is really difficult to capture the analytical boundary of the density function
In the proposed monitoring approach, given an arbitrary monitored sample
Given that each component
Under the assumption that
For the monitored sample
Given the appropriate degree of freedom,
Under the preset confidence level
Otherwise, the process operation is treated out of control.
Sparse representation has proven to be an extremely powerful tool for acquiring, representing, and compressing high-dimensional data [
Given the training sample
In brief, it is expected that the elastic net is used to group a set of sparse coefficients to construct the sparse alignment matrices, in which the sparse representation information or the potential discriminative information is encoded to enhance the discriminative ability in an unsupervised manner.
The key problem for monitoring the multimode process is to select a suitable model and choose the subspace spanned by key PCs. In the Introduction, we had put forward the fact that the subspace spanned by the first several PCs with largest explained variance does not always have fault information.
In the following part, a novel multimode process monitoring approach based on SPCS and BIP is proposed. This approach is in a just-in-time form. For each sample, an elastic net regression between all PCs and the sample is constructed and solved. The PCs which have nonzero regression coefficients are retained, while other PCs are rejected. That means, for each sample, we can pick out the most discriminative bases and the others are set to zero. Its concrete calculating steps are summarized in Figure
The steps of SPCS-BIP algorithm for process monitoring.
Collect a set of historical training data under all possible operating conditions. Use the EM algorithm to learn the Gaussian mixture model and estimate the model parameter set For each submodel, get a normal operational observation set Normalize the training data through the mean value and variance of each variable. Obtain all principal components using SVD decomposition. The training data For training sample Corresponding to the nonzero representation coefficients Specify a confidence
Normalize the current time point data by using mean values and variance of the training data. Obtain the loading vector When a test sample Corresponding to the nonzero representation coefficients Generate the BIP control chart with the calculated BIP index values for all the monitored samples. If the BIP index of a test sample is lower than the control limit, which means the sample is normal, go to step
In this case study, the TE benchmark and CSTR process are introduced to verify the effectiveness of the SPCS-BIP algorithm. PCA-GMM is the classic algorithm for multimode processing monitoring. And the fault detection index (FDI) is similar to Bayesian inference probability (BIP). So here, a comparison was made between SPCS-BIP and PCA-GMM. In addition, to verify the improvements of SPCS algorithm, which can select sparse PCs, a comparison was performed between the SPCS-BIP algorithm and the MPPCA algorithm.
As a well-known benchmark process, the Tennessee Eastman process, which was presented by Downs and Vogel, has been widely applied to evaluate and compare the efficiency of process monitoring techniques [
Six process operation modes of TE process.
Mode | G/H mass ratio | Production rate |
---|---|---|
1 | 50/50 | 7038 kg/h G and 7038 kg/h H |
2 | 10/90 | 1048 kg/h G and 12669 kg/h H |
3 | 90/10 | 10000 kg/h G and 1111 kg/h H |
4 | 50/50 | Maximum |
5 | 10/90 | Maximum |
6 | 90/10 | Maximum |
Control scheme for the TE process.
There are 20 faults in the multimode TE process, which are listed in Table
Process faults for the multimode TE process.
Faults number | Disturbance state | Type |
---|---|---|
IDV(1) | A/C feed ratio, B composition constant (Stream 4) | Step |
IDV(2) | B composition, A/C ratio constant (Stream 4) | Step |
IDV(3) | D feed temperature (Stream 2) | Step |
IDV(4) | Reactor cooling water inlet temperature | Step |
IDV(5) | Condenser cooling water inlet temperature | Step |
IDV(6) | A feed loss (Stream 1) | Step |
IDV(7) | C header pressure loss reduced availability (Stream 4) | Step |
IDV(8) | A, B, and C feed composition (Stream 4) | Random variation |
IDV(9) | D feed temperature (Stream 2) | Random variation |
IDV(10) | C feed temperature (Stream 4) | Random variation |
IDV(11) | Reactor cooling water inlet temperature | Random variation |
IDV(12) | Condenser cooling water inlet temperature | Random variation |
IDV(13) | Reaction kinetics | Slow drift |
IDV(14) | Reactor cooling water valve | Sticking |
IDV(15) | Condenser cooling water valve | Sticking |
IDV(16) | Unknown | Unknown |
IDV(17) | Unknown | Unknown |
IDV(18) | Unknown | Unknown |
IDV(19) | Unknown | Unknown |
IDV(20) | Unknown | Unknown |
In the MPPCA algorithm and PCA-GMM algorithm, when the variance contribution was selected as 85%, the dimension of feature space in MPPCA and the number of PCs in PCA-GMM were each selected as 18. In order to compare the monitoring performances of these algorithms in the same situation, the selected sparse PCs of each mode in SPCS-BIP were selected as 18. The 99% control limit was assigned to all three algorithms.
First, Figure
Different modes of the training data.
The normal process was tested by different algorithm, and the results are shown in Figure
Missed detection rates (%) of 12 faults.
Faults number | MPPCA- |
MPPCA-SPE | PCA-GMM | SPCS-BIP |
---|---|---|---|---|
1 | 0.75 | 0.25 | 1 |
|
2 | 3.5 | 8.5 | 5.5 |
|
4 | 0 | 0 | 0 | 0 |
5 | 0.375 | 0 | 1.125 |
|
6 | 0 | 0 | 0 | 0 |
7 | 0 | 0 | 0 | 0 |
8 | 3.375 | 3.875 | 4.25 |
|
10 | 82.375 | 16 | 89.875 |
|
11 | 2.25 | 8.5 | 4.5 |
|
12 | 1.375 | 1.625 | 1.5 |
|
13 | 16.125 | 26.5 | 1.9625 | 11.625 |
14 | 0 | 3.5 | 0.125 |
|
Monitoring performance of the normal process.
MPPCA-
MPPCA-SPE
PCA-GMM
SPCS-BIP
From Table
Figure
Monitoring performances of fault 10 in TEP.
MPPCA-
MPPCA-SPE
PCA-GMM
SPCS-BIP
This study simulated the CSTR process described by Yoon and MacGregor [
Diagram of the CSTR process.
In the modeling stage, 1000 samples which include 500 mode 1 samples and 500 mode 2 samples were collected as the training data set. In the testing stage, 1000 samples of mode 2 were tested, and two faults were introduced to the process as follows.
A step of 1 K was added in the cooling water temperature
A 2 kmol/
In the MMPCA algorithm, when the variance contribution was selected as 85%, the dimension of feature space in MPPCA is 10. So, in order to compare the monitoring performances of these algorithms in the same situation, the number of PCs in PCA-GMM and the selected sparse PCs in SPCA-BIP were both selected as 10. The 99% control limit was assigned to all three algorithms.
The same as TEP, the FR of these algorithms are 0.4%, 0, 1.2%, and 1%, respectively. In an industry process, FR lower than 0.05 is acceptable [
The data sets of two faults in mode 2 were tested, and the MR were listed in Table
Missed detection rates (%) of two faults.
Faults number | MPPCA- |
MPPCA-SPE | PCA-GMM | SPCS-BIP |
---|---|---|---|---|
1 | 90 | 100 | 98.8 |
|
2 | 28.8 | 99.6 | 31.8 |
|
As shown in Table
Fault 1 is a bias in cooling water temperature
Monitoring performance of fault 1 in CSTR.
MPPCA-
MPPCA-SPE
PCA-GMM
SPCS-BIP
Fault 2 is a bias in inlet solute concentration
Monitoring performance of fault 2 in CSTR.
MPPCA-
MPPCA-SPE
PCA-GMM
SPCS-BIP
An algorithm using sparse principal component selection and Bayesian inference-based probability (SPCS-BIP) was proposed in this study. Given that the modern industrial processes typically have multiple operating modes, BIP is utilized to compute the posterior probabilities of each monitored sample belonging to the multiple components and derive an integrated global probabilistic index for fault detection of multimode processes. In each submode, we use the sparse principal component selection to select the key PCs that have the best relation with fault. This algorithm constructs an elastic net regression between all PCs and each sample and then selects PCs according to the nonzero regression coefficients which indicate the discriminative expression of the sample. Finally, the TE and CSTR processes were employed to verify the superiority of the SPCS-BIP algorithm. The monitoring performances of MPPCA, PCA-GMM, and SPCS-BIP methods are discussed compared to those of the MPPCA and PCA-GMM algorithms, and the monitoring performances of the SPCS-BIP algorithm were found to be the best ones among the three algorithms.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is supported by the National Natural Science Foundation of China (Grant no. 61375007) and Shanghai Science and Research Projects (Grant nos. 15JC1400600, 15JC1401700).