Comprehensive Monitoring of Complex Industrial Processes with Multiple Characteristics

,


Introduction
People now set a higher threshold for the production quality, system performance, economical e ciency, and increasingly complicated grow process industry systems at the structure and automation level, amid the rapid advance of arti cial intelligence and sensor detection technology. Reliability and safety of complex industrial processes now jostle for increasing attention and have to be safeguarded urgently. Industrial process fault detection shines in e ectively improving product quality, safe operation, and continuous production. Condition monitoring can be performed by knowledge-and model-based or datadriven approaches [1][2][3][4][5]. e rst two kinds face great challenges in dealing with complex process industrial system as they require a large number of model parameters and prior knowledge. Operation data can be recorded, transmitted, and stored; these features lay the foundation for data-driven fault detection techniques [6][7][8].
Data-driven fault detection methods have been intensively studied over the past years. Great progress was seen in the Multivariate Statistical Process Monitoring (MSPM). For example, principal component analysis (PCA), partial least squares (PLS), canonical correlation analysis (CCA), independent component analysis (ICA), and other methods have been widely studied and applied [3,[9][10][11][12]. MSPM dimensionally reduces high-dimensional process variables to lowdimensional space by projection under normal circumstances, thus outputting a process monitoring model. To address the issue of early fault detection, because the memory monitoring charts are sensitive to incipient anomalies in the process mean, the advantages of PCA and multivariate memory monitoring schemes are used for early abnormality detection, and the results showed that the scheme could detect early anomalies in multivariate data [13]. Given complex industrial process systems, varying basic MSPM improvement schemes have been proposed. For example, to reduce computational cost and time while ensuring that important information is not lost or deleted, a dynamic simplification method which enhances the nonlinear process monitoring capability of KPLS is proposed [14]. In addition, an iterative robust kernel principal component analysis is brought forth to improve the robustness of the fault detection model by iteratively optimizing the function index and the kernel method [15]. e KPCA was revised to deal with nonlinear optimization problems. e newly improved approach greatly enhanced the fault detection performance of the original kernel principal component analysis [16]. Facts prove that not all variables of complex industrial processes completely obey the Gaussian distribution. ICA outperforms PLS as it saves the trouble of determining the input/output relationship of the system. erefore, a new kernel independent component analysis method was driven by the aforementioned non-Gaussian nonlinear distribution. e proposed method effectively captures the nonlinear relationship in the process variables [17]. e above methods have improved the fault detection performance and model stability to a certain extent. However, the above methods all have certain assumptions, such as variables obeying a Gaussian or non-Gaussian distribution and linear or nonlinear correlation between variables. Such assumptions are unrealistic in complex industrial processes. erefore, complex industrial processes require further research.
Given the characteristic that the process variables do not fully obey the Gaussian distribution, [18] an approach integrating ICA and PCA was proposed, which monitors non-Gaussian distributed variables and Gaussian distributed variables making use of ICA and PCA, respectively. Reference [19] proposed Gaussian and non-Gaussian dual subspace statistical process monitoring. Foremost, the D test identified the normality of the process variables, then divided the process variables into Gaussian and non-Gaussian subspaces, and finally classified the PCA and ICA models for fault detection in Gaussian and non-Gaussian subspaces, respectively. Reference [20] proposed a dynamic non-Gaussian mixture serial modeling method for industrial process monitoring, using the multivariate non-Gaussian evaluation method to divide the industrial process variables into Gaussian variable subspace and non-Gaussian variable subspace. en, using DICA and DPCA, the two subspaces were monitored, and the monitoring performance was improved to a certain extent. Aiming at the characteristics of incomplete linear correlation of process variables, [21] proposed a new hybrid linear-nonlinear statistical model (SPCA) for nonlinear process monitoring, using PCA to extract the characteristics of linear features to monitor the linear subspace. Using KPCA can extract the characteristics of nonlinear features to monitor the nonlinear subspace, which makes better use of the underlying process to improve the monitoring performance. Reference [22] proposed a parallel PCA-KPCA (P-PCA-KPCA) model and monitoring scheme combining stochastic algorithm (RA) and genetic algorithm (GA). e GA-based optimization method was used to determine whether the parallel principal component analysis model (P-PCA) and parallel KPCA models (P-KPCA) contain variables, and the proposed method can effectively handle nonlinear processes. In addition, some studies have adopted layering, subspace division, and Bayesian decision fusion mechanisms for process monitoring, and the monitoring performance has been improved to a certain extent [23][24][25].
Although the aforementioned research has been successful, the detection of faults in complex industrial processes where multiple features coexist remains a challenge. Complex industrial process variables always present high-dimensional, linear, nonlinear, Gaussian, and non-Gaussian coexistence. e proposed hybrid model provides a new direction for complex industrial process fault detection; however, the lack of in-depth research on nonlinear evaluation methods for process variables has led to the poor performance of existing hybrid models for complex industrial process fault detection with linear, nonlinear, Gaussian, and non-Gaussian feature coexistence. For example, the PCA-ICA model has good performance in industrial process monitoring where linear, Gaussian, and non-Gaussian features coexist, but there are always false positives and false negatives in nonlinear industrial processes. erefore, it is important for theoretical research and engineering practice to carry out nonlinear evaluation of process variables and to build models for monitoring complex industrial processes.
In this study, a PCA-KPCA-ICA-KICA-BI mixed model is proposed for the coexistence of high-dimensional, linear, nonlinear, Gaussian, and non-Gaussian variables in complex industrial processes. First, a multivariate feature evaluation method is proposed. e original variables are classified using the Jarque-Bera test and the nonlinear discriminant method. is method calculates the kurtosis and skewness statistics of the variables. According to the normality test results, the process variables are divided into Gaussian blocks and non-Gaussian blocks; then we use Pearson correlation coefficient, maximum mutual information coefficient, and nonlinear evaluation function to divide Gaussian blocks and non-Gaussian blocks into Gaussian linear blocks, Gaussian nonlinear blocks, non-Gaussian linear blocks, and non-Gaussian blocks nonlinear blocks; then we apply the PCA-KPCA-ICA-KICA model to monitor the blocking process; and we finally propose a Bayesian inference fusion strategy to comprehensively decide the detection results of each block. Section 2 briefly reviews PCA, ICA, kernel methods, and Jarque-Bera tests. Section 3 proposes nonlinear discriminant methods and Bayesian inference fusion strategies. Section 4 describes in detail the PCA-KPCA-ICA-KICA-BI method and its construction. Section 5 presents the application of PCA-KPCA-ICA-KICA-BI in the TE process and CSTR process and analyzes the corresponding monitoring performance. Section 6 concludes the paper.

Principal Component Analysis and Kernel Principal
Component Analysis. Principal component analysis constitutes one of the common data dimensionality reduction methods in linear correlation, Gaussian, and high-dimensional process monitoring. It maps industrial process data to 2 International Journal of Chemical Engineering a low-dimensional space and then performs the analysis according to the data characteristics of the decomposed main subspace and residual subspace. For a given process dataset X � [x 1 , x 2 , . . . , x m ] ∈ R n×m , n represents the number of samples and m represents the number of variables. After mapping, X is decomposed into in which X is the normalized data matrix, P ∈ R m×k is the loading matrix obtained by eigenvalue decomposition of the covariance matrix C, T ∈ R n×k is the score matrix, E is the residual subspace, and k is the number of the principal components obtained from the cumulative percent variance (CPV): in which λ i is the eigenvalues of the covariance matrix C.
. , x m ] T is obtained, according to the aforementioned decomposition model, the score vector t new and residual subspace vector e of the new sample are as follows: In process monitoring, PCA algorithms typically adopt residual squared prediction error (SPE) and T 2 statistic to monitor operational status. T 2 statistic measures the PCA main subspace, and the SPE statistic considers the residual subspace. e loading matrix P contains a lot of variance variation information. Since T 2 statistic is modeled based on a loading matrix P with large singular values, it is sensitive to low singular value inaccuracies, which can be addressed by the SPE statistic [26,27]. Similarly, the calculation formulas of the T 2 statistics and SPE statistics of the new sample x new are as follows [26]: in which ∧ is the diagonal matrix containing the eigenvalues of the data variance matrix. T 2 and SPE control limits are calculated from normal operating data. Since the T 2 statistic is in line with the F distribution, given a certain degree of confidence α (usually 99%), the statistical control limits T 2 α and SPE α are as follows: in which the freedom of F(k, m − k, α) is m − k; θ i is the sum of the residual eigenvalues to the power of 1, 2, and 3; and h 0 is the intermediate outcome variable, h 0 � 1 − (2θ 1 θ 3 /3θ 2 2 ).

Independent Component Analysis.
Unlike PCA, ICA can monitor the non-Gaussian processes by extracting independent non-Gaussian features from industrial process variables. e given process data vector X � [x 1 , x 2 , . . . , x m ] ∈ R 1×m can be represented as a linear combination of statistically independent d non-Gaussian sources s � [s 1 , s 2 , s 3 , . . . , s d ], in which d ≤ m. After normalization and preprocessing, the relationship between the original industrial process data and the independent components ICs is expressed as follows: in which A ∈ R m×d is the mixture matrix and e ∈ R m×1 is the residual matrix. e basic problem of ICA is to estimate the mixture matrix A and the independent component vector s.
Because the FastICA algorithm is characterized in fast calculation speed, this study uses the FastICA method [28] to calculate the decomposition matrix W and reconstruct s as follows: To eliminate the correlation between the process data and facilitate the calculation, normalization and whitening are required. is paper adopts the commonly used PCA whitening method to whiten the process data. rough singular value decomposition, the obtained covariance matrix and whitened data are as follows: in which E represents the expectation, C is the covariance matrix, Q is the whitening matrix, and zis the whitened data. Orthogonal matrix B can be calculated by the following formula: From the above formula, s can be expressed as follows: e relationship between the decomposition matrix W and B can be expressed by the following formula: When ICs is determined, process monitoring can be performed by establishing I 2 and statistical magnitude SPE. e specific process is as follows: e kernel density estimation (KDE) is used to calculate the thresholds of the above two statistics [28].
International Journal of Chemical Engineering 2.3. Kernel Method. KPCA and KICA models map nonlinear process variables into high-dimensional space for processing through kernel methods. is section introduces the kernel method at length since the monitoring schemes of PCA and ICA models have been explicated in Sections 2.1 and 2.2.
Assume the process data X � [x 1 , x 2 , . . . , x m ] ∈ R n×m is a normalized 0-mean dataset. rough nonlinear mapping Φ(·), the covariance matrix C F of the mapped data in the feature space is as follows [28]: C F can be diagonalized by eigenvalue decomposition as follows: in which λ represents the eigenvalue and satisfies λ ≥ 0, and v represents the eigenvector. Formulas (13) and (14) suggest the following: Since the eigenvectors of the covariance matrix are spanned by samples, there must be a coefficient α Formulas (15) and (16) suggest the following: e inner product of the reconstructed variables in the feature space can be described by a kernel function. is study adopts a Gaussian kernel function, which is defined as follows: , and (17) can be further simplified as follows: in which α and λ are the eigenvalues and eigenvectors of the kernel matrix k. According to the above formula, λ and v can be calculated.

Jarque-Bera
Test. MSPM method usually performs parameter evaluation and feature extraction given that variables obey normal distribution. Although it is reasonable to assume that some industrial processes obey normality, this assumption is often questionable when faced with complex industrial processes. erefore, the Jarque-Bera (JB) test for normality distribution of complex process data is proposed. In statistics, the Jarque-Bera test examines whether the sample data shows the skewness and kurtosis of a normal distribution and goodness. Assume . , x n ] to be a dataset of n independent random variables; the sample skewness and kurtosis of x 1 , x 2 , x 3 , . . . , x n are calculated as follows: in which x � (1/n) n i�1 x i ; if the process variable follows a Gaussian distribution, the skewness and kurtosis of X are close to 0 and 3, respectively. e normality of the sample can be tested by the skewness and kurtosis deviation of the expected value. e formula for calculating the JB statistic is as follows: e threshold JB α of the JB statistical magnitude is calculated from the significance level α and the number of samples n. If JB < JB α , this means that the process variable obeys the normality assumption.

Nonlinear Discriminant Method.
It is of great significance to divide the linearly correlated variables and the nonlinearly correlated variables into two subblocks for separate monitoring since the complex industrial process variables may have both linear and nonlinear correlations. is section proposes a nonlinear discriminant method for this problem, which is described at length as follows.
Mutual information (MI), a nonlinear evaluation method based on communication entropy theory, quantitatively describes the nonlinear correlation between two random variables [29,30]. If there is a strong correlation between these two variables, their MI values will become relatively large. On the contrary, if the two variables are approximately independent, the MI value will become very small. e proposal of mutual information provides great convenience for characterizing the relationship between two variables [30], but the computational complexity of the joint probability in mutual information is relatively large. erefore, this study selects the maximum mutual information coefficient (MIC) with low computational complexity to indicate the degree of correlation between variables.
Although the maximum mutual information coefficient has a strong ability to represent linear and nonlinear correlations, it cannot clearly indicate whether the variables are linear or nonlinear. e Pearson correlation coefficient can just make up for the lack of MIC. It is obtained by dividing the covariance of the two variables by the product of the standard deviation. e Pearson correlation coefficient value is located in [− 1, 1]: a value equal to 1 indicates that the two variables are perfectly linearly correlated, a value equal to − 1 indicates that the two variables are perfectly linearly negatively correlated, and a value equal to 0 indicates that there is no linear relationship between the two variables. e calculation process of the MIC and Pearson correlation coefficient is as follows: , , (22) in which MIC(x, y) represents the maximum mutual information coefficient of the two variables, and P x,y represents the Pearson correlation coefficient between the two variables. E(·) is the expected value, and var(·) is the variance value.
Although MIC and Pearson correlation coefficients shine in characterizing the correlation between variables, deficiencies are seen in determining whether the variables are linearly or nonlinearly correlated. erefore, we introduce a nonlinear discriminant coefficient, which combines the strengths of MIC and Pearson's correlation coefficient to determine whether a variable is nonlinear. Here is the calculation: in which NLDV represents the value of nonlinear discrimination, which represents the degree of nonlinear correlation of process variables. NLDV ∈ [0, 1] can be inferred from (23). If NLDV � 1 represents a complete nonlinear correlation between variables and NLDV � 0 represents a completely linear relationship between variables, to illustrate other cases of the value of NLDV, a threshold of c ∈ [0.3, 0.5] is set. If NLDV > c means that the variables are nonlinearly correlated, this paper takes c � 0.4, α represents the weight coefficient, and α � (1/2) in this paper [30].

Bayesian Inference Fusion Strategy.
Since the process variables are divided into multiple subspaces for separate monitoring, how to integrate the monitoring results becomes a problem. Reference [22] proposed a decision logic to determine the process running state. To improve the robustness of the model, this study adopts a Bayesian inference fusion strategy to determine the final monitoring results of the process. Bayesian inference is based on a probabilistic approach, which is similar to that in recent studies [31,32]. Jarque-Bera and the nonlinear discriminant method divided the process dataset into Gaussian linear subspace, Gaussian nonlinear subspace, non-Gaussian linear subspace, and non-Gaussian nonlinear subspace. e corresponding monitoring statistics of each subspace are established. e failure probability of T 2 in a Gaussian linear subspace is defined as follows: in which N and F represent normal operating conditions and abnormal operating conditions, respectively. Given a confidence level α, P T 2 F � α, and P T 2 N � 1 − α. e calculation process of P T 2 (X PCA |N) and P T 2 (X PCA |F) is as follows: in which T 2 lim is T 2 control limit. Similarly, definition of the failure probability of the remaining three subspaces shares similarities with the calculation method of the Gaussian linear subspace given above. Combination of the statistical magnitude of T 2 and SPE of Gaussian linear subspace and Gaussian nonlinear subspace and that of non-Gaussian linear subspace and non-Gaussian nonlinear subspace is as follows: International Journal of Chemical Engineering Following weighted statistic formula is established according to the above formula to facilitate the monitoring: If the calculation result of the above formula exceeds the confidence level α, it indicates possible failures of the industrial process; otherwise, the industrial process is in a normal operating state.

Process Monitoring Program
e flowchart of PCA-KPCA-ICA-KICA-BI is shown in Figure 1, and the specific description is as follows.

Spatial Decomposition
Step 1 normalized the historical dataset X under normal operating conditions.
Step 2 calculated the value of each column of the dataset X.
Step 3 divided the dataset X into Gaussian subspace and non-Gaussian subspace by JB value and JB α .
Step 4 divided the Gaussian subspace and the non-Gaussian subspace into Gaussian linear subspace, Gaussian nonlinear subspace, non-Gaussian linear subspace, and non-Gaussian nonlinear subspace through the nonlinear discrimination method proposed in Section 3.1.

Offline Model.
Step 1 adopted the historical dataset X under normal operating conditions for training; Step 2 performed space decomposition; Step 3 detected the Gaussian linear subspace, Gaussian nonlinear subspace, non-Gaussian linear subspace, and non-Gaussian nonlinear subspace making use of PCA, KPCA, ICA, and KICA models; Step 4 calculated the confidence limits for each subspace.

Online Monitoring
Step 1 collected the currently monitored industrial process system dataset X New and normalized it.
Step 2 divided X New into Gaussian linear subspaces, Gaussian nonlinear subspaces, non-Gaussian linear subspaces, and non-Gaussian nonlinear subspaces using spatial decomposition.
Step 3 calculated the monitoring statistics of each word space.
Step 4 learned the statistics in Step 3 and the confidence limits in Step 4 of the offline model. e final statistic BIC is calculated using the Bayesian inference fusion strategy of (27) in Section 3.2, where BIC > α indicates a fault, and the opposite indicates that no fault has occurred.

Case Study
To verify the effectiveness of the PCA-KPCA-ICA-KICA-BI hybrid model proposed in this study, two cases were adopted to verify the performance of the model.

Tennessee Eastman (TE) Chemical Process.
e Tennessee Eastman (TE) chemical process described by Downs and Vogel consists of five main units: reactor, product condenser, vapor-liquid separator, recycle compressor, and product stripper. e reactor includes four reactive substances (A, C, D, and E) and an inert substance (B). After the reactants are fed into the reactor, products G and H and byproduct F are generated through a series of chemical reactions. e product stream is condensed in a condenser and then separated by a vapor-liquid separator. e uncondensed product is sent back to the reactor by the centrifugal compressor for re-reaction, and the condensed product is sent to the stripper for stripping [33]. e flowchart of the TE chemical industry is shown in Figure 2. e TE process contains 41 measured variables (22 continuous process variables, 19 component measured variables) and 12 manipulated variables. e dataset adopted for this study was downloaded from https://web.mit.edu/braatzgroup/links. html. Table 1 presents all the variables included in the TE process, and the dataset contains 52 variables and 960 samples. For the convenience of subsequent experimental analysis, Table 2 lists 21 different faults (faults are introduced after the 161st sample). In the following datasets of faults 11, 16, and 19, the PCA-KPCA-ICA-KICA-BI hybrid model proposed in this study is compared with the traditional single model to verify the effectiveness of the hybrid model.

Subspace Division.
Based on the offline modeling in Section 4.2, the Jarque-Bera test divided the variable space into Gaussian subspaces and non-Gaussian subspaces, and the nonlinear discriminant method classified the variable space into Gaussian linear subspaces, Gaussian nonlinear subspaces, non-Gaussian linear subspace, and non-Gaussian nonlinear subspace. e results are shown in Table 3.   e traditional single model only considers one distribution case and can only detect TE process faults caused by linearly distributed process variables, nonlinearly distributed process variables, or non-Gaussian distributed process variables, so the fault detection accuracy is relatively low. e hybrid model proposed in this study fully considers the multicharacteristic distribution of complex industrial process variables and divides the process variables into multiple subspaces using the JB test and the nonlinear evaluation method proposed in this study, and then a model with targeted distribution is used for each subspace for fault detection. Finally, the fault detection results are given using a Bayesian inference fusion strategy, and the results show that the proposed hybrid PCA-KPCA-ICA-KICA-BI model has higher fault detection accuracy for multifeatured complex industrial process fault detection. e method proposed in this study considers the advantages of a single model and gives full play to the fault detection advantages of the single model using the JB test and nonlinear evaluation methods. Table 4  e existing single fault detection model, the PCA-ICA hybrid model, and the hybrid model proposed in this study have a fault detection accuracy close to 1 for faults 1, 2, 6, 7, 12, 14, 17, and 18 of the TE process; however, the fault detection accuracy of faults 3, 9, and 15 is not very satisfactory.

TE Process Fault Detection.
is confirms that the magnitude and obviousness of faults can lead to a relatively large difference in fault detection accuracy. erefore, to address the problem of small fault magnitude not being easily detected, due to the ability of exponentially weighted moving average (EWMA) control scheme to detect small changes, a strategy that combines the exponentially weighted sliding average control scheme with the PCA model is proposed to detect small faults in industrial processes [34]; how to detect initial faults or faults with a low signal-to-noise ratio will be a very interesting problem. Figures 3 and 4 show that a single model is inferior to the hybrid model in fault detection performance despite its ability to detect certain fault samples,   International Journal of Chemical Engineering indicating its deficiency, that is, only considering one distribution of variables. Simply put, the hybrid model shines in dealing with the increasingly complex industrial processes. TE process failure simulation experiments demonstrate the considerable advantages of the hybrid model proposed in this study. To further validate the performance of the model, the continuous stirred reactor process is used in Section 5.2. Figure 5 shows the simple flowchart of the CSTR process consisting of 9 variables. Here is the vector form of the dataset:

CSTR Case.
where T C denotes the cooling water temperature, T 0 denotes the inlet temperature, C AA and C AS denote the inlet concentrations, F S denotes the solvent flow, F C denotes the cooling water flow, C A denotes the outlet concentration, T denotes the temperature, and F A denotes the reactant flow [22,33]. In this study, using the same simulation conditions as Yoon and Macgregor [33], 960 normal operating CSTR data samples were collected through the established CSTR process simulation platform as the training set, and another 160 normal samples and 800 fault samples were collected as the test set. e test set was used to verify the fault detection performance of the hybrid model proposed in this study.

Subspace Division.
In view of the offline modeling in Section 4.2, first, Jarque-Bera test divided the CSTR process variable space into Gaussian subspaces and non-Gaussian subspaces, and the nonlinear discriminant method classified the variable space into Gaussian linear subspaces, Gaussian Reactor cooling water flow 52 Condenser cooling water flow    1,9,11,12,15,17,20,22,29,44,48,49, 51, 52 Gaussian nonlinear subspace 4, 5, 6, 8, 14, 21, 23, 26, 27, 24, 25, 28, 30,  nonlinear subspaces, non-Gaussian linear subspace, and non-Gaussian nonlinear subspace, as shown in Table 5. Figure 6 shows the fault detection results of the CSTR process. Figures 6(a)- 6(d) show that the hybrid model proposed in this paper integrates the distribution characteristics of the process variables and uses the BI method to make a comprehensive decision on the fault detection results with signi cant advantages, where the fault detection rate of the PCA-KPCA-ICA-KICA-BI hybrid model is 1 and is lower than the FPR of the single model.

CSTR Process Fault Detection.
Since the CSTR process is a typical nonlinear process [22,33], the PCA model and the ICA model cannot handle the nonlinear distributed characteristic variables well, resulting in lower fault detection rates, with 0.83 for the PCA model and 0.08 for the ICA model. KPCA can map the nonlinear data to a high-dimensional space to become linear, so the KPCA model has good applicability in nonlinear process fault detection, which is veri ed in Figure 6(b). Comparison of Figures 6(b) and 6(d) shows that the fault detection rate of the KPCA model is comparable to that of the hybrid model proposed in this paper, which indicates that full consideration of the distribution of process       single model has a fault detection accuracy of 0.68, and the hybrid PCA-ICA model has a detection accuracy of 0.72, which means that more faults are considered normal, while the hybrid model proposed in this paper has a fault detection accuracy of 0.86, which is a significant improvement. Complex industrial processes are characterized by strong coupling of variables, and any small fault can cause huge safety hazards, which makes it difficult to collect data on actual industrial process faults. erefore, this study used TE and CSTR processes with complex characteristics, such as high-dimensional, linear, nonlinear, Gaussian, and non-Gaussian features, to verify the fault detection performance of the hybrid model proposed in this paper. e analysis of the fault detection results shows that the PCA-KPCA-ICA-KICA-BI hybrid model can fully consider the variable distribution characteristics of TE and CSTR processes and has a higher fault detection accuracy than the traditional single models and the PCA-ICA hybrid model. e application of the hybrid model proposed in this study to the TE process and the CSTR process shows that the hybrid model has good prospects for securing complex industrial processes. To further optimize and improve the PCA-KPCA-ICA-KICA-BI hybrid model and to enhance the applicability of the model, further research directions could be suggested in the following areas: (1) Initial failures with insignificant changes in complex industrial process data could be detected. (2) Compound failures could be detected.
(3) Most current fault detection models require a large number of data samples. However, in the pursuit of a safe and sound industrial process, intriguing is how to realize fault detection based on a small number of samples.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.