Monitoring of Distillation Column Based on Indiscernibility Dynamic Kernel PCA

Aiming at complicated faults detection of distillation column industrial process, it has faced a grave challenge. In this paper, a new indiscernibility dynamic kernel principal component analysis (I-DKPCA) method is presented and applied to distillation column. Compared with traditional statistical techniques, I-DKPCA not only can capture nonlinear property and dynamic characteristic of processes but also can extract relevant variables from all the variables. Applying this new method to distillation column process (a hardware-in-the-loop simulation system), the results prove the proposed method has great advantages, that is, lower missing rate and higher detection rate for the faults, compared with KPCA and DPCA.


Introduction
With the emergence and development of industrial 4.0, modern industrial processes are more complicated in both structure and automatic degree.The safety and reliable issues about the industrial processes have become the most critical problems for system design [1][2][3][4][5][6][7][8][9][10][11][12][13][14].To avoid abnormal accidents and losses, the process monitoring problem has become a severe topic in the area of process control.Among different process models, multivariate statistical process monitoring provides a data-driven framework for monitoring the industrial process.With the wide use of smart sensors, a large amount of data is collected in industrial processes; process information can be extracted directly from the huge amounts of the process data without considering complicated system models by data-driven methods, which lead to data-driven methods that have attracted much attention in the recent research works.Principal component analysis (PCA) is one of most widely used models in statistical process monitoring [15][16][17][18][19][20].
PCA is a basic multivariate statistical method which can extract useful information from large amount of process data by reducing dimensions.And the process data is divided into systematic part that reflects normal data change and noisy part that reflects the variation of noise.Hotelling's  2 statistic and SPE statistic are used for chemical process monitoring to detect the changes of process variation in the principal component subspace and residual, respectively.And it is applied to petroleum and chemical industry widely.The conventional PCA has been well performed in only steady-state linear processes.However, dynamic and nonlinear characteristics are widespread in many industrial processes.
To handle nonlinearity, a lot of methods have been proposed (Kramer, 1991;Dong and McAvoy, 1996;and Chiang et al., 2001), such as neural network PCA and kernel PCA (KPCA) [23][24][25][26][27][28][29].And neural network PCA needs training to determine the number of principal components; KPCA was developed to overcome this problem.The basic idea of KPCA is that the mapped data are analyzed by conventional PCA method in feature space.The traditional PCA is a static method.It is difficult to acquire the serial correlation of data [30].But industrial processes are dynamic, which will lead to fault missing.To handle this problem, the dynamic characteristics should be taken into consideration when developing a monitoring model [31].Ku et al. developed dynamic PCA which takes into account serial correlation in the data by augmenting each observation vector with the previous  observations.After years of research, DPCA has been applied to many fields [30,32,33].
In this paper, to improve the PCA, we propose a new nonlinear dynamic process monitoring method based on indiscernibility dynamic kernel PCA (I-DKPCA).The proposed method can not only capture nonlinear property and dynamic characteristic of industrial processes but also simplify the process data by extracting valid data.We compare the results of DPCA, KPCA, and I-DKPCA for detecting various faults in distillation column industrial process.
The remaining sections of this paper are organized as follows.Section 2 explains the new I-DKPCA algorithm in detail.In Section 3, we applied the four methods to distillation column.At last conclusions are drawn in Section 4.

Algorithm of New I-DKPCA
There are strong dynamic and nonlinear characteristics in the industrial processes; the I-DKPCA was a nonlinear dynamic method and proposed aiming at these two characteristics.For the I-DKPCA algorithm, some faults may not affect all the operating and process variables.To a given fault, some variables are not influenced.The proposed indiscernibility dynamic kernel PCA finds the variables which are affected by the faults severely, and these variables are extracted to form new sample matrix and test matrix.Therefore the proposed method has higher sensitivity and accuracy for process monitoring.The new method consists of three parts.

Indiscernibility and the Cross-Degree (𝜇 and 𝜂).
In process industry, for a complex system, there are a lot of process and operation variables; these variables are collected and stored; in the traditional multivariate statistical process monitoring, the selection of control variable often considered all the process variables, which caused a lot of inconvenience for process monitoring; for an actual fault, only a few variables are affected in the process.So a new dynamic kernel principal component analysis method is put forward in this paper.Two parameters are proposed, the indiscernibility degree and the degree of cross.The new method can get rid of irrelevant variables, reduce the data dimension, simplify the calculation algorithm, and improve the efficiency and accuracy of fault diagnosis.
The train samples  = [ 1 ,  2 , . . .,    ],   ∈    are gotten from normal process; the average for each variable is as follows: The test samples   = [  1 ,   2 , . . .,     ],    ∈   are gotten from abnormal process; the average for each variable is as follows: where   is the number of samples and   is the number of variables.
To determine the threshold, where the threshold value is to distinguish the abnormal data of the train data and the abnormal data of the fault data; if there are data which are beyond (below) the threshold in the train data, those data would be considered the abnormal data.
If there are data which are below (beyond) the threshold in the fault data, these data would also be considered the abnormal data.Take all the abnormal data of train data and fault data in a set of fault samples; the wrong points are called samples of fault point, shown as follows: where   is the number of variables. is the number of wrong points; for different variable,  is different.The parameter of the indiscernibility degree which is proposed in this paper is represented by   as follows: Parameter of the degree of cross is the ratio of the number of wrong points and the number of all samples in each variable and is represented by   as follows: For each known fault, we need to set a limit of  and  to choose the variables; in this paper, the author obtained the best value of  and  by many simulation results; in general, as the value of  and  is smaller, the effort is better, and get rid of irrelevant variables, and make the monitor data more concise.Because of reducing the irrelevant variables and simplifying the computation, according to the selected variables to monitor the production processes, the effect of diagnosis is better.

Dynamic Characteristic Analysis.
To consider the dynamic of the new data  = [ 1 ,  2 , . . .,   ],   ∈   , the PCA methods can be extended to take the serial correlations into account by augmenting each observation vector with the previous  observations and stacking the data matrix in the following manner [3,34]: where   is the -dimensional observation vector in the training set at the time instance .As shown in Figure 1.The number of lags  is selected by [32,35,36].The DPCA can get rid of the correlation of the data in some degree and improve the accuracy of diagnosis.14) and ( 16)

Kernel Principal Component Analysis.
Assuming the new augmented matrix () is mapped nonlinearly into a high dimensional feature space Φ :   → .The original data   become Φ(  ) after mapping.Suppose the data mapped into the feature space is centered; that is, the covariance matrix   in the feature space is defined as To perform PCA in the feature space we make eigenvalues () and eigenvectors (]) that satisfy where  > 0 and ] ∈  \ {0} is the eigenvector of correspondent .Therefore, for  ̸ = 0, solution ] can be regarded as a linear combination of Φ( 1 ), . . ., Φ(  ); that is, Then, by multiplying Φ(  ) with the left of both sides of (10), we obtain Substituting ( 6) and ( 9) into (10), we obtain Defining a kernel matrix  ∈  × , Inserting ( 12) into ( 11), ( 7) can be represented by the following simple form: For ∀ > 0,  = [ 1 , . . .,   ]  .The kernel matrix, , is centered in the feature space using the formula where   is an ( × ) matrix in which every element is equal to 1/.To normalize  1 , . . .,   , Taking ]  = ∑  =1    Φ(  ) into ( 15), By projecting Φ() onto eigenvectors ]  in the feature space, principal components are extracted: For  = 1, .
After obtaining   , two complementary multivariate control charts can be applied to process fault diagnosis.The first is Hotelling's  2 chart which can monitor the variation in the space of the principal components, and it is defined as where Λ = diag{ 1 , . . .,   } is the diagonal matrix consisting of the eigenvalues of .The upper control limit based on  2 is obtained using  distribution and is given by where  ,−; is the upper  critical point of  distribution with (,  − ) degrees of freedom.
The SPE chart represents Euclidean distance from the model space and it is defined as where   () is the reconstructed feature vector with  principal components in the feature space; () is identical to   () if  equals .The details of formula ( 21) can be found in [27].
Assuming that the prediction errors are normally distributed, the upper control limit on the SPE at significance level  is obtained using  2 distribution and is given by where  = /2, ℎ = 2 2 /, and  and  are the mean and variance of the SPE at each time interval.(1) Acquire train data  = [ 1 ,  2 , . . .,    ],   ∈    and known faults data   , where  is the number of samples and   is the number of variables.
(3) Determine the number of kept variables.
The second step is on-line monitoring: (1) Acquire the new test data  new by first step. (

Fault Diagnosis of Distillation Column
Distillation column is very important in the field of chemical industry, which has been widely applied to chemical and oil refining enterprises.Mixed liquid is separated into various components by distillation column.The principle of distillation process is separating mixture by the different features such as temperature and volatile liquid separation (boiling point) so as to achieve the purpose of purification.
Distillation column is an indispensable device of chemical process and oil refining enterprise; once faults appear, they will bring great losses to enterprise, so the fault detection and diagnosis of distillation column are important to chemical production.There are many factors which impact the distillation column production operation and product quality; distillation column is a complex system of more than one parameter in the process of distillation.Physical phenomena occur, such as flow and heat transfer; there are many practical difficulties to truly understand the actual process, and, in the development of early fault diagnosis, many scholars put forward the mathematical model to describe the process of actually solving the problem of fault diagnosis of the distillation column, mainly divided into static kind and dynamic kind of mathematical simulation model.These models have a lot of defects; however, they do not meet the needs of diagnosis.
Mathematical model of distillation column cannot be instead of the actual process really; before the modeling, the researchers have to come up with some assumptions to simplify model, because it is difficult to solve complicated mathematical model, which leads to vast differences between model and actual process.So, considering these problems, multivariate statistical process monitoring provides a datadriven framework for monitoring the industrial process (4) Figure 2: The process industry integrated automation hardware-in-the-loop simulation system.without accurate physical models, which is convenient for implementation.
In industrial production, due to the huge scale of the object, complex of production raw material, equipment maintenance difficult, and the risk of high temperature and high pressure of the production processes, it is difficult to carry out experimental study of industrial fault monitoring for theory researchers.So the hardware-in-the-loop simulation system is developed in this paper, using real hardware controller and industrial control network to build distributed control system (DCS).Develop computer simulation model as controlled object.In this paper, distillation column system (PCS 7 Unit Template) of Siemens as one of controlled objects has been studied, as shown in Figure 2.
Based on hardware-in-the-loop simulation system the valve faults, concentration faults, flow faults, and so on can be set.For the same fault, different fault parameters can be set; fault and disturbance in the distillation column also can be set according to the actual situation; for example, feed concentration can be reduced or increased 0.01; these faults are unable to simulate in the Tennessee Eastman process and other simulation processes.Satisfy our research of fault diagnosis, and there is no need to worry about the limitations of the factory.
Traditional fault diagnosis methods can not accurately and quickly detect the faults.With the development of the Internet and wide use of smart and wireless sensors and wireless communications and mobile devices, a large amount of process data has been produced; how to use the process data to determine whether the industrial process is normal is a hot topic.In this paper data-driven methods (especially dynamic kernel PCA) are presented for applying to monitoring of distillation column.According to the data of the distillation column system, a framework about process data hierarchy can be seen in Figure 3.
A flowchart of distillation column process is given in Figure 4. Distillation column is an indispensable device of chemical industrials and oil refining enterprises; once the system shows valve faults, concentration faults, and flow faults, the enterprises would suffer a great loss.Thus the fault detection and diagnosis of distillation tower become an important link in chemical production.This section studies the application of data-driven algorithms in the distillation column; the author selected the 14 monitoring variables, as shown in Table 1, and collected 960 groups of sample data under normal condition for off-line training model.Set faults as shown in Table 2; the faults mainly included three categories, respectively: the valve faults, concentration faults, and flow faults; the author collected 960 pieces of observation data from distillation column process for on-line testing.The ( faults were introduced to the process at t = 5 h and the faults' effect persisted until 16 hours.Faults 4, 5, and 6 are valve faults which are large change, so these faults could be detected by the data-driven methods and the alarm system of DCS.Because the valve faults were similar, the author has only given the results of fault 6.As shown in Figure 5,  2 index and SPE index of DPCA had poor results that the false rates were significantly high. 2 index and SPE index of KPCA perform better than DPCA.But, in this connection,  2 index and SPE index of I-DKPCA were better than KPCA and DPCA obviously.
Fault 1 is a big deviation of feed concentration.As shown in Figure 6, we can see that KPCA and I-DKPCA could detect fault more quickly than DPCA.The early detection can help the industries avoid greater loss.Besides, the false alarm rates of I-DKPCA were lower than DPCA and KPCA.In a word, compared with DPCA and KPCA, I-DKPCA shows the best performance.
Fault 2 was also a fault of feed concentration and fault 3 was a flow fault.Compared with fault 1, fault 2 was a small deviation; concentration value only changes 0.01, and fault 3 was also a small deviation; flow value changes 0.1.These  small deviations could not be detected by the alarm system of DCS.Some monitoring charts for fault 2 and 3 are shown in Figures 7 and 8. SPE chart of DPCA could detect the faults, but  2 index of DPCA had big missed detection rates.Both  2 index and SPE index of KPCA give very high detection rates, but the false alarm rates were also high.Only the SPE index and  2 index of I-DKPCA perform the best results which had the lowest false alarm rates and highest detection rates.
In conclusion, the data-driven methods could apply to distillation column well, and the I-DKPCA based monitoring performs highest detection rate and smallest missed rates for all faults, in particular, when the small deviation happens.

Conclusion
In this paper, process industry integrated automation hardware-in-the-loop simulation system has been developed, using real hardware controller and industrial control network to build distributed control system and distillation column system (PCS 7 Unit Template) of Siemens as one of controlled objects for fault diagnosis research, which made us get wanted data easily regardless of the loss.Realize the simulation of fault diagnosis from purely simulation object to the hardware-in-the-loop simulation system and promote the application of the data-driven technique in the actual industrial system.A new indiscernibility dynamic kernel principal component analysis (I-DKPCA) method was proposed; the new method not only considered dynamic and nonlinear characteristics of industrial processes but also reduced the data dimension by indiscernibility and the cross-degree, got rid of irrelevant variables, simplified the calculation, and improved the efficiency and accuracy of fault diagnosis.Through the applications of DPCA, KPCA, and I-DKPCA in the distillation column, the results showed that the proposed I-DKPCA performed better than DPCA and KPCA for all faults, especially for small faults.

2. 4 .
I-DKPCA Based Monitoring. Figure 1 shows the flowchart used for the necessary calculation of monitoring based on I-DKPCA.The first step is setting up model by off-line training: controller Unit controller Unit controller Unit controll
Train data Z and known faults data Z f from the server . ., ,  is the number of principal components in KPCA.By (17), we obtain a score vector  new, = [ new,1 ,  new,2 , . . .,  new, ]  for  new .At present, a number of kernel functions are used in KPCA.In our study, only a Gaussian function is selected for it is widely used.Consider  (,   ) = exp (−      − 2 2 ) .
) Calculate  2 new and SPE new .(3) If  2 new <  2  and SPE new < SPE  the process is normal; else the process is abnormal.

Table 1 :
Fourteen variables used for monitoring.