Nonlinear Fault Separation for Redundancy Process Variables Based on FNN in MKFDA Subspace

1 College of Automation, Chongqing University, Chongqing 400044, China 2 School of Electrical and Information Engineering, Chongqing University of Science and Technology, Chongqing 401331, China 3 School of Safety Engineering, Chongqing University of Science and Technology, Chongqing 401331, China 4College of Mechanical and Power Engineering, Chongqing University of Science and Technology, Chongqing 401331, China


Introduction
With developments of modern process industry, multivariate monitor from sensors has showed their multicollinearity, nonlinear correlative coupling, time delay, and redundancy.It makes complexity increasing with exponent to fault separation and diagnosis, called "Curse of Dimension" [1,2].On the other hand, right ratio of fault classification decreases with multivariate and redundancy process variables.Therefore, many attentions have been paid on two points of view that are variable selection and dimension reduction [3,4].
Among the study of variable selection, the existed methods can be broadly classified into three categories: random search techniques, measure-based method, and intelligent computation.In random search, each process variable is directly deleted or involved in the classification model one time in turn to search the most suitable input sets under a certain criterion, such as forward selection, backward selection, and stepwise that are simple and easily realized methods [5].
While it was studied by Masion and Gunst [6] that these methods would result in mistaken results, variable set appears multicollinearity.Measure-based method appears to select variable with computing relevancy among all variables, as well as that between variables and labels.The variables with highest similar characteristic will be gathered in one kind.According to different definition, K-L information measure, minimum description length, and mutual information are used [7][8][9].Intelligent computation deepens to solve nonlinear variable selection problem, such as neurnal network that is once used to nonlinear model, while its selection criterion is uncertain [10].
Dimension reduction is different from variable selection, which mainly depends on transformation and information extraction of original variable matrix.It projects original variables with a certain mapping to a new subspace and extracts information in lower dimension, such as principal component analysis (PCA) [11] and partial least squares (PLS) [12].Original variables with linear-relative process variables are linearly projected according to the maximum direction of covariance matrix.Meanwhile, the maximum original information can be kept as most as possible.Contribution chart method is the way to calculate contribution of each variable to certain fault with  2 statics and SPE [13,14] for PCA.The above linear methods have been extended to nonlinear ones after kernel method presented [15][16][17][18][19][20], such as kernel principal component analysis (KPCA), kernel partial least squares (KPLS), and kernel fisher discriminant analysis (KFDA).Kernel method converts a linear classification learning algorithm into nonlinear one, by mapping the original observations into a higher-dimensional space.So that linear classifier in the new space equals to a nonlinear classifier in the original space.However, nonlinear information projected to the new feature space has higher dimension, and data matrix has lost their original physical meaning in original sample space.If we separated nonlinear faults crossed together in original space, the dimension of classifier with kernel method would become huge, while right ratio would decrease with redundancy and multicollinearity variables.
The objective of this paper is to deepen dimension reduction method for the above problems with measure method in variable selection called MKFDA-FNN.Nonlinear process variables are projected in higher-dimension space with MKFDA.Discriminant vector and its corresponding feature vector with maximum separation are computed to cluster original variables with highest similarity.With embeddimension increasing, false nearest neighbors (FNN) with high similarity are able to be removed in turn.Thus, nonlinear redundancy and multicollinearity process variables can be removed from input sets to nonlinear classifier.Finally, we give an actual fault separation problem in classical chemical process Tennessee Eastman (TE) to further study.

Problem Description
In fault separation problem presented above, it equals to screen original process variables related to certain faults as most as possible.Multivariate data matrix considered initially with normal and fault information is described in Figure 1, where  1 , x n Ψ(X) and Ψ(X i ) when each original variable places to zero where , , and ℎ present maximum delay order of process/ control variables  1 ,  2 , and   ,  presents current sample time, and  is sample length.

Multivariate Fault Separation Based on MKFDA-FNN
To fault separation problem with nonlinear redundancy process/control variables, an approach is proposed in Figure 2. Correlated nonlinear variables are firstly projected to a higher-dimension MKFDA subspace.Furthermore, in order to find fairly useful variables, the importance of each input is measured in subspace with distance measure inspired by FNN.Accordingly, redundant variables are recognized.It makes separation of faults crossed together easily.

False Nearest Neighbors.
FNN is the feature selection method on the basis of phase space reconstruction (PSR) in high-dimension data space [21].With embed-dimension increasing, movement locus becomes open, and false nearest neighbors with high similarity are able to be removed in turn.
It restores the locus of chaos.Its algorithm is as follows.
In -dimension phase space including original variables and their time delay, each phase vector () = {(), ( + ), . . ., ( + ( − 1))} has one nearest neighbors   ().Their 2-norm distance is When -dimension is increased to  + 1, the above phase vector is changed as new one, noted as  +1 () in If  +1 () was much bigger than   (), it means the projection of two nonneighbor phase vector from higher dimension to lower one.So the two neighbors are the false nearest neighbors.
The similarity between  and  is If distance measure is small, it shows that vectors  and  have highly similarity.That is, the removed variable   makes little impact on nonlinear pattern, and process variable   has low interpreting ability.Otherwise, if it was much bigger, it reveals that  much differs from .Process variable   is important to interpreting of nonlinear pattern. is false nearest neighbors of .

Kernel Fisher Discriminant Analysis.
KFDA is most useful to nonlinear classification problems [22].Nonlinear discriminant vector in original space is extracted to linear optimal discriminate vector in high-dimension feature space  with conventional fisher discriminant analysis (FDA).Since dimension of  is much higher, it is hard to directly confirm nonlinear mapping function from original space to the feature space.Reproducing kernel-based method widely developed in machine learning (ML) can achieve this goal.Nonlinear mapping is indirectly found according to (x, y) = Φ(x)  Φ(y) in Gram-space [23], where Φ : R  → .
Conventional kernel function can be selected as follows [6].
,  is the parameter of breadth.
Assume that original sample set was  = { 1 ,  2 , . . .,   } with -dimension and -samples, where   is the sample of th type,   = |  |, and  = 1, 2. There exists nonlinear mapping function Φ : R  → .It transforms nonlinear original sample space R  to linear classification in high-dimension data space ; that is, Φ(x) ∈ ,  ∈ R  .In space , distance scatter of intraclass and classes with training data is S  and S  in (7) and (9), respectively, where m  is the mean of th type in feature space.KFDA is to find a projection direction w, which meets the following two properties: (1) data that has similar characteristic should be gathered together as most as possible; (2) the ones with different characteristic should be gathered as far as possible.So a key is to search projection direction w * and its corresponding discriminant function (x) = (w * )  x − y 0 .Similarly with linear FDA, the optimal projection direction w * is to search vector w, which maximizes fisher criterion function (10), where w * is optimal projection direction: Since dimension of feature space  is usually high and Φ is indirect mapping function, discriminant vector is hard to compute directly.Thus, each solution w is expressed as linear Step 2 Select suitable multikernel function Step 3 Compute the kernel mean vector between two kinds with k  = ∑ Step 4 Compute the kernel scatter matrix of intraclass Step 5 Compute Get the optimal solution of (16) Step 7 Place the inspected process variable as zero in original samples Step 8 Project the new samples into the feature space Step 9 Compute the contribution of one variable at one time with FNN in MKFDA Step 10 Repeat the above course for the remaining variables

Outputs
The distance measure  = [ 1 ,  2 , . . .,   ] of each original variable is obtained combination of samples in (11), according to kernel-based method, where  = ( 1 ,  2 , . . .,   )  .Moreover, nonlinear transformation function Φ(x 1 ), Φ(x 2 ), . . ., Φ(x  ) of samples can be projected to feature space  with direction w in w  Φ (x  ) =     Φ (x i ) =   (Φ (x 1 ) , Φ (x 2 ) , . . ., Φ (x  )) Φ (x  ) , =   ( (x 1 , x  ) ,  (x 2 , x  ) , . . .,  (x  , x  )) , From (11), for all x ∈ R  , assume that k  = ((x 1 , x), (x 2 , x), . . ., (x 2 , x))  and projection of mean vector m  with direction w * in feature space  is where From ( 12) and ( 13), we have where Since fisher criterion function is optimal solution of ( 15), vector w can be resolved as  in the following fisher criterion (16) [24]: Furthermore, the solution of optimal vector  * and y 0 can be solved [25] with 2 . ( Thus, the corresponding function of kernel fisher discriminant function is obtained as 3.3.Multikernel Fisher Discriminant Analysis.From Section 3.2, the solution of maximizing ( 15) equals to the solution of maximizing (16).Assume that ) is optimal solution to classification effect, whereas  * is both determined by kernel scatter matrix k  and difference of kernel mean vector (u 1 −u 2 ).In the condition of independent and identically distributed, kernel mean of samples is independent with number of samples.It indicates difference of kernel mean vector (u 1 − u 2 ) doing nothing with the unbalance of samples.So  * is only determined by kernel scatter matrix k  for intraclass.If distribution of different variables differed, it should result in the contributions k  1 , k  2 , . . ., k  not in the similar interval.Besides that the solution of  * is not the optimal one.Hence, in order to avoid the influence of different distribution for samples, we presented multikernel fisher discriminant analysis method.It advances the kernel criterion function where ( ∈ [0, 1]) is the adjustable MATLAB parameter and k  1 and k  2 are the kernel matrix computed with each suitable kernel function from Section 3.2 (i)/(ii)/(iii).In this way, the influence with different sample distributions is considered with the suitable kernel function.
The above algorithm in this paper can be chiefly described in Table 1.In this way, the contribution of each original process variable   to the certain fault is measured.

Fault Separation of Tennessee Eastman with Redundancy Variables
4.1.Tennessee Eastman Chemical Process.Tennessee Eastman (TE) is a classical chemical process created by Eastman Chemical Company in 1993 [26].Its technological process is shown in Figure 4.There are four reactants (A, C, D, and E) and two products (G, H).Besides that, there is one inert material B and byproduct F. In TE process, the dynamic TE model is composed of five major units: a reactor, a separator, a stripper, a condenser, and a compressor.Each unit can be expressed with some equations, in all of 148 algebraic equations and 30 differential equations.So it becomes one of the most complex models and is widely used to test study algorithm with control, system monitor, fault diagnosis, and so forth.Here, we take Tennessee Eastman as the study object to measure its fault separation ability with our method.

Nonlinear Fault Separation of Redundancy Variables.
In TE process, there are 41 observed variables and 12 manipulated variables from controller, some of which are nonlinear redundancy variables.Moreover, there are 20 types of classical fault in TE process shown in Table 2. Since Fault9 and Fault11 are nonlinear overlapped together shown in Figure 5, we take their fault separation as the study goal, meanwhile, 53 process variables must be screened for their multicollinearity and nonlinear redundancy.Process data of TE is simulated at one-minute sampling time in MATLAB software from Downs [27].All the measurements have Gaussian noise.A total of 1000 samples are collected for training, where 800 data are collected for Fault9 and 200 for Fault11.In addition, 835 samples are applied to test separation validity with 644 for Fault9 and 171 for Fault11.
In the Following, the curves of the first two important Vab.21 and Vab.13 in TE process are given in Figures 7(a  strong variation of process variables Vab.21 and Vab.13, actually. According to the sequence of each process variable, the different feature sets are constructed as {Vab.21},{Vab.21,Vab.13}, {Vab.21,Vab.13, Vab.9}, and so on.Nonlinear pattern classification of Fault9 and Fault11 is tested with support vector machine (SVM), which is widely used in pattern recognition.The parameters of SVM are optimized with crossvalidation  = 2035 and  = 1024.With the above variable sets, the accuracy of fault separation between Fault9 and Fault11 is successively tested.The results are shown in Figure 9 and Table 4.It reveals that the separation accuracy becomes lower when the considered variables increase.
From the above results, we conclude that (1) if all the 53 process variables were used to separate Fault9 and Fault 11, right ratio is merely 72.12%.It indicates that not all of the variables are directly related to certain fault.Some redundancy or irrelevant variables may decrease the classification accuracy

Conclusions
Nonlinear redundancy and multicollinearity variables can decrease the accuracy in classifier that must be eliminated.
For the problem, FNN in MKFDA subspace is studied in the paper.Nonlinear variables are projected to a new linear higher dimension subspace with single-kernel fisher discriment analysis to get optimal classification with the intra-class nearest and inter-class farthest as most as possible.Furthermore, conventional single-kernel KFDA is expanded to multikernel method to solve the influence of each process variable with different distribution function.In order to reduce the higher dimension emerging in multi-KFDA subspace, FNN is composed to recognize the importance of each process variables on faults.According to simulation results in TE process, original variables are reduced to 5 in this paper, and the accuracy of tested right ratio reaches to 94.55% compared with tested right ratio 72.12% in the classifier between Fault9 and Fault11.

Figure 1 :
Figure 1: The fault diagnosis with multivariate.

Figure 2 :
Figure 2: Nonlinear fault diagnosis with redundancy process variables based on FNN in MKFDA subspace.

Figure 4 :
Figure 4: The technological process of Tennessee Eastman.

Figure 6 :
Figure 6: Contribution of all the 53 process variables to distinguish Fault9 and Fault11.

Figure 7 :
Figure 7: The changing of process Vab.21 in actual TE.

Figure 8 :
Figure 8: The changing of process Vab.13 in actual TE.

Figure 9 :
Figure 9: The accuracy with different feature sets to indentify Fault9 and Fault11 with testing data.

Table 2 :
State distribution in TE process.

Table 3 :
The contributions of 53 process variables to fault separation.

Table 4 :
13, accuracy with different feature sets with testing data.Vab.13,Vab.9,Vab.16, Vab.7} are the reactor coolant temperature, product separation pressure, reactor temperature, stripper pressure, reactor pressure, respectively, it is easy to see that the five selected variables are fairly relative to Fault9 and Fault11.The simulation results keep pace with the reality.