Fault Detection and Diagnosis in Process Data Using Support Vector Machines

unrestricted


Introduction
Now, for the complex industrial production systems, fault diagnosis and prediction play an extremely important role.Fault diagnosis is to identify the abnormal circumstances of a system [1].To find the fault type and to determine the cause of the fault as soon as possible have a vital significance [2].It can not only reduce the pecuniary loss and avoid the waste of resources but also ensure the safety of a production.In recent years, numerous multivariate statistical process control (MSPC) methods have been developed and have gotten industry's attention [3,4].There are many MSPC tools, for example, principal components analysis (PCA) [5], dynamic principal components analysis (DPCA), correspondence analysis (CA) [6], canonical variate analysis (CVA) [7], kernel independent component analysis (KICA) [8], and modified independent component analysis (MICA) [9] which have been used in the actual industry process [10,11].Support vector machine (SVM) is a form of machine learning technology, which has a robust mathematical foundation and has many advantages in solving various classification and regression problems.The generalization ability of SVM is also great.So, more and more people start to research on the SVM.RFE arithmetic is to calculate sorting coefficient based on the weight vector , which is got from the SVM model training process, and then make the feature variables in descending order according to the sorting coefficient.The study in this paper is based on the Tennessee Eastman (TE) process, which has been extensively applied in process control and statistical process monitoring [12,13].Datasets of this process have been used in a variety of machine learning researches.
In this paper, the PCA and RFE algorithm is first introduced.The fault detection methods by using PCA  2 , SPE statistic, and original SVM are then outlined followed by the SVM-RFE method to find out the most relevant variables of these common faults.The pertinent variables of these common faults are shown after that.Finally, PCA-SVM and SVM-RFE methods are utilized for the classification of several kinds of faults.

Related Words
2.1.Support Vector Machine.SVM was firstly proposed by Cortes and Vapnik in 1995 [14] and developed from the statistical learning.It is closely associated with two theories.The first is VC dimension theory, which equals the complexity of a problem.That is to say, the higher the VC dimension is, the weaker its generalization ability becomes.The second is structural risk minimization theory, which SVM method pursuits, meaning to minimize the empirical risk and confidence at the same time.SVM technology has many advantages in solving the problems of which the size of the dataset is relatively small or the samples are nonlinear or the dataset has a high dimension.SVM algorithm firstly maps the lowdimensional samples onto a higher feature space.In this way, the samples which cannot be divided into a low-dimensional space will become linearly separable.A maximum separating hyperplane will be constructed in the higher dimensional space.SVM solves the linear nonseparable case by using slack variables and error penalty.Since the separating plane is constructed based on the support vectors, SVM is a good solution to solve the problem of high dimension.Kernel function is introduced to replace the nonlinear mapping, which avoids numerous unresolved problems.
Assume that the input vectors have two categories.The labels of the first category are   = +1, and the labels of the second category are   = −1.  represents an input vector.The input samples can be written as follows: The classification hyperplane to separate given data is as follows: where  represented the weight vector and  represented the constant Taking into account the samples that cannot be classified, the slack variable is  and the penalty variable is .The function to get optimal hyperplane could be written as follows: where   reflects the distance between   and the margin,   ≥ 0.   is the wrong classified samples.The calculation can be simplified into another problem, which is as follows: In order to get the value of  and  which lead (5) to be minimum, the equation at the extreme point can be written as According to ( 5) and (7), the decision function is developed into a dual quadratic optimization problem, which will be The kernel function plays an important role in solving the nonlinear problems.The low-dimensional vectors are projected onto a high-dimensional space according to nonlinear function Φ() = ( 1 (), . . .,   ()).The liner decision function is replaced as The kernel function could be utilized for solving the computational problem.The modality of Φ does not need to be known, when the kernel functions are applied.The decision can be written as There are three ways to utilize SVM method for multiclass classification: 1-v-r SVMs, 1-v-1 SVMs, and H-SVMs.
For 1-v-r SVMs classification method, each type of samples is treated as a class for each time and the other remaining samples belong to the other class.After the SVM models are trained in this way, these  types of samples will construct  classification models.A testing sample will be put into a class in which the sample has the maximal classification function value.For 1-v-1 SVMs classification method, since an SVM classification model is trained for every two kinds of samples,  categories of training samples should train ( − 1)/2 SVM classification models.Then, every classification model will be used on a testing sample.The kind of category that the testing sample belongs to depends on the number of votes, which are cast by the classification models.For H-SVMs, all categories are divided into two subsets.After testing by SVM classifier, the subset which the sample belongs to is divided into another two categories.In this way, the right category is finally determined [15,16].
Because of the rigorous mathematical foundation, support vector machine reflects many unique advantages.It has been widely used in various researches.In the age of big data, machine learning methods and support vector machine technology will have a larger development in the future.

Principal Components Analysis. PCA was proposed by
Pearson in 1901.PCA aims at decorrelating the industrial process data, which is highly correlated and multivariate, and reducing the dimension of the data.In order to reduce the dimension and keep more information of the original data at the same time, a low-dimensional data model needs to be established and the variance of the data reaches the maximum value through projection.After dimensional reduction, the noise and redundant information in the original higher dimensional space can be eliminated.Because of this, the error can be reduced and the recognition accuracy will be improved.The internal structural characteristics of the data itself can also be found by dimension reduction method.Finally, to realize the purpose of improving the calculation speed, PCA statistics are used as metrics to determine whether the samples are faulty or not.If the metric of the testing data outstrips the cordon, the samples will be treated as faulty [17,18].

Recursive Feature Elimination
Algorithm.RFE arithmetic is to calculate the sorting coefficient at first according to the weight vector , which is got from the SVM model training process, then take out the feature variable which has minimal sorting coefficient in each iteration, and get the descending order of feature variables finally.The classical SVM-RFE is based on the linear kernel function, while, in nonlinear case, the RBF is used as the kernel [19].
In each iteration, the feature variable with minimal sorting coefficient will be removed.The new sorting coefficient will be obtained by using SVM to train the rest of the feature variables.By executing this process iteratively, a feature ranking list is got.According to this ranking list, the relative degree for each feature with the category will be known.A plurality of nested feature subsets to the SVM model is defined.The quality of these subsets can be assessed by the classification accuracy.So, the optimal feature subset can be obtained.To TE process, such a feature ranking list is got for each type of fault.The first feature in the list is taken as the most relevant one to be analysed [20].

PCA-SVM Algorithm
. PCA aims at decorrelating the industrial process data, which is highly correlated and multivariate, and reducing the dimension of the data [21].
First of all, to get a data matrix   under normal conditions, the rows of the matrix represent  specimens and columns represent  variables.This matrix is autoscaled firstly.The singular value decomposition of this matrix is as follows: The singular value decomposition of this matrix can be replaced by the eigenvalue decomposition of the sample's covariance matrix .Because the sample matrix has been autoscaled, the eigenvalue decomposition is as follows: Eigenvalues are ranked depending on their values in descending order.The feature vectors in  are ranked corresponding to their eigenvalues.Selecting the first  linearly independent eigenvectors N = [ 1 , . . .,   ] to constitute the principal component space Ŝ, the rest of the vectors are Ñ = [ +1 , . . .,   ] form the residual space S.
is decomposed as follows: M N =  1   1 + 2   2 +⋅ ⋅ ⋅+     represents the score matrix. is an error matrix which is caused by noise and contains only a little useful information.It can be removed with little losses.So,  can be written as follows: After establishing the principal component model, the undetected data  can be written as follows: The vectors x and x represent the data that is filtered by the principal component space Ŝ and the residual space S.
When PCA-SVM method is used to classify the types of fault, the dataset should be autoscaled at first. is the dataset that has been autoscaled The functions std(⋅) and mean(⋅) are used to calculate the standard deviation and mean value, respectively.Both the training data and the testing data should be autoscaled. tr is the autoscaled training data and  te is the autoscaled testing data.
The PCA decomposition is done to the normal sample data  0 .The number of principal components is set as .The control limit of PCA  2 statistic is determined by  distribution as follows: The control limit of PCA SPE statistic is calculated as follows: where   = ∑  =+1    ,  = 1, 2, 3, and The autoscaled fault samples  tr and  te are mapping onto the space constructed by the normal samples.Dimension reduction method is used for these fault samples according to (15).The PCA  2 statistics for the training and testing samples are calculated as follows: The PCA SPE statistics for the training and testing samples are calculated as follows: If the calculated  2 and SPE values exceed the threshold, the sample can be categorized as faulty.

Tennessee Eastman Process
The TE process was created by the Eastman Chemical Company to provide a realistic industrial process for evaluating process control and monitoring methods.It was firstly proposed by Downs and Vogel on the AICHE (American Institute of Chemical Engineers) [22].It has been extensively used in the aspects of control and optimization, process control, fault diagnosis, statistical process monitoring, data driven, and so on.The dataset has become a common data source for these research directions [23].
As shown in Figure 1, the TE plant mainly consists of five units.There are eight components.A, C, D, and E are gaseous materials.B is an inert substance.F is a liquid reaction byproduct.G and H are liquid reaction products.The reaction of H has a lower energy than that of G. G has a higher sensitivity to temperature.
Due to the presence of a catalyst, the gaseous reactants will become liquid product when it enters into the reactor.This catalyst is permanent and soluble in liquid.The reactor has an internal condenser.It is used to remove the heat generated by the reaction.Along with the component which is not completely reacted, the product leaves the reactor in the form of steam.The product comes to the gas-liquid separator through a condenser.Part of the steam is condensed and transported to the stripper.The stream mainly contains A. C is treated as the stripping stream.The remaining unreacted components are separated from the bottom of the stripper.Inert matter and the by-product are mainly out in the form of gas from the gas-liquid separator [24].
The whole TE process simulation has 21 kinds of preprogrammed faults, and it consists of 22 continuous measured variables, 19 composition measurements, and 12 manipulated variables.Both continuous measured variables and composition measurements are belonging to measured variables.

Fault Diagnosis
The dataset of the TEP can be got from the internet.All the variables are taken into account to analyse [24].Select faults 1, 4, 5, 7, and 11 to research.The training dataset contains 500 samples which are obtained from the normal condition.The thresholds of the PCA  2 and SPE statistics are calculated by this dataset.The testing dataset, using the PCA method to detect fault, contains 960 samples of each fault.The testing dataset to detect fault by SVM contains last 800 samples of each fault.Select 480 samples of each fault, respectively.Treat these samples as the training dataset of the SVM-PCA and SVM-RFE algorithm.There are 960 samples of each fault in the dataset.The first 160 samples which are obtained during the first 8 hours are normal.So, the last 800 samples are chosen from the original dataset of each fault as the testing dataset of the SVM-PCA and SVM-RFE algorithm.
By SVM-RFE algorithm, the feature ranking lists for each fault can be got, from which the most relevant variables will be known, as shown in Table 1.According to the relevant variables, the reasons of each fault could be analysed.
SVM method is also used to determine whether a testing sample is faulty or not.The model has been trained by the training dataset.The category of the testing samples will be directly predicted by the model.The sample, which predicted label is 0, is normal.Otherwise, it is a fault one.
PCA  2 and SPE statistics are used as the metrics for fault detection.The threshold can be calculated by the normal  samples.If the value of a testing sample exceeds either one of the threshold, this sample will be treated as faulty.
Depending on the normal dataset of the TE process, the threshold of PCA  2 and SPE statistics will be got.As shown in Figure 2, the data which exceeds the red dotted line can be considered as a fault data.The normal dataset is under the threshold as a whole except some abnormal individual points.The threshold of  2 statistic is 21.666.The threshold of SPE statistic is 2.5437 * (10 −6 ).The changes of the flow rates and compositions of stream 6 can affect the variable in the reactor.The rate of flow in stream 4 will arrive at a steady-state value, which is lower than the normal operating conditions.Since many variables are varied apparently, it will be easy to detect these faults.

Fault 1. As shown in
As shown in Figure 4, SVM method is used to detect the fault 1.The accuracy rate of detecting the fault 1 is 92.6%.Some points fail to be effectively recognised.The main reason why some data cannot be correctly distinguished is that the feature extraction not efficient and reasonable, that characteristics of fault are not to be identified very well by SVM classification.Compared with the PCA method, the rate by SVM is lower.
Figure 5 shows the PCA-based statistics for fault detection.The line in the figure is the  2 and SPE threshold.The first eight hours before the fault is produced are under normal operating conditions.Therefore, all the data are below the predetermined threshold.The accuracy rate of fault detection is 99.5% by using this method.The PCA statistic is more sensitive than the SVM in the detection of fault 1.To a fault which results in the variable changes obviously, many methods do well in the fault detection.happens, that temperature increases quickly, but the standard deviation the other is to that under normal conditions.This enhances the difficulty of fault detection.
As shown in Figure 6, it can be seen that the value of variable 51 in the testing dataset has an obvious step change.The changed value of variable 51 was maintained between 44 and 46, while the normal dataset does not have a step change, and its normal value is between 40 and 42.For the training samples of fault 4, the value ranges between 44 and 46 and is different from the normal values.This character helps SVM classification model distinguish fault samples correctly.
As shown in Figure 7, SVM method is used to classify the fault 4 data from the normal samples.To the majority of the fault data, SVM can correctly classify them.The recognition rate of fault 4 is 99.8%.The detection rate is similar to that obtained from PCA method.As shown in Figure 8, the PCA-based statistics are utilized for fault 4 detection; the dotted line represents the threshold of  2 and SPE.After 160 sets of normal samples, the value of statistics appears a step variety.This change to the step change of that temperature.After the fault occurs, the SPE statistics are higher than the threshold as a and the  2 statistics are partly higher than its threshold.The fault can be detected easily from this figure.The detection rate of fault 4 by PCA is almost 100%.

Fault 11.
The reason of 11 is the same as fault 4. As shown in Figure 9, the fault causes a greater oscillation for the temperature of the cooling water.The vibration amplitude is much larger than the vibration under normal conditions.Under normal circumstances, the range of the variable's value is between 40 and 42, while the variable's value is between 35 and 50 when a fault occurs.The other variables are retained at the level of normal operating conditions.As shown in Figure 10, the accuracy rate of fault 11 by SVM method is 99.6%.The sample whose label is 0 is treated as normal data, while the testing dataset consists of 800 faulty samples.These points are put into a wrong class.Data preprocessing step can be further increased to improve accuracy rate of fault detection by SVM.
As is shown in Figure 11, it sensitive for PCA-based statistics to detect the samples of fault 11.After the first 160 samples, the  2 and SPE statistics of the samples have a larger fluctuation when compared to that under normal conditions.Most of the data are beyond the red line, and the recognition rate of this fault is 87%.The detection rate is lower than that of SVM method.According to the simulation results, when compared with the original SVM classification method, the PCA-based method reduces the complexity of the classification model.Different detection methods are suitable for diverse faults; for instance, the SVM method is more appropriate in the detection of fault 11 than the PCA method.The method of SVM on the fault detection could be further strengthened by adding data preprocessing and choosing an appropriate feature extraction way to increase the detection accuracy.

Fault Detection.
In the SVM-RFE algorithm, the features have been ranked in a list by the calculated sorting coefficient, from which the most relevant variables and the relevance ranking to corresponding faults can be got.In order to get the maximum accuracy rate, different numbers of features are selected to train the classification model.The classification accuracy rates by simulation for different numbers of features are as follows.
The first feature in the ranking list is chosen as the most relevant one to the fault to analyze.The feature ranking in the front row may not make SVM classifier to obtain the best classification performance individually.By combining multiple features together, the classifier achieves optimal performance.Therefore, SVM-RFE algorithm can select the best complementary feature combination.As shown in Table 2, if the dimension is reduced to an improper degree, useful information will be reduced, and that will lead to the descent of classification accuracy rate and the faults cannot be diagnosed very well.
In the PCA-SVM algorithm, the autoscaled data should be preprocessed.Then, the PCA method is used to take feature extraction and dimensionality reduction.After the faulty samples are projected onto the principal component space which is constructed by the normal samples, both the dimension and the amount of computation are decreased.The principal features of each fault are extracted.As shown in Table 3, three different methods have been used to distinguish the samples of the 5 faults.The classification accuracy rates for each fault and the entire testing dataset can be got from this table.From the classification results, for the faults 1, 4, 5, and 7, these three methods can achieve high classification accuracy and only some special samples in the testing data are wrongly classified.For the fault 11, which has strong sequence correlation, three kinds of methods are not able to classify well, probably since these three methods cannot well consider the deviation caused by the serial correlation.The overall result says that the RFE feature extraction method can improve the accuracy rate by selecting an appropriate feature vector number, when compared with the original SVM algorithm.Though the effect of PCA-SVM method is similar to the original SVM method in detecting these types of faults, it greatly improves the computing speed.

Conclusion
Depending on the method of PCA-SVM and SVM-RFE, these common faults in the TE process are effectively diagnosed.These two methods have been compared in terms of the classification accuracy rate.The results are relatively ideal through the simulations of these algorithms.Since the dimension of the dataset obtained from the TE process is only 52, it is not reliable to reduce the dimension of the TE dataset to rapidly improve the classification accuracy simply by SVM-RFE algorithm.If the selected number of feature is not appropriate, useful information will be reduced while the dimension is decreased, which will lead to the disadvantage of fault diagnosis.
In addition, the threshold of PCA statistics is got by the normal samples obtained from the TE process.It can be used to test whether a sample is faulty or normal, which achieves very ideal detection effect.The original SVM method is also used to detect the faults.The effect is pretty good.Different detection methods are suitable for diverse faults.According to the method of RFE, the most relevant variables to each fault have been found.These variables have been utilized to analyse the reason why these faults occur.The methods in this paper have a few shortcomings.How to quickly and accurately distinguish all the faults in the TE process by SVM is the aspect which needs to be studied.

Figure 3 ,
the value of variable 19 has a step change, which means that the ratio of A/C feeds takes a step change.But, under normal circumstances, variable 19 does not have such a change, which results in the occurrence of the fault.The training samples also have a step change, so SVM model can learn this change through the training samples, which results in that the fault samples can be detected.

Figure 11 :
Figure 11: The PCA statistics for fault detection for fault 11.

Table 1 :
The most relevant variable for each fault.

Table 2 :
TE process: fault detection rate for training and testing data by SVM-RFE.

Table 3 :
TE process: fault detection rate for testing data.