Fault Diagnosis Method Based on Information Entropy and Relative Principal Component Analysis

In traditional principle component analysis (PCA), because of the neglect of the dimensions influence between different variables in the system, the selected principal components (PCs) often fail to be representative.While the relative transformation PCA is able to solve the above problem, it is not easy to calculate the weight for each characteristic variable. In order to solve it, this paper proposes a kind of fault diagnosis method based on information entropy and Relative Principle Component Analysis. Firstly, the algorithm calculates the information entropy for each characteristic variable in the original dataset based on the information gain algorithm. Secondly, it standardizes every variable’s dimension in the dataset. And, then, according to the information entropy, it allocates the weight for each standardized characteristic variable. Finally, it utilizes the relative-principal-components model established for fault diagnosis. Furthermore, the simulation experiments based on Tennessee Eastman process andWine datasets demonstrate the feasibility and effectiveness of the new method.


Introduction
In the process of the industry manufacturing, there is a large amount of variables that is highly correlative; these variables contain the essential information that would be helpful to judge the status of the system.As a result, it is an important problem to find and predict the fault through this information to ensure that the equipment always works in a safe and reliable way [1,2].
However, during the industry manufacturing, the collected characteristic variables have different units; this raises a problem that we may come out with the different result only due to the unit difference; hence, we have to standardize the unit.What is more, after the standardization, it is inevitable to lose the diversity among different variables and present the property of distribution uniformity in the perspective of geometry which makes it hard to extract the principle component for compression and diagnosis.As to overcome these problem, some methods have been proposed recently [3][4][5][6][7][8].Shi et al. use the Mahalanobis distance for relative transformation to reduce the effect of the dimension standardization [4].Tang et al. propose a relative transformation principal component analysis to reduce the data noise for the transformation oil breakdown voltage prediction [5].Yi et al. introduce a relative transformation operator to change the original variables in the spatial distribution and eigenvalues of the covariance matrix in the feature space [6].Wen et al. propose a method called Relative Principle Component Analysis (RPCA); it introduces weighting for each variable based on the prior information of the system to eliminate the false information due to standardizing the variable units [7,8], but the shortage of this method is that it needs a large amount of prior information from the system which is hard to gain in real engineering application.
In order to solve the problem, this paper introduces the concept of information entropy and proposes a new fault diagnosis method that combines the information entropy and relative transformation PCA which is called information entropy relative transformation PCA (InEnRPCA).The information entropy is put forward by Shannon in 1948 [9]; it indicates that the redundancy exists among any information and can be measured based on the symbol in the information such as the number, alphabet, and words.With the development of the information theory, the information entropy has become an effective method to measure the degree of importance for each feature in the sample and achieved a wide application in many areas.For instance, while using the decision-making tree for classification, Hu et al. use the information entropy to calculate the significance for each feature and then prune the decision-making tree and reduce the false alarm [10].Y. Y. Chen and Y. M. Chen use the information entropy to measure the uncertainty on the data in the attribute reduction algorithm [11].Wang et al. use the information entropy to balance the weight of each sememe in the area of natural language processing [12].
As to the problem we are facing, we start in the perspective of information theory and use the information gain algorithm to extract information entropy as the heuristic knowledge from the original dataset, then use it to calculate the relative transformation factor and allocate the weight for each standardized characteristic variable, and, finally, use the corresponding RPCA method for fault diagnosis.
The rest of this paper is organized as follows: in Section 2, we review the information entropy and information gain definition and algorithm.The original relative transformation PCA method is given in Section 3. Our simulation experiments on Tennessee Eastman process and the Wine dataset from UCI are stated in Section 4, where we make a comparisons between PCA, USPCA, and improved InEnRPCA with thirteen datasets to demonstrate the effectiveness of the new method.Finally, Section 5 gives conclusions and some discussions.

Overview of Our Approach
A brief overview of our fault diagnosis approach is given in this section.As the framework in Figure 1 shows, the proposed approach consists of two parts: calculating the relative transformation operators based on information entropy and fault diagnosis based on RPCA-kNN.
In the part for calculating the relative transformation operators, information entropy and information gain algorithm are applied on train data to get the relative transformation operators.Then, the operators are combined with the original data to get the relative transformation matrix.
In fault diagnosis part, the RPCA are used to deal with processed data for dimension reduction, and kNN are used for classification training.After the above process, a model is built for further fault diagnosis.

Information Entropies
In the subject of the information theory, probability, and statistics, entropy is used to describe the uncertainty of random variable and can be used to show the reduced degree of the information uncertainty for set  after getting the character .

The Definition of the Information Entropy and Information Gain
Theorem 1 (information entropy).Assume  is Lebesgue measure set generated by measurable set  with algebra  and measure  where () = 1, also  can be shown in the form of the incompatible sets with  =   , that is,  =   =1   and   ∪   = Φ, ∀ ̸ =  [13].Then, we can come to a conclusion: where (  ) is the measure for   .
Theorem 2 (information gain).[14].Then, the information gain algorithm can be shown as follows: input: train dataset  and feature ; output: the information gain of set  by feature  defined as (, ).
Step 1. Calculate the empirical entropy of dataset (): Step ( Step 3. Calculate the information gain which is also the relative transformation operator   : Step 4. Repeat the above processes from Steps 1 to 3 for each feature  in the sample and get the relative transformation operator with each feature:

Relative Transformation Principle Component Analysis
After the standardization for the data dimensions by the traditional PCA method, it may bring in some fake information to the principle element due to the uniform distribution.
Therefore, we use the relative transformation PCA method to solve the problem [15].Below is the brief RPCA process.Assume  ∈  × , where  corresponds to the number of the samples and  is the number of features.
Step 1. Transfer the original data to the standardized data with mean zero and variance one.
The mean for each column is where where Σ  ( = 1, . . ., ) is the standard deviation for each column.
We set the relative transformation: where   is relative transformation matrix from  and  is a diagonal matrix in which each of the variables   is the relative transformation operator that can be achieved from Section 3.2.
Step 2. Compute the covariance matrix    from   : Step Step 4. Select the number of the relative principle element based on the accumulative contribution rate: Step 5. Use kNN for classifying.

Application Experiments
In order to verify the universal applicability and effectiveness of the new method, firstly, we test it by thirteen experiments based on Tennessee Eastman process dataset and the Wine dataset [16] which are different in numbers of samples , dimensions , and classes ; in addition, we divide all the classes in three or four classes in sequence to avoid the deliberate choosing; secondly, experiments are also performed on original PCA and traditional DPPCA, respectively.The datasets descriptions and experimental settings are shown in Table 1.

TE Process Example.
We use the Tennessee Eastman process data as the testing samples which are obtained from the document.The dataset includes twenty-one classes and each class in the experiment has two hundred train samples and one hundred test samples; besides, each sample has 52 features.The different types in TE process are overlapped and difficult to classify in observed space.
After the data preprocessing, the number of the principle elements is determined by the sum of elements being more than 85%.For the twelve experiments based on Tennessee Eastman process, we choose one of the experiments based on TE classes 5, 6, 7, and 8 labeled by F1, F2, F3, and F4, respectively, for detail introduction and demonstration.Its relative transformation operator for each feature to the TE process can be seen in Table 2 and the experimental results with the three approaches are shown in Figures 2(a)-2(c).
As we can see from the figures, PCA and DSPCA have poor performance in pattern classification where four classes are overlapped.In Table 4, their classification accuracy is 31.5% and 75.3%, respectively.Compared with them, the information entropy RPCA can help identify each class higher and its classification accuracy is 82.3%.Clearly, the InEnRPCA is a suitable feature extraction and classification method for fault diagnosis.

Wine Dataset.
We select the Wine data from the machine learning database UCI, which includes three classes with the number of the each class being 59, 61, and 58; besides, each of the samples has fourteen features.The types in Wine are overlapped and difficult to classify in observed space.
The relative transformation operator for each feature in the Wine dataset can be seen in Table 3 and the experimental results with the three approaches are shown in Figures 3(a From Figure 2(b) and Table 2, we can see that PCA and DSPCA do not have a perfect performance in pattern classification with the accuracy rate is 66.7% and 96.7%, respectively.Compared with them, the InEnRPCA can identify each class accurately, and the testing data can also be classified obviously, and its classification accuracy rate is 100%.

Discussion.
From the experiments above, it is obvious that the proposed methods can successfully distinguish different fault types and implement fault pattern recognition and diagnosis in most times, via utilizing information entropy to calculate relative transformation operator for each feature and combining with relative PCA approach.
Meanwhile, we also find that although the proposed method can improve the fault classification performance in most times (nearly 80%), the method is sometimes inferior to DSPCA; the reason may be due to the overfitting for the train set and the dissimilarity between the train set and test set; however, we can make sure that utilizing the information

Conclusion
This paper has analyzed the PCA method in the perspective of information theory and proposed a kind of fault diagnosis research based on information entropy and Relative Principle  Component Analysis.This proposed method can solve the following problems effectively: (1) the different result only due to calculating in different perspective of dimensions; (2) being uniformly distributed between the characteristic variables after dimension standardization; (3) how to calculate the weight for each variable.Both theoretical analysis and simulation experiments on Tennessee Eastman process and the Wine dataset from UCI demonstrate the feasibility and effectiveness of the new approach.
It is worth noting that the idea of using the information entropy to determine the certainty of the features is not new.However, the idea of combining information entropy with RPCA for faulty detection, to our knowledge, should have no previous publication.The proposed method should be seen as an alternative fault diagnosis method.It is not superior to the other methods in all cases.
There is some interesting future work: (i) To exploit the way to make the energy conservation the same before and after the relative transformation: in the perspective of the energy conservation, the system energy before and after the relative transformation may usually not be the same.If we can consider the energy conservation during the relative transformation, the effect may be much better.
(ii) To apply the relative transformation by information entropy as data preprocessing in other methods: the proposed method is not limited to RPCA method; it
and the conditional empirical entropy ( | ) means the uncertainty level to classify feature  in the condition of set .Then, the difference between them is the information gain which stands for the reduced level of the uncertainty to classify the set  by given feature .Obviously, as to the dataset , the information gain depends on the features and different features have different information gain, the bigger the information gain, the stronger the ability it has for classifying.3.2.The Information GainAlgorithm.Assume the train dataset is ; || denotes the sample capacity which is equal to the number of the samples.There are  classes in set  with each name being   ( = 1, 2, . . ., ) and |  | is equal to the number of the samples which belongs to   ; that is, ∑  =1 |  | = ||.Also, assume there are  values in feature  called { 1 ,  2 , . . .,   } and then divide set  into  parts with each name being   ( = 1, 2, . . ., ) and |  | is equal to the number of the samples that belongs to   ; that is, ∑  =1 |  | = ||.Based on the above, define the set   as the intersection of class   and subset   , that is,   =   ∀  , and |  | is the number of Given the train set  and the corresponding feature , the empirical entropy () means the uncertainty level to classify set ,

Table 1 :
The datasets descriptions and parameter constitution.

Table 2 :
Weight for each feature in the TE dataset.

Table 3 :
Weight for each feature in the Wine dataset.

Table 4 :
The classifying accuracy rate for each method.