Kernel Local Linear Discriminate Method for Dimensionality Reduction and Its Application in Machinery Fault Diagnosis

Dimensionality reduction is a crucial task in machinery fault diagnosis. Recently, as a popular dimensional reduction technology, manifold learning has been successfully used in many fields. However, most of these technologies are not suitable for the task, because they are unsupervised in nature and fail to discover the discriminate structure in the data. To overcome these weaknesses, kernel local linear discriminate (KLLD) algorithm is proposed. KLLD algorithm is a novel algorithmwhich combines the advantage of neighborhood preserving projections (NPP), Floyd, maximum margin criterion (MMC), and kernel trick. KLLD has four advantages. First of all, KLLD is a supervised dimension reduction method that can overcome the out-of-sample problems. Secondly, short-circuit problem can be avoided. Thirdly, KLLD algorithm can use between-class scatter matrix and inner-class scatter matrix more efficiently. Lastly, kernel trick is included in KLLD algorithm to findmore precise solution.Themain feature of the proposed method is that it attempts to both preserve the intrinsic neighborhood geometry of the increased data and exact the discriminate information. Experiments have been performed to evaluate the new method. The results show that KLLD has more benefits than traditional methods.


Introduction
With the information collection technology becoming more and more advanced, a huge number of data have been produced during mechanical equipment running process.The sensitive information which reflects the running status of the equipment has been submerged in a large amount of redundant data.Effective dimensionality reduction can solve this problem.Dimensionality reduction is one of the key technologies for equipment condition monitoring and fault diagnosis.Nonlinear and nonstationary vibration signals generated by the rolling bearing [1,2] make the original highdimensional feature space which consists of the statistical characteristics of the signal inseparable.The traditional linear dimensionality reduction methods such as PCA and ICA not only are under the assumption of global linear structure of the data but also use different linear transformation matrix to find the best low-dimensional projection.The classification information plays an important role.In nonlinear conditions such as the original high dimensional feature space possesses a non-linear structure, however, the classification information is difficult to obtain by linear methods.KPCA is a traditional nonlinear dimensionality reduction method, which achieves the task of dimensionality reduction by discarding relatively small projection in a higher-dimensional linear space.In addition, KPCA aims to find the principal components with the largest variance, which may cause the loss of useful discriminate information [3].
Manifold learning is a data-driven approach and can reveal the underlying nature of the complex data structure, which provides a new approach for the analysis of the intrinsic dimension based on the data distribution.Manifold learning has got a series of research achievements in the feature extraction [4][5][6].Actually, manifold learning method falls broadly into two categories [7] which have different advantages and disadvantages: global (Isomap [8]) and local (locally linear embedding [9]).In [10], the author points out that, as for the discriminate analysis, the local structure 2 Shock and Vibration is usually more important than the global structure when there are no enough samples.As for local manifold learning, local linear embedding (LLE) is an algorithm which has many advantages such as global optimal solution and fast calculation.Furthermore, its minimum reconstruction error weights can keep data local neighborhood geometric properties unchanged when data exhibition shrinks and rotates.So LLE algorithm is applied to the fault feature extraction [11][12][13][14][15].
NPP [16] is one of the manifold learning methods, whose central idea is based on LLE by introducing a linear transform matrix.NPP has been successfully applied in famous "Swiss roll" and "S-curve" dataset dimension reduction.The algorithm assumes that the structure of data on the local significance is linear.However, when the data manifold has a larger bending, manifold learning method will result in the short-circuit problem.
As for these issues, a fault feature extraction method named KLLD is proposed in the paper.This method studies both the iris dataset and the rolling bear original feature dataset constructed by wavelet packet energy with dimensionality reduction application.The effectiveness of this method is verified by contrast with conventional analysis methods.
The rest of this paper is organized as follows.In Section 2, we review briefly the LLE, NPP, Floyd, and MMC algorithm.In Section 3, firstly, KLLD algorithm proposed in this paper is deduced; secondly, the short-circuit problem is introduced and Floyd algorithm is employed to overcome this drawback; lastly, based on the LLD, KLLD, and Floyd algorithm, the calculation steps of LLD and KLLD are designed.In Section 4, we design a KLLD experiment process for the dimension reduction of iris and rolling bear database; then we apply the KPCA, NPP, LLD, and KLLD algorithm to dataset dimension reduction.Conclusions are made and several issues for future study are addressed in Section 5.For the given  -dimensional real-valued vectors x  ∈ X ( = 1, . . ., ), assume that the vectors of each point and its nearest neighbor lie in the local linear space, by weighting coefficients W  ( = 1, . . ., ;  = 1, . . ., ) with  neighborhood which belongs to x  to reconstruct x  .W  is selected by minimizing the cost function.That is,

Basic Principle
In LLE algorithm low dimensional points can be reconstructed by high-dimensional matrix.Each of the highdimensional data x (:, ) can be obtained by M eigenvalue decomposition of formula (2) and then identify the bottom  + 1 eigenvectors corresponding to its smallest  + 1 eigenvalues.Then Y is a matrix that is constructed by using these eigenvectors and discarding the eigenvectors corresponding to its smallest eigenvalues of the matrix.Derivation can be found in the literature [9]: In summary, we state LLE as the following algorithm.
Step 1. Select neighbors by  nearest neighbor algorithm.
Step 3. Map to embed coordinates Y.

Neighborhood Preserving
Let Y = A  X.
(3) can be transformed to the following: where M is obtained by solving (2).The Lagrange extreme method is used: Obviously, we know the projection matrix A is generalized eigenvectors of ((XX  ) −1 (XMX  )).Step 1. Initialization: computing  0  .We set  0  = ∞ when such shortest path does not exist; otherwise  0  = 0.

Floyd
Step 2. For  = 1, 2, . . ., , using (7) to loop iteration, calculate short path of each point in D  : 2.4.Maximum Margin Criterion.LDA (linear discriminate analysis) is a popular linear feature exactor.The key step is to find a transform matrix Q ∈  × under the condition of Fisher criterion maximized, where  is the number of dimension in original dataset and X and  are the number of Y's dimension [17]: where S  represents the distance matrix between classes and S  represents distance matrix within class. is the number of classes;   and   are the mean vector and a priori probability of class , respectively. is overall mean vector.
However, we can find drawbacks, because (8) cannot be applied when S  is singular due to the small sample size problem.MMC method has been proposed in the literature [17] to overcome these drawbacks.According to LDA, MMC can be represented in the following:

Local Linear Discriminate Algorithm.
According to LLT-STA derivation process in the literature [18], the basic idea of the LLD proposed in this paper is that if the linear transform matrix A of NPP (see (5)) satisfies (10), the ability of discriminate different class of data will be greatly improved.This problem can be expressed as a multiobjective optimization problem: Equation ( 11) can be changed into constrained optimization problem:  Lagrange multiplier method is used to solve this problem; that is, further deduced as follows: Equation ( 14) is converted into then we can find the projection matrix A is eigenvectors of

Kernel Local Linear Discriminate Algorithm.
Suppose that  is a nonlinear mapping to some feature space F; (15) can be changed into the following: To find local linear discrimination information in the feature space F, we need (16) dot product form of input patterns.Then we replace the dot product form with one of the kernel functions.From the theory of reproducing kernels we know that any solution A  ∈ F must lie in the span of all   training samples in F. Therefore, expansion terms for A  can be written as follows: Combining ( 16) and ( 17) and then multiplying [(X  )]  at both sides of ( 16), we have  We set (  ,   ) = ((  ) ⋅ (  )), and then (18) can be rewritten as follows: where   is the number of th samples,  is the number of samples, and  is the number of sample dimensions.Taking polynomial kernel function ((, ) = () ⋅ () = ( ⋅  + 1)  ) as the sample, K , can be rewritten as K , = (  ⋅   + 1)  .Similar to Section 3.1, ( 19) can be considered as a generalized eigenvalue decomposition problem.

Short-Circuit Problem.
Traditional Euclidean distance method has many advantages such as perceptual intuitional, easy to understand and calculation.However, Euclidean method could easily lead to short-circuit problem [19], when the high-dimension space possesses a larger hypersurface curvature.Short-circuit problem refers to the fact that a point's neighbor mixed with different types of points, which results in discrimination information cannot be extracted effectively.Distribution of the two-type data in the twodimensional space is shown in Figure 1.
Figure 1 shows two types of points including round and square.Under the condition of Euclidean distances, round12's five close neighbors are {round11, round13, square2, square3, and square4}.This phenomenon will lead to distortion of data dimensionality reduction in low-dimensional space.In order to overcome this drawbacks, we created a connect graph.The point of different type is not connected and we deem the distance between the unconnected points is infinity.In this way the round12's five nearest neighbor points are {round9, round10, round11, round13, and round14}.Therefore, the Floyd algorithm in Section 2.3 is used to find the distance between points in the figure after establishing a connection diagram in high-dimensional sample space.To find the right nearest neighbor point in the LLD algorithm, using the Floyd algorithm can effectively avoid the problem of mixing different types of data samples.

Steps of LLD Calculation.
According to Sections 3.1, 3.2.and 3.3 analysis, we state LLD as the following algorithm.
Input.One has the original space  ×  matrix X, close neighbor points , connection distance , low-dimensional embedding dimension  ( < ), and Kernel parameter .
Output.One has  ×  low-dimensional matrix Y.
Step 2. Set connection threshold , determine points value, and construct a weighted graph similar to Figure 1.
Step 3. The distance between x  and the rest points is calculated.x  's  nearest neighborhoods are selected under the condition that minimum distance is determined according to Section 2.3.
Step 4. Reconstruct weighting matrix W, which is calculated according to formula (1).Step 5. Calculate matrix A according to formula (15) and tion matrix, and then find the low-dimensional embedding  + 1 smallest eigenvalue, 2 to  + 1 smallest eigenvalue corresponding feature vector A.
Step 6. Calculate Y by Y = A  X.

Steps of KLLD Calculation.
Besides the input and output of LLD, kernel parameter  also should be considered as for KLLD's input.There are 3 calculation steps.
Step 1. Reconstruct weighting matrix W as Step 1 to Step 4 of Section 3.4, and then calculate M according to formula (2).
Step 2. Calculate  according to formula (19); then A  can be obtained by (17).
Step 3. Calculate Y by Y = (A  )  X.

Application Analysis
4.1.Iris Dataset Dimensionality Reduction.We evaluated the performance of the new approach on the iris plants database.I. setosa, versicolor, and virginica are included in this dataset.Sepal length, sepal width, petal length, and petal width are the characteristics of the plant samples.The number of each of the plant samples is 50.We divided them into two parts equally named dataset1 and dataset2.So there were 25 plant samples in each class of the new database.KPCA, NPP, and LLD were also used in this section to demonstrate the advantage of the dimensionality reduction method.Polynomial kernel function whose parameter  = 25 was employed in KLLD and KPCA.The number of close neighbor points is  = 14, which was used in NPP, LLD, and KLLD method.The results of iris database dimension reduction are shown in Figure 2. KPCA and NPP methods hardly discriminate three types of the plant as shown in Figures 2(a)-2(d), because of the fact that both of them reduce dimension for describing data.That is to say, they keep the information not discarded during dataset dimension reduction.Figures 2(e)-2(h) show the results of LLD and KLLD methods.There are points representing different plant overlaps in Figures 2(e)-2(f), because LLD is a linear method.As shown in Figures 2(g)-2(h), the KLLD method can discriminate different kinds of plants properly.In order to investigate the accuracy of classification, SVM [20] is used as the classifier.Table 1 shows the accuracy of SVM classification.Dataset1 is the training dataset and dataset2 is the testing dataset.Table 1 illustrates that the KLLD method obtains the highest classification accuracy.

Rolling Bear Fault Datasets Dimensionality
Reduction Experiment 4.2.1.KLLD Experiment Process Design.The calculation steps are designed as follows.
Step 1. Collect vibration signal of the rolling bear.
Step 2. Wavelet packet energy is used to construct the original features datasets.
Step 3. Projection matrix (A  )  is obtained by KLLD.
Step 4. Find out dimension reduction result by Y = (A  )  X.
KLLD dimensionality reduction process for rolling bearing fault datasets is shown in Figure 3.

Wavelet Packet Energy Original Feature Construction.
Under normal and fault operating conditions, time-domain waveform signals are shown in Figure 4.
The time-domain waveform characteristic of bearing inner race fault is typical shock component.The waveform of normal bearing rolling shows the feature of stable and little fluctuation in amplitude.The waveform of rolling element bearings fault includes random single punch strike component, while the time-domain waveform of bearing outer race fault is very similar to the inner race fault waveform.It is hard to grasp the rolling bear feature of different fault condition only from time-domain waveform.Wavelet packet analysis is a precise method for signal analysis.It is widely used in bearing fault diagnosis currently.So we use this method to construct the original feature.Typical fault of wavelet packet energy is shown in Figure 5.We can find that the different rolling bear faults signals which are processed by the wavelet packet decomposition have significantly different amplitude in different frequency bands of energy.

Wavelet Packet Energy Original Feature Construction.
To verify the validity of the KLLD method, the experiment was performed on Electrical Engineering Laboratory rolling bear vibration database of Case Western Reserve University.We selected the bearing model SKF6203, with the running speed 1730 rpm under normal, inner race fault, ball fault, Shock and Vibration and outer race fault.They were processed by wavelet packet decomposition and two original feature datasets were constructed, named dataset1 and dataset2.Both of the datasets have 40 points.Table 2 shows the original features of dataset1.

Parameter Settings.
According to [21], the authors calculated the optimal embedding dimension of manifold learning algorithm by the following: where  is the optimal embedding dimension and  is the number of categories.This study considers four operational states of the rolling bear, so the low-dimensional space is 3 by (20).In addition, the distribution of the data can be clearly illustrated in 3-dimensional spaces.The parameter  is the quantity of nearest neighbor.It is one of the most important parameters in manifold learning algorithm, because if the number of nearest neighbors is too large the small-scale structure of the manifold could be eliminated and the whole manifold would be smooth.On the contrary, if the number of nearest neighborhoods is too little, the successive manifold may be divided into disjointed submanifolds.Residual variance can be defined as 1 −      , where   ,   represent the Euclidian distance matrixes of each point in X and Y, respectively. represents standard linear correlation.The smaller the value of residual variance is, the better high-dimension dataset can be embedded into low dimension [9].The optimal value of  can be found by LLE is the basic version of NPP, LLD, and KLLD and it can be used to determine the optimal  of datasets.Dataset1 is input X of LLE algorithm, the value of  is from 2 to 39, and the results are illustrated in Figure 6.So  = 10 is the optimal quantity of neighborhood.
We can find that the distance between different classes is large in numerical while that within the same class is small from Table 3.In order to guarantee that the number of neighborhood in the same class is 10, the connection distance is  = 0.441 according to Table 3.The number of neighbors is  = 10.Polynomial kernel function is used in KLLD and KPCA and its parameter  = 35.Dataset2 is also done the same as we do in dataset1.The KLLD algorithm is used to calculate dataset1 and dataset2.The inner-class distance in low dimension is calculated and it is shown in Table 4, in order to evaluate the effectiveness of the KLLD method precisely.It shows that the proposed method has better clustering ability.To quantitatively evaluate the separability of the method, the sample ratio of between-class average distance and average intraclass distance is calculated.We can find in Table 5 that KLLD is 4.38 × 10 11 while other methods are less than 517.7 in dataset1, and KLLD is 186.4 while other methods are less than 120.8.SVM has been used to calculate the classification accuracy of low-dimension dataset.The results are shown in Table 6, which illustrate that KLLD-SVM method can recognize each condition of rolling bear vibration signal.

Conclusions
A novel dimension reduction algorithm for the purpose of discrimination called kernel local linear discriminate (KLLD) has been proposed in this paper.The most prominent property of KLLD is the complete preservation of both discriminate and local geometrical structures in the data.However, traditional dimension reduction algorithm can not properly preserve the discriminate structure.We have applied our algorithm to iris databases dimension reduction.The experiment demonstrated that our algorithm can extract the different kinds of iris features is suitable for classification.And then we applied KLLD to machinery fault diagnosis.At first, the original feature space of rolling bear dataset was constructed by wavelet energy.Secondly, KLLD algorithm and other dimensionality reduction methods were used, respectively, in the original feature space.Finally, SVM was used for classification.The experiment shows that our method has excellent capability of clustering and dimension reduction.

Figure 3 :
Figure 3: Scheme of KLLD dimension reduction for rolling bear dataset.

4. 2 . 5 .
Calculation and Discussion.KPCA, NPP, and LLD algorithms are also used to analyze the effectiveness of KLLD.Table 2 is input matrix.

Figure 7
shows the results of distribution in three-dimensional space.As shown in Figures 7(a)-7(b), inner race fault and ball fault overlap with each other.KPCA hardly distinguishes different class of points during dimension reduction.Figures 7(c)-7(d) illustrate that normal and inner race fault can be distinguished; however, ball fault and outer race fault have some slight aliasing, especially in dataset2.We can find the same phenomenon in Figures 7(e)-7(f).The results of KLLD are directly shown in 7(c)-7(d).KLLD algorithm can distinguish different classes of rolling bearing dataset.Figure 7 suggests that the different states of discriminating sensitive characteristics are retained.

Table 1 :
Comparison of four dimensionality reduction methods for iris dataset classification.

Table 2 :
Original features constructed by wavelet packet energy spectrums of rolling bear dataset1.

Table 3 :
Distances between each point of rolling bear dataset1.

Table 4 :
Comparison of within-class distance of low dimension using four dimensionality reduction methods.

Table 5 :
Ratio of between-class distance and within-class distance using four dimensionality reduction methods.

Table 6 :
Comparison of four dimensionality reduction methods for classification.