Casing Vibration Fault Diagnosis Based on Variational Mode Decomposition , Local Linear Embedding , and Support Vector Machine

To diagnosemechanical faults of rotor-bearing-casing system by analyzing its casing vibration signal, this paper proposes a training procedure of a fault classifier based on variational mode decomposition (VMD), local linear embedding (LLE), and support vector machine (SVM). VMD is used first to decompose the casing signal into several modes, which are subsignals usually modulated by fault frequencies. Vibrational features are extracted from both VMD subsignals and the original one. LLE is employed here to reduce the dimensionality of these extracted features and make the samples more separable. Then low-dimensional data sets are used to train the multiclass SVM whose accuracy is tested by classifying the test samples. When the parameters of LLE and SVM are well optimized, this proposed method performs well on experimental data, showing its capacity of diagnosing casing vibration faults.


Introduction
The fault diagnosis of gas turbine engine is a large challenge to nowadays' mechanical engineers due to the complexity of its interior structure.Displacement sensors sometimes cannot be installed inside the machine.Vibration signals will be collected by the accelerometers installed on the casing.But it is difficult to analyze casing signals because of the complex surrounding structure and the long path of transmission from the inside to the outside [1].Frequency components of casing acceleration signals could be various and complex.Hence, the feature extraction and fault classification of its vibration faults are not the same with those designed for traditional rotating machineries.
To deal with the complex vibration signal of the rotorbearing-casing system, signal decomposition can be used to extract time or frequency characteristics before further processing.In addition to traditional filters of IIR and FIR, the wavelet transform is another widely used technique that could decompose and reconstruct signals.Wavelet transform can extract the time-frequency characteristics of the signal with variable resolution [2], but the selection of wavelets and their parameters is hard to decide [3].Empirical mode decomposition (EMD) is an adaptive technique which has a high-frequency resolution [4] but still has disadvantages like end effect and modes mixing.Since 1998, the application of EMD has been well developed.Many other adaptive decomposition methods were put forward, like local mean decomposition (LMD) [5], intrinsic time-scale decomposition (ITD) [6], and variational mode decomposition (VMD) [7].This paper is employing the VMD to do signal decomposition.
Features of signals reflect the vibrational condition of the rotor-bearing-casing system.But too many features would cause multiple dimensional difficulty during analysis.Dimensionality reduction is the way to handle multidimensional data, which could reduce the complexity of calculation [8].Many nonlinear dimensionality reduction techniques have been proposed these years, which have advantages over the traditional linear techniques like principal components analysis (PCA) [9].Local linear embedding (LLE) is a nonlinear technique proposed in 2000 [10] that has been widely used in many data processing problems.LLE is employed here to do dimensionality reduction, for its high efficiency in providing local properties of data.
The neural network has shown a strong capacity of solving nonlinear problems [11].But it still has some disadvantages like existence of many local minima solutions and the difficulty in choosing the number of hidden units.Support vector machine (SVM) is a breakthrough of neural network based on the development of statistical learning theory and structural risk minimization [12].The original SVM algorithm was invented in 1963.And in 1992, it was improved to be a nonlinear classifier by applying the kernel trick to maximummargin hyperplanes [13].
In this paper, faults of rotor-bearing-casing system are recognized by analyzing the casing vibration signal.Vibrational features in time and frequency domain, along with these extracted by VMD method, are used as the database of fault classification.The experimental data are obtained from the rotor-bearing-casing test system, where several kinds of mechanical faults could be simulated.Since the vectors consisting of features are multidimensional and complicated, they are dimensionally reduced through LLE to get a lowdimensional data set.Then the low-dimensional data set plays as the training data of multiclass SVM.The parameters of LLE and SVM are optimized, so that a capable classifier for faults of rotor-bearing-casing system could be produced.

Variational Mode Decomposition. VMD defines Intrinsic
Mode Function (IMF) as an amplitude-modulated-frequency-modulated signal: where   () is the instantaneous amplitude,   () is the instantaneous phase, and the instantaneous frequency is   =    ().Three important signal processing tools are involved in the building of the VMD model: Wiener filtering, Hilbert transform, and frequency mixing.
The goal of VMD is to decompose a real valued input signal into  subsignals (modes) written as   , and each of them is mostly compact around a center pulsation   .The constrained variational model VMD built [7] can be described as min To construct the model, first compute the associated analytic signal for each mode   by means of the Hilbert transform to obtain a unilateral frequency spectrum.Multiplied by  −   , then shift the mode's frequency spectrum to baseband.At last the bandwidth is estimated by means of the squared  2 -norm of the gradient.
The augmented Lagrangian  is introduced to convert the constrained variational problem into an unconstrained variational problem:  ( The solution to the original minimization problem now becomes finding the saddle point of the augmented Lagrangian  in a sequence of iterative subsignals and center pulsations.The algorithm is called alternate direction method of multipliers (ADMM) and the complete algorithm of VMD is shown as follows [14]: , and  as 0.
The IMF subsignals   are the inverse Fourier transform of û .

Locally Linear
with the constraint Zero filling ( 1 ,  2 , . . .,   ) to -dimensional vector   , so all the weights combine as  ×  matrix .Then the reconstruction error can be written as Suppose  = ( −   )  ( −   ), where  is  ×  identity matrix.Then there are where tr is the trace of matrix.

Support Vector Machine.
The main purpose of support vector machine is to construct a hyperplane in space for the use of classification, regression, or other tasks.When used for classification, the largest distance between data points and the hyperplane means a good separation.If the samples are not linearly separated in the original space, it is proposed to map this space into a much higher-dimensional space, even an infinite-dimensional one, which may make it an easier separation [16].
Suppose the training data points {  ,   }  =1 , where   ∈   is the input data and   ∈ {−1, 1} is the binary class labels.Define nonlinear function (⋅) :   →   which maps the original space into the higher-dimensional feature space.When the hyperplane is    +  = 0 and  is its normal vector, the training data may be described as Thus the classifier will be the form of The problem of building a hyperplane then turns into an optimization problem with the constraint condition where   are slack variables as the distance from the misclassified samples to the hyperplane, acting like the punishment.Positive constant  is the penalty coefficient and the bigger  is, the more the penalty of misclassification there will be.
It is a convex optimization problem and the Kuhn-Tucker theorem will be used to translate the problem into a dual problem with the constraint condition (  ) is not calculated in an explicit way but in the form of Mercer kernel There are several kernels often used: (iv) MLP: (,   ) = tanh(    + ).
So the nonlinear SVM classifier in dual space is turned into Here is the decision function of the optimized hyperplane.The value without the "sign" function as the indicator may also be used in multiclass classification problems.

Simulation Signal Analysis Using VMD
In a rotor-bearing-casing structure, resonances are often excited by the harmonic frequencies of the shaft and other forms of vibration.Large amount of information including fault features is buried in the resonances.High-Frequency Resonance Technique (HFRT) is a normally used method to extract vibrational parameters from modulated signals [17].But HFRT selects resonant frequency bands in empirical ways.It is unpractical to select frequency band for each sample when samples are too many.VMD helps to decompose the signal into a fixed number of intrinsic modes automatically.
Here is an example of a simulation signal, showing the effect of VMD decomposing modes apart.Suppose the shaft vibration   consists of rotational frequency and its subharmonic and harmonic frequencies.Two resonances of the casing are excited by the shaft vibration.Center frequencies of the resonances are set to 1000 Hz and 2400 Hz.So the casing signal  can be expressed as below, where wgn() is white Gaussian noise.
here amp = [1.5, 3, 2.5, 2, 1.5, 1, 0.5], freq = [0.5, 1, 2, 3, 4, 5, 6], and Figure 1 is the time waveform of  and its frequency spectrum where components of shaft vibration and structural resonances are clear to see.Thus there are 3 so-called modes in the signal.The purpose of VMD here is to separate these 3 modes apart.Figure 2 is the output IMF subsignals and their frequency spectrums, showing that main frequency components remained in the 3 IMFs and noise not close to these 3 modes is filtered out.For comparison, the signal is also decomposed by empirical mode decomposition (EMD).Figure 3 is time waveforms of the EMD subsignals (IMFs) and frequency spectrums of the first 7 IMFs.The stop criterion is set to recommend value by Rilling et al. [18], and 13 modes are decomposed out, most of which are not needed.Other stop criterions are tried but the mode number will not be decreased to 3. Figure 4 shows the center frequencies changing during VMD iteration.After convergence, these 3 center frequencies are 0.0076, 0.1248, and 0.3002 relative to the sampling frequency of 8000 Hz (60.8 Hz, 998.4 Hz, and 2401.6 Hz), very close to the center frequencies set before the simulation.If there is not white Gaussian noise in the simulated signal, the center frequencies from VMD will be 56 Hz, 1000 Hz, and 2400 Hz.This example can be a proof of VMD's remarkable ability to deal with casing vibration signals.

Experimental System and Data
The rotor-bearing-casing coupled vibration experiment system is designed to simulate vibration faults and to study transmission characteristics of rotor-bearing-casing system in a gas turbine engine.Figure 5 shows the structural drawing and the photo of the test rig.Although the structure of the test section is relatively simple, it has a complete transmission path from rotor to casing.Therefore, the test rig cannot fully represent a gas turbine engine but has many similar    Five different types of mechanical faults are simulated on this test rig including unbalance, misalignment, supporting looseness, bearing looseness, and rub-impact.So plus the normal state there are 6 running conditions in total.The acceleration signals are obtained at the sampling frequency of 8000 Hz when the shaft at its rotating speed is near 1500 rpm and 1800 rpm.The signals are divided into time slices at the length of 0.5 s as samples to be further analyzed.Figure 6 shows the time waveform and frequency spectrum of one signal sample for each running condition at the rotating speed of 1800 rpm.
For now, there are 56 samples for each condition, 28 for 1500 rpm, and 28 for 1800 rpm.So, there are 336 samples in total.Some of them are chosen to be the training data and the rest are the testing data.

Vibrational Feature Extraction of Experimental Data
As the purpose of fault diagnosis is to distinguish vibration conditions from each other, more significant features extracted from the vibration signal often mean more possibility of diagnosing the faults correctly.Different types of features could be extracted from a signal sample.
The most common features of a signal sample are parameters in time domain and frequency domain.Features in time domain include time waveform and amplitude parameters.In this case, 12 different amplitude parameters are calculated, whose formulas are shown in Table 1.They reflect the time domain characteristics of the waveform in many ways.make changes to the distribution of these parameters, but not enough changes to completely distinguish the faults from each other.
Frequency spectrum displays the frequency domain components of a signal.Mechanical faults are often accompanied by changes in frequency components of the vibration signal.Common rotor faults, like unbalance and misalignment, may bring significant changes in low-frequency part of rotor's vibration signal, and mostly in rotational frequency and its harmonic frequencies.These changes in low-frequency part may be transferred from rotor to casing.Thus, amplitudes of the first 12 harmonic frequencies are chosen as features in frequency domain.The higher-order parts are left to VMD method to deal with.
Further feature extraction is based on VMD outputs [19,20].Here in Figure 8 are time waveforms, frequency spectrums, and envelope spectrums of IMFs from one signal sample.Envelope spectrum amplitude values at the rotational frequency and its 2nd to 10th harmonic frequencies of each subsignal are chosen to be features of a sample.When the harmonic order is larger than 10, envelope spectrum components are not quite clear, so they are not used.
Different faults will excite the resonances of different parts of the whole structure; then center pulsations   of one condition are different from others.The values of center pulsations may also be one type of feature that could be used in casing fault diagnosis.For example, Figure 8(d) is the plot of center pulsations for all the training data related to unbalance and rub-impact.The difference between these two groups is quite clear.The -axis of the plot corresponds to 6 IMFs of the signal sample.The center frequencies of the first 3 modes of unbalance seem close to rub-impact, but the last 3 modes are clearly different.

Fault Classification Using SVM
Traditional SVM can only do binary classification.There are several methods for multiclass classification based on binary SVM, like one-versus-rest (1-v-r), one-versus-one (1v-1), hierarchical support vector machines (H-SVMs), and Directed Acyclic Graph (DAG-SVMs) [21].In this paper, methods of one-versus-rest and H-SVMs are used and compared.(ii) H-SVMs.By building binary SVMs, all the categories are divided into two subclasses and each subclass is divided again  until every subclass has only one category.In this process, -1 binary SVMs are built in total.Figure 9 is the procedure of training and testing a SVM classifier.There are 6 classes, whose labels are listed in Table 2. Table 3 is an example showing feature parameters of a sample.Vibrational features in amplitude domain and frequency domain are included, as well as envelope spectrum parameters of VMD subsignals and center pulsations of VMD modes.Only the first 2 IMFs are used to obtain VMD-Envelope parameters since the high-frequency subsignals contain too much noise which will cause interference with the process of fault classification.All these parameters could be part of an input vector used in the SVM training.Because the vibration data are under different rotating speeds, the value of rotational frequency of the main shaft is also included in the input vector and all other input vectors in the training.
Different types of features are used as the training data of the multiclass classifier.Each type is used individually at first.Both one-versus-rest and H-SVMs methods give a classification result.Polynomial function kernel is chosen as the kernel function of SVM because its performance is not worse than other kernel functions and it only has one discrete kernel parameter which is easy to optimize.Polynomial function orders 1 to 7 are tested and the one giving the best classification result will be the kernel parameter.
Half of all the samples are selected as training data and the rest are testing data (168 out of 336).Training data are randomly selected for 10 times, and each time the training and testing will yield classification accuracy.The mean accuracy of these 10 times is used as the accuracy of the classifier with this specific input type.The closer the accuracy is to 100%, the better the classification will be.The results below are all average accuracy of 10 classifications.
Then all these features are combined into one single input vector.The classification results of different training data are listed in Table 4. SVMs trained by single type of feature do not yield satisfying results.But when combined features are used, the results will be improved.And in most cases, the one-versus-rest method performs better than the H-SVMs method.To compare with the performance of SVM, two kinds of artificial neural networks, including radial basis function neural network (RBF) [22] and probabilistic neural network (PNN) [23], are also tried as the classification algorithm.The results of RBF and PNN are in Table 4 as well, which shows that in most cases SVM has better results than PNN, and PNN has better results than RBF.

Dimensionality Reduction Using LLE
There are too many elements in the feature vector and some of them have little variance for different categories.The dimensionality reduction (DR) methods can reduce the dimensionality and simplified the input vectors.The first thought of dimensionality reduction is Principal Component Analysis (PCA), which is a linear dimensionality reduction method of simple theory but high efficiency.It is widely used to decrease the size of feature vectors.Figure 10 is the flow chart of how PCA and SVM are combined to realize the training and testing [24].The loadings of PCA gained from the training data could also map the testing data into the dual space, where the first several components keep most of the information.During the training of PCA there are no many parameters to be selected, for only the number of dimensions after PCA needs to be decided.Classification results with PCA are shown in Table 4.
PCA is sensitive to the relative scaling of the original variables.Doing PCA before SVM training means smaller feature vector size but may not lead to much improvement of SVM's classification accuracy, as seen from the results.The performance without PCA is equivalent to, sometimes even better than, the one with PCA.One explanation is that, as the training of SVM has involved an optimization procedure, PCA, the linear dimensionality reduction method may not bring in any improvement when the kernel parameters and other parameters are already well selected.
To improve the performance of the classifier, PCA used above can be replaced with nonlinear dimensionality reduction approaches, like local linear embedding (LLE) that has been introduced in this paper.So, the flow chart is updated to Figure 11.Although LLE is a nonlinear dimensionality reduction method, it does not have many parameters.In addition to the number of dimensions, the number of nearest points, , is another parameter to select during the optimization.Combined features are used as input vectors here.The accuracy results in Table 5 show that LLE not only reduce the dimension of the input data but do improve the classification performance of SVM.Using two types of multiclass SVM, the classification accuracies both reach 97%.It seems strange that after dimensionality reduction the kernel function of SVM will map the data into a higher-dimensional space again.But, like nonlinear kernel function which makes the separation of different categories much easier, this nonlinear DR method could make the datasets more separable if the parameters are well optimized.These two methods seem to be contradictory but work well in fault classification problems, like the casing fault diagnosis problem in this paper.
Another important thing is that classifiers here are trained by vibration data under two rotating speeds.The results are still good, which means that the procedure is not very sensitive to rotating speed.Some other nonlinear DR methods may not have the same performance in this problem.For example, Isomap is similar to LLE, but more likely to provide global properties of data, which makes it more sensitive to short-circuiting [25].The results show that choosing Isomap as the DR process, the performance of the classifier declines a little.Some other DR methods are tried and the results are also listed below, like Laplacian Eigenmaps, Locality Preserving Projection (LPP), and Neighborhood Preserving Embedding (NPE) [26,27].Other DR methods do not suit this problem well like LLE.
The selected number of training samples has a certain influence on the classification results.Previously half of the samples are used as training data.The number of training samples is then decreased to 120 and increased to 216, and the classification results are in Table 6.More training samples will lead to higher classification accuracy.Here when the selected proportion of all the samples changes from 5/14 to 7/14 to 9/14, the accuracy will increase for no more than 2% each time.Although the influence of sample number is relatively small when the classification method is well chosen, more training data are preferred since the goal of classification is always 100%.

Discussion
The fault diagnosis process based on vibration signal could be simply described as 3 steps: feature extraction, feature optimization, and fault classification.The main techniques employed in this paper, including variational mode decomposition, local linear embedding, and support vector Vibrational features can be extracted from many aspects, including statistical properties, low-frequency characteristics, and high-frequency resonance characteristics.Figure 12 shows the flow chart of the whole procedure.
The time waveforms and frequency spectrums have shown the complexity of the experimental signals, even though the mechanical structure of the test rig is relatively simple.The vibration signal of a real rotor-bearing-casing system must be more complex, but the fault diagnosis procedure is similar.Vibrational features could be extracted from time domain, frequency domain, and VMD output too.Parameters like VMD mode number and harmonic order should be adjusted according to the real structure and its characteristics.However, when some special faults happen, like bearing faults and gearbox faults, additional analysis is necessary for more detailed features, otherwise the faults are hard to be detected.More signal preprocessing is required because of more influence from the inner structure.And for real systems the data amount will be much larger, so the classifier training progress will be quite complicated since the diagnosis procedure involves optimizations for several important parameters.

Conclusion
This paper presents a procedure of training a mechanical fault classifier of rotor-bearing-casing system, using casing vibration signals.The database of signals used here was obtained from the casing of the rotor-bearing-casing test rig.The test section of the rig is considered to be a rotorbearing-casing system simplified from part of a gas turbine engine.So this procedure can be a reference of mechanical fault diagnosis for gas turbine engine and other rotating machineries which have similar structure.
Instead of High-Frequency Resonance Technique, here VMD decomposes the casing signal into several modes,  which are usually subsignals modulated by fault frequencies.Amplitude values at rotational frequency and harmonic frequencies of the modes on their envelope spectrums are chosen as part of the features, as well as center frequencies of the modes calculated by VMD, and feature parameters in time and frequency domain.LLE is used to reduce their dimensionality of feature vectors.Then the multiclass SVM is trained by the low-dimensional data.Comparing to PCA the linear DR method and other nonlinear DR methods, LLE is capable of not only reducing the dimensionality, but also turning the data samples into more separable ones if its parameters were well chosen.The classification can reach accuracy of 98% in dealing with the experimental data, in which the features of amplitude, frequency, and VMD are combined together, using LLE for DR, and then training SVM.Considering the complexity and the randomness inside the experimental data, this result shows the procedure proposed here is a promising alternative of fault classifiers.

Figure 1 :
Figure 1: Time waveform (a) and frequency spectrum (b) of the simulation signal.

Figure 2 :
Figure 2: VMD subsignals (a) and their frequency spectrums (b) of the simulation signal.
(i) One-versus-Rest.Every time of training choosing one category as a class and all other categories as another class, then  binary SVMs are built.When doing classification for a sample, each SVM has a result of indicator function.This sample is assigned to the class where the SVM has the biggest indicator.

Figure 9 :
Figure 9: Flow chart of fault diagnosis using SVM.

Table 3 :
An example of feature parameters.

Table 4 :
Classification results with different training data.

Table 5 :
Classification results with different DR methods.

Table 6 :
Classification results with different selected numbers of training samples.