Fault Diagnosis for Engine Based on Single-Stage Extreme Learning Machine

Single-Stage Extreme LearningMachine (SS-ELM) is presented to dispose of the mechanical fault diagnosis in this paper. Based on it, the traditional mapping type of extreme learning machine (ELM) has been changed and the eigenvectors extracted from signal processing methods are directly regarded as outputs of the network’s hidden layer. Then the uncertainty that training data transformed from the input space to the ELM feature space with the ELMmapping and problem of the selection of the hidden nodes are avoided effectively. The experiment results of diesel engine fault diagnosis show good performance of the SS-ELM algorithm.


Introduction
As representative equipment, the engine is general power source, and its safety and reliability is very important.In equipment fault diagnosis, reciprocating machinery faults are the most difficult cases.In order to solve the problem, piezoelectric pressure sensors, accelerometers, and sound sensors are widely used to measure signals from the engine.Faults are defined as the deviations from normal behaviors in the plant.Because the engine's working condition is so bad and its structure is so complex, signals of engine fault are nonstationary and nonlinear.It is difficult to extract a threshold which can clearly reflect the fault characteristics.So the methods which combine signal processing and intelligent pattern recognition are used to actualize fault diagnosis.Firstly, signal processing methods, such as frequency spectrum analysis, wavelet transform, Hilbert-Huang transform, and mathematical morphology, are used to denoise the measured signals and extract the feature vector which could broadly reflect the fault characteristics and types.Secondly, intelligent pattern recognition methods, such as artificial neural network and support vector machine (SVM), are used to map the feature vector into a higher dimensional feature space and classify the fault mode based on iterative optimization or statistical learning.
The combination of signal processing and intelligent pattern recognition solves the diagnosis puzzler of mechanical fault in a certain extent.But so many parameters need to be tuned to achieve a preferable fault classification rate when the recognition algorithm is applied.The computational complexity and computing time cost of the recognition also limit its effective realization in embedded systems.As a novel learning algorithm, ELM has been proposed recently by Huang et al. [1] for single hidden layer feedforward neural networks (SLFNs).Different from gradient-descent-based methods, ELM randomly chooses input weights (linking the input layer to the hidden layer) and hidden bias, and the output weights (linking the hidden layer to the output layer) are analytically determined using the Moore-Penrose generalized pseudoinverse instead of tuning them.Experimental results show that the learning speed of ELM can be thousands of times faster than gradient-descent learning algorithms.So it receives wide applications such as fault prognosis of mechanical components [2], fault classification in series compensated transmission line [3], fault diagnosis on hydraulic tube tester [4], and computer aided diagnosis system [5].
However, because of random mapping from the input space to some feature space, numerical stability of the output has been generally ignored.On the other hand, random selection of the input weights and biases results in a large number of hidden units which consumes so much computing time.
To solve these shortages, many improved ELM algorithms were investigated recently [6][7][8][9].Zhao et al. [10] proposed an input weight selection algorithm for an ELM with linear hidden nodes to improve the ill-conditioned problem.Huynh and Won proposed Least Square Extreme Learning Machine (LS-ELM) [11], Regularized Least Square Extreme Learning Machine (RLS-ELM) [12], and SVD-Neural classifier [13].Projection Vector Machine was proposed by Deng et al. [14] for high-dimension small-sample data.Although these improved algorithms have enhanced numerical stability of the ELM output in a certain extent, some output fluctuations still exist.For many applications with high security requirements, unstable judgment may cause some fatal safety accidents, such as fault diagnosis for engine, industrial process control, control of the chaotic system, and operation condition monitoring of hydroelectric generating sets [15][16][17][18].So, in order to introduce ELM to fault diagnosis of engine effectively, we proposed Single-Stage Extreme Learning Machine (SS-ELM) in this paper.Firstly, eigenvectors extracted from signal processing methods are directly regarded as the SS-ELM network's hidden layer output matrix.Secondly, the Moore-Penrose generalized inverse is used to calculate the output weight.Then this method has lower computational complexity and could have been replanted to the embedded systems.Experimental results show that this approach is feasible in identifying engine faults.
The rest of this paper is organized as follows.Section 2 describes extreme learning machine algorithm and its shortage on classification.Single-Stage Extreme Learning Machine is presented in Section 3. Experimental results and analysis on engine fault diagnosis are shown in Section 4. Finally, conclusion is made in Section 5.

Extreme Learning Machine and
Its Shortage on Classification  output.Then SLFNs with activation function (x) can be mathematically modeled as where   = [ 1 ,  2 , . . .,   ]  is the input weight vector connecting the input neurons and the th hidden node, b  is the threshold of the th hidden node,   = [ 1 ,  2 , . . .,   ]  is the weight vector connecting the th hidden node with the output neurons, o  is the real output vector of SLFNs, and   ⋅ x  is the scalar product of   and x  .So the main aim of training process is to minimize the following error function by adjusting the network parameters   , b  , and   , and the error function can be defined by (2)

Extreme Learning Machine.
Traditionally, gradientdescent algorithm is used to train the SLFNs, in which the set of  is iteratively tuned by where  consists of parameters   , b  , and   and  denotes the learning rate.As a popular training algorithm for feedforward neural networks based on gradient-descent, back-propagation (BP) learning algorithm has been used in various fields, in which parameters are adjusted with error propagation from the output layer to the input layer.However, it is clear that these algorithms have a slow learning rate, easily get overlearning, and stop at the local minimum.
Recently, an effective training algorithm for SLFNs was proposed by Huang et al. [1] and called ELM.According to Huang and Babri [19], SLFNs with at most  hidden nodes and almost any nonlinear activation function can exactly learn  distinct observations.So if the standard SLFNs with  hidden nodes can approximate these  distinct observations with zero error, it implies that there exist   , b  , and   such that Equation ( 4) can be written compactly as where As proposed by Huang et al. [1], H is the hidden layer output matrix.The parameters   and b  (input weights and biases) may simply be assigned with random values and need not be adjusted during the training process.Then (5) becomes a linear system, due to the fact that the matrix H may not always be square matrix, so the smallest norm least-squares of the network is estimated as where H † is the Moore-Penrose generalized inverse of matrix H.

Shortage of Extreme Learning Machine on Classification.
Although the ELM algorithm surmounts many puzzles, such  as improper learning rate, overlearning, and local minima, which always lie in traditional gradient-descent approaches, random selection of input weights and biases may lead to an ill-conditioned problem so that the output of the network will be numerically unstable [10].On the other hand, in order to get a preferable classification rate, the number of hidden nodes should be increased substantially which usually leads the complexity of network and training time to increase obviously.
In order to verify the problem of ELM with random mapping, we choose Page Blocks dataset from UCI Machine Learning Repository [20].The initial number of hidden nodes is set as 50 and the incremental node for each simulation is set as 5.The sigmoidal function is chosen as the activation function; input weights and biases are determined randomly.Then the training and testing accuracy of ELM variations with respect to initial network parameters and hidden nodes on Satellite Image dataset are shown in Figures 1 and 2. Training and testing time of ELM with different hidden nodes are shown in Figures 3 and 4. The simulations are carried out in MATLAB 7.11.0environment running in AMD, 2.2-GHZ CPU with 1 G RAM.
The simulation results show that different outputs of ELM were obtained with the same hidden nodes and activation function because of random selection of the input weights and biases.On the other hand, generalization performance of ELM depends on the proper selection of the input parameters, but it is difficult to search the best network parameters.It usually requires a large number of hidden nodes to achieve a preferable classification accuracy which results in slow response of the trained networks.According to Figure 3, we know that the training time of ELM has an exponential manner approximately increasing along with the number of hidden nodes.In addition, the large number of hidden nodes also needs a large memory to store the parameters.These problems seriously restrict ELM's applications in embedded system.
For an ill-conditioned system, the change in the final solution may be large even if change in the output of ELM is small.So the numerical stability is an especially important aspect for fault diagnosis system.Equipments are maintained in a lagging manner and excess maintenance caused by underreporting and misjudgment of fault mode would result in serious accidents and larger financial burden.So ELM must be improved to be applied in the field of fault diagnosis because of the disturbance of the output.

Single-Stage Extreme Learning Machine
ELM algorithm can classify the multidimensional data which is extracted from the time domain measured signal.And the signal obtained from transducer is a kind of time series which cannot be imported to the ELM network directly.In other words, the ELM used three steps to accomplish classification of the time series.Firstly, some feature extraction methods are used to map the time series into feature space.The second step is to map the eigenvector from d-dimensional feature space into N-dimensional hidden layer space using the random input weights and hidden layer biases.The third step is to analytically determine the output weights by computing the pseudoinverse of the hidden layer output matrix.According to the aforementioned content, we know that the random mapping results in the perturbation of output and a good classification performance requires much more hidden nodes.In other ways, transformation of feature vectors increases the input dimension because the number of hidden nodes is always bigger than the number of input nodes.In order to solve the aforementioned problem, we predigest the network structure by regarding the feature vectors as the output of hidden layer; then the original network becomes a network only with two layers, the original random mapping is avoided, and original training process is simplified.So we call the improved structure Single-Stage Extreme Learning Machine (SS-ELM).In SS-ELM, the input matrix is equal to the hidden layer output matrix in ELM and the output weight can be calculated by pseudoinverse operation of the input matrix.The sketch maps of ELM and SS-ELM are shown in Figures 5 and 6.
Given a set of training data (x  , t  )  =1 , where R  are the jth input data and its desired output, then the input matrix X can be defined by X × = [ 1 ,  2 , . . .,   ]  and the desired output matrix T can be defined by T × = [t 1 , t 2 , . . ., t  ]  .The output weight of the network based on SS-ELM could be obtained by where X † is the Moore-Penrose generalized inverse of matrix X.
The training times of SS-ELM and ELM are mainly costed by computing the Moore-Penrose generalized inverse of the hidden layer output matrix [21].In most cases, singular value decomposition (SVD) is used to compute the Moore-Penrose generalized inverse; for the matrix H ∈ R × , the computational complexity of SVD is (4 2 + 8 3 ) [22].If the number of hidden layer nodes becomes large, computational time of SVD will remarkably rise.With a compact network structure, SS-ELM can achieve both better stability of output and lower computational complexity.
Consider a special structure of ELM in which the number of hidden nodes is equal to the dimension of input feature vector to analyze the random mapping from input space to hidden space.Then we investigate whether a difference exists between the special structure of ELM and SS-ELM or not.Performance comparison of SS-ELM and the ELM algorithm for Iris and Wine dataset is carried out.70% and 30% of samples in the dataset are randomly chosen as train and test data at each trial.The details of datasets and the average learning time of forty trials for these two algorithms are listed in Table 1.Training and testing accuracy of 40 trials of simulations for each method are shown in Figures 7 and 8.
The aforementioned simulation results show that the learning speed of SS-ELM is as fast as ELM.And the main difference between SS-ELM and ELM algorithms lies in the  stability of the output of the network.ELM has an unstable output due to the random mapping from original space to the hidden space, and SS-ELM has a robust output due to the simplified network structure.
In some real-world applications, the training data may be chunk-by-chunk or one-by-one.Under these circumstances, the incremental learning algorithms may outperform batch learning algorithms as incremental learning algorithms do not require retraining the old data whenever new data is received [23].Therefore, the batch SS-ELM algorithm can also be extended to accommodate online incremental learning to the online problems.
Given a set of initial training data S 0 = (x  , t  )  0 =1 , the initial input matrix X 0 and output matrix T 0 could be got easily, so the initial output weight can be obtained by If X  0 X 0 tends to become nonsingular, the initial output weight can be define by where K 0 = X  0 X 0 .If X  0 X 0 tends to become singular, one can make it nonsingular by adding a constant diagonal matrix; K 0 can be redefined as K 0 = X  0 X 0 + I.When other  1 observations of the second training subset S 1 = (x  , t  )  0 + 1 = 0 +1 are received, the training problem becomes minimizing where Considering both blocks of training sets S 0 and S 1 , the output weight  becomes where

Experimental Results
In this paper, we measured vibration signals on the F3L912type diesel engine which consists of 3 cylinders and 4 strokes.The rotating speed is 1200 r/min and the engine is empty loaded when sampling acceleration signals on the first cylinder.The oil pressure signal can be used as indication to the vibration signal on cylinder head, so the oil pressure signal on the third cylinder is measured by the clip oil pressure sensor with vibration signal in the synchronization manner.9.
As the aforementioned content, the vibration signal on cylinder head reflects the action time and intensity of five main excitation instances.From the time-frequency domain distribution, we know that the vibration signal has a wide frequency band and it is nonstationary.Multiscale principal component analysis (MSPCA) [24] computes the PCA of wavelet coefficients at each scale and combines the results at relevant scales.Because of the multiscale nature, it has wide applications.So, in this paper, we used the MSPCA method to analyze the vibration signal on cylinder head.The original vibration signal on cylinder head and transformation based on MSPCA on four conditions are shown in Figures 10,11,12,and 13.The wavelet packet can focus energy distribution of the signal on different decomposition coefficients, and it has advantages in feature extraction [25].So we decompose the vibration signal to three layers using the wavelet function Daubechies 4 and reconstruct the conjugate filter coefficients of wavelet packet decomposition firstly.Then we calculate the energy of signal in each frequency band using Parseval theory and normalize the energy of each frequency band as the feature vector.200 and 125 feature vectors in fault diagnosis problem are randomly chosen for training and testing.Forty trials for each algorithm are conducted, and then we take average of forty trials as final results.The training and testing accuracy are shown in Figure 14.The training time based on ELM with Morlet function and sigmoid function is 4.3 ms and 3.9 ms, in which the number of hidden layer nodes is equal to the dimension of feature vector.The training time of SS-ELM is 2.3 ms.
Experiment shows that the above mentioned diesel engine fault diagnosis method based on MSPCA and SS-ELM has a high accuracy rate for fault identification.SS-ELM overcomes the shortcoming that input vector must be  randomly mapped to the kernel space of the hidden layer in ELM.Then SS-ELM avoids the problem that equipments are maintained in a lagging manner and excess maintenance caused by underreporting and misjudgment of fault mode.So we think that the SS-ELM can be used in the field of fault diagnosis.

Conclusion
In this paper, a fault diagnosis method for engine is proposed.Firstly, due to the random mapping from the input space to the hidden layer space of the traditional ELM, unstable output result which may cause some fatal safety accidents for fault diagnosis of engine exists.Then we simplify the original structure of network and propose the Single-Stage Extreme Learning Machine (SS-ELM).In SS-ELM, the original random mapping is avoided, original training process is simplified, the input matrix is equal to the hidden layer output matrix in ELM, and the output weight is calculated by pseudoinverse operation of the input matrix.Experimental results show that the output stability of the modified method outperforms the traditional ELM in which the input and hidden layers have the same dimension.The learning speed of SS-ELM is as fast as ELM, but the modified algorithm does not need to tune any parameter in the whole training process.

Figure 1 :Figure 2 :
Figure 1: Training accuracy of ELM variation with respect to hidden nodes and initial parameters (dataset: Page Blocks).

Figure 3 :Figure 4 :
Figure 3: Training time of ELM variation with respect to hidden nodes and initial parameters (dataset: Page Blocks).

Figure 7 :
Figure 7: Training and testing accuracy of ELM and SS-ELM for Wine dataset.

Figure 8 : 2 F
Figure 8: Training and testing accuracy of ELM and SS-ELM for Iris dataset.

Figure 9 :
Figure 9: Vibration signal of cylinder head in one work cycle and its STFT.

Figure 10 : 1 −Figure 11 :
Figure 10: Signal in time domain and power spectrum on normal condition.

Figure 12 :
Figure 12: Signal in time domain and power spectrum on small exhaust valve clearance condition.

Figure 13 :
Figure 13: Signal in time domain and power spectrum on gas leak air supply valve condition.

Figure 14 :
Figure 14: Fault identification results on engine based different algorithm.

Table 1 :
Details of real-world applications and average learning time for each.