Aero Engine Component Fault Diagnosis Using Multi-Hidden-Layer Extreme Learning Machine with Optimized Structure

A new aero gas turbine engine gas path component fault diagnosis method based on multi-hidden-layer extreme learning machine with optimized structure (OM-ELM) was proposed. OM-ELM employs quantum-behaved particle swarm optimization to automatically obtain the optimal network structure according to both the root mean square error on training data set and the norm of output weights. The proposed method is applied to handwritten recognition data set and a gas turbine engine diagnostic application and is compared with basic ELM, multi-hidden-layer ELM, and two state-of-the-art deep learning algorithms: deep belief network and the stacked denoising autoencoder. Results show that, with optimized network structure, OM-ELM obtains better test accuracy in both applications and is more robust to sensor noise. Meanwhile it controls the model complexity and needs far less hidden nodes than multi-hidden-layer ELM, thus saving computer memory and making it more efficient to implement. All these advantages make our method an effective and reliable tool for engine component fault diagnosis tool.


Introduction
The aero gas turbine engine is susceptible to many problems, including erosion, corrosion, fouling, and foreign object damage during its operation [1].These problems may cause engine component deterioration thus affecting the engine performance.Therefore, it is very important to develop engine component diagnostics methods using engine performance data to detect and isolate the component faults for the safety of aircrafts and reduce the maintenance cost.
Traditional model-based diagnostics methods, which are often used in practice, require an accurate engine mathematical model and their reliability often decreases as the system nonlinear complexities and modeling uncertainties increase.In essence, the engine component fault diagnosis is a challenging classification problem and could be resolved using neural network-based techniques.Applications of neural networks in engine fault diagnosis have been widely studied in the literature [2][3][4][5][6][7].In recent years, a novel learning algorithm for single-hidden-layer neural networks called extreme learning machine (ELM) [8,9] has been proposed and was applied in engine fault diagnosis.In ELM, the input weights and hidden biases are randomly generated, and the output weights are calculated by Moore-Penrose (MP) generalized inverse.It learns much faster with higher generalization performance than traditional gradient-based learning algorithms such as back-propagation.It also avoids many problems faced by traditional gradient-based learning algorithms such as stopping criteria, learning rate, and local minima problem.Yigang et al. [10] applied ELM to aircraft engine sensor fault diagnosis and the results show that ELM algorithm has higher classification precision and shorter training time than conventional BP neural network method.Li et al. [11] proposed a fusion diagnosis method of aero gas turbine engine component faults based on ELM and Kalman filters.To overcome the drawbacks of ELM, the input weights and input layer biases are optimized by differential evolution.In [12], an ELM with optimized input weights and hidden biases was applied to a gas turbine fan engine diagnostic problem and achieved better results than SVM and BP neural International Journal of Aerospace Engineering network methods.However, when the number of hidden nodes increases, the optimization of so many input weights and biases becomes more difficult and time consuming.
The basic ELM and most of its variants employ single hidden layer feed-forward networks, which limits its feature abstraction ability and classification performance in some real world applications.Recently, deep learning methods such as the deep belief network (DBN) [13], the stacked denoising autoencoder (SDAE) [14], and deep Boltzmann machine (DBM) [15] have shown better performance than shallow neural networks in machine learning area [16,17].Deep network architectures can be exponentially more efficient than shallow ones [18].The latter may require a large number of hidden neurons to represent highly varying functions [19][20][21].While deep architectures can represent these functions more efficiently, they thus outperform shallow models in many applications.
Inspired by the depth structure of deep learning networks, Kasun et al. [22] developed a multilayer learning architecture using ELM-based autoencoder as its building block.The deep architecture extracts features by a multilayer network, and the higher layers represent more abstract information than those from the lower ones.Test on MNIST shows that M-ELM performs on par with DBM and outperforms SDAE and DBN.However, in [22] M-ELM employs a fixed network structure and tends to need a large-scale model with huge amount of hidden nodes when dealing with difficult classification tasks.It took a 700-700-15000 network structure on MNIST data set and needed to be implemented in a computer with 32 Gbytes RAM.However, such a network with huge number of hidden nodes cannot be implemented in computers with small and medium size RAM.In addition, designing a suitable network needs a lot of trials and experience.Moreover, a fixed network structure is not robust to noise and may perform even worse when the sensor noise level is high.
In order to address the above mentioned issues of M-ELM, in this paper, we proposed an effective multi-hiddenlayer extreme learning machine algorithm which selects the optimal network structure automatically and adaptively.The new method adopts QPSO strategy to optimize the network structure according to both RMSE on training data set and the norm of output weights.Results on both MNIST data set and engine fault diagnosis application show that our method outperforms ELM, M-ELM, and other state-of-theart deep learning methods in testing accuracy and robustness to sensor noise.And the QPSO helps to reduce the number of hidden nodes significantly, thus saving computation resources and making it more efficient to be implemented.
The rest of the paper is organized as follows.Section 2 gives a brief review of ELM, M-ELM, and QPSO algorithm.Section 3 presents the proposed OM-ELM.In Section 4, our method is applied on MNIST data set and compared with other methods.Section 5 compares OM-ELM with other methods on engine component fault diagnostics applications followed by the conclusions in Section 6.

Input nodes Output nodes
Figure 1: ELM-AE structure.

Preliminaries
2.1.Multi-Hidden-Layer Extreme Learning Machine.Kasun et al. developed a multi-hidden-layer learning architecture using ELM-based autoencoder (ELM-AE) as its building block for representational learning [22].M-ELM performs layer-wise unsupervised training for each ELM-AE.However, unlike conventional deep learning algorithms, it does not require fine tuning and the unsupervised training is executed in a batch way.This makes M-ELM run faster than any deep learning algorithm.

ELM-AE.
As Figure 1 illustrated, an ELM-AE has input layer, hidden layer, and output layer.Input data is used as output data  = .Random weights and biases of the hidden nodes are chosen to be orthogonal.Orthogonalization of these randomly generated hidden parameters tends to improve ELM-AE's generalization performance [23].In ELM-AE, the orthogonal random weights and biases of the hidden nodes project the input data to a different or equal dimension space, and they are calculated as where  denotes the orthogonal random weights and  denotes the orthogonal random biases between the input and hidden nodes.
The output weight is calculated as follows: where  denotes ELM-AE's hidden layer outputs and  is its input and simultaneously output data./ is the regularization term, used to improve generalization performance and make the solution more robust.

M-ELM.
Figure 2 illustrates the construction of M-ELM.As can be seen from the figure, the output weights with respect to input data are the first layer weights of M-ELM.And the output weights  +1 of ELM-AE, with respect

Input nodes
Output nodes to th hidden layer output ℎ  of M-ELM, are the ( + 1)th layer weights of M-ELM, while the M-ELM output layer weights are calculated using regularized least squares as (2).As the output weights of an ELM-AE are the learned feature, M-ELM realizes a layer-wise feature abstraction like deep learning but without iteration process and fine tuning.
2.2.QPSO.QPSO solves the premature or local convergence problem of PSO and shows better performance than PSO in many applications [23,24].In QPSO, the state of a particle  is depicted by Schrodinger wave function (, ), instead of position and velocity.The position and velocity of the quantum particle cannot be determined simultaneously.The probability of the particle's appearance in a position from probability density function |(, )| 2 , the form of which depends on the potential field the particle lies in, can be only learned.Employing the Monte Carlo method, for the th particle   from the population, the particle moves according to the following iterative equation: where  , ( + 1) is the position of the th particle with respect to the th dimension in iteration . , is the local attractor of th particle to the th dimension and is defined as where   is the number of particles and   represents the best previous position of the th particle. is the global best position of the particle swarm. is the mean best position defined as the mean of all the best positions of the population and , , and  are random numbers distributed uniformly in [0, 1], respectively.Contractionexpansion coefficient  is used to control the convergence speed of the algorithm.may exist many unnecessary redundant nodes, which leads to an ill-conditioned hidden output matrix and decreased generalization performance.

OM-ELM
To address these problems and achieve an optimal network structure automatically, in this section, we proposed a method named OM-ELM, as illustrated in Figure 3.The method uses QPSO to optimize network structure according to both training accuracy and the norm of output weights to achieve a good generalization performance.
The main steps of the proposed OM-ELM are as follows.
Step 1 (initializing).Firstly, we generate the population of particles   = [ 1 , . . .,   ] randomly, where   ,  = 1, . . ., , denotes the number of nodes in the th hidden layer.Note that it must be integer and thus is rounded to integer by the following equation when it is not during iteration: In order to control complexity of network and save computing resources,   is restricted to a certain range with predefined upper and lower bonds according to the applications.
Step 2 (fitness evaluation).The corresponding output weights (the weights between the last hidden layer and output layer) of each particle (a potential network structure) are computed according to (2).Then the fitness of each particle is evaluated by the root mean square error between the desired output and estimated output: Step 3 (updating   and ).With the fitness values of all particles in population, the best previous position for th particle   and the global best position  of the current population is updated.As suggested in [25], neural network tends to have better generalization performance with the weights of smaller norm.Therefore, the fitness values along with the norm of output weights are considered together for updating   and .The updating criterion is as follows: where (  ), (  ), and () are the fitness value of the th particle's position, the best previous position of the th particle, and the global best position of the population.wo   , wo   , and wo  are the corresponding output weights (the weights between the last hidden layer and output layer) of the th particle, the best previous position of the th particle, and the global best position obtained so far.By this updating criterion, particles with smaller fitness values or smaller norms are more likely to be selected as   or .
Step 4. Calculate each particle's local attractor   and mean best position  according to (4) and (5).
Steps 2 to 5 are repeated until the maximum number of epochs is reached.Finally, we obtained the optimized network structure and apply it to the testing data set.

Comparisons on MNIST Data Set
Before we apply the method to aero gas turbine engine fault diagnosis, in this section, we first applied it on MNIST handwriting data set [26].The MNIST consists of 60 000 training images and 10 000 testing images of handwriting digits 0-9.As different digits have their unique shapes and different people write the number in their own ways, the MNIST is an ideal data set and commonly used to test the deep learning algorithms' performance.
In this section, we compare the proposed method with other four state-of-the-art classification methods: basic ELM, M-ELM, SDAE, and DBN.In our method, the maximum number of epochs of QPSO is 20, the number of particles in population is 20, and the upper bound of hidden nodes number in each hidden layer is 1500.As we have done some validation tests to pick the ridge parameter  in (2), it is set as 10 8 for all hidden layers.
We first test our method on the MNIST data set and obtained an optimized structure of 75-108-1473.Therefore, to compare fairly, all the other algorithms have roughly the same number of hidden nodes and the same three hidden layers except basic ELM.
For DBN, SDAE, and M-ELM, their hidden layer structure is 400-400-800.And ELM has one hidden layer with 1000 nodes (more nodes may cause a "run out of memory" problem in computer with medium size RAM).And all methods adopt sigmoid activation function.For the two deep learning methods, their learning rate is set as 0.1.The unsupervised pretraining epoch is set as 200 and supervised fine-tuning epoch is set as 400.The training data set is divided into mini-batches each containing 100 samples.
All simulations have been made in MATLAB R2008a environment running on a PC with 3.4 GHz CPU with 2 cores and 4 GB RAM.The results are listed in Table 1.
It can be seen from the table that our method achieved the highest testing accuracy among the state-of-the-art learning methods with similar size of hidden nodes.This testing accuracy is slightly less than the result in [22], but it takes only 1656 nodes, roughly one-tenth of the number used in [22].Thus it saves much computation memory and is efficient to implement in common computers without large RAM.The computing time is larger than M-ELM and basic ELM as our method needs to evaluate the whole population for iterations.But compared with deep learning methods, OM-ELM saves much time.The good performance on MNIST suggests that our method is a good tool for engine fault diagnosis.

Engine Selection and Modeling.
We evaluate the methods on a two-shaft turbine fan engine with a mixer and an afterburner (for confidentiality reasons the engine type is omitted), as is illustrated in Figure 4.This engine has a low by-pass ratio of 0.62.
The gas turbine engine is susceptible to many physical problems and these problems may result in the component fault and reduce the component isentropic efficiency.Thus they result in the deviations of some performance parameters such as rotational speed, pressures, and temperatures across different engine components.It is a practical way to detect and isolate the default component using measured engine performance data.However, the performance data of real faulty engine is very difficult to obtain and often belongs to manufacturer's or users' proprietary information and cannot be accessed easily.Therefore the component fault is usually simulated by engine mathematical model as suggested in [12].In this study, we simulate the behavior of the engine with component faults using the engine mathematical model developed in MATLAB environment.By implanting certain magnitude of isentropic efficiency deterioration of some certain component to the engine performance model, we can obtain simulated engine performance parameter data with component faults.

Generating Component Fault Samples.
In this study, we mainly focus on four rotating components, and different engine component fault scenarios including single and multifault cases were tested and are listed in Table 2.The first four columns represent four single fault cases.They are low pressure compressor (LPC) fault class, high pressure compressor (HPC) fault class, low pressure turbine (LPT) fault case, and high pressure turbine (HPT) fault class.Each class is labeled with an "F".C5 and C6 represent double faults cases, which are "LPC + HPC" and "LPC + LPT" fault class.And the last two columns represent triple faults cases, which are "LPC + HPC + LPT" and "LPC + LPT + HPT" fault class.
According to [12], the engine operating point has no obvious effect on classification accuracies of all fault detection methods; therefore, we chose only one operating point condition.The fuel flow and environment setting parameters of the operating point are listed in Table 3.
The input parameters of the training and test data set are the relative deviations of simulated engine performance parameters with component fault to the "healthy" engine parameters.And these parameters are selected by sensitivity analysis.They are low pressure rotor rotational speed  1 , .In this study, all the input parameters have been normalized into the range [0, 1].And the output is different fault classes.For example, [0 1 0 0 0 0 0 0] represents the second fault class (HPC fault) in Table 2.
Figure 5 shows the deviation response of engine performance parameters (i.e., the input parameters) against different fault patterns (1% loss in isentropic efficiency).It can be seen that the engine performance deviation responses of HPC and HPT are very similar.Thus it is very difficult to distinguish these two faults for a diagnosis method.
For each single fault class, 50 samples were generated by randomly selecting corresponding component isentropic efficiency deterioration magnitude within the range 1%-5%.For double faults classes, 100 instances were generated for each class by randomly setting the isentropic efficiency deterioration of two faulty components within the range 1%-5% simultaneously.The same method was applied for triple faults classes and each class has 300 samples.Altogether we have 1000 samples.
In real engine applications, there always exist sensor noises.To simulate real engine sensory signals, all input data are contaminated with measurement noise as the following equation: where  is clean input parameter,   denotes the imposed noise level, and  is the standard deviation of data set.Meanwhile, for an imposed noise level, we expand the data samples from 1000 to 4000 proportionally.And we chose 3000 samples as training data set (the number of samples in each class is in proportion to their origin data set) and the left 1000 samples are testing data set.

Parameter Settings.
In our method, the maximum number of epochs is 30, the number of particles in population is 20, and the upper bound of hidden nodes number in each hidden layer is 200.The ridge parameter  is set as 10 7 for all hidden layers.We first test our method on the data set with noise   = 0.05 and obtain an optimized structure 34-51-129.To compare fairly, the hidden layer structure for DBN, SDAE, and M-ELM is 60-60-90.Thus they have about the same total number of hidden nodes.And ELM has one single hidden layer of 210 nodes.For the two deep learning methods, their learning rate is set as 0.1.The unsupervised pretraining epoch is set as 200 and supervised fine-tuning epoch is set as 500.The training data set is divided into small mini-batches each containing 30 samples.
In order to account for the stochastic nature of these diagnostics methods, all the five methods are run 10 times separately.All simulations have been made in the same environment as in Section 4.

Comparisons of the Five Methods.
Performance comparisons of the five methods were first conducted with small noise level   = 0.05.Table 4 lists the mean performance of training and testing accuracy on all fault classes and training time in 10 runs. Figure 6 is the mean testing accuracies on different fault classes.
It can be seen from Table 4 that basic ELM obtained the least testing accuracy but the highest training accuracy, which means the generalization performance of ELM is not as good as other methods with more hidden layers.This suggests that multi-hidden-layer structure is able to ameliorate the overfitting problem faced by single hidden layer neural network.
Among the four multi-hidden-layer methods, our method achieved the highest mean testing accuracy on both single fault classes and multifault classes.The mean testing accuracy on all fault classes (0.985) is also better than any other method, which is consistent with the results on MNIST.Furthermore, the testing performance is very stable as it achieves the least mean standard deviation.The testing performance of M-ELM is on par with SDAE and better than DBN.
Due to the iteration nature of OM-ELM, it costs much training time compared with basic ELM and M-ELM.But the training of our method is much faster than that of deep learning algorithms, such as DBN and SDAE.
Table 5 presents the confusion matrix of our method in a random run.It can be seen that our method achieved a satisfactory result.It recognized four fault classes with 100 percent accuracy.And the number of misclassified samples is less than other methods.
We also compared the performance of these methods with different noise level.To study how noise level affects the methods, we have tested the performance of these methods on six noise level conditions:   = 0.05, 0.1, 0.15, 0.2, 0.25, 0.3.The mean testing accuracy of the methods versus noise level is illustrated in Figure 7.It can be seen from the figure that all methods' testing accuracy decreases with the increasing of noise level.Due to its shallow architecture, basic ELM did not perform as good as other methods with multiple hidden layers.With the optimized network structure, our method was the least affected by sensor noise and achieved the best testing accuracy in all noise level conditions.This suggests   that our method is more reliable and robust to the sensor noise and could be more suitable for aero gas turbine engine fault diagnosis tasks.
To further show how the QPSO strategy helps our method to achieve such a good performance, the evolution of populations' mean testing accuracy and the norm of output weights on noise level   = 0.1 in a single run is listed in Figures 8 and 9, respectively.
As testing accuracy values along with the norm of output weights are all considered in the QPSO updating criterion, the testing accuracy keeps increasing with iterations and the norm decreases with iterations.In the end, the optimized network structure is attained and our method is able to obtain a good classification and generalization capability.

Figure 6 :Figure 7 :
Figure 6: Mean classification accuracies of the 6 methods of different conditions.

Figure 8 :
Figure 8: Mean evolution of the testing accuracy of our method.

Figure 9 :
Figure 9: Mean evolution of output weight norm of our method.

Table 1 :
Performance of all methods on MNIST.Figure 4: Schematic layout of studied turbine fan engine.A, inlet; B, fan; C, high pressure compressor; D, main combustor; E, high pressure turbine; F, low pressure turbine; G, external duct; H, mixer; I, afterburner; J, nozzle.

Table 2 :
Different fault classes setting.

Table 3 :
Description of the studied operating point.
Table 6 lists the mean training accuracy, testing accuracy, and training time in 10 runs with noise level   = 0.1.Our method still achieved the best testing accuracy among all methods.

Table 5 :
Confusion matrix of our method.

Table 6 :
Performance of all methods on data set with noise level   = 0.1.