A Framework for Final Drive Simultaneous Failure Diagnosis Based on Fuzzy Entropy and Sparse Bayesian Extreme Learning Machine

This research proposes a novel framework of final drive simultaneous failure diagnosis containing feature extraction, training paired diagnostic models, generating decision threshold, and recognizing simultaneous failure modes. In feature extraction module, adopt wavelet package transform and fuzzy entropy to reduce noise interference and extract representative features of failure mode. Use single failure sample to construct probability classifiers based on paired sparse Bayesian extreme learning machine which is trained only by single failure modes and have high generalization and sparsity of sparse Bayesian learning approach. To generate optimal decision threshold which can convert probability output obtained from classifiers into final simultaneous failure modes, this research proposes using samples containing both single and simultaneous failure modes and Grid search method which is superior to traditional techniques in global optimization. Compared with other frequently used diagnostic approaches based on support vector machine and probability neural networks, experiment results based on F 1-measure value verify that the diagnostic accuracy and efficiency of the proposed framework which are crucial for simultaneous failure diagnosis are superior to the existing approach.


Introduction
With sustained increase of work condition complexity, simultaneous failures occur more frequently in final drive which is the pivotal part of car and seriously affect running status and comfort and safety of car. Final drive is mainly consisting of a pair of gears which are meshing together when car runs. Owing to the complex structure, a certain function disorder in final drive usually stems from more than one single failure at the same time which is called simultaneous failure. Traditional manual technology cannot accomplish simultaneous failure diagnosis (SFD). This paper is focusing on final drive simultaneous failure diagnosis which is essential for auto manufacturer and maintenance industry.
Failure diagnosis by using vibration signal is almost the most frequently used approach because vibration signal is relatively precise and accurate against diagnosis based on sound. It can be divided into three main steps: feature extraction, training diagnostic models, and failure mode identification. The vibration signal collected from final drive has the characteristics of being nonlinear and nonstationary, and it is enclosed with a lot of uncorrelated and superfluous information. It is impossible to extract valid failure mode information from original vibration signal because of the noise and interference embedded in it. The frequently used preprocessed methods include wavelet analysis [1,2], wavelet package transform (WPT) [3,4], and empirical mode decomposition (EMD) [5]. Wavelet package transform is suitable for nonstationary vibration signal by decomposing original signal into several subfrequency bands which contains different failure information and effectively reduces noise interference.
Data contained in preprocessed signal is high-dimensional so that it cannot be directly inputted into diagnostic system. Feature extraction has a deep effect on accuracy and reliability of failure diagnosis. Recently, researchers have introduced entropy into the field of feature extraction including approximate entropy, sample entropy [6], and fuzzy entropy [7]. Compared with approximation entropy 2 Computational Intelligence and Neuroscience and sample entropy which are based on Heaviside step function which is mutational at the classification boundary, fuzzy entropy eliminates the influence of baseline drift of data and guarantees the entropy to vary smoothly and continuously with similarity tolerance [8] so that it is excellent in measuring complexity and self-similarity of the preprocessed vibration signal and fully reflecting changes of the vibration performance of mechanical equipment [9].
In recent years, many machine learning methods are applied in failure diagnosis including support vector machine (SVM) [10], artificial neural networks (ANN) [11], extreme learning machine (ELM) [12,13], and kernel extreme learning machine (KELM) [14]. ELM is single-hidden-layer feedforward neural networks and without human intervention in tuning parameters which differs from SVM and ANN and makes it superior in high generalization and less learning time. KELM apply kernel function to ELM to improve generalization and nonlinear approximation ability [15]. However, computational cost and memory cost of KELM are high with regard to large scale problem. Recently, Bayesian methods are employed into ELM to learn the output weights by estimating the probability distribution of output with high generalization. Soria-Olivas and Gómez-Sanchis proposed Bayesian extreme learning machine [16] for linear regression without solving classification problem. Sparse Bayesian extreme learning machine (SBELM) [17] is a novel method for finding the sparse representatives of hidden layer output weights by imposing a hyperparameter on each weight. During learning phase, SBELM tunes some output weights into zero to obtain compact model. In summary, SBELM has the advantages of probability output, high generalization, sparsity, and fast training speed. To solve the problem of simultaneous failure diagnosis, a proper classifier has to offer the probability of all possible failures. In this research, the proposed framework constructs classifiers based on paired SBELM in which each classifier based on SBELM is trained by a pair of single failure samples. The paired SBELM effectively reflect the probability distribution of failure modes. In general, only single failure samples are used for constructing diagnostic models. Since it is impossible to collect all combinations of existing single failure modes for training, the proposed framework can effectively solve the practical bottleneck in simultaneous failure diagnosis. With the purpose of recognizing simultaneous failure modes, use both single and simultaneous failure samples and Gird search method to generate optimal decision threshold which could convert probability result of classifier into final multiple failure modes. Considering that partial matching is valid and instrumental in simultaneous failure diagnosis, this research adopts F 1 -measure to evaluate the performance of the proposed framework.
This paper is organized as follows: Section 2 presents the proposed framework. Section 3 presents the experiment setup, data acquisition, and preprocessing. The results of experiment are discussed in Section 4. Finally, a conclusion is given in Section 5. Wavelet package transform (WPT) is an extended form of wavelet transform to analyse nonstationary and non-linear signal and to supply better partition of frequency band because the same frequency bandwidths can provide good resolution regardless of high and low frequencies [18]. As a multiresolution analysis method, WPT can effectively preprocess nonstationary vibration signal in both time domain and frequency domain. Two-scale equation of WPT is shown below, in which ℎ 0 and ℎ 1 represent the filter coefficients:

The Proposed Framework for Final
The recursion formula of wavelet package coefficient is 2.1.2. Fuzzy Entropy. When failure occurred in final drive, the complexity of oscillation feature will change; hence we should extract representative features containing in the signal. Fuzzy entropy is an extension of Shannon entropy and fuzzy sets [19]. The procedure of fuzzy entropy is described as follows.
where 0 ( ) is the average of vector . (2) Define the distance between and where , = 1, 2, . . . , − , ̸ = as follows: (3) Calculate similarity between and using fuzzy function: (4) Define function as follows: Computational Intelligence and Neuroscience 3 (5) Change to + 1 and repeat step (1) to (4):  . The output function of ELM with hidden nodes is shown as follows: where = [ 1 , . . . , ] is output weight connecting hidden nodes and output nodes; ( ) = [ℎ 1 ( ), . . . , ℎ ( )] is the hidden layer output matrix for input in which ℎ ( ) is the hidden output of the th hidden node. Equation (10) can be written as follows: where is the training data target matrix. SBELM learns output weight by using Bayesian method instead of by calculating Moore-Penrose generalized inverse of [17]. The hidden layer output becomes the input of SBELM. Treat each training sample as an independent Bernoulli event so that probability ( | ) satisfies Bernoulli distribution. Apply sigmoid function to convert the predicted output Y(ℎ; ) as follows: .
The likelihood function of sample set is expressed as follows: where is the target of training sample , Y(ℎ; ) = ℎ , and ∈ {0, 1}. Conditioned on a hyperparameter , zero-mean Gaussian prior distribution over is as follows: The typical step of SBELM is to establish the distribution of marginal likelihood over conditioned on and and determine by maximizing the marginal likelihood ( | , ) by Laplace approximation method: where = (Y(ℎ ; )), = diag( ), and const = ∑ =1 ln − 1/2 ln 2 . Then, make quadratic approximation for log of posterior probability: where is a diagonal matrix in which = (1 − ) with = 1, . . . , . Therefore, the center and covariance matrix of Gauss distribution of expressed as and Φ are obtained as follows: where = + −1 ( − ). By obtaining Gauss approximation of , the log of marginal likelihood is represented as follows: where = + . By setting the differential of L( ) with respect to as 0, update the hyperparameter as follows: The main procedure of SBELM is described as follows.
(2) By utilizing Laplace approximation approach, obtain approximated Gauss distribution of and update and Φ by using (17). (3) By maximizing the marginal likelihood, utilize (19) to update hyperparameter until reaching the termination criteria. (4) By tuning some into 0, obtain the sparse representation of hidden layer output weight. (5) For an unknown sample , utilize (12) to predict probability distribution ( | , ).

4
Computational Intelligence and Neuroscience  In simultaneous failure diagnosis, more than one failure may occur at the same time that can infer the concept: ∑ =1 ̸ = 1 [21]. By estimating each probability output of binary classifiers SBELM to measure correlation between various classes, obtain the paired probability output as follows: where is the number of training sample belonging to the th and the th class.

Optimization of Decision Threshold for Simultaneous Failure Mode Recognition.
For a -class classification problem, the output of classifiers based on PSBELM is a probability vector = [ 1 , . . . , ] in which represents occurring possibility of the th failure. In order to obtain final simultaneous failure modes, an appropriate threshold value is indispensable. In general, researchers usually use 0.5 to be the threshold value [21] which is of generality but not suitable for specific application. This paper utilizes Grid Search method and an independent sample set containing both single failure and simultaneous failure to generate an optimal decision threshold * between 0 and 1 which can convert probability output vector into result vector = [ 1 , . . . , , . . . , ] effectively: The simultaneous failure modes are those single failures that their corresponding is equal to 1. Since the range of searching is limited, the time-consuming characteristic of Grid Search method can not weaken its advantages of global optimization compared with GA and PSO [18].

Evaluation of Performance
Based on F 1 -Measure. Considering that partial matching is valid and significant in simultaneous failure diagnosis [22], utilize an independent testing set and F 1 -measure [23] which is commonly used for evaluation of information retrieval systems to evaluate diagnostic accuracy for the proposed simultaneous failure diagnostic framework. Given a data set = ( , ), = 1, . . . , , ∈ , ∈ , ∈ {0, 1}, = 1, . . . , , define two variables namely precision ( ) and recall ( ) among which represents the ratio between correct identified single failure modes and the actual simultaneous failure modes and represents the ratio between correct identified single failure modes and the predicted simultaneous failure modes: where * = [ * 1 , . . . , * ] is the predicted simultaneous failure modes by using the proposed framework and = [ 1 , . . . , ] is the actual simultaneous failure modes of . The F 1 -measure value can be obtained as follows:

The Proposed Framework for Final Drive Simultaneous
Failure Diagnosis. The structure of the proposed framework is shown in Figure 2 and the procedure of the proposed framework is described as follows.
(1) Divide sample set into four parts: training1 , training2 , threshold , and testing . All the sample should be preprocessed by WPT and utilize fuzzy entropy to measure the feature of oscillatory.
(2) Utilize training1 containing only single failure modes to optimize parameters of WPT including number of layer and mother wavelet in preprocessing by using failure diagnostic model of SBELM.
(3) By using optimal parameters of WPT obtained from step 2, preprocess training2 containing only single failure modes and train classifiers based on paired SBELM.
(4) The optimum diagnostic model of PSBELM generates a probability output vector [ 1 , . . . , ] in -label classification problem and uses threshold containing both single and simultaneous failure modes and Grid Search to confirm the decision threshold value * which is used to obtain simultaneous failure modes.

Experiment Setup.
In order to obtain sample data with representativeness for constructing diagnostic model and verify the efficiency of the proposed diagnostic platform, implement the experiments on a test bed containing a PC, two sensors, a signal amplifier, and a simulation turntable with the composition as shown in Figure 3 in quiet room to collect enough original vibration signal of final drive. Two sensors are laid on the final drive in horizontal and vertical direction as shown in Figure 4 to collect vibration signal when it is set into running state. Most failures of final drive such as gear error, gear hard point, and tooth broken occurred in gear pair which is the hard core of final drive and consisted of a drivinggear and a driven-gear. In this research simulate 9 common failures including 6 single failures and 3 simultaneous failures which are described in detail in Table 1 under the rotating speed of 1200 r/m for driving motor. As shown in Figure 5, amplitudes of simultaneous failure are obviously greater than single failure because when simultaneous failure occurs these single failures are coupled together severely. The wave profiles between single and simultaneous failure patterns are similar, so that it is difficult to distinguish them manually but the characteristics embedded in each vibration signal can be extracted and identified by using methods afore mentioned.
Considering the universality of vibration signals which are used to construct diagnostic models, repeat simulating each failure mode for 100 times and record the most stable 2 seconds in each time with the sampling rate of 12 kHz which should be higher than the gear meshing frequency so that effective failure information may not be discarded during the sampling. Eventually, 1000 sample data are obtained and prepared to be preprocessed. All the simulations are implemented in MATLAB 7.0 which is running in a PC with CPU of 3.4 GHZ and RAM of 4.0 GB.

Feature Extraction Based on WPT and Fuzzy Entropy.
In this paper, fuzzy entropy is used to reflect the change of complexity. By calculating the average fuzzy entropy of vibration signal corresponding to each failure mode which is shown in Table 2, we find out that the values of fuzzy entropy of these 10 failure modes are approximate so that it can not   and 300 simultaneous failure samples. In each trial, randomly divide the whole sample set into four parts: training1 , training2 , threshold , and testing . training1 and training2 which are consisting of only single failure modes are used for optimizing parameters of WPT and training optimal diagnostic model based on paired SBELM. threshold which contains both single and simultaneous failure modes is used to generate optimal decision threshold which convert probability result of diagnostic model based on paired SBELM into final simultaneous failure modes. testing is used to test and evaluate the proposed framework by using F 1 -measure. Ensure that the whole sample set be preprocessed and the size of training samples should be more than testing samples to ensure the generalization of the proposed framework. The distribution plan is shown in Table 3.

Optimization of Preprocessing and Feature Extraction.
In data preprocessing, optimum combination of decomposition level and mother wavelet of WPT and parameters of fuzzy entropy can achieve better performance in classification. We use training1 containing random 250 single samples to obtain the optimal combination of level number and mother wavelet which is suitable for preprocessing samples collected from final drive. In order to simplify experiment and on the basis of previous research result, focus on three wavelets Db3, Db4, and Db5 and three decomposition levels from 3 to 5. Three parameters of fuzzy entropy including , , and are defined empirically in advance. Parameter is usually set to be 2. Related to the boundary of fuzzy function, parameters and are setting as 0.2 and 2 STD where STD is the standard deviation of original data [8].
By using single failure samples contained in training1 and standard diagnostic model based on SBELM with failure parameters, find out appropriate parameters of WPT to achieve best performance of preprocessing. The standard diagnostic model based on SBELM is only used for selecting optimal parameters of WPT that exist in the best failure identification model in which the accuracy of classification is highest. The comparison result is shown in Figure 6 which indicates that classification accuracy of the preprocessing by Computational Intelligence and Neuroscience 7   using 3 level decomposition and Db4 as mother wavelet and standard diagnostic model based on SBELM is highest with the accuracy of 95.2%. This parameter combination of WPT is suitable for preprocessing the dataset in this application. After decomposing vibration signal by using three-level wavelet package decomposition, calculate the corresponding value of fuzzy entropy as shown in Figure 7. In Figure 7, horizontal ordinate represents eight subfrequency bands of three-level wavelet package decomposition, and longitudinal coordinate represents the fuzzy entropy value. The FuzzyEn of the oscillation from final drive with simultaneous failures is larger than that of single failures and normal status. When simultaneous failures occur under rotation of gear pair, different failure points are coupling together to make the

Effectiveness of Optimal Decision
Threshold. After constructing optimal diagnostic model based on paired SBELM with optimal parameters of WPT in preprocessing by using only single failure modes, generation of optimal decision threshold is the pivotal point which affects final diagnostic accuracy of simultaneous failure. Traditional machine learning methods usually adopt 0.5 as general threshold value (GT) [24]. This research uses threshold containing 100 single failure modes and 200 simultaneous failure modes and Grid Search method with interval of 0.01 to search final decision threshold * in range of 0 to 1. Although Grid Search is time consuming, it can obtain global optimum.
With the purpose of verifying effectiveness of optimal decision threshold, utilize 5-fold cross validation method to implement a set of experiments by using threshold for both single and simultaneous failure modes recognition. Results are shown in Figure 8.
After optimizing threshold, the accuracy of diagnostic model improves by an average of 6%. Fixed General threshold is generated by experience so that it has generalization but without optimization [25]. Even using the same diagnostic model to diagnose different sample set would require different threshold. Therefore, this research uses an independent sample set to generate optimal decision threshold.

Sensitivity Analysis of SBELM.
For diagnosis based on ELM, diagnostic accuracy and training speed are sensitive to the initial number of hidden nodes. To analyze the sensitivity of SBELM on the number of hidden nodes in this application, use 500 single failure samples in training1 and training2 to train classifier based on SBELM and the best average accuracy along with the increase of hidden nodes is shown in Figure 9. As shown in Figure 9, the average accuracies of ELM with increment of hidden nodes are in larger variation. The reason for this fluctuation is that ELM  is in poor generalization because of data overfitting [17]. However, the average accuracies are stable and are obviously higher than ELM. The result verifies that SBELM is relatively insensitive to the initial number of hidden nodes. Moreover, SBELM can obtain an excellent accuracy with a small hidden layer which reduces the computational cost effectively.

Evaluation of the Proposed Framework.
In order to effectively confirm the availability of the proposed simultaneous failure diagnosis framework, we use testing containing 100 single failure modes and 100 simultaneous failure modes and F 1 -measure method to measure performance of the proposed framework and diagnostic model based on PNN and SVM in diagnostic accuracy and diagnostic speed. Firstly, use sample set which are consisting of training1 and training2 to construct and tune parameters of diagnostic model based on PNN and SVM separately and then use threshold to generate optimal threshold value. Since SVM is essentially used for binary-class classification [26], with the purpose of simultaneous failure diagnosis we combine SVM with multiclass classification strategy to construct a set of classifiers in which each classifier is only focusing on two failure modes. Trying to ensure the excellent performance of classifiers based on SVM, set the value of regularization parameter of SVM to be 10 where is between 0 and 2. Radial basis kernel function is employed in SVM with = 10 and = 2 which show the best accuracy of classification. As a probability classifier, the crucial hyperparameter of PNN is spread . In this research, the value of s is chosen from 1 to 3 with interval of 0.5 according to conclusion of references. Finally, the best hyperparameters and threshold value for PNN are 1 and 0.69.
To verify the effectiveness of the paired strategy in the proposed framework, implement a set of experiments with one-to-all strategy. The experimental results are shown in Table 4. Comparing different classifiers with one-to-all strategy and paired strategy, the accuracies of classifiers with paired strategy are generally 2% to 4% higher than that of classifiers with one-to-all strategy. The primary reason is that paired strategy which is used in the proposed framework fully considers the correlation between each single failure. However, one-to-all strategy may cause some indecision regions between different classes. The indecision region is prone to sinking into misclassification.
To verify the performance of the proposed framework, implement a set of experiments about different classifiers with the same testing set and best parameters. The decision threshold values, training time, testing time, and testing accuracy of diagnostic models based on paired SBELM, SVM, and PNN are shown in Table 5. The diagnostic accuracy of paired SBELM for single failure, simultaneous failure, and entire sample is 98.4%, 92.8%, and 96.2% which are higher than that of the SVM and PNN. The reason is that SBELM estimates the probability distribution of output values instead of fitting data to improve generalization [17]. Moreover, the training time and testing time of paired SBELM are 145.4 ms and 48.7 ms that are much fewer than SVMs. The reason for this disparity is that even though paired SBELM builds a set of binary classifiers, the sparse characteristic of SBELM reduces the computational cost. Consequently, the disparity will become more obvious if the size of sample is big.
In practical application of auto manufacturer, representative and valid samples are continuously collected and added to the training sample database to improve training accuracy. Based on this, learning speed becomes a crucial factor for evaluating the efficiency of diagnostic platform. In general, considering both diagnostic accuracy and diagnostic efficiency, the proposed platform is superior in simultaneous  failure diagnosis and it is not only suitable for final drive of car but also it can be porting to other research fields.
In order to verify the stability of the proposed diagnostic framework based on paired SBELM, implement 100 trials and in each trial the whole sample data is reshuffled and randomly distributed into testing afresh and make sure there are enough single failure samples and simultaneous failure samples in testing . The testing result is shown in Figure 10 in which the testing accuracy is stable in the range between 95% and 97% and there is no dramatic variation in 100 simulation trials.

Conclusion
This paper proposes a novel framework based on SBELM and fuzzy entropy for simultaneous failure diagnosis of final drive which is hardcore to affect the performance and safety of car. The proposed framework contains four sections: preprocessing and feature extraction based on WPT and fuzzy entropy, construction of diagnostic model based on paired SBELM, generation of decision threshold value, and recognition of simultaneous failure modes. By using single failure samples, obtain optimal parameters of WPT which are perfectly adequate for the data in this application. Diagnostic model based on paired SBELM in which each binary classifier is trained by only single failure samples. With an independent sample subset containing both single and simultaneous failure samples, use Grid Search method to generate optimal decision threshold by which probability result obtained from diagnostic model can be converted into final result of simultaneous failure modes. Compared with frequently used diagnostic model based on SVM and PNN, there are three superiorities of the proposed framework. (1) The proposed framework based on SBELM inherits the advantages of ELM (efficient approximation and learning speed) and sparse Bayesian learning (high sparsity and generalization). (2) Fully considering the difficulty and impossibility of assembling all possible simultaneous failure modes, the proposed framework trains paired classifiers based on SBELM by using only single failure samples, and moreover the paired strategy can effectively avoid indecision regions between different classes which can result in misclassification. (3) With the average testing accuracy of 96.2% and testing time of 48.7 ms, the proposed framework outperforms other diagnostic models in diagnostic accuracy and learning speed. The proposed framework is general and transplantable for simultaneous failure diagnosis, so it can be applied to other applications in industrial area in which accuracy and time cost of failure identification are key factors.