A Novel Deep Sparse Filtering Method for Intelligent Fault Diagnosis by Acoustic Signal Processing

,


Introduction
Rotating machinery is widely used in automobile engine, wind power equipment, and water turbine generator equipment. A slight mechanical fault may directly affect the operation of machines and cause severe accident. us, it is important to ensure the high stability operation of machines [1]. Fault diagnosis technology has been proven to be an effective method to monitor the operating state of equipment in recent years [2][3][4]. How to extract useful features from massive mechanical data to precisely diagnose the health condition of machines has always been a hot research topic [5][6][7].
As an effective fault diagnosis method of rotating machinery, increased attention has been paid to research on acoustic signal processing. Given that the acoustic signals are easier to obtain and less costly than vibration signals [8], therefore it has become a trend to apply acoustic signal to fault diagnosis [9]. However, the signal-to-noise ratio of acoustic signals is low, which increases the difficulty of signal denoising and feature extraction. Traditional fault diagnosis is usually based on the signal processing methods such as short time Fourier transform (STFT) [10], wavelet transform (WT) [11], and empirical mode decomposition (EMD) [12], which overcomes the difficulty of processing acoustic signals and achieves certain results. However, all of the methods mentioned above require empirical knowledge and are timeconsuming.
Unsupervised learning may hold potential to overcome the aforementioned weakness in traditional intelligent fault diagnosis method. e basic idea behind unsupervised feature learning is that training artificial intelligence techniques can be viewed as learning a nonlinear function, which transforms the raw data from the original space into a feature space. Hence, unsupervised feature learning is recommended. e purpose of unsupervised feature learning is to adaptively learn effective features from unlabeled data rather than from artificial engineering feature representation [13]. Meanwhile, unsupervised feature learning has been widely applied in the speech recognition [14], face recognition [15], image classification [16], and other fields. Sparse filtering (SF) [17] is an unsupervised two-layer neural network, which can use its own population sparsity, life sparsity, and high dispersion to learn deep discriminative features automatically. So far, SF has been successfully applied to many scenarios, and its usefulness repeatedly confirmed. Meanwhile, how the implicit hypothesis and constraints of sparse filtering make it suitable for some scenes is confirmed by [18]. Lei et al. [19] first constructively employed SF to bearing fault diagnosis by adopting a twostage learning method, which greatly reduced human labor and made intelligent fault diagnosis much easier to handle big data. Yang et al. [20] introduced L2-norm regularization to enhance the generalization ability of SF and achieved better classification performance. Qian et al. [21] introduced L1-norm regularization into the cost function of SF and replaced the soft-absolute activation function with logarithm function to prevent overfitting more efficiently. Wang et al. [22] proposed a framework based on sparse filtering which can adaptively extract features from frequency-domain signals.
Although SF can automatically extract useful features from vibration signals, its feature extraction ability remains poor in processing acoustic signals. So we set up the batchnormalized deep sparse filtering (DSF) model to filter the acoustic signal twice to remove the redundant information better, and then the weights are fine-tuned by back propagation (BP) algorithm. erefore, a novel DSF model is proposed to deal with the acoustic signal in this paper. e main contributions of this paper are summarized as follows: (1) Batch normalization is introduced into DSF model, which can reparametrize the hidden layer of DSF, improve the training speed, and accelerate the convergence speed. (2) A two-layer batch-normalized DSF is established to filter the acoustic signal twice, and then the weights are adjusted by BP algorithm to get more robust features. e experimental results of gearbox and bearing datasets show that DSF has higher accuracy and faster computing time than other SF models. e rest of this paper is organized as follows. In Section 2, DSF in deep learning is introduced. In Sections 3 and 4, a deep sparse filtering framework is established, and planetary gear and bearing datasets are investigated using the different SF models, respectively. e conclusion is presented in Section 5.

Sparse Filtering.
Traditional unsupervised feature learning methods [23] need to adjust multiple parameters to achieve better performance, which is an arduous task. erefore, Ngiam et al. [17] proposed a method with only one parameter called SF to solve this weakness. As a simple and efficient unsupervised learning method, SF only focuses on the sparse distribution of data features. e structure of SF is shown in Figure 1. Its input and output are collected dataset and the learned features, respectively.
First, acoustic signals are collected from each health condition and combined into training set where f i j corresponds to the jth feature of the ith sample. Features f i j make up a feature matrix; each row of the feature matrix is normalized through its L2 norm en, each column is normalized by its L2 norm, and the features are mapped into the unit L2-ball: Finally, the optimization features can be obtained by L1 penalty; the cost function of SF is given as

Deep Sparse
Filtering. e standard SF model is a simple two-layer neural network. In this paper, DSF model is designed by layer-by-layer unsupervised learning. Specifically, the output features of the first SF are used as the input of the second SF to extract the feature layer by layer, and then the weights are reversely fine-tuned using the BP algorithm. In addition, we use a rectified linear unit function (ReLU) [24] as activation function. e output features of nth hidden layer can be calculated as where W n represents the weight matrix between nth hidden layer and (n−1)th hidden layer. Each SF layer is trained by solving the minimization problem, as where k is the number of training samples in nth layer. At the same time, the batch normalization is introduced to optimize the DSF. Batch normalization can reparametrize almost deep networks in an elegant way. e procedure is able to be used in every activation layer without parameter adjustment. For a layer with n-dimensional input x � (x 1 , · · · , x k ), to improve the training and reduce the internal covariate shift, two necessary simplifications are taken by batch normalization.
Firstly, each scalar feature is normalized independently by making its own zero mean and unit variance. Shock and Vibration where E[xi] is the mean of each unit and ������� However, the simply normalization of each input in a layer still can change the representation of the layer. So two parameters c i and β i are employed for each activation x i , which aim to scale and shift the normalized value: c i and β i are learned along with the raw model parameters and restore the representation power of the network. Note that the raw activations can be recovered by In this case, the steady distribution of activation values can be guaranteed during each training.
Here, we apply the batch normalization immediately before the activation layers of SF. So equation (5) is replaced with erefore, BN transform introduces normalized activations into the network and ensures the layers can continue learning on input distributions that reduce the influence of internal covariate shift, so that an easy starting condition can be constructed for training and further accelerating the training.
e number of layers can be selected according to different task requirements. It is generally believed that increasing the number of hidden layers can reduce the network error and improve the accuracy, but it also complicates the network, thus increasing the network training time and the tendency of overfitting. So it is actually a trade-off choice in application of the proposed method. In this paper, we choose two layers of DSF to extract the feature of acoustic signal, which can not only ensure less computation but also extract deeper features. Figure 2 shows the schematic of DSF. e acoustic signal datasets are used to train the first SF layer and subsequently the second SF layer. Firstly, the output batchnormalized features of the first SF layer are used as the input features of the second SF layer, and then the softmax regression is connected to the last layer of DSF as the classification layer. Finally, BP algorithm is used for the reverse weight fine-tuning.

Intelligent Fault Diagnosis Framework
Based on DSF e proposed fault diagnosis method mainly consists of three stages as shown in Figure 3. In the first stage, the collected time-domain acoustic signals are pre-normalized to eliminate the adverse effects of singular samples. en, the normalized time-domain signal is transformed into frequency-domain signal through FFT. In the second stage, the weight matrix W is obtained by training batch-normalized DSF with frequency-domain signals, and then the W is finetuned by BP algorithm. Finally, the optimized W is used to learn the deep discriminative features from the original frequency-domain signals. In the third stage, softmax regression is used as a classifier for heath condition recognition through the learned features: (1) Training data collection: the acoustic time-domain signals collected from rotating machinery under different health conditions are divided into K samples to form the training dataset x i , y i K i�1 , where x i ∈ R N×1 denotes each sample containing N timedomain points and y i denotes the health condition label of the ith sample.
(2) Training data processing: training set is rewritten as a matrix form X ∈ R N×K . Before training the DSF model, each column of training set X is first normalized by its l 2 -norm as follows: en, prenormalization training dataset x i n K i�1 is transformed into training dataset t i K i�1 by FFT, where t i ∈ R N in ×1 denotes each sample containing N in Fourier coefficients. N in represents the input dimension of DSF, and N out is the output dimension. e training set t i K i�1 can be further written as a matrix S ∈ R N in ×K for simplicity.  Figure 4(a), the test bench includes a motor, three shaft couplings, a bearing seat, a gearbox, and a brake. As shown in Figure 4 One sample is randomly selected from each health condition to show the acoustic signal details. e timedomain and corresponding frequency-domain waveforms of the samples are shown in Figure 5. It can be seen that it is arduous to distinguish different health conditions artificially, and the huge amount of data also increases the difficulty of feature extraction. erefore, the DSF model is proposed to    Shock and Vibration automatically extract the feature of acoustic signal and conduct precise fault classification.

Results and Analysis.
e frequency-domain signal is used as the input of DSF model, and the output dimensions of the two SF layers are set to 800 and 400, respectively. e number of outputs of softmax classification is 9. erefore, the structure of the DSF model is 1200-800-400-9. Subsequently, we investigate the effect of iteration number. Randomly select 5% samples for training and the diagnostic accuracies using different iteration number are displayed in Figure 6. Since the increasing of the accuracies is not obvious after number of iterations exceeds 40, we choose 40 as the iteration number of DSF. Meanwhile, the iteration number of the BP algorithm is 50, and the batch size is 30. In order to show the superiority of DSF model, standard SF [19], L1 regularized sparse filtering (L1-SF) [21], and L2 regularized sparse filtering (L2-SF) [20] are used as comparison methods. e output dimension of the three comparison methods is set as 1200, the number of iterations is 100, and the regularization parameter is 1E-5. 20 trails are conducted for each experiment to reduce the influence of randomness. e computing platform is a PC with an I5-4210M CPU and 8 GB RAM. e diagnosis results of different numbers of training samples using the proposed DSF model are shown in Figure 7. It is obvious that the accuracy and computing time increase with the rise of the training sample number. It can be seen from the figure that the DSF model with only 5% of the training samples can achieve average testing accuracy of 98.15% ± 0.33%, indicating that the proposed method can diagnose 9 health conditions in the absence of training samples. When the number of samples increases to 10%, the average test accuracy reaches 99.92% ± 0.027%, and the average computing time is 14.9s. erefore, in the following experiments, 10% of the samples were used for training. e diagnosis results of the four methods are shown in Figure 8. It is certain that the DSF model has the highest average testing accuracy (99.93%) and the lowest standard deviation (0.027%) among all the methods. It can be seen from the figure that the average accuracy of the standard SF is 89.05% ± 1.39%, which is the worst among the methods. e testing accuracies of L1-SF and L2-SF are 90.45% ± 1.09% and 91.63% ± 0.77%, respectively, which are slightly higher than those of SF. It is worth mentioning that the proposed DSF model computing time is 14.9s. By contrast, the average computing time of SF, L1-SF, and L2-SF is about 100s. is finding indicates that the DSF method can better overcome the difficulty of extracting the acoustic signal features and achieve the highest accuracy and least computing time among the four methods in terms of diagnosing bearing fault types.
In order to better present the superiority of DSF, here, we make a detailed comparison between our method and other several classical methods by using the same bearing dataset, as summarized in Table 1. In Method 1, ensemble empirical mode decomposition (EMMD) [25] was employed to extract features and then the features were classified by an optimized SVM. It achieved 96.67% testing accuracy on the bearing dataset. In Method 2, Jia et al. [26] constructed SAE based deep network utilizing frequency spectra as inputs to diagnosis and 99.68% testing accuracy is obtained. In Method 3, the frequency spectra are also used as inputs of Back Propagation Neural Networks (BPNN) and the diagnosis accuracy is 73.74%. In Method 4, Xie et al. [27] proposed feature extraction algorithm based on empirical mode decomposition (EMD) and convolutional neural network Shock and Vibration (CNN) techniques and obtained 99.75% testing accuracy. In Method 5, the proposed method achieves the best testing accuracy of 99.93% when classifying ten different fault conditions, which outperforms all compared approaches.
To show the details of the diagnostic results of the four methods, the confusion matrixes on the bearing dataset are presented in Figure 9. It can be seen from Figures 9(a) and 9(b) that the classification results of SF and L1-SF are unsatisfactory.
e concurrent faults such as ROF0.2 and ROF0.4 are not well distinguished, and the single faults such as RF0.4 and OF0.4 are not perfectly distinguished. e fault classification performance of L2-SF is slightly better than that of SF and L1-SF as shown in Figure 9(c), but it cannot distinguish different health conditions with high accuracy, which means that concurrent faults increase the difficulty of fault classification. As shown in Figure 9(d), the proposed DSF model can distinguish not only single faults but also concurrent faults perfectly, which shows that the proposed method can better extract the deep features of acoustic signal.

Data Description.
e gear fault signals are measured from the gearbox of the test bench as shown in Figure 4(a). e collected dataset contains one normal condition (NC) and four kinds of mechanical faults including sun wheel crack (WC), sun wheel pit (WP), pinion crack (PC), and pinion pit (PP), as shown in Figure 10. e gear speed is 2600 r/min and the sampling frequency is 10.24 kHz. 300 samples were collected for each health condition and each  sample contains 1600 data points. Each sample gets 800 frequency-domain points through FFT as the input of the model.

Results and Analysis.
10% of gear samples were randomly selected to train DSF model. After testing, we set 600 and 100 as the output dimension of the two SF layers.        Each experiment is repeated 20 times to reduce the effects of randomness. e gear fault diagnosis results of the four methods are shown in Figure 11. Specifically, the training accuracy of the four methods is 100%. e performance of SF model is the most unsatisfactory, and the testing accuracy is 87.82% ± 0.91%. e testing accuracies of L1-SF and L2-SF are 88% and 90%, respectively. By contrast, the proposed DSF model achieved the highest testing accuracy of 99.11% ± 0.11%. Meanwhile, the average computing time of DSF model is 9.58 s, and the computing time of three comparison methods is about 5 times that of DSF model. In conclusion, the proposed DSF method can achieve the highest accuracy and robustness in acoustic signal fault diagnosis. e average accuracies of the five health conditions are shown in Figure 12. It can be determined that the four methods can precisely diagnose the health condition of PC. However, the three comparison methods have lower testing accuracies for health conditions NC, PP, WC, and WP. In contrast, DSF model can overcome this shortcoming and accurately diagnose the five health conditions.

Shock and Vibration
To confirm the performance of the proposed DSF model, the t-distributed stochastic neighbor embedding (t-SNE) is applied to obtain the first two dimensions of learned features and the results are shown in Figure 13. It can be seen from Figure 13(a) that the features of the same health condition are well clustered. However, ten points of WC were misclassified to PP, and five points of NC were misclassified to WC, which explains the phenomenon that this method obtains low diagnosis accuracy for the health conditions of PP and WC. In Figures 13(b) and 13(c), the interval between NC, WC, and NC is not obvious; there are also more points that are misclassified. It is worth noting that each feature cluster of the DSF method is separated, and only four points of WC were mistakenly assigned to NC, and three points of WP were mistakenly assigned to PC, which means that

Conclusion
In this paper, a batch-normalized DSF model is proposed to process acoustic signals for fault diagnosis. Two SF models are stacked to extract the deep features of acoustic signals, and the optimal weight is obtained by fine-tuning process of BP algorithm. e experiment results of bearing and gearbox dataset show that the DSF model can also achieve high test accuracy in the case of insufficient training samples. Meanwhile, compared with other SF models, DSF can get the highest accuracy and the least computing time, which shows that the proposed method is more efficient in feature extraction.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.   Shock and Vibration