Fine-Grained Fault Diagnosis Method of Rolling Bearing Combining Multisynchrosqueezing Transform and Sparse Feature Coding Based on Dictionary Learning

,


Introduction
Rolling bearing is not only the core part of the rotating mechanical transmission device but also the part prone to failure, whose health state directly affects the performance of the whole equipment and the safety of staff [1].To avoid human casualties and economic losses, it is necessary to accurately diagnose the fault states of rolling bearing in time and take corresponding maintenance measures according to the diagnosis results.
How to acquire well raw signals plays a crucial role in subsequent fault diagnosis.However, raw vibration signals of rolling bearing are always nonlinear and nonstationary in practice.It is necessary to apply proper analysis methods to process these complex signals.Fortunately, joint time-frequency analysis can identify the signal frequency components and reveal their time variant features [2].erefore, various time-frequency methods have widely been used in fault diagnosis, mainly including short-time Fourier transform (STFT) [3], continuous wavelet transform (CWT) [4], Wigner-Ville distribution (WVD) [5], S transform (ST) [6], and Hilbert-Huang transform (HHT) [7].Unfortunately, time-frequency images generated by these analysis methods are very blurry and have serious cross-terms.To address this shortcoming, researchers have made some improvements on above classic time-frequency analysis methods.For instance, Zhang et al [8] presented a time-frequency analysis method based on CWT and multiple Q-factor Gabor wavelets (MQGWs), which adopted Gabor wavelets with multiple Q-factors to improve the resolution of CWT time-frequency map.Cai and Xiao [9] introduced generalized S transform into the fault diagnosis of rolling bearing, which can effectively enhance the resolution of the vibration signal in the time-frequency domain.Recently, Yu et al. [10] proposed a novel time-frequency method named MSST, which employs an iterative reassignment procedure to improve the energy concentration of the time-frequency representation.MSST has the ability of generating better energy concentration and suppressing the cross-terms over the time-frequency plane to effectively deal with the strongly time-varying signal.However, MSST has not been widely used in fault diagnosis of rolling bearing.Hence, this paper will utilize MSST to construct time-frequency images of each raw vibration signal for fine-grained fault diagnosis of rolling bearing.
e other significant factors that affect the performance of fault diagnosis are effective feature extraction and fault diagnosis.Intelligent diagnosis methods have been successfully applied to identify faults, such as convolutional neural network (CNN) [3,4], support vector machine (SVM) [11,12], and evolving algorithms (EA) [13].However, it is unpractical to directly feed time-frequency matrixes to classifiers because of the high dimension of timefrequency matrixes.To avoid dimensional cruse, some researchers are dedicated to extracting effective and low-dimensional fault features.Yu et al. [14] proposed a supervised sparse coding (SSC) method to secondly extract time-frequency features on the marginal spectrum acquired from HHT. Li et al. [15] used two-dimensional nonnegative matrix factorization (2DNMF) technique to extract more informative features from time-frequency matrixes obtained by generalized S transform.Li et al. [16] proposed a feature extraction and selection scheme, which first used NMF technology to obtain the candidate feature subset from timefrequency matrixes gotten from S transform and then applied a feature selection algorithm based on mutual information and nondominated sorting genetic algorithm II (NSGA-II) to secondly select features from the candidate feature subset.Although these methods obtain well classification results on their own datasets, they may not be suitable for fine-grained fault diagnosis to identify where the fault happened on rolling bearing and how the fault severity is.erefore, the effective feature extraction method, which combines nonnegative matrix factorization with sparseness constraints (NMFSC) [17] and the solution of nonnegative linear equations, is proposed to reduce the dimension of feature matrix and extract the most discriminative features for fine-grained diagnosis.
Considering the above factors, a fine-grained fault diagnosis scheme integrating MSST and sparse feature coding based on dictionary learning is proposed in this paper.Within this scheme, MSST is first adopted to deal with the strongly time-varying signal for obtaining time-frequency matrixes that can accurately reflect the raw signal information, and then SFC-DL is proposed to extract lowdimensional and the most discriminative features on timefrequency matrixes.At last, considering the characteristics of the dataset in this paper and the unique advantages of SVM in dealing with small samples and high-dimensional and nonlinear datasets, LSVM is employed to identify sparse feature coding and effectively realizes fine-grained fault diagnosis of rolling bearing.
e flowchart of the fine-grained fault diagnosis method is displayed in Figure 1, and the detailed steps are described as follows.
Step 1. Collect the vibration data by data acquisition system and note their states.
Step 2. Sample the vibration signal and ensure each sample contains one complete period at least.
Step 3. Perform MSST for signal samples and obtain the time-frequency images with high resolution.
Step 4. Use SFC-DL to process time-frequency matrixes and get effective sparse feature coding of each sample.
Step 5. Divide the sparse feature coding set into training set and testing set and feed the training set into LSVM for training.
Step 6. Utilize the testing set to verify the feasibility of the proposed method.e rest of this paper is organized as follows.In Sections 2 and 3, the theories about MSST and SFC-DL are described, respectively.In Section 4, the proposed finegrained fault diagnosis method is applied on the experimental dataset and compared to state-of-the-art methods on the same dataset.In Section 5, the proposed method is applied to another dataset and proved to be effective and robust.And the conclusion is given in Section 6.

Time-Frequency Analysis Based on MSST
2.1.MSST.MSST is a time-frequency analysis method based on synchrosqueezing transform (SST) [18][19][20], which not only can generate better energy concentration and suppress the cross-terms but also can retain the signal reconstruction ability.e strongly time-varying signal can be defined as 2 Shock and Vibration where s k (t) is the monocomponent signal, K is the number of the monocomponent signal, A k (t) is the instantaneous amplitude, and φ k (t) denotes the instantaneous phase.e MSST of s(t) is defined as where G(t, w) is the STFT of s(t), w is the angular frequency, δ() denotes the Dirac delta function, N is the iteration number which denotes how many times SST is executed iteratively, the value of N affects the ability of energy concentration and corss-term suppression in time-frequency image, which should be artificially set before operating MSST such that N ≥ 2, and  w [N] (t, w) is the instantaneous frequency (IF) estimate for the MSST, which can be defined as where φ ′ (t) is the first-order derivative of the instantaneous phase φ(t) for signal s(t).
A new IF estimate is constructed to reassign the blurry STFT energy by each iteration procedure.By multiple iterations, the IF estimate of the MSST method is closer and closer to the signal true IF [10] so that time-frequency images with high resolution of strongly time-varying signals can be obtained.

Time-Frequency Analysis of Raw Vibration Signal.
To verify that MSST is superior to other time-frequency analysis methods, various time-frequency analysis methods are applied to process strongly time-varying signals.Figure 2(a) gives the waveform of the actual fault vibration signal, and Figures 2(b)-2(f ) show the time-frequency representation of the vibration signal by means of STFT, CWT, WVD, ST, and MSST, respectively, where the time-frequency representation of the vibration signal by means of STFT, CWT, WVD, ST, and MSST is, respectively, reproduced by referencing [3][4][5][6]10] and time-frequency analysis toolbox in MATLAB.
As shown in Figure 2(f ), MSST achieves the most satisfactory time-frequency concentration compared with STFT, CWT, WVD, and ST.Each monocomponent signal can be clearly detected and separated from others.A significant improvement in the energy concentration is easily noticeable in comparison with other methods.erefore, this paper utilizes MSST to process raw signals, which provides great data basis for further feature extraction.

Sparse Feature Coding Based on Dictionary Learning
As can be seen in Section 2.2, the high-resolution timefrequency images of raw vibration signals can be obtained by MSST.However, it is not reasonable to classify those timefrequency distributions directly since the data dimension is too high to deal with and there are high noises and irrelevant or redundant information in those time-frequency matrixes.
In order to eliminate redundant features and avoid curse of dimensionality, SFC-DL combined NMFSC and the solution of nonnegative linear equations is proposed, which can not only reduce the data dimension but also retain the features that can best distinguish the slight differences between different fault states.e algorithm mainly contains the following two steps.
Step 1. Utilize NMFSC and minibatch gradient descent algorithm to train the original features to obtain the basis dictionary.
Step 2. Solve the sparse feature coding corresponding to each sample by the basis dictionary and nonnegative linear equations.e process of this feature extraction algorithm is shown in Figure 3.

Construction of Basis Dictionary Based on NMFSC.
Nonnegative matrix factorization (NMF) [21,22] is a new matrix decomposition technique.e factorization can be expressed as follows: where V n×m , W n×r , and H r×m are all nonnegative matrixes.e rank r of factorization is properly chosen to achieve dimensionality reduction.V n×m is the set of all the sample features, in which m is the number of samples and each column contains an n-dimensional feature.Because each vector in V can be considered as a linear combination of the columns of W, weighted by the components of H, W and H can be regarded as basis dictionary and feature coding, respectively [16].It is the nonnegativity that makes the mathematical method more practical.
en, NMF can be adopted to further extract time-frequency features, and the decomposition process of the time-frequency matrix is shown in Figure 4.However, conventional NMF cannot control the sparse degree of the feature coding so that it is impossible to remove redundant features effectively.erefore, SFC-DL is proposed by adding sparseness constrains to conventional NMF to make the feature coding sparse on the basis of the original characteristic.e objective function and constraint subject of NMFSC are as follows: where sparseness() denotes the sparse function and S h is the desired sparsity of the ith row of H. e specific expression of sparseness() is as follows: where n is the dimension of vector x.

Shock and Vibration
Basis dictionary  4

Shock and Vibration
According to equations ( 5) and ( 6), these nonnegative matrixes W and H can be obtained through the minibatch gradient descent algorithm and a projection operator [17] which enforces sparseness by explicitly setting both L 1 and L 2 norms, that is, each row of H is projected onto the hyperplane  h i � L 1 and then projected closest point on the joint constraint hypersphere (intersection of the sum and L 2 constraints) by moving radially outward from the center of the sphere, ensuring all components of h i are nonnegative.Otherwise, these negative values will be set as zero, and a new point is found again to satisfy the nonnegative constrain by iteration.Nonnegative matrix W is the basis dictionary required for sparse feature coding of each sample.e specific training process is as follows: (a) A small number of samples from each dataset are randomly selected to form a batch of training samples V train .60 samples are randomly selected from each data in our experiment.(b) e desired sparseness of nonnegative matrixes H is set by a projection operator, and the basis dictionary W is iteratively updated by the standard multiplicative step.(c) A total of 20 batches are trained to converge to get the desired nonnegative matrixes W and H, where the nonnegative matrix W is regarded as the basis dictionary for solving the sparse feature coding of each sample.

Solution of Sparse Feature
Coding.In Section 3.1, the basis dictionary W is obtained by repeated iterative update.en, in this section, nonnegative linear equations are combined with the basis dictionary to solve the sparse feature coding of each sample.e specific calculation expression is as follows: where v i represents the ith sample, h i ′ denotes the sparse feature coding corresponding to the ith sample, and h ij ′ indicates that every element in the sparse feature coding is nonnegative.erefore, the vector set [h 1 ′ , h 2 ′ , . . ., h m ′ ] solved by ( 7) is the sparse feature coding set H ′ required for fine-grained fault diagnosis.

First Case Study
To verify the feasibility and effectiveness of the proposed fault diagnosis method, firstly, the collected vibration data of rolling bearing are classified and processed by MSST.en the rank r and the sparsity S h are analyzed for the fault diagnosis performance.Moreover, the completeness and effectiveness of the sparse feature coding set are verified.Finally, the proposed method is compared with state-of-theart methods.All experiments are carried out with Windows 7, CPU of Intel Xeon E5-2640@2.40GHz and 64 GB RAM and MATLAB R2017a.

Data Acquisition.
e vibration data of rolling bearing come from the Case Western Reserve University (CWRU) bearing dataset [23].As shown in Figure 5, the experimental setup mainly consists of a loading motor, a torque transducer/encoder, a dynamometer, and control electronics.Single-point faults were introduced to each bearing (6205-2RS JEM SKF) using electro-discharge machining with fault diameters of 0.007 inches, 0.014 inches, and 0.021 inches.
e accelerometer was placed near the drive end to collect normal signals and fault signals at a sampling frequency of 12 kHz.Fault signals are, respectively, collected at inner race, ball, and outer race.All tests are under four different motor loads (0, 1, 2, and 3 hp).
To improve the robustness of the diagnosis method and meet the needs of practical engineering, the influence of motor load for fault classification is ignored.e vibration signals of the drive end bearing will be divided into 10 kinds of states: normal state and 9 kinds of fault states that they are inner race, ball, and outer, race respectively, in the damage diameters of 0.007, 0.014, and 0.021 inches (N, IR7, IR14, IR21, B7, B14, B21, OR7, OR14, and OR21), and each type of data consists of sample data under four different motor loads.Each state contains 600 samples, and each sample is a collected vibration signal segment consisting of 800 sampling data points; the experimental dataset with 6000 samples is established.e data sampling process is shown in Figure 6.To avoid particularity and contingency, 420 samples of each state are selected randomly for training and the remaining 180 for testing.More details about the 10 states are listed in Table 1.

Time-Frequency Representations of Vibration Signals
Acquired from Rolling Bearing Using MSST. Figure 7 shows  en, the feature set corresponding to each value of r is input to LSVM to get classification accuracy, which is performed 10 times.At last, to avoid the randomness and contingency, the average accuracy is calculated after removing the highest and lowest accuracy.Figure 9 displays the average accuracy corresponding to the different values of r.
As seen in Figure 9, when the sparsity S h is 0.7, the classification accuracy of rolling bearing fluctuates as the slight change of the value of r.If the value of r is too small, it will benefit the dimension reduction, but it will make useful information lost so that the sparse coding set is unable to reflect the true information of the original time-frequency matrixes.However, if the value of r is too large, the sparse coding set still contains redundancy, which also makes us miss the best classification accuracy.erefore, considering the influence of the value of r on the matrix dimension and the classification accuracy, the value of r is set to 25.

Analysis of Different Sparsity S h .
Likewise, the sparsity of the row of matrix H is directly related to the quality of sparse dictionary W and further affects the performance of the feature set.So CV is also performed on finding the optimal sparsity S h to set r as 25 and obtain the classification accuracy within the range of 0.3-0.9 of sparsity S h .In order to prevent the influence of extreme data on the experimental results, according to each sparsity S h , the experiment on finegrained fault diagnosis is conducted for 10 times.en the average accuracy is calculated after removing the highest and lowest accuracy.Figure 10 displays the average accuracy corresponding to the different sparsity S h .
Figure 10 shows that the accuracy fluctuates with the change of sparsity S h .If the sparsity S h is too small, the matrix H will still have redundancy, which further leads basis dictionary W to be less concise so that sparse coding set H ′ contains useless information.If the sparsity S h is too large, basis dictionary W will not be complete enough, which leads to loss of important information in the sparse coding set H ′ .Above situations both will affect the result of fault diagnosis.e experimental results show there are two peaks in the line chart.However, when the sparsity S h is 0.7, the average accuracy is the highest globally, and the basis dictionary W is sparse but complete and more representative than that when the sparsity is 0.4.

Different Feature Extraction Algorithms Based on NMF.
According to Section 4.2, 6000 time-frequency matrixes of 400 by 800 can be constructed by MSST.However, it is impossible to directly use all the elements in the original time-frequency matrix as the input vector for classification    Shock and Vibration because of the high dimension and the existing redundant information of matrixes.ere are three algorithms based on NMF employed to reduce the dimension of timefrequency matrixes.e first is conventional NMF, the second is SFC-DL, and the third is to add max-relevance and min-redundancy (MRMR) to SFC-DL, trying to select more superior feature sets from the sparse feature coding acquired by SFC-DL.Table 2 displays the experimental results of the above three different feature extraction algorithms.
As shown in Table 2, it can be known that the performance of SFC-DL is obviously superior to the conventional NMF algorithm, which indicates that SFC-DL not only reduces the dimension but also removes the redundant elements, and it is the elimination of redundant information that makes the performance rise from 92.63% to 98.03%.When adding MRMR to the feature set obtained by SFC-DL, it is failure to search for more superior feature sets from the feature set containing 25 features, which manifests the feature set obtained by SFC-DL without redundant features.
erefore, compared with the other two algorithms, SFC-DL can acquire the most discriminative and complete feature set to effectively realize the fine-grained classification of 10 fault states under mixed working conditions.

Comparison with State-of-the-Art Methods.
To highlight the effectiveness of the rolling bearing fault diagnosis method proposed in this paper, Table 3 displays the different fault diagnosis methods based on the data from CWRU.
In reference [7], the author applied HHT with CNN (HHT + CNN) to diagnose fault with 10 mixed working conditions.In reference [24], Zhang et al. proposed a fault diagnosis method with multivariable ensemble-based incremental support vector machine (MEISVM) to realize the classification of 7 kinds of fault bearings under the same motor load.In reference [25], Li et al. presented a semisupervised diagnosis method based on a distance-preserving self-organizing map (SS-DPSOM) to identify 4 fault states under mixed working conditions by manually extracting 19 features.In reference [26], the author utilized eight wavelet packet energy with multifractal features (8WPE-MF) to train SVMs and to realize 10 fault states diagnosed under mixed working conditions.
It can be seen in Table 3 that compared with the methods used in [7,26], the proposed method in this paper achieves the highest average accuracy up to 98.03% under the same complex working conditions.en, compared with the methods used in [24,25], the proposed method can effectively identify where the fault happened on rolling bearing

Second Case Study
5.1.Data Acquisition.In order to further verify the robustness of the proposed method, another dataset is provided by the Machinery Failure Prevention Technology (MFPT) Society [27].e test rig was equipped with a NICE bearing.In total, three baseline conditions, ten outer race fault conditions, and seven inner race fault conditions were tracked.To improve the robustness of the diagnosis method and meet the needs of practical engineering, the influence of motor load for fault classification is ignored.e MFPT dataset will be divided into 3 kinds of states: normal state, inner race fault state, and outer race fault state (N, IR, and OR), where three baseline data were gathered at a sampling frequency of 97656 Hz and under 270 lbs of load; seven outer race fault data were gathered at a sampling frequency of 48828 Hz and, respectively, under 25, 50, 100, 150, 200, 250, and 300 lbs of load, and seven inner race fault data were gathered at a sampling frequency of 48828 Hz and, respectively, under 0, 50, 100, 150, 200, 250, and 300 lbs of load.
Due to the limited data length, slicing the samples with overlap is employed.e data sampling process is shown in Figure 11, each sample is a collected vibration signal segment consisting of 4000 sampling data points and the shift is 2000.Finally, each state contains 500 samples, and a dataset with 1500 samples is established.To avoid particularity and contingency, random 350 samples of each state are selected for training and the remaining 150 for testing.More details about the 3 states are listed in Table 4.      10 Shock and Vibration time-frequency matrixes of 2000 by 4000 will be obtained, which will cause the size of time-frequency matrixes to be too large to compute.In order to unity the parameters in the proposed algorithm and reduce the computational burden, the operation of downsample was employed to process each sample signal.Hence, one of every five sample points was taken.en, 1500 time-frequency matrixes of 400 by 800 were obtained.Figure 13 shows the time-frequency representation of three states.

Experiment Result and Analysis.
To further verify the effectiveness of the rolling bearing fault diagnosis method proposed in this paper, the dataset from MFPT is carried into the model trained by the dataset from CWRU.In order to simplify the experiment and achieve high classification accuracy, the value of rank r is only adjusted and other parameters are not changed.As shown in Table 5, when the value of r is set as 100, the average classification accuracy is up to 95.83%.Compared with the method used in [28], the proposed method in this paper achieves better classification accuracy of three fault states with fewer features in the MFPT dataset, which indicates the rolling bearing fault diagnosis method proposed in this paper has great robustness and effectiveness.

Conclusion
In this paper, we combine MSST with SFC-DL to realize the rolling bearing fine-grained classification of different kinds of mixed working conditions.Raw vibration signals are first transformed into time-frequency images with high resolution through MSST to provide detail fault information related to fault types and fault severities.en SFC-DL is

Figure 1 :
Figure 1: Flowchart of the presented fine-grained fault diagnosis method.

Figure 3 :
Figure 3: e owchart of the feature extraction algorithm.

Figure 4 :
Figure 4: e NMF of the time-frequency image.

4. 3 .
Analysis of Main Parameters 4.3.1.Analysis of Different Ranks r.It can be seen in Section 3.1 that rank r should be suitably selected at first for dimensional reduction, which directly affects the accuracy and time required for fault diagnosis.Herein, Control Variates (CV) is applied to find the optimal value of r.First, the sparsity of the row of matrix H is set as 0.7 (S h � 0.7), and the different values of r are set to generate different feature sets.

Figure 10 :
Figure 10: the average accuracy corresponding to the different sparsity S h (r � 25).

Figure 12 :Figure 13 :
Figure 12: Vibration signals acquired from three states of rolling bearing in MFPT: (a) N, (b) IR, and (c) OR.

Table 1 :
Dataset description in CWRU.
Acquired from Rolling Bearing Using MSST. Figure 12 shows the waveforms of three states under different motor loads.It can be seen that vibration signals are strongly time-varying signals, and it is difficult to distinguish them directly.If each sample comes from MFPT is processed by MSST, 1500

Table 2 :
e results of different feature extraction algorithms based on NMF.

Table 3 :
Comparison of the proposed method with state-of-the-art methods.

Table 5 :
Shock and Vibrationproposed to excavate the most effective and representative features in time-frequency matrixes as sparse feature coding to train LSVM for fault diagnosis.Experimental results indicate that our method is superior to state-of-the-art methods and realizes the rolling bearing fine-grained classification of 10 mixed working conditions on the CWUR dataset.Meanwhile, the rolling bearing fault diagnosis model proposed in this paper is applied to the MFPT dataset to realize the classification of 3 fault states under different working conditions.ese experimental results indicate that our method is an effective and robust tool for bearing finegrained fault diagnosis and can be applied to other rotating mechanical fault diagnosis such as gears and rotors.
e average accuracy results of the dataset from MFPT.MethodS h Feature size Average accuracy (%)