Data-Driven Fault Diagnosis for Rolling Bearing Based on DIT-FFT and XGBoost

. The rolling bearing is an extremely important basic mechanical device. The diagnosis of its fault play an important role in the safe and stable operation of the mechanical system. This study proposed an approach, based on the Fast Fourier Transform (FFT) with Decimation-In-Time (DIT) and XGBoost algorithm, to identify the fault type of bearing quickly and accurately. Firstly, the original vibration signal of rolling bearing was transformed by DIT-FFT and divided into the training set and test set. Next, the training set was used to train the fault diagnosis XGBoost model, and the test set was used to validate the well-trained XGBoost model. Finally, the proposed approach was compared with some common methods. It is demonstrated that the proposed approach is able to diagnose and identify the fault type of bearing quickly with almost 99% accuracy. It is more accurate than Machine Learning (89.88%), Ensemble Learning (93.25%), and Deep Learning (95%). This approach is suitable for the fault diagnosis of rolling bearing.


Introduction
Rolling bearing is an extremely important basic mechanical component of rotating machinery. It is widely used in various fields of national economy and defense due to its high efficiency, easy assembly, and lubrication [1]. e health state of rolling bearing is directly related to the performance and service life of mechanical equipment. According to an incomplete statistic, about 30% of rotating machinery faults are caused by rolling bearing [2]. erefore, it is necessary to diagnose and identify the fault of rolling bearing timely.
Over the years, some methods and techniques have been used to monitor the health state of equipment [3][4][5][6][7]. For the fault diagnose of bearing, most of them are based on the analysis of the vibration signal of bearing [8][9][10][11][12][13]. Generally, most of these methods consist of two stages: data processing and fault state determination [11]. For data processing, most methods extract the fault parameters of vibration signal from the amplitude spectrum, amplitude-frequency diagram, power spectrum, or wavelet spectrum in the time domain and frequency domain and then constitute eigenvector with these parameters for signal analysis. Signal analysis methods mainly include the time domain analysis, frequency domain analysis, and time-frequency domain analysis [12,13]. Time domain analysis approaches analyze the vibration signal as a function of time [14], such as the spike energy method [15] and the signal enveloping method [16]. Frequency domain analysis, based on the availability of the Fourier transform technology, extracts the vibration signal features more easily than the time domain analysis [16]. e Fourier transform includes short-time Fourier Transform, Fast Fourier Transform (FFT), and Discrete Fourier Transform (DFT). Among them, FFT can perform DFT on finite sequence quickly. us it has an excellent performance in the feature extraction of fault diagnosis [14,17,18]. Time-frequency domain analysis approaches combine both the time and frequency domain information to study the inner features of signal, such as the Gabor transform [19], continuous wavelet transform [20], and the Wigner-Ville distribution [21].
ere are also some other signal processing methods used to construct feature sets, such as sample entropy, fuzzy entropy, and amplitude spectral entropy [11].
For fault state determination, some different Machine Learning (ML) approaches are used to construct a classifier, for instance, traditional ML, Deep Learning (DL), and Ensemble Learning (EL). In aspect of traditional ML, Chouri et al. [8] applied SVM to automate the fault diagnosis procedure. It extracted the feature of the vibration signal of faulty bearing with Alpha-stable distribution. Bu et al. [22] proposed a method combining LS-SVM and Local Mean Decomposition (LMD) to diagnose the bearing fault. e original vibration signal was decomposed by LMD and trained to constitute the feature vectors. LS-SVM was used to determine the health state of bearing. However, the diagnosis accuracies of these traditional methods were only about 90%. In aspect of DL, Yang et al. [10] constructed a data set after extracting the features of vibration signal in the frequency domain. e data set was used to train a deep neural network (DNN) to classify the fault types. Jiang et al. [23] presented a method based on the convolutional neural network (CNN). e feature parameters of Mel-frequency cepstral coefficients and delta cepstrum were extracted to train the diagnosis model. However, parameter adjustment of these DL methods is difficult and the training time is usually very long. In aspect of EL, Hu et al. [24] proposed a method combining kernel principle component analysis with random forest (RF); a group of classifiers were trained in high dimensional kernel space with RF method. But the overfitting occurs for RF when the data has much noise. A new algorithm of EL called eXtreme Gradient Boosting (XGBoost) was proposed by Chen [25]. is method has many advantages, such as regularization, parallel processing, and missing value processing. It has excellent performance on regression and classification issues [26,27]. Rao et al. [28] applied the XGBoost algorithm to detect anomalies of the steam turbine based on learning historical data and got a promising result.
is study is to propose an approach based on FFT with Decimation-In-Time (DIT) and XGBoost to diagnose and identify the rolling bearing faults. e experimental data is provided by the Case Western Reserve University Bearing Data Center (CWRU). e DIT-FFT is used to process vibration signal, and the XGBoost model is used as a classifier to diagnose the faults of rolling bearing.

Related Algorithms
is section starts with an introduction of DFT. Assuming x (n) is a typical N-points finite sequence, then its DFT is defined as where k is the normalized digital frequency, k � 0, 1, . . . , N − 1, W nk N � e − j2πnk/N . Unfolding equation (1) into the matrix form, we can obtain Its abbreviated matrix form is where X is the DFT matrix, x is the sequence matrix, and F denotes the transformation matrix. It can be seen from F that the computational complexity is N × N recorded as (O(N 2 )) for a N-point DFT. It includes N 2 times multiplication and N × (N − 1) times addition. us, it will consume a lot of computing time and memory space to process data directly by this method. is study adopted the radix-2 FFT algorithm [29] to improve computational efficiency and reduce computational complexity. In practical applications, there are many different radix-2 FFT algorithms. Among them, the radix-2 FFTalgorithm with DIT (radix-2 DIT-FFT) is the most important one. Its flow diagram of the butterfly operation theorem is shown in Figure 1.
As can be seen from Figure 1, the output of the upper branch equals the sum of the inputs of the upper branch and the lower branch, that is, X(k) � X 0 (k) + W k N X 1 (k). e output of the lower branch equals the difference between the inputs of the upper branch and the lower branch, that is, With the upper form, for N � 2 l (l is a positive integer), the lengths of the subsequence after the first, the second, . . . , and the last decompositions are 2 l− 1 , 2 l− 2 , . . . , and 2 0 � 1, respectively. erefore, the total number of decompositions is l � log 2 N. For each decomposition, the times of multiplication and addition of the corresponding butterfly operations are N/2 and N, respectively. us, the total multiplication times M c (i.e., computational complexity) and addition times A c are as follows: e ratio of multiplications of DIT-FFT and DFT α M (i.e., the ratio of computational complexity) and the ratio of For the radix-2, DIT-FFT and DFT algorithm with different N, α M , and α A were illustrated in Figure 2. As we can see from Figure 2, both of α M and α A decrease rapidly with the increase of N. In other words, the DIT-FFT method accelerates the calculation speed and improves the calculation efficiency obviously when N is large enough. But when N is greater than 2 11 , α M and α A decrease slowly. erefore, this paper chose N � 2 11 as the length of each group sample to reduce the computational complexity.

XGBoost.
XGBoost is an algorithm of EL, which is based on Classification and Regression Tree (CART) [25]. Its objective function is defined as where L is a differentiable convex loss function that measures the difference between the prediction y i and the target y i . f t (x i ) is the final score function of the sample in the t th round and x i is the i th input of the sample. It can be expressed as where ω is the leaf vector, q is the structure of the tree, and T represents the number of leaf nodes. e second term Ω on the right side of equation (6) penalizes the complexity of the model. It is defined as where λ and c are two model parameters used to control the proportion of Ω. ω 2 represents L 2 regularization processing on ω, it can also be expressed as ω 2 . If the value of Ω (f) is small, the complexity of the tree is low while the generalization ability is strong.
After the second-order Taylor expansion of equation (6), a new objective function is obtained: where g i � z ) are the first-and second-order gradient statistics on the loss function L, respectively. e final objective function only depends on the firstorder and the second-order derivative of the error of each data point. Removing the constant term and expanding Ω (f), the objective function is updated to where For the sake of simplicity, we define G j � i∈I j g i and H j � i∈I j h i ; then, Next computing partial derivatives of ω j and bringing the results back to equation (11), the following formula can be obtained:

Complexity
It can be used as a scoring function to measure the quality of a tree structure q. e smaller Obj is, the wellstructured the tree will be. is score is like the impurity score for evaluating decision trees, except that it is derived for a wider range of objective functions. Generally, it is impossible to enumerate all possible tree structures q. A greedy algorithm that starts from a single leaf and iteratively adds branches to the tree is used instead [25]. Let I L and I R be the instance sets of left and right nodes after the split, respectively. Define I � I L ⋃ I R , the loss reduction after the split is given by is formula is usually used in practice for the evaluation of the split candidate node.

Bearing Fault Test Bench
In this paper, the proposed DIT-FFT-XGBoost method is tested by the SKF rolling bearing data sets of CWRU which are standard data for testing diagnosis methods of bearing fault. Figure 3 illustrates the bearing fault test bench of CWRU. It consists of a 2 hp electric motor, a torque transducer/encoder, a dynamometer, and the control electronics. Generally speaking, the rolling element (RE), inner race (IR), and outer race (OR) of the rolling bearing of the electric motor are easily damaged. e test bench can simulate these fault types, that is, RE fault, IR fault, and OR fault.

Fault Diagnosis Procedure.
e fault diagnosis procedure of rolling bearing with the DIT-FFT-XGBoost method is shown in Figure 4. It mainly includes two stages: data processing and fault state determination. ere are two steps in the data processing stage.
Step 1: calculate the standard deviation and average value of input data and then standardize the data to be normal distribution to eliminate the impact of the abnormal samples on the data processing. It is suitable for complicated big data.
Step 2: process the standardized data with DIT-FFT, and then divide them into the training set and test set. In the fault state determination stage, feed the training set into the XGBoost model. e XGBoost model trains the negative gradient of loss function each time to adjust its parameters. Finally, the well-trained model identifies the fault type of bearing. Analysis was performed with the Python system for computing on a laptop with Intel (R) Core (TM) i5-7200 CPU @ 2.70 GHz.

Data Processing.
e original data of RE fault, IR fault, and OR fault with the 0.007-inch damage diameter collected by the recorder at DE under the conditions of 12 kHz and 1772 rpm are shown in Figure 5.
After fitting and standardizing the data of the above three fault types to be a normal distribution, the standardization results are shown in Figure 6.
Next, transform the standardization results by DIT-FFT; the absolute values of them are shown in Figure 7.
e processing results of the normal state and 9 fault types with 0.014-and 0.021-inch damage diameters are illustrated in Figures 8-14, respectively.
For each of the above figures, the waveforms from top to bottom are original data, standardization result, and transformation result, respectively. As mentioned above, N � 2 11 � 2048 data length was chosen as a group sample for the radix-2 DIT-FFT algorithm in this paper. Data length is 2048 for all of the above data processing figures. According to the data lengths of the 12 k DE bearing data as listed in Table 1 and the above data length selection rule of sample, we    constructed the training set and test set as illustrated in Table 2. For each bearing fault type and the normal state, 50 group samples were taken from original data as the training set, that is, data length of 102,400. e remaining data was used to construct the test set. us, there were 9 group samples as test set for each fault type (i.e., data length of 18,432), and 186 group samples as the test set for the normal state (i.e., data length of 380,928). erefore, the training set and test set were composed of 500 and 267 group samples, respectively.

Fault State Determination.
In order to make full use of the training set to improve the diagnosis accuracy of the

Complexity
XGBoost model, the k-fold cross-validation was introduced to divide the training set into k folds. Among them, k − 1 folds were used for the training model each time, and the remaining one was used for the testing model. When the parameters of the XGBoost model were default values, the influence of k on the training time and diagnosis accuracy of the model is illustrated in Figure 15. With the increase of k from 3 to 7, the training time gradually increases from 31.91 s to 89.62 s, but the diagnosis accuracy increases firstly and then decreases. When k � 5, the diagnosis accuracy is the highest; meanwhile, the training time is not too long. erefore, the training set is divided into 5 folds. ere are several parameters of the XGBoost model that affect the accuracy of fault diagnosis. eir detailed information is listed in Table 3.
Among them, the most important parameters are the max_depth, min_child_weight, and gamma. e max_depth, that is, the maximum depth of a tree, is a parameter that needs to be adjusted when building a tree. If it is too large, the model will be more complicated and easier to overfit [30]. e test value of this parameter was selected from 3 to 6 in this paper. Min_child_weight, that is, minimum sum of instance weight needed in a child (H j in (15)), is used to avoid overfitting. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, the building process will give up further partition [30]. e larger the min_child_weight is, the more conservative the model will be. Its range was from 1 to 5. e influence of these two parameters on the diagnosis accuracy of the XGBoost model is shown in Figure 16.
As can be seen from Figure 16, on the whole, the diagnosis accuracies of the XGBoost model for different max_depths decrease with the increase of min_child_weight from 1 to 5. e diagnosis accuracy is the highest when max_depth � 4. In particular, the model has the highest diagnosis accuracy (up to 100%) when min_child_weight � 1. e last important parameter gamma (c) represents the minimum loss reduction required to make a further partition on a leaf node of the tree (c in equation (8)). e larger c is, the more conservative the algorithm will be [30]. So, its range was from 0 to 0.5 in this study. e influence of c on the diagnosis accuracy of the XGBoost model is shown in Figure 17.
As can be seen from Figure 17, the diagnosis accuracy of the XGBoost model decreases slightly with the increase of c from 0 to 0.5. When c � 0 or 0.1, the model has the highest accuracy, up to 100%. Moreover, compared with c � 0.1, the calculation of the model is small when c � 0; thus, c � 0 is the better choice for the XGBoost model.

Result and Analysis
. T-distributed stochastic neighbor embedding (T-SNE) is a ML algorithm for dimensionality reduction. It is convenient for the visualization of data. is paper adopted this algorithm to reduce each group sample to  2D and visualize the effect of data processing with DIT-FFT. Figure 18 shows the outcome of dimensionality reduction of processed data. It can be seen that the effect of data processing with DIT-FFT is very obvious. Each type of fault clusters together after the data processing. When the parameter values of the XGBoost model are default in Table 3, the original data and processed data with DIT-FFT are used to train the XGBoost model and diagnose the test set, respectively. e comparisons of training time of model and diagnosis accuracy of test set between the above two cases are listed in Table 4.
It can be seen from Table 4 that, using the original data as the input of the XGBoost model, the training time is 52.65 s. But it is drastically reduced to 31.91 s using the data processed by DIT-FFT. e diagnosis accuracy of the test set has also greatly improved from about 55% to 93%.

Complexity
Next, the diagnosis accuracy of the XGBoost model without parameter adjustment for each fault type is further analyzed. e confusion matrixes associated with the obtained results above are illustrated in Figure 19.
As can be seen from Figure 19(a), when the XGBoost model is trained with the original data, it only has high accuracy for the OR fault with a 0.021-inch damage diameter (i.e., label 8). But when the model is trained with the data processed by DIT-FFT, it has high accuracies for 7 fault types (i.e., labels 0, 1, 2, 3, 5, 8, and 10), as shown in Figure 19(b). us, it further shows that data processed by DIT-FFT can greatly improve the diagnosis accuracy of bearing.
To further improve the accuracy of model diagnosis, the parameters of the XGBoost model are adjusted to the best values, that is, max_depth � 4, min_child_weight � 1, and c � 0. Both the train set and test set are processed by DIT-FFT previously. e confusion matrix for the test set is shown in Figure 20.
We can see from Figure 20 that the XGBoost model with the best parameter values has high accuracies for 9 fault types (i.e., labels 0, 1, 2, 3, 4, 5, 7, 8, 9, and 10). e total diagnosis accuracy of the test set is up to 98.12%. But the model has a low diagnosis accuracy for the RE fault with 0.021-inch damage diameter (i.e., label 6). It misidentifies this fault as the RE fault with a 0.014-inch damage diameter (i.e., label 3). Analyzing the vibration signals of these two faults, we find that they are very similar so that the model cannot identify them accurately. is problem will be further solved in future work.

Method Comparison
In this part, the proposed method DIT-FFT-XGBoost is compared with other methods from two aspects, for example, data processing and fault state determination. In aspect of data processing, the Empirical Mode Decomposition (EMD) method was chosen to compare with the DIT-FFT method. EMD has a very good performance in dealing with nonstationary and nonlinear data [17,27]. It decomposes signal based on the time scale characteristics of the data itself. Taking the RE fault with 0.007-inch damage diameter (i.e., label 0) as an example, the decomposition result of vibration signal by EMD method is illustrated in Figure 21.
After the analysis of the kurtosis of Intrinsic Mode Function (IMF) components shown in Figure 21, four components IMF1 ∼ IMF4 are chosen to build a new data set as the input data to train the XGBoost model. e diagnosis accuracy of this well-trained model for the label-0 fault is only 82.40%. As a comparison, when using the vibration signal of the label-0 fault processed by DIT-FFT to train the XGBoost model, the diagnosis accuracy increases up to 88.9%, which is higher than that of EMD.

Complexity
In aspect of fault state determination, the SVM method in ML; GBDT and RF method in EL; and DNN and CNN in DL are chosen to compare with XGBoost. All of the input training data of these models are preprocessed by DIT-FFT. e comparisons of these models in terms of training time and diagnosis accuracy are shown in Table 5.
As we can see from Table 5, the training time of SVM is the shortest, but its diagnosis accuracy is lower than 90%. It does not satisfy the requirements of the high diagnosis accuracy of bearing faults. Compared with other methods, XGBoost has the highest diagnosis accuracy (up to 98.12%) without a significant increase in training time (59.53 s). erefore, it is the best choice for the diagnosis of bearing faults.

Conclusion
is paper proposed an approach based on DIT-FFT and XGBoost for the fault diagnosis of rolling bearing. e DIT-FFT was used to process vibration signal, and the XGBoost model was used as a classifier for fault identification. e proposed approach was tested with CWRU bearing data and compared with some common methods. e following conclusions can be drawn: (1) After the vibration signal was processed by DIT-FFT, both of the training time and diagnosis accuracy of the XGBoost model had a significant improvement. e training time reduced from 52.65 s to 31.91 s; meanwhile, the diagnosis accuracy of the test set improved from about 55% to 93%.
(2) After the parameters of the XGBoost model were adjusted to the best values, the model had high accuracies for 9 fault types and the total diagnosis accuracy of the test set was further up to 98.12%. (3) Compared with the data processing method of EMD, DIT-FFT can extract more fault features and had higher diagnosis accuracy.
(4) Compared with other traditional ML, EL, and DL methods, XGBoost had the highest diagnosis accuracy (up to 98.12%) without a significant increase in training time (59.53 s).
ere is no doubt that the combination of DIT-FFT and XGBoost can be used to diagnose bearing faults quickly and accurately.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.