Rolling Bearing Fault Diagnosis Using Improved Deep Residual Shrinkage Networks

,


Introduction
Rotating machines are integral to both industrial production and daily life. Once the rotating machines break down, it will not only reduce the production efficiency of the industrial production process and cause economic losses but also threaten the safety of human production and work. Rolling bearing is one of the key parts of rotating machinery [1,2]. Incomplete statistics indicate that 30% of rotating machinery failures are caused by bearing failures [3,4]. erefore, it is highly significant to study the fault diagnosis methods of rolling bearing. e existing methods mainly include the signal-processing-based methods and the machine-learning-based methods. Signal-processing-based methods implement fault diagnosis by detecting characteristic frequencies which are related to the faults. In general, individuals can diagnose the faults by signal-processing-based methods; consequently, a multitude of researchers have poured attention into signalprocessing-based methods. For example, Li et al. [5] developed a variational-mode-decomposition-based bearing fault diagnosis method, which can effectively identify fault frequencies. Wang et al. [6] proposed a wavelet-transformbased method, which can effectively detect the hidden fault features in rotating machinery. However, the signal-processing-based methods need to rely on professional knowledge. In addition, the original vibration signals of the faults are easy to be overwhelmed with the strong noise in an actual environment, especially in the stage of incipient faults [7]. erefore, it is arduous for signal-processing-based methods to realize accurate fault diagnosis under an actual strong noise environment. e machine-learning-based intelligent methods [8], on the other hand, can perform the fault diagnosis task without the fault-related characteristics frequencies and prior physical knowledge. It mainly learns the sensitive fault features of the original vibration signals as the training features of the diagnosis model to identify the health state of the machine, including artificial neural networks (ANNs) [9], support vector machine (SVM) [10], k-nearest neighbors (KNNs) [11], probabilistic graphical model (PGM) [12], and deep learning (DL) [13]. DL methods have been widely used in fault diagnosis task because of its strong ability of automatic learning features, including stacking autoencoders (SAEs) [14][15][16][17][18][19], deep belief networks (DBNs) [20][21][22][23][24], and convolutional neural networks (CNNs) [25][26][27][28][29]. For example, Xiang et al. [18] used stacking autoencoders (SAEs) to diagnose rolling bearings without being affected by the speed and load. Mao et al. [19] applied the new deep autoencoders method to effectively improve the diagnosing accuracy. Shao et al. [23] proposed an adaptive deep belief networks (DBNs), which further improved the diagnosing accuracy and convergence speed. Chen et al. [25] effectively improved the diagnosing accuracy of rolling bearings by combining the preprocessing method based on cyclic spectral coherence with convolutional neural networks. Recently, deep CNNs have become the main-stream solution to many tasks. However, deep CNNs are more formidable to train than shallow neural networks, and thus, it would be arduous to diagnose accurately when the training process failed. erefore, He et al. [30] developed deep residual networks (ResNets), which use identity shortcuts, in order to reduce the difficulty of the training process of deep CNNs. In recent years, ResNets have become the main-stream solution to fault diagnosis tasks [31][32][33][34]. As an example, Zhang et al. [31] developed a fault diagnosis method based on ResNets for rolling bearing. Peng et al. [34] integrated ResNets with CNNs for a wheelset bearing fault diagnosis.
However, in the actual application environment of rolling bearing, the vibration signals collected often contain a large amount of noise. erefore, it is necessary to develop a method which is suitable for highly noised vibration signals. In the recent paper, Shao et al. [35] developed a new deep residual shrinkage network (DRSN), which integrated ResNets with soft thresholding. us, the DRSN is an evolution body of the ResNet. As mentioned in [35], DRSN can improve the fault diagnosing accuracy of vibration signals containing strong noise, so that DRSN is suitable for fault diagnosis under strong background noise. However, the soft thresholding in the DRSN may eliminate the effective signal features except noise in the process of feature learning, so that DRSNs fail to implement accurate fault diagnosis due to the loss of valid features. erefore, it is essential to develop a new shrinkage function in the DRSN for vibration signals containing strong noise.
In this paper, inspired by the latest work, DRSNs are improved to address this issue of soft thresholding in the DRSNs under strong background noise and further improve the fault diagnosing accuracy of vibration signals containing strong noise, with the final objective of yielding high fault diagnosing accuracy of rolling bearing under strong background noise. e main contributions are summarized as follows: (1) In this article, we firstly develop a new shrinkage function named leaky thresholding and replace the soft thresholding with leaky thresholding in the deep residual shrinkage networks. As mentioned in [35], the soft thresholding may eliminate the effective signal features except noise. Our leaky thresholding further improves the fault diagnosing accuracy of vibration signals containing strong noise. (2) Secondly, we provide a group searching method to determine the slope value of leaky thresholding under different input signals. e most optimal slope value can be selected by using the group searching method such that the whole network can achieve the better diagnosing accuracy to improve the diagnosis effect of the deep residual shrinkage networks for vibration signals containing noise. (3) Finally, we discover that the normalized original vibration signals can further improve the fault diagnosing accuracy of rolling bearings under strong background noise.
e rest of this article is organized as follows. e basic concepts of deep residual shrinkage networks and a detailed elaboration on leaky-thresholding are illustrated in Section 2. e experimental comparisons are in Section 3, in order to illustrate the effectiveness of the improved deep residual shrinkage networks proposed in this paper, and conclusions are summarized in Section 4. e detailed description is given as follows. e aim of the convolutional layer is to extract different features of the input, which can reduce parameters in network training, avoid the occurrence of overfitting, and improve the network model accuracy. e mapping relationship between the input features of the convolutional layer and the convolutional kernel can be expressed as follows:

Theory of IDRSN
where x and y are input feature map and output feature map, respectively. b is the bias, k is the convolutional kernel, i and j are indicators of the channels, and M j is a channel collection for calculating y j . BN is a technique to normalize input features during deep learning networks training. BN is mainly divided into two stages: in the first stage, BN adjusts features to the standard normal distribution; in the second stage, BN adjusts features to the distribution of ideal. As a result, BN can not only reduce internal covariant shift but also prevent the vanishing gradient problem and improve the learning convergence speed, so that BN can improve the training speed of the model. e formula of the batch normalization is expressed by where x n represents the input of the nth observation and y n represents the output of the nth observation. N batch represents the size of minibatch. ε is a constant value which is close to zero. c is a parameter value which is used to scale the distribution, and β is a parameter value which is used to move the distribution. Both c and β can be obtained by training. Activation function is an important component of the neural network, which is mainly used for nonlinear transformation. e rectifier linear unit (ReLU) has been widely used in the neural network because it can effectively prevent gradient vanishing. A rectifier linear unit is expressed by where x and y are the input feature and output feature, respectively. GAP can carry out average value calculation from each channel before the FC output layer. e advantages of GAP include two aspects. GAP can not only reduce the number of neural network parameters but also reduce the probability of the fitting phenomenon encountered in the neural network training. Furthermore, GAP can avoid the shift variant problem and reduce the effect of fault pulse position for training learning characteristics.
In the multicategory classification task, the cross-entropy error is often adopted as loss function because it can improve the training efficiency of the neural network. It is necessary to apply softmax function before calculating the cross-entropy error. e softmax function is expressed by where x j and y j are the input and output of jth neuron in the function, respectively. x i is an input of ith neuron, and N class represents the number of neurons. e cross-entropy function is expressed by where t i is the actual probability of jth neuron at the output layer.

Architecture of the RSBU.
e DRSN is an evolution body of the ResNet. e soft thresholding in the DRSN can effectively get rid of noise. Soft thresholding has been often used as a denoising method in the field of signal processing. In general, the soft thresholding realizes denoising of the original signal by converting the nearzero feature of original signal to zero. e function of soft thresholding is expressed by where x and y are the input feature and output feature, respectively. τ is the threshold, and it is a positive number. Its derivative form is as follows: It can be observed from the above function that the result of the function is either one or zero. erefore, the soft thresholding can not only remove the noise of signals but also prevent the problem of gradient vanishing. e most important module of the DRSN is the residual shrinkage building unit (RSBU), as shown in Figure 1. C is the size of channels, W is the size of width, and the number 1 is the size of height. An RSBU includes two BNs, two ReLUs, two convolutional layers, a threshold module, a soft thresholding module, and an identity shortcut. reshold module is composed of an absolute, a GAP, a BN, a ReLU, and two output FC layers.
reshold module can automatically determine the threshold in the process of neural network training. e detailed process is that the feature map x is propagated into an absolute operation, a GAP, a BN, a ReLU, and two output FC layers, the number of channels in the second FC layer is the same as the feature map x. e scaling parameter is obtained by the formula as follows: where z and α are the feature of neuron and the scaling parameter, respectively. c is the index. Furthermore, the threshold calculation formula is expressed as follows: where τ is the threshold of the feature map x and i, j, and c are the indexes of width, height, and channel of the feature map x, respectively.

e DRSN's Structure.
e structure of a DRSN is shown in Figure 2, which consists of an input layer, a convolutional layer, one or more RSBUs, a BN, a ReLU, a Shock and Vibration GAP layer, and an output FC layer. Finally, the fault classification results can be obtained.

Leaky resholding.
In general, the near-zero features of highly noised vibration signals are unimportant. It is necessary for accurate fault diagnosis to eliminate the unimportant features. As mentioned in [35], the soft thresholding in the DRSN can effectively eliminate the unimportant features. However, the soft thresholding may eliminate the effective signal features except noise, leading to the reduction of fault diagnosing accuracy. erefore, in order to address this issue, we firstly develop a new shrinkage function named leaky thresholding. e leaky thresholding can retain the effective features of the vibration signals containing noise as much as possible, and the function of leaky thresholding is expressed by where x and y are the input feature and output feature, respectively. τ and α are the threshold and the slope of the shrinkage function, respectively. And, τ and α are positive numbers. e leaky-thresholding process is shown in Figure 3(a). As shown in Figure 3(b), the result of the leaky-thresholding function derivative is either 1 or α, which can also prevent gradient vanishing and explosion problems during model training. e derivative of the leaky-thresholding function is expressed by

Improved DRSN Model (IDRSN).
In order to address the issue that soft thresholding in the DRSN model may eliminate the effective signal features except noise, in this paper, we replace the soft thresholding with leaky thresholding in the DRSN. e architecture of the improved residual shrinkage building unit (IRSBU) is shown in Figure 4. e basic architecture of IDRSN is similar to the DRSN, as shown in Figure 5. e improved residual shrinkage building unit (IRSBU) proposed in this paper needs to be replaced with the RSBU unit in Figure 2 to form an improved deep residual shrinkage network (IDRSN).

Experiments and Results
e developed IDRSN was implemented in the TensorFlow environment using Python. In this paper, we implemented experiments on a computer with a NVIDIA GeForce RTX 2060 MAX-Q graphics processing unit and an i7-10875H CPU. e experimental results have been summarized in this section. Figure 6, we adopt the vibration signal datasets which are obtained from the Case Western Reserve University (CWRU), in order to prove the IDRSNs' ability of rolling bearing fault diagnosis when dealing with highly noised vibration signals. In this paper, the bearing conditions include a healthy condition (H) and three fault conditions. e fault conditions include inner race fault (IF), outer race fault (OF), and ball fault (BF), and the fault diameter at each fault condition is 7 mils, 14 mils, and 21 mils, respectively. erefore, the datasets adopted in this paper include ten bearing conditions, and the datasets are under a load of 0hp. Each observation includes 400 data points, and each bearing condition has 3000 observations. e training sets consist of 2800 observations, and the test sets consist of 200 observations. e detailed description of the datasets is presented in Table 1.

Noise Preparation.
To verify the IDRSNs' ability of fault diagnosis under a strong background noise environment, Gaussian white noise was performed on the original vibration signal to simulate the real environment. Signal to noise ratio (SNR) stands for noise intensity, and its expression is as follows: where P signal and P noise are signal energy and noise energy, respectively. SNR stands for noise intensity in dB. In this paper, Gaussian noise was performed on the original vibration signal with noise intensity ranging from −4 dB to 6 dB. e vibration signals containing noise were used as training datasets and test datasets to simulate the real strong background noise environment. e training datasets and test datasets were generated randomly each time, in order to reduce the randomness of the experimental results.

Experimental Comparison with the DRSN.
e superiority of DRSNs compared to ResNets and deep CNNs under strong background noise has been validated in [35]. us, this article takes DRSNs as a benchmark to be further compared. And, the architecture-related hyperparameters selected in this article are based on the popular recommendations [35], as presented in Table 2, including the number of layers, the number of convolutional kernels, and the size of convolutional kernels. e first numbers in the bracket of the third and fourth column are the number of convolutional kernels, and the second numbers in the bracket of the third and fourth column are the width of convolutional kernels. e meaning of "/2" in the bracket is that the step size of convolution kernels is 2.
In the training process, we define some optimizationrelated hyperparameters, the Adam optimizer is adopted, and the learning rate is set to 0.001 in the 50 epochs. e coefficients of L2 regularization are set to 0.0001, and minibatch size is set to 16. e accuracy results of the DRSN and the IDRSN for fault diagnosis under the influence of noise of different intensity are listed in Table 3. e IDRSN adopts the leaky thresholding, which is proposed in this paper, and the slope value parameter α of the leaky thresholding adopts the group searching method which is provided in this paper. e process of the group searching method for the leaky thresholding developed in this article is shown in Figure 7. e slope value that makes the diagnosis accuracy reach the optimum can be selected as the model training parameter value. In this paper, the group of slope value consisted of 0.001, 0.003, 0.01, 0.05, 0.075, 0.1, and 0.2. e fault diagnosis results of different slope values in the group are listed in Table 3.
e accuracy of test samples in the table is the average values of the five experimental results as the final experimental results.
In order to prove the superiority of the IDRSN over the DRSN, the IDRSN is compared with the DRSN under different noise intensity, and the comparison results of fault diagnosing accuracy of the IDRSN proposed in this article and the DRSN are shown in Figure 8. e accuracy result under optimal slope is selected as the IDRSN's actual fault diagnosing accuracy. As indicated in Figure 8, the developed IDRSN for rolling bearing fault diagnosis under strong background noise is more advantageous than the DRSN, and the experimental results show that the developed IDRSN yields improvements of 1.88% in terms of average diagnosis accuracy under different noise intensity, when compared with the DRSN. is paper demonstrates that the proposed leaky thresholding effectively to address the issue of soft  Shock and Vibration 5    thresholding may get rid of useful characteristics in the process of signal denoising.
To further improve the average test accuracy, the input signal is preprocessed and the normalization method is adopted. e normalization process is expressed by where x and y are the input signal and output signal, respectively. x max stands for the maximum of input signal and x min is the minimum of input signal.
e results of fault diagnosing accuracy of the normalized DRSN and normalized IDRSN under different noise intensity are summarized in Table 4.
As shown in Figure 9, the accuracy of the normalized IDRSN and normalized DRSN is significantly improved compared with that before normalization, and the accuracy of the normalized IDRSN is higher than that before normalization. It can be observed that the normalized IDRSN and normalized DRSN yields improvements of 4.53% and 4.4% in terms of average diagnosis accuracy under different noise intensity, when compared with the IDRSN and DRSN, respectively.   Table 5. It can be observed that our IDRSN proposed in this paper improves the accuracy of fault diagnosis without increasing the computational complexity of the model.

Conclusions
In this paper, inspired by the latest work [35], we proposed an improved deep residual shrinkage network (IDRSN) to improve the fault diagnosing accuracy of rolling bearing under strong background noise environment. As described by the authors in [35], soft thresholding may eliminate the effective signal features except noise. We develop a new shrinkage function named leaky thresholding to replace the soft thresholding with leaky thresholding in the DRSN. e leaky thresholding can retain the effective features of the vibration signal containing noise as much as possible. e slope value of leaky thresholding can be determined by using group searching method, where the method can select the most optimal slope as the model training parameter value.
Compared with the original DRSN in [35], our IDRSN can achieve better simulation results of rolling bearing fault diagnosis when vibration signals contain noise. e superiority of the IDRSN compared with the DRSN in improving fault diagnosing accuracy of rolling bearing has been verified in the experiment. e IDRSN outperformed the DRSN by yielding improvements of 1.88%. In this paper, we also provide a normalized processing to further improve the diagnosis accuracy. e normalized IDRSN and normalized DRSN outperformed the IDRSN and DRSN by yielding improvements of 4.53% and 4.4%, respectively. erefore, our IDRSN in this paper has better fault diagnosis effect on noised vibration signals compared with the DRSN, and the normalized processing can further improve the fault diagnosis effect of the IDRSN and DRSN. In summary, the IDRSN developed in this paper can effectively improve the fault diagnosing accuracy of rolling bearing under strong background noise environment.
Data Availability e vibration signal datasets of the Case Western Reserve University (CWRU) can be downloaded from website: https://csegroups.case.edu/bearingdatacenter/home.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this article.