Fault Diagnosis of Rotating Machinery Based on One-Dimensional Deep Residual Shrinkage Network with a Wide Convolution Layer

In actual engineering applications, inevitable noise seriously affects the accuracy of fault diagnosis for rotating machinery. To effectively identify the fault classes of rotating machinery under noise interference, an efficient fault diagnosis method without additional denoising procedures is proposed. First, a one-dimensional deep residual shrinkage network, which directly takes the raw vibration signals contaminated by noise as input, is developed to realize end-to-end fault diagnosis. )en, to further enhance the noise immunity of the diagnosis model, the first layer of the model is set to a wide convolution layer to extract short time features. Moreover, an adaptive batch normalization algorithm (AdaBN) is introduced into the diagnosis model to enhance the adaptability to noise. Experimental results illustrate that the fault diagnosis model for rotating machinery based on one-dimensional deep residual shrinkage network with a wide convolution layer (1D-WDRSN) can accurately identify the fault classes even under noise interference.


Introduction
With the rapid development of integrated technology, complex industrial machinery systems are facing huge safety challenges [1,2]. At present, as the core component of the mechanical system, rotating machinery always works in a complex environment. e operation failure of rotating machinery will affect the function of the entire mechanical system and even cause catastrophic accidents [3,4]. erefore, to ensure the continuous and effective operation of industrial machinery systems, research on intelligent diagnosis methods has turned into a trend in machinery health monitoring.
Traditional fault diagnosis of rotating machinery relied on the professional experience of engineers; thus, the diagnosis accuracy inevitably has a certain degree of randomness. In actual engineering applications, it is more desirable to adopt an automatic method to ensure the reliability of rotating machinery. e functional framework of the automatic fault diagnosis process, which consists of data preprocessing, feature extraction, fault classification, and result analysis, is shown in Figure 1. Its main functions are as follows: the vibration sensors are used to collect the vibration signal data of the rotating machinery; the normalization and other preprocessing operations are performed to construct the original sample set; the appropriate feature extraction methods are adopted to find feature vectors that reflect the difference of sample features; an efficient fault classification model is trained to identify the fault classes of unknown samples; the diagnosis results are output and evaluated. e development of artificial intelligence and machine learning has promoted the continuous development of fault diagnosis methods for rotating machinery in the direction of intelligence, thereby gradually forming a rotating machinery fault diagnosis mode that combines artificial feature extraction and shallow learning classification. In the above diagnosis mode, traditional machine learning classifiers, including sparse representation classifier (SRC) [5,6], support vector machine (SVM) [7,8], artificial neural network (ANN) [9], and nearest mean (NM) [10][11][12], are trained using sensitive features extracted from raw data to achieve automatic identification of fault classes. However, the traditional feature extraction methods based on manual intervention have a strong dependence on empirical factors, which makes it easy to lose useful information of samples. In addition, the classifier based on shallow learning has the problems of less robustness and poor generalization ability.
Recently, deep learning methods, including stacked autoencoder (SAE), deep neural network (DNN), convolution neural network (CNN), and deep belief network (DBN), are widely applied in various data mining and pattern recognition scenarios. With the above methods, the feature representation of samples is transformed from the raw space into a new feature space by multilayer feature transformation, thus improving the classification accuracy. Compared with the methods of manual intervention, deep learning methods can obtain rich internal information of the original data. erefore, the application of deep learning methods has become a research hotspot in the field of fault diagnosis for rotating machinery. Chen et al. [13] extracted the time-domain and frequency-domain features of the raw vibration signals to train the CNN model, thereby achieving the fault diagnosis for the gearbox. Guo et al. [14] proposed a multitask convolutional neural network with information fusion, which solved the problem of fault diagnosis for rotating machinery under noise interference. Sun et al. [15] designed a fault diagnosis model based on a sparse autoencoder to identify the fault classes of the induction motor. First, the damage noise was added into the input layer of the sparse autoencoder to improve its robustness, and then the Dropout algorithm was introduced to avoid overfitting of the training process. Gai et al. [16] used the hybrid grey wolf optimizer to determine the penalty parameters and the number of the variational modal decomposition (VMD) components, and they performed the singular value decomposition (SVD) to construct the feature matrix of the vibration signals. e feature matrix is input into the DBN fault diagnosis model for training to realize the fault diagnosis for the rolling bearing. Pan et al. [17] constructed a hybrid fault diagnosis model based on CNN and the long-short term memory (LSTM) neural network, and they utilized the features learned by CNN to train the LSTM model, thereby accurately identifying the fault classes of the bearing. To improve the feature learning ability of gearbox signals, Shao et al. [18] adopted the maximum entropy to construct the loss function of the deep autoencoder and used the artificial fish school algorithm to optimize the network parameters.
At present, it is worth noting that researchers usually carry out fault diagnosis based on ideal vibration signals of rotating machinery. In actual engineering applications, key information which can reflect the fault status of rotating machinery is always hidden in contaminated signals with noise [19,20]. erefore, the performance of existing fault diagnosis methods degrades seriously due to the noise interference [21,22]. To realize health monitoring under actual engineering conditions, there is a certain practical value to study the fault diagnosis model for rotating machinery with noise immunity. Ou et al. [23] introduced the improved particle swarm optimization (IPSO) to optimize the majorization-minimization-based total variation (TV-MM) algorithm, thereby effectively removing the noise interference in signals. Zhang et al. [24] first adopted the raw signals to construct the Hankel matrix and restricted the weight vector to a unit vector. en, to eliminate the random noise in the raw vibration signals, the L 3/2 -norm sparse filter was adopted to extract the different frequency components among the sample signals. Jiang et al. [25] found the converging U-shape phenomenon by investigating the variation features of the centre frequency (CF) in VMD, thereby constructing a CF-guided VMD optimization strategy to obtain adaptively the weak damage features of rotating machinery. Shen et al. [26] designed a fault diagnosis model based on stacked contractive autoencoder (CAE) for rotating machinery, which can obtain the hidden robust features by penalizing the Frobenius norm of the Jacobian matrix of the hidden features. Combining VMD and multiscale convolution neural networks, Wu et al. [27] established a multiperspective fault diagnosis architecture, which provided the features of the raw signals in multiple perspectives including channel, component, and time scale. e noise robustness of the architecture is verified in the high-speed train dataset and bearing dataset. Yao et al. [28] replanned the positions of the bottleneck layers, convolution layers, and linear bottleneck layers of CNN to propose a stacked inverted residual CNN structure, and the proposed method can identify  bearing fault classes in different noise environments. Jin et al. [29] designed a hybrid model based on CNN and gated recurrent unit (GRU) neural network to solve the issue of fault diagnosis for bearing under noise interference. To improve the adaptive feature learning ability, the exponential linear unit (ELU) was introduced into CNN as an activation function, while the attention mechanism was added to the GRU. After the time-frequency decomposition of the vibration signals, Qiao et al. [30] also combined CNN and LSTM to identify the fault classes of bearings under noise interference. Zhang et al. [31] introduced the integrated learning mechanism into the deep shrinkage autoencoder and constructed a fault diagnosis model based on the integrated deep shrinkage autoencoder, which possesses a strong antinoise ability. Furthermore, LiftingNet [32], Bayesian network [33], concurrent convolution neural network [34], and capsule network [35] are also adopted to identify the fault classes of rotating machinery under noise interference.
To address the problem of low accuracy of fault diagnosis for rotating machinery caused by contaminated signals with noise, a novel method based on one-dimensional deep residual shrinkage network with a wide convolution layer (1D-WDRSN) is proposed in this paper. e innovative contributions of this paper are summarized as follows.
(1) An end-to-end fault diagnosis method is provided; that is, the raw vibration signals contaminated by noise are directly input to obtain the diagnosis results. e fault identification for rotating machinery under different noise interference degrees can be realized by using the fault diagnosis model based on 1D-WDRSN with soft thresholding and attention mechanism.
(2) e one-dimensional deep residual shrinkage network is developed to realize end-to-end fault diagnosis under noise interference. By introducing a suitable wide convolution layer to set the first layer of deep residual shrinkage network (DRSN), the shortterm features of the raw vibration signals contaminated by noise can be obtained, thereby further enhancing the noise immunity of the network. (3) e AdaBN algorithm is used to enhance the domain adaptability of the network. By introducing the AdaBN algorithm, the training samples and test samples are adjusted to a new distribution space. Because the above two distributions are approximately the same in this distribution space, the influence of noise interference on the fault diagnosis model is reduced. e paper is organized as follows. After the introduction, Section 2 illustrates the basic concept and procedures of the fault diagnosis method. To illustrate the performance of the proposed method for fault diagnosis of rotating machinery, experiments for various situations are conducted in Section 3. Finally, the conclusions are given in Section 4.

Overview.
To confront the challenge of fault diagnosis caused by noise interference, a fault diagnosis model based on 1D-WDRSN is proposed. Figure 2 shows the framework of the 1D-WDRSN model, which consists of a wide convolutional layer and multiple residual building units (RBUs). e fault diagnosis model based on 1D-WDRSN can realize adaptive feature learning, which extracts essential features from one-dimensional vibration signals contaminated by the noise and outputs the diagnosis results. Firstly, a larger size convolution kernel is adopted in the wide convolution layer to extract the short-term features of the signals, which can reduce noise interference to a certain extent. After the wide convolution layer, the Dropout technology is applied to avoid the overfitting problem in network training. A zerofilling technique is introduced in the RBUs to ensure that the features of different convolution kernels are of the same size. Moreover, the AdaBN algorithm is further introduced to accelerate the network training process and enhance the adaptability of the network under noise interference. Finally, the Softmax classifier is utilized to classify the extracted features, wherein cross entropy is constructed as a loss function to represent the error between the predicted value and the true value.

DRSN Model.
By introducing soft thresholding and attention mechanism into ResNet, DRSN [36] is constructed to achieve accurate classification of noise-contaminated samples. e working principle of DRSN is to find the interference characteristics of the input samples according to the attention mechanism and use the soft threshold function to set them to zero, thereby reducing the influence of noise interference on the pattern recognition effect.
ResNet is a novel deep learning method that can solve the degradation problem caused by the increase of network depth. ResNet is composed of a series of RBUs, which can be expressed as follows.
where x l represents the identity shortcut part and F(x l , W l ) is the residual part. e residual part F(x l , W l ) is composed of basic components, including batch normalization (BN), rectified linear units (Relu), and convolutional layers (Conv). e convolution kernel in the convolution layer can reduce the network training parameters, thereby reducing the probability of the network suffering from overfitting. e convolution mapping is as follows.
where x i represents the i-th channel of the input feature, y j represents the j-th channel of the output feature, M j is the channel set for calculating y j , k represents the convolution kernel, and b represents the bias.

Shock and Vibration
Adding BN between the neighbour convolutional layers can reduce the transfer of internal covariates, thereby improving the efficiency of network training and enhancing the generalization ability of the network. BN mainly performs feature normalization operations; that is, features are firstly normalized into the standard distribution and are then adjusted to the ideal distributions. e process of BN is expressed as follows: where N batch represents the size of mini-batch, x n and y n are the input and output of the n-th observation value in the mini-batch, ε is a constant close to 0 to ensure numerical stability, c is a scaling parameter, and β is a bias parameter. e activation function is the nonlinear transformation part of the neural network. With the ability of addressing the issue of gradient disappearance, Relu is widely used in deep learning methods to help maintain the stability of features.
e Relu activation function is expressed as follows.
Compared with the ordinary network, ResNet introduces skip connections, which can make the information of the previous RBU flow into the next RBU without hindrance. erefore, ResNet can not only accelerate the network convergence speed, but also avoid the problem of gradient disappearance and degradation caused by the increasing depth of network.
At present, soft thresholding is the core step of many denoising algorithms, which can remove the feature whose absolute value is less than the threshold and can shrink the feature whose absolute value is greater than the threshold toward 0. e soft thresholding function is shown as follows: where x and y are the input and output, respectively, and τ represents the threshold. e threshold setting must meet two conditions: one is that the threshold is positive and the other is that the threshold cannot be greater than the maximum value of input. In addition, it is better to set a corresponding independent threshold according to the input noise. e derivative of the soft thresholding function is as follows: It can be seen from the above that the function can only be 1 or 0, which has the same properties as Relu. erefore, soft thresholding can not only reduce noise interference, but also avoid the issue of model gradient disappearance.
In the field of computer vision, the visual system of animal can quickly scan the entire area to find the target object and then focus on the target object to extract more essential information while suppressing irrelevant information. Squeeze-and-Excitation Network (SENet) [37] is a newer deep learning method using the attention mechanism. e contribution of various feature channels of the input samples to the classification task is often different. A small subnetwork is used to obtain a set of weights in SENet, which is multiplied with the features of each channel to adjust its  size. e above process can be considered as applying different amounts of attention to each feature channel. In the construction of SENet, each input sample will get an independent set of weights. e specific path is "global average pooling (GAP)⟶full connection layer (FC)⟶ Relu⟶FC⟶Sigmoid". DRSN learns from the subnetwork structure of SENet to realize the soft thresholding using the attention mechanism. e threshold can be automatically set through the learning of the subnetwork in the red dashed box as shown in Figure 3. In this subnetwork, the absolute values of the input sample features are calculated, and then a feature A is obtained after GAP. In the other path, the features after GAP are input into a small full connection network, where the Sigmoid function is adopted as the last layer, and the output is normalized to obtain a coefficient α. Finally, the threshold corresponding to the input sample can be expressed as α×A, which meets the above two conditions of the soft thresholding.
Since the corresponding threshold can be set according to different samples, DRSN introduces a special attention mechanism, which can discover the interference features of the input samples and set them to zero through the soft thresholding. In other words, it can pay attention to the essential features and reserve them in the network.

Enhancement of Feature Learning Ability Based on the Wide Convolution Layer.
Feature extraction is the basis of fault diagnosis for rotating machinery, the purpose of which is to mine the essential information in the raw signals. However, due to the complex working environment of rotating machinery, the actual measured vibration signals will be contaminated by noise, which increases the difficulty of feature extraction. erefore, to enhance the feature learning ability, the wide convolutional layer is adopted as the first layer of 1D-WDRSN in this paper. e wide convolution layer can extract short-term features like the classical short-term Fourier transform, and the difference is that the former can adaptively learn the effective features of vibration signals and remove interference features that affect the diagnosis results. e core idea of the wide convolution layer is to improve the noise immunity of the model by using a larger convolution kernel. In general, a large convolution kernel has a larger reception field, thereby ignoring noise interference. Zhang et al. [38] prove that the large convolution kernel can be regarded as a low-pass filter, which can effectively obtain the low-frequency features of the raw vibration signals and suppress high-frequency noise interference. In the proposed fault diagnosis model based on 1D-WDRSN, the wide convolutional layer contains a large convolution kernel, and the appropriate convolution kernel size will be determined through experiments in the next section.

Improvement of Domain Adaptability Based on AdaBN.
In the field of image recognition, AdaBN is an improved BN algorithm with the domain adaptability. In AdaBN, the mean and variance of the BN layer achieved from the source domain samples are replaced by the mean and variance of the target domain samples. By performing BN on the source domain samples and AdaBN on the target domain samples, the two domains can be adjusted into a new distribution space, thereby achieving the purpose of field adaptation. erefore, AdaBN can not only ensure that the network parameters are shared between the source domain and the target domain, but also make the statistical information of the two domains be independent at the BN layer.
In this paper, AdaBN algorithm is introduced into 1D-WDRSN to accelerate network training process and improve the adaptability of the network to noise interference. Firstly, training samples are employed to optimize and train the 1D-WDRSN model. If the distributions of the training samples and the test samples are inconsistent, the mean and variance of all BN layers in the network are replaced with those of the test samples while keeping other parameters unchanged. Finally, the 1D-WDRSN model with domain adaptability is used to implement fault diagnosis for rotating machinery. e specific description of 1D-WDRSN based on AdaBN is shown in Algorithm 1.

Fault Diagnosis Process.
e process of the fault diagnosis model based on 1D-WDRSN for rotating machinery proposed in this paper is shown in Figure 4.
e 1D-WDRSN model is introduced for adaptive feature learning, which can extract key information from signals contaminated by the noise and accurately identify fault classes of rotating machinery. e specific steps of the fault diagnosis process are as follows: (1) Randomly divide the original vibration signal samples into a training sample set which is employed to train the 1D-WDRSN model and a test sample set that is employed to verify the effectiveness of the 1D-WDRSN model.

Experiments and Results
To e data acquisition system of CWRU bearing laboratory is shown in Figure 5. e damage locations of the bearing are outer race, inner race, and ball, wherein the fault diameter at each damage location is 0.007 inches, 0.014 inches, and 0.021 inches, respectively. erefore, the dataset includes ten bearing fault classes. A dataset under a load of 1 hp is adopted in this paper. Since each vibration signal sample contains 1024 data points, the entire  dataset consists of 1000 samples, where the ratio of the training sample set to the test sample set is 7 : 3. e detailed information of the dataset is described in Table 1.

Noise Preparation.
Due to the complex working environment of rotating machinery, the noise interference exists in actual engineering. erefore, Gaussian white noise is introduced to simulate the real noise, and the signal to noise ratio (SNR) is used to measure the noise intensity. SNR can be calculated as follows.
where P signal is the signal energy and P noise is the noise energy. Obviously, SNR is inversely proportional to noise intensity.
To evaluate the noise immunity of the fault diagnosis model proposed in this paper, the fault diagnosis experiments for rotating machinery are performed in a noise environment, where the SNR changes from −4 dB to 10 dB. With the purpose of simulating the actual engineering application, the raw vibration signals of the rotating machinery are adopted to establish the training sample set in our experiment, while noise interference is added to samples in the test sample set. Ten training sample sets and test sample sets are randomly generated for each noise intensity, thereby reducing the randomness in the results of fault diagnosis experiments.

Selection of Model Parameters.
Considering that several parameters, i.e., the size and number of convolution kernels in the wide convolution layer, the number of RBUs, and the size of convolution kernels in the RBUs, have a great impact on the diagnosis performance of the proposed 1D-WDRSN model, a series of experiments are implemented to determine the values of these parameters. e vibration signals contaminated by noise of SNR � -4 dB are adopted as the test samples in these experiments. To reduce the random factors of the results, the average fault diagnosis accuracy of 30 experiments is adopted as the final result.

e Size of Convolution Kernels in the Wide Convolution Layer.
As shown previously, a larger convolution kernel can enhance noise immunity of the network. erefore, the size of convolution kernels is set to be 48, 64, 96, 128, 192, 256, 384, and 512 for this experiment. e number of convolution kernels is fixed to 32, the number of RBUs is fixed to 2, the size of convolution kernels in RBUs is fixed to 16, and the influence of the size of convolution kernels in the wide convolution layer on the diagnosis performance is analysed in this experiment. e experimental results are shown in Table 2.
As shown in Table 2, the accuracy increases steadily with the size of convolution kernels. When the size of convolution kernel in the wide convolution layer reaches 256, the fault diagnosis accuracy tends to stabilize. erefore, the size of convolution kernels is set to 256.

e Number of Convolution Kernels in the Wide Convolution Layer.
After determining the size, the influence of the number of convolution kernels in the wide convolution layer on the 1D-WDRSN model is further analysed. In this experiment, the size of convolution kernels is fixed to 256, the number of RBUs is set to 2, and the size of convolution kernels in RBUs is fixed to 16. e experimental results are shown in Table 3.
As shown in Table 3, the fault diagnosis accuracy increases steadily with the number of convolution kernels. When the number of convolution kernels in the wide   Shock and Vibration convolution layer reaches 48, the fault diagnosis accuracy tends to stabilize. erefore, the number of convolution kernels is set to 48.

e Number of RBUs.
After determining the parameters of the wide convolution layer, the influence of the number of RBUs on the 1D-WDRSN model is further analysed. In this experiment, the size of convolution kernels in RBUs is fixed to 16. e experimental results are shown in Table 4.
As shown in Table 4, the accuracy becomes stable when the number of RBUs is larger than 2. Considering that the increase in the number of RBUs will reduce the network training speed, 2 RBUs are adopted in the 1D-WDRSN model.

e Size of Convolution Kernels in RBUs.
Finally, the influence of the size of convolution kernels in RBUs on the diagnosis performance of the 1D-WDRSN model is shown in Table 5.
As shown in Table 5, the fault diagnosis accuracy achieves the highest value when the number of RBUs is 16. erefore, the size of convolution kernels in RBUs is set to 16. According to the above results, the network parameters of the proposed 1D-WDRSN model are summarized in Table 6. e model consists of a wide convolutional layer, two residual blocks, and a Softmax layer. e 10 fault classes of the bearing are corresponding to the 10 outputs of the Softmax layer.

Noise Immunity Evaluation Experiments.
e parameters adopted for training the network are set as follows. e Adam optimizer is applied to the 1D-WDRSN model, where the learning rate is set to 0.001 and the number of iterations is set to 200. e Dropout rate after the wide convolution layer is set to 0.5.
To verify the adaptive feature learning ability of the proposed fault diagnosis model based on 1D-WDRSN under noise interference, visualizations of the raw vibration signals and features extracted by the 1D-WDRSN model with noise of SNR � 0 dB are shown in Figures 6 and 7, respectively. Considering the high dimensionality of original signals and extracted features, t-distributed stochastic neighbour embedding (t-SNE) technology is adopted to visualize the performance of the feature extraction. e extracted features are mapped in a 3-dimensional scatter plot, which provides a way to evaluate the effectiveness of features. e three axes of the scatter plot are named "Feature 1", "Feature 2", and "Feature 3".
As shown in Figure 6, it is observed that most fault classes of original signals have different degrees of overlaps with other fault classes. In contrast, Figure 7 presents that the 10 fault classes of the features extracted by 1D-WDRSN model are clearly separated from each other. From the perspective of fault classification, the higher accuracy of fault diagnosis can be achieved by inputting those features into the classifier. e above results demonstrate that the 1D-WDRSN model can mine the key information of the signals under noise interference.
In addition, the proposed method is compared with other fault diagnosis methods of rotating machinery under noise interference to verify its effectiveness. e comparison methods are sparse filtering (SF) [24], multiperspective CNN (MP-CNN) [27], CNN + LSTM [30], wide convolution and multiscale convolution (WMSCCN) [35], and deep convolutional neural networks with wide first-layer kernels (WDCNN) [38]. In the case of different noise intensities, fault diagnosis results of the above methods are shown in Figure 8.
It can be concluded from the above results that the proposed fault diagnosis method for rotating machinery achieves the best performance under different degrees of noise interference. In addition, this advantage is more obvious in the case of low SNR. e reasons can be explained as follows. Discovering the interference features of the input samples by using the attention mechanism, 1D-WDRSN employs the soft threshold function to zero for those features, thereby reducing the influence of noise interference on the fault diagnosis accuracy. Moreover, a wide convolution layer in the beginning of the 1D-WDRSN model can suppress high-frequency noise interference, which helps to figure out key information of different fault classes from vibration signals contaminated by noise. In addition, the AdaBN algorithm is introduced to adjust the training samples and test samples to a suitable distribution space, thereby strengthening the domain adaptability of the model.  Training samples  70  70  70  70  70  70  70  70  70  70  Test samples  30  30  30  30  30  30  30  30  30 30

Conclusions
To solve the problem of poor noise immunity of the fault diagnosis model for rotating machinery, an end-to-end method based on 1D-WDRSN is proposed in this paper. Firstly, to remove the interference features of the input samples achieved by the attention mechanism, the soft threshold function is introduced into the 1D-WDRSN model, thereby reducing the influence of noise interference on the fault diagnosis accuracy. en, a wide convolution layer is adopted to suppress high-frequency noise interference, which helps to figure out key information of different fault classes from the vibration signals contaminated by noise. After the wide convolution layer, the Dropout technology is applied to avoid the overfitting problem in network training. Finally, to further improve the adaptability of the network under noise interference, the AdaBN algorithm is employed to adjust the training samples and test samples to a suitable distribution space in the fault diagnosis process. e vibration signals obtained from the deep groove ball bearing in the bearing laboratory of CWRU are chosen as original experimental data. Different degrees of noise are added into the original vibration signal to implement simulation experiments of fault diagnosis under noise interference. Experimental results show that the fault diagnosis method based on 1D-WDRSN can effectively identify the fault classes of rotating machinery under noise interference. In brief, the fault diagnosis based on 1D-WDRSN can improve the reliability and maintainability of rotating machinery and provide a good solution for the field of machinery health monitoring. Future work will be devoted to applying the proposed fault diagnosis model to the mechanical system of the water jet propulsion device, which consists of various rotating machinery.

Conflicts of Interest
e authors declare that they do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.