Application of Rotating Machinery Fault Diagnosis Based on Deep Learning

. With the continuous progress of modern industry, rotating machinery is gradually developing toward complexity and intelligence. The fault diagnosis technology of rotating machinery is one of the key means to ensure the normal operation of equipment and safe production, which has very important signiﬁcance. Deep learning is a useful tool for analyzing and processing big data, which has been widely used in various ﬁelds. After a brief review of early fault diagnosis methods, this paper focuses on the method models that are widely used in deep learning: deep belief networks (DBN), autoencoders (AE), convolutional neural networks (CNN), recurrent neural networks (RNN), generative adversarial networks (GAN), and transfer learning methods are summarized from the two aspects of principle and application in the ﬁeld of fault diagnosis of rotating machinery. Then, the commonly used evaluation indicators used to evaluate the performance of rotating machinery fault diagnosis methods are summarized. Finally, according to the current research status in the ﬁeld of rotating machinery fault diagnosis, the current problems and possible future development and research trends are discussed.


Introduction
Complex electromechanical equipment is an important basis for the development of modern industries such as coal, transportation, aviation, and construction. With the continuous progress of modern industry, rotating machinery, as an important part of complex electromechanical equipment, is developing toward complexity, large scale, and intelligence [1]. Rotating machinery usually works for long periods of time under heavy loads and high speeds. Bearing, gearbox, and other key components are likely to suffer from wear, deformation, fracture, and other faults during the inherent degradation process and under time-varying operating conditions. e failure of these components will affect the normal operation and use of the equipment. Severe cases can result in downtime or damage, and they even cause casualties and huge economic losses. erefore, research on condition monitoring and fault diagnosis of rotating machinery is one of the key means to ensure the normal operation of equipment, reduce unplanned maintenance, avoid catastrophic failure, and ensure the safety of industrial production, which is of great significance [2,3]. e main tasks of fault diagnosis research on rotating machinery include determining the operating state, judging the fault type, and predicting the fault trend [4,5]. In recent years, researchers have done a lot of research work in the field of rotating machinery fault diagnosis, which has achieved fruitful research results and has been applied in actual working conditions, as shown in Figure 1.
With the rapid development of intelligent algorithms, the current research on deep learning in the fault diagnosis of rotating machinery is increasing year by year, and it has also attracted the attention of more and more researchers. In order to better promote the research progress in this field, this article reviews the related research from the two aspects of deep learning theory and its application in fault diagnosis. is research can provide convenience and inspiration for related researchers and provide reference for understanding and promoting the development of fault diagnosis research. is paper first introduces several development stages of the fault diagnosis method for rotating machinery and explains the diagnosis process, advantages, and disadvantages of each stage method. en, several mainstream deep learning models and transfer learning methods in the field of rotating machinery fault diagnosis are reviewed from both theoretical and application aspects. Finally, the challenges faced by deep learning in the field of rotating machinery fault diagnosis and the possible research trends in the future are discussed.

Development Process of Rotating Machinery Fault Diagnosis
Rotating machinery will generate different physical signals when it is running. ese signals can be used as characteristic signals to characterize its operating status during fault diagnosis research. According to different signal types and acquisition methods, common analysis methods include the following categories: vibration signal analysis [6], thermal imaging analysis [7], acoustic signal analysis [8], temperature signal analysis, and electrical signal analysis. As shown in Table 1, the above methods have their own advantages and disadvantages due to factors such as operating environment and sensor installation. Because vibration signal analysis can quickly respond to changes in the state of rotating machinery, the relevant signal processing methods are diverse and the diagnostic accuracy is relatively high, so vibration signal analysis is the most widely used in these methods. e early traditional fault diagnosis of rotating machinery was realized based on simple signal processing technology [9,10]. As shown in Figure 2, the traditional fault diagnosis method first collects the vibration, temperature, voltage, and current signals of the rotating machinery through sensors. After processing, the characteristic parameters that characterize the operating conditions of the equipment are obtained. e characteristic parameters of normal signals and fault signals are compared and analyzed, and an appropriate threshold is selected. When the characteristic parameters of the collected signals exceed the set threshold, it is determined that the device is faulty. However, in actual working conditions, rotating machinery operates in a harsh environment, and its signals are often characterized by nonstationarity and nonlinearity and contain a large amount of noise [11]. erefore, the accuracy of the traditional fault diagnosis method which only relies on the characteristic parameters is low.
With the rise of artificial intelligence, machine learning (ML) as its core has gained widespread attention. e researchers combined the machine learning method with the feature extraction method based on signal processing to perform fault diagnosis of rotating machinery. Feature extraction methods based on signal processing are used for feature extraction and selection of equipment monitoring signals. Commonly used features include time domain features, frequency domain features, and time-frequency domain features [1,12]. e ML method obtains the diagnostic model with high generalization through the training of input features and establishes the relationship 2 Shock and Vibration between the selected features and the health state of rotating mechanical equipment through the trained diagnostic model [13]. e commonly used ML methods include BP neural network, RBF neural network, extreme learning machine, and support vector machine (SVM) [2]. Although the fault diagnosis method of rotating machinery based on signal processing and machine learning has achieved some achievements, it still has two major defects [14,15]. First of all, in the process of feature extraction, a large number of signal processing technologies and rich engineering experience are required to extract and select appropriate fault features. Secondly, the ML method used belongs to the shallow model [16], and the model parameters will increase with the increase of input data. It will lead to the decrease of   [17], the amount of monitoring data of key components of rotating machinery such as bearings and gearboxes increases day by day, pushing the field of fault diagnosis into the era of "big data" [5]. In the field of ML, Hinton's paper on data dimension reduction of deep neural networks [18] marked the beginning of deep learning research. Deep learning (DL) methods automatically learn features from the original input and process them by building an end-to-end diagnostic mode [19], directly establishing a connection between the growing monitoring data and the health status of the machine [1], and they solve the problem of heavy workload and high cost when selecting features from a large amount of monitoring data. erefore, the application of deep learning methods in the fault diagnosis of rotating machinery has great significance. Figure 3 shows the implementation process of the above three-stage diagnosis method.
Rotating machinery fault diagnosis methods based on deep learning can often achieve good results in a laboratory environment, because there are enough labeled data to train the diagnosis model at this time, but this is not the case in actual working conditions [20]. In actual working conditions, rotating machinery usually undergoes a long degradation process from a healthy state to failure; when a failure occurs, it will be repaired in time, and it is time-consuming and labor-intensive to obtain failure data [21]. As a result, in the collected data, the amount of normal data is far more than the fault data, and the large difference in the number of sample types will cause serious data imbalance. Moreover, it takes a huge cost to mark a large amount of monitoring data for health categories. erefore, most of the monitoring data collected in actual working conditions are not marked. e above reasons lead to the unsatisfactory application effect of the fault diagnosis method based on deep learning in actual working conditions. Transfer learning (TL) is a new machine learning method that is closely related to deep learning and can use existing knowledge to solve problems in other different but related fields [22]. erefore, DL and TL can be used as a bridge to use the rich data resources of the laboratory environment to diagnose the fault of rotating machinery under actual working conditions, and solve the problem that the diagnosis model has low accuracy in identifying the health status of the equipment due to the scarcity of fault data in practical applications.
Although the research methods used by researchers for fault diagnosis of rotating machinery are different, most of the methods follow the same process, as shown in Figure 4. In the research process, first collect the signal from the target diagnostic equipment to obtain characteristic signals that can characterize the operating status of the equipment. en, use different research methods to establish diagnostic models to process and analyze the characteristic signals. Finally, diagnose and make decisions on the health of the target device, so as to ensure the normal operation of the target device.

Deep Learning-Based Fault Diagnosis
Method for Rotating Machinery and Its Application e mainstream deep learning models in the field of rotating machinery fault diagnosis include deep belief networks (DBN), autoencoders (AE) and their variants, convolutional neural networks (CNN), recurrent neural networks (RNN), and generative adversarial networks (GAN).

Basic eory of DBN.
e deep belief network (DBN) is composed of a stack of multiple restricted Boltzmann machines (RBM) [23]. RBM is a probabilistic generative model, and its structure is shown in Figure 5.
RBM consists of a visible layer v and a hidden layer h. e visible layer represents the input sample, and the hidden layer is equivalent to the feature extractor. e visible layer and the hidden layer are connected by a weight matrix w, and there is a connection between each neuron node in the layer, but there is no connection in the layer. e energy function of RBM is as follows: (1) In the formula, n and m are the numbers of visible layer and hidden layer units, respectively; a j and b j are the neuron bias of the visible layer and the hidden layer, respectively. w ij is the connection weight between neurons. e energy function is used to define the joint probability distribution of the nodes in the explicit layer and the nodes in the hidden layer, which can be expressed as the following formula, where Z(θ) is the normalization factor.
e activation conditions of the visible unit and the hidden unit are defined as follows: where σ s is the activation function and Sigmoid function is usually used. en, use the maximum likelihood estimation method to get the parameters of RBM, expressed as

Shock and Vibration
where x i k i�1 is the input data set with k samples. In the DBN, each RBM is connected by weights, and each layer of the RBM network is independent of the others. Pretraining is performed through forward learning; that is, the output of the previous layer of RBM is used as the input of the next layer, and the features are mapped and passed layer by layer. RBM of each layer makes the weight of the layer reach the optimal mapping of the feature vector of the layer, so that a more abstract and more representational feature representation can be    Shock and Vibration formed at the high level. Finally, the backpropagation algorithm is used at the high level to propagate the error from top to bottom to each RBM, so as to realize the supervised fine-tuning of the entire DBN. e core idea of the DBN is to optimize the network parameter values using a layer-by-layer greedy algorithm and use two training methods, interlayer pretraining and reverse fine-tuning, to extract the distributed features of the input data. Pretraining adopts an unsupervised training method, which uses a large number of unlabeled samples to minimize the reconstruction error between layers, can map data from input to output, and construct complex nonlinear functions to characterize features. Reverse fine-tuning makes full use of a small number of labeled samples supervised to achieve accurate classification of DBN networks.

Application of DBN in Fault Diagnosis.
Researchers have carried out a lot of research on the application of DBN. Jiang et al. [24] and Han et al. [25] constructed a DBN-based diagnostic model by stacking multiple RBM and achieved higher diagnostic accuracy than traditional methods. In [26,27], the authors use the frequency domain data after fast Fourier transform (FFT) as the input of the DBN model to diagnose the induction motor fault. Tao et al. used the Teager Energy Operator (TEO) to extract the instantaneous energy in the rolling bearing vibration signal and construct the corresponding feature vector, and then they combined the DBN to diagnose the rolling bearing fault [28]. In [29], a novel intelligent ball screw degradation recognition method based on deep belief networks (DBN) and multisensor data fusion is proposed. First, the derived method calculates frequency spectrums of raw signals, and the fused frequency spectrums are calculated by the multisensor data fusion. en, a deep learning-based recognition model that can estimate the degradation condition of ball screw automatically is established with the fused data set. e flow chart of this method is shown in Figure 6.
Zhang et al. proposed a semisupervised fault recognition model based on Laplacian feature mapping (LE) and DBN to identify the failure mode of mechanical equipment [30]. In [31], DBN is used for automatic diagnosis of high-speed train on-board equipment, and its diagnosis performance is better than KNN and ANN. Chen et al. used the feature selfextraction capability of DBN to extract the characteristics of the vibration signal of the gear transmission system and then perform fault identification [32]. Some researchers used DBN to build diagnostic models to diagnose faults in hydraulic equipment [33], wind turbines [34], and compressors [35] and achieved higher diagnostic accuracy than that of traditional methods. Oh et al. preprocessed the vibration signal to generate a 2D image, then applied the directional gradient histogram as the input feature, and performed feature extraction and fault classification through DBN [36]. Tao et al. proposed a DBN-based multisensor information fusion bearing fault diagnosis model. e input of the model is 14 time domain statistical features of the vibration signals collected by 3 sensors [37]. e flow diagram of multisensor information fusion is shown in Figure 7.
In order to speed up the DBN model training speed and improve its generalization ability, operating efficiency, and recognition accuracy, the researchers further studied the optimization algorithm based on the DBN model, as shown in Table 2. Li et al. proposed a deep belief network (DBN) algorithm and bearing fault diagnosis model based on particle swarm optimization (PSO) [38]. In [39][40][41], the researchers used Nesterov momentum to adaptively optimize the training of the diagnosis model based on DBN, and the diagnosis accuracy rate was higher than that of the standard DBN. He et al. used genetic algorithm to optimize the structure of the DBN and diagnose the fault of the gear transmission chain [42]. Tao et al. proposed a rolling bearing fault diagnosis method based on bacterial foraging decision and deep confidence network, which improved the accuracy of rolling bearing fault diagnosis [43]. Shao et al. constructed an adaptive DBN, used adaptive learning rate and momentum algorithm to train the DBN model [44], and further proposed a convolutional DBN algorithm for bearing fault diagnosis, which used exponential moving average technology to improve the performance of the diagnostic model [   Shock and Vibration

Basic eory of AE and Its Variants
(1) Autoencoder (AE). Autoencoder (AE) is a feedforward neural network with input layer, hidden layer, and output layer. e input layer and the hidden layer form a coding network, and the hidden layer and the output layer form a decoding network. e structure is shown in Figure 8. e basic idea of AE is to transform the input highdimensional data into a low-dimensional encode vector through nonlinear mapping through the coding network, then reconstruct the code vector through the decoding network, and then learn new data representation [47].
Given a data set x m { } M m�1 , the encoding network uses the encoding function f θ to transform the training sample x m into a low-dimensional coding vector h m .
where s f is the activation function of the coding network; θ is the parameter set of the coding network, θ � W, b { }; W is the weight matrix from the input layer to the hidden layer; and b is the bias term coefficient. en, the encoding vector h m is reconstructed by the decoding function g θ ′ in the decoding network to obtain the reconstructed representation x m of the sample x m .
where s g is the activation function of the decoding network; θ ′ is the parameter set of the decoding network, θ ′ � W ′ , d ; W ′ is the weight matrix from the hidden layer to the output layer; and d is the bias term coefficient. e autoencoder minimizes the reconstruction error between x m and x m by optimizing the parameter set θ, θ ′ .
As shown in Figure 9, after stacking multiple layers of AE, a deep network structure can be formed: stacked autoencoder (SAE).
is structure uses the hidden layer output of the previous layer of AE as the input layer of the next AE.
e training of the model includes two stages: pretraining and fine-tuning. e unsupervised layer-bylayer pretraining stage is used to extract fault features in the signal, and the supervised global fine-tuning stage is used to optimize the model's expression of fault features and make it have diagnostic capabilities.
(2) Denoising Autoencoder (DAE). Aiming at solving the problem that the original signal of rotating machinery contains a lot of noise and nonlinear components, Vincent et al. [48] proposed adding noise with certain statistical characteristics to the sample data to train the autoencoder, so as to obtain more robust features to initialize the deep network structure, which is denoising encoder (DAE). Fine tune all the parameters of DBN through the BP algorithm using labeled data NO Set layers, i=1 Set layers, i=i+1 Figure 6: e flow chart of the diagnostic method for ball screw proposed in [29].
Shock and Vibration 7 During training, first add random noise to the sample x m according to the q D distribution to obtain the noisy sample x m : en, train the DAE by optimizing the objective function such as (8).
(3) Sparse Autoencoder (SAE). A sparse penalty term is added to the loss function of the autoencoder to constrain it to reduce the probability that the autoencoder network would completely copy the input information to the hidden layer during the training process. e sparse data features learned by this method can better express the input data, and better hidden layer features can be obtained when the number of model neurons is large. Usually, choose Kullback-Leibler (KL) divergence to determine the penalty term, and the penalty term PN can be expressed as follows: In the formula, S is the number of neurons in the hidden layer and KL(ρ ‖ ρ j ) is KL divergence. e general cost function of neural network is expressed as After adding the sparse penalty item, it can be expressed as In the formula, β is the weight of the sparse penalty term. e optimized parameters W and b, which are also parameters in the sparse cost function J sparse , are finally obtained in the encoding process. erefore, the optimal parameters W and b can be obtained by minimizing the sparse cost function, and this process can be achieved by the backpropagation algorithm.

(4) Convolutional Autoencoder (CAE).
Masci et al. [49] proposed to use the unsupervised learning method of the traditional autoencoder to combine the convolution and pooling operations of the convolutional neural network to achieve feature extraction, and apply the deconvolution operation to decode the features, which is convolutional autoencoder (CAE).
First, suppose there are k convolution kernels, and each convolution kernel is composed of parameters w k and b k ; then, the encoding vector h k can be expressed as By performing feature reconstruction on the obtained h k , the following formula can be obtained: Considering (13), compare the input sample and the reconstructed result with Euclidean distance, and obtain a complete convolutional autoencoder through the optimization of backpropagation algorithm.

Application of Autoencoder and Its Variants in Fault
Diagnosis. AE models can learn representations from machinery data in an automatic way. Jia et al. constructed a diagnostic model based on stacked autoencoders (SAE), which automatically learned fault features from frequency domain data to diagnose rolling bearings and planetary gearboxes [15]. Liu

References
Optimizing method [38] Particle swarm optimization (PSO) [39][40][41] Adaptive learning rate combined with Nesterov momentum [42] Genetic algorithm(GA) [43] Bacterial foraging decision [44] Algorithm of adaptive learning rate and momentum [45] e exponential moving average technique  [55] and used Gaussian wavelet function as the activation function to propose a depth tracking wavelet adaptive encoder (TDWAE) to diagnose the bearing of electric locomotive [56]. e structure of the wavelet autoencoder used in the article is shown in Figure 10. Liu et al. used STFT to process the acoustic signal into a normalized spectrum and input it into a two-layer deep neural network based on SAE for rolling bearing fault diagnosis [57]. Cheng et al. [58] used SAE to extract features in the time domain, frequency domain, and time-frequency

Output layer
Wavelet hidden layer Figure 10: e structure of the wavelet autoencoder used in [56].

Shock and Vibration 9
domain. SVM is used for fault classification at the end of the network. Zhou et al. cascaded three SAE modules and used them to classify fault mode, fault type, and fault severity [59]. e original signal of rotating machinery contains a lot of noise and nonlinear components [60], AE may learn similar features during feature extraction, and the learned features have shifting mutation characteristics, which will lead to misclassification of machinery health status. erefore, the researchers applied the AE variant structure to the fault diagnosis of rotating machinery.
Lei et al. used frequency domain signals as input and stacked multiple denoising autoencoders (DAE) to form a DNN for fault diagnosis [61]. Wang et al. [62] proposed a rolling bearing fault diagnosis method based on EMD and sparse stacked autoencoder (SSAE), as shown in Figure 11.
is method uses the EMD method to obtain the IMF component of the bearing vibration signal and constructs the Hankel matrix to obtain the singular values as the input samples of SSAE.
Lu et al. used stacked denoising AE for bearing fault diagnosis, and the results showed that the diagnosis performance of the proposed method is better than traditional SVM, ANN, and other methods [63]. Sun et al. proposed to integrate denoising coding on the basis of sparse encoder to improve the robustness of feature expression [64]. Wang et al. proposed a continuous sparse autoencoder (CSAE) to identify transformer faults [65]. Shen et al. [66] and Liu et al. [67] constructed the fault diagnosis model of rotating machinery by using contraction AE and convolution AE, respectively. Zhang et al. used convolutional autoencoders (CAE) to diagnose rolling bearing faults [68]. e results showed that the method has the ability to eliminate noise, strengthen fault characteristic signals, and attenuate noncharacteristic impact signals. In [69], Chen et al. proposed an improved ensemble deep autoencoder (IEDAE). Firstly, the loss function of the autoencoder is improved, and three kinds of wavelet convolution autoencoders are designed.
en, five kinds of autoencoders, such as discriminative autoencoder and wavelet convolution autoencoder, are employed to construct the corresponding deep autoencoders, and a "cross-layer" connection is designed to alleviate the gradient disappearance of the deep network. Finally, the recognition result is given using the weighted averaging method to ensure accurate and stable diagnosis result. Wu et al. modified the mean square error (MSE) commonly used in unsupervised autoencoders and proposed a semisupervised fault diagnosis method called hybrid classification autoencoder [70]. is method can use both labeled and unlabeled data to train the model. e architecture of the proposed method is shown in Figure 12.
In order to improve the performance of the fault diagnosis model, researchers have successively optimized the AE-based diagnosis model using different optimization algorithms, as shown in Table 3. In [71,72], a batch normalization layer was added to SAE to solve the problem of internal covariate shift in multilayer network training and speed up the convergence speed. e results show that this method can achieve higher accuracy than the original SAE method. In [73], Saufi et al. proposed a method that combines differential evolution, and a resilient backpropagation approach is proposed to improve the performance of SSAE networks in bearing fault classification. Hou et al. proposed a rolling bearing fault identification model based on particle swarm optimization of stacked noise reduction autoencoder (PSO-SDAE). e model can obtain a robust and deep-level representation of the bearing fault state characteristics [74]. Chen et al. used SAE to diagnose diesel engine faults [75]. e authors use the harmony search (HS) algorithm to optimize hyperparameters, adaptively adjust the network structure of SAE, and improve the feature extraction ability of the network. In [76], Wang et al. proposed a deep neural network based on kernel function and denoising autoencoder (DAE). en, the chaotic firefly algorithm is used to optimize the kernel parameters and the undetermined parameters in the deep network. Based on the AE networks, [77,78] have proposed hybrid diagnostic models for motor bearings and wind turbines, respectively. e results show that the proposed model has better generalization and convergence speed than the original AE network. [79], which has been widely used in computer vision [80,81], speech recognition [82], and other fields [83]. e typical structure of CNN is shown in Figure 13. e convolutional layer, pooling layer, and fully connected layer are all hidden layers of the convolutional neural network, and the input layer and output layer are the visible layers of the convolutional neural network. CNN learns abstract features by alternately superimposing convolutional layers and pooling layers. e convolutional layer convolves multiple local filters with the original input data to generate translation-invariant local features. Each filter uses the same kernel to extract the local features of the input local area. e form of the convolutional layer is as follows:

Basic eory of CNN. Convolutional Neural Network (CNN) is a feedforward neural network proposed by LeCun
where l is the current number of layers, f(·) is the activation function, k ij is the weight matrix of the convolution kernel, M j represents the set of selected input features, and b j is a bias item corresponding to each feature in the convolutional layer.
Pooling is a downsampling operation that can reduce the spatial size of features, the length of feature maps, and the number of model parameters.
e pooling layer extracts fixed-length features on the sliding window according to the rules. Commonly used pooling operations include maximum pooling and average pooling [84]. e fully connected layer refers to sequentially expanding all the finally obtained feature maps to form a feature vector, and the feature vector is fully connected with the output layer. e full connection    Figure 12: e architecture of the proposed hybrid classification autoencoder in [70].

References
Optimizing method [71,72] Batch normalization (BN) [73] Differential evolution/resilient backpropagation approach [74] Particle swarm optimization (PSO) [75] Harmony search (HS) [76] Chaotic firefly algorithm (CFA) [76,77] Hybrid diagnostic model layer is located at the end of the CNN and is used to calculate the output of the whole network. CNN is a feature learning method with multilayer processing units, which can convert the data of the input layer into more easily recognizable features layer by layer. CNN uses local connections and weight sharing to reduce the complexity and computational complexity of the network. e local connection method effectively reduces the number of weight parameters. Weight sharing means that the weights connected by the same convolution kernel are the same, which reduces the number of training parameters, improves the convergence rate, and can effectively suppress overfitting.
e training process of convolutional neural network includes two stages: forward propagation and backpropagation. Forward propagation is to input samples into the network, initialize the network parameters, and finally get the output. Backpropagation adjusts network parameters by minimizing the error cost function until the network converges or the specified iteration termination condition is reached.

Application of CNN in Fault
Diagnosis. CNN can efficiently extract the feature information contained in massive data, which is very suitable for processing large quantities of data. Since the vibration signal is a 1D signal, researchers often use 1D CNN to extract features from vibration signals, and they conducted health detection and fault diagnosis classification for a variety of equipment, such as rolling bearings [85], motors [86], planetary gearboxes [87], and fixed gearboxes [88]. Among the above references, it is worth noting that Han et al. proposed an enhanced convolutional neural network (ECNN) that expands the receptive field [87]. is method uses a 1D convolutional layer to initially amplify the receiving field and capture the fault information in the adjacent point group in the vibration signal, then build multiple fusion expanded convolutional layers to further expand the receiving field, fully capture the long-distance dependence of the original signal, and directly input the original vibration signal into the developed fault neural network for training. e structural framework of the method is shown in Figure 14.
Reference [89] uses the motor current signal as input combined with an improved one-dimensional convolutional neural network to achieve real-time monitoring of motor faults. Janssens et al. use the original frequency domain data as the input of the 2D-CNN model. e model consists of a single-layer convolutional layer combined with a fully connected layer to complete the fault diagnosis of bearings under four types of rotating machinery conditions [90]. She et al. [91] input the multichannel signal into the multichannel 1D CNN for fault diagnosis of rolling bearing. e network structure is shown in Figure 15. Reference [92] uses Ensemble Empirical Mode Decomposition (EEMD) to decompose the original signal into intrinsic mode functions (IMF) selected based on the combined model functions (CMF) algorithm and used as the input of CNN. e input data of CNN is usually 2D data. erefore, some scholars use other methods to decompose or reconstruct the input vibration signal to make it suitable for the diagnosis model based on CNN to complete the intelligent fault diagnosis of the equipment [93][94][95][96]. As shown in Figure 16, some researchers use the short-time Fourier transform (STFT), the Hilbert-Huang transform (HHT) [97], the continuous wavelet transform (CWT) [98], the synchrosqueezing transform (SST) [99], and other methods to convert 1D vibration signals into 2D time-frequency images as the input of CNN. Some other researchers used gray image [100], wavelet packet energy diagram [101], infrared thermal image [102][103][104], root mean square diagram [105], feature statistics diagram [106], and other images as the input of CNN for fault diagnosis of rotating machinery. Wang et al. used the short-time Fourier transform (STFT) to obtain the time-frequency map of the vibration signal and then adaptively extracted the timefrequency map features through CNN [107]. e effects of preprocessing methods and hyperparameters on the accuracy of network diagnosis are also studied. e results show that batch size is the main factor affecting training accuracy and efficiency. Xiao et al. converted the vibration signal into a 2D gray image to extract image features. en, the feedforward denoising convolutional neural network is used for noise reduction, and the CNN gradient descent algorithm is optimized for parameter adaptive learning rate [ [112]. e model has a wide first-layer kernel and a small convolution kernel. e results show that the proposed model also has high diagnostic accuracy in noisy environments. Reference [113] proposes an end-to-end convolutional neural network method.
is method takes the original time signal as input and does not require any denoising or batch normalization preprocessing. It has high diagnostic accuracy under noisy environments or when the workload changes. e model structure is shown in Figure 17 ey used a large-size kernel function and a nonlinear function to filter out noise [114]. Dong et al. proposed a bearing fault diagnosis method based on multilayer noise reduction technology and improved convolutional neural network (ICNN) [115]. is method introduces an attention mechanism in the feature extraction layer of CNN, which improves the ability to extract nonsensitive features. Ye et al. proposed a multichannel weighted convolutional neural network (MCW-CNN) for feature learning and fault diagnosis of gearbox vibration signals [116]. Guo et al. used a hierarchical deep convolutional neural network to extract fault features and perform fault diagnosis on rolling bearings. e network contains two CNN modules to identify faults and determine the severity of faults [117]. Gong et al. [118] proposed an improved CNN-SVM method for motor bearing fault diagnosis. is method uses 1 * 1 transitional convolutional layer and global mean pooling layer to replace the fully connected layer structure of traditional CNN to reduce training parameters. As in [119], SVM is used instead of Softmax classifier to classify fault features to further improve the accuracy of diagnosis. In order to prevent part of the effective information from being filtered out by the pooling operation, Li et al. proposed a deep CNN that ignores the pooling layer to predict the remaining life of multivariable equipment [120].
As the number of CNN layers increases, it will cause gradients to disappear or explode during training. erefore, the researchers added a residual unit on the basis of the CNN architecture and proposed the ResNet structure to solve the problem that parameters such as weights and deviations in the deep-level CNN architecture are usually not easy to  Shock and Vibration 13 optimize [121,122]. Under complex conditions such as variable speed conditions or variable load conditions, the fault diagnosis model of rotating machinery based on the ResNet structure can achieve higher diagnostic accuracy and generalization performance [123][124][125][126]. Reference [125] proposed a method based on time-frequency analysis and deep residual network to diagnose planetary gearbox faults. Other images
fault diagnosis of planetary gearbox under severe noise environment is carried out [127,128].

Basic eory of RNN.
Recurrent neural network (RNN) is a neural network that includes feedforward connections and internal feedback connections. RNN is often used to deal with sequence problems. RNN adds self-connected neurons in the hidden layer to form internal memory and uses backpropagation to realize the memory of historical information and network state feedback. e special network structure of RNN can retain the state information of the hidden layer at a moment, and it has a powerful advantage in the field of complex dynamic system modeling.
e model structure is shown in Figure 18.
In the figure, x t is the input unit at time t, h t is the hidden state at time t, and o t is the output of the network at time t. U, V, W represent the connection weight between layers. After the repetitive structure in the network is expanded, network parameters such as the weight matrix and bias items can be shared. e calculation formula at time t is as follows: In the formula, b h is the bias vector at time t and f(·) is the activation function. e value of each hidden layer of the RNN is determined by the input at the current moment and the value of the hidden layer at the previous moment. RNN can be divided into two different networks of Jordan type and Elman type according to different feedback paths. e traditional RNN is equivalent to a multilayer feedforward neural network. As the length of the time series increases, the number of network layers will increase and the amount of calculation will increase significantly. When Table 4: Summary of fault diagnosis based on optimization algorithm and CNN mode.
dealing with long-term monitoring sequences, large prediction deviations will occur, and gradients may disappear or explode [129]. Researchers have developed improved models such as long short-term memory models (LSTM) and gated recurrent units (GRU) based on standard RNN to solve the shortcomings of RNN.
Long short-term memory (LSTM) is the most representative variant of RNN. LSTM uses a memory unit containing a gate structure to replace neurons in the hidden layer, which can add information to or forget the cell state and allowing information to pass through selectively.
As shown in Figure 19, the LSTM memory cell structure includes input gates, forget gates, output gates, and input modulation gates. e input gate, forget gate, and output gate use the Sigmoid function to control the switching state of the gate, and the input modulation gate uses the Tanh function to control the switching state of the gate [130]. e output of the forget gate is the product of the input information and the cell state at the last moment. e forget gate controls the forgetting of information. e output of the input gate is the product of the input information and the output of the modulation gate. e function of the input gate is to control the input of information. e sum of the above two is used as the cell state of the LSTM memory unit at the current moment, and the product of the output gate and the cell state processed by the Tanh layer is used as the input information of the next unit.

Application of RNN in Fault Diagnosis.
RNN can retain the state information of the hidden layer at a moment, which overcomes the limitations of simple neural networks. It is mainly used to process sequential data or degraded data. Jiang et al. proposed an adaptive RNN for intelligent fault diagnosis of bearings and used an adaptive learning algorithm to further improve the performance of the diagnostic model [131]. Reference [132] uses RNN to model the operating behavior of wind power generation systems. e fault identification is achieved by comparing the residuals between the real system output and the model output. Reference [133] uses RNN to solve the robustness problem of actuator fault diagnosis in dynamic nonlinear systems. Helmes trains the RNN-based model through the BPTT algorithm and the extended Kalman filter method to achieve the prediction of the remaining life [134]. Lin et al. used the recursive fuzzy neural network model to perform faulttolerant control of the permanent magnet synchronous motor position servo drive [135].
It is difficult for traditional RNN methods to analyze and process multidimensional data, and there are large prediction errors when processing long-term sequences. erefore, researchers have improved the RNN method. Wu et al. used LSTM to predict the remaining life of engineering equipment and used dropout technology to improve the generalization ability of LSTM [136]. e results showed that the prediction effect of the proposed method is better than that of the traditional RNN model. Reference [137] uses the three models of traditional RNN, LSTM, and GRU to diagnose and predict aircraft engine faults. e results show that the performance of LSTM and GRU is better than traditional RNN. Zhang et al. proposed a residual life prediction method based on bidirectional LSTM, an architecture that is specialized in discovering the underlying patterns embedded in time series, to track the system degradation and consequently to predict the RUL [138]. e model structure is shown in Figure 20.
Zhao et al. proposed a convolutional bidirectional long short-term memory network (CBLSTM) to predict tool wear failure [139]. is method uses CNN to extract local features and uses bidirectional LSTM to encode the time information output by CNN. Finally, the fully connected layer and the linear regression layer are superimposed to predict the target value. Liu et al. established an intelligent prediction model for rolling mill flutter energy value based on long short-term memory (LSTM) cyclic neural network and analyzed the influence of different time steps on the prediction effect to obtain the optimal prediction step [140]. Reference [141] uses the weighted feature averaging method and bidirectional GRU to build an enhanced two-way GRU network for machine health monitoring.

Basic eory of GAN.
e monitoring data of rotating machinery under actual working conditions has the problem of data imbalance; that is, the amount of normal data is far more than the fault data, and the number of different types of fault data is not balanced. When using imbalanced data for fault diagnosis, the classification boundary of the classifier will be biased toward most classes. As a result, it is difficult to identify minority samples, and the performance of the classifier will be seriously affected.
Generative adversarial network (GAN) [142] is a feature learning algorithm based on game scenarios. Feature learning is performed through adversarial learning, which can be used to solve the above-mentioned data imbalance problem. e structure of GAN is shown in Figure 21. e generative confrontation network consists of two parts: generator and discriminator. e input of the generator is random noise z obeying a certain distribution, the output is a generated sample G(z) similar to the real sample x. e input of the discriminator is the real sample and the generated sample. e function of the discriminator is to distinguish the source of the input sample, that is, the probability that G(z) comes from x. When the discriminator inputs x, D(x) is close to 1; when G(z) is input, D(G(z)) is close to 0. In the training process, the generator optimizes the discriminant results of the discriminator to improve the generation ability, making G(z) as similar as possible to the real sample x, so that the discriminator cannot distinguish the source of samples. e discriminator optimizes itself through the probability of misjudgment so as to improve the discriminant ability of generated samples. e whole network is optimized through mutual antagonism; that is, the final output of the generator is close to the generated samples of the real sample distribution, and the discriminator cannot distinguish the generated samples. e objective function of GAN training can be expressed as follows: D(G(z)))]. (16) e discriminator in GAN can be regarded as a kind of classifier to distinguish the authenticity of samples, and cross entropy is often used to distinguish the similarity of sample distribution. e formula is as follows: ... Figure 20: e structure of residual life prediction method based on bidirectional LSTM proposed in [138].

Application of GAN in Fault
Shock and Vibration diagnosis of the planetary gearbox [143]. Liu et al. [144] and Mao et al. [145] used GAN and stacked denoising autoencoder to solve the problem of data imbalance in bearing fault diagnosis. e results show that the fault samples generated by GAN can improve the fault diagnosis accuracy in the case of data imbalance. Sun et al. proposed an adversarial generative oversampling model based on a generative adversarial network (GAN) to produce valuable artificial samples for minority class to balance the data distribution and used it for tool breakage detection [146]. Different from the previous research using GAN, it uses the discriminator to filter the samples generated by the generator to achieve effective oversampling. e framework structure and network model structure of the proposed method are shown in Figures 22 and 23 [148]. Reference [149] proposed a method based on deep convolutional generative adversarial networks (DCGAN) to accurately detect rolling bearing faults under data imbalance and complex dynamic conditions. Dai et al. combined generative adversarial networks and autoencoders to construct an encoding-decodingrecoding network model to detect anomalies in mechanical systems [150].

Basic eory of TL.
e current machine learning methods used for fault diagnosis of rotating machinery are mostly based on the assumption that training data and test data are in the same feature space and have the same distribution. However, there is a data imbalance between the health data and fault data of rotating machinery in actual working conditions, and the cost of labeling large-scale data is extremely high or even impossible. erefore, it is difficult to construct a large-scale and well-labeled data set. At the same time, the above assumptions are often not true in actual working conditions. e diagnostic model trained with the training set will have poor performance on the test set.
Transfer learning (TL) relaxes the assumption that training data and test data must be independent and identically distributed. TL can apply knowledge or patterns learned in a certain field or task to different but related fields and has been widely used in many fields [151,152].
Assume that the sample space of a machine learning task T is X × Y, where X is the input space, Y is the output space, and its probability density function is p(x, y). Suppose X is a subset of the d-dimensional real number space and Y is a discrete set. A sample space and its distribution can be called a domain: D � (X, Y, p(x, y)). Given two domains, they are considered different if at least one of their input spaces, output spaces, or probability distributions is different, then the two domains are considered to be different. Transfer learning refers to the process of knowledge transfer between two different domains. Features or knowledge structures are transfered from the source domain to help with learning tasks in the target domain to complete or improve learning in the target domain, where labeled data in the target domain are missing or not available. e number of training samples in the source domain is generally much larger than that in the target domain. Different from traditional machine learning methods, as shown in Figure 24, transfer learning frameworks focus on using transferable characteristics or knowledge of the source domain to improve model performance, which can reduce the number of samples required in the target domain.
Deep learning methods can learn the deep representation of data and provide cross-domain invariant features for transfer learning. Transfer learning based on cross-domain invariant features can effectively reduce the difference between source domain and target domain, so transfer learning and deep learning methods are usually closely combined. Transfer learning is generally divided into instance-based transfer learning, feature-representation transfer learning, parameter transfer learning, and relational knowledge transfer learning [153].  [155] and conveyor roller bearing [156] based on the transfer learning method. Chen et al. proposed an improved LSSVM transfer learning method based on auxiliary data to solve the problem of insufficient bearing data available under different working conditions [157]. Lei et al. proposed a deep transfer diagnosis method for the transfer diagnosis between laboratory bearings and electric locomotive bearings [22].

Application of TL in Fault Diagnosis of Rotating
is method extracts transfer fault characteristics from the monitoring data of different devices by constructing a domain-sharing deep residual network and then imposes domain adaptation regular term constraints during the training process to form a deep transfer diagnosis model. Lu et al. used stacked autoencoders (SAE) to extract fault features with similar distributions to perform transfer learning fault diagnosis on motor bearings and gears [158]. For motor bearings under different operating conditions, Zhang et al. [159] and Hasan et al. [160] used target operating conditions samples to finetune the pretrained diagnostic model. Compared with the diagnostic model that only trains a small number of target domain samples, the fine-tuned diagnostic model has faster convergence speed and higher diagnostic accuracy. e schematic diagram of the method is shown in Figure 25  first obtains the initialization parameters of the target model by pretraining with sufficient source domain data and then uses a sample of the target domain to fine-tune the target model to adapt to the remaining samples. Cao et al. [162] and Shao et al. [163] converted vibration data into 2D time-frequency images and then performed parameter transfer on the pretrained image recognition model to obtain the pretrained network. Finally, the fine-tuned network is used for feature extraction and fault classification of time-frequency images. Reference [164] used the deep network model trained in the source domain to complete information transfer in the target domain by fine-tuning parameters, and it constructed a RUL prediction model with good feature representation.
Another method based on transfer learning is to realize fault diagnosis by reducing the difference in the distribution of sample characteristics between the source domain and the target domain. Wen et al. [165] proposed a SAE-based domain adaptive method to diagnose bearings under different working conditions and used the maximum mean difference (MMD) term to measure the difference between domains. Li et al. [166] and Zhang et al. [167] used MMD as a loss function to measure the difference in the distribution of data in the two domains and constructed a CNN-based transfer diagnosis model. As shown in Figure 26, Yang et al. [20] used a CNN-based transfer diagnosis model to transfer the diagnosis knowledge of laboratory motor bearings to the diagnosis of locomotive bearings. Multilayer MMD and its improvement method [168,169] were added to the diagnosis model to improve the transmission performance and robustness of the diagnosis model.
Qian et al. [170] and Zhang et al. [112] used adaptive batch normalization (AdaBN) operation in the CNN-based bearing transfer diagnosis model to improve the model diagnosis performance under different working conditions. Chen et al. used TCA to reduce the distribution difference of rolling bearing monitoring data under different operating conditions and extract transfer characteristics [171]. Han et al. [172] proposed a deep transfer intelligent fault diagnosis framework that extends marginal distributed adaptation (MDA) to joint distributed adaptation (JDA). As shown in Figure 27, the framework uses a discriminant structure related to the source domain labeled data to adapt to the conditional distribution of unlabeled target data, thereby ensuring more accurate distribution matching. Mao et al. [173] used TCA and SVM classifiers to study gearbox transfer fault diagnosis methods and RUL predictions.

Performance Evaluation Index of Rotating Machinery Fault Diagnosis Method Based on Deep Learning
After a researcher proposes a specific research method, it is usually necessary to conduct a rigorous evaluation, that is, to evaluate the performance of the method through a suitable evaluation standard, judging the validity and practicability of the method model. However, in the evaluation process, factors such as the selection of evaluation indicators, data types, and model application scenarios will affect the performance of the method model. erefore, the selection criteria of evaluation indicators are different in different application fields. Some scholars analyzed various performance indicators and classification methods. Some commonly used performance indicators and error measures are systematically organized, and their basic principles, application suggestions, and limitations are summarized [174,175].
is section provides an overview of the performance evaluation indicators that are widely used in the field of rotating machinery fault diagnosis. However, in specific research, researchers still need to make targeted selection of the evaluation indicators used according to the  Figure 25: Illustrations of the fine-tuned diagnostic model based on the TL method proposed in [160].
actual situation in order to correctly evaluate the performance of the method.

Classification Task Performance Evaluation Index.
Related research in the field of fault diagnosis of rotating machinery mainly focuses on the motors, bearings, gearboxes, and other components in mechanical equipment. Most of these studies are classified tasks. Take the fault diagnosis research of rolling bearing as an example; the diagnosis target is the fault type of the bearing (normal bearing, bearing inner ring fault, bearing outer ring fault, bearing rolling element fault). Performance evaluation indicators widely used in these classification tasks include the following: (1) Accuracy (ACC) ACC is defined as evaluating the ratio of the number of correct predictions to the total number of samples. Its expression is where P denotes the number of real positives, N denotes the number of real negatives, TP denotes true positives, TN denotes true negatives, FP denotes false positives, and FN denotes false negatives.  Full connection (6) F1-Score F1-score is defined as harmonic mean of the precision and recall. Its expression is

Predictive Task Performance Evaluation Index.
In addition to the classification of fault types, the research field of fault diagnosis of rotating machinery also includes another important research direction, that is, the prediction of the remaining life of the equipment. e remaining life prediction research of rotating machinery is a kind of prediction task. Given sample set D � (x 1 , y 1 ), (x 2 , y 2 ), . . . , (x n , y n ) , where y i is the true label of sample x i , to evaluate the performance of method model f, it is necessary to compare the model prediction result f(x) with the real label y. e most commonly used performance evaluation indicators for this type of task include the following: (1) Error (E) E is defined as the amount by which an observation differs from its actual value. Its expression is where A is actual measurements and P is predictions. (2) Mean Error (ME) ME is defined as the average of all errors in a set. Its expression is (3) Mean Absolute Error (MAE) MAE is defined as measuring the difference between two continuous variables. Its expression is

Other Performance Evaluation
Methods. In addition to the above-mentioned performance evaluation indicators, there are also some methods that can intuitively evaluate the performance of the method model and are widely used. ese methods include the following.

Confusion
Matrix. e content of the confusion matrix is the statistical data of actual classification and predicted classification. Each column in the matrix represents a predicted instance, and each row represents an actual instance. Confusion matrix is of great significance for understanding the accuracy measurement of model classification effect [174]. As shown in Figure 28, the actual classification results and misjudgments of each type of sample can be found through the confusion matrix [176]. 22 Shock and Vibration

t-SNE Visualization.
t-SNE is a nonlinear dimensionality reduction algorithm. When researching the classification of high-dimensional data, t-SNE can project the data into a 2D or 3D space for observation and judge the separability of the data (small intervals between similar types and large intervals between heterogeneous types). It is also possible to visually analyze the features extracted from each layer of the deep learning model through t-SNE. As shown in Figure 29, the classification performance of the model can be effectively evaluated through the observed feature distribution and the degree of clustering. In addition to the above methods, the performance of the method model can be also evaluated to a certain extent through methods such as the training/validation accuracy curve of the method model, the training/validation loss curve, and the algorithm running time. However, the hardware equipment and operating environment used by different research methods are different, so these methods have limitations. Depending on the application field of the research method and the type of data, the selection criteria of the evaluation indicators will also change. When selecting performance indicators, it is necessary to select multiple indicators or a combination of multiple indicators according to different research methods to properly evaluate the performance of the method.

Existing Problems and Future Development Trends
Rotating machinery fault diagnosis method based on deep learning can not only quickly and effectively extract characteristic signals reflecting equipment operating conditions, but also establish a nonlinear relationship between equipment operating monitoring data and equipment operating conditions and then accurately identify the type of equipment failure and the degree of the failure. However, there are still certain problems and challenges that need to be further studied.
Regarding research on fault diagnosis when multiple fault forms exist on the same component at the same time, existing studies have mostly neglected the simultaneous occurrence of multiple failure modes in engineering applications. For example, bearing failure may be caused by various failure forms such as corrosion of the inner ring and cracks on the surface of the outer ring. erefore, the research on the fault diagnosis method when there are multiple fault forms of the same component at the same time is worthy of further study.
In respect of research on fault diagnosis when multiple component faults are coupled with each other, rotating machinery in engineering applications runs for a long time and under heavy load, and its multiple components often have different failures at the same time. ese components usually affect each other and jointly determine the operating conditions of the equipment. Most of the existing research focuses on a certain part of the equipment, such as a bearing or gearbox and conducts its fault diagnosis research separately. erefore, the method of equipment fault diagnosis after multiple components have failed and are coupled to each other needs to be studied in depth.
Concerning research on methods to improve the quality of operating data for rotating machinery, rotating machinery data in engineering applications has the characteristics of large data volume, multiple signal sources, different sampling forms, and easiness of interference by random factors. It is difficult to establish appropriate evaluation criteria to quantitatively explain the completeness, accuracy, and timeliness of the data. erefore, it is necessary to study intelligent data cleaning algorithms or other methods to improve data quality and increase data availability.
With regard to research on the extended application field of fault diagnosis based on deep learning, current fault diagnosis mostly focuses on key components such as bearings or gearboxes of rotating machinery. However, there are a large number of other types of mechanical equipment in engineering applications, such as vibrating machinery or unfixed machinery. e Shock and Vibration 23 characteristic signals of this type of mechanical equipment are different from those of rotating machinery. erefore, it is necessary to study the application of deep learning-based fault diagnosis methods to other types of mechanical equipment such as vibrating machinery and unfixed machinery.
As for research on robustness and real-time performance of fault diagnosis methods based on deep learning, the actual working conditions of the diagnosis model are not necessarily the same. For example, a model that performs well in the diagnosis of fan bearings is not suitable for the diagnosis of coal mine belt conveyors. erefore, it is necessary to study a more robust diagnosis method. At the same time, the diagnosis model and related algorithms need to be updated according to in-service performance to deal with new situations.

Conclusion
e fault diagnosis research of rotating machinery is a key link to ensure the normal operation of rotating machinery and equipment, reduce unplanned maintenance, and ensure the safety of industrial production, which is of great significance. With the rapid development of intelligent algorithms and hardware equipment, the research on fault diagnosis of rotating machinery based on deep learning has received attention. is article reviews the research on fault diagnosis of rotating machinery based on deep learning, and the conclusions and contributions obtained are as follows: (1) e development process of fault diagnosis research in the field of rotating machinery is summarized, the advantages and disadvantages of the methods at each stage are explained, and the fault diagnosis research based on deep learning methods is pointed out as one of the future research trends. (2) is article discusses deep belief network (DBN), autoencoder (AE) and its variants, convolutional neural network (CNN), recurrent neural network (RNN), generative adversarial network (GAN), and transfer learning (TL) from both aspects of basic theory and specific applications. It provides a certain reference and convenience for researchers in this field to carry out follow-up research work. (3) is article provides an overview of some commonly used evaluation indicators used to evaluate the performance of diagnostic methods. From the perspective of engineering application, the problems existing in the study of fault diagnosis of mechanical equipment with deep learning methods are analyzed, and the future research and development trends are prospected. is work provides some inspiration for researchers in this field and helps to promote the development of this field.
In future work, researchers should combine theory and practice to increase the size of the available data set as the data basis for future research, improve the accuracy and speed of the diagnosis algorithm as much as possible, and combine this with the equipment in actual working conditions to improve the diagnosis and the usability of the algorithm in the industrial field [177].

Data Availability
No data were used to support this study.

24
Shock and Vibration