Deep Adaptive Adversarial Network-Based Method for Mechanical Fault Diagnosis under Different Working Conditions

,


Introduction
Bearings and gears are widely used transmission parts in rotating machinery, and their failure directly affects the healthy operation of machinery and even causes serious incidents.erefore, monitoring and diagnosing the health condition of these transmission parts is crucial [1,2].In recent years, the Internet of ings (IoT) based infrastructure is often adopted for condition monitoring and analysis because it can directly handle massive monitoring data with minimal manual intervention [3,4].Lei et al. [5] developed an intelligent method based on sparse filtering for bearing fault diagnosis.Jia et al. [6] presented a stacked autoencoders (SAE) based network to diagnose the fault problems of bearing and planetary gearbox.Xu et al. [7] used a deep convolutional neural network (CNN) to achieve a bearing fault diagnosis problem under different working conditions.An et al. [8] adopted a recurrent neural network (RNN) to process variable size sequences of bearing fault samples and achieved satisfactory performance.Xiao et al. [9] proposed a deep mutual information maximization (DMIM) method using a variational divergence estimation approach to maximize the mutual information between the input and output of a deep neural network and achieved motor fault diagnosis.Wang et al. [10] presented a capsule neural network for bearing fault diagnosis and obtained a high classification accuracy.Although these methods have achieved excellent diagnosis performance, they require plenty of labeled data.Besides, the training and testing data must own the same probability distribution.But obtaining a considerable amount of labeled data is quite hard for some machines, and the probability distribution of the fault samples constantly changes due to variable speeds and loads.
Transfer learning provides a promising idea of addressing these problems [11,12].In recent years, various related methods have been investigated for fault diagnosis.
Wen et al. [13] introduced maximum mean discrepancy (MMD) into SAE and achieved feature transfer learning under variable speeds.Lu et al. [14] developed a transfer learning-based model with domain adaptation for bearing fault diagnosis.Guo et al. [15] presented a deep convolutional transfer learning network (DCTLN) and used six cases to test the effectiveness of DCTLN.Yang et al. [16] offered a domain-shared CNN model to learn the transferable features from bearing used in the laboratory machines and realcase machines simultaneously.An et al. [17] proposed a multilayer multiple kernel variant of MMD, which introduced the kernel method to replace the high dimensional map of MMD and achieved bearing fault diagnosis under different working conditions.Zhang et al. [18] developed a novel sparse filtering based domain adaptation (SFDA) for the mechanical fault diagnosis, which employed l1-norm and l2-norm to MMD to obtain high dimensional adaptive features.ese studies utilized MMD to minimize the target loss by using the source loss to achieve the learning of crossdomain-invariant features.However, MMD measures the discrepancy using a high dimensional map based on reproducing kernel Hilbert space (RKHS), which cannot guarantee the sufficient closeness of different domain features close enough in RKHS.
In recent years, adversarial learning represented by the generative adversarial networks (GAN) [19] has drawn widespread attention.Various emerging GANs based variant networks have remarkably improved the learning effect compared with traditional GANs [20][21][22].In the field of fault diagnosis, GAN has also been successfully used for data augmentation to enrich training datasets.Wang et al. [23] utilized GAN to generate synthetic fault signals from frequency spectra to expand the training amount and achieve effective fault diagnosis of the gearbox.Mao et al. [24] also used GAN to solve the imbalance fault diagnosis problem and provided a comprehensive comparative study.Liu et al. [25] trained an autoencoder based on the adversarial training process of GAN to perform bearing fault diagnosis.GAN aims to generate training samples different from those of transfer learning.However, since there are naturally a source domain and a target domain existing in transfer learning, Ganin et al. [26] thought the process of generating samples can be avoided and the data in one of the domains (usually the target domain) can be directly treated as the generated samples.At this point, the generator extracts features instead of generating new samples by continuously learning the characteristics of domain data and making it impossible for the discriminator to distinguish the differences.us, the original generator can also be referred to as the feature extractor.So a domain adversarial training of neural networks (DANN) is developed in reference [26].However, the gradient of DANN is always unstable when training the discriminator.In order to overcome these limitations, Wasserstein distance [27] is employed in the discriminator to evaluate the difference between the two distributions.e Wasserstein distance is also called earthmover distance, which is a distance metric for measuring the discrepancy of the distribution between the two domains.It can improve the stability of the optimization process, and the Wasserstein distance-based domain adversarial method can directly extract the domain-invariant features from the original signal.Furthermore, Spectral Normalization (SN) [28] is applied to the discriminator to stabilize the training process.SN controls the Lipschitz constant of discriminator function by strictly restricting the spectral norm of each layer so that the discriminator does not make intensive adjustment while Lipschitz constant is the only hyperparameter.In contrast, other normalized terms impose stronger constraints on the weight matrix than expected, which limits the discriminator to recognize the generated or real distribution.erefore, a novel deep adaptive adversarial network (DAAN) is developed in this study.e main contributions can be summarized as follows:(1) e Wasserstein distance-based domain adversarial method for transferring fault diagnosis is proposed.(2) A new discriminator is designed using the SN strategy to stabilize the training process and accelerate convergence.e remainder of this paper is organized as follows.Section 2 describes the transfer learning problem.Section 3 details the proposed DAAN model.Section 4 presents the fault diagnosis experiments under different working conditions.Section 5 finally provides conclusions.

Wasserstein Distance.
e Wasserstein distance, also called the earth-mover distance, is a distance metric for comparing probability measures and distributions.e gradient of the DANN is always unstable when training the feature extractor.In order to reduce the gradient vanishing problem, Wasserstein distance is employed in discriminator D as the distribution measurement function, which is used as the minimum cost to converge p g to p r as follows: where (p r , p g ) represents the set of all joint distributions c(x, y) whose marginals are p r and p g , respectively.Intuitively, c(x, y) can be considered the cost of moving an amount from x to y in order to transform p r into p g .e Wasserstein distance has been used to solve the optimal transportation problem, so W(p r , p g ) is the minimum transport cost under optimal path planning.erefore, the improved objective function can be obtained as follows: where R is the set of 1-Lipschitz functions.

Spectral Normalization (SN)
. SN can control D via constraining the spectral norm of each network layer.Giving a linear layer f(h) � Wh, the norm is defined by Lipschitz constant: where sup h σ(∇f(h)) is equal to the Lipschitz norm ‖f‖ Lip , and σ is the SN operation of W: which is equal to the largest singular value of W. If the Lipschitz norm is equal to 1, then the inequality Lip can be used to observe the following bound on ‖g‖ Lip : e SN normalizes the spectral norm of W to make sure it can satisfy the Lipschitz constraint σ(W) � 1:

Deep Adaptive Adversarial Network (DAAN).
As shown in Figure 1, the proposed DAAN includes the condition recognition module and domain adversarial learning module.e condition recognition module contains a feature extraction network and a fault classify network.e feature extraction network can automatically learn the fault features, and the fault classify network identifies health conditions according to the extracted features.e domain adversarial learning module is completed by using the discriminator network which is connected to the feature extraction network to help learn the domain-invariant features.
(1) Condition Recognition: a three-layer feedforward neural network (FFNN) is used to construct this module, and then a classifier is followed so as to recognize the health condition.e optimal objective of the classifier C is to train the feature extractor with parameter θ F and C with parameter θ C .e following loss L C is defined as cross-entropy between the predicted softmax probabilistic distribution and the corresponding labels: where l(y s i � k) is the indicator function; C(F(x s i )) k is the kth value of the predicted distribution, and K is the number of health conditions.
(2) Domain adversarial learning: e adversarial training strategy of the GAN is used to extract domaininvariant features.e discriminator D is optimized via maximizing the domain adversarial loss L D considering parameter θ F to minimize the distribution discrepancy between two domains.erefore, L D is defined as follows: By combining the two optimization objectives, the final loss function can be written as follows: where the hyperparameter λ determines the strength of the domain adversarial strategy.

Training Strategy of DAAN.
As displayed in Figure 2, training the proposed method by Adam algorithm is convenient since the optimization objective of the DAAN is built.In the discriminator D, a gradient reversal layer [26] is used to connect the feature extractor during the training process. is layer can ensure the feature distribution in the two domains remain indistinguishable enough for the discriminator D to obtain the domain-invariant features.erefore, the loss can be rewritten as follows: Based on the above equations and Adam algorithm, the parameters θ F , θ C , and θ D are updated as follows: where α is the learning rate.
As the network training is finished, the classifier can accurately identify the unlabeled dataset in the target domain if there are fuzzy domain categories existing in the learned features.In the testing process, the rest target domain dataset is used as the input of the DAAN, and then the classifier outputs the classification result.e structure of the condition recognition module is [1200, 600, 200, 100, 5], and the domain adversarial module is [1200, 600, 200, 100, 1], in which the unit number of the input layer is determined by the dimension of the samples, the unit number of the output layer for the condition recognition module is determined by the number of the health conditions, and the unit number of the output layer for the domain adversarial module is determined by the result of true or false.e unit numbers of the hidden layer are determined by the dimension to reduce the principle.e learning rate is 0.002, and the penalty parameter λ is 0.005.Each training batch includes 500 samples from the source domain and target domain, respectively.e other 500 target domain samples are adopted for testing.In each experiment, a total of 15 trials were conducted to reduce the effects of randomness, and the training step is 50.In case A ⟶ B, the curves of training and testing accuracies are plotted in Figure 5. Accordingly, the training accuracy is approached 100% after approximately 15 training epochs, and the testing accuracy needs approximately 47 training epochs to achieve this goal.

Experiment Studies
e classifier loss curve of DAAN is plotted in Figure 4, and the training loss in DAAN converges to zero after approximately 15 training epochs.For comparison, the loss curves of DANN and DANN without SN are also plotted in Figure 4, it is easy to find that DANN is much more difficult to converge, and DANN without SN needs 25 training epochs to convergence.
ese performances indicate that the proposed DAAN owns a strong domain-invariant feature extraction ability and can help the model to achieve fast convergence.e results of six transfer cases are displayed in Table 1.All the testing accuracies in each case are over 90%, while some are even over 98%. is high accuracy indicates that the DAAN can effectively identify the health condition of bearing in the absence of labeled data.
To further demonstrate the effectiveness of the DAAN, three methods are adopted for the comparison of the six transfer learning cases.e five comparison methods are SAE [6], transfer component analysis (TCA) [31], MK-MMD [17], SFDA [18], and DANN.e subsequent classifier of SAE and TCA is softmax regression.SAE is trained only by the source domain data.TCA, MK-MMD, and SFDA are three representative examples of using the MMDregularized subspace learning method in the domain adaptation field.
e testing accuracies on the six transfer learning cases are listed in Table 1.It is easy to find that the DAAN achieves the highest accuracies and the lowest standard deviations among the given approaches.e average testing accuracy of SAE without transfer learning is only 60.20% because the target domain data have not participated in the model training.
erefore, compared with SAE, it is obvious that the transfer learning-based method is more effective in handling unlabeled data than traditional intelligent fault diagnosis.e traditional DANN e average accuracies of TCA, MK-MMD, and SFDA are 81.53%,94.90%, and 92.98%, respectively.ese results are considerably better than those of SAE but are still worse than those of the proposed method.
us, it can be concluded from the comparison that the DAAN can learn more robust domain-invariant features than the other transfer learning methods.
Furthermore, the t-SNE [32] algorithm is adopted to map the learned features into a 2D scatter diagram to offer visual insights on the two domains.Taking the case A⟶B as an example, the domain-invariant features learned by the DAAN are displayed in Figure 6(f ), and the mapping results obtained using the other comparison methods are shown in Figures 6(a)-6(e).e source and target domains are represented in terms of S and T, respectively.e result in Figure 6(a) demonstrates that although the SAE model obtains good cluster results, the distribution discrepancies of the two domains are substantially large, except for the NC condition.us, it can not effectively classify the unlabeled target samples when the model is only trained using the source samples.Figures 6(b) and 6(c) plot the mapped results of the transferred features learned by TCA and DANN, and the cross-domain discrepancies are clearly reduced.However, some overlapping samples still exist between the IF and RF conditions.Meanwhile, the source and target domain samples of ORF are poorly clustered.Figures 6(b) and 6(c) plot the mapped results of the transferred features learned by MK-MMD and SFDA; it can be seen that the cluster performances have been further improved, but there is still some distance among the two domains.Figure 6(f ) illustrates that the proposed DAAN method not only reduces the distribution discrepancy of the two domains, but also amplifies the feature distance of different health conditions.
erefore, it validates that the DAAN can extract considerably more robust transferable features than other traditional methods.

Case 2: Fault Diagnosis under Different Loads.
Another experiment bench for the transfer learning task under different loads is displayed in Figure 7.
is experiment also has five bearing health conditions of NC, IF, OF, RF, and ORF. e rotating speed was fixed at 1800 r/ min, and the sampling frequency was 12.8 kHz.
e vibration signals were measured under three different loads of 20N (dataset D), 40N (dataset E), and 60N (dataset F). 200 samples were also collected from each health condition under one load, and each sample contained 2400 data points.e frequency-domain samples were also utilized as the inputs of DAAN, and the other parameter sets were the same as in Case 1.
e results were compared with the three other methods, as displayed in Table 2.It shows that the DAAN also achieved the highest diagnosis accuracies for all cases among these four methods at an average testing accuracy of 92.65%.
e SAE method without the transfer learning strategy still performed the worst, yielding a success rate of 50.65%.Besides, the average testing accuracies of TCA and DANN are 72.78% and 81.46%, respectively.e average testing accuracies of MK-MMD and SFDA are 91.10% and 89.70%, respectively.
ese results demonstrate that the proposed DAAN method presents better transfer performance than other methods.
Similarly, in case of D⟶E, the reduced dimension results of these methods are displayed in Figure 8. Figure 8(a) shows that the learned features via SAE still poorly clustered the same health condition samples under different loads and corresponded to a low classification accuracy of 55.28%.Figures 8(b) and 8(c) demonstrate that the learned transferable features through TCA and DANN are subject to a smaller distribution discrepancy than that via SAE.However, the RF and ORF samples under different loads are still separated.Figures 8(d) and 8(e) show the results of MK-MMD and SFDA.We can find that the distributions of transferred features from the two domains are closer than the ones of the features learned by TCA and DANN. Figure 8(d) displays the excellent cluster result obtained by the proposed DAAN.
e source and target features under the same health condition are gathered remarkably close, and different health condition samples are also effectively separated.Consequently, the proposed DAAN method can learn domain-invariant features to reduce the discrepancy between different domains.

Case 3: Fault Diagnosis Using CWRU Bearing Dataset.
In order to test the proposed method for the case under different loads and speeds, a bearing dataset offered by Case Western Reserve University (CWRU) [33] is applied in this section.Four fault types of bearing are considered: (1) normal condition (NC); (2) inner ring fault (IF); (3) outer    e four datasets are named as Datasets G, H, I, and J.In this experiment, each fault type under one load includes 200 samples, and each sample contains 2400 data points, so there is a total of 2000 samples for each load.
e accuracies and the corresponding standard deviations of all different transfer scenarios are shown in Table 3.As we can see in Table 3, there are totally 12 different transfer scenarios applied to obtain the diagnosis accuracies.It presents that the average testing accuracies of all the scenarios using the proposed method can obtain more than 98.71% and the standard deviations below 0.17%, which means the proposed method can effectively and stably achieve transfer fault diagnosis under different loads and speeds.In addition, the other transfer learningbased methods can also achieve a good result, maybe because the difference between different working conditions is not big enough.e dimension reduction results of all the transfer learning-based methods are also basically the same.So we only provide the results of G⟶H, I⟶H, and J⟶H to show the effectiveness of the proposed method, which is displayed in Figure 9.It is observed that almost all the transferable features of the same health condition are assembled in the corresponding cluster, and different health condition features are separated.is indicates that the proposed method can learn transferable features without being affected by the varying loads and speeds.
, and concurrent fault in the outer ring and roller (ORF).e four fault bearings are depicted in Figure3(b).Vibration signal is commonly utilized in condition monitoring and diagnosis due to its rich and useful information with high sampling frequency[29,30].All vibration acceleration data were measured under three different speeds of 1100r/min (dataset A), 1300r/min (dataset B), and 1500r/min (dataset C).e sampling frequency of the accelerometer is 25.6 kHz, 200 samples are selected from each bearing health condition, and each sample contains 2400 data points.Hence, a total of 1000 samples are acquired.e spectra of the raw signals are then transformed via FFT, and 1200 data points of each sample in frequencydomain are obtained as the input of the DAAN model.In each experiment, all the source domain samples and half of the target domain samples are used for training.e rest target domain data samples are used for testing.e spectra of those three datasets and the transfer learning cases are displayed in Figure2.4.1.2.Diagnostic Results.Figure4shows that the proposed DAAN is evaluated on six transfer learning cases: A ⟶ B, B ⟶ A, B ⟶ C, C ⟶ B, A ⟶ C, and C ⟶ A. In each case, the part before and after the arrow refers to the source domain and target domain, respectively.For example, in the case A⟶B, datasets A and B are the source domain and target domain, respectively.

Figure 7 :
Figure 7: Test bench of bearing fault.

Table 1 :
Classification results in Experiment 1.

Table 2 :
Diagnosis results in Experiment 2.

Table 3 :
Diagnosis results in Experiment 3.