A New Deep Convolutional Domain Adaptation Network for Bearing Fault Diagnosis under Different Working Conditions

Effective fault diagnosis methods can ensure the safe and reliable operation of the machines. In recent years, deep learning technology has been applied to diagnose various mechanical equipment faults. However, in real industries, the data distribution under different working conditions is often different, which leads to serious degradation of diagnostic performance. In order to solve the issue, this study proposes a new deep convolutional domain adaptation network (DCDAN) method for bearing fault diagnosis.)is method implements cross-domain fault diagnosis by using the labeled source domain data and the unlabeled target domain data as training data. In DCDAN, firstly, a convolutional neural network is applied to extract features of source domain data and target domain data. )en, the domain distribution discrepancy is reduced through minimizing probability distribution distance of multiple kernel maximummean discrepancies (MK-MMD) and maximizing the domain recognition error of domain classifier. Finally, the source domain classification error is minimized. Extensive experiments on two rolling bearing datasets verify that the proposed method can implement accurate cross-domain fault diagnosis under different working conditions. )e study may provide a promising tool for bearing fault diagnosis under different working conditions.


Introduction
Rolling element bearings are an integral part of the rotating mechanical system, which are widely applied to many fields, such as gearbox, locomotive wheel, and gas turbine. Failure of the bearing directly affects the unexpected downtime, which will lead to higher maintenance costs and even safety issues. erefore, it is of great significance to implement high accuracy fault diagnosis of bearing. In recent years, with the development of big data technology, data driven intelligent fault diagnosis technology is widely concerned because it can provide accurate diagnosis results without extensive expert knowledge and cumbersome artificial feature extraction. In particular, many researches introduced deep learning into bearing fault diagnosis and achieved good results [1][2][3].
A number of researches neglected the change of working conditions, which assumed that the distribution of training data and testing data is the same. Zhang et al. [4] designed a deep belief network and verified the effectiveness of the proposed method through the turbofan engine degradation dataset. Lei et al. [5] established the two learning stages: one is using unsupervised networks to extract features and the other is using softmax regression to classify the health conditions. Jia et al. [6] presented a local connection network which is constructed by normalized sparse autoencoder, and the performance of the method was verified on a gearbox dataset and a bearing dataset. Jian et al. [7] combined with adaptive one-dimensional convolution neural network (CNN) on Wide Kernel and Dempster-Shafer evidence theory to put forward a one-dimensional fusion neural network. And experimental results on the bearing data of Center of Case Western Reserve University (CWRU) showed that this method has good diagnostic accuracy. Wen [8] used a conversion method converting signals into twodimensional images to build a new fault diagnosis model of CNN based on LeNet-5. Shao et al. [9] developed a novel method for intelligent fault diagnosis of rolling bearings using ensemble deep autoencoders. Wang et al. [10] used the optimization method called batch normalization to train the deep neural network, and experimental results show that the proposed method can extract features quickly in an elegant way. Huang et al. [11] added a new layer in front of the convolution layer to construct the composite signal; the validity and necessity of adding a new layer were verified by experiments. Zhang et al. [12] proposed residual learning algorithm to solve the issue that the gradients in optimization may vanish or explode during backpropagation. With the rapid development of deep learning technology, various fault diagnosis methods based on deep learning were continuously proposed [13][14][15]. However, there are two conditions for the direct application of deep learning to fault diagnosis. (1) e accuracy of diagnosis needs a lot of labeled data for training. (2) e distribution of training data and testing data is the same. In some fields, such as locomotive bearing or aerospace bearing, the labeled data are difficult to obtain. In addition, due to the constant change of working conditions in real industries, the distribution of training data and testing data is often different, which leads to the decline of model generalization ability. erefore, it is of great practical significance to propose a fault diagnosis model that can implement accurate fault diagnosis under different working conditions.
Targeting this issue, various signal processing methods were proposed. Liu et al. [16] proposed fault diagnosis technology of unknown time-varying speed bearing based on multicurve extraction and selection, Vold-Kalman filter, and generalized demodulation. Zheng et al. [17] developed a multiscale fuzzy entropy method for measuring the complexity of time series. However, these methods depend largely on the quality of manually extracted features, and they require domain knowledge and human intervention. erefore, we consider whether this problem can be solved by a deep learning method that directly takes the raw vibration signal as input. Inspired by the idea of transfer learning, it just can provide an effective solution to solve these problems. Transfer learning consists of source domain data and target domain data; the source domain data is labeled, and transfer learning can be divided into supervised transfer learning, semisupervised transfer learning, and unsupervised transfer learning based on whether the target domain data is completely labeled, partially labeled, or not labeled. e purpose of transfer learning is to reduce the features distribution discrepancy of source domain data and target domain data [18]. In transfer learning, when the distribution of source domain data and target domain data is different but the two tasks are the same, this special transfer learning is called domain adaptation. e deep learning algorithm based on domain adaptation has achieved important results in image recognition and speech recognition [19,20]. In recent years, a variety of intelligent bearing fault diagnosis methods based on domain adaptation have been proposed [21][22][23]. A domain adaptation bearing fault diagnosis method was proposed in [24] by fast Fourier transformation of the original signal. Ren et al. [25] used multiscale permutation entropy and time-domain features as network input to train neural network, and it is verified by experiments that the proposed method can implement fault diagnosis under different working conditions. However, the diagnostic performance of the methods in [24][25] is also affected by the quality of manually extracted features. Similarly, deep learning can also be used to achieve transfer learning. Lu et al. [26] proposed a novel domain adaptation model of deep neural network, which shorted the distance between the source domain features and the target domain features through maximum mean discrepancy (MMD).
rough the addition of the MMD adaptation layers, the distance between the two domains can be significantly shortened, and the accuracy of crossdomain diagnosis can be improved [27,28]. In [26][27][28], their methods alone used the MMD to reduce the distribution discrepancy between source and target domain features. Li et al. [29] proposed a novel cross-domain fault diagnosis method based on deep generative neural networks. Based on condition recognition and domain adaptation, Guo et al. [30] established a deep convolutional transfer learning network. Inspired by [29,30], to effectively reduce distribution discrepancy and improve diagnostic accuracy, we consider both MK-MMD loss and domain classifier loss.
Fault diagnosis under different working conditions is very common and practical, so it is of great significance to find a widely applicable fault diagnosis method under different working conditions. For comprehensive consideration in all the above analysis, in order to implement the accurate fault diagnosis of unlabeled data under different working conditions, a new DCDAN is proposed in this paper. e main contributions of this paper can be summarized as follows. Firstly, a new domain adaptation method is proposed, which can implement the accurate fault diagnosis without labeled data under various working conditions. Secondly, a new optimization objective function is proposed, which includes minimizing source domain classification error, minimizing MK-MMD, and maximizing domain recognition error. Lastly, the cross-domain fault diagnosis experiment of two datasets is established and the superiority of the proposed method is demonstrated by comparing with the existing methods. e structure of this paper is as follows: Section 2 details the domain adaptation, CNN, and MMD problems. Section 3 presents the model framework and optimization function of the method proposed. In section 4, two case studies are conducted with the proposed model. Section 5 is conclusion of the whole paper.

Previous Works and Preliminaries
2.1. Domain Adaptation. Domain adaptation learning can effectively solve the problem of inconsistent probability distribution between training data and testing data. In general, let D � χ, P(X) represent a domain data, where χ is the feature space of inputs, P(X) is the marginal probability distribution of inputs, and X � x1, x2, . . . , xn { } ∈ χ is a series of learning samples. Usually, D s represents source domain and D t represents target domain. Given a labeled source domain D s � xi, yi n i�1 and unlabeled target domain Shock and Vibration Y s � Y t , conditional probability P(Y s /X s ) � P(Y t /X t ), and marginal probability distribution P(X s ) ≠ P(X t ). e goal of domain adaptation learning is to use the labeled data D s to learn a classifier to predict the label of the target domain D t . In this paper, we try to solve the domain adaptation issues under different working conditions, that is, how to use the labeled data under one working condition to implement fault diagnosis of unlabeled data under other unknown working conditions. Since the source domain data and the target domain data are derived from vibration data under different working conditions, thus, the marginal distributions of these domains result in discrepancy. As shown in Figure 1

Maximum Mean Discrepancy.
MMD is a measure of the discrepancy between two domains, which is the most frequently used nonparametric distance metric in domain adaptation learning. MMD is a kernel learning method, which measures the distance between two distributions in reproducing kernel Hilbert space (RKHS) [31]. Supposing that source dataset X s and target dataset X t are obtained from the distributions P 1 and P 2 through independent and identically distributed sampling and the sizes of the data set are n 1 and n 2 , respectively, MMD is defined as where H represents the RKHS, f: X s , and is the feature kernel. In the deep neural network, the features become exclusive with the deepening of layers. And in the higher layer, the domain adaptability of features decreases significantly with the characteristics becoming exclusive. So, optimal kernel choice is crucial for effect of domain adaptation. In this paper, in order to enhance the portability of domain feature representation and better implement domain transfer learning, we focus on the multiple kernel variant of MMD (MK-MMD). As mentioned in [32], MK-MMD assumes that the optimal kernel can be obtained by linear combination of multiple kernels, which is defined as where β u are the coefficients.

The Proposed Method
A novel DCDAN model is proposed to solve the domain adaptation problem in bearing fault diagnosis under changing working conditions. e method has the ability to extract features directly from the raw vibration signal of source domain and target domain and can diagnose bearing faults of unlabeled target domain without manual data conversion. In this part, we mainly introduce the details of DCDAN model, including model structure, optimization algorithm, and training strategy. Because CNN has very good performance in image feature extraction, this paper transforms the original vibration signal into two-dimensional image through data preprocessing.

DCDAN.
e architecture of the DCDAN model is shown in Figure 2. is model can be divided into three parts: feature extraction, domain adaptation, and fault recognition.
In feature extraction, the labeled data from the source domain and the unlabeled data from the target domain are inputted into CNN network at the same time, and the feature is extracted through four stacked convolution layers. In the convolution layer of each cell, there are four operations: convolution, batch normalization (BN), activation function, and maximum pooling. In each convolution layer, filter is used to convolute the inputting image, and zeros-padding operation is implemented to prevent the feature map dimension from changing. After the convolution, the BN is operated to improve the training speed and pull the data distribution back to the standard normal distribution, so that the gradient is always in a large state. After the BN, the nonlinear activation function is introduced to enhance the learning ability of the network and eliminate the problem of gradient disappearance or explosion. e nonlinear activation function is usually ReLU. ReLU proved to be an excellent activation function in neural network [33]. Finally, max-pooling layer retains the main features, meanwhile reducing the parameters and calculations of the next layer and preventing overfitting.
rough previous studies, the number and size of convolutional filters have a great impact on the diagnosis results. en, the learned features are tiled into one-dimensional feature vector, and the feature vector is used as the input of F1 and F4. e F1 layer connects F2 and F3 layers, and F4 layer connects F5 layer. e feature distribution is different under different working conditions. In the domain adaptation part, in order to minimize the distribution discrepancy, we use MK-MMD to measure the distribution discrepancy of the learned transferable features. As shown in Figure 2, we embed the features of F1, F2, and F3 layers into RKHS and use the linear combination of Gaussian kernel to align distribution of different domains. en, the multiple kernel distribution discrepancy is taken as the optimization goal, and the network parameters are trained by minimizing MK-MMD. In addition, the addition of domain classifier is also a means to reduce the distribution discrepancy of domain. For the domain classification task, the predicted results of domain classification are outputted in the final domain classifier layer.
In fault recognition part, for the fault classification task with the label in the source domain, the final output layer of the source domain uses softmax regression to output the predicted fault categories. e features distribution of the target domain data and the source domain data are drawn closer by domain adaptation learning. Consequently, the network can correctly classify the target domain unlabeled data. In the deep learning model training, if the parameters of the model are too many and the training samples are too few, the trained model is easy to produce the phenomenon of overfitting. To prevent overfitting, the model uses the dropout technique after each fully connected layer.

Optimization Objective.
e proposed optimization object of DCDAN consists of the following three parts: (1) minimizing the classification error of source domain data; (2) minimizing MK-MMD discrepancy between the distributions of two domains; and (3)   In order to make the predicted fault categories of DCDAN model more close to the actual fault categories, the cross-entropy loss is used as the loss function. Cross-entropy can be used to evaluate the difference between the predicted results and the real results. Reducing the cross-entropy loss can improve the prediction accuracy of the model. e cross-entropy loss L c of source domain is presented as follows: where n is the batch size of input data, k is total fault categories, h(·) is the real label of input data, and O i is output results of softmax. Although MMD has achieved a good domain adaptation effect in some researches, most of the studies minimize the distribution discrepancy of the last layer of the full connection layer. Single layer MMD cannot completely eliminate the feature discrepancy because feature transferability deteriorates in multiple top layers. So, in this study, we embed the features learned in fully connected layer F 1 , F 2 , and F 3 into RKHS, which is called MK-MMD. e second optimization object of the DCDAN model is to minimize MK-MMD. MK-MMD loss based on equation (2) is calculated as follows [34]: At the same time, in order to better maintain the domain invariant, we design a domain classifier model. As shown in Figure 2, the domain classifier is connected to the last convolution layer. If the domain classifier cannot distinguish the data from the source domain and the target domain, it means that the features of the last convolution layer are invariant. erefore, the third optimization object of the DCDAN model is to maximize the domain classification error. e domain classifier is a two classification problem, and the cross-entropy loss of classification is as follows: where n is the batch size of input data, x i is real domain label of input data, and O i is output results of softmax. To maximize L dc , a special gradient reversal layer (GRL) was introduced. e purpose of GRL is that transfer weight is unchanged during the forward propagation, and the sign of neuron weight increment is reversed during the back propagation. e GRL can be formulated as a function R(x) described as follows: where I is an identity matrix. rough equations (3)-(5), we can write the final optimization objective as follows: where λ > 0 and μ > 0 are the trade-off parameters and μ + λ � 1.
In this paper, the optimization objective of DCDAN model is trained by the stochastic gradient descent (SGD) method. In model training, we define θ f , θ c , and θ d as the parameters of feature extractor, fault recognition, and domain classifier, respectively. e loss function (8) can also be written as follows: SGD is applied to update θ f , θ c , and θ d as follows: where η is the learning rate.

Algorithm and Training
Strategy. e overall framework of DCDAN model is shown in Figure 3. Firstly, the data from different working conditions are divided into labeled source domain data and unlabeled target domain data. en data is extracted domain invariant feature through convolution layer. In the process of training DCDAN model, the SGD is used to minimize the loss as in (9) of the whole batch of data. After a certain number of epochs, the loss function can generally converge. Finally, we can use DCDAN model of saving the trained network parameters to classify the testing data and get the diagnosis results.

Results
In this section, two experiments of rolling element bearings are taken as examples to verify the effectiveness of the proposed model. Case 1 adopted the CWRU bearing dataset. Case 2 is the rolling element bearings collected by the bearing test rig we built.

CWRU Experiment Description.
e first experiment in this study adopts the CWRU bearing dataset [34]. ere are four states of bearing: normal (N), inner-race faults (IF), outer-race faults (OF), and ball fault (BF). Each fault position has a damage diameter of 0.007 inches, 0.014 inches, and 0.021 inches, plus N, a total of ten categories. e driveend data is adopted and the sampling frequency is 12 kHz. In this paper, we use overlapping slice technology to expand the data, and the principle is shown in Figure 4 [35]. We used Shock and Vibration 784-size sliding window to expand the CWRU bearing dataset. e overlapping parameter is set to 16. erefore, data enhancement methods are used to obtain 2000 samples for each health condition, with a total of 20000 samples. Before the data are inputted to the DCDAN model, we need to convert one-dimensional 784-length data into two-dimensional data with size of 28 × 28. In this experiment, the data of the abovementioned ten states are collected at four rotating speeds (1797 rpm, 1772 rpm, 1750 rpm, and 1730 rpm), and datasets (N1, N2, N3, and N4) under four different working conditions are formed, as shown in Table 1. N1, N2, N3, and N4 are four different domains; each domain contains ten health conditions (nine fault states and one normal state).
In order to prove the benefits of the method proposed in this paper, we design 12 sets of domain adaptation fault diagnosis tasks: N1 ⟶ N2, N1 ⟶ N3, N1 ⟶ N4, N2 ⟶ N1, N2 ⟶ N3, N2 ⟶ N4, N3 ⟶ N1, N3 ⟶ N2, N3 ⟶ N4, N4 ⟶ N1, N4 ⟶ N2, N4 ⟶ N3. In each domain adaptation experiment, the front part of the arrow represents the source domain, and the back part of the arrow represents the target domain. Source domain data is labeled data, and target domain data is unlabeled data. e training data consists of source domain data and 50% target domain data. e remaining 50% target domain data is used to test the model effect.

Implementation Details.
In the fault diagnosis experiment, the setting of network parameters is also very important. is paper sets parameters of CNN convolutional layer network structure and fully connected layer network structure through a large number of experiments; the parameters of network are shown in Table 2.
e related parameters are as follows: the network parameter settings of CNN feature extraction are shown in  trade-off parameters affect the cross-domain diagnosis results of the DCDAN. In order to find the optimal combination, μ and λ are searched from {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1}, respectively, and μ + λ � 1. Each experiment conducted 4 trails, and the mean value is calculated. rough comparative experiments, the optimal result is obtained when µ � 0.5 and λ � 0.5. e batch size is set at 128 and the training epoch is set at 1000. e initial learning rate is set at 0.01 and its value is reduced to 10% of the original value after every one hundred epochs. Each group of domain adaptation experiments are repeated 10 times, and the accuracy is averaged. e calculation process of the model is conducted on a PC with Intel i7-8700 CPU, 16 GB RAM, and NVIDA RTX 2070 GPU, and PyTorch platform is used for the programming. Taking experiment N1⟶N2 as an example, the training loss is shown in Figure 5. It can be seen that the training loss of the proposed DCDAN converges after 1000 training epochs.

Comparison Methods.
In order to show the superiority of the DCDAN model, the proposed method in this paper is compared with several common domain adaptation learning methods. Specifically, other methods are used in the same dataset for the same task of domain adaptation learning, and the results of domain adaptation diagnosis are obtained. is paper mainly studies the following methods.
(1) CNN: the CNN model trained in the source domain is directly used for fault diagnosis in the target domain, without domain adaptation. e CNN network consists of the four stacked convolution layers and four fully connected layers. e parameters of four stacked convolution layers are the same as the four convolution modules of DCDAN, as shown in Table 1. ree fully connected layers are used in the network with 256, 128, and 10 hidden units, respectively. e learning rate is set at 0.0001. e optimization objective only includes the cross-entropy loss. Like DCDAN, CNN uses raw vibration data of two dimensions as network inputs. e other parameter settings are the same as in DCDAN.
(2) Transfer component analysis (TCA): it is a domain adaptation method based on handcrafted features [36]. e inputs of TCA are frequency spectrum data. And the 14 handcrafted features used are mean, RMS, kurtosis, variance, crest factor, wave factor, and eight energy ratios of wavelet package transform, respectively [23]. e trade-off parameter is searched from {0.01, 0.1, 1.0, 10, 100}.

CWRU Diagnosis Results and Discussion.
e diagnostic results of the testing samples of the proposed DCDAN method and the compared approaches on the CWRU dataset are given in Table 3. Under without domain adaptation, the average diagnostic accuracy is only 82.88%. Two existing     To better demonstrate the superiority of the proposed method, t-distributed stochastic neighbor embedding (t-SNE) is used to implement feature visualization of output features [37]. Taking experiment N1⟶N2 as an example, the t-SNE visualizations of the last layer features learned by comparison methods and DCDAN are shown in Figure 6. It can be seen from erefore, the superiority of the proposed method is further verified.

Bearing Test Rig Experiment Description.
In order to further verify the proposed method, this paper establishes a practical bearing test rig as shown in Figure 7. Rolling bearing is used as the testing bearing in the experiment. Five bearing health conditions (N, IF, OF, BF, and a compound fault (OF and BF)) are simulated in the experiment. e accelerometer is installed on the bearing house and is used to collect vibration data. e sampling frequency was set as 20 kHz and the sampling time is 220 s. In the experiment, data on five health conditions is collected at rotating speed of 600, 1200, and 1800 rpm, respectively.
us, this experiment studies six domain adaptation tasks as shown in Table 4. Each domain consists of five health conditions, and 1000 samples are taken from each health condition without the use of data enhancement technology. e output of the F3 of the DCDAN network is 5. Other experimental settings are similar to CWRU experiment.

Bearing Test Rig Diagnosis Results and Discussion.
e CWRU data that the speed setting is relatively close is not suitable fault diagnosis of real industries. For this reason, three working conditions with large speed difference are selected to better suit the application of real industries in this experiment, but at the same time it also increases the difficulty of cross-domain fault diagnosis. e accuracy of cross-domain fault diagnosis in this experiment is shown in Table 5. Under without domain adaptation, the average diagnostic accuracy is only 39.94%.
is shows that the features distribution discrepancy of the five health conditions under our experimental settings are quite different, and it is more difficult to implement crossdomain fault diagnosis than the CWRU experiment. e proposed method can achieve more than 86% average diagnostic accuracy. Meanwhile, the diagnostic accuracies of TCA, DDC, and DCDAN-WDC are 49.92%, 66.45%, and 72.62%, respectively.
is shows that the proposed DCDAN can effectively achieve cross-domain fault diagnosis even when the working conditions are very different. It is worth noting that, the average accuracy between T1 and T2 is 86.10%, the average accuracy between T2 and T3 is 91.93%, and the average accuracy between T1 and T3 is 82.87%.
is indicates that the greater the variation of working conditions, the greater the difficulty of crossdomain fault diagnosis.
Taking experiment T1⟶T2 as an example, the t-SNE visualizations of the last layer features learned by CNN, TCA, DDC, DCDAN-WDC, and DCDAN are shown in Figure 8. It can be seen from Figures 8(a) and 8(b) that, through CNN and TCA methods, target domain features are not separated. Similarly, methods DDC and DCDAN-WDC achieve alignment of partial health conditions features, but the target domain features are not completely separated. However, Figure 8(e) can show that the five health conditions are well aggregated together, and target domain features are basically separated. is is enough to show that the proposed method can achieve bearings fault diagnosis under different working conditions; meanwhile the proposed method has strong robustness and generalization ability.   T2  T1 ⟶ T3  T2 ⟶ T1  T2 ⟶ T3  T3 ⟶ T1  T3 ⟶ T2  Source domain  600  600  1200  1200  1800  1800  Target domain  1200  1800  600  1800 600 1200

Conclusion
e intelligent fault diagnosis in real industry is suffering the decline of diagnostic performance due to changing working conditions. To address this issue, this study proposes a novel domain adaptation method of bearing fault diagnosis under different working conditions. In our study, AMK-MMD of three fully connected layers between the two domains are minimized and a recognition error of domain classifier after high-dimensional features is maximized to better learn domain invariant features. rough extensive experiments on two datasets, the results show that the DCDAN outperforms the comparison methods. In CWRU experiment, compared with the CNN, TCA, DDC, DCDAN-WDC, and DCDAN achieve 1.01%, 5.93%, 13.67%, and 16.74% improvements on the average accuracy, and average accuracy of the DCDAN can reach 99.35%. e visualizations of output features verify that the proposed method can obtain the more accurate feature distribution alignment across domains and good clustering phenomenon of target domain. In the bearing test rig dataset, the proposed method can achieve more than 86% average diagnostic accuracy, although the working conditions are very different. Compared with the comparison methods, the effectiveness and superiority of the DCDAN are further demonstrated. erefore, the proposed method has a wide range of industrial application prospects.

Data Availability
All data included in this study are available upon request by contacting the corresponding author.

Conflicts of Interest
e authors declare that they have no conflicts of interest.