Deep Domain AdaptationModel for Bearing Fault Diagnosis with Riemann Metric Correlation Alignment

In many real-world fault diagnosis applications, due to the frequent changes in working conditions, the distribution of labeled training data (source domain) is different from the distribution of the unlabeled test data (target domain), which leads to performance degradation. In order to solve this problem, an end-to-end unsupervised domain adaptation bear fault diagnosis model that combines Riemann metric correlation alignment and one-dimensional convolutional neural network (RMCA1DCNN) is proposed in this study. Second-order statistic alignment of the specific activation layer in source and target domains is considered to be a regularization item and embedded in the deep convolutional neural network architecture to compensate for domain shift. Experimental results on the Case Western Reserve University motor bearing database demonstrate that the proposed method has strong fault-discriminative and domain-invariant capacity. +erefore, the proposed method can achieve higher diagnosis accuracy than that of other existing experimental methods.


Introduction
Rolling bearings are key components in heavy-duty machinery and manufacturing systems and have also been widely used in modern industries. However, unexpected bearing faults during long-term operations lead to large maintenance costs and safety losses [1]. In the past decades, machine learning and statistical inference techniques have been intensively studied and have become increasingly popular today due to their ability to process collected signals rapidly and efficiently and provide reliable fault diagnosis results without prior expertise [2][3][4][5]. Recently, with the development of deep learning, the performance of fault diagnosis has been remarkably improved. e excellent performance of various fault diagnosis applications is mentioned in [6][7][8][9][10][11][12][13].
Data-driven techniques for fault diagnosis generally assume that training and testing data are derived from the same distribution. However, in real-world applications, the distributions of training and testing data are often different from each other due to changes in the environment, working conditions, and bearing quality. Consequently, fault diagnosis systems suffer from large performance degradation.
A domain adaptation technique whose main research must focus on the adaptation of a learning model built in a source domain for different but related target fields is necessary to avoid such reconstruction efforts to address this challenge. Many studies in engineering areas have reported that domain adaptation, which includes image classification, natural language processing, object recognition, and feature learning, is beneficial and promising [14][15][16].
Domain adaptation has recently been introduced into the field of fault diagnosis, in which the fault diagnosis model parameters or input features are adjusted to compensate for the mismatch.
Zhang et al. [17] took raw vibration signals as inputs of a deep convolutional neural network with the wide first-layer kernel convolutional neural network (WDCNN) model and used adaptive batch normalization (AdaBN) as the algorithm of domain adaptation to realize fault diagnosis under different load conditions and noisy environments. Lu et al. [18] introduced a deep CNN model with domain adaptation for fault diagnosis; this model integrated the maximum mean discrepancy as the regularization term into the loss function of the model to reduce the cross-domain distribution difference. Zhang et al. [19] developed an adversarial domain adaptation model, which comprises a source feature extractor, a target feature extractor, a domain discriminator, and a label classifier, for fault diagnosis. Jian et al. [20] proposed a fusion CNN model that combines one-dimensional CNN (1DCNN) and Dempster-Shafer evidence theory to enhance cross-domain adaptive capability for fault diagnosis. Li et al. [21] presented a deep domain adaptation method for bearing fault diagnosis based on multikernel maximum mean discrepancies between domains in multiple layers to learn representations from the source domain applied to the target domain.
e main contributions of this study are as follows: (1) We propose an end-to-end approach that directly takes raw temporal signals as inputs and does not require time-consuming denoising preprocessing and separate the feature extraction algorithm (2) We combine RMCA with 1DCNN bearing for fault diagnosis in a unique domain adaptation pipeline, RMCA-1DCNN, which can learn fault-discriminative and domain-invariant features between domains (3) Extensive experiments on Case Western Reserve University (CWRU) bearing datasets demonstrate that RMCA-1DCNN achieves superior performance to that of existing baseline methods e rest of the paper is outlined as follows. In Section 2, we discuss some necessary theoretical background on unsupervised domain adaptation and CNN. In Section 3, we present our RECA-1DCNN unsupervised domain adaptation fault diagnosis model. We report a broad experimental validation in Section 4. Finally, we provide conclusions in Section 5.

Theoretical Background
2.1. Unsupervised Domain Adaptation. Domain adaptation involves machine and transfer learning. In transfer learning, when the data distribution of the source (training data) and the target (testing data) domains is different but the two tasks are the same, this special transfer learning is called domain adaptation, which can be divided into two classes: supervised and unsupervised adaptation. If the source data have labels and the target data have no labels, then it is called unsupervised domain adaptation. Its formal definition is as follows: Definition 1. (domain). A domain D comprises two components: a feature space X and a marginal probability distribution P(X), where X � x 1 , ..., x n ∈ X is a dataset, that is, D � X, P(X) { }.

Definition 2. (task)
. Task is the learning goal. A task T comprises two components: a label space Y and a predictive function f(X) corresponding to the labels, i.e., is also the conditional probability distribution, and Y ∈ Y.
, N s and N t are the numbers of samples of source and target domains, respectively. e eigenspaces of D S and D T (that is, X s � X t ), label space (that is, Y s � Y t ), and conditional probability distribution Q s (y s | x s ) � Q t (y t | x t ) are assumed to be the same. However, the marginal probability distribution of the two domains, that is, P s (x s ) ≠ P t (x t ), is different. Domain adaptation aims to use labeled D S in learning a classifier f(x t ) ⟶ y t for predicting the labels y t of D T , where Learning strategies of domain adaptation can be roughly divided into two categories, namely, instance transfer and feature matching, to reduce the distribution divergence between domains. e former reweights the source domain data according to the shared information contained in the target data and then further analyzes the reweighted source data [22,23]. Meanwhile, the latter either performs subspace learning by utilizing the subspace geometrical structure [24][25][26][27] or distribution alignment to reduce the marginal or conditional distribution divergence between domains [28][29][30][31]. With feature matching, some approaches based on deep neural and adversarial networks have demonstrated superior performance on domain adaptation benchmark datasets [32][33][34][35][36].

Deep Correlation Alignment.
In the activations computed at a given layer of a deep neuron network, x s i and x t i are d-dimensional representations. C S and C T are covariance matrices of source and target features, respectively. According to [27,37], C S and C T , are respectively, defined as follows: where P is the centering matrix [38]. Taking the source domain as an example, P is a matrix of N S × N S , and its value is as follows: We define the CORAL loss [27,36] as shown in the following to minimize the distribution discrepancy between the second-order statistics (covariance) of the source and target features: where ‖‖ 2 F represents the squared matrix Frobenius norm.
2 Mathematical Problems in Engineering e covariance matrix is a symmetric positive definite matrix and a Riemann space, in which Euclidean distance is suboptimal. Log − Euclidean distance is an approximate Riemannian metric, which can effectively capture manifold structures. Using the Log − Euclidean distance, the RMCA loss [39] is redefined as where U and V are the matrices in which C S and C T are diagonalized, respectively; σ i and u i , i � 1, ..., d, are the corresponding eigenvalues. e normalization term q/d 2 provides the loss independent from the size of the features.

Convolutional Neural Network (CNN).
CNNs have the characteristics of the local acceptance domain, shared weight, and spatial subsampling. A standard CNN comprises input, convolution, pooling (subsampling), fully connected, and output layers. We focus on 1DCNN because vibration signals are one-dimensional. Compared with two-dimensional data, one-dimensional representation is simple and intuitive, as long as the signal is regarded as an image with a height of 1. In 1DCNN, the forward propagation from a previous convolution layer l − 1 to the input of a neuron in the current layer l can be expressed as follows: where b l j is a bias of the j neuron at layer l; H l−1 i is the output of the l neuron at layer l − 1; and W (l) ij represents the weight of the kernel, which connects the i neuron at layer l − 1 to the j neuron at layer l. e pooling layer usually follows the convolution layer and samples the features based on the following sampling rules: where down(.) denotes the max or average pooling function. After passing through multiple convolutional and pooling layers, the CNN can classify the extracted features through the fully connected and softmax layers and obtain the labels of the samples.

Riemannian Metric Correlation Alignment Loss.
In unsupervised domain adaption fault diagnosis, the source domain data have labels, and the cross-entropy H(X S , Y S ) is defined as where for each sample, x s i , y s i is the ground truth label and f(x s i , θ) is the network prediction. Considering bearing fault diagnosis as a multiclassification problem, the final deep features must be sufficiently discriminative to train strong classifiers and domain invariance between domains. Minimizing the classification loss alone might lead to the overfitting of the source domain and reduce the performance of the target domain. Meanwhile, minimizing the RMCA loss itself is likely to degenerate features. erefore, we consider the cross-entropy loss in the source domain and the second-order statistical alignment of the given layer in the source and target domains for joint training with the two losses and define the final loss function as follows: (8) where L CLASS denotes classification loss on the labeled source domain data, L RMCA denotes the Log − Euclidean metric of the second-order statistics between the source and the target, and the hyperparameter α determines the strong confusion in the domains.
Considering the two kinds of losses together, the network not only learns good feature classification but also reflects the statistical structure of the source and the target and prevents overfitting. During model training, objective function (8) is minimized by gradient descent on θ. e final learned features are expected to work well on the target domain.

RMCA-1DCNN Fault Diagnosis Model. RMCA-1DCNN
is proposed to solve the cross-domain learning problem in the bearing fault diagnosis area. As shown in Figure 1, a DCNN is used as the main architecture, and the model employs a domain adaptation layer following Riemannian metric correlation alignment loss before the classifier. e labeled source and unlabeled target data are fed into the RMCA-1DCNN model in the training process. en, domain-invariant features of the raw vibration signals are extracted through the multiple convolutional and pooling layers. e distribution discrepancy is minimized at fully connected layers. eoretically, correlation alignment can be performed at multiple layers in parallel. Empirical evidence [36,37] shows that solid performance is obtained even if this alignment is conducted only once. As a common practice, correlation alignment is performed after the last fully connected layer. Joint training with the classification and the second-order statistic losses between the two domains in the given layer can adapt the learned representations in the source domain for application to the target domain. e domain-invariant features can be efficiently extracted to improve the cross-domain testing performance (Table 1).

Architectural Design of 1DCNN.
Considering that the vibration signals of bearing collected by acceleration sensors are usually one-dimensional is reasonable, 1DCNN is used Mathematical Problems in Engineering to process the vibration signals. In this study, 1DCNN is adopted to handle bearing fault diagnosis. e network structure comprises four convolutional and pooling layers, a fully connected layer, and a softmax layer at the end. e first convolutional layer uses wide kernels for feature extraction and high-frequency noise suppression. Small convolutional kernels in the preceding layers are used to deepen the network for multilayer nonlinear mapping and preventing overfitting [17]. e parameters of 1DCNN are detailed in Table 2. e pooling type is max pooling, and the activation function is ReLU. e ADAM stochastic optimization algorithm is applied to train the model to minimize the loss function, and the learning rate is set as 1e − 3. e experiments are conducted using the TensorFlow toolbox of Google.  Figure 1: Framework of RMCA-1DCNN. Mathematical Problems in Engineering [40].

Experimental Analysis of the
e data were collected from a motor driving the mechanical system under four different loads (0, 1, 2, and 3 hp) and three different locations: the fan end, the drive end and the base, and the sampling frequency, which includes 48 and 12 kHz. e four fault types of the bearing are normal condition (N), outer race fault (OF), inner race fault (IF), and roller fault (RF). Each fault type contains fault diameters of 0.007, 0.014, and 0.021 inch. erefore, we have 10 fault conditions in total.
In this paper, the vibration signals of different fault locations and health states with a sampling frequency of 12 kHz at the driving end of rolling bearing are selected for experimental research. e detailed description of the datasets is shown in Table 2. ree datasets are acquired under three loads of 1, 2, and 3 hp. Each large dataset contains training and testing samples, and each sample contains 2,048 data points. Overlap sampling technique is used to increase the number of training samples. e training samples are then overlapped to augment data [17]. However, no overlap occurs among the testing set. Overall, each dataset comprises 6,600 training samples and 250 test samples of 10 health states.  (1)- (3) and (5) are methods that work with the data transformed by fast Fourier transform. (4) is a CNN-based method that works with the normalized raw signals.

Experiment
For a fair comparison, we adopt accuracies reported by other authors with the same setting or conduct experiments using the source code provided by the authors. As shown in Figure 3, the performance of SVM, MLP, and DNN in domain adaptation is poor, and their average accuracy in the six scenarios is 66.63%, 80.40%, and 78.05%, respectively. erefore, the sample distribution is different under varying conditions, and the model trained in one condition is unsuitable for fault diagnosis and prediction in another condition.
Compared with some recent methods, such as WDCNN (AdaBN) and A2CNN, our method achieves an average accuracy of 99.33%, which is evidently higher than that of all the baseline methods.
In five out of six shifts, that is, A ⟶ B, A ⟶ C, B ⟶ C, C ⟶ B, and C ⟶ A, the fault diagnosis accuracy of the proposed method achieves the state-of-the-art domain adaptation performance, and the first four domain shifts reach up to 100%. In the domain-shift scenario of B ⟶ A, the accuracy of the proposed method is close to the A2CNN method. is accuracy is 0.18% lower than that of the A2CNN method, which is far better than that of SVM, MLP, DNN, WDCNN, and WDCNN (AdaBN) methods. On this basis, RMCA-1DCNN can learn fault-discriminate and domaininvariant features and effectively solve the domain adaptation problem caused by different loads of bearing data.

TP TP + FP
, where True Positive (TP) represents the number of faults correctly identified as fault category c, False Positive (FP) means the number of faults wrongly identified as fault category c, and False Negative (FN) represents the number of faults c incorrectly labeled as not belonging to c.
F-Measure is defined as a reference for diagnosis analysis, and the calculation method of F-Measure is as follows: F-Measure denotes the geometric weighted average of Precision and Recall, with α as the weight. We set α to 1 which indicates that Precision is as important as Recall. When α > 1, Precision is important; meanwhile, Recall is important when α < 1. In this study, α is set as 1, and an F-Measure close to 1 leads to improved fault diagnosis performance.
is evaluation method considers Precision and Recall. e highest F-Measure is 1. Precision, Recall, and F-Measure of each health state in the RMCA-1DCNN approach are shown in Table 3.
In Table 3, for the first type of the fault, that is, the rolling body fault size is 0.007 inch, the RMCA-1DCNN method has low Precision in the domain-shift scenarios B ⟶ A and C ⟶ A, which is 86% and 83%, respectively.
us, approximately 15% of this kind of fault alerts in the two domain shifts are unreliable. For the fourth type of the fault, Precision of the proposed method in the domain shift of B ⟶ A is 93%, indicating that 7% of the samples are incorrectly classified as this fault category. Accuracy (%) Figure 3: Accuracy on six domain shifts of the proposed and compared methods.  Table 3, for the third type of the fault, that is, the rolling body fault size is 0.021 inch, Recall of the RMCA-1DCNN method in the domain-shift scenarios B ⟶ A and C ⟶ A is low at 80%. is finding indicates that 20% of these faults are undetected.
Similarly In short, Precision, Recall, and F-Measure of the RMCA-1DCNN method are all high. Except for the first, the third, and the fourth fault types, the RMCA-1DCNN method divides all categories into the correct classes. ese results suggest that the classification performance of the proposed method is considerably improved after Riemann metric correlation alignment.

Parameter Sensitivity.
In this section, we study the hyperparameter α, which is a critical coefficient for crossvalidation. A high value of α may force networks to learn oversimplified low-rank feature representations. Although this high value may lead to perfectly aligned covariances, it may not be useful for classification. Meanwhile, small α may be insufficient to bridge the domain shift.
ree typical domain transfer scenarios, namely, B ⟶ A, C ⟶ A, and B ⟶ C, are selected. e results of α with different values are shown in Figure 4. Similar trends are observed in other domain-transfer scenarios. As shown in Figure 4, a range of α(α ∈ [10,25]) can be selected to obtain better results than those of the best baseline methods. When the value of α is larger than 25, the accuracy rapidly decreases in the three domain shifts. e effectiveness and robustness of the proposed method are further verified.
Furthermore, the second-order statistics of the specific activation layer in the source and target domains belong to the Riemannian manifold. When the classifier is in the optimal state, the entropy on the source domain is minimized, and the entropy on the target domain is minimized because both domains are indistinguishable after the alignment [37]. Given that the target domain data are unlabeled, the entropy E on the target domain is defined as which is nothing but network predictions. Figure 5 shows the plots of target entropy and diagnosis accuracies as α ∈ 0.1, 0. 5, 1, 2, 5, 7, 10, 12, 15, 17, 20, 22, { 25, 27, 30} on the domain transfer scenario of C ⟶ A. We can clearly see that when α � 17, the target entropy is minimal, and the diagnosis accuracy is the best. e minimal target entropy corresponds to the maximum performance on the target. Note that α corresponds to the best performance in the range of [10,25], also proving that target entropy minimization is necessary and is insufficient for domain adaptation. erefore, in the unsupervised domain adaptation of bearing fault diagnosis, it is verified that the selection of hyperparameter α is compatible with the datadriven cross-validation strategy when using the Riemann metric correlation alignment [37].

Performance under Noise Environment.
In the realistic industrial environment, the vibration signals are easy to be polluted by noise. is section will discuss the diagnosis accuracy of the proposed RMCA-1DCNN method in the noise environment. In our experiments, for six kinds of domain-shift scenarios, the source domain data remain the same, and the noise is added only to the target data to enlarge the distribution gap between the source and the target. e added noise is an additive white Gaussian noise, and the signals are compounded with different SNR. e definition of SNR is shown as follows: SNR dB � 10 log 10 P signal P noise ,  Figure 4: Accuracy corresponding to different α on three domain shifts. where P signal and P noise are the power of the signal and noise respectively. By definition, the more noise the signal contains, the smaller the SNR value is. Figure 6 shows the original signal of the inner race fault, the additive white Gaussian noise signal, and the composite signal of the two signals with 0 dB of the SNR value. e composite signal is seriously polluted by noise, and to distinguish the vibration features of the source signal visually is almost difficult.
To verify the antinoise performance of the proposed method, we test the RMCA-1DCNN method with noise signals ranging from −2 dB to 10 dB. e results are shown in Table 4. When the SNR value increases, the diagnosis accuracy increases; when the SNR value decreases, the diagnosis accuracy decreases. When the SNR is more than 4 dB, the accuracy rate easily reaches above 97%. Analyzing the reasons, we can know that the larger the SNR value is, the less noise there is in the composite signal, the less the fault features are affected by noise, and the better the model performance is. e smaller the SNR value is, the greater the noise in the composite signal will be, which covers most of the vibration signals, resulting in the lack of fault characteristic information and worse model performance. Furthermore, we think that when the gap between the source and the target is small, the effect of the RMCA-1DCNN method is better; when the gap is large, the effect of the RMCA-1DCNN method is general.

Network Visualizations.
e features of the source and target domain test data in the last hidden layer are reduced to two dimensions and visualized using t-SNE dimension reduction technology to further understand the influence of RMCA-1DCNN on network training. Taking the domainshift scenarios of B ⟶ A, C ⟶ A, and B ⟶ C as examples, the features in the last hidden layer of the source and the target are shown in Figure 7. As presented in Figures 7(a)  and 7(b), the domain-shift scenario of B ⟶ C has no overlap between classes, and the distance between different classes is large, that is, the features are highly separable. erefore, the experiment has achieved a test accuracy of 100%.
In the domain-shift scenarios of B ⟶ A and C ⟶ A, Figures 7(a) and 7(b) show that the individual rolling body fault size of 0.021 inch is wrongly classified near the rolling body fault size of 0.021 inch, and overlaps are observed in the signal features between the two classes. In other words, the model has insufficient discrimination for the two kinds of signals. Hence, individual samples may be misclassified. is result is consistent with the diagnosis accuracy of 98%. e

Conclusions
We designed an RMCA-1DCNN model in this study to solve cross-domain learning problems in the bearing fault diagnosis field. RMCA-1DCNN aims to extract domain-invariant features that bridge the cross-domain discrepancy while strengthening the fault-discriminative capacity between the two domains. e experimental results on CWRU bearing datasets confirm the superiority of the proposed method. In future work, we will attempt to apply correlation alignment at multiple layers of 1DCNN in parallel, possibly further improving the domain adaptation performance of the proposed model.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.