A New Framework Based on Supervised Joint Distribution Adaptation for Bearing Fault Diagnosis across Diverse Working Conditions

. To address the degradation of diagnostic performance due to data distribution diferences and the scarcity of labeled fault data, this study has focused on transfer learning-based cross-domain fault diagnosis, which attracts considerable attention. However, deep transfer learning-based methods often present a challenge due to their time-consuming and costly nature, particularly in tuning hyperparameters. For this issue, on the basis of classical features-based transfer learning method, this study introduces a new framework for bearing fault diagnosis based on supervised joint distribution adaptation and feature refnement. It frst utilizes ensemble empirical mode decomposition to process raw signals, and statistical features extraction is implemented. Ten, a new feature refnement module is designed to refne domain adaptation features from high-dimensional feature set by evaluating the fault distinguishability and working-condition invariance of feature data. Next, it proposes a supervised joint distribution adaptation method to conduct improved joint distribution alignment that preserves neighborhood relationships within a manifold subspace. Finally, an adaptive classifer is trained to predict fault labels of feature data across varying working conditions. To prove the cross-domain fault diagnosis performance and superiority of the proposed methods, two bearing datasets are applied for experiments, and the experimental results verify that the model built by the proposed framework can achieve desirable diagnosis performance under diferent working conditions and that it apparently outperforms comparative models.


Introduction
In the last several years, with the speedy and sustained advancement of modern industrial equipment, rotating machinery plays a major role in various production scenarios, such as transportation, mining, logistics, electricity, and manufacturing [1].Due to that, the bearing is one of the most important units of industrial machinery, and the malfunction of the bearing may cause serious accidents and economic losses.Moreover, bearing typically operates under complicated operating circumstances, which may cause it easy to malfunction.Importantly and challengingly, it is mostly difcult to collect fault samples of real-world mechanical facilities under variable operating conditions [2].
Terefore, when facing real-world industrial scenes, the most existing artifcial intelligence-based fault diagnosis techniques of rolling bearing still sufer from some challenges, such as data distribution diferences and inadequate fault samples [3,4].
Artifcial intelligence technologies applied to fault diagnosis of bearings are mainly divided into three classes: classical machine learning-based method (CMLM), deep learning-based method (DLM), and transfer learning-based method (TLM) [5,6].Commonly, CMLM that has been widely studied since many years ago include the support vector machine (SVM) [7], artifcial neural network (ANN) [8], k-nearest neighbor (KNN) [9], extreme learning machine (ELM) [10], and random forest (RF) [11].Tese methods possess some major drawbacks, including heavy reliance on expert knowledge under variable working conditions and a default assumption that the samples share the same probability distribution [3,6].At present, DLM has attracted widespread attention and research with the help of their powerful ability to automatically extract deep features with better representation performance.Commonly, studied approaches include deep auto-encoder (DAE) [12], deep residual network [13], deep belief network (DBN) [14], and convolutional neural network (CNN) [15].Nevertheless, several shortcomings of DLM are still prominent [1,3].Particularly, the fault diagnosis of rotating machinery based on traditional DLM adheres to the hypothesis that the data under diverse working conditions follow the identical distribution, which is adversarial to data distribution deviation under actual operating status.Furthermore, a bearing fault diagnosis model based on DLM requires sufcient training samples to achieve ideal fault diagnosis performance, which contradicts the insufcient fault data under actual industrial scenes.Furthermore, DLM usually involves a high-cost and high-time-consuming procedure to tune numerous hyperparameters [3].
To date, TLM has made increasing attention and research in cross-domain fault diagnosis (CFD) due to their distribution adaptation ability that is hope to tackle the above challenges of CMLM and DLM.TLM intends to learn the related domain knowledges from source domain (SD) and utilize them to target domain (TD).In the bearing fault diagnosis feld, a fault dataset under one working state can constitute a domain.Transfer learning methods can be mainly divided into two classes: classical manual feature extraction-based transfer learning (TL) approaches and deep transfer learning (DTL) approaches [3,16].Although DTL methods have attracted increasing attentions in bearing fault diagnosis towards diferent working conditions, they still have some drawbacks.A common and important one is that a desirable DTL-based fault diagnosis model requires a highcost and time-consuming procedure because of the adjustment of numerous hyperparameters.Accordingly, in this article, we focus on the typical feature-based TL approach to achieve the desirable CFD of rolling bearing in real-world industrial scenarios.Commonly studied feature-based TL methods mainly include the balanced distribution adaption (BDA) [17], joint distribution adaption (JDA) [18], transfer component analysis (TCA) [19], geodesic fow kernel (GFK) [20], and joint geometrical and statistical alignment (JGSA) [21].Based on these methods, some intelligent models for cross-domain diagnosis have been investigated.In [22], a transfer deep learning network was proposed to resolve the drawbacks of existing rolling bearing fault algorithms on the basis of deep learning.In this network, the feature transfer using TCA and a pretrained convolutional neural network is performed.In [23], a source domain multisample JDA (SM-JDA) approach was used for the bearing fault diagnosis under variable operating conditions.In [24], the BDA was introduced to facilitate the domain adaptation on bearing cross-domain fault diagnosis.In [25], aiming at the domain shift (distribution discrepancy) issue in the feld of bearing fault diagnosis, the multikernel joint distribution adaptation (MKJDA) with dynamic distribution alignment is proposed for bearing fault diagnosis.In [3], based on BDA, a new balanced adaptation regularization was designed to solve the problem of sample distribution discrepancy-caused degradation of CFD performance.In [26], an adaptive manifold probability distribution was studied for CFD; in this method, the GFK was implemented for distribution adaptation, and a domain adaptive classifer was further trained to diagnose the target domain under diferent working conditions.In [27], transfer sparse coding and JGSA were combined to construct a novel fault diagnosis approach for bearing under diferent operating status.Although the above-mentioned methods have successfully realized CFD of bearings, three issues are still blocking the application of these methods in actual industrial scenarios.(1) Te implementation of distribution adaptation in most studies is based on the probability distributions alignment in the primitive characteristic space, which makes it difcult to tackle the issue of feature distortion and may lead to the poor domain adaptation (DA) performance [28].(2) Te goals of mostly distribution adaptation of TLM merely concentrate on decreasing probability distribution diferences and enhancing the transferability of features, and the class distinguishability of features is usually neglected, which may lead to the poor classifcation performance [29,30].(3) In the process of distribution adaptation, the impact of class information and neighborhood relationships of feature data on distribution adaptation has not been efectively considered, which may restrict the CFD accuracy and generalization ability of the model [28,31].
Considering three issues of the above-mentioned TLM approaches, we investigate a new DA idea, that is, joint distribution alignment with neighborhood relationship preserving in manifold subspace.Moreover, for improving the DA capability, we consider the impact of fault discriminability and working-condition invariance (WCI) of features in the procedure of DA.Terefore, we designed a feature refnement module to refne features with the better domain adaptability from the original high-dimensional feature set (OHFS).In view of the above discussion, this study proposes a new CFD framework for bearing on the basis of feature refnement and supervised JDA.Tere are four modules in this framework: signal processing and feature extraction module, feature refnement module, DA module, and classifer module for CFD.In the signal processing and feature extraction module, it uses ensemble empirical mode decomposition (EEMD) to decompose the raw signals collected from bearing and conducts feature extraction.For the feature refnement module, a domain adaptation feature refnement based on classifcation accuracy and distribution discrepancies (DFCD) is investigated to estimate the fault distinguishability and WCI of feature.In DA module, a new DA method, termed improved JDA with manifold subspace learning and neighborhood relationship preserving (IDAMN), is proposed.Finally, in the cross-domain classifer module, the classical machine learning classifer, the KNN, is trained by labeled data of SD, and the trained classifer predicts the labels of data of TD.Te main contributions are summarized as follows: Te rest of the contents are arranged as follows.In Section 2, the preliminary knowledges of ensemble empirical mode decomposition, domain adaptation, MMD, and local fsher discriminant analysis are introduced, respectively.Section 3 describes the DFCD-IDAMN framework.In Section 4, the experimental validation is given to illustrate the performance of the proposed methods.Te conclusions of this work are presented in Section 5.

Ensemble Empirical Mode Decomposition (EEMD).
EEMD was proposed to overcome the mode confusion problem of empirical mode decomposition (EMD), its basic principle is that Gaussian white noise is added into raw signals, and signals can be automatically distributed to the appropriate reference scale.Terefore, EEMD can achieve the better time-frequency analysis of nonstationary vibration signals from bearings [32,33].Te procedure of EEMD is illustrated in Figure 1, and the specifc implementation process of EEMD is as follows [34]: (1) Given an original signal s(t), set up the variable i as 1, and set up the average times of EEMD as N.
(2) Add the Gaussian white noise (GWN) n i (t) to s(t), and the signal s i (t) can be obtained.Te expression of s i (t) is as follows: (3) Apply EMD to process s i (t), and various intrinsic mode functions (IMF) and the corresponding residual components can be obtained; the expression of s i (t) can be presented as follows: where IMF ij (t) represents the j-th IMF component obtained by EMD, J is the numbers of IMF, and r ij (t) represents the residual components.(4) Add diferent GWN to s(t) and repeat steps ( 2) and (3), obtain the sum and average of the IMF components reached in N decompositions to ofset the GWN, and the fnal IMF components can be obtained as follows: (5) Trough the above steps, s(t) is fnally decomposed to ∈ c represents the corresponding label set of X. D T and D S are drawn from two diferent probability distributions, and the optimization goal of DA is to shrink the distribution discrepancies between SD and TD [35].
MMD [36], a widely used nonparametric distance estimation in TL, was proposed by Gretton et al. for estimating the distance of distributions based on reproducing kernel Hilbert space (RKHS).Te MMD between distributions of D S and D T can be expressed as where ‖•‖ H represents the RKHS norm and ϕ(•) is the transformation function that transforms data to a RKHS.Aiming at the challenge of that inconsistent feature distribution is existed in CFD, the MMD has been widely utilized to estimate distribution discrepancies between domains and align data distributions.
Shock and Vibration 3

Local Fisher Discriminant Analysis (LFDA).
LFDA was proposed by improving local fsher analysis (LFA) by Sugiyama [37], and it is a classical supervised dimensionality reduction approach.Let f i ∈ R d , i � 1, 2, • • • , n be d-dimensional data and y i ∈ [1, c] be the corresponding category labels, where n and c are, respectively, the number of f i and the class number of data.
According to the literature [37,38], the objective of LDA is to maximize the proportion of the between-class scatter matrix (BSM) S b to the within-class scatter matrix (WSM) S w : where A is a mapping matrix, and the defnitions of S b and S w are as follows: where where n l is the number of samples in class l.Compared to LDA, the higher objective of LFDA is that the betweencategories divisibility is maximized and the within-category local manifold structure is preserved simultaneously in a new feature space with reduced dimension.Based on the above S b and S w , the local relationship of feature data can be incorporated into the defnition of weight.Accordingly, the new BSM S b and WSM S w have been substituted for  S b and S w , respectively.Te expressions of  S b and  S w are presented as follows [37]: where where the defnition of A ij is shown as follows: where c i and c j are the local scaling around f i and f j .In order to adapt to conditions as close as possible to actual industrial scenarios, the DFCD runs on this input: tagged feature data in fault states and normal state from SD, untagged feature data in fault states from TD, and feature data in normal state from TD. Te reason for setting such input is that in actual industrial scenes, it is unknown that which category the newly collected samples belong to and samples in all fault states under one specifc working condition are usually easy to prepare and obtain; therefore, the inputted feature data from TD is untagged.However, for any mechanical equipment, the samples in their normal state under all working conditions are easily accessible.Accordingly, the labeled feature data from SD are used to evaluate the fault discriminability due to its known label, and only feature data in the normal state from TD are used to measure the WCI of feature.

DFCD-IDAMN Framework
According to the structure shown in Figure 4, the labeled feature data (contains multiple fault categories) in a certain operating condition and normal status feature data in other operating conditions are used for feature evaluation.Firstly, it randomly divides the labeled feature data into the training and testing data, and it trains a KNN classifer to predict the class labels of the testing data.Accordingly, the classifcation accuracy of each feature can be used to measure the fault discriminability.Ten, the normal state feature data in two working conditions is implemented to calculate the MMD and KLD of features, which accomplishes the quantifcation of the WCI of feature.Finally, a novel evaluation index for domain adaptation features refnement, the domain adaptability index (DAI), is built.In this study, we presume that the feature with higher DAI is more advantageous to domain adaptation and fault classifcation.Te detailed description of DFCD is as follows.

Evaluate Fault Discriminability of Feature Based on
Classifcation Accuracy.Given a high-dimensional original feature set (OFS) that includes P feature samples containing where f q i represents the q-th feature of the i-th sample.Accordingly, the OFS can be presented as follows: Te row of OFS, the frst feature data and Y test , the expression of accuracy (q) is presented as follows: Te remaining features are also handled in the same way.Let accuracy (q) denote the classifcation accuracy of the q-th feature.Terefore, it can obtain the classifcation accuracy sequence, accuracy(1), accuracy (2) In this study, we presume that the higher value of classifcation accuracy indicates the better fault discriminability.

Measure WCI of Feature Based on MMD and KLD.
For a more comprehensive WCI evaluation of features, MMD and KLD are employed to evaluate the distribution diference between feature samples from SD and TD.Te basic principle of MMD is introduced in Section 2.2.Te details of KLD are described as follows [39].
KLD is an efective metric tool to estimate the distribution diferences [40], and it is often applied in the felds of statistical learning, information technique, signal Shock and Vibration processing, etc.Given two probability density functions of two diferent variables as pro d 1 and pro d 2 , the KLD is represented on the basis of the defnition of information entropy.
where the function I(•) has no symmetry, that is, According to the references [39][40][41], the expression of KLD in symmetric form can be denoted as Based on the basic principles of MMD and KLD, given normal state feature sets OFS normal 6 Shock and Vibration where f Q sM represents the q-th feature of the M-th sample from SD, f Q TM represents the the q-th feature of the M-th sample from TD, and M is the number of normal state feature sample.Te frst row of OFS normal s and OFS normal T and the frst feature data are used to calculate the MMD and KLD, which can obtain the MMD and KLD of the frst feature data between SD and TD.Te remaining features data are also handled in the same way.Let mmd (q) and kld (q) denote the MMD and KLD of the q-th feature, respectively.Terefore, it can obtain the MMD sequence mmd(1), mmd (2) In this study, we presume that the WCI of feature is better when the sum of MMD and KLD is smaller.

Build the Domain Adaptability Index.
According to the estimation of fault discriminability and WCI of features, based on the classifcation accuracy, MMD, and KLD, a new domain adaptability index, DAI, is proposed to assist refne domain adaptation features.For the n-th feature, the defnition of DAI is presented as follows: where θ is a trade-of parameter.Ten, we can obtain the DAI sequence of Q features, DAI � DAI(1), DAI(2), • • • , { DAI(Q)}.In this work, it is supposed that the domain adaptability of feature is stronger when the corresponding value of DAI is higher.Accordingly, we can refne domain adaptation features from OHFS by sorting the DAI sequence in descending order, and the features with high DAI values are used to form feature subset for domain adaptation.

Improved JDA with Manifold Subspace Learning and Neighborhood Relationship Preserving (IDAMN).
Aiming at three signifcant issues of many existing DA approaches based on feature-based TL: (1) the implementation of distribution adaptation in most studies is based on the probability distributions alignment in the original complex and high-dimensional feature space, which is difcult to tackle the issue of feature distortion and may lead to the poor domain adaptation performance [28].(2) Te optimization goals of numerous ready-made DAs of TLM merely concentrate on decreasing the distribution diferences and enhancing the transferability of features, and the class distinguishability of feature is usually neglected, which may lead to the poor classifcation performance [29,30].(3) In the process of distribution adaptation, the impact of class information and neighborhood relationships of feature data on distribution adaptation has not been efectively considered, which may degrade the CFD performance and generalization ability of the model [28,30].Terefore, in this section, on the basis of the idea that is joint distribution alignment with neighborhood relationship preserving in manifold subspace, a novel domain adaptation method, IDAMN, is designed.Tere are four steps of IDAMN.(1) Grassmann manifold subspace learning; (2) joint distribution alignment; (3) neighborhood relationships preserving; and (4) improved joint distribution adaptation.Te details of IDAMN are presented as follows.

Grassmann Manifold Subspace
Learning.Tis work applies the classical unsupervised manifold learning approach of the geodesic fow kernel (GFK) to learn lowdimensional manifold structure of feature set in original high-dimensional space [42].Accordingly, some features with certain geometrical structures in the manifold subspace can be obtained, which can overcome the problem of feature distortions in the raw feature space [28,43].Given that the labeled feature dataset of SD and TD are, respectively,  Shock and Vibration expressed as X S and X T , then, the GFK is implemented to map original feature data X S and X T into Grassmann manifold (GM) space G(d) by X [20,42], and the Z S and Z T can be obtained, respectively.Te detailed introduction of GFK can be referred to [20,42].
In particular, the prevailing subspace dimension of GFK must be set to less than half of the input feature space dimension.Terefore, aiming at the scenario that the input feature dimension is less than twice the set dimension of manifold subspace, before executing unsupervised manifold learning of GFK, it will conduct dimension size comparison and automatic adjustment.Specifcally, if the feature dimension is less than twice the dimension of the set manifold subspace, the dimension of the manifold subspace will be set as the half of the feature dimension.Conversely, if the feature dimension is greater than twice the dimension of the set manifold subspace, GFK will be implemented under the set manifold subspace dimension.

Joint Distribution Alignment.
In order to further shrink the distribution divergences between SD and TD, joint distribution alignment is introduced.It includes two parts: marginal distribution alignment (MDA) and conditional distribution alignment (CDA).
(1) MDA.Let Z S and Z T denote the representations of SD and TD data on the GM space, respectively.Te corresponding marginal distributions of them are P(Z S ) and P(Z T ).Te marginal distribution alignment is conducted by minimizing the MMD between P(Z S ) and P(Z T ) [17].Te expression of MMD between P(Z S ) and P(Z T ) is shown as follows: MMD 2  H P S , P T  � where H represents the RKHS.tr(W T ZL 0 Z T W) represents the trace of W T ZL 0 Z T W, W is optimal transformation matrix, and Z denotes the input feature data matrix composed of Z S and Z T .Te defnition of matrix L 0 is presented as follows: where n S and n T are the number of Z S and Z T , respectively.By minimizing equation (22), a new representation W T Z can be obtained to achieve that the marginal distribution discrepancies between the SD and TD are narrowed.
(2) CDA.Te CDA is conducted by minimizing the MMD between conditional distributions Q S (Y S | Z S ) and Q T (Y T | Z T ) [18].Aiming at the lack of Y T , it utilizes base classifer f trained on the Z S with Y S , the pseudo labels  Y T of the TD data Z T can be easily predicted by f [18].Due to that, the Q S (Y S | Z S ) and Q T (Y T | Z T ) are posterior probabilities and quite involved, it can explore the sufcient statistics of [44].c is the category in the label set and c ∈ [1, 2, • • • , C] (C is the total number of categories) [18].Terefore, the MMD between the Q S (Z S | Y S � c) and Q T (Z T | Y T � c) can be expressed as follows: where T are the number of samples pertaining to class c, respectively.Accordingly, the MMD matrix L c can be obtained by the following equation: .
When the minimum of equation ( 24) is achieved, a new representation W T Z can be obtained to achieve that the conditional distribution discrepancies between Q S (Y S |Z S ) and Q T (Y T |Z T ) are narrowed.

Neighborhood Relationships Preserving.
In order to consider the impact of class information and neighborhood relationships of feature data in the process of distribution adaptation, inspired by the principles of LDA [45] and LFDA [37], a new local minimum margin criterion matrix (LMMCM) is designed to utilize the label information while preserving the local neighborhood geometry of the feature data.Te expression of LMMCM is presented as follows: where where n, l, and n l are, respectively, the number of feature sample, class label of feature sample, and the number of feature samples that belongs to the class l. p Lb ij and  p LW ij constitute weight matrices.In  p Lb ij , the meaning of z i ≠ z j (j ∈ Nst(i)) is that j is the nearest neighbor of i and they pertain to diferent classes.A ij ∈ [0, 1] is defned as follows: where c i � ‖z i − z m i ‖ represents the local scaling around z i and z m i is the m-th nearest neighbor of z i .When z i and z j are closer, the A ij is larger, if not, the A ij is smaller.By introducing the LMMCM, the local neighborhood geometry of the feature data, including the neighborhood relationships between data of the same category and the neighborhood relationships between data of diferent classes, can be considered.Furthermore, the class label information is effectively introduced, and it can improve the discriminability of feature data by minimizing the LMMCM.

Improved Joint Distribution
Adaptation.On the basis of the above three contents, we design an improved joint distribution adaptation, D IJDA (Z S , Z T ); it is defned as follows: where β ∈ [0, 1] and η ∈ [0, 1] are adjustable parameters and β tunes the proportion of the marginal and conditional distributions adaptation.According to equations ( 22) and ( 24), the D IJDA (Z S , Z T ) can be further expressed as follows: According to the optimization objective of JDA and equation ( 26), the optimization goal of IDAMN can be defned as where λ is the regularization parameter with ‖ • ‖ 2 F the Frobenius norm and λ‖W‖ 2  F is used to ensure the optimization problem to be well-defned.I ∈ R (n S +n T )×(n S +n T ) and E represent the unit matrix and centering matrix, respectively.E � I − (1/(n S + n T ))1, and 1 is the (n S + n T ) × (n S + n T ) matrix of ones.For the solution of equation (34), based on the constrained optimization theory, set Lagrange multipliers Φ � diag(ϕ 1 , ϕ 2 , • • • , ϕ k ) ∈ R k×k ; accordingly, the Lagrange function for solving equation ( 34) is as follows: By setting derivative zL/zW � 0, the solution of equation ( 34) can be derived as a generalized eigendecomposition problem as follows: According to equation (36), fnally, the optimal adaptation matrix W is built by using the k smallest eigenvectors, and new feature representations U S � W T Z S and U T � W T Z T are obtained.Ten, it can use labeled U S to learn an adaptive classifer f, and the learned adaptive classifer f is employed to predict the label of unlabeled U T .
In summary, the overall complete procedures of IDAMN are presented as follows: (1) Input: Source and target domains feature set X S and X T , true labels Y S of X S , manifold subspace dimension d, regularization parameters λ, β, and η, and dimension of output source and target domains Shock and Vibration feature space k.Te iteration is i.Te dimension of X S or X T is d f .When the d f < 2 × d, the manifold subspace dimension d will be set as 0.5 × d f .(2) By equation (28), learn the Grassmann manifold transformation kernel G to transform the original feature data (X S and X T ) into G(d) with Accordingly, the new source domain Z S and new target domain Z T are obtained.
(3) Learn a base classifer on Z S and conduct prediction on Z T to obtain its pseudo labels  Y T .(4) Constitute Z � [Z S , Z T ]; compute L 0 and L c by equations ( 23) and (25).Compute S L w and S L b by equations ( 27) and ( 28). ( 5) Solve the eigendecomposition problem in equation (36) and use k smallest eigenvectors to form adaptation matrix W, and

Complete Process of the Cross-Domain Fault Diagnosis
Based on the DFCD-IDAMN.Based on the DFCD-IDAMN framework and cross-domain fault diagnosis tasks, the complete process is described in detail as follows: (

Experimental Verification
In  3, from which 162 statistical features can be extracted to form the original high-dimensional feature set.Vibration signal samples of no defect bearing and inner raceway defect under motor loads of 0 hp, 1 hp, 2 hp, and 3 hp are presented in Figure 6.Te corresponding IMFs from these samples are presented in Figures 7 and 8. Secondly, it carries out the feature refnement module.Te proposed DFCD evaluates the fault distinguishability and WCI of 162 statistical features, which obtains the DAI of them and helps to refne features with better domain adaptability from high-dimensional original feature set.Take the no defect vibration data under motor load of 0 hp as an example.Figure 9 presents the DAI of 162 statistical features.From the fgure, it can be seen that diferent features have diferent DAI values, and it indicates the diferent domain adaptability quantifcation results of diferent features.For the 39th and 42nd features, their DAI values are signifcantly higher than other features, and it shows that their domain adaptability is more prominent.Terefore, in this study, we assume that the higher DAI value indicates the greater domain adaptability.Terefore, the DFCD can help to refne some features (they are more advantageous to domain adaptation) by manually select a threshold of the DAI value, and these refned features are processed by the subsequent domain adaptation module.
Next, the refned features obtained by performing the feature refnement module constitute a cross-domain adaptation feature set (CDAF), and the labeled CDAF of the SD and the unlabeled CDAF of the TD are inputted into the proposed IDAMN domain adaptation method, achieving the joint distribution alignment with neighborhood relationship preserving is performed in Grassmann manifold subspace, and learning an adaptive classifer f for CFD.Finally, the learned classifer f is learned and it can predict the labels of the target domain feature set; therefore, the CFD result can be calculated.
After performing the above steps, the experimental results of 12 CFD tasks are listed in Table 4.It shows the mean diagnosis accuracies of 12 bearing defect types under different numbers of domain adaptation features (nf ).According to the diagnosis accuracies of these 12 CFD tasks, it can easily conclude the following analysis.Firstly, the proposed DFCD-IDAMN framework for CFD of bearings can achieve ideal fault diagnosis result.Te diagnosis accuracies of tasks 2, 4, 5, 6, 9, and 12 can reach 100% with the suitable nf.Tasks 1, 3, and 7 can attain over 99.5% diagnosis accuracy.Accordingly, the efectiveness of the DFCD-IDAMN framework can be validated.Secondly, it is evident that the use of the proposed DFCD has an apparent efect on the fault diagnosis accuracy.Without using DFCD, all of 162 features are utilized for the subsequent IDAMN domain adaptation method and CFD, the diagnosis result is not ideal.Te diagnosis accuracies of tasks 1-12 are 96.46%,99.58%, 83.33%, 99.17%, 100.00%, 89.58%, 98.54%, 97.29%, 99.38%, 95.83%, 82.29%, and 99.58%, respectively.When the DFCD is applied and the refned CDAF is employed for the subsequent procedure, it can attain desirable CFD accuracies that are apparently higher than that of diagnosis without using DFCD.Te maximum accuracies (mda) of 12 CFD tasks are 99.79%,100.00%, 99.58%, 100.00%, 100.00%, 100.00%, 99.58%, 98.54%, 100.00%, 96.88%, 91.67%, and 100.00%, respectively.Terefore, the efectiveness of the DFCD with a suitable nf for improving fault diagnosis accuracy can be verifed.Te above CFD experiment involves some parameters of DFCD and IDAMN that need to be manually chosen.For the basis for setting hyperparameters of the proposed methods, the specifc values of these parameters are set based on experimental experience.Terefore, we directly present the relevant parameter values in this manuscript.For the DFCD, the corresponding parameters set in DFCD include trade-of parameter θ � 0.5.Te parameters set in IDAMN include manifold subspace dimension d � 50, regularization parameters λ � 0.1, β � 0.3, and η � 0.5, dimension of output source and target domains feature space k � 20.Iterations i � 10.In particular, although the manifold subspace dimension is set as 50, when the feature dimension after the proposed feature refnement (that is nf ) is less than twice of the set manifold subspace dimension, the manifold subspace dimension will be automatic adjusted as the half of nf.In Table 4, when the nf is 40, 50, 60, 70, 80, and 90, the manifold subspace dimension will be automatic adjusted as 20, 25, 30, 35, 40, and 45.On the contrary, when nf is not less than twice of the set manifold subspace dimension (when the nf is 100 to 162), the GFK is implemented under the set manifold subspace dimension 50.

Comparative Analysis with Other Fault Diagnosis
Models.In an efort to further validate the advantages of the DFCD-IDAMN framework for CFD, some common and competitive approaches are used to conduct a series of comparison experiments, these methods include KNN, SVM, DAE, CNN, DBN, JDA, TCA, JGSA, BDA, and GFK.Te reason of this set up is as follows: (1) it choses three   5 presents comparative models built by these methods, DFCD and IDAMN.Tese comparative models are labeled as M1-M18 and can be divided into three types.

Shock and Vibration
(1) Te models are not combined with domain adaptation methods, and they only utilize the original highdimensional feature set (OHFS) and classical classifers.Take M1 as an example; it is a classical classifer-based model, and the OHFS is directly inputted in the SVM classifer for cross-domain fault diagnosis.(2) Te models are combined with domain adaptation methods, and they use the OHFS, domain adaptation methods, and base classifer.Take M7 as an example, it is a domain adaptationbased model.Te OHFS is frstly processed by TCA, and the output features are inputted in the KNN classifer.(3) Te models are combined with DFCD and domain adaptation methods, and they use the OHFS, DFCD, domain adaptation methods, and base classifer.Take M13 as an example, the OHFS is frstly refned by the proposed DFCD, then, the refned features are processed by TCA, and fnally the output features are inputted in the KNN classifer.

Shock and Vibration
Moreover, we select some other literature that used similar DA methods for cross-domain fault diagnosis experiments that are similar to ours and compare our experimental results with them.Shock and Vibration some features (they are more advantageous to domain adaptation) by manually select a threshold of the DAI value, and these refned features are processed by the subsequent domain adaptation module.In particular, although the manifold subspace dimension is set as 40, when the feature dimension after the proposed feature refnement (that is nf ) is less than twice of the set manifold subspace dimension, the manifold subspace will be automatic adjusted as the half of nf.In Table 10, when the nf is 40, 50, 60, and 70, the manifold subspace dimension will be automatic adjusted as 20, 25, 30, and 35.On the contrary, when nf is not less than twice of the set manifold subspace dimension (when the nf is 80 to 162), the GFK is implemented under the set manifold subspace dimension 40.

Comparative Analysis with Other Fault Diagnosis
Models.Te comparative models used in this section are also shown in      Te bold values highlight that the experimental results are desirable.
Shock and Vibration

Conclusions
Tis work designs a new framework based on the proposed DFCD and IDAMN for rolling bearing across diverse operating conditions.In this framework, the EEMD is frst applied for signals processing and statistics-based features extraction.Ten, the DFCD is employed to refne the features by evaluating the fault distinguishability and WCI.Next, the IDAMN is performed to maps the feature data into a GM subspace and further achieves improved JDA with neighborhood relationship preserving.Finally, an adaptive classifer is trained for fault diagnostic.By utilizing bearing data collected from two experimental platforms, extensive fault diagnosis experiments are conducted.Tese experimental results show the following: (1) the DFCD can efectively refne features with the better domain adaptability; accordingly, the utilization of the DFCD has a signifcant enhancement on the diagnosis accuracy of domain adaptation-based models.
(2) IDAMN possesses more robust domain adaptation ability than JDA, TCA, BDA, JGSA, and GFK.(3) Te model built by the DFCD and IDAMN can attain a desirable cross-domain fault diagnosis accuracy with a suitable nf, which presents a promising capability for employing it in practical industrial scenarios with variable working conditions.In future, we are planning to develop stronger domain adaptation-based approaches for more complicated fault detection scenes and conduct research on adaptive optimization methods for related parameters used in the proposed methods.22 Shock and Vibration
respectively, feature sets pertaining to class c.  y(z i ) is the pseudo tag of the TD data z i .n (c) S and n(c)

8
Shock and Vibration where S L w and S L b are local WSM and local BSM.Te S L w and S L b are expressed as follows:

6 ) 8 )
Train an adaptive classifer f on W T Z S , Y S   and update the pseudo labels  Y T of target domain data,  Y T � f(W T Z T ).(7) Construct the MMD matrices L c Repeat the step (4) until the iteration i .(9) Output the learned adaptive classifer f.

Table 1 :
Te CWRU bearing data for experiments.categories methods: classical machine learning methods, classical deep learning methods, and classical transfer learning methods, which are used to compare the efectiveness diferences between them.(2) KNN and SVM are classic classifers that have been widely used and are very representative.(3) DAE, CNN, and DBN are widely developed and studied classical deep learning approaches.(4) JDA, TCA, JGSA, BDA, and GFK are representative transfer learning methods that have gradually received attention and study from many researchers in recent years.Table

Figure 6 :Figure 7 :
Figure 6: Vibration signal samples of no defect bearing and inner raceway defect bearing.(a) A vibration signal sample of no defect bearing under motor loads of 0 hp and 1 hp.(b) A vibration signal sample of no defect bearing under motor loads of 2 hp and 3 hp.(c) A vibration signal sample of inner raceway defect bearing under motor loads of 0 hp and 1 hp.(d) A vibration signal sample of inner raceway defect bearing under motor loads of 2 hp and 3 hp.

Figure 16 :
Figure16: DAI of 162 statistical features that extracted from no defect bearing vibration data under motor speed of 1730 rmp (trade-of parameter θ is 0.5).
is used to obtain classifcation accuracy by KNN classifer.Te labeled q-th feature data from source domain are randomly divided into the training dataset D train , Y train   and the testing dataset D test , Y test  .Te D train and D test present the training and testing data samples, respectively.Te Y train and Y test are the corresponding labels of D train and D test .On this basis, the D train , Y train   is employed to train the KNN classifcation, and the trained KNN predicts the labels of D test .Accordingly, the predicted labels of D test , termed as Y Source and target domains feature set X S and X T , true labels Y S of X S , manifold subspace dimension d, regularization parameters λ, β, and η, and dimension of output source and target domains feature space k.Te iteration is i.On this basis, the proposed IDAMN is performed; accordingly, new feature sets U S , U T and adaptive classifer f are obtained.Finally, the cross-domain fault diagnosis accuracy is calculated.
source and OFS t arg et .Terefore, the new feature sets of source and target domains X S and X T are obtained for the subsequent step.(4)Input:

Table 2 :
The CFD tasks for case 1.

Table 4 :
CFD results obtained by DFCD-IDAMN framework in case 1.Te bold values highlight that the experimental results are desirable.

79 nf : 67 100.00 nf : 80 99. 8 nf : 92 100.00 nf : 80 100.00 nf : 60 100.00 nf : 90 99. 8 nf : 81 99.79 nf : 46 100.00 nf : 60 96.88 nf : 131 99.38 nf : 0 100.00 nf : 60
4.2.2.Diagnosis Results of the Proposed DFCD-IDAMNFramework.To further demonstrate the performance and advantages of the DFCD-IDAMN framework, bearing datasets from the SQI-MFS test-bed under diverse working speeds are employed for CFD experiments, and the contents cantly higher than other features, and it shows that their domain adaptability is more signifcant.Due to that, this work assumes that the higher DAI value indicates the greater domain adaptability; therefore, the DFCD can help to refne

Table 10
viously improved CFD accuracies.Terefore, the efectiveness of DFCD-IDAMN framework is validated again.Te above CFD experiment involves some parameters of DFCD and IDAMN that should be manually set.For the basis for setting hyperparameters of the proposed methods, the specifc values of these parameters are set based on

Table 7 :
Comparison of experimental results between DFCD-IDAMN and relevant methods from other literatures.

Table 6
, and the experimental contents are the same as case 1. Te corresponding cross-domain fault diagnosis results are listed in Table11and Figure17.It is also obviously concluded that the performance of the model built by the DFCD-IDAMN framework signifcantly surpasses that of the other models.Te detailed comparative analysis is illustrated as follows.(1)Comparing the DFCD-IDAMN model with M1-M6 (base classifer-based models), the diagnosis accuracies of tasks 1 and 2 of DFCD-IDAMN model are remarkably higher than that of the M1-M6 models.Moreover, the OHFS-IDAMN model can achieve the higher diagnosis accuracies in tasks 1 and 2 than M1-M6 models.(2)Comparing the OHFS-IDAMN (M12) model with the M7-M11 models (domain adaptation-based models), the diagnosis accuracies of tasks 1 and 2 are noticeably higher than M7-M11 models.Te accuracy of the M12 model in task 1 can attain 86.17%, which is, respectively, 10.17%, 3.67%, 20.00%, 6.00%, and 11.67% higher than the M7-M11 models.Accordingly, for domain adaptation ability, it is evident that the proposed IDAMN outperforms traditional JDA, BDA, TCA, JGSA, and GFK, which can efectively increase the CFD accuracy.(3)Comparing M7-M12 (domain adaptation-based models without DFCD) with M13-M18 (domain adaptation-based models with DFCD), it is easily found that the utilization of the DFCD has a remarkable improvement on the diagnosis accuracy of domain adaptation-based model, take OHFS-JDA (M8) and OHFS-DFCD-JDA (M14) as examples, the accuracies of tasks 1 and 2 of the M14 model are, respectively, 89.00% and 83.83%; nevertheless, the M8 model only attains

Table 9 :
Te CFD tasks for case 2.
20 Shock and Vibration 82.50% and 72.00% accuracies, respectively, which is obvious inferior than the M14 model.Accordingly, the above experimental analysis once again shows that the DFCD can help to refne features with strong domain adaptability, which can efectively enhance domain adaptation performance and increase CFD accuracy.To sum up, extensive experiments are carried out, and the results further validate the validity, adaptability, and superiority of the DFCD-IDAMN framework under diverse working speeds.

Table 11 :
CFD results of M1-M18 models in case 2.Te bold values highlight that the experimental results are desirable.