RollingBearing FaultDiagnosis Based onDomainAdaptation and Preferred Feature Selection under Variable Working Conditions

In real industrial scenarios, with the use of conventional machine learning techniques, data-driven diagnosis models have a limitation that it is difficult to achieve the desirable fault diagnosis performance, and the reason is that the training and testing datasets are assumed to have the same feature distributions. To address this problem, a novel bearing fault diagnosis framework based on domain adaptation and preferred feature selection is proposed, in that the model trained by the labeled data collected from a working condition can be applied to diagnose a new but similar target data collected from other working conditions. In this framework, an improved domain adaptation method, transfer component analysis with preserving local manifold structure (TCAPLMS), is proposed to reduce the differences in the data distributions between different domain datasets and, at the same time, take the label information of feature dataset and the local manifold structure of feature data into consideration. Furthermore, preferred feature selection by fault sensitivity and feature correlation (PSFFC) is embedded into this framework for selecting features which are more beneficial to fault pattern recognition and reduce the redundancy of feature set. Finally, vibration datasets collected from two test platforms are used for experimental analysis. *e experimental results validate that the proposed method can obviously improve diagnosis accuracy and has significant potential benefits towards actual industrial scenarios.


Introduction
In various parts of rotating machinery, the failure probability of rolling element bearings (REBs) is often higher than that of other components when the rotating machinery operate in harsh working environments [1,2]. erefore, bearing fault diagnosis features prominently in industrial applications, such as ensuring its reliability and reducing economic losses [3]. With the advent of the big data era, signal processing and data mining technologies are undergoing rapid development, and data-driven fault diagnosis methods are also developing rapidly [3][4][5][6]. However, towards its applicability in practical industrial applications, the conventional intelligent diagnosis method based on datadriven has two main disadvantages [4,5,7]. (1) Feature extraction and fault classification models abide by a common premise that the training data and the test data have identical distributions. If this premise is not true, the generalization of these intelligent diagnosis methods will be greatly reduced. When the working conditions are not consistent in real industrial scenarios, the premise is not true. (2) Insufficient labeled target fault data are existed, due to that the working condition is variable and the types of failure of rotating machines are diverse. erefore, those conventional data-driven intelligent diagnosis methods cannot establish an accurate fault diagnosis model of the target bearing in real diagnosis scenarios. In order to overcome the above limitations, it is essential to construct an advanced fault diagnosis model. is advanced model can perform accurate fault classification in a specific dataset and can show significant generalization ability in unlabeled data which is from other working conditions.
In the process of fault diagnosis, the crucial steps for fault pattern recognition are signal processing and feature extraction. In the current research on bearing fault diagnosis, the analysis signals mostly come from the vibration signal of REBs, and this signal is generally analyzed by time-frequency domain methods [5][6][7][8][9]. Yu et al. [10] decomposed raw signal with MODWPT and calculated the reconstructing signal to obtain the primitive statistical features representing the fault state in different frequency bands. For compound fault diagnosis of bearings, Shao et al. [11] proposed a new diagnosis method called adaptive DTCWPT based on the high-order spectrum. In order to decompose and reconstruct the vibration signal faster and more accurately, Zhang et al. [12] raised a lifting wavelet method, which is better than the traditional wavelet and combined with the morphological fractal dimension to recognize the rolling bearing's running status. By combining VMD time-frequency analysis and SVM, Zhou proposed a new method in [13] to detect the fault of rolling bearings, where the original signal is decomposed into intrinsic mode function (IMF) set through VMD. For the weak damage characteristic extraction, Xingxing et al. [8] proposed a new fault diagnosis method based on VMD, which is guided by ICF. For detection and isolation of multiple failures, a new fault detection technology is proposed for rolling bearings by Zhang in [14], which combined FAWT with time-frequency analysis. To identify the health state under varying status for the rolling bearings, Zhong et al. [15] proposed an intelligence method based on STFT and CNN. In order to combine the characteristics of WT to extract interference instantly and the adaptive ability of EMD to analyze time-varying or nonlinear signal, a new method is raised by Merainani in [16], which combined EWT with S transform (ST).
After the raw vibration signal is processed, statistical characteristics can be selected to show characteristic information, such as PV, RMS, V, Sw, K, energy, and energy entropy. In [9], the raw signal was decomposed by wavelet decomposition and the reconstructed signals were obtained, and then 192 statistical characteristics are obtained from the corresponding HHT envelope spectrum (HES) of reconstructed signals. In [17], raw signals were decomposed into many distinct IMFs through EMD, the first four IMFs were selected to obtain the HES and HHT marginal spectrum, and then they were used to calculate statistical features. Wang et al. [18] proposed a technology of the original signal preprocessing based on wavelet packet denoising (WPD) and random forest algorithm (RFs) and used the value of SNR and mean square error as the mother wavelets set of the signal preprocessing. Shuangli et al. [19] combined wavelet analysis and entropy theory to decompose and reconstruct the collected signals and then input it into GA optimized SVM (GA-SVM). Considering the modulation characteristics of the bearings fault signal and the disadvantages of selecting the resonant high-frequency band based on experience, a bearings fault diagnosis technology was proposed by Jing in [20], which combines EMD and spectral kurtosis. Considering the effectiveness of the extracted statistical features in the entire life cycle process, Xiaodong [21] proposed a feature extraction approach based on the optimal selection of statistical indicators. Qing-feng et al. [22] used the variance contribution to eliminate false components in IMF and combined the wavelet packet decomposition to improve EMD. By using wavelet and FFT to decompose signals, Seryasat et al. [23] extracted the energy and RMS of signals in distinct frequency bands, which can recognize the bearings fault effectively. e high-dimensional characteristic sets can usually be obtained after signal processing and feature extraction. Taking account into the compound mapping relationship between the bearing fault and its sign, it is hard to decide which statistical attribute is worth reflecting the nature of the fault from the high-dimensional characteristic space. e high-dimensional characteristic sets are prone to generate redundant characteristics and lead to reduced accuracy and efficiency of troubleshooting. e key step of the classifying process is selecting a subset of features. It is indicated in previous studies that feature selection seems to be an important premise to attain the accuracy for prospective diagnosis. Yu et al. [17] proposed a novel method of feature extraction for selecting the most sensitive characteristics, which combines the STD of characteristic data with K-means method. Fei et al. [9] raised a characteristics selection technology to determine the fault-sensitive features by combining ARI and sum of within-class MD. Considering that the stable distribution can extract features with high discriminating ability, Chouri et al. [24] combined alpha-stable distribution feature extraction with the weighted support vector machine (WSVM) to extract features efficiently. For the problem of nonsensitive characteristics in the features set, Liu et al. [25] raised a characteristic selection approach based on sensitive characteristic extraction and nonlinear feature fusion, which used CDET to choose sensitive characteristics and weighing them and then used locality preserving projections (LPP) to reduce the size of the weighted sensitive features for getting more sensitive characteristics. Liu et al. [26] used the K-SVD approach to train a sparse dictionary from sample data and apply it to sparsely decompose the feature vector for fault classification and identification. Considering that the actual working condition is complex and changeable, Chen et al. [27] raised a cross-domain features selection approach with TCA. Sun et al. [28] raised a feature extraction and diagnosis algorithm using CNN, which can automatically select a feature from time-domain vibration signals and discover distributed features of data effectively. By analyzing vibration and current signals, BR et al. [29] selected 12 statistical time-domain characteristics, among which SSC, VAR, and STD can be determined as good characteristics. erefore, how to select the statistical features which are more beneficial to fault pattern recognition is also an important step. In this paper, a new characteristics selection approach, priority selection of features based on fault sensitivity and correlation between features (PSFFC), is proposed.
According to the above discussion, aiming at the two main limitations of conventional data-driven fault diagnosis technologies, some existing research recently has shown that transfer learning or domain adaptation methods [30] have vast application prospect and extensive applicability in every 2 Shock and Vibration field [31][32][33]. For overcoming the existing limitations, a novel idea was inspired by transfer learning or domain adaptation. For addressing the problem of insufficient fault information in significant fault samples, Chen et al. [34] presented an early fault diagnosis model, which combines DNN with transfer learning and can select the fault features of a great number of fault samples and unimportant fault features of other fault samples. With HKL and transfer learning, a robust fault diagnosis network applied to different working conditions was built by Qian in [35]. In order to overcome the problem that the training and test data have different distribution, and considering the valid and ordinary diagnostic knowledge obtained from multiple concerning source domains, Zheng et al. [3] raised a new intelligent fault recognition approach for multiple source domains. In order to overcome the problem that too few labeled samples are insufficient for accurate diagnosis, Zeng et al. [36] raised a fault diagnosis method with certain parameters unrelated to subsequent tasks for pretraining, and a great amount of unlabeled data was obtained. Zhou et al. [37] raised MSDCTL for fault diagnosis in new working condition, and it did not need new labeled data. To overcome the problem of unmarked cross-domain diagnosis of bearings, Shao et al. [38] raised a confrontation domain adaptive approach with deep transfer learning. e basic idea of the above studies is learning the fault detection knowledge from training data and transferring the knowledge to test data. It improves the capability to recognize the generalization of these models on the target dataset. Unlike the traditional machine learning techniques need to start from scratch to learn new tasks, transfer learning or domain adaptation learns the fault detection knowledge from the source domain and applies it to the target domain, which is more suitable for cross-domain learning applications [3,6]. e transfer learning method based on characteristic is the main branch in it, and among these methods, TCA is a representative method [6]. For reducing the distance between the source domain and the target domain, features are mapped to a higher-dimensional reproducing kernel Hilbert space by means of nonlinear transformation [30]. Although it is very useful, TCA needs further optimization, which ignores the category information of samples and the local manifold structure of data. erefore, based on TCA, transfer component analysis with preserving local manifold structure (TCAPLMS) is proposed in this paper, it reduces the distribution difference between the source domain and the target domain and improves discriminative of feature dataset.
Given the above discussion, a novel intelligent fault diagnosis method for bearings is proposed which is based on domain adaptation and preferred feature selection. e contributions of this paper are summarized as follows.
(1) A new feature selection method, PSFFC, is raised to select features which are more beneficial to fault pattern recognition and reduce the redundancy of the feature set. (2) An improved feature-based transfer learning approach, TCAPLMS, is raised for domain adaptation.
TCAPLMS can reduce the differences in the marginal distributions between different domain datasets and, at the same time, take the label information of the feature dataset and the local manifold structure of feature data into consideration, and then, the domain adaptability and the discriminant performance are improved. (3) A new intelligent bearing fault diagnosis framework is proposed to suit variable working conditions, which address the limitations of fault diagnosis methods based on data-driven and enhance the generalization ability of fault diagnosis model in actual diagnosis scenario. e rest of this paper is introduced as follows. Section 2 introduces the theoretical backgrounds of the MODWPT, LFDA, and TCA. Section 3 presents the fault diagnosis framework, the PSFFC and TCAPLMS. Section 4 is about the experimental analysis of the method, which is used to validate the effectiveness and adaptability. Finally, Section 5 is the conclusion. Furthermore, we present some acronyms in Table 1.

Maximal Overlap Discrete Wavelet Packet Transform (MODWPT).
Discrete wavelet transform (DWT) is a typical time-frequency analysis method, but it also has limitations. Such as, in order to transform fully, the algorithm requires that the signal sequence length of the analyzed signal is an integer power of 2 [9]. To overcome this limitation, a highly redundant nonorthogonal wavelet transformation named MODWT is proposed, and it has no restriction on the signal sequence length. But MODWT is replaced by MODWPT, which has good frequency resolving power at high frequency and preserves the good performance of MODWT [39].
DWT can be explained as follows: X t , t � 0, K, N − 1 is a real-valued time sequence and N is the signal sequence length. g l , l � 0, K, L − 1 is even-length low-pass filter of DWT, which can output the low-frequency part of the input signal and filter out the high-frequency part, while the highpass filter h l , l � 0, K, L − 1 is the opposite, where L is the filter's length. For all nonzero integer n, the low-pass and high-pass filters can be simply written by the following equation [9,39]: Additionally, low-pass and high-pass filters are chosen to be quadrature mirror filters, so h l and g l are related to each other as follows: With V 0,t � X t , the j th level scale transformation co-

Shock and Vibration 3
For the DWT pyramid algorithm, V j− 1,t is the j th level input, so the j th level output is the j th level scaling transform coefficient and wavelet transform coefficient which are presented as follows: where mod represents remainder after division. DWT has the following limitations [9,39,40]: (1) when the length of the signal sequence to be analyzed is an integer power of 2, DWT could be fully performed; (2) when the signal is cyclically shifted, the wavelet coefficient and proportional coefficient of DWTcannot reach the identical cycle shift; (3) the wavelet coefficients and scale coefficients of DWT will be halved with the increase in the level of DWT series, which will affect the statistical analysis of the coefficients.
At the same time, g l and h l satisfy the following equations now: MODWT algorithm performs the weighted average of all observation starting points, which can reduce the deviation caused by cycle shift. To avoid halving the coefficient, MODWT rebuilds filters by inserting 2 j− 1 − 1 zeros for each j th level between the elements of g l and h l : are presented as follows: In order to decompose high-frequency signals, MODWPT can be used to process signals. e coefficient of MODWPT is defined as follows: where n is the frequency band number.

Local Fisher Discriminant Analysis (LFDA)
. LFDA integrates the advantages of LDA and LPP [17,41], that is, LFDA aims to obtain the best separability between classes in the space and at the same time retain the local structure within the class. On the basis of LDA, LFDA considers the proximity relationship between the sample data of the same classes, so that the data after dimensionality reduction are more conducive to classification [41]. LDA can be explained as follows: let x i ∈ R d (i � 1, 2, K, n) be d-dimensional samples and y i ∈ 1, 2, K, c { } be associated class labels, where n is the number of samples and c is the number of classes. Let n ℓ be the number of samples in class ℓ. Let S (w) be the scatter matrix within the class and S (b) be the scatter matrix between classes [17,41]: where μ ℓ is the sample average in class ℓ and μ is the average of all samples. Assuming S (w) has full rank and T Τ S (w) T is invertible, the LDA transformation matrix T LDA is defined as follows: LDA can be explained as follows: A is an affinity matrix and A i,j is the appetency between x i and x j , where e value of A i,j depends on the proximity relationship between x i and x j in the feature space. A i,j is defined as follows [40]: where σ i is the local scaling of the data samples around x i which is defined as follows: where x (K) i is the K-th nearest neighbor of x i and K is generally 7.
e LPP conversion matrix T LPP is represented as follows: subject to where D is the n-dimensional diagonal matrix and satisfies the following equation: e LFDA transformation matrix T LFDA is defined as follows [41]:

Shock and Vibration
where S (w) and S (b) are the updated scatter matrices within the class and between classes, respectively: where W i,j represent the weight values for the sample pair in the identical class and distinct class, respectively, which can be defined as follows: It is necessary to weight the values of sample pair in the identical category based on the affinity A i,j so that the samples which are far apart in the identical category have less impact on S (w) and S (b) .

Transfer Component Analysis (TCA)
. TCA [42] is a typical feature-based transfer learning method, which aims to reduce the difference between the marginal distributions of the different datasets. Given a domain D, it consists of a Ddimensional feature space X, whose marginal probability distribution is P(X); X � x 1 , K, x n is the training dataset.
T � Y, f(X) is the learning task, where Y � y 1 , K, y n is the D-dimensional label space and f(X) � Q(Y|X) is the predictive function which represents the conditional probability distribution given D S (source domain), D T (target domain), and corresponding learning tasks T S and T T . TCA aims to facilitate the predictive function f T (X) in D T by learning the d data from D S and D T , but in real diagnostic procedure, D S ≠ D T or T S ≠ T T . A nonlinear mapping function ϕ in a reproducing kernel Hilbert space H exits that P S (ϕ(X S )) ≈ P T (ϕ(X T )) and P S (Y S |ϕ(X S )) ≈ P T (Y T |ϕ(X T )). e optimization objective of TCA is that the variance of the feature data can be preserved in a latent space and the marginal distributions between the D S and D T datasets can be minimized as much as possible. In TCA, the empirical maximum mean discrepancy (MMD) represents the distance of two marginal distributions P S (X) and P T (X), and the definition of MMD is represented as follows [7,42,43]: where n S and n T are the sample number of D S and D T , respectively. K represents a kernel matrix, as follows: where K S,S , K S,T , and K T,T are the kernel matrices in the D S , cross-domain, and D T , respectively. e expression of L is shown as follows: Empirical kernel mapping can reduce the matrix dimension, and it can transform high-dimensional data to low-dimensional data by using matrix W ∈ R (n S +n T ) * m which is embedded into K. e resultant kernel matrix is as follows [7]: where W � K − (1/2) W, and equation (20) can be transformed as erefore, the kernel learning problem (the objective function of TCA) can be replaced as follows: where tr(W T W) is used to control the complexity of W and at the same time to avoid the rank deficiency of the denominator. In equation (26), μ is a trade-off parameter. I ∈ R m×m and H are identity matrices, respectively. Finally, based on the trace optimization problem, equation (26) can be efficiently solved.

Preferred Feature Selection by Fault Sensitivity and Feature Correlation (PSFFC).
In order to select features which are more beneficial to fault pattern recognitions and reduce the redundancy of feature set, in this paper, we suggest that fault sensitivity of statistical feature and correlation between features should be considered to select preferred features. erefore, there are two aspects: (1) e K-means algorithm (KA) [44] and the SMD of feature data are applied to indicate the fault sensitivity. Each type of statistical feature is processed by KA, and KA can get an index, that is, adjusted rand index (ARI). For each kind of feature, the MD of data samples in each condition can be calculated, and the SMD in all bearing conditions can be further obtained. e ARI and SMD indicate the class discriminative degree and the cohesiveness of the feature data, respectively. e ratio of the ARI and SMD is used to evaluate the fault sensitivity of feature, and the higher the ratio, the greater the fault sensitivity will be.
(2) PCC [45] is used to evaluate the correlation between features. e higher the PCC, the higher correlation between features will be.
On the basis of the above two aspects, a new feature evaluation index is proposed, feature priority selection degree (FPSD), that is used to select preferred features for fault pattern recognition. e introduction of PSFFC is summarized as follows.
Step 1: given a raw vibration signal dataset, there are M fault types, and each fault type has N vibration signal samples. K types of statistical features can be obtained by the vibration signals processing and original feature extraction. ese features can constitute the raw statistical feature set [RFS 1 , RFS 2 , K, RFS K ], where the expression of RFS k is as follows: where FS k ij is the k-th feature of the j-th sample in the ith fault type. en, by using KA, the ARI of the clustering partitions can be indicated by the cohesiveness of the feature data [46,47]. e definition of ARI is described as follows: Given a set of n objects X � x 1 , x 2 , K, x n , U � u 1 , u 2 , K, u R , and V � v 1 , v 2 , K, v C are supposed to represent two different partitions of the objects in X, the ARI is then defined as follows [46,47]: where a is the number of that objects in a pair belonging to the same classes in U and V; d is the number of that objects in a pair belonging to the different classes in U and V; b represents the number of that objects belonging to the same classes in U and the different classes in V; c represents the number of that objects belonging to the different classes in U and the same classes in V. e maximum of ARI is 1, which indicates that the correct classification between classes is achieved by KA [17,47]. erefore, the value of ARI can be used to indicate the clustering performance, which can reflect the feature's discriminant power [47]. When the feature sets, [RFS 1 , RFS 2 , K, RFS K ], are performed for clustering analysis by K-means algorithm, the corresponding ARI � ARI(1), ARI(1), K, ARI(K) { } can be obtained. When the value of ARI is higher, the class discriminative degree of feature is greater.
Step 2: for each type bearing condition (fault pattern), for a type of statistical feature, the MD of each feature data (the elements of row of RFS k ) is calculated. us, the corresponding MD set can be obtained, that is where en, for the k-th statistical feature of M fault types, the SMD of feature samples is calculated to obtain SMD(k). e expression of SMD(k) is as follows: erefore, K types of statistical features have a mean deviation sequence (SMD(1), SMD(2), K, SMD(K)). We suppose that the MD can be used to indicate the cohesion of feature data. When the value of SMD(k) is smaller, the class cohesion of the feature is greater.
Step 3: evaluation index of fault sensitivity, FSD (fault sensitivity degree), can be obtained by calculating the ratio of ARI and SMD. For K types of features, there is a FSD sequence FSD � FSD(1), FSD(1), K, FSD(K) { }, where the FSD(k) is defined as follows: When the value of FSD (k) is higher, the fault sensitivity of feature is better.
Step 4: by calculating the PCC between features, for the raw feature set which contains K types of statistical features, the PCC between each feature and the Shock and Vibration remaining K − 1 features should be calculated, and thus, each feature has K − 1 PCCs. en, the SPCC (sum of the K − 1 PCCs) can be obtained. Given two samples X � x 1 , x 2 , x 3 , K, x n and Y � y 1 , y 2 , y 3 , K, y n , the PCC is defined as follows: where μ X and μ Y are the mean of samples, σ X and σ Y are, respectively, the standard deviation of samples X and Y. Next, there is a SPCC sequence SPCC � SPCC(1), SPCC(1), K, SPCC(K) { }, and the SPCC(k) is defined as follows: where PCC ki represents the PCC between the k-th type feature and i-th type feature. In this paper, we suppose that the higher the SPCC of a feature, the higher the redundancy degree of raw feature set caused by the feature will be.
Step 5: a new feature evaluation index, FPSD, can be obtained by combining the FSD and SPCC. e expression of FPSD is presented as follows: where μ ∈ [0, 1] is a balance factor. When μ is 0, FPSD just takes feature correlation into account, and on the contrary, FPSD just takes fault sensitivity into account when μ is 1. In this paper, it is presumed that the selection priority of feature is better when the value of FPSD is higher. erefore, in the descending mode, the sorted FPSD sequence can be obtained by sorting the FPSD of features. e sorted FPSD sequence can be used to select features for the implementation of the subsequent fault diagnosis process.

Transfer Component Analysis with Preserving Local
Manifold Structure (TCAPLMS). TCA can keep the variance of the data to the greatest extent and minimize the marginal distribution differences between datasets in different domains as much as possible [41]. However, TCA does not take the label information and the local manifold structure of feature data into consideration. Aiming at fault pattern recognition, the label information of the training feature dataset is beneficial to improve the discriminant performance of feature data and increase the classification accuracy [9,17]. Furthermore, preserving the local manifold structure of data is beneficial to pattern recognition and classification of multimode feature data [2,17]. erefore, TCAPLMS, a novel feature-based transfer learning method, is proposed in this section. TCAPLMS naturally inherits the merits of TCA and LFDA, that is, the optimization goal of improved LFDA can be integrated into TCA, where the label information of feature data is considered and the local manifold structure of data is preserved. Based on the introduction of TCA and LFDA in Section 2, the optimization goal of the TCAPLMS can be defined by integrating the optimization goal of TCA and improved LFDA.
e goal function of TCAPLMS is presented as follows: where S (equations (17) and (18)). e expressions of S are shown as follows: where W (sw) i,j and W (sb) i,j are presented as follows: In equation (39), y i ≠ y j (j ∈ Nst(i)) represents that j is the nearest neighbor of i. e reason for the above modification is a shortcoming of LFDA, that is, the neighbor relationships between samples in the same classes are taken into account. However, the neighbor relationships between samples in different classes are not taken into consideration. Aiming at this problem, the between-class scatter matrix S (b) can be modified to S SM . e solution of equation (36) can be transformed to solve the trace optimization problem. e Lagrange multiplier is contained in diagonal matrix that is employed to equation (36), and it is presented as follows: 8 Shock and Vibration en, the matrix W can be solved out by solving a generalized eigenvalue problem: Finally, eigenvalues and the corresponding eigenvectors can be obtained by solving the above problem; the first d (d < D, D is a higher dimension of the inputs of TCAPLMS) eigenvectors which are corresponding to the first d smallest nonnegative eigenvalues can be selected to compose the transformation matrix W.
With the use of the proposed TCAPLMS, the low-dimensional representation of the training and testing datasets can be obtained with a smaller difference of marginal distributions between them, and they have greater discriminant performance and less redundant information.

System Framework.
e structure block diagram of the proposed system framework for variable-condition bearing fault diagnosis is shown in Figure 1. According to the system framework, the entire procedure has four steps, namely, signal processing, features extraction, feature transfer learning, and fault pattern recognition. ere are two phases, training and testing phases. First of all, the vibration signals collected for training and testing are, respectively, decomposed into different packet nodes by MODWPT. en, the original statistical feature generation is performed. For the training phase, the original feature set is processed by the proposed PSFFC to obtain sorted FPSD sequence. e most preferred features can be selected for the training fault diagnosis model. In the testing phase, the sorted FPSD obtained from the training phase will be directly applied to select preferred features to construct feature subset. Next, labeled feature data from the training phase and unlabeled feature data from the testing phase are chosen as source domain and target domain, respectively. e proposed TCAPLMS is employed to process source domain and target domain, which can obtain the lowdimensional feature dataset. Finally, the low-dimensional feature dataset in the training phase is employed to train the pattern recognition classifier. Finally, the trained pattern recognition classifier is used to test the low-dimensional testing feature dataset and output the fault diagnosis accuracy.

Experiments and Analysis Results
In this paper, bearing vibration datasets obtained from two experimental test platforms are used to validate the effectiveness of the proposed fault diagnosis framework towards real industrial scenes. e introduction of two experimental test rigs is as follows: (1) Test rig 1, as shown in Figure 2, is from Case Western Reserve University (CWRU) [48][49][50], and this test rig supports a motor load of 0-3 horsepower (hp). ere are three accelerometers that are placed on the fan-end and drive-end bearings at the 12 o'clock position. (2) Test rig 2 is SQI-MFS test rig, as shown in Figure 3 [9, 17]; there are different fault conditions which are presented in Figure 4. SQI-MFS supports the motor speeds of 1200-1800 rpm. Two accelerometers are placed on the fan-end and drive-end bearings to collect vibration signals. A high-speed AD collector is used to collect the vibration data under different working conditions.

Experimental Analysis.
First of all, vibration signals are processed by MODWPT, and some wavelet packet nodes can be obtained. In this section, the "dmey" is selected as mother wavelet and the decomposition level is 4. One normal sample, one ball fault sample, one inner race fault sample, and one outer race fault sample from the training set of 3 hp are presented in Figures 5-8, respectively. 16 terminal nodes are obtained by signal processing, and 16 HES of reconstruction signals of 16 terminal nodes can be calculated. erefore, 192 statistical features, the composition of the raw feature set (RFS), can be generated by calculating the 6 statistical parameters of 16 reconstruction signals and 16 HES. Table 3 presents the 6 statistical parameters.
For the RFS, the proposed feature selection method PSFFC is employed to evaluate the feature priority selection degree of each statistical feature of the training data. e ARI, SSMD, FSD, SPCC, and FPSD of 192 features of the training samples are, respectively, shown in Figures 9-13. e horizontal axis of Figures 9-13 is the number of features. After the procedure of PSFFC, a sorted FPSD sequence can Shock and Vibration be obtained, and preferred features subset can be formed. en, the proposed feature-based transfer learning method TCAPLMS is further performed to reduce the marginal distribution differences between the training and testing feature subsets and obtain a low-dimensional feature set with desirable discriminant performance. In this paper, the parameters a and μ in TCAPLMS are 0.5 and 0.3, respectively. Finally, the low-dimensional feature set is employed for the training fault diagnosis model.  Table 4. For example, RFS-SVM is a diagnosis model based on SVM, in which the RFS is used as the input of the SVM. TCA is embedded in the RFS-SVM model, which is RFS-TCA-SVM model. PSFFC is embedded in RFS-TCA-SVM model, which is RFS-PSFFC-TCA-SVM model. In this paper, the average diagnostic accuracy of 12 bearing conditions are shown in the experimental analysis, and the detailed description is presented as follows.
In the first group of experiments, PSFFC is not performed. e experimental results of diagnosis models listed in Table 4       Shock and Vibration diagnosis model, and the experimental results of the RFS-SVM and RFS-KNN models using MODWPT and WPT are, respectively, presented in Table 5. According to the diagnosis accuracy in Table 5, it is evident that the diagnosis result of model using MODWPT is better than the model using WPT. erefore, the experimental results and analysis of all models using MODWPT are introduced below. For the testing set of case 1, all models can obtain desirable diagnosis accuracy. e maximum accuracy of RFS-SVM, RFS-LFDA-SVM, RFS-TCA-SVM, and RFS-TCAPLMS-SVM can attain 98.54%, 99.79%, 91.67%, and 99.38%, respectively. e main reason is that the training and testing samples are from the same working condition and the distribution between them is almost the same. For the testing set of case 2, the highest accuracy of RFS-SVM, RFS-LFDA-SVM, and RFS-TCA-SVM can only attain 83.54%, 87.08%, and 87.29%. However, the highest accuracy of the RFS-TCAPLMS-SVM model can attain 97.50%, which is obviously higher than that of other models. e experimental results of KNN-based models are similar to that of SVM-based models, and the highest accuracy of the RFS-TCAPLMS-KNN model can attain 97.08%, which is obviously higher than that of other models. A problem can be found from the first group experimental results, that is, for the conventional fault diagnosis model, it is not easy to guarantee a preferable diagnosis performance when the distribution between the testing and training set is different. e use of TCAPLMS can help the diagnosis model attain desirable diagnosis performance.  Range Energy Energy entropy Mean value   ere is an experimental analysis about the second group of experiments; PSFFC is performed before the steps of features transfer learning and patterns classification. e experimental results of fault diagnosis models listed in Table 4 are shown in  Tables 9-12        achieve desirable performance which is obviously better than RFS-SVM, the maximum diagnosis accuracy of them can attain 95.63% (psfn � 90), 97.50% (psfn � 40), and 100% (psfn � 40, 70, 80), respectively. e experimental results of KNN-based models are similar to that of SVM-based models, and for the testing set of case 2, the highest diagnosis accuracy of RFS-PSFFC-TCAPLMS-KNN can attain 100%. According to the second group experimental results, under different working conditions, it is obvious that the proposed PSFFC can help diagnosis models to improve the diagnosis performance, and the combination of PSFFC and TCAPLMS can attain ideal fault diagnosis accuracy when a good parameter psfn is used; for the testing set of case 2, RFS-PSFFC-TCAPLMS-SVM model can attain 100% accuracy when psfn is 40, 70, or 80. erefore, the effectiveness and adaptability of the PSFFC and TCAPLMS are verified.  Table 13.

Experimental Analysis.
e experimental procedure is the same as that of test rig 1; first of all, vibration signals are processed by MODWPT, and different wavelet packet nodes can be obtained. One normal, one ball fault, one inner race fault, and one outer race fault vibration signal samples from the training set of 1800 rmp are presented in Figures 23-26  TCAPLMS is further performed to process the training and testing feature subsets, which can help to obtain a desirable discriminant performance. Finally, the low-dimensional feature set is employed for the training fault diagnosis model.          Table 16. When the testing dataset is from case 1, all models can attain      desirable diagnosis accuracy. e highest diagnosis accuracy of RFS-PSFFC-TCAPLMS-SVM (psfn is 110-150) and RFS-PSFFC-TCAPLMS-KNN (psfn is 120-150) can attain 100%. When a good parameter psfn can be used, the fault diagnosis model for the testing set of cases 2 can attain desirable diagnosis results, for example, the highest testing accuracy of RFS-PSFFC-TCAPLMS-SVM model can attain 89.50% when psfn is 80, which is obviously higher than that of RFS-TCAPLMS-SVM. e curve representation of testing accuracies of RFS-PSFFC-TCAPLMS-SVM and RFS-PSFFC-TCAPLMS-KNN is shown in Figures 32 and 33. e testing accuracies of models with the use of PSFFC, TCA, LFDA, and TCAPLMS are presented in Figure 34. erefore, the effectiveness and adaptability of the PSFFC and TCAPLMS can be further validated (Figures 32-34).

Conclusions
In the face of real industrial scenarios, the complex working conditions can lead to that data-driven diagnosis methods using conventional machine learning techniques often highlight a limitation that it is difficult to achieve the   24 Shock and Vibration desirable fault diagnosis performance, due to that the feature distributions of training and testing data are assumed to the same. Aiming at this problem, a novel intelligent bearing fault diagnosis framework is proposed towards real industrial scenarios. In this framework, an improved domain adaptation method, transfer component analysis with preserving local manifold structure (TCAPLMS), is proposed to reduce the marginal distributions differences between different domain datasets, and at the same time, take the label information of feature dataset and the local manifold structure of feature data into consideration, Furthermore, preferred feature selection by fault sensitivity and feature correlation (PSFFC) is embedded into this framework for selecting features which are more beneficial to fault pattern recognitions and reduce the redundancy of feature set. Finally, vibration signal datasets collected from two experimental test platforms are used for experiments. It is obvious that the proposed PSFFC and TCAPLMS have a great potential to be beneficial in actual bearing fault diagnosis applications. In experiments, two cases are selected as comparative cases, and for the experimental test rig 1, cases 1 and 2 have the same training samples, but the testing samples are different. e experimental results show that the diagnosis model using PSFFC and TCAPLMS can attain desirable performance and improve the generalization ability of models, and when a good parameter psfn is used, cases 1 and 2 can both attain 100% diagnosis accuracies. In summary, the experimental results from test rig 2 further demonstrate the effectiveness, adaptability, and great potential of the diagnosis model using PSFFC and TCAPLMS under variable working conditions.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.