^{1}

^{1}

^{1}

^{1}

^{1}

^{2}

^{1}

^{2}

In mechanical fault diagnosis, it is impossible to collect massive labeled samples with the same distribution in real industry. Transfer learning, a promising method, is usually used to address the critical problem. However, as the number of samples increases, the interdomain distribution discrepancy measurement of the existing method has a higher computational complexity, which may make the generalization ability of the method worse. To solve the problem, we propose a deep transfer learning method based on 1D-CNN for rolling bearing fault diagnosis. First, 1-dimension convolutional neural network (1D-CNN), as the basic framework, is used to extract features from vibration signal. The CORrelation ALignment (CORAL) is employed to minimize marginal distribution discrepancy between the source domain and target domain. Then, the cross-entropy loss function and Adam optimizer are used to minimize the classification errors and the second-order statistics of feature distance between the source domain and target domain, respectively. Finally, based on the bearing datasets of Case Western Reserve University and Jiangnan University, seven transfer fault diagnosis comparison experiments are carried out. The results show that our method has better performance.

As an essential component of mechanical system, bearing was widely used in rotating machinery. Once the bearing fails, it will cause the mechanical system failure resulting in rotating machinery shutdown and even causing casualties. Therefore, bearing fault diagnosis is of great significance to the health and safe operation of machinery and has attracted more and more attention in scholars and manufacturing industries [

In the last decade, with the increasingly complex structure of mechanical equipment structure and rapid development of sensor technology, the acquisition of vibration signal has become easy and has brought new perspectives and challenges to the traditional intelligent fault diagnosis of rotating machinery [

In order to satisfy the actual needs of bearing fault diagnosis in practical application engineering, transfer learning, a novel classification method by using the learned knowledge from sources domain to unknown target domain, has attracted more and more attentions in fault diagnosis [

Based on the abovementioned literature analysis, transfer learning has made a great breakthrough in the field of insufficient training data and data collected in varying condition. However, the existing transfer learning-based methods mainly focus on how to measure the interdomain feature marginal distribution discrepancy in domain adaptation. MMD (maximum mean discrepancy), a well-known domain adaptation method for distance metric, has been widely adopted in marginal distribution optimization and has achieved better performance [

Motivated by the analysis abovementioned, this paper proposes an intelligent bearing fault diagnosis method based on deep transfer learning with CORAL loss metric, which is used to measure interdomain marginal distribution discrepancy. First, as a basic feature representation-learning framework, CNN is used to obtain the robust feature space from vibration signals. To estimate the marginal distribution discrepancy between source domain and target domain, the nonlinear transformed CORAL domain adaption is exploited to minimize marginal distribution discrepancy and at same time to constrain the CNN parameters aiming to obtain more robust feature representation learned by CNN also. Then, two objectives need to be optimized, respectively. One optimization objective is a conditional classifier based on CNN, using the cross-entropy loss function to minimize classification error. The other is the second-order statistics of features between the optimal features of source domain and target domain, which are optimized by the Adam method. Finally, twelve comparative experiments based on Case Western Reserve University bearing dataset and six comparative experiments based on Jiangnan University bearing dataset are carried out to verify the effectiveness of our method. The main contribution of our method lies in the following aspects:

One-dimensional (1-D) CNN is build to extract representation features from the original vibration signal, and then the domain adaptation is performed only in the latter two layers unlike other CNN in latter three layers, with the purpose of reducing computation cost.

A differentiable loss function is constructed for extending CORAL metric domain adaptation to minimize the marginal distribution discrepancy from cross-domain representation feature covariance.

Two objectives are optimized by cross-entropy loss function and Adam optimizer, to minimize the classification error for CNN and second-order statistics feature of source domain and target domain, respectively.

The remainder of this paper is organized as follows. In Section

In this section, we will mainly introduce the model structure of TL and CNN, which is usually used to transfer learned knowledge from source domain to the target and to classify fault. In addition, we will introduce the relevant theoretical knowledge of CORAL.

TL, an important branch of machine learning, is usually used to tackle the problem of insufficient data and marginal distribution inconsistency by learning knowledge from training data to testing data. It has been widely used in fault diagnosis [

The classification accuracy of traditional machine learning methods will drop sharply when source domain and target domain did not have the same marginal distribution. Thus, domain adaptation was used to weaken the influence of marginal distribution inconsistency from the two domains [

The framework of transfer learning.

Given a labeled source domain

MMD, a widely used distance in transfer learning for interdomain distribution discrepancy measure, was explored to construct as a new regularized item in loss function to make the distribution discrepancy of the two domains as small as possible. It can obtain the nonparametric distance from interdomain feature distribution without calculating the intermediate density. To measure the marginal distribution discrepancy by migrating data in the reproducing kernel Hilbert space (RKHS), the calculation formula of MMD is defined as follows:

CNN, one of the most representative networks in the field of deep learning, was extensively used in civil structures, mechanical structures, and wind engineering [

Convolution layer is the core layer of CNN, which contains a set of trainable filters. Weight sharing is the most important characteristic of the convolution layer. It is used to optimize the network parameters to avoid over fitting caused by too many parameters and to relax the computer load, which is expressed as follows:

Generally speaking, the pooling layer (PL) performs the down sampling operation. The main purpose of PL is to reduce the parameters of the neural network while retaining the representative features and to prevent over fitting and improve the generalization ability of the model. The PL operation can be carried out as follows:

The full connection layer plays the role of “Classifier” in the whole neural network. First, the output of the last pooling layer is expanded into a one-dimensional feature vector as the input of the fully connected layer. Then, the inputs and outputs are fully connected, and the activation function of the hidden layer is ReLU. Finally, the Softmax function is used to the output layer; the calculation of full connection layer is given as follows:

CORAL is an effective and simple unsupervised adaptive method which was first proposed by [

Compared with MMD, the difference is that MMD-based approaches usually apply the same transformation to both the source and target domain. [

The domain adaptation is achieved by minimizing the difference between the feature space of the source domain and the target domain, and the CORAL method is used. By taking the coral loss into the optimization objective, the similarity of the feature space learned in the source domain and the target domain is maximized so as to make up for the deficiency of CNN's insufficient learning of domain invariant feature space.

This section is divided into subheadings. It should provide a concise and precise description of the experimental results, their interpretation as well as the experimental conclusions that can be drawn.

The condition classifier based on CNN consists of 10 layers of one-dimensional CNN, including one input layer, four convolution layers, two pooling layers, two fully connected layers, and one output layer. In 10 layers of CNN, the first seven layers are called feature extractors, which are used to extract the conditional representative feature from vibration signals. Meanwhile, the last layer is regarded as the condition classifiers for judging the condition of the test sample. The input layer is constructed by one-dimensional vibration signal with the length of 784. In the convolution layer, convolution kernels are used for the local region of the input signal and generate corresponding features as shown in Figure

The condition classifier framework based on CNN.

In order to reduce the dimension of convolution features and preserve representatively features as much as possible, a pooling layer is connected after the first convolution layer and last convolution layers. Through four convolution layers and two pooling layers operation, the input features will become flat in the first fully connected layer

Structure and parameters of CNN.

Layers | Parameters | Activation function |
---|---|---|

Input | — | — |

Conv1 | Kernels: 1 × 64 × 16, stride: 16 | ReLU |

Pool1 | Tride: 2, max pooling | — |

Conv2 | Kernels: 1 × 3 × 32, stride: 1 | ReLU |

Conv3 | Kernels: 1 × 5 × 64, stride: 1 | ReLU |

Conv4 | Kernels: 1 × 5 × 128, stride: 1 | ReLU |

Pool2 | Tride:2, max pooling | — |

FC1 | Weights: 5000 | ReLU |

FC2 | Weights: 1000 | ReLU |

Output | Weights: 10 | Softmax |

Domain adaptation is an important means to transfer knowledge from source domain to target domain when data marginal distribution is inconsistent between source domain and target domain, which determines the efficiency of knowledge transfer. The common MMD domain adaptation criteria have high computational complexity and low generalization ability with the increase in data volume. In order to effectively measure the data marginal distribution difference between the source domain and the target domain, we use a differentiable loss function to minimize the similarity comparison of the marginal distribution differences between the source domain and the target domain [

In this subsection, we will concern the optimization objectives of the proposed method in detail. There are two objectives need to be optimized: (1) minimize conditional classification errors on the source domain dataset given as in Figure

Let

Based on equation (

In the training process of the proposed method, the Adam optimization algorithm is used for objective optimization [

The flowchart of our method.

In order to test the performance of the proposed intelligent fault diagnosis method and verify its effectiveness, we conducted experiments using two bearing datasets. Comparative experiments are also carried out to compare the classification accuracy with existing methods including traditional CNN without transfer learning, TCA-based [

The data for Experiment 1 of this method come from the Bearing Data Center of Case Western Reserve University [

Bearing testbed used for experiment.

The vibration signal is collected by the bearing test platform which consists of a motor (left), a dynamometer (right), and a control circuit, and the experiment data are arranged on the bearing (SKF6205). We divide the experimental datasets into four categories. Each class has 10 groups data, including one category of general data and nine categories of fault data, namely, normal (N), inner-race fault (IF), outer-race fault (OF), and ball fault (BF). Each fault type has different degrees of fault severity (0.007 inch, 0.014 inch, and 0.021 inch fault diameters). So, there are 9 fault conditions and 1 health condition. More details are given in Table

Experimental data description.

Dataset | Condition description | Samples | Operation conditions | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

A | Loc | N | IF | IF | IF | BF | BF | BF | OF | OF | OF | 10 × 150 | 0HP (1797 rpm) |

Size | 0 | 7 | 14 | 21 | 7 | 14 | 21 | 7 | 14 | 21 | |||

B | Loc | N | IF | IF | IF | BF | BF | BF | OF | OF | OF | 10 × 150 | 1HP (1772 rpm) |

Size | 0 | 7 | 14 | 21 | 7 | 14 | 21 | 7 | 14 | 21 | |||

C | Loc | N | IF | IF | IF | BF | BF | BF | OF | OF | OF | 10 × 150 | 2HP (1750 rpm) |

Size | 0 | 7 | 14 | 21 | 7 | 14 | 21 | 7 | 14 | 21 | |||

D | Loc | N | IF | IF | IF | BF | BF | BF | OF | OF | OF | 10 × 150 | 3HP (1730 rpm) |

Size | 0 | 7 | 14 | 21 | 7 | 14 | 21 | 7 | 14 | 21 |

In our experiment process, we select the original vibration signal with the sampling being 12 kHz, and four different motor speeds (1797 rpm, 1772 rpm, 1750 rpm, and 1730 rpm) are applied to the bearing. We regard them as four different operating conditions (named A, B, C, and D), and each operating condition contains 1500 samples (including 9 classes of fault, each class has 150 samples, respectively, and 150 samples with health condition). The waveforms of various categories of original vibration signal for experiment are illustrated in Figure

Original waveform of various working conditions.

The second bearing data come from the centrifugal fan system for rolling bearing fault diagnosis testbed of JiangNan University [

Original waveform of various working conditions.

In order to present a comprehensive evaluation for our method, we first carry out the fault diagnosis experiment from source domain to source domain and from source domain transfer to target domain. Various condition classification accuracies of the proposed method are shown in Table

Classification accuracy (%) of various conditions.

Transfer condition | Source domain accuracy (%) | Target domain accuracy (%) |
---|---|---|

A ⟶ B | 100 | 98.50 |

B ⟶ A | 100 | 98.33 |

A ⟶ C | 100 | 99.07 |

C ⟶ A | 100 | 98.40 |

A ⟶ D | 100 | 99.00 |

D ⟶ A | 100 | 93.87 |

B ⟶ C | 100 | 99.53 |

C ⟶ B | 100 | 97.67 |

B ⟶ D | 100 | 98.40 |

D ⟶ B | 100 | 95.20 |

C ⟶ D | 100 | 98.93 |

D ⟶ C | 100 | 97.20 |

AVG | 100 | 97.85 |

Furthermore, to analyze the classification accuracy of each category in more detail, the widely used confusion matrix is used to obtain the performance of compared methods. In this proposal, we choose the task A ⟶ B to calculate the confusion matrix randomly. The detailed results are illustrated in Figure

Confusion matrix of the prediction for task A ⟶ B: (a) CNN; (b) DDC; (c) DAN; (d) the proposed.

From Figure

In this proposed method, in order to conduct a detailed comparative experiment, we divide the comparison algorithm according to the feature extraction into four categories, when the feature extraction is the same, and then classify according to the transfer manner. The details of various methods are introduced in Table

Parameters and category of compared method.

Category | Method | Features | Transfer manner |
---|---|---|---|

1 | CNN | Learned features | No transfer |

2 | TCA [ | Handcrafted features | MMD |

CORAL [ | Handcrafted features | Covariances | |

3 | WD-DTL [ | Learned features | Wasserstein distance |

4 | DDC [ | Learned features | MMD |

DAN [ | Learned features | MMD | |

5 | Our method | Learned features | Covariances |

Classification accuracy of various methods.

Method | A ⟶ B | B ⟶ A | A ⟶ C | C ⟶ A | A ⟶ D | D ⟶ A | B ⟶ C | C ⟶ B | B ⟶ D | D ⟶ B | C ⟶ D | D ⟶ C | AVG |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

CNN | 95.80 ± 0.1% | 95.67 ± 0.43% | 93.33 ± 1.26% | 92.40 ± 2.13% | 95.33 ± 0.17% | 77.60 ± 1.17% | 95.40 ± 0.28% | 96.67 ± 0.26% | 91.07 ± 0.47% | 77.27 ± 5.35% | 97.13 ± 0.12% | 75.20 ± 4.22% | 90.24 |

TCA [ | 79.13 ± 3.17% | 77.33 ± 2.22% | 72.93 ± 1.13% | 63.00 ± 1.24% | 66.67 ± 4.73% | 74.47 ± 0.62% | 82.80 ± 0.71% | 78.27 ± 1.52% | 75.27 ± 3.53% | 74.13 ± 1.22% | 61.07 ± 4.24% | 68.44 ± 5.36% | 72.79 |

CORAL [ | 75.55 ± 2.16% | 72.33 ± 4.22% | 69.80 ± 3.52% | 66.86 ± 7.14% | 52.73 ± 4.15% | 54.00 ± 5.56% | 72.53 ± 1.93% | 70.47 ± 2.18% | 80.86 ± 0.19% | 76.93 ± 3.17% | 72.27 ± 4.16% | 67.60 ± 5.13% | 67.60 |

WD-DTL [ | 97.52 ± 3.09% | 96.80 ± 1.10% | 94.43 ± 2.99% | 92.16 ± 2.61% | 95.05 ± 2.52% | 89.82 ± 2.41% | 99.69 ± 0.59% | 96.03 ± 6.27% | 95.51 ± 2.52% | 95.16 ± 3.67% | 97.56 ± 3.31% | 99.62 ± 0.80% | 95.75 |

DDC [ | 95.60 ± 1.25% | 90.60 ± 2.43% | 95.00 ± 2.32% | 97.87 ± 0.12% | 86.73 ± 2.71% | 88.47 ± 2.04% | 92.80 ± 1.58% | 97.07 ± 0.21% | 89.53 ± 4.19% | 79.27 ± 0.23% | 96.87 ± 0.18% | 86.27 ± 2.26% | 91.34 |

DAN [ | 95.90 ± 1.17% | 93.33 ± 1.32% | 96.67 ± 0.59% | 96.00 ± 0.68% | 88.93 ± 4.66% | 88.00 ± 1.02% | 94.40 ± 1.63% | 98.00 ± 0.22% | 90.80 ± 1.13% | 78.33 ± 3.16% | 97.13 ± 0.33% | 81.93 ± 3.42% | 91.63 |

The proposed | 98.50 ± 0.11% | 98.33 ± 0.06% | 99.07 ± 0.12% | 98.40 ± 0.46% | 99.00 ± 0.21% | 93.87 ± 1.46% | 99.53 ± 0.14% | 97.67 ± 0.32% | 98.40 ± 0.18% | 95.20 ± 2.24% | 98.93 ± 0.13% | 97.20 ± 1.02% | 97.85 |

In the TCA method, the MMD was used to calculate the distribution discrepancy of two domains, and then Gaussian kernel function was employed to minimize the MMD. The CORAL method aims to minimize the domain offset with aligning the second-order statistics (alignment of mean and covariance matrices) by linear transformation, respectively. The details of classification accuracy of various transfer tasks are shown in Table

Cheng et al. proposed an intelligent fault diagnosis method based on Wasserstein distance deep transfer learning named WD-DTL [

Classification accuracy of mentioned methods.

In order to verify the results obtained in the previous section of the experiment, we use another dataset to perform experiments on each method. The experimental results obtained are shown in Table

Classification accuracy of various methods.

Method | E ⟶ F | F ⟶ E | E ⟶ G | G ⟶ E | F ⟶ G | G ⟶ F | AVG (%) |
---|---|---|---|---|---|---|---|

CNN | 62.8 ± 0.1% | 51.4 ± 0.43% | 53.33 ± 1.26% | 62.4 ± 2.13% | 55.4 ± 0.28% | 56.67 ± 0.26% | 57.00 |

TCA [ | 39.13 ± 3.17% | 37.33 ± 2.22% | 32.93 ± 1.13% | 33.25 ± 1.24% | 42.80 ± 0.71% | 38.27 ± 1.52% | 37.29 |

CORAL [ | 41.55 ± 2.16% | 36.33 ± 4.22% | 35.80 ± 3.52% | 36.86 ± 7.14% | 37.53 ± 1.93% | 40.47 ± 2.18% | 38.09 |

WD-DTL [ | 77.52 ± 3.09% | 74.80 ± 2.10% | 64.69 ± 2.99% | 72.86 ± 2.61% | 75.35 ± 2.59% | 68.74 ± 4.27% | 72.23 |

DDC [ | 71.67 ± 4.25% | 73.60 ± 4.43% | 66.70 ± 2.32% | 69.87 ± 4.12% | 73.80 ± 2.58% | 72.07 ± 2.21% | 71.29 |

DAN [ | 72.90 ± 2.17% | 76.33 ± 3.32% | 67.67 ± 3.59% | 66.40 ± 4.68% | 74.40 ± 3.63% | 72.30 ± 2.22% | 71.67 |

The proposed | 75.50 ± 3.11% | 81.33 ± 2.06% | 75.07 ± 3.52% | 75.40 ± 0.46% | 80.53 ± 3.14% | 71.67 ± 4.32% | 76.58 |

Classification accuracy of mentioned methods.

In this subsection, we will give the detailed introduction about our experiments. The software framework used is python, and the GPU is NVIDIA GTX 1660ti. In each experiment, Adam optimizer with the learning rate of 0.001 is set, with batch size being set to 128. And penalty parameter lambda affects the performance of transfer fault diagnosis. By tuning this parameter from {0.1, 0.2, 0.5, 1, 10}, best classification accuracy is acquired. As an example, we take into account the transfer task A ⟶ B to show the training process of its loss function. Due to Adam’s fast characteristics, the loss value decreases rapidly in the first 30 iterations and tends to be stable at about 50 iterations and approaches to 0 at about 400 iterations; the detailed process of the iteration is illustrated in Figure

The loss of training process for the transfer task A ⟶ B.

In order to further demonstrate the knowledge transfer ability of our method at the feature level, t-SNE technology is also employed to visualize learned features for further classification accuracy analysis [

Figure

t-SNE visualization of features: (a) CNN; (b) DDC; (c) DAN; (d) the proposed.

Figures

In this paper, a deep transfer learning method based on CORAL metric for bearing fault diagnosis is proposed. The key idea of this proposed method is to employ the nonlinear transform-based CORAL loss function to estimate the discrepancy of interdomain. As a feature extractor and classifier, CNN is used to train the condition classifier model with its parameters contrasted by CORAL loss function. Eighteen types of fault of transfer tasks in two different dataset are carried out to verify classification performance of the proposed method, and five state-of-the-art architectures are used to compare with our method. These results illustrated above demonstrate that the proposed approach can achieve more satisfactory classification accuracy and domain adaptation capabilities. However, in the second experiment, the difference between the working conditions increased, and the results obtained by our method were not satisfactory. This may be due to the following two limitations of the CORAL: (1) aligning covariance with usual Euclidean metric is suboptimal and (2) second-order statistics have limited expression for the non-Gaussian distribution [

The data used to support the findings of this study are available from the corresponding author upon request.

The authors declare that they have no conflicts of interest.

The research is supported by National Natural Science Foundation of China (Grant No. 61803087, 61972091), by Guangdong Natural Science Fund (Grant No. 2017A030310580, 2017A030313388).