Fault Diagnosis of Rolling Bearing Based on Modified Deep Metric Learning Method

A novel fault diagnosis method of rolling bearing based on deep metric learning and Yu norm is proposed in this paper, which is called a deep metric learning method based on Yu norm (DMN-Yu). In order to solve the misclassiﬁcation caused by the traditional deep metric learning based on distance metric function, a similarity criterion based on Yu norm is introduced into the traditional deep metric learning. Firstly, the deep metric learning neural network (DMN) is used to adaptively extract the fault feature parameters. Secondly, considering that the data samples at the boundary between diﬀerent fault categories can be misclassiﬁed, the marginal Fisher analysis method based on Yu norm is used to optimize the features. And then, BPNN classiﬁer of DMN-Yu method is used to ﬁne tune the network parameters and diagnose the fault category. Finally, the eﬀectiveness and feasibility of the proposed DMN-Yu method is veriﬁed with the rolling bearing fault diagnosis test. And the superiority of the proposed diagnosis method is validated by comparing its diagnosis accuracy with the deep metric learning method based on Euclidean distance (DMN-Euc), traditional deep belief network (DBN), and support vector machine (SVM) combined with the common time-domain statistical features.


Introduction
e rolling bearing is one of the key components in rotating machinery, and its running state has an important influence on the health of rotating machinery. In order to prevent production losses and casualties, it is very necessary to monitor the running condition of rolling bearings and identify their fault categories [1]. e condition monitoring system is an effective tool to ensure the normal operation of the bearing. Owing to the long data acquisition time from the beginning of service to the end of life, the relatively large number of measuring points and the high sampling frequency, the mechanical big data can be obtained to reflect the health condition of bearings [2][3][4]. erefore, it is of great significance to study how to effectively use the big data to diagnosis the bearing fault.
In recent years, many traditional intelligent diagnosis methods based on machine learning have been applied to the field of fault diagnosis. KNN, Bayes network, and others shallow-layer neural networks are all utilized to diagnose the different fault categories of mechanical equipment [5][6][7][8][9], but these ANN methods need the feature parameters which are manually extracted by diagnosis experience and signal processing method. Afterwards, deep learning [10,11] is developed and applied to the field of fault diagnosis because of the ability of automatically extracting feature parameters and directly identifying faults from big data [2]. Some deep learning models are also designed to diagnose bearing faults; for example, deep belief network (DBN) is proposed to diagnose faults of aircraft engine and power transformer and rolling bearing [12,13], and deep convolution neural network (DCNN) is also used to identify outer ring raceway faults and different severity degrees of lubricity faults [14]. Although these deep learning models have strong feature extraction capability to diagnose the fault categories with high accuracy, their diagnosis mechanism is unexplainable in the process of diagnosis. In addition, they have a poor ability to eliminate the overlapping region of different fault categories and can misclassify the faulty data sample at the boundary region between different fault categories [15].
Recently, metric learning has been proved to have the capacity to reduce intraclass scatter and interclass similarity within fault categories by measuring the distance or similarity between different data samples [16]. In view of the advantages of the deep learning and metric learning, deep metric learning (DML) methods are developed to improve the classification ability, which combines deep learning with metric learning to map original feature parameters to discriminative feature space by maximizing interclass variation and minimizing intraclass variation [17,18]. Many typical DML models have been also proposed and applied to the field of pattern recognition. A discriminative deep metric learning (DDML) method is proposed for face recognition, which employed a fully connected deep neural network to learn multiple nonlinear transformations to map face samples into a discriminative distance space in which the similarity of each positive pair increased and the similarity of each negative pair decreased [19]. Hu et al. [20] suggested a deep transfer metric learning (DTML) method for face recognition to learn the deep metric network by maximizing interclass variation and minimizing intraclass variation. Song et al. [21] proposed a DML method for vision recognition and employed the lifted structure feature embedding to learn semantic feature embeddings, and the similar samples are mapped close to each other and dissimilar samples are far from each other. An adaptive interval DML method is proposed for video classification, which can adaptively allocate the distance according to semantic distance between sample pairs [16]. Obviously, these DML methods can use the distance criterion to classify these face images with high accuracy, but they have not strong ability to discriminate between fuzzy face images.
In the field of fault diagnosis, some mechanical signals are very difficult to be classified because of the complexity of signal transmission path and insensitiveness to fault classes of parameter features, and especially, some data samples in the boundary region between different fault categories can be misclassified [22]. Fuzzy operators have the ability to deal with various fuzzy sets. A novel clustering method that combines the adaptive resonance theory (ART) with the similarity measure based on Yu's norm is proposed to diagnose the faults of rolling element bearings, which can recognize faulty data samples in the boundary region through the fuzzy formalism [23]. But it needs to manually extract feature parameters from original data samples.
As mentioned above, a deep metric learning method based on Yu norm-(DMN-Yu-) based similarity measure is proposed to diagnose mechanical faults, which use the Yu norm-based similarity measure to calculate the similarity between different data samples and use the deep network architecture to extract the feature parameter and built the nonlinear relation between data sample and fault category. e rest of this paper is organized as follows. Section 2 reviews the deep neural network and deep metric learning. A deep metric learning method based on Yu norm is depicted in Section 3. Section 4 describes the details of DMN-Yu algorithm, and the diagnosis analysis of bearing fault is conducted in Section 5. Finally, the conclusions are drawn in Section 6.

Deep Neural Network.
Deep neural network is a deep structure with multiple hidden layers, which can learn the hierarchical feature representation by constructing the network architecture from the low-level layer to high-level layer. us, some feature parameters can be extracted directly from the original data automatically. Generally, a deep neural network consists of an input layer, multiple hidden layers, and an output layer, and the network architecture is shown in Figure 1. Assume that there is a deep neural network with N + 1 layers and P (n) units in the nth layer, where the parameters n ∈ [1, 2, . . . , N], (W, b) � (W 1 , b (1) , W (2) , b (2) , . . . , W (N) , b (N) ). Given a data sample set X ∈ R d , the input of the first layer is X, and its corresponding output h (1) can be expressed as follows: where Z (1) is the weighted sums of the input of the first layer, W (1) ∈ R P (1) ×d is the projection matrix learned in the first layer, b (1) ∈ R P (1) is the bias vector in the first layer, and s is the nonlinear activation function of each layer, which can be a sigmoid function or a tanh function. e tanh function was used as the nonlinear activation function in this paper. en, the output h (1) of the first layer is used as the input of the second layer, and the output h (2) of the second layer can be computed as where Z (2) is the weighted sums of the input of the second layer and W (n) ∈ R P (2)×P (1) , b (2) ∈ R P (2) , and s are the projection matrix, bias, and activation function of the second layer, respectively.
Similarly, the output of the n th layer can be written as follows: and the output of the top layer is where mapping function f: R d ↦ R P (N) is a parametric nonlinear equation determined by the parameters W (n) and b (n) , n ∈ [1, 2, . . . , N].

Deep Metric Learning.
Deep metric learning can map original data sample into other feature spaces through a set of hierarchical nonlinear transformations, which use the deep neural network structure to integrate feature learning and metric learning into a joint learning framework [24] and to explicitly built the nonlinear mapping function f, and f is initialized by W (n) and b (n) . If f(X) is the output of input X, the Euclidean distance of the data points x i and x j in the deep metric network space is written as follows: where the goal of the deep metric learning is to establish the mapping function f under some certain constraints.

Deep Metric Learning Model Based on Yu Norm
Metric learning is mainly to measure the distance or similarity between different data samples so that the data sample of the same fault class is as close as possible to each other, and the data sample of different fault classes is as far as possible from each other by metric criterion. erefore, it is very important to select appropriate metric criteria before conducting classification. However, most existing metric learning methods can only map data samples into feature space through a linear transformation, which makes it difficult to capture the nonlinear relationship between data samples [25]. In order to solve this problem, a kernel-based nonlinear metric method is developed to map data samples to high-dimensional feature space and to perform discrimination distance metric in this high-dimensional feature space [26,27]. However, the nonlinear mapping function of the nonlinear metric method based on kernel function is not explicit, which is not conducive to understanding and development. In addition, these two metric learning methods need to extract the feature parameters manually. Different from the above two metric learning methods, the Yu-normbased deep metric learning method can extract feature parameters automatically and establish a set of hierarchical nonlinear transformations through deep neural network and map data sample pairs into a feature space for recognition.

Yu Norm-Based Similarity Measure Criterion.
Yu norm is proposed in 1985, which is composed of the T norm and the S norm in the field of fuzzy mathematics [28], and the T norm and the S norm are, respectively, defined as follows: where all x, y ∈ [0, 1], λ > −1. Because of the unique mathematical property of Yu norm, an equivalence expression which is depicted as follows: is suggested to compute the similarity between two different data samples [29], and based on the similarity degree, the data samples can be classified.
In the process of classification, the data samples in the overlapping region of the different fault categories can be misclassified easily by the traditional distance metric method because of the nonlinearity of the classification boundary line. However, the basis of classification of the Yu normbased similarity metric method depends on similarity between different data samples rather than the distance. us, the Yu norm-based similarity metric method can classify the data samples in the boundary region of fault categories.

Marginal Fisher Analysis. Marginal Fisher analysis
(MFA) is a supervised descendent dimension algorithm that measures the distance between every data sample and its neighbor samples [30], which require the data samples of every category to satisfy approximately the Gauss distribution [31]. Based on the graph embedding framework, MFA can be depicted by two graphs, as shown in Figure 2: an intrinsic graph and a penalty graph. e intrinsic graph characterizes the intraclass compactness, and the penalty graph characterizes the interclass separation. In the intrinsic graph, an adjacency matrix P can be established by the data sample, which is only connected with its k1-nearest neighbors of the same class. In the penalty graph, an adjacency matrix Q can be also established by the sample, which is connected with its k2nearest neighbors of the different class. e matrix P or Q can be expressed as follows: e separation rule of MFA can be described that each sample is drawn close to k1-nearest neighbors of its similar samples and pushed away from k2-nearest neighbors of its dissimilar samples. Based on the graph embedding framework, the intraclass compactness S c described by the intrinsic graph and the interclass separability S b described by the penalty graph can be obtained, respectively. Accordingly, after that the metric learning is introduced into the deep neural network, the marginal Fisher analysis is used to optimize the features extracted in the feature space of top layer, which improves the feature representation ability and the classification performance of the deep neural network.

DMN-Yu Model.
For each pair samples x i and x j in the training sample set X, the output of the nth layer of the network can be represented as f (n) (x i ) and f (n) (x j ) as mentioned above. e similarity of the two samples can be expressed by Yu norm-based similarity measure criterion as follows: where According to the graph embedding framework, the MFA is performed on the output of all the training samples at the top layer of deep neural network, and a strongly supervised deep metric learning model is constructed correspondingly. e parameters of deep metric learning model based on Yu norm (DMN-Yu) can be obtained by the optimization of the following objective function. ereinto, the parameters W (n) and b (n) can be obtained by the gradient descent algorithm: where α is the free parameter which balances the importance between intraclass compactness and the interclass separability. e larger the α is, the greater the interclass scatter is; c is the adjustable regularization parameter, c > 0; ‖Z‖ F denotes the Frobenius norm of the matrix Z; S (n) c and S (n) b define the intraclass compactness and interclass separability, respectively, and its formula can be written as follows: where M is the number of samples in the training set and P and Q are adjacency matrixes. If X j is one of the k1intraclass nearest neighbors of X i , then P ij is set to 1 and otherwise 0; if X j is one of the k2-interclass nearest neighbors of X i , Q ij is set to 1 and otherwise 0.
To solve the problem of parameters optimization in equation (15), the subgradient descent method is used to get the parameters W (n) , b (n) , where n� 1, 2, . . . , N. e gradient of the objective function J with respect to the parameters W (n) and b (n) is computed as follows: where h (0) i � x i and h (0) j � x j are the original input samples of the network, for all other layers n� 1, 2, . . . N−1, and the updating equations are as follows: where the operation ⊙ denotes the element-wise multiplication and W (n) and b (n) can be updated by the following gradient descent algorithm until convergence: where τ is the learning rate. problems. erefore, a BPNN classifier which is introduced into the top feature output layer of the deep metric neural network is utilized to further fine tune the parameters of the network [32,33] and classify data samples.

Fault Diagnosis Method Based on DMN-Yu
e DMN-Yu model is trained by the backpropagation algorithm, and the BPNN classifier is utilized to fine tune the network parameter and to complete the classification of faults. e flowchart of the proposed method is shown in Figure 3.
e corresponding algorithm is summarized as follows: Step 1: normalize a large number of labeled data samples and divide them into the test set and training set according to a certain proportion.
Step 2: construct the DMN-Yu model, set the parameters W (n) and b (n) to a number close to 0, and set the iteration number T, the intraclass nearest neighbor k1, the interclass nearest neighbor k2, and the learning rate τ, λ, c, and α.
Step 3: Train the DMN-Yu model, and the data samples are nonlinearly transformed to the top layer and are made by the marginal Fisher analysis to further constrain the extracted features so that the descriptive and discriminative features can be obtained.
Step 4: Add a BPNN classifier to the feature output layer at the top of the network to fine tune the network and optimize the network parameters.
Step 5: use the trained network to classify the test dataset.
Step 6: calculate the classification accuracy.

Experimental Dataset.
In order to verify the effectiveness of the proposed method, the DMN-Yu model was used to diagnose the rolling bearing data [34] from Electrical Engineering Laboratory of Case Western Reserve University. e data are collected on the deep groove ball bearing mounted on the motor drive end type SKF6205-2RS. ere are four types bearing faults, namely, normal state, inner race fault, outer race fault, and ball fault, bearings were seeded with faults using electrodischarge machining, each fault has three different depths of damage, and Table 1 shows the description of the bearing dataset. e data are collected by the accelerometer at a sampling frequency of 12 KHz under the operating condition of 0 HP load and 1797 rpm, and the dataset contains 10 fault types. e number of samples for each fault type is 700, a total of 7000 samples, and a single sample has 512 sample points, where the number of samples in the training set is 4900 and the number of samples in the test set is 2100.

Fault Diagnosis and Analysis.
A DMN-Yu model with 3layer (N � 2) network is constructed, and the number of neural nodes in each layer was set as 512-100-100, respectively. BPNN classifier was added to the top-level feature output layer that the classification result output layer nodes were set as 10. It is very important to select proper model parameters for obtaining better diagnosis accuracy. Here, α was set as 4.0, λ was 0.2, the maximum number of iterations was 10, the regularization parameter c was 0.5, the initial learning rate τ was 0.2, and the learning rate decay was 0.95. e number of the neighbor points k1 and k2 has a greater impact on the diagnosis ability of the model. If the number of the neighbor points is too small, it is difficult for DMN-Yu model to dig out the intrinsic fault information from the high-dimensional data, and if the number of neighbor points is too large, geometric information and nonlinear information of data are easily ignored, so the nearest neighbor values k1 � 5 and k2 � 10 according to the reference [30].
In order to obtain the specific diagnosis information of the model for each fault category, the confusion matrix was used to visualize the diagnosis results, and the diagnosis results were quantitatively described using the precision P and the recall R [35]. Figure 4 shows the diagnosis results of the training samples. From Figure 4, it can be seen that the diagnosis accuracy of each fault category is 100%. It indicates that the proposed method can extract the fault feature parameter of bearing and diagnose the fault category effectively, and there are no training errors.
In addition, to describe the diagnosis process of the DMN-Yu model, t-SNE (t-distributed stochastic neighbor embedding) algorithm is used to visualize the high-dimensional data in the 3D (3-dimensional) space by mapping samples from the original feature space to 3D space. e fault features extracted from each layer of DMN-Yu model are optimized and mapped to the 3D space by t-SNE [36]. And the corresponding 3D feature distribution scatter diagrams of the trained DMN-Yu model are obtained. Figures 5-7 show the visualization of the original data, the first hidden layer output feature, and the second hidden layer output feature, respectively. It can be seen that the interclass distance between the features of different fault categories becomes larger and larger and the intraclass distance becomes smaller and smaller with the rising of layer number, and at the top feature output layer, all fault categories can be basically separated. All these indicate that the DMN-Yu model can perform feature extraction through minimizing the intraclass distance and maximizing the interclass distance, and the subsequent classification accuracy can get higher and higher. e diagnosis result of the bearing fault test set samples is shown in Figure 8. It can be seen that the classification accuracy is 97% and error rate is only 3%.       Shock and Vibration 7 number 7, and its precision P is 99.0%. Correspondingly, the classification accuracies of other fault categories can also be seen in Figure 8. All these can demonstrate that the DMN-Yu model can diagnose the different fault categories of bearings accurately.   the SVM combined with nine time-domain statistical features which are mean, standard deviation, root mean square, skewness, kurtosis, crest factor, margin factor, shape factor, and impact factor is 85.05%. It is obvious that the accuracies of the DBN and the traditional SVM are lower than that of DMN-Yu and DMN-Euc, and the accuracy of DMN-Yu is the highest. All these demonstrated that the performance of DMN-Yu and DMN-Euc is superior to DBN and SVM, and the DMN-Yu is the best. In addition, the parameter value of these methods is set as described in Table 3.

Conclusion
A novel deep metric learning based on Yu norm is proposed to diagnose the fault of the rolling bearings, which can measure the similarities as well as differences between data samples and improve diagnosis ability by reducing intraclass scatter and interclass similarity. Due to the fuzziness of the boundary of the fault categories, the data samples at the      Deep transfer metric learning ART: Adaptive resonance theory MFA: Marginal Fisher analysis t-SNE: t-distributed stochastic neighbor embedding.

Data Availability
e data used to support the findings of this study are available at http://csegroups.case.edu/bearingdatacenter/ pages/download-data-file.

Conflicts of Interest
e authors declare that there are no conflicts of interest.