DC-NNMN: Across Components Fault Diagnosis Based on Deep Few-Shot Learning

,


Introduction
In complex industrial systems, fault diagnosis is an important issue to ensure the safety of equipment and personnel [1,2]. In recent years, the ability of deep neural network models to learn fault features of a large number of samples has been well known and widely used in the field of fault diagnosis [3,4]. However, the success of deep learningbased fault diagnosis depends on the following two conditions: (1) massive amounts of labeled fault data; (2) training data and testing data which have the same category space and consistent distribution [5][6][7].
At present, many scholars focus on the fault diagnosis with limited labeled samples. e method of transfer learning has been introduced in recent years, which uses existing knowledge in the source domain to solve fault classification in the different target domains. Lu et al. [8] proposed a deep neural network model with domain adaption to realize fault diagnosis under different loads. Wen et al. [9] proposed a Deep Transfer Learning method of rolling bearing fault diagnosis with unlabeled target domain data, which minimizes the loss of difference between features of training and test data using maximum mean discrepancy. Hang et al. [10] proposed a Principal Component Analysis (PCA) method based on the improved SMOTE algorithm and applied PCA to the field of high-dimensional imbalance fault data. In order to increase the size of the sample set.
Many scholars have used the idea of GAN to realize the generation of vibration samples for fault diagnosis. Cabrera et al. [11] used the GANs model to evaluate the data distribution of each minority failure mode. Zhao et al. [12] proposed a switchable normalized semisupervised generative fault diagnosis method network, by generating samples to assist the model training. en, the problem of insufficient label of fault samples under test conditions can be solved. e above studies can solve the problem of fault diagnosis with insufficient labeled data when the training set and testing set have the same category space in deep networks. e model trained by the labeled data of one component cannot be able to classify other component fault categories, because even though the labeled data can be obtained from some other components, the fault category space and data distribution of different components are different; we call it across components fault diagnosis.
Few-shot learning is committed to understanding new categories from a few examples, and it is a very popular topic in the field of image classification. Some implementation approaches include model-based, metric-based, and optimization-based methods. e model-based methods aim to quickly update the parameters with a small number of samples through the design of the model structure and directly establish a mapping function of the input x and the predicted value p, such as memory enhancement methods [13] and Meta Network [14]. e metric-based method completes the classification by measuring the distance between the samples in the batch set and the samples in the support set. e typical methods based on metrics are the Siamese Network [15], the Match Network [16], the Prototype Network [17], and so on. e optimization-based methods are represented by Finn et al. [18], who proposed that ordinary gradient descent methods were difficult to fit in few-shot scenarios. e idea of optimization-based methods completes the task of fewshot classification by adjusting the optimization method, so the methods are not limited to the size of the parameters and the model architecture.
However, because of the difference distribution between image data and vibration data, the existing few-shot learning models cannot be well adapted in the field of fault diagnosis. us, this paper proposes a across components few-shot learning fault diagnosis method based on matching network, and the model is verified through a series of experiments. e main insights and contributions of this study are summarized as follows: (1) We propose an intelligent fault diagnosis method based on deep convolutional nearest neighbor matching networks (DC-NNMN). A four-layer convolutional network is designed to extract highdimensional fault features. e cosine distance is merged into the K-Nearest Neighbor method to model the distance distribution between the unlabeled sample from query set and labeled sample from support set in high-dimensional fault features, so that the fault samples of the same category are close to each other and the samples of different categories are far away. e query set and support set samples of one component are decomposed into different meta tasks to learn the generalization ability of the model when the fault category changes; then, the unknown fault category of another component can be classified without changing the network model. (2) We use the Case Western Reserve University (CWRU) bearing vibration datasets as the training set and the bearing vibration data selected from Labbuilt experimental platform and another gearing vibration dataset, respectively, as the testing set for our experiment to prove the feasibility of the proposed method. Experimental results prove that the model trained by bearing fault data has achieved accurate fault classification on the new fault category of both bearing and gearing. e proposed method implements across components fault diagnosis with tiny fault samples. e rest of the paper is organized as follows. Section 2 introduces the preliminaries of DC-NNMN. Section 3 details the proposed deep convolution nearest neighbor matching network model (DC-NNMN), including problem description, model structure, and optimization objectives. In Section 4, experimental verification and corresponding analysis are conducted. e conclusions are drawn in Section 5.

Few-Shot Learning.
e main challenge of few-shot learning is how to understand new categories from a few examples. Specifically, the training set of few-shot learning contains many categories, and each category has multiple samples. In the training phase, c categories are randomly selected in the training set, and each category selects k samples (a total of c * k samples) as the support set S. en selecting k ′ from the remained data in the c categories samples serves as the query set Q for the model. e goal of the model is to minimize the prediction loss on the query set Q, by giving the support set S as input. at is, the model is required to learn how to distinguish these c classes from the c * k samples in the support set. Such a task is called a C-way k-shot problem. In few-shot learning, k is usually less than 20. S and Q can be expressed as follows: (1)

K-Nearest
Neighbor. K-Nearest Neighbor (KNN) was originally proposed by Cover and Hart in 1968 [19]. It is a relatively mature nonparametric statistical method for classification and regression. e core idea is that if most of the K-Nearest Neighbors of a sample in the feature space belong to a certain category, the sample also belongs to this category. Take a set of data with known labels {(x 1 , y 1 ), (x 2 , y 2 ), . . ., (x n , y n )}, where x i is the feature vector of the sample i. and y i is its label, y i � c 1 , c 2 , . . . , c k . For the training sample (x, y), the KNN algorithm searches for the K instances that are closest to x based on the given distance metric, denoted as x i ′ , i � 1, 2, . . . K. en calculate the label of the sample x to be tested based on the decision rule: 2 Shock and Vibration y � arg max where I is the distance for measuring similarity. erefore, after the distance metric is determined, the K-Nearest Neighbor algorithm has only one parameter of K. How to choose an optimal K value depends on the dataset itself. As shown in Figure 1, a red circle is the test sample, if K � 3, it is classified as a green square, and if K � 5, it is classified as a yellow triangle. It has the advantages of simplicity, easy to understand, easy to implement, no need to estimate parameters, and no training. It is especially suitable for multiclassification problems.

Problem Description.
In this paper, the idea of few-shot learning based on Match Network is used to the fault diagnosis across category spaces. We define across components few-shot learning fault diagnosis problem as follows: , the data in query set Q has the same categories with support set S. (4) T and S have different feature spaces χ and category spaces Y: k′ i�1 randomly selected from the training set T. Among them, T s is the same as S and T q is the same as Q. During training, each task randomly selects T s and T q to train the fault diagnosis model and repeats the task many times to achieve model training at the metalevel. erefore, our goal is to train the model using T s and T q of fault MCA vibration sample to classify each new class in Q according to the set S of fault MCB. e main idea of the problem description of across components is shown in Figure 2.

Deep Convolution Nearest Neighbor Matching Network.
is paper proposes a deep convolutional neighbor matching network (DC-NNMN) to learn a support set S with labeled fault samples and then classify the fault samples in query dataset Q.
As shown in Figure 3, the model proposed in this paper contains two parts: the embedding module f φ and the matching module g ∅ . In the embedding module, we use a convolution network to complete the map from the input space of the sample to the feature space, using the K-Nearest Neighbor algorithm to complete the matching from the feature space to the category space, so as to achieve the fault classification task.
e features in the time-domain vibration sample have translation invariance; that is, a certain statistical feature in the sample may appear at any time. Convolutional neural networks have the characteristics of local connections and weights sharing, so convolution operations are particularly suitable for processing time-domain vibration samples. As shown in Table 1, we adopt a neural network with a fourlayer convolution operation as our embedded module to extract the feature information of each fault sample. Because the number of samples is too small, in order to prevent overfitting, the fully connected layer after the traditional convolution operation is canceled, reducing the parameters that the network model needs to train. e first layer is the input layer, and the size of the input fault sample is L × 1. Each subsequent convolution operation includes a convolution and a batch regularization. e size of the convolution kernel is s and the number of convolution kernels is n. e activation function is the Leaky ReLU activation function. In addition, the first and second layers add an additional maxpooling layer after the convolution operation. e convolution operations of the first two layers are as follows: where * represents convolution operation, W d and b d represent convolution kernel and bias, h d is the result of convolution operation, and d represents d-th layer of network. e last two layers are

Shock and Vibration
In this way, after a four-layer convolution operation, a feature vector with a size of L/4 × 1 × n is obtained, which can be expressed as e matching module is mainly to use the deep feature descriptions of all fault samples in a category to construct the local feature space for fault classification. If we directly use a limited amount of data to train a classifier on a few-shot learning task, the model will almost certainly overfitting.
ere are tens of thousands of parameters in the neural network classifier which need to be optimized. Instead, many nonparametric methods are more suitable. Considering the discreteness of the fault vibration sample, the KNN algorithm is used to verify the spatial distance between the samples of the query set and this category in the support set, as shown in Figure 2.
Specifically, each sample q from the query set Q is processed by the embedding module to obtain f(q) � [x 1 , . . . , x m ] ∈ R k×m . K-Nearest Neighbors in a category c for x i in turn are found and get

Conv1 Pool1 Pool2 Conv2
Conv3 Conv4  Shock and Vibration e cosine of the angle between the two vectors is used to measure the correlation between them. e cosine distance can reduce the sensitivity to absolute values, which is suitable for measuring the distance between discrete data. e cosine similarity of the vectors x ′j i and x i is 3.3. Optimization Objective. In this paper, the number of labeled fault samples known as MCB is less than 20. If we train the limited number of labeled samples directly, the model will inevitably fall into overfitting and fail to accurately classify faults. e episodic training mechanism [16] has been demonstrated as an effective approach to learn the transferable knowledge from the training dataset. Specifically, in each iteration, we use the constructed training set to construct a data structure similar to that in the test set. So, the network is trained through N tasks. For each task, there has two inputs, namely, support set S and query set Q. e feature information of each sample is obtained through the processing of the embedded module, and they are matched with the correct category according to the matching module. For the model, we hope that, at each task, the network can try to have a good classification effect on the samples in Q; that is, g(f(q), c) can match the correct category. e output of the network is considered as a value of 0 to 1.0 which means very dissimilar, while 1 means completely similar. In this way, for each sample in Q, a predicted value for the real category is got. is predicted value can be used to build a cross-entropy loss function for a single task; that is, where t represents the t-th training task, Y i represents the true label of the i-th sample, and Y i ′ represents the predicted label obtained through the network. For N tasks, the total loss function is During the training process, the loss function L is minimized through backward transfer and gradient descent. We adopt the adaptive moment estimation method to update the parameters of the model in this paper. e algorithm can calculate the adaptive learning rate of each parameter, and the convergence speed is fast. At the same time, it can correct problems in other optimization techniques, such as the disappearance of the learning rate, slow convergence, or the large variance of the loss functions caused by the update of high variance parameters. e parameter update rules are as follows: In the above equation, φ is the characterization of the convolutional network parameters, m t is the average value of the first moment of the gradient, and v t is the noncenter variance value of the second moment of the gradient. e variance at the two moments, μ, is the learning rate of the model, ε is an infinitesimal small amount 10 − 8 , and β 1 and β 2 are two parameters of the Adam optimizer. e pseudocode of the algorithm is shown in Table 2.

Fault Diagnosis Based on DC-NNMN.
e flowchart of the proposed fault diagnosis method is shown in Figure 4. It mainly includes three steps: the construction of the training dataset, the model building and training, and the testing of the fault samples.
(1) In the step of training dataset construction, many different categories of labeled vibration samples of fault bearing need to be used. According to the setting form of few-shot learning dataset, support set T s and query set T q of C-way k-shot are randomly selected. (2) In the step of model building and training, the model as shown in Figure 2 is built firstly. en, we send the dataset extracted each time to the network for training and record it as a task. After N times of training, the parameters involved in our model are fixed.
Compute adapted parameters with Adam: Shock and Vibration (3) Input the support set S and query set Q to the network, and the terminal of the network will give the classification results. It is worth mentioning that, in this stage, the parameters of the model will not be updated anymore; that is to say, through the training of multiple tasks and the parameters optimization, the model has already possessed the ability to classify completely different C-type fault samples on the Cway k-shot sample set.

Case Study
In this section, we use the Case Western Reserve University (CWRU) bearing datasets [20], the bearing vibration data selected from Lab-built experimental platform, and another gearing dataset [21] for our experiment to prove the feasibility of the proposed method. Figure 5, the CWRU bearing experimental platform includes a 2-horsepower motor (left), a torque sensor (middle), a power meter (right), and electronic control equipment. is dataset is one of the most commonly used benchmark datasets in the field of fault diagnosis. Single point pitting faults are arranged on the bearings using EDM technology. e fault categories include IF (inner ring faults), OF (outer ring faults), and BF (rolling body faults). At the same time, the location of the faulty bearing is also different, which is located at the drive end and the fan end, respectively. It can be clearly seen in Figure 6 that the type of bearing fault, load, and fault size will cause significant differences in the collected signals. Based on the above, we select the vibration samples under the conditions of different bearings on two positions, 2 kinds of load conditions, 5 kinds of fault categories, and 4 kinds of fault sizes. We set up the training set for the model with 80 fault categories of CWRU bearing and 90 samples of each category. e specific description is shown in Table 3, where bearing position contains FE (fan end) and DE (drive end) and fault categories contain BF (ball fault), IF (inner ring fault), N (normal), and OF@3 which means that the fault point is at 3 o'clock in the outer ring of the bearing; both OF@6 and OF@9 are the same.

Data Setting. As shown in
During the test step, two different vibration datasets from different mechanical components will be verified on the model. One is the bearing fault data, which is collected by our self-built bearing experimental platform, as shown in Figure 7. e other is the gearing fault data [21]. e specific data settings are shown in Table 4. e bearing fault categories we selected are normal (N), ball fault (BF), outer ring fault (OF), inner ring fault (IF), and a compound fault consisting of ball fault and outer ring fault (B and OF). e gear fault categories we selected are crack, health, missing, spall, and chip5a, where 5a means wear degree.

Part 1 Fault Classification Experiment Results on the C-Way K-Shot Problem.
All the experiments in this section revolve around the classification task of the C-way K-shot problem. During the training phase, we extracted 5 different categories of fault data for each task, each fault category contains 1, 3, or 5 samples. In each task, each category of fault sample provides 15 verification data for query set. In other words, for each 5-way 1-shot task, it contains 5 support samples and 75 query samples.
In the test phase, both the bearing dataset and the gearing dataset are verified. e experimental results are the average of multiple experimental results, as shown in Figure 8. When the model is tested on the Lab-built bearing dataset, the fault classification accuracy of 5-way 1-shot, 5way 3-shot, and 5-way 5-shot is, respectively, 82.63%, 92.60%, and 94.79%. at is to say, we only need tiny labeled data for each category for the model training; the across components fault diagnostic model has a satisfactory generalization performance when the testing set has the same category space and different probability distribution.
Moreover, we also can see that when the model is tested on the gearing dataset, the fault classification accuracy of 5way 1-shot, 5-way 3-shot, and 5-way 5-shot is, respectively, 82.19%, 91.28%, and 93.00%. Considering the testing set of the three across components fault diagnostic experiments are with the different category space and different probability distribution, although the classification accuracy is lower than that of Lab-built bearing dataset, the results is also reasonable and favorable.

Part 2 Fault Classification Results of Different Models.
In this section, we compare the performance of the proposed method in this paper with several most commonly used models in the bearing fault diagnosis with fewer known labels samples, reflecting the superiority of the proposed method. e compared models include WDCNN, CNN_SVM, SAE, and SS-GAN. We give 5, 50, or 100 labeled fault samples for training the model and testing on query set. All of the samples are selected from Lab-built experimental platform and then we obtain the fault classification results through multiple experiments, as shown in Table 5.
It can be seen that the proposed method has the highest accuracy on three training sets. In the case of only five knowns fault samples, the best performance of the traditional model is the SAE model, but its fault classification accuracy is only 58.07%, while the fault classification accuracy of the proposed method on the gearing dataset is 82.19% and the accuracy on the bearing data is 82.63%.
As we know, for most neural networks, it is necessary to train with a large amount of labeled data to have a good classification accuracy. erefore, when there is only a small amount of labeled data, it is inappropriate to directly use the traditional model. e proposed method in this paper    Table 6. From Table 6, we can see that, for the nearest neighbor algorithm, it is not that the larger the value of K is, the better the classification accuracy is. Relative to different datasets, the optimal value of K varies. In this paper, when K � 5 on the gear dataset, the model achieves the best result of 93.63%; when K � 10 on the bearing dataset, the model obtains the best result of 95.51%. is is because the gearing data has higher discreteness, the data distribution is relatively more dispersed, and a higher K value will reduce the accuracy of the classification results. While the distribution of the bearing data is more compact, it can promote classification accuracy with the increasing K.

Conclusion
In this study, a deep convolutional neighbor matching network based on few-shot learning is proposed, which can solve the across components fault diagnosis with tiny labeled samples. e convolutional network is used to extract fault features from a small sample dataset. en the K-Nearest Neighbor algorithm is adopted to match the samples of the unknown label with the dataset to achieve the fault classification of the new categories. We have proved the superiority of the proposed method by using three datasets of different components and comparing with four popular network models. e method in this paper provides a good idea for solving the problem of across components fault diagnosis with tiny labeled samples.

Data Availability
e experimental data of this article are from the Case Western Bearing Data Center; the bearing vibration data selected from Lab-built experimental platform and another gearing dataset are specified in the article.

Authors' Contributions
Pengfei Xu contributed to the writing of this manuscript, references, analysis of experimental results, and data interpretation. Juan Xu contributed to the research ideas, research directions, research acquisition, and research design. Lei Shi contributed to document retrieval, research content, algorithm flow, data analysis, algorithm analysis, and manuscript writing. Zhenchun Wei contributed to the experimental design, algorithm implementation, experimental analysis, and manuscript review of this manuscript. Xu Ding contributed to the experimental algorithm.