Fault Diagnosis of Gearbox in Multiple Conditions Based on Fine-Grained Classification CNN Algorithm

.e use of the convolutional neural network for fault diagnosis has been a common method of research in recent years. Since this method can automatically extract fault features, it has played a good role in some research studies. However, this method has a clear drawback that the signals will be significantly affected by working conditions and sample size, and it is difficult to improve diagnostic accuracy by directly learning faults, regardless of working conditions. It is therefore a research orientation worthy of a diagnosis of high precision defect in various working conditions. In this article, using a fine-grained classification algorithm, the operating conditions of the object system are considered an approximate classification. A specific failure in different working conditions is considered a beautiful classification. Samples of different faults in different working conditions are learned uniformly and the common characteristics are extracted from the convolutional network so that different faults of different working conditions can simultaneously be identified on the basis of the entire sample. Experimental results show that the method effectively uses the set of samples of the working conditions of the variables to obtain the dual recognition of defects and specific working conditions and the accuracy of the recognition is significantly higher than the method of learning regardless of working conditions.


Introduction
Traditional methods of fault diagnosis, whether in time domain or frequency domain analysis, are highly dependent on physical experience. In recent years, because convolution neural network (CNN) was proposed [1], deep learning algorithms have developed rapidly; their powerful end-toend learning ability enables feature extraction work that needs experience to be completed independently by CNN, which becomes a new direction of fault diagnosis research. For rotating machine, especially for gearbox, using CNN to learn the vibration signals was the main method of fault diagnosis [2][3][4][5]. However, most of these studies are only based on specific conditions. Obviously, many systems have very strong characteristic changes under different working conditions. Although some research studies try to use CNN's ability to directly model the multi-working conditions of the object, the settings of this working condition change were very limited [6]. In fact, in some practical fault diagnosis problems, the influence of working conditions on signal characteristics is greater than that of fault types [7]. erefore, extracting fault features directly without considering working conditions can seriously reduce the accuracy of classification [8,9]. In order to solve this problem, some studies use the method of separated conditions modeling [7,9], which divides the entire problem into independent problems that are not related to each other and reduces the utilization efficiency of samples. e latter method addresses another problem, which means that the sample size must be quite large. But the diagnosis of fault is often difficult to sample; this means that a separate model must be formed on a smaller sample, so accuracy is limited. Transfer learning is used to solve this problem, including adjusting the source model to adapt to new conditions [10,11] and using the source model to accelerate learning [12]; the premise is that there is a source model. ese methods have achieved excellent results, but they are all based on an important premise, that the sample size is large enough to support diagnostic modeling in a single working condition. However, in many problems, this is a difficult requirement. A neglected message is that, although under different conditions, they have the same characteristics as the same target system. Because the influence of working conditions and defects on the system response is interactive and the influence of working conditions is greater than that of defects, the diagnosis of defects in different working conditions can be considered a fine-grained classification problem, which is a different structure from the CNN traditional. is structure is very effective in dealing with hierarchical problems [12][13][14][15] and has been applied in the diagnosis of defect [16], but only applied as a means of improving information. However, this structure was not used to solve the problem of defect diagnosis of limited samples under the disturbance state. Because the influence of working conditions is more important than health states, the model is designed as shown in Figure 1.
In the traditional method ( Figure 1(a)), when the sample size is sufficient, the modeling effect has been achieved by using sub-working condition modeling. According to the traditional method (Figure 1(b)), the coupling of the working conditions to the fault is ignored, which has a significant impact. In method in Figure 1(b), although it seems that more samples are used to support the training of a single network, the modeling accuracy will often be seriously reduced instead. is means that although it is necessary to find a more efficient way to use the sample, the effect of distinguishing working conditions on fault classification must also be guaranteed at the same time.
e fine-grained method (Figure 1(c)) uses the same convolutional layers to extract common features of faults. rough the two-level fully connected network and the two-stage loss function, recognize the working condition features with stronger significance as coarse classification, and the fault features are subdivided into specific faults under different working conditions as a fine-grained identification. In order to increase the recognition accuracy, the function of increasing class spacing is designed and added to the loss. All the above operations eliminate the impact of changes in operating conditions on fault diagnosis. Our work has the following advantages: (i) High utilization efficiency of the data: all samples unified are used to train a single feature extractor, the sample utilization ratio is doubled compared with the traditional division of labor modeling, the convolution layer can obtain more adequate training to achieve better feature extraction effect, and the working condition label originally used for manual division is also used in learning (ii) Under the constraint of fine-grained model, the fault under different working conditions is automatically distinguished, and the influence of working conditions on fault diagnosis is solved while all samples are trained uniformly Experiments show that the gearbox is a system with significant differences in working conditions and rare effective fault samples and its fault diagnosis problem is solved.

Research Object
Gearbox is an object with distinct characteristics. Because different working conditions mean completely different mesh states, working conditions are a kind of information that cannot be ignored. Gearbox is also a kind of object with a high cost of collecting the sample, due to the low failure rate; hundreds of hours of work can often encounter a single failure as an effective sample.
is means that it is not possible to form the model through a large sample size. An appropriate method must take into account the economic cost. In this paper, the planetary gearbox for a particular type of vehicle is utilized as an object. e gearbox transmission principle is shown in Figure 2. ere are three planetary gears K1, K2, and K3 that need to be analyzed. e test bench and sensor arrangement are shown in Figure 3. e health states corresponding to the three planetary wheels include (a) normal, (b) planetary wheel tooth fault (hereinafter referred to as fault (1)), (c) planetary wheel tooth fault (hereinafter referred to as fault (2)), (d) planetary wheel tooth fault (herein after referred to as fault (3)), and (e) solar wheel tooth fault (herein after referred to as fault (4)). ere are 4 kinds of corresponding working conditions: 1 to 4 gears. e signal characteristics of the same fault under different working conditions are not exactly the same. For example, normal and fault 3 and fault 4 conditions include 1/ 2/3/4/5 and reverse gear. e possible working conditions of fault 1 and fault 2 include gear 2 and gear 3. However, the conventional convolutional network can only label the 5 situations that need to be identified. e relationship between different working conditions and the change of fault signal characteristics by working conditions cannot be connected. e variation of working conditions becomes a disruptive factor in fault diagnosis, which reduces the accuracy of the diagnosis. Fine-grained classification algorithm can distinguish the influence of working conditions and health state from principles. Its performance depends on the design of the loss function.
In [12,13], the contrast loss function is utilized. In [14,15], a triple loss function is used to learn a feature that maximizes the distance between classes while minimizing the distance between classes and improve the precision of the fine-grained classification algorithm. However, when the amount of data is relatively large, the computation amount of these two algorithms increases exponentially when constructing binary or triple groups. Experiments show that the quality of binary or triple sets has a major influence on the final classification accuracy. erefore, problems such as slow model convergence, large computation, increased training complexity, and increased uncertainty of results will occur when using the above methods. In order to solve the above problems, a new loss function has been proposed in the literature [17], which has improved the classification accuracy of fine-grained classification problems of the following two aspects: (1) a structure of cascading classification 2 Shock and Vibration was designed to better describe the hierarchical relationship between fine-grained and coarse categories; (2) a wide margin loss method is proposed, which aims to minimize the inner distance and maximize the distance between classes, and the distance between fine-grained classes belonging to the same class of parents should be smaller than the distance between classes with fine-grained belonging to different classes of parents. e algorithm in [18] is applied to the gearbox fault diagnosis to explore its effect.

Methodology
Before introducing our method, we give some definitions in order to explain our method in a mathematical way. Given a signal data set Γ � S i , Y i n i�1 , where S i is the ith input signal and n is the total number of the training sets, each label of the input signal S i has a hierarchical label structure Y i � y j i l j�1 , where y j i ∈ 1, 2, . . . , Y (j) is the class label of the jth level, l is the number of label levels of the hierarchical label set, and Y (j) is the number of classes of the jth level. We suppose that the first level denotes the fine-grained label. So y 1 i is the finegrained label corresponding to signal S i , and Y (1) is the total number of fine-grained classes. For each input signal S i , we define the preliminary layer of the CNN model as its feature vector, denoted as s i . For the problem of fault signal diagnosis to be solved in this paper, the hierarchical label structure consists of two levels (l � 2). e first level is the fine-grained class level, which identifies the specific sources of failure (Y (1) � the total number of health states). e second level is the coarse class level (Y (2) � 6) which denotes different working conditions. After explaining the basic definitions above, we introduce the two main parts of our method, respectively.

Cascaded Softmax Loss.
In the problem of fine-grained classifications with two hierarchical labels (working condition coarse class classification and health state fine-grained class classification), we divided the last classification layer of the CNN model into two fully connected layers and used a cascaded softmax loss for training. Figure 4 shows the whole framework of our method. e number of neurons in fc6 layer (health state fine-grained class classification layer) and fc7 layer (working condition coarse class classification layer)   is Y (1) and Y (2) , respectively. For each input signal S i , the output of fc5 layer is the feature vector s i of the signal S i . e output of the fc6 layer and fc7 layer is the probability scores p(y 1 i /y 1 i ) of the health state fine-grained class y 1 i ∈ 1, 2, . . . , Y (1) and the probability scores p(y 2 i /s i ) of the working condition coarse class y 2 i ∈ 1, 2, . . . , Y (2) . e purpose of adding the skipping connection between fc5 layer and fc7 layer is that it could provide not only the features of the health state fine-grained class but also the probability scores p(y 1 i /s i ) (the output of fc6 layer) of it. Intuitively, using the above-mentioned two different types of information to conduct the working condition coarse level classification will be better than the one using only the health state fine-grained level classification results, since the former not only explores the semantic information of the fault signals (i.e., the learned features), but also learns the hierarchical label structure of the fault signals. Besides, in the iterative learning procedure, the error flows of fc7 layer backpropagate to fc6 layer, fc5 layer, and the first few layers of the CNN model; this can help to improve the health state fine-grained class classification accuracy.
To train the network shown in Figure 4, the cascaded softmax loss of fc6 layer and fc7 layer is as follows: where W denotes the parameters of the whole network. For fine-grained classifications problem with two hierarchical labels (working condition coarse classification and health state fine-grained classification) (l � 2), softmax(W, S i , y 1 i ) and softmax(W, S i , y 2 i ) are used to train fc6 layer and fc7 layer, respectively. In fact, the cascaded softmax loss can be seen as a multi-task learning problem. One task is the health state fine-grained class classification; the other is the working condition coarse class classification. In the joint training procedure, these two tasks could improve each other by sharing the feature representation. e whole loss function of training the CNN network in Figure 4 can be defined as where csm(W, S i , Y i ) is the cascaded softmax loss defined in formula (1), and M(W, S, Y) represents the GLM (generalized large-margin) loss, used to train the feature layer (fc5 layer) of the network. e input of the GLM loss is the training feature set S � s 1 , . . . , s n and hierarchy structure label set Y � Y 1 , ..., Y n . λ is the hyperparameter which is used to balance the cascaded softmax loss and the GLM loss.

Generalized Large-Margin Loss.
For each health state fine-grained class y, we define two groups SF(y) and SF(y) which make up the remaining health state fine-grained classes. ese two groups consist of the health state finegrained classes which share and do not share the same parent working condition coarse class with class y, respectively. e purpose of the GLM loss includes two aspects: (1) the distance between health state fine-grained class y and the nearest health state fine-grained class in SF(y) larger than the intraclass distance of health state fine-grained class y by a pre-defined margin; (2) the distance between health state fine-grained class y and its nearest health state fine-grained class in SF(y) is larger than the distance between health state fine-grained class y and its farthest health state fine-grained class in SF(y) by a predefined margin. In the following, we will first define the intraclass distance and interclass distance and then use these definitions to describe the GLM loss. e principle is shown in Figure 5. e features belonging to health state fine-grained class y in the training set are defined as  where τ y denotes the index set belonging to the health state fine-grained class y in the training data set. e mean vector of F y is defined as where n y � |τ y |. e intraclass distance of F y is We define F m and F n as two feature sets: en the interclass distance between F m and F n can be defined as where , and tr(·) is the matrix trace.
After we describe the definitions mentioned above, the two constraints in the GLM loss can be expressed as where α 1 and α 2 are two predefined margin values, In formulas (11)-(13), SF(y) consists of health state finegrained classes that share the same parent (i.e., working condition coarse class) with health state fine-grained class y. And F (min) SF(y) is the feature vector set of fault signal samples which is the closest to health state fine-grained class y in the health state fine-grained classes of SF(y), and F (max) SF(y) is the feature vector set of fault signal samples which is the farthest to health state fine-grained class y in the health state finegrained classes of SF(y). Besides, SF(y) consists of the health state fine-grained classes that do not share the same parent working condition coarse class with health state finegrained class y and F (min) SF(y) is the feature vector set of the fault signal samples which is the closest to health state finegrained class y in the health state fine-grained class of SF(y) ( Figure 5).
Using the definitions mentioned above, the GLM loss which contains two-level label structure can be described as 3.3. Optimization. e CNN model formation algorithm we use here is the standard backpropagation algorithm (BP) based on the mini-batch. e full loss function is defined in formula (2). By minimizing this objective function, the CNN model can learn fault characteristics that can better distinguish different health conditions in different working conditions, thus improving the classification accuracy of faults. Finally, therefore, we need to calculate the gradients of the entire loss function with respect to the activations of all CNN layers, which are called the error streams of the Shock and Vibration 5 corresponding layers. As it is very simple to calculate the gradient of softmax loss, we only provide the gradient bypass process for the loss of GLM with respect to the following. In this paper, we use a two-level hierarchical label structure to describe the fault diagnosis problem. erefore, the derivatives of the GLM loss regarding s i can be computed as follows: where I(·) is an indicator function, which equals one if the condition is true, and zero otherwise. e jth column of the matrix is represented by a subscript (: , j), and Algorithm 1 describes the training algorithm based on the network framework shown in Figure 4 of formula (15).

Data Description.
e experimental data used in this paper is collected on a planetary gearbox for a particular type of vehicle mentioned in Section 1. And the physical diagram of fault simulation experimental platform is shown in Figure 6. ). e labels correspond to five health states denoted as 0-4, respectively. Each health state samples is collected under 4 working conditions, which correspond to the gear positions 1 to 4. e data sampling frequency is 20 kHz and each working condition has four types of input speed: 600 r/min, 900 r/min, 1200 r/min, and 1500 r/min. e load torque is 900 N · m. A total of 4,800 samples are obtained for each health state, and 80% of samples are used for training. Each sample contains 4 measurement points, which are the vibration signal of the gearbox, and the length of each measurement point is 2000 which is longer than one rotation period. So the sample data length is 4 * 2000. ere are 12,000 samples in total. All the sample data are shown in Table 1.
For the fault diagnosis problem described in this paper, the traditional method (a) needs to establish a model according to each independent condition; a total of four models can be completely processed; each model has 5 states; each state has 600 tags, while the traditional method (b) ignores the working conditions to use all the samples, a total of 5 states, 2400 samples each. e fine granularity method used in this paper also has the same label form as the traditional method (a) while using all the working condition samples uniformly. And the method (c) focuses on fault diagnosis while avoiding losing the working condition information at the same time. ere are 4 conditions as the coarse class and 4 × 5 faults as the fine class. As for every method, the total number of samples is 12,000.

Experiment Settings.
e CNN models used in this paper have the same network structure and hyperparameters except for the different output structures, that is, 5 convolutional layers and 2 fully connected layers, each of which includes a pooling layer and a BN layer. e output layer is a softmax layer. e activation function of the convolution layer is Relu. e activation function of the fully connected layer is Sigmoid. e cross-entropy loss is used as the loss function. And the optimization method is the Adam algorithm. e regularization function is L2. e learning rate is 0.0001 and the batch size is 32. e hyperparameter λ used in the loss function (formula (2)) of the proposed method is 0.1.

Traditional CNN Fault Diagnosis Method Using Separate Modeling of Multiple Working Conditions.
e framework of the traditional CNN model using separate modeling of multiple working conditions is illustrated in Figure 1(a), and this method is abbreviated as CNN-S. It has four CNN models in total and each model corresponds to one working condition. e output of each CNN model contains 5 health states; the working condition information of the input sample needs to be clarified during testing stage.  Table 2. It shows only the data used by one model, and each model has the same data structure.

Traditional CNN Fault Diagnosis Method Using Simultaneous Modeling of Multiple Working Conditions.
e framework of traditional CNN model using simultaneous modeling of multiple working conditions is illustrated in Figure 1 Table 3.

Fine-Grained Fault Classification Algorithm.
e framework of the fine-grained defect classification algorithm is illustrated in Figure 1(c), and our method is abbreviated as CNN-FG. Compared to CNN's traditional methods, the Input: fault signal training set Γ, hyperparameter λ, α 1 , and α 2 , the maximum number of iterations T max , and counter t � 0, Output: parameter W of the CNN model (1) Select a mini-batch sample from the fault signal training set Γ (2) Execute the forward propagation of the CNN model, for each input signal; calculate the activation values of each layer (3) Compute the softmax loss error flows of the fc7; then calculate the backpropagated error flows from fc7 to fc6 and the error flows from fc7 to fc5 (4) Compute the softmax loss of the fc6 (5) Compute the overall error flows of fc6, which consist of the softmax loss of fc7 and softmax loss of itself. en use the BP algorithm to calculate the error flows backpropagated from fc6 to fc5 (6) Compute the GLM loss error flows of the backpropagation to fc5, and then multiply hyperparameter λ (7) Compute the overall error flows of fc5 which consist of fc6, fc7, and the GLM loss (8) Execute the backpropagation from fc7 to conv1 layer, and use the BP algorithm to compute the error flows of these layers (9) Based on the activation values and error flow values, calculate zL/zW through BP algorithm (10) Update W according to the gradient descent method (11) t ⟵ t + 1. If t < T max , back to step 1 ALGORITHM 1: e training algorithm of the proposed method.    fine-grained classification algorithm uses two levels of label hierarchical structure, which are coarse class level and finegrained class level. So, it has two classifiers (softmax layer) after the fully connected layer of CNN; one recognizes the fine-grained class level and the other recognizes the coarse level. e coarse class level corresponds to working conditions and the fine-grained class level represents health conditions. Working conditions are divided into condition 1, condition 2, condition 3, and condition 4, which correspond to coarse labels 1/4. Data samples under each working condition are divided into 5 health conditions: normal state and fault 1-fault 4; and corresponding state tags are 0-4. So we have a total of 20 fine-grained class labels, corresponding to fine-grained class labels 0-19. e composition of the data samples used in the fine-grained classification model is shown in Table 4.

e Results of CNN-S.
CNN-S is the most common method. During the model training, when the training set accuracy reaches 100% and the loss function no longer decreases, the average fault diagnostic accuracy of each working condition diagnostic model on the test set is only 95.5%, 88.5%, 94.5%, and 95.8%. It shows that the network structure is reasonable and the functional operation is preliminarily effective. However, due to the limitation of the number of samples (each model only has 600 samples per health state in total), it has not been fully activated in training, which appears to be a typical overfitting. In order to understand the diagnosis results in more detail, Figure 7 shows the confusion matrix of the CNN-S model on each working condition test set and the overall statistics. On the confusion matrix plot, the rows correspond to the predicted class and the columns correspond to the true class. Numbers 0∼4 mean the health state labels as shown in Table 2 In order to analyze the fault diagnosis results more intuitively, t-distributed stochastic neighbor embedding (t-SNE) algorithm is used to visualize the extracted features. e results are shown in Figure 8.     which will lead to misdiagnosis. For the scattered points of the same health state, the inner distance is relatively large in some classes, and some scattered points of the health state may even be distributed across classes. is will result in the inability to effectively identify the health state and cause failure to diagnose. e above analysis results show that the CNN-S model has no obvious effect on the classification of the health status under working condition 1 and cannot accurately diagnose the fault status. e visualized classification results of the extracted features of the data sets under working condition 2, working condition 3, and working condition 4 are shown in Figures 8(b)-8(d), respectively. Similarly, the classification results of these three working conditions show the following: the cases where the distance between classes of different health states is relatively small, and the distance between classes in the same health state is relatively large.

e Results of CNN-M.
In order to prove the influence of working conditions on gearbox fault diagnosis, and because the sample size is not enough to support CNN-S, another common method of CNN-M is used here. e final diagnostic accuracy of CNN-M is 79.9% on test set. Although data from all working conditions are based on a single model, the number of samples increases, but the accuracy of the final diagnosis is lower than that of one of CNN-S's four unique work status models. Figure 9 shows the confusion matrix of the extended sample data set shown in Table 3. From Figure 9, the same CNN network structure with the same network parameters can be seen, the diagnostic accuracy for the extended sample data set is 66% to 93%, and the average accuracy is only 79%, which is much lower than any of the four single working condition models in CNN-S. Further, the feature extraction visualization of CNN-M is shown in Figure 10. It shows that there is a serious overlap between the scattered points of different health status features, which is more serious than any of the cases in Figure 8. In addition, the intraclass spacing between the scattered points of the same health status feature becomes larger, which indicates that the accuracy of fault diagnosis on the expanded sample data set is very low.
e results show that the working condition cannot be ignored as the key element of gearbox fault diagnosis. An appropriate global modeling method needs to distinguish all faults under different working conditions.

e Results of CNN-FG.
e final diagnostic accuracy of CNN-FG is 98.8% on test set. Figure 11 shows the fault diagnosis accuracy of the proposed method, both the coarse class (working condition) and the fine-grained class (health state). It can be seen that the specific working condition information of the input signal and the specific health state can be accurately identified. is also reflects the effectiveness of the hierarchical label structure proposed in this paper. It combines the working condition information with the health state information, and these two complement each other, providing more reliable information for signal fault diagnosis.
e confusion matrix of CNN-FG is shown in Figure 12, which shows the ability of CNN-FG model to recognize the coarse class of the data samples. From the figure, it can be seen that, for any data sample under working conditions 1∼4, the CNN-FG model is able to identify the working conditions accurately. e numbers 0∼3 are coarse class labels defined in Table 4.        Figure 13 shows the confusion matrix of fine-grained class recognition using fine-grained classification model. It can be seen from the right column that all accuracies are almost 100%, which means that the fine-grained classification model can accurately identify the 20 fine-grained classes.
In order to compare the two common methods (a) and (b) more intuitively, the diagnosis results of CNN-FG method are merged according to the working conditions. e statistical results are shown as confusion matrix, as shown in Figure 14. Obviously, this method is much better than (a) and (b). Figure 15 shows the t-SNE visualization of the overall feature extraction of the fine-grained classification model. It is obvious that the fine-grained classification model can clearly identify all 20 health states from the joint feature distribution of states across working conditions. e above results clearly show the superiority of the fine-grained classification algorithm for fault diagnosis under the coupling of working conditions and health states.

Discussion.
e comparison of the traditional CNN methods (CNN-S and CNN-M) with the CNN-FG proposed in this paper is shown in Figure 16. Figure 16(a) is the comparison of test loss drop curves of those three methods. We can see that the proposed algorithm can achieve lower loss on the test set. It indicates that the proposed method has better generalization performance. As mentioned above, the network output of CNN-S and CNN-M model has only five classes, and the network output of CNN-FG contains a total number of 20 fine-grained classes that belong to 4 coarse classes, so the latter loss function value is larger at the beginning. However, as the training progresses, the loss function value of the latter decreases rapidly and is lower than the traditional CNN methods. It indicates that the finegrained classification algorithm based on the working condition is more suitable to describe the problem of fault diagnosis for variable working conditions. Figure 16(b) shows the fault diagnosis accuracy comparison between the traditional CNN methods and the fine-grained fault diagnosis method proposed in this paper. It can be seen that the method used in this paper is significantly better than the traditional CNN methods in fault diagnosis recognition accuracy. It indicates that the traditional CNN methods which directly use the CNN model to extract fault signal features without considering the working conditions have obvious defects. e use of fine-grained classification algorithm to extract the common features of different health states under different working conditions more fully explains the problems addressed in this paper. When the health status features are significantly affected by the working conditions, accurately identifying the health states means that the working conditions must be accurately identified at  the same time, and the implementation of the two should ultimately be synchronous. Table 5 shows the comparison of fault diagnosis accuracies between traditional CNN models and fine-grained CNN models. e advantages of fine-grained classification methods in diagnostic accuracy can be seen very intuitively from the table.

Conclusion
e innovations of the proposed algorithm are twofold: (1) modify the CNN structure, introduce the skipping connection, and use softmax cascade loss to train the network, so that the CNN model has the ability to extract the common characteristics of the types of failures and working conditions, and use these common characteristics. (2) e hierarchical label structure allows CNN to simultaneously identify the types of failures and working conditions and at the same time improve the accuracy of the diagnosis of the types of failures. Compared with traditional CNN methods, the fine-grained classification model can not only effectively expand the number of original samples by combining the samples of different working conditions. Considering the influencing factors of working condition coupling, the coupling between working conditions and health status is unified into the same CNN model, so as to extract features from all samples to the maximum and complete the feature extraction across the working conditions. Finally, the decoupling of working conditions and health status is achieved, and the fault diagnosis accuracy is effectively improved.
e problem of the sample expansion of the planetary gearbox under a limited sample situation and the problem that the working conditions and the health states are coupled to reduce the diagnosis accuracy are solved. e results show that this method is superior to existing methods.
It should be noted that this method is very suitable for the gearbox, or other systems whose characteristics are significantly affected by working conditions, and that it is determined that it is difficult to obtain sampling on a large scale. When the influence of working conditions is limited, the SIMPLE CNN-M method can be used. If the system is heavily affected by working conditions, with large sample size, CNN-S can achieve the same effect. However, where   14 Shock and Vibration both exist, the methods provided in this paper are irreplaceable.
Data Availability e data from military equipment that cannot be disclosed.

Conflicts of Interest
e authors declare that they have no conflicts of interest.