The use of the convolutional neural network for fault diagnosis has been a common method of research in recent years. Since this method can automatically extract fault features, it has played a good role in some research studies. However, this method has a clear drawback that the signals will be significantly affected by working conditions and sample size, and it is difficult to improve diagnostic accuracy by directly learning faults, regardless of working conditions. It is therefore a research orientation worthy of a diagnosis of high precision defect in various working conditions. In this article, using a fine-grained classification algorithm, the operating conditions of the object system are considered an approximate classification. A specific failure in different working conditions is considered a beautiful classification. Samples of different faults in different working conditions are learned uniformly and the common characteristics are extracted from the convolutional network so that different faults of different working conditions can simultaneously be identified on the basis of the entire sample. Experimental results show that the method effectively uses the set of samples of the working conditions of the variables to obtain the dual recognition of defects and specific working conditions and the accuracy of the recognition is significantly higher than the method of learning regardless of working conditions.
Traditional methods of fault diagnosis, whether in time domain or frequency domain analysis, are highly dependent on physical experience. In recent years, because convolution neural network (CNN) was proposed [
The comparison between the traditional CNN fault diagnosis method and the fine-grained fault diagnosis algorithm. (a, b) The framework of the traditional CNN methods. “fc” denotes the fully connected layer. “C1”∼“C4” denote 1∼4 gear position, respectively. (c) The fine-grained fault diagnosis algorithm with two-level hierarchy label structure.
In the traditional method (Figure
The fine-grained method (Figure High utilization efficiency of the data: all samples unified are used to train a single feature extractor, the sample utilization ratio is doubled compared with the traditional division of labor modeling, the convolution layer can obtain more adequate training to achieve better feature extraction effect, and the working condition label originally used for manual division is also used in learning Under the constraint of fine-grained model, the fault under different working conditions is automatically distinguished, and the influence of working conditions on fault diagnosis is solved while all samples are trained uniformly
Experiments show that the gearbox is a system with significant differences in working conditions and rare effective fault samples and its fault diagnosis problem is solved.
Gearbox is an object with distinct characteristics. Because different working conditions mean completely different mesh states, working conditions are a kind of information that cannot be ignored. Gearbox is also a kind of object with a high cost of collecting the sample, due to the low failure rate; hundreds of hours of work can often encounter a single failure as an effective sample. This means that it is not possible to form the model through a large sample size. An appropriate method must take into account the economic cost. In this paper, the planetary gearbox for a particular type of vehicle is utilized as an object. The gearbox transmission principle is shown in Figure
Planetary gearbox transmission schematic.
Sensor layout.
The health states corresponding to the three planetary wheels include (a) normal, (b) planetary wheel tooth fault (hereinafter referred to as fault (1)), (c) planetary wheel tooth fault (hereinafter referred to as fault (2)), (d) planetary wheel tooth fault (herein after referred to as fault (3)), and (e) solar wheel tooth fault (herein after referred to as fault (4)). There are 4 kinds of corresponding working conditions: 1 to 4 gears. The signal characteristics of the same fault under different working conditions are not exactly the same. For example, normal and fault 3 and fault 4 conditions include 1/2/3/4/5 and reverse gear. The possible working conditions of fault 1 and fault 2 include gear 2 and gear 3. However, the conventional convolutional network can only label the 5 situations that need to be identified. The relationship between different working conditions and the change of fault signal characteristics by working conditions cannot be connected. The variation of working conditions becomes a disruptive factor in fault diagnosis, which reduces the accuracy of the diagnosis. Fine-grained classification algorithm can distinguish the influence of working conditions and health state from principles. Its performance depends on the design of the loss function.
In [
Before introducing our method, we give some definitions in order to explain our method in a mathematical way. Given a signal data set
In the problem of fine-grained classifications with two hierarchical labels (working condition coarse class classification and health state fine-grained class classification), we divided the last classification layer of the CNN model into two fully connected layers and used a cascaded softmax loss for training. Figure
Structure of the proposed fine-grained fault signal classification model.
The purpose of adding the skipping connection between fc5 layer and fc7 layer is that it could provide not only the features of the health state fine-grained class but also the probability scores
To train the network shown in Figure
The whole loss function of training the CNN network in Figure
For each health state fine-grained class
The hierarchical and the interclass structure of states.
The features belonging to health state fine-grained class
We define
Then the interclass distance between
After we describe the definitions mentioned above, the two constraints in the GLM loss can be expressed as
In formulas (
Using the definitions mentioned above, the GLM loss which contains two-level label structure can be described as
The CNN model formation algorithm we use here is the standard backpropagation algorithm (BP) based on the mini-batch. The full loss function is defined in formula (
Algorithm
Input: fault signal training set Output: parameter Select a mini-batch sample from the fault signal training set Execute the forward propagation of the CNN model, for each input signal; calculate the activation values of each layer Compute the softmax loss error flows of the fc7; then calculate the backpropagated error flows from fc7 to fc6 and the error flows from fc7 to fc5 Compute the softmax loss of the fc6 Compute the overall error flows of fc6, which consist of the softmax loss of fc7 and softmax loss of itself. Then use the BP algorithm to calculate the error flows backpropagated from fc6 to fc5 Compute the GLM loss error flows of the backpropagation to fc5, and then multiply hyperparameter Compute the overall error flows of fc5 which consist of fc6, fc7, and the GLM loss Execute the backpropagation from fc7 to conv1 layer, and use the BP algorithm to compute the error flows of these layers Based on the activation values and error flow values, calculate Update
The experimental data used in this paper is collected on a planetary gearbox for a particular type of vehicle mentioned in Section
Physical diagram of fault simulation experimental platform.
The planetary gearbox data samples are divided into five health states: normal state (Normal), K1 planetary gear failure (fault 1), K1 planetary gear failure (fault 2), K2 planetary gear failure (fault 3), and K3 planetary sun gear failure (fault 4). The labels correspond to five health states denoted as 0–4, respectively. Each health state samples is collected under 4 working conditions, which correspond to the gear positions 1 to 4. The data sampling frequency is 20 kHz and each working condition has four types of input speed: 600 r/min, 900 r/min, 1200 r/min, and 1500 r/min. The load torque is
Number of samples for different training methods.
Sample number | Condition 1 | Condition 2 | Condition 3 | Condition 4 | Ignore conditions |
---|---|---|---|---|---|
Normal | 600 | 600 | 600 | 600 | 2400 |
Fault 1 | 600 | 600 | 600 | 600 | 2400 |
Fault 2 | 600 | 600 | 600 | 600 | 2400 |
Fault 3 | 600 | 600 | 600 | 600 | 2400 |
Fault 4 | 600 | 600 | 600 | 600 | 2400 |
Total | 12000 |
For the fault diagnosis problem described in this paper, the traditional method (a) needs to establish a model according to each independent condition; a total of four models can be completely processed; each model has 5 states; each state has 600 tags, while the traditional method (b) ignores the working conditions to use all the samples, a total of 5 states, 2400 samples each. The fine granularity method used in this paper also has the same label form as the traditional method (a) while using all the working condition samples uniformly. And the method (c) focuses on fault diagnosis while avoiding losing the working condition information at the same time. There are 4 conditions as the coarse class and
The CNN models used in this paper have the same network structure and hyperparameters except for the different output structures, that is, 5 convolutional layers and 2 fully connected layers, each of which includes a pooling layer and a BN layer. The output layer is a softmax layer. The activation function of the convolution layer is Relu. The activation function of the fully connected layer is Sigmoid. The cross-entropy loss is used as the loss function. And the optimization method is the Adam algorithm. The regularization function is L2. The learning rate is 0.0001 and the batch size is 32. The hyperparameter
The framework of the traditional CNN model using separate modeling of multiple working conditions is illustrated in Figure
Data description of CNN-S.
Health state | Sample number | Test/train sample proportion | Label |
---|---|---|---|
Normal | 600 | 20%/80% | 0 |
Fault 1 | 600 | 20%/80% | 1 |
Fault 2 | 600 | 20%/80% | 2 |
Fault 3 | 600 | 20%/80% | 3 |
Fault 4 | 600 | 20%/80% | 4 |
The framework of traditional CNN model using simultaneous modeling of multiple working conditions is illustrated in Figure
Data description of CNN-M.
Health state | Working condition number | Sample number | Test/train sample proportion | Label |
---|---|---|---|---|
Normal | 4 | 2400 | 20%/80% | 0 |
Fault 1 | 4 | 2400 | 20%/80% | 1 |
Fault 2 | 4 | 2400 | 20%/80% | 2 |
Fault 3 | 4 | 2400 | 20%/80% | 3 |
Fault 4 | 4 | 2400 | 20%/80% | 4 |
The framework of the fine-grained defect classification algorithm is illustrated in Figure
Data description of CNN-FG.
Working condition | Health state | Sample number | Test/train sample proportion | Health state label | Fine-grained class label | Coarse class label |
---|---|---|---|---|---|---|
Condition 1 | Normal | 600 | 20%/80% | 0 | 0 | 1 |
Fault 1 | 600 | 20%/80% | 1 | 1 | ||
Fault 2 | 600 | 20%/80% | 2 | 2 | ||
Fault 3 | 600 | 20%/80% | 3 | 3 | ||
Fault 4 | 600 | 20%/80% | 4 | 4 | ||
Condition 2 | Normal | 600 | 20%/80% | 0 | 5 | 2 |
Fault 1 | 600 | 20%/80% | 1 | 6 | ||
Fault 2 | 600 | 20%/80% | 2 | 7 | ||
Fault 3 | 600 | 20%/80% | 3 | 8 | ||
Fault 4 | 600 | 20%/80% | 4 | 9 | ||
Condition 3 | Normal | 600 | 20%/80% | 0 | 10 | 3 |
Fault 1 | 600 | 20%/80% | 1 | 11 | ||
Fault 2 | 600 | 20%/80% | 2 | 12 | ||
Fault 3 | 600 | 20%/80% | 3 | 13 | ||
Fault 4 | 600 | 20%/80% | 4 | 14 | ||
Condition 4 | Normal | 600 | 20%/80% | 0 | 15 | 4 |
Fault 1 | 600 | 20%/80% | 1 | 16 | ||
Fault 2 | 600 | 20%/80% | 2 | 17 | ||
Fault 3 | 600 | 20%/80% | 3 | 18 | ||
Fault 4 | 600 | 20%/80% | 4 | 19 |
CNN-S is the most common method. During the model training, when the training set accuracy reaches 100% and the loss function no longer decreases, the average fault diagnostic accuracy of each working condition diagnostic model on the test set is only 95.5%, 88.5%, 94.5%, and 95.8%. It shows that the network structure is reasonable and the functional operation is preliminarily effective. However, due to the limitation of the number of samples (each model only has 600 samples per health state in total), it has not been fully activated in training, which appears to be a typical overfitting. In order to understand the diagnosis results in more detail, Figure
Confusion matrix for CNN-S. (a) Condition 1. (b) Condition 2. (c) Condition 3. (d) Condition 4. (e) Statistics of all single conditions.
In order to analyze the fault diagnosis results more intuitively, t-distributed stochastic neighbor embedding (t-SNE) algorithm is used to visualize the extracted features. The results are shown in Figure
The t-SNE figure of feature extracted by CNN-S: (a) condition 1, (b) condition 2, (c) condition 3, (d) condition 4.
Figure
In order to prove the influence of working conditions on gearbox fault diagnosis, and because the sample size is not enough to support CNN-S, another common method of CNN-M is used here. The final diagnostic accuracy of CNN-M is 79.9% on test set. Although data from all working conditions are based on a single model, the number of samples increases, but the accuracy of the final diagnosis is lower than that of one of CNN-S’s four unique work status models.
Figure
Confusion matrix for CNN-M.
The t-SNE visualization figure of features extracted from CNN-M.
The final diagnostic accuracy of CNN-FG is 98.8% on test set. Figure
Fine-grained classification (health state) accuracy and coarse classification (working condition) accuracy of CNN-FG.
The confusion matrix of CNN-FG is shown in Figure
Confusion matrix of coarse class recognition using CNN-FG.
Figure
Confusion matrix of fine-grained class recognition using CNN-FG.
In order to compare the two common methods (a) and (b) more intuitively, the diagnosis results of CNN-FG method are merged according to the working conditions. The statistical results are shown as confusion matrix, as shown in Figure
Statistics of CNN-FG.
Figure
The t-SNE visualization of the overall feature extraction of CNN-FG.
The comparison of the traditional CNN methods (CNN-S and CNN-M) with the CNN-FG proposed in this paper is shown in Figure
The comparison of traditional methods and fine-grained classification algorithm. (a)The comparison of test loss drop curves. (b) The comparison of faults diagnosis accuracies.
Table
Comparison of fault diagnosis accuracies between traditional CNN models and fine-grained CNN model.
Working condition | Traditional CNN models | Fine-grained CNN model | ||
---|---|---|---|---|
Accuracy (%) | Avg (%) | Accuracy (%) | Avg (%) | |
Single working condition (CNN-S) | ||||
Condition 1 | 95.5 | 93.6 | — | — |
Condition 2 | 88.5 | — | ||
Condition 3 | 94.5 | — | ||
Condition 4 | 95.8 | — | ||
Multiple working conditions | ||||
Conditions 1–4 | 79.9 (CNN-M) | 98.8 (CNN-FG) |
The innovations of the proposed algorithm are twofold: (1) modify the CNN structure, introduce the skipping connection, and use softmax cascade loss to train the network, so that the CNN model has the ability to extract the common characteristics of the types of failures and working conditions, and use these common characteristics. (2) The hierarchical label structure allows CNN to simultaneously identify the types of failures and working conditions and at the same time improve the accuracy of the diagnosis of the types of failures. Compared with traditional CNN methods, the fine-grained classification model can not only effectively expand the number of original samples by combining the samples of different working conditions. Considering the influencing factors of working condition coupling, the coupling between working conditions and health status is unified into the same CNN model, so as to extract features from all samples to the maximum and complete the feature extraction across the working conditions. Finally, the decoupling of working conditions and health status is achieved, and the fault diagnosis accuracy is effectively improved. The problem of the sample expansion of the planetary gearbox under a limited sample situation and the problem that the working conditions and the health states are coupled to reduce the diagnosis accuracy are solved. The results show that this method is superior to existing methods.
It should be noted that this method is very suitable for the gearbox, or other systems whose characteristics are significantly affected by working conditions, and that it is determined that it is difficult to obtain sampling on a large scale. When the influence of working conditions is limited, the SIMPLE CNN-M method can be used. If the system is heavily affected by working conditions, with large sample size, CNN-S can achieve the same effect. However, where both exist, the methods provided in this paper are irreplaceable.
The data from military equipment that cannot be disclosed.
The authors declare that they have no conflicts of interest.
This research was supported by the National Natural Science Foundation of China under Grant no. 51875576.