Intelligent Detection Method of Gearbox Based on Adaptive Hierarchical Clustering and Subset

Deep learning uses mechanical time-frequency signals to train deep neural networks, which realizes automatic feature extraction and intelligent diagnosis of fault features and gets rid of the dependence on a large number of signal processing technology and experience. Aiming at the problem of misclassification of similar samples, a fault diagnosis algorithm based on adaptive hierarchical clustering and subset (AHC-SFD) is proposed to extract features and applied to gearbox fault diagnosis. Firstly, the adaptive hierarchical clustering algorithm is used to analyze the characteristics of different data, and then the data set is clustered into multiple feature groups; finally, according to the feature group, the SubCNN model is established for multiscale feature extraction, so as to carry out fault diagnosis. The test results show that the fault recognition rate achieved by the proposed method is more than 99.7% on the gearbox dataset, and the method has better generalization ability.


Introduction
Major accidents caused by mechanical equipment failure [1] constantly alert people to ensure the safe and reliable operation of equipment, especially the mechanical equipment failure at the key core of the production line will bring significant shutdown losses to the whole production line, not only causing huge economic losses, but also endangering personal safety in serious cases. e online monitoring, fault diagnosis, and prediction of mechanical equipment [2,3] play an important role in improving equipment operation reliability, optimizing operation and maintenance strategies, and are crucial to the maintenance of mechanical equipment. Traditional intelligent fault diagnosis methods need to master a large number of signal processing techniques to extract relatively accurate feature parameters. At the same time, if the shallow model is used to characterize the relationship between signal and fault, and the diagnosis ability and generalization ability are insufficient, it is difficult to meet the actual needs of fault diagnosis under big data.
In recent years, the application of deep learning in fault diagnosis of complex industrial systems has begun to take shape [4]. Lei et al. [5,6] proposed a big data health monitoring method based on denoising self-encoder (DAE) for mechanical equipment, which has realized a variety of fault diagnosis for planetary gears, reflecting the powerful ability of deep learning to extract mechanical vibration signal characteristics. Yu and Zhao [7][8][9] effectively integrated DAE and EN to solve the problem of noise interference in fault diagnosis, effectively detect abnormal samples in industrial processes, and isolate fault variables from normal variables. Nguyen et al. [10][11][12] proposed a deep learning network composed of automatic encoder and softmax classifier to identify bearing faults of different degrees. DBN is more combined with other technologies to solve the problem of fault diagnosis. Since CNN was used to identify bearing faults in 2016, fault diagnosis performance and scope of application have been continuously improved. Hoang and Kang [13][14][15][16] proposed a new method based on CNN for rolling bearing fault diagnosis. By using the effectiveness of CNN in image classification, the CWRU bearing data set can achieve 100% diagnosis accuracy. Based on resnet-50, a transfer learning convolution neural network TCNN is proposed by Wen et al. [17,18] for fault diagnosis, and the prediction accuracy is significantly better than other DL models and traditional diagnosis methods. e application of RNN in fault diagnosis began to recover in 2015. Abed et al. [19,20] used RNN for bearing fault diagnosis and realized accurate detection and classification of bearing faults under nonstationary conditions. Pan et al. [21][22][23] proposed a method for bearing fault classification by combining one-dimensional CNN and LSTM, and the experimental test accuracy is 99.6%.
Although the above algorithm has been applied in mechanical equipment fault diagnosis, there is still a lot of room to improve the fault recognition rate. Feature extraction is a key part of fault diagnosis. It is found that for samples with similar features and belonging to different patterns, a single model will extract similar features, resulting in false recognition [24] and a reduction in the accuracy of fault diagnosis. In view of the above problems, referring to the idea of subset [25,26], this study proposes a multiscale feature extraction fault diagnosis algorithm model AHC-SFD based on adaptive hierarchical clustering and applied to gearbox fault diagnosis. e test results show that the proposed method can achieve the fault recognition rate achieved by the proposed method is more than 99.7% on the gearbox dataset and has better generalization ability.

Gear Fault Diagnosis Algorithm Based on Adaptive Hierarchical Clustering and Subset
Gear boxes generally work in the environment with strong noise and complex structure, and the collected vibration signals are easily affected by external factors. To fully develop the feature extraction ability of the CNN network, this study proposes a fault diagnosis algorithm based on adaptive hierarchical clustering and subset. First, all data obtained the optimal clustering results through adaptive hierarchical clustering, and a multiscale feature extraction module is designed according to the clustering results to realize the classification of fault data.

Adaptive Hierarchical
Clustering. e number of clusters is an important parameter that affects the clustering effect, but before clustering, it is often necessary to set the number of clusters to take a fixed value. As the amount of data changes, the original parameter values cannot optimize the clustering result of the algorithm. Combined with the characteristics of vibration signals, an adaptive hierarchical clustering (DIANA) algorithm is proposed in this study. e clustering contour coefficient is used as the index of clustering effectiveness evaluation, so that it can adaptively determine the number of clusters according to the value of self-defined discriminant function. e process is shown in Figure 1. e specific algorithm flow chart is as follows: (1) Extract the average value of each original vibration signal to form a feature sample set X � x 1 , x 2 , . . . , x num } , U � u 1 , u 2 , . . . , u C indicates fault type set (2) Start clustering, make k � 0, s max � −∞; (3) Let k � k + 1, take k as the number of clusters, and perform hierarchical clustering on the input training samples (DIANA); (4) Calculate the contour coefficient s(k), In equation (1) 2 Computational Intelligence and Neuroscience In equation (2), p denotes a mark other than Class c, n p represents the number of samples not of class c, C p represents a sample that is not class c, C c is the sample of class c, and d(i, j) is the absolute distance between samples i and j; In equation (3), a(i) represents the average distance between sample i and all other samples belonging to the same type of fault, and b(i) represents the minimum value of the average distance between sample i and all samples in each class of nonclass i fault; In equation (4), s i is the contour coefficient of the sample individual, num is the number of samples in the feature sample set, and k is the number of clusters; (5) When s(k) > s max , then s Index � k and s max � s(k), perform step 7; (6) When s(k) ≤ s max , return to step 3; (7) Judge whether k is less than n, where n indicates the number of dataset types: When k ≥ n, s Index is the number of clusters and the clustering results are output; When k < n, repeat step 3.

Multiscale (Subset) Feature Extraction.
In order to maximize the extraction of feature information from  Computational Intelligence and Neuroscience training data and quickly realize iteration, this study designs a multilayer and multichannel multiscale feature extraction module based on the CNN. e structure is shown in Figure 2. e branch structure of each subset (12 layers in total) is the same, in which the convolution kernel sizes of the 8-layer convolution layers are 1 * 8, 1 * 8, 1 * 4, 1 * 4, 1 * 4, 1 * 2, and 1 * 2, the number of channels is set to 16, 16, 64, 64, 256, 256, 512, and 512, and the step size is set to 2, 2, 2, 2, 2, 1, and 1. e relu activation function is used behind each convolution layer, and the max pool layer of 4 adopts the 1 * 2 structure. Finally, the extracted feature information is output.

AHC-SFD Diagnostic Algorithm.
e flow chart of adaptive hierarchical clustering and subset fault diagnosis proposed in this study is shown in Figure 3. e mean value of each vibration signal is used as the input of adaptive hierarchical clustering to obtain the optimal clustering results. e labeled samples corresponding to the results are input to the multiscale feature extraction module to obtain more effective fault data features. Finally, the features extracted by the multifeature extraction module are transformed into one-dimensional data through the fully connected layer. Output the fault diagnosis result through softmax function.

Experimental Verification and Analysis
In order to evaluate the effectiveness and accuracy of fault diagnosis of the AHC-SFD network model, the gearbox dataset is used for experimental verification. e data are collected from a reference two-stage gearbox, the gear speed is controlled by a motor, and the torque is provided by a magnetic brake, which can be adjusted by changing its input voltage. A 32-tooth pinion and an 80-tooth pinion are installed on the first stage input shaft, the second stage consists  of a 48-tooth pinion and a 64-tooth pinion. Input shaft speed is measured by tachometer, and gear vibration signal is measured by accelerometer, as shown in Figure 4.

Fault Dataset Description and
Processing. e pinion on the input shaft introduces 9 different gear conditions, including five different severity labels, such as health, missing teeth, root cracking, peeling, and tip cutting. e number of samples in each status tag is the same. e collected data are roughly divided into training samples and test samples in the proportion of 4 : 1. Each sampling sample is set to 3600 points. e dataset is described in Figures 5-13 and Table 1.

Refactoring Input Data Format.
e dataset collected by the test-bed is a one-dimensional vibration signal sequence. In order to reduce the clustering time and carry out the adaptive hierarchical clustering operation quickly and effectively, this study takes the one-dimensional vibration signal with 3600 sampling points as the average value and takes the average value as the input value of the adaptive hierarchical clustering. e specific operation is as follows: Computational Intelligence and Neuroscience In equation (5), x i represents the i-th eigen value of a sample and X represents the average value of a sample.

Result Output.
e principle of adaptive clustering is to obtain a certain clustering result, so that the distance between classes is as large as possible, the distance within a class is as small as possible, and the classes have good separability. It can be seen from 2.1 that the cluster contour coefficient is used as the index for cluster effectiveness evaluation in this study. e closer the cluster contour coefficient is to 1, the better the clustering result is. e closer it is to −1, the worse the clustering result is. In this study, the number of clusters is set between [1,9]. During clustering, the cluster contour coefficients obtained with the change of the number of clusters is shown in Figure 14. It can be clearly seen that when the number of clusters are 2, the cluster contour coefficient (Sk) is the largest. erefore, the branch of the multiscale feature extraction module is set to 2.

Grouping Label Data According to Clustering Results.
Use labeled data; the labeled data samples are (x (1) , y (1) ), (x (2) , y (2) ), . . . , (x (m) , y m ), x (i) represents the feature vector, and y (i) ∈ 1, 2, . . . , t { } represents the fault type. According to the clustering results in 3.2.2, the label data (one-dimensional vibration signal) is divided into two groups. e two groups are divided into training samples and test samples according to the ratio of 39 : 11 and 19 : 6, respectively. e description of the training and testing datasets is shown in Table 2.

Data Standardization Operation.
In order to better speed up the network model training, make the data easy to calculate and obtain more generalized results, the input data are standardized, and the vibration signal data are mapped to the (0,1) interval by using the normalization equation. e mathematical expression is as follows:  Computational Intelligence and Neuroscience In equation (6), z i represents the preprocessed data, x i represents the frequency value of the vibration signal, and maximum values of frequency in each group of vibration signals, and f represents the number of each vibration signal.

Diagnostic Result Output.
In order to evaluate the difference between the normalized prediction result and the corresponding sample label, the cross entropy function is used to calculate the error loss value. e mathematical expression is as follows: In equation (7), J(θ) represents the loss function, I Δ { } represents the logical indication function (when the value is true, I � 1, otherwise I � 0), and y (i) represents the i-th real label of the fault. e weight matrix θ is iteratively updated by means of gradient descent. e iterative equation is as follows: In equation (8), θ j represents the weight matrix of the jth update.

Model Parameter Structure.
e experiment was implemented on a Linux computer using Pycharm platform, Python as the programming language, and PyTorch deep learning framework.
During network training based on stochastic gradient descent, the multilayer back-propagation of the error signal can easily lead to "gradient dispersion" (too small gradient will make the returned training error signal extremely weak) or "gradient explosion" (too large gradient will lead to Nan in the model).
With the increase of network depth, training becomes more and more difficult. Considering the network lightweight, during the experiment, the Adam optimizer is used to continuously update the network training parameters. e batch size is set to 30 and the number of iterations is 200. is study introduces the early stopping mechanism. By monitoring the changing value of the training set loss function between adjacent iterations during the training process, early stopping can terminate the model training in time to prevent the model from overfitting. e learning rate is 0.0005. e model is built on the basis of convolutional neural network model, so the parameter design is similar to the convolutional neural network, and the parameter design is shown in Table 3.       Figure 15.
e comparison results of AHC-SFD and CNN on the test set are shown in Figure 16.
It can be seen from the comparison results in Figures 15  and 16 that after 140 epochs, the accuracy of AHC-SFD algorithm on the test set reaches 99.7%, while the accuracy of the CNN algorithm on the test set is only 98.9%. erefore, the diagnostic methods in this study tend to be faster, more stable, with higher accuracy and stronger generalization ability.
In order to further demonstrate the learning ability of the model for different categories of features, the t-SNE dimension reduction algorithm in flow pattern learning is introduced to visualize the features learned by the full connected layer.
e experimental results are shown in Figure 17.
It can be seen from the scatter plot Figure 17 that the method AHC-SFD in this study has identification errors in the samples of class 0 and class 7, and the other samples are gathered at the corresponding positions. However, CNN features have recognition errors in class 1, class 2, class 5, and class 8 samples, and there are many overlaps in class 1 and class 5 samples. It can be seen that AHC-SFD has stronger feature learning ability than the CNN.

Conclusion
e AHC-SFD algorithm established in this study is a diagnosis algorithm based on adaptive hierarchical clustering and subset, which has the following three advantages: (1) the AHC-SFD algorithm directly takes the original vibration signal as the input of 1D-CNN, which can obtain the characteristics of vibration signal to the greatest extent. (2) A grouping method based on adaptive hierarchical clustering is proposed, which analyzes the characteristics of different data and then clusters the dataset into multiple feature groups. (3) A multiscale feature extraction module is proposed to reduce the misclassification of similar samples, thus ensuring the maximum extraction of effective information into the data. It is verified on the gearbox dataset that the diagnostic accuracy is better than the single-channel CNN model.

Data Availability
e data set used in this article can be obtained from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding this work.  Computational Intelligence and Neuroscience 9