Deep learning-based fault diagnosis of rolling bearings is a hot research topic, and a rapid and accurate diagnosis is important. In this paper, aiming at the vibration image samples of rolling bearing affected by strong noise, the convolutional neural network- (CNN-) and transfer learning- (TL-) based fault diagnosis method is proposed. Firstly, four kinds of vibration image generation method with different characteristics are put forward, and the corresponding pure vibration image samples are obtained according to the original data. Secondly, using CNN as the adaptive feature extraction and recognition model, the influences of main sensitive parameters of CNN on the network recognition effect are studied, such as learning rate, optimizer, and L1 regularization, and the best model is determined. In order to obtain the pretraining parameters, the training and fault classification test for different image samples are carried out, respectively. Thirdly, the Gaussian white noise with different levels is added to the original signals, and four kinds of noised vibration image samples are obtained. The previous pretrained model parameters are shared for the TL. Each kind of sample research compares the impact of thirteen data sharing schemes on the TL accuracy and efficiency, and finally, the test accuracy and time index are introduced to evaluate the model. The results show that, among the four kinds of image generation method, the classification performance of data obtained by empirical mode decomposition-pseudo-Wigner–Ville distribution (EP) is the best; when the signal to noise ratio (SNR) is 10 dB, the model test accuracy obtained by TL is 96.67% and the training time is 170.46 s.
The safe operation of mechanical equipment is an important guarantee for the modern industrial production. As an indispensable part of intelligent equipment, the fault diagnosis technology of rolling bearings has attracted great attention. Rolling bearing is generally composed of inner ring, outer ring, rolling body, and cage. After a long time of operation, various faults are easy to occur. Therefore, the rapid and accurate fault identification of rolling bearing is a great challenge and of great significance [
The traditional fault diagnosis method of rolling bearings is mainly to obtain the characteristic information through the processing of vibration signals. Different analysis algorithms are selected for different types of vibration signal. Common analysis algorithms, such as wavelet transform and empirical mode decomposition (EMD), still play important roles in the fault diagnosis field and are constantly being improved [
With the rapid development of intelligent algorithms such as deep learning and cross-integration with various disciplines, the intelligent diagnostic methods are extensively used in mechanical fault diagnosis [
The data corresponding to different working conditions are of different distributions. It is found that the classification accuracy of CNN is low when the distribution of training data set and test data set is different [
In order to solve the problems of the sample effectiveness and the performance of TL under the strong noise, a method of vibration image-driven CNN- and TL-based model is proposed in this paper. In
The research process of this work is shown in Figure
Overall research flowchart.
As a feedforward neural network, the CNN model is composed of convolution layer, pooling layer, and fully connected layer, in which the convolution layer is used to convolute the image to obtain the featured map. The convolution formula is shown as follows [
After the convolution operation, the parameters are input into the activation function, which increases the nonlinearity of the model, so that the model can be applied to the complex classification problems. The activation function formula is shown as follows:
The pooling layer is to reduce the dimension of the input parameters, and the commonly used pooling operations are maximum pooling and average pooling. The fully connected layer in the convolution network maps the distributed features to the sample space. In the whole network structure, the fully connected layer generally acts as a classifier, for example, the Softmax function, and the calculated result is the output of the whole CNN model, as shown in the following equations:
In the process of model training, the error between the predicted model output and the actual target is calculated by the loss function, and the commonly used loss function is the cross-entropy function, as follows [
The
The TL is an effective method to solve small sample problem, which takes the model developed for task A as the starting point and reuses it in the process of developing the model for task B. In a practical research, TL is mainly divided into three kinds: the case-based, feature-based, and shared parameter-based transfer [
This paper takes the LeNet-5 network as the basic model. The model has a shallow network layer and requires few training parameters, so it has fast training speed and good classification effect. For the low complexity of vibration image samples in this paper, it is appropriate to choose this network. The network structure is shown in Figure
CNN model structure diagram.
The initial CNN model parameters are shown in Table
CNN model parameters preset.
Items | Parameters |
---|---|
Batch size | 24 |
Convolution1 kernel | Length |
Convolution2 kernel | Length |
Convolution3 kernel | Length |
Pool1 set | Type = max pooling, length |
Pool2 set | Type = max pooling, length |
Activate function | Relu |
Loss function | Cross entropy |
Optimizer | Stochastic gradient descent |
Initial learning rate | 0.001 |
Regularization | None |
The original vibration signals do not have the advantages of images, such as great differences between samples and more intuitive observation. Therefore, this paper carries out four image generation methods for vibration signals and finds which of 4 types of image has better classification performance. The samples used in this paper are the vibration data of rolling bearing from the Case Western Reserve University [
Bearing test platform of Case Western Reserve University.
The selected data are those with 1797 r/min, fault degree of 0.007 inch, sampling frequency of 12 kHz, and 4 kinds of working state including normal state, inner ring fault, outer ring fault, and rolling body fault. According to the rotational speed and sampling frequency, it is calculated that 400 sampling points are obtained for each rotation, so this paper converts 400 sampling points as a group of data for image conversion.
For any signal
Treat
Treat
IMFA sample: (a) normal state; (b) inner ring fault; (c) outer ring fault; (d) rolling body fault.
In the process of image generation, in order to show the fluctuation state of each IMF, the method of adaptively adjusting the vertical coordinate is adopted; i.e., the amplitude
Then, the Wigner time-frequency distribution of the original signal is obtained by superposing the PWVD results of different IMFs. The combination of EMD and PWVD not only effectively eliminates the cross interference but also retains the excellent time-frequency focusing [
Vibration signal segmentation: processing 400 sampling points at a time. EMD decomposition: the above signals are decomposed by EMD, and the IMF components and residual components from high frequency to low frequency are obtained in turn. Only the first six IMFs are processed in the next step. EP time-frequency analysis: the first six IMFs are analyzed by PWVD, and then, the EP time-frequency distribution is obtained. Grayscale processing: in order to reduce the input parameters and improve the training efficiency, the grayscale processing of the generated samples is done.
According to the above steps and Figure
EP sample: (a) normal state; (b) inner ring fault; (c) outer ring fault; (d) rolling body fault.
The above two methods obtain the images from vibration signal processing, but the SPCI does not belong to this scope, and it represents the form of a mirror symmetrical image in polar coordinates and directly converts the sampled signal into an image. The graphic display is intuitive and has a strong ability to express features.
Schematic diagram of SPCI.
In the above formula,
SPCI sample: (a) normal state; (b) inner ring fault; (c) outer ring fault; (d) rolling body fault.
In addition to the above three vibration images, the GTM is introduced in this paper. Before obtaining a GTM, it is necessary to convert the vibration signal into a grayscale image. The grayscale image is a data matrix, in which the subscript of each element corresponds to its position in the image, i.e., row and column coordinates, and the element value represents the luminance value of the corresponding position. The generation process of grayscale image is actually a process of data mapping. The maximum value “max” in the feature matrix is mapped to the gray level 255. The minimum value “min” is mapped to the gray level 0, as shown in Figure
Mapping relationship of grayscale image.
In the above equation,
The data arrangement in traditional grayscale image is “sequential arrangement of vibration data,” but this arrangement uses a large number of data points every time. If the image sample resolution is sacrificed and a small number of sample points are used to convert the grayscale image each time, it cannot highlight the characteristics of the original signal. Therefore, this paper uses vibration data to convert the grayscale image according to the horizontal-vertical cross arrangement; i.e., the Each grayscale image with The horizontal-vertical cross arrangement can form a grid structure in the grayscale image, which can enhance the image texture feature.
After the above conversion, the grayscale image is extracted by the local binary pattern (LBP) [
LBP is used to show the relationship between the pixel value of a certain point in the grayscale image and its surrounding pixel value. The image processed by LBP shows the texture information. Figure
GTM sample: (a) normal state; (b) inner ring fault; (c) outer ring fault; (d) rolling body fault.
The four types of image generation method used in this paper focus on showing the different features between vibration data. In this paper, four kinds of image sample will be obtained, in which each kind of sample contains four kinds of bearing state, and each kind has 1200 samples. The samples is divided into the training set, validation set, and test set according to 6 : 2 : 2, in which the test set does not participate in the training process and only evaluates the final model.
The task of deep learning is divided into two stages: training and test, in which the training is the process of optimizing the model, such as improving the classification accuracy and strengthening the generalization ability. After many experiments, it is found that the model performance is highly sensitive to the initial learning rate, training optimizer, and regularization parameters, so this paper mainly focuses on these items. As the research methods for four types of image sample are similar, the following only shows the research process of EP, and the other gives their results in the end.
Learning rate is an important parameter of CNN training. For the training with fixed learning rate, if the learning rate value is small, the high training accuracy can be obtained, but the convergence speed will be affected. If the learning rate value is big, there are the opposite results. To avoid a fixed learning rate, this paper proposes a degenerative learning rate, i.e., to find an equilibrium between training speed and accuracy, and the learning rate decreases with the increase in the number of training steps. The formula is shown as follows:
In order to obtain the best training effect, this paper takes the learning rate as 0.001, 0.0001, and 0.00001, respectively. The accuracy and loss curves in the training process are shown in Figure
Comparison of training processes with different initial learning rates: (a) training accuracy; (b) training loss.
Comparison of test results with different initial learning rates.
Learning rate | 0.001 | 0.0001 | 0.00001 |
---|---|---|---|
Test accuracy | 98.75% | 98.33% | 95.42% |
In Figure
Confusion matrix of test results with learning rate = 0.0001.
Bearing states | Normal | Inner ring fault | Outer ring fault | Rolling body fault |
---|---|---|---|---|
Normal | 60 | 0 | 0 | 0 |
Inner ring fault | 2 | 58 | 0 | 0 |
Outer ring fault | 0 | 0 | 59 | 1 |
Rolling body fault | 0 | 0 | 1 | 59 |
In deep learning, there are many optimization methods to find the optimal solution of the model. In this paper, three kinds of optimization algorithms are used to select the most suitable optimizer.
Momentum adds the inertia in the gradient descent process, which makes the speed with the same gradient direction faster and the renewal speed of the dimension with the change in gradient direction slower so that it can speed up the convergence and reduce the oscillation.
In this paper, under the condition that the initial learning rate is 0.0001 and the other parameters are unchanged, the model optimizer is set as SGD, momentum, and Adam, respectively. The accuracy and loss comparison curves in the training process are shown in Figure
Comparison of different optimizer training effects: (a) training accuracy; (b) training loss.
Comparison of different optimizer test results.
Optimizer | SGD (%) | Adam (%) | Momentum (%) |
---|---|---|---|
Test accuracy | 98.33 | 98.33 | 99.17 |
Compared with the three curves in Figure
Confusion matrix of test results with momentum.
Bearing states | Normal | Inner ring fault | Outer ring fault | Rolling body fault |
---|---|---|---|---|
Normal | 60 | 0 | 0 | 0 |
Inner ring fault | 2 | 58 | 0 | 0 |
Outer ring fault | 0 | 0 | 60 | 0 |
Rolling body fault | 0 | 0 | 0 | 60 |
There are two kinds of abnormal fitting in the training process: overfitting and underfitting. The overfitting means that the model established is too superior in the training samples, resulting in poor performance in the validation and test data sets, while underfitting generally means that the features extracted from the training samples are relatively few, resulting in the training model cannot match well, and the performance is very poor.
In order to solve the overfitting, the called regularization is introduced into the training process. The main purpose of regularization is to control the complexity of the model and reduce overfitting. The basic regularization method is to add a penalty term to the original loss function to “punish” the model with high complexity. In this paper, several commonly used regularization methods, such as
Comparison of validation results under different regularization parameters: (a) validation accuracy; (b) validation loss.
Comparison of test results under different regularization parameters.
Regularization | Without regulation (%) | ||||
---|---|---|---|---|---|
Test accuracy | 98.75 | 98.75 | 98.75 | 99.17 | 99.17 |
In Figure
Confusion matrix of test results with
Bearing states | Normal | Inner ring fault | Outer ring fault | Rolling body fault |
---|---|---|---|---|
Normal | 60 | 0 | 0 | 0 |
Inner ring fault | 2 | 58 | 0 | 0 |
Outer ring fault | 0 | 0 | 60 | 0 |
Rolling body fault | 0 | 0 | 0 | 60 |
Performance curves of the model with
Under the condition that other parameters remain unchanged, the other three types of image samples were trained and validated for the above three sensitive parameters, and the final parameters are shown in Table
Best parameter determination of four models.
Sample categories | Initial learning rate | Optimizer | |
---|---|---|---|
EP | 0.0001 | Momentum | 0.0001 |
IMFA | 0.0001 | SGD | 0.01 |
SPCI | 0.0001 | Momentum | 0.0001 |
GTM | 0.0001 | Momentum | 0.1 |
Under the best model parameters obtained above, four types of sample were tested, respectively, and the number of tested samples in each type is 240. This paper uses the accuracy, precision, recall, and
Evaluation data of test results.
Sample categories | Accuracy (%) | Average precision (%) | Average recall rate (%) | Average |
---|---|---|---|---|
EP | 98.75 | 98.75 | 98.75 | 98.50 |
IMFA | 93.75 | 94.25 | 93.75 | 94.00 |
SPCI | 99.19 | 99.25 | 99.25 | 99.00 |
GTM | 96.67 | 96.75 | 96.75 | 96.50 |
From the test results, it can be found that the model trained by SPCI samples has a good classification effect, and the accuracy reaches 99.19%, which is the best among four kinds of image samples. The classification effect by EP samples is also good, with the test accuracy of 98.75%. GTM and IMFA samples have the test results of 96.67% and 93.75%, respectively.
In the actual industrial field, the collected vibration signals will be disturbed by background noise, and the signals polluted by noise will cover up the effective information in the original signals, so it is a significant work to identify the fault categories quickly and accurately for the noised signals. The samples used in the pretraining have a certain similarity with the noised samples, which increases the probability of successful parameter transfer. By transferring the model parameters obtained from the pretraining, the slow training process can be avoided and the model efficiency can be improved. In this paper, by adding Gaussian white noise (GWN) to the bearing data of Case Western Reserve University to simulate the actual field signal, the noised signal is converted into image. The designated layers of the training model using noised samples are frozen, and the pretraining parameters by the pure samples are shared. The TL flowchart is shown in Figure
Parameter transfer process.
In order to simulate the influence of different degrees of noise on the signal, the GWN with slight, moderate, and severe SNR is added, respectively. Through a test, it is found that when the SNR is 22 dB, the noised signal can drown the pure signal slightly, so the GWN of 22 dB, 16 dB, and 10 dB is added. The time-domain comparison between the pure and noised signals is shown in Figure
Time-domain comparison of noised signals with different SNRs: (a) 22 dB; (b) 16 dB; (c) 10 dB.
Noised image samples with a SNR of 10 dB: (a) EP; (b) IMFA; (c) SPCI; (d) GTM.
Firstly, the experiment for 10 db was carried out on the noised samples obtained by the EP method. The layers capable of being frozen are the convolution layer 1 (
Types of frozen layer combination.
Type | |||||||
Frozen layer | None | ||||||
Type | |||||||
Frozen layer |
Test accuracy and training time.
Among the above
Performance curves of the model with type
The noised samples with a SNR of 16 dB and 22 db were tested in the same way, and the training and test results were sorted out for the best scheme. The results for the selected freezing type in each case are shown in Table
Summary of test results.
SNR (db) | Parameters | EP | IMFA | SPCI | GTM | Average |
---|---|---|---|---|---|---|
22 | Freezing type | — | ||||
Test accuracy | 99.58% | 97.75% | 96.67% | 91.67 | 96.42% | |
Time cost (s) | 161.52 | 179.44 | 194.86 | 165.77 | 175.40 | |
16 | Freezing type | — | ||||
Test accuracy | 97.92% | 90.42% | 83.33% | 76.67% | 87.09% | |
Time cost (s) | 170.68 | 171.54 | 194.14 | 181.55 | 179.48 | |
Relative reduction rate | 1.67% | 7.50% | 13.80% | 16.36% | 9.68% | |
10 | Freezing type | — | ||||
Test accuracy | 96.67% | 78.33% | 86.67% | 59.58% | 80.31 % | |
Time cost (s) | 170.46 | 165.36 | 194.19 | 175.27 | 176.32 | |
Relative reduction rate | 1.28% | 13.37% | -4.01% | 22.29% | 7.78% |
With the noise intensity increase in the original signal, more noise information is introduced into the converted image samples, and the difference between the noised samples and the original pure samples is greater. Because of the influence of noise on the characteristics of the original signal, it weakens the feature differences between different bearing samples, which makes it difficult for CNN to obtain the distinguishable features between different samples.
In the experiment, under the slight noise of 22 db, the four types of image samples have accurate classification results, and their average accuracy is 96.42%. When the noise intensity is increased to 16 dB, the test effect of EP samples is not significantly reduced, and the accuracy is still maintained at 97.92%, while IMFA, SPCI, and GTM are significantly reduced, whose accuracy is reduced by 7.5%, 13.8%, and 16.36%, respectively. Under the strong noise of 10 dB, the relative increase in SPCI samples is 4.01% compared to 16 dB case, and the decrease rates of EP, IMFA, and GTM are 1.28%, 13.37%, and 22.29%, respectively, compared to 16 dB case. It can be seen that IMFA and GTM are more sensitive to noise intensity. However, EP and SPCI can maintain the test accuracy of 96.67% and 86.67% under the strong noise, indicating that they have the good tolerance ability to noise.
It is found that, under the strong noise of 10 db, the training time of four kinds of sample is 170.46 s, 165.36 s, 194.19 s, and 175.27 s, respectively, and EP has the highest test accuracy of 96.67%. Others are 78.33%, 86.67%, and 59.58%. From the results, it can be concluded that the long feature extraction time can be avoided and the model training speed can be accelerated by pretraining the model parameters, but if only the parameter updating of the last fully connected layer is retained, the model learning ability will be weakened. Therefore, different schemes should be tried in the process of parameter transfer to obtain the best transfer learning efficiency, i.e., to improve the speed of network training on the premise of guaranteeing accuracy.
In this paper, four vibration image generation methods are discussed, and in order to distinguish the features among different image samples and optimize the resolution, the adaptive IMFA and gridding GTM are proposed, which provide new approaches for vibration image sample preparation.
In order to give full play to the learning efficiency of CNN model, the best model parameters are obtained by adjusting sensitive parameters including learning rate, optimizer, and regularization, and the trained model has accurate classification result when the samples obtained by EP and SPCI are used.
Aiming at the samples with different GWN, the effect of 13 model freezing schemes on TL is studied. Under the strong noise, the model still has good classification effect. Through the specific TL schemes, the training time-consuming of the model is reduced; meanwhile, the test accuracy can be kept at a high level.
The data used to support the results of this study are available from the corresponding author upon request or can be downloaded at Case Western Reserve University website “
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This research was funded by the National Natural Science Foundation of China (Grant nos. 51605380, 51875451, and 51834006) and Natural Science Basic Research Program of Shaanxi (Program no. 2021JM-391).