Adaptive Fisher-Based Deep Convolutional Neural Network and Its Application to Recognition of Rolling Element Bearing Fault Patterns and Sizes

Deep learning has the ability to mine complex relationships in fault diagnosis. Deep convolutional neural network (DCNN) with deep structures, instead of shallow ones, can be applied to mining useful information from the original vibration data. However, when the number of the training samples is small, the diagnosis accuracy will be affected. As an improvement of the DCNN, deep convolutional neural network based on the Fisher-criterion (FDCNN) can be used for the fault diagnosis of small samples. But the model parameters in the method are based on human labor or prior knowledge, which is bound to bring negative influence on the diagnosis accuracy. )erefore, a novel adaptive Fisher-based deep convolutional neural network (AFDCNN) method, which can optimize the model parameters adaptively, is proposed as an improvement of the FDCNN. Comparative verification test results show that AFDCNN has more outstanding performance.


Introduction
ere are many benefits of the intelligent rotating machinery health monitoring and fault diagnosis; for example, it can reduce the dependence on the costly training and highly skilled operators and detect potential hazards before a catastrophic failure occurs [1,2]. Meanwhile, it can also reduce the operation and maintenance costs of the complex engineering systems. Rolling bearings are widely used as critical moving parts of rotating machinery, its state of health matters [3,4]. erefore, the intelligent health condition monitoring and accurate fault diagnosis of the rolling element bearings are of great significance.
To meet the needs aforementioned, some fault diagnosis methods, such as BPNN and SVM, have been used for the machinery health monitoring [5][6][7][8][9][10][11]. While, with the larger scale, higher speed and much more complex of the rotating machinery, it is ideal for fault diagnosis method that can identify the health status of the diagnosis object accurately, quickly, and intelligently.
ere are more stringent requirements that the fault diagnosis methods could be more intelligent indeed.
As a great progress of the diagnosis method, deep learning [12] has the ability to solve the problems that the traditional fault diagnosis methods have to extract features on the basis of prior knowledge and have limited capacity to mine the hidden relationships in the fault quantitative diagnosis. e deep convolutional neural network (DCNN) with deep structures can be established on the basis of the deep learning theory [13,14]. It can mine distributed features from the original vibration data adaptively [15,16]. Ever since deep learning theory has been used in the mechanical fault diagnosis, it has attracted a lot of attention [17]. Jun Lee and Kim [18] proposed a novel algorithm for localizing slab identification numbers (SINs) in factory scenes by using DCNN. Bai [19] used DCNN to extract features and achieved good diagnostic results. Guo et al. [20] proposed a novel hierarchical learning rate adaptive deep convolution neural network based on an improved algorithm and applied it to the bearing fault diagnosis. Verstraete et al. [21] proposed a fault diagnosis method based on DCNN and time-frequency image analysis and achieved good results on two public datasets of rolling element bearing vibration signals. Zhuang et al. [22] proposed a novel deep learning method based on the DCNN and achieved ideal results as well. Zhang et al. [23] proposed a deep graph convolutional network on the basis of graph convolution operators, graph coarsening methods, and graph pooling operations; the experimental results demonstrate that the proposed method can be used to detect different kinds and severities of faults in roller bearings by learning from the constructed graphs. Wang et al. [24] proposed an enhanced intelligent diagnosis method based on multisensor data-fusion and DCNN, and the proposed method achieved higher prediction accuracy and more obvious visualization clustering effects. e aforementioned applications show that the DCNN is a potential tool in dealing with fault diagnosis of rolling element bearing. While, as a diagnosis model based on the training samples, DCNN is influenced by the number of training samples as well [25]. Here comes the problem, the experimental vibration samples with labels cannot be always sufficient. In which, some of them are very difficult to obtain [26,27]. e deep convolutional neural network based on the Fisher-criterion (FDCNN) is used for words recognition of small samples [28]. Aiming at the shortcoming of DCNN and learning from related methods in image recognition, this paper has adopted the Fisher classification criteria in the back propagation of model training. But the model parameters in [28] are set based on prior knowledge, which is bound to bring negative influence on the recognition accuracy. erefore, a novel adaptive Fisher-based deep convolutional neural network (AFDCNN) method in which the model parameters can be optimized adaptively is proposed for the fault diagnosis of bearings in this paper. e advantages of the proposed method are stated again as follows: (1) e AFDCNN is able to extract fault features from the original data adaptively (2) e AFDCNN is able to establish the hidden relationship between the machinery health conditions and the signals measured adaptively (3) Based on limited samples, AFDCNN can achieve perfect performance compared to DCNN (4) e proposed method avoids dependence on expert experience to some extent e architecture of the paper is organized as follows. First, a brief introduction to the traditional DCNN is given. Second, the DCNN model is improved based on the idea of Fisher-criterion. It can be more conducive to the classification characteristics direction. However, the model parameters in the method are based on prior knowledge, which is bound to bring negative influence on the diagnosis accuracy. erefore, the FDCNN model can be improved by using the optimization algorithms for optimizing the parameter combination adaptively, and then, the AFDCNN is proposed.
ird, the collected bearing fault samples are mainly used for two purposes. One of them, the training sample set is used to build the model, and the other, the test sample set is used to verify the model. Furthermore, the contrast verification is expanded between the traditional methods and the AFDCNN method. Forth is the conclusion.

Brief Introduction to the DCNN
Essentially, a typical 10-layer DCNN model shown in Figure 1 has two parts [20]: the feature extractor and the Softmax classifier. e feature extractor has one inputting layer and three alternating convolutional layers (or C-layer), maxpooling layers (or P-layer), and two full connection layers (or FC-layer). e C-layer is used for feature extraction, and the P-layer is used for resampling. After several alternating C-layers and P-layers, the FC-layer is followed to compute the class scores. en, the class scores are inputted into the Softmax classifier and the diagnosis results could be obtained.

e Convolutional Layer (C-Layer).
e filter bank is described as follows: in which w l k is a linear filter of the l-th layer, its size is m l × m l , and k ∈ 1, 2, . . . , d l h , d l h is the number of different kernels or filters in the W l . A matrix I l−1 p with size ω l− 1 × ω l− 1 is convolved with the filter w l k . e operation can be rewritten as e Softmax function is

e Pooling Layer (P-Layer).
e P-layer is used for resampling. After the operation, the matrixes' size becomes size: in which ω l− 1 is the size of inputting sample of the l-th layer and s is the down sampling size, for example, when the mean sampling method is used, s is 2.

e Softmax Classifier.
e Softmax classifier can be described as follows: where p W (l) (·)(i ∈ [0, 1]) is an activation function and its parameter is W (l) . e parameter W (l) is learned by a 2 Mathematical Problems in Engineering training set, and x l is the learned feature.
e result of equation (5) is a label between 0 and 1. Furthermore, the predicted class i ∧ and score s(i) can be described as Compared to traditional fault diagnosis methods, DCNN has won widespread attention by relying on the advantages of adaptive feature extraction. e reconstruction errors between the inputs and outputs have been selected as the energy function in the method. e connection weights of the network will be optimized and adjusted through the forward and back propagation process. en, the energy function can be minimized. e weights sharing principle has been used in the forward propagation process to reduce the complexity of the algorithm. e sample feature vector obtained will be adjusted by the weights and bias, and then the sample prediction labels can be obtained through an activation function. In order to obtain a better training model, the process of weight optimization will be one of the key factors.

Adaptive Fisher-Based Deep Convolutional
Neural Network (AFDCNN) 3 (1) , y (1) ), . . . , (x (m) , y (m) ) and they are n categories, respectively, the traditional energy function [26] can be represented as follows: where W is the weight value of each unit and b is the bias term and h W,b (x (i) ) is the output of the last neural network layer, namely, the fault-pattern index of the sample x i . e target of the training network is to find the minimum value of the function J(W, b) by adjusting the W and b. Using the gradient descent method to optimize the objective function, the iterative formula can be represented as where α is the learning rate. Before using the back propagation algorithm, the first step is the forward propagation, and it has been used to calculate the output value h ω,b (x (i) ) of the last layer of the network. en, the error value between the h ω,b (x (i) ) and the actual value can be calculated. e error can be represented as where nl is the order of the output layers, Z (l) i is the sum input of the l layer of the i unit, Z (nl) i is the sum input of the last layer of the i unit. e minimum error between the input tag value and prediction value has been used as the energy function in the back propagation process for the adjustment of the accurate weights.

e Optimization Process of Model Training.
In the back propagation process of the DCNN, the adjustment of the weights can be more conducive to the classification characteristics direction based on the idea of Fisher-criterion. At the same time, the search space of the weights iteration is affected by the discriminant conditions, and it can be more conducive to the classification characteristics direction as well.
J 1 is the similarity measure function in the class, and it is defined as the sum of all the samples with the category average distance. J 2 is the similarity measure function between the classes, and it is defined as the sum of the average distance classes of all samples.
where M (i) is the mean value of the i category samples, and it can be represented as

Mathematical Problems in Engineering
When J 1 is used as an energy function in the gradient algorithm, after each iteration, the prediction category will be closer to the actual one, and when J 2 is used, the distance between the different categories will be bigger. In order to make the features learned by each DCNN layer more conducive to the diagnosis, the model as follows is used.
in which R is the energy function of the DCNN and J is the overall energy function. e parameter combination [c, μ] is depending on the expert experience, which is bound to bring negative influence on the training model.

e Improved Optimization Process of Model Training.
In order to avoid the influence of human factors on model training and obtain the parameters adaptively, several optimization algorithms have been adopted and compared.
Before the optimization process, the objective functions can be derived as follows.
For the function J 1 , the calculation formula of the output layer of residual error for each unit can be represented as For the function J 2 , the calculation formula of the output layer of residual error for each unit can be represented as    Mathematical Problems in Engineering e particle swarm optimization (PSO) and stochastic gradient decent (SGD) are adopted for optimizing the parameter combination [c, μ], adaptively and respectively.
In the model, all the weights can be obtained from the BP algorithm after the last layer residual error is minimized. According to the different working conditions of the same object being diagnosed, the optimal parameter combination [c, μ] obtained by the use of the optimization algorithm should satisfy the condition that the AFDCNN diagnosis model can be quick and accurate.
In this paper, the rolling element bearing is used as the object being diagnosed. Assume that the object being diagnosed has m kinds of faults, the i th (i � 1, 2, . . . , m) category has n i samples and the sampling frequency is f. e proposed method includes three convolutional layers, three max-pooling layers, and two full connection layers. e flow chart for AFDCNN is shown in Figure 2

Experimental Comparison
e bearing data are provided by the Case Western Reserve University (CWRU) [29]. e main components of the experimental apparatus were a 2-hp motor, a torque transducer, and a dynamometer. e motor shaft was supported by 6205-2RS JEM SKF bearings. e data were collected with the sampling frequency of 12 kHz, and the sampling time was 1 s. Figure 3 shows the time domain samples of four kinds of health conditions that are normal (N), outer race fault (OF), inner race fault (IF), and roller fault (RF). Table 1 shows the sample division of the dataset obtained.
e configuration of the computer is Intel(R) Core(TM) i7-7400 CPU 16G RAM.

Description of the Data.
Description of the data are provided in Figure 3 and Table 1. e convolutional neural network structure of the AFDCNN is the same to the DCNN and FDCNN. e flow chart of Figure 5 described the hierarchical framework of the proposed method. e flow chart of Figure 6 Figures 7(b) and 8(b). Table 3 stated the models adopted in this paper and the diagnosis results. From the diagnosis results of the different models, it can find that both the PSO-AFDCNN and SGD-AFDCNN models have the superior ability on the recognition rate, and because of the difference of optimization speed, the SGD-AFDCNN has shown better performance.  Furthermore, the comparisons of the bearing fault quantitative diagnosis between the PSO-AFDCNN model and SGD-AFDCNN model are adopted, and the diagnosis results are shown as Figures 9 and 10.
To further analyze the evaluation performance of the two methods, a statistical indicator is used to quantify the accuracy of the second layer of the proposed system. e accumulation error, which denotes the maximum deviation from actual fault size, is defined as follows: e formula above is used to calculate the maximum error achieved using the PSO-AFDCNN and SGD-   Table 4 for comparison. e conclusion that both PSO-AFDCNN and SGD-AFDCNN have perfect diagnosis ability on the bearing experiment data can be obtained from the compassion results. Furthermore, the SGD-AFDCNN showed faster diagnosis speed, while the PSO-AFDCNN obtained better diagnosis accuracy. e superiority of the proposed hierarchical AFDCNN model is confirmed by the experimental comparison results collectively.

Conclusion
In this paper, a novel DCNN model, which can be called as AFDCNN, is proposed, and the contrast verification  (1) It is able to extract fault features from the original data adaptively (2) It is able to establish the hidden relationship between the machinery health conditions and the signals measured adaptively (3) Both SGD-AFDCNN and PSO-AFDCNN have perfect performance on the bearing fault-pattern recognition, and SGD-AFDCNN showed better calculation ability than PSO-AFDCNN, while, in the process of quantitative diagnosis, POS-AFDCNN obtained better diagnosis accuracy (4) e proposed method avoids dependence on expert experience to some extent e results of the experiments demonstrated that the proposed AFDCNN model has superior ability compared to other methods, such as DCNN and FDCNN. e AFDCNN model achieved a high degree of fault diagnosis accuracy and offered an automatic feature extraction method which could be a practical and convenient method for the bearing fault diagnosis.

Conflicts of Interest
e authors declare that they have no conflicts of interest.