A Parameter-Optimized DBN Using GOA and Its Application in Fault Diagnosis of Gearbox

,


Introduction
As a key part of mechanical transmission system, the gearbox is widely used in wind turbine generators, coal mining, and military equipment. When operating, the gearbox is exposed to alternating load, and key parts such as gears and transmission shafts are prone to failure. If the fault is not diagnosed in time and the equipment keeps running, minor faults may turn into serious faults, resulting in machine shutdown, production stagnation, and even casualties [1,2]. erefore, real-time state monitoring and fault diagnosis of the gearbox are necessary measures to ensure the safe operation of these equipments [3,4]. e fault diagnosis process for the gearbox generally includes four steps: data collection, feature extraction, feature fusion, and pattern recognition. Among them, feature extraction is the most critical step, which directly determines the performance of fault diagnosis. Sun et al. [5] proposed a fault diagnosis method for the planetary gearbox based on parameter optimized VMD, determining the parameters of mode number and center frequency adaptively according to the extreme value of power spectral density. Such method can effectively extract fault feature frequency, making accurate diagnosis for crack faults in gears under strong background noises and subtle fault signals. Isham et al. [6] decomposed the vibration signal of the gearbox by VMD, then extracted the time-domain, frequency-domain, and time-frequency-domain features of each IMF component to construct the eigen matrix of signal, and finally trained ELM to establish a fault diagnosis model to complete the intelligent diagnosis of the gearbox in the wind turbine. Zhang et al. [7] took advantage of GWO algorithm to search for the optimized parameters in TVF-EMD matching with the input signal, eliminating the influence of parameter selection on the decomposition results. en, the fault characteristics of rotating machinery were extracted by analyzing the IMF component with the maximum weighted kurtosis index. e abovementioned methods are effective for simulation signals and certain specific fault signals, but they need abundant knowledge in signal processing and rich experience in expert diagnosis. In complex industrial test sites, with huge amounts of data, fault information is often complex and changeable, and it may also contain internal and external excitation as well as the coupling of multiple faults. It is unrealistic to just rely on professional technicians and diagnostic experts for manual analysis. At present, in health monitoring, with the increase of measuring points, sampling frequency, and time length of data collection, a larger amount of data has been acquired by the monitoring system. ese massive data makes the traditional fault diagnosis methods fall into a bottleneck in real-time monitoring efficiency, fault diagnosis accuracy, and self-adaptive analysis capability. erefore, exploiting information from the big data to efficiently and accurately identify the health status has become a new problem in the health monitoring of equipment [8].
With the development of machine learning, fault diagnosis methods based on machine learning models have become a research hotspot, such as BP Neural Network, Support Vector Machines (SVM), and Extreme Learning Machines (ELM). However, in case of high-dimensional big data, when applying the shallow learning model for gearbox fault diagnosis, there is lack of diagnosis and generalization ability in fault diagnosis, the accuracy of which relies on the extraction quality of fault features among big data [9]. As a new method in the field of machine learning, deep learning is increasingly applied in fault diagnosis due to its powerful modeling and characterization capabilities. Different from the traditional fault diagnosis methods of feature extraction and pattern recognition, deep learning integrates them into the deep neural network to carry out the feature extraction of signals in the hidden layer and the recognition of state patterns in the output layer. Lei et al. [8] used denoising autoencoder (DAE) as an unsupervised algorithm in the pretraining stage and BP algorithm as a supervised algorithm in the fine-tuning stage to build a deep neural network, achieving adaptive extraction of fault characteristics and accurate identification of health conditions of different faults in the gearbox under various working conditions and a large number of samples. Jin et al. [9] introduced the multiobjective optimization algorithm to optimize multiple Stack Denoising Automatic Encoders (SDAE) and extracted the diverse fault features of the planetary gearbox. Lei et al. [10] proposed a two-stage learning method for machine intelligence diagnosis, learning the characteristics of signals directly with unsupervised two-layer neural network and then adopting softmax regression to classify the health status. After that the method was successfully verified with relevant data sets. Deep learning avoids the dependence on a large number of signal processing technologies and diagnostic experience, directly extracts fault features self-adaptively from signals in frequency domain, integrates feature extraction and pattern recognition methods in traditional fault diagnosis, and achieves self-adaptive extraction of fault features as well as intelligent diagnosis of health conditions under big data.
Deep learning opens a new way for intelligent fault diagnosis. Wen [11] used DBN with different structures to establish the fault diagnosis model for bearing, evaluated the models through multiple indexes of performance, and selected the network structure with the best diagnostic performance.
rough experiments, Zhang [12] deeply analyzed the influence of number of nodes in the hidden layer, learning rate, and number of iterations on feature extraction ability of DBN and determined how the main parameters should be set. e abovementioned methods have achieved certain effects in DBN network construction and self-adaptive fault feature extraction. However, in the process of parameter selection for DBN, network parameters are still modified according to experience. At this time, the diagnosis model has disadvantages of insufficient stability and high randomness of diagnosis. Based on this, this paper designed a new fault diagnosis method based on DBN to give a set of optimal diagnosis scheme. First, parameter optimization is carried out through GOA to reduce the influence of manual parameter setting on training results. en, the influence of optimal network structure distribution and parameter optimization on feature extraction capability of the hidden layer is analyzed. Finally, the preprocessed data is input into the network for training, and a fault diagnosis model for the gearbox based on DBN is constructed. rough experiments, it has been proved that the method proposed in this paper can effectively improve DBN's self-adaptive fault feature extraction ability and identification accuracy effectively solving the shortcomings in traditional methods under big data.

The Parameter-Optimized DBN Method
In this section, some related algorithms which include DBN, GOA, and the parameter determining criterion are introduced. Based on these algorithms, a new parameter-optimized DBN method is proposed.

Brief Overview of DBN.
DBN is a probability generation network composed of several Restricted Boltzmann Machines (RBMs) [13,14]. e network consists of a visible layer, a hidden layer, and an output layer. e visible layer and the hidden layer are connected by weights, and each neuron itself has an offset to represent its own weight. e output layer and the previously hidden layer form a BP neural network which is mainly used to adjust the initial parameters of the hidden layer to achieve supervised training of the entire network. In the DBN learning process, the data is input from the bottom layer and then through the various hidden layers to complete the training process. e learning process can be divided into two parts: pretraining and fine tuning. Figure 1 shows a DBN structure with n layers hidden.

Pretraining.
Pretraining uses an unsupervised greedy layer-by-layer approach to initialize the connection weights and offsets between the RBM layers. en, each layer of RBM is trained separately from bottom to top [15]. Suppose RBM is an energy-generated Bernoulli model, given the energy of state (v, h) [16]: In this formula, θ is a parameter of the RBM and ω ij is a connection weight between the visible layer node and the hidden layer node. V and H are the number of visual units and hidden units. v i and h j are the node states of the visible layer and the hidden layer. b i and b j are the offsets of the visible layer and the hidden layer. In order to maintain sparseness, the visible layer offset b j can be initialized to lb(p i /(1 − p i )), where p i is the probability of v i � 1. e hidden layer offset a j is initialized to a large positive number and ω ij is initialized to a smaller random number. At this time, the joint probability of the model is as follows: In this formula, Z is a normalization factor. Since there is no connection between the peer nodes, the probability of the visible layer unit v i and the hidden layer unit h j is independent: .
In the formula, σ(x) is a Sigmoid function. Find the edge distribution of p(v, h; θ) to h: θ can be obtained by solving the maximum log-likelihood estimation function on the training set, and the RBM parameter update criterion is obtained by the contrast divergence method [17]: where ε is the learning efficiency and 〈·〉 data and 〈·〉 k are the expected values of the distribution defined by the current model and the reconstructed model.

Fine Tuning.
Since pretraining is unsupervised learning, the initial values of the parameters obtained through pretraining are not optimal parameters. At this stage, the BP neural network is combined with the label to fine tune the parameters for the problem of large output error. e BP neural network is set up at the output layer of DBN and supervised training is performed from top to bottom. According to formula (20), the connection parameters between each layer are optimized to make the best classification ability of DBN. For the complex characteristics of early fault signal, DBN is able to establish a deep model by simulating the deep tissue structure of the brain, which can more effectively characterize the complex mapping relationship between vibration signal and running state of the gearbox.

Parameter Determining Criterion: Minimum Root Mean
Square Error (RMSE). It is necessary to evaluate the network error in the training process. RMSE is the square root of the difference between the reconstructed visual layer state vector and the original data input vector after one Gibbs sampling of RBM with the training sample as the initial state. e specific definition is as follows: In equation (6) e smaller the RMSE, the better the training effect. rough the observation error, the training situation of the model can be judged, and the parameters such as iteration times, learning rate, and number of batch learning can be adjusted to achieve better training effect. erefore, RMSE is an excellent choice as a fitness function in the optimization process. Shock and Vibration

Grasshopper Optimization Algorithm. Grasshopper
Optimization Algorithm (GOA) imitates the swarm foraging behavior of grasshoppers in nature and shows excellent performance in dealing with multiobjective optimization problems [18]. e network formed by grasshopper populations connects all the individuals so that all grasshoppers keep in step, and one individual can determine the direction of predation through others in the group. Since the location of the target is unknown, the position of grasshopper with the best fitness is considered to be the closest to the target. en, the grasshoppers will move in the same direction as the target in the network. With the position update of grasshoppers, in order to achieve a balance between global search and local search, the appropriate range area would decline self-adaptively until finally grasshoppers get together and approach the optimal solution [19,20]: In equation (7), N is the population size; ub d and lb d represent the upper and lower bounds of the dth dimension, respectively; T d represents the current iterative optimal solution; and In this equation, c max is the maximum value of c; c min is the minimum value of c; l represents the current number of iterations; and L represents the maximum number of iterations.
In order to make each grasshopper move towards the optimal solution during each search, it is assumed that the optimal fitness value among individuals in the current search process is the target value. GOA starts optimization with a random initial set of solutions and updates position according to formula (7), where the update of factor c depends on formula (8). e best location of target is updated after each iteration until the termination condition is met and the location and fitness value of the optimal individual are returned. Figure 2, the optimization steps of GOA for DBN parameters are as follows:

The Construction of Gearbox Fault Diagnosis Model Based on Parameter-Optimized DBN
Combined with the characteristics of big data from equipment monitoring and the advantages of deep learning, a fault diagnosis method for the gearbox based on parameter-optimized DBN is proposed. is method achieves the organic combination of unsupervised learning and supervised learning and is capable for self-adaptive extraction of fault features under big data as well as the identification of equipment running state. Also, it is superior to traditional methods which are with poor self-adaptive ability in feature extraction as well as insufficient generalization performance of shallow network in fault identification. e method flow chart is shown in Figure 3. e specific steps are as follows: (1) e vibration signal of the gearbox is preprocessed by FFT and linear normalization.

Experiment Setup.
In this paper, the transmission system of the gearbox is taken as the research object to verify the effectiveness of the proposed method by monitoring and diagnosing its running state. e test bed of drive system in the gearbox is shown in Figure 4(a). It is composed of a drive motor, gearbox, and magnetic powder brake. e schematic diagram of the test rig and accelerometer layout is illustrated in Figure 4 Table 1, 500 sample groups are obtained from each running state, each containing 512 points. In conclusion, the data set for all running states contains 2000 samples, which simulate the gearbox running states under various working conditions and with various faults. During the training and testing of the network, 50% samples are randomly selected for training and the other 50% for testing. Table 1 illustrates 4 running states of normal, pitting, snaggletooth, and abrasion, as well as their corresponding status labels.

Data Preprocessing.
According to the procedure of the proposed method, the vibration signal of the gearbox is preprocessed. e FFT spectrum of different running states are given in Figure 5.
Each signal corresponds to a superposition of several components in frequency domain and can be decomposed by frequency-domain analysis. In order to make the signal more concise and more convenient to represent, each group of samples would go through FFT transformation, obtaining 1024 points. In view of the symmetry of the spectrum, half of the data points are taken for the eigenvector, so as to reduce the dimension of signal feature. e data from sensors in different measurement positions are superimposed to increase the information included in the eigenvector about space and angle. In order to reduce the influence of noises and abnormal samples on the network training, the obtained eigenvectors are normalized linearly to reduce the training time and to speed up the convergence.

Determination for Optimal Parameter Combinations and Network Structure of DBN.
Considering that no formula or theory is known in setting the number of neuron nodes in each hidden layer, many experiments and relevant knowledge are required then. In this paper, three types of hidden layer structures would be analyzed: smooth type (200-200-200), increasing type (100-200-400), and decreasing type (400-200-100). In order to determine both the optimal parameter combination and the optimal network structure at a time, GOA is applied. After searching for the optimal parameter combination of the network with different structures under the same training conditions, the structure whose RMSE converges to the minimum is considered to be the best.
First, the optimal parameters of learning rate and batch extraction in DBN are searched by GOA, with the search range of [0, 1] and [1, 100], respectively. According to Zhang's suggestions [18], parameter setting of GOA are shown in Table 2.
After parameter setting of the optimization algorithm, the parameter search of different network structures is started. As shown in Figure 6, in order to explain the parameter search process in detail, the optimization curve under the network structure of 400-200-100 is given, where the RMSE converges to the minimum value of about 0.0074. Also, the iteration begins to converge after 31 times of calculation, indicating that the algorithm has strong global optimization ability and fast convergence speed, making it suitable for searching optimal parameter combination of DBN. In this case, the optimal combination of parameters obtained by descending type (400-200-100)  In order to determine the optimal network structure, the error curves of three network types with corresponding optimal parameter combination are given. It is indicated in Figure 7(a) that the RMSE of decreasing network (400-200-100) converges faster with smaller value. Figure 7(b) shows the convergence in later period (after 50 iterations), and RMSE of the decreasing type is significantly smaller than that of other types. erefore, the decreasing type is taken as the best structure in this paper.
In addition, the DBN model achieves a good training effect and tends to a stable state at the 100th iteration. Although the increase in the number of iterations is beneficial to improve the effectiveness of fault recognition, the  calculation time required would also increase greatly. Considering the recognition effect and calculation cost comprehensively, the number of iterations is set to 100. e number of nodes in the input layer depends on the sample dimension (2560 dimensions), and the number of nodes in the output layer is determined by the running state (4 states). In this paper, the decreasing structure type with minimum RMSE is applied in the hidden layer. e finally determined parameters of the structure in the DBN model are shown in Table 3.    According to the parameter combination obtained after optimization, the setting of learning parameters in DBN is shown in Table 4.

DBN Hidden Layer Feature Extraction Capability
Analysis. In order to verify that optimized DBN is more capable for feature extraction, the extraction capability of hidden layers before and after optimization is compared. According to the advice given by Hinton et al. [21], the learning rate and the number of batch extraction in DBN selected by experience (viewed as DBN before optimization) are [0. 1,10]. With the same sample and network structure for training, the node values of the third hidden layer are output, and its sparsity is taken as the evaluation index of feature extraction capability.
As illustrated in Figure 8, the features extracted by DBN after optimization are with more sparsity than that of DBN before optimization. Such sparse features can effectively express the essential features of data and can improve the generalization ability of fault features. Table 5 lists the changes of parameter combination, RMSE, and comprehensive distance value during iterations. e comprehensive distance within and between classes, obtained by dividing the distance between classes and that within classes, is an essential criterion for the separability of samples in different classes and the aggregation in the same classes. Under a specified feature, the longer the distance between classes, the more separable the samples in different states. Similarly, the shorter the distance within classes, the more concentrated the samples in same states. erefore, the increase of the   comprehensive distance between and within classes is capable for expressing the improvement in feature extraction ability of the network. As shown in Table 5, RMSE gradually decreases with the iteration, and the comprehensive distance also appears in an upward trend. At the 31st iteration, RMSE decreases significantly and the comprehensive distance increases significantly, until finally they reach stabilization. is fact indicates that, with the iteration of parameters, the feature extraction ability of DBN is improving, which has a direct impact on the reduction of RMSE. e proposed method is capable for extracting fault features self-adaptively from the spectrum of running states. In order to further verify the feature extraction ability of the proposed method, the first three principal components of these features are extracted by KPCA and visualized. en, optimized DBN, DBN set by experience, shallow probability network, and traditional feature extraction are compared, respectively. e shallow probability network adopts a single hidden layer Probability Neural Network (PNN), which follows Bayesian law of prior probability and Bayesian rules of decision to simplify the network training and carry out the nonlinear mapping between original data and features.
Traditional feature extraction method is to extract 20 common characteristics in time domain, frequency domain, and time-frequency domain from vibration signals of the gearbox: mean value, standard deviation, peak value, RMS, root amplitude, margin index, kurtosis index, waveform index, pulse index, peak index, mean frequency, center frequency, RMS frequency, standard deviation frequency, kurtosis frequency, and the first 5 orders of energy entropy in IMF components from EMD. Figure 9(a) is the scatter diagram of principal elements for feature extraction in the proposed method, indicating that the samples in the same state cluster completely in its own space, while those in different states separate effectively without overlapping. Figure 9(b) is associated with DBN set by experience. In the scatter diagram of the first three principal elements, little overlap appears among pitting, snaggletooth, and abrasion, which would have an adverse impact on the accuracy of fault diagnosis. At the same time, the significance of parameter optimization is verified as well, which significantly affects the ability of feature extraction in the network. Figure 9(c) is a shallow probability network. Compared with deep probability network such as DBN, it is   Shock and Vibration discovered that the deep probability network is more capable for feature extraction, while serious overlapping exists among different states under shallow probability network. Figure 9(d) is associated with the traditional feature extraction method. By observing the scatter diagram, the distances between different states are too close with aliasing phenomenon, which is also a main reason for the poor diagnosis effect in traditional fault diagnosis.

Comparative Analysis with Other Methods.
In order to verify the advantages in diagnosis accuracy, the diagnosis rates of the proposed method is compared with DBN set by experience, shallow probability network, and traditional feature extraction combined with ELM. 250 groups (the remaining 50% samples) from each of the four running states in the gearbox are randomly selected. In order to eliminate the errors and to verify the fault identification ability and stability of the model, the test is repeated for 25 times. e test results are as follows. As illustrated in Figure 10(a), the accuracy of the fault diagnosis model established by the proposed method is higher than 99.5% among 25 random sampling tests, and the average diagnosis rate can reach 99.66%, indicating that the proposed method is characterized by the high diagnosis rate and stability for the gearbox under multiple working conditions. Figure 10(b) is the diagnosis rate of the DBN model with empirically selected parameters. e average diagnosis rate is 98.89%, slightly lower than the optimized DBN. Figure 10(c) shows the diagnosis rate of shallow probability network, with an average diagnosis rate of 84.79%. Compared with the shallow network, the deep network is more suitable for big data and self-adaptive fault diagnosis under complex working conditions. Figure 10(d) shows the diagnosis rate of traditional feature extraction combined with ELM, with an average diagnosis rate of 80.93%. Compared with the deep network model, traditional fault diagnosis methods lack in selfadaptive fault feature extraction, monitoring diagnosis accuracy, and generalization performance.

Conclusion
(1) A parameter-optimized DBN method was proposed to improve the feature extraction ability and fault diagnosis accuracy, in which the minimum RMSE in the network is considered as the fitness function, and the newly proposed GOA is properly employed to search for the optimal parameter combination. (2) e parameter-optimized DBN method can selfadaptively extract fault information contained in the signal spectrum of the gearbox, avoiding the dependence on a large number of signal processing methods, and diagnosis experience, which has more advantages in fault diagnosis ability and generalization performance. (3) A novel integrated fault diagnosis model based on FFT, linear normalization, and the optimized DBN is established, which provides a set of new intelligent fault diagnosis procedure. rough experimental analysis, this method is superior to shallow layer networks and traditional methods based on the combination of feature extraction and pattern recognition, which greatly contributes to the new era of intelligent fault diagnosis mode under "big data." Data Availability e data used in this manuscript are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.

Authors' Contributions
Jingbo Gai provided the main idea of the study; Junxian Shen analyzed the experiment and completed the paper; He Wang helped to programme in some problems; and Yifan Hu helped to translate the manuscript.