Design and Application of Deep Belief Network Based on Stochastic Adaptive Particle Swarm Optimization

Due to the problem of poor recognition of data with deep fault attribute in the case of traditional superficial network under semisupervised and weak labeling, a deep belief network (DBN) was proposed for deep fault detection. Due to the problems of deep belief network (DBN) network structure and training parameter selection, a stochastic adaptive particle swarm optimization (RSAPSO) algorithm was proposed in this study to optimize the DBN. A stochastic criterion was proposed in this method to make the particles jump out of the original position search with a certain probability and reduce the probability of falling into the local optimum.)e RSAPSO-DBNmethod used sample data to train the DBN and used the final diagnostic error rate to construct the fitness value function of the particle swarm algorithm. By comparing the minimum fitness value of each particle to determine the advantages and disadvantages of the model, the corresponding minimum fitness value was selected. Using the number of network nodes, learning rate, andmomentum parameters, the optimal DBN classifier was generated for fault diagnosis. Finally, the validity of the method was verified by bearing data from Case Western Reserve University in the United States and data collected in the laboratory. Comparing BP (BP neural network), support vector machine, and heterogeneous particle swarm optimization DBN methods, the proposed method demonstrated the highest recognition rates of 87.75% and 93.75%. )is proves that the proposed method possesses universality in fault diagnosis and provides new ideas for data identification with different fault depth attributes.


Introduction
Machine learning is a popular research interest in artificial intelligence and pattern recognition. Its theory and methods have been widely applied to complex problems in engineering applications and science [1][2][3][4]. ere have also been important achievements in the fault diagnosis of mechanical equipment as many domestic and foreign scholars have conducted in-depth researches and achieved fruitful results in the field. Some machine learning methods used are shown in (Figure 1). For example, the main intelligent diagnostic methods include support vector machine (SVM), artificial neural network (ANN), multilayer perceptron (MLP), kernel method (KMs), and other pattern recognition methods [5][6][7][8]. ese methods have achieved desirable results in the fault diagnosis of mechanical equipment, but they belong to the algorithm structure called "shallow learning." Here, the function fitting must be completed in one or two layers of the model's structure; thus, fault diagnosis results are unstable [9,10]. Moreover, a long latency period of the equipment does not imply sudden failure because there is a long critical state before the failure occurs. e depth of component wear or scratches has not reached the state where serious failure occurred for a while. Sample data of various fault depth attributes have been extracted, but the traditional shallow model has been ineffective in characterizing complex nonlinear mapping relationships between signals and devices. e recognition effect of the shallow network is poor in the case of semisupervised and weakly marked. Hence, developing new fault depth detection methods is imperative.
Deep learning is an emerging machine learning method, which mainly simulates the structure of the human brain and achieves efficient data processing through hierarchical learning. e deep belief network proposed by Hinton et al. is a classic algorithm in deep learning [11], which opens the door to deep learning. For instance, Tamilselvan et al. [12] applied it to monitor and identify the health status of machinery and equipment, and experiments proved that it can effectively identify the fault status of equipment. Compared with other deep learning models such as CNN, RNN, and GAN, DBN can more easily capture data features under big data and has the advantage of good compatibility with other algorithms [13]. Compared with the shallow model, DBN can use the initial stacked restricted Boltzmann machine to unsupervise feature extraction and then use the classifier to fine-tune [14], which can improve the classification effect in the case of semisupervised and weak labeling. Further, Sun et al. [15] used signal processing to extract the fault characteristics of the monitoring signal and used deep learning to diagnose the type of mechanical failure and the degree of damage. Shan Waiping [16] also studied the reconstruction and feature extraction of the original vibration signals of rolling bearings by the DBN. Shao et al. [17] proposed a DBN for time-domain feature extraction and particle swarm optimization (PSO) for the fault diagnosis of rolling bearings. However, these studies have only set DBN's structural parameters based on experience or repeated experiments. e model optimization training takes time and is difficult to achieve accurate fault diagnoses. erefore, to improve the accuracy of fault diagnosis and reduce the optimization time of the model, this study proposes a fault diagnosis method based on the stochastic adaptive particle swarm algorithm (RSAPSO) and DBN. Particle swarm optimization is a swarm intelligence global optimization search algorithm, which has been well applied in neural network parameter optimization [18]. Compared with other intelligent optimization algorithms, PSO has fewer parameters, is easier to implement, and can be more accurate results [19]. Using the parallel search capability of the RSAPSO, the model parameters of DBN were optimized and selected. is method used DBN training on the sample data to construct a fitness function. e final recognition error rate was used as the termination condition of the improved particle swarm algorithm iteration. e RSAPSO is associated with the parameter optimization of the DBN to effectively generate a suitable classifier to improve the accuracy of fault diagnosis rate.

Stochastic Adaptive Particle Swarm
Optimization Deep Belief Network

DBN Training Process.
e DBN is a stack of multiple RBMs. Its structure is shown in Figure 2. e lower layer represents the details of the original data, and the upper layer represents the data attribute category or feature. e data is abstracted layer by layer from the lower layer upward. e DBN can deeply dig the essential characteristics of the data. It reduces the impact of human factors and effectively improves the training results of neural networks. e core pretraining method of DBN is the greedy layerby-layer learning algorithm, which trains each layer of RBM separately and is unsupervised. at is, RBM1 is completely trained before RBM2 is trained. RBM is unsupervised training and has no expected output. Its role is to extract features and adjust the training parameters of the model. Its node value is binary: 0 or 1.
After completing the RBM pretraining, the entire DBN preliminary structure is formed, but a labeled adjustment link to the DBN is still required. A BP classifier is set at the last layer of the DBN network. e fine-tuning process is supervised, and the weights and biases obtained by the RBM pretraining are adjusted to make the network training more accurate.
is improves the recognition accuracy of the network. e steps of the DBN fault diagnosis method are outlined in Figure 3.
Initially enter all samples, including marked and unmarked. Extract features by stacking RBM, and then use labeled sample labels for fine-tuning to achieve semisupervised fault diagnosis.

Stochastic Adaptive Particle Swarm Algorithm.
In the standard PSO algorithm, the weight adjustment formula has larger limitations, and the adjustment range of the weight w is smaller. erefore, the shortcomings of local optimization and low search accuracy often occur. To improve the search accuracy of the algorithm and reduce the probability of falling into a local optimum, this study adopts an RSAPSO algorithm, which introduces stochastic adaptation in the algorithm.
is allows particles to stochastically reset the position with a certain probability, thus jumping out of the original position and search again. Consequently, the probability that the particle swarm is trapped in a local minimum is reduced.
To begin with, the RSAPSO algorithm modifies the inertia weight formula of the PSO algorithm, as shown in where w max and w min are the maximum and minimum values of inertial weight, respectively; fitness(p) is the fitness value of the pth particle of each generation; f max and f min are the maximum and minimum fitness values of each generation of particles, respectively. is method can automatically adjust the parameters of the inertia weight w according to the current particle fitness value. When the fitness value is large, w becomes large, which can increase the particle search speed and improve the global search ability of the particle. Conversely, when the fitness value is small, w becomes small, which can reduce the particle search speed and improve the local search ability of the particle. e modified inertial weight w has a larger adjustment range and improves the searchability of the algorithm, as shown in Figure 4.
Second, the particle swarm algorithm does not have a process such as cross-mutation, and it tends to fall into the local optimum; therefore, the stochastic rule is added. When the stochastic number in the update formula is greater than the set threshold, let the particles stochastically reset the position with a certain probability, thereby reducing the probability of the particle swarm falling into the local minimum. e stochastic criterion is shown in where P is the threshold, x max is the maximum allowed position, x id is the current position, and rands(1, D) is the D stochastic number between 0 and 1.

RSAPSO-DBN Model.
A neural network has a tendency to train new samples and forget old samples during training, and too many types of training will cause a low learning efficiency and slow convergence speed. erefore, it is necessary to choose a suitable momentum parameter m and learning rate η. When the values of m and η are too large, the corresponding update weights W and thresholds a and b will increase, which will increase the convergence speed. However, it will make the model unstable, the loss function will continually oscillate, and it is difficult to improve the accuracy, as shown in Figure 5. As shown at 1 in Figure 5, when the values of m and η are too small, W, a, and b will become smaller, which will cause the model to converge slowly and require longer training time, and the values may fall into the local maximum during reverse fine-tuning. is results in model training failure, as shown at 2 in Figure 5. erefore,    the need to choose the optimal momentum parameters and learning rate is the key to a successful model training.
After the training set and classification are determined, the corresponding input and output layer nodes are determined accordingly. A large number of experiments show that if the number of nodes in the hidden layer N is too small, the network cannot have the necessary learning and information processing capabilities. Conversely, if it is too large, it will not only greatly increase the complexity of the network structure, but the network will be more likely to fall into a local minimum during the learning process, which will make the learning speed of the network very slow.
As there are currently no methods for determining the nodes of a deep neural network, the learning rate η, the momentum parameter m, and the number of nodes N in the hidden layer, most studies still determine the network structure parameters of the DBN based on experience or multiple experiments. e optimization parameters of particle swarm are learning rate η, momentum m, and number of hidden layer nodes N.
After the population is initialized, the parameter analysis space is set, and then iteration starts. When the number of iteration steps is less than the maximum number of steps, the fitness value of each particle will be continually calculated and the minimum fitness value among them will be recorded and checked against the convergence claim. When the convergence requirement is reached, the loop is skipped and the training is completed, but when the requirement is not met, iteration is continued until the maximum number of iteration steps is reached. Finally, the parameters of the minimum fitness value are recorded, and the DBN classifier is output. e optimization process is shown in Figure 6.

Network Optimization Analysis
As stated already, the particle swarm optimization neural network has three parameters: learning rate η, momentum parameter m, and number of hidden layer nodes N. For the network optimization analysis, the number of particles is set. Generally, 20-40 particles are selected, and the particle dimension is defined as 3, corresponding to the three parameters. at is, each particle is represented as p(m p , η p , N p ).
e RBM model is an energy-based model that defines an energy function that is used to introduce a series of probability distribution functions:   Mathematical Problems in Engineering where n v represents the number of nodes in the visible layer; n h represents the number of nodes in the hidden layer, which represents the weight value of the ith layer of the visible layer to the jth node of the hidden layer; θ � W, a, b { } represents the set of all parameters of the system.
Using the above energy function formula, when θ is determined, the joint probability of (v, h) can be obtained according to the energy function as where z θ is a normalized term that guarantees that P becomes a probability distribution and a partition function. e conditional probability of the hidden unit i given the visible unit vector v and the conditional probability of the visible unit j given the hidden unit vector h are From the calculated probability distribution, Gibbs sampling is used to extract a sample h 1 ∼ P(h 1 | v 1 ); h 1 is used to reconstruct the visible layer (i.e., the hidden layer) to infer the visible layer. en, the apparent layer's activation probability P(v i � 1 | h 1 ) is calculated. Similarly, a Gibbs sampling is taken from the calculated probability distribution to extract a sample v 2 ∼ P(v 2 | h 2 ). en, the activation probability P(h j � 1 | v 2 ) of each neuron in the hidden layer by v 2 is calculated. is was performed for k Gibbs samplings. e weights and thresholds are updated according to e parameters p(m p , η p , N p ) of 40 particles are initialized, the analytic space of m p , η p , N p is defined, and the fitness function of the particle swarm algorithm is defined as the DBN recognition error rate. Further, the parameter values of 40 particles are put into the DBN for testing, the fitness value corresponding to each particle is calculated, the particle p with the smallest fitness value is selected, and its corresponding p 1 (m p , η p , N p ) is copied. is is performed iteratively for 40 particles: the fitness value corresponding to each particle is calculated, and the particle parameter p 2 (m p , η p , N p ) corresponding to the minimum fitness value is selected. is continues until the particle fitness value reaches the requirement or the maximum number of iterations, p(m p , η p , N p ) corresponding to the last minimum fitness value is selected, the parameter value is assigned to the DBN network, and the optimal DBN is output.
Using a particle swarm algorithm to select network parameters not only avoids the uncertainty of parameter value selection caused by human experience but also improves the implementation efficiency of the network.

Mathematical Problems in Engineering 5
A total of 1600 training samples and 400 test samples were obtained for this study.

Experimental Verification.
e data in Section 3.1.1 was used to input BP, SVM, standard PSO, optimized APSO, and the improved RSAPSO-optimized DBN network used in this study for 10 tests. Among these, the DBN's hidden layer node analysis space was [10,20], the learning rate parsing space was (0,0.1], and the momentum parameter parsing space was [0.8,1). e recognition rates of the five different models are shown in Figure 8.
It can be seen from Figure 8 that the shallow network's BP and SVM struggled to identify data with different fault depth attributes. e recognition rate is only 50% once, and the others are below 50%. Both PSO-DBN and APSO-DBN only exceeded the RSAPSO-DBN recognition rate once, and the highest recognition rate was lower than the RSAPSO-DBN proposed in this study. It can be seen that the actual effect of the proposed method is due to the other two PSO methods. More precisely, the structural parameters of the DBN were defined. e parameters of the optimal result particle swarm optimization DBN are shown in Table 1.

Experimental Analysis of Different Fault Depth Databases
Based on Support Vector Data Description (SVDD). In reality, it is difficult to obtain the data of different fault depths of the same fault type. Hence, it is a serious limitation of this kind of research. us, the authors proposed a linear sample data generation method based on SVDD for generating data with different fault depth attributes.
SVDD is a description method based on boundary data. e goal is to find a minimal sphere or domain that contains all or almost all target samples. When it is difficult to obtain different degrees of damage to the same part, SVDD is used to describe the fault data. us, abnormal data with different degrees of damage are constructed for further fault diagnosis research. e linear weight selection was selected according to the time-domain characteristic formula. Taking the peak as an example, the envelope contrast images of two different fault depths were drawn, as shown in Figure 9.
As shown in Figure 9, when the fault depth is 0.007 inches, its vibration is relatively stable, and at a fault depth of 0.021 inches, its peak value relatively increases by a certain linear ratio. erefore, for a single sample of data, different depth attribute data are generated by increasing a certain linear ratio. is is based on the relationship between the recognition accuracy and the linear sample library generated here, and the fault state is shown in equation (11). e supersphere description image is shown in Figure 10. One has where A is the recognition accuracy during construction. Use a single sample dataset IF1/OF1/RF1 to create a trained SVDD model. e single sample dataset is increased by a certain linear ratio and input to SVDD for judgment, and the linear weight is judged according to equation (7), thereby constructing a linear sample database.

Construction of Deep Sample Library for Laboratory Data.
is study collected the bearing data in the laboratory rotating machine test bench. e experimental platform is shown in Figures 11 and 12. e bearing data is divided into 4 types: normal, rolling element failure (RF), inner ring failure (IF), and outer ring failure (OF). e data sampling frequency is 5.12 kHz. Figure 11 shows the red circle marks for sensor sampling. e authors selected three levels of data: 100%, 50%, and 0% for illustration. e relationship between recognition accuracy and linear weight is shown in Figures 13-15  networks and three PSO-optimized DBN diagnostics. e DBN's hidden layer node parsing space was [10,20] and the learning rate parsing space was (0, 0.1]. e parsing space of the momentum parameter was [0. 8,1). e line chart of the recognition rate after 10 tests is shown in Figure 16. e structural parameters of the DBN corresponding to the optimal result are shown in Table 3. Each type of data was selected, making 160 sets of training samples and 40 sets of test samples. e test results are shown in Table 3. e line chart of the highest recognition rate of the three particle swarm optimization algorithms is shown in Figure 17.
According to the test results in Table 3 and Figure 15, it can be seen that, in the generated linear sample database with different degrees of damage, DBN can well identify          Figure 16, it can be seen that, among the 10 types of fault type and average recognition rates, RSAPSO-DBN has the best effect. Only one fault recognition rate is lower than PSO-DBN; hence, it is proven that the RSAPSO optimization method proposed in this   paper is superior in more cases.
e DBN structure parameter defined thereby has the best performance and can better identify data with different fault depth attributes.

Conclusions
It is difficult for a shallow network to identify data with different fault depth attributes. Hence, a deep neural network is required for fault diagnosis. Compared to a shallow network, the deep network does not fall into a local optimal situation, which can effectively perform fault depth identification. Due to the inaccuracy caused by a large number of experiments or empirically defined methods for the DBN network structure parameters, the improved particle swarm algorithm proposed in this study was compared with traditional algorithms. Results show that its parallel search ability can effectively determine the network structure parameters of DBN compared to custom parameters as it improves efficiency and accuracy.
Meanwhile, the proposed method has been used in most mechanical equipment such as bearings, motors, and roadheaders in modern machinery. By analyzing the rules of vibration signals presented by mechanical equipment, faults can be diagnosed and the state can be detected. ere are still some problems with the method proposed in this paper, which increases the complexity of the network model, resulting in an increase in its single run time and an increase in the number of calculation steps, which also puts a certain burden on the equipment; the population optimization algorithm is ultimately external. e optimization algorithm needs to further define the structural parameters from the principle of deep learning; at present, this method is only used in fault diagnosis of mechanical equipment, and it needs to be further extended to other fields of machine learning. erefore, how to solve the above problems will be the next key research direction of DBN research.

Data Availability
Experimental data were collected on a rotating machine test bench inside the China University of Mining and Technology (Beijing) Laboratory. Data cannot be shared publicly due to some confidentiality reasons.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.