Application of Model-Based Deep Learning Algorithm in Fault Diagnosis of Coal Mills

+e coal mill is one of the important auxiliary engines in the coal-fired power station. Its operation status is directly related to the safe and steady operation of the units. In this paper, a model-based deep learning algorithm for fault diagnosis is proposed to effectively detect the operation state of coal mills. Based on the system mechanism model of coal mills, massive fault data are obtained by analyzing and simulating the different types of faults. +en, stacked autoencoders (SAEs) are established by combining the said data with the deep learning algorithm. +e SAE model is trained by the fault data, which provide it with the learning and identification capability of the characteristics of faults. According to the simulation results, the accuracy of fault diagnosis of coalmills based on SAE is high at 98.97%. Finally, the proposed SAEs can well detect the fault in coal mills and generate the warnings in advance.


Introduction
e coal mill is one of the important auxiliary equipment of coal-fired units, and its operating status is directly related to the safe and stable operation of the units. When a fault occurs in the coal mill, the fuel supply of the boiler cannot be guaranteed which creates the mismatch between boiler energy output and the turbine power output. Under this situation, a quick load rejection operation will occur, which directly leads to fire extinguishing in the furnace. e fault in the coal mill will cause large economic loss to power generation enterprises and decrease the safety and stability of the power system. erefore, it is of great necessity to guarantee the normal operation through effective fault warning and diagnosing of coal mill.
Agrawal et al. [1] divided the fault diagnosis methods into three categories: model-, signal-, and historical operation data-based fault diagnosis methods. Model-based fault diagnosis methods need to establish the mathematical model of the coal mill. Odgaard and Mataji [2] used a simplified energy balance equation to monitor and diagnose abnormal energy flow in the coal mill. Andersen et al. [3] designed a Kalman filter to estimate the moisture in the coal that enters and exists a coal mill to determine whether the energy in the coal mill is in normal condition. Based on the multisegment model of coal mills established by Wei et al. [4], Guo et al. [5] realized the monitoring of the state of coal mills by identifying the abnormal variation in the model parameters.
Model-based fault diagnosis methods analyze the mathematical model of the actual object for fault diagnosis, and thus, the physical meaning is clear. However, establishing the exact model in practical application is difficult [6][7][8][9][10].
us, the operability of these methods is poor. Signal-based fault diagnosis systems are widely used to evaluate the health of mechanical equipment. Many signals of high frequency change during the operation of the coal mill, such as current of coal mill, outlet primary air flow of coal mill, and differential pressure of primary air. Su et al. [11] designed a system that records the vibration signals of the coal mill and shifts them to energy amplitudes by use of wavelet analysis. Whether the coal mill is in coal interruption or coal choking, other fault operations can be determined by analyzing the relationship between the vibration signals and the amount of coal in the mill. Kisić et al. [12] proposed a method to detect the wear degree of grinding roller and analyzed the multivariate control chart on the frequency spectrum to find the appropriate time to replace the worn parts. Collura et al. [13] utilized model identification and signal processing techniques to develop a coal mill performance monitoring tool based on real-time detection of the fineness of pulverized coal. Compared with the modelbased fault diagnosis method, the signal-based fault diagnosis method does not need to establish complex object model. Only through the analysis of collected data can a fault in the system be found. However, these methods often need to install a large number of sensors to collect signal, thereby resulting in high implementation and maintenance costs.
Fault diagnosis based on historical operation data is mainly done by analyzing the differences between the normal operation data and fault operation data to determine the health status of coal mill. Han and Jiang [14] proposed a fault diagnosis method based on fuzzy decision clustering and used a single-layer neural network to realize three kinds of fault identification of coal mill. Qin et al. [15] utilized the abnormal operation data of the coal mill to establish an expert system to determine the operating status of the coal mill by comparing the trend of the model output with the expert system. A data-based fault analysis method is a datadriven approach, and even researchers who are unfamiliar with the system can use relevant algorithms for analysis. However, fault types and fault data in the mass historical data of the thermal power units are incomplete and a datadriven method requires analyzing a large amount of fault data. us, selecting the fault data from the vast amount of historical data one by one is difficult [16][17][18]. e model-based fault diagnosis method needs to establish an accurate model of the coal mill in order to obtain good fault diagnosis results. However, the coal mill is a complex object with multiparameter coupling. It is difficult to establish an accurate mathematical model. e premise of applying signal-based fault diagnosis methods is to be able to measure the monitored parameters. erefore, it is necessary to install a large number of new sensors on the shell of the coal mill. However, when the coal mill was initially constructed, it usually did not consider reserving the mechanical interface for new sensors. So, it is not easy to install new sensors on the shell of the coal mill. e fault diagnosis method based on historical operation data firstly needs to obtain a large amount of fault operation data of the coal mill. However, the fault data of the coal mill is usually mixed with the normal operation data, which is difficult to classify and identify. Based on the above analysis, the existing methods are difficult to achieve good application results for the fault diagnosis of coal mills. Although the above three types of traditional methods have shortcomings, combining their advantages can find a simpler and more effective method to solve the fault diagnosis of coal mills. e basic idea is to obtain fault simulation data based on a simplified model and use big data analysis for fault identification. In recent years, the rapid development of deep learning algorithms has provided the possibility of big data analysis. Guo et al. [19] constructed an adaptive convolution neural network, which greatly improves accuracy of fault diagnosis of motor bearing. Duan et al. [20] used a deep learning algorithm to study the missing traffic data to implement the interpolation of missing data. In this study, a model-based data-driven fault diagnosis method is proposed to obtain a fault diagnosis method with simple operation, low cost, and high accuracy. First, on the basis of the simplified coal mill model, massive fault data of the coal mill are obtained by analyzing the fault principle and simulating the fault operation status of the coal mill. is method solves the difficulty in obtaining a large amount of fault data manually from the massive data. en, stacked autoencoders (SAEs) with multilayer neural networks are established on the basis of the theory of deep learning algorithm. e numerous fault data obtained by the steps above are used in the training of the networks to fully motivate the nonlinear characteristics of deep neural networks.
us, the built network can accurately learn the essential characteristics of all kinds of faults and then achieve the early warning and diagnosis of the fault in coal mills. e above method can greatly improve the fault diagnosis accuracy of the coal mill, and at the same time, it can also provide fault warning to the operator, which is of great significance for ensuring the safe operation of the power plant and ensuring the safety of the equipment. e rest of the paper is organized as follows. Section 2 introduces the working principle of the coal mill and its nonlinear dynamic model. Section 3 analyzes the mechanism of two typical coal mill faults and obtains a large number of fault data by simulation experiments. Section 4 introduces the working principle of the SAEs and makes certain improvements to the model. Section 5 is the simulation analysis that aims to verify the effectiveness of the proposed method in the fault diagnosis of the coal mill. Section 6 elaborates the conclusions of the study.

Brief Introduction of the Coal Mill System
2.1. Working Principle. MPS-type medium-speed coal mill [21] is a roller-type coal mill designed and manufactured by Babcock, Germany. Such mills are characterized by smooth output, low energy consumption, and long maintenance period. In this study, MPS180-HP-II medium-speed coal mill is used in the analysis. e maximum output is 44.496 t/ h, and the fineness of coal powder R90 is 22% (Figure 1). R90 indicates the probability that coal powders cannot pass through a sieve with a pore size of 90 μm. e raw coal falls into the coal mill through the coal dropping pipe and is milled into coal powder under the squeezing effect of two milling parts (grinding disks and rollers) [4]. e primary air enters the coal mill through the annulus around the grinding disk to dry the coal powders and bring them into the coarse coal separator for separation. e qualified fine coal powders are blown into the boiler for combustion while large ones return into the coal for subsequent milling.

Mathematical Model of the Coal Mill.
e operation of the coal mill involves the mass balance of coal and the energy balance of the entire coal mill. Establishing an effective dynamic mathematical model of coal mills is an important prerequisite for the state monitoring of coal mills. Zeng et al. [22][23][24] established an MPS medium-speed mill model (equation (1)), which includes three inputs and three outputs based on the mass and energy balance of the primary air and coal moisture in the mill. e proposed method is based on this model, and the symbolic description of the model is shown in Table 1: In Equation (1), U L , U H , and W c are the control quantities of the model; W air , T out , and W pf are the output quantities of the model; and K i and T i (i � 1, 2, . . ., 15, j � 1, 2) are the model parameters to be identified and of which the values are shown in Table 2.

Model-Based Coal Mill Fault Simulation
e fault types and fault data in the vast amount of historical data of the thermal power units are incomplete, and selecting the fault data one by one from the massive historical data is difficult. erefore, effectively obtaining a large number of fault data is the key to solve the fault diagnosis of coal mill. e simulation results in [22] showed that the mathematical model of MPS-type medium speed coal mill presents high precision. In the current study, the coal mill model is used in the analysis and two typical coal mill faults (coal interruption and coal choking) are simulated by analyzing the fault mechanism of coal mill. e simulation experiments obtain a large number of fault data, which can effectively solve the difficulty in obtaining fault data manually from the massive data.
First, a control scheme is designed for the coal mill model. e purpose is to ensure that the simulation experiments are conducted under the closed loop regulation, such that the fault data obtained by the simulation experiments can be significantly close to the real operation status of coal mill. e control scheme is shown in Figure 2. e entire control scheme consists of three controlled, three control, and four state variables as presented in Table 3. e control circuits are composed of three single-loop  Differential pressure of primary air (mbar) C in Specific heat capacity of mixed primary air (kJ/(kg·°C)) C L Specific heat capacity of cold air (kJ/(kg·°C)) C H Specific heat capacity of hot air (kJ/(kg·°C)) T in Inlet primary air temperature of coal mill (°C) proportion-integral-differential (PID) controllers. Specifically, PID1, where the setting parameters, respectively, are K p � 1, K i � 0.05, K d � 0, controls the outlet temperature of coal mill by adjusting the valve position of cold air. PID2, where the setting parameters, respectively, are K p � 2, K i � 0.5, K d � 0, controls the inlet primary air flow of coal mill by adjusting the valve position of hot air, and PID3, where the setting parameters, respectively, are controls the outlet pulverized coal flow of coal mill by adjusting the inlet coal flow of coal mill. After designing the control scheme, the fault operation status of the coal mill can be simulated by adjusting the corresponding controllers.

Fault Simulation of Coal Interruption.
When an obstruction exists in the coal dropping pipe or a fault occurs in the coal feeder, the amount of coal into the coal mill will reduce directly and coal interruption will occur when the case is serious, thereby endangering the stability of the boiler combustion. e process of simulating coal interruption is as follows. When the coal mill is in stable operation, a negative slope signal is superimposed on PID3, such that the mass of coal entering the coal mill is gradually reduced to 0. e data generated in this process can be considered the coal interruption samples. A large number of coal interruption samples can be obtained by adjusting the set value to run the coal mill in other operation status and repeating the steps above to record fault data.
To verify the effectiveness of the simulation experiments of coal interruption, the variables that change significantly during the period of coal interruption are selected and their varying curves are drawn. Figure 3 shows the result of an arbitrary selection of experimental data. Figure 3(a) shows that coal interruption decreases the mass of coal entering the coal mill, which then decreases the outlet pulverized coal flow of the coal mill. Meanwhile, the heat consumption of the inlet primary air flow of coal mill through the coal mill reduces, thereby resulting in an upward trend in the outlet temperature of coal mill; the valve position of cold air is then rapidly opened, thereby making the outlet temperature of coal mill fall (Figure 3(b)). e reduction in the mass of coal stored in the coal mill results in the reduction of the current of coal mill and the differential pressure of primary air. e trend is consistent with that of the curves described in Figures 3(c) and 3(d). e ramp signal is removed after 75 s, and the variables are returned to the original set value under the control of the controller. e research object in [19] is a MPS-type medium speed coal mill in a power plant in Hainan, China. e current study obtains sets of fault data of coal interruption by looking for the historical operation data of the coal mill and draws the varying curves of key variables as shown in Figure 4. According to the accident analysis, the coal interruption fault occurs because of the malfunctioning of the coal feeder; as a result, the actual supply of coal gradually reduces to 0 (Figure 4(a)). Figure 4 shows that, when coal interruption fault occurs, the outlet temperature of the coal mill rises (Figure 4(b)); however, the current of coal mill (Figure 4(c)) and the differential pressure of primary air (Figure 4(d)) decrease. is changing trend is similar to that of the key variables in the simulation of coal interruption. erefore, the simulation of coal interruption in this study is reasonable. Accordingly, the data in the rectangular frame in Figure 3 can be recorded as fault samples.

Fault Simulation of Coal Choking.
Coal choking may be caused by too little inlet primary air flow of coal mill, excessive coal feed, or too much moisture in raw coal. e process of simulating coal choking is as follows. A positive step signal is superimposed on PID3 to make the mass of coal in coal mill quickly reach the upper limit, and the data are recorded as the coal choking samples. Similar to previous simulation experiments, a large number of coal choking samples can be obtained by adjusting the set value to run the coal mill in other operation status and repeating the steps above to record fault data. e upper limit of the mass of coal stored in the coal mill is set as 60 kg. e varying curves of the variables are drawn, and Figure 5 shows the result of an arbitrary selection of experimental data. As shown in the figure, the sudden increase in the set value of the outlet pulverized coal flow of coal mill causes the mass of raw coal in the coal mill to rise continuously ( Figure 5(a)) and increases the resistance along the way, which results in the increase in differential pressure of primary air ( Figure 5(c)). At the same time, the work load of coal mill increases accordingly, such that the current of the coal mill increases as well ( Figure 5(b)). When the thickness of the raw coal reaches a certain degree, the grinding efficiency drops significantly, which then reduces the current of the coal mill. e outlet pulverized coal flow of coal mill is reduced to 0 until the mass of raw coal in the coal mill reaches the upper limit, at which time the primary air pipe is blocked and the pulverized coal cannot be blown out ( Figure 5(d)). e analysis above shows that the simulation experiment results are consistent with the fault characteristics of coal choking; therefore, the data in the rectangular frame can be used as fault samples.

Stacked Autoencoders
e fault diagnosis method based on historical operation data is a data-driven approach, which aims to obtain the nonlinear mapping relationship between the data and fault features. When sufficient data are available for learning, the deep neural networks can theoretically approximate any nonlinear function.
is section describes a deep neural network called SAE, which is stacked by autoencoders (AEs), for fault diagnosis of coal mill and proposes two ways to improve the network performance.

Fundamentals of
Autoencoder. An AE neural network can be considered a three-layer neural network. is network applies unsupervised learning algorithm to train and adjust the network weight and ultimately sets the network output to be equal to the network input. A typical example is shown in Figure 6, where {x 1 , x 2 , . . ., x n ; x i ∈ R n } can be treated as a set of unlabeled raw data and x 1 ′ , x 2 ′ , . . . , x n ′ ; x i ′ ∈ R n represents the network output. e circles with b are called bias units and correspond to the intercept term. e transfer process of raw data from the input layer to the hidden layer is called encoding, and the transfer process from the hidden layer to the output layer is called decoding, which can be described by where S(·) represents sigmoid function, W 1 represents the weight matrix between the input and hidden layers, W 2 represents the weight matrix between the hidden and output layers, and b 1 and b 2 represent the bias. According to the concepts mentioned in this section, AE tries to learn a function h w,b (x)≈x. In other words, AE is trained to learn an approximate function such that the network output is similar to the network input. In fact, by putting constraints into AE, such as limiting the number of nodes in hidden layer, AE can obtain the low-dimensional feature of the data by compressing the high-dimensional input data. In Section 3, a large number of fault data have been obtained by fault simulation experiments. e remaining parts focus on the establishment of a suitable AE network to find the relationship between data and fault characteristics by effective learning of the complex fault data.     Mathematical Problems in Engineering e backpropagation algorithm is used for AE training. A training set (x (1) , y (1) ), . . . , (x (m) , y (m) ) of m training samples is assumed. e network can be trained using batch gradient descent. For a single training example (x, y), the cost function can be defined as For a training set of m samples, the overall cost function is where λ represents weight decay coefficient that controls the relative importance of the two terms in equation (5). W (l) ji represents the synaptic weight between the i-th neuron in layer l and j-th neuron in layer l + 1. n l represents the number of layers in AE. In other words, n l can represent the output layer of the network, and s l represents the number of the total neurons in layer l. e first term in the definition of J (W, b) is an average sum-of-squares error term. e second term is a weight decay term that can decrease the magnitude of the weights and prevent overfitting. e weight W and bias b are updated with gradient descent as follows:

Mathematical Problems in Engineering
where α represents the learning rate. e partial derivatives in the equations above are derived as follows: where z zW (l) where a (l) j represents the activation of unit j in layer l and δ (l+1) i represents the error term of layer l + 1, given by where z (l) i represents the input weighted sum of unit i in layer l and f ′ (·) represents partial deflection of sigmoid function. e error term of the output layer n l is given by where a (l) i represents the activation of unit i of layer l, a (n l ) i represents the activation of unit i in the output layer, and z (n l ) i represents the input weighted sum of unit i in the output layer.
Repeating the above equations can make the output of AE equal to the input of AE by minimizing the overall cost function (equation (5)).

Improvement of AE.
As mentioned in Section 4.1, limiting the number of nodes in hidden layer is conducive to helping AE learn the relationship between the input data and fault features, because reducing the number of neurons can simplify the structure of the hidden layer and reduce the dimension of the input data.
Restricting the number of neurons can reduce the dimension of the data, but the network can learn few features in the hidden layer. On the basis of guaranteeing the diversity of the features in the hidden layer, a method called sparse constraint is introduced in this study to improve AE. e main idea is not to reduce the number of neurons but to consider restrictions to limit the activities of the neurons and thus reduce the dimension of the input data. Accordingly, the original overall cost function (equation (5)) should be modified to introduce an additional penalty factor, given by where where s j�1 KL(ρ ‖ ρ j ) represents the sparsity penalty term, β controls the weight of the sparsity penalty term, ρ j represents the average activation of unit j in hidden layer, ρ represents a sparsity parameter, and s represents the number of units in one hidden layer. e penalty term has the following property: if ρ j � ρ, then KL(ρ ‖ ρ j ); the value increases monotonically with the difference between ρ j and ρ. erefore, the activations of hidden units are sufficiently small when ρ is set close to zero.
Random noise is introduced into the input data to make the network learn rich information and thus prevent the AE from learning only the equivalent representation of the original data. e main idea is to set a small number of nodes in the input layer to zero at a small probability. However, the probability of introducing random noise should be appropriate; otherwise, the noise may cause irreversible damage to the input data.
SAEs are deep neural networks consisting of multiple layers of the improved AEs in which the output of each layer is wired to the input of the next layer. e SAE model is connected with a Softmax classifier to complete the construction of the deep neural network (Figure 7). e SAE model can identify the fault in the coal mill by learning the labeled data obtained from the fault simulation experiments of the coal mill.

Data Preprocessing and Health State Definition.
In accordance with the fault simulation method described in Section 3, the simulation experiments are conducted repeatedly, and then 5000 sets of experimental data are obtained including three kinds of data samples; namely, coal mill operates in coal broken condition (coal interruption), full-ofcoal condition (coal choking), and normal condition (normal operation). To facilitate the training of the SAE model, the three different operation conditions of the coal mill are labeled. e definition is shown in Table 4. e 5000 sets of data are randomly divided into training data and test data (Table 5). At the same time, two experiments are conducted to validate the effects of the proposed fault diagnosis method. e test samples of two experiments are the same.

Establishment of the SAE Model
. e optimum SAE model has important effects on the accuracy rate of fault identification. In [25,26], the unsupervised learning effect of SAE is reported to be affected by parameters of model, such as the number of nodes in the input and hidden layers, sparse parameter, and the number of times of network training. e experimental data of Experiment 1 are used as training samples, and relevant experiments are conducted to determine the optimum parameters of the SAE model. e evaluation index is the reconstruction error of the first layer of the SAE model, which is calculated by equation (4), and the experimental results are shown in Figure 8. According to the analysis in Section 3, the significantly changed variables during the fault period of the coal mill include differential pressure of primary air, outlet temperature of coal mill, and current of coal mill. erefore, the three variables are used as the input nodes of the SAE model. e SAE model can learn much information when the number of nodes in the input layer is large. However, the number of nodes in the input layer cannot be increased indefinitely because of the computational complexity. Figure 8(a) shows that, when the number of nodes in the input layer increases from 40 to 100, the reconstruction error of the network decreases continuously. If the number of nodes in the input layer increases further, then the reconstruction error will remain unchanged. e number of nodes in the hidden layer determines the degree to which the model compresses the input data. e degree of compression is high when the number of nodes in the hidden layer is small. An experiment is employed using the first layer of SAE. In the experiment, the input size is set to 120 on the basis of the experiment above to determine the appropriate hidden layer parameters by analyzing the influence on the reconstruction results. As shown in Figure 8(b), when the number of hidden layer nodes is less than the input layer nodes, the reconstruction error fluctuates in a small range. is result indicates that the original data can obtain better compression when the number of hidden layer nodes is small, and this situation is conducive for the model to learn data characteristics. However, when the number of nodes in the hidden layer exceeds the number of nodes in the input layer, the reconstruction error increases rapidly, and the training effect is poor. e reason is that the sparse parameter ρ is set to 0 at this time, and the activities of neurons in the hidden layer cannot be limited, which then leads to the poor compression effect of the SAE model on original data. Combining the constraints of complexity of network structure and computational efficiency, the number of hidden layers is set to three, and the number of nodes in each layer is 100, 50, and 25.
Sparse constraint is introduced to improve the capability of the SAE model to compress input data. Figure 8(c) shows that, when the value of ρ is between 0.05 and 0.15, the reconstruction error of the network continues to decrease, showing that the inhibitory effect on neurons is appropriate. With the increase in the value of ρ, the inhibitory effect on neurons is excessive, and the reconstruction error increases rapidly.    Random noise is introduced to the input data to prevent the SAE model from learning only the equivalent representation of the original data. As shown in Figure 8(d), when the probability of introducing noise is in the range of 0 to 0.1, reconstruction error decreases with the increase in noise. However, with the increase in noise, the reconstruction error increases rapidly because excessive noise causes nondestructive damage to the raw data.
In combination with the analysis above, the key parameters of the SAE model are shown in Table 6. e allocation of the nodes in the input layer is shown in Table 7 Figure 10.
e two sets of misdiagnosed samples contained in the rectangular box in Figure 9(a) correspond to data contained in the rectangular box in Figure 10. e change trend of data during this period indicates that differential pressure of primary air decreases, outlet temperature of coal mill rises, and current of coal mill decreases. ese trends are consistent with the characteristics of coal mill when the coal interruption fault occurs, and these characteristics have been described in Section 3.1. erefore, when normal operation samples are insufficient, SAE cannot fully study the differences between the two types     of data and wrongly diagnoses the normal operation data as coal interruption fault. Similarly, from the data contained in the ellipse box in Figure 10, the change trend of the data in this period is found to be consistent with the characteristics of the coal choking, differential pressure of primary air rises, outlet temperature of coal mill decreases, and current of coal mill decreases. us, SAE can mistakenly diagnose the normal operation data as coal choking fault. To improve the accuracy of fault diagnosis of SAE, the training samples of normal operation are increased to 1500 groups, and an experiment (Experiment 2) is conducted again. Figure 9(b) shows that, although two sets of misdiagnosed samples are still present, the accuracy of fault diagnosis of SAE has been improved to 98.7%. erefore, if the training samples continue to increase, then the accuracy of fault diagnosis of SAE will theoretically be close to 100%.
To illustrate the effectiveness of the method that improves the performance of SAE, the accuracy rate of fault diagnosis of SAE before and after the algorithm improvement is compared.
e comparison results are shown in Table 8. Training and test samples are from the data in Experiment 1. Table 8 shows that, when no improvement is implemented in SAE, the fault recognition rate is 84.02%. When sparse constraint is introduced in SAE, the fault recognition rate is increased to 85.05%. After introducing random noise into input data, the fault identification rate increases to 89.18%. When two improved methods are introduced into SAE, the network fault recognition rate further increases to 95.4%. e above analysis shows that the two improved methods proposed in this study can improve the fault recognition capability of SAE. e coal mill is characterized by a large delay system. Detecting changes in outlet pulverized coal flow of coal mill to find the operation fault in coal mill often cannot establish early warning. rough real-time monitoring of differential pressure of primary air, outlet temperature of coal mill, and current of coal mill, and the three kinds of fast changing signals, the trained SAE can find the operation fault in the coal mill in advance. Coal interruption is taken as an example in this study. As shown in Figure 11, coal interruption fault occurs by artificial simulation. As a result, the outlet pulverized coal flow of coal mill reduces to 0 in 110 s, while the output of the SAE jumps from the normal operation state to coal interruption fault in 75 s. e network has advanced 35 s to predict the fault in the coal mill. With the adjustment of PID, the outlet pulverized coal flow of the coal mill rises gradually and goes back to the safety limit in 160 s. At this point, the output of the SAE returns to normal. us, the proposed method based on the deep learning algorithm can play an important role in the fault diagnosis of the coal mill.

Conclusions
In this study, a deep learning algorithm based on a datadriven model is proposed for fault diagnosis of coal mills. On the basis of the mechanism model of coal mills, the fault operation of coal mills is simulated and numerous fault data are obtained. us, the difficulty in obtaining the fault data using traditional methods is addressed. e performance of SAE is improved by introducing sparse constraints and random noise in the input layer. At the same time, the accuracy of fault diagnosis of coal mills is effectively improved, thereby enabling the possible prediction of fault in coal mills. e method proposed in the paper greatly improves the accuracy of the fault diagnosis of coal mills, which is of great significance for ensuring the safe operation of power plants. In addition, the proposed method is easy to generalize. Complex mechanical equipment in other industrial fields can use this method for fault diagnosis. e method can reduce the use of sensors for fault diagnosis of large equipment and the investment of human resources, which is essential to improve the economy and safety of the industry.
It should be noted that the paper does not consider online training for SAE. e main reason is that the online training of deep neural networks will take a lot of time, which puts stricter requirements on the performance of computers and optimization algorithms. So, we adopt a simplified method which chooses to directly use the offline training model to achieve fault diagnosis. It greatly saves the cost of calculation. However, using the online training model can continuously optimize the accuracy of the model [27][28][29][30][31] as the data accumulate. erefore, how to realize the online training of the model will be the focus of our followup research.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.