The Bidirectional Information Fusion Using an Improved LSTM Model

-e information fusion technology is of great significance in intelligent systems. At present, the modern coal-fired power plant has the fully functional sensor network. However, many data that are important for the operation of a power plant, such as the coal quality, cannot be directly obtained. -erefore, the information fusion technology needs to be introduced to obtain the implied information of the power plant. As a practical application, the soft measurement of coal quality is taken as the research object.-is paper proposes an improved LSTMmodel combined with the bidirectional deep fusion, alertness mechanism, and parameter selflearning (DFAS-LSTM) to realize online soft computing for the coal quality analyses of industries and elements. First, a latent structure model is established to preprocess the noisy and redundant sensor network data. Second, an alertness mechanism is proposed and the self-learningmethod of the activation function parameters is used for the data feature extraction.-ird, a deeply bidirectional fusion layer is added to the long short-term memory neural network model to solve the problem of the insufficient accuracy and the weak generalization. Using the historical data of the sensor network, the DFAS-LSTMmodel is established.-en, the online data of the sensor network is input to the DFAS-LSTM model to implement the online coal quality analyses. Experiment shows that the accuracy of the coal quality analyses is increased by 1%–2.42% compared to the traditionally bidirectional LSTM.


Introduction
Long short-term memory (LSTM) neural network is a variant model of the recurrent neural network [1,2]. In recent years, there have been many studies on the LSTM. Reference [3] proposed a novel DC-Bi-LSTM model. Reference [4] developed a selective multimode long short-term memory network. Reference [5] studied a novel layered network to solve the problem of the pedestrian trajectory prediction. Besides, aiming at the problem of the 3D motion recognition, reference [6] introduced a new gating mechanism into LSTM to increase the reliability of the sequential input data and to adjust the effect on updating the long-term context information stored in the memory cell.
For the past few years, neural network models based on the bidirectional long short-term memory (Bi-LSTM) networks have achieved excellent application performance in their respective fields and have shown a vitality of the Bi-LSTM in the field of the sequential data processing [7,8].
erefore, the basic research of the bidirectional long shortterm memory network is of a great significance for the performance upgrading of the neural network model based on it as well as the development of a new neural network model in specific application field.
In recent years, neural network technology has shown excellent performance in pattern recognition, automatic control, signal processing, auxiliary decision-making, and other fields [9,10]. In particular, the coal is an important energy in the world, and the way of the coal energy utilization still has a great improvement space. Meanwhile, neural network technology has the practical value for adjusting the energy utilization way. According to the data provided by BP p.l.c. in 2019, world's total primary energy consumption is 583.88 EJ, of which coal consumption is 157.86 EJ. e coal plays an important role in the consumption of the primary energy [11]. e main utilization mode of the coal is combustion in coal-fired power plants [12]. e types of the coal used in coal-fired power plants are complex and changeable. In coal-fired power plants, the boiler combustion system is the heart. e change of the coal quality directly affects the combustion state of the boiler, as well as the stability, safety, and economy of the boiler combustion system. e timely acquisition of the coal quality information is of a great significance for ensuring the smooth operation and the adequacy of the fuel combustion in the boiler.
At the same time, it can also provide power for improving the economic benefits of the coal-fired power plant and the energy utilization rate of the coal-fired industry. So far, the coal quality measurement technologies include neutron activation analysis, laser-induced breakdown spectroscopy, and near-infrared spectroscopy [13,14]. e above technologies realize the real-time measurement of the coal quality in coal-fired power plants, but all rely on expensive hardware facilities.
For the application, an improved LSTM model with the bidirectional fusion, alertness mechanism, and parameter self-learning (DFAS-LSTM) is used for coal quality computing in coal-fired power plants. e real-time coal quality computing method is not yet found in publications, and the soft computing of the coal quality in coal-fired power plant is a major challenge for many years. However, coal-fired power plants generally have a certain degree of intelligence.
ey have a strong sensor network composed of edge devices and central processing systems. In the sensor network, there are a large number of sensors with complete categories, which provide a wealth of the edge data for the system.
ese edge data contain rich information about coal-fired power plants systems. Some of the information, such as coal quality information, is hidden in the edge data in the form of the high-dimensional data features. e way to realize element analyses of the coal quality through edge data mining can avoid the interference to the system operation as well as the cost increase caused by the addition of the hardware facilities. It is cheap and convenient as well. To achieve this goal, this paper proposes a DFAS-LSTM model, which is used to mine coal quality-related information in the edge data of the coal-fired power plants. At the same time, this paper uses the alertness mechanism for the alerting abnormal data information, the improved activation function, and the deeply bidirectional fusion structure of the LSTM for improving the performance of the model. e logic diagram of the soft computing system is shown in Figure 1.
e following parts of this paper include related work, data set and its latent structure, establishment of the alertness mechanism, deeply bidirectional fusion LSTM modeling, experiment and result analyses, and conclusions.

Related Work
is section discusses the development trend of the efficient use of energy in the world and the related work of scholars on the optimization of the LSTM structure and the coal utilization from various countries.
At present, the world is in a trend of economic transformation and efficient use of energy. e sustainable development trend of the economy puts forward higher requirements for the rational and efficient use of the primary energy. According to the statistics of the International Energy Agency (IEA), the energy intensities of some countries are shown in Figure 2.
Among primary energy sources, coal is mainly used for power generation in coal-fired power plants. e coal-fired power plants are facing increasingly severe challenges. e introduction of the intelligent technologies and methods is necessary for them to adapt to the historical trend of the efficient energy utilization.
In recent years, the generation capacity of wind and solar power has continued to grow. However, due to the high volatility of the renewable energy from wind and solar power generation, coal-fired power plants must be used to cover the gap between renewable energy generation and load to maintain the stability of the frequency and the power supply stability [15]. erefore, the method of the coal-fired power generation is still irreplaceable.
According to the survey data from BP p.l.c., driven by the wind and solar energy, the growth of the renewable energy has reached a record level, accounting for more than 40% of the primary energy growth in 2019 [16]. is also means that the ever-changing energy structure places higher demands on coal-fired power generation technology.

Optimization of the LSTM Model.
Aiming at increasing the depth in the time dimension, reference [17] extended a highway LSTM (HW-LSTM) model by adding highway networks inside an LSTM and used it for language modeling. As an application of the recursive neural network, reference [18] studied the capture of the behavior trajectory through a large context window and achieved the purpose of solving the data sparseness and improving the robustness. Combined with the attention mechanism and the character-level convolutional neural network, reference [19] proposed several new classification architectures based on the long short-term memory (LSTM) language model and the gated recurrent unit (GRU) language model. Reference [20] formulated precipitation nowcasting as a spatiotemporal sequence forecasting problem. By extending the fully connected LSTM (FC-LSTM) to have convolutional structures in both the input-to-state and state-to-state transitions, they proposed the convolutional LSTM (ConvLSTM) and used it for the precipitation nowcasting problem. Reference [21] presented a novel unified framework (LSTM-E) for exploring the learning of the LSTM and visual-semantic embedding. In order to solve the problem of the energy load forecasting, reference [22] presented a novel forecasting model based on long short-term memory algorithms. Reference [23] designed an architecture with the purpose of serving as a model which can generate sequence samples, while simultaneously classifying a given sequence.

Optimization of the Coal Energy Utilization.
At present, the process of the power plant intelligence is constantly advancing. Reference [24] developed a dynamic model of the drum-boiler using NARX neural networks, which can forecast the actual pressure and water level of the drumboiler. Reference [25] proposed a pilot program aiming at developing a comprehensive knowledge base for power plants by using principal component analysis and artificial neural networks. e program used the principal component analysis method to filter noise in the prediagnosis stage and evaluated the neural network model based on representative data of the power plant. Reference [26] developed a diagnosis system of the power plant gas turbine to detect the deterioration of the turbine. By using artificial neural network, the system can be used for predicting the deterioration of the main component. Reference [27] proposed a new method for predicting the output of a power plant by using a feedforward neural network. e network used ambient temperature, atmospheric pressure, relative humidity, and vacuum as the input parameters to predict the average hourly output of the power plant.
At present, coal quality information of a coal-fired power plant cannot be obtained in real time. As a result, untimely coal quality information lacks guiding value for the optimization of coal combustion, which leads to the underutilization of the coal in coal-fired power plants. As a method to improve the utilization of the coal, an online soft measurement model for coal quality is proposed in this paper.
e system framework is shown in Figure 3.

Data Set and Its Latent Structure
is section discusses the data acquisition and preprocessing including the selection of conventional measurement points, principal component analysis, and independent component analysis.

Selection of Conventional Measurement Points.
In the sensor network of a coal-fired power plant, there are many kinds of conventional measurement points, and the redundancy among the measured data is serious. Most of the conventional measurement points have no obvious correlation with coal quality. ese points lack the guidance for the coal quality soft computing. erefore, it is necessary to screen the conventional measurement points of the coalfired power plant and obtain the effective data from the monitoring points. e operation process of a coal-fired power plant includes complex physical and chemical reaction processes. e existing operation mechanism research of the coal-fired power plant provides a basis for the selection of conventional measurement points [28]. According to the laws of the energy conservation, material conservation, and the actual process flow in the operation of a coal-fired power plant, 190 conventional measurement points related to the coal quality are determined from the sensor network of the coal-fired power plant. Some relevant measurement points used in soft computing of the coal quality are shown in Table 1.
In coal-fired power plants, the sensors used in conventional measurement points have various working principles and complex working environment; these result in the serious data redundancy and noise. At the same time, the range and accuracy of the data from different measuring points are significantly different, as well as the correlation with coal quality. In order to ensure the performance of the model, it is necessary to preprocess the data of conventional measurement points in coal-fired power plants.
For the monitoring data coming from conventional measurement points in coal-fired power plants, the bad points in the data are eliminated first. e data preprocessing model realizes this function mainly through the following steps:      No. Name/unit 1 Active power of the generator (MW) 2 Total power of the generator (MW) 3 Total coal supply (T/h) 4 Current of the coal feeder (A) 5 Current of the coal mill (A) 6 Primary air volume at mill inlet (T/h) 7 Primary air temperature at mill inlet (°C) 8 Steam temperature at the outlet of the final reheater (left side) (°C) 9 Steam temperature at the outlet of the final superheater (left side) (°C) 10 Flow of the feed water (T/h) 11 Temperature of the feed water (°C) 12 Pressure of the feed water (MPa) 13 Drum pressure (MPa) 14 Drum water level (mm) 15 Secondary air flow rate (T/h) 16 Desuperheating water temperature of the superheater (°C) 17 Outer wall temperature of the high-pressure cylinder (°C) 18 Inner wall temperature of the high-pressure cylinder (°C) 19 Inlet air temperature of the secondary heater (°C) 20 Primary air pressure at outlet of the air preheater (MPa) 21 Secondary air pressure at the inlet of the air preheater (MPa) 22 Outlet air pressure of the forced draft fan (MPa) 23 Current of the supply fan (A) 24 Current of the primary fan (A) 25 Outlet air pressure of the primary fan (MPa) 26 Differential pressure between the secondary air box and the furnace (MPa) 27 Inlet air temperature of the primary fan (°C) 28 Outlet air temperature of the primary air heater (°C) 29 Inlet air temperature of the primary air heater (°C) 30 Negative pressure at the outlet of the extension end low-temperature superheater (MPa) 31 Negative pressure at the outlet flue of the fixed end superheater (MPa) 4 Mobile Information Systems (1) According to the measurement range information of each measurement point, remove the measurement point data which obviously deviates from the measurement range (2) Combined with the actual operation experience of the power plant, remove the data which obviously deviates from the experience value under the current working condition At the same time, in order to shorten the time of the data processing and remove redundant information from the monitoring data, this paper proposes a feature extraction method based on the latent structure model for the monitoring data. Two common methods, principal component analysis (PCA) [29,30] and independent component analysis (ICA) [31,32], are used in this latent structure model.

Principal Component Analysis.
In coal-fired power plants, there are many conventional measurement points.
e soft computing method, by which we obtain the coal quality-related information through edge data mining, requires the processing of the high-dimensional data. e principal component analysis method is a commonly used data dimensionality reduction method, which aims at maximizing the variance of the data after dimensionality reduction. e PCA algorithm is used to map the raw data to the low-dimensional feature space with most of the information saving. At the same time, the algorithm realizes the data compression and avoids too many parameters of the following neural network.
In this paper, the data of the conventional measurement points and the coal quality test are used as the test data of the model. Set the data set of the conventional measurement points of the coal-fired power plants after data preprocessing as U.
where {X M−1,t0 , X M−1,t1 , . . ., X M−1,tN−1 } is the data of the Mth coal-fired power plants' conventional measuring points at the sampling time of the t 0 , t 1 , . . ., t N−1 , respectively. N is the number of samples, and M is the total number of the conventional measuring points. e correlation matrix R of the U T is calculated by Find the eigenvalue λij of the R, queue the eigenvalues from the big to the small, select the first d bigger eigenvalues, Temperature of the primary air (°C) 33 Temperature of the hot secondary air (°C) 34 Primary air duct pressure of the furnace (MPA) 35 Main pipe pressure of the hot primary air (MPa) 36 Outlet pipe temperature of the reheater (left side) (°C) 37 Outlet pressure of the reheater (left side) (MPa) 38 Working environment temperature of the coal mill (°C) 39 Inlet temperature of the reheater desuperheater (left side) (°C) 40 Desuperheating water flow of the reheater (left side) (T/h) 41 Primary air pressure at mill inlet (MPa) 42 Pressure of the reheater inlet (MPa) 43 Inlet steam temperature of the low-pressure cylinder (_C) 44 Exhaust temperature of the high-pressure cylinder (_C) 45 Exhaust pressure of the high-pressure cylinder (MPa) 46 Outlet air temperature of the secondary heater (_C) 47 Steam pipe pressure of the high and intermediate pressure cylinder (MPa)  48 Steam pipe temperature of the high and medium pressure cylinder (_C) 49 Differential pressure between the secondary air box and the furnace (MPa) 50 Oxygen concentration at chimney inlet (%) 51 Outlet flue gas temperature of the air preheater (_C) 52 Outlet flue gas pressure of the air preheater (kPa) 53 Flue gas temperature at the inlet of the air preheater (_C) 54 Inlet flue gas pressure of the air preheater (kPa) 55 Main motor current of the air preheater (A) 56 e current of the induced draft fan (A) 57 Flue gas pressure at the inlet of the induced draft fan (kPa) 58 Low-pressure cylinder exhaust temperature (_C) 59 Inlet flue pressure of the fixed end economizer (MPa) 60 Feed water temperature at economizer inlet (_C) 61 Pressure of the main water supply pipe at economizer inlet (MPa) 62 Inlet flue pressure of the economizer at expansion end (MPa) and calculate the eigenvectors corresponding to the d eigenvalues. After normalization, record them as U j , j � 1, 2,...., d. e transformation matrix A is composed of the U j .
K-L transformation is applied to the sample set U T . If the transformed matrix is I, then where vector I is the low-dimensional data obtained after principal component analysis, and the data dimension is reduced to d dimension. After the principal component analysis, the correlation of the data is removed by using the two-order statistical information, the processed data may still have higher order redundant information, and the components of the data may not have mutual independence. erefore, the independent component analysis is used to obtain the independent component of the data after the principal component analysis.

Independent Component
Analysis. Independent component analysis (ICA) is a data analysis method that aims at the independence of the processed data components. Taking Fast-ICA algorithm as an example, this paper describes the implementation of the ICA algorithm. Fast-ICA algorithm, also known as fixed point algorithm, is a kind of the fast optimization iterative algorithm, which has forms based on kurtosis, likelihood, and negative entropy. In this section, taking Fast-ICA algorithm based on negative entropy as an example, suppose the data to be processed is X; then the Fast-ICA algorithm determines the separation matrix W by observing the data X. e data result Y of the independent component analysis is shown in According to the information theory, among random variables with the same variance, Gaussian random variables have the largest differential entropy. According to the central limit theorem, the stronger the non-Gaussianness of the Y, the greater its negative entropy and the stronger its independence. e decision basis for the Fast-ICA algorithm based on negative entropy is the maximization of the negative entropy. e definition of the negative entropy is given in where J g (Y) is the negative entropy, Y Gauss is a Gaussian random variable with the same variance as Y, and H(Y Gauss ) is the entropy of the random variable. In practical applications, in order to avoid using the unknown probability density distribution function of the variable Y, the approximate formula of the negative entropy is shown in where E is the mean calculation and G is a nonlinear function.
Because the data is generally standardized before being analyzed by Fast-ICA algorithm, the constraint is given by in From equation (8), combined with the method of the Lagrange multipliers, equation (10) can be obtained as follows: where β is the Lagrangian multiplier.
In practical applications, the iterative algorithm and equation (10) can be combined to realize the processing of the independent component analysis on the data.

Establishment of the Alertness Mechanism
e basic theory of the alertness mechanism originates from philosophy, cognitive psychology, social science, and linguistics. It is a new mechanism to strengthen the key information of the data based on prior knowledge [33]. e data used for online measurement of the coal quality is the actual operation data of a coal-fired power plant. In the actual operation of a coal-fired power plant, due to the segmented control behavior of the control system or the subjective operation of the operators, the regularity of the raw data in the time series will be destroyed in some places. Because this kind of the damage will reduce the accuracy of the model, and according to the prior knowledge, it has a certain alertness feasibility. In this paper, the alertness mechanism is introduced to data processing for the purpose of optimizing the model. e specific implementation process is as follows: (1) According to the prior knowledge of the operation in a coal-fired power plant, determine the data location of conventional measurement points that need the active alert, such as the changing points of the total coal input of the coal mill, plant load, and boiler water level. (2) e active alert matrix C is defined. e alert matrix is a sparse matrix. e value of each element in the matrix is the alert weight. e data position weight that needs alert is not 0, and the rest position elements are 0.
where W i,t 0 , W j,t 1 , . . . , W h,t N−1 are alert weights and subscripts i, j, h, t 0 , t 1 , t N−1 indicate the location and time of the alert data. (3) Modify the data; introduce the alertness mechanism to the feature vector extracted by the latent structure model. Let the input of the alertness mechanism be matrix I, where I t 0 , I t 1 , . . . , I t N−1 are d-dimensional eigenvectors. e data after the introduction of the alertness mechanism are shown in equation (13), where N is the number of samples, I t 0 , I t 1 , . . . , I t N−1 represent the sampling time series, I is the original data, and W is the alert weight.
(4) e output of the alertness mechanism is the input of the long short-term memory neural network in the model. With the training process, the weight of the alertness mechanism and the neural network parameters are modified by the gradient descent method.
e alertness mechanism proposed in this paper refers to the mechanism of the attention mechanism in neural network, and it is obviously different from attention mechanism [34,35]. e main differences between alertness mechanism and attention mechanism are as follows: (1) e initial setting of the alertness mechanism's weight is based on the prior knowledge of practical problems, while attention mechanism does not need the support of the prior knowledge.
(3) e object of the alertness mechanism is some specific discrete data points, while attention mechanism introduces attention mechanism to all or part of the data. (4) e purpose of introducing alertness mechanism to data in the model is to reduce the damage caused by the subjective operation behavior of operators or the segmented control behavior of the control system in coal-fired power plants, while the general purpose of the attention mechanism is to pay attention to the key information in data and enhance the ability of the neural network to data mining. e introduction of the alertness mechanism enhances the stability of the training model and suppresses the influence of the abnormal data fluctuation on the accuracy of the model output. To a certain degree, it reduces the damage of the unpredictable factors to the data regularity and optimizes the performance of the DFAS-LSTM model proposed in this paper.

Deeply Bidirectional Fusion LSTM Modeling
is section discusses the establishment of the deeply bidirectional fusion LSTM including improved activation function of parameters self-learning, structure of the deeply bidirectional fusion LSTM, and Encoder-Decoder framework with attention mechanism.

Improved Activation Function of the Parameter Self-
Learning. Long short-term memory (LSTM) is a variant of the recurrent neural network. Different from the traditional recurrent neural network, LSTM neural network uses three gate controllers: input gate, output gate, and forgetting gate. On the basis of the original short-term memory, memory units are added to maintain long-term memory. Reference [36] studied the recent LSTM variants, summarized the results of 5400 experiments, and found that the forgetting gate and output activation function are the most critical components. Compared with the traditional recurrent neural networks model, the long short-term memory neural network uses gate structure, which enhances the selective memory ability of the neural network and overcomes the problem that the traditional recurrent neural networks are prone to gradient explosion and gradient dispersion in dealing with long-term sequential problems. erefore, it has a unique advantage in dealing with long-term sequential problems. Activation function is an indispensable part of the long short-term memory neural network, and the activation function of its input gate plays an important role in the mapping process from input to neuron state [37]. Tanh function is a commonly used activation function of the bidirectional long short-term memory neural network. e expression is given in equation (14).
Mobile Information Systems e tanh activation function and its derivative curve are shown in Figure 4.
As shown in Figure 4, tanh function has a wide saturation region, in which the derivative of the tanh function is almost zero. Because the neural network uses the gradient descent method to modify the network weight, when the activation function enters the saturation region, the modification of the weight will be very slow. When tanh is selected as the activation function, if there is a large number of the data input, the weight parameters may be congested due to the slow correction of the weight, resulting in a longer training time or even inability to train. To solve the above problems of the tanh activation function, this paper proposes an improved multiparameter hyperbolic tangent activation function f(x), whose expression is given by where λ regulates the output amplitude of the activation function, c regulates the scale of the independent variable, and η regulates the gradient of the activation function, reflecting the gradient limit of the activation function. e function curve of the improved activation function f(x) is shown in Figure 5. λ, c, and η are adjustable parameters in neural network, and their values affect the performance of the neural network model. In this paper, λ, c, and η are set as optimization variables. After the initial value is set, with the training process, the gradient descent method is used to optimize the values of λ, c, and η. After the training, the appropriate values of λ, c, and η are determined and solidified into the model. In the model training process, the value optimization process of λ, c, and η is shown in Figure 6.
It can be seen from Figure 6 that the parameters λ, c, and η in the improved activation function of the multiparameters self-learning realize self-learning with the training process. e parameter correction process is relatively slow, and the parameter correction of λ, c, and η in this model tends to be stable after about 50000 steps. After training convergence, the activation function parameters λ, c, and η in this model are about: λ � 0.2442114, c � 2.82857346, and η � 0.01878586. e improved hyperbolic tangent activation function avoids the difficulty due to gradient saturation during training. At the same time, this method realizes the process of the independent optimization in the model. e influence of the subjective parameters on the performance of the neural network is weakened. e parameter self-learning method proposed in this paper is also suitable for some common superparameters and provides a new idea for the optimization of the superparameters in neural network.

Structure of the Bidirectional Deep Fusion LSTM.
When using deep learning methods to deal with sequence problems, RNN is a common and effective method. During the operation of a coal-fired power plant, it takes a long time for the material and energy of the coal to be completely converted. Data on coal quality is scattered over a long time series. As a kind of the RNN, LSTM has better performance in dealing with long sequence problems than traditional RNN. Bidirectional long short-term memory (Bi-LSTM) networks, compared with unidirectional long short-term memory networks, consider the relevance of the data in the reverse direction, which helps to fully mine the relevance of the data   Mobile Information Systems in the forward and reverse direction [38]. On the basis of the Bi-LSTM, this paper proposes a structure of the deeply bidirectional fusion LSTM model. e structure of the deeply bidirectional fusion LSTM model is shown in Figure 7. e structure of the deeply bidirectional fusion LSTM is an important part of the DFAS-LSTM proposed in this paper. It uses the fusion layer to realize the fusion of the forward and the reverse data in the hidden layer of the model. Compared with the deeply bidirectional fusion LSTM, the traditional bidirectional LSTM is essentially two independent unidirectional LSTM networks, and there is no bidirectional data fusion in the hidden layer of the network. e structure design of the forward and reverse data splitting hinders the ability of the hidden layer in the neural network to extract bidirectional data features. In this paper, the deeply bidirectional fusion LSTM structure constructed by DFAS-LSTM model overcomes the defect of the traditional bidirectional LSTM by fusion structure, which makes the neural network structure have stronger ability of the data feature representation. Deeply bidirectional fusion LSTM structure is the core structure of the DFAS-LSTM model, which consists of the input layer, forward LSTM layer, reverse LSTM layer, bidirectional data fusion layer, and output layer. Among them, the fusion layer is the key structure to realize bidirectional data fusion in DFAS-LSTM model and also the key to distinguish the deeply bidirectional fusion LSTM from the traditional deeply bidirectional LSTM.
In the fusion layer, bidirectional fusion weight, sigmoid function, and Encoder-Decoder unit are set. In this structure, the input of this structure is the output of the upper bidirectional LSTM neuron, and the output is the input of the lower LSTM neuron. After the data enters the fusion layer, the forward and reverse neuron output data are given fusion weights. After bidirectional data superposition, sigmoid function is used to output the fused vector.
is vector outputs the vector data of the specified dimension through the Encoder-Decoder framework that introduces the attention mechanism.
is output is connected to the bidirectional LSTM neuron node of the lower layer. e structure of the bidirectional data fusion layer is shown in Figure 8.
In order to visually characterize the working process of the fusion layer, mathematical expressions are given. Suppose the input of the fusion layer, that is, the output of the previous layer, is Y f and Y b , respectively. en, the mathematical operation process of the fusion layer is shown in equations Y f � y f1 , y f2 , y f3 , . . . , y fn , Y b � y b1 , y b2 , y b3 , . . . , y bn , where Y is the input of the Encoder-Decoder framework, Y f is the forward output matrix, Y b is the reverse output matrix, W f is the forward fusion weight, and W b is the reverse fusion weight. Sigmoid function enhances the nonlinear fitting ability of the fusion layer, adjusts the fusion result range, and enhances the representation ability of the model.

Encoder-Decoder Framework with Attention Mechanism.
As shown in Figure 8, the fusion layer uses the Encoder-Decoder framework to adjust the data after bidirectional data fusion. e working mechanism of the Encoder-Decoder framework is as follows: use encoder to map the input to the specified dimension space, get the fixed dimension decoding vector C, and then use decoder structure to decode the decoding vector C. is structure realizes the function of the data feature acquisition and data structure adjustment.

Mobile Information Systems
In this paper, the fusion layer of the DFAS-LSTM model is constructed based on Encoder-Decoder framework, and attention mechanism is added. e attention mechanism in deep learning is essentially similar to the selective visual attention mechanism of the human beings. Its core goal is to select the primary and secondary information of the current task, so as to achieve the purpose of paying attention to the primary information and ignoring the secondary information. Compared with the traditional Encoder-Decoder framework, the encoder structure of the Encoder-Decoder framework which introduces attention mechanism encodes the input into a vector sequence. At the same time, each vector in the vector sequence obtains different attention weights according to its importance to the target output. en, the decoder structure decodes the vector sequence with attention weight. In the fusion layer, the Encoder-Decoder framework of this structure can effectively obtain the key information in the bidirectional fusion data and improve the data feature extraction ability of the fusion layer.
e Encoder-Decoder framework with attention mechanism is shown in Figure 9.

Experiment and Result Analyses
is section discusses the performance of the DFAS-LSTM in coal quality computing. Content includes presetting and training, metrics soft computing of the industrial coal quality analyses, and soft computing of the elemental coal quality analyses.

Presetting and Training.
Our algorithm is implemented in Tensorflow-1.12.0 with the Python wrapper and using eight cores of a 3.6 GHz Intel Core i7-7700 CPU and two NVIDIA GeForce GTX 1080 Ti GPUs. e data for model training and testing comes from actual operating data of a coal-fired power plant. Among them, the coal quality data comes from the laboratory data of a coal-fired power plant. Coal quality data includes industrial analyses data and elemental analysis data. e soft computing method proposed in this paper can realize industrial analyses and element analyses of the coal quality based on these data. Among them, the industrial analyses of the coal quality include low calorific value, total moisture, ash content, and volatile; the elemental analysis of the coal quality includes carbon content, hydrogen content, oxygen content, nitrogen content, and sulfur content.
e DFAS-LSTM model is used to solve the problem of the coal quality soft computing. e framework of the DFAS-LSTM model for coal quality soft computing is shown in Algorithm 1.

Metrics.
In order to intuitively represent the ability of the model, this paper uses the fitting index and root mean square error as the evaluation index of the model. Equation (21) shows the expression of the fitting index.
where R f is the fitness index, y i is the real value of the sample, and y i p is the output value of the model. e model proposed in this paper belongs to regression model, using root mean square error as the loss function of the model; the loss function can evaluate the deviation degree between the model output and the real value, and the smaller the value, the better the robustness of the model. e expression of the RMSE loss function is given in where RMSE is the root mean square error, n is the number of samples, y i is the true value of samples, and y i p is the output value of the model.

Soft Computing of Industrial Coal Quality Analyses.
Industrial analyses of the coal, also called technical analysis or practical analysis of the coal, is the basis for evaluating coal quality and an important indicator for understanding coal quality. In this section, soft computing of the industrial analyses is carried out, which includes low calorific value, total moisture, ash content, and volatile. e specific meaning of each industrial analysis involved is as follows: (1) Low calorific value: the low calorific value of the coal refers to the heat produced by the combustion of the coal under atmospheric pressure, after deducting the vaporization heat of the moisture in the coal, the remaining heat that can actually be used (2) Total moisture: it is the moisture that the coal sample loses when it is in the air and reaches equilibrium with air humidity (3) Ash content: the ash content of the coal refers to the residue left after the coal is completely burned (4) Volatile: the volatile of the coal is the content heated by insulation at a certain temperature, and Encoder Decoder ······ ······ ······ Figure 9: Encoder-Decoder with attention mechanism.
the moisture is subtracted from the escaped material In order to verify that the DFAS-LSTM model proposed in this paper has advantages, in this section, based on the data, soft computing of the industrial analyses is realized by using DFAS-LSTM, conventional Bi-LSTM model, Bi-LSTM model with improved activation function, and Bi-LSTM with alertness mechanism. Each model runs 20 times, the accuracy obtained is averaged, and the statistics are shown in Figure 10.
Using the DFAS-LSTM model proposed in this paper, based on the conventional measurement points data of a coal-fired power plant, the soft computing of the above industrial analyses' information is realized. In chronological order, 20 data points were selected at random time to show the actual value of industrial analyses and the soft computing result for comparison. e result is shown in Figure 11.

Soft Computing of Elemental Coal Quality Analyses.
In this section, DFAS-LSTM model is used to achieve soft computing for elemental analysis of the coal. Elemental analysis of the coal is to detect and analyze the element content in coal. e element content in coal is an important indicator of the coal quality. e element analyses data used in this paper is based on the received basis. In this section, elemental analysis specifically includes carbon content, hydrogen content, oxygen content, nitrogen content, and sulfur content. Similarly, elemental analysis soft computing is realized by using DFAS-LSTM, conventional Bi-LSTM model, Bi-LSTM model with improved activation function, and Bi-LSTM with alertness mechanism. Each model runs 20 times, the accuracy obtained is averaged, and the statistics are shown in Figure 12.
Input: Data of the historical conventional measurement points of the coal-fired power plants Xh; Real-time conventional measurement data of the coal-fired power plants X r ; Coal quality test data of the coal-fired power plants y r ; Output: Real-time coal quality data in the furnace of the coal-fired power plants y; (1) Remove noise from historical data Xh and filter it; (2) Standardize the data, process the standard data by PCA and ICA algorithm to obtain data Xi; (3) Initialize weight parameters, batch the data to get X ib , and input X ib into DFAS-LSTM; (4) Use alertness mechanism to process data X ib and obtain data X a ; (5) Data X a input to LSTM, which is based on improved activation function and fusion structure; (6) Use the coal quality test data y r to compare with the output of the neural network to obtain the cost function C; (7) Use the optimizer to optimize the cost function C by updating the weight parameters of the neural network; (8) After the model is stable, the optimization ends and the model parameters are solidified; (9) Take X r as input, output coal quality information y in real time.

Mobile Information Systems
For elemental analyses of the coal, the same work has been done as industrial analyses. Using the DFAS-LSTM model proposed in this paper, the soft computing of the above elemental analyses is realized. In chronological order, 20 data points were selected at random time to show the actual value of the elemental analyses and the soft computing result for comparison. e result is shown in Figure 13.

Conclusions
In this paper, the information fusion technology applied in the coal-fired power plant is discussed. As a practical application, the soft measurement of the coal quality in the power plant is achieved by the information fusion method. Combining the sensor network of a coal-fired power plant, an improved LSTM model with the bidirectional fusion, alertness mechanism, and parameter self-learning (DFAS-LSTM) is proposed to realize the soft computing of the coal quality. e use of the alertness mechanism can suppress the interference information; the use of the improved activation function of parameters self-learning can improve the accuracy of the model; the use of the bidirectional fusion structure can improve the accuracy and the generalization ability of the model. In order to verify the superiority of the DFAS-LSTM model proposed in this paper, the model is compared with conventional Bi-LSTM model, Bi-LSTM model with improved activation function, and Bi-LSTM with alertness mechanism. For the test of the model, the data of the coal-fired power plant is used to achieve the industrial and elemental analyses of the coal quality. To be specific, the industrial analyses include low calorific value, total moisture, ash content, and volatile.  verification shows that the DFAS-LSTM model proposed in this paper basically completes the functions of industrial and elemental analyses, which provides support for online analyses of the coal quality in coal-fired power plants. Traditional measurement methods rely on expensive equipment and cannot achieve coal quality analyses in real time. e soft measurement method proposed in this paper avoids the weakness of the traditional measurement methods and saves the cost. In the future, the correlation between conventional measurement points and coal quality will be further analyzed. By removing the measurement points with low correlation, the data dimension will be further reduced. e model parameters and the time consumed for analyses will reduce at the same time.
Data Availability e data used to support the findings of this study are available from the corresponding author Mei Wang, whose e-mail is wangm@xust.edu.cn.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the paper.