Short-Term Power Load Forecasting of Integrated Energy System Based on Attention-CNN-DBILSTM

In view of the fact that the potential high-dimensional features in the historical sequence are diﬃcult to be eﬀectively extracted by traditional power load forecasting methods and the coupling factors of electricity, heat, and gas have not been considered, the correlation of electric heating and gas load is considered in this paper, and a short-term power load forecasting method for integrated energy systems based on Attention-CNN- (Convolutional Neural Network-) DBILSTM (Deep Bidirectional Long-Short-Term Memory) is proposed. First, the correlation between the multiple load inﬂuencing factors is considered, and the Pearson coeﬃcient is used to quantitatively calculate the correlation between the multiple loads. Second, a CNN network consisting of a one-dimensional convolutional layer and a pooling layer is established. High-dimensional features reﬂecting the dynamic changes of the load are extracted, and the proposed feature vector is constructed in the form of time series as the input of the DBILSTM network; the dynamic change law of time series data is modeled and learned. Then, the Attention mechanism is introduced to assign diﬀerent weights to the hidden state of DBILSTM through the mapping weight and the learning parameter matrix, to reduce the loss of historical information and strengthen the impact of key information, and the Dense layer is used to output the load prediction results. Finally, the inﬂuence of the correlation of multiple loads and its inﬂuencing factors on the power load forecasting results is analyzed, based on the historical load data of the integrated energy system in a certain area of Northeast China. The simulation results of the calculation example show that the prediction accuracy of the method reaches 97.99%, and the integrated energy system electric, heat, and gas load correlation coeﬃcients as the input parameters of the Attention-CNN-DBILSTM network can reduce the average prediction error by 0.37% ∼ 1.93%. The proposed method has been veriﬁed to eﬀectively improve the prediction accuracy by comparison with the prediction model results of CNN-LSTM network, CNN-BILSTM network, and CNN-DBILSTM network.


Introduction
Under the background of the strategic goal of "carbon peak and carbon neutrality," the coupled operation mode of electricity, heat, and gas is the key to constructing a new low-carbon, safe, and efficient power system [1,2]. As an important part of the new power system, the integrated energy system takes the power distribution system as the core, and the multienergy flow of electricity, heat, and gas is coupled and complementary. It is an important material basis and realization method for the establishment of a low-carbon, high-efficiency, multienergy power system and energy Internet with new energy as the main body and is the key to promoting the energy revolution in the new era [3][4][5]. However, the operating characteristics of the integrated energy system's electric, heat, and gas multienergy flow coupled operation mode pose a severe test for the power system load forecast [6]. erefore, the consideration of the multiple load correlation factors of electricity, heat, and gas is of great significance for the operation and dispatch of the integrated energy system. Different from the independent operation of traditional power system, thermal system, and natural gas system, the integrated energy system meets multiple load demands through the coordination and complementation of electric energy, heat energy, and natural gas [3]. e importance of the correlation between multiple loads among multiple factors and the way to consider the influence of the correlation between multiple loads in the model are the key issues for integrated energy system power load forecasting. Some research have been carried out at home and abroad on the load forecasting problem that considers multiple load correlations. Literature [7] used historical time series of cooling, heating, and power loads and weather factors to form a multivariable time series and proposed a multivariable phase space reconstruction and Kalman filter for combined cooling, heating, and power system load forecasting methods; the correlation of weather factors with a large connection between cooling, heating, and power loads has been initially considered. Literature [8] analyzed the load characteristics of the integrated energy system and studied the effect of the integrated energy system load pattern and energy consumption data types on the predictability of multiple loads. Literature [9] used Copula theory to analyze the coupling characteristics of the cooling, heating, and electric loads in the integrated energy system, and a multiple load forecasting model for the integrated energy system was established. e common influencing factors of multiple loads are considered by the above research to improve the accuracy of integrated energy system load forecasting. However, limited to the common influencing factors of each energy form, the correlation of coupling factors among multiple loads has not been considered.
In terms of short-term load forecasting algorithms, traditional methods and machine learning have fast forecasting speeds, but the time series of data has not been considered. In recent years, in the context of a large-scale improvement of training data, deep learning methods can fully mine data historical sequence features and have better robustness [10,11]. Many research have been carried out on the short-term power load forecasting based on deep learning at home and abroad. Literature [12] applied DNN (deep neural network) to shortterm power load forecasting. Literature [13] and literature [14] used CNN model to extract load features and capture seasonal cycles to improve load forecasting accuracy. Literature [15] and literature [16] considered the long-term temporality of data. LSTM (Long-Short-Term Memory) network and GRU (gated recurrent unit) network were, respectively, introduced into short-term power load forecasting to solve the problem of vanishing gradients in RNN (recurrent neural networks). Literature [17] proposed a BILSTM network based on feature screening, and the time series features of multidimensional load data were further mined. Literature [18] introduced the Attention mechanism in the short-term load forecasting process, giving the hidden layer different weights, and the influence of important information in the load data was strengthened. e above research has high accuracy in processing data on highly nonlinear sequences, but the power load data of the integrated energy system with a high proportion of renewable energy access has strong volatility and uncertainty; the dynamic change of load is difficult to be better learned by a single, one-way neural network model.
From the above analysis, it can be seen that the current domestic and foreign research on the power load forecasting of the integrated energy system mainly focuses on considering the common influencing factors of multiple loads. However, there are few research on load forecasting problems considering the correlation of multiple load coupling factors. erefore, in view of the strong coupling characteristics between the multiple loads of the integrated energy system and on the basis of considering the correlation of the multiple loads, deep learning methods need to be introduced into the power load forecasting of the integrated energy system. At the same time, relevant influencing factors such as season, weather, and date are considered, and the multiple load time series data are processed and analyzed rationally.
Via the above considerations, the multiple load correlation factors of electricity, heat, and gas are considered in this article, and the CNN-DBILSTM integrated energy system short-term power load forecasting model based on the Attention mechanism is proposed. First, the Pearson coefficient is used to quantitatively calculate the multielement load of the integrated energy system, the tested effective coefficient is used to measure the multielement load correlation of the integrated energy system, and the strong correlation factors are selected to support load forecasting. en, a CNN-DBILSTM short-term power load forecasting model based on Attention mechanism is proposed. In this method, the CNN network is used to extract effective feature vectors from the historical load sequence as the input of the DBILSTM network and to model the dynamic changes of the proposed time series features. e Attention mechanism is introduced to give different probability weights to the hidden state of the DBILSTM network, and the influence of important information is strengthened. Finally, via historical load data of the integrated energy system in a certain area of Northeast China, MAPE (mean absolute percentage error) and RMSE (root mean square error) are used to evaluate indicators. e proposed model is compared with the prediction model results of the LSTM network, the Attention-LSTM network, and the Attention-BILSTM network to verify that the power load prediction accuracy of the integrated energy system is effectively improved by the Attention-CNN-DBILSTM network proposed in this paper.

Analysis of Load Correlation of Integrated
Energy System e integrated energy system is composed of energy input equipment, energy conversion equipment, and multiple loads. Among them, power generation equipment mainly includes photovoltaic power generation, wind power generation, and power purchase from external power grids. Electric energy, heat energy, and gas energy are coupled and converted by multienergy flow through energy conversion equipment. Energy conversion equipment includes electric boilers, micro gas turbines, and P2G equipment. e structure of the integrated energy system is shown in Figure 1.
Considering that the Pearson coefficient can better reflect the direction and degree of the change trend between the two variables, the Pearson coefficient is used to quantitatively analyze the correlation between electricity, heat, and gas in the integrated energy system [19]. For variables 2 Mathematical Problems in Engineering . , x n ] T and Y � [y 1 , y 2 , . . . , y n ] T , the correlation coefficient can be expressed as (1) In the formula, x and y are the mean values of n eigenvalues x and y, respectively. e closer the correlation coefficient r is to 1, the stronger the correlation between the eigenvalues x and y is. e closer the correlation coefficient r is to 0, the weaker the correlation between the feature values x and y is.
en, the significance test method is used to test the reliability of the correlation coefficient r. Suppose H 0 means that there is no correlation between the two variables the tdistribution test is used for the statistics, and the calculation formula is Finally, according to the given significance level α and the degree of freedom d t � n − 2, the t distribution table is used to find the t distribution with the degree of freedom n − 2, that is, the critical value of t a/2 (n − 2). If a, the null hypothesis H 0 is rejected, then the previous hypothesis H 0 is rejected, indicating that the two variables are correlated.
e correlation between multiple loads and its influencing factors is considered in this article, and the significance level α is not given. According to formula (2), the statistics are calculated, and then the significance level α that satisfies the rejection H 0 is obtained by checking the t distribution table.
In the short-term power load forecasting process of the integrated energy system, the power load forecast results are not only related to the types of electricity, heat, and gas but also affected by weather factors and economic factors. However, the increase in the factors considered in the forecasting model will also increase the uncertainty. e Pearson coefficient is used in this paper to carry out quantitative correlation analysis, and the more relevant influencing factors are selected for the short-term power load forecasting of the integrated energy system to improve the accuracy of the forecasting model.

CNN Principle
Structure. CNN is a feed-forward neural network, and it is also a learning algorithm with a multilayer network structure [20]. It consists of a convolutional layer, a pooling layer, and a fully connected layer. CNN uses local connections and weight sharing to process data information.
e alternate use of multiple convolutional layers and pooling layers to extract data feature vectors can effectively reduce data complexity, reduce the number of weights, and improve the quality of data features and the generalization ability of the prediction model. e CNN model structure is shown in Figure 2.

BILSTM Principle Structure.
e LSTM network is an improved RNN network. e gradient descent method is used to eliminate the error gradient, which solves the problem of gradient explosion and gradient disappearance during the training process, and the prediction accuracy is improved [21]. e BILSTM (Bidirectional Long-Short-Term Memory) network is proposed to solve the problem of low data utilization and poor data relevance caused by the training method of forward time series propagation of the LSTM network [22]. e BILSTM network is a two-way cyclic network based on time series. e input data is trained through the two-way time series, and the output data contains information on the entire time series, which has better time series data processing capabilities. e RNN has memory through parameter sharing between neurons and is often used to process the nonlinear characteristics of sequence data, but its network storage capacity is poor. As the time sequence interval increases, the gradient caused by the hidden layer information being covered disappears. LSTM introduces long-term memory and short-term memory through the gating unit, which solves the problems of gradient explosion and gradient disappearance to a certain extent. Its network structure is shown in Figure 3.
LSTM controls the output of the memory unit by three logic units: input gate i t , forget gate f t , and output gate o t .
In the above formula, and b o are gate training parameters; i t is jointly determined by input X t , the previous hidden layer output h t−1 , and the activation function σ; tanh is the activation function; C t is the cell state; C t is the candidate value of the new cell state C t .
LSTM network prediction data depends on the output of the hidden layer at the previous moment, so the data usage rate is low and the relevance is poor. BILSTM is composed of a combination of a forward LSTM of a forward input sequence and a backward LSTM of a reverse input sequence; each generates output data and then is connected to the output node to synthesize the final output data. e forward and backward relationships of the input data are effectively extracted, without relying on data timing and predefined parameters, and the network structure of BILSTM is shown in Figure 4.

DBILSTM Principle
Structure. Different from traditional power load forecasting, the power load forecasting process of the integrated energy system needs to consider factors such as electricity, heat, and gas load correlation. Considering that the single-layer BISLSTM network is poor in processing complex time series data in the power load forecasting of the integrated energy system, in this article, the DBILSTM network model composed of multiple BISLSTM networks is used to forecast the power load of the integrated energy system. e network structure is shown in Figure 5. e DBILSTM network is composed of an input layer, a hidden layer, a Dense layer, and an output layer. e hidden layer is composed of n BILSTM networks. e BILSTM network of each layer obtains the information in the front and back directions through the forward LSTM network and the reverse LSTM network [23].
rough the information fusion of the first n -1 layer BILSTM network, the output of the n-th layer is used as the bidirectional time series feature vector of the load at time t, and the prediction result is output through the Dense layer. e calculation process is as follows.
Suppose that the i-th input sequence is In the above formula, f is the activation function; M �→ and M ← are the forward and backward weight matrices, t with the superscript representing the current number of layers; ⊕ represents the addition calculation, keeping the original data dimension unchanged. e output sequence of the n-th layer can be expressed as

Attention Mechanism Principle Structure.
e Attention mechanism is based on the resource allocation mechanism of the human brain's attention.
e essence is to ignore low-relevance information and highlight the required information through a probability allocation mechanism. According to the influence of the input feature vector on the output feature vector, different weights are given to the state of the hidden layer, thereby effectively improving the prediction accuracy of the model [24]. In this article, the Attention mechanism is introduced into the CNN-DBILSTM integrated energy system power load forecasting model. e influence of inputs at different time steps on the load forecasting results is selectively paid attention to, and higher weights are assigned to key information, so that the load forecasting accuracy is improved. e Attention mechanism structure is shown in Figure 6; x t (t ∈ [1, n]) is the attention probability distribution value of the CNN-DBILSTM hidden layer under the Attention mechanism, and y is the CNN-DBILSTM output value after the Attention mechanism is introduced.

Attention-CNN-DBILSTM Model
In the process of power load forecasting of the integrated energy system, historical data based on time series contains important characteristic information, which reflects the trend of power load changes in the integrated energy system. Traditional methods such as DBM and DNN require artificial extraction of fixed time features, and the time series and correlation of historical load data have not been fully considered. In this article, the CNN network is first used to extract the historical load periodic feature data, and the dynamic changes of the proposed time series features are modeled and learned and then input into the DBILSTM network for training. e DBILSTM network has a good performance in the modeling of high volatility and uncertain time series load data in the integrated energy system. However, the short-

Input layer
Output layer Mathematical Problems in Engineering term power load forecasting of the integrated energy system has a long time series, and the DBILSTM network may have problems such as information loss and modeling difficulties. e Attention mechanism is introduced to give different probability weights to the hidden state of the DBILSTM network to strengthen the influence of important information.
erefore, a short-term power load forecasting model for the integrated energy system based on Attention-CNN-DBILSTM is proposed, which can effectively learn the dynamic characteristics of the time series data of the integrated energy system by combining multiple network structures.

Attention-CNN-DBILSTM Model Structure
e power load forecasting framework based on Attention-CNN-DBILSTM proposed in this paper is shown in Figure 7. It is composed of input layer, CNN layer, DBILSTM layer, Attention layer, Dense layer, and output layer. e specific description of the model is as follows: (1) Input layer: First, the Pearson coefficient was used to perform a correlation analysis on the relevant data, and a correlation coefficient greater than 0.3 was selected as an effective input factor. en, the relevant data are processed and normalized for abnormal amount. e input layer uses preprocessed historical time series data as the input of the prediction model, which can be expressed as e CNN layer is used to extract features from the input data. First, a CNN network consisting of 2 convolutional layers, 2 pooling layers, and a Dense layer is built. A CNN network consisting of a one-dimensional convolutional layer and a pooling layer is established, and the ReLU activation function is used for activation. In order to retain the load fluctuation information, pooling layer 1 and pooling layer 2 select the maximum pooling. After the convolutional layer and the pooling layer are processed, they are output through the Dense layer and extracted to the feature vector.
e Sigmoid activation function is used by the Dense layer. e output feature vector of the CNN layer can be expressed as In the above formula, C 1 and C 2 are the outputs of convolutional layer 1 and convolutional layer 2, respectively; P 1 and P 2 are the outputs of pooling layer 1 and pooling layer 2, respectively; W 1 , W 2 , and W 3 are weight matrices; b 1 , b 2 , b 3 , b 4 , and b 5 are deviations; ⊗ is the convolution operation function; the output length of the CNN layer is I; (3) DBILSTM layer: e DBILSTM network is composed of multiple BILSTM networks including forward LSTM and reverse LSTM. Compared with the x t h t Figure 6: Attention mechanism structure. Dense layer traditional LSTM network, the DBILSTM network has better time series data learning capabilities. e final output sequence of the DBILSTM layer is In the above formula, g is the Rule activation function of the Dense layer; W d and W o are the weight parameters of the Dense layer and the output layer, respectively; and b d is the offset of the Dense layer. (4) Attention layer: e output of the DBILSTM layer is used as the input of the attention layer via the activation function. Calculate the probability corresponding to different eigenvectors according to the weight distribution principle, and continuously update iteratively to get a better weight parameter matrix. e weight coefficient calculation formula can be expressed as In the above formula, e t represents the attention probability distribution value determined by the output vector of the DBILSTM layer at time t; u and w are weight coefficients; b is the bias coefficient; s t is the output of the Attention layer at time t. (5) Output layer: e output layer calculates the output with a prediction step of m through the fully connected layer. Y � [y 1 , y 1 . . . y m ] T . e prediction formula is In the above formula, y t is the predicted output value at time t; w o is the weight matrix; b o is the bias vector; w o is the deviation vector.

Loss Function.
In the Attention-CNN-DBILSTM model training process, Adam (adaptive moment estimation) optimization algorithm is selected to optimize the model parameters [25]. Adam is a first-order optimization algorithm that optimizes the output value of the loss function through iterative network weights. e root mean square error function is used to express the loss function of the model, which can be expressed as In the above formula, n is the number of load forecast output moments; y i is the actual value; y i is the load forecast value at time i.

Case Analysis.
In order to verify the accuracy of the Attention-CNN-DBILSTM model considering the multielement load correlation of the integrated energy system, the MATLAB platform was used to simulate and analyze the original electricity, heat, and gas load data of the integrated energy system in a certain area of northern China from January 1, 2016, to December 31, 2017. 48 points are collected a day, with a sampling interval of 30 minutes, and the training set, validation set, and test set are divided as to 8 : 1 : 1. In addition to electricity, heat, and gas load data, the integrated energy system relies on information acquisition devices to obtain hourly steps of temperature, radiation, wind speed, wind direction, and other related data and holiday information.
e distribution of power load data of the integrated energy system is shown in Figure 8. It can be seen that the load data is cyclical when measured in years, but it fluctuates sharply when measured in days.

Data Preprocessing and Model
Evaluation Index e mean square method is used to deal with the abnormal amount in the load data, which can be expressed as In the above formula, n ′ is the number of pieces of daily load data; x i is the load on the i-th day; the abnormal point criterion is |x i − x l | > 3δ.
In order to facilitate the training of the model, the minmax normalization method is used to linearly transform the original data and map it between (0, 1), which can be expressed as In the above formula, x ′ is the normalized data; x is the original load data; x max and x min are the maximum and minimum values of the sample data, respectively. e mean absolute percentage error (MAPE) and the root mean square error (RMSE) are selected as evaluation indicators, which can be expressed as In the above formula, n is the number of predicted results; y i and y i are the actual value and the predicted value, respectively.

Correlation Coefficient of Influencing Factors
Correlation analysis is performed on the electricity, heat, and gas load data in the integrated energy system. After the t distribution test is performed on the obtained Pearson coefficient, the influencing factors with a correlation coefficient greater than 0.3 are normalized and then used as model input for training. e Pearson correlation coefficient values of typical months are shown in Table 1. e influencing factors of the effective correlation of multiple loads are shown in Figure 9. It can be seen that the thermal load correlation coefficients are all greater than 0.4, which has a great impact on the power load forecasting, and then the thermal load data in the period is used as an influencing factor to participate in the power load forecasting. e gas load affects the power load forecast at a certain time, and then the cases where the correlation coefficient is greater than 0.3 are screened out and the gas load data at this time is used as an influencing factor to participate in the power load forecast. erefore, it is necessary to consider the heat load and gas load data in the power load forecasting process of the integrated energy system.

Result Analysis
In order to verify the accuracy and stability of the proposed model, CNN-LSTM, CNN-BILSTM, CNN-DBILSTM, Attention-CNN-DBILSTM, and Attention-CNN-DBILSTM models considering the correlation of multiple loads are introduced for comparative analysis.
e input data are all preprocessed time series data. e data of the first 16 months are used to train the model to predict the daily load for the next 8 months. One week in each month of the forecast sample is taken for daily power load forecasting. e evaluation indexes of each model are shown in Table 2.
According to the change trend of the average value of the prediction result evaluation index of each model in Figures 10 and 11, it can be seen that the average value of the MAPE index of the Attention-CNN-DBILSTM model that considers the multiple load correlation proposed in this article is reduced by 1.93%, 1.39%, 0.71%, and 0.37%. e average RMSE index decreased by 41.73%, 29.93%, 23.61%, and 15.79%. erefore, after the multivariate load correlation is considered, the prediction accuracy is effectively improved by the proposed Attention-CNN-DBILSTM model, and the average prediction accuracy can reach 97.99%.
In order to more intuitively reflect the load forecasting effect of different models, Figure 12 shows the comparison between the load forecasting value and the actual value on June 28, 2016, which is a randomly selected working day.
It can be seen that the load curve of the selected forecast day is approximately a bimodal curve, and the load curve fluctuates greatly at the sampling point 20-40 (corresponding to the 10 : 00-20 : 00 time period). Compared with other models, the proposed model has the smallest fluctuation in the peak and trough regions of the load curve. In addition, in the rising and falling phases of the morning and evening load curves, the law of load changes can also be well captured by the proposed model. Figure 13 shows the comparison between the predicted load value and the actual value on January 1, 2017, which is a randomly selected holiday. e prediction results of other models have fluctuated greatly. e proposed model has better prediction accuracy, and it has been verified that the proposed model is also suitable for holidays. erefore, the proposed Attention-CNN-DBILSTM prediction model can be close to the actual value, and the prediction accuracy is higher.
It can be seen that the Attention-CNN-DBILSTM prediction model proposed in this article can be close to the actual value, and the prediction accuracy is higher. Compared with other models, the proposed model performs more prominently during load peak and valley period. e law of load change during peak and valley period can be accurately analyzed, and the accuracy of load forecasting is effectively improved. Compared with other models, the proposed model performs better in the peak-valley period, can accurately analyze the load variation law during the peak-valley period, and is also applicable to holidays. e accuracy of load forecasting is effectively improved.

Conclusions
Aiming at the influence of the multiple load correlation factors of the integrated energy system on power load forecasting, the correlation between electricity, heat, and gas load is analyzed, and the CNN-DBILSTM short-term power load forecasting model based on the Attention mechanism is established. e effective features of load changes are extracted through CNN, and the CNN-DBILSTM network combined with the Attention mechanism is used to train the model to mine the internal timing characteristics of the load data. en the Dense layer is used to output the short-term power load forecast value. Finally, the validity of the model is simulated and verified based on the historical load data of the integrated energy system in a certain area of northern China. e main conclusions are as follows: (1) e coupling characteristics of the electricity, heat, and gas multiple loads of the integrated energy system are considered, and the Pearson coefficient is used to quantitatively calculate the correlation of multiple loads. e results show that the thermal load, natural gas load, and electrical load are highly correlated. Among them, the correlation coefficient between electrical load and thermal load at some moments is stronger than that of weather.
(2) After the multiple load correlation is considered, the average prediction accuracy reaches 97.99%. e average MAPE dropped by 0.37%∼1.93%, and the average RMSE dropped by 15.79%∼41.73%. e accuracy of power load forecasting was effectively improved.
(3) Compared with traditional deep learning models, the CNN-DBILSTM short-term power load forecasting model based on the Attention mechanism proposed in this article can fully mine the time series characteristics of load data under multidimensional input feature parameters and has higher prediction accuracy in shortterm power load forecasting.

Data Availability
e datasets generated for this study are available upon request to the corresponding author.

Conflicts of Interest
e authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.