Short-Term Load Forecasting Method Based on Deep Reinforcement Learning for Smart Grid

. Short-term load forecasting is an important part to support the planning and operation of power grid, but the current load forecasting methods have the problem of poor adaptive ability of model parameters, which are diﬃcult to ensure the demand for eﬃcient and accurate power grid load forecasting. To solve this problem, a short-term load forecasting method for smart grid is proposed based on multilayer network model. This method uses the integrated empirical mode decomposition (IEMD) method to realize the orderly and reliable load state data and provides high-quality data support for the prediction network model. The enhanced network inception module is used to adaptively adjust the parameters of the deep neural network (DNN) prediction model to improve the ﬁtting and tracking ability of the prediction network. At the same time, the introduction of hybrid particle swarm optimization algorithm further enhances the dynamic optimization ability of deep reinforcement learning model parameters and can realize the accurate prediction of short-term load of smart grid. The simulation results show that the mean absolute percentage error e MAPE and root-mean-square error e RMSE of the performance indexes of the prediction model are 10.01% and 2.156MW, respectively, showing excellent curve ﬁtting ability and load forecasting ability.


Introduction
Load forecasting is a key link to ensure the stable and controllable state of power system [1,2]. Accurate load forecasting can realize the formulation of unit combination, appropriate rotating standby, and maintenance plans, which have a great impact on the planning and operation of power system to a certain extent [3][4][5], and can ensure that the production and consumption of electric energy are always in a balanced and stable state [6]. erefore, the importance of load forecasting in energy distribution and management of modern power system is self-evident.
At present, the power load forecasting can be classified into long-term analysis, medium-term analysis, and shortterm analysis according to the forecasting time scale [7]. Medium-and long-term forecasting is mainly used for load forecasting in the next few weeks to several years. It is often used for long-term planning such as the optimization of grid structure and the layout of power supply, so as to improve the economy of power grid planning and construction. Short-term load forecasting is a part of the daily operation of power grid, and its forecasting accuracy directly affects the stability of power grid planning and operation [8]. erefore, this study realizes accurate prediction and analysis of power grid short-term load.
Traditional load forecasting methods are mainly regression analysis and time-series methods [9,10]. According to the timeline, this kind of method sorts the load historical data, constructs the corresponding mathematical structure of load forecasting, and deduces the analysis formula. e traditional load forecasting method is based on the historical state data of power grid, and the data state model is constructed to realize data linear fitting for load forecasting research and analysis. However, the current smart grid has a large number of distributed energy and random access of intelligent terminals, which have time and space uncertainty, so the power grid state data also have high uncertainty and strong nonlinearity, which have a great impact on the construction and initialization of traditional mode models. [11]. e emergence of deep learning technology provides a new solution for power grid load forecasting and avoids the disadvantages of traditional methods of directly constructing forecasting and analysis models based on data [12]. e power state data can be continuously trained and learned through the multilayer structure network model to build a multilayer prediction network model with superior fitting ability, so as to achieve accurate and efficient load analysis and prediction [13]. It should be noted that due to the deep and complex characteristics of the multilayer network structure, its network parameters are difficult to track the data to achieve adaptive dynamic adjustment [14], and there is a problem of multistep power load prediction error.
Aiming at the above problems, this study proposes a new forecasting method based on deep learning network and reinforcement learning network. e main innovations are as follows: (1) Aiming to realize the reliable data support of the prediction network, this study preprocesses the collected power state data based on the integrated empirical mode decomposition (IEMD) method, so as to improve the order and reliability of the network model training data (2) Aiming to support the prediction network for accurate analysis, the reinforcement learning inception network model is used to adjust the parameters of the deep neural network (DNN), and the hybrid particle swarm optimization algorithm is integrated into the network to achieve accurate and efficient short-term load forecasting of smart grid

Related Work
Accurate short-term load forecasting can realize stable and controllable operation of the power grid [15], and the average error of short-term load can save tens or even millions of dollars for every one percentage point reduction [16]. Traditional load forecasting methods are considered from the aspect of mathematical modeling, such as regression analysis method and time-series method. e traditional forecasting method directly builds the mathematical model of power data, which has high requirements for the quality of collected power state data and can realize power load forecasting with small data volume. However, for the complex and diverse power state data, the analysis error of the prediction model is large. Moreover, for the traditional methods, there are some difficulties in the construction and initialization of the model, which are also difficult to ensure the prediction accuracy [17][18][19].
Due to the random grid connection of new energy and the complex change in load, the change in power load presents high uncertainty, and the power grid state data present strong nonlinearity, which poses a higher challenge to the short-term accurate prediction of power load [20][21][22].
Deep learning and reinforcement learning can extract the characteristics of load state data through multilayer network structure and continuously improve the prediction network model through training and learning process to realize efficient power grid load forecasting. Reference [23] considers the uncertainty of temperature and load, and realizes medium-term load forecasting based on quantile regression neural network. Reference [24] uses automatic coding to compress the load history data and constructs a prediction model based on multilayer gated recurrent unit (GRU) to analyze and predict the daily variation of power load. Reference [25] uses long short-term memory (LSTM) model to predict load power based on temperature history data and power history data. Reference [26] combines LSTM network and RNN network to realize effective prediction of industrial power load based on multilayer hybrid deep learning network. However, it should also be pointed out that although the current multilayer network analysis method solves the problem of direct mathematical modeling, it still has certain requirements for the data quality of the training set. A reliable and complete load data set is an important guarantee for accurate prediction [27]. Moreover, due to too many layers of network structure, there is a certain lag in the adjustment of network parameters, and the mismatch of model parameters will also lead to a certain error in the prediction network.
To solve this problem, this study selects IEMD method to optimize power load data, so as to provide reliable data support for forecasting network model. Based on the parameter adaptive multilayer network structure, the efficient and accurate prediction of short-term load of smart grid is realized, which provides a reliable reference for power grid planning and operation.

Multilayer Network Prediction Method
For the complex scenario of smart grid, this study uses the combination of empirical mode decomposition method and deep reinforcement learning to build a complete power grid state data analysis architecture of reliable preprocessing accurate prediction analysis, so as to realize the accurate prediction of short-term load of smart grid.

Overall Framework.
Power state data have the characteristics of redundancy and incompleteness. It is difficult to achieve accurate short-term load forecasting by training and learning it directly. is study combines data preprocessing technology with load forecasting technology and proposes a deep learning load forecasting model architecture supported by complete data sets. Figure 1 shows the short-term analysis architecture of power load proposed in this study.
As shown in Figure 1, to achieve accurate power load forecasting, this study first uses the data preprocessing method to improve the data quality and second processes the data based on IEMD method to improve the reliability and integrity of training and learning data. At the same time, the multilayer network training model adopts the combination of reinforcement learning network and deep learning network. e concept module realizes dynamic data adjustment to the DNN prediction network to improve the fitting ability of the prediction model.

Load Data Preprocessing.
Power data acquisition has incomplete characteristics, heavy data storage tasks, inconsistent storage formats, and certain errors in data storage and is very vulnerable to noise interference. erefore, data preprocessing is necessary to use data in practical tasks. In the actual data processing process, it mainly includes four steps: data cleaning, data integration, data specification, and data transformation. Among them, data transformation is the most important step in preprocessing. Transforming nonstandard data into standardized data can directly affect the prediction accuracy of the model. ere are two methods of data transformation, which can make different features have the same value range: normalization and standardization. Normalization is to rescale the data within the original range and scale it according to a certain proportion. e specific interval is adjusted according to actual needs to scale the data between [0, 1] and [−1, 1]. Suppose there is a sample of power grid state data, its normalization calculation formula is as follows: where φ (n) is the normalized power grid state data and min(φ (n) ) and max(φ (n) ) are the minimum and maximum values of characteristic φ on all samples, respectively. Data standardization is a step of data preprocessing to make it in a specific range. e purpose is that the data features can have better convergence in back propagation, and each one-dimensional feature is adjusted to mean 0 and variance 1. Suppose there are N samples φ (n) N n�1 , for each one-dimensional characteristic φ, the mean and variance are calculated first. e mean formula and the variance calculation method are shown in the following formulas: In the formula, σ cannot be 0. If the variance is μ � 0, it means that this one-dimensional feature is indistinguishable from the task and can be deleted directly without being included in the feature dimension. e power grid state data have time characteristics. e traditional method uses the wavelet analysis method to realize data processing, but there is the problem of complex and redundant calculation. e empirical mode decomposition (EMD) method can avoid the problem of specifying the basis function by wavelet method based on the time characteristics of the data itself and realize the reliable preprocessing of power grid state data.
EMD is used to analyze and process nonstationary nonlinear signals with large fluctuation. Any signal can be decomposed into several intrinsic model functions (IMFs) to realize orderly and reliable data processing. However, due to the strong uncertainty of the current power grid load characteristics, there is a non-negligible reorganization error in the sorted data.
e IEMD method is used to decompose the signals with different variation characteristics, so as to realize the reliable support of power data. IEMD method overcomes the shortcomings of traditional EMD by adding adaptive white noise and calculating unique signal residual, so that the reconstructed signal is almost the same as the original signal. IEMD method not only overcomes the existing mode mixing phenomenon, but also reduces the reconstruction error by increasing the decomposition times.
J c (·) is defined as the calculation operator of the c th mode component obtained through EMD, w(t) is Gaussian white noise conforming to normal distribution, and then the p th IMF of the original net load curve F(t) to be decomposed through IEMD is IMF p . e main contents of IEMD algorithm are as follows: Step

Data preprocessing
Reliable data support Step 2. In the first stage, calculate the residual sequence τ 1 (t) for the first time as follows: Step 3. Define δ 1 as the amplitude constant of Gaussian white noise. J 1 is the IEMD decomposition operator. At the same time, the sequence τ 1 (t) + δ 1 J 1 (w i (t)) is subjected to H EMD until its first IMF is obtained. en, the second component IMF 2 of IEMD can be calculated as follows: Step 4. For each of the remaining stages p, repeat Step 3 and calculate the p + 1 modal component as follows: Step 5. Perform Step 4 until the obtained residual signal no longer executes any IMF, and the standard condition is that the IMF cannot be extracted from the residual. en, the final residual signal is as follows: erefore, the original net load signal sequence F(t) is finally decomposed into IEMD decomposition method can solve the traditional EMD model aliasing problem by adding standard normal white noise and has more self-adaptability in power grid short-term load forecasting.

Multilayer Network Prediction Method.
To achieve an accurate and efficient short-term load forecasting method, this study combines reinforcement learning and deep learning to form a deep reinforcement learning model to provide excellent fitting and tracking ability. Among them, the reinforcement learning model is constructed by the inception module to improve the training parameters of in-depth learning. e deep learning network is constructed by deep cyclic neural network. e core of the reinforcement learning model is the policy gradient agent. In each time step t, the agent (agent) interacts with the environment s t to determine the action of transferring a state to the DNN model. en, the DNN model uses the newly selected state to complete the task. As shown in Figure 2, this study uses the inception network model as the main network of reinforcement learning model. Based on the currently selected model performance, it will be regarded as a reward for updating agent parameters. Next, we introduce three key points of agent, including environment, action, and reward function.
Environment: the previous state e t−1 and the current input x t are combined to form the environment representation of the agent, as shown in the following formula: Action: after observing the environment z t , the agent should determine which state is best for the DNN unit. Formally, a state set z p will be constructed, which retains the recently obtained p state and sets the maximum size p in advance. e agent takes measures by sampling the optimal state in z p from the multinomial distribution θ p (p | z t ), as shown in the following formula: where [p � i]. If p � i, it evaluates to 1, otherwise it is 0. MLP represents a multilayer perceptron that transforms z t into a vector with dimension k, and softmax function is used to transform the vector into probability distribution η. η i is the i th element in η. en, the z 1−k is transferred to the DNN unit for further calculation.
Reward function: the appropriate reward function can guide the agent to select a series of best skip actions to train a better prediction model. rough the interaction between reward and environment, the agent is encouraged to select the best skip to improve the performance of the model. e reward function is shown in the following formula: is model adopts the simplified inception network structure. In the inception module, the hidden state vectors of each time step are connected to construct a hidden state matrix, and a set of optimized features are generated from the current and nearby hidden state vectors. For DNN training model, firstly, the parameters of the network need to be selected, including the number of input/ output layer nodes and hidden layers. e specific DNN training network is shown in Figure 3. e input layer network parameters of DNN model are closely related to the predicted power load data, which mainly affect the number of nodes in the input layer. e parameters of the hidden layer generally select the trial and error method. By constantly changing the parameters and comparing the analysis results, the number of layers and nodes of the hidden layer can be determined. e specific network parameters are given in the experimental part of the next section and will not be repeated here. en, the model is trained, including forward propagation and directional propagation training of DNN network.
DNN maps the input sequence of x value to the corresponding sequential output y. e learning process is carried out from t � 1 to t � ]. For the time step t, the network neuron parameters of the s th layer update their shared state by the following equation: where x (t) is the input data of step t; y (t) forecast is the corresponding prediction result; h (t) s is the sharing state of the volume layer network at time t; κ is the excitation function, and tanh function is generally used; α (t) s represents the input characteristics of the s layer at the step cycle; and J s and J N represent the model state coefficients of the network at the s th layer. e input features are screened by mutual information theory. Mutual information comes from the concept of entropy in information theory, which is used to represent the amount of information shared among multiple variables, so it is often used as a variable selection tool. e mutual information entropy of input data X and prediction data Y can be defined by the following formula: where n n and m m are the data collective quantity of the data A to be analyzed and the prediction result B. e probability that the value of A of the data to be analyzed is a can be defined as p(a). e probability that the prediction result Y is b can be defined as p(b) accordingly; p(a i , b j ) defines the formula for the joint probability calculation of data sets A and B.
For the deep reinforcement learning model, the essence of the model is multicore parameter network model, and its parameter adaptability is very important for the accurate analysis of the prediction model. erefore, this study uses chaotic particle swarm optimization algorithm to realize the adaptive adjustment of model configuration.
Correction formula is as follows: where v k is the velocity vector of the particle; p b,k is the optimal solution for the particle's current self; p g,k is the optimal solution for population history; β 1 and β 2 are pseudorandom numbers between 0 and 1; δ is inertia weight; and α 1 and α 2 are acceleration constants. e steps of chaotic particle swarm optimization are as follows: (1) Initialize the population and set the built-in parameters of the model (2) Search for individual optimization and global optimization (3) Use equation (15) to update particle velocity and position and weight (4) Calculate the objective function of each particle and retain the best performance particles in the population (5) the best particles in the population by chaos to update the local best points and the global best points (6) when the global optimal solution meets the requirements Figure 4 shows the main flowchart of the multilayer network power load forecasting method proposed in this study.

Experiment and Analysis
e experimental simulation is carried out in the highperformance working machine, and the simulation operation scenario is shown in Table 1.

Mobile Information Systems
Meanwhile, Table 2 shows the main parameter configuration of the short-term load forecasting network proposed in this study.

Experimental Dataset.
e dataset containing 53,154 real-time load demands which come from 2,187 electrical users in an urban power grid company is used to verify the effectiveness of our proposed method. e sampling frequency of load is collected every hour from August 3, 2020, to August 23, 2020, during which the 20-day load curve of each load is recorded. Figure 5 shows an example of short-term power consumption of some loads in the simulation data.
Because the simulation data set used is large, the experimental data are scaled into training set and test set in proportion. e purpose of training set data application is to train the model to a better level. e main purpose of prediction set application is to carry out validation test and evaluate the performance of the model. Divided by training set: test set � 8 : 2.

Forecasting Network Evaluation Index.
Aiming to visually verify and evaluate the advantages of the proposed method, two evaluation indexes for evaluating the prediction model of time series are selected, namely mean absolute percentage error e MAPE and root-mean-square error e RMSE . e calculation formula is as follows: where y i and y * i are the actual value and the predicted value of load demand at the i − th time, respectively; n is the number of all samples as the test set.
As can be seen from the calculation formulas (16) and (17), e MAPE value represents the percentage of the deviation of the prediction result, while e RMSE value represents the absolute value of the actual deviation of the prediction result. e clearer the e MAPE and e RMSE values, the better the prediction effect.

Data Reliability Analysis.
To prove that the proposed method can achieve accurate load forecasting analysis, the traditional EMD method is used as the comparison method to process and analyze the sample data, respectively. Table 3 shows the processing results of the proposed method and the comparison method.    Table 3, in terms of the number of IMFs, the IEMD data processing method can obtain 8 IMF components, which have more IMFs than that in the traditional EMD method, which proves that the data processing method can obtain the frequency components more effectively. At the same time, the frequency and amplitude of the proposed method for IMF 1-imf 3 are kept between 11-12.5% and 1.0-1.2 mw, respectively, which are higher than that of the other components, while the frequency and amplitude of each component of EMD method have little difference. e residual R value also shows that the proposed method has better data processing ability than the traditional EMD method and can provide more reliable and complete data support for the subsequent prediction network.

Load Forecasting Analysis.
In this study, reference [25] and reference [26] are used as comparison methods to realize the prediction and simulation experimental verification of the experimental data set, respectively. All methods realize the performance analysis under the same experimental scenario. Figure 6 shows the qualitative analysis results.
As shown in Figure 6, the method proposed can better track the change curve than the comparison method and has excellent curve fitting ability. e reason is that in the early stage of network model training, this study has obtained reliable and complete data support through IEMD data processing method. More importantly, due to the introduction of inception module of reinforcement learning model, DNN prediction network can realize more rapid and accurate parameter adaptive change. Combined with the application of hybrid particle swarm optimization algorithm, the adaptability of prediction network is further strengthened, and its intuitive performance is more rapid and has accurate load curve tracking ability. On the contrary, the network models in references [25,26] do not involve the adaptive adjustment of prediction network parameters, so it is difficult to achieve real-time and effective load analysis.
To more intuitively analyze the advantages of the prediction performance of the proposed method, this study makes a quantitative analysis. Table 4 shows the prediction performance evaluation indexes of different prediction methods.
As shown in Table 4, the mean absolute percentage error e MAPE and root-mean-square error e RMSE of the method proposed are 10.01% and 2.156 MW, respectively, 11.93% and 1.659 MW lower than reference [25], and 15.16% and 2.515 MW lower than reference [26], showing better load forecasting performance.

Conclusion
In view of the low performance of the current short-term load forecasting methods, this study combines reinforcement learning model with deep learning to propose an     efficient and accurate deep reinforcement learning load forecasting method. is method is mainly divided into two parts: one is to use the IEMD method to orderly process the load side power data, which can provide reliable and complete analysis data support for the prediction network model; secondly, the inception module is used to adjust the dynamic parameters of the DNN for load forecasting network, which can improve the tracking and fitting ability of the forecasting network. e simulation results confirm that the proposed method can track and analyze the power load energy consumption in the actual complex scenario efficiently and accurately. ere are still a few data errors in the existing load data. In this study, the data are eliminated manually. However, if the data screening process is ignored, it will have a great impact on the subsequent model prediction. erefore, it can be combined with blockchain technology to realize distributed storage of power data, improve data transmission quality, and support short-term load forecasting of smart grid.

Data Availability
e data used to support the findings of the study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this study.