A Short-Term Load Forecasting Model of LSTM Neural Network considering Demand Response

As one of the key technologies for accelerating the construction of the ubiquitous Internet of .ings, demand response (DR) not only guides users to participate in power market operations but also increases the randomness of grid operations and the difficulty of load forecasting. In order to solve the problem of rough feature engineering processing and low prediction accuracy, a shortterm load forecasting model of LSTM neural network considering demand response is proposed. First of all, in view of the strong randomness and complexity of input features, the weighted method is used to process multiple input features to strengthen the contribution of effective features and tap the potential value of features. Secondly, an improved genetic algorithm (IGA) is used to obtain the best LSTM parameters; finally, the special gate structure of the LSTMmodel is used to selectively control the influence of input variables on the model parameters and perform load forecasting..e experimental results show that the research has high prediction accuracy and application value and provides a new way for the development of power load forecasting.


Introduction
In recent years, due to the shortage of fossil energy such as oil and the serious environmental problems caused by global carbon emissions, more and more experts and scholars focus on the direction of "Energy Internet [1][2][3]." People aim to establish a more flexible and stable form of hybrid power generation. As an important constraint for the safe and stable operation of power system, accurate power load forecasting plays an important role in the economic, safe, and reliable operation of the power system.
Demand response (DR) serves as important means to shift energy supply and uses across time to counter the indeterminate variations. It is of great significance to improve the overall energy efficiency of the energy system [4]. However, especially in the Regional Energy Internet System [5], the strong uncertainty of the user-side DR will increase the difficulty of load forecasting. Accurate load forecasting is the basis for realizing the economic operation and scientific management of the power system, as well as the inevitable choice for long-term strategic planning and operational decision-making. In fact, the fundamental requirement for the implementation of DR programs is the load forecasting of its participants whether they are the group of residential consumers managed by aggregators, industrial consumers, or commercial consumers. DR capacity and DR events to be involved in these programs can be determined through load forecasting [6]. Instantly, the relevant technologies of artificial intelligence (AI) and machine learning (ML) have been widely used in renewable energy forecasting, load forecasting, power system fault diagnosis and optimal dispatching, power grid data visualization, and so forth [7]. At the same time, with the installation and application of a large number of measurement equipment such as smart meters [8] and weather monitoring [9], power grid companies have obtained unprecedented large amounts of data. It provides a lot of data support for power load forecasting based on AI [10]. On this basis, to improve the accuracy of load forecasting and the generalization ability of the load forecasting model has become an urgent problem to be solved.
Load forecasting is one of the most widely used areas of artificial intelligence technology in power systems. Scholars have carried out extensive research on the theory and methods of load forecasting. e methods for power load forecasting mainly include traditional methods and artificial intelligence methods. With the development of the times, traditional methods [11][12][13] have gradually exposed problems such as poor model generalization ability, difficulty in determining model structure, and difficulty in selecting parameters. Artificial intelligence methods [14][15][16][17][18] have gradually occupied the stage of load forecasting. Ahmad et al. [19] used two ML algorithms to forecast regional shortterm energy demand. In [20], deep recurrent neural network (DRNN) and deep convolutional neural network (DCNN) were presented for day-ahead multistep load forecasting in commercial buildings. Khan et al. used grey correlation based on random forest (RF) and mutual information (MI) for feature selection. Kernel principle component analysis (KPCA) was used for feature extraction and enhanced convolutional neural network (CNN) was used for classification [21]. e choice of model hyperparameters affects the entire prediction process. e essence of optimal model configuration is actually a combination problem. Bouktif et al. used metaheuristic-search-based algorithms to reduce the complexity of the search.
is method can find the optimal LSTM hyperparameter set more accurately and quickly [22]. For the problem of hyperparameters, Santra and Lin have similar views. Santra and Lin proposed using GA to optimize the initialization parameters of LSTM. is method improved the robustness of short-term load forecasting [23]. Rong and de León [24] proposed a load estimation method suitable for complex power networks. In order to solve the problem of network unobservability, Rong and de León proposed a nonlinear power-temperature curve to predict the load that varies with temperature. e "effective temperature" was introduced in the paper to properly consider the impact on the power consumption of heating and cooling equipment. is method has been well verified in practice. Li et al. [25] proposed a short-term load forecasting method that considers demand response under an energy Internet environment. e grey correlation analysis method was used to process meteorological data to obtain similar daily characteristic variables input to the forecast model. e LSTM neural network model was used for load forecasting. is method separates the DR electricity price that causes users to participate in DR to indirectly consider the DR load.
is method provides a new idea for load forecasting considering DR. e rapid development of AI provides more possibilities for load forecasting technology. At the same time, it also puts forward stricter requirements on the quality of load forecasting, which brings new challenges and opportunities to load forecasting technology.
In order to improve prediction accuracy and model generalization ability, a short-term load forecasting model of LSTM neural network considering DR is proposed in this paper. Based on characteristics of engineering processing, the weighted method [26] is used to deal with multiple input features. In order to strengthen the contribution of effective features and explore the potential value of effective features in depth, this paper defines the weights of input features. en, IGA is used to select the optimal model parameters of LSTM [27]. Finally, the parameter-tuned LSTM model is used for load forecasting.

Feature Extraction
is paper uses the data of New South Wales, Australia, from 2006 to 2010 for experimental verification. e data from 2006 to 2008 is used as the training set and the data from 2009 to 2010 is used as the verification set. In this paper, the maximum temperature, the minimum temperature, the average temperature, the date, and the real-time electricity price of the area are extracted as the input features.

Extraction of Temperature Characteristics.
e average temperature is the true average temperature of each day. e average temperature in winter is the lowest and the average temperature in summer is the highest. e load in these two quarters is higher than in other quarters due to the increase in air conditioner usage. erefore, the feature of average temperature can reduce the abnormal high temperature and low temperature to a reasonable range to reduce temperature fluctuation and then combine it with the real temperature. In this paper, the average temperature weight is set to 0.6, and the true temperature weight is set to 0.4. e specific formula is as follows: (1) T is the input feature of temperature, T avg is the average temperature, and T true is the real temperature. Figure 1 shows the temperature characteristic curve after weighting treatment.

Extraction of Date Characteristics.
Date type is another important influencing factor of short-term forecasting. e load on the user side on weekends will be greater than the load on workdays, and the load on the user side on holidays will also be greater than the load on workdays. is paper lists the date type as one of the factors that affect load forecasting. Considering that the parameters of the deep neural network need to be fed back, this paper expresses workdays as 1 and weekends and holidays as 2. Store the date feature in the fourth column of the feature vector.

Extraction of Real-Time Electricity Price Characteristics.
e reform of the electricity market is an inevitable trend and requirement of national development.
e marketization of the electricity prices is the top priority, and realtime electricity price is an important factor affecting load fluctuation. First, the maximum information coefficient (MIC) [28] is used to analyze the correlation between load and real-time point price. MIC is used to measure the degree of linear or nonlinear dependence between two variables. e greater the mutual information between the two variables is, the stronger the correlation is. MIC can overcome the shortcoming that mutual information is inconvenient to calculate continuous variables and can better reflect the degree of correlation between attributes. For a binary data set D ∈ R 2 divide D into a grid of x columns and y rows. For the divided grid G, calculate the probability of each unit in G to obtain the probability distribution D |G of the binary data set D on the grid G. Save it as I * [D(x, y)]: (2) Standardize the obtained mutual information and find the maximum mutual information coefficient as follows: where n is the sample size; B(n) is a function of the sample size, which represents the constraint of the total number of grids xy in grid G, which must be less than B(n); and generally B(n) � n0. 6. In essence, MIC is a normalized maximum mutual information, with a value interval of [0, 1]. e statistical results show that MIC > 0.4 days, account for 908 days in total, accounting for 46.69%, of which MIC > 0.6 days account for 502 days, accounting for 26.43%. Figure 2 is the curve of electricity price and load. It can be seen that the trend of electricity price and load is basically the same. Residents can check the real-time electricity price to appropriately reduce electricity consumption and reduce the load, which will play a role in peak clipping. When the load reaches a trough, the electricity price will also be reduced, so as to encourage users to increase the amount of electricity to fill the valley. It has been proved by experiments that the electricity price has a great correlation with the load. In this paper, the real-time price is put into the fifth column of the feature vector as the input feature.

Prediction Model
3.1. Improved Genetic Algorithm. Genetic algorithm (GA) is a global search optimization algorithm formed according to the selection and genetic law of "survival of the fittest" in nature. e initial population composed of feasible solutions is operated by three operators of selection, crossover, and mutation. Eliminate one by one according to the fitness value of the individual. Finally, the individuals with the best fitness are left to form a new population, parallel calculation of multiple solution groups by genetic algorithm. Replace generations to achieve global convergence to obtain the optimal solution.
Hill climbing is a good local search algorithm. Firstly, a point is randomly selected as the initial point of iteration in the search space, and then a point is randomly generated in its neighborhood to calculate its function value. If the function value of the point is better than the current point, the current point is used to replace the initial point as a new initial point to continue searching in the neighborhood. Otherwise, the search process will be terminated if another point and initial point are randomly generated in the neighborhood until the better point is found or the better point is not found several times in a row. Because hill climbing is a method to optimize by randomly generating individuals in the neighborhood, it does not need to use a gradient. Hill climbing method can play the role of local optimization when genetic algorithm deals with complex problems. In the iterative process of this paper, the hill climbing method is introduced to optimize the individuals obtained by genetic algorithm, which greatly improves the efficiency of optimization.

LSTM Algorithm Mechanism. LSTM is an excellent variant of Recurrent Neural Network (RNN), which solves the problem of short memory and difficult training of RNN.
It is very suitable for the classification and prediction of time series and is widely used in natural language processing. As shown in Figure 3, t is the time point; i t is the input gate, which is mainly used to control how much information about the current state of the network needs to be saved in the internal state; f t is the forget gate, which is mainly used to control how much information is in the past state and needs to be discarded; o t is the output gate, mainly used to control how much information of the internal state at the current moment needs to be output to the external state; h t is the internal state of the neuron at the current moment; c t is the external state at the current moment; x t is the external input at the current moment; and σ is the activation layer function. e calculation formula between each variable is as follows: rough the comprehensive application of these three control gates, the LSTM neural network can control the retention and discarding of the information transmitted in the neural network. It determines how much of the new state information at the current moment needs to be stored in the  Complexity memory unit. Compared with general neural networks, LSTM neural networks can learn dependencies with a relatively long span without the problems of gradient disappearance and gradient explosion [29].

Selection of LSTM Optimal Parameters.
e initial values of the weighting matrices can affect the performance of the LSTM. Inspired by [23], an improved genetic algorithm (IGA) is used to assist in searching the proper initial values for the weighting matrices of the LSTM. Figure 4 shows the flow diagram of the proposed method.
A set of n chromosomes are randomly generated. Each chromosome w contains the values for all the weighting matrices in LSTM.
Step 2. Set fitness function.
For each chromosome in the current population, its value is used to initialize the weighting matrices in the LSTM. e LSTM predicts the output error of the training sample as the fitness function and calculates the fitness value of the individual in the initial population.
Step 3. Generate the new population through genetic operations.
is step generates a new population that contains the same number of chromosomes as the current population. e roulette wheel selection is used so that a chromosome with a higher fitness value had a higher probability of being selected for genetic operations. e chromosomes of the new population are produced by reproduction, crossover, and mutation operations on the chromosomes selected from the current population.
Chromosomes with higher fitness in the current population will be selected for reproduction, and it will be added to the new population, until the reproduction ratio is reached. e elitism policy is used in this article to ensure that the best chromosome among the current chromosomes can be added to the new population group.
Repeatedly select two chromosomes of the current population as the parent chromosomes for the crossover operation to generate two offspring chromosomes and add them to the new population until the crossover ratio is   reached. Uniform crossover is adopted in this study. With uniform crossover, each value in the offspring chromosomes is independently chosen from the two values at the same corresponding position in the two parent chromosomes, as shown in Figure 5. e mutation operation repeatedly selected a chromosome from the current population, modified the selected chromosome to generate a new chromosome, and added the mutant to the new population, until the mutation ratio has reached. With one-point mutation, a small random change is injected into the value of a randomly selected position in the selected chromosome to generate the mutant, as shown in Figure 6. Mutation operation makes the chromosome population more diverse.
Step 4. Select the best chromosome.
After one evolution, the fitness value is calculated for the retained chromosomes, and the best chromosome is retained. Compare the fitness value of the chromosome with the maximum fitness value and the best chromosome of the previous generation. e best chromosome is recorded. Finally, the chromosome with the best fitness value is used to initialize the weighting matrix in the LSTM, and the training data is fed to the LSTM to generate the fitness value of the chromosome.
Step 5. Stop criterion If the fitness value of the best chromosome is already optimal, then the LSTM of the best chromosome is adopted, and the proposed method is terminated. e parameters of LSTM are initialized with the optimal weights to obtain the optimal network structure.
In this paper, the original real data set is divided into three training sets and a test set. Feature engineering processing is carried out by means of the weighted discretization method. e input features processed by the weighting method are more valuable, and then the IGA is used to select the optimal parameters of the LSTM network. e LSTM network after parameter tuning is used to train and predict real data. e specific process is shown in Figure 7.

Case Study
In order to evaluate the prediction performance, the mean absolute percentage error Y MAPE and forecasting accuracy Y FA are set according to the load forecasting indexes evaluated by State Grid Corporation of China. e specific formulas are as follows: where n is the total number of predictions and X act (i) and X pred (i) are the real value and the predicted value of the load at time i respectively.  Table 1. It can be seen from the table that whether it is a oneday or half a year, Y MAPE of the proposed model is the smallest and Y FA is the greatest and the goodness of fit is the best of all models, which shows that the method proposed in this paper is more accurate and the robustness is better. Experimental verification shows that the comparison between the four methods and the real load curve is shown in Figure 8. It can be seen from the figure that the load curve predicted by the model proposed in this paper has the highest degree of fit with the real curve.
e prediction results are more accurate than the other three methods. In

Conclusions
In view of the phenomenon that user-side DR is highly uncertain and load prediction becomes more difficult, this paper selects regions with relatively perfect demand-side management as target regions and puts forward a short-term load prediction method that takes DR into consideration. In this paper, the weighted discretization method is used to process the input features and enhance the contribution degree of input features. en, IGA is used to select the optimal model parameters of LSTM. After that, the LSTM prediction model is established for load prediction. Finally, the effectiveness and superiority of the proposed method are verified based on the actual data, and the following conclusions are drawn.
(1) Considering a variety of load influencing factors, this paper uses the weighted discretization method to process input characteristics. And the maximum information coefficient method is used to verify the real-time electricity price characteristics selected in this paper have a strong correlation with load. (2) In this paper, the IGA algorithm is used to select the optimal LSTM parameters. (3) e LSTM model has a special forgetting gate mechanism, which can automatically screen the input variables that are beneficial to the model during the training process. It cuts down the prediction time on the basis of improving the model performance and prediction accuracy.

Data Availability
e data used in the study are available at http://www.aemo. com.au.

Conflicts of Interest
e authors declare no conflicts of interest.   Complexity