Short-Term Load Forecasting Based on Frequency Domain Decomposition and Deep Learning

In this paper, we focus on the accuracy improvement of short-term load forecasting, which is useful in the reasonable planning and stable operation of the system in advance. For this purpose, a short-term load forecasting model based on frequency domain decomposition and deep learning is proposed. 0e original load data are decomposed into four parts as the daily and weekly periodic components and the lowand high-frequency components. Long short-term memory (LSTM) neural network is applied in the forecasting for the daily periodic, weekly periodic, and low-frequency components. 0e combination of isolation forest (iForest) and Mallat with the LSTM method is constructed in forecasting the high-frequency part. Finally, the four parts of the forecasting results are added together.0e actual load data of a Chinese city are researched. Compared with the forecasting results of empirical mode decomposition(EMD-) LSTM, LSTM, and recurrent neural network (RNN) methods, the proposed method can effectively improve the accuracy and reduce the degree of dispersion of forecasting and actual values.


Introduction
In order to meet the needs of rapid social development, the power system has gradually turned into a self-healing, largescale renewable energy access, economic, and efficient smart grid. Load forecasting becomes the basis of smart grid planning and operation [1][2][3]. Short-term load forecasting is important for scheduling plan arrangement, unit combination optimization, and so on [4,5].
At present, various load forecasting methods have been studied, such as the time series method [6], neural network method [7][8][9][10][11], support vector machine algorithm [12][13][14], local-weighted linear regression method [15], and RF method [16,17]. Among them, neural network method is widely used in load forecasting research. In [11], the LSTM neural network method is used in 48 nonresidential consumers' energy consumption data in China. e problem of trapped in local minimum and disappeared gradient in the traditional neural network method during training is effectively solved by the LSTM neural network, which is beneficial to improving the accuracy of load forecasting. In recent years, methods for decomposing and reforecasting load sequences have become hot topics, such as EMD [18,19], sparse decomposition [20], local mean decomposition [21], and frequency domain decomposition [22,23]. Among them, the empirical mode decomposition method has strong adaptability, but there are problems such as modal mixing, end effects, and overenvelope. Sparse decomposition method is widely used in signal processing, but this method uses a double dictionary set decomposition algorithm, leading to the large calculation amount. Compared with empirical mode decomposition, local mean decomposition has less end effects and less iterations, but its oversmoothness has some influence on forecasting accuracy. Frequency domain decomposition method is introduced based on the periodicity of the load sequence. Using this method, the periodicity of the load is mined and extracted from the perspective of the frequency domain. An appropriate forecasting method is established according to the law of load component. However, the forecasting accuracy of the high-frequency component still did not achieve the ideal forecasting precision.
In this paper, the field data of a Chinese city are decomposed based on the frequency domain decomposition algorithm to obtain four parts as the daily and weekly periodic components and the low-and the high-frequency components. e LSTM method is used to train and predict highly regular daily and weekly periodic components and smooth low-frequency components. e high-frequency component periodicity is not obvious, and the fluctuation is relatively large. For the purpose of cleaning abnormal data and removing high-frequency burrs, iForest algorithm is used for the data cleaning of high-frequency component training samples. Mallat algorithm is also used to decompose cleaned data, and then the high-frequency signal and lowfrequency signal are obtained. Low-frequency signal is used as training samples, combined with the LSTM algorithm to forecasting the original high-frequency component. Finally, the forecasting results of the four components are reconstructed according to the time points. e overall forecasting method is shown in Figure 1.

Description of Decomposition.
After the four parts are obtained as the daily periodic, the weekly periodic, the lowfrequency and the high-frequency components, different forecasting methods will be adopted according to the characteristics of different component sequences. e load sequence is random, and after the frequency domain decomposition, the partial load sequence shows certain regularity. In this way, the forecasting accuracy will be improved.

Frequency Domain Decomposition Algorithm.
Other frequency domain decomposition algorithms (EMD and VMD) are generally decomposed according to scale functions or filters. e frequency domain decomposition algorithms used in this paper are based on the preset angular frequency decomposition and then used the characteristics of periodic changes in the load to reconstruct the sequence and decompose.
Power load is a time series with a strong periodical characteristic. Fourier transform is used to decompose the load time series P(t) as follows [24]: where N is the number of original load data and a 0 is a direct current component. In this way, P(t) is decomposed into components in which angular frequency is w i � 2π × i/N, (i � 1, 2, . . . , N − 1). Taking advantage of the periodicity of load variation, according to the size of angular frequency w i and coefficients a i and b i , P(t) can be reconstructed as follows: where a 0 + D(t) is the daily periodic component and W(t) is the weekly periodic component; the two above periodic components vary according to a fixed cycle. L(t) and H(t) are low-and high-frequency component, respectively. e reconstruction method of decomposition results is introduced by modular operation; use mod(x, y) to represent the remainder of x divided by y. In this paper, 96 points of power load daily sampling are taken as an example.
(1) e period of a 0 + D(t) is 96 sampling time, which is a load component that varies within a period of one day, and its angular frequency set is (2) e period of D(t) is 7 × 96 sampling time, which is a load component that varies with a period of one week, and its angular frequency set is  Mathematical Problems in Engineering frequency sets of the two are shown in the following formula: Taking 28 days and 96 sampling points per day as an example, the set of all possible values of frequency subscripts i of a 0 + D(t), D(t), L(t), and H(t) components is as follows: where the angular frequency equals to 0 which corresponds to the direct component. Discrete Fourier transform and an inverse discrete Fourier transform are listed as the following formulas: According to the Fourier transform decomposition coefficients a i and b i and spectrum X(w i ), the following relationship is obtained: erefore, after the discrete Fourier transform load sequence, the coefficients a i and b i can be obtained from the spectrum values. e inverse Fourier transform is combined with Euler's formula f jθ � cosθ + jsinθ, and the decomposed sequence is obtained as Considering the length of the article, we only list the process of calculating a 0 + D(t) sequence as follows: (1) P(t) is processed by discrete Fourier transform, and X(w i ), For all w i ∉ Ω day , let X(w i ) � 0, and a new spectrum sequence X ′ (w i ) is obtained, as shown in the following formula: X ′ (ω i ) is inversed by Fourier transform as follows: In addition, the other component sequences can also be obtained according to the aforementioned method.

Forecasting of the Daily Periodic, Weekly
Periodic, and Low-Frequency Components

Description of the Daily Periodic, Weekly Periodic, and
Low-Frequency Component Forecasting. At present, LSTM has been researched in the field of load forecasting and has achieved good forecasting accuracy [25]. e regularity of the daily periodic component and the periodic component is very obvious, and the low-frequency component is smooth and flat. LSTM neural network is used to forecasting the daily periodic, weekly periodic, and low-frequency components.

LSTM Neural Network.
LSTM architecture was proposed by Hochreiter and Schmidhuber and further improved by Gers et al. [26]. e computing node of the LSTM is composed of the input gate, output gate, forgetting gate, and a cell. LSTM controls the drop or increase information through gates and what new information is added to the input gate control; the output gate is used to control how much of the current cell state is filtered out. e output value of the forgetting gate is between [0, 1]: 0 means that all information is discarded, and 1 means that all information is retained. Figure 2 is the compute node of LSTM. Cell is used to record the current status, which can be defined using the following equation: Input transformation is defined as e status is updated as where x t and h t are the input and the output of the computing node, i t is the output of the input gate, f t is the output of the forgetting gate, c t is the cell unit state, and o t is the Mathematical Problems in Engineering 3 output of the output gate, t is the current time, and W and b are the parameter matrices.
In formula (14), it can be seen that not only o t works but h t is a special feature of the LSTM network, and the problem of low training efficiency of parameters is solved.

Description of the High-Frequency Component
Forecasting. e fluctuation of the high-frequency component is gentler than the original load sequence but still exists. e sequence after the recombination of the highfrequency component accounts for a large proportion of the original load sequence. erefore, the high-frequency component forecasting results directly determine the quality of the load forecasting model proposed in this paper. e high-frequency component reflects the randomness of the load fluctuation. During the process of power load data measurement, the abnormal values are unavoidable because of power failure, machine breakdown, poor communication, and so on. It makes an obstacle to precise load forecasting. For the possibility of abnormal data in the measured load data, the iForest algorithm is used to clean the training samples.
Next, for the purpose of removing burrs in the highfrequency component, Mallat algorithm is used to decompose the cleaned high-frequency component sequence. e low-frequency and high-frequency signals are obtained. Among them, the low-frequency signal is an approximate signal which can reflect the trend and characteristics of the high-frequency component sequence. erefore, the reconstructed low-frequency signal sequence is used to forecasting the original high-frequency component in combination with the LSTM neural network.

iForest Algorithm.
iForest algorithm is a fast anomaly detection method based on ensemble, which has linear time complexity and high accuracy [27]. iForest algorithm is based on two characteristics of abnormal data: (i) abnormal data only account for a small amount and (ii) abnormal data eigenvalues are very different from normal data. iForest is composed of a large number of binary trees. A single binary tree is called isolation tree (iTree). e iTree building process is a completely random process. e implementation steps are as follows [28]: (1) Suppose the training data set is X, the current tree height is h, and the height limit of the tree is l (2) X is placed in the root node, a dimension q is randomly selected in X, and value p is randomly selected between the maximum and minimum values on q (3) Samples in X that are larger than p on q are placed in the right child node, and other samples are placed in the left child node (4) en, repeat (2) and (3) until there are more identical samples or only one sample in each child node or the height of the tree reaches l e average path length of point x to be judged on each tree is counted as E(h(x)). e path length of x is counted as h(x), which is the number of edges experienced by the root node to the external node. e formula for calculating the anomaly score of each data to be measured is where c(n) � 2H(n − 1) − (2(n − 1)/n) is the average path length of the binary search tree for the purpose of the result normalization. H(k) can be estimated by the formula H(k) � ln(k) + ξ, where ξ is Euler's constant. According to formula (15), it is shown that when the score is closer to 1, the probability that it is an abnormal point is higher. When the scores are smaller than 0.5, then the basic data can be determined as normal. When all the scores are near 0.5, then the data do not contain obvious abnormal samples.

Mallat Algorithm.
Mallat algorithm, which is a kind of fast discrete orthogonal wavelet transform [29], is used to decompose and reconstruct high-frequency component sequences.
where c j (k) is an approximate signal, d j (k) is the detail signal, j and k are integers, < * , * > is the inner product, and φ jk (t) and ψ jk (t) are the scale functions. φ jk (t) is obtained by the mother wavelet φ(t) by translation and expansion: where a j and d j are sequences waiting for decomposition according to the decomposition algorithm. H f and G f are low-pass and high-pass filters, respectively. φ j (t) is a lowpass filter. e signal is decomposed into an approximate signal of a large time scale and sets the detail signal at different small scales by wavelet transform. Mallat algorithm [30,31] is used to decompose the original high-frequency component sequence. Considering that Daubechies wavelet has compact support and regularity, the information energy of the nonstationary random load sequence can be relatively concentrated after decomposition.
e approximate low-frequency signal and the high-frequency detail signal are obtained. e approximate signal can reflect the trend and characteristics of the original high-frequency component sequence, while the detail signal reflects the dynamic factors such as disturbance.
e decomposed sequence can be reconstructed by Mallat algorithm according to the following formula: where H * j and G * j are the dual operators of H f and G f , respectively. In order to keep the length of the decomposition reconstruction sequence consistent, the quadratic interpolation method is used by the Mallat reconstruction algorithm via adding zero between each adjacent sequence of the input sequence.

Case Study
e training and test results of the model are strongly related to the size of the selected data set. However, the quality of the results is also strongly related to the quality of the selected data. e load data are inherently unstable. After the frequency domain decomposition, part of the load data shows certain regularity. It is more conducive to the training and prediction of the model, while reducing the number of data sets. From June 1, 2017, to July 5, 2017, the sampling time was 15 minutes, and a total of 3,360 sampling points were researched. e first 2,688-time sampling points in the sample are selected for the training, and the final 672-time sampling points are used for the test. e training samples are decomposed by EMD and the frequency domain decomposition algorithm, and the decomposition results are shown in Figure 3 and 4, respectively.
From Figure 4, we can see that the law of the daily and weekly periodic components is obvious, the low-frequency component is a smooth curve, and the high-frequency component has strong randomness. e frequency domain decomposition method is based on the day of the week and the regularity of these factors to reconstruct the daily and weekly periodic components, and the residual components are reconstructed based on the randomness of meteorological factors. Compared with Figures 3 and 4 shows that (i) the number of components decomposed by the frequency domain decomposition method is less, which can reduce the calculation scale; (ii) the number of high-frequency components decomposed by EMD is more, which is not conducive to prediction; and (iii) as a whole, the frequency domain decomposition method is more regular.
Implementation details of several main algorithms in this paper are as follows: (1) LSTM is provided by the python-based keras package. e training of the model is optimized by adjusting parameters, such as Layers, Units, and Dropout. After testing, the main parameters are set as follows: Layers � 2 (the number of LSTM stacked layers) and Units � 50 (the number of hidden neurons). To prevent overfitting, Dropout is set as 0.5, batch_size � 16 (if the value is too small, it is easy to not converge. If the value is too large, it is easy to fall into the local optimal solution). (2) e "emsemble.IsolationForest" module provided by the sklearn package in Python is used for the iForest algorithm. e 2,688 high-frequency component training samples are cleaned with anomaly points. After cleaning, the number of remaining samples is 2,553.
e main parameters are set as follows: n_estimators � 100 and contamination � 0.05.
According to the description in Section 3, the prediction results, which consist of daily periodic, weekly periodic, and low-frequency components, are obtained.
ere are 2,553 sampling points in the training samples of the high-frequency component after deburring using iForest. Figure 5 shows the result of the high-frequency component is decomposed by Mallat. e forecasting results of the four parts are reconstructed according to the time point. Figure 6 demonstrates the comparisons of HFDA-LSTM, EMD-LSTM, LSTM, and RNN. Mean absolute percentage error (MAPE) is used to evaluate the quality of the model, and root mean square error (RMSE) is used to reflect the precision of the forecasting. ey are defined in formula (19). In Table 1, the error statistics of the overall forecasting results are listed.
where y * i is the forecasting value, y i is the actual value, and n is the number of samples.
In order to present the proposed method in graphs and tables, the proposed method is defined as the hybrid frequency domain analysis-(HFDA-) LSTM method. In order to highlight the advantages of the method proposed in this paper, the prediction results of HFDA-LSTM are compared with those of EMD-LSTM, single LSTM network, and RNN, respectively.
In the amplification area in Figure 6, it can be seen that the forecasting error of the single LSTM method and the Mathematical Problems in Engineering RNN method in the peak load part is relatively large, while the method proposed in this paper is slightly better than the EMD-LSTM method, and the prediction result is relatively ideal. In Table 1, from the point of view of the average value of time sampling points, the method proposed in this paper has the best prediction accuracy, MAPE is 0.62%, RMSE is  7.91 MW, which is 1.11% higher than that of the EMD-LSTM method, and RMSE is 6.94 MW higher than that of the other three methods. is indicates that the short-term load prediction method proposed in this paper has more accurate prediction results, and the discrete range of predicted value and actual value lower degrees.

Conclusions
In this paper, a short-term load forecasting model based on frequency domain decomposition and deep learning is proposed. e frequency domain decomposition algorithm is used to decompose the load data, and the four components of the daily periodic, the weekly periodic, the low-frequency and the high-frequency are obtained. e characteristics of different components are analysed and use different forecasting methods in a targeted manner.
LSTM algorithm is used to forecasting the daily periodic, weekly periodic, and low-frequency components. e combination of iForest and Mallat with the LSTM method is constructed in forecasting the high-frequency component.
e forecasting results of the four components are added together and compared with the prediction results of EMD-LSTM, single LSTM network, and RNN, and the prediction accuracy of the proposed method is higher. e contributions of this paper are as follows: (i) combining the frequency domain decomposition algorithm with the deep learning LSTM network for short-term load forecasting. (ii) For the high-frequency component with large fluctuations, iForest algorithm is used to clean the abnormal data of training samples. en, Mallat algorithm is used to decompose the cleaned samples, the low-frequency signal and the high-frequency signal are obtained, and the low-frequency signal is selected as the training sample combined with the LSTM neural network to forecasting the original high-frequency component.
Next, we consider whether the transfer learning method can be added to the model to improve the sparse decomposition ability to further improve the accuracy of the forecasting model proposed in this paper.

Data Availability
e data used to support the results of this study can be provided by the corresponding authors.

Conflicts of Interest
e authors declare that they have no conflicts of interest.