Study on Short-Term Load Combination Forecasting Model Considering Historical Data Interval Construction

. In response to the insuf ﬁ cient accuracy of load forecasting in power system and the wide range of intervals, a combined short-term power load forecasting model considering the interval construction of historical data is proposed. First, the data are decomposed into relatively stable subsequences using extreme-point symmetric mode decomposition (ESMD), and the adaptive dispersion entropy (DE) of C – C algorithm is proposed to recombine similar subsequences. Then, periodicity and correlation analysis are used to determine the input set of each reconstructed component, and the hybrid strategy improved equilibrium optimizer (HSIEO) is proposed to optimize the output weights of the deep extreme learning machine (DELM) to obtain the prediction values of different components and the historical data errors, and the historical data intervals are constructed based on the errors of each component. Then, based on the upper and lower bound estimation method (LUBE), the proposed improved objectivefunction is optimized usingthe HSIEO-DELM toobtain the component prediction intervals, and the optimal prediction intervals are obtained after superposition. Finally, the experimental comparison shows that the proposed algorithm has higher accuracy and better quality of model prediction intervals.


Introduction
With the rapid development of the "dual-high" power system, there is an urgent need for high-quality load forecasting to provide guidance for power supply and scheduling, in order to maintain the economic stability of the power system.Currently, load forecasting methods mainly include time series analysis [1], regression analysis [2], and artificial intelligencebased approaches such as BP neural networks, support vector machines (SVM), long short-term memory networks (LSTM), deep belief networks (DBN), ELM, thermodynamics-informed neural network (TINN), self-adaptive deep neural network [3][4][5][6][7][8][9], etc.This group of artificial intelligence-based approaches excels in handling nonlinear data and characterizing data characteristics.
Predicting load accurately using a single artificial intelligence approach is a challenging task.As a result, more and more combinatorial model predictions are being developed and applied.Typically, these models consist of two stages: data preprocessing and prediction.In the data preprocessing stage, raw nonsmooth data are smoothed using techniques such as variational modal decomposition (VMD), ensemble empirical modal decomposition (EEMD), wavelet decomposition (WT), and the least squares method for adaptive optimal screening, as described in literature [10][11][12][13].However, these methods have their limitations.For example, VMD requires manual determination of the optimal number of filters, which is time-consuming and requires several experiments.EEMD adds noise, which is difficult to eliminate completely.WT is challenging to determine the number of screening times, and its trend function decomposition is too coarse.ESMD, on the other hand, overcomes these limitations with an obvious trend function and adaptive optimal screening.Additionally, DE proposed in the literature [14] addresses the drawbacks of sample entropy (SE) and permutation entropy (PE), but its initial parameters require manual testing with multiple inputs.Ultimately, these combinatorial models are designed to reduce computational effort while improving the accuracy of load prediction.The second stage of load prediction involves model prediction, which can be optimized using various algorithms.In the literature [15], the particle swarm algorithm (PSO) based on LUBE is used to optimize the BP interval prediction model.Compared to the Bootstrap, Delta transform, and Bayesian network, LUBE reduces the assumption of data distribution, improves the efficiency of operation, and enhances interval quality.However, PSO has weak late search capability, and traditional neural networks are prone to getting stuck in a local optima.The seagull optimization algorithm (SOA) structure employed in the literature [16] is characterized by simplicity, but it presents limited initial search capability and susceptibility to local optima.The equilibrium optimizer Algorithm (EO) introduced in the literature [17] has addressed the speed and global search limitations to some extent.Nonetheless, further enhancements are required to optimize its performance.In contrast, ELM, used in literature [18], has fast learning speed and generalization ability compared to the BP, SVM, and LSTM.However, due to its single-layer structure, it has an advantage in handling low-dimensional data but is relatively weak in capturing high-dimensional features.The DBN used in the literature [19] has successfully addressed the issues associated with traditional neural networks' gradient descent.However, it is accompanied by drawbacks such as lengthy training time and susceptibility to convergence on local optima due to inappropriate parameter selection.The least squares support vector machine (LSSVM) algorithm employed in the literature [20] simplifies the complex process of solving convex quadratic programing problems within SVM algorithms.It demonstrates good performance in handling small sample problems but fails to effectively handle large-scale high-dimensional learning problems.DELM, as used in literature [21], is an application of deep learning in the machine domain that can effectively map high-dimensional data features.This not only improves the prediction accuracy of the model but also enhances its generalization ability.
This paper proposes a short-term electric load combination forecasting model that considers historical data interval construction to address the problems mentioned earlier.The proposed model utilizes the input parameters of adaptive DE to obtain the optimal DE entropy value using the C-C algorithm.The subseries of ESMD is then reconstructed based on the entropy value.Moreover, to address the issue of random parameter selection in the hidden layers of DELM, a highperformance HSIEO algorithm is proposed.First, HSIEO-DELM is employed for point prediction to obtain the predicted values of each recombined component and the prediction error of historical data, enabling the construction of a fitted confidence interval.Second, the results of point prediction for each component are used as inputs for LUBE-HSIEO-DELM interval prediction.Simultaneously, to improve the quality of the optimization interval, the objective function of LUBE is improved by considering the reconstruction of each interval at every moment.Finally, the effectiveness of the proposed algorithm and model is verified through empirical validation.

Data Preprocessing
Preprocessing the raw data and uncovering its underlying features can enhance the subsequent prediction to provide a good database.This study contains various data decomposition and reconstruction techniques, as well as input set analysis techniques.
2.1.Extreme-Point Symmetric Mode Decomposition.The ESMD algorithm, proposed by Jinliang and Zongjun [13], is a datadriven nonlinear time-varying signal decomposition method.It builds upon the original concept of empirical modal decomposition by replacing external envelope interpolation with internal pole-symmetric interpolation.In addition, the method adopts the principle of "least squares" to optimize the final residual mode, which serves as the "adaptive global mean" of the entire dataset and determines the optimal number of filters.The ESMD procedure can be summarized as follows: (1) Find all the poles of the time series x t and record as P i ; 1 ≤ i ≤ n; n is the total number of poles; (2) Calculate the midpoints of the adjacent poles H i (1 ≤ i ≤ n − 1) and obtain the sum of the midpoints of the starting H 0 and H n ending poles by linear interpolation (DI); (3) Constructing cubic spline interpolation curves from n þ 1 midpoints and take their mean values ) Loop the above steps for x t − L * signals, when jL * j ≤ ε (allowable error) or a predetermined K, to obtain mode IMF 1 ; (5) To x t − IMF 1 loop the above steps to obtain the modes IMF 2 ; IMF 3 … until the number of remaining poles is below a set value; (6) varying K within the given interval, repeating the above steps to obtain modalities IMF 1 ; IMF 2 … and the remaining data R again, and calculating the variance ratio σ ¼ ðx t − RÞ=σ 0 at different values of K, with σ 0 being the standard deviation of x t ; (7) Take K at the smallest value of σ and calculate the corresponding modalities IMF 1 ; IMF 2 … and the residual R as the final result.

Dispersion Entropy.
In 2016, Rostaghi and Azami [14] introduced DE as a means of measuring the regularity of time series.This method surpasses SE and PE in terms of speed while also taking into account signal frequency and amplitude.As a result, DE has a higher capacity to differentiate data than PE.The calculation process for DE is outlined below: (1) The result of ESMD decomposition is used as signal input, respectively, and the following operations are performed.(2) Assign y i to the interval [1, C] after normalizing the signal to y ¼ fy 1 ; y 2 ; …y n g using the normal cumulative distribution with the following equation: (3) Constructing the embedding vector: where m is the embedding dimension and d is the time delay.(4) Map the Z m; c i to the dispersion pattern (5) Determine the value of the dispersion entropy: where higher entropy value proves higher chaos.Conversely, the lower.
2.3.The C-C Algorithm.To determine the ideal delay time, Kim et al. [22] presented the C-C method [22].The time series is first split into n subseries as shown below: The test statistics are: where N is the data length and t is the time scale.The correlation integral Cðm; N; r; tÞ, is as follows: where M ¼ N − ðm − 1Þt embedding points; m is the spatial dimension.When N → 1; r takes whatever value of test S ¼ 0, the formula is as follows: According to the statistical conclusion of BDS, m ¼ ½2; 5; r j ¼ iσ=2; i ¼ ½1; 4; t ¼ ½1; 2000 is the optimal value interval.
where SðtÞ the first zero point or ΔSðtÞ the first local minima is the optimal time delay τ; S cor ðtÞ the minimum point is the optimal time delay window τ w , and the value of the embedding dimension m can be obtained by τ w ¼ ðm − 1Þτ.

Input Set Analysis
Method.The analysis presented below ensures that the input data can be utilized for high-quality forecasting while minimizing data redundancy and enhancing forecasting efficiency.
2.4.1.Cyclical Analysis.Periodicity analysis is performed for the reconstructed subsequences to determine the approximate period of each reconstructed subsequence.The formula is as follows: where S is the total number of peaks and troughs; x i is the wave or trough.

Correlation Analysis.
In order to determine the factors affecting the load, correlation analysis is performed for a variety of climatic conditions.In this paper, Pearson correlation coefficients were selected for the analysis, and the specific correlations are shown in Table 1.Equations are as follows: where S is the sample standard deviation and COVðX; YÞ is the sample covariance, as follows: where X ave and Y ave are the data sample mean, respectively, 2.4.3.Normalization.In order to improve the accuracy and efficiency of the model, the formula is as follows: where X is the data to be normalized; X min and X max are the minimum and maximum values of the data, respectively; Y is the normalized data; and Y min and Y max are the minimum and maximum values of the normalized interval.In this paper, the data are normalized at ½ − 1; 1.

Predictive Model
The prediction model algorithm plays a decisive role in the good or bad prediction results.This paper contains optimization algorithms as well as machine learning algorithms.

Equilibrium Optimizer
3.1.1.Algorithm Principle.The equilibrium optimizer algorithm (EO) was inspired by the control volume mass balance model for estimating the dynamics and equilibrium states [17].The concentration of nonreactive components is described according to the mass balance equation, whose first-order equation contains the entry, exit, and production of mass in the control volume.The equation is as follows: where V is the volume; C is the concentration; Q is the rate of flow in or out, C eq the equilibrium state concentration, and G the mass production rate.
Letting λ ¼ Q=V solving the differential equation of Equation ( 5) yields: 3.1.2.Specific Steps(1) Initialized in the interval ½C min ; C max , is as follows: where rand i is the vector of random numbers for individual i in the interval ½0; 1.
(2) To increase the algorithm's capacity for global search and prevent it from becoming stuck in a local optimum.Four roughly ideal people and the mean value together make up the equilibrium pool, as seen below: (3) The exponential term coefficient F, which has an important effect on the concentration update, It is changed to balance the algorithm's local and global search capability.
where a 1 and a 2 are constants; r and λ are vectors of random numbers in the interval ½0; 1; I NOW and I MAX are the current and maximum number of iterations, respectively.(4) Generation rate, used to enhance the search capability and improve the accuracy of the solution as follows: where G control is the control parameter of the generation rate.r 1 and r 2 are vectors of random numbers in the interval ½0; 1.
(5) Standard concentration update: where C is the concentration; mod is the residual function.

The Dimension-by-Dimension
Mutation.The variation of ideal individuals may fully utilize the bootstrap effect of optimal persons to get the population closer to the optimal individuals while enhancing population variety, preventing the algorithm from entering a local optimum.Second, using dimension-by-dimension mutation can also avoid interdimensional interference.As a result, the t-distribution mutation [23] is utilized in this paper to enhance the EO.The following is the precise formula: where C best is the current optimal solution and C new is the optimal solution after mutation.Discriminate using the greedy principle, if C i new <C i best then replace the original solution with the mutated new solution and calculate the optimal fitness.

Deep Extreme Learning Machine
. DELM, which is alternatively referred to as multilayer extreme learning machine [24], employs a series of ELM-AE cascades for hierarchical unsupervised pretraining.Through this approach, DELM is able to train its hidden layers to effectively capture the features of the input data.Specifically, the output weights for the hidden layers can be expressed as follows: Hidden layer matrix: The objective of the initial ELM-AE in DELM is to obtain the output weights of the first hidden layer, which will be used as input for the subsequent ELM-AE layer along with the target output.This process is repeated until all layers have been trained, and the final step involves using ELM to train with the obtained weights and achieve the desired output.Notably, the output of each hidden layer always represents the original data, thereby resulting in the feature expression: where X is the sample function; N is the number of samples; a is the hidden layer input weight; b is the bias; C is the regularization factor; L is the number of the first hidden layer; β is the output weight.Where a T i a i is the unit matrix; b T i b i ¼ 1 In this paper, the mean square error, as the accuracy of the evaluation point prediction value, the smaller the MSE, the more accurate the prediction result.The specific formula is as follows:

Objective Function and Interval Construction
Where: N is the sample size, P i is the predicted value at moment i; P i is the average of N predictions, and P real i is the true value o at moment i.

Interval Prediction Evaluation Index and Objective
Function.The LUBE technique utilizes a neural network to directly predict the upper and lower bounds of a given interval [25].In order to achieve high-quality interval prediction, the objective function takes into account both the prediction interval coverage and the interval width.The specific structure of this approach is depicted in Figure 1.
(1) Prediction interval coverage: where: B j ¼ 1 is taken when the target value is within the prediction interval [L j ; H j ] and B j ¼ 0 is taken when the predicted value is outside [L j ; H j ].
(2) Average width of the predicted interval: (3) Using the composite predictor as the objective function: where R is the range of target value change, ζ is the Boolean constant, and Z set is the confidence level.
4.1.3.Improving the Interval Objective Function.The composite index function of average width plus penalty factor does not consider interval quality for each moment, resulting in poor overall interval quality during the search for the best solution.Therefore, this paper proposes an improvement to the composite index.When PICP ≤ Z set and PICP j ¼ 0, by randomly increasing its bandwidth, the equation is as follows: For the remaining cases, the equations are as follows: In summary, the proposed composite index is: 4.2.Historical Data Interval Construction.For the initial interval setting of the training set historical data, the fitted confidence interval at a certain confidence level is made by fitting each recombination component point prediction error separately using data fitting.By this approach, most of the cases of taking the recombination component up and down worthy are changed, so that the initial interval converges to the recombination component infinitely.
where Y p and E RROR are the training set predictions and historical data errors, respectively; DELTA the width of the confidence interval of historical data errors.(5) Inputs for interval prediction are determined based on the historical data and predicted values.(6) The interval historical data output target is constructed using Equation ( 35).( 7) The input set is then utilized in HSIEO-ELM to obtain prediction intervals for each component, which are subsequently superimposed to obtain the final interval prediction results.

Simulation Experiments
In this Experiment, we utilize the electric load dataset provided by the 9th Electrician Mathematical Modeling Contest.This dataset includes the electrical load data for every 15 min of the day, along with the daily climatic conditions such as maximum temperature, minimum temperature, average temperature, relative humidity, and rainfall.We select the total historical load data for each day from July 26 to August 24, 2012 as the training set to predict the load values and intervals for the next 2 days.

Real Data Preprocessing.
The training set is first decomposed into relatively stable subseries using ESMD.The initial parameters of ESMD are set as the maximum and minimum screening times ½1; 50, along with the remaining poles Num ¼ 4. The decomposition results are presented in Figure 4.
According to Figure 4(a), the optimal number of filters is K ¼ 40.However, Figure 4(b) shows that the original data were decomposed into seven subsequences comprising six relatively smooth modal components and one residual component.As there were a large number of decomposed subsequences, reconstruction of these subsequences was necessary.

Journal of Sensors
The C-C algorithm was used to adaptively determine the optimal DE, as illustrated in Figure 5.
According to Figure 5(a), the C-C algorithm determined the optimal time delay to be 2. Using Eq. τ w ¼ ðm − 1Þτ, the optimal embedding dimension m was found to be 6.With these parameters, the DE entropy value was calculated and similar subsequences were merged, as depicted in Figure 5. Four reconstructions were obtained, as demonstrated in Table 2. Re1 represented the high-frequency component that exhibited high randomness and indicated the load's unpredictability.Re2 corresponded to the medium-frequency component that exhibited relative stability.Re3 represented the low-frequency component with high periodicity, indicating the load's regularity.Finally, Re4 captured the load trend over a period of time, as demonstrated in Figure 6.
Afterward, the components were analyzed periodically, as shown in Table 3.The average period obtained by analyzing each component with climatic conditions according to the method of correlation analysis mentioned above is shown in Table 4.   Journal of Sensors information from previous days is integrated to determine the point forecast input set, as shown in Table 5.

Real Data Point Prediction. According to
In order to verify the effectiveness of the proposed point prediction algorithm, three groups of experiments are set up to compare the prediction results as well as the algorithm convergence effect.The first group is a single prediction    results and the convergence effect of the algorithm are compared.DELM parameters for each algorithm were set with a population of 50, maximum iterations of 50, weight bounds of 1 and −1, and two implicit layers in the DELM model with sigmoid activation function.The number of hidden layer nodes was set to ½6; 4 for DELM, and 6 for ELM.RMSE was used as the objective function for rolling point prediction.Figure 7 shows prediction results.
Table 6 displays the prediction error results, which were utilized to evaluate the accuracy of the point prediction.
From Figure 7(d), it is evident that HSIEO exhibits faster convergence speed and better convergence performance compared to the other algorithms discussed earlier.Analysis of Table 6 reveals that the overall accuracy of the optimized prediction model is higher in all three sets of experiments after data processing.First, the incorporation of optimization  To validate the effectiveness of the proposed algorithmic model, three different scenarios were constructed.Scenarios 1-3 were calculated using LUBE-HSIEO-DELM with a 95% confidence level.In Scenario 1, the initial interval was set to 20% above and below the power load, and the objective function used was Equation (31).In Scenario 2, the same initial interval as in Scenario 1 was used, but an improved objective function, Equation (34), was employed.In Scenario 3, the initial intervals were constructed using the method proposed in this paper, and the intervals were predicted using the improved objective function Equation (34). Figure 11 presents the prediction results.
The interval evaluation of the predicted results of Figure 11 is shown in Table 7.
Table 7 calculation results demonstrate that Scenario 2 has a higher average width of interval prediction and interval coverage compared to Scenario 1, indicating that the proposed interval prediction objective function is superior to Equation (31).The reason for this is that it is more desirable to consider each moment interval quality individually compared to penalizing it with an average width.Scenario 3 has significantly improved the average width of interval prediction and interval coverage compared to Scenario 2. This proves that the interval construction method proposed in this paper can provide a better initial data base compared to simple up-and-down intervals, thus improving the prediction results.
To further showcase the advantages of the algorithmic model proposed in this paper, Scenario 3 is compared with HSIEO-DELM based on the Bayesian analysis and HSIEO-DELM based on the bootstrap method for interval prediction results.Figure 12  Journal of Sensors all the aforementioned experiments were conducted under the data processing condition of ESMD-DE.The data in Table 8 shows that the prediction performance of HSIEO-DELM based on the Bayesian analysis, HSIEO-DELM based on the bootstrap method, and Scenario 3 increases sequentially.In particular, the coverage increases from 0.9583 to 0.9792.This indicates an increase in the probability that the actual values fall within the predicted range.In addition, the average width of the intervals decreases from 0.1365 to 0.0885, which means that the estimation of future intervals is more accurate.The reason is that bootstrap has put back a large number of self-help samples in the original dataset, which is very sensitive to the amount of data and outliers and affects the prediction results.Bayesian networks make probabilistic inferences based on the observed data and known prior data, and the accuracy depends heavily on the prior data.In contrast, the method proposed in this paper directly uses the lower and upper bounds to represent the uncertainty range of the prediction results, which has obvious advantages in the face of uncertainty and the same amount of data.Therefore, it can be concluded that the proposed short-term power load forecasting model considering the interval construction of historical data outperforms the other interval prediction models.

Conclusion
This paper proposes the ESMD-DE-HSIEO-DELM point forecasting algorithm to enhance point load prediction accuracy and interval prediction quality.Furthermore, a short-term load interval forecasting model is assembled using the LUBE-HSIEO-ELM algorithm that incorporates interval construction.The following are the main conclusions obtained from analyzing the example results: (1) The data processing stage decomposes the data into smooth and distinct components using the ESMD-DE method.The C-C algorithm is proposed to adapt the optimal DE value, which eliminates the need for multiple manual attempts to reconstruct the method and improves the operational efficiency of the model.(2) The HSIEO algorithm enhances the search for the optimal solution by leveraging both the uniform distribution of the initial population's positions and individual optima.This enables HSIEO to achieve better convergence speed and more favorable convergence outcomes compared to other algorithms.Ultimately, the application of HSIEO-DELM leads to significant improvements in the prediction results of the DELM model.(3) The interval construction method proposed in this paper provides higher quality initial intervals for subsequent interval prediction compared to traditional up-and-down floating interval construction, resulting in better interval prediction.(4) The improved interval prediction objective function increases the prediction width analysis by analyzing cases that do not satisfy the preset confidence level, achieving higher interval coverage, and narrower average width compared to the traditional CWC objective function.(5)

FIGURE 7 :
FIGURE 7: Point load forecast results: (a) single prediction model; (b) algorithmic optimization prediction model; (c) algorithmic optimization prediction model after data processing; and (d) algorithm convergence.

FIGURE 10 :
FIGURE 10: Historical data interval construction.(a) Re1 historical data interval construction; (b) Re2 historical data interval construction; and (c) Re3 historical data interval construction.

Table 4 ,
Re1 and Re2 are not significantly correlated with the climatic conditions.Re3 is affected by maximum temperature, average temperature, relative humidity, and rainfall at that time, while Re4 is significantly influenced by maximum temperature, 8Journal of Sensors minimum temperature, and relative humidity.These climatic conditions are marked as "W".Considering customers' electricity demand can be divided into three stages during a day labeled as Type, low-electricity load at ½1; 9 is marked as 0.4, high-electricity load at ð9; 22 is marked as 0.5, and ð22; 24Þ and ð0; 1Þ represent decreasing electricity loads that still remain at a high-level marked as 0.1.Pðt; dÞ represents the electricity load demand at time t of day d as the forecast output, and

TABLE 7 :
presents the prediction results.Additionally, Interval evaluation index.

TABLE 8 :
Compared to the other interval prediction algorithms, the interval prediction algorithm proposed in this Interval prediction results: (a) Bayesian analysis and (b) bootstrap method.Comparative results.the coverage and the average width of the interval prediction, resulting in better interval prediction quality.