Electric Load Forecast Using Combined Models with HP Filter-SARIMA and ARMAX Optimized by Regression Analysis Algorithm

Electric load in summer has a significant cyclical trend with temperature effects. In general, the parameters of the SARIMA and the SMA turn out to be nonsignificant in most cases. To address this issue, the hybrid time series model is utilized to extract the spectrum sequences with different frequencies. The original electric load series are first decomposed into the trend sequence “G” and the cycle sequence “C.” After that, a revised ARMAX model is proposed to deal with the two divided sequences. Finally, the combined models are tested by case study. The case study on electric load forecast in one city from China shows that the proposed model outperforms other four comparative models in terms of prediction accuracy. It proves that the combined model proposed by the authors is more accurate than those based on a single forecasting method.


Introduction
Load forecast is mainly used to predict power load in the next few days [1,2], which plays an important role in the modern electricity Demand Side Management (DSM).Accurate forecast on short-term electric load is a crucial element to the dynamic operations of Advanced Electricity Demand Side Management (EDSM) and Advanced Power Information Systems (APIS) in improving the efficiency and the safety of power grid.To a certain extent, mediumterm power load is affected by seasonal factors, summer temperature, and consumption peaks due to unexpected cases.In general, high temperature in summer may form a high "air conditioning load."With the rise of temperature, air conditioning load increases as well.Temperature data fluctuate widely in summer.Electric load also changes a lot with temperature.Theoretically, it is difficult to forecast mutable data, such as summer air conditioning load.To address this problem, this paper proposes the combined model with HP Filter-SARIMA and ARMAX model optimized by regression analysis algorithm.
Over the past decade, many forecasting methods have been put forward, such as time series, gray model, SVM, and artificial neural networks (ANN) [3].Azadeh et al. explored seasonal fluctuation and nonlinearity in forecasting based on the fuzzy system and data mining techniques to analyze monthly electricity demand in Iran.This model is established on well developed statistical theories to show explicit relationships between input data and outputs.However, the SARIMA model does not perform well when electric power deviates greatly from the normal weekly pattern.It cannot react under abnormal load conditions before the flow deviation is detected [4].Hamzacebi and Es predicted annual electricity consumption by the optimization GM(1, 1) model, which is more effective in analyzing the long-term trend but has less effect in seasonal variation [5].Zhu et al. used the HP filter to decompose the GDP sequence into the tendency item and the cyclical item [6].He et al. also took advantage of the HP filter in energy price analysis to study the synchronization between the markets home and abroad [7].Zhang et al. employed four improved adaptive coefficient approaches optimized by particle swarm optimization (PSO) to forecast daily mean wind speed, the simulated results of which showed that the PSO obtained an observable improvement in forecasting performance [8].Guo et al. proposed a modified EMD-FNN model by combining 2 Mathematical Problems in Engineering empirical mode decomposition (EMD) with the ensemble learning paradigm of feedforward neural network (FNN), which had better accuracy than that based on the basic FNN and unmodified EMD-FNN [9].
The multivariate ARIMAX model is hypothesized to improve the one-step-ahead forecasting accuracy of the univariate SARIMA model.Bierens and Broersma used the ARMAX model to study the relation between unemployment and interest rate.They found that the relationship is not confined only to the Netherlands; but it holds for USA, Canada, Japan, Germany, UK, and France [10].Bordignon et al. analyzed combined versus individual forecasts for British electricity price prediction.It is found that combined forecasts are more accurate than or at least equivalent to individual ones [11].Yan and Chowdhury adopted a hybrid midterm forecasting model based on the combination of both least squares support vector machine (LSSVM) and autoregressive moving average with external input (ARMAX) modules to forecast electricity market clearing price (MCP).It is shown that the proposed hybrid model can improve forecasting accuracy compared with the forecasting model using a single LSSVM [12].Wang et al. proposed a two-stage model in estimating value-at-risk (VaR) based on ARMAX-GARCHSK and extreme value theory (EVT).It is shown in the empirical analysis that the ARMAX-GARCHSK-EVT model can rapidly reflect the most recent and relevant change of electricity prices, with accurate forecasts of VaR at all confidence levels, thereby presenting better dynamic characteristics [13].Yang et al. proposed a new evolutionary programming (EP) approach to identify the autoregressive moving average with exogenous variable (ARMAX) model for hourly load demand forecasts from one day to one week ahead.The developed EP based load forecasting algorithm is verified by different types of data for Taiwan Power (Taipower) system and substation load as well as temperature values [14].Huang et al. proposed a new particle swarm optimization (PSO) approach to identify the autoregressive moving average with exogenous variable (ARMAX) model for load forecasts.It is indicated by the testing results that the proposed PSO has the characteristic of high-quality solution, superior convergence, and shorter computation time [15].
Wangdi et al. adapted ARIMAX model to determine predictors of malaria in the coming month.ARIMAX model is an extension of ARIMA modeling in an attempt to predict the malaria cases using the climatic factors and the number of cases in the previous month.The predictors in the model include the number of cases in the previous month, mean maximum and minimum temperature, relative humidity, and rainfall lagged in a month.It is shown by test results that prediction accuracy has been greatly improved [16].
The above forecasting methods have obvious effects in dealing with the cyclical and trend data.However, the generalization capability is generally weak.Traditional time series forecasting methods can be used to predict short-term load data [17].The forecasting accuracy of the method is not as good as the combined models [18].Autoregressive integrated moving average with exogenous variables (ARIMAX) model is mainly aimed at forecasting the data under the influence of external factors.However, there are only a few literatures discussing this model for load forecast.With respect to the above active research, the combined models of HP Filter-SARIMA-Revised ARMAX optimized by regression analysis are proposed here to forecast short-term electric load.They have strong application value in summer load forecasting field.

Forecasting Methods
White Noise.Time series {  } meet the condition of A ∀ ∈ , ∃  =  and B ∀,  ∈ ; then named {  } for white noise sequence or is displayed below: Autocorrelation Function.Consider Cointegration Theory.Cointegration theory was put forward by Engle and Granger [19].Forecasting models can be developed without requiring that all sequences are stationary.Regression model is as follows: Cointegration relationship exists between the independent variable sequence { 1 }, . . ., {  } and response variable sequence {  }, with stable regression residuals sequence.There is a strong correlation between electric load and temperature in summer, which is shown in Figure 4.The cointegration relationship is evident, which needs to be tested at the second section of the combined forecasting models.
where  ()  is purely deterministic component; ∑ ∞ =0    − is a moving average time series.ARMA model (autoregressive moving average model) requires that sequence itself is stable.Thus the nonstationary sequence data is hard to be managed.
ARMA model is defined as By introducing delay operator , it can also be presented as Φ()  = Θ()  .
In terms of structure, ARIMA(, , ) models are the same as ARMA(, ) models, where the time series has first been transformed by differencing, the order of which is specified by .ARIMA model flowchart is as follows in Figure 1.

HP Filter
Hodrick-Prescott Filter.Hodrick and Prescott first put forward Hodrick-Prescott filter (HP filter) method in an analytical paper about economic cycle in postwar America [20].The sequence is divided into two groups: the sequence {} with long-term trend, denoted as  = { 1 ,  2 , . . .,   }, and the sequence {} with short-term volatility, denoted as  = { 1 ,  2 , . . .,   }.The relationship is counted as In this paper, HP filter is applied to the non-seasonally adjusted series, and the original sequence is divided into two sequences with the significant spectral frequency.The model is more accurate, which weakens the interaction among factors like seasonality, trend, cycle, and so forth [21].

ARMAX
ARMAX Model.Supposing that the response variable {  } and the input variable sequences { 1 }, . . ., {  } are all stationary, then the regression model is established in response to the input variable sequences and response sequences [22] The ARMAX model is an improvement of the ARIMA model with explanatory exogenous variables ().The model is a combination of a regression model with an ARIMA model, which includes the advantages of both models.In the actual modeling process, a combined "HP Filter-SARIMA-Revised ARMAX" model is proposed to forecast the shortterm electric load.The specific process is displayed below.

Grey Prediction Model.
The grey prediction processes are as follows [23].

MLP Neural Network. Multilayer perceptron neural network (MLP neural network
) is one of the artificial neural networks which contains three processes, namely, training, testing, and validation [24].Figure 2 is a multilayer neural network diagram Input layer Hidden layers Output layer Synaptic weight change rules for the neurons of the hidden layer are as follows: Synaptic weight change rules for the output neuron are as follows: 2.2.6.Regression Analysis.Regression analysis is used to forecast the value of one variable (dependent variable) based on other variables (independent variables).

Modeling Steps
Step 1.The original sequence {  } or {  }, defined as the superposition of the waves with different frequencies, can be divided into the sequence {} and the sequence {} [25], the separation process of which must satisfy the minimum loss function principle: Step 2. The sequence {} is the function of time.The error term is denoted as  1 ,  2 , . . .,   .The method of ordinary least squares (OLS) is used in making polynomial curve fitting to minimize the sum of squared errors.Polynomial fitting can be represented as The sequences' predictions can be got and defined as ĝ .
Step 3. The stationary property of the sequence {} is tested.
If it is stationary, the correlation analysis can be used on the sequence; If it is stationary, the correlation analysis can be used on the sequence; if otherwise, the difference transformation is conducted on the sequence until it is stationary [26,27].The stationary sequence is denoted as {  }.
Step 4. Through the correlation analysis, the autocorrelation and moving average items can be acquired.The ARIMA model is presented as Step 5. Consider testing of the rationality of the model by the residual sequence.
Step 6.By ARIMA model, the predictions of the sequences can be denoted as ĉ .
Step 7. The final prediction result can be got according to the principle of HP filter: Step 8. Consider exploring the correlation coefficient between the stationary sequences " ŷ " and " X " to determine the structure of the improved ARMAX model.This step is an improved version for traditional ARMAX model.The revised ARMAX model can be calculated as follows: Step 9. Consider fitting residual sequence {  }: where {  } is a zero mean white noise sequence.
Based on above steps, the combined "HP Filter-SARIMA-Revised ARMAX" model can be applied in load forecasting process.

Modeling Flowchart.
The forecasting processes of the combined models are shown in Figure 3.

Load Forecasting with ARMA Model
4.1.Data Source.Figure 4 shows the daily maximum power load and the maximum temperature in a city in China from May 1st to July 15th.The specific information about the city is not allowed to be shared here.In this paper, the classical time series models and the combined models are applied to forecast load.The prediction results of different models are compared in Section 5.
It is shown in Figure 4 that load data has clear cyclical fluctuations by observing the sequences.The sequences are obviously nonstationary.

Establishing
Seasonal-ARIMA Model.The ADF unit root test demonstrates that the original sequence is nonstationary, while the first-order difference of the original sequence is stationary under the 5% significant level [28].Therefore, the additive model is used to adjust the seasonal sequence.The analysis on the partial autocorrelation and the autocorrelation is shown in Figure 5.
It is shown in Figure 5 that the autocorrelation of daily peak power load data is in stationary and periodic series.Partial autocorrelation shows that only the first-order partial autocorrelation coefficient is significantly greater than two times standard deviation [29].The rest partial autocorrelation coefficient rapidly declines to zero, making random fluctuations within two times standard deviation ranges.Thus it may be regarded as the first-order truncation [30].
In the process of model order selection, MA (11) is greatly influential in modeling and predicting.This phenomenon  shows that the errors caused by long-term observations still affect the current monthly electricity consumption to some extent [32].It is inferred that monthly electricity consumption may have the characteristic of long-term memory.This conjecture is confirmed by long-term memory test using the Rescaled Range Analysis (/) method, which shows the Hurst exponent:  According to the criteria, 0.5 < 0.837 < 1, it is known that the monthly electricity consumption has the long-term memory characteristic.Therefore, the current forecast is influenced by distant observations [33,34].

Integrative Models
Using HP Filter.HP filter is applied in the analysis of original sequence.The decomposition results are shown in Figure 6, in which the blue curve is the original sequence, while the red curve is the long-term trend.It can be found that the growth rate of the power consumption is mainly constant from May 1st to July 15th.The green curve changes cyclically and irregularly.As time goes on, the fluctuation range is larger.
According to HP filter, the original electric load sequence can be separated into the sequence {} with long-term trend and the sequence {} with other fluctuant properties [35].
Data separation results using HP filter are shown in Figures 7 and 8.It can be seen that the sequence {} approximates a smooth curve, while the curve of the sequence {} fluctuates up and down around zero.
In the perspective of statistics, the sequence {} can be transformed into a time-related sequence.The independent variable  represents time, while the sequence {} shows the dependent variable of the system.
The fitting polynomial of electric load data is Fitting figure of load is shown in Figure 9.
The sequence {} of temperature from May 1st to July 15th can be forecasted based on fitting polynomial [36], while the sequence {} is predicted based on ARMA (1,2).
The same procedure can be applied to temperature data.Based on HP filter, the original temperature sequence can be separated into the sequence {  } with long-term trend and the sequence {  } with other fluctuant properties in Figure 10.
Data separation results using HP filter are shown in Figures 11 and 12.It is shown that the sequence {  } looks like a smooth curve, while the curve of the sequence {  } fluctuates up and down around zero.
The fitting polynomial of temperature data is The sequence {  } of the temperature from May 1st to July 15th can be forecasted based on the fitting polynomial, while the sequence {  } is predicted based on ARMA(1, 1).Fitting figure of temperature is shown in Figure 13.
Finally, the input value of revised ARMAX model is obtained based on the HP filter principle ŷ = ĝ + ĉ , x = ĝ + ĉ .It is indicated in Table 3 that x is a stationary white noise sequence.

Load Forecasting with
The best order for ARMA(, ) model is [AR(0), MA(1)], which is shown in Table 4.
Secondly, the x model is set up.The test results obtained by SAS show that x is a stationary white noise sequence [37].Therefore, the fitting model is ARMA(0, 1) or MA(1) model.Fitting parameter of x is shown in Table 5.The final fitting model is shown below:

Computing Load Data with Revised ARMAX Model.
The above model is used to filter input variable sequence { x1 }, . . ., {x  } and the response variable sequence { ŷ }, which is followed by the calculation of mutual relationship number between the independent variables and the response variable after filtration.The regression analysis in Table 6 shows that the final regression coefficient is 0.62871.The statistics test is conducted with residual sequence, showing that the residual sequence is stationary white noise sequence (Pr > 0.05) [38,39].The fitted model for residual sequence is   =   , where   is zero mean white noise sequence.
The number of 0-order delay mutual relationship is significantly nonzero, which means that there is no hysteretic    effect between response sequence and input sequence.Thus the model should be treated in the same period [40].The statistics test is operated with residual sequence, showing that the residual sequence is stationary white noise sequence.The fitted model for residual sequence is   =   , where   is zero mean white noise sequence [41][42][43].It is known that there is significant correlation in the zero order between the two sequences.The same period model between ŷ and x is established based on the results in Table 6: The load from 16th to 31st is forecasted according to the combined models, HP Filter-SARIMA-Revised ARMAX.The second column in Table 7 is the actual load data (10 4 kw⋅h).The third column is the actual temperature ( ∘ C).The fourth is the prediction value of SARIMA model.The fifth is prediction value of ARIMA model.The sixth is prediction value of grey system theory.The seventh is prediction value of MLP neural network.
In our work, the prediction data points and the errors for five models from July 16th to July 31st (training set) have  been conducted to assess the models' fit performance.
Forecast graphic is displayed in Figure 16.
It is shown in residual stationarity and white noise test that the residual is stationary white noise sequence,   =   .There is a second-order delay correlation between ŷ and x.The final fitting model is ŷ = 4.51539 + 0.97436 x−2 +   ,   =   . (33)

Conclusion
Based on the above analysis, it is shown that the combined HP Filter-SARIMA-Revised ARMAX models can effectively  forecast electric load with external variables.The structure of the model is determined by the statistical regression analysis and the least squares method.The process strictly follows the rules of -test, AIC, and SBC.The combined models are more accurate than the single forecasting method for short-term electric load forecasting.A total of four traditional forecasting models are applied to forecast electric loads in this paper.It has been proved in empirical analysis that the combined models have small relative error compared with the traditional methods.The prediction accuracy of the combined models is greatly improved.The significance of parameters has been greatly enhanced.With the existing literatures and the analysis in this paper, researchers may find out that combined models have better prediction performance in dealing with special data compared with a single model.

Figure 3 :
Figure 3: The combined models with HP Filter-SARIMA and the revised ARMAX.

Figure 4 :
Figure 4: Electric load and temperature figure.

Figure 6 :
Figure 6: Separated electric load sequence by HP filter.

Figure 16 :
Figure 16: Forecasting graphics of different methods.

Table 1 :
The parameter estimating of the ARIMA model.

Table 2 :
Comparison of AIC and SBC.
Combined Models 4.4.1.ŷand x Sequences Model.Firstly, the ŷ model is established.It is shown in the test that ŷ is a stationary white noise sequence; thus the fitting model is ŷ =   .(29)

Table 3 :
Autocorrelation check for white noise.

Table 5 :
Fitting parameter of x.

Table 6 :
Parameter estimates of REG procedure.

Table 7 :
Predictions of five models.

Table 8 :
The comparison of error among five models for training set.The error histogram for combined model and other comparative models.
(  ): Mean of time series {  } var(  ): Variance of time series {  } cov(  ,  −1 ): Correlation coefficient of time series {  }   : Autocorrelation function   :A s t a t i o n a r y s e q u e n c e :The average of the sequence {  }