SARIMA-Orthogonal Polynomial Curve Fitting Model for Medium-Term Load Forecasting

Seasonal component has been a key factor in time series modeling for medium-term electric load forecasting. In this paper, a seasonal-ARIMAmodel is developed, but the parameters of the SAR and the SMA turn out to be quite nonsignificant in most cases during the model order selection. To address this issue, the hybrid time series model based on the HP filter is utilized to extract the spectrum sequences with different frequencies and analyze interactions among various factors. Finally, an integrative forecast is made for the electricity consumption from January to November in 2014. The empirical results demonstrate that the method with HP filter could reduce the relative error caused by the interaction between the trend component and the seasonal component.


Introduction
To a certain extent, the medium-term power consumption is affected by the seasonal factors, the historical consumption, and consumption peaks caused by unexpected events.According to the current prediction techniques, these factors are temporarily categorized into the long-term trend (), the seasonal fluctuation (), the cycle volatility factors (), and the irregular volatility factors ().The influences of these factors superimpose on each other and thus that became a difficult problem in model construction.
In recent years, many researches have been conducted in the field of the above four fluctuation factors in power load forecasting.The neural network method is often used to make predictions on electric load [1,2], whereas, considering the data volume, the time sequence model is more in line with the characteristics of the sequence than the neural network model [3].Azadeh et al. [4] combined the seasonal fluctuation and the nonlinearity of forecasting with the fuzzy system and data mining techniques to analyze the monthly electricity demand in Iran.The SVM model can also be exploited to analyze the effect of the seasonal fluctuation and the long-term trend [5].The significant trend sequence can be analyzed through the GM(1, 1) model combined with neural network method [6][7][8].It follows that the characteristics such as trend and seasonal ones are the key factors which affect the accuracy of the load forecasting in the medium-term load forecasting.The SARIMA model which could eliminate the effects of seasonal factor and irregular change factors is more suitable for the monthly electricity consumption forecasting [9,10].The decomposition method of sequence is often used to analyze the superimposed effect produced by seasonal change tendency and the long-term growth and decline trend [11].Among them, the application of Hodrick-Prescott (HP) filtering method has the certain superiority in the series decomposition [12,13].
The seasonal-ARIMA model is able to take all seasonal fluctuation of sequence into full account.However, due to the interaction of the four fluctuations, "", "", "," and "", and the interaction between the seasonal factors and the nonseasonal factors, the seasonal parameters are nonsignificant in practical applications in most cases.HP filter is based on the spectral analysis to separate the data sequences and relieve the superimposed impact of the fluctuations.In this paper, by using the HP filter we get the sequence {} with significant trend and the sequence {} with significant periodicity.With separated modeling and integrative analysis, the model can successfully relieve the mutual influence of the changing trend and improve the precision of prediction.Besides, to cope with the model order problems, the paper has conducted a long-memory test on the original data sequence.The result shows that the sequence does not meet the standard random walk process, which puts forward new ideas with the power load forecasting.
The purpose of this paper is to design an accurate prediction model.And the remaining parts of the paper are organized as follows.Section 2 introduces the principle of the method.Section 3 describes the process of the power load forecasting by using the traditional method and the improved method and discusses the results.The last section makes some conclusions of this paper.

Forecasting Models
2.1.SARIMA Model.SARIMA (Seasonal Autoregressive Integrated Moving Average), which is denoted as SARIMA (, , )(, , )  , is based on the traditional ARIMA(, , ) model, and it can eliminate the periodicity influence in a prediction process and thus is a widely applied model for forecasting seasonal time series [14,15].The formula can be described as follows: where  is the backward shift operator.The integers , , , and  are the order of   (),   (), Φ  (  ), and Θ  (  ), respectively.The integers  and  are the number of regular differences and seasonal differences, respectively, and, for a nonstationary time series   , (1 − )    could come to a stationary series by using the difference operator 1 − . satisfies the formula     =  − .The formulas are polynomials in  of degrees  and .And the formulas are polynomials in  of degrees  and .And   which is a current interference with variance  2 and mean = 0 is considered as the estimated residual at time .At the same time,   is an independent and identically distributed normal random variable.
In the process of the seasonal time series analysis, there are three questions that need to be analyzed.
(1) Stationary Test.The stationarity of the time series {  } is the premise for building the S-ARIMA model.When it meets the condition that ,  2 , and   are constants in formula (4), we can define {  } as weakly stationary or covariance stationary: The ADF unit root test can be used to test whether the sequence {  } is stationary or not.If the sequence is nonstationary, the difference transformation would be used until the difference sequence is stationary.The stationary sequence with differential transformation is defined as   = Δ    .
(2) Seasonal Analysis.Before we make the seasonal analysis, the autocorrelation function should be defined.It can be expressed as follows: where   is a stationary sequence and  is the average of the sequence {  }.By judging the autocorrelation function and the confidence interval, the periodicity and the cycle  of {  } could be obtained.According to the additive model which is defined as (6), the sequence {  } can be seasonally adjusted: where   means long-term trend and cycle volatility,   means seasonal fluctuation, and   means irregular volatility.
(3) Model Order Selection and Model Prediction.Firstly, the seasonally adjusted sequence is defined as { SA }.The lag intervals for endogenous function and the confidence interval of the autocorrelation function and the partial autocorrelation function should be analyzed in order to determine the order of AR(), MA(), SAR(), and SMA() and build the ARIMA(, , )(, , )  model.Then, according to the principle of minimum mean square error, the prediction is the conditional expectation of  +1 , and it can be expressed as follows: ŷ () =  ( +   ,  −1 , . . .,  1 ) ; when the higher-order problem exists in the ARIMA(, , )(, , )  model, we can make the long-memory test for the stationary sequence {  }.

𝑅/𝑆
Method of ARFIMA Model.The long-memory analysis, which is specific to the random walk process, is put forward by H. E. Hurst in the research of the relationship between the reservoir of water flow and the storage capacity in 1951.And he puts forward the rescaled range analysis (/) for the long-memory analysis.Then the researchers often use this method for financial sequence analysis and build the Autoregressive Fractionally Integrated Moving Average (ARFIMA) model [16,17].The analysis procedure of / method is shown in the following paragraph.Firstly, the sequence {  } is divided into the infinite number of intervals, and the length of each interval is .Every interval is defined as follows: where   is the average of the interval   and  , is the cumulative deviation of the interval   .Then the letter  can be used in denoting the difference between the maximum  , and the minimum  , , and the letter  can be used in denoting the standard deviation of the sequence {  }, so the formula of / analysis can be expressed as follows: where , which is the Hurst Index, is defined as the index of  and  is a constant.The logarithm should be taken on both sides of the equation, and adjust the equation as follows: Then the Hurst Index can be worked out by the OLS method.Finally, the long memory can be judged by the standard as follows [18]: 0 ≤  < 0.5 {  } : mean reversion process  = 0.5 {  } : standard random walk process 0.5 <  < 1 {  } : long memory process; (11) when 0.5 <  < 1, the original sequence is likely to have a long memory [19]; however, whether there is an ARFIMA model which is suitable for most of the medium-term load forecasting cannot be guaranteed [20].

Orthogonal Polynomial Curve Fitting. Orthogonal polynomial curve fitting is the improvement of the Ordinary Least Square (OLS).
There is a premise that the independent variables must be accurate values before using the OLS method, but it is not reasonable in most cases.When the error of the independent variables reaches a certain extent, the prediction model with OLS method would produce a certain error.In view of this situation, the orthogonal polynomial curve fitting is proposed.And its basic principle is that the square sum of the orthogonal distance from all points to the fitting curve is minimum.In the OLS method the fitting polynomial can be expressed as follows: which is fitted by the least square criterion: the distance square sum between the predicted value and actual value is minimum, and it can be expressed as follows: then the undetermined coefficients can be got by the mean value theorem.This orthogonal polynomial curve fitting method is improved on the basis of OLS method, and the errors of the dependent variable and the independent variable are considered to build forecasting model.And the fitting polynomial can be expressed as follows: where x is the predicted value of the independent variable   .The orthogonal distance error can be expressed as follows: where   and   are the random error of   and   , respectively.Then the criterion of the orthogonal polynomial curve fitting can be expressed as follows: Combining the orthogonal polynomial with the OLS method, the multinomial model can rise to the imitative effect.
The objective function can be expressed as follows: where   represents the real point,  represents the fitted curve, and (  , ) represents the orthogonal distance from the real points to the fitted curve.
The parameter equation of fitted curve  can be defined as follows: where {, } is a point of fitted curve  and  is the included angle of the tangent to the abscissa axis, so the objective function can be expressed as follows: Then we should take its partial derivative with respect to , , and  in order to calculate the minimum error and the fitted curve.The equation set can be expressed as follows: where  and  are the mean values of the sequences {} and {}.

Hodrick-Prescott
Filter.Hodrick and Prescott first put forward Hodrick-Prescott filter (HP filter) method in the paper analyzing the economic cycle about postwar America.
The method regarded the time series as the spectrum for analyzing [14,15].It divided the sequence into two groups, and their relationship with the original sequence is counted as where the sequence {} with long-term trend is denoted as  = { 1 ,  2 , . . .,   } and the sequence {} with short-term volatility is denoted as  = { 1 ,  2 , . . .,   }.The separation process must satisfy the minimum loss function principle: where , where  is the smoothing parameter and  2  1 and  2 2 represent the standard deviation of the sequence {} and the sequence {}, respectively.When  increases, estimated total trend changes in relation to the change in the sequence which is reduced.It means that  takes the high number, the estimated trend is more smooth, and when  trends to infinity, estimated trend will be close to the linear function.As a general rule of thumb, when we analyze the monthly data,  can be defined as  = 14400.
In this paper, the HP filter is applied to the nonseasonally adjusted series, and the original sequence is divided into two sequences with the significant spectral frequency and building the model more accurately by weakening the mutual effect between the two sequences.

Error Estimation Methods.
There are five basic error estimation methods; simultaneously, the model can be evaluated by relative error (RE), mean absolute percentage error (MAPE), root mean square error (RMSE), and mean absolute error (MAE), which can be expressed as follows: − ŷ     . (23)

Sequence Analysis and Combination Model
Building.The improved model is based on separating the original sequence by filtering analysis.Then according to the characteristics of each sequence, the models can be established for forecasting.
The detailed process is as follows.
(1) According to the HP filtering principle, the original sequence {  }, defined as the superposition of the waves with different frequencies, can be divided into the sequence {} and the sequence {}.
(2) The sequence {} is defined as a function of time "," and its scatter-plot can be drawn.The error term from each point to the fitting curve is denoted as  1 ,  2 , . . .,   ,  1 ,  2 , . . .,   .Then the orthogonal polynomial with the OLS method is used for making polynomial curve fitting to minimize the sum of squared errors min ) According to the polynomial fitting in the previous step, the sequences' predictions can be got and defined as ĝ .
(4) The stationary property of the sequence {} is tested.If it was stationary, the correlation analysis can be used on the sequence; otherwise, the differential transform is conducted on the sequence {} until it is stationary.The stationary sequence is denoted as {  }.
(5) Through the correlation analysis of the seasonal fluctuation, the autocorrelation and moving average items can be acquired [18,20].According to the result, the ARIMA model can be defined as (6) The rationality of the ARIMA model is tested by the residual sequence.(7) According to the ARIMA model, the sequences' predictions can be got and denoted as ĉ .
(8) The final prediction result can be obtained based on the principle of HP filter: The improved model will produce twice prediction error in the analysis.In theory, there is the possibility of increasing the errors and reducing the prediction accuracy.But in the actual analysis, the HP filter method weakens the mutual influence of factors (including the long-term trend and the seasonal fluctuation) and utilizes integrative forecasting for models with the different characteristics of the sequence {} and the sequence {}.In this way, the trend of the sequence can be effectively fitted and the influence on the seasonal trend can be reduced.Finally higher prediction accuracy can be achieved.
According to the above steps, the specific process of the improved model is shown in Figure 1.

Empirical Analysis
The example chooses the research data about electric power consumption from January 2004 to November 2014 in China, the data in January 2004 to December 2013 for model building, and the data in January 2014 to November 2014 for testing the prediction error.3.1.The Forecasting of the Seasonal-ARIMA Model. Figure 2 plots the monthly electricity consumption data from January 2004 to December 2013; as we can see, the intercept and the trend exist in the original sequence, and the sequence is nonstationary.The result of the ADF unit root test with intercept and trend on the sequence is shown in Table 1.
The ADF unit root test demonstrates that the original sequence is nonstationary and the first-order difference of the original sequence is stationary under the 5% significant level.Through observation of the autocorrelation function of the first-order difference sequence, we discover that the sequence's seasonal cycle is 12.Therefore, the additive model is used to adjust the seasonal trend of the sequence.The analysis of the partial autocorrelation and the autocorrelation is shown in Figure 3.
On the basis of the 95% confidence level, the confidence interval of the correlation coefficient is   Prob. is the  value; the smaller the  value, the larger the significance.* indicates to reject the original hypothesis under the confidence level of 95%.
Combined with Figure 3, we can find that partial autocorrelation coefficient in 1, 2, 11 and the autocorrelation coefficient in 1, 11, 13 are not in the confidence interval.Based on the autocorrelation diagram and the partial autocorrelation diagram, OLS method is adopted to establish the seasonal-ARIMA model, and the adjustment of model parameters is based on the significance of correlation coefficient.The estimation of parameter is shown in Table 2.
After determining the model order through parameter significance testing, the time series model we obtain is ARIMA(2, 1, 11)(1, 1, 0) 12 .In this model, the nonseasonal autoregressive items are AR(1) and AR(2), the nonseasonal moving average items are MA(1) and MA (11), and the seasonal autoregressive items are SAR (1).According to this model, the electricity consumption from January to November in 2014 could be forecasted and the result is shown in Table 6.

The Forecasting of the Seasonal-ARFIMA Model.
In the process of model order selection, MA (11) has significant influence on modeling and predicting.This phenomenon shows that the error caused by the long-term observations still influences the current monthly electricity consumption to some extent.We infer that the monthly electricity consumption may have the long-term memory characteristics, and this conjecture is confirmed by the long-term memory test using / method which shows the Hurst exponent: According to the criteria, 0.5 < 0.837 < 1, we know that the monthly electricity consumption has the long-term memory characteristic.Therefore the current forecast is influenced by the distant observations.

Integrative Model.
The HP filter is applied to analyze the original sequence and the smoothing parameter  is assigned values as 14400.And the decomposition results are shown in Figure 4.The blue curve means the original sequence.The red curve means the long-term trend, and we can find that the growth rate of the power consumption is mainly constant from 2004 to 2013.The green curve means the cyclical and  irregular change.And as time goes on, the fluctuation range is more significant.According to HP filter, the original sequence can be separated into the sequence {} with the long-term trend and the sequence {} with other fluctuation properties.
The data separation result using HP filter is shown in Figure 5.We can see the sequence {} approximates a smooth curve, in which the curve of the sequence {} fluctuates up and down around the zero.
In the perspective of statistics, the sequence {} can be transformed into a sequence relating to .Let the independent variable  ( = 1, 2, . . ., 120) represent time and let the sequence {} represent the dependent variable of the system.A curve is fitted to describe the sequence {} as a function of .It turns out that the best fitting is achieved when the order of curve fitting is four and the relative error of curve fitting is shown in Figure 6.According to the fitting polynomial, the sequence from January to November 2014 can be forecasted and the results are shown in Table 3.
The sequence {} is analyzed based on the time series model.And according to the significance we can adjust the parameters and record in Table 4. Finally, the model is ARIMA(11, 1, 1)(0, 1, 0) 12 .
Finally, according to the HP filter principle, ŷ = ĝ + ĉ , we get the prediction of electric consumption from January to November 2014, as shown in Table 6.6, we make a line chart about the absolute value of the relative error as shown in Figure 7. Except for March, May, and June, the relative errors of the integrative model are smaller than According to the prediction result, we make analysis for the prediction error and fill the result in Table 7. Through the comparative analysis, it can be discovered that the prediction accuracy of the improved model is significantly improved.

Model Error Analysis. According to Table
As shown in Table 7, the MSE of the improved model is the minimum, and it means the predictive ability of the improved model is the most stable.By observing the MAPE, the improved model is 1.817%, which is less than 2.643% of the SARIMA model, and it shows that the predictive results of the improved model are close to the real value.The MAE error measurement also shows the same result.

Conclusions
In this paper, HP filter is utilized for adjusting the time sequence data.Thus the original sequence is decomposed into the sequences with different trend, and the mutual interference between the different fluctuation items can be relieved.Moreover the relative error of load forecasting is reduced and the multistep prediction is guaranteed.The testing result of the / method shows that the monthly electricity consumption has the property of longmemory process.This conclusion may be attributed to interference of the amount of the data.However, in the actual order analysis, the higher-order AR or MA indeed affects the power load forecasting.Therefore, the long memory of the time series can be considered for building the SARFIMA model in the midterm power load forecasting.

Figure 1 :Figure 2 :
Figure 1: Model flow chart based on HP filter.

Figure 4 :
Figure 4: HP filter diagram of the power consumption.

Figure 5 :
Figure 5: Data separation based on HP filter.

Figure 6 :Figure 7 :
Figure 6: Relative error of the curve fitting about sequence {} in January 2004 to December 2013.

Table 2 :
The parameters estimate of the ARIMA model.

Table 4 :
The parameters estimate of the ARIMA model.

Table 6 :
Electricity consumption forecasting results in 2014.

Table 7 :
The error of forecast.