The Combination Forecasting of Electricity Price Based on Price Spikes Processing : A Case Study in South Australia

and Applied Analysis 3 which is one of the artificial intelligence types of models. Input vectors and the corresponding reference vectors are used to train a network until it can approximate a function and associate input vectors with specific output vectors [12]. In this paper, the Hecht-Nelson method is used to determine the number of hidden layer nodes [13], which means that when the number of input layer nodes is n, the number of hidden layer nodes is chosen as 2n + 1.The structure of BPNN is shown in Figure 1 and consists of three layers: the input layer, the hidden layer, and the output layer. Before training the net, it is necessary to process data. To ensure that the input vectors are compatible when there are significant differences in their magnitudes, we determine the input vector by normalizing each input value as follows [14, 17]: P = {Pi} = xi − ximin ximax − ximin , i = 1, 2, . . . , n, (3) whereximax and ximin are the maximum and the minimum values of the input data, respectively. xi is the real value of each vector. Subsequent steps for training a BP net model for application in electricity price forecasting are necessary. Figure 1 shows the topology of the BP network. The net is initialized by giving weightsωij, Vjt, and threshold value θi randomly. The input set is then preprocessed by (3), and the output of hidden layer is calculated by the hidden layer function: yi = f1(∑ i wijxi − θj) (i = 1, . . . , n; j = 1, . . . , 2n + 1) , (4) where yi is the output of the hidden layer node j and f1 represents the activation function of a node, which is usually a sigmoid function. The outputs of the output layer data are calculated with the following form: output 1 = f2(∑ i Vijyj − r1) (j = 1, . . . , 2n + 1) , (5) where r1 represents the bias of the neuron, output1 represents the output data of the network, and f2 stands for the activation function of the output layer node [14]. 2.3. Kalman Filtering (KF) Model. Kalman filters (KF) were proposed byKalman in 1960.These filters are dynamic behavioral procedure systems for optimal time series estimation. Observations are recursively combined with recent forecasting values with weights that minimize the corresponding biases [15]. The discrete procedure used is stated by the following state equation (6) and output equation (7): X (k + 1) = AX (k) + V1 (k) , (6) Y (k) = CX (k) + V2 (k) , (7) where k stands for the time, X is the actual state at time k, and A and C are the coefficient matrices that must be Input layer Hidden layer Output layer Figure 1: The topology of the BP neural network. determined before the model is applied in the forecasting system. Yrepresents the output variable vector. V1 and V2 are the process series andmeasurement noise series, respectively, and they are assumed to be white Gaussian noise [18]. The KF model gives a dynamic estimation of the state X based on the observation value Y along with time k. The present estimate ofX(k) is based on the previous valueX(k− 1), k = 1, 2, . . ., and the relationship can be given as Xk/k−1 = AX (k − 1) , (8) Yk/k−1 = AY (k − 1) + V1 (k) . (9) As the new observation value Y(k) is known, the estimate of stateX is updated to be Xk = Xk/k−1 + Kk (Yk − CkXk/k−1) , (10) where Kk = Pk/k−1 ⋅ C T k (Ck ⋅ Pk/k−1 ⋅ C T k + V2(k)) −1 . (11) The final estimate of P is Pk = (I − Kk ⋅ Ck) Pk/k−1. (12) Equations (8)–(12) provide a detailed updating procedure for the Kalman filter algorithm where K is the forecasting gain and P is the forecasting covariance.The ARIMA (1, 0, 0) model is used to estimateC in this paper, and the observation equation can be denoted as [15]: yk = x0,k + x1,k ⋅ dk (13) and the observation matrix C = [1, dk] . 2.4. Traditional Combination Method. In general, the forecasting output of a combination forecasting model based on K component forecasts that produces forecasts p t , . . . , p (K) t of pt is given by the following form: pt = f (p (1) t , . . . , p (k) t ; w) , (14) 4 Abstract and Applied Analysis where f is a function and w is a weight parameter vector [8]. In this study, two methods are chosen to determine the parameter vector w. Let xit denote the forecasting output of the ith individual forecastingmodel at time point t for time series xt.The combination forecasting output at time t can then be denoted in the following form:


Introduction
Electricity has been regarded as a unique commodity in the energy market under the reformation of the energy economy because it is hard to store and it is closely related to peoples' livelihood.Electricity price plays an important role in the power market because it balances market demand and market supply [1].The future price of electricity is the indicator of the market demand for electricity generators as well as for the performance of the profitability of the electricity market.In addition, the nonstorable nature of electricity requires maintaining a constant balance between demand and supply.Therefore, an accurate electricity price forecasting method plays an important role in the electricity industry because not only can it help the generators and decision-makers to generate more profit, but it can also help to reduce waste of the electricity.However, the price of electricity is affected by many factors, including the demand of the market, electricity net transport, weather, and many other uncertain factors [2].
Thus, the price is highly volatile, and the abnormal behaviors of jump points amass abnormal price volatility.The abnormal price points are also called price spikes, and these spikes cause the forecasting of the price of electricity to be a challenge.
The common models used to forecast the price of electricity are classified as time series models and artificial intelligence based models [3].The time series models are some of the most widely used techniques, including the ARIMA and GARCH models, which are regarded as efficient ways to forecast the price of electricity [4].Time series models are specifically developed for forecasting the price of electricity [2].Artificial intelligence based models, including neural network and data-mining models, are regarded as the most popular techniques applied in electricity price forecasting [5].A fuzzy inference system is also applied in electricity price forecasting, and the desired results have been obtained [6].Francesco Serinaldi has introduced the application of generalized additive models for location, scale, and shape (GAMLSS), which is a dynamical distribution model that has 2 Abstract and Applied Analysis proved to be a flexible model in electricity price forecasting [5].
However, the models listed above are all just based on the original data without preprocessing any outliers.Electricity prices can change suddenly and can make large jumps to extreme levels; this phenomenon may be caused by unexpected increases in demand, an unexpected supply shortage, or failures in the transmission net [7].The extreme price values are also called "price spikes, " which negatively affect forecasting accuracies because many forecasting models are sensitive to extreme observations [5].Thus, the detection of price spikes is a very important issue for practical situations.In this paper, the density-based spatial clustering of applications with noise (DBSCAN) algorithm is used to identify and remove the electricity price spikes.The linear interpolation function is used to produce new points that replace the price spikes for the development of the preprocessed data.Furthermore, the original data and the preprocessed data are used to establish the forecasting models, and then the forecasting results are compared in this study.The simulation results show that the forecasts built using the preprocessed data perform better than when the original data were used.
It is widely accepted that combination forecasts can achieve desirable results [8,9].In several research studies, combination forecasts have generally been shown to have outstanding performance in comparison to single forecasts [10].Because the main idea of a combination forecast is assigning weights to the combined forecasts according to the contribution of the combined forecasts, it is an important task to select a reasonable combination method.Traditional combination methods have restrictions on weight: assigning negative values is not allowed and the sum of the weights must be 1.In this way, traditional combination methods have low forecasting accuracies.The CPSO weight-determined combination method allows weights of combined models to take values of [−1, 1], which overcomes the shortage of traditional combination models caused by the no negative constraint on weight.Furthermore, the CPSO weight-determined combination method has inherited several advantages of an artificial intelligence algorithm to flexibly search for the best solutions.Several combination models are compared, and their forecasting results are analyzed carefully in this paper.The results indicate that CPSO weight-determined combination models outperform traditional combination models.
This paper is organized as follows: Section 1 gives a brief introduction of this paper; Section 2 proposes theories of individual forecasts and combination methods; Section 3 illustrates the electricity price forecasting of South Australia (SA) and contains the analysis of the forecasting results and a comparison of the forecasting models.Lastly, Section 4 concludes with the contributions of this paper.

Materials and Methods
Three individual forecasting models including the autoregressive integrated moving average (ARIMA) model, the back propagation neural network (BPNN) model, and Kalman filtering (KF) model are chosen for this study.Each of these models has its own advantages in electricity price forecasting.The ARIMA model can analyze the linear relationship of variables and implement various exponential smoothing models [11].The BPNN model is an efficient way to analyze nonlinear data [12], and the KF model provides an efficient computational solution using the least square method with only a minor computational cost and with easy adaptation to any alteration of the observations [13].However, the ARIAM model is not always efficient when applied to any data set [11]; the global solution determined using the BPNN model depends on the training of its nonlinear functions, which may cause inaccurate prediction results [14]; and the KF models cannot always produce desirable forecasting results [15].As a widely accepted way to improve forecasting accuracies, combination methods are proposed in this paper to combine the individual forecasts.

Autoregressive Integrated Moving Average (ARIMA)
Model.The ARIMA model was developed by Box and Jenkins, also called "Box-Jenkins, " and has been widely used in many areas in the last several decades [11].This model can capture the linear relationship between different variables in the real world [16].An ARIMA (, , ) model can be expressed as follows: where   ( = 1, 2, . . ., ) is the time series value and () and () are the coefficients related to the lag operator or back operator .In addition,  = 1, 2, . . ., ,  = 1, 2, . . ., , , and  are inter numbers often referred to as autoregressive and moving average polynomials, respectively.Finally,   ( = 1, 2, . . ., ) are the random errors at time , which are assumed to be independently and identically distributed with mean = 0 and a variance of  2 .In an ARIMA model, the future time series value is regarded to be a linear function of several past observations and random errors [11].The form of (1) can be written as (2), and then it is easy to obtain the future series according to the several past values: To obtain accurate forecasting results, three phases must be performed [16]: (1) identify the structure of the model; (2) estimate the parameters of the model; (3) establish the model and forecast the future series.

Back Propagation Neural Network (BPNN)
Model.An artificial neural network system is a nonlinear model composed of three layers, including an input layer, one or more hidden layers, and one output layer.There is a large amount of nodes to connect the three or more layers using different weights.The BPNN model is a type of multilayer feedforward neural network with a wide variety of applications, which is one of the artificial intelligence types of models.Input vectors and the corresponding reference vectors are used to train a network until it can approximate a function and associate input vectors with specific output vectors [12].In this paper, the Hecht-Nelson method is used to determine the number of hidden layer nodes [13], which means that when the number of input layer nodes is , the number of hidden layer nodes is chosen as 2 + 1.The structure of BPNN is shown in Figure 1 and consists of three layers: the input layer, the hidden layer, and the output layer.
Before training the net, it is necessary to process data.To ensure that the input vectors are compatible when there are significant differences in their magnitudes, we determine the input vector by normalizing each input value as follows [14,17]: where   max and   min are the maximum and the minimum values of the input data, respectively.  is the real value of each vector.Subsequent steps for training a BP net model for application in electricity price forecasting are necessary.
Figure 1 shows the topology of the BP network.The net is initialized by giving weights   , V  , and threshold value   randomly.The input set is then preprocessed by (3), and the output of hidden layer is calculated by the hidden layer function: ( = 1, . . ., ;  = 1, . . ., 2 + 1) , (4) where   is the output of the hidden layer node  and  1 represents the activation function of a node, which is usually a sigmoid function.
The outputs of the output layer data are calculated with the following form: where  1 represents the bias of the neuron, output 1 represents the output data of the network, and  2 stands for the activation function of the output layer node [14].

Kalman Filtering (KF) Model. Kalman filters (KF) were
proposed by Kalman in 1960.These filters are dynamic behavioral procedure systems for optimal time series estimation.Observations are recursively combined with recent forecasting values with weights that minimize the corresponding biases [15].The discrete procedure used is stated by the following state equation ( 6) and output equation (7): where  stands for the time,  is the actual state at time , and  and  are the coefficient matrices that must be determined before the model is applied in the forecasting system.represents the output variable vector. 1 and  2 are the process series and measurement noise series, respectively, and they are assumed to be white Gaussian noise [18].
The KF model gives a dynamic estimation of the state  based on the observation value  along with time .The present estimate of () is based on the previous value ( − 1),  = 1, 2, . .., and the relationship can be given as As the new observation value () is known, the estimate of state  is updated to be where The final estimate of  is Equations ( 8)-( 12) provide a detailed updating procedure for the Kalman filter algorithm where  is the forecasting gain and  is the forecasting covariance.The ARIMA (1, 0, 0) model is used to estimate  in this paper, and the observation equation can be denoted as [15]: and the observation matrix  = [1,   ]  .

Traditional Combination Method.
In general, the forecasting output of a combination forecasting model based on  component forecasts that produces forecasts p(1)  , . . ., p()

𝑡
of   is given by the following form: where  is a function and  is a weight parameter vector [8].In this study, two methods are chosen to determine the parameter vector .
Let x denote the forecasting output of the th individual forecasting model at time point  for time series   .The combination forecasting output at time  can then be denoted in the following form: where x is the combination model's forecasting output,  is the number of the component forecasting models, and   is the weight of the th individual forecasting model.Then, the constraints for   must be considered.In conventional research studies, the constraints must meet the following requirement to ensure that the combined model is reasonable [19]: To estimate the performance of the combined models, the key issue is to adjust the weight of the component forecasts.There are many methods for obtaining reasonable weight values, and the most commonly used method is the weighted average (WA) method.Let   = x −   denote the residual of the th individual forecasting model at time .The residual of the combined model is denoted as follows: Therefore, the WA minimizes the mean absolute percentage error (MAPE) of the output of the combination forecasts to obtain the optimal weights as follows:

The Proposed CPSO Weight-Determined Combination
Method.Particle swarm optimization (PSO), developed by Kennedy and Eberhart, is a new intelligence algorithm based on a swarm intelligence algorithm inspired by the cooperation and communication of a swarm of birds or fish looking for food [20][21][22].

CPSO Algorithm.
Chaos is a common phenomenon that consists of unstable dynamic behavior that is sensitive to initial conditions.However, it contains an inherent regularity and randomness whereby the ergodicity and regularity of chaotic variables can be used to optimize a search.Recently, the chaos series replaced the random series for the development of the chaos optimization algorithm (COA), which has achieved good results [23].A logistic equation is a chaotic system, and the chaotic sequence is expressed as follows: where  0 ∈ [0, 1],  is the control parameter.  is the chaotic sequence at point .
The chaos particle swarm optimization (CPSO) algorithm is developed based on the COA, which replaces random series with chaos series to avoid premature convergence [24].The CPSO algorithm integrates the fast computational time of the PSO algorithm and the strong ability of the COA to examine values beyond the local extrema.In addition, CPSO can avoid the shortcomings of the PSO algorithm because it easily arrives at the local extrema by maintaining the diversity of the swarm [25].The speed of the particles is   = (V 1 , V 2 , . . ., V  ), which varies using the following form: The position of each particle is changing according to the following equation: where  is the time,  1 and  2 are acceleration coefficients,  is the inertia factor, rand1() and rand2() are two independent random numbers uniformly distributed in the range of [0, 1], and  indicates the number of particles.Many studies have demonstrated that better results can be achieved when  1 =  2 = 1.5 and when changes from 0.4 to 0.9 [26].Table 1 shows the advantage of using CPSO compared to using basic PSO.Generally, the logistic chaos series performs better than the random series under Sphere, Rosenbrock, Rastrigrin and Schaffer f6 test functions.

Selecting Weights of Combination Models
Using the CPSO Algorithm.In this paper, the CPSO algorithm is used to determine the weights of the combination models.The combination model is shown in (1), and the objective function is shown as (5).However, to obtain better results, the constraints must be changed to the following: Weights are regarded as particles and are optimized by the CPSO algorithm; the best particles have optimal weights.The optimal processing is expressed using the following steps [27].
Step 1. Initialize the parameters containing the particle population , acceleration coefficients  1 and  2 , inertia factor , and the iteration times.
Step 2. Let the particles fly with the speed and location of particles designated under the control of ( 7) and ( 8), in which the random series is replaced by the chaos series.
Step Step 4. Calculate the fitness value of the current best particle that is the MAPE in this study, and then find the best individual  * .
Step 5. Use the best individual  * to replace a current particle.
Step 6. Check whether the optimal solution or maximum number of iterations has been achieved.If not, return to Step 2, or break.

A Case Study
The electricity price data used in this study is from South Australia (SA) (see Figure 2), which is located in the south of Australia [28].

Data Description.
The available data were collected every 30 minutes from 1 March to 21 March 2009, and the collection time began at 00:30 and ended at 12:00 each day.The collected data are grouped into two sets: the "training set" was used to build the forecasting models, and the other set was the test set.The collected data are shown in Figure 3, which shows the high volatility of the data.In other words, there are many price spikes in the data.

Density-Based Spatial Clustering of Applications with
Noise (DBSCAN) Algorithm.The DBSCAN algorithm was proposed in 1996 and is a simple and efficient clustering algorithm based on density clustering [29].The DBSCAN algorithm regards the cluster as a region and assumes that the objects in the region are dense.DBSCAN is the best choice for identifying price spikes, for it is able to discover clusters of arbitrary shapes [30].The DBSCAN algorithm includes two parameters,  and . is the radius of a neighborhood,  is the number of points, and  is the given data.The following concepts are introduced for understanding and implementing the process of the algorithm.Figure 4 shows a brief scan of the DBSCAN algorithm.
Definition 2 (directly density-reachable).Two conditions should be satisfied where an object  in the data is directly density-reachable from an object  ⋅  ∈   (), |  ()| ≥ .
Definition 3 (density-connected).Objects  and  are density-connected in the data set  if for the given  and ;  and  are all directly density-reachable from the same object .The preprocessed data are shown in Figure 5.

Statistic Measurements of Forecasting Performance.
The forecasting performance of the individual forecasts and combination models is evaluated by three common criteria: the mean absolute percentage error (MAPE), the root mean square error (RMSE), and the mean absolute percentage error  (MAE).The accuracy of the forecasting results increase as these three error values decrease [31]: where   is the actual value at time , x is the forecasting value at time, and  = 24 in this study.

Simulation and Individual Models
Performance.The electricity price data of SA are preprocessed and divided into two sets: one set contains the electricity price data from 1 March 2009 to 20 March 2009, which is used to build the forecasting model, and the remaining data are included in the other set, which is the out-of-sample data.The detailed process of the model building is shown in Figure 6, and three individual models are chosen for the simulation.The forecasting result is shown in Figure 7, which suggests the forecasting model is only accurate in the local area.Table 2 provides a more detailed description of Figure 7 in which the forecasting results of the three individual models are listed.Three criteria are chosen to evaluate performance of the three individual forecasts, and the results are shown in Table 3.The data in Table 3 indicates that the KF model is the best model of the three forecasting models because it produces the minimum amount of error indicators in MAPE, RMSE, and MAE.The ARIMA model performs undesirably upon comparison with other two models.

Simulations and the Evaluation of the Combination
Model's Performance.The proposed combination method combines three individual forecasts.As a comparison, the traditional combination method is adopted to combine individual forecasts as well.Finally, the forecasting results are compared in this section.The simulation process can be seen in Figure 8. Step 1. Combination simulation: CPSO weight-determined combination method combines the three individual forecasts.As a comparison, the three individual models are also combined using the traditional combination method.
Step 2. Analysis of the results: the performance of the combination models based on two combination methods is shown in Tables 4 and 5 4 shows the forecasting results of the CPSO weight-determined combination models in the forecast period from 00:30 to 12:00 on 21 March and Table 5 shows the forecasting results of the traditional models in the same forecasting period, where the forecasting performance in every step can be observed easily.In comparing the forecasting results shown in Tables 4 and 5, a piece of information is revealed: while the forecasting points are moving, the proposed CPSO weightdetermined combination models always perform better than the traditional combination models.For the CPSO weightdetermined combination models, the A-B, B-K, A-K, and A-B-K models have a minimum MAPE value of 0.00% at time points 04:30, 05:00, 08:30, and 05:30, respectively.However, for the traditional combination method based A-B, A-K, B-K, and A-B-K models, the MAPE values were 6.33%, 3.37%, 3.73%, and 2.89%, respectively.This study illustrates the outstanding performance of the CPSO weight-determined combination models.The average evaluation criteria listed in Table 6 suggest that the A-B model based on the traditional combination method, which produced MAPE, RMSE, and MAE values of 25.19%, 9.74, and 7.72, respectively, is higher than the corresponding values of A-B model based on the CPSO weight-determined combination method.The same analyses were conducted using the B-K, A-K, and A-B-K models, and these results indicate that the combination models based on the CPSO weight-determined combination method perform better than the traditional combination method overall because CPSO algorithm searches the weights of combination models by the way of artificial intelligence.Table 6 lists the comparison of the individual and combination models.On the whole, the combination models outperform the best combined individual models because these combination models generate smaller values of MAPE, RMSE, and MAE.The MAPE values of the A-B-K model are 21.12% () and 20.79% (), the RMSE values of the A-B-K model are 8.19 () and 7.93 (), and the MAE values are 6.32 () and 6.12 (), which are all smaller than the evaluation      criteria of A-B-K, A-B, B-K, and A-K models.Therefore, A-B-K models perform better than A-B, B-K, and A-K models no matter if these other models are combined by the traditional combination method or the CPSO weightdetermined combination method.Table 7 shows the detailed weight assignment of the combination models and compares the two different combination methods.The comparison results indicate that the CPSO weight-determined combination method assigns

28.53
The compared results weights to the individual models flexibly without the constraint of the sum of weights being 1 and with no negative assignments, which makes it easy to obtain more accurate forecasting results.The results in Table 7 suggest that, when the combination models perform undesirably, the traditional combination method would assign the weight of the best combined model a value of 1 and the others a value of 0. However, the CPSO weight-determined combination method would adjust the assignment of the weights without the constraint of the sum of weights being 1 with no negative assignments to find the best combination.The MAPE values indicate that the performance of the B-K model based on the traditional combination method is undesirable.Therefore, it is reasonable to improve the combination rule using the CPSO weight-determined combination method.

Comparison of the Forecasting Results of the Preprocessed
Data and the Original Data.It is well known that the preprocessing of anomaly data is very useful in the building of forecasting models.In this section, the forecasting results of the forecasting models built by the preprocessed data (PF) and the forecasting models built by the original data (ODF) are compared in Table 8.Table 8 shows that the PF based models have the best performance compared to the ODF models for both individual models and combination models.For the ARIMA models, the PF based model has a MAPE value of 20.05%, which is lower than the MAPE value of the ODF based model at 37.01%.The BP and KF models based on preprocessed data are superior to the models based on the original data.Although the combination models based on the original data (CODF) have good performance in comparing individual (IODF) models, the CODF models are inferior to the combination models based on the preprocessed data (CPF).

Conclusion
As electricity forecasting becomes increasingly important, this paper proposes the use of the CPSO weight-determined combination method based on DBSCAN algorithm preprocessed data to improve electricity price forecasting.
The traditional combination method has a no-negative weight constraint, and its sum must be 1, which limits the range of weights and thus causes lower forecasting accuracies.In addition, original data-based forecasts do not consider the fact that electricity price spikes are sensitive to outliers, which leads to inaccurate forecasting.To overcome the limitations of original data based traditional combination models, the proposed CPSO weight-determined combination method allows the weights of combined models to take values of [−1, 1], and the original data are preprocessed by the DBSCAN algorithm.The electricity price data of South Australia have been simulated by the proposed, preprocessed, and data based CPSO weight-determined combination models to forecast the electricity price for the period from 00:30 to 12:00 on 21 March 2009, in SA.The preprocessed databased CPSO weight-determined combination models were observed to have optimal performance in comparison to the traditional combination method.This conclusion was based on the original data in terms of three metrics: MAPE, RMSE, and MAE.

Figure 1 :
Figure 1: The topology of the BP neural network.
, respectively.A-B represents the combination model of ARIMA and BP; B-K represents the combination models of BP and Kalman; A-K represents the combination model of ARIMA and KF; A-B-K represents the combination model of ARIMA, BP, and KF;  means the traditional combination method; and  means the CPSO weight-determined combination method.Table

Figure 7 :
Figure 7: Comparison of the three individual forecasting results.

1 Figure 8 :
Figure 8: The results of the simulation process.

Table 2 :
The forecasting results of three individual models.
∈ Figure 4: A sketch of DBSCAN algorithm.Definition 4 (cluster).Clusters are the nonempty subs of the data set  that are density-connected from all core points.Definition 5 (noise).Suppose that  1 ,  2 , . . .,   are all clusters of data set  and noise is the set of points in  not in any cluster.Thus, noise is treated as a set of points that satisfy the form { ∈  |  ∉   ,  = 1, 2, . . ., }.

Table 3 :
Comparison of three individual forecasting models.

Table 4 :
CPSO weight-determined combination models forecasting results.

Table 5 :
Traditional combination models forecasting results.

Table 6 :
Comparison of the performance of different forecasting models.

Table 7 :
Weights of the individual models in the combination models.

Table 8 :
Comparison of forecasting models built using preprocessed data and original data.