Support Vector Regression Based on Grid-Search Method for Short-Term Wind Power Forecasting

The purpose of this paper is to investigate the short-term wind power forecasting. STWPF is a typically complex issue, because it is affected by many factors such as wind speed, wind direction, and humidity. This paper attempts to provide a reference strategy for STWPF and to solve the problems in existence. The two main contributions of this paper are as follows. (1) In data preprocessing, each encountered problem of employed real data such as irrelevant, outliers, missing value, and noisy data has been taken into account, the corresponding reasonable processing has been given, and the input variable selection and order estimation are investigated by Partial least squares technique. (2) STWPF is investigated by multiscale support vector regression (SVR) technique, and the parameters associated with SVR are optimized based on Grid-search method. In order to investigate the performance of proposed strategy, forecasting results comparison between two different forecasting models, multiscale SVR and multilayer perceptron neural network applied for power forecasts, are presented. In addition, the error evaluation demonstrates that the multiscale SVR is a robust, precise, and effective approach.


Introduction
Compared to the traditional thermal power [1], wind energy is a significant aspect of renewable energy and it is getting more and more attention due to it is a renewable, inexhaustible and free source.Therefore, the corresponding wind energy forecasting becomes a critical issue for dispatch and scheduling of power systems [2].Precise wind energy forecasting can balance and integrate the multiple volatile power sources at all levels of the transmission and distribution grid [3].Moreover, accurate short-term wind power forecasting can reduce the problems which are caused by grid integration and energy trading [4].
Short-term wind distribution is essentially a random one although it can be described by a continuous probability distribution named Weibull distribution in a long term.It is hard to obtain the intrinsic regulation of wind speed in a shortterm; thus, the soft computing can play a significant role in short-term wind power forecasting.SVR technique has already been employed for the short energy forecasting in the existing literatures; however, many of them assume that the employed data have good quality.In fact, it is impossible to acquire the data without noise because there are many reasons that are sometimes beyond the control of human operators [5].Generalization capability of traditional forecast approach is relatively low because the employed approach is seriously dependent on the quality of sample.In this paper, we optimize the parameters for two different kernel functions, radial basis function (RBF) or polynomial function (PF), based on the Grid-search method for SVR such that the accuracy of forecasting results is improved.The main algorithm flow is illustrated in Figure 1.
Figure 1 provides a brief illustration for the main processing step of proposed approach in this paper.The rest of this paper is organized as follows: formalization and illustration related to the dataset are investigated in Section 2. A brief review on wind energy forecasting is given in Section 3.
Section 4 presents the detailed information related to our proposed approach including the basic employed theory.In Section 5, firstly, the missing value of employed data is filled by the data interpolation techniques.Secondly, two different filter methods (median absolute deviation (MAD) filter and wavelet decomposition and denoising) are used to eliminate the irrelevant, noisy, and outlier value.Thirdly, the input variable selection is investigated by Partial least squares (PLS) technique.Fourthly, the data order estimation is implemented through the cross-correlation methods.Fifthly, the multiscale SVR in combination with Grid-search technique is applied to forecast the short-term wind power.Finally, the performance evaluation and error analysis are given.In Section 6, the proposed results and the prospective research questions are summarized and discussed, respectively.

Problem Description
2.1.Data Resources.The quality of data samples plays an important role in wind power forecasting because it has a direct impact on forecasting performance.The data quality is the fundamental issue for the data analysis; in particular, data do not exist without noise in the real application.The main objective of data analysis is to discover knowledge which will be used to solve real problem and make decisions [5].The wind tower which is employed to collect the data contains two different heights, 30 m and 60 m.The data have been measured every 3 minutes, a total of about five days of measurement data are selected.Specifically, the employed data contain few irrelevant, corrupt, and noisy ones which must be removed and filled from the data in order to proceed with further data analysis.Taking into account the real application, because the data sampling equipment sometime encounter a temporary mistake and the associated data are recorded as zero, therefore, these data with bad quality are eliminated.Typically, the short-term wind power prediction is within 4 days and ultra-short-term one is within 4 hours.The paper mainly discussed the short-term wind power prediction by SVR techniques based on Grid-search method.

Formalization.
In this paper, the short-term wind forecasting issue is formulated as a regression problem.The time series is denoted by where 10 variables indicate 30 m average wind speed, direction, temperature, humidity, and pressure and 60 m average wind speed, direction, variance, real wind speed, and real wind direction, respectively;  is a positive integer with the value more than one.In general, (1) provides all the factors in real application for short-term wind power forecasting.The primary task of this paper is to predict the output wind power   (  ),  = 1, . . .,  at time  =   based on the wind measurements through   (  ) at time   ,  −1 , . . .,  − ,  > , ,  ∈  + , past observations.

Related Work
Short-term wind power forecast has attracted more and more attention in recent decades.Alessandrini et al. [4] discussed the comparison between ECMWF EPS (Ensemble Prediction System in use at the European Centre for Medium-Range Weather Forecasts) and COSMO-LEPS (Limited-area Ensemble Prediction System developed within Consortium for Small-scale Modelling) based on two forecasting models.As a survey associated with short-term prediction in the last 30 years; Costa et al. [6] investigated the performance of two principal approaches (mathematical and physical).Kramer and Gieseke [2] applied the SVR techniques for shortterm wind energy forecasting based on the real world wind power data from the National Renewable Energy Laboratory (NREL) western wind resource dataset, and Osowski and Garanty [7] utilized the SVR and wavelet decomposition method to forecast the short-term air pollution.Chen et al. [8] proposed new learning techniques based on support vector machine (SVM) model for the power load forecasting and conducted experimental results for the short-term load forecasting.Chang [9] proposed a hybrid method which combined the orthogonal least squares (OLS) algorithm and genetic algorithm (GA) for short-term wind power forecasting based on the radial basis function (RBF) neural network.Che et al. [10] applied a two-voltage stage topology with boost converter to improve the conversion efficiency of commercial small wind grid inverter by proposed control strategy.Based on the combination of wavelet transform and neural network arithmetic, Song et al. [11] dealt with an energy management related to a hybrid power generation system such that the stability of power generation system was improved greatly.Moreover, a bibliographical survey associated with the general application of research and developments was presented by Lei et al. [12] in the fields of wind power forecasting.Li et al. [13] presented ideal subspace approximation techniques based on a chaotic time series and nonlinear Kalman filtering, and the wind speed prediction experiments were used to demonstrate the high chaotic prediction accuracy.Hoai et al. [14] optimized an empirical-statistical downscaling technique for prediction based on a feed-forward multilayer perceptron (MLP) neural network, and they gave the numerical simulation to demonstrate the robustness of proposed technology.Sánchez [15] gave a statistical forecasting system for short-term wind power prediction in 48 hours ahead based on the techniques combination with recursive methods and schemes with adaptability.An in-depth review of the current wind power generation method and advances in wind power forecasting was formulated by Foley et al. [16], and Botterud et al. [17] discussed the current development of wind power forecasting in US ISO/RTO markets and the application of state-of-the-art forecasts refers to associated forecasting.Based on the numerical and statistical models, Stathopoulos et al. [18] proposed some strategies for accurate local wind forecasts by the combination with statistical post processes.

Theory for Proposed Approach
Data analysis is the fundamental approach for the knowledge investigation [19].Proper data preprocessing can eliminate the unreasonable trend of data without loss of data characteristics.Therefore, efficient techniques for automatic data preprocessing are crucial [20].In this section, the basic theory illustration refers to data preprocessing and multiscale SVR are introduced as follows.

Missing Value.
In the data preprocessing, the issues that must be considered are about the irrelevant, missing value, and noisy data.The data with poor quality will result from the poor performance of final employed approaches.In this case, the data are not complete because there are missing values which cannot be eliminated because of the requirement for time continuity as well as they may contain useful information [5].In this paper, the missing value is filled via data interpolation by the intrinsic relationship between data; for instance, the missing value of wind power can be filled by its associated influential factor wind speed.The nearestneighbor interpolation (also named as proximal interpolation) is a typical method of multivariate interpolation in one or more dimensions, which can be used to approximate the value of a nongiven point based on the corresponding point of the neighborhood.Under normal circumstances, the filled data is reasonable because the value of data is not suddenly changed in the real application, and the neighboring data correlation is taken into account in nearest-neighbor interpolation method.

MAD Filter.
Median absolute deviation from the median  * for a data sequence ( 1 ,  2 , . . .,   ) is more robust with good performance in the presence of multiple outliers [21,22].The Hampel filter (also known as MAD filter) which is used to eliminate the outlier is denoted by where  = 1.4826 and  = median{|  −  *  |} is the median value of |  −  *  |.  is the threshold employed to control range of convergence, and it can be estimated based on the sample standard deviation of the distribution.The MAD filter can replace the outlier-sensitive mean and standard deviation estimates with the outlier-resistant median as well as MAD from the data [23].

Wavelet Decomposition and Denoising.
Unlike the MAD filter, the wavelet decomposition and denoising analysis for data is localized in both time domain and frequency domain, and it can be used to decompose the original data into highfrequency component (HFC) and low-frequency component (LFC).Typically, the HFC denotes the detailed information such as mutant value, while the LFC usually represents the generalized or stationary characteristic related to employed data.The more detailed discussion of wavelet decomposition and denoising can be founded in [24][25][26][27].
Comparing to the continuous wavelet transform, the discrete wavelet transform (DWT) is more commonly used in real application and defined by where ∀ means for any   () is a basic wavelet, and The above DWT is also known as Mallat algorithm [28], in this paper, the Vaidyanathan filter [29][30][31] is applied for the implementation of data decomposition and denoising.

Input Variable Selection. Swedish statistician named Herman Wold first introduces Partial least squares (PLS)
technique which is used to find the fundamental relations between two variables ( and ); that is, a latent variable approach is employed to investigate the covariance structures between variables.Partial correlation can be used to explore the association between pairs of random variables in the presence of other variables [32,33], and its coefficients can be calculated between the variables and exclude the influence of other variables, then the main instruction of PLS coefficients can be derived via the following three steps.
(i) Hypothesis.Consider three variables,  1 ,  2 , and  3 ; the partial correlation coefficient  12(3) between  1 and  2 given  3 is defined by where   is the product-moment correlation coefficient between variables with subscripts  and .The range of PLS coefficients values is (−1, 1), in particular, 0 indicates no association between  1 and  2 .
(ii) Statistic.A test associated with the full correlation coefficient is used to test the original hypothesis in Step ( 4) under the assumption that data has an approximately normal distribution.If the  12(3) is the obtained partial correlation coefficient, then the appropriate  statistics is denoted as where  −−2 has an approximate Student's -distribution, and its degrees of freedom are  −  − 2;  is the number of observations from the computed full correlation coefficients.
(iii) Probability Calculation.Calculate the observation of test -statistics as well as its corresponding values of probabilities .If the value of probability , which is used to test -statistics, is less than the value of given significance level , then the original hypothesis should not be accepted; otherwise, it is available for the test.

Order Estimation.
Data order reflects the intrinsic relationship between past data and current data, which is derived through the autocorrelation function (ACF) for one data sequence and cross-correlation function (CCF) for two different data sequences.Autocorrelation (is also sometimes called "lagged correlation" or "serial correlation") refers to the correlation of a time series with its own past and future values, which relate to the correlation between members of a series of numbers arranged in time.CCF is a measure of similarity of two given date as a function of a time lag, and it is commonly used for the replacement of a long data by a shorter and suitable length.In the discrete domain, ACF and CCF for two real time series   ,   ,  = 0, 1, . . .,  − 1,  = 0, 1, . . .,  − 1 are defined by In MATLAB, ACF and CCF are computed with the function "xcorr" which are defined by ( 6) and (7) in the frequency domain, respectively.4.6.Support Vector Regression.SVM for regression was proposed in 1996 by Drucker et al. in [33]; this method is called support vector regression (SVR), and its basic idea is based on support vector classification, more precisely, the fact that the cost function does not take into account the training points that lie beyond the margin; thus, the SVR only depends on a subset of the training data.Analogously, least squares support vector machine (LS-SVM) which is known as another SVM has been presented by Suykens and Vandewalle [34].Vanik-Chervonenkis theory and structural risk minimization (SRM) are the fundamental theory for the SVM [35,36].SVM is to investigate the intrinsic relationship between the prediction model related to wind time series and learning capability and derive the best generalization capability, if the given sample dataset is  ×  = {( 1 ,  1 ), . . ., (  ,   ), . ..}  =1 ∈   × .
The basic task of regression is to establish the nonlinear function  :   → , such that  = (); the estimation function and loss function are, respectively, defined as where  is penalty factor which is employed for empirical risk and confidence range,  *  ,  are relaxation factors which are used to modify the convergence speed,  is loss function which is applied to estimate the prediction accuracy for (8), and its detailed information is given as Moreover, the   () is equal to zero if |() − | < .Essentially, ( 8) and ( 9 where if the value of  is large, then the accuracy capability of data fitting will be more good in the complexity and approximation error of the control model;  is applied to control the generalization capability and regression approximate error,   ,  *  are the Lagrange multiplier, when they are not equivalent to zero and then the SVR can be used for regression prediction, and (  ,   ) is the kernel function which is used to simulate the inner production, which can be given as radial basis function or polynomial function.Furthermore, the regression function is   = ∑ − =1 (  −  *  )(  ,   ) + ;  =  + 1, . . ., , the regression step th step prediction can be denoted by where MAE is the absolute average error between forecasting power   and real power   over the time . is the number of test sample; RMSE and RMAE are the root mean square error and relative average absolute error, respectively.

Experimental Analysis
In this section, the numerical simulation is constructed for each part of Section 4 based on the wind resources dataset which are presented in Section 2.1.AWS and power based on the discussion in Section 5.2.Because 10 samples are collected in half an hour according to sampling frequency, half of windows size of MAD filter is set as 5, and the Vaidyanathan wavelet filter is employed for the data preprocessing.Without loss of generality, other wavelet transform such as Daubechies wavelet are also accepted.From Figures 2-4, irrelevant, outliers, and noisy data of employed data are, respectively, eliminated and filled by the MAD filter and wavelet transform.The data become a relative stationary and smooth time series now, which is a necessary step for the further analysis.

Input Variables Selection Results
. This section provides the simulation results with respect to the theory analysis of Section 4.4.The quality and quantity of samples have significant impact on the accuracy of forecasting because more data will increase the difficulty of real operation, and less data may not contain enough information for analysis.As the discussion of Section 5.1, the PLS coefficient with regard to eleven variables is computed by ( 4) and ( 5) and are given in Tables 1 and 2.
Based on the discussion of Section 4.4, all the value of "significance value" is equivalent to about 0.6936 × 10 −308 , which is far less than the set value  = 0.01; in other words, the original hypothesis should not be accepted because there are significant correlation between variables.Moreover, the lower and upper bound with confidence level of 95% are provided in Table 2, where UB represents the upper bound and LB represents the lower bound.From Tables 1 and 2, we can learn two facts: one is that all the obtained data in Table 1 are accepted because of the corresponding testing value which is presented in Table 2; the other one is that one group is enough for the wind power forecasting because of the similarity between different variables.

Data Order Estimation and Normalization.
Based on the discussion of Section 5.2, the 30 m variables except for the pressure are selected as the input variables because there is no significant change about pressure.The data order estimation  is implemented through ( 6) and ( 7) and denoted in Figures 5  and 6, respectively.From Figures 5 and 6, we can learn that the two past values are the most significant impact factors for the current value because the ACF value is about more than 0.9 and CCF value is more than 0.7.Thus, 2 can be set as the data order based on the data estimation by performance of ACF and CCF.

Multiscale SVR Performance.
The following simulation is based on [37,38] libsvm (2013, version 3.17, platform: MATLAB 2012b, Microsoft Visual C++ 2010), and the kernel functions are radial basis function (RBF) and polynomial function (PF).Without loss of generality, 80% and 20% of employed data (2297) are selected as the training sample and testing samples, respectively.Because the changes in wind direction are not obvious, so both sine and cosine with regard to AWD are selected as the input variables.Therefore, the number of input variables is essentially about 10 because the data order is set as 2. In order to obtain the accuracy comparison between the different variables, all the variables value are normalized at the range (1,2).Moreover, the k-fold crossvalidation (K-CV) [39][40][41][42] and "Grid search" are utilized for the parameters selection."Grid search" is to try every possible value of the parameters (, ), and the best accuracy of the (, ) can be derived based on the K-CV, where  and  are, respectively, penalty factor and kernel function parameters.The state trajectory of original data and forecasting data are given in Figures 7,8,9,and 10.Moreover, the detailed information refers to simulation results comparison between multiscale SVR and MLP neural network will be given in Tables 3 and 4.    3 and 4.
Based on Tables 3 and 4, we can learn that the performance of SVR is better than the tradition MLP neural network.Because the best performance of SVR employed MAD filter, therefore, data for MLP neural network are still filtered by MAD.In fact, SVR turns to be a robust time series forecasting method even with different parameters such as different number of cross-validation for testing set, different kernel function, and associated parameters.The simulation results denote that the SVR is an effective approach for STWPF.

Conclusions
In this paper, the multiscale SVR technique is applied for the short-term wind power forecasting.Firstly, we introduce a brief illustration for the main processing step and its corresponding theory analysis.Secondly, the data interpolation technique is used to fill the missing value of employed data.Thirdly, median absolute deviation filter and wavelet decomposition and denoising technique are applied to eliminate the irrelevant, noisy, and outlier value.Fourthly, Partial least squares technique, autocorrelation function, and crosscorrelation function are, respectively, employed to the input variable selection and order estimation.Fifthly, the multiscale SVR in combination with Grid-search technique is utilized to forecast the short-term wind power.Finally, the performance evaluation and error analysis are applied to evaluate the performance of multiscale SVR.Comparing to the multilayer perceptron (MLP) neural network, the performance demonstrates that SVR technique is a fast and robust time series forecasting approach.We believe that the proposed strategy has reference value for short-term wind power forecasting and other energy consumption on the demand aspect.

Figure 3 :
Figure 3: State trajectories refer to the 3rd variable and 4th variable.

Table 1 :
PLS coefficients refer to eleven variables.

Table 2 :
Lower and upper bound related to Table1.

Table 3 :
Simulation results refer to multiscale SVR.
(10)riginal data; M: MAD filter; W: wavelet filter; RBF(3): number of crossvalidation for testing set is 3 by RBF, similar to RBF(6) and RBF(10); PF(3): number of cross-validation for testing set is 3 by PF, similar to PF(6) and PF(10); Hn: number of hidden layer; Et: elapsed time in seconds; RMSE: regression mean squared error for testing sample; MAE: MAE for testing sample; RMAE: RMAE for testing sample; NA: not available.