Support Vector Regression Method for Wind Speed Prediction Incorporating Probability Prior Knowledge

Prior knowledge, such as wind speed probability distribution based on historical data and the wind speed fluctuation between the maximal value and the minimal value in a certain period of time, provides much more information about the wind speed, so it is necessary to incorporate it into the wind speed prediction. First, a method of estimating wind speed probability distribution based on historical data is proposed based on Bernoulli’s law of large numbers. Second, in order to describe the wind speed fluctuation between the maximal value and the minimal value in a certain period of time, the probability distribution estimated by the proposed method is incorporated into the training data and the testing data. Third, a support vector regression model for wind speed prediction is proposed based on standard support vector regression. At last, experiments predicting the wind speed in a certain wind farm show that the proposed method is feasible and effective and the model’s running time and prediction errors can meet the needs of wind speed prediction.


Introduction
Wind power is a clean, renewable energy that will play an increasingly important role in the future electricity supply [1].Unfortunately, due to the stochastic and nonstationary nature of wind, the wind power is variable and uncontrollable.It is difficult to maintain the balance between the supply and the demand of electricity, which is required by the electricity system [2].Wind speed prediction is a key point in the management of wind farms because it is directly related to the power produced by each of the farm's turbines, so it is usually the base of wind power forecasts, and it is necessary to increase the accuracy of the wind speed prediction for the effective use of wind energy.
At present, there are mainly two kinds of wind speed prediction methods.One is based on the physical model, and the other is based on historical data.The prediction methods based on physical model often use the numerical weather prediction (NWP) data for wind speed prediction [3,4].Wind speed prediction methods based on NWP do not focus on the speed of a farm's turbines but on the speed of a region.Thus, it needs to solve the problem of how the wind speed of a region is mapped to the wind speed of a certain wind generator.Wind speed prediction methods based on historical data predict the wind speed by using correlations among the initial data.In 2008, Louka et al. [5] improved wind speed forecasts for wind power prediction using Kalman filtering.In 2012, Cao et al. [6] presented a comparative analysis of the wind speed prediction accuracy of univariate and multivariate ARIMA models with their recurrent neural network counterparts.In 2013, Woods et al. [7] developed a method to produce synthetic time series of wind power at several locations based on a measured time series of wind speed from a reference site, and so on.
In the 1990s, Vapnik et al. [8,9] proposed support vector machines (SVMs), including support vector classifications (SVCs) and support vector regressions (SVRs).SVMs focus on the statistical learning problems for small size samples by solving a convex quadratic optimization problem and can solve the local minimization problem which cannot 2 Mathematical Problems in Engineering be avoided by the neural network algorithm.SVMs use a kernel function to map the data in original space to a high dimensional feature space and then solve the nonlinear decision problem in high dimensional space.Thus, SVMs can successfully solve the problem of dimension disaster and have good generalization ability.However, the standard SVMs focus on historical data and cannot incorporate prior knowledge into learning process, which may causes the generalization ability of the standard SVMs to decrease.Therefore, in 2009, Guan et al. [10] proposed a modified method that incorporated prior knowledge into cancer classification based on gene expression data to improve accuracy.In 2011, Zhang et al. [11] proposed a fully Bayesian methodology for generalized kernel mixed models, which are extensions of generalized linear mixed models in the feature space induced by a reproducing kernel.In 2012, Liu and Xue [12] focused on designing a new class of kernels to incorporate the prior information into the training process of support vector regressions.Currently, SVMs have received extensive attention and are attracting more and more scholars to study from different views [13][14][15][16][17][18][19][20][21][22].
In 2011, Zhou et al. [23] presented a systematic study on fine tuning of LS-SVM model parameters for one-step ahead wind speed prediction, and Ortiz-García et al. [24] proposed an improvement to an existing wind speed prediction system using banks of regression support vector machines for a final regression step in the prediction system.
However, for the problem of wind speed prediction in practice, there is much prior knowledge.For example, the wind speed has a certain probability distribution in a season or in a day, and the probability distribution can be estimated with historical wind speed data.As the probability distribution can provide much more information about the wind speed, it is necessary to incorporate it into the wind speed prediction.Also, in a wind farm, the output wind speed V at a fixed time  is the mean value V of many measured values V  ( = 0, 1, . . ., ) during a certain period of time Δ.Assume that V max = max  {V  } and V min = min  {V  }, then the larger the V max − V min is, the more the fluctuation of wind speed during the period of time Δ is.Conversely, the smaller the V max − V min is, the less the fluctuation of wind speed during the period of time Δ is.Nevertheless, the mean value V does not provide this prior knowledge at all.Therefore, in order to decrease the wind speed prediction errors, it is necessary to find a way to incorporate this prior knowledge in the wind speed prediction.However, the present methods for wind speed prediction often used the historical wind speed data directly to predict the wind speed, instead of dredging information from the data, and the prediction errors are difficult to decrease.Therefore, in order to decrease the prediction errors of the wind speed at a fixed time, the probability distribution of historical wind speed data is estimated and incorporated into the training data and testing data to provide the information about the wind speed fluctuation.Then a support vector regression model for wind speed prediction is proposed combined with the standard SVR.
This paper is structured as follows.Section 2 is the preliminaries.Section 3 is the method of estimating the probability distribution of the historical wind speed data.Section 4 is to incorporate the prior knowledge about the wind speed fluctuation into the training data and testing data and then establish the -support vector regression method for wind speed prediction incorporating probability prior knowledge (PPK--SVR).Section 5 includes two experiments with the historical wind speed data coming from a wind farm in Gansu province, and Section 6 draws the conclusions.

Preliminaries
In this section, we briefly review some relevant knowledge of probability theory and the standard support vector regression often used in applications.
Let (Ω, F) be a measurable space,  a function defined on F, and 0 ≤ () ≤ 1 for any  ∈ F. We call () the probability of event  occurring, and () indicates the possibility of event  occurring.Assume that () is a random variable on Ω, () = {() < } ( ∈ ) is the distribution function, () is the expectation, and () is the variance of random variable , respectively.Definition 1 (see [25]).Let () be the distribution function of random variable ().If there exists a nonnegative and integrable function () satisfying then we call () the probability density of continuous random variable ().
Definition 3 (see [25]).Let the sequence of random variables {  } be independent.{  } is said to be independent and identically distributed if they have the same distribution function ().
Definition 4 (see [25]).Let {  } be a sequence of random variables and let  be a constant.If, for any  > 0, we have lim then the sequence {  } is said to converge in probability to .
Theorem 5 (Bernoulli's law of large numbers [25]).Let   be the number of event  occurring in  independent duplicate experiments, and let  be the probability of event  occurring.Then, for any  > 0, we have that is, Remark 6. Bernoulli's law of large numbers proves the frequency stability in theory.In other words, the frequency   / converges in probability to .Then for a sufficiently large n, the frequency   / almost equals the probability .Hence, if the number of experiments is very large, the frequency   / can be treated as the probability  in practice.

Regression Problem. Suppose that
to solve the optimal problem min ,,, * where (  ) =   (  )+ and data (6) are mapped to a higher dimensional characteristic space by the function () and the constant  > 0 determines the tradeoff between the flatness of () and the amount up to which deviations larger than  are tolerated.Similar to support vector classification,  may be a huge vector variable, and then we solve the dual problem min where As a result, SVR has a sparse representation of solutions and hence is relatively fast in training and testing.SVR is the most common application form of SVM and has been popular for regression and function estimation problems in the past decades.

Method of Estimating the Probability Distribution of Historical Wind Speed Data
In practical problems, there are a lot of historical wind speed data.In order to dredge more information from them, in this part, we will give a method to estimate the probability distribution of historical wind speed data.
3.1.Method.Assume that () in Figure 1 is the probability density of independent and identically distributed random variables   ( = 1, 2, . . ., ) and the number and the frequency of   falling into interval [ −1 ,   ) ( = 1, 2, . . ., ,  ≤ ) are   and   /, respectively.By Definition 1, we can obtain that the probability   (namely, the area of trapezoid with curved edge ) of   falling into interval By Theorem 5, we have With the infinitesimal method, the area of trapezoid with curved edge  can be approximately substituted by the area of trapezoid , namely, By formulas (10), (11), and (12) we have Mathematical Problems in Engineering Similarly, for ( −1 ), (  ) and ( +1 ), we have From formulas ( 13) and ( 14) we can obtain that 3.2.Algorithm.Suppose that V() is the wind speed at a fixed point .Then V() can be seen as a random variable defined on time set .The  samples data V(  ) ( = 1, 2, . . ., ) can be seen as a sequence of sample data coming from  independent random variables   ( = 1, 2, . . ., ).Supposing that the probability density function of wind speed V() is (), then the steps of estimating the probability density () are as follows.

Experiments.
In order to verify the effectiveness of the above method, in this subsection, sample data coming from normal distribution and exponential distribution are used to make experiments, respectively.Experiment 1. Suppose that  is a standard normal random variable; then its probability density is and the graph of () is shown in Figure (10).Suppose that the sample data Experiment 2. Supposing that  is a random variable following an exponential distribution, then the probability density is and the graph of () is shown in Figure 3.With the similar steps in the above experiment, suppose that the sample data  1 ,  2 , . . .,  10000 are independent and identically distributed coming from the probability density() and   ∈ [0, 10] ( = 1, 2, . . ., 10000).Insert 99 points in interval [0, 10] equidistantly, and interval [0, 10] is divided into  = 100 small intervals with the same length.With the proposed method above, we can obtain function () which is the estimation of exponential density () (see Figure 3).

Results Analysis.
From Figure 2 we can see that the estimation density () obtained with the sample data is not as smooth as the standard normal density (), but the rough shapes of the two functions are almost the same.And from Figure 3 the similar conclusions can be drawn.That is to say, the method of estimating the probability density based on Bernoulli's law of large numbers and infinitesimal method is effective, and it lays a foundation for establishing the PPK--SVR.

Support Vector Regression Method Incorporating Probability Prior Knowledge
In this section, we aim to predict the wind speed V() at a fixed point .

Incorporating the Prior Knowledge about the Wind Speed
Fluctuation into the Training Data and Testing Data.Supposed that () is the probability density estimated with the above method,  0 is the initial time, and V 0 is the wind speed at  0 , denoted by ( 0 , V 0 ).In practice, the wind speed at a fixed point  is often measured many times  0 ( = 1, 2, . . ., ) for every certain period of time Δ (where  0 <  01 < ⋅ ⋅ ⋅ <  0 ≤  0 + Δ), and the mean value V = (1/) ∑  =1 V 0 is output as the predicted wind speed at the fixed point .In other words, the wind speed Of course, the mean value V can represent the wind speed V 1 at  1 in a sense, but in some cases the mean value V is quite different from V 1 .For example, if the measured wind speed V 0 ( = 1, 2, . . ., ) is the same value during a certain period of time Δ, then "the mean value V is the wind speed V 1 " holds with a high probability.Conversely, if the measured wind speed V 0 fluctuates wildly during a certain period of time Δ, then "the mean value V is the wind speed V 1 " holds with a very low probability.
Hence, in order to incorporate this prior knowledge into the wind speed prediction, the training datum In fact, from formulas ( 19) and ( 20) we can see that the larger the V max − V min is, the larger the  1 is and, furthermore, the smaller the 1 −  1 is.That is to say, the possibility of "wind speed at  1 is V 1 " is very small.On the other hand, the large V max − V min illustrates that the wind speed from  0 to  1 fluctuates wildly and "the mean value V is the wind speed V 1 " holds with a low probability (namely, the possibility of "wind speed at  1 is V 1 " is very small), which is in accord with the information provided by 1− 1 .Thus, the probability 1 −  1 provides the fluctuation about the wind speed during a certain period of time Δ.Therefore, datum ( 1 , 1 −  1 , V 1 ) contains the prior knowledge provided by the historical data.

𝜀-Support Vector Regression Method for Wind Speed
Prediction Incorporating Probability Prior Knowledge.For the above problem of wind speed prediction, -support vector regression method incorporating probability prior knowledge (PPK--SVR) can be constructed as follows.
Step 3. Constructing and solving the convex quadratic programming problems: min we can obtain the optimal solution Step 4. Choose the component If  *  is chosen, then Step 5. Construct the decision function with Remark 8. Solving regression problems with support vector regression, the kernel function can be selected according to prior knowledge, such as the characteristics of the problem, or the training set.More details about selecting kernel function with prior knowledge will be investigated in another paper.

Experiments
In this part, we take a wind farm in Gansu province as an example.For a fixed point , in order to predict the wind speed V() (m/s) at  (Hours), we recorded the wind speed from November 2006 to April 2008 and found that the wind speed had a periodicity with the change of seasons, months, or days.Therefore, the probability distribution  of wind speed in the period of the previous year (month or day) can be incorporated into the wind speed prediction of the corresponding period in this year (month or day).In this wind farm, the wind speed V is output every ten minutes, so there are 144 sets of data in a day and more than four thousand sets of data in a month.As SVR focuses on the statistical learning problems for small size samples and the wind speed had a periodicity with the change of days, the experiment is aimed at the short-term wind speed prediction, and we choose 144 sets of data to carry out the experiment.Here, without loss of generality, we take the wind speed prediction on 1 April 2008 as an example.

Estimation of the Probability Density of Historical Wind
Speed Data.Supposed (  , V  ) ( = 1, 2, . . ., 144) indicates that the wind speed at time   is V  .Wind speed was measured ten times V(  ) ( = 1, 2, . . ., 10) during every ten minutes, and the mean value V  = (1/10) ∑ 10 =1 V(  ) is output as the wind speed V +1 at time   + 1/6.By the proposed method in Section 3, the probability density () of the wind speed data on 1 April 2008 was estimated and the graph of () is shown in Figure 4.

Incorporating the
and data (29) are converted into

𝜀-Support Vector Regression Method for Wind Speed
Prediction Incorporating Probability Prior Knowledge.In the experiment, a grid search method based on 5-fold crossvalidation is chosen to determine model parameters, where  ∈ {2 −10 , 2 −9 , . . ., 2 1 },  ∈ {2 −6 , 2 −5 , . . ., 2 6 }, and the kernel  function is the radial basis function (RBF).In order to predict the wind speed V 144 for the given  144 with training data ( 1 , V 1 ), ( 2 , V 2 ), . . ., ( 143 , V 143 ) in data (30), we make experiment with PPK--SVR and standard -SVR, respectively; the results are shown in Tables 1 and 2, respectively.The wind speeds of training data, normalized wind speeds of testing data, and wind speeds of testing data with PPK--SVR are shown in Figures 5(a Similar to the steps of predicting wind speed V 144 for the given  144 , we make experiment 50 times to predict the wind speed V  for the given   ( = 1, 2, . . ., 50) (namely, the former 50 wind speeds monitored on 2 April 2008), the average mean squared errors are shown in Table 3, and the numbers after ± are the standard deviations.

Result Analysis.
From Tables 1 and 2, we can see that the mean squared errors of training data and testing data with PPK--SVR are all smaller than the corresponding ones with standard -SVR.Comparing Figure 5(a) with Figure 6(a) and Figure 5(c) with Figure 6(c), we can find that the predicted wind speeds of training data and testing data with PPK--SVR are more close to the initial wind speeds than those with standard -SVR.This illustrates that the prediction error of PPK--SVR is smaller than that of standard -SVR in predicting the wind speed V 144 for the given  144 .The initial wind speed The predicted wind speed   From Figure 5(b) we can obtain that the difference between normalized initial wind speed and normalized predicted wind speed is 0.766 − 0.707 = 0.059 which is equal to the square root of mean squared error 0.0035 in Table 1, which shows the effectiveness of the PPK--SVR.
From Table 3, we can see that the average mean squared error (namely, the average prediction error) of PPK--SVR is smaller than that of standard -SVR, which illustrates that PPK--SVR method is more accurate than standard -SVR.What is more, the standard deviation of PPK--SVR is also smaller than that of standard -SVR, which illustrates that the PPK--SVR is more stable than the standard -SVR.And also the running time of PPK--SVR is less than one minute, which shows that the model's running time can meet the needs of wind speed prediction in application.

Conclusions
In this paper, a method of estimating the probability density of historical wind speed data is proposed, and the estimated probability density is used to describe the wind speed fluctuation between the maximal value and the minimal value in a certain period of time.Then the prior knowledge provided by historical wind speed data is incorporated into the training data and the testing data.Then, based on standard -SVR, a kind of support vector regression for wind speed prediction incorporating probability prior knowledge is proposed.The comparing experiments show that the proposed PPK--SVR is feasible and effective and the model's running time can meet the needs of wind speed prediction in application.And, how to incorporate prior knowledge into selecting the kernel function to decrease the prediction error further is our future study.

Figure 4 :
Figure 4: Probability density of wind speed data on 1 April 2008.
Prior Knowledge about the Wind Speed Fluctuation into the Training Data and Testing Data.Denote that V max  = max  {V(  )} and V min  = min  {V(  )}.By the estimated probability density () of the wind speed data on 1 April 2008 and formula (19), the probability   = {V min  ≤ V  ≤ V max  } can be calculated.The wind speed sample data on 1 April 2008 are

x
The initial wind speed The predicted wind speed (x) (c) Wind speeds of testing data

Figure 5 :
Figure 5: Initial wind speeds and predicted wind speeds with PPK--SVR.
), 5(b), and 5(c), respectively.The wind speeds of training data, normalized wind speeds of testing data, and wind speeds of testing data with standard -SVR are shown in Figures 6(a), 6(b), and 6(c), respectively.

( c )
Wind speeds of testing data

Figure 6 :
Figure 6: Initial wind speeds and predicted wind speeds with standard -SVR.

Table 2 :
Experiment results with standard -SVR.

Table 3 :
Average mean squared errors.