Time Series Analysis and Forecasting for Wind Speeds Using Support Vector Regression Coupled with Artificial Intelligent Algorithms

,


Introduction
In recent decades, increasing attention has been paid to renewable energy around the world due to the limited reserves of nonrenewable resources and the emerging crisis of global climate warming resulting from large amounts of greenhouse gases emission generated by fossil fuel combustion [1,2].The cumulative installed wind capacity in China was reported to be 91412.89MW in 2013, up 21.4% over the previous year.Adoption will not slow because increased wind power capacity goals have been set as promising solutions to the energy crises in many countries [3].With the installed capacity of wind energy increasing, resulting in the largescale integration of wind power into electrical power systems, additional problems and challenges have appeared, including power stability, quality, and, especially, power dispatching [4].
Uncertainties related to wind power can put power quality and system reliability at risk as the penetration of wind power increases, and major grid integration issues such as reserve capacities and balance management also arise [5].Because wind power is proportional to the cube of wind speeds and a 10% deviation of the expected wind speed results in an approximately 30% deviation in the expected wind power production [6], the prediction error of wind energy largely depends on the accuracy of wind speed forecasts.Accurate wind speed predictions for each of the farm's turbines are critical for the management of wind farms, which is usually the basis of wind power prediction and effective wind power utilization [7] and can increase the reliability of the power grid and reduce operating costs [8].
However, accurate and reliable wind speed forecasts are a significant challenge due to its stochastic nature with high rates of change, highly nonlinear behavior with no typical patterns [9], and dependency on elevation, terrain, atmospheric pressure, and temperature, which results in large uncertainties of wind speeds.Extensive efforts have been devoted to develop efficient wind speed forecasting models.To date, many forecasting models have been examined and proposed by applying different predictive methods and techniques performed on different forecasting horizons.According to the length of the prediction horizon, wind speed forecasting can be classified into long-term forecasting and short-term forecasting.The former can provide critical information for site location, windmill planning, and proper wind turbine sections for specific wind farms [10].Precise short-term wind speed forecasts can minimize scheduling errors that can exert a large impact on grid reliability and marketbased ancillary costs [11].According to these approaches, wind speed prediction can be clustered into two main categories, that is, physical methods and statistical methods [12].
Physical methods, which take into account physical factors, that is, temperature, pressure, wind farm layout, and local terrain, are based on numerical weather prediction (NWP) tools that provide weather forecasts by utilizing mathematical models of the atmosphere [2,[13][14][15]; these models require long operation times and large amounts of computational resources.Landberg initially proposed the concept of applying NWP tools as an input; tools such as the wind atlas analysis and application program (WAsP) and PARK are now used for wind prediction correction [13].
Statistical methods that are used to determine the relation between historical wind speeds by generally recursive techniques can be utilized for short-term wind speed forecasting.Many models have been developed to improve wind speed forecasting accuracy, including autoregression (AR), autoregressive moving average (ARMA), autoregressive integrated moving average (ARIMA), artificial neural networks (ANN), fuzzy logic (FL), support vector machine (SVM), and spatialtemporal models [16].Torres et al. [17] used ARMA and persistence models to forecast the hourly wind speed up to 10 h ahead.The ARIMA and ANN approaches have been used for wind speed time series forecasting on the south coast of the state of Oaxaca, Mexico [18].Three types of ANN models, namely, adaptive linear element, back propagation, and radial basis function, were investigated for hourly mean wind speed forecasting at two observational sites in North Dakota [19].A fuzzy model was proposed for wind speed prediction and provided wind speed forecasts from 30 min to 2 h ahead [20].Zhou et al. [21] suggested a systematic study on fine-tuning least-squares support vector machines (LS-SVM) model parameters for one-step-ahead wind speed forecasting for the first time.A methodology to characterize the stochastic processes applied for wind speed at different geographical locations via scenarios was provided [22].
Moreover, hybrid models that hybridize multiple features of different predictive models are usually adopted for wind speed forecasting because this type of model can comprehensively capture the intricate characteristics of wind speed series.Combining several forecasting methodologies is another strategy that can significantly improve predictive performance by taking advantage of each method's performance with respect to data sets, capability of describing nonlinearity and linearity, as well as prediction horizons; these combined models can be superior to individual models [12].Li et al. proposed a hybrid model consisting of the ANN and Bayesian approaches, and the results indicated that the hybrid approaches produced forecasting errors that were always smaller than those produced by ANN [23].Monfared et al. [24] developed an ANN and FL hybrid model to predict actual wind speed time series sampled in Rostamabad from 2002 to 2005, which demonstrated that this approach requires less computational time and provides better prediction performance.Salcedo-Sanz et al. [25] combined a hybridized ANN with a mesoscale model, and this combined strategy produced superior forecasting results.Additionally, Cadenas and Rivera investigated hybrid models that consisted of ANN and ARIMA and concluded that the hybrid models outperformed the individual ANN and ARIMA approaches [26].
Based on the aforementioned research, predictive models with different relative weaknesses and strengths have been widely studied and developed.Among these models, the potential for applying the combined approaches over a much wider application area has a special significance because individual models perform well only under specific and corresponding conditions and may therefore require the use of different models [12].In this paper, an intelligent hybrid forecasting model based on support vector regression (SVR), brainstorm optimization (BSO), and the Cuckoo search (CS) algorithm is proposed.Firstly, to determine the spatial and temporal relations of wind speed series collected from different wind turbines, cross correlation (CC) analysis was performed to discover information about the auto-and cross correlations of the wind speed time series.Then, the proposed hybrid models were applied for wind speeds prediction 1or 3-step ahead based on averaged hourly and 10 min wind speed series, respectively.The applicative case studies show that the proposed approach has far better performance for short-term wind speeds forecasting, which can assist wind power scheduling and wind farm management.
The remainder of the paper is organized as follows.Section 2 describes the related methodology.Section 3 presents the case study analysis.The conclusions are summarized in Section 4. Finally, acknowledgments and references are presented.

Cross Correlation Analysis.
In time series analysis, cross correlation refers to the correlations between two time series  1 and  2 , while the autocorrelation of a random vector  1 is the correlation of the series  1 with itself.Cross correlation is a measure of the similarity of two time series as a function of a time-lag applied to one of them, with the correlation ranging from −1 to +1.Moreover, cross correlation is useful to determine the time delay between two time series; the maximum value indicates the point in time where the signals are best aligned [27].Detailed information about cross correlation can be found in [28].
Mathematical Problems in Engineering 3

Elman Recurrent Neutral Network (ERNN).
The Elman recurrent neural network (ERNN), first proposed by Elman in 1990 [29], is a partial recurrent network model and lies somewhere between a classic feed-forward perception and a pure recurrent network.Recurrent neural networks have superior temporal and spatial behaviors, such as stable and unstable fixed points and limits cycles, and chaotic behaviors.These behaviors can be utilized to model certain cognitive functions, such as associative memory, unsupervised learning, self-organizing maps and temporal reasoning [30].Detailed information about ERNN can be found in [31].
The recurrent Elman architecture was chosen for this comparative work due to its nonlinear mapping ability, which can also be used to describe wind speed nonlinear patterns.A simple Elman artificial neural network structure is described in Figure 1.Because the dynamic characteristics of an Elman network are provided by internal connections, it does not need to use the state as an input or training signal, which makes ERNN superior to static feed-forward networks and explains why it is widely used in dynamic system identification.

Support Vector Regression (SVR).
SVR is an adaptation of a recently developed machine learning theory (MLT) known as support vector machines (SVM) proposed by Vapnik et al. [32].In the SVR model, a regression function  = () is fit, and it is then applied to predict the outputs based on a new input set.A brief review of SVR is introduced as follows [33][34][35].
Step 1.A nonlinear mapping (⋅) : R  → R  ℎ is defined to solve a nonlinear regression problem by mapping the training sets {(x  ,   )}  =1 into a high dimensional feature space R  ℎ .
Step 2. In the high-dimensional feature space, the nonlinear regression problem in the lower dimension space is transformed into a linear one by a linear function, namely, the SVR function where (x) denotes the forecasting values and the coefficients w (w ∈ R  ℎ ) and  ( ∈ R) are adjustable.
Step 3. Define the empirical risk,  emp (), where Θ  (y, (x)) is the -intensive loss function given by The -intensive loss function is utilized to control the sparsity of the solutions and the generalization of the models.
Step 4. Determine the overall training errors between the training data and the -insensitive loss function, which can be Hidden layer

Input layer
Output layer Contex layer The first term in ( 4) is employed to regularize weight sizes, penalize large weights, and maintain regression function flatness.The second term in (4) penalizes training errors of (x) and y by exploiting the -intensive loss function.Herein,  is a parameter to balance those two terms.The training errors below - are denoted as   ; otherwise they are denoted as  *  .
Step 5. Obtain the parameter vector w by solving the quadratic optimization problem defined in Step 4 where  *  and   are the Lagrangian multipliers.
Step 6. Establish the SVR regression function by the following equation: where (x  , x  ) is the kernel function and (x  , x  ) = (x  ) ∘ (x  ).In this paper, the radial basis function (RBF) was  selected as the kernel function due to its strong capability for nonlinearly mapping the training sets into an infinitedimensional space, which is suitable to handle nonlinear relationship problems.

Cuckoo Search (CS)
Algorithm.The CS algorithm, which was inspired by the breeding behavior of cuckoos, is a metaheuristic algorithm recently developed by Yang and Deb [36].For an optimization problem, the quality or fitness of a solution can simply be proportional to the value of the objective function.In the present study, the CS algorithm was used for parameter optimization.The CS is described briefly as follows.
When generating new solutions  (+1)  for, say, a cuckoo , a Levy flight is performed where  > 0 is a step size that should be related to the scales of the problem of interest.Based on the constraints imposed by the optimization problem,  must be tuned to the desired step size.Usually,  is utilized, as in the following equation ( 9): where  0 is the initial step change.The above equation is essentially the stochastic equation for a random walk.
In general, a random walk is a Markov chain whose next status/location only depends on the current location (the first term in the above equation) and the transition probability (the second term).The product ⊕ means entrywise multiplication.This entrywise product is similar to that used in PSO, but here the random walk via a Levy flight is more efficient in exploring the search space, as its step length is ultimately much longer.
The Levy flight essentially provides a random walk while the random step length is drawn from a Levy distribution Levy ∼  =  − , (1 <  ≤ 3) , (10) which has an infinite variance with an infinite mean.Here, the steps essentially form a random walk process with a power-law step-length distribution with a heavy tail.Some of the new solutions should be generated by a Levy walk around the best solution obtained so far, which will speed up the local search.However, a substantial fraction of the new solutions should be generated by far-field randomization, whose locations should be far enough from the current best solution; this will ensure that the system will not be trapped in a local optimum (Figure 2).

The Brainstorm Optimization (BSO) Algorithm.
We can describe the brainstorm optimization (BSO) algorithm as follows [37].
Step 1. Randomly select a population of  individuals that are feasible for the problem.An individual can be represented by a vector   = ( 1 ,  2 , . . .,   )  = 1, 2, . . ., , where  is the number of dimensions for the individuals.Generally, we initialize the individuals by using   =   +(  −  )⋅rand, where   and   represent the left boundary and the right boundary of the variable and rand is a random value selected from 0 to 1.
Step 2. Divide the  individuals into  groups by using the -means cluster method.
Step 3. Calculate the fitness function value for each individual and take the individual that has the best value in the group as the center for the group.
Step 4. Randomly select a number value  1 between 0 and 1 and compare it with a predetermined probability  1 .If  1 <  1 , randomly select a group center and then randomly generate an individual to replace it.
Step 5. To update the individuals, first randomly select a number value  2 from 0 to 1 and then continue according the following description.If  2 is less than a predetermined probability  2 , choose a random value  3 between 0 and 1.
(a1) If its value is less than a stationary probability  3 , select the group center and add random values to it for updating the individuals.
(a2) Otherwise, choose an individual randomly from this group and update it by adding a random value.
Otherwise, generate a random value  4 between 0 and 1.
(b1) If its value is less than a stationary probability  4 , select two centers of two groups randomly and combine them; then add random values to update the individual.
(b2) Otherwise, choose two individuals from two selected groups randomly and combine them, then add random values to update the individual.
(b3) Compare the fitness function values obtained by the new updated individuals with the corresponding original individuals, then save the better one and consider it as the new individual.
Step 6. Repeat Step 5 until the all of the individuals have been updated.
Step 7. When the termination criterion has been reached, end the process; otherwise, go to Step 2 and repeat.
It is clear that the individual update in Step 5 is vital to the BSO process.The replacement of the individual in this step is carried out by where  selected and  updated represent the selected and updated individual, respectively, (,  2 ) is the normal random function of which  is the mean and  2 is the variance, and the coefficient  is determined by where max iteration and current iteration represent the maximum iteration number and the current iteration number, respectively, rand is a value randomly chosen between 0 and 1, and  is a predetermined value.

The Proposed BSO-and CS-Based Hybrid Model.
In this study, an intelligent hybrid model is proposed for short-term wind speed forecasting by integrating intelligent optimization algorithms with a support vector regression model with inputs determined by spatial-temporal correlation analysis.It is well known that the forecasting accuracy of a SVR model largely relies on the reasonable assignment of the kernel parameter  and hyper parameter .Therefore, the determination of these parameters is a significant issue [38].However, there is neither a structural approach nor any shortage of opinions on how to determine efficient parameters.A grid search, which is a noninteractive traversal algorithm, is the basic method for determining both the kernel parameter  and hyper parameter .The parameters largely depend on the size of a grid.On the one hand, a confined grid can be well suited for determining accurate parameters for SVR, but the large workload and computational time results in poor model efficiency and practicability.On the other hand, use of a wider grid rarely chooses precise parameters, which can then lead to poor model performance.Lately, the problem of determining a suitable kernel parameter and hyper parameter is often solved using artificial intelligent optimization algorithms due to their superiority in addressing optimization problems when the objective function is nondifferentiable or has a great number of local minima.Artificial intelligence optimization algorithms have been proposed and rapidly developed.Thus, to improve the forecasting efficiency and performance of the SVR model, two recently developed artificial intelligence algorithms (BSO and CS) were adopted to identify the SVR parameters.
To determine accurate short-term wind speed forecasts using an intelligent hybrid model ensemble, two different SVR models were considered: BSO-SVR and CS-SVR.BSO-SVR was utilized with an averaged hourly wind speed series at sites  1 and  2 .BSO-SVR was used for 10 min wind speed prediction with 3-steps ahead.The main structures of the proposed models are shown in Figure 3.

Case Study and Analysis
where () and ŷ() are the actual value and the forecasted value at time , respectively.

Wind Speed Data Description.
In our case study, a wind park located in a reference region in China was selected, and 10 min wind speed series  1 ,  2 ,  3 ,  4 over a period of time were collected from four wind turbines installed at  1 ,  2 ,  3 ,  4 in the park, respectively.Because the time units of the wind speed time series are 10 min, short-term forecasts at future times that are multiple time units ahead are calculated.As shown in Figure 4, the observed data reveal that the wind speeds at the study sites  1 ,  2 ,  3 ,  4 exhibited similar fluctuations and trends.The available wind speed data were divided into training sets and test sets, which were utilized to determine the model's structure and to evaluate its predictive performance, respectively.In this paper, we were able to assume that Mathematical Problems in Engineering ideas;  x (t+1) i = x (t)  i + a ⊕ Levy () a = a 0 (x (t) j − x (t)  i ) ∼ u = t − , (1 <  ≤ 3) Levy Step 1: get together a brainstorming group; Step 2: generate many ideas; Step 3: pick up better ideas; Step 4: generate more ideas; Step 6: randomly pick an object and use the functions of the object as clues, Step 7: have the owners to pick up several better Step 8: obtain a good enough solution.

Determine input sets based on CC analysis
Hourly wind speed forecasting Step 5: pick up several better the distances between sites   and   ( ̸ = ) are short, which indicates that a wind speed value occurring at   at time  may occur in   at the same time or at a delayed time.As a result, it is reasonable to assume that there are strong and weaker spatial-temporal relations between two wind speed series.The native predictor suggests that as the forecasting time lag increases, the correlation with past measurements becomes negligible [39].Before constructing the models, we first examined the effects of spatial-temporal correlation between two sites.

Spatial-Temporal Correlation (STC) Analysis.
A cross correlation-based STC analysis was performed to explore the time delays between the different wind speeds (  and   ), which were referenced for the model inputs; this is critical for accurate forecasting model building.
From Figure 5, the coefficients between  1 and  2 ,  1 and  3 ,  1 and  4 ,  2 and  3 ,  2 and  4 , and  3 and  4 , are 0.9931, 0.9520, 0.9342, 0.8937, 0.8973 and 0.9577, respectively.This indicates that the spatial relationships between two sites are highly correlated, as the corresponding coefficients are  approximately equal to or larger than 0.9.Moreover, certain winds measured at   are expected to have arrived at   after a time delay and, to some extent, with decreasing wind speed.
A traditional method of identifying the time delays between a local site and other sites, is to consider the cross correlation derived from the wind speeds.Figure 6 intuitively illustrates the CC between   and   for different time delays, and the quantified values are listed in Table 1.In Figure 6, the degree of correlation is indicated using distinctive colors.The red or hot color indicates a stronger correlation, while the blue color indicates a weaker correlation.When the lag is equal to zero, it implies a spatial correlation of (  ,   ) with no time delay.Moreover, if the lag  is larger or smaller than zero, a temporal correlation exists between   and   with one time delayed or prior to the other.As can be visualized from each subplot of Figure 6, if the winds at site   are strong at   , there is a high correlation between   and   for a time delay within 5 units with the color of the region ranging from red to color.Additionally, the majority of the quantitative correlations of (  ,   ) for different time delays are 0.8, which demonstrates a high spatial-temporal correlation (STC) between   and   .In this study the principal objective is to forecast wind speed at site   multiple time steps ahead.Then, we assume that the wind speed forecast series   should be composed of the historical data of the wind speeds at   and the wind speeds at previous times measured at the other sites

Analysis of Forecasting Results
. In this section, four cases were used to validate the performance of the proposed BSO-based and CS-based SVR models.Firstly, the examined 10 min wind speed data at sites  1 and  2 were averaged over 1 h intervals, with 258 pairs of data for model training and 30 pairs for model testing.Furthermore, owing to the cross correlation, the input sets were lagged time steps of the current values of the wind speeds at the local and remote sites.
For SVR-based models, to avoid excessive errors during the training stages, the cross validation (CV) technique was utilized.It is well known that the predictive accuracy of a SVR model largely relies on reasonable values of the kernel parameter  and hyper parameter .Then, for the purpose of improving the forecasting capacity of the SVR, the BSO algorithm and the CS method were used to optimize the parameter selection of the SVR model.The radial basis function (RBF) was selected as the kernel function because of its ability to nonlinearly map the training sets onto an infinite dimensional space, which is suitable to handle linear and nonlinear relationship problems.
For the sake of comparison, in addition to the proposed model another model was tested, namely the Elman recurrence neural network (ERNN).The ERNN, a modified form of the artificial neutral networks with an internal layer of selfrecurrent neurons, is a nonlinear mapping function that can capture the nonlinearity of the studied wind speed series.Then, for fair comparison, both models with their corresponding mapping capacity were chosen for comparison with the proposed model.Based on the STC analysis discussed in the above subsection, the inputs to the models were the wind values from each one of the four measurement sites at several lag times.Thereby, both the local and the remote past information were introduced into these models for adequate periods of 1 h and 30 min behind.

BSO-Based SVR Forecasting Model: Two Cases for Averaged over 1 h
Interval.Herein, we established the proposed and comparison models based on the same wind speed dataset and obtained simulation and forecasting results to compare the models.Referring to Section 3.1, the evaluation criteria adopted in this paper are MAPE, MAE and MSE.The forecasted values and errors for the proposed BSO-SVR, basic SVR and ERNN models are shown in Figure 7 and Table 2.
As shown in Figure 7, over the predictive horizon the timing and range of variation of the forecasted series approximate the measured series, but some forecasted values slightly overor underestimate the actual values.To evaluate the models' forecasting performance quantitatively, the evaluation errors  8 and Table 3.
As shown in Figure 8, the timing and range of variation of the forecasted series mostly agree with the actual series, but some forecasted values were significantly underestimated and overestimated.Each model's forecasting performance can be quantitatively evaluated based on the errors over the performance horizon.Table 3 gives the predictive error values for both case studies; the smallest value of MAPE, MAE, and MSE was obtained by the proposed CS-SVR model, which demonstrates that the CS-SVR had better forecasting performance.As listed in Table 3, the forecasting errors obtained by CS-SVR model were much smaller than those of the single SVR model; MAPE, MAE and MSE decreased by 1.07%, 0.09 (m/s) and 0.06 (m/s) for Case 1, and 2.14%, 0.36 (m/s) and 0.37 (m/s) for case 2, respectively.However, the forecasting errors of the ERNN model are approximately the same as the CS-SVR errors, which means that ERNN performed well in Case 3.These results do not indicate that a particular method was determined to be superior to the others; model performance under specific conditions should be analyzed and understood, and incremental improvements should be made based on the knowledge gained [40].However, on the whole, the CS-based SVR model was significantly improved, and the proposed method can be utilized for shortterm wind speed forecasting over 45 min ahead.
The proposed BSO-SVR forecasting model outperformed the other two models when using average hourly wind speed data.The CS-SVR model performed well in 10 min wind speed forecasting.Based on the STC analysis results, we believe that the developed two models can be applied to the other two data sets because of their similar distributions and statistical characteristics.

Conclusions
As one of the promising forms of renewable energy, wind power has been growing rapidly over the world.Given the significant influence of wind speed on wind power, considerable efforts have been devoted to wind speed prediction.
In this study, two intelligent hybrid models are proposed for short-term wind speed forecasting by integrating intelligent optimization into a SVR model with inputs determined by a CC-based analysis of spatial-temporal relationships.To determine the spatial and temporal relations between wind speed series collected from different wind turbines, CC analysis provided adequate information about the auto-and cross correlations of the wind speed time series.For accurate  short-term forecasts of wind speeds using an intelligent hybrid model ensemble, two different SVR models were considered: BSO-SVR and CS-SVR.BSO-SVR was used with averaged hourly wind speed series at sites  1 and  2 .BSO-SVR was used for 10 min wind speed prediction for 3-steps ahead.The results of this study demonstrated that the proposed hybrid models were efficient when conducting short-term wind prediction compared to the single SVR and ERNN models, which indicates that the hybrid models can improve model prediction and enhance forecasting efficiency.

3. 1 .
Performance Evaluation Criteria.To evaluate the performance of the different models, three error evaluation criteria, mean absolute error (MAE), root mean squared error (MSE), and the mean absolute percentage error (MAPE), were adopted; the corresponding definitions are given by       () − ŷ ()  ()         × 100%,

3 -𝜀Figure 3 :
Figure 3: The structure of the proposed intelligent hybrid models.

Table 1 :
The quantified cross correlation values.

Table 2 :
Forecasting errors over the prediction horizon: averaged over 1-h interval for Case 1 and Case 2.

Table 3 :
Forecasting errors over the prediction horizon: averaged over 10-min interval for Case 3 and Case 4.