Short-Term Wind Speed Forecasting Study and Its Application Using a Hybrid Model Optimized by Cuckoo Search

1Key Laboratory of Arid Climatic Change and Reducing Disaster of Gansu Province/Key Open Laboratory of Arid Climatic Change and Disaster Reduction of Chinese Meteorological Administration, Institute of Arid Meteorology, Chinese Meteorological Administration, Lanzhou 730020, China 2Gansu Meteorological Service Center, Lanzhou, Gansu 730020, China 3School of Mathematics & Statistics, Lanzhou University, Lanzhou, Gansu 730000, China 4MOE Key Laboratory of Western China’s Environmental Systems, Research School of Arid Environment & Climate Change, Lanzhou University, Lanzhou 730000, China 5Scientific Information Center for Resources and Environment, Lanzhou Branch of the National Science Library, Chinese Academy of Sciences, Lanzhou 730000, China 6Datong Meteorological Bureau of Shanxi Province, Datong 037010, China


Introduction
In the contemporary energy market, the demand for electricity soars intensely due to the development of economy and society, while reserves of fossil fuel for power generation are becoming exhaustive and various ecosystem problems are increasing.Under this serious condition, renewable, clean, and nonpolluting energy becomes alternative energy for substituting fossil fuel.So wind energy becomes the one satisfying the above requirements.Meanwhile, as the increasing generation of wind power and the growth of integration of wind power into grid system, electricity generation based on wind energy resource has been playing an increasing role in China.The installed wind power capacity has been increased by approximately 200% between 2005 and 2009 [1].Despite the high cost of wind power plant, wind power has its unique advantages especially at remote locations which are rich in wind energy resource in China.
Wind series from the southwest of China, Wuwei City and Jinchang City in Gansu province, appear to have complex characteristics, such as high volatility, nonstationarity, and nonlinearity.In order to work efficiently on the market of the wind power, it is apparent that forecasting the wind power production is essential for farm owners and assists producers in making decisions for the sale of energy, thus increasing production and profits.If an accurate prediction of the wind speed for the following time can be evaluated, the total amount of active power that can be produced by each generator on a wind farm can be determined.So wind speed prediction is getting more and more attention [2].
However, as the result of the complicated characteristics of wind speed, such as chaotic fluctuation, nonstationarity, and nonlinearity, forecasting has been the most challenging task.In order to predict wind speed efficiently, research in the field of forecasting the wind power or wind speed has been devoted to the development of reliable and effective tools and many different methods have been reviewed and proposed in [3][4][5].
As wind speed appears to be of high volatility and nonstationary, some additional techniques as preprocessing procedures are proposed to remove the irregular wind speed, such as empirical mode decomposition (EMD) [6][7][8] method and wavelet transform (WT) or wavelet denoising (WD) method [9][10][11][12][13][14]. WT has been widely applied to present a signal in both frequency and time domains.Wavelet transform method has been extensively applied recently in analyzing a nonstationary and high fluctuant series.It decomposes the original complicated data into several components of wavelet transform, one of which is smooth and reflects the inherent and real information.Because of the complexity of factors of wind speed fluctuation, wavelet transform as a preprocessing procedure is used so as to obtain a further excellent performance.In [13], WT is used to decompose the original wind speed series into detail signal and approximation signal to remove the abnormal fluctuation of wind speed series for further modeling.Catalão et al. [9] propose artificial neural networks combined with WT for short-term wind power forecasting in Portugal.The proposed model is both effective and novel, outperforming persistence, ARIMA, and NN approaches.EMD is based upon the local characteristic time scale of signal and could decompose the complicated signal function into a number of intrinsic model functions (IMFs) for further modeling [15].
Zhang et al. [16] said that the frequently used statistical approaches of wind speed series forecasting can be classified into statistical models and artificial intelligent algorithms (AI).The former establish time series models to predict the future speed by mining information contained in the historical signals.Time series method includes Autoregressive Integrated Moving Average (ARIMA), which is used for forecasting wind power in US wind farms in [17,18].A part of the models outperform the persistence model.The Autoregressive Conditional Heteroscedastic (ARCH) model is conjoined with ARIMA model to take the heteroscedasticity influence of the residual series [19] into account.An ARMA-GARCH-M framework is employed to examine the 26 regional wind power energy markets in the US using daily average wind speed [20].It revealed that wind speed displays a characteristic of time-varying volatility and there is different relationship between the mean and volatility of wind speed series across the different locations.In [21], the ARIMA-ARCH model is employed to predict wind speed series itself.It is demonstrated that the ARIMA-ARCH model offers better performance than single ARIMA model.
Another approach is the intelligent algorithm models building a nonlinear model to fit the historical wind speed series by minimizing the training error, such as Artificial Neural Networks (ANN).It is a widely used statistical method for many fields, such as stock price [22], electricity price [23], load forecasting [24,25], gas consumption [26], and wind speed [27,28].A typical artificial neural network, Backpropagation Neural Network (BPNN) [29], is actually a mapping function relation from the vector(s) of input to output with unknowing the correlation between the data.It has been proven in mathematical theories that BPNN can implement any complicated nonlinear mapping function and approximate an arbitrary nonlinear function with satisfactory accuracy [30].By learning the historical data pattern, BPNN can be effectively utilized to predict series in new horizon.Similarly, support vector regression (SVR) is also designed to capture the nonlinear patterns from time series [1,31,32].Also, it has been observed that it can model nonlinear wind speed with an excellent performance.Nevertheless, one of disadvantages of the method is dilemma of selection of values of parameters in support vector machine because the way of selecting values for the parameters will affect the generalization performance remarkably.In this paper, chaos optimization is applied to accomplish selection of values of parameters.
As chaotic fluctuation, nonstationarity, and nonlinearity of wind speed series, hybrid models based on linear and artificial intelligence are popularly proposed in the research of wind speed series forecasting.Liu et al. [33] proposed two hybrid methods: ARIMA-ANN and ARIMA-Kalman models.ARIMA model is utilized to determine the structure of ANN and initialize the Kalman measurement and the state equations for Kalman.Su et al. [34] proposed ARIMA and Kalman filter to predict the daily mean wind speed in the west of China.To develop a novel hybrid model which is adapted to the data set and increase the fitting accuracy, this approach used Particle Swarm Optimization (PSO) to optimize the parameters of the ARIMA model.Both of them obtain good performance and are applied to the wind speed forecasting.A hybrid of ARIMA-ANN is employed in [35].The ARIMA models were used to forecast the linear pattern and then with the obtained errors ANN were built to forecast the nonlinear tendencies that the ARIMA could not identify.It reveals that these hybrid models have a higher forecasting accuracy than the single ARIMA and ANN.
There is a large amount of research directed to the development of reliable and accurate wind speed and power prediction models.However, it is difficult to draw a conclusion of which model is the best because a model could perform well at its site, but not at other sites.In other words, a potential best forecasting model at one site does not guarantee the model to work well at another site.This paper discusses forecasting accuracy in different sites and months based on a preprocessing method and comparison between a new optimal algorithm and some conventional optimal algorithms that are used in the forecasting models.In most of the cases, the statistical tools can provide accurate results in the short-term, medium-term, and long-term prediction.However, as to the very short-term and short-term horizon, the effect of atmospheric dynamics on the wind speed becomes more important, so in these cases the use of physical approaches becomes important.This paper will explore the accuracy of very short-term (10 minutes) of 3-step forecasting by the use of statistical approaches.
The main contributions of this paper are as follows.Several standard forecasting models (SVR, BP, and Elman) are used to forecast wind series.These models make an excellent performance, respectively.In order to improve accuracy further, another two kinds of techniques are proposed in this paper.The first kind is to use 5-3 Hanning filter and Wavelet denoising as a preprocessing procedure.The second kind is a new mate-heuristic algorithm, cuckoo search, which is introduced to optimize the parameters of SVR and compare with grid search (GS) and two conventional optimal algorithms (GA and PSO).To demonstrate that our proposed method is effective, electricity price in New South Wales is utilized to build proposed models and get satisfying results.
This paper is organized as follows.The explicit theories of the proposed approach are described in Section 3, including 5-3 Hanning filter and WD, SVR, BP, Elman, and optimal algorithms.Section 3 provides the proposed methods in this paper.In Section 4, numerical results and evaluation of forecasting performance in the case study are shown.Section 5 provides some conclusions and suggestions.

The Related Methodology
2.1.The Data Preprocessing Method.The proper data preprocessing can effectively remove the useless information, such as outliers and noises, in a time series.As wind speed appears to be of high volatility and nonstationary, some preprocessing procedures are introduced to remove the irregular wind speed and outliers of electricity price.

The Proposed 5-3 Hanning Filter
(5-3H) Technique.5-3H method is short for the medians of five-three-Hanning smoothing method ("five" is a method for a median-of-five smoothing, "three" for a median-of-three smoothing, and "H" for Hanning smoothing).This method, presented by Tukey, adopts weighted smoothing by three times to the original data to generate the ultimate smoothed estimates.Tukey introduces three steps for the signal preprocessing: five-point moving average smoothing, three-point moving average smoothing, and Hanning moving average smoothing, respectively.Flowchart of this method is illustrated in Figure 1.
Let the original data be {(),  = 1, 2, . . ., }, where  is the length of time series .And three steps are illustrated and expressed as follows.
(2)  Step 2. Three-point moving average smoothing.For the smoothed signal in the first step, we use three-point moving average smoothing method to form the second smoothed estimates.The series (−1), (), (+1) sorted from small to large is expressed as  (−1) ,  () ,  (+1) .So three-point moving median average smoothing signal {()} can be presented as follows: for  = 4, . . .,  − 3, where () is the th point in the second smoothed time series.Then, the six items missing in the series () can be estimated as follows: Step 3. Hanning moving average smoothing.As for the second smoothed signal, we use Hanning filter to produce final smoothed signal.For a Hanning smooth, for  = 2 to  − 1, where () is the th point in the final smoothed signal.Then, the six items missing in the series () can be estimated as Step 4. Compute median absolute deviation (MAD).MAD reflects the degree of absolute dispersion of every original data.The median   of {()} can be presented as MAD can be expressed as Step 5. Set threshold to remove outliers and smooth data.
In this paper, we set threshold value as 0.3.() is series to replace original data needed to be replaced by this following formula: where  is a logical variable valued either 0 or 1.So the preliminary 5-3H values can be expressed as And by replacing the eight values in the beginning and end of preliminary 5-3H values we could obtain the final 5-3H values: for  = 1, 2, 3, 4,  − 3,  − 2,  − 1, .

Wavelet Transform (WT).
The WT method is an effective mathematical method used to analyze signal by decomposition into various frequencies.WTs can be categorized into two kinds: Continuous Wavelet Transform (CWT) and Discrete Wavelet Transform (DWT).DWT is for wavelets discretely sampled.As for WTs, a key advantage over Fourier transforms is their temporal resolution, which captures both location information and its frequency.In this work, DWT is used to decompose the original wind speed data.
WT decomposes a signal into many detail components and an approximation component, where approximation component contains low-frequency information, the most essential part to identify its signal, and where the detail components reveal the noise of signal.Figure 2 is a tree of wavelet decomposition displaying the decomposition procedure.Firstly, the original data is decomposed into an approximation component  1 and a detail component  1 ; and then the  1 is continued to be decomposed into another approximation component  2 and detail component  2 if it is necessary to analyze the signal with higher level resolution.Continue this process until it reaches a suitable number of levels.
The original wind speed data is decomposed into several components, one approximation component and multiple detail components, to reflect the characteristics of the wind speed data on different levels.The approximation is designed to present the main trend of the original wind speed and the details are designed to present the stochastic volatilities on different levels.A suitable number of levels can be determined by comparing the similarity between the approximation and the original wind speed.

Artificial Intelligence Algorithm.
Zhang et al. [16] considered that statistical models are not perfect in forecasting.As most of statistical models assume that the data is normally distributed, however, wind speed series is not normally distributed [36].Additionally, the stochastic and intermittent characteristics of wind speed series require more complex models and functions for capturing the nonlinear trends and relations, whereas these models are built based on a hypothesis: a linear correlation structure exists among time series values [37].Consequently, the wind speed series is difficult to be forecasted accurately by statistical models.To address these problems of statistical approaches, the AI models, mainly including ANN and SVR, have got more and more concerns for accurate short-term wind speed prediction.

Artificial Neutral Network (ANN)
. ANN consists of interconnected artificial neurons which are programmed to imitate the natural properties of biological neurons.It has been widely used in forecasting time series, especially the data nonnormally distributed, such as wind speed.

Backpropagation Neutral Network (BPNN).
In this work, a backpropagation (BP) is adopted as one of the comparative approaches for short-term wind speed forecasting.In Figure 3 the BP contains an input layer, at least one hidden layer, and an output layer, which implement the map of an input vector to output scalar via activation function in different neurons.With  inputs and  hidden neurons, the output of the th hidden node   can be calculated as where   denotes the connection weight from the th input node to the th hidden node,  − is -step behind past wind speed   , and  ℎ (⋅) denotes a sigmoid activation function in the hidden layer.Then, the wind speed prediction can be estimated by where   denotes the connection weight from the th hidden node to the output node, ŷ denotes the forecasted wind speed at the th sampling instant, and   (⋅) is a linear activation function for the output layer.The nonlinear mapping   capability of ANN is achieved by minimizing the overall error between the actual wind speed   and the predicted wind speed ŷ through Levenberg-Marquardt (LM) algorithm [38].

Elman Recurrent Neutral Network (ERNN).
Elman recurrent neural network (ERNN) is a famous recurrent topology, developed by Tong et al. [39].In a typical ERNN, the hidden layer neurons are fed by the outputs of the context neurons and the input neurons (Figure 4).Context neurons are known as previous states (memory units) of output of hidden neurons [40].This recurrent topology makes the ERNN more sensitive to the historical data, increasing its capacity of dealing with the dynamic information.In addition, it is not necessary to use state variable as the input or the training data.Its dynamic characteristics are provided by its internal connections, which make network more suitable for timevarying system modeling.This is also an important factor making ERNN superior to the feed-forward neutral network, such as multilayer perceptions (MLP) and radial basis function networks (RBF).
Because of ERNN's training algorithm which is mainly based on the gradient descent method, this may cause

Criterion
Fixed size selected subset Regression in primal space a number of problems [41]: (1) the speed of network convergence is slow, and the training may give rise to a lower learning efficiency; (2) as the network structure and weights are not trained concurrently, a good performance of dynamic approximation cannot be guaranteed; (3) lack of the global search capacity easily makes it fall into a local best.[42], support vector machines (SVMs) are one of the most widely used models based on statistical learning theory.A nonlinear mapping (⋅) : R → R is defined to map the training data set (input data) {(x, )} into a high dimensional feature space (which has infinite dimensions), R (Figure 5).Then, in this high dimensional feature space, theoretically there exists a linear function, , to formulate the nonlinear relationship between input data and output data.Such a linear function, namely, SVR function, is given as follows:

Support Vector Regression (SVR). Developed by Vapnik
where () is the forecasting values; the coefficients w (w ∈ R) and  ( ∈ R) are adjustable.As mentioned above, through SVM method one aims to minimize the empirical risk, where Θ  (y, (x)) is the -insensitive loss function (Figure 6) and is defined as follows: In addition, Θ  (, (x)) is employed to find out an optimal hyper plane on the high dimensional feature space (Figure 5) to maximize the distance separating the training data.So, the SVR focuses on finding the optimal hyper plane and minimizing the -insensitive loss function and the training error between the training data.
Then, the SVR minimizes the overall errors with the constraints The first term in (17) employed the concept of maximizing the distance between two separated training data sets and is used to regularize weight sizes, to penalize large weights, and to maintain regression functional flatness.And the second term  ∑   (  +  *  ) penalizes training errors of (x) and y by using the -insensitive loss function. is a parameter to trade off two terms.Training errors above  are denoted as  *  , and training errors below − are denoted as   (Figure 6).
After solving the problem of quadratic optimization with inequality constraints, the parameter vector w of ( 14) is obtained: where  *  ,   are obtained by solving a quadratic problem and are the Lagrangian multipliers.Finally, the SVR regression function is obtained as (21) in the dual space, where (  ,   ) is called the kernel function, and the value of the kernel equals the inner product of two vectors, x  and x j , in the feature space (x  ), (x  ), respectively; that is, (x  , x  ) = (x  )∘(x  ).Any function that meets Mercer's condition [42] can be used as the kernel function.

Artificial Intelligent Optimization
Algorithm.The empirical results show that the selection of the two parameters  and  (the parameter of Gaussian kernel function) in SVR influences the forecasting accuracy significantly.In order to further improve forecasting accuracy of wind speed, we have employed different evolutionary algorithms (GA, PSO, and CS) for parameters determination, to identify which algorithm is suited for specified data patterns.

Genetic Algorithm (GA). GA was firstly developed by
John Holland et al. in the 1960s.It is an effective algorithm for nonlinear global optimization that was inspired by the biological evolution process.It is especially suitable for solving complicated optimization problems for simplicity and robustness, and it has been in use extensively in various forecasting and optimization fields.The GA approach is listed as follows.
(i) Select a group of random candidate solutions.
(ii) Iterate the following steps until reaching stop criterions: (1) computing the fitness values of the candidate solutions in accordance with the adaptive condition, (2) producing the next generation according to the proportionate principle (the one with higher fitness is more inclined to be chosen), (3) performing a crossover and mutation operation to the candidate solutions and generating new ones.
(iii) Return the solutions.

Particle Swarm Optimization (PSO).
The PSO algorithm was first proposed by Kennedy and Eberhart [43], inspired by the social swarming behavior of animals moving in large groups (birds and insects in particular).Like other swarm-based techniques, the algorithm contains a number of individuals refining their knowledge of the given search space.In this search space, the individuals, called as particles, have a position and a velocity.The PSO algorithm works via attracting the particles of the given search space positions of high fitness.A memory function in each particle adjusts its trajectory according to two pieces of information: the best position that it has so far visited and the global best position attained by the whole swarm.The whole swarm can be considered as a society, and the first piece of information can be thought of as a result from the particle's memory about its past states, and the second piece of information is resulting from the collective experience of all individuals of the society.A fitness evaluation function in PSO computes each particle's position and assigns it a fitness value.Each particle can remember the global best, which can be identified when the position of highest fitness value is visited by the swarm.The position of the highest fitness value that has been personally visited is called the local best.

Cuckoo Search (CS). The cuckoo search (CS) algorithm
is a new optimization metaheuristic algorithm (Yang and Deb in 2009 [44]), based on a stochastic global search and the obligate brood-parasitic behavior of cuckoos by laying their eggs in the nests of host birds.In this optimization algorithm, each nest represents a potential solution.They choose the recently spawned nests so that they can be sure that eggs could hatch first for the reason that a cuckoo egg usually hatches earlier than its host bird.In addition, by mimicking the host chicks, a cuckoo chick can deceive the host bird to grab more food resources.If the host birds discover that an alien cuckoo egg has been laid in (with the probability   ), they either propel the egg or abandon the nest and completely build a new nest in a new location.New eggs (solutions) laid by cuckoo choose the nest by Levy flights around the current best solutions.And with the levy flight behavior, the cuckoo speeds up the local search efficiency.
In sum, two search capabilities have been used in cuckoo search: global search (diversification) and local search (intensification), controlled by a switching/discovery probability (  ).Yang and Deb simplified cuckoo parasitic breeding process by the following three idealized rules.
(i) Each cuckoo lays only one egg at a time and randomly searches a nest to lay it.
(ii) The egg of high quality will be considered to survive to the next generation.
(iii) The host bird of the nest, where a cuckoo lays its egg, can discover an alien egg with a possibility,   ∈ [0, 1].
And the host bird either propels the egg out of the nest or abandons its nest to build a new one in a new location.The number of available nests is fixed during these rules.
To better understand these rules, they can be transformed into the following steps.
Step 1.A cuckoo randomly chooses a nest to hatch only one egg.An egg represents a potential best solution.
Step 2. To maximize the probability of their eggs survival, the cuckoo birds search the most suitable nests by law of Levy flight.According to the elitist selection principle, the best egg (minimum solution) will survive to the next generation and will have the opportunity to grow into a mature cuckoo bird.In this step, the aim of cuckoo algorithm is to obtain the ability of intensification.
Step 3. The number of available nests (population) is fixed during these rules.The alien egg laid by a cuckoo bird is discovered by the host with a probability   ∈ [0, 1], and this egg is thrown away or the host abandons the nest to completely build a new one in a new location (with a new random solution).In this step, the aim of cuckoo algorithm is to obtain the ability of diversification.
For minimization problems the quality or fitness function value may be the reciprocal of the objective function.Each egg in a nest represents a solution and the cuckoo egg represents a new solution.Therefore, there is no difference between an egg, a nest, and a solution.
When generating new solutions for, say, a cuckoo , a Levy flight is performed as follows: where  > 0 is the step size and should be related to the scales of the problem of interest.In most cases,  = 1 is proposed.Equation ( 21) is essentially the stochastic process for a random walk.In addition, a random walk is a Markov chain process where next status only depends on the current status (the first term in ( 21)) and the transition probability (the second term in ( 21)).The product ⊕ means entrywise multiplications, which is similar to those used in PSO, but the random walk process via Levy flight here is more efficient in exploring the search space, for its step length is much longer in the long run.
The Levy flight provides a random walk process while the random step length is drawn from a Levy distribution: Levy ∼  =  − (22) which has an infinite variance with an infinite mean.Here, the steps essentially form a random walk process with a powerlaw step-length distribution with a heavy tail.Some of the new solutions are generated by Levy flight around the current best solution obtained so far, which will intensify the local search (intensification).However, a substantial fraction of the new solutions should be generated by far-field randomization (diversification), whose locations should be far enough from the current best solution; this will ensure that the system will not be trapped in a local optimum.
The simple flowchart of the cuckoo search algorithm is presented in Figure 7.

The Proposed Hybrid Model
As the high volatility, nonstationarity, and nonlinearity of wind speed series, many useful tools are introduced to predispose so as to make an accurate forecasting.
The procedure for applying the proposed method to predict the 10 min wind speed is illustrated in Figure 8 and described as follows.
Step 1. Conduct the 5-3H method to test and discover the outliers and then replace by 5-3H values.
In this step, after a large number of experiments, we set threshold parameter  = 0.3.The result shows that not only could 5-3H detect the outliers effectively, but it can also smooth the original data to some extent, in which it captures the majority of the trend of wind speed.
However, some slight white noises still exist in the series after 5-3H.Hence, it is necessary to further smooth via wavelet in Step 2.
Step 2. Decompose 5-3H values by wavelet denoise by db3 wavelet basis function and reconstruct the series.
In this approach, we adopt db3 as the wavelet basis function in only one layer to decompose the data.As the result of respective smooth preprocessing data after 5-3H, and as making many an experiment, we discover that decomposing the data to one layer has the best effectiveness of denoising, which otherwise could denoise excessively to get rid of useful information of original data.In relation to threshold selection, we use the popular method of threshold selecting, Birge-Massart method.After being filtered, the wind speed of high frequency, that is, white noises, could be smoother so as to be better used in forecasting.x (t+1) i = x (t)  i + a ⊕ Levy() Pseudocode for CS algorithm Step 3. Use three popular artificial intelligent algorithms, BP, Elman, and SVR, to fit the models and predict the future values of one day.
We discover that the SVR functions are the best among these two models.To further improve the performance of SVR, we propose another two steps at the same time, which are, respectively, another three artificial intelligent optimization algorithms in Step 4.
Step 4. Conduct the GA, PSO, and CS to optimize the two main parameters of SVR and make a comparison with the conventional approach of grid search.
A nonheuristic algorithm of searching parameters of SVR is grid search in this paper to search the best parameters  and .Although, in the sense of grid search, it could find the best accuracy (the global optimum), employing the metaheuristic algorithm can find the global optimum more efficiently if considering the search in a larger field of  and  by grid search is time-wasting.Therefore, under this consideration and in order to further improve accuracy of forecasting, GA, PSO, and CS are employed to search the two main parameters of SVR.

Analysis and Discussion of the Applicative Case Studies
4.1.Data Presentation.To validate the proposed forecasting method, three cases are introduced.The first two are 10 min average wind speed series from wind towers of 70 meters in two sites in four seasonal months (January, April, July, and October in 2011, which are the representative months for each quarter of the year).The first site locates in the Jiling Shoal, Jinchang City, with longitude of 101.7999, latitude of 38.5248, and altitude of 2195.000.The second wind tower is in Qingtu Lake, Wuwei City, with longitude of 103.6201, latitude of 39.1031, and altitude of 1298.000.Of each wind tower in each month, we draw 744 samples and make a 3-step forecasting.
The previous 600 samples are used to build a model and then predict the remaining 144 (48 * 3) points (144 * 10 minutes, which amounts to a whole day).To further validate the universe of approach, the data of electricity price from New South Wales (NSW) in January 2012 is also used as the third case.In Figures 9-11, the raw data in three cases are illustrated.

Error Evaluation.
Table 1 shows the results of proposed intelligent algorithms to forecasting the wind speed by 3 steps in Jiling Shoal and Qingtu Lake and electricity price by 3 steps in NSW in 2011.We refer to PBP, PElman, and PSVR as prediction method after preprocessing.To validate the proposed approach, we mainly contrast the results of PBP and BP, PElman and Elman, and SVR and PSVR.The mean absolute error (MAE), mean square error (MSE), mean absolute percentage error (MAPE), and symmetric mean absolute percentage error (SMAPE) are utilized to scale the prediction accuracy of these three models [45].The MAE values can be calculated by and the values of MSE can be computed by and the MAPE values can be calculated by V best glob Molecular mechanics

Global best particle
Based on Denoised Kernel parameter  and hyperparameters where the reference model in our case is the model without preprocessing, model  ( = 1, 2, 3) represents one of three models, and  ( = 1, 2, 3, 4) stands for one of four seasonal months.The results of RE values are listed in Table 2 and illustrated in Figure 14.Additionally, to provide a comprehensive evaluation of performances of proposed methods, the average (Ave.)error criterion is introduced, which is computed according to Ave.

Simulating Results Analysis
Steps 1 and 2. Conduct 5-3H and wavelet denoise method to predispose wind speed in Jiling Shoal in four seasonal months in 2011 by use of proposed method.
As we can see from Figures 12-13, the first subplot to fourth subplot are pictures of original and preprocessing wind speed series in January, April, July, and October, respectively.Figure 14 illustrates the raw electricity price series in NSW.The preprocessing data are apparently smoother as the result of 5-3H outliers filtering and wavelet denoising.According to the algorithm of 5-3H, not only could it detect the outliers, but, more importantly, it has also the characteristics of smoothing data.Through these proposed predisposing approaches, the tendency of wind speed series and electricity price become clear and are more adaptive to be forecasted; that will be illustrated in next step.
Step 3. Use three popular artificial intelligent algorithms, BP, Elman, and SVR, to fit the models and predict future values of one day.than that of wind speed.In conclusion, both Table 2 and Figure 15 demonstrate excellent performance of the proposed preprocessing methods.As the fact that the whole installed electricity capacity in China in 2011 is 62364.2MW, this slightly improved accuracy could even economize a large amount of money.

As is listed in
In particular, from Table 1, the finding shows that the PSVR functions best among these other two models.To further improve the accuracy of forecasting of wind speed, we propose another three artificial intelligent optimization algorithms to improve performance of PSVR in Step 4.
Step 4. Conduct the CS to optimize the two main parameters of PSVR and make a comparison with the conventional approach, GS, GA, and PSO.
Using the metaheuristic algorithms, GA and PSO, to optimize the hyperparameters of SVR could generally attain a better accuracy than using a nonheuristic conventional method, such as grid search (GS).However, as Moghram and Rahman [46] said, no certain model or algorithm that forecasts effectively in a wind farm could be applied to any wind farms as a result of difference of wind speed between wind farms and various location-specific factors influencing the wind speed patterns.To explore the potential best algorithm forecasting wind speed in Jiling Shoal and Qingtu Lake, it is necessary to make a comparison between different algorithms.In next part, we choose the most commonly used algorithms (including artificial intelligent and non-artificialintelligent algorithms, GS, GA, and PSO) to optimize the hyperparameters of SVR and then make a contrast with a new metaheuristic algorithm (CS) optimizing SVR.The final results are shown in Tables 3-4 and Figures 18-21.
Figures 20-21 are forecasting results of wind speed in a certain month and electricity price.Table 3 displays four kinds of forecasting error indexes of PSVR through optimization of GS, GA, PSO, and CS in three cases.The final four models are marked as PSVR, GA-PSVR, PSO-PSVR,   fluctuated more intensively randomly.In conclusion, through comparing average RE values of GA-PSVR, PSO-PSVR, and CS-PSVR in NSW, the proposed method CS-PSVR is more excellent than the other three algorithms.

Conclusion
Wind power has been rapidly growing in the world.The forecasting of wind speed plays an important role in the wind energy.Accurate wind speed prediction is becoming increasingly important to improve and optimize renewable wind power generation.Particularly, reliable short-term wind speed prediction can enable model predictive control of wind turbines and real-time optimization of wind farm operation.
In this paper we utilize 5-3H and wavelet denoising method to prepress the original data and then conduct BP, Elman, and SVR models to make a 3-step prediction every 10 minutes.Finally, we adopt GA, PSO, and CS to optimize the PSVR.It is discovered that 5-3H combined with wavelet denoising can significantly improve accuracy of BP network, Elman network, and SVR forecasting wind speed in two sites and electricity price in NSW.These results reveal that excellent ability of removing outliers and denoising of 5-3H combined with wavelet denoising can be applied into the wind speed forecasting in the Jiling Shoal and Qingtu Lake and the electricity prediction in NSW.Relating to the optimization of the two main hyperparameters of SVR, the capacity of a new metaheuristic intelligent optimization algorithm, cuckoo search, outperforms that of traditional methods that are GS, GA, and PSO.

Figure 7 :
Figure 7: The flowchart of cuckoo search.

Figure 8 :
Figure 8: The flowchart of the proposed method.

Figure 16 :
Figure 16: The forecasting results of Jiling Shoal after preprocessing in October.

Figure 17 :
Figure 17: The forecasting results of NSW after preprocessing.

Figure 18 :
Figure 18: Times of best accuracy in three cases.

Table 1 :
Forecasted results in three cases.()and x () signify the th actual and predicted values at time , respectively.In Table1listed are comparisons of the MAE, MSE, MAPE, and SMAPE values for the PBP and BP, PElman and Elman, and SVR and PSVR models.As shown in Table 1, plenty of MAE, MSE, MAPE, and SMPAE values obtained through the proposed methods are displayed.To further facilitate understanding of the performance of improved approaches, the 4 kinds of decreased

Table 1 ,
PBP, PElman, and PSVR represent models after data preprocessing.We can easily discover that almost all the accuracies of PBP, PElman, and PSVR outperform those of BP, Elman, and SVR.The more explicit results, RE, are shown in Table 2 and Figure 15.Table 2 reveals the improved percentage of accuracy by 3 models evaluated by four error measures (MAE, MSE, MAPE, and SPMAE).Figures 16-17 are forecasting results of wind speed in a certain month and electricity price in NSW.And Figure 15,

Table 2 :
The decreased RE values of each site.Average percentage of RE values of BP, Elman, and SVR in three cases.
made through the result of Ave. in Table2, shows the average improved percentage of RE values.As is shown in the PBP column from Table2, almost all the RE values of MAE are positive, which implies that all of the MAE values attained by PBP are smaller than those obtained by BP.In addition, values, in three cases.In Jiling Shoal, PBP has the greater improvement than the other two evidently, while PBP and PSVR perform better than PElman in Qingtu Lake, and in NSW the PElman outperforms the other two intensively.As is shown in Figure14, the data preprocessing in NSW is more effective because it removes more outliers of electricity price