Multistep Wind Speed and Wind Power Prediction Based on a Predictive Deep Belief Network and an Optimized Random Forest

A variety of supervised learningmethods using numerical weather prediction (NWP) data have been exploited for short-termwind power forecasting (WPF). However, the NWP data may not be available enough due to its uncertainties on initial atmospheric conditions. Thus, this study proposes a novel hybrid intelligent method to improve existing forecasting models such as random forest (RF) and artificial neural networks, for higher accuracy. First, the proposed method develops the predictive deep belief network (DBN) to perform short-term wind speed prediction (WSP). Then, the WSP data are transformed into supplementary input features in the prediction process of WPF. Second, owing to its ensemble learning and parallelization, the random forest is used as supervised forecasting model. In addition, a data driven dimension reduction procedure and a weighted voting method are utilized to optimize the random forest algorithm in the training process and the prediction process, respectively. The increasing number of training samples would cause the overfitting problem. Therefore, the k-fold cross validation (CV) technique is adopted to address this issue. Numerical experiments are performed at 15-min, 30-min, 45-min, and 24-h to indicate the superiority and signal advantages compared with existing methods in terms of forecasting accuracy and scalability.


Introduction
The uncertainty of wind energy indeed imposes major challenges on power system operation and planning, such as power system security assessment and reserve management [1].A reliable wind power forecasting plays an important role in preventing damage to wind turbines and maintaining the stability and security of the power system.
Numerous forecasting models which are generally classified into four categories have been proposed for short-term wind power forecasting.
(1) Physical-based method includes spatial and temporal factors in a full fluid-dynamics model of the atmosphere [2].NWP is a popular physical approach utilizing complex mathematical models but always suffers from the difficulties in gaining information and its limited spatial resolution.
(2) Statistical-based method characterizes the history data to yield precise performance for short-term forecasting tasks.The dynamical ensemble LSSVR [3], the closed-loop forecasting engine including KIM and EMD [4], and adaptive robust multikernel regression model [5] have been proposed to yield the research.Adaptive neurofuzzy inference systems were developed to perform a nonlinear mapping between inputs and outputs [6].
(4) Hybrid-based method based on two or more architectures benefits from the advantages of each approach to reach the optimal results.A hybrid approach based on the combination between least square support vector machine and gravitational search algorithms is proposed [10].
Most methods reported in the current literature have three disadvantages.
(1) In most artificial intelligence methods, there is only one single hidden layer [11][12][13][14].Due to the finite ability of searching optimal solution in the parameter spaces of ANN with more than one hidden layer, these models cannot provide accurate outputs.
(2) Some methodologies require tediously handengineered features [13].Due to the lack of sufficient knowledge for a specific domain, the selected features may not be suitable for corresponding models.
(3) The conventional features in the previous supervised forecasting models contain the NWP data of the wind farms [15][16][17][18][19].However, the NWP model usually runs once or twice a day, which is often applied for medium-to long-term forecasts [20].
Deep learning has attracted tremendous attention in recent years of academia and industrial communities [21].It has been successfully used in the field of speed recognition [22], image processing [23], and health status assessment [24,25].However, it has not been actively utilized in the wind power or wind speed forecasting fields.Deep belief network, due to its strong ability of learning, has been performed in short-term WSP [26].The stacked denoising autoencoder combined with rough set was applied to extract features from wind speed series [20].In the literature [27], deep autoencoders act as base-regressors, whereas deep belief network is used as a metaregressor.A DNN-MRT scheme was proposed to predict wind power.Therefore, employing deep learning to address the problems mentioned above has a great potential.
In this paper, a hybrid intelligent method for shortterm wind power forecasting is proposed.First, DBN as a probabilistic generative model consisting of multiple layers of restricted Boltzmann machines (RBMs) provides the unsupervised learning features from the wind speed series data in the pretraining phase, and backpropagation neural network (BPNN), as the top layer of the DBN, uses the labeled data to fine-tune the parameters of the DBN by stochastic gradient decent.Performance of DBN is affected by its parameters, which are often limited by hand-engineered features.The bat algorithm combined the major advantages between particle swarm optimization and genetic algorithm and Harmony Search is applied to yield optimal parameters in the DBN.Second, random forest is suitable for handling large data due to its parallelization [28].It has been combined with the Spark [28], heuristic bootstrap sampling method [29], kernel principal component analysis [30], and other technologies to perform fault diagnosis and regression tasks [31,32].Owing to the improvement of the forecasting accuracy for highdimensional and large-scale wind turbine data, we propose an optimized random forest method which consists of a dimension reduction procedure and the weighted voting process for the short-term WPF.
The contributions of the paper are as follows.
(1) Predictive deep belief network is applied as the deep learning architecture to perform the short-term wind speed prediction.The DBN model has better generalized ability than the traditional shallow architectures.
(2) The bat algorithm is incorporated with DBN to improve the training accuracy and reduce the costing time in wind speed prediction.(3) Benefiting from the dimension reduction and weighted voting procedures, the random forest model avoids the lack of prior knowledge for feature selection and enhances the capacities of tackling new condition data.
This paper is organized as follows.In Section 2, the DBN and its learning algorithm for short-term WSP are presented.Section 3 presents the optimized random forest and the proposed model for short-term WPF.Numerical results and conclusion are presented in Sections 4 and 5, respectively.

Brief Architecture about the Proposed Method
In this section, the proposed method would be briefly introduced and its process is shown in Figure 1.Wind speed, as the most important variable to the normal running of wind turbine, is hard to predict because of its randomness and fluctuation.Therefore, due to the merged advantages of the deep learning architectures and the bat algorithm, the WSP model based on predictive deep belief network and bat algorithm is firstly performed to capture the characteristics of wind speed series.Then owing to its ensemble learning and parallelization the random forest is used as the WPF model.In addition, a data driven dimension reduction procedure and a weighted voting method are utilized to optimize the random forest algorithm in the training process and the prediction process, respectively.

DBN for Wind Speed Prediction
Due to the intermittence and volatility of the wind energy, it is difficult to gain the efficient future wind speed series.Thus, this section would introduce a novel WSP method to perform the wind speed forecasting.3.1.Short-Term Wind Speed Forecasting.The wind speed series can be described as  = { 1 ,  2 , . . .,   }, where   is the average wind speed in the past 15 minutes as illustrated in Figure 2.For short-term wind speed prediction, it is to forecast the future value of  + by utilizing the previous N data, where t is the index of the wind speed series and  is the forecast horizon.Therefore, it is a significant task to build the function f to describe the wind speed explicitly: where  is the parameter vector.

Traditional Deep Belief Network.
The architecture of DBN is shown in Figure 3.It is composed of several stacked RBMs.The unit number of first input layer is N, which is the same as the number of previous data.The numbers of units for the hidden layers are gradually reduced.The input layer of BPNN, as well as the hidden layer of the last RBM, consists of two units.The output layer has one unit.Then, the optimal mapping function can be found through the parameter space which is influenced by the bat algorithm and avoid the effect of personal experience.There are two steps included in the training process of traditional DBN.
(1) The pretraining phase: As the hardcore of DBN, RBM is a stochastic generative model which performs this procedure as a technology of greedy layerwise unsupervised learning.Each RBM consists of two layers, namely, the visible layer and the hidden layer, as shown in Figure 2. The probability distribution over an RBM is defined by the connection weights between visible units and hidden units through an energy-based model (V, ℎ; ): where  = (, , ) represents the model parameters,  denotes the weights connecting hidden and visible units, and b and c are the biases of the visible units and the hidden units, respectively.The conditional probability over visible units and hidden units is defined as Considering the specific structure of the RBM, the neurons are binary; the probabilistic version of activating a unit is a logistic function which is given by where  is the logistics sigmoid function.
The objective by training the model is maximized loglikelihood, which is defined by In the commonly studied case, the contrastive divergence is adopted to calculate the derivative of the log probability about the model parameter where  denotes an expectation under the data distribution and   denotes an expectation under the model distribution.
Then update rules for the parameters can be written as where  is the learning rate.
(2) The fine-tune phase: For the DBN model, the gradient descent associated with greedy layer-wise training method is performed in the pretraining phase, and this predictive fine-tune is carried out by adopting BPNN.The parameters space of the DBN model would be fine-tuned by minimizing the error between the predicted value and actual value.The parameters can be updated as where   and   are the weights and bias of the lth layer,  is the learning rate, and (, ; ) is the cost function.
The DBN model has the capability of forecasting the wind speed in the current condition space after the fine-tune procedure.When it comes to the new dataset, the parameters space can be further fine-tuned by simply performing the new dataset instead of training from scratch.

Optimization of the DBN Model.
To find the optimal mapping function in Section 3.1, an improved DBN is employed.The connecting weights, the offsets of the units, and the learning rate need to be decided when the model is applied.As mentioned above, the model parameters are initialized by random experiments and then updated by iterative training, which may take a long time and need the mutual experience.To address this problem, a bat algorithm is employed to help search the optimal parameters space, which is composed of the connecting weights between visible units and hidden units, the biases of the visible units and the hidden units, the learning rate in the RBM, and the parameters in the BPNN.The detailed steps are shown as follows.
Step 1. Initialize  = 1, bat position   , velocity V  , frequency   , pulse rate   , and loudness   for each of the n bats.
Step 2. Compute the fitness function value for every bat and select the best bat ) where  is the size of output vector,   is the predicted value and    is the measured value, and   is the best solution in the current situation.
Step 3. Compute the new solution    , the new velocity V   , and the new fitness   .where  is a uniform random value within [0, 1] and  * is the current global best solution, and  max and  min denote the maximum and minimum frequency, respectively.
Step 4. BPNN or RBM updates the parameters space according to its own learning method.
Step 5. Compare the performance of parameters spaces between the BPNN or RBM and the bat algorithm using (16), and select the best one as the new best global solution.
The second step to the fifth step would be repeated until the termination condition is true. Figure 4 shows the framework of the DBN model developed in this section.

Optimized Random Forest for Wind Power Forecasting
Random forest has been widely used for prediction problems.
The proposed WPF model is based on WSP and optimized random forest, which consists of attribute reduction and weighted voting procedures.Figure 5 shows the training and prediction process of the WPF model.First, Pearson's correlation coefficient is conducted on the vector space to select the favor features.Second, this paper adopts the WSP and the selected SCADA variables as the input attributes.Finally, a real-time weighted voting is constructed in the prediction process.Each regression tree provides a prediction for every testing data, and the final result depended on the vote of these trees.The original training data is denoted as  = {(  ,   ),  = 1, 2, . . ., ;  = 1, 2, . . ., }, where  is a sample and  is the feature sets.Namely, the training data contains D variables and L samples.The steps of constructing each decision tree of random forest are as follows.
Step 6. Select training subsets in a bootstrap manner.
Step 7. d features are randomly selected from D variables.
Step 8. Construct the tree to the maximum depth without pruning.
The above three steps are repeated until k decision trees are collected into a random forest model.

The Dimension Reduction for High-Dimensional Data.
To improve the prediction accuracy of the random forest model, we present a Max Relevance-Min Redundancy (MRMR) index by conducting Pearson's correlation coefficient to reduce the number of dimensions.In the training procedure of each decision tree, the top d features according to the MRMR index value are selected as the favor features, and then we randomly select the ( − ) feature variables from the remaining ( − ) ones.Therefore, the dimension is reduced from  to .The process in the dimension reduction is presented in Figure 6.
First, in the training process, Pearson's correlation coefficient based on the covariance matrix is adopted to evaluate the relationship between two vectors   and   .

𝑃 (𝛼
The relevance of each feature would be calculated by where   is the feature, D is the size of dimension,   is the target feature, and the most relevant feature can be obtained by (1) Input: a training dataset, the whole feature set   , favor feature size , the favor feature set   = , the remaining feature set   =   −   (2) for k=1 to s search the favor feature from the remaining feature set according to Second, the redundancy for each input variable is calculated.The min redundancy feature can be obtained by min = arg min  (  ,   ) The MRMR index is a simple form of (  ,   ) and (  ,   ), and the candidate feature can be selected by Given the framework discussed above, we adopt the MRMR index to select the favor features; the detailed steps are shown in Algorithm 1.Compared with the traditional RF model, the dimension reduction method balances the accuracy and diversity of the RF algorithm and prevents the overfitting efficiently.

Weighted Prediction Process.
In the prediction process, limited to the training data, it likely leads to the performance reduction of the traditional RF algorithm when the new condition data is applied.To overcome this drawback, a realtime update for weighted voting approach is proposed to decrease the prediction error for the testing data.
Each sample of testing data is predicted by all the decision trees in the RF model.The prediction result is weighted average value of all decision trees.The weighted regression result is defined as where   is the weight for the ith decision tree on the jth sample,   is the prediction result for the ith decision tree on the jth sample, and   is the final prediction result.The objection of the predictor learning is to minimize the prediction error.The error function in this paper is defined as The stochastic gradient decline is conducted to update the weights for decision trees as follows.
,+1 =   − Δ  (31) In the weighted voting procedure of RF model, a reasonable weight associated with each tree is applied to improve the global prediction accuracy and reduce the generation error especially for the new condition data.

The Short-Term Wind Power Forecasting Model.
To further improve the inferential ability of the proposed method, the k-fold cross validation partitions the training data into k different subsets.Then (k-1) of these subsets are selected to train the model and the other one is used for testing the performance of the trained model.The root mean square error is taken as the criterion to select the optimal model.To clearly describe the process of the WPF, the detailed steps of training and prediction procedure are given as follows.
The training procedure includes the following.
Step 1. Select the training features from the input variables according to the dimension reduction method mentioned in Section 4.2.Twelve SCADA variables are selected as the candidate features for RF model, which are as shown in Table 1.
Step 2. Based on the input features selected in Step 1, the regression tree model is built on its traditional way with kfold cross validation without pruning.
Step 3. The above two steps are repeated until the size of random forest model reaches the threshold.The prediction procedure includes the following.
Step 1.Each regression tree of the RF model applies the testing data which consists of the future wind speed series obtained from the WSP and the SCADA variables to perform short-term WPF.
Step 2. Compute the final prediction result based on the prediction values from all regression trees according to (28), and then update the error weight for each regression tree in real-time according to (30)- (31).

Numerical Results
To verify the performance of the proposed method, numerical experiments have been carried out on four sets of historical SCADA data.The task is to forecast the 15-min, 30-min, 45-min, and 24-h wind speed and the corresponding wind power.The size of training sets and testing sets in four cases are as shown in Table 2.

Performance Index.
The root mean square error (RMSE), the mean absolute error (MAE), the average percentage error (APE), the bias (BIAS), and the standard deviation of the error (SDE) are the measurements to compute the error scores.
where   is the predicted value from the model and   is the true value;  is the number of testing samples.

Short-Term Wind Speed Prediction Results
. The structure of the predictive DBN is determined by human experience, which is 10-7-5-4-2-4-1.How to search the optimal structure is still an open problem.In this section, the proposed method is compared with several wind speed forecasting models, which have been proposed in the literature including BPNN, ELMAN, persistence method, and traditional DBN.
To evaluate the generalization performance, the experiments with the forecast horizon ranging from 15 min to 24 hours are carried out on four sets: Case 1-Case 4.
The actual wind speed data and their predicted values using predictive DBN, traditional DBN, BPNN, ELMAN, and persistence model are shown in Figure 7. Figures 7(a)-7(j) show the predicted wind speed series in the future 15 minutes and 30 minutes from predictive DBN, traditional DBN, BPNN, ELMAN, and persistence model in Case 1 and Case 2. Similarly, the results for the 45-minute and 24-hour horizon in Case 3 and Case 4 are shown in Figures 7(k)-7(t), where the green and blue lines denote the actual and predicted wind speed series by these models, respectively.In the first two cases, the estimated values from predictive DBN are very close to the real data, but the others have obvious errors.However, as we can see from Figures 7(k)-7(t), the performance of these models decrease gradually as the inference step increases, where the predictive DBN also generates the minimum prediction errors.
To further show the performance of these models, the RMSE, MAE, BIAS, SDE, and APE, as the indexes to evaluate the approximate performance, are computed and demonstrated in Table 3.As shown in Table 3, the deep learning architectures procure much better results than shallow networks and persistence model.The proposed model shares nearly the same generalization capacities with the traditional DBN to predict the future wind speed, while the forecasting ability of the persistence model degrades with the growth of the time scale.Predictive DBN, as a deep learning network, has improved the RMSE, MAE, BIAS, SDE, and APE by 1.505, 1.427, 2.439, 1.134, and 0.625 over multiple time

Discussion
As seen clearly, the DBN together with optimized random forest can provide notably accurate future wind power series.
In Case   The performance of RFWWU is just a little bit worse than the optimized random forest.The predicted values from RFWFF have the largest errors among those models, which from another point of view prove the importance of favor features.The results of MRMR index and the favor features are illustrated in Table 6.The favor features size is set to 5, and the bold one is chosen to be a favor feature.

Conclusions
In this paper, we proposed a multistep wind speed and wind power forecasting model combined with predictive deep belief network and optimized random forest to produce the future 15-min, 30-min, 45-min, and 24-h wind speed and wind power series.The DBN models the wind speed series by its deep learning ability, and the bat algorithm is employed to further enhance its inference performance.The performance of the predictive DBN is verified by four cases, and the results demonstrate that the DBN makes a major prediction accuracy increase.Wind speed has a pivotal effect on the wind power generation.Benefited by the approximated ability of the DBN model, random forest algorithm can procure better future wind power data with higher precise wind speed series than traditional WPF models.On the other hand, the competitive generalization property can be further obtained with the dimension reduction and weights updating in realtime.
However, there are still many problems for the wind speed and wind power forecasting model.One is that the learning time will be booming when it has a large scale.Therefore, a parallel deep learning algorithm is developing in recent years.The second one is the parameters selection; in this paper, we utilize the bat algorithm to search the parameter space.However, it is very difficult to determine the optimal number of layers, number of the units in each layer, and the number of epochs.These setting parameters affect the learning ability of the deep network significantly.Some automatic selection models should be developed to fix this problem.

STARTFigure 1 :
Figure 1: The process of the proposed method for multistep wind speed and wind power prediction.

Figure 2 :Figure 3 :
Figure 2: The wind speed series recorded on October 30, 2017, in northern China.

STARTFigure 4 :
Figure 4: Using bat algorithm to design an optimized DBN.

Figure 5 :
Figure 5: Process of training and prediction of the WPF model.

STARTFigure 6 :
Figure 6: Dimension reduction in the training process.

3 )
update the favor and the remaining feature sets   =   ∪     =   −   end for Output:   Algorithm 1: The process of selecting the favor features.
m/s) (g) The forecasting result of traditional DBN for 30min m/s) (h) The forecasting result of BPNN for 30-min

Table 1 :
The list of candidate features.

Table 2 :
The size of training sets and testing sets.

Table 3 :
The performance of wind speed forecasting models for different horizons.

Table 4 :
The performance of wind power forecasting models for different horizons.

Table 5 :
The performance of wind power forecasting models with 10-fold CV for different horizons.
1, Case 2, and Case 3, the lines of predicted values and the actual data are almost overlapping.It makes a little error in Case 4 because of the wind fluctuation and the longer time horizon, as can be seen from Figure7(p).To further inspect the generalization ability of the proposed model, Table4reports the RMSE, MAE, BIAS, SDE, and APE indexes of the proposed model and unoptimized random forest.What is more, Table5shows the forecasting indexes by utilizing tenfold cross validation in training procedure of random forest.The training data would be divided into ten subsets.Each subset of the training data is used once for testing while being used nine times for training.CV method contributes to searching the optimal weights of the RF algorithm.From Table5, the effectiveness of CV can be easily seen by comparing the indexes with Table4.Most

Table 6 :
The MRMR index and the favor features.of the all cases degrade with CV in the training procedure except for some individual situations.From this observation, using the CV technology in the training process of random forest is a reasonable choice.It can be observed that the predicted accuracy of optimized random forest is greater than that of RFWWU and RFWFF for all cases. indexes