Multidimensional Meteorological Variables for Wind Speed Forecasting in Qinghai Region of China: A Novel Approach

*e accurate, efficient, and reliable forecasting of wind speed is a hot research topic in wind power generation and integration. However, available forecasting models focus on forecasting the wind speed using historical wind speed data and ignore multidimensional meteorological variables. *e objective is to develop a hybrid model with multidimensional meteorological variables for forecasting the wind speed accurately. *e complementary ensemble empirical mode decomposition with adaptive noise (CEEMDAN) is applied to handle the nonlinearity of the wind speed.*en, the original wind speed will be decomposed into a series of intrinsic model functions with specified numbers of frequencies. A quadratic model that considers the two-way interactions between factors is used to pursue accurate forecasting. To reduce the model complexity, Gram–Schmidt-based feature selection (GSFS) is applied to extract the important meteorological factors. Finally, all the forecasting values of IMFs will be summed by assigning weights that are carefully determined by the whale optimization algorithm (WOA). *e proposed forecasting approach has been applied on six datasets that were collected in Qinghai province and is compared with several stateof-the-art wind speed forecasting models. *e forecasting results demonstrate that the proposed model can represent the nonlinearity of the wind speed and deliver better results than the competitors.


Introduction
e depletion of conventional fossil fuels and the deterioration of the ecological environment have been receiving increasing concern in recent years. It is inevitable that fossil fuels will be restricted or replaced by renewable energy resources, which include wind, solar, geothermal, and biomass ones [1,2]. As a type of clean, inexhaustible, and environmentally preferable renewable energy, wind energy has been developed rapidly since the 1990s around the world and has received widespread attention [3]. Based on the annual report by the Global Wind Energy Council, the global market was 52.573 GW of the wind power installed capacity in 2017, thereby resulting in a cumulative installed capacity of 539.581 GW [4]. Figure 1 presents the top 10 countries in the world in terms of wind power installed capacity from January to December 2017. e wind power is a function of the cube of the wind speed; hence, small differences in the wind speed forecast cause large differences in the wind power forecast [5]. Moreover, the wind speed is an important variable that is related to the wind turbine output power, and the accurate forecast of wind speed facilitates the improvement of the efficiency of wind turbines and the enhancement of the stability of the power supply [6]. However, the wind speed exhibits the stochastic fluctuations and irregular intermittency due to the influence of several meteorological factors, and its distribution is not normal. erefore, it is also critically challenging to forecast the wind speed accurately. Many scientific techniques have been developed for forecasting the wind speed. ese algorithms can be divided into five types: physical-based methods, statistical-based methods, persistence methods, intelligent methods, and hybrid methods [7]. Physical-based methods, which are suitable for long-term wind speed forecasts, apply parametric models according to the physical changes in the atmosphere [8], such as the numerical weather prediction (NWP) model [9]. Statistical-based methods, such as statistical regression [10], the autoregressive moving average (ARMA) models [11], Bayesian models [12], and Markov models [13], depend on the historical wind speed data and forecast the future pattern [6]. ese models are suitable for forecasting short-term wind speed but have difficulties if the wind speed series exhibit high-frequency variations. Persistence methods, which are used in short-term wind speed forecasts, are based on the assumption that the wind speed at time k+1 will not change significantly from the wind speed at time k [14]. e intelligent methods apply machine learning and artificial intelligence algorithms to forecast the wind speed and include Kalman filtering [15], artificial neural networks (ANNs) [16], support vector machine [17], extreme learning machine (ELM) [18], and high-dimensional feature selection [19]. e most popular intelligent methods for forecasting wind speed are ANNs. Zhang et al. proposed a new hybrid model that combines a shared-weight long short-term memory network (SWLSTM) and Gaussian process regression (GPR), and they applied SWLSTM-GPR to wind speed prediction cases [20]. Li et al. developed an innovative framework for multistep wind speed prediction that uses a recursive neural network that is based on the wind speed and the turbulence intensity. According to experimental results, the proposed model dramatically outperforms conventional machine learning models on multistep wind speed prediction [21].
Currently, the main focus has shifted from single methods to advanced hybrid methods, which combine techniques such as statistical and machine learning approaches and utilize the characteristics of individual methods. Sheela and Deepa used a hybrid computing model that integrates a multilayer perceptron network and selforganizing feature maps for wind speed prediction [22][23][24]. Shukur and Lee proposed a hybrid model that was based on ARIMA for forecasting the daily wind speed in Iraq and Malaysia; to improve the forecasting accuracy, an ANN and a Kalman filter (KF) could be utilized to overcome uncertainty and nonlinearity problems [25]. Doucoure et al. used wavelet decomposition and ANNs with predictability analysis to forecast wind speed time series [26]. Liu et al. developed a novel wind speed prediction model that was based on the WPD (wavelet packet decomposition), a convolutional neural network and a convolutional long short-term memory network, and they demonstrated that the proposed model can perform best in wind speed 1-step e values of Top 10 new wind power installed capacity Jan-Dec 2017    [29]. In addition, wind speed time series exhibit nonlinearity and nonstationarity due to the weather and geographical factors. is may lead to incorrect conclusions regarding the activity of the wind speed when the techniques are used to forecast [15]. erefore, several researchers have attempted to improve the forecasting precision via the addition of a preprocessing stage for time series decomposition and feature selection [30]. Time series decomposition approaches have been widely used to decompose the original complexstructured series into several simple series, such as the wavelet transform (WT) [31] and variational mode decomposition (VMD) [32]. Feature selection approaches facilitate the selection of suitable inputs for the forecasting models, which enhances the performances of the models substantially. Various feature selection approaches have yielded satisfactory results, such as Gram-Schmidt orthogonalization (GSO) [33], the least absolute shrinkage and selection operator (LASSO) [19], and forward regression [34]. e main contribution of this paper is the developed hybrid forecasting model, which uses random forest, a Gram-Schmidt-based feature selection method, and the whale optimization algorithm. e proposed method has inherited all the advantages of the above approaches, which are embedded together. Instead of combining them in parallel, random forest and the Gram-Schmidt-based feature selection method are embedded in the whale optimization algorithm; this embedding is a challenging task in terms of computation. e objective function value of the whale optimization algorithm is determined by these two methods. e details will be presented in Section 2, which describes the use of the complementary ensemble empirical mode decomposition with adaptive noise (CEEMDAN), random forest (RF), the Gram-Schmidt-based feature selection method (GSFS), and the whale optimization algorithm (WOA) to solve several chained problems.
(i) e first methodology (CEEMDAN in Section 2.1) addresses the problem that the signals that are detected from the measured objects are often mixed with various levels of noise. Wavelet analysis can filter out weak or useless signals and extract the information that users need. is paper introduces a data preprocessing strategy that is designed based on the theory of decomposition and reconstruction; it filters the noise from the original wind speed efficiently and searches for the intrinsic values of the wind speed data. Finally, this preprocessing procedure utilizes wavelet packet decomposition for denoising and multiscale wavelet decomposition to complete data preprocessing, and it guarantees the accurate forecasting of the wind speed. (ii) e second methodology (RF in Section 2.2) is a machine learning model that is utilized as a forecasting model after selecting all the important variables. is forecasting model is stable since it weakens the correlations between variables to address the multicollinearity issue. Furthermore, it is designed based on bootstrapping, which theoretically guarantees its stability and accurate forecasting results. In summary, compared with traditional programming methods, the utilized machine learning algorithm simplifies the code and improves the execution performance of the code. Second, machine learning technology can be used to solve problems that cannot be solved via traditional methods. Furthermore, machine learning can adapt to new data; namely, the system can be personalized according to the environment. (iii) e third methodology (GSFS in Section 2.3) is an effective and efficient variable selection approach that is used to select the important variable after the preprocessing procedure. is approach is designed based on an orthogonalization method, which makes the selected variables orthogonal with each other. Furthermore, in contrast to other penalized variable selection methods, this novel approach does not introduce any bias into the model; hence, all nuisance variables can be eliminated effectively. (iv) e fourth methodology (WOA in Section 2.4) is a whale optimization algorithm, which is a new swarm intelligence optimization algorithm that simulates the predation behavior of humpback whales in the ocean. e algorithm has the advantages of simple structure, few parameters, strong search performance, and easy implementation. It is applied in the proposed model in an embedded way to determine the optimal weights for assignment to machine learning models. e fitness function in WOA is determined by the machine learning model and the variable selection method. e hybrid forecasting model, namely, RF-GSFS-WOA, is proposed in Section 3. e experimental results and the corresponding analysis are presented in Section 4. Finally, Section 5 presents the conclusions of this study.

Materials and Methods
In this section, four techniques, namely, CEEMDAN, random forest, GSFS, and WOA, will be introduced. e proposed methodology is developed based on these important techniques.

Complementary Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN).
e complementary ensemble empirical mode decomposition with adaptive noise (CEEMDAN) originates from the ensemble empirical Advances in Meteorology mode decomposition (EEMD) algorithm. To solve the mode mixing problem, adaptive white noise that has a specified standard deviation can be added at every stage of the decomposition, and a unique residue can be calculated to generate each mode. Moreover, the decomposition is completed with a negligible error; hence, the CEEMDAN provides superior spectral separation of the modes and an accurate reconstruction of the original signal at a lower computational cost [35,36]. e process of the CEEMEDAN is as follows.
Step 1: add realizations of a white Gaussian noise series and generate the corresponding signal: where y(n) is the original signal, i(i � 1, 2, . . . , I) is the number of the trials, ε 0 is the noise standard deviation, and ω i (n) denotes the realizations of the white Gaussian noise series.
Step 2: decompose I realizations y i (n) via EMD to obtain their first intrinsic model function (IMF): Step 3: calculate the first residue r 1 (n) (at the first stage k � 1): Step 4: decompose the noise-added residues until their first EMD mode and define the (k+1)-th IMF of the CEEMDAN: where the operator E k (·) produces the k-th mode of the EMD.
Step 5: calculate the k-th residue: where k � 2, . . . , K, K is the total number of modes.
Step 6: repeat step 4 and step 5 for the next k until the obtained residue is not suitable for decomposition. e final residue is expressed as Hence, the original signal can be expressed as 2.2. Random Forest. Random forest is an ensemble learning method that uses decision trees as base learners and measures the importance of each variable. Random forest is a variant of bagging (bootstrap aggregating), which considers all m attributes or variables when establishing a base learner. Decision trees tend to overfit the data, thereby leading to low bias but high variance. To overcome this drawback, random forest constructs decision trees and integrates the results of all individual trees to generate the final outcome [37]. In classification, the mode of the classes that are predicted by the trees is adopted as the output, while, in regression, the mean of the predictions of all individual trees is used. Since bootstrapping is used to generate the sample from which an individual tree is grown, only approximately 63.2% of the observations are used to produce each base learner, and the remainder of the training set can be used as a validation set: suppose that there are m samples. e probability that not all the samples are chosen in the training set is , with x representing the observations and y representing the response variable, the out-of-bag (OOB) estimate, which is denoted by H OOB (x), is produced by base learners [32,33]. Given an ensemble of classifiers , it is defined as follows: where Y is the set that consists of all possible values of the response and D t is the subsample that is applied to produce base learner h t . e OOB estimate of the generalization error is expressed as where D is the training set. Both bagging and random forest measure the importance of each variable via the following procedure: (1) calculate the OOB error of each observation and average them to obtain the OOB error of the forest. (2) Permute the values of the j-th feature among the training set. (3) Compute the OOB error again on this perturbed data set, and the difference in the OOB error before and after the permutation can be used to measure the importance of the j-th variable, where a larger difference corresponds to higher importance of variable j.

Gram-Schmidt-Based Feature Selection Method (GSFS).
Feature selection substantially facilitates the reduction of the model complexity and the enhancement of the forecasting accuracy when establishing a forecasting model. Many feature selection methods are designed based on penalized regression, which shrinks the estimate; these methods yield inaccurate results. In this study, a subset selection method, namely, the Gram-Schmidt-based feature selection method (GSFS) [38], is used for feature selection. e objective of GSFS is to identify all the relevant features that contribute to the model forecasting. e details of GSFS are presented as follows.
GSFS iterations: let G (0) be the empty set and initialize y 0 and k to 0. Here, y 0 denotes the initial forecasting values. Calculate the pseudoresponse R (k) t � y t − y k and select the optimal variable x j k+1 by regressing R (k) t on x: where x j k+1 . e objective in building G (1) is to add the optimal variables into the empty set.
In the next M n � ((n/log p) (1/2) /2) steps, we will conduct orthogonalization before selecting each new feature. If Similar to the first step, compute the pseudoresponse O (k) t � y t − y k and consider where e objective of this procedure is to select variables that are orthogonal with each other sequentially and add them into an empty set. e main advantage of GSFS is that it does not involve any penalized regression. It detects the relevant features by examining each feature sequentially. Although it may take more time, it will not miss any important data information. In the second step, GSFS computes the pseudoresponse as the target vector, which can be used to find the most strongly correlated features with the pseudoresponse.
is pseudoresponse differs from the ordinary response that is applied in regression or classification problems. It is updated in every iteration of GSFS to eliminate the effects of selected features. An orthogonalization procedure that is based on the Gram-Schmidt method is applied to make all the features orthogonal with each other. is procedure removes the dependencies among features and shortens the computation time for inverting the projection matrix. e determination of M n plays an important role in guaranteeing the accuracy and the selection consistency. Let p � p n ⟶ ∞; namely, let the number of features increase with the sample size. We must impose the following regularity condition before proving the theorem.
It follows that lim n⟶∞ (log p n )/n � 0. Assume that the noise term is independent of each feature and let u and u 0 be constants; namely, (C2) E(exp(uε)) < ∞ for u < u 0 Let σ 2 j � E(x 2 j ), which represents the noise level and let z j � (x j /σ j ) and z tj � (x tj /σ j ). Assume that there exists u 1 such that eorem 1 establishes the selection consistency of the GSFS method and provides a theoretical guarantee on the choice of M n . Due to the selection consistency, the feature selection method can detect all the important features as the sample size approaches infinity. Here, G M n contains the features that are selected in step M n . To select the optimal model with all the important features, high-dimensional information criteria (HDIC) will be applied after GSFS. e proof of eorem 1 is similar to [34] and is omitted here. [39] proposed the whale optimization algorithm (WOA), which is a novel metaheuristic that mimics the bubblenet foraging behavior of humpback whales in searching for their prey. It is illustrated in Figure 2(a). e humpback whales attack small fishes or schools of krill that are close to the surface by swimming in a shrinking circle and creating distinctive bubbles along a spiral-shaped path. e algorithm includes two phases: the first phase is encircling a prey and applying the spiral bubble-net attacking method, and the second phase is searching randomly for a prey. e details of each phase are presented as follows.

Encircling Prey.
e humpback whales dive downwards and generate bubbles in a spiral shape, which makes the prey (typically krill) feel threatened and, hence, gather together. It is assumed that there are N humpback whales in a d-dimensional search space. e position of each whale is denoted by N). e position of the prey represents the globally optimal solution to the problem. e humpback whales can identify the prey's position and encircle it. However, there is no priori information about the optimal solution. erefore, in the WOA, the current best position is adopted as the position that corresponds to the optimal solution. Once this position has been identified, other humpback whales will gather around it and update their position as follows [36]:

Advances in Meteorology
where t is the current iteration, X(t) is the current position, X * (t) is the current best position, and A and C are the coefficient vectors: where rand 1 and rand 2 are 2 random numbers that are in the interval (0, 1) and a is a convergence factor, which decreases linearly as the iteration proceeds. Define t as the iteration number and t max as the maximum allowed number of iterations. Factor a is defined as Two mechanisms are designed to describe the hunting behavior of the humpback whales, namely, the shrinking encircling mechanism and the spiral position updating mechanism, as illustrated in Figures 2(b) and 2(c). Shrinking encircling is realized as the convergence factor a decreases, while, in spiral position updating, the spiral movement of the humpback whales is simulated by the following: where D ′ � |X * (t) − X(t)| is the distance between the humpback and the prey, b is the shape parameter of the spiral, and l is a random number that is in the interval (−1, 1). Since the humpback whales perform these two behaviors simultaneously, in each iteration, each of these mechanisms has a probability of 0.5 to be utilized. e corresponding model is as follows: where p is a random number in (0, 1).

Search for Prey.
To enhance the exploration in WOA, instead of updating the positions of the humpback whales based on the best position so far, a random position is chosen to guide the search accordingly. A with random values that are greater than 1 or less than −1 is applied to force the position to move far away from the best position. e model of searching for prey is defined as [40] D where X rand is a random position (or a random whale) that is chosen from the current population.

Proposed Hybrid Forecasting Model: RF-GSFS-WOA
In this paper, the proposed RF-GSFS-WOA is defined by the following procedure.  Advances in Meteorology Step 1: the original wind speed time series y(n) is decomposed into I IMFs (let IMF I � R (n)) by following Step 1-Step 6 in Section 2.1.
Step 2: a quadratic model (QM) is established for each IMF i : where x 1 , . . . , x p are variables (first order) in the original dataset; x j x k are quadratic variables (second order) that are constructed by calculating the component-wise product of x j and x k ; a 1 , a 2 , . . . , a p , . . . , a 11 , . . . , a pp are the corresponding coefficients of the variables and quadratic variables; and ε 1 , . . . , ε n are independent Gaussian noise errors that follow N(0, σ 2 ). I quadratic models will be obtained, which are denoted by QM 1 , . . . , QM I . As p 2 + p variables are included in the QM, the computational cost is high even if the value of p is moderate. erefore, a variable selection method is urgently needed to reduce the computational cost.
Step 3: to reduce the computational cost of QM in Step 2, GSFS is used to complete the variable selection task. With slight abuse of notation, , and the optimal variables are selected by regressing R (k) t on where In the next M n � ((n/log p) 1/2 /2) steps, the data will be orthogonalized. where and Step 4: random forest model RF i is built over the variables that were selected in Step 3.
Step 5: assign a weight ω i to each RF i , and the final forecast model is provided by the ensemble model (EM): e RF-GSFS-WOA algorithm is presented in detail as Algorithm 1. e proposed combination forecasting method, namely, RF-GSFS-WOA, has the following advantages. Instead of applying a single decision tree, RF grows many decision trees and, hence, can handle the correlation among the features. It bootstraps samples, which promotes the diversity of the base learners and, consequently, reduces the number of variables of the model. Furthermore, RF outperforms bagging as the number of base learners increases. It trains fast since it applies a fraction of the features instead of considering all possible features. e time that is required for training RF is often shorter than that for bagging. e overfitting problem of RF can be controlled well via the selection of a suitable number of features in each decision tree. GSFS is a greedy feature selection method, which is a type of forward selection method. In contrast to penalized feature selection methods such as LASSO and SCAD, it does not introduce any bias into the model. WOA is a metaheuristic method that can obtain the globally optimal solutions of nonlinear optimization problems. A flowchart of the proposed RF-GSFS-WOA is presented in Figure 3. e RF-GSFS-WOA algorithm is presented in the Appendix.

Case Studies
is section describes the data collection process, establishes the forecasting assessment criteria, and presents, discusses, and analyzes the experiment results. e main objective of the experiments is to evaluate the performance of the proposed hybrid forecasting model, namely, RF-GSFS-WOA.

Data Collection.
Qinghai province is located in the northeast of the Qinghai-Tibetan Plateau, and most of the area is rich in wind energy resources; hence, it is suitable for the installation of large-scale wind farm. us, it is meaningful to investigate sites in this region. e hourly data were collected between 0 : 00 and 24 : 00 from 1/1/2014 to 12/31/ 2014 at six sites in the Qinghai region. e dataset that is used in this study was collected by National Renewable Energy Laboratory (NREL) and is available at https://www. nrel.gov/gis/tools.html. To simplify the description, the six sites are denoted by Site 1, Site 2, Site 3, Site 4, Site 5, and Site 6. e longitudes and latitudes of these locations are specified in Figure 4. Five meteorological factors, namely, the temperature (T), pressure (P), relative humidity (RH), wind precipitable water (PW), and wind direction (WD), are considered in the forecasting of the wind speed for the Calculate the pseudoresponse (7) Perform regression on the data with a pseudoresponse and select the most strongly correlated variable (8) Update the response IMF Orthogonalize G (k) via the Gram-Schmidt method. (12) Perform regression on the data with a pseudoresponse and select the most strongly correlated variable (13) Update the response IMF

Advances in Meteorology
Qinghai region, which facilitates the design of wind power plants to efficiently generate electricity for the local population [41]. ere are four seasons in each year: spring (Mar., Apr., and May), summer (Jun., Jul., and Aug.), autumn (Sep., Oct., and Nov.), and winter (Dec., Jan., and Feb.). e data are divided into training data and forecasting data as follows: 20 days in each season are randomly chosen as the training data (20 days × 24 hours/days � 480) for building the model, and the forecasting data include 5 days (5 days × 24 hours/day � 120) that are randomly chosen from the remaining days in each season. e performances of various forecasting models are evaluated in the four seasons. e training sets and forecasting sets for the six sites are described in Table 1. e forecasting sets account for approximately 20% of the total data, which is reasonable.

Forecasting Assessment Criteria.
To evaluate the results of the forecasting models quantitatively, four assessment measures, namely, the mean absolute error (MAE), root mean square error (RMSE), mean absolute percent error (MAPE), and eil inequality coefficient (TIC), are applied. All are calculated between the actual data and the forecasted values, and the smaller their values, the better the forecasting results. ese criteria are expressed as follows: where N is the size of the forecasting sample and y i and y i are the actual value and the forecasted value, respectively, for time period i.

Experiment Results and Corresponding
Analysis. e experimental settings are described in detail in Table 2. e parameters in the complementary ensemble empirical mode decomposition with adaptive noise (CEEMDAN) are set as follows: the noise standard deviation (Nstd) ε 0 is set as 0.2, the number of realizations (Nr) is set as 500, and the maximum allowed number of sifting iterations (MaxIter) is set as 500.
e regularization parameters in LASSO and  Furthermore, Elman is trained using backpropagation through time, and the corresponding weights are selected based on a uniform distribution. Based on the experimental results, the gradient descent with momentum is set as 0.82, and the adaptive learning rate is set as 0.02. In GSFS, we set q as 30; hence, 30 variables will be chosen after the GSFS procedure, and, thus, there are 1200 variables in the quadratic model. e number of trees in RF is set as 3000, and the maximum number of iterations (MaxIterRF) is 500. RF is implemented using MATLAB function TreeBagger. e fitness function of WOA is selected as "F1," which is the square error loss function, and the range is (0, 1). e weights are determined after normalization such that the sum of all the weights is 1.
To evaluate the forecasting performance of the proposed hybrid model for forecasting hourly wind speed, experiments for six sites and four seasons are designed in this paper. Four types of assessment criteria are used in all experiments to evaluate the performance. e proposed hybrid forecasting model is RF-GSFS-WOA, and six forecasting models, namely, an Elman neural network, Elman-LASSO, Elman-SCAD, a BP neural network (BP), BP-LASSO, and BP-SCAD, are applied as comparison models to evaluate the performance of the developed hybrid model. Wind speed data exhibit volatility or instability due to changes in the weather; hence, this study conducts a preprocessing step before building all the models; namely, the complementary ensemble empirical mode decomposition with adaptive noise (CEEMDAN) is used to eliminate the noise and smooth the wind speed time series data before the developed hybrid model and six comparison models are established for hourly wind speed forecasting in four seasons at six sites.
In each season, 20 days of hourly wind speed time series are selected randomly as the training phase for the construction of the models, and 5 days of wind speed time series are selected randomly as the forecasting phase. us, 80% of  12 Advances in Meteorology the data will be used as training data; the remaining data (20%) will be used as test data. e reason for this splitting scheme is that the model can be trained sufficiently with more observations (80% of the data) and overfitting can be avoided. However, the number of observations in the test data is also important: the test accuracy cannot be evaluated accurately if the number of observations is too small (less than 20%). e multiple output strategy (5 days are used as outputs), which is one of the multistep forecasting techniques, is applied in the experiments. e wind speed is influenced by several meteorological variables. Consequently, seven forecasting models consider multidimensional meteorological factors, such as the temperature, pressure, relative humidity, precipitable water, and wind direction, as dependent variables in multivariate models. e use of multivariable inputs increases the difficulty of determining the model parameters.
Four performance criteria, namely, MAPE, RMSE, MSE, and TIC, for each forecasting model in the four seasons at the six sites, are listed in Tables 3-5. Figures 5-7 present the forecasting results of the seven models for observation sites 1-6, which successfully illustrate that the proposed hybrid model RF-GSFS-WOA yields closer results to the real wind speed data than any other forecasting model in all instances. Furthermore, the developing hybrid framework realizes the best MAPE, RMSE, MSE, and TIC values. For hourly wind speed forecasting with multidimensional meteorological variables, the forecasting performance is described in detail in Tables 3-5 and Figures 5-8, from which the following findings are obtained.
(I) From Table 3 and Figure 5, it is found that Elmanbased models, namely, Elman, Elman-LASSO, and Elman-SCAD, outperform the BP-based models, namely, BP, BP-LASSO, and BP-SCAD, and that Elman and BP yield the worst forecasting results. For instance, at Site 1, the developed hybrid model, namely, RF-GSFS-WOA, outperforms the other six forecasting models. e RF-GSFS-WOA model realizes the smallest MAPE of 2.44% in summer, and similar results are obtained in terms of the MAE, RMSE, and TIC criteria. (II) According to Table 4 and Figure 6, the proposed RF-GSFS-WOA still outperforms other forecasting  Table 3 and Figure 5, the SCAD-related forecasting models outperform the LASSO-related forecasting models. is is mainly because SCAD can better handle the collinearity of variables than LASSO. Similar phenomena are observed in Table 5 and Figure 7. RF-GSFS-WOA realizes remarkable forecasting performances. (III) According to Figure 8, which presents the average MAPE and RMSE values of the compared forecasting methods in the four seasons, RF-GSFS-WOA realizes the lowest MAPE values over all seasons, which are 4.04%, 3.73%, 3.02%, 3.08%, 2.81%, and 4.13% at the six sites. Elman-SCAD still outperforms Elman-LASSO and BP-SCAD. e worst value is obtained by BP, which is approximately 20%. BP-LASSO and BP-SCAD slightly outperform BP, which does not perform as well as Elman. erefore, from the discussions above, the proposed hybrid forecasting model, namely, RF-GSFS-WOA, performs well in the wind speed forecasting task by combining the advantages of machine learning module RF, feature selection approach GSFS, and metaheuristic method WOA.

Statistical Test.
To further evaluate the performance of the proposed forecasting model, a nonparametric statistical test, namely, the Friedman test, is conducted. e forecasting performances of the compared methods differ among the six sites [42]. e following hypothesis, namely, the null 14 Advances in Meteorology hypothesis H 0 , along with the alternative hypothesis H 1 , is considered in the Friedman test: where L j � (1/B) i a j i , with a j i representing the j-th rank of k algorithms on a dataset of size B. e Friedman statistic is defined as follows [43]: is statistic follows chi-square distribution χ 2 F (k − 1). An F-distribution statistic, namely, F(k − 1, (k − 1)(B − 1)), is calculated as follows: Suppose that there are significant differences among the comparing forecasting methods. en, the null hypothesis H 0 will be rejected. If this occurs, critical differences (CDs) will be applied for further comparisons, which is known as a post hoc test. A Bonferroni-Dunn test is used with CD values q 0.1 ���������� k(k + 1)/6B, where q 0.1 is determined as 2.291 in [44]. e Friedman test is described in detail in Table 6. All the F F values are much larger than the critical value F(6, 30) � 2.42.
is suggests that the null hypothesis H 0 should be rejected and a post hoc test is needed. Using the above-described method, the CD value is calculated as q 0.1 ���������� k(k + 1)/6B � 2.291 × ��������������� 7 × (7 + 1)/(6 × 6) � 2.8574, which is used as a benchmark for comparison. e proposed RF-GSFS-WOA significantly outperforms BP-LASSO, BP, and Elman since the differences between the average ranks of these methods exceed 2.85. For instance, the difference between RF-GSFS-WOA and BP-LASSO is 4.67 for MAE, which is larger than 2.85. Compared to other methods, the proposed method does not show strong advantages in forecasting. However, based on the above analysis, the proposed method delivers remarkable forecasting results.