A Hybrid Model Based on Ensemble Empirical Mode Decomposition and Fruit Fly Optimization Algorithm for Wind Speed Forecasting

As a type of clean and renewable energy, the superiority of wind power has increasingly captured the world’s attention. Reliable and precise wind speed prediction is vital for wind power generation systems. Thus, a more effective and precise prediction model is essentially needed in the field of wind speed forecasting. Most previous forecasting models could adapt to various wind speed series data; however, these models ignored the importance of the data preprocessing and model parameter optimization. In view of its importance, a novel hybrid ensemble learning paradigm is proposed. In this model, the original wind speed data is firstly divided into a finite set of signal components by ensemble empirical mode decomposition, and then each signal is predicted by several artificial intelligence models with optimized parameters by using the fruit fly optimization algorithm and the final prediction values were obtained by reconstructing the refined series. To estimate the forecasting ability of the proposed model, 15min wind speed data for wind farms in the coastal areas of China was performed to forecast as a case study. The empirical results show that the proposed hybrid model is superior to some existing traditional forecasting models regarding forecast performance.


Introduction
e world's current sources of fossil fuels will eventually be depleted, mainly due to high demand and, in some situations, extravagant consumption [ ].
e recently posted Energy Outlook of British Petroleum predicts that primary energy consumption will increase by % between and , with growth averaging . % per year. Approximately % of the expected growth will be in countries that are not members of the Organization for Economic Cooperation and Development (OECD), with energy consumption growing at . % per year [ ]. According to some statistics, energy demand worldwide will grow rapidly by one-third from to , and China and India will become the largest contributors, accounting for percent of the growth during that period. Moreover, China is expected to be the largest oil importer by [ , ]. To cope with the growing demand for energy, countries such as China can look to renewable energy sources to provide an opportunity for sustainable development. e signi cance of renewable sources was recently underpinned by a plethora of advocates and reports, which have mostly focused on wind energy studied by the related institutions and energy commissions of several countries [ , -]. According to reports from the China National Renewable Energy Center (CNREC), wind resources in China are rich and promising prospects, carrying a potential of more than . TW, mostly in the ree North Areas, with an onshore potential of more than . TW. Before , land-based wind power will dominate, with o shore wind power in the demonstration status. Furthermore, the annual discharge of carbon dioxide will be reduced to . billion tons and . billion tons in in the conservative a n da g g r e s s i v es c e n a r i o s ,a n da ne s t i m a t e d j o b s and jobs will be created, respectively [ , ]. Based on these gures, wind energy should be regarded as an a p p e a l i n ge n e r g yo p t i o nb e c a u s ei ti sb o t ha b u n d a n ta n d data processing, EEMD was especially devised for nonlinear and complicated signal sequences, such as wind speed series. For example, Hu et al. [ ] proposed a hybrid method based on the EEMD to disassemble the original wind speed datasets into a series of independent Intrinsic Mode Functions (IMFs) and use SVM to predict the values for IMFs in di erent frequencies. Jiang et al. [ ] also proposed a hybrid model for high-speed rail demand forecasting based on EEMD, in which the original series are decomposed into certain signals with di erent frequencies and then the grey support vector machine (GSVM) is employed for forecasting. Zhou et al. [ ] additionally proposed a hybrid method based on EEMD and the generalized regression neural network (GRNN). In this method, the original data are decomposed into di erent IMFs with corresponding frequencies and the residue component byEEMD ,andtheneachcomponentistakenasaninputto establish GRNN forecasting model.
Each of the aforementioned models only employs a single ANN model to predict all of the signal sequences decomposed by EEMD; nevertheless, di erent signals have di erent characteristics, meaning that a simple individual model can no longer adapt to all properties of the data. Moreover, previous literature has not addressed which features are best suited for choosing the most appropriate approach. us, in our study, we propose a hybrid model based on a model selector that combines RBF, GRNN, and SVR to address signal data series with di erent characteristics to further improve forecasting accuracy.
In existing neural network training structures, model parameters are very vital factors a ecting prediction precision, and di erent types of data require di erent parameters. e genetic algorithm (GA) and particle swarm optimization (PSO) algorithms are the most common approaches to optimize the parameters of neural network structures. Liu et al. [ ] used the genetic algorithm to determine the weight coe cients of a combined model for wind speed forecasting. Zhao et al. [ ] developed a combined model for energy consumption prediction based on model parameters optimization with the genetic algorithm. Ren et al. [ ] applied the particle swarm optimization to set weight coe cients of a forecasting model for -hour wind speed forecasting. However, these meta-heuristic algorithms have the drawbacks of being hard to understand and achieving the global optimal solution slowly. e fruit y optimization algorithm (FOA) [ ] was a new optimization and evolutionary computation technique, which has distinct advantages in its simple computational process, fewer parameters to be ne-tuned, and stronger ability to search for global optimal solutions and outperforms other metaheuristic algorithms [ , ]. In our study, we introduce the FOA algorithm to automatically determine the necessary parameters of the RBF, GRNN, and SVR models to achieve better performance. e rest of the paper is organized as follows. Section brie y introduces related methods while Section describes the proposed hybrid approach in detail. Section describes the dataset used for this study and discusses the forecasting results of proposed model compared with other prediction models. Section concludes the work.

Related Methodology
is section brie y introduces EEMD, FOA, and three classical forecasting models: RBF, GRNN, and SVR, which will be used in our research.
. . RBF. e radial basis function (RBF) neural network is a type of feedforward network developed by Broomhead and Lowe [ ]. is type of neural network is based on a supervised algorithm and has been widely applied to interpolation regression, prediction, and classi cation [ -]. It has three layers of architecture, where there are no weights between the input hidden layers, and each hidden unit implements a radial-activated function. e Gaussian activation function is used in each neuron at the hidden layer, which can be formulated as where is the th input sample, is the mean value of the th hidden unit presenting the center vector, is the covariance of the th hidden unit denoting the width of the RBF kernel function, and is the number of training samples. e network output layer is linear so that the th output is an a ne function that can be expressed as where is the weight between the th output and th hidden unit, istheb iasedweigh to fthe th output, and is the number of hidden nodes.

. . GRNN.
e general regression neural network (GRNN), rst proposed by Specht [ ], is a very powerful computationaltechniqueusedtosolvenonlinearapproximationproblems based on nonlinear regression theory. e advantages of GRNNs include its good feasibility, simple structure, and fast convergence rate. It consists of four layers, and its basic principles are presented in Figure F : A structure schematic chart of GRNN (where = 1, 2, ..., , is the input variable of the network, is a training vector of the th neuron in the pattern layer, denotes the smoothing parameter (also called spread parameter), is the measured value of the output variable, is the pattern Gaussian function, 1 and 2 are the network weights, 1 and 2 are the signals from summation neurons, and is the network output). de nition of the IMF above, the EMD process of a raw data series ( ) ( = 1,2,..., )canbeformulatedas where ( ) denotes any nonlinear and nonstationary signal, imf ( ) is the th IMF of the signal, and ( ) is the residual item, which can be a constant or the signal mean trend. However, the EMD method is imperfect, and the modemixing problem [ ] is encountered frequently in practical application. Due to the mentioned drawback of EMD, the advent of the EEMD method was proposed by Wu and Huang [ ], and the procedures of EEMD can be presented as follows.
Step (a). Add a white noise series to the original data.
Step (b). Decompose the data with added white noise to IMFs through the EMD algorithm.
Step (c). Repeat the abovementioned two steps, but add white noise series at di erent scales each time.
Step (d). Calculate the means of each IMF of the decomposition to constitute the nal IMFs.
As a result, the white noise series incorporated into the original signal can provide a uniform reference scale to facilitate the EMD process and, consequently, help extract the true IMFs. e relationship between the ensemble number, theerrortolerance,andtheaddednoiselevelcanbedescribed

Advances in Meteorology
accordingtothewell-establishedstatisticalruleprovedbyWu and Huang: where is the amplitude of the added noise, is the nal standard deviation of error, and is the value of ensemble members. Generally, it is suggested that an amplitude xed at . will result in an exact result. In this study, we set the value of ensemble members to and select the optimal standard deviation of white noise series from . to . with a -fold cross-validation method.

. . Fruit Fly Optimization Algorithm (FOA).
e fruit y optimization algorithm (FOA), imitated by the food-nding behavior of the fruit y, is a new swarm intelligence algorithm that was put forward by Pan in [ ]. It is an interactive evolutionary computation method for nding global optimization and has been shown to perform better than traditional metaheuristic algorithms. e FOA succeeds in solving optimization challenges and has received signi cant attention in multiple scienti c and academic elds. e fruit y, a type of insect, is superior to other species in visual and olfactory sensory abilities. It can make the most of its instinctive advantages to nd food, even capable of smelling a food source from km away. e fruit y's method of searching for food starts by using the olfactory organ to smell food odors in the air and then ies towards that location. Upon getting closer to the food location, it continues to seek food and the company's ocking location using its keen eyesight, and then it ies to that position too. Figure shows the iterative process of food searching of a fruit y swarm.
A rudimentary FOA algorithm is outlined as shown in Algorithm .

Combined Model
e combined model rst applies the EEMD technique to decompose the original time series into a collection of relatively stationary subseries, and the model selection is used T : Four evaluation rules.

Metric Equation
De nition to select the optimal model above arti cial neural networks based on FOA optimization for predicting each subseries. e prediction results are then aggregated to obtain the nal prediction values of wind speed series.
rough the process of EEMD, distinct information scales in the original wind speed series can be determined and decomposed into a set of IMFs. Additionally, di erent IMFs exhibit di erent frequency characteristics, and the instantaneous frequency of each IMF has its meaning at any point. Moreover, no clear theory exists to determine which characteristic is best suited for choosing the most suitable approach. us, we must describe some performance metrics to comprehensively measure the strengths of di erent models. To evaluate the forecast capacity of the proposed models, three evaluation criteria are applied in model selection. ey are the mean absolute error (MAE), root meansquare error (RMSE), and index of agreement (IA), as shown in Table . Here, and̂ denote the real and predicted values at time ,respectively . is the sample size. e IA is a dimens i o n l e s si n d i c a t o rt h a tp o r t r a y st h es i m i l a r i t yb e t w e e nt h e observed and forecasted tendencies. e range of IA is from to a ndf o ra" perf ect "modeltheval ueo fI Aiscloseto whiletheMAEandRMSEareequivalentto .
Step (model selection and optimization of model parameters). First, select the appropriate parameter from the RBF, GRNN, and SVR models by the FOA. Next, the abovementioned models are then selected by model selection to forecast IMFs and a residual R.
Step (ensemble forecast). Combine the forecasting results of each signal component to obtain the nal result.

Results and Analysis
In this section, the process descriptions of RBF, GRNN, and SVR models optimized by the FOA are presented rstly and then followed by the process descriptions of the model selection. Results conclude with the nal forecasting results ofthehybridmodelcomparedtootherdi erentforecasting models.
. . Data Selection. Shandong Province located in eastern China has abundant wind energy resources. In our study, the wind speed series from the wind farm in Weihai was used to examine the performance of the combined model. Figures  (a) and (b) present the statistical measures and visual graphs of four wind speed datasets, which show apparent di erences between the four seasons.
us, the original wind speed data, picked randomly corresponding to the four seasons of the year, are used to test whether the proposed models can be applied on di erent occasions. e wind speed d a t aw e r es a m p l e da ta ni n t e r v a lo f m i n ,s ot h e r ea r e data records per day. Data from days, providing a total of points of min data, were selected for model training, and the next of the min data values were used to test the e ectiveness of the developed hybrid model (as shown in Figure (b)).
. . e Performance Metric. Forecasting accuracy is an important criterion in the evaluation of forecasting models. Inthispaper ,threemetricruleswereappliedtoevaluatethe Advances in Meteorology e best fruit fly Food Fly group Iterative evolution : e process of food-seeking of a fruit y swarm.
accuracy of forecasting models, as shown in Table . In addition, two benchmark models and bias-variance framework areusedtotestthehybridmodel.
. . . Persistence Model. e persistence model as a simple statistical model, which has simple calculation and provides accurate prediction in a very short time, has been widely used as benchmark model to evaluate the accuracy of more advanced forecasting model. e persistence model can be given bŷ wherê is the forecasting value, is a time index, and is the look-ahead time.
. . . Autoregressive Integrated Moving Average (ARIMA). ARIMA model is widely used because it can characterize nonlinear data. A general ARIMA model is known as ARIMA ( , , ),where is the order of the autoregressive part, is the number of di erences from the original time series data to make it stationary, and is the order of the moving average portion. e general equation for ARIMA models is where is the observed value at time , is the th autoregressive parameter, is the th moving average parameter, and is the error at time .
. . . Bias-Variance Framework. To estimate the availability of the wind speed forecasting models, bias-variance framework [ ] was employed to evaluate accuracy and stability of theproposedhybridmodelandsinglemodels.Let −̂ be Step 3 Step 2 Remove Forecasting results

Mode reconstruction
F : e procedures of wind speed forecasting using the hybrid model. the di erence between observed value and predicted valuê , and the average di erence over all points is where is the th data for performance evaluation and is all the forecasting data used for performance evaluation. e expectation of the total number of forecasting values is (̂ ) = (1/ ) ∑ =1̂ , and the expectation of the actual value is = (1/ ) ∑ =1 . e bias-variance framework can be decomposed as follows: where Bias 2 (̂ ) indicates the prediction accuracy of the forecasting model and Var (̂ ) demonstrates the stability.
. . Process of Parameter Optimization. Selecting the appropriate parameter is very critical to improving the accuracy of model prediction; thus, the abovementioned FOA is used to optimize the parameters of the RBF, GRNN, and SVR  A e r that, the o spring is entered into the three models, and the smell concentration value is calculated again. en, smell concentration (Smell ), replacing with the smell concentration judgment function (also called the tness function), is calculated; with the smaller value of tness function, the better results will be found. rough the fruit y's random food searching using its sensitive sense of smell and ocking to the location of the highest smell concentration using its vision, the optimal parameters of the three models are obtained.
To test the e ect of the model parameters optimized by theFOA,thefourseasonsofwindspeeddatawereselected. e three criterions were employed to evaluate the perform a n c eo ft h et h r e em o d e l so p t i m i z e db yt h eF O A .R e s u l t s of the comparison are shown in Table and

Advances in Meteorology
: e process of the hybrid model.
. . e Process of Model Selection. Given the complexity and chaos of the original wind speed series, the tendency of wind speed is very di cult to directly predict by using the abovementioned individual models. As such, the original wind s p e e dd a t a s e t sa r ed e c o m p o s e di n t os e v e r a lI M F sa n da residue ( ) by EEMD, which make the raw datasets easier to simulate. e FOARBF, FOAGRNN, and FOASVR models a r eu s e dt of o r e c a s te a c hI M Fa n dt h er e s i d u e ( ) as the input nodes, hidden nodes, and output nodes of the three neural networks are set to , , and , respectively. e rolling o p e r a t i o nm e t h o dw a su s e di nt h i sp a p e r ,a n dt h ew i n d speed data in four seasons were selected to test the proposed models.
e selection process of the hybrid model is shown in Figure and its results are shown in Tables -, and  . . Forecasting Results and Comparative Analysis. In the abovementioned process, the six independent IMFs and one residual decomposed by EEMD are predicted by three di erent models: FOARBF, FOAGRNN, and FOASVR. e optimal model corresponding to each IMF and ( ) is then selected through model selection. In Step , each IMF is predicted by the selected optimal methods, and the nal results are obtained by assembling the forecasting results of each IMF.
. . . Forecasting Comparison Results. To e va lu ate t he p erformance accuracy of the proposed hybrid model based on model selector, three single models and two benchmark models are employed to compare with the hybrid model. Single models include the FOARBF, FOAGRNN, and FOASVR, each of which is used for forecasting all of the signals decomposed by EEMD. Two benchmark models include persistence model and ARIMA model. e comparison results for forecasting ability are as shown in Table . Detailed analyses are elaborated as follows: Above all, the proposed hybrid model has been veri ed as an e ective approach for improving the forecasting performance through the analysis of the prediction results.
. . . Tested with Bias-Variance Framework. us, it is clear that the hybrid model has a higher accuracy and stability in wind speed forecasting, and it performs much better than individual models in forecasting.

Conclusions
Reliable and precise wind speed forecasting is vital for wind power generation systems. However, wind speed shows nonlinearity and nonstationarity, which pose great challenges to the task of predicting wind speed precisely. Regarding the currently available forecasting models, the single model applied for forecasting wind speed has limited capacity and is not suitable for all situations. r e c a s t i n g accuracy. e experimental results indicate that the proposed hybrid model has minimum statistical error in terms of MAE, RMSE, IA, and bias variance, and it proved that the proposed hybrid method performs better than single models and is superior to other hybrid models as well, such as the EEMD-FOARBF, EEMD-FOAGRNN, and EEMD-FOASVR. Basedontheabovementionedanalysis,weconcludethatthe proposed hybrid model can not only take full advantage of several single ANNs to improve prediction accuracy but also easily implement the task in wind parks. . [ ]J .W a n g ,W .Z h a n g ,J .W a n g ,T .H a n ,a n dL .K o n g ," An o v e l hybrid approach for wind speed prediction, " Information Sciences,vol. ,pp. [ ] E. Haven, X. Liu, and L. Shen, "De-noising option prices with the wavelet method, " European Journal of Operational Research, vol. ,no. ,pp.
-, . [ ] X. Jiang, L. Zhang, and M. X. Chen, "Short-term forecasting of high-speed rail demand: a hybrid approach combining ensemble empirical mode decomposition and gray support vector machine with real-world applications in China, " Transportation Research Part C: Emerging Technologies,v o l . ,p p .
[ ] Q .Z h o u ,H .J i a n g,J .W a n g,a n dJ .Z h o u ," Ah yb ri dm od e lf o r PM . forecasting based on ensemble empirical mode decomposition and a general regression neural network, " Science of the Total Env ironment,vol. ,pp.