A Novel Hybrid Model for Short-Term Wind Speed Forecasting Based on Twice Decomposition, PSR, and IMVO-ELM

Accurate wind speed forecasting is an effective way to improve the safety and stability of power grid. A novel hybrid model based on twice decomposition, phase space reconstruction (PSR), and an improved multiverse optimizer-extreme learning machine (IMVO-ELM) is proposed to enhance the performance of short-term wind speed forecasting in this paper. In consideration of the nonstationarity of the wind speed signal, a twice decomposition based on improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN), fuzzy entropy, and variational mode decomposition (VMD) is proposed to reduce the nonstationarity of the original signal firstly. )en the PSR based on C-C method is employed to reconstitute the decomposed signal as the input of the prediction model. Lastly, an improved multiverse optimizer is proposed to improve the stability and efficiency of ELM which is used as prediction model. Furthermore, two experiments are designed to verify the performance of the proposed method; the results indicate that (1) the wind speed forecasting with twice decomposition of original wind speed signal is better than other once-decomposition methods and much better than forecasting without decomposition; (2) the C-C-PSRmethod can determine the input dimension of ELM and improve the prediction accuracy of ELM; (3) the IMVO has improved the stability of ELM, and the optimization efficiency is better than other comparison optimization methods. )e results show that the proposed hybrid approach is a useful tool for short-term wind speed forecasting.


Introduction
With exhaustion of fossil energy and increase of requirements of environmental protection, energy supply has become an important problem. Developing clean energy is an effective way to solve energy problems. Wind energy as a cheap, recyclable, pollution-free energy has been vigorously developed by many countries, and the capacity of wind turbine is increasing rapidly [1]. According to statistics, the wind-turbine capacity increased from 487 GW in 2016 to 702 GW in 2020 [2].
Wind speed has the characteristics of randomness, intermittence, and fluctuation which makes the output power of wind turbine unstable. With the grid-connected largescale wind power, the unstable output power brings great challenge to power grid [3]. Accurate wind speed forecasting is an effective tool to improve the safety and stability of power grid [4]. Many of wind speed forecasting methods have been proposed in the fast few decades. e methods can be classified into two categories [5]: the physical-driven methods and the data-driven methods. e physical-driven methods are usually established with topography, temperature, density, air pressure, and altitude. And the numerical weather prediction (NWP) is employed for forecasting [6,7]. With the low resolution of NWP, the physical-driven methods usually cannot meet the demand of short-term wind speed forecasting [8].
e data-driven methods just need the history data for forecasting which is more suitable for short-term wind speed forecasting. e data-driven methods can be divided into two categories: statistical algorithms and artificial intelligence algorithms. e statistical algorithms employed for wind speed forecasting mainly include autoregressive moving average model (ARMA) and autoregressive integrated moving average model (ARIMA) [9,10]. e ARMA model is a linear model which is not very suitable for the nonstationary signals [11]. e ARIMA model can convert nonstationary signals into stationary time series which improved the prediction accuracy of wind speed [12]. With the development of computer science, the artificial intelligence algorithms have been widely employed in wind speed forecasting, such as support vector machine (SVM) [13,14], backpropagation (BP) [15], Elman neural network [16,17], and extreme learning machine (ELM) [18,19]. Among these artificial intelligence algorithms, the ELM has the fastest calculation speed and stronger generalization ability [20] which mean it is more suitable for short-term forecasting.
With the nonstationarity of wind speed, data preprocessing can get more useful data features from original wind speed signal to improve the prediction accuracy [21,22]. Data preprocessing methods have been widely used to reduce the nonstationarity of wind speed signal, such as wavelet transform (WT) [12], empirical mode decomposition (EMD) [23], ensemble empirical mode decomposition (EEMD) [24], complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) [25], improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) [26], and variational mode decomposition (VMD) [27]. e ICEEMDAN has solved the modal mixing problem and the residual components in intrinsic mode function (IMF) are greatly reduced [28,29]. e VMD can also solve the modal mixing problem by decomposing signal into band-limited subseries [30,31].
In this paper, a novel hybrid model for short-term wind speed forecasting based on twice decomposition, phase space reconstruction (PSR), and an improved multiverse optimizer-extreme learning machine (IMVO-ELM) is proposed. e proposed method includes data processing module, prediction module, and combination of final results module. A twice-decomposition method based on ICE-EMDAN, fuzzy entropy, and VMD is proposed as data processing module. A prediction model based on C-C-PSR and IMVO-ELM is proposed as prediction module. e main contributions of this paper are illustrated as follows: (1) A twice decomposition based on ICEEMDAN, fuzzy entropy, and VMD is proposed for wind speed signal to improve the prediction accuracy. e ICE-EMDAN is utilized to the original wind speed signal firstly. As some of the high frequency IMFs are still complex for the prediction, the VMD is employed to decompose the complexity of IMFs. And the fuzzy entropy is utilized to estimate the complexity of each IMF.
(2) e PSR based on C-C method is used for establishing the input signal of the prediction model to improve the prediction accuracy. (3) An improved multiverse optimizer is proposed to optimize the weight coefficients from input layer to hidden layer and the bias of hidden layer of ELM. e IMVO-ELM can improve the stability and efficiency of ELM. e rest of this paper is organized as follows: the theoretical background which is related to the proposed method is described in Section 2. In Section 3, the proposed hybrid model and the methodology of the article are described detailedly. Experiments are conducted and the results are analyzed in Section 4. Conclusions are given in Section 5.

Theoretical Background
e theoretical backgrounds related to the proposed method of this paper are briefly reviewed in this section, including ICEEMDAN, VMD, fuzzy entropy, PSR based on C-C, and ELM.
2.1. ICEEMDAN. ICEEMDAN is proposed by Colominas based on CEEMDAN which is recognized as the important improvement of EEMD [32]. e ICEEMDAN adds the mode of white noise to original signal instead of white noise which greatly reduces the residual noise in IMFs. e detailed steps of ICEEMADN are as follows: Step 1: e modes of white noise which is processed with EMD are added to the original signal.
where f is the original signal, β 0 is the SNR, E k [·] represent the the k-th subseries decomposed by EMD, and w (i) denotes the i-th white noise which adds to the original signal. I is the total number of white noises.
Step 2: e first-order residuals and the first IMF are calculated: where r 1 is the first-order residuals, M[·] represents the calculation of local mean value, and c 1 represents the first IMF.
Step 3: e rest of the orders of residuals and IMFs are calculated by the following equations: where r k represents the k-th order residual, and c k is the k-th IMF.

VMD.
VMD is an adaptive decomposition algorithm which can decompose a signal into IMF with limited bandwidth [30]. e detailed steps of VMD can be described as follows: Step 1: e variational problem of VMD can be described as 2 Complexity where f is the original signal, K is the number of IMFs of the original signal, u k is the k-th IMF of f, and ω k represents the center frequency of u k . Because equation (5) cannot be solved directly, the augmented Lagrangian function of equation (5) can be described as where η represents the Lagrange multiplier, and α represents the penalty factor.
Step 2: e u n+1 k , ω n+1 k , η n+1 k are updated to search the saddle point of equation (6). e updating process can be described as follows: where u k (ω), f k (ω), and η k (ω) are frequency domain signal of u k (t), f k (t), and η k (t). τ represents the updating step.
In the process, the center frequency and bandwidth of each mode are constantly updated, and several IMFs with narrow bandwidths are obtained finally.

Fuzzy Entropy.
Fuzzy entropy is an improved complexity evaluation method based on sample entropy [33]. Membership function in fuzzy theory is employed in fuzzy entropy to replace the threshold value in sample entropy which can make similarity evaluation more clearly. e detailed steps of fuzzy entropy are described as follows: A time series with N samples is assessed as [u(1), u(2), . . ., u(N)]. e phase space U is reconstructed with the time series which can be described as where m represents the dimension of the phase space. e maximum absolute distance of U(i) and U(j) is described as e similarity is calculated as

Complexity 3
where n and r present the gradient and width of the boundary of an exponential function. Equations (8)-(10) are repeated to get the similarity for phase space with m + 1 dimension. e Fuzzy entropy is defined as r) . (11) 2.4. PSR Based on C-C. e PSR is a basic method for chaotic time series analysis [34]. For a time series x � {x i | i � 1, 2, . . ., N}, the PSR model can be described as where m represents the embedding dimension, and τ is the delay time. e embedding dimension m and delay time τ are identified by the C-C method usually [35]. e detailed steps are as follows: e correlation integral of time series is defined as e statistics S 1 (m, N, r, t) is defined as When the number of samples is infinite, equation (14) can be described as And some statistics of S 2 can be calculated as e first zero point of SM 2 (t) or the first minimum value is the best delay time τ. e minimum value of S 2cor (t) is the length of time series window: T w � (m − 1)τ.

ELM.
Extreme Learning Machine is a feedforward neural network which has the characteristic of fast learning speed. For an ELM with single hidden layer, the ELM model can be described as [36] where y j is the output of ELM, X j is the input of ELM, L is the neurons number of the hidden layer, ω i represents the weight coefficient of neurons from input layer to hidden layer, β i represents the weight coefficient of neurons from hidden layer to output layer, b i denotes the bias of neurons of hidden layer, and h(x) is the activation function.
4 Complexity e objection function of ELM training is to get the minimum output error. If the output error is close to zero, the ELM model can be described in matrix form as where Y is formed with the real output, and ω i and b i are randomly selected. e β can be determined by

e Structure of the Proposed Model.
e structure of the proposed method is shown in Figure 1. e proposed method is mainly composed of three modules including data processing module, prediction module, and combination of final results module.
Module 1: Data processing In this module, the original wind speed data is decomposed by ICEEMDAN firstly. en, the entropies of each IMF are calculated. e IMFs with higher entropies which are regarded as more complexity subseries are decomposed by VMD again. e detailed process of the twice decomposition is presented in Section 3.2. e details of VMD and ICEEMDAN are presented in Sections 2.1 and 2.2. Module 2: Prediction e IMFs which are got by module 1 are utilized for prediction. Firstly, the C-C and PSR method are used to reconstitute the input of the prediction model which can get more useful information. And the dimension of the input can be also determined by the C-C method.
e details of C-C and PSR method are presented in Section 2.3. en, the IMVO-ELM model is employed for prediction for each IMF. e detail of IMVO-ELM model is presented in Section 3.3. Module 3: Combination of final results e summation of the prediction result of each IMF is the final result.

Twice Decomposition Based on ICEEMDAN, Fuzzy
Entropy, and VMD. In this paper, a twice-decomposition method is proposed to reduce the complexity of the input data of the prediction model. With nonstationarity of the original wind speed, the ICEEMDAN is employed to decompose the original wind speed which can reduce the complexity in prediction firstly. But some of the IMFs which are got by ICEEMDAN are still complex for prediction model, especially for the high frequency subseries. In order to find these IMFs, the fuzzy entropy is employed to estimate the complexity of each IMF. en, the IMFs are reclassified into two datasets. e reclassification process is as follows: where FEn(IMF i ) represents the fuzzy entropy of the i-th IMF. FEn(original) represents the fuzzy entropy of the original wind speed. e dataset L includes the IMFs with lower fuzzy entropy which are easy for prediction. e dataset H includes the IMFs with higher fuzzy entropy which are difficult for prediction. e VMD is employed to decompose the IMFs in dataset H again to reduce the complexity of the IMFs which are with high entropy. e subseries got by the twice decomposition have greatly reduced the complexity and can be used for prediction.

Improved Multiverse Optimizer for ELM.
As the EML method introduction in Section 2.5, the weight coefficients from input layer to hidden layer and the bias of hidden layer are formed randomly, and the values remain constant in training processing. According to the researches [18,37], this principle makes the ELM have faster training processing, but it will also make the poor effect in training processing. In order to solve this problem, optimization methods have been widely employed to improve the ELM model [38,39]. e parameter number which needs to be optimized in ELM is determined by the number of neurons in input layer and hidden layer. e optimal parameter number is usually too big to be effective which makes that more efficient optimization methods are necessary. e MVO is a nature-inspired algorithm for global optimization which is proposed by Mirjalili et al. in recent years [40]. Many researches have proved the better performance of MVO compared to other well-known optimization methods. Although the MVO has better optimization ability, the exploration ability and exploitation ability are difficult to balance and the initial populations have uneven distribution. In this paper, an improved multiverse optimizer (IMVO) has been proposed with two improved strategies.
Firstly, the cubic chaos mapping is employed to increase the diversity of the initial populations. e cubic chaos mapping can be described as follows: where a and b represent influence factors of chaos which influence the state and scope of the mapping. In general, the mapping is chaotic when b ∈ (2.3, 3). x n ∈ (−2, 2) when a � 1, and x n ∈ (−1, 1) when a � 4. Secondly, a sine function is proposed for WEP which is a control parameter in MVO.
e WEP parameter control strategy is shown as follows: where WEP max and WEP min are the maximum value and minimum value of WEP, iter represents the current iteration, and iter max denotes the maximum iteration. Under this control strategy, WEP changes slowly in the early stage to improve the exploration ability. In the middle period, the WEP changes fast which makes the algorithm quickly change from exploration to exploitation. And the WEP also changes slowly in the late stage to improve the exploitation ability.

Dataset Description.
e experiments data of this paper is collected from Sotavento Galicia wind farm. Wind speed data is recorded with a time interval of 10 mins. ere are four datasets which are collected in different seasons and utilized for the experiments. e wind speed of the four datasets is shown in Figure 2. For each dataset, the first 1000 samples are used as the training dataset and the last 100 samples are used as testing dataset. Meanwhile, the statistical information which includes mean value, maximum value, minimum value, standard deviation, skewness, and kurtosis is illustrated in Table 1. e maximum wind speed and the minimum wind speed are in wide variation range in all datasets. e standard deviation, skewness, and kurtosis show that the wind speed is not normally distributed. All the above statistical information indicates that the wind speed presents strong nonlinearity and nonstationarity.

Evaluation Metrics.
In order to evaluate the performance of each forecasting method, it is necessary to calculate the evaluation metrics which are based on the forecasting result and the actual result. In this paper, mean absolute percentage error (MAPE), root mean square error (RMSE), and mean absolute error (MAE) are utilized as evaluation metrics which can be described as follows: where L represents the number of samples, and y i and y i ′ are the observed and forecasting wind speed value at time i.

Comparison and Analysis with Different Optimization
Methods for ELM. In this paper, the IMVO method proposed in Section 3, 4 is used to improve the efficiency of ELM. In order to demonstrate the performance of the proposed  Complexity IMVO method, the proposed IMVO method is compared to genetic algorithm (GA), particle swarm optimization (PSO), grey wolf optimizer (GWO), and MVO which are wellknown optimization methods. Firstly, the parameters of these optimization methods are set to make sure the amount of computational complexity is roughly the same. e population is 30 and max iteration is 50 for all the optimization methods, and the other main parameters of these methods are set as follows: GA: generation gap � 0.95, crossover rate � 0.7, mutation rate � 0.01. PSO: accelerating constants c 1 � 2 and c 2 � 2, inertia weight ω � 0.6. MVO: WEP max � 1, WEP min � 0.2, p � 6. IMVO: WEP max � 1, WEP min � 0.2, p � 6, a � 4, b � 2.5. e number of input neurons of ELM is set as 5 and the number of hidden neurons is set as 8.
Secondly, the ELM model is employed to establish the forecasting model with the training datasets. e above optimization methods are utilized to optimize the ELM model which will make the ELM have better performance. e objective function is set as the minimum MAPE of training processing. With the randomness of intelligent optimization algorithms, each method is calculated 20 times independently.
e average values of evaluation metrics of training processing by different method and different datasets are illustrated in Table 2. And the boxplots of the evaluation metrics of the 20 times' calculations are shown in Figure 3.
As shown in Table 2 and Figure 3, the MAPE, MAE and RMSE of ELM method are worse than other methods which are caused by the instability of ELM. Some of the results have large deviation from the average value. For example, the worst MAE value of ELM method of dataset B is 1.62 m/s, and the average MAE value of ELM method of dataset B is 1.18 m/s. e maximum deviation is near 50% to average value. Once the ELM method gets into this situation, it will bring bigger error in wind speed forecasting. e results also indicate that the intelligent optimization algorithms can improve the stability of training processing of ELM model. In Figure 3      methods. In Figure 3(b), the MAE values of all the methods of dataset B are almost the same, and the MAE values of IMVO method are better than other methods in datasets A, C, and D. e GA method performance is worse than other optimization algorithms. In Figure 3(c), the GWO, PSO, MVO, and IMVO performance is almost the same and better than GA method. Although the results of some methods are almost the same, the convergence rates and searching ability are different which is important for short-term wind speed forecasting. e average convergence curves of different intelligent optimization algorithms in 20 times are shown in Figure 4. As shown in Figures 4(a)-4(c), the MVO has better searching ability than GA, GWO, and PSO method, but the convergence rate of MVO cannot match with the PSO and GWO method. e proposed IMVO method has not only increased the searching ability but also improved the convergence rate. As shown in Figure 4(d), although the convergence rate of the proposed IMVO method is a little slower than PSO method in early period, the strong searching ability makes it have better result in the mid to late period.
e results indicate that the proposed IMVO-ELM method can make the ELM model more stable to avoid the extreme situation. And the proposed IMVO method has strong searching ability and fast convergence rate which makes the ELM model more effective.

Comparison and Analysis with Different Prediction
Models. In this subsection, the proposed short-term wind speed forecasting method is verified. And seven comparative methods are carried out, including IMVO-ELM, EMD--IMVO-ELM, CEEMDAN-IMVO-ELM, ICEEMDAN--IMVO-ELM, EMD-cc-PSR-IMVO-ELM, CEEMDAN-cc-PSR-IMVO-ELM, and ICEEMDAN-cc-PSR-IMVO-ELM. All the above methods are based on the IMVO-ELM model which has been demonstrated to be effective in the previous subsection. e difference of these methods is the different input signal. e IMVO-ELM approach is based on original wind speed for input directly. e EMD--IMVO-ELM, CEEMDAN-IMVO-ELM, and ICEEMDAN--IMVO-ELM approaches are based on the EMD decomposition, CEEMDAN decomposition, and ICEEMDAN decomposition of original wind speed for input, respectively. In EMD-cc-PSR-IMVO-ELM, CEEMDAN-cc-PSR-IMVO-ELM, and ICEEMDANcc-PSR-IMVO-ELM approaches, the original wind speed is decomposed by EMD, CEEMDAN, and ICEEMDAN, respectively. en the PSR whose dimension and time delay are determined by C-C method is employed to reconstitute the input signal with the decomposition signal.
As the input neuron number of ELM can be determined by the cc-PSR method, the EMD-cc-PSR-IMVO-ELM, CEEMDAN-cc-PSR-IMVO-ELM, ICEEMDAN-cc-PSR-IMVO-ELM, and the proposed approach can determine the input neuron number of ELM automatically. But the IMVO-ELM, EMD--IMVO-ELM, CEEMDAN-IMVO-ELM, and ICEEMDAN--IMVO-ELM approach require human judgement for the input neuron number of ELM. Traversing method is employed to get the best input neuron number of ELM of the IMVO-ELM, EMD--IMVO-ELM, CEEMDAN-IMVO-ELM, and ICEEMDAN--IMVO-ELM. e input neuron number of ELM is traversed from 1 to 10. e other parameters of the IMVO and ELM are set as Section 4.3. And 20 time's independent calculations are applied for each approach. e average MAPE, MAE, and RMSE values of forecasting result under different approaches and different input neuron number are demonstrated in Figure 5.
According to the traversing calculation, the best input neuron number of IMVO-ELM is 5, 6, 3, and 6 for datasets A, B, C, and D, respectively. e best input neuron number of EMD--IMVO-ELM is 3, 2, 3, and 2 for datasets A, B, C, and D, respectively. e best input neuron number of CEEMDAN-IMVO-ELM is 3, 3, 3, and 3 for datasets A, B, C, and D, respectively. And the best input neuron number of ICEEMDAN--IMVO-ELM is 3, 6, 3, and 3 for datasets A, B, C, and D, respectively. e results with the best input neuron number of IMVO-ELM, EMD--IMVO-ELM, CEEMDAN-IMVO-ELM, and ICEEMDAN--IMVO-ELM are used for comparison to the other methods. Meanwhile, each of the EMD-cc-PSR-IMVO-ELM, CEEMDAN-cc-PSR-IMVO-ELM, ICE-EMDAN-cc-PSR-IMVO-ELM, and the proposed approach is employed 20 times independently for each dataset. e average evaluation metrics of the wind speed forecasting of all the above approaches are shown in Table 3.
As shown in the results in Table 3, the IMVO-ELM approach with original wind speed has the worst performance which indicates that the original wind speed has characteristic of nonstationarity and is difficult to predict by the IMVO-ELM model directly.
e forecasting performance had been greatly improved by EMD-IMVO-ELM, CEEMDAN-IMVO-ELM, and ICEEMDAN-IMVO-ELM approaches which are with the input of the decomposition signal by EMD, CEEMDAN, and ICEEMDAN. e results indicate that the random component, periodic component, and trend component of the signal are well decomposed by these signal decomposition methods which is helpful for the forecasting. Meanwhile the experiments show that the ICEEMDAN is better than CEEMDAN and the CEEM-DAN is better than EMD in this wind speed forecasting experiment. e EMD-cc-PSR-IMVO-ELM, CEEMDANcc-PSR-IMVO-ELM, and ICEEMDAN-cc-PSR-IMVO-ELM approaches have been added to the cc-PSR method to reconstruct input signal of each IMF, and the performance is better than EMD-IMVO-ELM, CEEMDAN-IMVO-ELM, and ICEEMDAN-IMVO-ELM, respectively. e results illustrate that more useful information can be gleaned from time series by cc-PSR method.
e proposed method has the best performance in evaluation metrics for all the datasets. e forecasting area of dataset A belongs to the low wind speed area, the MAPE of IMVO-ELM is over 25%, and the MAPE of other methods except the proposed method is near by 10%. e MAPE of the proposed method is 6.68% which is much better than other methods. e forecasting area of datasets B, C, and D belongs to the medium and high wind speed area, the MAPE, MAE, RMSE of the proposed method have reduced nearly 0.5%, 0.05 m/s, and 0.05 m/s comparing with the best         Complexity of other methods, respectively. e results indicate that the proposed method is useful for wild range of wind speed. e forecasting wind speed in time series of each approach is compared to the original wind speed in Figure 6. Meanwhile, the errors of each approach are also presented. As shown in the figures, the proposed method matches the original curve well, especially in the peak of the curve. And the error curve of the proposed method is smoother and more closed to the zeros.
Finally, the average calculation time of each method is listed in Table 4.
e results indicate that although the proposed method costs a little more time than other  methods, the calculation time of the proposed method is still acceptable for short-term wind speed forecasting.
According to the above comparison results, it can be seen that the proposed method has higher prediction accuracy and stronger adaptable in wild range of wind speed than other comparison methods.

Conclusions
A novel hybrid model based on twice decomposition, PSR, and IMVO-ELM is proposed to enhance the performance of short-term wind speed forecasting. In the proposed hybrid model, a twice decomposition based on ICEEMDAN, fuzzy entropy, and VMD is proposed to reduce the nonstationarity of original wind speed signal. en, decomposed signal is reconstituted by C-C-PSR method as the input data of prediction model. And an IMVO-ELM model is proposed as the prediction model. e proposed IMVO is utilized to improve the stability and efficiency of ELM. Finally, two comparison experiments are designed to verify the performance of the proposed method, and the experimental conclusions are as follows: (1) e wind speed forecasting with twice decomposition has greatly reduced the nonstationarity of original wind speed signal.
(2) e C-C-PSR method can determine the input dimension of ELM which can improve the prediction accuracy of ELM. (3) e IMVO has improved the stability of ELM, and the optimization efficiency is better than other comparison methods. erefore, the proposed hybrid approach is a useful tool for short-term wind speed forecasting.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.