A Hybrid Method for Ultrashort-Term Wind Power Prediction considering Meteorological Features and Seasonal Information

High-precision wind power prediction is important for the planning, economics, and security maintenance of a power grid. Meteorological features and seasonal information are strongly related to wind power prediction. 'is paper proposes a hybrid method for ultrashort-termwind power prediction consideringmeteorological features (wind direction, wind speed, temperature, atmospheric pressure, and humidity) and seasonal information. 'e wind power data are decomposed into stationary subsequences using the ensemble empirical mode decomposition (EEMD). 'e principal component analysis (PCA) is used to reduce the redundant meteorological features and the algorithm complexity. With the stationary subsequences and extracted meteorological features data as inputs, the long short-term memory (LSTM) network is used to complete the wind power prediction. Finally, the seasonal autoregressive integrated moving average (SARIMA) is innovatively used to fit seasonal features (quarterly and monthly) of wind power and reconstruct the prediction results of LSTM. 'e proposed method is used to predict 15-minute wind power. In this study, three datasets were collected from a windfarm in Laizhou to validate the prediction performance of the proposed method. 'e experimental results showed that the prediction accuracy was significantly improved when meteorological features were considered and further improved with seasonal correction.


Introduction
With the sharp decline in fossil energy reserves and increasing prominence of environmental problems, wind power has seen rapid development as a renewable and clean energy source around the world [1]. At the end of 2016, the global wind energy cumulative capacity reached 486.74 GW, with a new installation for 54.6 GW [2]. With the increasing proportion of installed wind capacity, the impact of the volatility and randomness of wind power on the power grid has become obvious, and the subsequent abandonment of wind has become more serious [3]. To suppress the influence of randomness, ensure the safe operation of the power grid, and improve the rate of wind power consumption, accurate prediction of wind power is important.
According to the time scale, the wind power forecasting model is divided into four categories: long-term, mediumterm, short-term, and ultrashort-term prediction. e long-term prediction is usually in years and mainly serviced for the formulation of wind farm plan and annual power generation plan. e medium-term prediction is usually in weeks or months and generally used to arrange maintenance plans. e short-term prediction is generally in days and generally used to reduce the discarded wind power, optimize the maintenance scheduling, and optimize the generation scheduling [4]. So the accuracy of short-term prediction is particularly important for planning of power grid.
Ultrashort-term wind power prediction refers to forecasting at intervals of 5, 10, or 15 minutes. Such prediction results are mainly applied to increasing wind power consumption, reducing wind turbine regulation pressures, and ensuring the safe operation of power grids [5]. e ultrashortterm prediction is theoretically the most demanding forecasting method; it is mainly used to control daily operations of wind farm units [6,7]. In addition, it is considered essential for maximizing the potential revenues of operators due to price fluctuations.
is is because future price in the renewable energy market is determined via a bidding mechanism between wind power operators and renewable market operators; the potential revenue for a wind power operator is directly affected by their ability to predict when and how much electricity will be generated in the future [8].
Currently, there are three basic types of wind power prediction methods: physical, statistical, and hybrid. Some research focused on building physical methods using mechanism analysis and equations. For ensuring the best efficiency of power outputs, the authors in [9,10], respectively, established turbine models to discuss the influence of different hub heights and configurations and the effects of inflow turbulence intensity and turbine arrangements. But due to the fluctuations and uncertainty of meteorological change, it is difficult to obtain the satisfactory forecasting results only through a single mechanism prediction model. Statistical methods, such as the autoregressive moving average model, are based on pattern recognition, parameter estimation, and model checking to construct historical data and predicted values [11,12]. However, statistical models cannot dig deep into data features, which lead to large prediction errors. Hybrid methods have two main constructions. e first construction focuses on data preprocessing, in which a decomposition algorithm is used to decompose the original wind power sequence into several sublayers to weaken the volatility. Each sublayer is predicted separately by the prediction model, and the results of all sublayers are then recombined to obtain the final predicted values [13,14]. e second construction usually uses several different prediction models to predict the wind power separately and then recombines the results to obtain the final predicted values [15,16]. Because hybrid methods have a better predictive performance, research with this approach has peaked in recent years. Hong et al. [17] proposed a hybrid method based on empirical mode decomposition (EMD) and an artificial neural network (ANN). e EMD was used to decompose the wind and wind power to obtain the stationary sub-sequences, and the ANN was used to predict the wind power. Yang et al. [14] proposed a hybrid method using the improved EMD to decompose the wind power. e ANN and SVM were used to establish prediction model according to different frequencies of sub-sequences. In recent years, long short-term memory (LSTM) networks have been widely used because of their unique memory and forgetting mode; they can flexibly adapt to timing fluctuations in wind power and have better prediction accuracies than conventional methods. Liu et al. [18] proposed a hybrid method based on a discrete wavelet transform (DWT) and the LSTM. e DWT is used to decompose the nonstationary wind power into components which have more stationarity, and each component is dug by an independent LSTM. Yu et al. [19] proposed a hybrid method using improved LSTM network to predict the wind power and further optimized the prediction results with clustering analysis. Han et al. [20] proposed a hybrid method using improved variational mode decomposition (VMD) to decompose wind power, and using improved LSTM to get the prediction. ese studies achieved many optional models for wind power prediction, and the forecasting accuracy was improved by applied the hybrid methods. However, most of the current studies only used wind power data, which do not make full use of meteorological information such as wind speed, wind direction, and temperature. In addition, the prediction model fails to fully extract seasonal information which will greatly improve the prediction accuracy if these messages are considered.
Based on previous work, a hybrid method based on meteorological features and seasonal information is proposed.
is paper uses LSTM-SARIMA hybrid model to achieve a 15-minute prediction. e innovations are as follows: (i) e meteorological features were regarded as the input variables, and the principal component analysis (PCA) was used to reduce the redundant components that these excessive influencing variables may cause higher computational complex. (ii) With the stationary subsequences and extracted meteorological features data as inputs, the LSTM was used to predict the wind power, taking advantage of seasonal autoregressive integrated moving average (SARIMA) method that can extract the seasonality characteristics (including quarterly and monthly) of wind power [21]. e rest of this paper is organized as follows. Section 2 describes the process of data preprocessing, including the decomposition of the wind power and the extraction of the related meteorological features. Section 3 introduces the prediction algorithms used in this paper. Section 4 introduces the case study. e method was applied to data from a windfarm in Laizhou for verification, and the prediction accuracy was compared with those of the four other methods. Finally, Section 5 summarizes this study and presents plans for future work.

Data Preprocessing
e architectural design of this prediction method is shown in Figure 1. e details of how the proposed method decomposes the original wind power sequence and considers the coupling relationship between the wind power and meteorological features are presented here.

Ensemble Empirical Mode Decomposition of Wind Power.
EMD is a direct, empirical, and adaptive data processing method designed specifically for nonlinear and non-stationary data. Unlike traditional methods, such as Fourier decomposition, that use fixed bases, EMD directly decomposes nonlinear and non-stationary time series into multiple sublayers (multiple IMF layers and a Res term) by the Hilbert transform without any base functions or filter functions. Because of this unique advantage, EMD is widely used in complex system analyses [22,23] and has been demonstrated to be advantageous for time series prediction.
EEMD was developed as an extended version of EMD to overcome the modal confusion problem [24] by adding white noise to the original time series multiple times; then, EMD is performed on the modified time series to obtain IMF sublayers, and the overall average of the corresponding IMF sublayers is considered as the final decomposition result. e main steps of EEMD are as follows.
Add white noise w i (t) − N(0, σ 2 ) to the original wind power series p(t) to convert it to a new time series: Decompose the time series p i (t) into n IMF sublayers e i j (t) (j � 1, 2, . . ., n) and a Res term r i (t) whose frequency is close to zero and cannot be extracted further: where e i j (t) is the jth IMF sublayer of the ith wind power sequence.
Repeat the above operation several times, in which the corresponding IMF sublayers could be obtained by adding different white noise to the original time series.
Calculate the average of all IMF sublayers obtained in m decompositions as the final IMF sublayer: Once EEMD is complete, the original time series can be represented as a linear combination of IMF sublayers and the Res term: where e j (t) is the average of the jth IMF extracted during m decompositions, r(t) is the final residual, and n is the number of IMF sublayers.

Meteorological Feature Extraction Based on Principal Component
Analysis. PCA is a multivariate statistical analysis technique that combines data compression and feature extraction [25]. PCA data inputs increase the overall accuracy by preserving the actually useful information. Also, by reducing the number of input data, the model training and prediction times can be significantly reduced. PCA has Mathematical Problems in Engineering thus been effectively used in wind power prediction [26,27]. e idea of PCA is to construct new variables from linear combinations of the original and select the variables that reflect the original as much as possible. PCA maps n-dimensional features to k dimensions (k < n), which results in a new set of orthogonal features called principal components (PC), rather than simply removing the remaining n − k dimensional features from the n-dimensional.
e PCA algorithm calculates the eigenvalues of the input variable matrix, finds the variance corresponding to each input variable, and then determines the PC according to the cumulative value [28]. e main steps are as follows.
Center all meteorological features (including wind direction, wind speed, temperature, atmospheric pressure, and humidity): find the average of each feature f j (j � 1, 2, . . ., 5) and then subtract the mean from each feature f j : where f ij is the ith value of feature j, f j is the average of feature j, n is the number of values of feature j, and p is the number of features. Calculate the covariance matrix of features: Calculate the eigenvalues of the covariance matrix and corresponding eigenvectors. Calculate the contribution rate of all eigenvectors and select the appropriate PC that was regarded as the extracted meteorological features. A higher contribution rate indicates that the original feature contains stronger information. e selection of a PC is mainly determined by the cumulative contribution rate of eigenvectors. In other words, the cumulative contribution rate usually needs to be more than 85% to ensure that the PC can contain most of the information of the original feature:

Seasonal Auxiliary Prediction Based on the SARIMA Model.
As an extension of the autoregressive integrated moving average (ARIMA) model, SARIMA considers the periodicity of the time series. e predicted results can be reconstructed using the results of the LSTM network to correct the wind power values. When a wind power sequence contains seasonal and non-seasonal information, the typical ARIMA model cannot capture the behavior of the seasonal portion, which results in a wrong choice of model parameters [29]. e general formula for SARIMA (p, d, q) (P, D, Q) S [30] is given by where P t is the wind power sequence, e t is white noise, L is the lag operator, D is the seasonal difference order, and d is the regular difference order, which represents the number of differences that smooth the wind power sequence, a is the regression model parameter, and s is the seasonality order that represents the seasonal change cycle of wind power.
Equations (11) and (12) are autoregressive and moving average polynomials, respectively, and represent the dependence of the future time series on past values and errors: Similarly, equations (13) and (14) represent the seasonal autoregressive and seasonal moving average polynomials, respectively. Adding these polynomials to the ARIMA equation can help capture seasonal variations in time series:

Wind Power Prediction Based on the LSTM Networks.
As an improved model of the recurrent neural network (RNN), the LSTM network has not only the recursiveness of RNN but also a unique memory and forgetting mode that can flexibly adapt to the timing characteristics of wind power. e LSTM network solves the gradient disappearance and gradient explosion problems of RNN during training and can fully utilize the time dependence of historical wind power data. e LSTM network consists of an input layer, output layer, and hidden layer. Compared to the traditional RNN, the hidden layer is no longer an ordinary neural unit but an LSTM unit with a unique memory mode. e structure of the memory cell is shown in Figure 2.
Each LSTM unit has a cell [31] whose state at time t is denoted as c t ; this can also be regarded as a memory unit which will input in the same cell the next time. e reading and modification of the memory cell are accomplished by controlling the input gate, forgetting gate, and output gate. Specifically, the workflow of the LSTM unit is as follows: at each given time, the LSTM unit receives the input state of combined feature vectors: where x i is the ith combined feature vector, IMF i is the ith wind power sub-sequence of different frequency scales, and In addition, the LSTM unit also receives the hidden state of the previous time h t−1 and the internal information c t−1 as the state of the last cell. After receiving the information, the logic function of a gate determines whether the input is active.
en, the input is transformed by a nonlinear function and superimposed with the state of the memory cell processed by the forgetting gate to form a new memory cell state c t . Finally, the state c t forms the wind power prediction h t by operation of the nonlinear function and dynamic control of the output gate. e variables are calculated as follows: where w xc , w xi , w xg , w xo are the weight matrices connected to the input signal x t ; w hc , w hi , w hg , w ho are the weight matrices connected to the input of the hidden layer; h t−1 , w ci , w cg , w co are the weight matrices connected to the internal information; and c t−1 , b i , b g , b c , b o are the offset vectors. σ is the activation function, which is usually a tanh or sigmoid function. According to (18), the current cell state c t is composed of the current input feature x t and the previous cell state c t−1 in which there is the memory of previous input features that is controlled by forgetting gate g t . When predicting the subsequent time point c t+1 , the information contained in c t−1 is not abandoned but is input into the next cell. As the sequence grows, all the features are used as important information for every subsequent wind power prediction.
Finally, the prediction results of the LSTM network are corrected by the results of the SARIMA model. e final wind power predicted values can be expressed as where p t is the prediction results of SARIMA and ω is the ratio by which the SARIMA model modifies the predicted values of the LSTM network, which takes the minimum error between the predicted value and the real value as the standard.

Evaluation Indices.
In this study, several indices were used to evaluate the proposed wind power prediction method: the mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean square error (RMSE). MAE reflects the true state of the predicted value error; MAPE measures the deviation between the predicted and true values; and RMSE represents the sample standard deviation between the predicted and true values, which is sensitive to larger errors. e specific equations are given below: where n is the size of the test data and Y pre (t) and Y act (t) represent the predicted and true values of wind power, respectively. To compare the prediction performances of relevant models, the percentage improvements of MAE, MAPE, and RMSE were used as evaluation indices:

Data Description.
Laizhou is located in the northeast of Shandong Province (37.0215°N, 119.4546°N), which borders the ocean and is rich in wind power resources. is study used three time series datasets of wind power collected at 15minute intervals from a Laizhou wind farm. e wind turbine siting layout is fixed. e datasets contain a total of 35040 samples (from 15 January 2017 to 27 February 2018). As shown in Table 1, the detail dates for prediction model are presented. e wind farm has a total installed capacity of 100 MW. e information of wind speed, pressure, temperature, and relative humidity is directly acquired through the system of Supervisory Control and Data Acquisition (SCADA), which could realize real-time collection of local Mathematical Problems in Engineering meteorological information. ere are many kinds of sensors in the system, which can collect many kinds of data such as meteorological environment and running state. e data has six features, as shown in Figure 3: wind direction, wind speed, temperature, atmospheric pressure, humidity, and wind power. Wind power shows high volatility and seasonality. e unit of wind speed is meters per second; the wind direction ranges from 0 to 360°, the temperature unit is Celsius, and the humidity ranges from 0 to 100 percent. During the modelling process, the first 80% of each dataset was used for training; the next 10% was used for crossvalidation; and the last 10% was used for the test. Figure 3, the original wind power sequence had strong instability with no obvious rule to follow. It is difficult to obtain reliable results if the prediction model is directly established for analysis. In order to decrease the volatility of the sequence, EEMD was used to decompose the wind power. e standard deviation of the additional white noise was set to 0.2, and the EMD execution frequency was set to 100 times to make the Res item small enough. e algorithm produced 10 IMF sublayers and a Res term at different time scales. As shown in Figure 4, the stability of the IMF sublayers was significantly improved. e 11 subsequences generated after decomposition were used for the next LSTM modelling analysis.

Extracting Results Using PCA.
e feature properties of wind power include five vectors: wind direction, wind speed, temperature, atmospheric pressure, and humidity. In order to analyze the correlation between the features and wind power, the Person correlation coefficient between wind power and each feature was tested. e calculation formula is shown as follows: where r represents the Person correlation coefficient, X i and Y i represent the features, respectively, and X and Yrepresent the mean value of respective samples.
As shown in Table 2, the wind speed is the most relevant to wind power, and the following is wind direction. It can be concluded that the wind turbine output is directly affected by the wind speed and wind direction. e temperature and humidity have a low correlation with wind power but have a certain correlation with wind speed. is is because temperature and humidity can indirectly reflect wind speed. In summary, this article considered all these five meteorological features for experiments.
In order to reduce the redundant components of meteorological characteristics and reduce the computational complex, PCA was used to extract features, and the components with a large contribution rate were kept. Table 3 presents the contribution rates of the extracted PC. e third and fourth PC had very small contribution rates and contained negligible information of the original features. erefore, only the first and second PC were considered in this experiment.

Parameter and Hyperparameter Settings of LSTM.
e IMF sublayers and Res term were combined with the principal components extracted by PCA to obtain 11 combined feature vectors X t (IMF it , PC 1 , PC 2 ), which were inputted into the LSTM network for training. For the experimental design of the training process, the Keras deep learning framework was selected. e sigmoid function was used as the activation function, and the model was optimized with the Adam optimization algorithm. To prevent overfitting, we set a dropout of 0.8 after cross-validation. e number of training samples per unit of time (batch size) was set to 96 that equals the amount of data in a day. e learning rate affects convergence effect and speed of the model, and we set it to 0.02 after experiments. In terms of the LSTM layers and neurons of dense, the following values were considered: {layers: 1, 2, 3, 4}, {neurons of dense: 1(70), 1(80), 2(70, 80), 2(80, 80)}. As shown in Table 4, RMSE is calculated to evaluate the forecasting performance using training set and validation set. As the number of layers increases, the training model will have a better effect, even reaching 100% prediction accuracy. However, what follows is that the model is overfitted, and the prediction effect is seriously reduced when the model is put on the test data. Above all, increasing the number of hidden layers may not be beneficial to forecasting performance. Also, based on their effects on the prediction and training speed, the number of LSTM layers was set to 2 with a dense layer that has 80 neurons.

Parameter Settings of SARIMA.
At the same time, the SARIMA (p, d, q) (P, D, Q) S model was built according to the information of the wind power sequence. e Akaike information criterion (AIC) was used to evaluate and compare models with different parameters [32]. AIC also considers the overall model complexity when measuring how a model fits the data. A model suitable for wind power data would have a lower AIC score than others. In Mathematical Problems in Engineering applications, d rarely exceeds 1, and P and Q are usually no more than 3 [33]. In this study, the SARIMA function of the statsmodels framework was used to test the effects of different model parameters on AIC, and the results are presented in Table 5. Based on the minimum AIC score, SARIMA (3, 0, 1) (3, 1, 1) 12 was chosen as the wind power prediction model.

Comparative Analysis.
To test prediction ability of the proposed three-stage wind power prediction method, it was compared with four other methods: single LSTM, Gated Recurrent Unit (GRU), EEMD-LSTM, and PCA-EEMD-LSTM. GRU is a sequence prediction network. Compared with the LSTM, it only contains update gate and reset gate, which also has a good prediction effect while simplifying the structure [34]. As the existing paper has demonstrated that the LSTM model is superior to traditional models in prediction effect, the traditional methods are not considered in this experiment. e three Laizhou wind farm datasets were used in this comparison. is study took the predicted wind power data for the following day as example, which contains 96 data points. e prediction interval is 15 minutes. To better explain the performance of the methods, we used the determination coefficient (R 2 ) to test the degree of  Mathematical Problems in Engineering association between the predicted and true values. R 2 is defined as follows: where y act represents the original values, y pre represents the predicted values, and y is the average of the original values. e closer R 2 is to 1, the smaller the deviation is between the predicted and true values [35]. Table 6 provides the results of R 2 for the cross-validation datasets. R 2 is close to 1, which shows that the value predicted by this model is very close to the true value; the method performed well during the cross-validation. Table 7 provides the results of R 2 with the test dataset; here too, it can be seen that the predicted value is close to the true value. Figures 5-7 show the prediction results for all methods for datasets 1-3. A one-day forecast curve was drawn with a total of 96 data points. Table 8 provides the prediction errors for the different methods. Table 9 presents the percentage improvement in the evaluation indices in the case of the proposed method, compared to the others. e main purpose of this experiment was to compare the proposed method with others, thereby demonstrating its effectiveness.
e minimum error value is taken as the evaluation standard, and the proposed method performed the best at wind power prediction among the four methods, with MAPE, RMSE, and MAE values of 4.78, 3.38, and 2.86, respectively. Taking dataset 1 as example, detailed results are presented as follows: (i) e PCA-EEMD-LSTM combination method has a significantly higher prediction accuracy than the EEMD-LSTM combination method. Tables 7 and 8 indicate that the PCA-EEMD-LSTM combination method has MAPE, RMSE, and MAE values of 5.33, 3.4, and 3.09, respectively. Compared to the EEMD-LSTM combination method, the percentage improvements in MAPE, RMSE, and MAE were 33.87%, 18.83%, and 13.93%, respectively. Figure 5 shows the prediction results for these two methods. e experimental results show that the meteorological information is effective in improving the accuracy, and the PCA can have a satisfactory effect on reducing the redundant components and get the components with high correlation with wind power, which improves the prediction accuracy and computing speed.            Figure 5 shows the prediction results for these two methods. e experimental results show that the SARIMA model can track the seasonality and periodicity of the wind power sequence, which improves the results predicted by the LSTM network.

Conclusions
In this study, a hybrid ultrashort-term wind power prediction method was developed. Based on the prediction using historical wind power data, the meteorological information regarded as the features of wind power is helpful to improve prediction accuracy.
e PCA can be used to reduce the redundant meteorological features and only get the components with high correlation with wind power. In addition, the SARIMA can effectively extract seasonal information and be regarded as the auxiliary predictor of LSTM. e hybrid prediction model was applied to three datasets collected from a wind farm in Laizhou to verify prediction ability, and the experimental results showed that this method got great prediction accuracy. In the future, the proposed method can be applied to real-time prediction systems. Although it takes more than 3 hours to build the models on a general computer, the time required for real-time prediction using the built models is extremely short, which fully meets ultrashort-term requirements. Moreover, the future research should comprise the following: (i) e EEMD algorithm can be applied not only to the wind power sequence but also to meteorological features. (ii) e proposed method shows satisfactory results for a 15-minute prediction. We will verify predictions at 30-minute, 1-hour, or even longer intervals in the future.

EEMD:
Ensemble empirical mode decomposition IMF: Intrinsic mode function PCA: Principal component analysis PC: Principal component LSTM: Long short-term memory network SARIMA: Seasonal autoregressive integrated moving average.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.