Short-Time Wind Speed Forecast Using Artificial Learning-Based Algorithms

The need for an efficient power source for operating the modern industry has been rapidly increasing in the past years. Therefore, the latest renewable power sources are difficult to be predicted. The generated power is highly dependent on fluctuated factors (such as wind bearing, pressure, wind speed, and humidity of surrounding atmosphere). Thus, accurate forecasting methods are of paramount importance to be developed and employed in practice. In this paper, a case study of a wind harvesting farm is investigated in terms of wind speed collected data. For data like the wind speed that are hard to be predicted, a well built and tested forecasting algorithm must be provided. To accomplish this goal, four neural network-based algorithms: artificial neural network (ANN), convolutional neural network (CNN), long short-term memory (LSTM), and a hybrid model convolutional LSTM (ConvLSTM) that combines LSTM with CNN, and one support vector machine (SVM) model are investigated, evaluated, and compared using different statistical and time indicators to assure that the final model meets the goal that is built for. Results show that even though SVM delivered the most accurate predictions, ConvLSTM was chosen due to its less computational efforts as well as high prediction accuracy.


Introduction
e need to move towards renewable and clean energy sources has increased considerably over the previous years. Fossil fuels are being misused excessively and eventually will waste away. However, renewable energy (RE) sources such as wind, solar, and hydraulic or hydroelectric are regularly replenished and will sustain forever. Grid operators who use RE face many challenges which lead to variability and uncertainty in power generation. For instance, in the case of solar power where the existence of clouds that move above solar power plants can narrow power generation for brief intervals of time. Cloud cover may introduce a very quick shift in the outcome of solar structures, but solar energy is still considered to be highly predictable as the sun motion is understood clearly [1]. However, wind power generation is less predictable due to the fact that fluctuations in wind speed are stochastic in nature. is issue will cause a break between supply and demand. So, in order to enhance and optimize renewable wind power generation, wind speed or power production forecasting models are recently being used to resolve this problem. is has led to huge increase in installing wind power plants [2].
As the demand for wind power has increased over the last decades, there is a serious need to set up wind farms and construct facilities depending on accurate wind forecasted data. Collected short-term wind forecasting has a significant effect on the electricity [3], which is also necessary to identify the size of wind farms.
It is obvious that there is a need for an accurate wind forecasting technique to substantially reduce the cost by wind power scheduling [4]. ere are several methods which are aimed at short-time wind forecasting (e.g., statistical time series and neural networks). For an advanced and more accurate forecasting, the hybrid models are used. ese models combine physical and statistical approaches, short and medium-term models, and combinations of alternative statistical models. e concept of artificial neural networks (ANNs) was first introduced by McCulloch and Pitts [5] in 1943 as a computational model for biological neural networks. Convolutional neural network (CNN) was influenced by "Neocognitron" networks which were first introduced by Fukushima in 1980 [6]. CNN was based on biological processes which were hierarchical multilayered neural networks used for image processing. ese networks are capable of "learning without a teacher" for recognition of various catalyst shapes depending on their geometrical designs [7].
Long short-term memory (LSTM) [8] is built upon recurrent neural network (RNN) structure. It was designed by Hochreiter and Schmidhuber in 1997. LSTM uses the concept proposed in [9] which depends on feedback connections between its layers. Unlike standard feedforward neural networks, LSTM can process entire sequences of data (such as voice or video) and not just single data points (such as images). Support vector machine (SVM) [10] is a popular machine learning technique, which is advanced enough to deal with complex data. It is aimed to deal with challenges in classification problems.
In 2016, Convolutional LSTM (ConvLSTM) was used to build a video prediction model by Shi et al. [11]. A tool is developed to prognose action-conditioned video that modeled pixel movement, by predicting a distribution over pixel movement from earlier frames. Stacked convolutional LSTM was employed to generate motion predictions. is approach has gained the finest outcomes in predicting future object motion.
An end-to-end learning of driving models was developed in [12] using a LSTM-based algorithm. A trainable structure for learning how to accurately predict a distribution among upcoming vehicle movement is developed through learning a generic vehicle movement from large-scale crowd-sourced video.
e data source used a rapid monocular camera, observations, and past vehicle state. e images were encoded through long short-term memory fully convolutional network (FCN-LSTM) to determine the related graphical illustration in every input frame, side by side with a temporal network to use the movement history information. e authors were able to compose an innovative hybrid structure for time-series prediction (TSP) that combined an LSTM temporal encoder utilizing a fully convolutional visual encoder.
Various papers have been explored in the literature on wind speed forecasting. For instance, a model was introduced by Xu et al. [13] to predict short-term wind speed using LSTM, empirical wavelet transformation (EWT), and Elman neural network approaches. e EWT is implemented to break down the raw wind speed data into multiple sublayers and employ them in Elman neural network (ENN) and LSTM network to predict the low and high frequency sublayers. Unscented Kalman filter (UKF) along with support vector regression (SVR) based state-space model was applied by Chen and Yu [14] to efficiently correct the short-term estimation of wind speed chain.
A nonlinear-learning scheme of deep learning time series prediction, EnsemLSTM, was developed by Chen et al. [15]. is scheme relied on LSTMs, support vector regression machine (SVRM), and extremal optimization algorithm (EO). Wind speed data are forecasted separately by an array of LSTMs that contained covered layers. Neurons are built in every hidden layer. e authors proved that the introduced EnsemLSTM is capable of achieving an improved forecasting execution along with the least mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and the highest R-squared (R 2 ).
A hybrid model constructed of wavelet transform (WT) and SVM was proposed by Liu et al. [16] to predict wind speed in the short term. e model is improved by genetic algorithm (GA), which is implemented to vary essential specifications of SVM through reducing the produced errors and searching the optimum specifications to bypass the danger of instability. e presented model is proved to be more efficient than SVM-GA model. Wang [17] developed a genetic algorithm of wavelet neural network (GAWNN) model. e developed model showed an enhanced operation as compared to the normal wavelet neural network (WNN) model in predicting shortterm wind power. e model can be located at the beginning of network training as well as in convergent precision.
A prediction model was proposed by Sheikh et al. [18] based on support vector regression (SVR) and neural network (NN) with backpropagation technique. A windowing data preprocessing was combined with cross and sliding window validations in order to predict wind speed with high accuracy. A hybrid method was presented by Nantian et al. [19], which included variational mode decomposition (VMD), partial autocorrelation function (PACF) feature selection, and modular weighted regularized extreme learning machine (WRELM) prediction. e optimal number of decomposition layers was analyzed by the prediction error of one-step forecasting with different decomposition layers. A robust forecasting model was proposed by Haijian and Deng [20] by evaluating seasonal features and lag space in wind resource. e proposed model was based on the multilayered perceptron with one hidden layer neural network using the Levenberg-Marquardt optimization method. Least squares support vector machine (LSSVM) was used by Xiaodan [21] for the wind speed forecasting. e accuracy of the prediction model parameters was 2 Computational Intelligence and Neuroscience optimized utilizing the particle swarm optimization (PSO) to minimize the fitness function in the training process. Ningsih et al. [22] predicted wind speed using recurrent neural networks (RNNs) with long short-term memory (LSTM). Two optimization models of stochastic gradient descent (SGD) and adaptive moment estimation (Adam) were evaluated. e Adam method was shown to be better and quicker than SGD with a higher level of accuracy and less deviation from the target. A nonlinear autoregressive neural network (NAR-NET) model was developed by Datta [23]. e model employed univariate time series data to generate hourly wind speed forecast. e closed loop structure provided error feedback to the hidden layer to generate forecast of the next point. A short-term wind speed forecasting method was proposed by Guanlong et al. [24] using a backpropagation (BP) neural network. e weight and threshold values of BP network are trained and optimized by the improved artificial bee colony algorithm. en, the gathered samples of wind speed are trained and optimized. When training is finished, test samples are used to forecast and validate.
Fuzzy C-means (FCM) clustering was used by Gonggui et al. [25] to forecast wind speed. e input data of BP neural network with similar characteristics are divided into corresponding classes. Different BP neural networks are established for each class. e coefficient of variation is used to illustrate the dispersion of data, and statistical knowledge is used to illuminate the input data with large dispersion from the original dataset. Artificial neural networks (ANNs) and decision trees (DTs) were used by ZhanJie and Mazharul Mujib [26] to analyze meteorological data for the application of data mining techniques through cloud computing in wind speed prediction. e neurons in the hidden layer are enhanced gradually, and the network performance in the form an error is examined. Table 1 highlights the main characteristics of the existing schemes developed for wind speed forecasting. e novelty of this work lies in enhancing the accuracy of wind speed forecasting by using a hybrid model called ConvLSTM and comparing it with other four commonly used models with optimized lags, hidden neurons, and parameters. is includes testing and comparing the performance of these five different models based on historical data as well employing multi-lags-one-step (MLOS) ahead forecasting concept. MLOS provided an efficient generalization to new time series data. us, it increased the overall prediction accuracy. e remainder of this paper is organized as follows. Section 2 describes the four learning algorithms in addition to a hybrid algorithm investigated for an accurate wind speed forecasting. Section 3 illustrates the study methodology. Section 4 shows a real case study of a wind farm. Section 5 introduces the results and discussion. Finally, conclusions and future works are presented in Section 6. Table 2 illustrates the acronyms and notations used through the paper.

Prediction Algorithms
In this section, the algorithms used for wind speed forecasting are summarized as follows.
2.1. LSTM Algorithms. LSTM is built in a unique architecture that empowers it to forget the unnecessary information, by turning multiplication into addition and using a function whose second derivative can preserve for a long range before going to zero in order to reduce the vanishing gradient problem (VGP). It is constructed of the sigmoid layer which takes the inputs x t and h t−1 and then decides by generating the zeros which part from the old output should be removed. is process is done through forget gate f t . e gate output is given as f t * c t−1 . After that, a vector of all the possible values from the new input is created by tan h layer. ese two results are multiplied to renew the old memory c t−1 that gives c t . In other words, the sigmoid layer decides which portions of the cell state will be the outcome.
en, the outcome of the sigmoid gate is multiplied by all possible values that are set up through tan h. us, the output consists of only the parts that are decided to be generated.
LSTM networks [8] are part of recurrent neural networks (RNNs), which are capable of learning long-term dependencies and powerful for modeling long-range dependencies. e main criterion of the LSTM network is the memory cell which can memorize the temporal state. It is also shaped by the addition or removal of information through three controlling gates. ese gates are the input gate, forget gate, and output gate. LSTMs are able to renew and control the information flow in the block using these gates in the following equations: where "·" presents matrix multiplication, "⊙" is an elementwise multiplication, and "θ" stands for the weights. c is the input to the cell c which is gated by the input gate, while o t is the output. e nonlinear functions σ and tan h are applied elementwise, where σ(x) � 1/1 + e − x . Equations (1) and (2) establish gate activations, equation (3) indicates cell inputs, equation (4) determines the new cell states, where the 'memories' are stored or deleted, and equation (5) results in the output gate activations which are shown in equation (6), the final output.
Computational Intelligence and Neuroscience   To achieve network architecture optimization and solve the unknown parameters in the network, the attributes of a two-dimensional image are excerpted and the backpropagation algorithms are implemented. To achieve the final outcome, the sampled data are fed inside the network to extract the needed attributes within prerefining. Next, the classification or regression is applied [27]. e CNN is composed of basically two types of layers: the convolutional and the pooling layers. e neurons are locally connected within the convolution layer and the preceding layer. Meanwhile, the neurons' local attributes are obtained. Future wind speed measure 6 Computational Intelligence and Neuroscience e local sensitivity is found through the pooling layer to obtain the attributes repeatedly. e existence of the convolution and the pooling layers minimizes the attribute resolution and the number of network specifications which require enhancement.
CNN typically describes data and constructs them as a two-dimensional array and is extensively utilized in the area of image processing. In this paper, CNN algorithm is configured to predict the wind speed and fit it to process a onedimensional array of data. In the preprocessing phase, the one-dimensional data are reconstructed into a two-dimensional array. is enables CNN machine algorithm to smoothly deal with data. is creates two files: the property and the response files. ese files are delivered as inputs to CNN. e response file also contains the data of the expected output value.
Each sample is represented by a line from the property and the response files. Weights and biases can be obtained as soon as an acceptable number of samples to train the CNN is delivered. e training continues by comparing the regression results with the response values in order to reach the minimum possible error. is delivers the final trained CNN model, which is utilized to achieve the needed predictions.
e fitting mechanism of CNN is pooling. Various computational approaches have proved that two approaches of pooling can be used: the average pooling and the maximum pooling. Images are stationary, and all parts of image share similar attributes. erefore, the pooling approach applies similar average or maximum calculations for every part of the high-resolution images. e pooling process leads to reduction in the statistics dimensions and increase in the generalization strength of the model. e results are well optimized and can have a lower possibility of over fitting.

ANN Algorithms. ANN has three layers which build up the network.
ese are input, hidden, and output layers. ese layers have the ability to correlate an input vector to an output scalar or vector using activation function in various neurons. e j th hidden neuron Z j can be computed by the p inputs and m hidden neurons using the following equation [14]: where w ij is the connection weight from the i th input node to the j th hidden node, y k−i is i-step behind previous wind speed, and f h (.) is the activation function in the hidden layer. erefore, the future wind speed can be predicted through where w j is the connection weight from the j th hidden node to the output node and y k is the predicted wind speed at the k th sampling moment while f 0 is the activation function for the output layer. By minimizing the error between the actual and the predicted wind speeds, y k and y k , respectively, using Levenberg-Marquardt (LM) algorithm, the nonlinear mapping efficiency of ANN can be obtained [28].

SVM Algorithms.
Assuming a set of samples x i , y i , where i � 1, 2, . . . , N, with input vector x i ∈ R m and output vector y i ∈ R m . e regression obstacles aim to identify a function f (x) that describes the correlation between inputs and outputs. e interest of SVR is to obtain a linear regression in the high-dimensional feature space delivered by mapping the primary input set utilizing a preknown function ϕ(x(.))and to minimize the structure risk R[f]. is mechanism can be written as follows [15]: where W, b, and C,respectively, are the regression coefficient vector, bias term, and punishment coefficient.
is the e-insensitive loss function. e regression problem can be handled by the following constrained optimization problem: min , where ζ i and ζ * i represent the slack variables that let constraints feasible. By using the Lagrange multipliers, the regression function can be written as follows: where a i and a * i are the Lagrange multipliers that fulfil the conditions a i ≥ 0, a * i ≥ 0 and N i�1 (a i − a * i ) � 0. K(x i x j ) is a general kernel function. In this study, the well-known radial basis function (RBF) is chosen here as the kernel function: Computational Intelligence and Neuroscience where σ defines the RBF kernel width [15].

ConvLSTM
Algorithm. ConvLSTM is designed to be trained on spatial information in the dataset, and its aim is to deal with 3-dimentional data as an input. Furthermore, it exchanges matrix multiplication through convolution operation on every LSTM cell's gate. By doing so, it has the ability to put the underlying spatial features in multidimensional data. e formulas that are used at each one of the gates (input, forget, and output) are as follows: where i t , f t , and o t are input, forget, and output gates and W is the weight matrix, while x t is the current input data. h t−1 is the previous hidden output, and C t is the cell state. e difference between these equations in LSTM is that the matrix multiplication (·) is substituted by the convolution operation ( * ) between W and each x t , h t−1 at every gate. By doing so, the whole connected layer is replaced by a convolutional layer. us, the number of weight parameters in the model can be significantly reduced.

Methodology
Due to the nonlinear, nonstationary attributes and the stochastic variations in the wind speed time series, the accurate prediction of wind speed is known to be a challenging effort [29]. In this work, to improve the accuracy of the wind speed forecasting model, a comparison between five models is conducted to forecast wind speed considering available historical data. A new concept called multi-lags-one-step (MLOS) ahead forecasting is employed to illustrate the effect on the five models accuracies. Assume that we are at time index X t . To forecast one output element in the future X t+1 , the input dataset can be splitted into many lags (past data) X t−I , where I ∈ {1-10}. By doing so, the model can be trained on more elements before predicting a single event in the future. In addition to that, the model accuracy showed an improvement until it reached the optimum lagging point, which had the best accuracy. Beyond this point, the model accuracy is degraded as it will be illustrated in the Results section. Figure 1 illustrates the workflow of the forecasting model. Specifically, the proposed methodology entails four steps.
In Step 1, data have been collected and averaged from 5 minutes to 30 minutes and to 1 hour, respectively. e datasets are then standardized to generate a mean value of 0 and standard deviation of 1. e lagging stage is very important in Step 2, as the data are split into different lags to study the effect of training the models on more than one element (input) to predict a single event in the future. In Step 3, the models have been applied taking into consideration that some models such as CNN, LSTM, and ConvLSTM need to be adjusted from matrix shape perspective. ese models normally work with 2D or more. In this stage, manipulation and reshaping of matrix are conducted. For the sake of checking and evaluating the proposed models, in Step 4, three main metrics are used to validate the case study (MAE, RMSE, and R 2 ). In addition, the execution time and optimum lag are taken into account to select the best model. Algorithm 1 illustrates the training procedure for ConvLSTM. Table 3 illustrates the characteristics of the collected data in 5 minutes time span. e data are collected from a real wind speed dataset over a three-year period from the West Texas Mesonet, with 5-minute observation period from near Lake Alan, Garza [30]. e data are processed through averaging from 5 minutes to 30 minutes (whose statistical characteristics are given in Table 4) and one more time to 1 hour (whose statistical characteristics are also given in Table 5). e goal of averaging is to study the effect of reducing the data size in order to compare the five models and then select the one that can achieve the highest accuracy for the three dataset cases. As shown in the three tables, the data sets are almost identical and reserved with their seasonality. Also, they are not affected by the averaging process. e data have been split into three sets (training, validation, and test) with fractions of 53 : 14 : 33.

Results and Discussion
To quantitatively evaluate the performance of the predictive models, four commonly used statistical measures are tested [20]. All of them measure the deflection between the actual and predicted wind speed values. Specifically, RMSE, MAE, and R 2 are as follows: where y i and y i are the actual and the predicted wind speed, respectively, while y i is the mean value of actual wind speed sequence. Typically, the smaller amplitudes of these measures indicate an improved forecasting procedure, while R 2 is the goodness-of-fit measure for the model. erefore, the larger its value is, the fitter the model will be. e testbed environment configuration is as follows:   Computational Intelligence and Neuroscience 9 Table 6 illustrates the chosen optimized internal parameters (hyperparameters) for the forecasting methods used in this work. For each method, the optimal number of hidden neurons is chosen to achieve the maximum R 2 and the minimum RMSEand MAE values.
After the implementation of CNN, ANN, LSTM, ConvLSTM, and SVM, it was noticed that the most fitted model was chosen depending on its accuracy in predicting future wind speed values. us, the seasonality is considered for the forecast mechanism. e chosen model has to deliver the most fitted data with the least amount of error, taking into consideration the nature of the data and not applying naive forecasting on it.
To achieve this goal, the statistical error indicators are calculated for every model and time lapse and fully represented as Figure 2 illustrates. e provided results suggest that the ConvLSTM model has the best performance as compared to the other four models. e chosen model has to reach the minimum values of RMSE and MAE while maximum R 2 value.
Different parameters are also tested to ensure the right decision of choosing the best fitted model. e optimum number of lags which is presented in Table 7 is one of the most important indicators in selecting the best fitted model. Since the less historical points are needed by the model, the computational effort will be less as well. For each method, the optimal number of lags is chosen to achieve the maximum R 2 and the minimum RMSE and MAE values. For instance, Figures 3 and 4 show the relation between the statistical measures and the number of lags and hidden neurons, respectively, for the proposed ConvLSTM method for the 5 minutes time span case. It can be seen that 4 lags and 15 hidden neurons achieved maximum R 2 and minimum RMSE and MAE values. e execution time shown in Table 8 is calculated for each method and time lapse to assure that the final and chosen model is efficient and can effectively predict future wind speed. e shorter the time for execution is, the more efficient and helpful the model will be. is is also a sign that the model is efficient for further modifications. According to Table 8, the ConvLSTM model beats all other models in the time that it needed to process the historical data and deliver a final prediction; SVM needed 54 minutes to accomplish the training and produce testing results while ConvLSTM made it in just 1.7 minutes. is huge difference between them has made the choice of using ConvLSTM.     10 Computational Intelligence and Neuroscience Figure 5 shows that the 5-minute lapse dataset is the most fitted dataset to our chosen model. It declares how accurate the prediction of future wind speed will be.
For completeness, to effectively evaluate the investigated forecasting techniques in terms of their prediction accuracies, 50 cross validation procedure is carried out in which the investigated techniques are built and then evaluated on 50 different training and test datasets, respectively, randomly sampled from the available overall dataset. e ultimate performance metrics are then reported as the average and the standard deviation values of the 50 metrics obtained in each cross validation trial. In this regard, Figure 6 shows the average performance metrics on the test dataset using the 50 cross validation procedure. It can be easily recognized that the forecasting models that employ the LSTM technique outperform the other investigated techniques in terms of the three performance metrics, R 2 , RMSE, and MAE.
From the experimental results of short-term wind speed forecasting shown in Figure 6, we can observe that ConvLSTM performs the best in terms of forecasting metrics (R 2 , RMSE, and MAE) as compared to the other models (i.e., CNN, ANN, SVR, and LSTM). e related statistical tests in Tables 6 and 7, respectively, have proved the effectiveness of ConvLSTM and its capability of handling noisy large data. ConvLSTM showed that it can produce high accurate wind speed prediction with less lags and hidden neurons. is was indeed reflected in the results shown in Table 8 with less computation time as compared to the other tested models. Furthermore, we introduced the multi-lags-onestep (MLOS) ahead forecasting combined with the hybrid ConvLSTM model to provide an efficient generalization for new time series data to predict wind speed accurately. Results showed that ConvLSTM proposed in this paper is an effective and promising model for wind speed forecasting.     Similar to our work, the proposed EnsemLSTM model by Chen et al. [15] contained different clusters of LSTM with different hidden layers and hidden neurons. ey combined LSTM clusters with SVR and external optimizer in order to enhance the generalization capability and robustness of their model. However, their model showed a high computational complexity with mediocre performance indices. Our proposed ConvLSTM with MLOS assured boosting the generalization and robustness for the new time series data as well as producing high performance indices.

Conclusions
In this study, we proposed a hybrid deep learning-based framework ConvLSTM for short-term prediction of the wind   e proposed dynamic prediction model was optimized for the number of input lags and the number of internal hidden neurons. Multi-lags-onestep (MLOS) ahead wind speed forecasting using the proposed approach showed superior results compared to four other different models built using standard ANN, CNN, LSTM, and SVM approaches. e proposed modeling framework combines the benefits of CNN and LSTM networks in a hybrid modeling scheme that shows highly accurate wind speed prediction results with less lags and hidden neurons, as well as less computational complexity. For future work, further investigation can be done to improve the accuracy of the ConvLSTM model, for instance, increasing and optimizing the number of hidden layers, applying a multi-lags-multi-steps (MLMS) ahead forecasting, and introducing reinforcement learning agent to optimize the parameters as compared to other optimization methods.
Data Availability e wind speed data used in this study have been taken from the West Texas Mesonet of the US National Wind Institute (http://www.depts.ttu.edu/nwi/research/facilities/wtm/ index.php). Data are provided freely for academic research purposes only and cannot be shared/distributed beyond academic research use without permission from the West Texas Mesonet.

Conflicts of Interest
e authors declare that they have no conflicts of interest.