Integrated Machine Learning and Enhanced Statistical Approach-Based Wind Power Forecasting in Australian Tasmania Wind Farm

This paper develops an integrated machine learning and enhanced statistical approach for wind power interval forecasting. A time-series wind power forecasting model is formulated as the theoretical basis of our method. The proposed model takes into account two important characteristics of wind speed: the nonlinearity and the time-changing distribution. Based on the proposed model, six machine learning regression algorithms are employed to forecast the prediction interval of the wind power output. The six methods are tested using real wind speed data collected at a wind station in Australia. For wind speed forecasting, the long short-term memory (LSTM) network algorithm outperforms other ﬁve algorithms. In terms of the prediction interval, the ﬁve nonlinear algorithms show superior performances. The case studies demonstrate that combined with an appropriate nonlinear machine learning regression algorithm, the proposed methodology is eﬀective in wind power interval forecasting.


Introduction
Wind power is rapidly expanding its market share around the world. However, the intermittency and uncertainty of wind make it a challenge to integrate wind power into the power system. e wind power forecasting system can greatly help the integration process since system operators rely on accurate wind power forecasts to design operational plans and assess system security [1,2]. Servo mechanism is the foundation of wind turbines, and precise wind power forecasting can improve the accuracy of parameter estimation and control of wind turbine servo systems [3][4][5]. Predictions of wind power outputs are traditionally provided in the form of point forecasts. e advantage of point forecasts is that they can be easily understood. e single value is expected to tell everything about future power generation. Nowadays, the majority of the research efforts on wind power forecasting is still focused on point forecasting. e reviews of the state of the art in wind power forecasting can be found in [6,7]. A book on physical approaches to short-term wind power forecasting also partly discusses the state of the art in wind power forecasting [8].
However, even if both meteorological and power conversion processes are well understood and modelled, there will always be inherent and inevitable uncertainty in wind power forecasts. e uncertainty comes from the incomplete knowledge of the physical processes that influence future events [9]. e uncertainty of wind power forecasts mainly depends on the predictability of the current meteorological status and the level of the predicted wind speed [10]. To assist with the management of the forecasting uncertainty, extensive research studies have been conducted to develop wind power forecasting methods. Different regression methods have been introduced in [10][11][12]. ese approaches use probabilistic forecasts generated by different quantile regression methods to provide the complete information of future wind production. A multiscale reliable wind power forecasting (WPF) method was developed by Yan et al. in [13]. is method provides the expected future value and the associated uncertainty by a multi-to-multi mapping network and stacked denoising autoencode. Wang et al. developed a short-term wind prediction method with the convolutional neural network (CNN) based on information of neighbouring wind farms [14]. One popular approach is to use ensemble-based probabilistic forecasting methodologies, which enable better wind power management and trading purposes [15,16]. In [17,18], statistical analysis has been conducted to study the distribution of wind power forecasting errors. Because wind power is stochastic in nature, errors will always exist in wind power forecasts. erefore, besides predicting the expected value of the future wind power, it is also important to estimate its forecasting errors.
A key weakness of the above studies lies in that they fail to establish proper statistical models for interval forecasting of wind power and also fail to take into account the timechanging effect of the error distribution. Generally speaking, a prediction interval is a stochastic interval, which contains the true value of wind power with a preassigned probability. Because the prediction interval can quantify the uncertainty of the forecasted wind power, it can be employed to evaluate the risks of the decisions made by market participants. Existing methods discussed above cannot effectively handle wind power interval forecasting since they mainly focus on predicting the expected point value of wind power.
ere are two main challenges for providing accurate interval forecasting of wind power: (i) the expected value of wind power should be accurately predicted. is is difficult since wind power is a nonlinear time series and is therefore highly volatile. e nonlinear research system has high complexity in the fundamental research, and lots of nonlinear control problems unceasingly emerge in real fact [19][20][21][22][23]; (ii) the probability distribution of forecasting errors should also be accurately estimated. is is even more difficult since the error distribution can be time-changing. In this paper, a novel approach is proposed to forecast the prediction interval of wind power. A statistical model is first formulated to properly model the time series of wind speed. Based on the proposed model, a number of different machine learning algorithms are introduced to predict the expected value of wind speed and the parameters of forecasting error distribution. Prediction intervals of wind speed are then constructed based on the predicted wind speed value and error distribution. e wind speed prediction interval is finally transformed into the wind power prediction interval with the wind turbine power curve. Comprehensive studies are performed to compare the performances of six machine learning algorithms in wind power interval forecasting. e main contributions of this paper are as follows: (1) A comprehensive statistical model is introduced, which forms the theoretical basis for wind power interval forecasting. (2) Different machine learning regression methods are incorporated into the proposed model. e comparison of different regression algorithms in wind power forecasting is presented.
(3) e proposed integrated statistical machine learning approach can highlight the essential information of the available data. e rest of the paper is organized as follows: in Section 2, a statistical model for the wind speed time series is formulated. We also introduce the Lagrange multiplier (LM) test to verify that the forecasting errors of wind power have a time-changing distribution. In Section 3, the basic concept of machine learning and six machine learning algorithms for wind power forecasting are introduced. Afterwards, comprehensive case studies are performed in Section 4. Section 5 finally concludes the paper.

The Statistical Model of the Wind Speed Time Series
To forecast the power output of a wind turbine, a widely used approach is to predict the wind speed first and then transform the predicted wind speed into wind power with the power curve. erefore, in this section, a statistical model of wind speed is first formulated. We will also briefly explain how to integrate the proposed model with nonlinear regression techniques to forecast the prediction intervals of wind speed. e wind speed time series can usually be assumed to be generated by the following stochastic process: where Y t denotes the random wind speed and y t is the observed value of Y t at time t. X → t ∈ R m is an m-dimensional explanatory vector. Each element X t,i of X → t represents an explanatory variable which can influence Y t , for example, the temperature and humidity. e current value of Y t can be determined by its lagged values Y t−1 , Y t−2 , . . . and the explanatory vector X → t . Note that the mapping f(•) from Y t−1 , Y t−2 , . . . , X → t to Y t can be any linear or nonlinear function. Most existing methods essentially forecast wind speed by estimating mapping f(•); the forecasted value f(•) of f(•) can be called the point forecast of wind speed. According to (1), the wind speed Y t contains two components: f(•) is a deterministic component, and ε t is a random component, which is also known as noise. Statistical and engineering models are an approximation to reality, not reality, so they always have some degree of errors. Nowadays, there are lots of research studies about error tracking and control [24][25][26][27][28][29]. Precise prediction and reducing the error are the prerequisite for all further control works. Detailed statistical studies [30] show that ε t can be assumed to follow a normal distribution. We therefore have (2) Because f(•) is a deterministic function, we should be able to approximate it with arbitrary accuracy by employing a powerful nonlinear machine learning technique (e.g., neural network). Most existing wind speed forecasting methods mainly focus on estimating f(•) and selecting its estimated value as the predicted wind speed. On the contrary, because of the uncertainty introduced by noise ε t , errors will always exist in wind speed forecasts. erefore, estimating μ and σ 2 is essential for estimating the 2 Complexity uncertainty of Y t . In models (1) and (2), parameters μ and σ 2 are assumed to be constant. In practice, the model parameters can usually be time-changing. We therefore introduce the following time-changing distribution model of wind speed: Similar to f(•), mappings g(•) and h(•) can also be either linear or nonlinear. According to model (3), the uncertainty of wind speed is time-changing. e mean and variance of noise ε t are determined by the previous noises and the explanatory vector. Note that model (3) is a generalization to the traditional ARCH (AutoRegressive Conditional Heteroskedasticity) model; since by setting u t ≡ 0 and assuming f(•) and h(•) are linear functions, model (3) will be identical to the ARCH model. To more strictly justify our model, the Lagrange multiplier (LM) test can be employed to verify that the wind speed has a time-changing distribution. In the case study, we will test whether the actual wind speed data of Australia have a time-changing distribution by performing the LM test.
Based on statistical model (3) of wind speed, we can construct the prediction interval, which contains the true value of wind speed with any preassigned probability. e definition of the prediction interval can be given as follows.
Because noise ε t is usually assumed to be normally distributed, the α-level prediction interval can therefore be calculated as where f t (•) represents the value of the deterministic component f(•) at time t, α is the confidence level, and z (1−a)/2 is the critical value of the standard normal distribution. Based on (4) and (5), to calculate the prediction interval, we should first obtain three quantities: the wind speed forecast f t (•), the mean μ, and the variance σ 2 of the noise. In practice, traditional time-series models, such as ARIMA and GARCH, usually perform poorly on shortterm wind speed forecasting since they are linear models and therefore cannot handle the complex nonlinear patterns of wind speed data. To give accurate wind speed forecasts, the three mappings f(•), g(•), and h(•) in model (3) should be accurately estimated with nonlinear techniques. In this paper, we introduce six different machine learning methods to estimate f(•), g(•), and h(•). To apply machine learning methods to estimate g(•) and h(•), an unsolved problem is how to obtain the estimates of mean μ t and variance σ 2 t of the noise. In this paper, the moving window method is employed. Given the noise series ε t , the estimates of μ t and σ 2 t can be calculated as By combining a machine learning method with proposed model (3), the main procedure of wind power interval forecasting is given as follows: (1) Given the historical wind speed data Y t and the Note that e t can be considered as the estimate of noise ε t .
(3) Based on error series e t , calculate the estimates of μ t and σ 2 t with equations (6) and (7). (4) Based on error series e t and mean and variance estimate series μ t and σ 2 t , employ a machine learning technique to estimate functions μ t � g(e t−1 , e t−2 , X → t ) and σ t � h(e t−1 , e t−2 , X → t ), and use them as the estimates of g(•) and h(•). (5) To forecast the wind speed at t, first employ f(•), g(•), and h(•) to calculate f t (•), μ t , and σ 2 t ; then, calculate the wind speed prediction interval with equations (4) and (5). (6) Transform the wind speed prediction interval into the wind power prediction interval with the wind turbine power curve, which will be discussed in the following sections.

Machine Learning Methods for Wind Power Interval Forecasting
In this section, we first provide a brief introduction to machine learning, which is an important research area in forecasting. Six machine learning algorithms used in this paper are then presented. e power curve for converting wind speed into wind power is introduced. We finally discuss how to evaluate the performance of wind power interval forecasting methods.

Introduction to Machine
Learning. Machine learning is science that studies how to use the computer to simulate or realize human learning activities. It is one of the most intelligent and leading-edge research fields in artificial intelligence. Machine learning techniques are essential to the renewable energy integration such as PV and wind power [31,32]. Machine learning can be divided into supervised learning and unsupervised learning [33][34][35]. As can be seen from Figure 1, supervised learning can be classified into classification and regression, and unsupervised learning can be classified into clustering and correlation.
Regression [36] is a process to estimate a functional mapping between a data vector and a target variable. Regression aims at determining a continuous target variable, which is usually named as the dependent variable, while the data item itself is usually called independent variables, explanatory variables, or predictors. For example, in wind speed forecasting, the predictors can be historical wind speed, temperature, and humidity, while the independent variable is the future wind speed. Regression usually estimates the mapping based on a training dataset in which the independent variables of all data items have been given. Regression is therefore a supervised learning problem in the sense that the estimation of the mapping is supervised by the training data. Regression is also an important research area of statistics. e most important statistical method is linear regression, which assumes that the independent variable is determined by a linear function of predictors. In recent years, the machine learning society has proposed many other regression methods, such as deep learning. In this paper, we will introduce six different machine learning regression techniques and integrate them with the proposed statistical model to perform wind power interval forecasting.

Linear Regression.
Linear regression is a traditional and widely used statistical technique for regression. It is selected as the baseline technique in this paper and will be compared with five nonlinear techniques. Linear regression models the relationship between the dependent variable y i and the vector of predictors x i . Linear regression assumes that the independent variable y is linearly dependent on the predictors x plus a noise term ε i . e model can be written as where x i ′ β T is the inner product between vectors x i and β. And these n equations can be written in the vector form as where y � ε is usually assumed to follow a normal distribution with a zero mean and variance σ 2 . We therefore have and β is a p-dimensional parameter vector, which specifies how much each component of X contributes to the output y [37].

Multilayer Perceptron Network.
A multilayer perceptron (MLP) network is a feedforward artificial neural network model that maps sets of input data onto a set of appropriate outputs. Based on the standard linear perceptron, MLP uses three or more layer nodes with nonlinear activation functions. An MLP network consists of a set of source nodes as the input layer, one or more hidden layers of computation nodes, and an output layer of nodes. Figure 2 shows the signal flow process of a feedforward neural network. A MLP network has two stages: a forward pass and a backward pass. e forward pass includes presenting a sample input to the network and letting activations flow until they reach the output layer [38,39]. branch of machine learning. It is essentially a special artificial neural network. Deep learning utilizes a multilayer network structure and applies appropriate nonlinear transformation functions to each hidden node to achieve the purpose of high-level abstraction of the data. e traditional feedforward artificial neural network usually contains only one hidden layer, but there are many hidden layer structures in deep learning. erefore, deep learning adopts a training mechanism completely different from the traditional artificial neural network in order to solve the problem of the deep neural network in training [40]. LSTM is a time-cycle neural network, which can effectively solve the gradient explosion and gradient disappearance problems compared with the traditional cycle neural network. LSTM is composed of a set of cyclic subnets called memory blocks. Each memory block is composed of the input gate, forgetting gate, and output gate. Figure 3 shows the LSTM structure [41].

Long Short-Term Memory (LSTM) Network-Based
In general, the LSTM recursive neural network is composed of the following components: input gate i t has the corresponding weight matrix w xi , w hi , w ci , and b i ; forgetting gate f t has the corresponding weight matrix w hf , w cf , and b f ; and output door o t with the corresponding weight matrix w xo , w ho , w co , and b o . e function of the input gate is to record the new information to the cell state selectively. e function of the forget gate is to selectively forget the status information in the cell; the function of the output gate is to export certain information from the cell. e detailed workflow of LSTM is shown in the following [42]: where σ is the logistic sigmoid function with the output in [0, 1] and tanh represents the hyperbolic tangent function with the output in [−1, 1].

Lazy IBK.
Lazy IBK is one of the widely used lazy learning methods. Lazy learning methods defer the decision of how to assign the dependent variable until a new query explanatory vector is inputted. When the query explanatory vector is received, a set of similar data records is retrieved from the available training dataset and is used Complexity to assign the dependent variable to the new instance [43]. In order to choose the similar data records, lazy methods employ a distance measure that will give nearby data records higher relevance. Lazy methods choose the k data records that are nearest to the query instance. e dependent variable of the new instance is determined based on the k-nearest instances.
Lazy learning algorithms have three basic steps: (i) Defer: lazy learning algorithms store all training data and defer processing until a new query is given. (ii) Reply: a local learning approach developed by Bottou and Vapnik in 1992 is a popular method to determine the dependent variables for news queries [44]. In the Bottou and Vapnik learning approach, instances are defined as points in the space, and a similarity function is defined on all pairs of these instances. (iii) Flush: after solving a query, the answer and any intermediate results are discarded.

Regression Tree.
A regression tree is one of the widely used decision tree algorithms. A decision tree is a data-mining tool designed to extract useful information from large datasets and use the information to help decision-making processes. A regression tree consists of a set of nodes that can assign the value of the dependent variable to an explanatory vector. Regression tree constructs a tree style decision rule set and divides the training data into the leaf nodes of the decision tree according to the numerical or categorical values of explanatory variables. e regression rules of each leaf node are derived from a mathematical process that minimizes the regression errors of the leaf nodes [45]. Table. Similar to the regression tree, decision table also determines the value of the dependent variable with a set of decision rules [46]. However, the decision table arranges decision rules as a table, rather than a tree. A decision table usually consists of a number of parallel decision rules. Similar to the regression tree, the training data will be divided into several groups, each of which will be represented by a decision rule. For a given explanatory vector (input), an appropriate decision rule will be first selected based on the values of its explanatory variables. e dependent variable for this input will be assigned as the average of the dependent variables of all training data vectors in the corresponding group. e dependent variable can also be determined by performing linear regression on the corresponding group of training data. Empirical studies show that the decision table has a similar performance to regression trees.

Converting Wind Speed to Wind
Power. An elementary method is used in this paper to convert the predicted wind speed to the predicted wind power output of a wind turbine or wind farm. e predicted wind speed is provided by one of the six machine learning regression methods discussed above. e wind speed is then input into the certified wind turbine power curve and transformed into the wind power. e Vestas V90-3.0 MW wind turbine is selected for the case studies in this paper. Vestas V90-3.0 MW is a pitch regulated upwind wind turbine with active yaw and a threeblade rotor. It has a rotor diameter of 90 m with a generator rated at 3.0 MW. Vestas V90-3.0 MW is widely used in Australia wind power plants and has a proven high efficiency. e typical power curve of Vestas V90-3.0 MW, 60 Hz, 106.7 dB(A) is shown in Figure 4. It can be clearly observed that the wind power output p(u) is proportional to u 3 for small wind speed u. Moreover, the power curve is steep for medium wind speeds and flat for large wind speeds. e cut-in speed is 3.5 m/s, and the cut-out speed is 25 m/s [47].

Performance Evaluation.
Before proposing the case study results, several criteria are introduced for performance evaluation. Given T historical wind power values p t , 1 ≤ t ≤ T, of a time series p t which are converted from T historical wind speed observations and the corresponding forecasted power values p * t , 1 ≤ t ≤ T; mean absolute percentage error (MAPE) is defined as MAPE is a widely used criterion for time-series forecasting. It will also be employed to evaluate the proposed method in the case studies.
Another two criteria are presented to evaluate interval forecasting. Given T wind power values p t , 1 ≤ t ≤ T, of a time series P y and the corresponding forecasted α-level prediction intervals [l t , u t ], 1 ≤ t ≤ T, the empirical confidence α [48] and the absolute coverage error (ACE) are defined as α � frequency p t ∈ l t , u t T , where α is the number of observations, which fall into the forecasted prediction interval (PI), divided by the sample size. It should be as close to α as possible.

e Setting of Case Studies.
In the experiments, the wind power forecasting model has been evaluated using the wind speed data from the Devonport Airport Wind Station, Tasmania, Australia. e data were provided by the Australian Bureau of Meteorology. e training and testing data have the following four numerical features: wind speed, wind direction, humidity, and temperature. e training data are from 1st February 2018 to 1st March 2018, while the testing data are from 1st February 2019 to 1st March 2019. 6 Complexity To empirically prove the validity of our model, we will first verify that the wind speed data exhibit time-changing distribution effect by performing the Lagrange multiplier test [49,50]. e results of the LM test with 95% significance level on the data from 1st February 2019 to 1st March 2019 are given in Table 1.
As illustrated in Table 1, setting the significance level as 0.05, P value of the LM test is zero in all six cases. Moreover, the LM statistics are significantly greater than the critical value of the LM test in all occasions. ese two facts strongly indicate that the wind speed data have strong effect of timechanging distribution. In the test, an order of 10 means that the variance σ 2 t is correlated with its lagged values up to at least σ 2 t−10 . In other words, the wind speed at 10 time units before time t can still influence the uncertainty of the wind speed at time t.

Results of Wind Speed Forecasting.
Wind speed forecasting is the first step of wind power forecasting. Six regression methods are first employed to perform one-hourahead wind speed forecasting in this paper. e performances of six algorithms are shown in Table 2.
As illustrated in Table 2, the MAPEs of LSTM and lazy IBK are smaller than other methods. Moreover, the MAPE of LSTM is under 10%, which is sufficiently good considering the very high volatility of wind speed. e results indicate that these two nonlinear machine learning regression methods perform well in wind speed forecasting. e forecasting errors of three methods are graphically shown in Figure 5. In Figure 5, the visual inspection suggests that the forecasting errors of the three algorithms have a normal distribution. It is very important to know the type of the error distribution to ensure that the proposed statistical model has a valid assumption. To empirically prove that the wind speed forecasting errors are normally distributed, the forecasting errors of all six methods are checked for normality by performing the Kolmogorov-Smirnov normality test. e test results also show that all the six forecasting methods have normally distributed errors. ese results again verify the validity of the assumptions of our model.

Results of Wind Power Interval Forecasting.
e wind speed forecasts given by the six machine learning regression algorithms are then converted into wind power forecasts as discussed in Section 3. Similarly, mean absolute percentage error (MAPE) is used to evaluate the performances of different methods. From Table 3, it is observed that, for wind power forecasting, the MAPE of LSTM is still lower than other five algorithms.
Based on Tables 2 and 3, the LSTM method is selected as the wind speed point forecasting method (the estimator of f(•)). e procedure discussed in Section 2 is then employed to give the prediction intervals of wind power. We will employ all six regression methods to estimate g(•) and h(•) and then compare their performances in wind power interval forecasting.   Complexity 7 In Table 4, for 95% and 99% confidence levels, the ACEs of different regression methods are presented. As seen in Table 4, the ACEs of five nonlinear methods are similar regardless of the confidence level. On the contrary, all the five nonlinear regression algorithms outperform linear regression. is is a clear proof that strong nonlinearity exists in the wind power data. e 95% level and 99% level prediction intervals given by different methods are illustrated in Figures 6 and 7. As illustrated, the prediction intervals given by all the five nonlinear machine learning algorithms perfectly contain the true values of wind power. ese results clearly prove the effectiveness of the proposed statistical model. Moreover, the results also show that nonlinear machine learning regression methods are suitable candidates in wind power interval forecasting. Compared with other machine learning methods, LSTM performed best in wind power interval forecasting. LSTM is a deep learning neural network algorithm. e improvement of the structure level of the deep learning neural network will make the information abstraction ability of the deep learning model stronger. erefore, its ability to extract and learn complex information from large amounts of data is also stronger. e accuracy of wind power interval forecasting will be improved accordingly. Multilayer perceptron (MLP) can be categorized as the feedforward neural network. In the traditional feedforward neural network such as MLP, the input layer, the hidden layer, and the output layer in the network are fully connected, but the nodes within each layer are disconnected. is structure results in the inability of the traditional feedforward neural network to deal with the problem of correlation between inputs. Compared with the feedforward neural network, circular neural network introduces directional circulation. At this point, the nodes between hidden layers in the network are no longer disconnected but connected. And the input of the hidden layer includes not only the output of the input layer but also the output of the hidden    layer at the last moment. As a conclusion, LSTM can perform better than MLP.

Conclusion
is research work develops a novel comprehensive integrated statistical machine learning strategy for wind power forecasting in Australian wind farm, including exploration of the statistical characteristics of the data by statistical tools and developing the forecasting model by different statistical machine learning methods. Accurate wind power interval forecasting is essential for efficient planning and operation of power systems. Wind energy is characterised by its nonlinearity and intermittency, which pose significant challenges for wind power forecasting. Traditional linear timeseries models cannot appropriately handle these challenges and therefore cannot achieve satisfactory performances. In this paper, we propose a machine learning-based statistical approach, which can handle nonlinear time series with timechanging distributions, thus is suitable for wind power interval forecasting.
Compared with other relevant references, this research work shows that classical regression techniques are not suitable for complicated applications such as wind power interval forecasting. It is inappropriate simply to use linear assumptions for these problems. In addition, other research works only using complicated machine learning approaches failed to balance the important information of the historical data. Experimental results show that LSTM is the most suited candidate for wind power forecasting. Moreover, the effectiveness and accuracy of the proposed model in wind power interval forecasting are also proven with the case studies.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.