Comparison of Weibull Estimation Methods for Diverse Winds

Wind farm siting relies on in situ measurements and statistical analysis of the wind distribution. +e current statistical methods include distribution functions. +e one that is known to provide the best fit to the nature of the wind is the Weibull distribution function. It is relatively straightforward to parameterize wind resources with the Weibull function if the distribution fits what the function represents but the estimation process gets complicated if the distribution of the wind is diverse in terms of speed and direction. In this study, data from a 101m meteorological mast were used to test several estimation methods. +e available data display seasonal variations, with low wind speeds in different seasons and effects of a moderately complex surrounding.+e results show that the maximum likelihood method is much more successful than industry standard WAsP method when the diverse winds with high percentile of low wind speed occur.


Introduction
One of the first and most crucial steps for wind farm investment is to investigate the characteristics of the local wind resources. e aim is to select the optimum locations for the turbines, maximizing income and minimizing cost. e analysis most commonly relies on in situ measurements collected through conventional meteorological masts (met. masts) [1,2]. Recently, remote sensing devices with vertical wind profiling capabilities have also been accepted by the market, but these devices have limitations, mostly related to the complexity of the terrain [3][4][5][6]. erefore, the methodology of spatial modeling based on point measurements is practically the only acceptable methodology for the wind site assessment step. A conventional met. mast is usually erected on a location that is considered representative of the area of interest. Sometimes more than one met. mast may be necessary if the wind characteristics are particularly variable in an area. Cup anemometers, vanes, sonics, and atmospheric sensors are used as measurement devices. It is important that data collection has a high recovery rate in order to accurately capture the wind resources, and a rate of 90% is ideally preferable. e final aim is to use the collected data for creating sector wise and generalized wind statistics for the measurement location. e generalized wind statistics can be used to create regional wind climates, also known as wind atlases. Since the atlas represents the larger domain, one can statistically transfer the in situ measurements to desired wind turbine locations. After several iterations of the process, the best production numbers are calculated, which will maximize the income of the wind farm For many years, wind data have been represented by the Rayleigh distribution function [7,8]. Although they are similar and come from the same statistical methodology, for almost over three decades, the Weibull distribution function has been the choice of experts, mainly because it is considered to be more representative of the natural wind characteristics [9]. One of the first studies that used the Weibull function as the main statistical methodology was the European Wind Atlas [10], which leads to the industry standard Wind Atlas Methodology. e Weibull distribution function (equation (1)) can be presented as a cumulative (CDF) (equation (2)) or probability (PDF) (equation (3)) density distribution function where U is the wind speed, A is the scale of the wind speed (m/s), and k is the unitless shape parameter [11].
e common way of using the distribution function is to fit measured wind speed frequency data to the probability density function to get the relationship between the wind speed and the wind frequency with two-parameter Weibull distribution.
ereafter, one can define the observational climate of the location by only recording two characteristics, A and k.
Although the method looks straightforward, there are known problems. e most important one is diverse wind conditions, occurring with high frequency during the year, which add extra uncertainty to computational fitting parameters; this is the case in all used methods [12]. In this study, a dataset with high recovery rate from a semicomplex terrain at 101 m height was used to compare different estimation methods of the Weibull parameters. e aim was to perform a comparison between the most widely used methods for a representative height for common wind turbines.
In the next section, studies on distribution function for wind energy use and specifically Weibull distribution are discussed, followed by the experimental setup and presentation of available wind data used with five different estimation methods. Finally, the results are compared and discussed.

Experimental Setup and General Wind Characteristics
Izmir Institute of Technology (IZTECH) is located on the west coast of Turkey, around 40 km to the west of the city of Izmir at the village of Urla. e peninsula surrounding the area has more than 20% of the total installed wind power capacity of Turkey. In August 2017, a mast with 101 m height was erected in the university campus with several instruments mounted (Table 1) at N38.3332°E26.6326°geographic coordinates. e reasoning of the new met. mast is to make academic studies about wind speed distributions, atmospheric stability, and complex terrain short-term predictions. erefore, the mast is equipped with also two 3D sonic anemometers and measurement devices in different heights. Data collection started on the 1st of August 2017 and the campaign is ongoing. e best recovery rate, with over 99.9% for a full year's data, is available between the 21st of December 2017 and the 21st of December 2018, totaling 52191 of 10-minute samples. e available data were filtered with two rules: (i) the standard deviation of the channels is bigger than zero and (ii) wind speeds below a calm threshold level for cup and vane of 0.3 m/s and 0.4 m/s, respectively, are ignored; meanwhile, the final recovery rate for the channels dropped down to 99%. e data can be considered to be of high quality in terms of availability. e air density was measured to be 1.19 kg/m 3 on average and only changes ±0.01 kg/m 3 throughout the day in monthly and yearly average but can sometimes reach 1.01 kg/m 3 in the occasional 10-minute sample. e location of the met. mast was chosen for research purposes (Figure 1), such that each wind sector has different combinations of terrain and roughness classifications. e main wind direction is the first sector, S1, where the northerly wind occurs more than 45% of the time. S1 is located 5 km away from coastline and has a flat terrain at 50 m a.s.l., covered with grass. S2 is only 1 km away from the coastline and is under the influence of the sudden roughness change. S3, S4, and S5 are occupied by a small village with narrow streets, where the tallest houses are approximately 6.5 m high. e village is built parallel to the coastline and the sea is nearly 1.25 km from the met. mast at its closest location. S6 and S7 are occupied by the IZTECH campus, with the buildings closest to the mast being 4 m high at maximum, while further away (> 12 m distance), there are taller university buildings but more sparsely located. A hill parallel to the coastline with an average height of 350 m a.s.l. covers the grounds of S8-S12. e vegetation of these sectors is characterized by low bushes (max. 30 cm) mixed with grassland. e only unique sector within this range is sector S9, where 5 units of 3 MW wind turbines are located more than 5 km away from the mast. Due to the long distance, it is assumed by the author that the met. mast is not under the influence of these turbines ( Table 2).
Vertical wind profiles were calculated with all the data and the yearly statistics show an almost logarithmic vertical wind speed profile except for the minor speed up at 30 m height (Figure 2(a)). e wind directional turn shows minor terrain effects above 50 m where the averaged wind direction becomes nearly stable (Figure 2(b)). All five wind speed channels were also analyzed for diurnal statistics, which demonstrate that the measurement location is characterized by an almost laminar flow at night and quite unstable conditions during daytime ( Figure 3). Based on the yearly averaged wind speed values, the omnidirectional wind shear was calculated with the power law function (equation (4)) as 0.18, being within the safety limits of 0.0 and 0.2 as the IEC 61400-1 standard suggests. e dominant northerly wind direction's wind shear is even lower, at 0.09. e wind sectors from the urban areas, S3 to S5, display wind shear and turbulence values above the acceptable ranges; it must be noted, however, that few data points were available for these sectors, hindering the wind shear calculations, as discussed later in the manuscript.
Monthly statistics of the top-mounted anemometer, WS101, show high wind speeds during the fall and winter periods, while the averaged wind speeds get lower in the summer and spring as expected due to the local weather 2 Advances in Meteorology conditions (Table 3). Wind data were split into 12 equal wind direction sectors, based on the vane measurements at 98 m coupled with the top-mounted cup anemometer at 101 m. Directional wind speed statistics when the top-mounted anemometer is coupled with the closest wind vane at 98 m show highest energy density at sector 1 and sector 7, which are centered at 0°and 180°, respectively (Table 4).

Distribution Analysis
In the last decade, several other distribution functions have been put forward/have been proposed, among which the most commonly used in the literature seem to be the following: Rayleigh, Weibull, Lognormal, Gamma, Pearson, Kappa, Erlang, and Gumble or bimodals of these [13][14][15][16][17].
Among the referenced studies, the general conclusion is that the two-or three-parameter Weibull distribution is the best distribution function to describe the wind characteristics [11,18]. is is also reflected in industry applications, where any major wind energy or wind field modeling tool employs the two-parameter Weibull function (e.g., WAsP and WindPRO).
In the current study, selected dataset is also used for analysis using sector wise Weibull distribution function.
Two-parameter Weibull function is used with each dataset split with sector wise wind direction; each holds 30°sections. e relationship between the mean wind speed U and the Weibull parameters, similar to equation (1), is given in equation (5); A is the scale and k is the shape parameter as before. Mean value of the wind speed can be calculated. Power density, P d , can also be computed from the third moment of the equation (see (6)). Gamma (Γ) is the Euler Gamma Function (equation (7)).
Several different methods have been studied extensively in the past by the wind energy community. A review of the literature shows a great variety of formulations in estimating the Weibull distribution parameters (Table 5) [34,35], but see [36]. In the current study, a selection of methods from the literature was used for the estimation of Weibull parameters.. Elimination of methods is done through similarities. Between similar methods of EM and PDM and between GM and LSM, the EM and the LSM are chosen, respectively. Moment methodologies have similarities with WAsP; therefore they are ignored. Five methods are shortlisted for the study: EM, MLM, MMLM, LSM, and WAsP.

Empirical Method (EM).
One of the most well-known and simple methods for estimating the Weibull parameters is the Empirical Method (EM), derived from the power density function [25,27,29,30,32]. e energy factor can be calculated as the ratio between the mean of the sum of cubes of all wind speeds and the cube of the mean wind speed (equation (8)). After E f is calculated, one can easily use a derived numerical solution of k and A (equation (9)). e simplicity of the equation makes it very useful for initial calculations, but it has also been observed that the fit is not as good as that in other methods when there is low wind speed and high turbulence, which makes the Weibull PDF not a smooth curve. Nevertheless, it was chosen for this study as a reference formulation in order to explore the possible differences with more advanced methods.

Maximum Likelihood Method (MLM).
Another wellknown and widely used method is the Maximum Likelihood Method (MLM) [20][21][22][23][24][25][26][27][28][29]. In MLM, the k parameter is found with the iteration of equation (10) for n number of samples with the initial shape parameter with the value of k � 2. After finding k within the desired limits with equation (11), the scale parameter A can be computed with equation (12).

Modified Maximum Likelihood Method (MMLM).
ere is an alternative version of MLM, which has a modification on the wind frequencies [20,21,26,27,30,32]. e method is mostly preferred when there is a large amount of missing data. All wind frequency values are weighted based on the available data; therefore the method is called      (11) is rewritten with the addition of the frequency by knowing that all the samples are above 0 m/s (equation (13)). e same iteration from MLM is applied to calculate the modified scale parameter A m as in equation (14).

Least Square Method (LSM).
A less preferred method but one that is known to be more accurate for diverse frequencies distributions is the Least Square Method (LSM) [21,23,30,37]. e Weibull function is transformed into a linear function format as in "y � Gain · x + Offset." In order to make this transition, log normal of both sides of equation (2) is taken and this leads to the desired format as in equation (15). e left side of the equation can be computed from the wind speed variable and a fit algorithm can be used to calculate k and A.

WAsP Weibull Method.
is method uses a different approach for estimating the Weibull parameters; it assumes that the wind speed above the mean value is most likely to create the maximum power. erefore, Weibull fitting by using only the data above the mean wind speed value is considered to be more realistic by the method's developers [38]. Nevertheless, that does not mean it is more effective for diverse frequencies distributions; therefore it has been selected as one of the methods to be studied. e reference documentation about the method is not detailed but the WAsP method is one of the most used methodologies through the WAsP software family. e first step in the method is to define a parameter that gives the probability of the wind speeds above the mean value, which can be calculated through the cumulative density function (CDF) with the Weibull parameters (equation (2)). If the mean wind speed is applied to the function, the cumulative density will be the total sum of the probabilities of the wind speeds below the mean values; therefore 1-F (U) becomes the proportion of the values above the mean value, U p (equation (16)).
When both sides of the equation are taken as logarithmic normal (equation (17)), one can write the parameter A as a function of k or vice versa and calculate both in two steps. e shape parameter A can be written as a function of k through the power density function (equation (6)), which is equal to the mean of cube sum of wind speed samples (equation (18)). If parameter A is singled out, it can be written as a function of k(equation (19)). If equations (17) and (19) are merged as in equation (20), one can use iterative numerical steps to solve k and place the result in equation (19) to calculate A.  [19] x x x Seguro [20] x x x Donk [21] x x x Ramirez [22] x x x Carta [23] x x Cellura [24] x Akdag [25] x x x x Saleh [26] x x x x Khahro [27] x x x x x Arslan [28] x x Mohammadi [29] x x x x x Katinas [30] x x x x Ali [31] x Kang [32] x x x x Polnumtiang [33] x 6 Advances in Meteorology Omnidirectional data and grouped sectors were used in the analysis (see Table 2). Sectors were grouped based on the common roughness and obstacle types of the sectors. Group I includes sectors 1 and 2, with the northerly winds and the highest number of samples. e urban area located in sectors 3, 4, and 5 constitutes Group II, which has the lowest number of samples and a total energy density calculated to be close to zero. erefore, the results for Group II might be misleading; nevertheless they are still presented for the sake of completeness. e university zone, sectors 7 and 8, constitutes Group III and the sectors covered with hills from 8 to 12 form Group IV. e collected data were analyzed by wind speed bins of 0.5 m/s and all available data for the selected sectors were processed for the calculation.

Results
In order to understand the accuracy of the Weibull estimation method, most of the studies address the question as a statistical error and calculate the root mean square error (RMSE) based on the measured data.
where y i is the measured value and x i is the Weibull-parameters-based calculated value. However, this method can be misleading because it places the same importance on  every range of wind speed and the total sum of errors is not weighted by the possible energy density. erefore, in the current study, the accuracy is also evaluated through the power density function because this is the real effective difference between estimation methods when it comes to calculating the wind energy production. It is common to calculate the power density based on two datasets as in equation (22), where ρ is the air density and f m is the frequency for the given wind speed range. e values can be used to drive an error percentage value, ε (equation (23)), compared to the measured value.
e omnidirectional results ( Figure 4 and Table 6) show a minor error in power density prediction, as low as ± 3-4% for the methods EM, MLM, and WAsP, which is already within the limits of statistical uncertainty [39], while frequency levels show that almost 25% of the whole data is below 3 m/s, which is the cut-in wind speed for most turbines. When the sector wise results are observed ( Figures 5-7 and Table 6), it is also seen that low wind speeds below 3-4 m/s are common and can affect the calculations for the all groups. Among the compared methods, MLM appears to be the best performing overall, with a maximum error of 3% in power density calculations at the sectors without urban areas.
e WAsP gives the second lowest percentage error, arriving at similar results to those of EM, with the difference being that the WAsP method has a better fit for wind speeds close to the mean level, which is intentional by the design of the methodology, as was described in subsection "WAsP Weibull Method." MMLM and LSM produce high error percentages nearly in every sector and even in the omnidirectional fit. RMSE and R 2 values show an inverse relationship as is expected. When the RMSE increases, the R 2 parameter decreases, and vice versa, but there

Conclusion
Yearly 10-minute statistics from a 101 m met. Mast and topmounted cup anemometer data have been used for comparison of Weibull parameter estimation. e measurement period is from 27 December 2017 to 27 December 2018 for a total of 1 full year with over 99% recovery rate. e data has almost 25% low wind speeds, which causes difficulties on the estimation, which is the core reason of the study. Vertical wind characteristics from the met. mast show that the measurement location has wind shear within limits of the standards and vertical directional turn is negligible between cup anemometer and the vane used in the study. Basic statistics of monthly and 12 equal wind sectors are calculated. Sectors 2 to 6 are excluded from the study due to the low recovery rate and urban areas closer than 500 m. e results leading to the conclusion are made with the other sectors. e omnidirectional and sector wise calculations for five different estimation methods are tested and the results are presented with observed statistics. e distribution fits are analyzed through root mean square errors (RMSE) and R 2 parameters in addition to the power density error function (ε). e results show that the Maximum Likelihood method (MLM) has the best performance. A clear link has not been found between ε and RMSE and/or R 2 contrary to the several previously cited studies, where conclusions are made based on these statistical parameters. Other estimation methods show high uncertainty and power density calculation error. ese two results lead to the fact that power density error estimation is the most effective way of checking the distribution fit quality even though similar studies. Based on Group III results, it can be said that the low number of samples with low wind speeds causes high uncertainty and deviation in any method.
Several wind data analysis tools in the world use WAsP Weibull fitting method as the core estimation method for Weibull parameters through the WAsP software. Nevertheless, it is observed that, for the diverse winds, Maximum Likelihood Method is much more stable and has a better correlation with measured data. One should create observational wind statistical data by means of MLM and apply the Weibull parameters to the model even if the used model is WAsP software.
In this study, wind speed data in the form of 10-minute statistics from a 101 m met. mast with top-mounted cup anemometers are used to compare different methods for the estimation of Weibull parameters. e measurement period was from 27 December 2017 to 27 December 2018, that is, one full year of data with over 99% recovery rate. A high proportion of the dataset (over 20%) consisted of low wind speeds, which are known to cause difficulties in the parameter estimation, and this is the main driver that motivated the study. e vertical wind characteristics from the met. mast show that the measurement location has wind shear within the limits of the industry standards. Basic statistics of monthly and 12 equal wind sectors were calculated. e sectors are divided into four groups (Figure 1). e results leading to the conclusion were produced with data from these groups. Nevertheless, Group II does not have enough amount of data to compare and present the methods; therefore, it is excluded from the results. e omnidirectional and sector wise calculations for five different estimation methods were tested and the results are presented with the observed statistics. e distribution fits were analyzed through root mean square errors (RMSE) and R 2 parameters in addition to the power density error function (ε). e results show that the Maximum Likelihood method (MLM) had the best performance for the dataset. A clear link has not been found between ε and RMSE and/or R 2 contrary to several previous studies, where conclusions on performance were made based on these statistical parameters. Selected list of these referenced studies is in Table 5.
e other estimation methods tested showed high uncertainty and higher power density calculation errors. ese two results lead to the conclusion that power density error estimation is the most effective way of checking the distribution fit quality of an estimation method even though similar studies only focus on the statistical terms of RMSE and/or R 2 . Based on the results of Group III, it is evident that low number of samples combined with low wind speeds causes high uncertainty and deviation in any method.
Several wind data analysis tools in the world use the WAsP Weibull fitting method as the core estimation method for Weibull parameters through the WAsP software. Nevertheless, it is demonstrated that, in the case of diverse winds, the Maximum Likelihood Method is much more stable and has a better correlation with measured data. For these types of datasets, it is advisable to create observational wind statistical data by means of MLM and apply the Weibull parameters to the model.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e author declares that there are no conflicts of interest regarding the publication of this article.