Comparison of the Applicability of Two Reanalysis Products in Estimating Tall Tower Wind Based on Multiple Linear Regression and Artificial Neural Network in South China

Climate reanalysis products have been widely used to overcome the absence of high-quality and long-term observational records for wind energy users. In this study, the applicability of two popular reanalysis datasets (ERA5 and MERRA2) in estimating wind characteristics for four tall tower observatories (TTOs) in South China was assessed. For each TTO, linear and nonlinear downscaling techniques, namely, multiple linear regression (MLR) and an artificial neural network (ANN), respectively, were adopted for the downscaling of the scalar wind speed and the corresponding U/V components. The downscaled wind speed and U/V components were subsequently compared with the TTO observations by correlation coefficient (Pearson’s r), the root mean square error (RMSE), the uncertainty analysis (U95), and the reliability analysis (RE). According to the results, ERA5 had a better applicability (higher Pearson’s r and RE, but lower RMSE and U95) in estimating TTO wind speed than MERRA2 when using both the MLR and ANN downscaling method. The average Pearson’s r, RE, RMSE, and U95 of the downscaled wind from ERA5 by the MLR (ANN) method were 0.66 (0.69), 40.8% (41.8%), 2.20 m/s (2.11m/s), 0.181m/s (0.179m/s), respectively, and 0.60 (0.63), 38.0% (39.7%), 2.32 m/s (2.25 m/s), 0.189 m/s (0.187m/s), respectively, for MERRA2. The wind components analysis showed that the better performance of ERA5 was attributed to its smaller error in estimating V component than MERRA2. For the wind direction, the two reanalysis datasets did not display distinct differences. Additionally, the misalignment of the wind direction between the reanalysis products and the TTOs was higher for the secondary predominant wind direction (SPWD) than for the predominant wind direction (PWD). Furthermore, we found that the reanalysis U wind had a higher RMSE but a lower RE and Pearson’s r than the V wind, which indicates that the misalignment in the wind direction was mainly associated with the deviation in the U component.


Introduction
High-quality wind observations are essential for wind resource assessment prior to the construction of wind farms [1]. Traditionally, the hub-height wind speed is measured at tall tower observatories (TTOs) at the representative location of a wind farm, and observations should be recorded for at least one year. However, factors such as tower icing and equipment failure may lead to the absence of wind records.
us, TTO wind measurements must be inspected before being used for wind resource assessment.
Traditionally, missing wind records were interpolated with data from nearby meteorological stations. However, the station data must meet certain prerequisites, such as the climate conditions and the time resolution should be similar to the TTO. A low-density distribution of meteorological stations may limit the e ectiveness of using station data for interpretation, especially in developing countries. In China, most stations are located near urban areas and are usually far from areas suitable for wind generation and development.
is could create distinct microclimates between TTO and meteorological station.
Reanalysis datasets are the outputs of numerical models based on reprocessed and assimilated meteorological observations [2,3]. Such data have global coverage and are more resolved in space and time than meteorological stations. us, reanalysis has gained popularity among many wind power researchers [4,5]. In recent years, MERRA and MERRA2-with hourly temporal resolutions and an adequate wind speed height (50 m)-have been widely used in wind power modeling [6,7], and the results have displayed relatively low errors when compared to measurements. In recent years, a new-generation reanalysis product named ERA5 has been released, and many studies have demonstrated that ERA5 has better accuracy than MERRA2 for wind resource assessment [1,8,9]. In certain applications, the variability of surface wind is often represented by the wind error in space and time. However, few studies have compared the differences in applicability of different reanalysis wind speeds at the hub height.
Since wind resource assessments may be required at various scales that differ from the pixel scale for reanalysis, downscaling is necessary for various applications. Downscaling techniques are classified into statistical and dynamic methods [10]. Dynamic downscaling uses numerical weather models to produce high-resolution variables for selected regions or in situ sites by processing the physical mechanism of a land-atmosphere system; notably, large amounts of computational resources and detailed knowledge of landsurface properties are required [11]. In contrast, statistical downscaling relies on the statistical relationships between in situ data and large-scale variables, thus providing an attractive alternative to dynamic downscaling due to its simplicity and lower computational demands [12]. e most widely used approach for establishing statistical relationships is the multiple linear regression (MLR), which has been proven effective in estimating regional climate variables such as precipitation [13] and temperature [14]. However, since the relationships among climate variables are highly nonlinear, some high-potential tools, such as artificial neural networks (ANNs), have risen in popularity for use in complex nonlinear models. With respect to ANN, the distinctive structure of the network and the nonlinear transfer function related to each hidden and output node allow ANNs to estimate highly nonlinear relationships [12]. Moreover, while other regression techniques assume a functional form, ANNs allow the data to define the functional form [15,16]. Comparisons have been performed between MLR and ANNs in estimating regional temperature [17] and precipitation [12]. However, comparisons of different downscaling methods have rarely been conducted for wind variables.
China has been experiencing a high development speed in wind energy development in recent years. MERRA or MERRA2 has been widely used in characterizing wind resource in China [18,19]. However, few studies have analyzed the reliability of the new ERA5 in China. us, the purpose of this paper was to reveal the reliability of ERA5 and MERRA2 in estimating tall tower wind characteristics and to explore the applicability of MLR and ANN downscaling methods.

Tall Tower Observations.
e observed wind speed data used in this study came from four TTOs located in mountainous areas of South China (Figure 1). Two towers were 80 m high, and the other two were 90 m. e data period for each tower was listed in Table 1. Data were recorded at hourly resolution.

Reanalysis
Datasets. MERRA2 from the NASA Center for Climate Simulations was released in 2016 to replace the first version of MERRA. MERRA2 uses an upgraded data assimilation system, namely, the Goddard Earth observing system model, version 5 (GEOS-5) data assimilation system. ERA5 is the most recent climate reanalysis product of the European Center for Medium-range Weather Forecasts. e major improvement of ERA5 over the former product, ERA-Interim, is that it has higher spatial and temporal resolutions and provides wind variables at 100 m, which makes it particularly suitable for wind resource assessment. e characteristics of MERRA2 and ERA5 were presented in Table 2. Both reanalysis products had the same temporal resolution of 1 hour, but MERRA2 had a coarser spatial resolution.

Statistical
Downscaling. MLR and ANN, representing linear and nonlinear methods, respectively, were used to downscale the reanalysis data to the TTOs. For MLR, the overall relationship can be written as follows: where y represents the simulated results; x i represents the independent variables; a i is the estimated regression coefficient for each independent variable; and b is the deviation value.
For ANN, we selected the back-propagation (BP) network for analysis. A BP network contained three layers: the input layer, one or more hidden layers, and the output layer. Each layer consisted of processing elements, called neurons, which were fully connected to the neurons of the preceding layers by weight (W) and bias (b). e training of a BP network involved determining the best combination of W and b for each neuron that produces the smallest difference between the output and actual value. For the input and output layers, the number of neurons depended on the input and output variables of the problem. However, the number of neurons in the hidden layer depended on the complexity of the problem, which was specifically determined during the training of the network. Typically, a BP network with one hidden layer can predict any nonlinear input-output relationship. Figure 2 illustrates the BP network considered in this study.
In the hidden layer, each neuron was passed an input value and exported the corresponding outcome based on the following processing: where O n is the outcome of the n th neuron; a n is the weighted sum at the n th neuron; W in is the weight for the connection of the i th input neuron and the n th hidden layer neuron; x i is the i th input variable; b n is the bias of the n th hidden layer neuron; and φ is the nonlinear activation function for the hidden layer. e output layer neuron implemented the same processing operations as the hidden layer neuron. e output of the network (S) can be written as follows: where τ is the nonlinear activation function for the output layer, and b is the bias of the output layer neuron.

Data
Process. e flowchart of this study was presented in Figure 3. e wind records for each TTO and reanalysis data were randomly divided into three groups: a training group, a validation group, and a testing group, with a proportion of 7 : 1 : 2. e training group was used for training the MLR and ANN models. During the training process, five variables of the reanalysis grid which was nearest to the TTO were selected as model inputs: the reanalysis wind speed (W), 2-meter air temperature (T), hourly variation in the 2-meter air temperature (DT), surface pressure (P), and hourly variation in surface pressure (DP), as these variables have been shown to have a strong physical relationship with the wind speed parameter and have been used to train neural network models in other studies [20,21]. at is, the ANN models had five neurons in the input layer. For the MLR method, the variables with insignificant regression coefficients were eliminated from the regression equation until all the coefficients reached a significant level [22]. While training the model, the validation group was established to ensure that ANN training did not result in overfitting. After the MLR and ANN model were trained, variables from the testing groups in MERRA2 and ERA5 were applied to estimate the wind values, respectively. e accuracy of the estimated results were then determined by comparing them with the actual values of the four TTOs using the following evaluation metrics: Correlation coefficient (Pearson's r): Pearson's r depicts the degree of agreement between the simulated and the   Advances in Meteorology observed wind, with values range from 0 to 1, where a larger value of Pearson's r means higher agreements. Root mean squared error (RMSE): the RMSE represents the magnitude of deviation, and a lower RMSE value indicates a smaller magnitude of the bias. Uncertainty analysis (U95): the major goal of the uncertainty analysis is to restrict the expected range in which the true value of the outcome of an experiment lies. Here, following Saberi-Movahed et al. [23], the U95 was used to compute the uncertainty interval, which can be interpreted as follows: performing the given experiment repeatedly over and over again, the true value of the outcome of that experiment will lie in the offered uncertainty interval for approximately 95 times out of each 100 trials. Reliability analysis (RE): the reliability analysis is a statistical method for measuring the overall consistency of a model. To be more specific, it determines if a suggested model achieves a permissible level of performance.
ese evaluation metrics were given by Input layer Hidden layer Output layer where m is the length of testing group; y i represents the wind speed of the tall tower observatory; y i is the wind speed simulated by the MLR or ANN. In Formula (8), k i is obtained through two steps. First, the relative average error (RAE) is defined as a vector whose i th component is where Δ is the threshold value of reasonable wind deviation. In other words, k i is defined as the number of times the value of RAE is less than or equal to that of Δ. Following Saberi-Movahed et al. [23], the optimum value of Δ is 0.2 or equivalently 20%.
Additionally, before training the ANN model, the input and output data were normalized using min-max normalization to avoid network nonconvergence due to the magnitude difference among different variables. In this study, a total of 24 ANNs were trained based on observations from 4 TTOs and 2 reanalysis products for 3 targets (the scalar wind speed, U component of wind, and V component of wind). Notably, the wind direction was calculated based on the U/V components. e properties of the network used in this study can be seen in Table 3. e training and testing procedures were implemented with the AMORE package in the R language.

e Interpolation of Wind Speed.
e scatter plots of the observed wind speed and the wind speed simulated by the MLR and ANN interpolation methods were presented in Figures 4 and 5, respectively. e upper row of each plot was simulated from ERA5, and the second row was simulated from MERRA2. Among the four TTOs, T4 had the best correlation and the lowest RMSE for both the MLR and ANN methods; taking MLR as an example, the correlation coefficients were 0.80 and 0.70, and the RMSEs were 1.70 m/s and 1.94 m/s for ERA5 and MERRA2 (Figures 4(d) and 4(h)), respectively. In contrast, the correlation was the poorest for T3, and the RMSEs were 2.71 m/s and 2.79 m/s for ERA5 and MERRA2 based on the MLR method (Figures 4(c) and 4(g)), respectively.
For all the tall towers, the wind speed simulated from ERA5 yielded higher correlations and lower root mean square error (RMSE) than MERRA2 for both the MLR and ANN downscaling methods. For the MLR method, the average RMSE of ERA5 was 2.20 m/s, which was 0.12 m/s lower than MERRA2. For the ANN method, the average RMSE of ERA5 was 0.14 m/s lower than MERRA2.
In addition, the interpolation results based on ANN were better than MLR for all four tall tower observatories. However, the difference in RMSE between different downscaling methods was smaller than that between different reanalyses. For ERA5, the average RMSE of ANN was 0.09 m/s lower than the MLR method. For MERRA, the average RMSE of the BP network was 0.07 m/s lower than the MLR method. e results of uncertainty and reliability evaluation were presented in Table 4. Table 4 shows that ERA5 had a lower value of U95 for either the MLR or ANN downscaling method when compared to MERRA2. For example, for the ANN method, the average U95 of ERA5 was 0.179 m/s, which is lower than MERRA2 (U95 � 0.187 m/s). Table 4 also shows that the reliability of wind speed downscaled from ERA5 by both the MLR (RE � 40.8%) and ANN (RE � 41.8%) method was higher than MERRA2 ( e RE value was 38.0% and 39.7% for the MLR and ANN, respectively). Additionally, comparing different downscaling methods, the ANN provided lower uncertainty but higher reliability than the MLR method. Take ERA5 as an example, the average uncertainty value was 0.179 m/s for the ANN method, which was lower than that for the MLR method (U95 � 0.181 m/s). Also, the reliability value was 41.8% for the ANN method, which was higher than that for the MLR method (RE � 41.8%). Moreover, it should be noted that, like the RMSE, the difference in uncertainty and reliability between different downscaling methods was smaller than that between different reanalyses. For example, the difference in U95 between different downscaling methods was 0.002 m/s for both ERA5 and MERRA2, while they were 0.008 m/s between different reanalyses for both the MLR and ANN methods. Figures 6 and 7 show the wind rose plots drawn from the TTOs and reanalysis downscaled by the MLR and ANN methods, respectively. As shown, the wind rose plots drawn from the reanalysis were similar to those based on observations, and the difference in the wind rose plots based on ERA5 and MERRA2 was no significant difference in visual appearance, as was that between the plots based on the MLR and ANN interpretation methods.

e Interpolation of Wind Direction.
Compared to TTOs, both the reanalysis products had better results for the predominant wind direction (PWD) than for the secondary predominant wind direction (SPWD). For example, for T2, the PWD was NNE (Figure 6(d)), which was perfectly estimated by EAR5 and MERRA2 by either the MLR method (Figures 6(e) and 6(f ))  Figure 8 presents the RMSE and Pearson's r for the downscaled U/V components of wind based on the MLR (Figure 8(a)) and ANN (Figure 8(b)) methods. As shown, the figure showed an obviously lower RMSE but higher Pearson's r for the V wind than the U wind for both ERA5 and MERRA2. For example, for ERA5 downscaled by the MLR method (Figure 8(a)), the RMSE and Pearson's r were 2.80 m/s and 0.80 for the V wind, respectively, and 3.18 m/s and 0.55, respectively, for the U wind. Similarly, for MERRA2 downscaled by the MLR method (Figure 8(a)), the average RMSE of the U/V wind was 3.14/2.95 m/s, and Pearson's r was 0.55/0.80. e results were the same for the ANN downscaling method (Figure 8(b)).

e Interpolation of U/V Components of Wind.
For different interpolation methods, the ANN method displayed slightly better performance than MLR in simulating both U/V winds. e average RMSE of the U wind predicted by the ANN method was 3.07 m/s and 2.98 m/s for ERA5 and MERRA2, respectively (Figure 8(b)), and the corresponding values obtained for the MLR method were higher (3.18 m/s and 3.14 m/s for ERA5 and MERRA2,  Advances in Meteorology respectively) (Figure 8(a)). Similarly, for the V wind, the average RMSE predicted by the ANN method was 2.68 m/s and 2.78 m/s for ERA5 and MERRA2, respectively; the RMSE values were higher for the MLR method (the corresponding RMSE was 2.80 m/s and 2.98 m/s, respectively). Table 5 shows the uncertainty and reliability analysis for the U/V components of wind. From the results of uncertainty (U95), the V wind had higher U95 than the U wind for both ERA5 and MERRA2. e average U95 of V wind downscaled by the MLR method was 0.282 m/s and 0.296 m/          (7), the U95 is not a normalized value. It positively corresponds to the magnitude of wind. As Figures 6 and 7 shown, the most frequency of wind direction showed a south-north pattern, indicating that the wind component in the latitudinal direction (i.e., the V wind) was larger than that in the longitudinal direction (i.e., the U wind). is leads to a larger U95 for the V wind. As a normalized value, RE is more suitable for comparing the applicability of U/V wind. Table 5 shows that the RE of V wind was higher than U wind, which was consistent with the results of RMSE and Pearson's r. For example, the average RE of V wind downscaled from ERA5 by the MLR method was 23.2%, and that was 11.8% for the U wind. Furthermore, in the estimation of V wind, ERA5 showed better uncertainty and reliability than MERRA2. For example, the average U95 of the MLR method was 0.282 m/s for ERA5, which is lower than MERRA2 (U95 � 0.296 m/s). However, with respect to U wind, the applicability of the two reanalyses in different TTO had its own advantages and disadvantages. Additionally, by comparing different downscaling methods, we found that the ANN method produced better performance in the uncertainty and reliability analysis. Take V wind of ERA5 as an example, the value of U95 for the ANN method was 0.280 m/s, which was lower than the MLR method (U95 � 0.282 m/s). And the value of reliability for the ANN method was 25.4%, while it was 23.2% for the MLR method.

Discussion
Climate reanalysis has been widely used to overcome the absence of high-quality and long-term observational records for wind energy users. e new ERA5 dataset has been proven superior to the traditional MERRA2 product in predicting the regional distributions of wind resources in the South China Sea [24], Europe [25], America [26], and other areas worldwide [9]. Here, we analyzed the applicability of these two reanalyses in estimating tall tower wind characteristics using different statistical downscaling methods in South China.
On average, ERA5 displayed improvements (higher Pearson's r and RE, but lower RMSE and U95) over MERRA2 in the ability to estimate wind speed based on both the MLR and ANN downscaling methods. Similar results were obtained in Europe [1,27,28] and the Arctic [8]. For example, Ramon et al. [28] compared five global reanalysis data sets (including ERA5, MERRA2, ERA-Interim, JRA55, and NCEP-R1) with in situ observations worldwide, and the results showed that the ERA5 surface winds exhibited the best agreement with the in situ winds at a daily time scale. As for the reasons, it may include the various representations of land-surface roughness and varying data density and quality and the output properties in different reanalysis models. For example, the output wind of ERA5 was 100 m high, which was closer to the tall tower height in this study than MERRA2 [1]. Additionally, the accuracy of wind speed components for different reanalyses also contributes to the reasons. In this study, regardless of the downscaling method, the V component of ERA5 had a smaller error compared to MERRA2. For example, when using the ANN downscaling method, ERA5 produced a lower RMSE (2.68 m/s) than MERRA2 (2.78 m/s) when estimating the V component of wind (Figure 8(b)). However, with respect to U wind, the applicability of the two reanalyses in different TTO had its own advantages and disadvantages. erefore, the better ability of ERA5 in estimating the V component of wind could be another reason for its better applicability over MERRA2.
For the wind direction, the downscaling simulations driven by ERA5 and MERRA2 did not display distinct differences in the reproduction of wind direction. is is consistent with the comparison among other reanalysis products [29]. ey tested the reconstruction skill of three reanalysis products (i.e., ERA-Interim, CFSR, and JRA55) in reproducing the China Yellow Sea coastal winds and demonstrated that these products were consistent with each other in the reproduction of local wind direction. However, comparisons of the reconstruction skill of wind direction between ERA5 and MERRA2 are few studied in other locations. Here, we found that both ERA5 and MERRA2 could reproduce the PWD better than the SPWD. e misalignment was typically within 1 wind direction angle for the PWD. However, the SPWD was biased by at least 2 wind direction angles. We reveal the reasons for wind direction differences by analyzing the bias of wind speed components, and the results showed that the V wind had a higher Pearson's r and RE but lower RMSE than the U wind for all four TTOs, indicating that the misalignment in wind direction was mainly associated with the deviation in the U component of wind.
Additionally, our results showed that the ANN method yielded better estimations of wind speed than the traditional MLR method. e average RMSEs of the ANN/MLR methods were 2.11/2.20 m/s and 2.25/2.32 m/s for ERA5 and MERRA2, respectively. e superiority of the ANN method over other downscaling methods has also been verified for other climate factors, such as precipitation and evapotranspiration [12,22]. Our findings suggest that the ANN is an attractive alternative to traditional methods for the interpolation of wind records. However, we found that difference applicability between different downscaling methods was smaller than that between different reanalyses. us, for the wind resources assessment, a comparison of the applicability between different reanalysis products is a priority issue to be addressed.

Conclusion
In this study, wind characteristics downscaled from two reanalysis products by two downscaling methods were compared with the observations from four tall towers in South China. Various evaluation metrics (including the correlation coefficient, the root mean square error, the uncertainty analysis, and the reliability analysis) were applied to reveal the differences in applicability between different reanalysis products and different downscaling methods. e following specific conclusions were drawn from this study: (1) ERA5 had a better performance over MERRA2 in the ability to estimate wind speed based on both the MLR and ANN downscaling methods. Smaller error of ERA5 in estimating the V component could be one of the reasons for its superiority over MERRA2. (2) For the wind direction, the wind rose plots produced by ERA5 and MERRA2 were generally consistent. Additionally, both the reanalysis products were more accurate for the estimation of the predominant wind direction (PWD) than the secondary predominant wind direction (SPWD). (3) e ANN downscaling method yielded better metrics than the MLR method. However, the difference in capability between different downscaling methods was smaller than the difference between different reanalyses. erefore, for the interpolation of wind records, the choice of reanalysis products is more important than downscaling methods.
Data Availability e datasets generated during and/or analyzed during the current study are not publicly available due to the data also form part of an ongoing study but are available from the corresponding author on reasonable request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.