Applications of the MVWG Multivariable Stochastic Weather Generator

Weather generators (WG) became significant modules of crop models and decision support systems in the past decade. Using a large meteorological database from North America; two basic problems, related to the applicability of WGs in case of short or lacking data series, were investigated in the framework of the Multivariable weather generator (MVWG). First, the minimum data series length, required for adequate parameterization of the WG, was determined. Our results suggest that 15 years of observed data are enough for adequate parameterization of the MVWG. We then investigated a possibility of spatial interpolation of WG parameters using the outputs of the WG for sites with no meteorological observations. Coupled with the presented interpolation technique, MVWG was able to generate realistic weather data for sites with no measurements situated in climatically and geographically homogeneous regions.


Introduction
Artificial weather data series produced by stochastic weather generators can be used in a wide range of agrometeorological, hydrological, risk analysis, and climate change studies. Combined with spatial interpolation methods, generators can provide realistic weather data for locations with no measurements using the data of the surrounding meteorological stations. Coupled with future climate scenarios weather generators can produce coherent present and future weather data series that could be used for predicting the agricultural, hydrological, ecological, and economic effects of the prospective climate change with the help of various impact models (e.g., crop models).
The outputs of weather generators have been used many times in scientific studies: as inputs for crop models [1][2][3][4][5] and also in climate change studies [6][7][8][9]. The main hindrance of using and improving crop models is the lack of input data-including weather data-that is required for operating and testing these models. There are two situations when weather generator outputs could be especially useful for crop modellers: they can be applied for sites (1) where no meteorological data is available or (2) where only a short data series is available.
In the past decade, many meteorological stations were set up all over the world mainly owing to special (e.g., environmental modelling) projects, as well as to the decrease of purchase prices. Hence, there are plenty of automated weather stations that have only a couple of years long data series. A frequently applied presupposition concerning weather generators is that the minimum of 30 years, that is, the so-called climate normal period, is required for their adequate parameterization. The first objective of our study was to determine the minimum data series length required for parameterizing the MVWG [10] stochastic weather generator to produce sufficient quality weather data.
Certain applications (e.g., crop model aided decision support systems) require (at least) realistic weather data for locations with no measurements. These data could be produced by weather generators that were parameterized by using the measured data of the neighboring meteorological stations [11]. Thus, the second objective of our study was 2 The Scientific World Journal to investigate the conditions of obtaining sufficient quality weather data using the spatially interpolated results of the MVWG generator.

Materials and Methods
2.1. The MVWG Weather Generator. The recently developed multivariable stochastic weather generator, MVWG [10], was used in this study. Not repeating all the details [10], the main features of the MVWG are as follows: the daily precipitation amounts as well as the dry or wet status of a day are calculated independently of other meteorological variables. MVWG uses the "serial approach" [12] instead of Markov chain for generating consecutive dry and wet spells. 3-parameter Weibull function (1) is used for calculating the length of dry and wet spells. The same Weibull function (1) is introduced for approximating the distribution of daily precipitation amounts: where , , and are parameters of the cumulative distribution function (CDF) that are to be determined by curve fitting.
Calculation of the rest of the daily variables (maximum and minimum temperature, global radiation, sunshine hours, relative humidity, cloudiness, wind speed, and atmospheric pressure) is conditioned on the dry or wet status of the day. The time series of each variable is reduced to a time series of residual elements by removing the annual course of means and standard deviations. The annual courses of means and standard deviations are approximated by a 3-order Fourier series. Residual series of the variables are calculated by using the weakly stationary generating process proposed by Matalas [13] where the auto-and cross-correlations of the generated variables are implemented by calculating lag-0 and lag-1 cross-correlation matrices monthly. The established monthly matrix equations are solved by applying the spectral decomposing theorem. A user friendly interface has been created for MVWG that enables the users to customize the parameterization procedure as well as handling the input and output data. [14] database has been used in the study. Data series of each location were at least 20-year-long from the period of 1961 to 1990 and included daily global radiation, maximum and minimum temperature, and precipitation [15]. This database provides a very good basis for analyzing and comparing the performance of weather generators since it contains data of locations from different climates, and the covered period is free from explicit monotonous trends [16].

Minimum Data Series Length for Generator Parameterization.
A subset of 61 stations ( Figure 1) having complete 30-year data series  has been collated from the SAMSON [14] database for finding the minimum data series length for generator calibration. Eighty subseries were separated within the 30 year data series for each location ( Figure 1): six 25-year-long series (1961-1985, 1962-1986, . . . , 1966-1990); eleven 20-year-long series (1961-1975, 1962-1976, . . . , 1971-1990); sixteen 15-yearlong series (1961-1975, 1962-1976, . . . , 1976-1990); twentytwo 10-year-long series (1961-1970, 1962-1971, . . . , 1981-1990) and twenty six 5-year-long series (1961-1965, 1962-1966, . . . , 1986-1990). MVWG was parameterized using each subseries as well as the 30-year-long series, and 100-year-long synthetic data series were generated for every location using each parameterization. The generated series were compared to the observed 30-year-long series. The expected values of the monthly number of wet days, the cumulative solar radiation, average temperature, and cumulative precipitation values were compared for every location and for each month, which meant 61 × 12 = 732 comparisons for every climatic variable and for each parameterization. Mann-Whitneytest [17] was used to determine whether the difference was significant or not in a comparison since the normality of the distributions could not be guaranteed. Zero, one, two or three were assigned to each comparison if the result of the -test indicated nonsignificant ( ≥ 0.05), marginally significant ( < 0.05), significant ( < 0.01), or highly significant ( < 0.001) differences, respectively, between the observed and the synthetic data. The sum of the assigned values gave a total score (TS) for each climatic variable and for each parameterization, while the maximum score (MS) was 732 × 3 = 2196. An acceptability index (AI) was calculated for all climatic variables and for each parameterization using the following formula: The higher the acceptability index, the better the performance of the weather generator: AI = 100 means that no significant differences could be found between the observed and the generated data series of the climatic variable in question while AI = 0 means that the compared data series were significantly different for all the 61 sites and for every month. Acceptability indices for the corresponding (equally long) subseries were averaged.
The Scientific World Journal 3 Table 1: Central points (CP) and border points (BP) of the polygons, defined for interpolation of the WG parameters, as well as the average distance (D CP-BPs ) and the mean altitude difference (AD CP-BPs ) between the central point and the border points.

# Central point (CP) Border points (BPs)
Number of BPs D CP-BPs (km) AD CP-BPs (m)  1  CHNC  COSC, GRNC, GRSC  3  132  88  2  COOH  DAOH, HUWV, MAOH  3  131  65  3 WAIA  Then, the observed and the synthetic data series were used as crop model inputs. Corn yields and annual cumulative evapotranspiration values were calculated with the CERES-Maize crop model [18] for all of the investigated sites. Soil data of a loam profile and plant specific data of a FAO-400 cultivar were retrieved from the database of the DSSAT ver. 3.5 software package [19] and were given as inputs to the crop model. The simulation results obtained with the generated weather data were compared to those obtained with the observed data using -test and the above defined acceptability index.

Spatial Interpolation of the Generator Results.
Lam [20] and Burrough [21] have described a variety of quantitative interpolation methods suitable for computer algorithms which could be divided into exact (proximal, b-splines, kriging, etc.) and approximate (trend surface analysis, fourier series, distance weighed averaging, etc.) methods. A simple, point based approximate interpolation method with distance weighed average scheme (3) has been chosen for this study. From the available 38 weather stations (Figure 2), twenty polygons (Table 1) were formed in the Eastern and the central regions of US (three example polygons presented in Figure 2), defined by one central point and 3 to 9 border points. MVWG was parameterized for every border station, and the parameter values were interpolated into the central point using the following formula: where : parameter for the central point, : parameters for the th border point, : distance between the central and the th border point, and : number of border points. The result parameters of the interpolation were then used for generating 100-year-long data series for the central points of the polygons. The mean of the monthly values of the following climatic variables in the observed and the generated data series was compared: number of wet days; cumulative global radiation; average temperature; and cumulative precipitation. The synthetic and the observed data were compared by using -test and the previously defined acceptability index with a maximum score of 3 × 12 = 36 since the comparisons were carried out on a monthly basis for each polygon.
The effect of the number of the border points (Figure 2, polygons 15 and 19 have the same central point having 4 and 9 border points, resp.), as well as of the average distance and altitude difference between the central and the border points (Table 1) on interpolation efficiency (measured by AI) 4 The Scientific World Journal  denote central points, and, A, B, and C denote border points) whose data used for testing the spatial interpolation method. Three example polygons are presented with central points 4, 15, and 19 (see Table 1). Polygon #15 and polygon #19 have the same central point but have 4 (solid line) and 9 (dashed line) border points, respectively. was also investigated. The observed weather data, as well as the synthetic data series generated with the previously introduced interpolation technique were also used as crop model inputs for the central points of each polygon ( Table 1). The rest of the model inputs were set according to Section 2.3. The simulated corn yield and evapotranspiration values, calculated by using the observed and generated weather data, were compared by using Student's -tests.

Minimum Data Series Length for Generator Calibration.
Dependence of the generator performance (acceptability indices) on the data series lengths that were used for parameterization is presented in Table 2. Though exact determination of the minimum required length is difficult, one can establish that in case of the generated meteorological variables even 15 years would provide fair approximations in over 95% of the cases (even considering significant and highly significant differences with double and triple weights, resp.). For crop model simulations, an at least 20-year-long observed data series is required for the same level of adequacy. The difference is likely caused by the nonlinear accumulation of the errors in the CERES crop model. One should also remark that 30 years provide a 100% exact simulation compared to the real observed series even in case of crop yield and evapotranspiration calculations. When the distributions of the AIs over the months or the sites were investigated no extraordinary month or station was found. The performance of the generator was practically the same for all of the months and stations. Thus, the results in Table 2 could be regarded to be quite robust, independent in the location and the period within the year. Concerning the calculated yields, the nonsignificant, slightly significant and highly significant -test results correspond with a 0-2%, 9-12% and 16-20% average relative errors, respectively. Crop modellers usually consider a 15% relative error in yield prediction to be an acceptable deviation. When 15-year-based artificial weather data were used for yield predictions, the proportion of the highly significant differences (average relative error greater than 15%) was only 2%. Based on these, one can be almost 100% certain that 15 years of observed weather data would be enough for adequately parameterizing the MVWG generator.

Spatial Interpolation of the Generator Results.
The introduced spatial interpolation technique was not able to provide adequate parameters for generating realistic weather data for 5 polygons (Table 3). The failure of the interpolation can most likely be traced back to specific geographical features of the "problematic" polygons. The central point of the 3rd polygon, Waterloo, IA, USA (Table 1) Table 3: Acceptability indices obtained when the observed weather series were compared to the generated series for central points of the investigated polygons. The parameters for generating artificial weather data were obtained by spatial interpolation using the data of the border points.  Tables 3 and 4 are cross-checked. The interpolation technique performed worst for the 16th polygon (central point: Memphis, TN, USA) and was significantly worse than the next worst case: the 6th polygon (central point: Atlanta, GA, USA). Despite this, there were no significant differences in the crop model results at Memphis, while, in Atlanta, significantly different yield and evapotranspiration values were obtained with generated and with observed weather data. This apparent inconsistency can be resolved by the following arguments. In the applied crop model the relationship between the global radiation and the mass production is direct; the higher the radiation is, the more biomass is produced. The effect of the temperature, though it is more complex, is opposite. The higher the temperature is, the less biomass is produced due to increased heat stress, as well as increased water stress induced by elevated transpiration rates. Therefore, radiation and temperature take effect reversely. In Atlanta, the generated data underestimated global radiation and overestimated temperature, resulting in less mass production when the generated weather data was used for the simulations (Table 4). This is most likely caused by less available sunlight and increased evapotranspiration. In Memphis, the weather generator underestimated both the radiation and the temperature. Their effects counterbalanced each other; thus there was practically no difference (<0.3%) between the yield results obtained with observed and generated weather data. Note, that even for the worst case (polygon #20), the relative error of the yield estimations is smaller than 15%. For the 90% and 70% of the investigated polygons, the relative error is smaller than 10% and 5%, respectively.
For the rest of the investigated polygons (14 out of 20), the generator performance was quite convincing in estimating climatic parameters as well as yields. The investigated climatic parameters of the synthetic data series were statistically similar to those of the observed series. There were only two polygons with a more moderate performance with <90% accuracy for radiation. Despite this, no significant differences were found for these polygons when the crop model results were investigated (Table 4).
No explicit relationship was found between the efficiency of the used interpolations technique and the investigated factors: the number of the border points, the average distance and altitude difference between the central and the border points. Despite this fact, the following rule of thumb could 6 The Scientific World Journal be observed. The efficiency of the interpolation (expressed in AI, Table 3) decreases as the average distance or the average altitude difference between the central and the border points increases (Table 1). One should note, however, that the above analysis has been performed for polygons with fairly even and low topography, where the network of existing stations may be representative enough. Regions with higher and more complex topography could behave differently especially with regard to the interpolation technique presented above.

Conclusions
This study was carried out to determine the minimum data series length required for adequate parameterization of the MVWG stochastic weather generator using the data of 61 meteorological stations in the USA. Based on a 15 year-long measured database, the MVWG is able to generate realistic weather data that can be used in meteorological as well as in modeling applications with high (95-100%) accuracy. Though this series length is most likely insufficient for accurately modeling the occurrence of extremes, it seems to be enough for mimicking the averages of the most important climatic parameters.
The applicability of a simple, point based approximate interpolation method with distance weighed average scheme to provide adequate weather generator parameters was also investigated. With the help of the presented interpolation technique, the MVWG could be used for generating realistic weather data for sites with no measurements situated in climatically and geographically homogeneous regions. The topography of the investigated midwest and southern regions of the US is generally less diverse than that of the western regions. The findings of this paper apply to somewhat homogenous terrains and should be tested for mountainous and very heterogeneous terrains as well.