Estimating Potential Evapotranspiration by Missing Temperature Data Reconstruction

This work studies the statistical characteristics of potential evapotranspiration calculations and their relevance within the water balance used to determine water availability in hydrological basins. The purpose of this study was as follows: first, to apply a missing data reconstruction scheme in weather stations of the Rio Queretaro basin; second, to reduce the generated uncertainty of temperature data: mean, minimum, and maximum values in the evapotranspiration calculation which has a paramount importance in the manner of obtaining the water balance at any hydrological basin. The reconstruction of missing data was carried out in three steps: (1) application of a 4-parameter sinusoidal type regression to temperature data, (2) linear regression to residuals to obtain a regional behavior, and (3) estimation of missing temperature values for a certain year and during a certain season within the basin under study; estimated and observed temperature values were compared. Finally, using the obtained temperature values, the methods of Hamon, Papadakis, Blaney and Criddle, Thornthwaite, and Hargreaves were employed to calculate potential evapotranspirationthatwascomparedtotherealobservedvaluesinweatherstations.Withtheresultsobtainedfromtheapplicationofthisprocedure,thesurfacewaterbalancewascorrectedforthecasestudy.


Introduction
The uncertainties associated with global reanalysis data or simulated output from the atmosphere-ocean global circulation models (GCMs), hydrological processes models and land surface models (HPMs and LSMs) to predict terrestrial water and energy balance [1,2], have been usually evaluated by means of multiple statistical indicators, such as the efficiency, skewness, and the mean quadratic error [3,4].The most appropriate calibration models are based on error minimization under the hypothesis that the only source of error in measurements is Gaussian type.Actually, there are different sources of error including the uncertainty of the input data (e.g., precipitation and temperature) employed in the calibration-validation of the model, the structure of the model, and its associated parameters, having an incidence in the distribution of these sources of error, which are not usually explicitly recognized; therefore, the calibration processes can produce biased parameters [5].
Most of hydrological models assume that precipitation is equally distributed through the basin.However, since precipitation is intercepted by the vegetation and returns to the atmosphere as vapor reaching the ground surface only when the vegetation storage capacity is exceeded, the distribution of the hydric balance components can be modified spatially by vegetation coverage on the ground surface [6,7].
On the other hand, the proportional changes of the evaporation along a time period can be related to modifications in soil use, vegetation coverage, and climate.Furthermore, some empirical models based on weather data and algorithms [8] as well as energy balance formulations have been developed to estimate the evapotranspiration, but those models show significant differences with respect to field measurements [9].
McMahon et al. in 2013 [10] identified five definitions of evapotranspiration, but only three are considered to be useful: actual, real, and potential evapotranspiration (PET).This paper is focused on estimating PET, and it is possible to the actual and the real, the latter defined as the maximum meteorologically evaporative power on land surface [11].Potential evapotranspiration is a function of humidity, availability of surface energy, wind speed, and temperature; however, recent papers have questioned the ability of temperature to describe PET variability [12].Evapotranspiration is an important component of the hydrological cycle in which water is returned to the atmosphere as water vapor [13,14], and it often leaves a small fraction of usable renewable water [15] and is fundamental to understand water resources systems [16] since it impacts several climate properties and processes [17].
The insufficient registrations for the hydrological period, the natural variability mixed with the anthropogenic changes recognized in diverse climatic change studies, and future effects should be considered within the hydrological studies as well as the water resource management programs.Such questions are addressed by different hydrologists for the management of hydrological resources [18].
In this context, a research about the uncertainty of the potential evapotranspiration calculation was carried out in the present work, with reconstruction of temperature data in traditional weather stations located within the Rio Queretaro basin, which is of a semidesert type and where records of climate variables at a daily frequency are available.In the next sections a description of the study area, the current status of the temperature data, and the method of temperature data implicit in PET estimating for Rio Queretaro basin is shown.

Study Zone:
The Rio Queretaro Basin.The Rio Queretaro basin is part of the Lerma-Santiago regional basin.It is located at the central part of Mexico, latitudes 20 ∘ 15  -21 ∘ 00  North and longitudes 100 ∘ 05  -100 ∘ 40  West.The Rio Queretaro flows to the west having the Juriquilla (North) and El Pueblito (South) streams as the main affluence to it.The water balance of the basin shows deficits from 129 to 106 × 10 6 m 3 during the 2003-2010 period; the annual runoff was also reduced (DOF, Mexican Government, 2003, 2010).The basin exhibits the shape of a long leave 20 km long and 14 wide.The area is 2,142.7 Km 2 , which is compounded by 36 subbasins with a surface that varies from 214 Km 2 to 15 Km 2 ; 13 of these units have an average slope of 10%, and only 7 have less than 6%.The medium annual precipitation in the Rio Queretaro basin is 550 mm, 30% under the national average.The rain and regional climate are influenced by the topography, by the slopes of the Pacific Ocean and the Gulf of Mexico, and locally by the limits of the city of Querétaro.
The Rio Queretaro basin covers the municipalities of El Marqués, Colón, Corregidora, Pedro Escobedo, Huimilpan, Amealco de Bonfil, and Querétaro municipality.A high percentage of the state's population is concentrated in the Rio Queretaro basin, consequently such fact produces an excessive use of the natural resources present in the basin, in particular water, ignoring that environmental protection represents an important indicator of sustainable development.Figure 1 shows the areal extent of the basin and location of the selected weather stations employed in the analysis.

Database.
For the present study, climatological information of 11 weather stations administered by Comisión Nacional del Agua (CONAGUA) was available.Gaps in information are mostly associated with failures of the sensors, by the general operation of the weather station or by human causes, during the course of a day or even several days, months, or years.
The data available for the reconstruction of medium, minimum, and maximum daily temperature and evaporation were extracted from the CLICOM database (CONAGUA 2007), updated in February of 2012, as well as the climatological measurements registered at the weather stations located within the basin.
Within the criteria for the selection of weather stations, only those with at least 20 years of records and operating at the time of the study were considered.Additionally, a revision of the weather stations outside the basin was carried out, and only those located within a 10-kilometer radius from the watershed divide were considered, and this analysis was carried out by employing the software Arc View 10 (ESRI).Figure 2 shows the temporary distribution of the daily maximum temperature data where the -axis corresponds to registrations years in and the -axis represents the number of weather stations, and white color represents the existence of 0 to 50 days of maximum and minimum temperature registrations; on the other hand, black color represents the existence of 300 to 366 days of maximum and minimum temperature registrations in the weather station.From this analysis, the period of time selected to reconstruct the missing data is 1986-2008 (between the red parallel lines).

Reconstruction of Missing Temperature Data.
The procedure applied for reconstructing the missing temperature data within the 1986-2008 period for the 11 selected weather stations is explained below.

Air Temperature.
The methodology for reconstructing the data was based on the one presented by Saito and Simunek in 2009 [19], after an extensive literature review was made, including the evaluation of main components and autoregressive models of order  [20,21].Saito and Simunek proposed a synthesis and comparison of methods, and the real daily cycle was used when only daily data were available (with maximum and minimum information).In this paper a sine trigonometric function is used instead of a cosine   function; then the modified method for reconstructing daily temperature data is shown.
Method for Obtaining Residuals.It consists in using the correlation between two or more neighborhood homogeneous weather stations to estimate missing data in periods larger than 1-day time interval.Nonetheless, a simple correlation is not convenient due to the temperature sinusoidal behavior; additionally, with this method, the temperature local effects are eliminated.In order to make the correlation possible between two weather stations, the temperature residuals are correlated as follows.
(1) The daily mean temperature is normalized with a sinusoidal type function with 4 parameters (Sigma plot 10), and the normalization covers an annual cycle.
The sinusoidal function has the form see [1].
(2) Once the daily mean temperature is normalized, the residuals between the temperatures registered each day and the normalized daily mean temperature are calculated.Then these residuals of daily cycles are normalized by applying also a sinusoidal function with 4 parameters in order to obtain the described function of daily mean residuals.
(3) The differences between the normalized daily mean temperature and the normalized residuals at a daily scale are calculated.These differences are used to correlate weather stations.(4) The statistical correlation allows obtaining the value needed in the reconstruction of missing data.For this purpose, the software Statgraphics XV.II was employed to obtain a table of Pearson product moment correlations between the differences for all the weather stations.(5) Then with the data differences a linear function is obtained between the station to reconstruct and the one with which it has the highest correlation.In this step the database of maximum and minimum temperature is reconstructed.(6) Finally, the calculated residuals were added to the adjusted means of the daily cycle in order to obtain a complete database for the corresponding schedule.

Statistical Analysis.
For the generic description of the statistical behavior of data, it is necessary to determine central tendency and dispersion measurements of the residuals such as mean, standard deviation, variance, and kurtosis, among others.The analyses were obtained using SPSS and Statgraphics Centurion XV.II.

Evapotranspiration Estimation.
To estimate potential evapotranspiration the following methods were taken into account: method of Thornthwaite developed from precipitation and runoff data for various drainage basins; simplified Hargreaves method [22] designed to assess potential evapotranspiration for development requiring only temperature data and solar radiation; Blaney and Criddle method, for arid and semiarid areas, which considers water use of a crop under the assumption of water deficit in the soil; evapotranspiration is a function of temperature.For the case study an extreme scenario of "" monthly percentages of daylight hours for a 21 ∘ North latitude was evaluated (the location of the basin); due to the limited availability of climatic variables in traditional weather stations, the method of Hamon for the estimation of temperature data was employed.Thornthwaite developed an empirical method using data of daily average temperature and average number of light hours.Several authors found the complex similarity of Thornthwaite empirical relationship and the expression for estimating the saturation vapor pressure.Papadakis method is based on the consideration of vapor saturation deficit ( 0 − ); thus the expression given by Hamon was obtained with the requirement of data for relative humidity and temperature [23].

Reconstruction of Missing Data
3.1.1.Air Temperature.Maximum and minimum data for each weather station were analyzed with the computer program Sigma Plot 12 (SYSTAT).Figure 3 shows the maximum and minimum temperature data recorded by El Pueblito (22006) station against the corresponding date of data for a Julian time scale, and this was done in order to normalize the data.
With the normalized temperature, the calculation of residual temperatures for each season was carried out; in this way, minimum and maximum residual data were obtained for each station.Figure 4 shows the residuals for the maximum and minimum temperature of station 22006.
A 4-parameter sinusoidal regression was applied to the residuals in order to standardize them for all the seasons.Correlation among these standardized residuals and the observed temperature data in all the weather stations was carried out using the computer package Statgraphics XV.II, and results are shown in Table 1.
Even though some stations have correlations below 0.5, the standard deviation of this data diminished in the same manner compared to that of data with greater correlation   These correlations were employed for the estimation of residuals for missing data, and temperature reconstructed data for maximum (a) and minimum (b) temperature date in station 22006 are shown in Figure 5.The red line represents the data recorded at the weather station and line in black corresponds to estimates with the proposed method; this procedure was performed for all the selected weather stations.

Statistical Analysis.
Statistics for observed and reconstructed temperature data are shown in Table 2.The mean value indicates the form in which data are grouped, and the standard deviation provides us with the medium distance of data with respect to the mean, the variance, and the kurtosis which indicates how the temperature data fluctuate around a maximum value.With the reconstructed database, statistics of data changed in comparison to the observed due to the fact that the number of observations increases, and as a result there was an increase in the medium value of data and a decrease in the standard deviation which will reduce uncertainty in potential evapotranspiration calculations and consequently in the water balance.Observed and reconstructed temperature data of some weather stations are shown in a boxplot in Figure 6.
Potential evapotranspiration was calculated by different methods with the reconstructed temperature values for the 1986-2008 period, resulting in a distribution as shown in Figure 7, which represents the evapotranspiration estimates for the station 22006 "El Pueblito." 0  W 100 ∘ 15  0  W 100 ∘ 30  0  W 100 ∘ 15  0  W 20

Figure 2 :
Figure 2: Data pattern of the available number of maximum and minimum daily temperature dates per year for the employed weather stations.

Figure 3 :
Figure 3: Maximum and minimum daily temperature for El Pueblito (ID 22006) weather station.

Figure 4 :
Figure 4: Residuals of maximum and minimum daily temperature for El Pueblito (ID 22006) weather station.

Table 1 :
Correlations of observed temperature data (a) and standardized residuals (b).

Table 2 :
Statistics of observed and reconstructed temperature data.
+ Figure 6: Observed and reconstructed data in a boxplot figure.