Comparative Study of M 5 Model Tree and Artificial Neural Network in Estimating Reference Evapotranspiration Using MODIS Products

Reference evapotranspiration (ET O ) is one of the major parameters affecting hydrological cycle. Use of satellite images can be very helpful to compensate for lack of reliable weather data. This study aimed to determine ET O using land surface temperature (LST) data acquired fromMODIS sensor. LSTdatawere considered as inputs of twodata-drivenmodels including artificial neural network (ANN) andM5model tree to estimate ET O values and their results were compared with calculated ET O by FAO-Penman-Monteith (FAO-PM) equation. Climatic data of five weather stations in Khuzestan province, which is located in the southeastern Iran, were employed in order to calculate ET O . LST data extracted from corresponding points ofMODIS images were used in training of ANN andM5model tree. Among study stations, three stations (Amirkabir, Farabi, and Gazali) were selected for creating the models and two stations (Khazaei and Shoeybie) for testing. In Khazaei station, the coefficient of determination (R) values for comparison between calculated ET O by FAO-PM and estimated ET O by ANN and M5 tree model were 0.79 and 0.80, respectively. In a similar manner, R values for Shoeybie station were 0.86 and 0.85. In general, the results showed that both models can properly estimate ET O by means of LST data derived fromMODIS sensor.


Introduction
Decline in availability of water for agriculture is one of the most serious challenges facing human life that has affected agricultural production in some arid and semiarid regions around the world.Determining future demands of water for agriculture section includes computation of several factors such as runoff, groundwater, precipitation, and evapotranspiration (ET) [1].ET is identified as the combination of two different processes including evaporation from the soil surface and crop transpiration [2].
Accurate and reliable estimates of ET are necessary to determine temporal variations in irrigation requirement, improve allocation of water resources, and evaluate the effect of changes in land use and crop patterns on the water balance [3].Considering difficulties in direct measurement of ET [4], this parameter is estimated through reference evapotranspiration (ET  ) and crop coefficient (  ) for a specific crop [5].
Therefore, calculation of ET  (evapotranspiration of the given plant) and subsequently crop water requirement as irrigation water depend on ET  estimates.ET  is defined as the rate of ET from a reference surface in such a way that the surface is assumed to be covered with a hypothetical grass with specific characteristics [2].A large number of methods have been offered in order to model ET  [2,6,7]; the majority of these methods are extremely complicated and rely on weather data such as temperature, solar radiation, wind speed, and air humidity [1,8].ET  is principally calculated by physically based equations (e.g., Penman-Monteith (PM) equation) or by empirical relationships between meteorological variables.
The FAO-Penman-Monteith (FAO-PM) method is currently recommended as an accepted standard technique for computing ET  .This method employs a variety of complex equations, which are based on weather data, including air temperature, humidity, radiation, and wind speed [9].The requirement of various weather data, which are not mainly et al. [21] compared calculated ET  by ANN to the ET  values of radiation and thermal models and declared that ANN can provide more accurate results than empirical models.
Due to the simple geometric structure of model tree and providing fast computing set of IF-THEN rules, which are easily understandable [31], the model tree has been utilized in different scientific studies.Pal and Deswal [1] in a study evaluated the capability of M5 model tree to model ET  in daily scale, and the obtained results showed that M5 model tree can be properly used in this context.Rahimikhoob et al. [11]

Study Area and Data. Farms of Sugarcane Development
Project, which are located in Khuzestan province, Iran, were considered as the study area in the present research.Khuzestan province is stretched in southeastern part of the country, with the area of approximately 63,238 km 2 between longitudes of 47 ∘ 38  E to 50 ∘ 32  E and latitudes of 29 ∘ 57  N to 33 ∘ 00  N.This province is located adjoining the boundary of Iraq from the west and Persian Gulf from the south.Khuzestan province according to de Martonne climate classification has a semiarid climate.The average temperature during the winter season is nearly 9.14 ∘ C, and occasionally its minimum falls down a few degrees below 0 ∘ C. The average temperature in the summers is approximately 31.2 ∘ C, whereas its maximum sometimes exceeds 50 ∘ C. Required climatic variables in order to estimate ET  in this study were collected from weather stations, namely, Amirkabir, Farabi, Gazali, Khazaei, and Shoeybie for the years of 2006 and 2007 in the periods when satellite images existed.The meteorological data consisted of daily observations of maximum and minimum temperature, sunshine duration, relative humidity, and wind speed which have been used in estimating ET  by FAO-PM equation.Figure 1 illustrates the location of weather stations in the study area.

MODIS Images.
In the current study, data of daytime LST (LST  ) and nighttime LST (LST  ) obtained from the MODIS sensor were used as inputs to the M5 model tree and ANN models.The MODIS sensor, which is on board of the Terra and Aqua satellites, is able to provide images in 36 spectral bands between 0.62 and 14.385 m.Spatial resolutions of the sensor are 250 m, 500 m, and 1 km in several bands.In the thermal-infrared (TIR), MODIS sensors have 1 km spatial resolution [19].This sensor performs a range of daily regular observations on the sea, land, and atmosphere in the global scale; moreover, it has an extensive and continuous spectral and spatial coverage.MODIS sensor was launched by the Terra and Aqua satellites in 1999 and 2002, respectively.These satellites both have a sun-synchronous, near polar, circular orbit; Terra satellite everyday crosses over the equator from north to south, while Aqua satellite has an inverse movement from south to north.United States National Aeronautics and Space Administration (NASA) is responsible for processing and providing MODIS sensor's data.NASA supplies the data in a large variety of products applicable to land, ocean, and atmosphere uses [4].The data and products of MODIS sensor are available on the NASA's website (http://modis.gsfc.nasa.gov/).The images selected for this study are the level three (L3) data of the MODIS sensor (MOD11A1), which are related to Terra satellite.This product provides daytime and nighttime LST data collected on a 1 km spatial resolution and gridded in the sinusoidal projection system [33].The MODIS sensor LST data is acquired from two TIR channels, that is, 31 (10.78-11.28m) and 32 (11.77-12.27m) using the split-window algorithm [34].
A total of 428 images of the MODIS sensor associated with MOD11A1 production from the Terra satellite in the time periods of 2006 and 2007 covering the study area were retrieved from the Land Processes Distributed Active Archive Center (LP DAAC) (http://lpdaac.usgs.gov).Cloudless sky and fair weather were the reason for selecting these pictures.Changing coordinate systems from the sinusoidal to UTM (datum WGS84) was performed using MODIS Reprojection Tool (MRT).Afterward, locations of study stations were projected on the images and then LST  and LST  were extracted from the corresponding pixels of the images.Extracting LST values in the desired stations was done by means of Hawth's Analysis Tools (HAT) in ArcGIS 9.3 software.Unit of LST values extracted from the images was Kelvin, so before utilizing them as model input, these values were converted to Celsius.
In the present study, whole LST values extracted from MODIS images in three stations, including Amirkabir, Farabi, and Gazali, were used to construct and train the ANN and M5 model tree, and two other stations, Khazaei and Shoeybie, were applied to test mentioned models.

Artificial Neural Network (ANN).
ANNs are known as powerful machine-learning techniques, which are extensively used for numerical prediction and classification [35].ANNs are also operative tools to model nonlinear systems and need less amount of data as inputs than customary mathematical approaches [27].A neural network is based on processing a set of data that its analysis procedure has been inspired by biological neural systems.Neural networks consist of an interconnected group of neurons placed near to each other in layers that implement data processing through neurons and their connections.ANNs included one input layer, an output layer, and one or more hidden layers; neurons of each layer are joined to all neurons of the previous layer by means of weighted connections.The input vectors are received by neurons of the input layer and the values can be conveyed to the next layer of processing elements across connections.The number of neurons in each hidden layer is determined by the user.This process continues up to the output layer.ANN modeling is carried out in two stages including training and testing.In the training stage, after getting input data, neural network makes attempts to convert the inputs into desired outputs.Connecting weights of network neurons are determined in this stage; these weights in the testing stage are examined using various datasets.
A multilayer perceptron (MLP) ANN with one input layer, one hidden layer, and one output layer was used to estimate ET  from LST  , LST  , mean LST (LST  ), and day of year (DOY) data as inputs.The transfer function in the networks was log-sigmoid for this study.The accuracy of the networks was evaluated for each epoch in the training through mean squared error (MSE).The backpropagation (BP) algorithm was employed to train MLP neural network.Levenberg-Marquardt (LM), a second-order nonlinear optimization technique, was used with an early stopping criterion to improve the network training speed and efficiency.The target error for the training of networks was equal to 10 −4 .With the specification of this target error, the number of iteration for all ANN models was placed to 1000.

M5 Model Tree. M5 model tree was introduced by
Quinlan in 1992 [36].The model is established according to a binary decision tree in which there are linear regression functions in the terminal node (leaf), which sets a relationship between independent and dependent variables [11].Treebased models are made according to a divide-and-conquer method for constructing a relationship between independent and dependent variables [37].The tree models can be also used for qualitative and quantitative data [9].
Building the M5 model tree involves two separate steps [36,38].The first one consists of dividing data into subsets Figure 2: (a) Example of M5 model tree, a splitting the input space  1 ×  2 by M5 model tree algorithm and (b) diagram of model tree with six linear regression models at the leaves [11].
to construct the model tree.The dividing criterion is based on the standard deviation of the subset values that reach a node as a measure of error value in that node and additionally calculating the expected reduction in the error as a result of testing each attribute at that node.Equation of standard deviation reduction (SDR) is given as follows [1]: where  is defined as a set of examples that reaches the node;   demonstrates the subset of examples that have the th output of the potential set, and SD is the standard deviation [39].Due to branching process, data in child nodes (subtree or smaller nodes) have fewer SD than parent nodes (greater nodes).After examining all possible structures, a structure would be picked out that has the maximum expected error reduction.This dividing often creates a great tree-like structure that leads to overfit structure.In order to avoid overfitting, in the second step the overgrown tree is pruned, and pruned subtrees are replaced with linear regression functions.Figure 2 where ET  is the reference evapotranspiration (mm d −1 ), Δ is the slope of the saturation vapor pressure curve (kPa ∘ C −1 ) at the daily mean air temperature ( ∘ C),   and  are the net solar radiation and soil heat flux density (MJ m −2 d −1 ),  is the psychrometric constant (kPa ∘ C −1 ),  is the daily mean temperature ( ∘ C),  2 is the mean wind speed (ms −1 ),   is the saturation vapor pressure (kPa), and   is the actual vapor pressure (kPa) [2].
In current study, the daily values of Δ,   ,   , and   were calculated using the equation given by Allen et al. [2].For calculation of   , it is needed to calculate daily solar radiation (  ).Allen et al. [2] proposed   to be calculated using the Angstrom formula, which relates solar radiation to extraterrestrial radiation (  ) and relative sunshine duration.The formula has the following form: where  and  are, respectively, the actual daily sunshine duration and daily maximum possible sunshine duration, where   is ET  computed by FAO-PM,  denotes the average amount of computed ET  computed by FAO-PM,   is ET  computed by ANN/M5, and  is the average computed ET  amounts by last models.

Results and Discussion
In the first stage of the study, before calculating the ET  , the correlation between LST obtained from MODIS sensor and air temperature was determined.Accordingly, scatter plots of LST values provided by MODIS images versus air temperature are presented for different stations in Figures 3 and 4.
As it can be perceived in Figure 3, there is a high correlation between the LST  and minimum air temperature ( min ).The greatest  2 was observed between LST  and  min in Shoeybie station, which was equal to 0.91; the minimum  2 was seen in the Amirkabir station with the value of 0.87.Furthermore, for all stations,  2 of two aforementioned parameters was entirely assessed at 0.88.In Figure 3, symmetrical and appropriate distribution of points around the line of best fit (line 1 : 1) indicates a strong correlation between the two parameters.
In Figure 4, relation between the LST  and maximum air temperature ( max ) displays that the greatest  2 between LST  and  max is related to Amirkabir and Khazaei stations ( 2 = 0.91); the minimum amount of this coefficient is associated with Gazali station ( 2 = 0.81).For all study stations,  2 was generally equal to 0.78.Overall, it can be concluded that there is a strong relationship between these two parameters.The scatter of points around the line of best fit in the Figure 4 depicts an overestimation of LST  over  max .This overestimation is more pronounced for the temperatures over 30 ∘ C. For all examined datasets, the correlation between LST  and  min is higher compared to the correlation between LST  and  max for thermal products of MODIS sensor.Moreover, previous studies concerning estimation of air temperature have similarly confirmed that correlation between LST  and  min is greater than correlation between LST  and  max [12,19].
In the second stage, in order to calculate ET  using LST values extracted from the MODIS sensor (LST  and LST  ), ANN models and M5 model tree were employed.In addition to LST  and LST  , two parameters including mean LST (LST  ) and day of year (DOY) were considered as inputs for generating model in order to increase accuracy of ET  estimates.Thus, in producing ANN model and M5 model tree, four parameters including LST  , LST  , DOY, and LST  were taken into consideration as input and ET  calculated from weather data using FAO-PM equation as output.
For generating ANN model, training data (whole data of Amirkabir, Farabi, and Gazali stations) were applied, and the number of nodes in the hidden layer was obtained using trial and error method.The number of nodes in the hidden layer was considered varying from 1 to 12; based on error statistical parameters, a network with 3 nodes in the hidden layer indicated the lowest error in comparison of ET  values  calculated by FAO-PM method.Table 1 presents the values of error parameters for the neural network with the structure of three nodes in the hidden layer.In Figure 5, scatter diagram of calculated ET  values through ANN for all training data has been plotted against ET  values obtained by FAO-PM equation.As shown in Figure 3,  2 = 0.79 demonstrates that the ANN has provided an acceptable determination coefficient in approximating ET  .In this figure, the points density increases around the line 1 : 1 in large amounts of ET  ; that is, it represents the overestimation or underestimation of the data that are situated in this area.For testing the methods of ANN and M5 model tree, data of Khazaei and Shoeybie stations have been used.In Figure 7, the scatter plot of calculated ET  through two models of ANN and M5 model tree versus FAO-PM values for the mentioned stations have been shown. 2 values associated with calculated ET  by ANN against FAO-PM equation ET  values for Khazaei and Shoeybie stations were 0.79 and 0.86, respectively.Additionally, the same  2 values related to M5 model tree for aforementioned stations were 0.80 and 0.85, respectively.In both methods, ET  values were properly distributed around line 1 : 1.For smaller amounts of ET  , there was been a better compatibility between the calculated values by two models and FAO-PM values compared to greater amounts of ET  .In other words, underestimation or overestimation of ET  is related to warm seasons further than cold seasons.The evolutions of calculated ET  values over time have been illustrated for two test stations in Figure 8.As perceived in this graph, the maximum amounts of ET  by these two models were not entirely compatible with FAO-PM values.This could be due to the lack of consideration of some weather parameters affecting ET  rate (e.g., wind speed, solar radiation, and relative humidity) as inputs of ANN model and M5 model tree.
In Table 2, error statistics of ET  estimation by ANN and M5 model tree have been presented for two test stations.According to the statistics of RMSE and MAE, both methods performed equally well.Similar results have been reported about close performance of ANN and M5 models in rainfallrunoff modeling [43] and river flow forecasting [44].The main advantage of M5 model tree is that it provides the results in a simple and comprehensible form of regression equations [1], which can be easily used in the calculation of ET  .
It seems generally that LST  and LST  values obtained from MODIS sensor, making the use of both ANN and M5 models, can be properly utilized in the estimation of ET  .

Conclusion
In this study, attempts were made to calculate ET  using LST values extracted from images of MODIS sensor.For this  models have shown close performance in determining ET  .It seems that considering other significant variables affecting the ET process such as relative humidity and wind speed (as model inputs) can lead to higher accuracy.Overall, it can be concluded that, unlike complex ANN method, M5 model tree due to its simple structure and providing simple regression equations, is a more appropriate method in determining ET  using LST data derived from MODIS sensor.
In this paper A M5 tree model with some simple linear regressions was developed to estimate ET  using MODIS LST data.Considering simple linear equations and minor data required in the developed M5 tree model, it is expected that this method could be an alternative to calculate ET  in the study area.However, this model provided acceptable results in the current area but it seems that its applicability in the other region of the world or the region with similar climate condition needs more investigation.

Figure 1 :
Figure 1: Location of weather stations in the study area.

Figure 3 :
Figure 3: Scatter plot of  min against LST  .

Figure 6 :
Figure 6: Scatter plot of estimated ET  using M5 model tree against calculated ET  by FAO-PM equation for training data.

Figure 7 :
Figure 7: Scatter plot of estimated ET  using ANN and M5 model tree against calculated ET  by FAO-PM method, (a) ANN method, and (b) M5 method.

Figure 8 :
Figure 8: Time variations of estimated ET  using ANN and M5 model tree for test statins, (a) ANN method, and (b) M5 method.
conducted a study to assess the performance of M5 model tree in reaching ET  values in a semiarid region of Iran by means of   values of pan evaporation.They asserted that this model shows more satisfactory performance than other approaches in estimating ET  in the study region.Sattari et al. [32] compared ANN and M5 model tree in determining ET  and concluded that both ANN model and M5 model tree can properly estimate ET  .ANN suggested a better performance; however, M5 model tree offered simple linear regressions that can simply calculate ET  .The main aim of this study is to evaluate the ability of M5 model tree and ANN model in estimating daily ET  using LST values obtained from MODerate Resolution Imaging Spectroradiometer (MODIS)/Terra sensor.The accuracy of calculated ET  by means of M5 model tree and ANN model has been investigated by comparing with the ET  amounts of FAO-PM method.
[2,40,41]trates splitting the input space  1 ×  2 (independent variables) into six linear regression functions at the leaves (labeled LM1 through LM6) by M5 model tree algorithm.General form of the model is  =  0 +  1  1 +  2  2 , in which  0 ,  1 , and  2 are linear regression constants.Figure2(b)shows branches relations in the form of a tree diagram[19].2.5.FAO-PM Equation.FAO-PM equation, a physically based method, is widely used in the calculation of ET  and is a reliable method in the estimation of ET  in various climate conditions[2,40,41].Moreover, FAO proposed this equation as a standard method for estimating ET  .Although the most precise method for estimating ET  is to employ lysimeter data, considering the absence of such data in the study region, assessment of ANN and M5 model tree performance was made by FAO-PM equation.The equation for estimating daily ET has been presented as follows[2]:ET  = 0.484Δ (  − ) +  (900/ (  + 273))  2 (  −   ) Δ +  (1 + 0.34 2 ) , and   are, respectively, the daily global solar radiation (MJ m −2 d −1 ) and the daily extraterrestrial solar radiation (MJ m −2 d −1 ) on a horizontal surface, and  and  are empirical coefficients which depend on location.The values of a and b coefficients are considered as default values, 0.25 and 0.5, respectively, proposed by Allen et al. [2].

Table 1 :
Error statistics in estimating ET  for model tree and ANN methods using training data.the software and established linear regression relations.Afterward, a scatter plot of estimated ET  by M5 model tree versus calculated ET  by FAO-PM equation has been presented for training data (Figure 6);  2 value equal to 0.83 indicates proper agreement.The distribution of points around the best fit line represents higher accuracy of this method in the estimation of low ET  values; in the great values of ET  the error amount increases as well.The comparison of error statistics concerning developed models by ANN model and M5 model tree in estimating ET  using training data can be drawn in Table 1.As seen, M5 model tree, which has shown RMSE and MAE values, respectively, equal to 1.27 and 0.96 mm d −1 , has resulted in higher accuracy than ANN in estimating ET  . by

Table 2 :
Error statistics in estimating ET  by ANN and M5 model tree using testing data., two methods of ANN and M5 model tree were selected in order to estimate ET  ; four parameters including LST  , LST  , LST  , and DOY were considered as model inputs, and ET  was calculated from FAO-PM equation as model outputs.A comparison of estimated ET  using each mentioned model with the values obtained from the FAO-PM method revealed high accuracy of both methods.The average amounts of RSME in two test stations for ANN and M5 model tree were, respectively, 1.35 and 1.34 mm/day.According to the error values, it can be observed both purpose