Modeling Evapotranspiration Response to Climatic Forcings Using Data-Driven Techniques in Grassland Ecosystems

Remarkable progress has been made over the last decade toward characterizing the mechanisms that dominate the exchange of water vapor between the biosphere and the atmosphere. This is attributed partly to the considerable development of machine learning techniques that allow the scientific community to use these advanced tools for approximating the nonlinear processes affecting the variation of water vapor in terrestrial ecosystems.Three novel machine learning approaches, namely, groupmethod of data handling, extreme learning machine (ELM), and adaptive neurofuzzy inference system (ANFIS), were developed to simulate and forecast the daily evapotranspiration (ET) at four different grassland sites based on the flux tower data using the eddy covariance method. These models were compared with the extensively utilized data-driven models, including artificial neural network, generalized regression neural network, and support vectormachine (SVM).Moreover, the influences of internal functions on their corresponding models (SVM, ELM, and ANFIS) were investigated together. It was demonstrated that most developed models did good job of simulating and forecasting daily ET at the four sites. In addition to strengths of robustness and simplicity, the newly proposed methods achieved the estimates comparable to those of the conventional approaches and accordingly can be used as promising alternatives to traditional methods. It was further discovered that the generalization performance of the ELM, ANFIS, and SVMmodels strongly depended on their respective internal functions, especially for SVM.


Introduction
Grasslands cover approximately 13% of global land surface [1].It is universally acknowledged that grassland ecosystems play a crucial role in global water and carbon budgets as well as energy balances [2,3].In grassland ecosystems, the interrelations between atmosphere and evapotranspiration (ET), consisting of transpiration and evaporation from the soil and plant systems, are closely related to ecosystem productivity, particularly in arid and semiarid regions [4].Their dynamical interactions between biophysical and biogeochemical processes are substantially affected by a variety of environmental driving factors, such as climate change, nutrient deposition, and increased atmospheric CO 2 [4][5][6].Therefore, measuring and estimating ET in grasslands are relatively essential for understanding the responses of ET to different environmental forces and offering precise ET budgets for solving the issue of water resource allocation and management.Eddy covariance technique has been considered as a reliable approach and is being used extensively to measure grassland ET in different climates [7].A large amount of data from numerous flux tower sites, including ET and other micrometeorological variables, have been collected and archived in global flux network in recent years [8,9], providing a solid foundation for obtaining helpful knowledge of the underlying mechanisms influencing the variation of ET across various time scales from hourly to interannual.However, accurately estimating ET of grassland ecosystems remains a great challenge due to the limited understanding of how ET responds to human practical activities (e.g., land use change, grazing, and fertilization) and climate extreme events (e.g., extreme precipitation, wind storms, droughts, and heat waves).
A broad range of approaches are presently utilized to model and predict terrestrial ET at various spatial levels from ecosystem to regional or global level.These methods can be commonly grouped into three categories: remote sensing-based [10,11], physically based [12,13], and datadriven [14,15].Among these approaches, physically based techniques are universally used by both the ecological and hydrological communities, due to their distinct advantage of offering a physical insight into the complex hydrological process controlling the temporal and spatial variation of ET.However, temporal and spatial heterogeneity of land surface characteristics strongly affects the underlying processes of ongoing water vapor exchange, which considerably increases the complexity of elucidating the hydrological relationship between land and the atmosphere and subsequently impedes the pursuit of obtaining accurate physically based models.
Data-driven methods have been receiving increasing attention in the last decade for ET simulation and prediction at different terrestrial ecosystems.Numerous previous studies have reported the ability of machine learning techniques in the prediction of ET at terrestrial ecosystems [16,17].It has been demonstrated that these approaches are capable of effectively seizing most useful information stored in the available data (e.g., the measurements based on eddy covariance method), without depending on any complex underlying knowledge about the evolving process of ET.Nevertheless, to the best of our knowledge, most studies focused primarily on using three ubiquitous approaches, namely, artificial neural network (ANN), generalized regression neural network (GRNN), and support vector machine (SVM).
In recent years, many novel machine learning techniques have been proposed for handling the complex nonlinear problems, such as adaptive neurofuzzy inference system (ANFIS), extreme learning machine (ELM), and group method of data handling (GMDH).The generalization ability of these advanced approaches has been extensively acknowledged in forecasting the environmental and climatological variables.More particularly, these methods have been widely applied in the nonlinear time series prediction in several various branches of hydrology, such as reference ET forecasting [18,19], evaporation prediction [20,21], and soil moisture estimation [22,23].However, comparatively few attempts to apply these state-of-the-art approaches have been made to date for estimating the ET at terrestrial ecosystems based on the eddy covariance data.
In addition, all the techniques, including the conventional ANN, GRNN, and SVM methods, and the relative novel methods (ELM, ANFIS, and GMDH) proposed by this study remarkably differ in terms of their structures, principles, and parameters.And, therefore, their generalization ability should be different in theory.For this reason, regarding their practical application in the present study, systematically evaluating the ability of all the methods for ET prediction is particularly important with the objective of ensuring the predictive accuracy.Besides, it is also beneficial to identify the strengths and shortcomings of each method and provide helpful suggestions for the relevant researchers.
It is a widely accepted fact that the ability of all the above-mentioned methods is strongly affected by their corresponding inner parameters [24].However, very few studies have reported the roles of different intrinsic parameters for each approach in characterizing the actual behavior of the addressed problems, especially for the relatively new methods, ANFIS and ELM, even though these approaches have been widely applied in other fields.For example, although several recent studies have examined the effects of different membership functions on ANFIS for various hydrological time series prediction [25][26][27], it remains unclear how the ANFIS model is influenced by different identification algorithms for fuzzy inference system (FIS) generation for the present application.In fact, when establishing these models in practical applications, these intrinsic algorithms in most studies are commonly determined by random selection or trial-and-error procedure, which may limit, to some extent, the ability of related models.Accordingly, it is certainly worth examining the impacts of inner parameters on SVM, ANFIS, and ELM for predicting ET, which is also another major task for the current research.
Considering the presently existing above-mentioned issues, the novelty of this paper lies in the use of three state-of-the-art data-driven methods for estimating daily ET based on the available eddy covariance measurements at four different grassland sites.More specifically, the novelty of this paper is threefold as follows.First, we probe into the feasibility and effectiveness of three modern data-driven models, including GMDH, ELM, and ANFIS, for modeling and predicting the ET at the daily time scale.Second, we investigate whether these newly proposed models have the potential to sufficiently complement or replace the conventionally recognized methods, ANN, GRNN, and SVM.Finally, we particularly focus on evaluating the impacts of intrinsic algorithms involved in each model (SVM, ELM, and ANFIS) on their respective generalization ability.

Site Description and Data
Used.In this study, four grassland ecosystems were used to explore the modeling capability of the proposed models.These sites are situated in different countries.The details of each site are shown in Table 1.They were selected for the present study mainly due to their continuous multiyear measurements of water vapor flux (ET) with the eddy covariance technique.In addition, the measurements of other variables were also undertaken at these sites, primarily including CO 2 flux, energy fluxes (latent heat and sensible heat), and meteorological variables.The utilized instruments and processing procedures of related data at each site have been, respectively, reported in previous studies as shown in Table 1.
Statistical parameters of daily measurements utilized in this study during at least 6-year period for all the sites are given in Table 2.As shown in the table, the annual mean air temperature (  ) changes from 6.48 ∘ C at CA-Let to 10.20 ∘ C at HU-Bug; the annual mean soil temperature (  ) is between  [32].Its ultimate goal is to establish the nonlinear relationship between input (independent) variables and output (dependent) variables.In this study, a feed-forward ANN with a single hidden layer was used and its parameters were updated using the backpropagation learning algorithm with the intention of minimizing the error between the target values and the calculated values derived from the network.When designing ANN models in this study, hyperbolic tangent sigmoid and linear algorithms were selected as transfer functions for hidden layer and output layer, respectively, with the purpose of generating output results from each neuron.In addition, the determination of other key parameters and functions, such as the number of nodes of hidden layer and training functions, was also taken into account based on the trial-and-error approach.

Support Vector Machine.
The SVM approach is an important soft computing method and has been utilized in many research fields.The detail of theory about SVM can be found in Vapnik [33] and its advances in the practical applications can be found in some recent reviews [33][34][35].
Based on the accumulated experience, it is noteworthy to point out that the generalization ability of SVM method considerably depends on the choice of kernel function.
There are many alternative kernel algorithms, such as radial basis function (RBF), polynomial, sigmoid, exponential, Laplacian, and rational quadratic.The first three kernel functions are commonly used for SVM models and therefore were compared and assessed in this study.As a matter of convenience, the SVM models with the kernel functions, RBF, polynomial, and sigmoid, were hereinafter referred to as SVM-RBF, SVM-Poly, and SVM-Sig, respectively.For a given case, the critical internal parameters involved in the training process, including the cost factor, the slack factor, and the related parameter of respective kernel algorithm, were carefully obtained through the grid search approach.All the SVM models applied in the present research were performed according to LIBSVM software written by MATLAB programming language [36].

Adaptive Neurofuzzy Inference
System.ANFIS has received much attention for modeling and predicting the hydrological time series (e.g., evaporation, groundwater level, and stream-flow) in recent years.Hence, this method has been widely recognized as a promising alternative tool to traditional data-driven techniques for mapping the nonlinear relationship between independent and dependent variables in different research fields.Selecting a reasonable FIS is an important precondition for successfully elucidating the nonlinear processes underlying a given problem, when establishing an ANFIS model.Presently, Takagi and Sugeno [37] method is extensively adopted in diverse research areas, and, in particular, its powerful ability in estimating various hydrological variables has been proved by accumulating lines of evidence to date.Accordingly, Takagi and Sugeno-based ANFIS models were developed by the present work for all the applied cases.To achieve an optimum AFNIS model, two types of parameters, namely, the antecedent parameters which influence the effectiveness of the applied membership function and the consequent parameters which determine the quality of the system output, need to be optimized appropriately.The hybrid learning procedure is the most commonly utilized to adjust the antecedent and consequent parameters using the gradient descent and least squares methods, respectively.In addition, of particular interest is the impacts of different identification algorithms for generating FISs on the ability of developed ANFIS models.The current research also attempts to address this issue.As a convenience, the ANFIS models with three extensively used identification algorithms, namely, grid partitioning, subtractive clustering, and fuzzy c-means clustering, were abbreviated as ANFIS-Grid, ANFIS-SC, and ANFIS-FCM, respectively.

General Regression Neural
Network.GRNN proposed by Specht [38] based on nonlinear regression theory is widely utilized for function approximation.It is closely associated with the radial basis function neural network, but there exists a slight distinction between these two networks in terms of topological structure.The GRNN model as a onepass learning method is designed by using a highly parallel framework.In fact, the number of neurons in all the layers is strongly dependent upon the number of the used variables as well as the sample size of a given training dataset, and thus these neurons do not require adjustment during the learning.
In addition to this, another noteworthy advantage of the GRNN approach over other data-driven techniques is that the GRNN model does not need much time to select its internal algorithms or parameters, apart from the smoothing factor that highly affects the model generalization performance.At present, there is no standard, universally accepted method for determining the optimal smoothing factor with the aim of guaranteeing the forecasting precision of the GRNN model.In this study, the proper smoothing factor that was set ranging from 0.01 to 1 was obtained by using an iterative algorithm.Moreover, to prevent the overfitting, fourfold cross validation was utilized in the training.Further details regarding the principle of the GRNN approach can be found in Raghavendra and Deka [34] and Yaseen et al. [39].

Extreme Learning Machine. ELM firstly proposed by
Huang et al. [40] is a relatively new, modern data-driven technique and has been successfully applied to deal with the nonlinear problems in diverse areas during the last few years.
The developed ELM method has a topological architecture of single-hidden-layer feed-forward neural network.Its number of hidden nodes can be randomly gained, and their relevant parameters (weights and biases) do not need to be tuned as these properties have been demonstrated to be independent of the training dataset.By contrast, for the traditional backpropagation based ANN approach, almost all the parameters are highly problem-dependent and should be carefully set or optimized by a common, time-consuming trial-and-error process.Moreover, the powerful approximation ability of the ELM approach has been strongly supported by several lines of evidence [39,41].The fundamental principle of ELM approach can be found in more detail in Huang et al. [40].Furthermore, it should be emphasized that appropriate activation function is of importance for guaranteeing the modeling and generalization performance of the ELM method [42].The sigmoid, sine, and hard limit algorithms are the three common types of activation functions applied for the hidden layer.The influences of different activation functions on the modeling ability of developed ELM models were evaluated by this study.For the sake of convenience, the ELM models with the sigmoid, sine, and hard limit algorithms were abbreviated as ELM-Sig, ELM-Sin, and ELM-Hard, respectively.Moreover, the linear activation function was adopted for the output layer for all the developed ELM models.

Group Method of Data
Handling.GMDH approach as a typical inductive high-order regression-type algorithm is commonly subsumed under the category of feed-forward neural network [43].The architecture of this heuristic nonlinear method is established automatically based on selforganization learning algorithm, which is significantly different from other machine learning algorithms.Moreover, the GMDH method can conquer the limitation of requiring prior knowledge regarding a given problem, and redundant input variables can be eliminated effectively during the learning.Besides, the GMDH is highly robust to noise data and thus is not affected by the existing outliers in the training samples.In a well-developed GMDH network, the neuron in each hidden layer is individually connected by two inputs and a single output and acts as a transfer function for representing the results generated by the two neurons from the previous layer.Mathematically, each neuron is represented by using a quadratic polynomial equation with five weights and one bias.Ultimately, the well-trained network can be expressed by using an explicit high-order polynomial function in relation to all the neurons reserved in the network.All the weighting coefficients involved in this iterative function are calculated by a common least square method.More details about GMDH method can be found in Ivakhnenko [43] and Barzegar et al. [44]. 2 that each environmental variable has high linear correlation with ET.Therefore, these variables (  ,   ,   ,  ℎ ) were used as inputs for all the proposed data-driven models.Before the training of each applied model, the input variables and ET were normalized to a range between 0 and 1. MATLAB software (version 8.2, R2013b) was utilized for the development of the models and the related statistical analysis of estimated results.

Model Structure and Evaluation. It is clear from Table
In this study, the coefficient of determination ( 2 ), Nash-Sutcliffe efficiency (NSE), root mean square error (RMSE), and mean absolute error (MAE) were considered as statistical indices for measuring the performance of the developed data-driven models.These statistical indices are, respectively, expressed as below: where ET  and ET  denote the observed and modeled values of daily ET, respectively; ET  and ET  are the means of observed and modeled values, respectively;  is the number of observed values.Regarding the physical significance of these indicators, the  2 was used to measure the proportion of total variance in the observed ET that is explained by the modeled ET.For a perfect model, the  2 is expected to be close to unity.However, this index is limited because it is calculated based on the linear relationships between observed and simulated values and is sensitive to outliers.Alternatively, the NSE represents the level of agreement between observed and simulated ET and can offer useful knowledge about the relative estimation of the generalization performance for a given model.It is sensitive to differences in measured and calculated means and variances and accordingly was employed in the present investigation.Moreover, other valuable information involving the predictive ability of the developed models can also be provided by the RMSE and MAE indicators.Therefore, these performance criteria were employed together to obtain helpful insight into the efficiency of the developed models.

Results
The performance indices, including  The comparisons of daily ET observed and predicted by using the data-driven models over the prediction period in the form of scatter plot at AT-Neu site are shown in Figure 1.As a convenience, the scattered estimates of the optimal ELM, ANFIS, and SVM models are exclusively compared with those of the ANN, GRNN, and GMDH models.As illustrated in Figure 1, the fit line of the SVM-RBF model is closer to the ideal line (1 : 1 line) than those of the other models, considering the corresponding equations of fit lines.However, it can be seen from Figure 1 and Table 3 that the ANFIS-SC model seems to have slightly higher  2 value than the other models and obtains similar scattered estimates to those of the ANN and ELM-Sin models.
The performance of the applied data-driven models for estimating ET over the training, validation, and prediction periods at CA-Let site is shown in Table 4.As can be seen from the table, the ANN model performs the best on the whole, followed by the ANFIS-SC and ANFIS-Grid models.The SVM-Sig model provides the worst accuracy among the twelve models.Based on the  2 , NSE, RMSE, and MAE metrics, the overall performance of these applied models in the prediction period can be ranked as follows: ANN, ANFIS-SC, ANFIS-Grid, ANFIS-FCM, SVM-RBF, ELM-Sin, ELM-Sig, SVM-Poly, GRNN, GMDH, ELM-Hard, and SVM-Sig.Figure 2 illustrates the measured and predicted ET by the optimal models for each approach in the prediction stage at CA-Let site.As shown in the figure, the ANN, ANFIS-SC, and SVM-RBF models have higher  2 and lower RMSE than other models.In addition, the fit lines of these three models are closer to their respective ideal lines (1 : 1 line), in comparison with the other models.
Comparisons of the applied models are made in Table 5 for modeling daily ET at DE-Gri site.Unlike the previous sites, the ELM-Sin model has better accuracy than the other models in terms of the used indices.The ranks of the estimated models in the prediction stage are: ELM-Sin, ELM-Sig, ANN, ANFIS-Grid, GRNN, GMDH, ANFIS-FCM, ANFIS-SC, SVM-RBF, ELM-Hard, SVM-Poly, and SVM-Sig.Figure 3 illustrates the estimates of the optimal models for ET simulation in the prediction phase at DE-Gri site.It is apparent from the figure that the fit line of the ANFIS-Grid model is closer to the ideal line (1 : 1 line), while the ELM-Sin model has the highest  2 and provides less scattered estimates than the other models, which has also been confirmed in Table 5.
Table 6 summarizes the accuracy of the applied models for forecasting daily ET in the prediction phase at HU-Bug site.It can be clearly seen from the table that the GMDH model achieves the best precision among the developed twelve models.Based on the utilized performance indicators, the overall model efficiency of these models in the prediction period can be ranked as follows: GMDH, ELM-Sig, ELM-Sin, ANN, ANFIS-SC, SVM-RBF, ANFIS-FCM, ANFIS-Grid, GRNN, SVM-Poly, ELM-Hard, and SVM-Sig.The scatterplot comparisons of the estimated methods with their respective optimal parameters are made in Figure 4 over the prediction stage at HU-Bug site.As shown from the figure, the slope and bias values of the fit line equation of both the ANFIS-SC and ELM-Sig models are closer to those of their corresponding exact lines (1 : 1 line, slope: 1, and bias: 0), compared with those of the other models.However, the GMDH model seems to be more successful than the other models from the  2 and RMSE viewpoints.
In order to provide insights into the over-and underestimation of the used data-driven models in predicting daily ET, the measured and modeled values by the best models for each site in the whole period are demonstrated in Figure 5.It is evident from the figure that the modeled values of these best models for each site can closely follow the corresponding observed ones, which is previously confirmed in Tables 3-6.However, the peak values during the growing season in the  prediction period appear to be appreciably underestimated by the optimal models for all the sites, especially for CA-Let site, which is also consistent with the scatter plots in Figures 1-4.

Discussion
In the following subsections, we concentrated primarily on discussing the generalization ability of all data-driven models for forecasting daily ET and then on exploring the effects of different internal functions on the ELM, ANFIS, and SVM models.Finally, we also pointed out some potential improvements for the follow-up work.
Our estimates demonstrated that all the examined models, including ANN, GRNN, GMDH, ELM, ANFIS, and SVM, can seize the nonlinear relationship between ET and environmental variables according to the combination of four different performance criteria.Moreover, our modeling results also confirmed that the new computational intelligence techniques (GMDH, ELM, and ANFIS) were capable of effectively acquiring the seasonal and interannual variation in ET driven by the environmental variables.More importantly, these three methods had significant superiority over conventional methods in terms of robustness and simplicity.Specifically, of particular interest regarding the ELM method is the fact that its learning speed is shown to be extremely faster than that of other data-driven approaches, partly because the number of hidden nodes of an ELM model can be randomly determined.In addition, its related parameters (weights and biases) need not be tuned as these properties are problem-independent and thus are not clearly associated with the applied training dataset.
Furthermore, with the advances in machine learning in recent years, a number of new machine learning methods, such as relevance vector machine [45], M5 model tree [46], and genetic programming [47], have been proposed and successfully applied in other diverse fields, such as forecasting of meteorological time series (e.g., air temperature and precipitation) and prediction of water resource variables (e.g., rainfall-runoff, groundwater level, and drought).Consequently, it is also important to investigate the feasibility and effectiveness of these new approaches in dealing with the present ET estimation, which will be undertaken in our future work.
Our modeling results also showed that an obvious difference existed within each method (ELM, ANFIS, and SVM) with various internal functions in terms of predictive performance.Therefore, selecting appropriate internal function for each method is essential with the objective of achieving the best modeling ability.It is highly recommended that different internal functions for each method should be evaluated in advance, when these methods are used for regression and classification problems.More specifically in the present study, for ANFIS method, the evaluated three algorithms for generating FISs had noticeable effects on the performance of ANFIS on the whole.We found that, among the three various ANFIS models, the ANFIS model with the subtractive clustering algorithm performed the best at AT-Neu, CA-Let, and HU-Bug sites, while grid partitioning based ANFIS model generated the optimal estimates at DE-Gri site.Therefore, there was no omnipotent algorithm that was appropriate for all the cases.In addition, Cobaner [48] investigated the ability of two different ANFIS methods, respectively, based on the subtractive clustering and grid partitioning algorithms, in estimating reference ET using daily climatic data, and found that subtractive clustering based ANFIS model achieved more plausible precision with fewer amounts of computation in comparison with grid partitioning based ANFIS model.Moreover, to the best of our knowledge, most studies comparing the performance of different types of ANFIS techniques in other different fields have concentrated mainly on evaluating the roles of the membership functions under grid partitioning algorithm.It should be mentioned that selecting appropriate membership function and determining its optimal number may effectively improve the performance of grid partitioning based ANFIS model.In conclusion, our study has broaden the scope of ANFIS research and provided deep insights into the application of ANFIS method.In the follow-up work, we will investigate together the capability of the aforementioned algorithms (subtractive clustering, grid partitioning, and fuzzy c-means clustering), as well as different types of membership functions under grid partitioning algorithm.
According to the predictive capability and efficiency of SVM, it was found from our results that remarkable difference existed among three different kernel functions (sigmoid, polynomial, and RBF), which has been also confirmed by the previous studies in other fields [49][50][51].Moreover, our estimates indicated that the RBF kernel function for SVM method performed better than the other two kernel functions (sigmoid and polynomial) in the prediction of daily ET.It concurred with the findings from numerous  studies, showing the superiority of RBF over the other kernel functions in solving the regression problems [52][53][54].

Advances in Meteorology
For example, Zounemat-Kermani et al. [55] evaluated the capability of SVM model with four different kernel functions (linear, polynomial, sigmoid, and RBF) for forecasting daily suspended sediment concentrations and further pointed out that the RBF for SVM model was the best choice for modeling hydrological phenomena.Mohammadi et al. [56] investigated the ability of two different types of SVM models based on polynomial and RBF kernel functions in forecasting the horizontal global solar radiation and found that RBF for SVM was highly competent for predicting daily horizontal global solar radiation in comparison with polynomial function.
As a relatively new method, ELM exhibited strong modeling accuracy in predicting daily ET, which has been verified by previous studies for other applications, such as reference ET prediction [57] and stream-flow forecasting [39].In particular, a very noteworthy aspect was that the ELM method presented a remarkable advantage against other data-driven approaches with respect to computational time and efficiency due to its simple network structure and nontuned mechanism [58], which is a beneficial contribution to the solution of some real-time forecasting problems, such as rainfall and runoff time series forecasting.In brief, the ELM technique can be a promising alternative tool to traditional methods for dealing with the regression issue in the current research as well as the gap-filling problem of ET flux measured by the eddy covariance technique.On the other hand, relatively less attention has been drawn toward exploring the effects of various activation functions (e.g., sine, sigmoid, and hard limit activation function) in the hidden layer on the generalization ability of ELM.In the present study, our results demonstrated that the sine and sigmoid activation functions for ELM models played the similar roles in estimating the daily ET and dramatically outperformed the hard limit activation function.Therefore, it is important to emphasize that the sine and sigmoid activation functions are recommended as the optimal options for establishing the ELM models for ET forecasting.
Furthermore, many recent studies have found that selecting an appropriate training function is vitally essential for assuring the predictive ability and reliability of ANN in other fields [59][60][61].Despite the fact that ANN has been recognized as the most popular method to simulate the ET as well as the carbon fluxes at ecosystem level based on the eddy covariance-measured data [62][63][64], the influences of various  training algorithms on the ANN performance have never been examined to date.Consequently, special attention will be given to the evaluation of the modeling abilities of ANN models with a variety of training algorithms for predicting ET and carbon fluxes in our follow-up work.

Conclusions
In recent years, many attempts mainly involving the use of soft computing modeling approaches have been made to simulate and forecast the ET in terrestrial ecosystems.However, the lack of comprehensive comparative researches related to these state-of-the-art modeling techniques largely hinders their applicability and popularity, primarily owing to the confusion in what approach should be appropriately chosen from a variety of data-driven methods in the practical applications.To overcome this hindrance, the current research first attempted to investigate the suitability and effectiveness of three newly proposed methods, GMDH, ELM, and ANFIS, for estimating daily ET at four different grassland sites based on the eddy covariance-measured data.In addition to these techniques, three traditional soft computing techniques, including ANN, GRNN, and SVM, were also employed as benchmarks in order to compare their capability.Moreover, this study also focused on investigating the influences of internal functions on their corresponding models (ELM, ANFIS, and SVM) in terms of the generalization performance on the basis of a set of statistical indices ( 2 , NSE, RMSE, and MAE).
To summarize, the primary findings in the present study can be enumerated as follows: (1) It has been discovered that all the models developed in this study were capable of mapping the nonlinear processes of governing the variation of the ET between the biosphere and the atmosphere at the ecosystem level, and these novel models (GMDH, ELM, and ANFIS) produced estimates comparable to those of the conventional models.
(2) Considering the robustness and simplicity, these novel approaches can be used as promising alternatives to traditional methods for modeling and forecasting daily ET.
(3) The predictive accuracy of the SVM, ELM, and ANFIS models was strongly dependent on their respective internal functions, especially for SVM.Three different kernel functions for the SVM method were together tested and the results suggested that the RBF kernel function substantially outperformed both the polynomial and sigmoid kernel functions.(4) The ELM models with the sigmoidal and sine activation functions generated the similar modeling accuracy and were appreciably superior to the ELM model with the hard limit function.(5) For the ANFIS method, the algorithms for generating FISs had noticeable effects on the performance of ANFIS method.The optimal algorithm can be determined according to a trial-and-error procedure.
It should be noted that all the data-driven modeling techniques strongly driven by a large amount of data are often argued because the intrinsic mechanisms of these welltrained models are still not able to be represented explicitly.Accordingly, this argument is very likely to decrease the credibility of these techniques and further impede their applications.Notwithstanding this limitation, our present investigation does suggest that these data-driven methods based on soft computing can effectively complement physically based models, broaden the horizon of ecological, climatological, and hydrologic researchers, and therefore contribute to the estimates of regional and global water resources under climate change.

Figure 1 :
Figure 1: Comparisons of daily ET between eddy covariance measured and simulated by data-driven models in the prediction period at AT-Neu site.(a) ANN model; (b) GRNN model; (c) GMDH model; (d) ELM-Sin model; (e) ANFIS-SC model; and (f) SVM-RBF model.

Figure 3 :
Figure 3: Comparisons of daily ET between eddy covariance measured and simulated by data-driven models in the prediction period at DE-Gri site.(a) ANN model; (b) GRNN model; (c) GMDH model; (d) ELM-Sin model; (e) ANFIS-Grid model; and (f) SVM-RBF model.

Figure 4 :
Figure 4: Comparisons of daily ET between eddy covariance measured and simulated by data-driven models in the prediction period at HU-Bug site.(a) ANN model; (b) GRNN model; (c) GMDH model; (d) ELM-Sig model; (e) ANFIS-SC model; and (f) SVM-RBF model.

Figure 5 :
Figure 5: Eddy covariance measured and simulated daily ET by their respective best models for the four sites in the whole period.(a) For AT-Neu site using ANFIS-SC model; (b) for CA-Let site using ANN model; (c) for DE-Gri site using ELM-Sin model; and (d) for HU-Bug site using GMDH model.

Table 1 :
Site characteristics used in this study.

Table 2 :
Daily statistical parameters of flux tower measured environmental variables including air temperature ( ), net radiation (  ), relative humidity ( ℎ ), soil temperature (  ), and evapotranspiration (ET) in the whole period.Note.The units of   ,   ,  ℎ ,   , and ET are ∘ C, mol m −2 , %, ∘ C, and mm day −1 , respectively. mean ,  max ,  min ,  sd ,  ku , and  sk refer to the mean, maximum, minimum, standard deviation, kurtosis, and skewness of each variable, respectively. refers to the correlation coefficient between each variable and ET.
2, NSE, RMSE, and MAE, are employed to evaluate the accuracy of the utilized models in predicting the daily ET in this study.2andIAwith larger values and RMSE and MAE with smaller values imply higher model efficiency.Table3shows the estimated results of all machine learning models (ANN, GRNN, GMDH, ELM, ANFIS, and SVM) for ET over the training, validation, and prediction periods at AT-Neu site. Aan be seen from Table3, the ANFIS-SC model gives the best performance in the prediction of ET in the prediction period, with the highest values of  2 (0.9379) and NSE (0.9355) and the lowest value of RMSE (0.3308 mm day −1 ) and MAE (0.2260 mm day −1 ).And the ELM-Sin model provides the inferior results in terms of  2 , NSE, RMSE, and MAE.The SVM-Sig model performs the worst among the twelve models.According to the  2 , NSE, RMSE, and MAE metrics, the overall performance ranks of these developed models in the prediction period can be summarized as follows: ANFIS-SC, ELM-Sin, ANFIS-FCM, ANN, ELM-Sig, GMDH, SVM-RBF, ANFIS-Grid, GRNN, ELM-Hard, SVM-Poly, and SVM-Sig.

Table 3 :
Comparisons of data-driven model performances for evapotranspiration among the training, validation and prediction periods at AT-Neu site.The units of RMSE and MAE are mm day −1 Note.

Table 5 :
Comparisons of data-driven model performances for evapotranspiration among the training, validation, and prediction periods at DE-Gri site.Note.The unit of RMSE and MAE is mm day−1

Table 6 :
Comparisons of data-driven model performances for evapotranspiration among the training, validation, and prediction periods at HU-Bug site.The unit of RMSE and MAE is mm day−1 Note.