A Comparative Analysis of Data-Driven Empirical and Artificial Intelligence Models for Estimating Infiltration Rates

Infiltration is a vital phenomenon in the water cycle, and consequently, estimation of infiltration rate is important for many hydrologic studies. In the present paper, different data-driven models including Multiple Linear Regression (MLR), Generalized Reduced Gradient (GRG), two Artificial Intelligence (AI) techniques (Artificial Neural Network (ANN) and Multigene Genetic Programming (MGGP)), and the hybrid MGGP-GRG have been applied to estimate the infiltration rates. +e estimated infiltration rates were compared with those obtained by empirical infiltrationmodels (Horton’s model, Philip’s model, andmodified Kostiakov’s model) for the published infiltration data. Among the conventional models considered, Philip’s model provided the best estimates of infiltration rate. It was observed that the application of the hybrid MGGP-GRGmodel and MGGP improved the estimates of infiltration rates as compared to conventional infiltration model, while ANN provided the best prediction of infiltration rates. To be more specific, the application of ANN and the hybrid MGGP-GRG reduced the sum of square of errors by 97.86% and 81.53%, respectively. Finally, based on the comparative analysis, implementation of AI-based models, as a more accurate alternative, is suggested for estimating infiltration rates in hydrological models.


Introduction
Infiltration can be defined as the process by which water enters the surface of Earth [1]. It leads to the entrance of water into the soil, thereby catering to groundwater recharge and subsurface runoff. In essence, the infiltration phenomenon is among the most crucial processes of water cycle. Furthermore, estimates of infiltration capacity of soil is required in the design of efficient irrigation systems, estimation of evapotranspiration, groundwater recharge, surface runoff, effective rainfall, crop water requirement, and transport of chemicals in surface and subsurface water [2]. As a result, modelling and prediction of infiltration rates is an inevitable part of hydrological modelling. For instance, Morel-Seytoux [3] reviewed the importance of infiltration in large-scale hydrologic modelling. Furthermore,Šraj et al. [4] pointed towards the impact of the estimation of infiltration rates on the runoff hydrograph, which plays a vital role in watershed modelling and water management. Similarly, Wen et al. [5] demonstrated the implication of excessive infiltration on watershed models. Finally, these studies demonstrated why an accurate estimation of time-dependent infiltration is important in hydrological modelling.
Owing to the wide applications of the infiltration rate, its estimation has gained significant attention from researchers. Over the years, various infiltration models have been proposed by the researchers for the estimation of infiltration rates. ey include models that have physical, semiempirical, and even empirical formulations. Despite the development of several models, no single model exists that outperforms other ones universally. e suitability of infiltration model for a particular site depends on the type of soil and field conditions [1]. In this regard, many comparative studies have been conducted to assess the suitability of various infiltration models for different soil types under varying field conditions. Mishra et al. [6] conducted one of the most comprehensive analyses on suitability of infiltration models for different soils. Similarly, the methodology used to model infiltration rate has a significant impact on the estimation of infiltration. Deep and Das [7] compared various optimization algorithms to estimate the parameters of infiltration models. Nonetheless, the application of different optimization techniques can only move the solution from local optimum parameters towards global optimum parameters, while they cannot increase the flexibility of infiltration models to mimic actual infiltration rates. Haghiabi et al. [8] employed a dimensionless form of infiltration data to estimate infiltration parameters accurately. However, Zakwan [9] suggested that such transformation may not necessarily improve the accuracy of infiltration equations. Finally, Chen et al. [10] utilized genetic algorithm to improve the estimate of Green-Ampt infiltration model under a rainfall condition.
Recent applications of computational techniques in water resource engineering have widened the scope further [11][12][13][14][15][16][17][18][19][20][21]. With the advancement in the computational method and modelling approaches, the application of these approaches has provided a viable alternative towards the estimation of infiltration rates also. Kumar and Sihag [22] applied Gene Expression Programming (GEP) to model infiltration rates. Moreover, Dewidar et al. [23] proposed the application of fuzzy logic to estimate the infiltration rates. In addition, Patle et al. [24] employed a multiple linear regression model to predict time-dependent infiltration values based on several soil properties such as bulk density, silt, sand percentage, and moisture content. Furthermore, Sihag et al. [25] exploited the support vector machine (SVM) for modelling infiltration rates in sandy soil. Also, Pahlavan-Rad et al. [26] compared the performance of Multiple Linear Regression (MLR) and Random Forest Tree in depicting the spatial variation of infiltration rates and reported the superiority of Random Forest Tree over MLR. Recently, Sepahvand et al. [27] utilized several data-driven models to predict infiltration rates. eir investigation revealed the superiority of neural networks over other data-driven techniques such as model tree, Gaussian process, and regression analysis. According to the recent studies, considering time as the exclusive state variable in empirical models should be revisited in favour of better infiltration predictions, while the AI-based models were found to have a better performance in comparison with the conventional infiltration models. erefore, despite previous efforts in improving the estimation of infiltration rates, further studies are needed to explore these issues. e present study aims to compare the performances of different infiltration methods. Additionally, it attempts to assess the capability MGGP and of the novel hybrid MGGP-GRG to model the infiltration process. In a bid to seek for a better time-dependent infiltration model, the performances of the MGGP-based models were compared with those of the conventional models, regression techniques, and commonly used neural network.

Data.
In the present study, the infiltration data reported by Sihag et al. [28] were utilized. e data were divided into training and testing data sets. To be more precise, 75% of the data were used for training, while the rest of the data were exploited to test the obtained results. Table 1 summarises the data sets used in the present paper. Figure 1 shows the observed infiltration data at the same time duration. Also, it illustrates that the infiltration rate may be dependent on other factors (soil properties, such as bulk density and sand percentage) apart from time. e infiltration data set, which was obtained from the literature [28], belongs to the infiltration observations carried out at Davood Rashid and Honam regions in Lorestan Province and the Kelat region in Ilam Province in Western Iran.

Conventional Infiltration Models.
ere are a number of infiltration models available in the literature. Brief description of some of the commonly used infiltration models considered in the present study is as follows.
(1) Horton's Model Horton [29] proposed an empirical equation, which is presented in the following equation, for exponential decay of the infiltration rate after analysing several infiltrometer data sets: where f is the infiltration capacity at any time t from the start; f c is the final or ultimate infiltration capacity occurring at t � t c ; f 0 is the initial infiltration capacity at time t � 0; and K is Horton's decay coefficient.
(2) Philip's Model Philip [30] proposed an infinite series solution of Richard's equation to drive a relationship between the cumulative infiltration (F) and soil properties. It is presented in the following equation: By differentiating the above equation, the infiltration rate may be represented as

(3) Modified Kostiakov's Model
Kostiakov [31] observed the temporal variation of infiltration into soil and proposed a time-dependent infiltration model, invariantly known as Kostiakov's model. e major limitation of Kostiakov's model is that it approached to zero final infiltration rates rather than toward constant final infiltration rates and infinite infiltration rates at the start. Smith [32] modified Kostiakov's [31] equation to include the constant term f i . e modified version is shown in the following equation: e parameters of different infiltration models were obtained by minimizing the sum of square of errors using a 2 Complexity nonlinear optimization tool.
us, the objective function becomes where f obs is the observed infiltration rate and f est is the estimated infiltration rate at any time t.

Multiple Linear
Regression. MLR has been widely used in water resource engineering [17,33]. It has also been applied to estimate the infiltration rate [25]. In accordance with MLR, infiltration rate can be expressed as where c 1 , c 2 c 3 ,c 4 , and c 5 are coefficients, f is the infiltration rate in cm/min; t is time in minutes; S is the percentage of sand; D is the density in g/cm 3 .

Generalized Reduced Gradient (GRG).
GRG is a gradient-based nonlinear optimization technique [34]. Earlier, Zakwan et al. [1] and Muzzammil et al. [35] suggested that GRG technique is superior to the conventional graphical method for estimating infiltration parameters and rating curve parameters. In accordance with GRG, the infiltration rate can be expressed as where c 5 ,c 6 , c 7 , and c 8 are coefficients.
In the present study, GRG solver embedded in Microsoft Excel was used to estimate the infiltration rate based on minimizing the sum of square of errors. Detailed explanation on working of GRG technique is available in the literature [17,36].

Artificial Neural
Network. ANN is one well-documented AI model. It has been used for solving various problems in water resources and hydrological modelling [37,38]. Generally, ANN has a few layers, whose neurons store data. e neurons in each layer (input, hidden, and output layers) are connected with neurons in the previous and next layers, whereas there is no connection between neurons in a typical layer [39]. e flexible architecture of ANN basically facilitates the estimation of a relationship between input and output data [40]. In this study, a feed-forward ANN was exploited to predict the rate of the infiltration. e controlling parameters of ANN were set as those used in the previous studies [41].

Multigene Genetic
Programming. MGGP is a modified version of genetic programming (GP), which is classified as an AI technique [42]. Not only does it utilize genetic algorithm as its search engine but also it works as a flexible estimator without the need to know the shape of a prediction model under investigation [43]. In essence, MGGP follows a similar solving approach as GP using a tree-like structure, while it enables the use of more than one gene, i.e., tree, in each individual. is characteristic benefits MGGP in the light of developing estimation models when the relation among involved variables is complicated to study. As a result, a typical MGGP solution consists of a set of equations, each associated with one gene, which is algebraically summed up using weighting coefficients. ese coefficients are calibrated in MGGP, while a term invariantly called as bias is also added to the final solution. e terms comprising the  final solution of MGGP help in improving its flexibility in capturing the relationship between input and output data. In this study, an open-access code of MGGP was exploited. is code was adopted form the literature [44], while it was used in previous studies for other purposes [20]. It minimizes the root mean square of errors between the estimated and observed values of the normalized infiltration rates. Additionally, the MGGP parameters were selected from previous studies [20,43]. Since each run of MGGP may result in a unique equation, more than 100 runs of MGGP were considered for developing the relation between the infiltration rates and other variables involved. e common number of MGGP runs in the literature is 50 [20,45], while the double number of runs, i.e., 100, was taken into account to make sure that the best relation was achieved.

Hybrid MGGP-GRG Technique.
e hybrid MGGP-GRG was first proposed for developing stage-discharge relationships in the literature [20]. In this technique, MGGP and GRG are used in two successive steps to find the best-fit model. Figure 2 depicts the flowchart of the hybrid MGGP-GRG for estimating infiltration rates. As shown, MGGP is initially operated to search for the best-fit form of equation to the data, while the GRG technique is utilized to optimize the coefficients of the equation obtained by MGGP. Hence, this hybrid technique not only benefits from the powerful capability of MGGP for seeking an accurate prediction model, but also uses GRG to enhance the performance of the estimation model.

Performance Evaluation Criteria.
e performance of infiltration models and soft computing techniques was compared based on several criteria, which are presented in the following equations [28,46]: where f obs is the observed infiltration rate, f obs max and f obs min are the maximum and minimum observed infiltration rates, f est is the estimated infiltration rate at any time, and f is the mean of the observed infiltration capacity. Nash criterion has been widely used as an indicator for goodness of fit, while its value ranges from 0.0 to 1.
where IR max (x i ) and IR min (x i ) are the minimum and maximum infiltration rate determined by considering the variation of the input parameter (x i ) when each one of other input parameters are set as their average values. e more the SA percentage for a specific input variable, the higher the model is sensitive to that variable.

Reliability Analysis.
e reliability analysis is basically conducted to investigate the overall consistency of a prediction model. For this analysis, the relative error for each data point is achieved by the estimation model and compared with a threshold. en, the number of cases, which have an equal or lower relative error than the threshold specified, is divided by the total number of points. Finally, the aforementioned ratio in the percentage would be the reliability metric, which demonstrates how reliable the prediction model performs in accordance with the desirable threshold. In this study, the reliability analysis was carried out for all methods used for predicting the infiltration rate, while the threshold was selected to be 20% based on the literature [49].

Results and Discussion
Accurate estimation of infiltration rate plays a vital role in various aspects of watershed hydrology. e present work focuses on improving the estimates of the infiltration rate through application of different soft computing approaches. e infiltration rates estimated by these techniques were compared with those approximated by the conventional infiltration models (Horton's model, modified Kostiakov's model, and Philip's model). In the conventional infiltration model, the observed infiltration rates and time were used as input data in accordance to model equations to obtain the estimated infiltration rate. On the other hand, in MLR, GRG, ANN, MGGP, and the hybrid MGGP-GRG models, the observed infiltration rates, time, sand percentage, and density were used as the input variables to obtain the estimated infiltration rates. Table 2 presents the model parameters obtained in the training phase for the three conventional infiltration models. For the test phase, these parameters were used to estimate the infiltration rate based on equations (1), (3), and (4). e results of different approaches considered in the present study were compared with respect to four criteria for both train and test data. is comparative analysis is shown in Table 3. In this comparison, the same data divisions were considered for all methods. e metrics used for comparing different infiltration models are given in Table 3. Based on  Table 3, it may be observed that the performance of Horton's model was the worst for both training and testing parts of data. e modified Kostiakov's model improved the estimates of the infiltration rate by almost 4% and 10% as compared to those of Horton's model during training and testing, respectively. e performances of Philip's model and modified Kostiakov's model were almost comparable. Table 3 reveals that the technique used to model infiltration rates influences the estimates of the infiltration rate considerably. It can be observed that MLR provides the worst estimates of infiltration rates, which may depict the nonlinear nature of the infiltration process. e conventional models provide slightly better predictions of infiltration rates as compared to those obtained by MLR. Application of GRG solver further improves the estimate of infiltration as equation (7) involves a higher nonlinearity and more number of parameters as compared to equations (1)-(4). Before the application of

Comparison of the Conventional Models with Soft Computing Approaches. A perusal of
Figures 3-10 present the relative error plots obtained from different conventional infiltration models and computational techniques during training and testing. ese figures also compare different methods based on MARE and MXARE for both train and test data. Although the relative error plots of the conventional infiltration models and other computational techniques (MLR, GRG, MGGP, and the hybrid MGGP-GRG) followed a similar sequence, the nature of relative error plots of ANN followed a different pattern during both training and testing. It may also be observed from Figures 3-10 that the relative errors achieved by ANN are the least as compared to others. On the other hand, relative errors obtained by Horton's infiltration model were the highest as compared to others. Furthermore, the AIbased models (ANN, MGGP, and the hybrid MGGP-GRG), which consider three independent variables (t, s, and d) instead of one variable (t), achieved much better MARE and MXARE in comparison with the empirical models during the training and testing phases. According to Figures 3-10, ANN and the hybrid MGGP-GRG resulted in the first and second best MARE and MXARE values, whereas MLR and Horton's model yielded to the first and second worst MARE and MXARE values for the train and test data. Figures 11 and 12 depict the comparison between the observed and estimated infiltration rates obtained by the best-fit model (ANN and the hybrid MGGP-GRG) and the worst-fit model (Horton's model). It may be observed from Figure 11 that the infiltration rates estimated by ANN almost fit the observed data during training phase. On the other hand, the infiltration rates predicted by Horton's model deviated significantly from the observed data. e performance of the hybrid MGGP-GRG was better than that of Horton's model but poorer than that of ANN. During the testing phase, the estimates of the hybrid MGGP-GRG and ANN were almost identical as shown in Figure 12. e       estimates obtained by Horton's model during the testing phase were again significantly different from the corresponding observed values. Hence, Figures 11 and 12 obviously demonstrate how much the infiltration estimations can be enhanced by considering other variables involved in the process in addition to time, while they clearly indicate the better performances of AI-based models in comparison with those of the available empirical equations. Figure 13 depicts the results of the sensitivity analysis, which was conducted for ANN and MGGP. As shown, time has the highest SA percentage (SA � 91.98%) for ANN, which implies that the infiltration rates predicted by ANN are mostly sensitive to time in comparison with other two input variables (sand percentage and density). is achievement is in agreement with the fact that the empirical models (such as Horton's and modified Kostiakov) used for estimating infiltration rates rely only on time. On the other hand, MGGP-based model, which yielded a lower accuracy for predicting infiltration rates than ANN, was found to be more sensitive to sand percentage than to time. erefore, as infiltration rates may be affected by time based on the physical background of the problem statement, the results of the sensitivity analysis also indicate that ANN estimated infiltration rates better than the MGGP-based model. e reliability analysis was carried out for the train and test data separately. e results of this analysis are presented in Figure 14. As shown, ANN achieved the highest percentages of reliability for both the train and test data. Furthermore, the reliability percentages obtained by MGGP and the hybrid MGGP-GRG were higher than those of empirical model, MLR, and GRG. Finally, the reliability analysis conducted in this study reveals the improvement made by the AI models over other data-driven methods available in the literature for predicting infiltration rates. e structure of the equations developed by the conventional infiltration models, MLR and GRG, are known in advance of applying these methods. On the other hand, ANN, MGGP, and the hybrid MGGP-GRG are highly nonlinear techniques with greater degrees of freedom and complexity and, therefore, provide better estimates of the infiltration rate.
However, more precise results are obtained by ANN, MGGP, and the hybrid MGGP-GRG at the expense of higher computational efforts. ese machine learning tools require a considerable number of runs, unlike the conventional models and MLR in which a single attempt is sufficient for determining the model output. Based on the comparative analysis conducted in this study, ANN certainly yielded to the best estimates of infiltration rates. However, the estimates obtained from the hybrid MGGP-GRG were also comparable, especially, for the test data. Furthermore, unlike ANN, the hybrid MGGP-GRG model provided explicit equations for predicting infiltration rates, which can be implemented in a typical hydrological modelling or preferred in practice by engineers, which may be counted as an advantage of this AI-based technique.

Conclusions
In the present study, published infiltration data was used to assess the performances of MGGP and the hybrid MGGP-GRG technique in modelling the infiltration rates of soil. e estimated infiltration rates were compared with those obtained by the conventional models (Horton's model, Philip's model, and modified Kostiakov's model). It was observed that application of the hybrid MGGP-GRG and MGGP improved the estimates of infiltration rates as compared to the conventional infiltration model by over 80%. On the other hand, ANN provided the best estimates of infiltration rates. In addition to the accuracy improvement, the application of ANN, MGGP, and the hybrid MGGP-GRG increased the complexity of modelling equations. Future studies may focus on the comparison of the hybrid MGGP-based models with the other machine learning approaches, while applying the explicit infiltration models developed by either MGGP or the hybrid MGGP-GRG in hydrological models is anticipated in favor of assessing their performances in practice.

Data Availability
e data used in this study are available in the related literature.