Integration of Multiple Models with Hybrid Artificial Neural Network-Genetic Algorithm for Soil Cation-Exchange Capacity Prediction

Department of Water Engineering, Faculty of Agriculture, University of Tabriz, Tabriz, Iran Department of Civil Engineering, Siddaganga Institute of Technology, Tumakuru 572103, Karnataka, India Department of Railroad Construction and Safety Engineering, Dongyang University, Yeongju 36040, Republic of Korea Department of Real Estate Development and Management, Faculty of Applied Sciences, Ankara University, Ankara, Turkey Faculty of Engineering and Architecture, Department of Geomatics Engineering, Tokat Gaziosmanpaşa University, Tokat, Turkey Faculty of Sustainable Design Engineering, University of Prince Edward Island, Charlottetown, PE C1A4P3, Canada School of Climate Change and Adaptation, University of Prince Edward Island, Charlottetown, PE C1A4P3, Canada New Era and Development in Civil Engineering Research Group, Scientific Research Center, Al-Ayen University, 7i-Qar, Nasiriyah 64001, Iraq Department of Earth Sciences and Environment, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia Adjunct Research Fellow, USQ’s Advanced Data Analytics Research Group, School of Mathematics Physics and Computing, University of Southern Queensland, Toowoomba, QLD 4350, Australia


Introduction
Cation-exchange capacity (CEC) refers to the extent of soil's capacity to preserve exchangeable cations, the like of which have a direct bearing on the soil fertility triangle [1]. Soil CEC is a sensitive indicator of natural and human-induced perturbations over soil profile and groundwater [2]. Monitoring changes in soil CEC can assist in predicting whether soil quality has degraded, improved, or sustained under diverse agricultural or forestry schemes. In the course of conventional soil management practices to replenish the soil solution that supports plant growth, the negatively charged clay particles and organic substances adsorb and hold on positively charged soil nutrients (e.g., NH + 4 , K + , Mg 2+ , Ca 2+ , etc.) via electrostatic forces [3,4]. e preferential adsorption of cations is as per the sequence: Al 3+ > Ca 2+ > Mg 2+ > K + � NH + 4 > Na + [5]. Depending on the soil structure, CEC clearly demonstrates the shrink-swell potential of any soil; a high CEC value (>40 meq/100 g) denotes that a soil structure will recuperate gradually and sometimes can show expansive behavior. In contrast, a soil with low CEC value (<10 meq/100 g) will have a reduced capacity to hold water and end up being acidic rapidly [6]. Soil CEC can fluctuate according to clay percentage, soil pH, ionic strength, soil-to-solution ratio, clay type, and changing organic matter composition. It is sometimes affected by the redistribution of cations (exchange kinetics) in the soil attributed to soil solution buffering and solute transport. CEC also enables the categorization of certain soils including oxisols, vertisols, alfisols, mollisols, and ultisols [7]. In general, the organic matter enriches soils and, usually, clays (except kaolinite) have a high CEC, while sands have no CEC. For agriculture, the preferred value of CEC is >10 meq/ 100 g for exchange between plant root hairs and soils [8]. e leaching of contaminants into the underlying aquifer system is usually affected by CEC and percent base saturation which are eloquent indices of soil fertility and nutrient retention capacity. In areas of intensive irrigation, the continuous use of inorganic fertilizers (in excess) inundates the soil profile with more nutrients and thereby flushes a plume of contaminants into the groundwater [9]. Hence, in the early stages of agriculture, it is necessary to estimate CEC for determining the supplemental nutrient needs or to remove excess salts which influence soil structure and agricultural productivity. Soil CEC is a sensitive indicator of natural and human-induced perturbations over soil profile and groundwater. Monitoring changes in soil CEC can assist in predicting whether soil quality has degraded, improved, or sustained under diverse agricultural or forestry schemes [10].
Various methods for direct measurement of soil CEC have been reported and extensively discussed in the literature [11][12][13]. Multiple comparison of CEC estimation techniques is presented by Conradie and Kotze [14]. In addition, there exist several ancillary approaches such as pedotransfer functions (PTF) for estimating CEC based on easily measured soil's physical and chemical properties [15][16][17][18]. Several other researchers conducted studies on the functional relationships between CEC, water retention, and particle-size distribution. Lambooy investigated the influence of CEC on the water retention characteristics of soils [19]. Implementing multiple regressions, Parfitt et al. estimated CEC by taking into account soil organic carbon and clay content [20]. Krogh et al. modeled CEC rates of Danish soils by using clay and organic matter content as input variables through multiple linear regression analysis [21]. e actual CEC of agricultural soils was found to be directly related to the estimated charge of clay and organic carbon in the soil mass at the actual pH [22]. Using soil organic matter and noncarbonate clay contents as predictors, Seybold et al. explained the variation in CEC for several soil horizons based on soil pH, mineralogy class, taxonomic family, and CEC-activity class [23]. Fooladmand derived PTFs using multiple linear regression between CEC and soil textural data including sand content, clay content, geometric mean particle size diameter, the soil particle-size distribution, and soil organic matter content [24]. Several PTFs relating soil CEC with soil's sand, silt or clay fractions, and soil organic carbon content were evaluated by Khodaverdiloo et al. taking into account calibration dataset size on the prediction accuracy of soil CEC [25]. ese classical pedotransfer function-based approaches often suffer from a high degree of inaccuracy due to spatial scale dependence, nonlinear relationships among variables, and incompetence to handle mixed data [26]. Hence, the motivation of the current state of the art is directed toward a new research era where more intelligent models should be explored in this field.
Recent research studies have focused on improving the estimation accuracy of soil CEC by means of artificial intelligence (AI) techniques. Artificial neural network (ANN) based PTFs have become popular to predict/estimate soil CEC of different soil types under diverse climatic zones [27][28][29][30][31]. Kalkhajeh et al. conducted the accurate prediction of soil CEC using different data-driven models [32]. ey compared the performance of multiple linear regressions (MLR), adaptive neurofuzzy inference system (ANFIS), multilayer perceptron (MLP), and radial basis function (RBF) based ANN models for predicting the soil CEC using the bulk density, calcium carbonate, organic carbon, clay, and silt content (%) of the soil as input variables. e MLP model gave the most reliable prediction of soil CEC. A set of AI models along with empirical PTFs were developed and evaluated by Ghorbani et al. [33]; the authors found the most influential soil properties that influence soil CEC through sensitivity analysis. e ANFIS model provided the superior performance to RBF, MLP, MLR, and empirical PTFs while estimating soil CEC. Arthur [2] presented an ANN based methodology for estimating CEC from soil water content at different relative humidity ranges. Relatively few studies utilize a support vector machine (SVM), random forests (RF), genetic expression programming (GEP), multivariate adaptive regression splines (MARS), and a subtractive clustering algorithm based ANFIS for estimating soil CEC using readily measured soil properties as inputs [34][35][36][37][38]. A hybrid model integrating ant colony optimization (ACO) algorithm with ANFIS improved the prediction accuracy of soil CEC accompanied by an optimal choice of input subset which comprised soil properties (e.g., soil organic matter, 2 Complexity clay, silt, pH, and bulk density) [39]. Although there has been noticeable progress in AI implementation in the field of geoscience, the enthusiasm for developing and exploring more reliable intelligent predictive models is still an ongoing research era. In addition, the applications of hybrid AI models have been observed remarkably reported in the literature and for diverse engineering and sciences domains [40][41][42][43]. As a result, the inspiration for developing multiple learning intelligent models is investigated here for modeling soil CEC. Soil CEC is a sensitive indicator of natural and humaninduced perturbations over soil profile and groundwater. Monitoring changes in soil CEC can assist in predicting whether soil quality has degraded, improved, or sustained under diverse agricultural or forestry schemes. Hybrid soft computing approaches involving evolutionary algorithms coupled with AI techniques facilitate the development of more sophisticated models with higher prediction accuracy. Hence, in the present study, a hybrid approach involving the multilayer perceptron neural network optimized with a genetic algorithm (GANN) was developed and employed to enhance the prediction efficiency of soil CEC in Tabriz plain, an arid region of Iran. In addition, a multiple model integration scheme supervised with hybrid GANN (MM-GANN) was also simulated and verified to improve the soil CEC prediction efficiency. is multiple model integration scheme supervised with the GANN approach is a unique form of a hybrid model for soil CEC prediction. Standalone MLP artificial neural network (ANN) and extreme learning machine (ELM) models were also implemented for incorporation into the multiple model integration scheme and for reasonable evaluation with MM-GANN model predictions.

Artificial Neural Network (ANN).
e multilayer perceptron (MLP), a class of feedforward ANN, is one of the most versatile algorithms that has proven able to simulate highly complex and nonlinear relationships between a set of input variables (predictors) and the output data (predictand) [44]. A multilayer perceptron (MLP) neural network with 1 hidden layer is shown in Figure 1. e network is trained to learn a function, f(·): P d ⟶ P o on a set of training data, where "d" denotes the number of input dimensions and "o" denotes the number of output dimensions of the model [45].
e Levenberg-Marquardt backpropagation (BP) algorithm fine-tunes the weights and parameters of the MLP network. e network architecture involving the input layer consists of a set of processing units (neurons) p i |p 1 , p 2 , . . . , p n signifying the model input features and every hidden layer neuron performs a nonlinear transformation of the inputs from the previous layer via weighted linear summation of inputs (w 1 p 1 + w 2 p 2 + · · · + w n p n ). A nonlinear activation function (σ) is then applied to each hidden unit to make a specific topology of weighted links more flexible following the affine transformation [46]. e neurons of the final layer receive connections from hidden layers of the network and are referred to as the output layer that produces a refined output. Some of the commonly used activation functions include hyperbolic tangent (tanh) and sigmoid (logsig) functions. ere are no general rules for choosing training algorithms and adjusting associated parameters of the MLP architecture to maximize the efficiency of the network. A good introduction and mathematical concepts of ANN and its applications are provided in the following literature [47][48][49][50][51].

Extreme Learning Machine (ELM).
e extreme learning machine (ELM) model proposed by Huang et al. [52] for a single layer feedforward network (SLFN) has been widely used for the prediction, forecasting, and estimation in many engineering fields [53][54][55]. Previous research studies have proved the outstanding advantages of the ELM model over the traditional AI techniques [56][57][58]. In addition, the ELM model can be implemented easily and has improved features such as fast learning speed [59], superior generalization performance [60], and utilization of activation functions (of nondifferentiable form) for training SLFN [52,61]. Figure 2 portrays the general network structure of an ELM model. For N arbitrary distinct input samples (X i , Y i ) ∈ R n × R n , the standard SLFN with "L" hidden layer nodes can be described as follows: where c i ∈ R is the assigned bias of the i th hidden node, w i ∈ R is the assigned input weight connecting the i th hidden and input layer nodes, β i isthe weight connecting the i th hidden and output layer nodes, and g(X i ) is the output of the i th hidden layer node with respect to the input X i . Each input is assigned to the hidden nodes in the ELM model. e output weights can be derived by finding the least square solutions to the linear system. e main difference between the ELM model and traditional AI techniques is that the parameters of the feedforward network including its input weights and the hidden layer biases are randomly selected without any adjustment in the ELM model. For good introduction and mathematical concepts of ELM and its architectures, refer to Huang et al. [62], Martínez-Martínez et al. [63], Wang et al. [64], and Ding et al. [53].

Hybrid Genetic Algorithm-Neural Network (GANN).
Genetic algorithm (GA) belongs to a class of search iterative approaches based on the "Darwinian" theory of natural selection and genetics that provide optimum solutions for combinatorial optimization, heuristic search, or process planning problems [65,66]. GA implements genetic operators like reproduction, crossover, and mutation for upgradation and search for the best population by imitating the natural evolution process artificially. e genetic algorithm is initiated with individuals, an initial population of possible solutions, with a specified objective (fitness function) wherein every single individual is symbolized using a chromosome, a distinct form of encoding [67]. e chromosomes of a population are nominated for Complexity reproduction based on the fitness value and the fittest individuals so selected are manipulated using crossover and mutation.
e rudimentary idea here is the hope that superior parents can probabilistically produce superior offspring. e offspring of the next generation are generated by applying the GA operators crossover and mutation, upon the selected parents. e iteration process continues until the search converges to the termination criterion [65,68]. e schematic illustration of the GA cycle is represented in Figure 3. e advantages of GA include (1) rapid convergence to the global optima, (2) superior multidirectional global search even in complex search surfaces, (3) use of probabilistic transition rules, and (4) the not deterministic ones in the search spaces where the gradient information is missing. e training of an MLP network, which is a type of neural network (NN), is somewhat a cyclic process. However, in the present case of the hybrid genetic algorithm-neural network (GANN), the intelligent search technique (GA) allows the user to configure the weight initialization range and the number of hidden layer neurons and update the weights and bias terms of an MLP network. Eventually, GA is used to learn the best hyperparameters for an MLP network. Even though the weights of the MLP network are initialized randomly, GA does not adhere to a simple random walk. Based on the parameter settings, it effectively exploits the information to gamble on fresh search points for anticipated improved performance [69]. GA selects the primary superlative solution with the best fitness values iteratively and recombines it with mutation and crossover operators to introduce offspring into the population.
is process continues until the optimal solution with the highest fitness value is found based on any stopping criterion. us, the population's most fit MLP network is determined.

Multiple Model Integration Scheme Supervised with Hybrid GANN Model.
e proposed multiple-model integration scheme involves the development of ANN and ELM models individually using input combinations as defined in their model structures. e discrete outputs (predicted series) of individual ANN and ELM models are then unified as inputs for the GANN model to obtain superior soil CEC predictions.
e implementation of this multiple-model scheme involves two phases. In the first phase, the bestperforming ANN and ELM models are identified by simulating all possible combinations of inputs. Later, in the second phase, the discrete outputs (predicted series) of the best ANN and ELM models are unified as inputs to simulate the GANN model. e GA optimizes the number of hidden layer neurons and updates the weights and bias terms of an ANN. e final output derived from this proposed scheme is referred to as integrated multiple models supervised with a hybrid GANN (MM-GANN) strategy ( Figure 4).

Case Study and Data Description
e study area (Tabriz plain) considered encompasses an area of 150000 hectares (between 45°25′-46°12′ E, 37°50′-38°20′ N) and is located in the East Azerbaijan province of Iran. e surface topography of the area comprises rugged, mountainous rims, and the study area is sited toward the north-eastern part of Urmia Lake ( Figure 5). Tabriz plain is a high-altitude location (1360 m above mean sea level) characterized by cooler, wetter winters and hot summers with a tropical and subtropical steppe climate. e study area never receives greater than 40 mm of rainfall in any of the months, and the annual mean precipitation is around 360.7 mm. e geology of the area includes recent alluvium, fine elastic sediments, and red conglomerate with an alternation of sandstone and red marl. e method of ammonium saturation as mentioned in Chapman [70] was used for the cation-exchange capacity determination. e descriptive statistics of soil CEC and other soil parameters of the study area under consideration are tabulated in Table 1.
e spatial distribution of observed soil CEC is presented in Figure 6. e clay and soil organic matter were positively Input Layer    Figure 5: Location of the study area along with sampling points.

Modeling Development
Based on different combinations of soil parameters, the framework of model input-output scenarios was set for the development of ANN and ELM models with soil CEC as the output parameter. e input-output scenarios put on trial are listed in Table 2. e performance of the developed models was assessed based on the multiple statistical indices, namely, Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Nash Sutcliffe Efficiency (NSE) [71], Mean Absolute Percentage Error (MAPE), and coefficient of determination (R 2 ).

Root Mean Square Error
where x i is the actual value, y i is the model estimated value, x is the mean of true values, y is the mean of the model estimated values, and n is the number of data points.

Performance of ANN and ELM Models.
e ANN and ELM models were simulated for predicting the soil CEC based on the input-output combinations as mentioned in Table 2. e model structure (input nodes-hidden layer nodes-output nodes) and performance metrics of the ANN model for each input combination are presented in Table 3. In this study, the proposed ANN, ELM, and MM-GANN models were developed using MATLAB interface coding. e input-output scenario involving all the soil parameters (i.e., Clay + OM + pH + silt + CCE) provided the virtuous estimates of soil CEC with an NSE � 0.842. e input-output scenario involving four soil parameters (i.e., clay, OM, Ph, and silt) also offered relatively good soil CEC estimates with an NSE � 0.826. Despite having a significant correlation between clay and soil CEC, the single input-output ANN model (i.e., clay-CEC) failed to provide good soil CEC predictions. e spatial distribution map of ANN predicted soil CEC is presented in Figure 7. e ability of the MLP network to formulate a priori explicit hypotheses about a possible nonlinear relationship among several input variables makes it illustrious from other AI methods. e performance metrics of ELM models for each input-output scenario are tabulated in Table 4. e scenario involving all the soil parameters (i.e., clay + OM + pH + silt + CCE) provided the virtuous predictions of soil CEC with an NSE � 0.835. e ELM model efficiency was slightly lesser than that of ANN. e ELM model simulated with four inputs (i.e., clay, OM, pH, and silt) had reasonably substandard performance when compared to that of the ANN model with a similar input structure. e spatial distribution map of ELM predicted soil CEC is shown in Figure 8. e scatter plots presented in Figure 9 of the three efficient models display the accounted linear relationship between the observed and estimated soil CEC by ANN and ELM models. According to Figure 9, the ELM outperformed ANN although they have very close performance in terms of the statistical indices (Tables 3 and  4). e ELM is known for its superior learning speed and virtuous generalization performance than the ANN architecture. e soil CEC estimates of ANN and ELM models were employed as new inputs to the GANN model to predict soil CEC. To select the optimal input combinations in further modeling steps, examples from previous literature were referred to for enhancing the accuracy of models based on the different fields [72][73][74][75]. Within this category, it is worth mentioning that only the three highest performed combinations were considered in this hybrid model. e parameters of the genetic algorithm for adjusting the weights and bias terms of the ANN are presented in Table 5. Also, the performance statistics of MM-GANN models are shown in Table 6       Complexity predictors. e spatial distribution map of MM-GANN predicted soil CEC is presented in Figure 10 which is very much similar to that of the observed soil CEC map. e MM-GANN models developed with the predictions of ELM and ANN models calibrated by considering three and four soil parameters as inputs also offered convincingly good soil CEC predictions with NSE � 0.80 and 0.854, respectively. e scatter plots of MM-GANN models shown in Figure 9 depict the goodness of fit of the model predictions against the actual soil CEC values. In Figure 9, it is evident that the third combination of the MM-GANN model indicated a very close linearly fitted line to the 1 :1 line, especially for the combination that had all the parameters. Table 8 compares the performances of the best model of ANN, ELM, and MM-GANN models based on the statistical measures during the training and testing phases. is table shows that the performance accuracy of the hybrid model is higher than the ELM and ANN models, respectively, based on all the criteria values. e Taylor diagrams plotted for the best ANN, ELM, and MM-GANN models are shown in Figure 11. According to the Taylor diagram, it is very much evident that the multiple-model scheme (MM-GANN) offered relatively accurate estimates of soil CEC compared to the ELM and ANN models based on three statistical metrics (RMSD, standard deviation, and correlation coefficient). e MM-GANN model was the closest to the observed/actual data. e point density plots presented in Figure 12 also supported the above statement by exposing the tradeoff between observed soil CEC against the modeled.

Validation with Published Research Studies.
Validating the results of current research with reliable published literature within the context of a similar kind of study area (i.e., semiarid region) is worthwhile. e correlation coefficient (R 2 ) indices were selected as an indicator of the prediction capability. e best R 2 obtained for MM-GANN, ELM, and ANN models is R 2 ≈0.88, 0.85, and 0.84. In one of the earliest research performed on the soil CEC simulation along the Zayandehroud River in Isfahan, Iran, Amini et al. [27] established two classical ANN algorithms (i.e., feed-forward neural network and generalized regression neural network). e applied models were performed with poor prediction results with R 2 ≈0.69 and 0.66. Another study was conducted by Emamgolizadeh et al. to predict soil CEC on collected soil information from Semnan, Mashahad, and Taybad cities of Iran [35]. e authors developed two new data intelligence models, namely, genetic expression programming (GEP) and multivariate adaptive regression spline (MARS). GEP and MARS models attained an R 2 ≈0.80 and 0.86. Overall, the current study showed a convincing correlation performance over the state-of-the-art research studies.
Although the current research was the solitary approach to develop and assess the multiple model integration scheme supervised with hybrid GANN (MM-GANN), the certified limitations should be addressed in future research. It is evident from tables and figures that the MM-GANN model can improve the prediction accuracy of soil CEC when the inputs involving the predictions of ELM and ANN models calibrated by considering all the soil parameters (e.g., clay, OM, pH, silt, and CCE) are provided. However, one of the disadvantages of the MM-GANN model lies in the selection of the best standalone model for enhancing the prediction accuracy of soil CEC. erefore, it is recommended to incorporate the prediction results of other data-driven models as the inputs of the MM-GANN model which can enhance the model's performance. In addition, this concept can be expanded and applied to other engineering fields such as structural, hydrologic, water resources, climatic, and different time series prediction/forecasting.

Conclusion
Over the past two decades, there is a noticeable demand for soil data assessment with regard to pollution and land degradation. e new era of soil process modeling using data intelligence models has been rapidly boosted. e current study was to develop a hybrid machine intelligence model based on the multimodel genetic algorithm-neural network for soil cationexchange capacity. Two classical artificial intelligence models, namely, the ANN and ELM, were developed to evaluate their performance in estimating soil CEC along with the proposed hybrid MM-GANN model. Several correlated soil parameters including clay, silt, pH, carbonate calcium equivalent (CCE), and soil organic matter (OM) were used in the form of input attributes to the proposed and the comparable machine intelligence models. In particular, the hybrid MM-GANN model which received the predicted values of ANN and ELM as input attributes performed well in the estimation of soil CEC. Overall, the proposed multiple model integration scheme supervised with hybrid GANN model functions as an efficient pedotransfer function to predict or estimate soil CEC using readily available soil parameters (i.e., clay, OM, pH, silt, and CCE) as input variables. In particular, the conclusions of the current investigation are as follows: (i) Based on the applied evaluation metrics, the ELM model provided superior CEC estimates than ANN.
(ii) e proposed hybrid MM-GANN model outperforms both standalone ANN and ELM models in terms of all the statistical metrics.
(iii) e proposed integrated hybrid machine intelligence scheme (MM-GANN) proved to be a reliable modeling strategy for modeling the soil cation-exchange capacity of the study area.
Before this end, it is worth stating the possibility for future research. As a fact, soil CEC is influenced by several morphological parameters [76,77]; thus, integrating a feature selection as a prior modeling phase for the prediction process is highly recommended to be established. In addition, owing to the associated variability with each soil CEC type, it is an ideal proposition to estimate each type individually.

Complexity
Data Availability e datasets are available. Data can be shared upon request from the corresponding author.

Conflicts of Interest
e authors declare that they have no conflicts of interest.