Evolutionary Prediction of Soil Loss from Observed Rainstorm Parameters in an Erosion Watershed Using Genetic Programming

Various environmental problems such as soil degradation and landform evolutions are initiated by a natural process known as soil erosion. Aggregated soil surfaces are dispersed through the impact of raindrop and its associated parameters, which were considered in this present work as function of soil loss. In an attempt to monitor environmental degradation due to the impact of raindrop and its associated factors, this work has employed the learning abilities of genetic programming (GP) to predict soil loss deploying rainfall amount, kinetic energy, rainfall intensity, gully head advance, soil detachment, factored soil detachment, runoﬀ, and runoﬀ rate database collected over a three-year period as predictors. Three evolutionary trials were executed, and three models were presented considering diﬀerent permutations of the predictors. The performance evaluation of the three models showed that trial 3 with the highest parametric permutation, i.e., that included the inﬂuence of all the studied parameters showed the least error of 0.1 and the maximum coeﬃcient of determination ( R 2 ) of 0.97 and as such is the most eﬃcient, robust, and applicable GP model to predict the soil loss value.


Soil Loss and Influencing Rainstorm Parameters.
As part of the earth with the utmost organic matter content, topsoil plays a significant role in farming and other fertile activities. Soil erosion, a naturally occurring process where topsoil is being removed by the actions of rain or wind, is a serious environmental hazard that reduces soil fertility as well as the productivity of agricultural land [1][2][3][4][5]. Frankly, a momentous percentage of global land is permanently being lost to water erosion through runoff and to wind. In fact, there is a strong unresolved interconnectedness among precipitation, vegetation, and erosion [6]. Soil loss has also been reported to contribute to environmental degradation via air and water pollution, nutrient loss, and loss of soil organic matter or biota [1,7]. Among the factors that constitute soil formation such as topography, original materials, organisms, and time, which also determine the rate at which soil loss occurs, climatic or hydrometeorological factors such as rainstorm parameters play the most critical role [8]. Some authors have described soil temperature and soil loss as functions of meteorological variables which are also positively affected by hydrological factors such as rainfall parameters like rainfall intensity, kinetic energy, runoff, and many more [9,10]. For instance, Kinnell [10] had earlier reported that the difference between rainfall intensity and the average infiltration rate could be used to estimate variations in the efficiency of the use of rainfall energy in causing sheet erosion. While investigating the connection of amount and kinetic energy of rainstorm with that of soil influence, Kinnell [10] developed an equation that modelled rainfall kinetic energy as a function of rainfall intensity, in which many researchers have always attributed to be critical factors of soil loss actions [11].
Moreover, another significant environmental threat many locations face is the continuous increase in the gully head or elevation as more and more dissection of the landscape is triggered by incessant soil erosion. Due to its intricacies and criticality in exacerbating soil loss, gully head advancement has been modelled using various field studies, aerial photography, GIS analysis, and multiple regression, and the results all confirmed the spatial and temporal variation of gully longitudinal development which reveals the effects of the runoff and waterfall process on soil erosion [12,13]. Another common occurrence within the environment is when coarse and medium sand size material get reduced to either larger or smaller particle sizes [14]. During this phenomenon of particle detachment either in the factored or unfactored form, high hydration energies from raindrops induce collision of particles, followed by a splash of drops of the fluidized soil, and then a disintegration of the fluidized soil by an overland flow [14,15]. All these parameters affecting soil erosion have not been perfectly measured, and hence, the use of forecasting or predictions techniques has proven to be useful.
At present, various modelling techniques have emerged such as field studies, aerial photography, GIS analysis, remote sensing, and multiple regressions, which help to estimate soil loss under a wide range of climatic and land use conditions [12,[16][17][18][19]. ere has also been a massive decease in emphasis on erosion research dwelling strictly on empirically based models, such as the Universal Soil Loss Equation, due to the emergence of various evolutionary computation techniques, otherwise, known as machine learning methods that facilitate easy, accurate, and robust computations and predictions. Not only does the machine learning approach predicts outputs in a more precise form but also have the capability of inferring a decision boundary that separates the input data space into two distinctive regions of erosion and nonerosion segments, thereby, making soil loss prediction worthwhile [20]. As systems imitate the human brain, machine learning techniques have been applied in various areas of engineering and even beyond and are useful in making predictions, performing clustering, extracting association rules, or making decisions from a given dataset [21][22][23].

Genetic Programming (GP) and Potential for Soil Loss
Prediction. Genetic programming (GP) is one of the datadriven evolutionary computation methods that explores a program space rather than just searching it in a process setting as shown in Figure 1, which is a unique advantage it claims over the genetic algorithm [24,25]. GP performs this exploration effectively by selecting datasets accordingly, fitting, and introducing genetic variation via some genetic operators [26,27]. GP has the potential of changing their sizes, shapes, and composition, much like a living organism, to learn and report findings made from inputted datasets [21,28,29].
As a model that deals with complex adaptive systems via its tree-structure modelling mechanism (Figure 2), gene expression programming is a new, popular evolutionary technique that produces modelling equations in addition to its robust prediction configuration [31][32][33]. GP has been applied to many studies in environmental, water resources, structural engineering, geotechnical engineering, and beyond [26,27,31,[33][34][35].
As earlier inferred, extensive care must be taken to ensure that the various constraints affecting soil erosion are all considered when predicting soil loss from the daily temporal and spatial variations of the concerned rainstorm parameters occasioned by different land use and climatic factors. Having being offered the amazing advantage of the GP technique as a data-driven tool that does not only successfully study and analyze all parameters of interest but also generates prediction equations and models based on lessons learnt from the input datasets. Use of GP in predicting soil loss from rainstorm parameters promises to aid in developing new algorithms or evaluate the existing ones for sustainable adjustment of soil parameters against continuous soil erosion actions [36,37]. e present study aims at utilizing the GP approach in predicting soil loss from various rainstorm parameters, particularly rainfall amount (mm), rainfall intensity (mm/h), kinetic energy (J/m 2 /mm), gully head advance (m), unfactored and factored soil detachments (g/m 2 ), runoff rate (m 3 /s), runoff (cm 3 × 10 3 ), and soil loss (kg/m 2 ), all having spatial and temporal variability modelled for the day-by-day period of 1993, 1995, and 1996.

Data Collection and Tabulation.
e data collection method adopted in this work was a literature collation of observed rainfall data from previous research works spanning for a period of three years in a watershed measuring about 35 m 2 as presented in Table 1 [38] and graphically in . e graphs represent the observed and estimated values of rainfall amount, kinetic energy, rainfall intensity, gully head advance, soil detachment, factored soil detachment, runoff, runoff rate, and soil loss, respectively, collated from the erosion watershed under the study within the three-year period.

Process
Set of Outputs Set of Inputs     Figure 6: Estimated gully head advance with the selected dates of rainfall observation.      Figure 10: Observed runoff rate with the selected dates of observation.

Applied and Environmental Soil Science
Furthermore, the gully head advance parameter was estimated by using the United States Soil Conservation Service simple parametric model equation estimating average gully head advance for erosion watersheds as presented in the following equation [39][40][41]: where A is the drainage area above gully head in m 2 (3.5 × 10 � 35 m 2 ) and P � 24 hours rainfall in mm. e values of the literature data collation and parametric estimation of erosion parameter were tabulated and deployed to a prediction exercise using the gene expression programming learning technique. e model functional relationship is presented in the following equation: From equation (2), it can be aptly deduced that the soil loss (L) is the dependent or target variable, while rainfall amount, kinetic energy, rainfall intensity, gully head advance, soil detachment, factored soil detachment, runoff, and runoff rate are the independent or predictor variables, which are functions of the function L.
Forty eight (48) records were collected to carry out this study. Each record contains the following data:

Statistical Analysis of Database.
e collected records were divided into two sets. e first one is a training set that contains (32) records, while the second one is the validation set that contains (16) records. e complete dataset is given in Table 2. Statistical analysis for the utilized database is summarized in Table 3 to show the significant variability in the soil properties. It can be observed that the soil loss has the highest degree of deviation and variance compared to the other observed parameters with gully head advance showing the least degree of deviation and variance. However, the variation and variance of the parameters showed values less than 1.

Research Program.
ree trials were carried out to correlate the soil loss value (L) to the corresponding field measurements. Each trial uses certain complexity level starting from 4 levels expression and up to 6 levels expression. Iterations are performed until achieving the minimum (SSR) of the set which reflects the most accurate expression at the considered level of complexity. Characteristics of each trial are as summarized in Table 4.   Figure 11: Observed soil loss with the selected dates of observation.
Applied and Environmental Soil Science e following paragraphs present and discuss the results of each trial. Results of all trials are summarized in Table 5. For all performed trials, the main target was to make a comparison between the predicted and the measured soil loss values and hence evaluate the accuracy of the developed expressions based on the statistical analysis. (GP). As usual in the (GP) technique, the trials start with the lowest level of complexity and increased to improve the prediction accuracy.

Prediction of Soil Loss Value Using Genetic Programming
is is because the simple formulas allow for a limited number of variables; hence, the appeared variables are the most effective ones, while the complicated formulas allow for more less effective variables. And hence, the considered variables could be ranked according to their impact on the output.

GP Model Trial
No. (1). Start with the simplest expression with four levels of complexity (chromosome length � 32 genes); the generated formula in this trial is shown in equation (3) and in Figure 12(a). e achieved error percent for training, validation, and total sets is (30%), (26%), and (29%), respectively, while the corresponding coefficient of determination (R 2 ) values of (0.61), (0.65), and (0.62), respectively. e archived (R 2 ) values indicate a fair correlation between the predicted and measured soil loss values.

GP Model Trial
No. (2). In this trial, the complexity level is expanded to five levels (chromosome length � 64 genes). Equation (5) appears a good correlated relationship, whereas Figure 12 (1) and (2) illustrates that the accuracy is significantly improved.    A1 � Ln(D) +

GP Model Trial No. (3).
Six levels of complexity (chromosome length � 128 genes) were used in this trial to produce equation (7). e relation between measured and predicted values of soil loss is shown in Figure 12(c). e achieved error values are (11%), (9%), and (10%) for the training, validation, and total sets, and their (R 2 ) values are (0.96), (0.98), and (0.97), respectively. It could be noted that the equation (3) where A1 � Ln

Conclusions
is study was concerned in predicting the soil loss value using the available database. e considered parameters were rainfall amount, kinetic energy, rainfall intensity, gully head advance, soil detachment, factored soil detachment, runoff, and runoff rate. ree GP models were developed with different complexity and accuracy levels, and comparing these models shows the following: (i) e 1st model in the simplest one (with only 32 genes in the chromosome) depended mainly on rainfall amount (P), soil detachment (D), and factored soil detachment (D f ). A fair accuracy level was achieved.
(ii) e 2nd model (with 64 genes in the chromosome) utilized the same parameters of the 1st model, besides gully head advance (E), runoff (R), and runoff rate (R r ) to enhance the accuracy level.
(iii) e last and the most complicated model (with 128 genes in the chromosome) included all the eight considered parameters and showed an excellent level of accuracy Based on the above brief, the following points could be concluded: (i) Genetic programming (GP) technique was successfully used to predict the soil loss value (L) with excellent accuracy.
(ii) e accuracy of the predicted soil loss value increased with increasing the complexity of the used expression up to certain level (6 levels in this study), beyond that the extra accuracy is not worth the extra complexity.
(iii) Rainfall amount (P), soil detachment (D), and factored soil detachment (D f ) are the main parameters controlling the soil loss (L) (iv) Gully head advance (E), runoff (R), and runoff rate (R r ) have secondary impact on soil loss (L) (v) Kinetic energy (E K ) and rainfall intensity (I) do not have a significant effect on the soil loss (L)

Applied and Environmental Soil Science 13
Data Availability e data used to support the results of this research are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.