Assessing Rainfall Erosivity with Artificial Neural Networks for the Ribeira Valley , Brazil

Soil loss is one of the main causes of pauperization and alteration of agricultural soil properties. Various empirical models (e.g., USLE) are used to predict soil losses from climate variables which in general have to be derived from spatial interpolation of point measurements. Alternatively, Artificial Neural Networks may be used as a powerful option to obtain site-specific climate data from independent factors. This study aimed to develop an artificial neural network to estimate rainfall erosivity in the Ribeira Valley and Coastal region of the State of São Paulo. In the development of the Artificial Neural Networks the input variables were latitude, longitude, and annual rainfall and a mathematical equation of the activation function for use in the study area as the output variable. It was found among other things that the Artificial Neural Networks can be used in the interpolation of rainfall erosivity values for the Ribeira Valley and Coastal region of the State of São Paulo to a satisfactory degree of precision in the estimation of erosion. The equation performance has been demonstrated by comparison with the mathematical equation of the activation function adjusted to the specific conditions of the study area.


Introduction
Erosion is considered one of the main causes of depauperation and alteration of soil properties and, consequently, of loss of agricultural soil.Mathematical models are used to quantify and/or predict such losses [1,2].One classical example is the universal soil loss equation (USLE), proposed by Wischmeier and Smith [3].This equation predicts the average annual soil loss from agricultural land.The USLE is represented by the product of the following factors: (a) rainfall erosivity (R), (b) soil erodibility (K), (c) slope length (L), (d) slope percent (S), (e) soil use, handling, and coverage (C), and (f) conservative practices of soil support (P).
The R factor is an index that expresses the rainfall erosivity, in other words, its erosive capacity [4].Erosivity is defined as the rainfall potential for soil erosion and is exclusively a function of rainfall physical characteristics, including amount, intensity of fall, droplet size, terminal velocity, and kinetic energy.Some studies have been conducted to further detail research on this erosive agent and showed that the rainfall characteristics that provide the best correlations with soil losses are intensity of fall and kinetic energy [5].Therefore, the estimation of erosivity values, which represents the rainfall potential for erosion, is essential for planning soil and water conservation.
Water erosion is causing severe problems to the population that live in the State of São Paulo, such as loss of soil from arable farmland, reduction in public investments in infrastructure works, and degradation in urban areas.Seven thousand cases of gullies, estimated currently, exist in the territory of the State of São Paulo.The cost of the corrective actions required for the stabilization of this geological phenomenon that causes severe water erosion corresponds to 20% of the State's budget, excluding the costs of restoration of degraded urban areas, constructions, and urban street design schemes, among others.Water erosion in agricultural land is even more critical, for it is estimated that 80% of São Paulo's agricultural soils are affected by erosion.
The procedures used to estimate rainfall erosivity (R factor) are slow.Pluviograph data is necessary for the calculation of the R factor, and such data is not easily available in Brazil.The data processing and analysis is slow and complex [6].Besides, the reduced number of meteorological stations that are equipped to provide pluviograph data makes it difficult to disseminate such studies.An alternative is the use of empirical equations based on monthly and annual rainfall to estimate the R value [1,7,8].However, even with the use of the referred to equations, rainfall erosivity is restricted to sites that count on the required pluviometric data.In places where such data is not available, authors like Bertoni and Lombardi Neto [6], Erosividade [9], and Silva [10] used interpolation techniques for R values.Machine Learning (ML) Techniques were also considered an attractive alternative for handling this problem [11].
One of the main techniques within ML is the Artificial Neural Networks (ANNs).According to Persson et al. [12], an ANN is a system that is developed to imitate the operation of the human brain, which acquires knowledge through a process of training aiming at finding weights of the different connections.According to Haykin [13], this technique is based on a form of nonalgorithmic computation.For Sárközy [14], ANNs can be used as interpolation tools, and their ability to learn different input parameters makes them capable to solve complex problems from many other areas.The ANNs are also cited as alternative resources for estimating climatic variables that may replace the traditional interpolation methods [15].
An ANN is composed of a set of computational elements called artificial neurons, which relate the output and input values through the following equation: where y i j : output value of neuron "i" of layer "j"; n: number of neurons in the previous layer; y i ( j−1) : output value of neuron i in the previous layer; w i j i : value of synaptic weight of neuron "i" in layer "j", activated by neuron "i" in the previous layer; b i j : compensation value for neuron "i" in layer "j"; f: neuron "i" activation function [16].
According to Saito et al. [17], consensus has not been reached on the most appropriate statistical methods for analyzing the performance of models.However, for Camargo and Sentelhas [18], model precision is given by the correlation coefficient (r), and accuracy concerns the difference between the estimated values and the observed values, given by the index of agreement (d).Another resource is the efficiency coefficient (E), which has been used by many authors in the evaluation of hydrological models and in the quantification of water constituents for estimation of its quality [19].
Based on the proposition of Moreira et al. [2] and considering the specificity of climate and geomorphology, we aimed to develop an artificial neural network to estimate rainfall erosivity in the Ribeira Valley and Coastal region in the State of São Paulo.

Material and Methods
Part of the study area (Vale do Ribeira) is considered by the United Nations Educational, Scientific and Cultural Organization (UNESCO) as Mata Atlântica biosphere reserve.According to Romão [20] this region has the largest continuous tract of the remaining Mata Atlântica and the Brazilian associated ecosystems, concentrating 40% of the conservation units of the State of São Paulo.The use of around 75% of the region's land is regulated by environmental protection acts, and 58% of these areas have been turned into public parks and ecological stations for protection-so land renting is forbidden-or into environmental protection areas where land ownership is private, though with restrictions in use.
The study was conducted in the Ribeira Valley and Coastal region of the State of São Paulo, Brazil (Figure 1), located between 23.77 and 24.97 latitudes and 45.42 and 49.17 longitudes, and with altitudes varying from 2 to 890 m asl.The region's climate, according to Koppen, is the Cfa, that is, subtropical humid, with an average annual temperature of 21 • C, with 28.3 • C being the mean maximum temperature and 17 • C the mean minimum temperature.The assessed variables included monthly precipitation, rainfall erosivity, latitude, longitude, and altitude data from 32 pluviometric stations (Table 1).
The R values for each station were obtained through (2), proposed by Silva et al. [8].The sum of EI 30 values for the 12 months of the year represented the rainfall erosivity value: where EI 30 = erosion index (MJ mm (h ha) −1 ); Pm = average monthly precipitations (mm); Pa = annual precipitations (mm).
The Weka (Waikato Environment for Knowledge Analysis) package was used to develop the network, which is formed by a set of implementations of algorithms of various Data Mining techniques.The input variables were composed of the latitude and longitude values of each station (in decimal degrees), the altitude value (m), as well as the average annual precipitation (mm).A linear activation function was used in the output variable to obtain the rainfall erosivity value (R factor), in MJ mm (h ha) −1 of the location represented by the input vector.The analysis was performed by means of ML study.The results were obtained using 10-fold cross-validation, with 10 repetitions.Multilayer Perceptron ANNs were used for training, with backpropagation algorithm, momentum term, and learning rate of, respectively, 1 and 0. One-layer ANNs trained with the Least Mean Square (LMS) algorithm were also used.The root mean squared error (RMSE) and the correlation coefficient (r) obtained in the test sets were used.The correlation coefficient (r), the agreement index (d) proposed by Willmott [21], and the confidence interval (c), which is the product of r and d, were used for the evaluation of the performance of the model obtained with the use of ML.The agreement index can be calculated by (3) as follows: where i is the number of observations, R is the experimentally observed value, Re is the value estimated by the model, and Rex is the average value of the experimentally observed values.
The efficiency coefficient (E) defined by Nash and Sutcliffe [22] was also calculated as in (4); the mean absolute error (MAE) was showed by Legates and McCabe Jr. [23] as in (5); the mean percent error (MPE) was used by Chong et al. [24] as in (6); and the root mean squared error (RMSE) was represented by (7) as follows: The spatial distribution of erosivity (R factor) estimated by Silva et al. [8] and obtained through the use of ANNs was performed through kriging interpolation method, using Surfer version 8.0 application.

Results and Discussion
The input variables composed by the latitude, longitude, altitude, and the average annual precipitation values for each station made it possible to adjust (8) to estimate Table 3: Results of the estimation of rainfall erosivity (R) according to [8] (MJ mm (h ha) −1 ), of the estimation with the use of the equation obtained with the use of artificial neural networks (Re) (MJ mm (h ha) −1 ), and of the relative percentage error (RPE)  rainfall erosivity (R factor).This equation obtained with the use of ANN was a multivariate linear function, and its determination coefficient (R 2 = 0.97) corroborates its performance in estimating rainfall erosivity with 95% confidence interval.The RMSE value for LMS algorithm (496.43) was lower compared to backpropagation algorithm (738.28), which corroborates [13] that the LMS algorithm provides better solution for linear problems: where Alt: altitude (m), Lat: latitude (decimal degrees), Long: longitude (decimal degrees), and Pa: annual precipitation (mm).Table 2 shows a summary of the descriptive statistical analysis of the performance of the equation proposed by Silva et al. [8] and the performance obtained through the use of the ANN in the estimation of erosivity.The main parameters were the correlation coefficient (r), agreement index (d), efficiency coefficient (E), mean absolute error (MAE), mean square error (MSE), root mean squared error (RMSE), and the mean percent error (MPE).
The values found for the correlation coefficient and the agreement index were, respectively, 0.97 and 0.99.As it is known, the more precise and accurate results for r and d, respectively, are those closer to 1.0.Moreover, a 0.95 value was found for E, and it is known that E values may vary from minus infinity to 1.0, with 1.0 value indicating a perfect simulation of the model.
The lower the percentage error, the higher the efficiency of the method used.According to literature, values close to 10% are too low for accurate estimation of rainfall erosivity [2].In this study, the MPE value was 2.95%; thus, it was found to be acceptable.The confidence interval was 0.96, which was found to be an optimum performance, because according to Camargo and Sentelhas [18] c-values higher than 0.85 ensure optimal performance.
The variability in altitude values shown in the dispersion in data in relation to the mean (Table 1) can explain the low synaptic weight (−0.4362) of the neuron (w i j i ) and the contribution of this variable to (8).This result is different from the one found in the studies of Moreira et al. [2], who stresses the influence of altitude and continentality in rainfall erosivity throughout the State of São Paulo, obtained with the use of the ANN.However, it is important to emphasize that the referred to authors used the model proposed by Lombardi Neto and Moldenhauer [7] as the mathematical equation of the activation function, as well as data from meteorological stations distributed throughout the State of São Paulo, so that the variability (dispersion) of altitude data in the study area has become more dilute.

Figure 1 :
Figure 1: Location of the study area: Ribeira Valley and Coastal region of the State of São Paulo, Brazil.

Figure 2 : 2 Figure 3 :
Figure 2: Erosivity values estimated by Silva et al. [8] and by the ANN for each municipality.

Table 1 :
Altitude, latitude, longitude, annual precipitation, and erosivity of 32 municipalities of the Ribeira Valley and Coastal region of the State of São Paulo.

Table 2 :
[8]mary of the descriptive statistical analysis of erosivity values estimated by Silva et al.[8]and with the use of ANN in the Ribeira Valley and Coastal region of the State of São Paulo.
in module between the erosivity values R and Re, in %, for the 32 municipalities in the Ribeira Valley and Coastal region of the State of São Paulo.