Forecasting Daily and Monthly Reference Evapotranspiration in the Aidoghmoush Basin Using Multilayer Perceptron Coupled with Water Wave Optimization

The aim of this study is to evaluate the ability of soft computing models including multilayer perceptron- (MLP-) water wave optimization (MLP-WWO), MLP-particle swarm optimization (MLP-PSO), and MLP-genetic algorithm (MLP-GA), to simulate the daily and monthly reference evapotranspiration (ET) at the Aidoghmoush basin (Iran). Principal component analysis (PCA) was used to find the best input combination including the lagged ETs. According to the results, the ET values with 1, 2, and 3 (days) lags as well as those with 1, 2, and 3 (months) lags were the most effective variables in the formation of the PCs. The total variance proportion of inputs and eigenvalues was used to identify the most important variables. The accuracy of the models was assessed based on multiple statistical indices such as the mean absolute error (MAE), Nash–Sutcliff efficiency (NSE), and percent bias (PBIAS). The results showed that the performance of hybrid MLP models was better than that of the standalone MLP. The findings confirmed that the MLP-WWO could precisely predict ET.


Introduction
Soft computing models are widely used to solve various problems in water resource management, such as reservoir operation [1,2], flood routing [3], irrigation management [4], and drug removal modeling [5]. Reference evapotranspiration (ET) is a key parameter for hydrology [6]. e major significance of ET in estimating the water budget has been well proved. e precise estimation of ET is highly important in managing water resources. In fact, it is essential to predict ET with an acceptable level of precision in managing watersheds [7], which can help decision-makers use accurate predictions to ensure the best allocation of water resources among the stakeholders [8]. For example, models used for predicting ET in relatively humid areas may not be suitable for dry basins where the water shortage is a significant challenge [9]. However, ET is measured only at a very limited number of meteorological stations due to the high cost of the necessary equipment.
e FAO-56 Penman-Monteith equation, which is widely used as a reference model for ET computation [10], requires a multitude of hydrological data, which is considered as one of its serious drawbacks. However, many hydrological modelers do not have access to accurate measurement devices to record information regularly [11]. Several empirical methods have been proposed to compute ET based on meteorological variables. e radiation-based and temperature-based ET models are widely used due to the restriction in the availability of meteorological data. A large number of studies have shown that the ET temperature-oriented models cannot achieve a high level of accuracy.
A growing body of research has examined the ability of the soft computing models in estimating ET. ese computing models usually use nonlinear inputs to predict ET, while old data-driven ET predictions mostly use statistical methods such as linear regression and autoregressive integrated moving average models [12] and are restricted by the hypothesis that the inputs are linear. Nonlinear models have received growing attention in the management of water resources using soft computing techniques, such as artificial neural network (ANN) and adaptive neuro-fuzzy interface system (ANFIS).
Shiri et al. [13] compared the performance of gene express programming (GEP) and FAO-56 Penman-Monteith (PM) in Spain. eir results indicated that the GEP model outperformed other models in most cases. In another study, Tabari et al. [14] applied ANFIS and support vector machine (SVM) to calculate potato crop evapotranspiration and found that the ANFIS had a better performance than the SVM model. In addition, Luo et al. [15] applied ANN and PM models to predict the ET and reported the superior performance of ANN in comparison to the PM model. Furthermore, Patil and Chandra Deka [16] examined the ability of the extreme learning machine (ELM) and ANN model in India and found that the ELM model provided better results.
Patil and Chandra Deka [17] applied ANFIS, wavelet ANFIS, and wavelet ANN to predict ET. eir results showed that the ANFIS-wavelet model outperformed other models. Seifi and Riahi [18] investigated the effectiveness of least square support vector machine (LSSVM), ANFIS, and ANN models in predicting ET using meteorological data. e results indicated that LSSVM model provided a more desirable prediction compared to the other models.
In this study, the MLP model was used to predict monthly and daily ET. Although the SVM models can predict hydrological variables, they have some inherent drawbacks [18]. For example, the SVM models cannot accurately estimate the target variables when the number of features is more than the number of the samples, and the input data have more noise [18]. erefore, choosing a kernel function is not easy for the modelers. e MLP models are widely used to predict hydrological variables because they can handle multivariate inputs and have multistep forecasts. However, the MLP models have some important drawbacks [19]. e training soft computing models are a real challenge for users. e traditional training algorithms such as the backpropagation algorithm of soft computing models may trap in the local optimum. Optimization algorithms are considered as suitable alternatives for traditional training algorithms due to their advanced operators, which avoid trapping in local optimums. e optimization algorithms are widely used for training soft computing models [19][20][21][22][23][24][25][26]. e genetic algorithm (GA) and particle swarm optimizations (PSO) are powerful optimization algorithms. e positions of particles are considered as the candidate solutions. is study combines the WWO and MLP models to increase the convergence speed and accuracy. In addition, the WWO avoids trapping in local optimums by refraction operator, which increases the population diversity. erefore, the use of this strategy avoids falling into the local optimums.
Water wave optimization (WWO), as an innovative optimization algorithm, has been recently used in various research fields such as the optimal reactive power dispatch, benchmark functions, and traveling salesman problem [27][28][29]. Previous research has shown that the WWO could increase the convergence speed and computation accuracy compared to PSO, GA, and other algorithms. WWO has several advantages such as a good balance between exploration and exploitation. Furthermore, it uses different operators such as refraction, propagation, and breaking operators to increase population diversity. Furthermore, the advanced operators of the WWO can reduce premature convergence. e main motivation of this paper is to develop the new hybrid MLP models for predicting ET. In addition, the new hybrid models can be used for predicting target variables in other fields. Although the different studies use different metrological data for predicting ET, this study uses the lagged ET values for predicting ET. us, this study is useful when the scholars do not have different climate input data for predicting ET.
In this paper, a new optimization algorithm aiming to combine within the MLP model was developed to predict monthly ET. To this end, meteorological data were collected from Aidoghmoush station in Iran. To the best of our knowledge, scanty research has examined the potential of the combination of MLP model and WWO algorithm to predict daily and monthly ET. Regarding the performance of the combined MLP-WWO, the outputs of the combined MLP-WWO are compared with the standalone MLP model, MLPgenetic algorithm (MLP-GA), and MLP-particle swarm optimization (MLP-PSO).

MLP Model.
MLPs include a set of neurons placed in layers. Activation functions are used in each node to transform the weighted inputs to an output characteristic of the mathematical properties of the activation function. e MLP in this study was trained using the back propagation algorithm (BPA) [30]. is network includes input, hidden, and output layers. e input data are received in the first layer. en, the information is processed in the hidden layer. Finally, the model prediction is produced by the output layer. e applied MLP model is based on the following levels: (1) e input-out data were accidentally selected by employing the given training input data. Models with different sizes, which had the highest accuracy, were tested for training and testing levels, among which 70 and 30 of the data were selected for training and testing levels, respectively.
(2) e outputs were generated for some input patterns after being applied to the transfer function. (3) An objective function such as root mean square error (RMSE) was selected. (4) e connection weight was updated to obtain the lowest RMSE value.
2 Complexity (5) For the testing and training levels, each pair consisting of the input-output vectors was continued through levels to the extent in which there was no considerable change in the RMSE in the model ( Figure 1).

Genetic
Algorithm. GA is an evolutionary algorithm which finds the optimal solutions for a problem based on Darwin's principle via mutation, crossover, and selection operators. To this end, several initial solutions are generated and their corresponding objective function values are computed. e selection operation is used to choose the old population parents. Next, the new individuals are generated by the crossover operator [31]. Finally, the mutation operator is used to maintain the diversity between the current and the next generations. e algorithm ends when the stopping criteria are satisfied ( Figure 2).

PSO Algorithm.
PSO is widely used for different optimization problems. Similar to other optimization algorithms, this algorithm begins with a randomly initialized population of members and uses the social behaviors of the particles to obtain the best solution by setting the position of each member concerning the best position of the particle in the swarm population. Equations (1) and (2) are used to update the velocity and position of the particles [32]: where X i (t + 1) is the position of the iterating particle (t + 1), V i (t + 1) denotes the velocity of the iterating particle (t + 1), c 1 and c 2 represent the constant value ranges 0-2, g best i shows the global best particle, l best i is local best particle, and rand denotes the random number, which is between 0 and 1. Figure 3 shows the optimization process of PSO.

WWO Algorithm.
e WWO algorithm aims to enhance the exploration and exploitation of the abilities of the optimization algorithms based on propagation, refraction, and braking operators. e WWO mimics the shallow water wave theory. Every agent in the algorithm has some resemblance to the "water wave" entity with a wavelength (WL) of λ and a wave height of h. As shown in Figure 4, the objective function value of the water wave is considerably lower, its h is lower, and WL is shorter than in deep water origins [29].

Propagation.
It is hypothesized that x represents the original water wave, and x′ denotes a new one generated by the propagation operator (weights and biases of the MLP) [33].
where L(d) is the length of the search space, rand(−1, 1) shows a uniformly distributed random number, x(d) represents the original water wave, and x ′ (d) denotes the new water wave. After propagating, the objective function value of x′ is evaluated. Without losing generality, F is assumed as a minimization problem with fitness functions of f and F. e practical problem can be compared with the shallow water model ( Figure 5).
If f (x′) < f (x), x′ is used instead of x in the population. en, the wave height is adjusted to maxim height (h max ). On the contrary, the wave x is fixed, and the wave height is reduced by one to show the loss of energy. erefore, the following equation could be used to update the wavelength [33]: where α is the wavelength reduction, f max shows the maximum objective function value, f min represents the minimum objective function value, and ε denotes the positive integer.

Refraction.
In the optimization process, the refraction operation can only be conducted on the wave x, which approaches 0 in its height to prevent search stagnation. e position update is given as follows [33]: where x * d is the optimal solution and N(σ, μ) shows a Gaussian random number with the mean (μ) of 0 and standard deviation of 1 ( σ). e N allows the wave x d to learn from the best solution x * d .

2.4.3.
Breaking. e utility of the breaking operator is to make the population diverse. K dimensions are randomly chosen, and each dimension is selected for providing each dimension of solitary wave x′ [29]: where β is the breaking coefficient. If the objective function value of wave x * is much better than the provided solitary waves, the x * wave is kept. Figure 6 shows the optimization levels for WWO.

Optimization Algorithms for Training MLPs.
It is necessary to consider two technical aspects in order to integrate the optimization algorithms with MLP, namely, the method for encoding the agents/solutions and the procedure for determining the objective function. Although the standalone MLP models have high ability, their training algorithms may have slow convergence or may trap in local optimums [21][22][23][24][25][26][27][28][29][30][31][32][33][34]. erefore, it is essential to improve the accuracy of the MLP models. ese models were trained for 1000 epochs in this study. e learning rate and momentum coefficient were 0.001 and 0.09, respectively.

Complexity
In evolutionary algorithms-MLP models, every dimensional vector can refer to an agent (e.g., particles (PSO), chromosomes (GA), and 'wave water' objects (WWO)), which may include random numbers in [−1, 1]. Each agent indicates a candidate MLP (Figure 7). e encoded agents contain sets of bias values and connection weights. e number of weights and biases determines the length of the vector. To compute the objective function values of the agents, all agents should be transferred to the MLP so that they could be labeled as the connection weights. RMSE is typically the applied objective function in MLP-optimization algorithms.
e level of the hybrid model trainer can be explained in the following levels: (1) e optimization algorithm-MLP model initiates the random agents (2) e agents are mapped to some biases and weight values of a candidate MLP (3) e quality of the MLPs is evaluated according to the RMSE (4) e optimization algorithm-MLP model constructs the fittest MLP with the minimum RMSE (5) e agents are updated (6) Steps 2 to 4 are continued until the last iteration (Figure 7) 2.6. Dataset. Figure 8 shows the data obtained from the meteorological station located in the Aidoghmoush basin of Iran (37°16′ to 37°31′N; 47°33′ to 47°49′E) during 1987 to 2000. e average discharge and the average annual rainfall are 190 * 10 6 m 3 per year and 340 mm, respectively. e rainy period of the year is from October to May, and August is the driest month. In addition, the peak probability for rainfall is in April, while the rainless period of the year is from June to

Complexity
October. e brighter and darker periods are from May to August and October to February, respectively [37]. e MLP, MLP-WWO, MLP-PSO, and MLP-GA were used to forecast the daily and monthly ET. Table 1 shows the tabulation of the statistical characteristics of the input data in this study. e lagged ET values were used to estimate one month or day ahead prediction of ET. e lagged ET values were used because it aimed to evaluate the ability of the models based on the limited input data. In fact, since there was no access to the climate data, the accurate forecasting ET was an important issue. e data were divided into 70% training and 30% testing. e lagged ET values were used as the input to soft computing models. Different sizes of data were tested, and 70 and 30 achieved the least value of the objective function. Before developing the model, principal component analysis (PCA) was performed on monthly and daily ET values to select the significant lags, that is, the lagged inputs mostly affect daily and monthly ET values. e variables lagged up to 7 days and 7 months were used as the inputs for the soft computing models to predict daily and monthly ET. PCA is a statistical model to transform a given set of n variables into a new set of PCs which are orthogonal to each other. e PCA was used to choose the best-lagged input variable to predict hydrological variables [37][38][39]. Previous studies indicated the superiority of PCA in comparison to other methods such as gamma test and correlation method [37][38][39]. e principals with eigenvalues, which are more than one, are considered as significant inputs. In addition, the most effective variables in PCs have a coefficient of ≥0.90. us, the principals and their variables are selected based on the aforementioned rules [37][38][39].    Figure 7: Solution structure in MLP-optimization model [35].

international border province border
East Azerbayjan province Figure 8: Location of the case study [36].

Complexity
Preparing the input data, computing the covariance matrix, along with the eigenvalues and vectors, as well as calculating the proportion of total variance for each PC are considered as the main levels of PCA model. e following indexes are used to assess the models.
where MAE is the mean absolute error, NSE shows Nash-Sutcliff efficiency, Y obs i indicates the observed data, Y mean represents the average of the observed data, and Y sim i denotes the forecasted data. Table 2 shows the outputs of the PCA including the contribution of seven inputs to seven PCs, the described variance of each PC, and the cumulative sum of the described variance. As shown, the PC1 and the first three PCs contributed to 54% and 91% of the total variance, respectively. e results indicated that the ET values with lags 1, 2, and 3 (days) were significant and used as the inputs.

Monthly Scale.
Based on the results, the PC1 and the first three PCS contributed to 53% and 94% of the total variance, respectively. In addition, the ET values with lags 1, 2, and 3 (month) were significant and used as the inputs to the soft computing models (Table 2).

Daily Scale.
e random parameters of optimization algorithms are considered as influential coefficients which significantly affect the performance of the algorithms. For example, the random parameters of WWO are the population size, wavelength, wave height, and breaking coefficients. Table 3 shows other important variables. Each MLP-optimization algorithm model was tested by different values of the random parameters. A sensitivity analysis was used to find the optimal value of the parameters. e variation of the objective function was computed versus the variation of interest of parameter. When a value of the interest of parameters varies, the other parameters are fixed.
e best values of the parameters minimize the objective function. e objective function was used to compute the error of performance at the end of each iteration. Each model was run within 2000 iterations to optimize the MLP parameters. erefore, a series MLP-optimization algorithms was conducted using various population sizes (PS), wave lengths (WL), wave heights (WH), and breaking coefficients (BC) ranging from 100 to 400, 0.1 to 1, 1 to 5, and 0.01 to 0.10, respectively. e obtained results showed that the PS: 200, WL: 0.5, WH: 2, and BC: 0.05 provided the lowest RMSE. erefore, these values were selected as the optimum parameters in the MLP-WWO model. Regarding a similar process, the optimal parameters of the other algorithms were obtained. Table 4 demonstrates the outputs related to the training level of the MLP models. e results shown in Tables 4 and 5 are based on the best value of random parameters obtained in Table 3.
Based on Table 4, the hybrid MLP has better and more acceptable results for modeling. In addition, the MAE of the MLP-WWO was 2.1%, 3.2%, and 4.1% lower than those of the MLP-PSO, MLP-GA, and MLP models. e PBIAS of the MLP-WWO was 0.14, while it was 0.35, 0.37, and 0.39 for the MLP-PSO, MLP-GA, and MLP models. e MLP-WWO has the highest NSE and the lowest PBIAS among other models. e PBIAS of the MLP model is higher than those of the other models. Table 4 shows the error indexes for soft computing models based on the daily scale at the testing level. As seen in this table, the MAE of the MLP-WWO is 1.3%, 2.5%, and 3.3% lower than those of the MLP-PSO, MLP-GA, and MLP models. Furthermore, the PBIAS of the MLP-GA is higher than those of the other hybrid models. e NSE of the MLP is 0.84, while it was 0.87, 0.90, and 0.92 for MLP-GA, MLP-PSO, and MLP-WWO models. Table 5, the hybrid MLP has better and more acceptable testing results for modeling. e PBIAS of the MLP-WWO is lower than those of the other hybrid models. e MAE of the MLP-WWO was 7.2%, 14%, and 17% lower than those of the MLP-PSO, MLP-GA, and MLP models. e highest NSE and lowest PBIAS were obtained for the MLP-WWO model. Table 5 shows the results related to the testing level of the MLP models. As shown, the MAEs of the WWO are 7.2, 12, and 17% lower than those of the MLP-PSO, MLP-GA, and MLP models, respectively. e NSE of the MLP-PSO is higher than that of the MLP-GA and MLP models. e results indicated that the optimization algorithms improved the accuracy of the standalone MLP model. In other words, the combination of the hybrid MLP model and preprocessing method such as PCA could be used for practical problems with different input scenarios in water resource management. e modelers may encounter with a large number of input data in estimating different hydrological variables. In addition, the standalone soft computing models may not lead to good results only. us, it is essential to use a hybrid framework based on preprocessing methods and optimization algorithms to ensure accurate estimations of the target variables.     Figure 9 shows the scatter plots for the soft computing models. e results indicated that the outputs related to the MLP-WWO were closer to the observed data indicating the    Figure 9(a) demonstrates that the MLP model has the worst performance among the MLP models. Furthermore, as displayed in Figure 9(a), the MLP-PSO scatter points are closer to the 45°l ine in comparison to the MLP-GA scattered points.

Monthly Scale.
e computed R 2 for soft computing models indicated that the MLP-WWO had the best performance as compared to the other models, and the PSO outperformed the GA. To sum up, the hybrid MLP models outperformed the standalone MLP model (Figure 9(b)).

Probability Distribution of NSE.
e training data were randomly sampled M times with replacement to build a model and evaluate its NSE for each resample. M trained models were used to compute the NSE based on the validation data.
is approach was used to perform the goodness of fit of predicted data and observed data. e procedure may require high computational time, depending on the number of patterns. After approximating the probability distribution of NSE of the NSE, its significance was evaluated based on the 95% confidence interval (CI) ( Table 6). e results were analyzed to predict the daily and monthly ET as follows: Figure 10(a) displays how the probability that NSE > 0.80 is as high as 93% in the MLP-WWO model. us, the MLP-GA model did not achieve 0.90 NSE and decreased to 0.50 to 0.89 (Figure 11(a)). e obtained results for the MLP-PSO show that more than 60% of the CIs are above 0.80 NSE (Figure 11(a)). Based on the results, the MLP-PSO is better than the MLP-GA and MLP-WWO.

Monthly Scenario (NSE).
Regarding the MLP-WWO, the results indicated that more than 93% of the CIs are above Complexity 0.80 NSE (Figure 11(b)). erefore, the MLP-GA model failed to reach 0.90 NSE and decreased to 0.5 to 0.87 (Figure 11(b)).
Based on Figure 10 showing the convergence curves, the WWO converted earlier than other methods.

Conclusion
In this study, the soft computing models were used as the input for the soft computing models. e forecasting models were generated using the lagged ET values for Aidoghmoush basin (Iran). en, the outputs of the soft computing models were compared, which indicated that the MLP-WWO outperformed the other MLP models. In addition, the MAE of the MLP-WWO is 1.3%, 2.5%, and 3.3% lower than those of the MLP-PSO, MLP-GA, and MLP models in the daily scale models. e MAE of the MLP-WWO was 7.2%, 14%, and 17% lower than those of the MLP-PSO, MLP-GA, and MLP models in the monthly scale. In addition, the outputs related to the MLP-WWO were closer to the observed data. Finally, it must be stated that the appropriate optimization algorithms affect the accuracy of standalone soft computing models. us, the selection of a robust optimization algorithm is an important issue for developing the soft computing models. Future investigations can develop the performance of the models of this study. e proposed models can be used for predicting other hydrological variables such as rainfall, temperature, and runoff. Furthermore, the next studies can investigate the effect of uncertainty on the accuracy of the models.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon reasonable request.