Training and Testing Data Division Influence on Hybrid Machine Learning Model Process: Application of River Flow Forecasting

The hydrological process has a dynamic nature characterised by randomness and complex phenomena. The application of machine learning (ML) models in forecasting river ﬂow has grown rapidly. This is owing to their capacity to simulate the complex phenomena associated with hydrological and environmental processes. Four diﬀerent ML models were developed for river ﬂow forecasting located in semiarid region, Iraq. The eﬀectiveness of data division inﬂuence on the ML models process was investigated. Three data division modeling scenarios were inspected including 70%–30%, 80%–20, and 90%–10%. Several statistical indicators are computed to verify the performance of the models. The results revealed the potential of the hybridized support vector regression model with a genetic algorithm (SVR-GA) over the other ML forecasting models for monthly river ﬂow forecasting using 90%–10% data division. In addition, it was found to improve the accuracy in forecasting high ﬂow events. The unique architecture of developed SVR-GA due to the ability of the GA optimizer to tune the internal parameters of the SVR model provides a robust learning process. This has made it more eﬃcient in forecasting stochastic river ﬂow behaviour compared to the other developed hybrid models.


Introduction
e hydrological, environmental, and climatological processes related to different components of the hydrologic cycle such as rainfall, evaporation, infiltration, groundwater, and river flow are embedded with high nonlinearity, nonstationery, and redundancy [1,2]. Mathematical models are generally used to address the different forms of nonlinearity and model different hydrological processes [3,4]. A large number of hydrological models have been developed and successfully applied for forecasting different components of hydrological cycles [5]. Among them, the hydrological model for forecasting river flow has received the highest attention due to its enormous importance [6]. Being that river flow pattern is difficult to be fully understood due to the temporal and spatial changes in basin characteristics and variabilities in the rainfall-runoff process, univariate river flow simulation has become a trending topic in the field of hydrology [7]. River flow modeling in a particular climate environment (e.g., semiarid) is highly complicated as it is associated with regional climate and human interventions. Significant changes in climate have been witnessed across the globe in recent years. Streamflow time series is dynamic, complex, and presenting nonlinear and randomness phenomena. ese characteristics make the forecasting process challenging for most of the hydrological researchers [8,9]. Accurate long-term forecasting of river flow at monthly and yearly scale is very important for the planning and operation of water reservoir, agricultural and irrigation water management, estimation of catchment water balance, estimating minimum instream environmental flow, and other purposes [10,11]. e accurate short-term (real-time) forecasting of river flow such as hourly or daily time step is important for flood and/or water scarcity forecasting in order to minimize and mitigate their effects on infrastructure and public health [12]. In addition, this river flow forecast makes it possible to predict the minimum instream environmental flow that is primarily used to sustain organisms' reproduction and growth and provide optimum habitat area [13].
Several developments have been recorded over the years in the application of machine learning (ML) models, artificial intelligence (AI) models, and big data mining technology for the solution of related hydrological engineering problems [14][15][16][17][18]. Being that these models do not depend on a physical meaning, they are suitable for solving problems, which are influenced by several complex factors, such as nonlinear natural processes and forecasting problems [19]. ese models have also been found suitable for the solution of hydrological problems [20]. e ML models, unlike the physically based models, can mimic the actual trend of hydrological time series data by autonomously learning the various hydrological processes. However, ML models need a low level of expertise for the implementation and they can provide a fast solution compared to physically based models.
Despite the suitability of the ML models in hydrological studies, they are still prone to several drawbacks, such as prolonged computation time and insufficient feature extraction capability [21]. e recent interest in the ML models has exposed many other drawbacks of the classical ML models like artificial neural network (ANN), support vector regression (SVR), adaptive neurofuzzy inference (ANFIS), and random forest (RF) [22], which include trapping at local optima and gradient disappearance. erefore, exploration of new robust and reliable versions of ML models for the modeling of various hydrological phenomena is always the motivation of hydrologists and soft computing scientists [20]. Recently, the new era of ML models is configured in the form of hybridized models where in integral of tuning parameter algorithms, it is conducted for solving the internal parameters using some bioinspired or mathematical optimizers. e hybrid ML model has been emerged as the sought-after model due to its capability to overcome the drawbacks of standalone ML models [23]. It has been successfully applied in recent years for complex hydrological problems [24,25]. e traditional ML models can build their learning processes using trial and error procedure that is associated with the possibility of the limited learning process. Hence, introducing the new optimization approaches can solve this problem and provide a reliable and robust learning mechanism. Risks associated with flooding can be reduced via accurate modeling of river flow time series dynamics. is can also enhance the capability of proper management of reservoirs during droughts [26]. e accurate forecasting of river flow time series should preferably be based on the existing long data with memory networks. Hybrid ML models as a robust methodology provide an excellent learning memory that could better model river flow patterns and provide better forecasting. During training the predictive model, the data is divided into the training and testing phase. A low ratio of training data may decrease the performance of the model, whereas the high ratio leads to overfitting. In both cases, the models get bad performance and unacceptable results. So, choosing the best ratio of data division is considered a challenging task in developing a machine learning model [27].
e main objective of the present study is to investigate the impact of training and testing data divisions on the process of several hybrid ML models including hybridized ANN and SVR with genetic algorithm (GA) and hybridized SVR and RF with the grid search algorithm. e development of the introduced models is investigated for river flow forecasting using historical data, which belongs to the Tigris River in semiarid climate of Iraq. e capacity of the developed model is examined to solve the complexity of river flow by using statistical metrics and graphical presentation. e modeling procedure is structured based on different antecedent values of river flow and is defined as the matrix attributes for univariate modeling.
ree data division scenarios for the training and testing dataset were inspected. e obtained results are discussed comprehensively and analysed comparatively to reveal the forecasting ability of different models. ereafter, the forecasted river flow was used to estimate minimum instream environmental flow that is primarily used to sustain organisms' reproduction and growth and provide optimum habitat area.

Description of Study Areas and Data
e Tigris River is one of the largest rivers in Middle East. e total length of the river is about 1718 km which is shared by Turkey, Syria, and Iraq. About 85% of the total basin of Tigris River (253,000 km) lies in Iraq. e Tigris River along with the Euphrates River supplies the major share of total water required for irrigation, human use, and industrial purposes for several cities in Iraq, Turkey, and Iran counties. e climate of the basin is predominantly arid; however, semiaridity is the main characteristic of the river. e average rainfall in the basin is 216 mm with most of the rainfall occurring during winter (December to February) [28]. However, the rainfall concentration is varied from the north, middle, and south of Iraq [29]. e temperature varies from maximum 45°C during summer to minimum 10°C in winter [29]. e monthly river flow data of Tigris River for the period January 1991 to November 2010 was obtained from the USGS Data Series 540 for the present study [30]. e mean monthly discharge and the standard deviation of Tigris River flow at Baghdad Station are 411.35 m 3 / s and 234.52 m 3 /s, respectively [31]. e location of Tigris River in the map of Iraq is presented in Figure 1.

Data Division Scenarios and Input Combinations
In order to utilize the machine learning methods for forecasting, the observed river flow data was split into two sets 2 Complexity (training and testing). ree data divisions were inspected in this research including 70%-30%, 80%-20, and 90%-10%. is is owing to the fact that ML models can behave differently based on the supplied dataset span for the learning process and testing phases [32]. e identification of the input parameters for the ML model's development is an essential step prior to the models' learning process. In this study, as the intended is the river flow forecasting, lead times were determined using the statistical approaches including the autocorrelation function (ACF) and the partial autocorrelation function (PACF). e suitable input parameters were decided based on the highly influential lead times toward the one-step-ahead forecasting. Valuable temporal pattern may exist in observed data which can be used as an input pattern for the development of the forecasting model. ACF can be used to capture information on the temporal patterns existing in time series. ACF provides information about the correlation between two separate points on the time series at different time leads and provides information about the repeating patterns in the time series. us, it tells how the past river flow influences the future river flow. e ACF value ranges between 1 and −1; the value near to 1 indicates near-perfect correlation and the value near to  Complexity −1 indicates complete anticorrelation. However, the time series data for the river flow is time-independent, and therefore, the correlation between the pair of values depends only on the time differences between the two points without considering their position in the series. In order to distinguish the most appropriate lead times of the time series that notably or substantially might influence the forecasting, the reference value of ACF should be identified. In this study, the ACF values equal to or more than 0.5 were considered for the selection of the time-lag pattern [33,34]. Figure 2 illustrates the ACF values for different lead times up to 20 time differences. Figure 2 shows that the ACF values for Tigris River for the lead times Q t−1 to Q t−5 are more than 0.2. In other words, five input combinations were constructed (Model I, Model II, . . . , Model V).

Machine Learning Models Overview
In this study, four different hybrid machine learning models were developed including ANN-GA, SVR-GA, Grid-SVR, and Grid-RF for monthly river flow forecasting located in semiarid region, Iraq. RapidMiner software was used to develop machine learning models. RapidMiner is an opensource, free, and flexible software implemented by Java language. e program has been used in data analysis, application design, and developing complex models [35]. e development of predictive models using the above hybrid ML models is described in the following sections.

ANN-GA.
Inspired by the human neural network, ANN was proposed and developed to simulate the human brain during learning. With high computing power, ANN can outperform the performance of the human in some cases. ANN was applied for solving many regression, clustering, and classification tasks.
ANN, as shown in Figure 3, consists of three types of layers: input layer, hidden layer, and output layer [36]. Each layer consists of a set of nodes called artificial neurons that perform elementary calculations [37]. Weighted connections connect neurons in the successive layers. During the training procedure of ANN, weights are defined and updated with the aim of minimizing the error between the actual output and the computed output. ANN has the ability to produce output with reasonable accuracy [38], if it has gone through an effective learning phase. e backpropagation neural network (BPNN) proposed by Rumelhart et al. [39] is one of the most popular learning algorithms. BP aims to optimize the network parameters by minimizing the least square error between actual and computed output.
Inspired by Darwin's theory of biological evolution, GA was developed as a heuristic method for finding the function's optimal value [40,41]. It represents one of the most popular forms of an evolutionary algorithm used to solve different optimization problems [42,43]. GA is initialized by generating a random population of individuals (solutions) and tries to optimize these individuals by applying three successive operations: is procedure is repeated a certain number of times until the optimum values are obtained. In this section, the performance of BP was optimized using GA for tuning the parameters that affect the NN's performance. e training procedure for BP starts by using GA to perform a global search for network weight values. It refines an initially random set of weights to get a better estimate, and it is likely to be close to the global optimum [44,45]. After that comes the role of BP in training in order to refine the solution provided by the GA to bring it to the optimum solution.
e general steps of ANN-GA can be summarized as follows. e flowchart of the GA-ANN method is illustrated in Figure 4. Initially, a feasible NN's topology was predefined through determining the number of neurons in the hidden layer [46,47]. After that, steps to improve the performance of neural networks through GA algorithms begin as follows: Step 1: initialize the random values for weight and bias (w ij and b i ) according to initial network topology.

Complexity
Step 2: generate an initial random population of parameters. en perform the following steps until reaching the maximum number of iterations: Step 2.1: calculate the fitness value by applying the fitness function on each individual in the population.
Step 2.2: update the network parameters (w ij and b i ) based on the lowest error E.
Step 2.3: generate a new population through performing GA operations: selection, crossover, and mutation.
Step 3: obtain the optimal value of parameters from the last population Step 4: train NN by PB training procedure (updating final weight and bias).
Step 5: evaluate the performance of NN using testing data. If the error is acceptable, stop and return the optimal model; otherwise, change the network topology (number of hidden layer's neurons) and return to step 1.

RF-Grid.
Grid search is a traditional way for performing hyperparameters optimization for ML models. It is simply an exhaustive search method that sets up a grid of the possible values of the hyperparameters ( Figure 5) and trains a model for each of the combinations [48]. In this method, all the possible combinations of the data are tried and tested using k-fold cross-validation technique.
Random forest (RF) is one of the most potent ensemble learning techniques developed by Breiman in 2001 [49] to solve different regression, classification, and clustering problem, and it exhibited excellent performance in many fields [50][51][52].
Despite the advantages of decision trees of its simplicity, ease of use, and interpretability [53,54], it has many Complexity limitations, such as their suboptimal performance and lack of robustness. erefore, RF can overcome the limitations of traditional decision trees by combining the performance of many randomized, decorrelated decision trees to predict or classify the variable to a specific class. RF is a boosting technique in which it boosted the performance of a number of decision trees via a voting scheme. An example of RF is shown in Figure 6 in which the forest consists of n trees and a voter.
Regarding the main advantages of RF in regression tasks, it includes (i) bootstrap resampling, (ii) random features selection, and (iii) out-of-bag error estimation.
Suppose there are n trees e prediction value of each decision tree is Y i � T i (x). While the final prediction output Y results from aggregating the outputs of all randomly generated trees. e aggregation process is accomplished in the classification task by taking the majority votes, while in regression task, it is accomplished by taking the average: e RF algorithm can be summarized as follows [55,56]: (i) Generate n number of trees by selecting randomly different bootstrap samples from the training data. e out-of-bag samples are the samples that are not selected.
(ii) For each bootstrap sample, grow a full decision tree to the maximum size without pruning. In splitting the nodes of the tree, a specific number of features were selected randomly instead of choosing all features (this refers to a random feature selection). (iii) Repeat step 2 until forming a randomly generated forest consisting of n decision trees. (iv) Predict the new data by applying the n trees and aggregate the results.
In this section, the grid search algorithm was used to optimize the RF algorithm by tuning the hyperparameters of it.
e primary hyperparameters of RF that affect its performance are (1) the number of trees in the forest that must be generated before taking the maximum voting or average of predictions and (2) the maximum number of features to split in each node of the tree. e hybrid structure of RF and grid search was considered in the following steps ( Figure 7): (i) Step 1: define RF searching parameters range: maximum value, minimum value, and step size. (ii) Step 2: build the grid search space on the coordinate system. (iii) Step 3: build the RF models using all possible combinations of parameters and evaluate the performance of RF. (iv) Step 4: return the multiple set of optimal parameters. (v) Step 5: if the accuracy is satisfied, stop and return the optimal parameters; otherwise go to step 6. (vi) Step 6: redefine the range of searching near the optimal parameters and reduce step size. en, go to step 2. (vii) Step 7: repeat steps 2-6 until the optimal hyperparameters values satisfying the accuracy were found. (viii) Step 8: build RF forest model with the optimal parameters. (ix) Step 9: predict the output value of data in testing set by RF model.

SVR-GA. Support vector regression (SVR) is introduced by Vapnik [57]
as an extension of SVM for solving the regression problem. SVR is a very useful tool for prediction because of its ability to map nonlinear data space into a higher dimensional feature space [58]. Consider a learning dataset defined as x i , y i (i � 1, . . . , n), where x i ∈ R n is the input vector and y i ∈ R is its corresponding output vector. e main objective of SVR is to deduce the regression function f (x) that describes the relationship between the input data x and the target output y with an error value that is less than epsilon ε deviation for all training data. e SVR function can be written as follows [59]: where f(x) is the computed output of x, φ(x) is the nonlinear feature mapping function of inputs, and w and b are adjustable coefficients that represent the weight vector and intercept vector, respectively. e training of SVR is to find w and b values by minimizing the upper bound of the regression error. erefore, SVR is considered as an optimization problem that tries to make the regression function f(x) as flat as possible by minimizing the value of w, which necessitates the minimization of Euclidean norm, that is, ‖w‖ 2 . 6 Complexity e optimization problem that is used to identify the regression problem is given as follows: e preceding equation applies if there is function f (x) which approximates all pairs of (x i , y i ) with an accuracy of ε. Besides, some mistakes that violate the conditions above are introduced. e inaccessible limitations of the optimization problem are addressed by the slack variables ξ i and ξ * i . Equation (2) can, therefore, be rewritten as explained as follows: where C is defined as a nonnegative constant which expresses the box constraint that is responsible for monitoring data points above the ε-insensitive error range and which is also helpful when avoiding overfitting problems [58]. ere are several commonly used kernel types in SVR that convert the nonlinear function in equation (1) into higher dimensional space. e radial basis function (RBF) has been widely used in previous studies [5,60]: where K(x, x i ) is the kernel function and c is the width parameter. e performance of the SVR model depends on the hyperparameter tuning of the model: the regularization parameter (C), the epsilon (ε), and the kernel parameter (c).
ese parameter values affect the performance of the model incredibly, in which the influence of these parameter values is presented as follows. e value of the first parameter (i.e., regularization parameter, also called box constraints) is used to decide the penalty of the approximation function. It must not be very small or large. If it is too small (large), it will cause underfitting (overfitting). In contrast, the decision boundary's maximum margin is controlled by the insensitivity loss function (ε). Finally, the kernel function controls the ability to make the model for dealing with nonlinear function responsible for transforming the nonlinear function into a more suitable function [61,62].
Many researches were conducted to tune the hyperparameters of SVR using manual or grid search [63,64]. However, this approach's complexity is increased incredibly when the width of the search space is increased. Also, this approach does not always get the best hyperparameter values for the model. Other approaches have been inspired to overcome these limitations, that is, the genetic algorithm (GA). GA is considered one of the power optimization algorithms proposed by Holland in 1975 and inspired by Darwin's theory. In this section, GA is inspired and used to find the hyperparameter of SVR. e proceeding procedure  of optimizing SVR hyperparameters using GA was illustrated in Figure 8, and the steps involved are summarized as follows: Step 1: initialize the SVR parameters (ε, C) and the kernel parameters c. Code these parameters to create a chromosome directly.
Step 2: initialize the GA parameters randomly: population size, number of generations, mutation rate, and crossover rate.
Step 3: perform SVR model on training data using the k-fold cross validation.
Step 4: calculate the fitness value of each individual in the population according to the mean square error (MSE).
Step 5: generate new offspring parameters population through selection, crossover, and mutation.
Step 6: repeat steps 4-5 until ε, C, c are satisfied with minimal error; otherwise, continue to optimize.
Step 7: output the optimal parameters found at the end of the generation. Train SVR model with these parameters Step 8: predict the out of data in the testing dataset part by SVR model.

SVR-Grid.
In this section, the grid search algorithm is used to obtain the optimal values of the SVR parameters (ε, C, and g). e grid search algorithm is based on trying all possible values of the parameters in a given space with a specified step distance. e cross-validation technique [65] derives the SVR model's parameters that improve its performance with the best accuracy. e hybrid structure of SVR optimized by grid search is illustrated through a flowchart, as shown in Figure 9, and consists of the following steps: Step 1: define the range of SVR searching parameters.
Step 2: initialize the values of the parameters and step distance.
Step 3: split dataset into two sets (training and testing sets) Step 4: train SVR model using k-fold cross validation on the training dataset.     Redefine the range of searching near the optimal parameters, and reduce the search step Figure 9: e flowchart of SVR-grid model.

Complexity
Step 5: if the MSE satisfies the accuracy, then select the sets of parameters with the minimum MSE; otherwise, continue the optimize and go to step 5.
Step 6: search for multiple sets of optimal parameters.
Step 7: redefine the range of searching near the optimal parameters and reduce the search step; then go to step 4.

Model Performance Evaluation Using Statistical
Indices. e effectiveness of the proposed modeling techniques was examined by comparing the forecasted river flow with the observed river flow data. It should be noted that the data used in this investigation was continuous and without any missing value. e performance of the models in forecasting river flow one month ahead was forecasted and evaluated using several statistical metrics. Five statistical metrics were used to measure the performance of the predictive models in forecasting river flow during model testing, namely, Mean Error (ME), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Percentage Error (MPE), Mean Absolute Percentage Error (MAPE), and Coefficient of Determination (R 2 ) [66,67]. ese statistical metrics were widely used by researchers to evaluate the performance of predictive models in hydrological and machine learning models [68]. ese measurements are considered a good indicator of the accuracy and robustness of the model: where N is the length of the testing data set. y o and y p are the actual and forecasted river flow data. € y o and € y p are the mean values of the actual and forecasted river flow data. Tables 1-3 present the statistical performance indicators (i.e., ME, RMSE, MAE, MPE, MAPE, and R 2 ) for the five constructed input combinations, training and testing phases, and the three data division scenarios. e tables showed that all the four hybrid ML models (ANN-GA, SVR-GA, Grid-SVR, and Grid-RF) are performed in a general good performance. However, they are varied from one input combination to another. at totally depends on the historical data memory provided by the lead time "antecedent river flow values." A notable enhancement is achieved using the hybridized SVR-GA which collaborates with the findings of several other studies established over the literature within hydrological engineering [69][70][71][72]. It is observed that the data division plays an essential role in the learning process of the developed ML models. Apparently, increasing the span of the training phase contributes to model predictability enhancement. In quantitative terms, the best results of forecasting are attained for the SVR-GA with (RMSE � 0.04, MAE � 0.03, and R 2 � 0.95). e SVR-GA model indicated boosting in forecasting accuracy; although, the other ML models obtained a reasonable prediction accuracy. is observation approved the capacity of the hybrid SVR-GA model to solve the complexity of river flow located in semiarid environment.
e Coefficient of Determination (R 2 ) was achieved more than 0.90 for almost all the input combinations over the using the SVR-GA model although the performance of the SVR-GA model over the training phase was not superior (Table 3).
is can explain the feasibility of the SVR-GA to be more potent.

Models Graphical Evaluation.
Two different graphical presentations are hereby presented for the evaluation of the performance of the proposed models. e actual and the forecasted values of the river flow for Tigris River are presented using scatter plots and Taylor diagram. e scatter plots for the three modeling scenarios of data division (70%-30%, 80%-20, and 90%-10%) and the four developed ML models (ANN-GA, SVR-GA, Grid-SVR, and Grid-RF) are illustrated in Figures 10-12. It can be clearly observed that the 90%-10% data division scenario using the SVR-GA model attained the best match between the observed and forecasted river based on four-month antecedent values. For this particular data division and based on this lead time, the observed and the forecasted values by all the models were found to follow the similar trends. However, the matching between the observed and the forecasted values was found to be the best for SVR-GA model. To assess the efficacy of the models, it was justifiable to investigate the linear relationship between the time series of observed and the forecasted river flows for the testing period. e uniformity plot of river flow forecasted by SVR-GA attained the maximum determination coefficient (R 2 � 0.96).
Another graphical presentation that was investigated for the applied predictive models was Taylor diagram [73] (Figures 13-15). It is a distinguished two-dimension graphical presentation that accompanied three statistical metrics including RMSE, correlation, and standard deviation. In harmony with the presented statistical results and the scatter plot presentation, using four-month lead time predictors for the one month ahead river flow, the SVR-GA model indicated the best forecasting value owing to the location of the model results on the Taylor map for the 90%-10% data division scenario.

Discussion and Possible Future Research Directions
e results indicate that the proposed hybrid ML models can provide high accuracy in forecasting river flows for the studied Tigris River where the variability of river flow is less. Among all the developed four models, it was noticed that SVR-GA was superior to the other models. e model revealed the ability to solve complex process related to engineering problem. SVR-GA achieved a high Coefficient 14 Complexity of Determination for almost all the input combinations for streamflow forecasting. It can be concluded that SVR-GA has the potential to deal with dynamics and chaotic environment with high accuracy in forecasting process.
In the current study, a type of forecasting is based on univariate modeling procedure where only river flow historical data was intercepted in the model development. In such case, it is suggested to use other variables such as rainfall, humidity, temperature, or even evaporation rate to have a strong relationship with river flow. However, it is worth highlighting that the proposed models demonstrated an efficient soft computing model to capture the actual trend of the river flow time series. is is highly essential for several water and environmental engineering applications and particularly for management and monitoring of flooding and mitigations events. Based on the reported results, it was observed that using 20-year river flow data is sufficient for the development of the forecasting model. However, the length of the data span used for the modeling learning process has a considerable effect on the accuracy of model performance. erefore, this is the essential finding of the hypotheses data division scenarios on the capacity of the machine learning models. Indeed, serious attention should be given in selecting the length of data for training the models. Selecting the length of data in an appropriate way reduces underfitting and helps the modeler to choose the best size of training data. is is due to the fact that the training stage should experience the majority of river flow Complexity patterns to allow the models in the testing session to forecast river flow with an acceptable level of accuracy. e present study indicates that 20-year river flow data is enough to provide an acceptable accuracy in forecasting river flow.
Another significant aspect which can improve the predictability of the applied predictive models is the optimum selection of the lead times correlated to the targeted variable.
Mutual information (MI) statistical approach potentially can be integrated as a prior stage of the forecasting model development process to abstract the highly associated information. e approach is based on the information theory and the notion of entropy [74].
It is worth highlighting that there is a need to extract the highly correlated features (the correlated lead times)  for the development of similar developed hybrid machine learning. Hence, the deep learning model has the advantage of deploying the hidden layers to automatically extract the features. e hydrological process is linked to typical time-sequential data [75] and therefore, the forecasting of hydrological time series is based on a fixed number of previous steps selected based on correlation.
For such a case, deep learning model can be trained to learn time-varying or sequential patterns by facilitating time delay units through feedback connections for the selection of antecedent values as input. e suitability of the deep learning model for hydrological application relies on its capability of providing accurate and timely timeseries prediction in the systems.

Conclusion
e motivation for the current research was to forecast monthly river flow in semiarid environment. Four hybrid ML models (i.e., ANN-GA, SVR-GA, Grid-SVR, and Grid-RF) were developed for this purpose. ree data division modeling scenarios (i.e., 70%-30%, 80%-20, and 90%-10%) were proposed and inspected for the model's performance predictability. e architecture of the applied ML models was established based on several antecedent values of the river flow in accordance with the correlation analysis. e performance of the models was assessed using a number of numerical skill indicators and graphical presentations. In general, the results demonstrated that the SVR-GA model achieved the highest ability in forecasting monthly river flow with significant accuracy. erefore, it is possible to improve the river flow forecasting ability using the proposed hybrid machine learning model. In addition, the results indicated that building the predictive based on 90%-10% training-testing dataset attained better prediction capability. e results indicated that using 20 years of river flow data is sufficient for the development of the forecasting model. e study concluded that the size of the training data has a significant effect on the accuracy of the predicted model. e study revealed that the data division has an important role in the learning process of the developed ML models.
e results demonstrated that increasing the span of the training phase can enhance the accuracy of model performance.
e current research is possible to be further extended for the forecasting enhancement of river flow by including more information on river flow patterns through the inclusion of climate parameters such as rainfall, humidity, and temperature as an input. e models in the present study were developed for the forecasting of only one-step-ahead river flow. However, multiple-month ahead forecasting models are important for water resources planning and management. Although the success of SVR-GA model in forecasting one-stepahead river flow indicates its capability in longer time-step ahead river flow forecasting, it is still necessary to examine the ability of SVR-GA model in multiple-month ahead forecasting. e study recommended using mutual information (MI) statistical approach as a prior stage of the forecasting model development process to extract the highly associated information.
Data Availability e data used in the study are available upon request from the corresponding author.

Conflicts of Interest
e authors have no conflicts of interest to declare.