Multivariate Streamflow Simulation Using Hybrid Deep Learning Models

Reliable and accurate streamflow simulation has a vital role in water resource development, mainly in agriculture, environment, domestic water supply, hydropower generation, flood control, and early warning systems. In this context, these days, deep learning algorithms have got enormous attention due to their high-performance simulation capacity. In this study, we compared multilayer perceptron (MLP), long short-term memory (LSTM), and gated recurrent unit (GRU) with the proposed new hybrid models, including CNN-LSTM and CNN-GRU. Hence, we can simulate one-step daily streamflow in different agroclimatic conditions, rolling time windows, and a range of variable input combinations. The analysis used daily multivariate and multisite time series data collected from Awash River Basin (Borkena watershed: Ethiopia) and Tiber River Basin (Upper Tiber River Basin: Italy) stations. The datasets were subjected to rigorous quality control processes. Consequently, it rolled to a different time lag to remove noise in the time series and further split into training and testing datasets using a ratio of 80 : 20, respectively. Finally, the results showed that integrating the GRU layer with the convolutional layer and using monthly rolled average daily input time series could substantially improve the simulation of streamflow time series.


Introduction
One of the emerging research areas in hydrology is hydrological simulation [1], through which catchment responses are evaluated in terms of meteorological forcing variables. Hydrological simulation is also crucial for water resource planning and management, such as flood prevention, water supply distribution, hydraulic structure design, and reservoir operation [2,3]. However, river flow simulation is not an easy task since river flow time series are commonly random, dynamic, and chaotic. e relationship between streamflow generation and other hydrologic processes is nonlinear, which is controlled not only by external climatic factors and global warming but also by physical catchment characteristics.
Stream flows are mostly recorded at river gauging stations. However, different research studies show that the availability of gauging station records is generally decreasing in most parts of the world [4]. Tourian et al. [5] gathered a time series plot of the number of stations with available discharge data from the Global Runoff Data Centre (GRDC).
is time series indicates a decline in the total monitored annual stream flows between 1970 and 2010. Besides, inadequate discharge observation and malfunctioned gauging stations worsen the situation in developing countries [6]. Sparsely distributed rain gauge stations in Ethiopia also limit the performance of physical hydrological models. erefore, research studies on the robustness of innovative discharge data estimation models are undeniably important.
Streamflow simulation models in the literature generally are divided into two: (1) process or physical-based models that are generated from catchment characteristics and (2) data-driven models that depend on historically collected data [2,3,7]. Process-based models commonly use the experimental formula that provides insight into physical characteristics and has extensive data requirements. On the other hand, data-driven models are suitable and can function easily without considering the internal physical mechanism of the watershed system [2,3,7].
Artificial neural networks (ANNs) are the most used and studied "black-box" models. ey are utilized in many scientific and technological areas than the list of available black-box algorithms, such as support vector machine (SVM), genetic programming (GP), fuzzy logic (FL), recurrent neural network (RNN), and long short-term memory (LSTM) [7,8]. ANN is available in different functionalities and architectural forms, from simple to advanced levels. A recurrent neural network (RNN) is one of the advanced ANN architectures. It has been considered a specially designed deep learning network for time series analysis that quickly adapts to temporal dynamics using previous time step information [2]. However, RNN cannot capture long-time dependencies, and it is susceptible to vanishing and exploding gradients.
Couta et al. suggested advanced RNN or long short-term memory (LSTM) as one of the most effective approaches [8].
e LSTM unit has a cell that comprises an input gate, an output gate, and a forget gate [9]. Due to these gates, the LSTM model has shown promising results in different applications, including speech recognition, time series modelling, natural language processing, handwriting recognition, and traffic flow simulation [3,10]. Studies have also shown that LSTM has powerful performance for streamflow simulation over different powerful multilayered (ML) tools [3,11]. Campos et al. [10] applied autoregressive integrated moving average (ARIMA) and LSTM network to forecast floods on four Para' iba do Sul's River stations in Brazil. Aljahdali et al. [7] also compared the LSTM network and layered RNN to forecast streamflow in the USA's two rivers, the Black and Gila rivers. A recent article by Rahimzad et al. [12] used time-lagged Qt-1, Qt-2, and other climatic variables to forecast Qt in the future and concluded that the LSTM network outperforms linear regression (LR), multilayer perceptron (MLP), and support vector machine (SVM) in forecasting daily streamflow.
A few years back, Cho et al. [13] introduced gated recurrent units (GRUs) similar to LSTM with a forget gate which have fewer parameters than LSTM, as it lacks an output gate. GRU's capacities in speech signal modelling and natural language processing were similar to those of LSTM. However, there are debates on the relative performance of these two architectures for streamflow and reservoir inflow simulation, which is not well studied with different timescales and environments.
Notwithstanding the difference in their performance, selecting appropriate time series models from various known deep learning network architectures is difficult. LSTMs and GRUs are not always the ideal sequence prediction option. However, simulation with better prediction accuracy, fast running time, and less complicated models requires more research. Hence, this comparative analysis on the network architectures helps decide the optimized alternative for time series analysis. Recently, different hybrid deep learning models are getting wide attention from researchers in various fields of study. Chen et al. [14] used convolutional neural network (CNN), LSTM, and hybrid CNN-LSTM models for nitrogen oxide emission prediction. ey concluded that CNN-LSTM has an accurate and stable forecast of periodic nitrogen oxide emissions from the refining industry. Moreover, Li et al. [15] used univariate and multivariate time series data as input for LSTM and CNN-LSTM models. Hence, for the analysis of air quality using particulate matter (PM2.5) concentration prediction, the proposed multivariate CNN-LSTM model gives the best result due to low error and short training time.
e integration of CNN and LSTM models benefits time series prediction models such that the LSTM model can efficiently capture long time sequences of pattern information. In contrast, CNN models can filter out the noise of the input data and extract more valuable features, which could increase the accuracy of the prediction model [16]. Moreover, integrating CNN with GRU can also lead us to robust preprocessing of data, providing a viable option to improve the model's accuracy [17]. Even though combining CNN with LSTM showed remarkable results in different studies, its application in hydrological fields still demands more research [18]. Muhammad et al. [19] used LSTM, GRU, and hybrid CNN-GRU models for streamflow simulation based on 35 years of Model Parameter Estimation Experiment (MOPEX) dataset of 10 river basins in the USA. ey revealed that the proposed hybrid model outperforms the conventional LSTM; nevertheless, the performance is almost the same with GRU. Recently, Barzegar et al. [20] studied short-term water quality variable prediction using a hybrid CNN-LSTM model and effectively captured low and high water quality variables, mainly dissolved oxygen concentrations.
Screening input variables for different model architectures is also a challenging task for the researchers. Even though rainfall, evaporation, and temperature are causal variables for streamflow modelling, data availability and study objectives limit the choice variability [21]. Van et al. [21] discussed that applying temperature and evapotranspiration input nodes into the model increases the network complexity and causes overfitting. In contrast, Parisouj et al. [22] concluded that using readily available input variables such as temperature and precipitation for data-driven streamflow simulation will provide a reliable result. Hence, this research will contribute a step to this debate by testing different input combinations of various climatic regions in the performance of the proposed models.
To the best of our knowledge, we identify minimal literature that shows the performance variation of different hybrid models for streamflow simulation in various input variability conditions at once. us, we compared various forms of hybrid CNN-LSTM and CNN-GRU architectures with the classical MLP, GRU, and LSTM networks to simulate single-step streamflow using two climatic regions, available precipitation, and minimum and maximum temperature data. Moreover, the study tests the hybrid models with different layer arrangements and applies Keras tuner to optimize model hyperparameters. In general, the primary objective of this study will be to test the performance variation of the proposed models with extreme input variability 2 Computational Intelligence and Neuroscience conditions, which includes climatic, input combination, input time window, and average rolling time window variability. is study used different open-source software and machine learning libraries, including Python 3.6 for programming, NumPy, pandas, Scikit-learn, Hydroeval, Statsmodels, and Matplotlib libraries. All were used for data preprocessing, evaluation, and graphical interpretation. Moreover, TensorFlow and Keras deep learning frameworks were employed for modelling deep learning architectures.

Study Area
In the present study, two river subcatchments were selected in two climatic regions: the Awash River Basin, Borkena subcatchment in Ethiopia (Figure 1(a)), and the Upper Tiber River Basin in Italy (Figure 1(b)).

Borkena Watershed (Ethiopia).
e first case study area is in the Borkena watershed at the Kombolcha station outlet, located in the upper part of the Awash River Basin in the northern part of Ethiopia. e mainstream of the watershed emanates from Tosa mountain, which is found near Dessie town. e area's altitude ranges from 1,775 m at the lowest site near Kombolcha to 2,638 m at the highest site upstream of Dessie. e main rainy season of this watershed is from July to September.

Upper Tiber River Basin (Italy).
e second case study area is located in the Upper Tiber River Basin (UTRB) in Italy.
e Tiber River Basin (TRB) is the second-largest catchment in Italy [23]. Geographically, the basin is located between 40.5°N to 43°N latitudes and 10.5°E to 13°E longitudes, covering about 17,500 km 2 that occupies roughly 5% of the Italian territory. e Upper Tiber River Basin (UTRB) is part of the TRB, covering 4145 km2 (∼20% of the TRB) with its outlet at Ponte Nuovo. e elevation of the catchment ranges from 148 to 1561 m above sea level. e area's climate is the Mediterranean, with precipitation mainly occurring from autumn (March to May) to spring (September to November). e intense rainfall highly influences the basin's hydrology at the upstream part that causes frequent floods in the downstream areas [24].

Data Source and Preprocessing
Borkena's required hydrological and metrological datasets were collected from the Ministry of Water Irrigation and Energy (MoWIE) of Ethiopia and the National Meteorological Agency of Ethiopia (NMA), respectively. UTRB's datasets were collected from the National Research Council of Italy (CNR) and archived for public use with the Water Resource Management and Evaluation (WRME) platform in the following link: http://hydrogate.unipg.it/wrme/.
We collected 5844 available data series from the time window of January 1, 1999, to December 31, 2014, from the Borkena watershed. Similarly, for UTRB, 7670 data series were collected from January 1, 1958, to December 31,1978. Both case study datasets are multivariate and multisite. Even though we are highly concerned and chose the series of time windows with minimum data gap for both stations, the datasets contain many missing values due to different reasons. us, our first task was to fill the missing values with the Monte Carlo approach for this research.
e study applied linear correlation statistics to measure the strength of dependency between different input variables [25]. Even though Mehr and Gandomi [26] stated that linear correlation might mislead or provide abundant inputs, our study does no't have a huge feature size that requires intensive feature selection criteria. Hence, we adopted a linear correlation coefficient. Moreover, Kun et al. [27] concluded that Pearson correlation coefficient (PCC) is the most applicable for multiple linear regressions (MLRs), and Oyebode [28] also stated that inputs selected with PCC showed superior model accuracy. Hence, this study applied Pearson linear correlation coefficient [29,30]. It has a value ranging between "+1" and "-1," where "+1" indicates a positive linear correlation, "0" is no linear correlation, and "-1" shows a negative linear correlation [25]. Equation (1) [31]. However, since we have a small number of variables and data size, for this study, we decided to omit Borkena station (T max ) values, which have r values ranging between (−0.129) and (+0.107), and the details are presented in Table 1. (1) After passing rigorous quality control processes, the raw data were then split chronologically into training and testing datasets with a ratio of 80 : 20, respectively. e time series graph and the corresponding box plot of split data for both stations are presented in Figure 2. Different options existed in the literature to remove noise from the time series. A sliding window is the first option to temporarily approximate the time series data's actual value [32]. In comparison, rolling windows (moving average) is the second option that smooths the time series data by calculating the average, maximum, minimum, or sum over a specific time [33]. Hence, for this study, we applied average rolling windows to smooth and remove noise from the time series by keeping the length of the data still. en, daily, weekly, and monthly average rolling sliding windows were used to rebuild the input and output time series into a supervised learning format. Accordingly, the rolled time series data were then prepared with the time lag window of 30 or 45 for single-step streamflow simulation at Borkena and UTRB stations, respectively. Moreover, split time series data variable scaling was performed using Standard Scaler for the modelling process's computational easiness and numerical stability.

Methods
In this study, three types of network architectures MLP, GRU, and LSTM, were compared with the proposed hybrid deep neural network architectures CNN-LSTM and CNN-GRU for the simulation of single-step streamflow by taking different combinations of precipitation (P), minimum temperature (T min ), and maximum temperature (T max ) as inputs. e proposed simulation model architectures with their input and output variables are briefly presented as a flowchart in Figure 3.

Deep Learning Models.
Deep learning models are part of a broader family of machine learning, including recurrent neural networks (RNNs), convolutional neural networks (CNNs), deep belief networks (DBNs), and deep neural networks (DNNs). ese models have been applied to different fields of study, including speech recognition, computer vision, natural language processing, and time series analysis [13,16,[34][35][36]. e following sections will briefly discuss some of these architectures that were used in the present study.

Artificial Neural Network (ANN).
Artificial neural network (ANN) is the most common machine learning model that has found application in streamflow simulation over the last two decades [1,37]. It is known for modelling complex input-output relationships inherent in hydrological time series features within a river catchment. e traditional feedforward neural network (FFNN) with three layers of input-output and hidden layers trained by backpropagation (BP) algorithm gained popularity for nonlinear hydrological time series modelling. Figure 4 displays the typical architecture of ANN.
where i, h, j, b, and w indicate neurons of the input, hidden, output layers, bias, and applied weight of the neuron, respectively; f h and f j show the activation functions of the hidden layer and output layer, respectively; x i , n, and m represent, respectively, the input value, input neuron, and hidden neuron numbers; and y and y j denote the observed and calculated target values, respectively. In the calibration phase of the model, the values of the hidden and output layers and corresponding weights could be varied and calibrated [38]. e ability of ANN to link input and output variables in complex hydrological systems without the need for prior knowledge about the nature of the process has led to a huge leap in the use of ANN models in hydrological simulations [38].

Long Short-Term Memory (LSTM).
e difference of LSTM from the classical MLP network is that layers of the neurons in LSTM have recurrent connections; thus, the state from the previous activation time step is used to formulate an output. e LSTM replaces the typical neuron in the hidden layer with a memory cell and three gates: an input gate, a forget gate, and an output gate [39]. It is an advanced form of recurrent neural network (RNN) that can capture long-term dependencies. On the other hand, RNN is a circular network in which an additional input is added to represent the state of the neuron in the hidden layer at the previous time steps [40]. LSTM has two critical benefits over RNN: overcoming vanishing and exploding gradients and holding memory to capture long-term temporal dependency in input sequences. e mathematical formulation for different parameters is listed in Table 3, and Figure 5 displays the LSTM memory cell with three gated layers. * W i , W f , W o , and W c are the weights that map the hidden layer input to the three gates of input, forget, and output. U i , U f , Uo, and U c weight matrices map the hidden layer output to gates; bi, b f , b o , and b c are vectors. C t and h t are the outcome of the cell and the outcome of the layer, respectively.

Gated Recurrent Unit (GRU)
. GRU is a special type of LSTM architecture in which it merges the input and forget gates and converts them into an update gate, which makes the parameter numbers fewer, and the training will be easier. ere are two input features each time: the input vector x t and the previous output vector h t−1 .
e output of each specific gate can be calculated through logical operation and nonlinear transformation of the input [34]. e mathematical formulations among inputs, outputs, and different parameters are listed in equations (3), (4), (5), and (6). Moreover, Figure 6 displays the structure of the gated recurrent unit (GRU) network.
where Z t is the update gate vector, r t is the reset gate vector, W and U are parameter matrices, σ is a sigmoid function, and tanh is a hyperbolic tangent.

Convolutional Neural Network (CNN).
Convolutional neural network (CNN) is one of the most successful deep learning models, especially for feature extraction, and its network structures include 1D CNN, 2D CNN, and 3D CNN [15]. CNN structure generally consists of a convolution layer, a pooling layer, and a full connection layer [18].   1D CNN is mainly implemented for sequence data processing [41], 2D CNN is usually used for text and image identification [42], and usually, 3D CNN is recognized for modelling medical image and video data identification [43]. Hence, since the aim of the present study is time series analysis, we implemented 1D CNN. e detailed process of 1D CNN is described in Figure 7.
As depicted in Figure 7, the input series is convoluted to the convolution layer from top to bottom (shown by the arrows). e grey or the mesh colours represent different filters where the size of the convolution layer depends on the number of input data dimensions, the size of the filter, and the convolution step length.

CNN-LSTM and CNN-GRU Hybrid Models.
In this study, hybrid models were designed by integrating CNN with LSTM or GRU layers. Hence, the feature sequence from the CNN layer was considered as the input for the LSTM or GRU layer, and then the short and long-time dependencies were further extracted. e proposed CNN-LSTM or CNN-GRU models contain two main components: the first component consists of one dimensional single or double convolutional and average pooling layers. Moreover, a flatten layer is connected to further process the data into the format required by the LSTM or GRU. In the second component, the generated features are processed using LSTM, GRU, and dense layers. Additionally, dropouts are introduced to prevent overfitting. Figure 8 shows the designed model inputs and outputs with a basic description of the convolutional, pooling, and LSTM or GRU layers proposed for this project.

Data Analysis
Simulation with deep learning requires selecting a probable combination of hyperparameters: batch size, epochs, number of layers, and number of units for each layer [8]. Optimizing hyperparameters is not always consistent as there is no hard rule to follow. " e process is more of an art than a science" [44]. Hence, in this study, we chose the Keras tuner optimizer developed by the Google team and included it in the Keras open library [45,46].

Hyperparameter Optimization.
Tuning machine learning model hyperparameters is critical. Varying hyperparameter values often results in models with significantly different performances [47]. e models applied in this study mainly contain two types of hyperparameters: constant hyperparameters that are not altered through the optimization process and variable hyperparameters. Adam optimizer is applied under the category of the constant hyperparameters because of its efficiency and easiness to implementation that requires minimum memory and is commonly suited in different problems [48]. In this category, rectified linear unit (Relu) was used as an activation function, and mean squared error (MSE) was used as a loss function.
In contrast, the second type of changing hyperparameters is optimized by Keras tuner, and hyperparameter choices or value ranges for optimization are set using different trials. We also considered our PC capacity (processor: Intel(R) Core (TM) i7-6500U CPU 2.50 GHz and RAM: 8 gigabytes) with Windows 10 operating system. Hyperparameters are optimized with 20 trials, and since deep learning networks have different training and validation plots for each run, we decided to repeat the iteration three times.
All hyperparameter ranges or choices are listed in Table 4. CNN-LSTM 1 and CNN-GRU 1 models used hyperparameter values from numbers 1 to 13 for optimization (Table 4), while we omitted 4, 5, and 6 for CNN-LSTM 2 and CNN-GRU 2 . e remaining deep learning models MLP, LSTM, and GRU used a list of hyperparameters from numbers 7 to 13. Finally, each optimized hyperparameter is used for each training and testing experiment. Moreover, the train and test traces from each run can be plotted to give a more robust idea of the behaviour of the model to inspect overfitting and underfitting issues.

Performance Measures.
A wide variety of evaluation metrics are listed in the literature [49]. However, the popular ones are mean error (ME), coefficient of determination (R 2 ), root mean square error (RMSE), mean absolute error (MAE), mean percentage error (MPE), mean absolute percentage error (MAPE), mean absolute scaled error (MASE), and Nash-Sutcliffe efficiency (NSE).
is study used different input and model variability conditions. Hence, to concisely measure the analysis output and present the result, we applied the following top three standard performance evaluation criteria that can also have the potential to capture the extreme streamflow time series values effectively [50]. Coefficient of determination (R 2 ): Root mean square error (RMSE): Mean absolute error (MAE):    Computational Intelligence and Neuroscience Chooses the information to reject from the cell Decides what information is relevant to update in the current cell state Decides what to output based on input and the long-term memory of the cell Cell state Long-term memory  Figure 5: LSTM memory cell with three gated layers [11]. . C 2 C 1 =W 1 P 1 +W 2 P 2 C 2 =W 1 P 2 +W 2 P 3 C n =W 1 P n-1 +W 2 P n C 3 C n .

13
Number of batch sizes 10 100 10 * * * * Value ranges or choices for optimization by Keras tuner: (objective � "validation loss," max trials � 20, and executions per trial � 3). * Not applicable.  Computational Intelligence and Neuroscience Table 6: Weekly rolled streamflow simulation: performance comparison of the proposed models for different input variables and climatic conditions.

Model 2
Borkena UTRB P + Tmin P P + Tmin + Tmax where Q obs � discharge observed, Q sim � discharge simulated, and n � number of observations. e range of R 2 lies between 0 and 1, representing, respectively, no correlation and a perfect correlation between observed and simulated values, whereas smallest RMSE and MAE scores or values close to zero direct to the best model performance.

Results
Streamflow simulation result with the proposed seven deep learning architectures, different input time window series, two climatic regions, two input combinations, and three    average rolling time windows is presented in Tables 5, 6, and 7. Regardless of the combination of these conditions, the CNN-GRU model showed promising results in most of these scenarios (Tables 5 and 7). e highest scores are presented here.
( Moreover, from the proposed four hybrid models, CNN-GRU 2 or the model designed by a single 1D CNN layer showed the highest promising result on trial model 1(UTRB) and model 3, as shown in Tables 5 and 7. In contrast, GRU on model 2 (UTRB), CNN-LSTM 2 on model 2 (Borkena), and CNN-GRU 1 on model 1 (Borkena) shared the secondhighest promising result. Streamflow simulation with the CNN-GRU 2 model generally showed the highest performance than the other tested hybrid deep learning models and state-of-the-art LSTM, GRU, and MLP models. In line with our objectives, the result is discussed with different variability conditions in the following paragraphs.

Climatic Region Variability.
Testing models in different climatic conditions with historical data will likely provide robust deep learning models for streamflow simulation in the future [51]. Hence, this research also tested different   Computational Intelligence and Neuroscience models in two climatic regions, and irrespective of climatic and time window variation, the CNN-GRU model displayed the highest scores on tested case study areas.

Input Combination
Variability. Input combination, minimum temperature (Tmin) with precipitation (P), does not show significant performance increment in the Borkena station (Tables 5 and 6). In some scenarios, adopting P only as input increases the performance of the model (Table 7). In contrast, for UTRB, streamflow simulation with all input variables or Tmin, Tmax, and P showed significant performance increments (Table 7).

Average Rolling Time Window Variability.
Streamflow simulation without rolling daily time series data had deficient performance compared to monthly rolled average time series. is could be because the time series noise in UTRB is visible compared to that in Borkena station. As a result, performance increment from daily to monthly rolled window models is much higher in UTRB than in Borkena station.
Generally, the monthly rolled time window with the CNN-GRU2 model showed the top performance results in both stations (Table 7). e corresponding training and test loss functions of this optimized high score hybrid model for both stations are displayed in Figure 9. Consequently, Figure 10 compares the true values and predicted values of this model. e optimized hybrid model boosts the performance score and lowers the training time per epoch much better than GRU and LSTM models.
is model, input feature, and Keras tuner optimized hyperparameter values for both stations with its MSE score are presented in Tables 8  and 9. Moreover, the internal network structures of these models are also shown in Figures 11 and 12, which display the model input and output parameter matrices for each layer.

Conclusions
is study showed a comparative analysis of different hybrid deep learning algorithms with state-of-the-art machine learning models for one-step daily streamflow simulation at two river basins or subcatchment stream flow outlets. e proposed algorithms for this study are CNN-LSTM and CNN-GRU hybrid deep learning models, each model having one or two 1D CNN layers with the classic MLP, LSTM, and GRU models. is study conducted a series of experiments to observe the performance variation of the proposed models by introducing different input combinations, rolling time windows, and climatic conditions for streamflow simulation. e following list of points will summarize the significant findings of this study.
(i) CNN-GRU 2 with one 1D CNN layer showed the best simulation performance reporting the lowest RMSE, MAE, and R 2 out of all models in both case study areas. Such results dictate that the performance of the selected architectures is irrespective of the climatic characteristics of the basins. (ii) Combining temperature data with precipitation as input and inserting to the proposed models had minimum performance increment in Borkena station compared to UTRB case study area, which clearly showed that temperature data scarcity has more performance loss implication in UTRB station. On the other hand, the Borkena station has significant natural streamflow variability than UTRB, which is also reflected in the model results. is implies the consideration of catchment response before any deep learning model applications.
(iii) Rolling the time window of input and output time series for streamflow simulation using the proposed models considerably increases performance in the UTRB than in the Borkena station. (iv) e analysis results also showed that training time per epoch for the hybrid deep learning models is much lower than that of GRU and LSTM models.
Deep learning models usually require massive datasets, and their performance drops with small to medium datasets. However, from this case study, acceptable results and considering hybrid models' hyperparameters sensitivity and complexity, future research may further design optimized configurations. Moreover, they can test these hybrid models for long-term streamflow simulation in ephemeral, seasonal, and perennial river systems and other fields of study. Our future research will try to synchronize the highly performed hybrid deep learning models in this study with remote sensing datasets for the problem we experience in the ungauged catchments.
Data Availability e raw hydrological and metrological datasets used for the Borkena watershed are available from the corresponding author upon request. However, authorization letters are required from the Ministry of Water Irrigation and Energy (MoWIE) of Ethiopia (http://mowie.gov.et/) and the National Meteorological Agency of Ethiopia (NMA) (http:// www.ethiomet.gov.et), whereas for UTRB, the datasets can be retrieved from an online repository (http://hydrogate. unipg.it/wrme/).

Conflicts of Interest
e authors declare that they have no conflicts of interest.