Predicting Surface Runoff from Catchment to Large Region

Predicting surface runoff from catchment to large region is a fundamental and challenging task in hydrology. This paper presents a comprehensive review for various studies conducted for improving runoff predictions from catchment to large region in the last several decades. This review summarizes the well-established methods and discusses some promising approaches from the following four research fields: (1) modeling catchment, regional and global runoff using lumped conceptual rainfall-runoffmodels, distributed hydrological models, and land surface models, (2) parameterizing hydrological models in ungauged catchments, (3) improving hydrological model structure, and (4) using new remote sensing precipitation data.


Introduction
Runoff from land surface is the flow of water that comes from excess water from rain, meltwater, or other sources that flow over the Earth's surface.It is a major component in regional and global hydrological cycle.It has direct impacts on human lives since it is a key water resource for agriculture, industry, urban water use, and so forth.It is crucial to understand complex relationships between rainfall and runoff processes and then to accurately estimate surface runoff for efficient design, planning, and management of catchments.This can be achieved using hydrological modeling that not only estimates continuous surface runoff, but also helps in understanding catchment behaviors and modeling impacts of climate and land use changes on surface water balance [1,2].
Model calibration is a necessary step for achieving good simulations and predictions of surface runoff.Hydrological models are usually calibrated against observed streamflow to tune their model parameters to account for the inputs and water fluxes in a catchment [3,4].With the development of remote sensing technology, more information is now available for hydrological modeling, for example, using remote sensing precipitation and leaf area index as model inputs [5,6], and incorporating more data (such as remote sensing soil moisture, evapotranspiration, groundwater, and snow cover area) for multiple objective model calibration [4,[7][8][9].
Local hydrological models have been largely used to predict runoff time series using a small number of catchments that covers a small region where climate conditions are similar [10,11].Recently, they were used to predict surface runoff in ungauged catchments in a large region, such as in southeastern Australia [12], the Tibetan Plateau [13], UK [14], and France [15].This is important since lots of rivers and their reaches and tributaries in the world are ungauged or poorly gauged [14,16,17].
It is a hard task to have a credible prediction of surface runoff in ungauged catchments or regions where no runoff data are available or runoff data are available sparsely.Hydrologists have been attempting to develop strategies to estimate runoff on ungauged catchments since the 1970s, especially after the International Association of Hydrological Sciences (IAHS) launched an initiative Predictions in Ungauged Basins (PUB) in 2003, which aims at predicting or forecasting the hydrological responses in ungauged or poorly gauged

Catchment, Regional and Global Runoff Models
There exist various models to simulate surface runoff in an empirical, semimechanistic or fully mechanistic way.Generally, surface runoff models are classified from deterministic to stochastic models, from physically based (white-box) to black-box or empirical and to conceptual models, from lumped to distributed models, and from global hydrological to land surface models (LSMs) [39,[73][74][75].This paper separates the hydrological models into three categories according to complexity and application, including (1) lumped conceptual rainfall-runoff (RR) models, (2) distributed hydrological models, and (3) global hydrological/LSMs [73,76].The first two categories of the hydrological models are normally applied from catchments to regions and the third category of the hydrological models is generally applied from a large region to the global land surface.Table 1 summarizes the three categories of major hydrological models for runoff estimations/predictions across a wide range of climate and physiographic conditions.

Lumped Conceptual
Rainfall-Runoff Models.Lumped conceptual RR models treat a catchment as a single homogeneous unit, and they are widely used since such models tend to be parametrically parsimonious while yielding good model performance after calibration using historical watershed input-output data [77].Numerous RR models have been developed and documented [78,79].Crawford and Linsley's Stanford Watershed Model was one of the notably successful efforts in introducing a complex RR model accounting for the dynamics of hydrologic processes governing in a watershed [32].Other examples of conceptual RR models include Xinanjiang model developed in China in the 1980s [34] and Sacramento Soil Moisture Accounting Model (SAC-SMA) [30], widely used operational model in the US National Weather Service (NWS) for flood forecasting.
RR models have been used very successfully to estimate runoff at small and large catchments under different climate regimes.Usually, RR models use rainfall and other climate data (e.g., temperature and/or potential evaporation) to estimate runoff.Although the main emphasis of RR models is to estimate runoff, they are normally designed to simulate actual evapotranspiration to account for soil water balance.However, they have no direct interest in quantifying surface energy fluxes [76].The parameters in the RR models are usually optimized such that the runoff simulated matches as closely as possible the recorded runoff.A variety of model calibration techniques (including manual calibration and automatic calibration techniques) have been developed and implemented to ensure conformity between the model simulations of system behavior and observations [3,80].
Compared to the distributed hydrological models, the RR models are simpler and need less input data, and the calibration cost is cheap, so they are quite easy to be used and are important tools for hydrologic analysis.More importantly, the RR models are comparable to the distributed hydrological models, in terms of model accuracy for predicting daily, monthly, and annual runoff time series.For instance, Vansteenkiste et al. [81] compared three RR models (NAM, PDM, and VHM) to two distributed models (WetSpa and MIKE-SHE) in a medium sized catchment in Belgium to assess the model accuracy.They found that all tested models perform well for estimating total runoff and their components, peak and low flow extremes.However, calibrating the RR models is much less time consuming and produced higher overall model performance in comparison to the two distributed models.Reed et al. [82] compared 12 distributed models with a lumped model, and the results show that the lumped model outperformed distributed models in more cases, while some calibrated distributed models can perform at a level comparable to or better than a calibrated lumped model.The limit of the RR models is that they cannot simulate the spatial pattern change in land cover and land use influencing surface water availability.
The RR models are normally applied at catchment scales.Hydrologic prediction of the RR models is highly influenced by the uncertainties in the forcing data (generally taken as deterministic), observed system response (due to errors in measuring the physical quantities), imperfection of the model structure, and the parameter values resulting from the model calibration which is profoundly affected by uncertainty sources [3].
In summary, the RR models are still very important tools in hydrological modeling, particularly for predicting runoff in ungauged catchments because of their simplicity and usability.

Distributed Hydrological Models.
Distributed hydrological models make a series of hydrological processes interconnected, such as runoff generation, recharge to groundwater, snow accumulation and melt, soil moisture dynamics, evapotranspiration, and routing in lakes and rivers [10].In addition, distributed hydrological models take account of the spatial variability of climate, terrain, soil, and vegetation.These elements are divided in smaller units that are more homogenous than the whole watershed.Therefore, this feature offers the potential to improve hydrologic predictions [83].The distributed hydrological model can be directly used for estimating land use and land cover change impact on surface runoff and water availability [1,2].This is particularly important for catchments with a wide range of climatic and land surface conditions.
The distributed hydrological models have been well developed since the 1970s because of the robust development of 3S (RS/GPS/GIS) technology.A representative semidistributed hydrological model is the topography-based hydrological model named as TOPMODEL that was developed in 1979.It describes runoff generation process including both saturation excess and infiltration excess runoff according to topographic index derived from digital elevation model (DEM) [44].The spatial variability of precipitation, however, is not considered by TOPMODEL.After TOPMODEL, distributed hydrological models such as SHE (System Hydrologic European) [40] and SWAT (Soil and Water Assessment Tool) [42] are fully distributed and contain more complex hydrological processes.
Although the distributed hydrological models have more solid physical base compared to the lumped models, several model comparison studies [74,75,82,84,85] have shown that no single model performs consistently best but rather that individual model performances vary with the setting.So selecting models depends on objectives, application, and availability of data.
Despite their complexity, the distributed hydrological models are very useful for investigating changes in hydrological processes caused by anthropogenic activities, such as forestation, deforestation, and urbanization.

Global Hydrological and Land Surface Models.
The hydrological models presented in Sections 2.1 and 2.2 are normally applied to a catchment to regional scale.At a larger scale from a large region to globe, global hydrological and LSMs (Table 1) are developed for simulating/predicting surface runoff.It is noted that global hydrological models are traditionally focused on water resources and lateral water fluxes while LSMs can be coupled to global climate models, to describe the vertical exchange of heat, water, carbon, or other elements.Based on the spatial application, this review paper does not separate the two kinds of models, naming them "global LSMs" on a whole.
Compared to the lumped RR and distributed hydrological models, the global LSMs are far more complicated since they can simulate not only hydrological processes, but also various material and energy transfer processes on land surface [86].These processes include precipitation interception, snow accumulation and melt, runoff generation, water transfer amongst soil layers, shortwave radiation's reflection and transmission, longwave radiation's absorption and emission, separation of sensible heat and latent heat, plant growth and respiration, photosynthesis and gross primary production, microbe activities, and nutrient cycle.
The first-generation LSMs such as Bucket model [87] do not consider vegetation and include only one soil layer.The second-generation LSMs such as BATS [48] and SiB [69] contain "big-leaf" vegetation and 2-3 soil layers.The third-generation LSMs such as CLM [49] contain "twoleaf" vegetation and multilayer soil layer for hydrological processes.Some widely used land surface models are listed in Table 1.
Surface runoff process is considered quite differently between distributed hydrological models and global land surface models.Surface runoff is a key output in lumped RR and distributed hydrological models, while it is taken as residue of water balance equation in global LSMs.Because of the accumulated errors built in land surface models, they perform generally more poorly than distributed hydrological models [88][89][90].Gosling et al. [91] compared the projected impacts of climate change on river runoff from two types of distributed hydrological models, a global hydrological model (GHM) and catchment-scale hydrological models (CHMs).Results show that there are differences between GHMs and CHMs in mean annual runoff due to differences in potential evapotranspiration estimation methods, and the differences in projected changes of mean annual runoff between the two types of hydrological model can be substantial for a given GCM.Haddeland et al. [74] compared six land surface models and five global hydrological models and results show that significant simulation differences between models are found to be caused by the snow scheme employed, and differences between models are a major source of uncertainty.
The main strength of global hydrological and land surface models is that they can be used for answering the regional and global questions for water availability and changes in global hydrological cycles [74].

Regionalization.
There are no observations or lack of observations in ungauged catchments.Therefore to predict surface runoff in the ungauged catchments depends on alternative prediction methods [17].Regionalization is a commonly used method for runoff predictions [15,92], in which model parameters calibrated from gauged catchments are transferred to ungauged catchments using various approaches.It is a challenge to get satisfactory regionalization results [15,17,93,94] because of limit of dataset, a wide range of catchment attributes, poor quality of model inputs, unsatisfactory model calibrations, and so on [15,17].
All these regionalization methods have been applied in many catchments, and many attempts have been made to determine which regionalization approach was the most appropriate (Table 2).Merz and Blöschl [19] tried to regionalize an 11-parameter semidistributed conceptual RR model based on more than 300 Austrian catchments.It shows that spatial proximity performs best, and using nested catchments as donors may significantly improve performance of spatial proximity.Young [14] tried to regionalize a six-parameter version of the PDM model on 260 UK catchments and found that regression approach yielded the best results, compared to other approaches.Oudin et al. [15] compared three regionalization schemes (SP, PS, and Reg) based on 913 French catchments using two lumped models and found that spatial proximity provides the best regionalization solution.Li et al. [101] proposed a new regionalization method (the index model), which establishes a nonparametric relationship between each parameter of predictive tools and a linear combination of predicators.The prediction results of 227 catchments in southeast Australia show that the index model produces the most accurate prediction compared to regional models based on the linear regression, nearest neighbor, and hydrological similarity.Shu and Ouarda [102] introduced a regression-based logarithmic interpolation method to estimate regional FDCs at ungauged sites, and the estimated FDC is combined with a spatial interpolation algorithm to obtain daily streamflow estimates.McIntyre et al. [103] and Oudin et al. [15] showed that output averaging (the target catchment is modeled using parameter values from many donor catchments) can reduce uncertainty in runoff predictions in ungauged catchments.Similarly, Reichl et al. [98] showed that flow prediction using an optimized model averaging method (based on physical similarities) is superior to regression and spatial proximity approaches.
In summary, the studies carried out in most countries, such as Austria, France, and Australia, found that SP is better than PS and Reg is the least satisfactory.This is also confirmed in the highest plateau, the Tibetan Plateau [105].Only in UK did the studies find that Reg performs better than SP or PS.
There are various reasons explaining the different model performance between the abovementioned studies, including using different catchment sets, different catchment descriptors, and different hydrological models [14,15].This suggests that each regionalization approach does not always perform consistently.Razavi and Coulibaly [20] found that the performance of regionalization approaches is climate related, and overall spatial proximity and physical similarity have shown satisfactory performance in arid to warm temperate climates (e.g., Australia) and regression-based methods have been preferred in warm temperate regions (e.g., most European countries).To fully understand the performance of the various regionalization approaches, it is critical to have global comparison studies.However, such studies have not been reported yet.

Multiple Objective Model Calibration.
It was recognized early [80,107] that models calibrated only to observed hydrographs can be considered overparameterized if they consist of more than five parameters [29], because the predictive capability of hydrological models would be limited by high model complexity relative to the typically low number of model constraints used to calibrate the models AM: arithmetic mean method; SP: spatial proximity method; PS: physical similarity method; Reg: regression methods; HS: hydrological similarity method.[108].An important strategy to overcome this problem was the incorporation of more information (such as different aspects of the hydrograph, soil moisture, evapotranspiration, groundwater, and snow depth) for multiple objective model calibration.
Madsen [7] used a calibration scheme including optimization of multiple objectives that measure different aspects of the hydrograph (overall water balance, overall shape of the hydrograph, peak flows, and low flows).Seibert and McDonnell [109] reported that the inclusion of groundwater dynamics results in significantly improved and more consistent overall model performances.Nester et al. [110] demonstrated the value of remotely sensed snow cover patterns to constrain parameter uncertainty of catchment models.
Others used remotely sensed soil moisture and evaporation, respectively, to improve model parameterizations [111][112][113].Zhang et al. [114] showed that the incorporation of remotely sensed leaf area index and surface soil moisture measurements into the calibration objective function marginally improves the daily runoff estimates but noticeably improves the leaf area index and soil moisture estimates in the validation catchments.Zhang et al. [4] used remotely sensed evapotranspiration estimates together with recorded streamflow to constrain rainfall-runoff model calibration and then used optimized parameter sets for runoff predictions.They found that the use of remotely sensed evapotranspiration data in calibration leads to improved daily or monthly runoff predictions in ungauged catchments.However, Willem Vervoort et Advances in Meteorology al. [9] show that satellite evapotranspiration did not improve the calibration results of the lumped conceptual model and confirm that the calibration of models using multiple environmental time series (such as MODIS evapotranspiration and streamflow) can be used to identify structural model issues.

Regional Calibration against
Observations from Multiple Catchments.Regional model calibration is defined here as model calibration simultaneously against observations in multiple catchments (from dozens to hundreds) across a wide region to obtain a single parameter set for all catchments.In contrast, local model calibration is referred to as the calibration against observations in a single catchment.
The major advantage of the local model calibration is that an optimum parameter set can be obtained for each individual catchment and will match the local data most accurately.However, the locally optimized parameter values are not always suitable for runoff predictions because gauging stations can be few and far apart, resulting in that the underlying assumption that nearby catchments have similar responses can be problematic.Furthermore, observational errors (e.g., in streamflow gauging and rainfall inputs) can cause the local calibration to be biased, with biased model parameters being regionalized.
The main benefit of regional model calibration is that (1) use of one set of optimized parameter values (or perhaps several sets if different objective functions are considered or if a research region is divided into different subregions) can improve hydrological and vegetation estimates at the regional scale and (2) there is no noticeable degradation from model calibration to model validation.The disadvantage of regional calibration is that it requires lots of computation resources and it is normally conducted using super computer clusters.
Previous studies showed that regional calibration could improve the accuracy of simulated runoff in ungauged regions and has been used in runoff simulation and prediction [21,115].Regional calibration would be an important research field of large-scale hydrological simulation and predictions and will be strengthened with the computational development.

Improving Hydrological Model Structure
The model structure represents a formalized perception of how the catchment system is organized and how the various parts are interconnected [138].Selection of a suitable model structure ideally depends on a number of factors as one strives to represent the runoff processes in a realistic way, so that the model can be safely used in a predictive mode.However, there is still some room for further improving model structures.

4.1.
Modifying RR Model Structure.Usually, RR models use simply conceptual equations to simulate evapotranspiration based on soil wetness and potential evapotranspiration (calculated from basic climate data) and seldom consider vegetation dynamics, which can play an important role in midlatitude catchments [11,139,140].Because of lack of surface vegetation information in RR modeling inputs, calibrated RR models may not estimate water balance components, evapotranspiration, and water storage change accurately, which possibly limits their ability to estimate runoff.
Remotely sensed data can provide temporally dynamic and spatially explicit information on land surface characteristics such as vegetation cover types and leaf area index.Vegetation processes play an important role in evapotranspiration and runoff in midlatitude catchments [140,141].Yildiz and Barros [140] showed that vegetation properties such as fractional vegetation coverage and leaf area index (LAI) had significant effects on hydrological model results via control of evapotranspiration rates, and this control was especially critical during the spring-summer transition which coincided with the greening season in midlatitudes.
A suitable way to integrate vegetation process data into hydrological models is to use remote sensing vegetation data, such as LAI and fractional vegetation cover [142][143][144].Recent studies have tried to include remote sensing vegetation information as inputs into RR models.Reference [97] used MODIS LAI data combined with the Penman-Monteith equation in the lumped Xinanjiang model, and results showed that it can improve the prediction of runoff in ungauged basin.Oudin et al. [145] modified the water balance models to introduce the fractional coverage of land cover types and results showed that land cover information improves the overall model efficiency.

Improving Distributed Hydrological, Land Surface Model
Structure.Appropriate land surface parameterization is based on comprehensive understanding to land surface processes and thus could improve performance of physically based models.For instance, Liang and Xie [146] used a new surface runoff parameterization which takes into account effects of soil heterogeneity on Horton and Dunne runoff to replace the old parameterization in VIC model.Results showed that the new parameterization plays a very important role in partitioning the water budget between surface runoff and soil moisture.Pitman et al. [147] compared the accuracy of estimated runoff in the region that ranged from 30 N to 90 N by BASE model with and without frozen soil parameterization.Results proved that frozen soil parameterization greatly influenced runoff generation with less runoff variability.Haverd and Cuntz [148] found that soil litter is important for simulation of soil moisture and evapotranspiration in forest region.When coupled with a soil litter model, the accuracy of CABLE has been greatly improved for estimating soil moisture and evapotranspiration in a forest flux site in Australia.Choi and Liang [149] detected several deficiencies in the existing formulations for terrestrial hydrologic processes in CLM and improved model performance for predicting runoff by five modifications of its parameterization.In summary, there is plenty of room

Improving Precipitation Inputs
High quality daily precipitation estimates are required for accurate hydrological modeling.There are two major sources to estimate precipitation fields: rain gauge stations and remote sensing devices (such as satellites and radar).The observations obtained from rain gauges are considered to be more accurate and reliable, but the spatial coverage is unsatisfactory.Hence, the areal precipitation estimates constructed solely by rain gauges exhibit a great deal of uncertainty especially in the areas of low rain gauge density.Remote sensing gridded precipitation estimates are presented in a good coverage in space/time and with less uncertainty [150,151].However, the coverage of weather radar network is currently limited to some areas in the world.So, with the advent of meteorological satellites in the 1970s, great efforts have been directed to estimating precipitation from satellite images (e.g., TRMM, TMPA, CMORPH, and GSMAP), which cover most of the globe (Table 3).However, the accuracy of remote sensing satellites precipitation may not be desirable and the estimation of precipitation can be improved by blending rain gauge and satellite data [151].Several statistical merging schemes have been developed for experimental or/and operational use, such as conditional merging [152], Bayesian merging [153], statistical objective analysis [154], data assimilation [137,155], and double/single optimal estimation [156].
Gottschalck et al. [157] showed that the Climate Prediction Center (CPC) Merged Analysis of Precipitation (CMAP) has the closest agreement with a CPC rain gauge dataset for all seasons except winter, while TRMM overestimated summertime precipitation in the central United States (200-400 mm).Chappell et al. [158] evaluated geostatistical methods of blending satellite and gauge data to estimate near real-time daily rainfall for Australia and results showed that the blending considerably reduced the estimation variance.Mitra et al. [159] showed that TRMM merged with gauged station data can significantly improve the estimation of spatial distribution of precipitation of the Indian monsoon region.Ryo et al. [160] showed that the blended precipitation data can improve the hydrological modeling especially the flood modeling in Vietnam.

Summary
This paper provides a comprehensive review of catchment, regional and global runoff modeling.Continuous surface runoff modeling can be carried out through conceptual rainfall-runoff models, distributed models, and land surface models.Parameterization of hydrological models in ungauged catchments can be done by regionalization, multiple objective model calibration, and regional calibration against observations from multiple catchments.The models can be further improved by incorporating remote sensing vegetation data and remote sensing precipitation data.
There is still considerable room to improve surface runoff prediction from catchments to large regions.In a large region, improving regionalization performance can be attributed to improved catchment characteristics, ensemble of different regionalization approaches, multiple-donor output averaging, or model ensemble, and so forth.Special attention should be paid to use of remote sensing data for multiple objective model calibration and to improving hydrological model structure using remote sensing data since they have great advantages in ungauged catchments or data sparse regions.How to smartly parameterize global land surface models or smartly modify their structure for improving runoff predictions from large regions to globe will be a great challenge for hydrologists and meteorologists in the next couple of decades.

Table 1 :
Major catchment, regional and global runoff models.

Table 2 :
Summary of regionalization approaches conducted using large datasets.

Table 3 :
Summary of precipitation datasets.
Ocean Reference Experiment Version 2: Large-Yeager Air-Sea Surface Flux; NCAR: National Center for Atmospheric Research; NLDAS: North American Land Data Assimilation System; GLDAS: Global Land Data Assimilation System; PERSIANN: Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks; CHRS: Center for Hydrometeorology and Remote Sensing; PRISM: Parameter-Elevation Relationships on Independent Slopes Model; PREC/L: NOAA's Precipitation Reconstruction Land; SSM/I, SSMIS: Special Sensor Microwave/Imager and Sounder; TRMM: Tropical Rainfall Measuring Mission; JAXA: Japan Aerospace Exploration Agency.