This study investigates the capability of improving the distributed hydrological model performance by assimilating the streamflow observations. Incorrectly estimated model states will lead to discrepancies between the observed and estimated streamflow. Consequently, streamflow observations can be used to update the model states, and the improved model states will eventually benefit the streamflow predictions. This study tests this concept in upper Huai River basin. We assimilate the streamflow observations sequentially into the Soil and Water Assessment Tool (SWAT) using the ensemble Kalman filter (EnKF) to update the model states. Both synthetic experiments and real data application are used to demonstrate the benefit of this data assimilation scheme. The experiment shows that assimilating the streamflow observations at interior sites significantly improves the streamflow predictions for the whole basin. Assimilating the catchment outlet streamflow improves the streamflow predictions near the catchment outlet. In real data case, the estimated streamflow at the catchment outlet is significantly improved by assimilating the in situ streamflow measurements at interior gauges. Assimilating the in situ catchment outlet streamflow also improves the streamflow prediction of one interior location on the main reach. This may demonstrate that updating model states using streamflow observations can constrain the flux estimates in distributed hydrological modeling.
Catchment hydrological modeling is critical for water resources management and planning (e.g., the flood control and drought monitoring). However, the accuracy and reliability of the hydrological modeling are generally subjected to various uncertainties associated with model forcing inputs, model parameters, and model structures [
Assimilating the discharge observations has more advantages in rainfallrunoff improvement, since the discharge observations are more reliable and more directly related to the streamflow modeling. This concept is intensively tested in lumped hydrological modeling [
Besides, the state variables of the distributed models are characterized with spatial variability, which are usually controlled by the underlying land surface characteristics of the catchment (e.g., the topography, land cover, and soil types). This makes the state update probably be influenced by the underlying land surface conditions; thus the performance of streamflow assimilation in improving rainfallrunoff modeling is also affected. Having a good understanding of these factors would be important in improving the streamflow assimilation performance. However, to our knowledge, the influencing factors which related the land surface conditions on streamflow assimilation in distributed modeling are rarely reported by now.
In this study, we investigate the capability of streamflow assimilation in improving the distributed hydrological modeling of upper Huai River basin via updating the basin states. This investigation is based on the Soil and Water Assessment Tool (SWAT) and the ensemble Kalman filter (EnKF) method. Both the synthetic experiment and real application are considered; the synthetic experiment assesses the upper bound of the streamflow assimilation performance under idealized conditions of unbiased model parameters and wellknown modeling and observation errors, while the real application provides a more realistic streamflow assimilation performance. Besides, we also analyze the influence of the underlying land surface conditions (e.g., slopes) on the performance of streamflow assimilation.
SWAT is a physically based basin scale distributed model developed by USDA (United States Department of Agriculture) Agricultural Research Service. It has been widely used in predicting the impact of land management practices on water, sediment, and agricultural chemical yields in large, complex catchments with varying soil, land use, and management conditions over long periods of time [
The surface runoff on a given day (
The reduced rainfall by surface runoff recharging to soil profile is redistributed using the storage routing technique at each soil layer in the root zone as
For groundwater modeling, SWAT incorporates two aquifers in each subbasin: the shallow (unconfined) and deep (confined) aquifer. Only the water in the shallow aquifer contributes to flow in the main channel of the subbasin, where the water balance is described by
The surface runoff (
The channel water balance in the reach can be generally expressed as
Based on the theory of the linear Kalman filter [
When the observations are available, the predicted model state ensemble
In order to assess the performance of streamflow assimilation, the root mean square error, NashSutcliffe coefficient of efficiency, Pearson’s correlation coefficient and normalized error reduction index are used.
The root mean square error (RMSE) can be expressed by [
The NashSutcliffe coefficient (NSE) is expressed by [
Pearson’s correlation coefficient is obtained by [
The normalized error reduction index (NER) is expressed by [
The study catchment is located between 113°15′E~116°00′E and 31°30′N~33°00′N in the upper Huai River basin in China, with an area of 16005 km^{2} and an elevation from 25 m to 1117 m (Figure
Upper Huai River basin (note: ST represents station; DPL, CTG, ZGF, XX, HC, and HB represent Dapoling, Changtaiguan, Zhuganfu, Xixian, Huangchuan, and Huaibin station, resp.; 1 to 28 represent the subbasin number).
The input data required by SWAT model mainly includes the meteorological forcing data and the underlying land surface data. The meteorological input data includes precipitation, maximum/minimum temperature, solar radiation, wind speed, and relative humidity. The precipitation is provided by 106 local rainfall gauges in the basin, while for each subbasin, the precipitation input is the interpolated value of the 106 rainfall stations using the Thiessen polygon method. The 5 types of subsequent meteorological data above are collected from 2 meteorological stations (Xinyang and Guangshui station) in or near the catchment (Figure
Soil classification and its area proportions in the upper Huai River basin.
Soil code  Source soil  Clay  Silt  Sand  Rock  USDA soil texture  Area proportion (%) 

(%)  (%)  (%)  (%)  
S1  Huanghetu  23.43  65.1  11.5  0  Silt loam  21.52 
S2  Shuidaotu  16.46  71  12.5  0  Silt loam  34.62 
S3  Huangzongrang  17.03  39.4  43.6  0  Loam  13.7 
S4  Cugutu  7.05  34  35.9  23.1  Sandy loam  7.35 
S5  Shizhitu  9.16  44.1  46.7  0  Loam  7.8 
S6  Huichaotu  12.86  51.8  35.3  0  Silt loam  4.72 
S7  Shajiangheitu  20.32  65.5  14.2  0  Silt loam  9.81 
There are 6 streamflow gauges in the catchment (Figure
Considering the complicated model physics and structures and the large number of state variables in SWAT, there is a possibility that the spurious correlations exist in a high dimensional state vector and, thus, bring large degrees of freedom to the state update in streamflow assimilation. In order to reduce the freedom in state update, only the state variables that are strongly dependent on the streamflow measurement predictions (i.e., the state variables sensitive to the streamflow modeling) will be updated. Based on the physical significance of the state variables in SWAT, the five variables are preliminarily chosen (Table
State variables in SWAT model chosen for correlation analysis.
Order  State variable  Description  Units  Level 

1  
Amount of surface runoff lagged or stored on a given day  mm  HRU 
2  
Amount of lateral flow lagged or stored on a given day  mm  HRU 
3  
Amount of water stored in shallow aquifer on a given day  mm  HRU 
4  
Amount of water stored in the soil layer (ly) at the end of day  mm  HRU 
5  
Water stored in reach at the end of day  m^{3}  Subbasin 
Note:
The analysis results are present in Figure
Correlation between the state variables of the fifteenth calculation units and the sixth subbasin on the horizontal axis and the simulated streamflow at six streamflow gauges (DPL, HB, XX, CTG, HC, and ZGF in Figure
In this study, the uncertainty of hydrological modeling comes from the errors of meteorological forcing inputs (e.g., precipitation), model parameters, and model structures. Each ensemble member in EnKF is perturbed using the assumptions summarized in Table
Error parameters in the synthetic experiment.
Variables  Error distribution  Mean  Standard deviation  Bound 

Precipitation  Gaussian, multiplicative  1  0.2  (0, —) 
Predicted states  Gaussian, multiplicative  1  0.01  (0, —) 
Predicted streamflow  Gaussian, multiplicative  1  0.2  (0, —) 
Parameters  Gaussian distribution  1  Given in Table 

Observed streamflow  Gaussian, multiplicative  1  0.1  (0, —) 
The sensitive parameters in SWAT for rainfallrunoff modeling and their standard deviation (SD) of Gaussian multiplicative error in streamflow assimilation.
Parameters  Units  Description  Level  SD of GME 

CN_{2}  —  Initial SCS runoff curve number for moisture condition II  HRU  0.05 
ESCO  —  Soil evaporation compensation factor  HRU  0.2 
ALPHA_BF  Days  Baseflow recession constant  HRU  0.2 
GW_REVAP  —  Ground “revap” coefficient  HRU  0.2 
GWQMN  mm  Threshold depth of water in the shallow aquifer required for return flow to occur  HRU  0.2 
REVAPMN  mm  Threshold depth of water in the shallow aquifer for “revap” or percolation to the deep aquifer to occur  HRU  0.2 
OV_N  —  Manning’s “ 
HRU  0.1 
SOL_AWC  mm/mm  Available water capacity of the soil layer  HRU 
0.2 
CH_N( 
—  Manning’s “ 
HRU  0.2 
CH_N( 
—  Manning’s “ 
Subbasin  0.2 
SURLAG  —  Surface runoff lag coefficient  Basin  0.2 
Note:
In the synthetic experiment, one reference field is randomly picked out from a Gaussian distribution with given model inputs. Then, the model is propagated forward to obtain a reference modeling process, which is regarded as the synthetic truth. The synthetic streamflow observations are generated by randomly perturbing the synthetic true streamflow drawn from the reference simulation using a Gaussian multiplicative error with the same standard deviation as that of the streamflow observation error. An ensemble integration with 200 members of SWAT model is performed with known errors (Table
In real data assimilation, the error set of model and observation for EnOL and EnKF is identical to that of the synthetic experiment. Similar to the synthetic experiment, we consider the following two cases: (1) the streamflow measurements at the five interior discharge gauges are assimilated and the in situ streamflow measurements at the catchment outlet are used for validation; (2) the streamflow measurements at the catchment outlet are assimilated and the measured streamflow at the five interior sites is adopted for validation.
Figure
The RMSE (
The NER (
Figure
Comparison of the estimated streamflow by EnKF and EnOL at the basin outlet (Huaibin station) in synthetic experiment: (a) the whole period from Jan. 1, 1996 to Dec. 31, 1997; (b) and (c) details of the period with high flows. The light blue and red lines indicate the ensemble members of the EnOL and EnKF, respectively.
The improvement of streamflow prediction by EnKF should be benefited from the realtime updating of the surface runoff storage (
The RMSE (
RMSE of the estimated river flow storage (
For different subbasins and HRUs, the impact of streamflow assimilation on the update of
Figure
Comparison of the RMSE (
Figure
The NER (
Figure
The RMSE (
The NER (
The correlation coefficients (CC) between the simulated streamflow at all subbasin outlets and that at the catchment outlet (HB station). The value of CC for one subbasin outlet is displayed on the subbasin where it is located. “Corr. Coef. versus HB” indicates the correlation coefficients between the streamflow at the subbasin outlet and that at HB station.
Figure
Comparison of the simulated streamflow by EnKF and EnOL at the basin outlet (HB station) in real data application: (a) the whole period from Jan. 1, 1996 to Dec. 31, 1997; (b) and (c) details of the period with high flow. The light blue and red lines represent the ensemble members of EnOL and EnKF, respectively.
Table
The statistics of the estimated streamflow obtained by EnOL and EnKF at the five interior discharge gauges.
Discharge gauges  EnOL  EnKF  NER (%)  

RMSE (m^{3}/s)  NSE  CC  RMSE (m^{3}/s)  NSE  CC  
Dapoling  39.82  0.46  0.71  39.01  0.49  0.71  2.03 
Changtaiguan  72.99  0.41  0.66  72.96  0.41  0.66  0.04 
Zhuganfu  56.51  0.64  0.81  60.12  0.6  0.78  
Huangchuan  67.36  0.59  0.77  68.07  0.58  0.76  
Xixian  199.98  0.55  0.74  187.38  0.61  0.78  6.3 
Note: CC represents the correlation coefficient.
The correlation coefficient between the streamflow at the catchment outlet (HB station) and that at the five interior gauges (DPL, CTG, ZGF, HC, and XX station) for both observation and model simulations.
This study uses both the synthetic experiment and real data application to investigate the capability of streamflow assimilation in improving the streamflow predictions of upper Huai River basin via updating the basin states in distributed hydrological modeling process. The two sensitive model state variables (i.e., the surface runoff storage and the reach flow storage) in SWAT are updated using the streamflow observations (1) at the five interior sites and (2) at the catchment outlet.
The synthetic experiment shows that the predicted streamflow at all subbasin outlets is improved by assimilating the synthetic streamflow observations at the five interior sites. This improvement shows an increasing trend from upstream to downstream of the main reach, and the predicted streamflow at the catchment outlet obtains the most significant improvement. This improvement of the catchment streamflow modeling is benefited from the updating of the surface runoff storage and reach flow storage of the whole basin as the two state variables significantly approach their synthetic truth after assimilation. It is found that the significant improvement of the surface runoff storage generally occurs on the calculation units with the slopes less than 5°. We also find that the main channel length of the subbasins has a considerable impact on the updating of reach flow storage (
The real data application shows that a large improvement of the estimated streamflow at the catchment outlet is obtained by updating the basin states using the in situ streamflow measurements at the five interior gauges in rainfallrunoff modeling. The assimilation of the in situ catchment outlet streamflow measurements improves the streamflow modeling at Xixian, produces a slight worse impact on the streamflow estimation at Zhuganfu and Huangchuan, and has nonsignificant influence on the streamflow modeling of the upstream of Dapoling and Changtaiguan. In this study, the method for model and observation error parameter estimation is subjective and empirical to a certain extent as it generally relies on experience and conclusions from other previous studies. More rigorous and theoretical basis methods for quantifying the model and observation error parameters, for example, the adaptive assimilation approach [
The authors declare that they have no competing interests.
This study was supported by the National Natural Science Fund Project of China, the Response Mechanism of the Watershed Hydrological Processes under Meteorological Drought (41371050), the Specialized Research Fund for the Doctoral Program of Higher Education of China (20130094110007), the Fundamental Research Funds for the Central Universities (2015B05514), and the Graduate Research and Innovation Program for Ordinary University of Jiangsu Province, China (CXZZ13_0248). The authors thank Susansteel Dunne from Delft University of Technology for her precious suggestions.