Improving the Distributed Hydrological Model Performance in Upper Huai River Basin : Using Streamflow Observations to Update the Basin States via the Ensemble Kalman Filter

This study investigates the capability of improving the distributed hydrological model performance by assimilating the streamflow observations. Incorrectly estimated model states will lead to discrepancies between the observed and estimated streamflow. Consequently, streamflowobservations can be used to update themodel states, and the improvedmodel stateswill eventually benefit the streamflow predictions. This study tests this concept in upper Huai River basin. We assimilate the streamflow observations sequentially into the Soil andWater Assessment Tool (SWAT) using the ensemble Kalman filter (EnKF) to update the model states. Both synthetic experiments and real data application are used to demonstrate the benefit of this data assimilation scheme. The experiment shows that assimilating the streamflow observations at interior sites significantly improves the streamflow predictions for the whole basin. Assimilating the catchment outlet streamflow improves the streamflow predictions near the catchment outlet. In real data case, the estimated streamflow at the catchment outlet is significantly improved by assimilating the in situ streamflow measurements at interior gauges. Assimilating the in situ catchment outlet streamflow also improves the streamflow prediction of one interior location on the main reach. This may demonstrate that updating model states using streamflow observations can constrain the flux estimates in distributed hydrological modeling.


Introduction
Catchment hydrological modeling is critical for water resources management and planning (e.g., the flood control and drought monitoring).However, the accuracy and reliability of the hydrological modeling are generally subjected to various uncertainties associated with model forcing inputs, model parameters, and model structures [1,2].Data assimilation (DA) techniques, owning to the capability to merge modeling and observations by jointly considering the uncertainties in them, have been commonly used in hydrology to improve the rainfall-runoff simulation and prediction [3][4][5][6].Previous researches have demonstrated that assimilating the remote sensing data, for example, the satellite soil moisture products [4,7,8], into hydrological modeling could improve the hydrological process.
Assimilating the discharge observations has more advantages in rainfall-runoff improvement, since the discharge observations are more reliable and more directly related to the streamflow modeling.This concept is intensively tested in lumped hydrological modeling [9][10][11][12].In distributed modeling, the studies on streamflow assimilation keep growing in recent years.Clark et al. [13] employed streamflow observations to update the states of a distributed model (TopNet) to improve the streamflow prediction and demonstrated that transforming streamflow into log space before computing the covariance would improve the filter performance of the ensemble Kalman filter (EnKF) [14].Xie and Zhang [5] proved the capability of the EnKF in the improvement of parameter estimation and thus improved the hydrological modeling in SWAT by assimilating the streamflow observations in synthetic experiments.Lee et al. [15] validated the 2 Advances in Meteorology potential of updating soil moisture states of a distributed model by assimilating streamflow and in situ soil moisture data using a variational assimilation approach for the prediction of streamflow and soil moisture.Streamflow assimilation has the potential to improve the distributed hydrological simulations and predictions via updating the model states in the model propagating process, but to date, this potential is not fully exploited; still few studies evaluate this forecast improvements of distributed hydrological modeling that are possible from the improved estimates of basin states via assimilating streamflow observations.However, in theory, the distributed models are expected to outperform the lumped ones in rainfall-runoff modeling as it could account for the spatial variability of meteorological inputs and the underlying land surface conditions of a catchment [16].Unlike the lump models that can only produce streamflow prediction at the basin outlet, the distributed models have the capability to provide the streamflow prediction at interior locations.This makes the streamflow assimilation in distributed models not only benefit the streamflow simulations and predictions at the upstream of the observation sites but also can be attempted to improve that of the neighboring or downstream areas.This property is especially important for the streamflow predictions of ungauged basins.Considering the significance of the streamflow assimilation in the improvement of the distributed hydrological simulations and predictions and still few studies on this aspect at present, the study on streamflow assimilation in distributed models needs to be strengthened and their potential in improving distributed hydrological modeling needs to be further explored.
Besides, the state variables of the distributed models are characterized with spatial variability, which are usually controlled by the underlying land surface characteristics of the catchment (e.g., the topography, land cover, and soil types).This makes the state update probably be influenced by the underlying land surface conditions; thus the performance of streamflow assimilation in improving rainfall-runoff modeling is also affected.Having a good understanding of these factors would be important in improving the streamflow assimilation performance.However, to our knowledge, the influencing factors which related the land surface conditions on streamflow assimilation in distributed modeling are rarely reported by now.
In this study, we investigate the capability of streamflow assimilation in improving the distributed hydrological modeling of upper Huai River basin via updating the basin states.This investigation is based on the Soil and Water Assessment Tool (SWAT) and the ensemble Kalman filter (EnKF) method.Both the synthetic experiment and real application are considered; the synthetic experiment assesses the upper bound of the streamflow assimilation performance under idealized conditions of unbiased model parameters and well-known modeling and observation errors, while the real application provides a more realistic streamflow assimilation performance.Besides, we also analyze the influence of the underlying land surface conditions (e.g., slopes) on the performance of streamflow assimilation.

Soil and Water Assessment Tool (SWAT)
. SWAT is a physically based basin scale distributed model developed by USDA (United States Department of Agriculture) Agricultural Research Service.It has been widely used in predicting the impact of land management practices on water, sediment, and agricultural chemical yields in large, complex catchments with varying soil, land use, and management conditions over long periods of time [17].In SWAT, the catchment is geographically partitioned into different subbasins; then each subbasin is further delineated into several Hydrological Response Units (HRUs) based on soil, land cover, and slopes.HRUs are the basic calculation units of the model, on which the surface runoff, soil water, lateral flow, and ground water are produced.The generated water from HRUs is aggregated at subbasins and discharges to the outlet of the catchment through river routing process.
The surface runoff on a given day ( surf ) is simulated by the modified SCS curve number method: where  is the precipitation on a given day (mm) and  is the retention parameter (mm) that can be described by where CN is the SCS curve number for a given day;it is determined by the initial curve number CN 2 [17] and the profile soil water content on the same day.
The reduced rainfall by surface runoff recharging to soil profile is redistributed using the storage routing technique at each soil layer in the root zone as SW ly,2 = SW ly,1 +  perc,ly −  lat,ly −  ,ly , where SW ly,1 and SW ly,2 are the soil water content at the beginning and end of the day (mm) in layer ly, respectively,  perc,ly is the percolation from the overlaying layer (mm) to layer ly,  ,ly is the actual evapotranspiration drawn from layer ly during the day, and  lat,ly is the lateral flow generated in layer ly calculated by where  is the number of layers on soil profile, FC ly is the soil water content of layer ly at field capacity (mm),  sat,ly is the saturated hydraulic conductivity (mm/h) of layer ly, slp is the slopes (m/m),  soil,ly is the total porosity of the soil layer ly (mm/mm),  FC,ly is the porosity of layer ly when it is at field capacity, and  hill is the hill slope length (m).
For groundwater modeling, SWAT incorporates two aquifers in each subbasin: the shallow (unconfined) and deep (confined) aquifer.Only the water in the shallow aquifer contributes to flow in the main channel of the subbasin, where the water balance is described by aq sh,2 = aq sh,1 +  rchrg,sh −  gw −  revap −  pump,sh , (5) Advances in Meteorology 3 where aq sh,1 and aq sh,2 are the water stored in the shallow aquifer at the beginning and end of the day (mm),  rchrg,sh is the recharge entering the shallow aquifer during the day (mm),  gw is the ground water flow to the main channel during the day (mm),  revap is the water moving into the soil zone in response to the water deficiency for the day (mm), and  pump,sh is the water removing from shallow aquifer by pumping during the day (mm).
The surface runoff (  surf ) and lateral flow (  lat ) to the main channel on a given day are calculated by ( 6) and (7).One has where  stor,−1 is the surface runoff lagged from the previous day (mm), surlag is the surface runoff lag coefficient, and  conc is the time of concentration for the surface runoff.One has where  latstor,−1 is the lateral flow lagged from the previous day (mm) and TT lag is the lateral flow travel time (days).
The channel water balance in the reach can be generally expressed as where  stored,1 and  stored,2 are the volume of water in the reach at the beginning and end of the day respectively (m 3 ),  in and  out are the volume of water flow into and out of the reach during the day (m 3 ), and  out is calculated by (9) in variable storage routing method (see [17]): where SC is the storage coefficient with an upper limit of 1.0 calculated by where Δ is the length of time step (s) and TT is the reach travel time (s), which is a result of the main channel length divided by the discharge rate.

The Ensemble Kalman Filter (EnKF).
Based on the theory of the linear Kalman filter [18], the ensemble Kalman filter (EnKF) incorporates a Monte Carlo method to generate the state ensemble with a certain distribution to represent its probability.The generated state ensemble is propagated forward in time using the model , with forcing inputs (), model parameters , and system uncertainty () as where () is the predicted model state ensemble at time  and the state variables to be included will be detailed in Section 4.1. represents SWAT model in this study, and the model uncertainty () is assumed to be a Gaussian distribution multiplied to the system.When the observations are available, the predicted model state ensemble () is related to the observation ensemble () using an operator  as where V() represents the observation error being assumed to be a Gaussian distribution with zero mean and covariance of .The state update is obtained by where   () is the updated state ensemble and () is the Kalman gain, which determines the weight of modeling and observation in state update and is calculated by the forecast error covariance and the observation error covariance as where   is the cross error covariance between the predicted states () and the measurement predictions [()] and   is the forecast error covariance of the measurement predictions [19].

Evaluation Method.
In order to assess the performance of streamflow assimilation, the root mean square error, Nash-Sutcliffe coefficient of efficiency, Pearson's correlation coefficient and normalized error reduction index are used.
The root mean square error (RMSE) can be expressed by [20] where  is the total time step and  obs  and  sim  are the measured and simulated streamflow at time .
The Nash-Sutcliffe coefficient (NSE) is expressed by [21] where  obs indicates the mean value of the measured streamflow for the whole period.Pearson's correlation coefficient is obtained by [22] CC = where CC represents the correlation coefficient,  sim  is the simulated states at time , and  sim indicates the mean value of simulated states for the whole period.The normalized error reduction index (NER) is expressed by [23]

Study Area and Data Used
The study catchment is located between 113 ∘ 15  E∼116 ∘ 00  E and 31 ∘ 30  N∼33 ∘ 00  N in the upper Huai River basin in China, with an area of 16005 km 2 and an elevation from 25 m to 1117 m (Figure 1).It is dominated by flat area except for the west and south west mainly covered by mountains and hills.The basin locates in the transition zone between the northern subtropical and the warm temperate zone; the annual average precipitation is about 900 mm and the annual average temperature is around 15 ∘ C. The rainfall is considerably influenced by monsoons during flood season from June to September, so the precipitation with 50% to 80% falls in this period.The major land cover of this catchment is agriculture, forest, and brush.Agriculture is the dominant land use type, the majority of which is rice (34.98%) and wheat (32.49%).The input data required by SWAT model mainly includes the meteorological forcing data and the underlying land surface data.The meteorological input data includes precipitation, maximum/minimum temperature, solar radiation, wind speed, and relative humidity.The precipitation is provided by 106 local rainfall gauges in the basin, while for each subbasin, the precipitation input is the interpolated value of the 106 rainfall stations using the Thiessen polygon method.The 5 types of subsequent meteorological data above are collected from 2 meteorological stations (Xinyang and Guangshui station) in or near the catchment (Figure 1).The land surface data contains the topography data (digital elevation data, DEM), soil category, and land cover data.The DEM data is downloaded from the Shuttle Radar Topography Mission (SRTM) with a spatial resolution of 90 m (http://srtm.csi.cgiar.org/SELECTION/inputCoord.asp).The soil data is resampled from a soil map at a scale of 1 : 100000 collected from Soil Handbook of Henan province.According to the Soil Handbook, there are 7 types of soil in this basin; the area proportions, soil texture, and the corresponding USDA (United States Department of Agriculture) classification are present in Table 1.Besides, the land use data is resampled from a year-1995 land use map at a scale of 1 : 210000 provided by the government of Xinyang city.
There are 6 streamflow gauges in the catchment (Figure 1), which provides daily streamflow discharges for model calibration, validation, and data assimilation.The locations of all the 6 streamflow stations are set as the subbasin outlet.And the catchment is partitioned into 28 subbasins (Figure 1) in total based on the DEM data.Then, it is further delineated into 82 HRUs according to the slope, soil, and land use information.Besides, the soil on profile is divided into 4 layers to maximum.

Implementation of Streamflow Assimilation in SWAT
4.1.Selecting State Variables to Be Updated in SWAT.Considering the complicated model physics and structures and the large number of state variables in SWAT, there is a possibility that the spurious correlations exist in a high dimensional state vector and, thus, bring large degrees of freedom to the state update in streamflow assimilation.In order to reduce the freedom in state update, only the state variables that are strongly dependent on the streamflow measurement predictions (i.e., the state variables sensitive to the streamflow modeling) will be updated.Based on the physical significance of the state variables in SWAT, the five variables are preliminarily chosen (Table 2).Then, the objective is to analyze the correlations of the preliminary selected five variables with the simulated streamflow at the six runoff gauges (Figure 1) to finally determined variables to be updated in the assimilation.The analysis results are present in Figure 2, taking the fifteenth calculation unit (HRU) and the sixth subbasin as an example.It can be found that compared to the surface runoff storage ( stor ) and reach streamflow storage ( stored ), the correlation coefficients (CC) of the lateral flow storage ( latstor ), shallow aquifer storage (aq sh ), and soil water storage (SW ly ) of 4 layers are comparatively low.For other 81 calculation units and 27 subbasins, the analysis results are similar to the above conclusions.Therefore, in this study, only  stor and  stored are selected to be updated.It is noteworthy that the predicted streamflow at measurement sites is also included in the state vector, but they are not updated as they are diagnostic variable rather than state variable [13].As the predicted streamflow at measurement locations is included in the state vector, there is an exact match between the observations and their model equivalent.Therefore, the state transition matrix  from modeling to observations in EnKF (12) can be constructed to have a value of 1 for elements where there is a model prediction of the observation and 0 where there is no equivalent of observation [24].

Modeling and Observation
Error.In this study, the uncertainty of hydrological modeling comes from the errors of meteorological forcing inputs (e.g., precipitation), model parameters, and model structures.Each ensemble member in EnKF is perturbed using the assumptions summarized in Table 3.The error of precipitation is assumed to be uncorrelated both on temporal scale between continuous time steps and on spatial scale with different stations.This design of the precipitation error considers the previous studies on the system measurement error of rainfall at precipitation gauges in China [25].The small error design for the predicted model states (i.e., the surface runoff storage and the reach flow storage) is to avoid rapid changes between continuous time steps in the modeling process.The assumed error on the measurement prediction of streamflow flux is to account for the errors of model structure.Besides, the ranging standard deviation (SD) for the sensitive parameters (Table 4) is to ensure that the parameters still stay in their physical threshold after perturbation.The assumed error for streamflow observations considers the characteristics of the uncertainty of streamflow measurement [12] and the error assumption adopted in previous studies [13].

Synthetic and Real Data Assimilation.
In the synthetic experiment, one reference field is randomly picked out from a Gaussian distribution with given model inputs.Then, the model is propagated forward to obtain a reference modeling process, which is regarded as the synthetic truth.The synthetic streamflow observations are generated by randomly perturbing the synthetic true streamflow drawn from the reference simulation using a Gaussian multiplicative error with the same standard deviation as that of the streamflow observation error.An ensemble integration with 200 members of SWAT model is performed with known errors (Table 3) of model inputs, model parameters, and states, which is regarded as the Open loop (EnOL) run.In the EnKF run, the assimilation integration is performed by introducing the synthetic streamflow observations to the stochastic modeling process with the same errors as that of the EnOL run.
Here, two cases are designed: (1) assimilating the synthetic streamflow observations at 5 interior sites (i.e., Dapoling, Changtaiguan, Xixian, Zhuganfu, and Huangchuan); (2) assimilating the synthetic streamflow observations at the catchment outlet (Huaibin).The performance of streamflow assimilation is illustrated by comparing the predicted streamflow obtained by EnKF and EnOL with the synthetic truth as a reference on the whole basin.
In real data assimilation, the error set of model and observation for EnOL and EnKF is identical to that of the synthetic experiment.Similar to the synthetic experiment, we consider the following two cases: (1) the streamflow measurements at the five interior discharge gauges are assimilated and the in situ streamflow measurements at the catchment outlet are used for validation; (2) the streamflow measurements at the catchment outlet are assimilated and the measured streamflow at the five interior sites is adopted for validation.

Assimilation of the Streamflow Observations at Interior
Sites. Figure 3 compares the root mean square error (RMSE) of the estimated streamflow at all subbasin outlets (located at different sites on the reach) under the assimilation of the streamflow observations at the five interior sites to that without streamflow assimilation for the whole assimilation period from Jan. 1, 1996 to Dec. 31, 1997.It can be found that the RMSE is reduced by EnKF for all subbasin outlets.However, for different subbasins, the streamflow assimilation performance shows large difference.This can be better illustrated by the normalized reduction of RMSE (i.e., NER), the distribution of which on the whole basin is present in Figure 4. To make the NER distribution on the basin clear, the NER value at the subbasin outlet is put on the subbasin where it is located.We can see that the estimated streamflow at the subbasin outlets obtains different levels of improvement by EnKF as the NER ranges from 5.43% to 64.05%.It seems that this improvement shows an increasing trend from upstream to downstream of the main reach where the discharges gauges Dapoling, Changtaiguan, and Xixian (DPL, CTG, and XX) are located.The predicted streamflow at the catchment outlet obtains significant improvement compared with the fact that no data assimilation is performed.Figure 5 presents the time series of the estimated streamflow obtained by EnKF and EnOL at the catchment outlet.It can be seen in Figure 5(a) that the outlet streamflow is significantly improved by EnKF for the whole assimilation period.In terms of the ensemble mean of the simulated streamflow, EnKF obtains significant improvement on the accuracy as it is close to the synthetic truth with the RMSE reduced from 62.18 m 3 /s to 23.24 m 3 /s (NER = 63%); the NSE and the correlation coefficient (CC) increased from 0.97 and 0.98 to approximately 1.0, respectively.The uncertainty of the simulated streamflow in EnOL is significantly reduced by EnKF, as the ensemble spread is largely decreased in EnKF.Besides, it is clear in Figures 5(b  The improvement of streamflow prediction by EnKF should be benefited from the real-time updating of the surface runoff storage ( stor ) and the reach flow storage ( stored ) on the basis of the observed streamflow, because according to design, the sole difference between EnKF and EnOL is that both  stor and  stored are updated in EnKF but not in EnOL. stor and  stored in EnKF are compared with those in EnOL with their synthetic truth as a reference.Figure 6 displays the RMSE distribution of  stor for 82 HRUs in EnKF and EnOL.It can be found that the RMSE of  stor is significantly reduced by EnKF as both the mean and the spread of RMSE are considerably decreased.Figure 7 presents the RMSE of  stored in EnOL and EnKF and the normalized reduction of RMSE (NER) for  stored on 28 subbasins.By comparing Figures 7(a) and 7(b), we can see that the RMSE of  stored on most subbasins are substantially reduced by EnKF, especially on subbasins 6, 8, 12, and 14.This could be better illustrated by Figure 7(c), where the NER on all subbasins are above zero and those on subbasins 6, 8, 12, and 14 are over 70%.Overall,  stor and  stored updated by EnKF significantly approach their synthetic truth.
For different subbasins and HRUs, the impact of streamflow assimilation on the update of  stored (Figure 7) and  stor (Figure 6) is significantly different, as the net reduction calculated by RMSE EnOL − RMSE EnKF and the normalized reduction of RMSE (18) for  stored on different subbasins show large difference and the RMSE distribution for  stor changes significantly after state update.This difference can be partly related to the underlying land surface conditions.Figure 8 shows the impact of the slopes, soil category, and land cover on the update of the surface runoff storage ( stor ) as the three factors determine the HRU delineation.Figure 8(a) shows that the  stor on the HRUs with low slopes (<5 ∘ ) obtain large net reduction of RMSE (RMSE EnOL − RMSE EnKF ), which contributes significant improvement of rainfall-runoff modeling.Figures 8(b) and 8(c) illustrate that the large reduction of RMSE occurs on the HRUs with soil category of Huanghetu (S-1), Shuidaotu (S-2), and Shajiangheitu (S-7) (all the three soils belong to silt loam in USDA soil classification) and with land cover of rice (RICE) and mixed agriculture mainly covered by wheat and corn (AGRC).
Figure 9 presents the impact of the main channel length of the subbasins on the state update of the reach flow storage ( stored ) in terms of the normalized reduction of RMSE (NER) for  stored .It shows that the less significant state update of  stored (NER < 25%) mainly occurs on subbasins with short main channel length (<18 km).This can be interpreted from the channel routing equations ( 8), (9), and (10) in SWAT.The main channel with short length reduces the channel flow travel time (TT) to a relatively small value, which makes the storage coefficient (SC) approaches its upper limit of 1.0 (10).Thus, the  stored is decreased to zero value of ( 8) and ( 9) frequently in the model running process, which leads to the update of this variable with no significance.Therefore, the subbasin with short main channel tends to have less significant update of  stored in the assimilation.Besides, it seems that the NER displays an increasing trend with the increase of the main channel length, which indicates the strong dependency of the state update of the reach flow storage on the length of the reach itself.

Assimilation of the Streamflow Observations at Catchment
Outlet. Figure 10 compares the root mean square error (RMSE) of the estimated streamflow at all subbasin outlets (located at different sites of the reach) in the case of the assimilation of the streamflow observations at the catchment outlet (HB, i.e., Huaibin station) to that without streamflow assimilation for the whole assimilation period.
It can be seen that the estimated streamflow at the outlet of several subbasins (e.g., subbasins 1, 5, 6, 9, and 14) is improved by assimilating the streamflow at the catchment outlet as their RMSE are reduced.The outlet streamflow of the whole basin (the outlet of subbasin 6) obtains the most significant improvement in terms of the degree of the RMSE reduction.This can be better illustrated by the normalized reduction of RMSE (i.e., NER), the distribution of which for all subbasin outlets is present in Figure 11.To make the NER distribution on the basin clear, the NER values at the subbasin outlet are put on the whole subbasin where it is located.It seems that the improvement of the streamflow modeling shows a decreasing trend from downstream to upstream by assimilating the catchment outlet streamflow observations, especially for the sites located at the main reach.This trend is consistent with that of the correlation coefficients between the simulated streamflow at the catchment outlet and that at the subbasin outlets (Figure 12), as they also present a decreasing trend from downstream to upstream on the whole basin.Hence, it can be inferred that the assimilation of the streamflow observations tends to be effective in improving the streamflow estimation for the catchments where the streamflow modeling is closely related to that of the measurement predicted streamflow.

Assimilation of the Streamflow Measurements at the Five
Interior Gauges.Figure 13 compares the estimated streamflow at the catchment outlet obtained by EnOL and EnKF under the assimilation of the streamflow measurements at the five interior gauges.It can be seen that the simulated streamflow in EnOL is considerably improved (e.g., the day around 260, 320, and 450) by the EnKF based state update for the entire data assimilation period, which can be fully illustrated by the reduction of RMSE from 271.58 m 3 /s to 204.7 m 3 /s with a reduction of 25% (NER = 25%) and the increase of NSE from 0.62 to 0.78 and the correlation coefficient from 0.79 to 0.9.However, it can also be found from Figures 13(b) and 13(c) that this assimilation shows some limited capability in the error correction of streamflow prediction when there are large biases between the simulated and the observed streamflow.This limitation of the streamflow assimilation performance might be related to the unsatisfied (or possibly biased) model parameters, which can hardly be avoided in parameter calibration, especially for the distributed models with complicated model physics and structures and large numbers of parameters.Besides, the biased parameters will deteriorate the state update of hydrological modeling in the assimilation process [10].(15) for the estimated surface runoff storage ( stor ) on 82 HRUs by EnKF and EnOL with varying (a) slope steepness, (b) soil category, and (c) land cover.S-1, S-2, S-3, S-4, S-5, S-6, and S-7 are the seven types of soil (Table 1).RICE, AGRC, FRST, SESB, and MESQ represent the land cover of rice, mixed agriculture, mixed forest, the Sesbania, and honey mesquite, respectively.The solid black line is the 1 : 1 line.

Assimilation of the Streamflow Measurements at the
Catchment Outlet.Table 5 summarizes the statistics of the  estimated streamflow by assimilating the streamflow measurements at the catchment outlet (Huaibin station) comparing with the fact that no data assimilation is performed.It  can be found that the streamflow at Xixian station obtains a considerable improvement by the catchment outlet streamflow assimilation, as the RMSE is decreased and both the NSE and CC are increased by EnKF.However, the simulated streamflow at Zhuganfu and Huangshan is worse than that without streamflow assimilation.It might be caused by the over updates of the model state variables as the correlation coefficients between the streamflow at the measurement site (Huaibin station) and that at the interior gauges for model simulation are comparatively higher than that for observation (Figure 14).This can be related to the inadequacy in modeling the spatial variability of the hydrological process due to the insufficient spatial variability of model inputs and parameters [13].Besides, the assimilation of the catchment outlet streamflow has nonsignificant impacts on the streamflow modeling of Dapoling and Changtaiguan, which can be partly explained by the relatively low correlation coefficients between their simulated streamflow and that at the measurement sites (Huaibin station).

Conclusions
This study uses both the synthetic experiment and real data application to investigate the capability of streamflow assimilation in improving the streamflow predictions of upper Huai River basin via updating the basin states in distributed hydrological modeling process.The two sensitive model state variables (i.e., the surface runoff storage and the reach flow storage) in SWAT are updated using the streamflow observations (1) at the five interior sites and (2) at the catchment outlet.
The synthetic experiment shows that the predicted streamflow at all subbasin outlets is improved by assimilating the synthetic streamflow observations at the five interior sites.This improvement shows an increasing trend from upstream to downstream of the main reach, and the predicted streamflow at the catchment outlet obtains the most significant improvement.This improvement of the catchment streamflow modeling is benefited from the updating of  the surface runoff storage and reach flow storage of the whole basin as the two state variables significantly approach their synthetic truth after assimilation.It is found that the significant improvement of the surface runoff storage generally occurs on the calculation units with the slopes less than 5 ∘ .We also find that the main channel length of the subbasins has a considerable impact on the updating of reach flow storage ( stored ), as  stored with short main channel tend to get less significant improvement compared to that with long main channel.Besides, the assimilation of the streamflow observations at the catchment outlet improves the estimated streamflow at the outlet of several subbasins near the catchment outlet.The improvement of the streamflow modeling shows a decreasing trend from downstream to upstream, especially for the sites located at the main reach, which is consistent with that of the correlation coefficients between the simulated streamflow at the catchment outlet and that at the subbasin outlets.
The real data application shows that a large improvement of the estimated streamflow at the catchment outlet is obtained by updating the basin states using the in situ streamflow measurements at the five interior gauges in rainfallrunoff modeling.The assimilation of the in situ catchment outlet streamflow measurements improves the streamflow modeling at Xixian, produces a slight worse impact on the streamflow estimation at Zhuganfu and Huangchuan, and has nonsignificant influence on the streamflow modeling of the upstream of Dapoling and Changtaiguan.In this study, the method for model and observation error parameter estimation is subjective and empirical to a certain extent as it generally relies on experience and conclusions from other previous studies.More rigorous and theoretical basis methods for quantifying the model and observation error parameters, for example, the adaptive assimilation approach [2,26] and the maximum a posterior approach [27], should be attempted in future studies.Besides, the streamflow assimilation performance in distributed hydrological modeling relies on the capability of the distributed model in characterizing the spatial variability of the real hydrological process, which is usually controlled by the spatial variability of the model inputs and parameters.More works should be paid to improving the spatial variability of model forcing data, and new and optimal methods for parameter estimations and calibrations are needed to produce more accurate and reliable model parameters on spatial scale [28].

Figure 2 :
Figure2: Correlation between the state variables of the fifteenth calculation units and the sixth subbasin on the horizontal axis and the simulated streamflow at six streamflow gauges (DPL, HB, XX, CTG, HC, and ZGF in Figure1).

Figure 3 :
Figure 3: The RMSE (15) of the estimated streamflow by EnOL and EnKF at all subbasin outlets in the assimilation of the streamflow observations at the five interior sites.
) and 5(c) that this improvement on high flow is significant, the simulated

Figure 4 :
Figure 4: The NER (18) of the estimated streamflow for all subbasin outlets obtained by assimilating the streamflow observations at five interior gauges (DPL, CTG, ZGF, XX, and HC).The NER value at the subbasin outlet is displayed on the subbasin where it is located.

Figure 5 :Figure 6 :
Figure 5: Comparison of the estimated streamflow by EnKF and EnOL at the basin outlet (Huaibin station) in synthetic experiment: (a) the whole period from Jan. 1, 1996 to Dec. 31, 1997; (b) and (c) details of the period with high flows.The light blue and red lines indicate the ensemble members of the EnOL and EnKF, respectively.

Figure 7 :
Figure 7: RMSE of the estimated river flow storage ( stored ) on 28 subbasins by (a) EnOL and (b) EnKF for the whole assimilation period.(c) NER (18) of the estimated  stored . 1 to 28 are the subbasin number.

Figure 9 :
Figure 9: The NER (18) of the estimated reach flow storage on 28 subbasins with varying main channel length of subbasins.

Figure 10 :
Figure 10: The RMSE (15) of the estimated streamflow obtained by EnOL and EnKF at the subbasin outlet in the assimilation of the streamflow observations at the catchment outlet.

Figure 11 :
Figure 11: The NER (18) of the estimated streamflow for all subbasin outlets obtained by assimilating the streamflow observations at the catchment outlet (HB station).The NER value at the subbasin outlet is displayed on the subbasin where it is located.

Figure 12 :
Figure12: The correlation coefficients (CC) between the simulated streamflow at all subbasin outlets and that at the catchment outlet (HB station).The value of CC for one subbasin outlet is displayed on the subbasin where it is located."Corr.Coef.versus HB" indicates the correlation coefficients between the streamflow at the subbasin outlet and that at HB station.

Figure 13 :
Figure 13: Comparison of the simulated streamflow by EnKF and EnOL at the basin outlet (HB station) in real data application: (a) the whole period from Jan. 1, 1996 to Dec. 31, 1997; (b) and (c) details of the period with high flow.The light blue and red lines represent the ensemble members of EnOL and EnKF, respectively.

Figure 14 :
Figure 14:  The correlation coefficient between the streamflow at the catchment outlet (HB station) and that at the five interior gauges (DPL, CTG, ZGF, HC, and XX station) for both observation and model simulations.

Table 1 :
Soil classification and its area proportions in the upper Huai River basin.

Table 2 :
State variables in SWAT model chosen for correlation analysis.
Note:  lay is the number of soil layers on profile in a hydrologic response unit (HRU).

Table 3 :
Error parameters in the synthetic experiment.

Table 4 :
The sensitive parameters in SWAT for rainfall-runoff modeling and their standard deviation (SD) of Gaussian multiplicative error in streamflow assimilation.
Note:  lay is the number of soil layers on profile in a hydrologic response unit (HRU); SD represents standard deviation; GME represents Gaussian multiplicative error.

Table 5 :
The statistics of the estimated streamflow obtained by EnOL and EnKF at the five interior discharge gauges.
Note: CC represents the correlation coefficient.