Spatial Interpolation of Annual Runoff in Ungauged Basins Based on the Improved Information Diffusion Model Using a Genetic Algorithm

1Research Center of Ocean Environment Numerical Simulation, Institute of Meteorology and Oceanography, PLA University of Science and Technology, Nanjing, China 2Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disaster, Nanjing University of Information Science & Technology, Nanjing 210044, China 3Key Laboratory of Surficial Geochemistry, Ministry of Education, Department of Hydrosciences, School of Earth Sciences and Engineering, State Key Laboratory of Pollution Control and Resource Reuse, Nanjing University, Nanjing 210093, China


Introduction
Prediction in Ungauged Basins (PUB) [1] is an important task for water resources planning and management and remains a fundamental challenge for the hydrological community.Prediction in Ungauged Basins (PUB) was identified as a key issue in hydrological studies by IAHS.Accurate estimates of hydrologic variables at ungauged sites such as streamflow allow objective, quantitative, and statistical decision-making with respect to water resources management and natural hazard assessments.The lack of data for model calibration and verification in ungauged basins requires the hydrological regionalization [2] to transfer information (e.g., model parameters) from gauged catchments.The regionalization allows estimating parameter values of hydrological predictive tools without calibration.Regionalization can be defined as the transfer of information from one catchment to another [2].This transfer is typically from gauged to ungauged catchments (e.g., [3,4]).Its aim is to estimate parameter values of hydrological models for any/every grid cell, subcatchment, or large geographic region without a need for calibration or "tune" of the model to get the best fit.
Over the years, regionalization has received increasing attention from the hydrological community.A number of regional models are currently available, including (1) proxybasin method [5,6], (2) spatial interpolation method, for instance, linear interpolation by Guo et al. [7], the inverse distance weighting (IDW) interpolation by Di Piazza et al. (2001), and Kriging interpolation by Vandewiele and Elias [8], (3) clustering approach [9,10], (4) bi-and multivariate regression method [11,12], and (5) one step regression-regional calibration [13].Among those, spatial interpolation method is one of the earliest and most widely used methods, which estimates the value of unknown spatial data based on known spatial data.Its essence is the spatial forecasting of the whole unknown region using a few known points.Deterministic and geostatistical techniques are two main groupings of spatial interpolation techniques to produce a continuous surface from point measurements.Deterministic interpolation techniques create surfaces from measured points using mathematical functions, which are based on either the extent of similarity (e.g., inverse distance weighted) or the degree of smoothing (e.g., radial basis functions).Geostatistical interpolation techniques (e.g., Kriging) utilize both the mathematical and the statistical properties of the measured points [14,15].Recent studies highlight that geostatistical interpolation, which has been originally developed for the spatial interpolation of point data (see e.g., [16]), can be effectively applied to the prediction problem of the streamflow regime in ungauged basins [17,18].
Recently, geostatistical methods have proven valuable for estimating hydrological variables in ungauged catchments.In all geostatistical methods, the traditional Kriging and its related algorithms (e.g., Universal Kriging or Cokriging (COK)) are the most widely used.Skøien and Blöschl [18], for instance, developed the topological Kriging technique (or top-Kriging), which accounts for hydrodynamic and geographical dispersion.Their results indicate that this technique can not only outperform deterministic runoff models in regions where stream gauge density is sufficiently high, as it avoids problems with input data errors and parameter identifiability, but also provide more robust estimates than regional regression models [19].Comparison of top-Kriging with Physiographical-Space Based Interpolation (PSBI) highlights the complementary utility of the two methods for headwater and larger scale catchments [20].
However, four major limitations of geostatistical methods (e.g., Kriging and its related algorithms) are presented.
(1) The ordinary Kriging is defined as a "best linear unbiased estimator."Kriging is "linear" because its estimates are calculated by a linear equation.While the change of runoff is a nonlinear process, these will cause some deviations.(2) The methods need more sites for modelling, generally more than eight sites [21].So it is more feasible when it is used in the large watershed interpolating.When the study sites are little in minor watershed, the method can do nothing for the space structure of hydrology variables.(3) The interpolation process of many geostatistical methods needs to add water balance constraints, which controls the export flows of each subbasin.Before the runoff calculation, the runoff data in the study should be normalized [22].In the interpolation process, in order to measure the correlation between the basins, the distance between the subbasins of the interpolation algorithm need to be redefined [23].The calculation of the entire process is complex and the conversion is troublesome.
(4) The application of these methods is mainly in European natural basin [24].The application in the watershed acutely impacted by human activities needs to be verified.
In order to overcome the above four questions, we need to find new spatial interpolation method which is more effective and rational.So we introduce the information diffusion model (IDM) in our paper.IDM is a useful method to deal with the small sample problem [25,26].Spreading the observed data can extract many additional information based on the diffusion methods.Huang [25] can easily determine simple window width (SWW) with incomplete data based on the nearby criteria.This method was widely used for reliability of risk assessment [27][28][29].But IDM with SWW (SIDM) cannot accurately calculate the hydrological or meteorological data which follow abnormal distribution.To solve the problem, Wang et al. [30] presented the optimal window width based on IDM (OIDM) using the principle of least mean squared errors.But the optimal window width (OWW) may easily cause the local optimal problem.Genetic Algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA).Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems [31].Hence, to get the global optima diffusion coefficients, Bai et al. [32] propose a new information diffusion model using a GA (GIDM) to interpolate the river runoff.But they just used GIDM method to interpolate the river runoff time series.Now our paper expand the idea of GIDM and use them to spatially interpolate the river runoff in ungauged basins.
No previous hydrological literature reported that the IDM has been used for establishing a hydrological model for hydrological spatial interpolation.So we explain the principle of the SWW and OWW in Section 2; we also discuss the new method how to improve the information diffusion model based on GA.In Section 3, we illustrate interpolation of river runoff based on IDM in detail.To test our new method, seven experiments for the runoff interpolation based on the GIDM in the Yellow River are carried out in Section 4, compared with SIDM, OIDM, IDW, and COK.Finally, Section 5 summarizes and discusses the results.

Simple Window Width and Optimal Window
Width.The principle of information diffusion and simple window width (SWW) is discussed in the Appendix and was introduced in the previous literature [25].
The SWW method extracted more data from the small sample [26,33].However, when the population does not follow a normal distribution, the method is invalid.Based on the idea of the least mean square error, the optimal window width (OWW)  can be obtained as follows [30]: where  and  mean the ordinal number of records and iterations; initial iterative value is  0 . ( The OWW based on the mean value will cause the local optimization problem [31].So GA is introduced.

Searching the General Optimal Window Width Based on GA.
Based on the principle of natural genetics and biological evolution, Genetic Algorithms (GA) can effectively avoid the "local optima" problem [31].So we use GA to search the global optimal diffusion coefficients.For window width searching, the combination of the IDM and GA includes three major phases.Firstly, the GA initializes a population that compounds random codes from the search domain (0, ( − )/3] [30], where  and  are the minimum and maximum value of the samples.Then we carry out the evaluation of the fitness of all chromosomes.Based on Wang et al. [30], the window width  can be obtained as follows: where   means different records from sample ( = 1, 2, . . ., ) and f() denotes the information diffusion estimator.The second-order schemes are motivated by where The global optimal  can be searched as follows: The evolutionary processes can be found in [31].So we can get the improved window width (IWW).The above sentences explain the techniques of IDM and improved IDM; how to interpolate the runoff based on IDM will be discussed in following section.

Information Diffusion Method with Fuzzy
Inference for the Runoff Estimation The IDM based on the numerical method [28]  Let  be the range of  and  be the domain of . will denote the element of , and the same for V by .Let  = {  ,  = 1, 2, . . ., } ,  = {V  ,  = 1, 2, . . ., } .

(7)
Dealing with membership functions based on (A.1) and (A.2), the following equation can be obtained: and   are window widths.(  , V  ) is called an illustrating point.The information gain of (  , V  ) is as follows: Then, A fuzzy relation matrix  [34] is as follows: which can be got from an information matrix  by using Then  is denoted as the input fuzzy set to calculate the output fuzzy set ; The fuzzy inference formula is used as follows: in which max-min fuzzy composition rule is denoted as operator "∘"; where (  , V  ) ∈ (0, 1]; so we can get Finally, we can generate the gravity center of the fuzzy set as the output: In general, the given sample (, ) is used to construct the relationship between river discharge and its meteorological factor or its antecedent values as follows: where   is an input vector including  1 , . . .,   , . . .,   , and Ỹ means the flow in the ungauged basins.So the value of river runoff in the ungauged basins can be obtained by the IDM method.

Case Study
To test the runoff interpolation effectiveness of our model (GIDM), we carried out six experiments.These experiments can divide into five groups.Experiments 1 and 2 are spatial interpolation experiments of runoff on mainstream of the Yellow River; Experiments 3 and 4 are spatial interpolation experiments of runoff on tributaries of the Yellow River.Experiments 5 and 6 are spatial interpolation experiments of runoff on the mainstream and tributaries of the Yellow River.These six experiments are carried out to test the spatial interpolation and prediction ability of GIDM model for the mainstream and tributaries of the same river basin.Finally, experiment 7 is the spatial interpolation experiment of runoff in Daying mine region in Guizhou Province (representing nonclosure small watershed of no runoff information), which is used to validate the spatial interpolation and prediction ability of GIDM model for the minor watershed with few hydrological sites.An application of GIDM for runoff interpolation is compared with SIDM, OIDM, IDW, and COK based on the same date.

Study Area and Data.
Tangnaihai, Lanzhou, Toudaoguai, Longmen, Tongguan, Huayuankou, Gaocun, Aishan, and Lijin stations on mainstream of the Yellow River and Hongqi, Huangfu, Wenjiachuan, Baijiachuan, Fanguyi, Zhangjiashan, Baimasi, and Huaxian stations on tributaries of the Yellow River have been selected for the experiments 1-6 (see Figure 1).The Yellow River is the third-longest river in Asia, following the Yangtze River and Yenisei River, and the sixthlongest in the world at the estimated length of 5,464 km.Originating in the Bayan Har Mountains in Qinghai province of western China, it flows through nine provinces, and it empties into the Bohai Sea near the city of Dongying in Shandong province.The Yellow river basin has an east-west extent of about 1,900 kilometers and a north-south extent of about 1,100 km.Its total basin area is about 742,443 square kilometers.So the hydrological stations on the Yellow River are selected for its importance to test the performance of our model.
Yu sha, Baina, Gaofeng, Xiangshui, and Liulong stations in Daying mine region in Guizhou Province (representing nonclosure small watershed of no runoff information) have been selected for the experiment 7.

Experiments on Estimating Annual Runoff Time Series at
Stations on Mainstream of the Yellow River.In this section, two real examples of the annual runoff data (×10 8 m 3 ) from 2002 to 2011, taken at nine stations on mainstream of the Yellow River (Figure 1), are presented on mainstream of the same river basin by different models.

Experiment 1: Interpolating Annual Runoff of Two Stations Based on the Data of Other Seven
Stations.The sites are selected as Tangnaihai, Lanzhou, Toudaoguai, Longmen, Tongguan, Huayuankou, Gaocun, Aishan, and Lijin stations on mainstream of the Yellow River (Figure 1).As Table 1 shown, the latitude, longitude, altitude, and catchment area of nine sites are known.The natural annual runoff data of seven sites (Tangnaihai, Toudaoguai, Longmen, Tongguan, Huayuankou, Gaocun, and Aishan) are known for modelling, while the natural annual runoff data of Lanzhou site (located upstream of the Yellow River) and Lijin site (located downstream of the Yellow River) are seen as unknown for testing, which are obtained through spatial interpolation.The actual annual runoff data of Lanzhou site and Lijin site in Table 1 can be used to test the results of spatial interpolation.
The specific procedure of the test is as follows.
Step 1.The data of seven sites (Tangnaihai, Toudaoguai, Longmen, Tongguan, Huayuankou, Gaocun, and Aishan) on mainstream of the Yellow River is used as the modelling sample, in which latitude, longitude, altitude, and catchment area are seen as input data and the natural runoff data in 2011 are seen as output data.So the input data is  = ( 1 ,  Step 2. Calculate the window width of IDM based on the discussion in the Section 2. Table 2 is the results of SWW, IWW, and OWW.
Step 3. Appropriate illustrating points are very important because they present the basic information of the system.The illustrating space can be reconstructed based on the statistical analysis.
Step 5. Similar as the above calculation progress of the natural runoff of two sites in 2011, we can also get the annual natural runoff results of two sites from 2002 to 2010.
To further test the interpolating results of our GIDM model, we not only use the SIDM and OIDM methods to be compared with the GIDM model, but also choose the IDW and COK interpolation method, which are two most commonly spatial interpolation methods now, to be compared with the GIDM model.Specific principles and operational steps of the IDW and COK interpolation method can be seen in the related literature [35][36][37], which are not described in detail here.
Figure 2 shows interpolated and observed runoffs in Lanzhou station and Lijin station.The preformation of GIDM is better than the other four models for two sites.From Figure 2(a), some obvious oversimulations exist in 2003, 2004, and 2006, which are the same as undersimulations in 2009, 2010, and 2011 obtained by SIDM, OIDM, IDW, and COK, while the GIDM exhibits a good correlation.Although a few discrepancies between simulated data and observed data using GIDM exist (such as from 2004 to 2006), considering the limited samples, the general tendency can be accepted.
The Mean Absolute Percentage Error (MAPE, [38]), the Root Mean Square Error (RMSE, [39,40]), the Nash and Sutcliffe coefficient (E, [41]), and the coefficient of correlation (, [39]) are selected to test the model.The RMSE, R, E, and MAPE of different models are shown in Table 3. From Table 3, the IDW performs worst and GIDM performs best among  five different models.For example, in Lanzhou site from 2002 to 2011, considering a high value of 169.1500 × 10 8 m 3 (the average annual runoff) at the Lanzhou gauging station, the GIDM with an RMSE value of 6.48 × 10 8 m 3 performed satisfactorily up to the interpolation.Moreover, the GIDM obtained the best R, E, and MAPE statistics of 0.9367, 0.8457, and 12.33% in Lanzhou site, respectively, while the MAPEs of other four models are all over 28%.So the interpolation results of GIDM not only are better than those of the traditional IDW and COK interpolation method, but also can improve traditional IDM interpolation by about 10-40%.Therefore, based on Table 3, the averages of MAPE, RMSE, E, and  of GIDM are 15.38%, 6.225 (×10 8 m 3 ), 0.8335, and 0.9314, so the GIDM has consistency and reliable robustness.
The test site Lanzhou is located in the upper reaches of the Yellow River, which is little affected by human activities, while the test site Lijin is located in the lower reaches of the Yellow River, which is intensely influenced by human activities.The site observation data implies the impact of human activities to the natural runoff.Under this case, in this study area the GIDM method is used to research the spatial specificity of runoff and the RMSE, , , and MAPE statistics of interpolation results is 5.97 (×10 8 m 3 ), 0.9260, 0.8213, and 18.42%, which is a little worse than those of Lanzhou site.It indicates that the interpolation and prediction in the region intensely influenced by human activities are more difficult than those in the region little affected by human activities, but on the whole its interpolation results are still accurate,  1 are known for modelling, while the annual natural runoff data of Lanzhou site (located upstream of the Yellow River), Huayuankou site (located midstream of the Yellow River), and Lijin site (located downstream of the Yellow River) are seen as unknown for testing (Figure 1), which can be obtained through spatial interpolation.The actual annual runoff data of Lanzhou, Huayuankou, and Lijin site in Table 1 can be used to test the results of spatial interpolation.Through similar calculation steps as experiment 1, the annual natural runoff of three sites from 2002 to 2011 can be obtained.Figure 3 describes reconstructed and observed runoff from different models.The correlation of the GIDM between reconstructed and observed runoff is best among the five different models, although some slight oversimulations and undersimulations exist.Table 4 shows a comparison among different five models and the GIDM method obtains best accuracy among five different models in terms of different evaluation measures.The average of RMSE, , , and MAPE of the GIDM is 7.77 (×10 8 m 3 ), 0.9170, 0.8218, and 20.50%.Comparing with the average results of GIDM in Table 3, we can see that the interpolating results are worse in experiment 2 than those in experiment 1.In addition, simulations of all five models gradually become bad based on more data for training.That is because in experiment 2 there are only 6 sites to train and the samples contain less information of the runoff than that in experiment 1.But the average MAPE of GIDM is 20.50% and the average  is above 0.9, which indicates that the general results of GIDM are still good and could be acceptable, considering the only six training samples.
Similar as the experiment 1, in this experiment, the interpolation and prediction in the region intensely influenced by human activities (Huayuankou and Lijin sites) are more difficult than those in the region little affected by human activities (Lanzhou site), but on the whole their interpolation results are still accurate.

Experiments on Estimating Annual Runoff Time Series at Stations on the Tributaries of the Yellow River.
In this section, two real examples of the annual runoff data (×10 8 m 3 ) from 2002 to 2011, taken at eight stations on the tributaries of the Yellow River (Figure 1) are shown to illustrate the implementation on the tributaries of the same watershed by different models.

Experiment 3: Interpolating Annual Runoff of Two Stations Using the Data of Other Six Stations.
In experiment 3, from 2002 to 2011, the annual natural runoff data of six sites (Huangfu, Wenjiachuan, Baijiachuan, Ganguyi, Zhangjiashan, and Huaxian) on the tributaries of the Yellow River are known for modelling, while the annual natural runoff data of two sites (Hongqi and Baimasi) on the tributaries of the Yellow River are seen as unknown for testing (Figure 1), which can be obtained through spatial interpolation.The actual annual runoff data of Hongqi and Baimasi site can be used to test the results of spatial interpolation.Through similar calculation steps as experiment 1, the annual natural runoff of two sites from 2002 to 2011 can be obtained.
Figure 4 describes reconstructed and observed runoff from different models.The correlation of the GIDM between reconstructed and observed runoff is best among the five different models, although some slight oversimulations and undersimulations exist.Table 5 shows a comparison among different five models and the GIDM method obtains best accuracy among five different models in terms of different evaluation measures.The averages of RMSE, , , and MAPE of the GIDM are 2.975 (×10 8 m 3 ), 0.9184, 0.8197, and 32.16%.Comparing with the average results of all models in Table 4, we can see that the interpolating results of all models in experiment 3 are worse than those in experiment 2, which also has only six training samples.For example, the average MAPE of GIDM is 32.16% in experiment 3, while that in experiment 2 is only 20.50%.The reason is that the sites in experiment 3 on the tributaries of the Yellow River also belong to different subrivers, like the Hongqi site belonging to Tao subriver, Huangfu belonging to Huangfuchuan subriver, Wenjiachuan belonging to kuye subriver and Baimasi belonging to Beiluo subriver, and so forth.So the underlying surface and river hydrological processes are more complex, which will make the interpolation and prediction more difficult.Also, considering the modelling data, the data magnitudes of nine sites on the mainstream of the Yellow River in experiments 1 and 2 are not greatly different from each other, while the data magnitudes of eight sites on the tributaries of the Yellow River in experiment 3 are very different from each other.For example, the maximum ten-year average annual runoff is Huaxian, which is 52.89 (×10 8 m 3 ), and the minimum ten-year average annual runoff is Huangfuchuan, which is only 0.3022 (×10 8 m 3 ).The large differences among  the runoff magnitudes in different sites can also affect the interpolation results.But the average MAPE of GIDM is 32.16% and the average  is above 0.9 in experiment 3, which indicates that the general results of GIDM still could be acceptable.

Experiment 4: Interpolating Annual Runoff of Four Stations Using the Data of Other Four Stations.
In experiment 4, from 2002 to 2011, the annual natural runoff data of four sites (Huangfu, Wenjiachuan, Baijiachuan, and Zhangjiashan) on the tributaries of the Yellow River are known for modelling, while the annual natural runoff data of four sites (Hongqi, Ganguyi, Baimasi, and Huaxian) on the tributaries of the Yellow River are seen as unknown for testing (Figure 1), which can be obtained through spatial interpolation.The actual annual runoff data of Hongqi, Ganguyi, Baimasi, and Huaxian site can be used to test the results of spatial interpolation.Through similar calculation steps as experiment 1, the annual natural runoff of four sites from 2002 to 2011 can be obtained.
Figure 5 describes reconstructed and observed runoff from different models.The correlation of the GIDM between reconstructed and observed runoff is best among the five different models, although some slight oversimulations and undersimulations exist.Table 6 shows a comparison among different five models and the GIDM method obtains best accuracy among five different models in terms of different evaluation measures.The averages of RMSE, , , and MAPE of the GIDM are 3.21 (×10 8 m 3 ), 0.9033, 0.7891, and 39.32%.Comparing with the average results of all models in Table 5, we can see the interpolating results of all models The average MAPE of GIDM is 32.16% and the average  is above 0.9, which indicates that the general results of GIDM still could be acceptable.Considering only 4 sample sites for modelling, the spatial interpolation of the GIDM method is still good.

Experiments on Estimating Annual Runoff Time Series at Stations on the Mainstream and Tributaries of the Yellow
River.In this section, two real examples of the annual runoff data (×10 8 m 3 ) from 2002 to 2011, taken at 17 stations on the mainstream and tributaries of the Yellow River (Figure 1) are shown to illustrate the implementation on the mainstream and tributaries of the same watershed mixed together by different models.

Experiment 5: Interpolating Annual Runoff of Three
Stations Using the Data of Other 14 Stations.In experiment 5, from 2002 to 2011, the annual natural runoff data of 14 sites (Tannaihai, Toudaoguai, Longmen, Huangfu, Wenjiachuan, Baijiachuan and Zhangjiashan, etc., 7 sites on the mainstream of the Yellow River and 7 sites on the tributaries of the Yellow River) are known for modelling, while the annual natural runoff data of three sites (Lanzhou, Lijin, and Hongqi, 2 sites on the mainstream of the Yellow River and 1 site on the tributaries of the Yellow River) are seen as unknown for testing (Figure 1), which can be obtained through spatial interpolation.The actual annual runoff data of Lanzhou, Lijin, and Hongqi sites can be used to test the results of spatial interpolation.Through similar calculation steps as experiment 1, the annual natural runoff of three sites from 2002 to 2011 can be obtained.
Figure 6 describes reconstructed and observed runoff from different models.The correlation of the GIDM between reconstructed and observed runoff is best among the five different models, although some slight oversimulations and undersimulations exist.Table 7 shows a comparison among different five models and the GIDM method obtains best accuracy among five different models in terms of different evaluation measures.The average of RMSE, , , and MAPE of the GIDM is 5.86 (×10 8 m 3 ), 0.9348, 0.8277, and 25.16%.Comparing with the average results of all models in Table 5, we can see the interpolating results of all models are better in experiment 5 than those in experiment 3.For example, the average MAPE of GIDM is 25.16% in experiment 5, while that in experiment 3 is only 32.16%.That is because similar as the experiment 3, the underlying surface and river hydrological processes of the mainstream and tributaries of the Yellow River mixed together are more complex, which will make the interpolation and prediction more difficult.And the large differences among the runoff in different sites can also affect the interpolation results.But the sample sites in experiment 5 are many, 14 sites, while the test sites are only three.More samples contain the more information, which will cause the final interpolation and prediction results to be better than those in experiment 3. The average MAPE of GIDM is 25.16% and the average  is above 0.9, which indicates that the general results of GIDM are good and still could be acceptable.Other 11 Stations.In experiment 6, from 2002 to 2011, the annual natural runoff data of 11 sites (Tannaihai, Toudaoguai, Longmen, Huangfu, Wenjiachuan and Baijiachuan, etc., 6 sites on the mainstream of the Yellow River and 5 sites on the tributaries of the Yellow River) are known for modelling, while the annual natural runoff data of six sites (Lanzhou, Huayuankou, Linjin, Hongqi, Ganguyi, and Huaxian, 3 sites on the mainstream of the Yellow River and 3 sites on the tributaries of the Yellow River) are seen as unknown for testing (Figure 1), which can be obtained through spatial interpolation.The actual annual runoff data of Lanzhou, Huayuankou, Linjin, Hongqi, Ganguyi, and Huaxian site can be used to test the results of spatial interpolation.Through similar calculation steps as experiment 1, the annual natural runoff of six sites from 2002 to 2011 can be obtained.Figure 7 describes reconstructed and observed runoff from different models.The correlation of the GIDM between reconstructed and observed runoff is best among the five different models, although some slight oversimulations and undersimulations exist.Table 8 shows a comparison among different five models and the GIDM method obtains best accuracy among five different models in terms of different evaluation measures.The averages of RMSE, , , and MAPE of the GIDM are 6.63 (×10 8 m 3 ), 0.8719, 0.7669, and 40.56%.Comparing with the average results of all models in Table 7, we can see the interpolating results of all models are worse in experiment 6 than those in experiment 5.That is because in experiment 6 the sample sites drastically reduce and the samples include less information of the runoff, which cause the interpolation results turning bad.

Experiment 6: Interpolating Annual Runoff of Six Stations Using the Data of
Also comparing with the average results of all models in Table 5, we can see the that interpolating results of all models are worse in experiment 6 than those in experiment 3.For example, the average MAPE of GIDM is 40.56% in experiment 6, while that in experiment 3 is only 32.16%.This shows that under the less training sample the interpolation on the mainstream and tributaries of the Yellow River mixed together will be more difficult than that in experiment 3.That is because the underlying surface and river hydrological processes of the mainstream and tributaries of the Yellow River mixed together are more complex than those in experiment 3, which will make the interpolation and prediction more difficult.Also, considering the modelling data, the differences of data magnitudes of 17 sites on the mainstream and tributaries of the Yellow River in experiments 6 (Difference of magnitude is 1000 times) are more than those of 8 sites on the tributaries of the Yellow River in experiment 3 (Difference of magnitude is 100 times).The large differences among the runoff magnitudes in different sites can also affect the interpolation results.
Previous studies also found that the spatial interpolation results on the mainstream and tributaries of the same river basin mixed together are slightly worse than those only on the mainstream or on the tributaries of the same river basin [24,42,43].For example, Yan et al. [24] carried out a case study of the Huaihe river basin above Bengbu based on a hydrostochastic approach method.His study also found that the interpolation on the mainstream and tributaries of the Yellow River mixed together will be difficult.So in this experiment, the average MAPE of GIDM is 40.56% and the average  is above 0.85.Considering the less sample sites and the difficultly of the interpolation on the mainstream and tributaries of the Yellow River mixed together, the general results of GIDM are good and still could be acceptable.Daying mine region is located in the western of Guizhou Plateau and belongs to north subtropical monsoon warmwet weather patterns.The annual rainfall is about 1000 mm and the region belongs to rain source river basin.The Daying mine region belongs to the upstream basin of the Wuxi River and there is only Liulong hydrological station in this region.In this experiment, the Daying mine region can be seen as nonclosure small watershed of no runoff information (the Liulong sites can be seen as a test site).There are four hydrological sites (Yusha, Baina, Gaofeng, and Xiangshui) nearby the mine basin.The annual runoff data of each hydrological station have been reorganized and comprehensively checked, which makes the data available and reliable.Similarly as the experiment 1, from 2002 to 2011, the annual natural runoff data of 4 sites (Yusha, Baina, Gaofeng, and Xiangshui) are known for modelling, while the annual natural runoff data of the Liulong site in the Daying mine region (nonclosure small watershed of no runoff information) are seen as unknown for testing, which can be obtained through spatial interpolation.
Table 9 shows a comparison among different five models and the GIDM method obtains best accuracy among five different models in terms of different evaluation measures.The averages of RMSE, , , and MAPE of the GIDM are 4.55 (×10 5 m 3 ), 0.9352, 0.8204, and 27.54%.Comparing with the average results of the above eight experiments, we can see for nonclosure small watershed of no runoff information, under the few training samples, the interpolating results of GIDM are still accurate, indicating that our model has good general applicability and reliability.Comparing with the previous model, the advantages of our model are as follows: (1) The GIDM model is a nonlinear information diffusion model, which is in line with the changes in the hydrological runoff.
(2) The demand of the training sites number is not high, such as the experiment 4. For a few training sample, the spatial structure simulation of hydrological variables is good and the interpolation results are accurate.
(3) The interpolation process of many geostatistical methods needs to add water balance constraints, which controls the export flows of each subbasin.Before the runoff calculation, the runoff data in the study should be normalized [22].In their interpolation process, in order to measure the correlation between the basins, the distance between the subbasins of the interpolation algorithm should needs to be redefined [23].The calculation of the entire process is complex and the conversion is troublesome.But the calculation process of our GIDM model is simple and convenient to operate.
(4) The test shows that the interpolation results of our model for the watershed intensely influenced by human activities (such as the Lijin sites in experiments 1 and 2) are also very good.(1) GIDM is a useful tool for the river runoff spatial interpolation.The traditional method may get the acceptable result based on much more data.Through 6 experiments of the three species, we can fully verify that the interpolation results of the GIDM method not only are better than those of the traditional IDW and COK methods, but also can improve the traditional IDM interpolation by about 10−40%.

Conclusions
(2) In the middle and lower reaches of the Yellow River intensely influenced by human activities, the site observation data implies the impact of human activities on the natural runoff.Under this situation, the interpolation results of the GIDM method in study area are still good, which can overcome the traditional hydrological interpolation methods having good interpolating results only in the natural watershed.
(3) The previous six experiments proved that the interpolation results of the GIDM method on great rivers are good.Through experiment 7, we can see for nonclosure small watershed of no runoff information, under the few training samples, that the interpolating results of GIDM are still accurate, indicating that our model has good general applicability and reliability.
(4) Through the four different types of experiments, we can conclude that the number of modelling sites, the differences of the underlying surface and river hydrological processes among sites, and the difference of data magnitudes among sites are three key factors impacting the interpolation results of our GIDM method.This is why the interpolation and prediction results in experiments 6 are the worst among all the experiments.Even so, the average MAPE of GIDM in experiment 6 is under 45% and the average  is above 0.85, still in the acceptable range.
Although the new IDM is a useful tool for the river runoff spatial interpolation, it still cannot be acceptable with too small samples.So our future work will focus on these items.For example, we can establish a more useful diffusion function, or we can save computation time of the IDMs, and so forth.

Figure 1 :
Figure 1: Map of the Yellow River basins showing the location of hydrological stations.

Table 1 :
The latitude, longitude, altitude, catchment area, and the annual runoff from 2002 to 2011 of nine sites on mainstream of the Yellow River.

Table 2 :
The corresponding diffusion coefficients of Experiment 1.

Table 3 :
The RMSE, , , and MAPE of five different models in Lanzhou site and Lijin site (where boldface font indicates the best performance).

Table 4 :
The RMSE, R, E, and MAPE of five different models in Lanzhou site, Huayuankou site, and Lijin site (where boldface font indicates the best performance).

Table 5 :
The RMSE, , , and MAPE of five different models in Hongqi site and Baimasi site (where boldface font indicates the best performance).

Table 6 :
The RMSE, , , and MAPE of five different models in Hongqi site, Ganguyi, Baimasi, and Huaxian site (where boldface font indicates the best performance)., we can see that the degree of the test results variation of the other four models is much larger than that of the GIDM method.The test results of the GIDM method only slightly deteriorate.For example, the MAPE of OIDM in Table6is 66.30%, which is a 34.51% increase than that of OIDM in Table5, while the MAPE of GIDM in Table6is 39.32%, which is a 22.26% increase than that of GIDM in Table5.Similarly, the reductions of  and  of GIDM are about 2.81% and 5.21%, which also indicate that the results of GIDM in experiment 4 are just a little worse than those in experiment 3.

Table 7 :
The RMSE, , , and MAPE of five different models in Lanzhou site, Lijin site, and Hongqi site (where boldface font indicates the best performance).

Table 9 :
[47][46][47], and MAPE of five different models in Liulong site (where boldface font indicates the best performance).But there is an obvious error between their simulated results and the observed results.Recently, regionalization has widely used in the PUB (Jin et al., 2008);[45][46][47].Regionalization is typically used to estimate parameter values of hydrological predictive tools for catchments without observed streamflow.But the prediction results are still not satisfactory and should be improved.For example, Jin et al. (2008) carried out regionalization study of a conceptual hydrological model in Dongjiang basin, south China.The hypothetical ungauged catchments produced acceptable results with an average coefficient of efficiency (ME) value equals to 0.72.Li et al.[47]propose a new regionalization method, called the index model, and predict flow duration curve in ungauged basins.The average coefficient of efficiency of their model is 0.722.
[44]Discussion.Although Prediction in Ungauged Basins(PUB) in recent years has attracted more and more attention of research scientists and they made a lot of research, however, due to the problem complexity and methods limitations, the researches in this area still have much room to be improved.For example, Srinivasan et al.[44]propose a framework for developing Soil and Water Assessment Tool (SWAT) input data, including hydrography, terrain, land use, soil, tile, weather, and management practices, for the Upper Mississippi River basin (UMRB).The uncalibrated SWAT model ably predicts annual streamflow at 11 USGS gauges and crop yield at a four-digit hydrologic unit code (HUC) scale.
. Our paper proposes a new method for reconstructing the river runoff based on incomplete data.The paper provides a new method to solve the Prediction in Ungauged Basins (PUB) problem.The algorithm can improve the IDMs by unraveling more information.Conventional IDMs, IDW, and COK are used for the comparison.Seven experiments based on GIDM for the runoff interpolation at different stations on the Yellow River are carried out, compared with SIDM, OIDM, IDW, and COK.These experiments can divide into five groups.The advantages of our method are as follows: