JAM Journal of Applied Mathematics 1687-0042 1110-757X Hindawi Publishing Corporation 346045 10.1155/2013/346045 346045 Research Article Uncertainty Analysis of Multiple Hydrologic Models Using the Bayesian Model Averaging Method Dong Leihua 1, 2 http://orcid.org/0000-0001-6990-2414 Xiong Lihua 1 Yu Kun-xia 1 Li Y. P. 1 State Key Laboratory of Water Resources and Hydropower Engineering Science Wuhan University Wuhan 430072 China whu.edu.cn 2 National Research Center for Sustainable Hydropower Development China Institute of Water Resources and Hydropower Research Beijing 100038 China 2013 5 12 2013 2013 30 08 2013 06 11 2013 2013 Copyright © 2013 Leihua Dong et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Since Bayesian Model Averaging (BMA) method can combine the forecasts of different models together to generate a new one which is expected to be better than any individual model’s forecast, it has been widely used in hydrology for ensemble hydrologic prediction. Previous studies of the BMA mostly focused on the comparison of the BMA mean prediction with each individual model’s prediction. As BMA has the ability to provide a statistical distribution of the quantity to be forecasted, the research focus in this study is shifted onto the comparison of the prediction uncertainty interval generated by BMA with that of each individual model under two different BMA combination schemes. In the first BMA scheme, three models under the same Nash-Sutcliffe efficiency objective function are, respectively, calibrated, thus providing three-member predictions ensemble for the BMA combination. In the second BMA scheme, all three models are, respectively, calibrated under three different objective functions other than Nash-Sutcliffe efficiency to obtain nine-member predictions ensemble. Finally, the model efficiency and the uncertainty intervals of each individual model and two BMA combination schemes are assessed and compared.

1. Introduction

To date, various hydrological models have been put forward and widely used in flood forecasting, planning, and water resources management [1, 2]. Since different models have strengths in capturing different aspects of the real world processes, combining the results from diverse models by weighting procedures can present a better performance than any individual model . The early model combination researches in hydrologic forecasting employed such tools as neural network  and fuzzy system . Recently, Bayesian Model Averaging (BMA), a method for averaging over different competing models, has been introduced to ensemble hydrologic predictions.

Bayesian Model Averaging came to prominence in statistics in the mid-1990s, and Madigan and Raftery  were the first to propose this method for combining predictions. Subsequently, Raftery  and Draper  gave more detailed discussion about BMA. It has been applied in diverse fields such as economics , biology , ecology , public health , toxicology , meteorology , and management science . In many case studies, BMA produces accurate and reliable predictions and was shown to be a better scheme than other model-combining methods . In recent years, hydrologists have also applied BMA to hydrologic modeling, such as groundwater  and rainfall-runoff modeling .

A prediction from a single model has been recognized to be associated with a certain degree of uncertainty, and so is the prediction from combining a number of different single models. Thus, uncertainty analysis is an indispensable element for any hydrologic modeling study. The uncertainty usually arises from errors during the calibration of parameters, the design of model structure, and measurements of input and output data [25, 26]. To account for these uncertainties, many uncertainty analysis techniques have been developed and applied to diverse catchments, such as Generalized Likelihood Uncertainty Estimation (GLUE), Parameter Solution (ParaSol), and Bayesian inference based on Markov chain Monte Carlo (MCMC) [27, 28]. Each of those techniques has its own advantage in uncertainty analysis. In the uncertainty analysis of BMA scheme, the composition of Monte Carlo method  is used to generate BMA probabilistic ensemble predictions, and then the 90% uncertainty intervals can be derived within the range of the 5% and 95% quantiles.

Previous studies of BMA in hydrology mostly focused on the comparison of the BMA mean prediction with each individual model’s prediction, to prove the better performance of the prediction after weighted averaging. As BMA also has the ability to provide a statistical distribution of the quantity to be forecasted, the research focus in this study is shifted onto the comparison of the prediction uncertainty interval generated by the BMA with that of each individual model, in order to see if BMA can also improve the prediction reliability. The technical route of the research in this paper is described in Figure 1. Another purpose of this paper is that by calibrating different hydrological models under different objective functions, each of which has distinctive advantages in better modeling certain flow ranges, we can construct different sets of ensemble members for combination in order to fully explore the superiority of BMA. Therefore, two kinds of BMA combination schemes are designed, analyzed, and compared. In the first BMA scheme, we calibrate each of the three models under the same Nash-Sutcliffe efficiency objective function, thus providing three-member predictions ensemble for the BMA combination. In the second BMA scheme, three different objective functions other than Nash-Sutcliffe efficiency are adopted, each of which is supposed to have some advantage of better simulating a certain range of flows (low flow, medium flow, and high flow). All three models are, respectively, calibrated for each of three objective functions to obtain the optimized parameter sets.

Flowchart of using BMA scheme for hydrological ensemble prediction as well as for prediction uncertainty analysis.

2. Methods 2.1. Bayesian Model Averaging

Bayesian Model Averaging (BMA) is a statistical technique designed to infer a prediction by weighted averaging over many different competing models. This method is not only a scheme for model combination but also a coherent approach for accounting for between-model and within-model uncertainty . Below is a brief description of the basic ideas of this method.

Let us consider a quantity Q to be predicted on the basis of input data D=[X,Y] (X denotes the input forcing data, and Y stands for the observational flow data). f=[f1,f2,,fK] is the ensemble of the K-member predictions. The probabilistic prediction of BMA is given by (1)p(QD)=k=1Kp(fkD)·pk(Qfk,D).

The terms in (1) are explained as follows. p(fkD) is the posterior probability of the prediction fk given the input data D and reflects how well model fk fits Y. Actually p(fkD) is just the BMA weight wk, and better performing predictions receive higher weights than the worse performing ones; all weights are positive and should add up to 1. pk(Qfk,D) is the conditional probability density function (PDF) of the predictand Q conditional onfk and D. For computation convenience, pk(Qfk,D) is always assumed to be a normal PDF and is represented as g(Qfk,σk2)~N(fk,σk2), where σk2 is the variance associated with model prediction fk and observations Y. In order to make this assumption valid, some techniques such as Box-Cox transformation are needed to make the data approximately normally distributed and to narrow the data range.

The BMA mean prediction is a weighted average of the individual model’s predictions, with their posterior probabilities being the weights. In the case that the observations and individual model predictions are all normally distributed, the BMA mean prediction can be expressed as (2)E[QD]=k=1Kp(fkD)·E[g(Qfk,σk2)]=k=1Kwkfk.

2.2. EM Algorithm for BMA Parameter Estimation

To estimate BMA weight wk and model prediction variance σk2, the Expectation-Maximization (EM) algorithm, which has proved to be an efficient technique for BMA calculation based on the assumption that K-member predictions are normally distributed, is described in this section .

Firstly, if we denote the set of BMA parameters to be estimated by θ={wk,σk2,k=1,2,,K}, the log form of likelihood function can be represented as (3)l(θ)=log(p(QD))=log(k=1Kwk·g(Qfk,σk2)).

It is difficult to maximize the function (3) by analytical method. The EM algorithm is a method for finding the maximum likelihood by alternating between two steps, the expectation step and maximization step. The two steps are iterated to convergence when there is no significant change between two consecutive iterative log-likelihood estimations. In EM algorithm, a latent variable (unobserved quantity) zkt is used as an assistant for estimating BMA weight wk. The procedure of EM algorithm for BMA scheme is described as follows.

( 1) Initialization. Set Iter=0.

Initialize (4)wk(0)=1K,σk2(0)=k=1Kt=1T(Yt-fkt)2K·T, where Iter is the number of iteration and T is the number of data in the calibration period. Yt and fkt are denoted as the observation and the corresponding prediction by the kth model for the time t.

( 2) Calculate the Initial Likelihood: (5) l ( θ ) ( 0 ) = t = 1 T log ( k = 1 K ( w k ( 0 ) · g ( Q f k t , σ k 2 ( 0 ) ) ) ) .

( 3) Compute the Latent Variable. Set Iter=Iter+1, then calculate (6)zkt(Iter)=g(Qfkt,σk2(Iter-1))k=1Kg(Qfkt,σk2(Iter-1)).

( 4) Update the Weight: (7) w k ( Iter ) = 1 T ( t = 1 T z k t ( Iter ) ) .

( 5) Update the Variance: (8) σ k 2 ( Iter ) = t = 1 T z k t ( Iter ) · ( Y t - f k t ) 2 t = 1 T z k t ( Iter ) .

( 6) Update the Likelihood: (9) l ( θ ) ( Iter ) = t = 1 T log ( k = 1 K ( w k ( Iter ) · g ( Q f k t , σ k 2 ( Iter ) ) ) ) .

( 7) Check for Convergence. If l(θ)(Iter)-l(θ)(Iter-1) is less than a prespecified tolerance level, stop the whole estimation procedure; else go back to Step (3).

2.3. Estimation of Prediction Uncertainty Interval

After BMA weight wk and prediction variance σk2 being estimated, we use the composition of Monte Carlo method to generate BMA probabilistic predictions for any time t . The procedures are described as follows.

Generate an integer value of k from [1,2,,K] with probability [w1,w2,,wK]. A specific procedure is described as follows.

Set the cumulative weight w0=0 and compute wk=wk-1+wk for k=1,2,,K.

Generate a random number u between 0 and 1.

If wk-1u<wk, it indicates that we choose the kth member of the ensemble predictions.

Generate a value of Qt from the PDF of g(Qtfkt,σk2). Here, g(Qtfkt,σk2) represents the normal distribution with mean fkt and variance σk2.

Repeat the above steps (1) and (2) for M times. M is the probabilistic ensemble size. In this paper, we set M=100.

After generating the BMA probabilistic ensemble predictions, sort them in the ascending order. Then the 90% uncertainty intervals can be derived within the range of the 5% and 95% quantiles.

For each individual model in the BMA scheme, the prediction uncertainty interval can also be constructed, with the Monte Carlo sampling method still being used to approximate the assumed PDF of g(Qtfkt,σk2).

3. Materials 3.1. Study Area and Data

The study area is Mumahe catchment, a branch of Han River. It is located in Shanxi Province of China and the total area is 1224 km2. The basin has a subtropical climate, and the area is humid with fairly high precipitation. The mean annual rainfall for the period of 1980–1987 is 1070 mm, and the mean annual runoff is 687 mm, or roughly 64% of the annual rainfall. The hydrological data include daily runoff, rainfall, and evaporation. There are 2992 data points in total, and 1825 (the period of 1980.1.1–1985.12.31) of them are used for calibration, while the rest 1167 data points (the period of 1986.1.1–1987.12.31) are used for validation.

3.2. Hydrological Models and Optimization Algorithm

In this study, three conceptual hydrological models are employed for testing the capability of BMA: the Xinanjiang Rainfall-Runoff Model (XAJ), the Soil Moisture Accounting and Routing Model (SMAR), and SIMHYD Rainfall-Runoff Model.

Xinanjiang Rainfall-Runoff Model was developed in 1970s. It is a conceptual hydrologic model, which has been widely used in humid and semihumid regions of China. And all the 15 parameters of this model have strong physical meanings. SMAR model is a lumped conceptual model with soil moisture as a central theme. The model consists of two components in sequence: a water balance component with 5 water balance parameters and a routing component with 4 routing parameters. SIMHYD model is a daily conceptual model that estimates daily stream flow from daily rainfall and areal potential evapotranspiration data and it contains 7 parameters . For calibrating these hydrological models, Shuffled Complex Evolution (SCE-UA) method is employed here for parameter optimization .

3.3. Objective Functions

The selection of objective function (OF) is of great importance since it will have great influence on the values of calibrated parameters and thus on simulation results of the rainfall-runoff model. Different objective functions can be adopted for different kinds of practical issues. For example, the objective function of squared model errors of squared transformed flow can be applied in high flow studies, and the objective function of squared model errors of logarithmic transformed flow can be applied in low flow studies . In this study, four objective functions have been used for the parameter calibration.

( 1) OF1: The Nash-Sutcliffe Coefficient of Efficiency ( R 2 ): (10) R 2 = 1.0 - t = 1 T ( Q obs t - Q sim t ) 2 t = 1 T ( Q obs t - Q obs ¯ ) 2 , where Qobst and Qsimt are observed and simulated data at time t and Qobs¯ is the average of observed data in the calibration period.

( 2) OF2: Mean Squared Error of Squared Transformed ( M S E S T ): (11) MSEST = t = 1 T ( Q obs t 2 - Q sim t 2 ) 2 T . Transforming the observed data in squared form puts great emphasis on fitting peak values.

( 3) OF3: Mean Squared Error of Squared Root Transformed ( M S E S R T ): (12) MSESRT = t = 1 T ( Q obs t - Q sim t ) 2 T . MSESRT can be employed in the medium flow simulation.

( 4) OF4: Mean Squared Error of Logarithmic Transformed ( M S E L T ): (13) MSELT = t = 1 T ( ln Q obs t - ln Q sim t ) 2 T .  This transformation helps model parameterization to better fit the low flow values.

3.4. Construction of BMA(3) and BMA(9) Schemes

When the prediction data are highly non-Gaussian, we should firstly transform the data to be normally distributed by Box-Cox transformation before using EM algorithm. OF1 is the most widely used objective function for parameter optimization and is used in calibrating each of three hydrological models mentioned above to generate three different predictions. We combine these three different predictions by BMA to construct a three-member predictions ensemble; thus, we denote the first BMA scheme as BMA(3). Figure 2 shows the procedure of BMA(3) combination scheme. The other three objective functions, that is, OF2, OF3, and OF4 are, respectively, fit for high, medium, and low flow simulation. All three hydrological models are, respectively, calibrated for each of these three objective functions to obtain the optimized parameter sets. As the same model with different parameter sets will give rise to different outcomes, nine different predictions are generated. We can use BMA method to combine these nine different predictions to construct a nine-member predictions ensemble, which is just the second BMA scheme denoted as BMA(9). The procedure of BMA(9) combination scheme is described in Figure 3.

Diagram of BMA(3) combination scheme.

Diagram of BMA(9) combination scheme.

Let E denote the uncertainty of the forecast, and it can be written as E=[Eh,Em,El], including three components, that is, the high flow simulation uncertainty Eh, the medium flow simulation uncertainty Em, and the low flow simulation uncertainty El. In BMA(9), the forecasts which are generated under OF2 have relatively small Eh, so they can get higher weights than other forecasts in high flow simulation. Similarly, the forecasts generated under OF3 have relatively high weights in medium flow simulation, while the ones generated under OF4 have higher weights than others in low flow simulation. By averaging the forecasts from a set of different combinations of hydrological model and objective function, the advantage of BMA(9) is its ability to reduce the simulation errors by giving weights to each of the nine-member forecasts according to their performance in different flow ranges.

3.5. Performance Criteria for Evaluating the Mean Prediction

There are three indices for evaluating the mean prediction.

( 1) The Nash-Sutcliffe Coefficient of Efficiency ( R 2 ). The definition of R2 has expressed in (10). R2 is not only an objective function but also a widely used performance criterion. It ranges from minus infinity to 1.0, with higher values indicating better agreement. It is difficult to evaluate the performance of the model with R2 in all flow ranges, since the value of R2 is always negative in the medium flow range.

( 2) Daily Root Mean Square Error  ( D R M S ): (14) DRMS = t = 1 T ( Q obs t - Q sim t ) 2 T , where Qobst and Qsimt are observed and simulated data at time t. DRMS is sensitive to the differences between the observations and simulations. The lower the DRMS value is, the better the prediction performance is.

( 3) Relative Error of Total Runoff ( R E ): (15) RE = 1.0 - t = 1 T Q sim t t = 1 T Q obs t . It reflects the performance in the simulation of the total runoff amount. Lower values of RE indicate better agreement of total surface runoff.

3.6. Performance Criteria for Assessing the Prediction Uncertainty Interval

Xiong et al.  have presented a set of indices for assessing the prediction uncertainty intervals generated by the uncertainty analysis methods. Three main indices are selected here to assess the prediction uncertainty intervals produced by BMA schemes as well as from each individual hydrological model.

( 1) Containing Ratio ( C R ). The containing ratio is used for assessing the goodness of the uncertainty interval. It is defined as the percentage of observed data points that are covered in the prediction bounds.

( 2)  Average Band-Width ( B ). Consider (16)B=1Tt=1T(qut-qlt), where qut and qlt are denoted as upper and lower prediction bounds at time t. The average band-width B is also an index for measuring the performance of estimated uncertainty interval.

( 3) Average Deviation Amplitude ( D ). The average deviation amplitude D is an index to quantify the average deflection of the curve of the middle points of the prediction bounds from the observed streamflow hydrograph. It is defined as (17)D=1Tt=1T|12(qut+qlt)-Qobst|, where Qobst is the observed discharge at time t.

4. Results and Discussion

The weights of individual models in BMA(3) scheme are displayed in Figure 4, while the weights in BMA(9) are showed in Figure 5. Moreover, in order to compare the performance of two BMA schemes in different flow ranges, according to the characteristics of the streamflow values of Mumahe catchment, data are broken into three flow ranges: high flow (top 10%), medium flow (middle 50%), and low flow (bottom 40%).

Histogram of weights of individual model predictions in BMA(3) scheme.

Histogram of weights of the individual model predictions in BMA(9) scheme.

4.1. BMA(3) Results

We check the mean prediction of BMA(3) using three criteria illustrated in Section 4.1. Results of BMA(3) and its 3 individual models in the mean prediction for the whole flow series are presented in Table 1. In terms of R2, the mean prediction of BMA(3) can achieve 90.68% in calibration period and 86.98% in validation period, which is better than its best individual model prediction (XAJ). However, in terms of RE, the mean prediction of BMA(3) performs much worse than its best individual model prediction.

Results of BMA(3) and its 3 individual models in the mean prediction as well as 90% uncertainty interval for the whole flow series.

Models Mean prediction 90% uncertainty interval
R 2 (%) DRMS RE (%) CR (%) B (m3/s) D (m3/s)
Calibration period:
XAJ 88.69 30.77 21.04 24.83 31.41 16.69
SMAR 87.69 32.11 16.21 32.83 32.80 17.21
SIM 80.73 40.17 31.51 14.83 27.38 22.33
BMA(3) 90.68 27.92 27.87 40.72 43.76 16.06

Validation period:
XAJ 85.77 29.22 17.79 24.28 24.66 14.09
SMAR 85.30 29.70 14.19 31.91 25.52 14.56
SIM 69.81 42.56 39.48 14.33 18.40 20.07
BMA(3) 86.98 27.95 30.72 40.65 36.71 14.13

Note: bolded values represent the best results.

Three indices illustrated in Section 4.2 are used for assessing the prediction uncertainty intervals of both BMA(3) and its three individual models. The results for the whole flow series are also showed in Table 1. It is clear that BMA(3) uncertainty interval has the largest values of CR and B, and almost the smallest D, in both calibration and validation periods. In other words, BMA(3) uncertainty interval has better properties than any individual model’s uncertainty interval in terms of CR and D, but worse in terms of B. Then we compare the differences between BMA(3) and its individual model in uncertainty interval by the graph. Figure 6 displays the mean prediction and 90% uncertainty interval of both BMA(3) and its 3 individual models for Mumahe catchment in the year of 1983 during the calibration period. The observations of 1983 are shown as dots, and the BMA(3) mean prediction and its individual models’ predictions are represented by solid curve. As the statistical results showed in Table 1, the uncertainty intervals of the individual models have low containing ratio and large deviation amplitude. But the uncertainty interval of BMA(3) is much broader than that of any of its individuals. It can be found from Figure 7 that the results of validation period are similar to that of the calibration period. In general, the uncertainty interval of BMA(3) has better performance than its individual models for the whole flow series.

The mean prediction and 90% uncertainty interval of both BMA(3) and 3 individual models for the Mumahe catchment in 1983 during the calibration period.

The mean prediction and 90% confidence interval of both BMA(3) and 3 individual models for the Mumahe catchment in 1987 during the validation period.

4.2. BMA(9) Results

Table 2 lists the results of BMA(9) and its 9 individual models in the mean prediction for the whole flow series. And from it we can easily find that in calibration period, the mean prediction of BMA(9) performs better than its best individual prediction according to the value of R2 and DRMS, though the mean prediction of BMA(9) does not have any advantage in comparison to its individual model predictions in terms of RE.

Results of BMA(9) and its 9 individual models in the mean prediction and 90% uncertainty interval for the whole flow series.

Objective function Models Mean prediction 90% uncertainty interval
R 2 (%) DRMS RE (%) CR (%) B (m3/s) D (m3/s)
Calibration period
OF2 (MSEST) XAJ 85.45 34.89 30.24 17.89 29.43 21.46
SMAR 84.61 35.89 6.96 31.67 36.51 19.30
SIM 80.73 40.17 31.51 15.39 28.47 22.67
OF3 (MSESRT) XAJ 89.78 29.25 10.44 68.06 33.37 11.75
SMAR 80.25 40.66 10.13 44.17 35.37 17.39
SIM 72.42 48.05 5.82 47.72 42.57 21.26
OF4 (MSELT) XAJ 79.99 40.93 12.39 63.94 33.92 14.75
SMAR 58.01 59.29 −9.22 42.28 43.45 28.32
SIM 52.71 62.92 −41.07 38.89 55.51 26.93

BMA(9) 90.49 28.22 21.40 91.11 70.98 14.54

Validation period
OF2 (MSEST) XAJ 82.70 32.21 31.92 14.79 21.56 18.20
SMAR 80.05 34.59 0.66 30.23 29.52 16.64
SIM 69.81 42.56 39.48 20.84 24.43 22.32
OF3 (MSESRT) XAJ 88.52 26.25 4.54 68.56 26.95 9.62
SMAR 78.26 36.11 7.48 44.56 27.59 14.53
SIM 71.09 41.64 8.98 53.86 27.69 16.47
OF4 (MSELT) XAJ 77.25 36.94 8.74 63.07 26.68 11.85
SMAR 43.43 58.25 −18.79 35.53 35.76 27.36
SIM 72.27 40.79 −21.69 34.05 36.22 18.96

BMA(9) 84.54 30.46 25.42 90.23 55.91 13.20

Note: bolded values represent the best results.

The results of the uncertainty intervals of BMA(9) and its 9 individual models are also listed in Table 2. The containing ratio of BMA(9) uncertainty interval reaches 91.11% in calibration period and 90.23% in validation period, which are much higher than those of the uncertainty intervals of any individual model. The average deviation amplitude of the BMA(9) uncertainty interval is smaller than that of the uncertainty intervals of most of its nine individual models. From Figures 8 and 9, the similar conclusion can be concluded both in calibration and validation periods.

The mean prediction and 90% uncertainty interval of both BMA(9) and SIMHYD3 model (the SIMHYD with the objective function OF3) for the Mumahe catchment in 1983 during the calibration period.

The mean prediction and 90% confidence interval of BMA(9) and SIMHYD3 model (the SIMHYD with the objective function OF3) for the Mumahe catchment in 1987 during the validation period.

4.3. Comparison of BMA(3) and BMA(9)

The results of both BMA(3) and BMA(9) in terms of the mean prediction and 90% uncertainty interval for the whole flow series are listed in Table 3 for comparison. BMA(3) mean prediction has slightly better performance than BMA(9) mean prediction in terms of R2 and DRMS in both calibration and validation periods, while BMA(3) mean prediction is slightly worse than BMA(9) mean prediction in terms of RE. For the uncertainty intervals, some findings are listed as follows: (1) in terms of CR, BMA(9) uncertainty interval is much higher than BMA(3) uncertainty interval in both calibration and validation periods; (2) in terms of B, BMA(9) uncertainty interval is obviously larger than BMA(3) uncertainty interval in both calibration and validation periods; (3) in terms of D, BMA(9) uncertainty interval performs slightly better than BMA(3) uncertainty interval in both calibration and validation periods.

The comparison of BMA(3) and BMA(9) in the mean prediction and 90% uncertainty interval for the whole flow series.

Indices Calibration Validation
BMA(3) BMA(9) BMA(3) BMA(9)
Mean Prediction
R2 (%) 90.68 90.49 86.98 84.54
DRMS (m3/s) 27.92 28.22 27.95 30.46
RE (%) 27.87 21.40 30.72 25.42

90% uncertainty interval
CR (%) 40.72 91.11 40.65 90.23
B (m3/s) 43.76 70.98 36.71 55.91
D (m3/s) 16.06 14.54 14.13 13.20

Further, we compare the BMA(3) and BMA(9) mean predictions with respect to three flow ranges in Table 4. According to the values of three indices for mean prediction, BMA(3) mean prediction has better performance than BMA(9) mean prediction in high flow range, but has worse performance in medium and low flow ranges, during both calibration and validation periods. Then we compare the uncertainty intervals of BMA(3) and BMA(9) in three different flow ranges and have some findings as follows: (1) the CR value of BMA(9) uncertainty interval has absolute predominance in comparison with that of BMA(3) uncertainty interval for each of three flow ranges in both calibration and validation periods; (2) the B value of BMA(9) uncertainty interval is larger than that of BMA(3) uncertainty interval for all three flow ranges in both calibration and validation periods; (3) the D value of BMA(9) uncertainty interval is slightly larger than that of BMA(3) in high flow range but smaller in medium and low flow ranges in both calibration and validation periods.

The comparison of BMA(3) and BMA(9) in the mean prediction and 90% uncertainty interval for three flow ranges.

Indices High flow Medium flow Low flow
BMA(3) BMA(9) BMA(3) BMA(9) BMA(3) BMA(9)
Calibration period
Mean prediction
R2 (%) 93.01 91.74 32.28 52.76 95.83 96.39
DRMS (m3/s) 78.15 84.90 23.24 19.41 7.81 7.27
RE (%) 15.48 17.44 35.66 21.51 69.29 46.73
90% uncertainty interval
CR (%) 88.74 92.05 45.91 91.32 27.40 90.75
B (m3/s) 273.17 342.61 40.34 74.97 6.39 19.23
D (m3/s) 59.78 63.66 18.33 15.44 6.21 5.02

Validation period
Mean prediction
R2 (%) 89.00 85.47 22.03 41.82 93.66 94.94
DRMS (m3/s) 92.51 106.35 19.01 16.42 6.87 6.14
RE (%) 22.49 27.68 31.35 17.66 67.48 45.11
90% uncertainty interval
CR (%) 85.33 88.00 46.76 90.81 28.60 90.02
B (m3/s) 252.88 282.17 34.97 61.22 7.19 18.45
D (m3/s) 65.67 66.12 14.90 14.03 5.99 4.82
5. Conclusions

In this paper, the Bayesian Model Averaging (BMA) method is employed to construct a three-member predictions ensemble, denoted by BMA(3), and a nine-member predictions ensemble, denoted by BMA(9), for ensemble prediction as well as for prediction uncertainty analysis. There are three kinds of comparisons made in terms of both mean prediction and prediction uncertainty interval in this study: BMA(3) with its three individual models, BMA(9) with its nine individual models, and BMA(3) with BMA(9). In particular, we break observational flows into three different ranges for detailed comparison and analysis. The performance of two BMA schemes can be summarized as follows.

In terms of mean predictions, BMA(3) performs generally better than any of its individual models. And BMA(9) mean prediction has generally higher accuracy than each of its individual model predictions. The comparison between BMA(3) and BMA(9) in mean predictions indicates that BMA(9) does not have any advantage compared to BMA(3) as far as the entire flow series is concerned. The performance of BMA(9) mean prediction is better than that of BMA(3) in both medium and low flow ranges, however, worse in the high flow range.

In terms of the containing ratio for assessing the uncertainty intervals, the BMA(3) has a larger CR value than any of its individual models. And the containing ratio of BMA(9) uncertainty interval is also markedly larger than that of all its individual models when the CR value is calculated for the whole flow series. When the CR value is compared for different flow ranges, BMA(9) uncertainty interval performs better than its individual models in high, medium, and low flow ranges. In comparison with BMA(3), BMA(9) uncertainty interval also has absolute predominance in terms of CR.

The average band-width B of BMA(3) uncertainty interval is larger than that of all its individuals. And the average band-width of BMA(9) uncertainty interval is even larger than that of BMA(3). It is found that, for uncertainty intervals, the increase of containing ratio is accompanied by the increase of band-width, which has already been pointed out by Xiong et al. .

The average deviation amplitude D of BMA(3) uncertainty interval is generally smaller than the best individual in the ensemble. In terms of D, BMA(9) uncertainty interval also has a better performance than the best individual among its nine-member ensemble, especially in high flow range. Moreover, in terms of D, BMA(9) uncertainty interval performs better than BMA(3) uncertainty interval in medium and low flow ranges, but worse in the high flow range.

Based on this study, it is found that BMA is a particularly useful method for dealing with two issues. Firstly, when there are two or more competing models or methods available for the same problem, BMA can assess the relative performances of all models by assigning weights to each model or method and then produce more accurate mean prediction by weighted averaging of all predictions from those models or methods. Secondly, BMA can be used when there is uncertainty over control variables. The uncertainty intervals for both individual predictions and the BMA prediction can be derived when the distribution of the data is known or assumed.

Two issues from this study of BMA also need to be pointed out. The first is about the data transformation process. It is obvious that the daily flow data do not strictly obey the normal distribution even after the Box-Cox transformation. In fact, it is impossible to make every prediction from every model be normally distributed by using only a uniform transformation coefficient. Another problem is about the quality of the hydrological models chosen for combination. In this paper, the models employed here are all conceptual hydrological models. If better models are chosen as the ensemble members, then it is expected that the better results will come out of the BMA combination.

Acknowledgments

This research is supported by the National Natural Science Foundation of China (Grant nos. 51190094, 51079098), which is greatly appreciated. The comments and suggestions from the editor and the reviewers are very helpful in the improvement of the paper and are greatly appreciated.

WMO Intercomparison of conceptual models used in hydrological forecasting Operational Hydrology Report 1975 7 Geneva, Switzerland WMO Singh V. P. Computer Models of Watershed Hydrology 1995 Water Resources Publications Reid D. J. Combing three estimates of gross domestic products Economica 1968 35 431 444 Bates J. M. Granger C. W. J. The combination of forecasts Operational Research Quarterly 1969 20 451 468 Dickinson J. P. Some statistical results in the combination of forecasts Operational Research Quarterly 1973 24 2 253 260 2-s2.0-0015634118 Shamseldin A. Y. O'Connor K. M. Liang G. C. Methods for combining the outputs of different rainfall-runoff models Journal of Hydrology 1997 197 1–4 203 229 2-s2.0-0031259688 10.1016/S0022-1694(96)03259-3 Xiong L. Shamseldin A. Y. O'Connor K. M. A non-linear combination of the forecasts of rainfall-runoff models by the first-order Takagi-Sugeno fuzzy system Journal of Hydrology 2001 245 1–4 196 217 2-s2.0-0035340544 10.1016/S0022-1694(01)00349-3 Madigan D. Raftery A. E. Model selection and accounting for model uncertainty in graphical models using Occam’s window Journal of the American Statistical Association 1994 89 1535 1546 Raftery A. E. Bayesian model selection in social research Sociological Methodology 1995 25 111 163 Draper D. Assessment and propagation of model uncertainty Journal of the Royal Statistical Society B 1995 57 1 45 97 MR1325378 ZBL0812.62001 Fernández C. Ley E. Steel M. Benchmark priors for Bayesian model averaging Journal of Econometrics 2001 100 2 381 427 10.1016/S0304-4076(00)00076-2 MR1820410 ZBL1091.62507 Yeung K. Y. Bumgarner R. E. Raftery A. E. Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data Bioinformatics 2005 21 10 2394 2402 2-s2.0-19544362938 10.1093/bioinformatics/bti319 Wintle B. A. McCarthy M. A. Volinsky C. T. Kavanagh R. P. The use of bayesian model averaging to better represent uncertainty in ecological models Conservation Biology 2003 17 12 1579 1590 2-s2.0-0346735387 10.1111/j.1523-1739.2003.00614.x Morales K. H. Ibrahim J. G. Chen C.-J. Ryan L. M. Bayesian model averaging with applications to benchmark dose estimation for arsenic in drinking water Journal of the American Statistical Association 2006 101 473 9 17 10.1198/016214505000000961 MR2252429 ZBL1118.62373 Koop G. Tole L. Measuring the health effects of air pollution: to what extent can we really say that people are dying from bad air? Journal of Environmental Economics and Management 2004 47 1 30 54 2-s2.0-0347566334 10.1016/S0095-0696(03)00075-5 Raftery A. E. Balabdaoui F. Gneiting T. Polakowski M. Using bayesian model averaging to calibrate forecast ensembles Technical Report 2003 440 Department of Statistics, University of Washington Viallefont V. Raftery A. E. Richardson S. Variable selection and Bayesian model averaging in case-control studies Statistics in Medicine 2001 20 21 3215 3230 2-s2.0-0035889492 10.1002/sim.976 Hoeting J. A. Madigan D. Raftery A. E. Volinsky C. T. Bayesian model averaging: a tutorial Statistical Science 1999 14 4 382 417 10.1214/ss/1009212519 MR1765176 ZBL1059.62525 Clyde M. A. Bernardo J. M. Dawid A. P. Berger J. O. Smith A. F. M. Bayesian model averaging and model search strategies Bayesian Statistics 1999 6 Oxford University Press 157 185 Raftery A. E. Zheng Y. Discussion: performance of bayesian model averaging Journal of the American Statistical Association 2003 98 464 931 938 2-s2.0-1142301691 Neuman S. P. Maximum likelihood Bayesian averaging of uncertain model predictions Stochastic Environmental Research and Risk Assessment 2003 17 5 291 305 2-s2.0-0348225037 10.1007/s00477-003-0151-7 Ajami N. K. Duan Q. Sorooshian S. An integrated hydrologic Bayesian multimodel combination framework: confronting input, parameter, and model structural uncertainty in hydrologic prediction Water Resources Research 2007 43 1 2-s2.0-33847624519 10.1029/2005WR004745 W01403 Duan Q. Ajami N. K. Gao X. Sorooshian S. Multi-model ensemble hydrologic prediction using Bayesian model averaging Advances in Water Resources 2007 30 5 1371 1386 2-s2.0-33847274843 10.1016/j.advwatres.2006.11.014 Zhang X. Srinivasan R. Bosch D. Calibration and uncertainty analysis of the SWAT model using genetic algorithms and Bayesian model averaging Journal of Hydrology 2009 374 3-4 307 317 2-s2.0-68349108137 10.1016/j.jhydrol.2009.06.023 Beven K. Binley A. The future of distributed models: model calibration and uncertainty prediction Hydrological Processes 1992 6 3 279 298 2-s2.0-0027009437 Gupta H. V. Beven K. J. Wagener T. Calibration and uncertainty estimation Encyclopedia of Hydrological Sciences 2003 Chichester, UK John Wiley and Sons Marshall L. Nott D. Sharma A. A comparative study of Markov chain Monte Carlo methods for conceptual rainfall-runoff modeling Water Resources Research 2004 40 2 W02501 10.1029/2003WR002378 2-s2.0-1642295818 Vrugt J. A. Gupta H. V. Bouten W. Sorooshian S. A Sshuffled complex evolution metropolis algorithm for optimization and uncertainty assessment of hydrologic model parameters Water Resources Research 2003 39 8 2-s2.0-1542757127 Hammersley J. M. Handscomb D. C. Monte Carlo Methods 1975 London, UK Methuen MR0223065 Chiew F. H. S. Peel M. C. Western A. W. Singh V. P. Frevert D. Application and testing of the simple rainfall runoff model SIMHYD Mathematical Models of Small Watershed Hydrology and Applications 2002 Water Resources Publications 335 366 Duan Q. Sorooshian S. Gupta V. Effective and efficient global optimization for conceptual rainfall-runoff models Water Resources Research 1992 28 4 1015 1031 2-s2.0-0026445234 10.1029/91WR02985 Oudin L. Andréassian V. Mathevet T. Perrin C. Michel C. Dynamic averaging of rainfall-runoff model simulations from complementary model parameterizations Water Resources Research 2006 42 7 2-s2.0-33748041654 10.1029/2005WR004636 W07410 Xiong L. Wan M. Wei X. O'Connor K. M. Indices for assessing the prediction bounds of hydrological models and application by generalised likelihood uncertainty estimation Hydrological Sciences Journal 2009 54 5 852 871 2-s2.0-70349545766 10.1623/hysj.54.5.852