Since Bayesian Model Averaging (BMA) method can combine the forecasts of different models together to generate a new one which is expected to be better than any individual model’s forecast, it has been widely used in hydrology for ensemble hydrologic prediction. Previous studies of the BMA mostly focused on the comparison of the BMA mean prediction with each individual model’s prediction. As BMA has the ability to provide a statistical distribution of the quantity to be forecasted, the research focus in this study is shifted onto the comparison of the prediction uncertainty interval generated by BMA with that of each individual model under two different BMA combination schemes. In the first BMA scheme, three models under the same NashSutcliffe efficiency objective function are, respectively, calibrated, thus providing threemember predictions ensemble for the BMA combination. In the second BMA scheme, all three models are, respectively, calibrated under three different objective functions other than NashSutcliffe efficiency to obtain ninemember predictions ensemble. Finally, the model efficiency and the uncertainty intervals of each individual model and two BMA combination schemes are assessed and compared.
To date, various hydrological models have been put forward and widely used in flood forecasting, planning, and water resources management [
Bayesian Model Averaging came to prominence in statistics in the mid1990s, and Madigan and Raftery [
A prediction from a single model has been recognized to be associated with a certain degree of uncertainty, and so is the prediction from combining a number of different single models. Thus, uncertainty analysis is an indispensable element for any hydrologic modeling study. The uncertainty usually arises from errors during the calibration of parameters, the design of model structure, and measurements of input and output data [
Previous studies of BMA in hydrology mostly focused on the comparison of the BMA mean prediction with each individual model’s prediction, to prove the better performance of the prediction after weighted averaging. As BMA also has the ability to provide a statistical distribution of the quantity to be forecasted, the research focus in this study is shifted onto the comparison of the prediction uncertainty interval generated by the BMA with that of each individual model, in order to see if BMA can also improve the prediction reliability. The technical route of the research in this paper is described in Figure
Flowchart of using BMA scheme for hydrological ensemble prediction as well as for prediction uncertainty analysis.
Bayesian Model Averaging (BMA) is a statistical technique designed to infer a prediction by weighted averaging over many different competing models. This method is not only a scheme for model combination but also a coherent approach for accounting for betweenmodel and withinmodel uncertainty [
Let us consider a quantity
The terms in (
The BMA mean prediction is a weighted average of the individual model’s predictions, with their posterior probabilities being the weights. In the case that the observations and individual model predictions are all normally distributed, the BMA mean prediction can be expressed as
To estimate BMA weight
Firstly, if we denote the set of BMA parameters to be estimated by
It is difficult to maximize the function (
Initialize
After BMA weight
Generate an integer value of
Set the cumulative weight
Generate a random number
If
Generate a value of
Repeat the above steps (1) and (2) for
After generating the BMA probabilistic ensemble predictions, sort them in the ascending order. Then the 90% uncertainty intervals can be derived within the range of the 5% and 95% quantiles.
For each individual model in the BMA scheme, the prediction uncertainty interval can also be constructed, with the Monte Carlo sampling method still being used to approximate the assumed PDF of
The study area is Mumahe catchment, a branch of Han River. It is located in Shanxi Province of China and the total area is 1224 km^{2}. The basin has a subtropical climate, and the area is humid with fairly high precipitation. The mean annual rainfall for the period of 1980–1987 is 1070 mm, and the mean annual runoff is 687 mm, or roughly 64% of the annual rainfall. The hydrological data include daily runoff, rainfall, and evaporation. There are 2992 data points in total, and 1825 (the period of 1980.1.1–1985.12.31) of them are used for calibration, while the rest 1167 data points (the period of 1986.1.1–1987.12.31) are used for validation.
In this study, three conceptual hydrological models are employed for testing the capability of BMA: the Xinanjiang RainfallRunoff Model (XAJ), the Soil Moisture Accounting and Routing Model (SMAR), and SIMHYD RainfallRunoff Model.
Xinanjiang RainfallRunoff Model was developed in 1970s. It is a conceptual hydrologic model, which has been widely used in humid and semihumid regions of China. And all the 15 parameters of this model have strong physical meanings. SMAR model is a lumped conceptual model with soil moisture as a central theme. The model consists of two components in sequence: a water balance component with 5 water balance parameters and a routing component with 4 routing parameters. SIMHYD model is a daily conceptual model that estimates daily stream flow from daily rainfall and areal potential evapotranspiration data and it contains 7 parameters [
The selection of objective function (OF) is of great importance since it will have great influence on the values of calibrated parameters and thus on simulation results of the rainfallrunoff model. Different objective functions can be adopted for different kinds of practical issues. For example, the objective function of squared model errors of squared transformed flow can be applied in high flow studies, and the objective function of squared model errors of logarithmic transformed flow can be applied in low flow studies [
When the prediction data are highly nonGaussian, we should firstly transform the data to be normally distributed by BoxCox transformation before using EM algorithm. OF1 is the most widely used objective function for parameter optimization and is used in calibrating each of three hydrological models mentioned above to generate three different predictions. We combine these three different predictions by BMA to construct a threemember predictions ensemble; thus, we denote the first BMA scheme as BMA(3). Figure
Diagram of BMA(3) combination scheme.
Diagram of BMA(9) combination scheme.
Let
There are three indices for evaluating the mean prediction.
Xiong et al. [
The weights of individual models in BMA(3) scheme are displayed in Figure
Histogram of weights of individual model predictions in BMA(3) scheme.
Histogram of weights of the individual model predictions in BMA(9) scheme.
We check the mean prediction of BMA(3) using three criteria illustrated in Section
Results of BMA
Models  Mean prediction  90% uncertainty interval  


DRMS  RE (%)  CR (%) 

 
Calibration period:  
XAJ  88.69  30.77  21.04  24.83  31.41  16.69 
SMAR  87.69  32.11 

32.83  32.80  17.21 
SIM  80.73  40.17  31.51  14.83 

22.33 
BMA 


27.87 

43.76 

 
Validation period:  
XAJ  85.77  29.22  17.79  24.28  24.66  14.09 
SMAR  85.30  29.70 

31.91  25.52  14.56 
SIM  69.81  42.56  39.48  14.33 

20.07 
BMA 


30.72 

36.71  14.13 
Note: bolded values represent the best results.
Three indices illustrated in Section
The mean prediction and 90% uncertainty interval of both BMA(3) and 3 individual models for the Mumahe catchment in 1983 during the calibration period.
The mean prediction and 90% confidence interval of both BMA(3) and 3 individual models for the Mumahe catchment in 1987 during the validation period.
Table
Results of BMA
Objective function  Models  Mean prediction  90% uncertainty interval  


DRMS  RE (%)  CR (%) 



Calibration period  
OF2 (MSEST)  XAJ  85.45  34.89  30.24  17.89  29.43  21.46 
SMAR  84.61  35.89  6.96  31.67  36.51  19.30  
SIM  80.73  40.17  31.51  15.39 

22.67  
OF3 (MSESRT)  XAJ  89.78  29.25  10.44  68.06  33.37 

SMAR  80.25  40.66  10.13  44.17  35.37  17.39  
SIM  72.42  48.05  − 
47.72  42.57  21.26  
OF4 (MSELT)  XAJ  79.99  40.93  12.39  63.94  33.92  14.75 
SMAR  58.01  59.29  −9.22  42.28  43.45  28.32  
SIM  52.71  62.92  −41.07  38.89  55.51  26.93  
 
BMA 


21.40 

70.98  14.54  
 
Validation period  
OF2 (MSEST)  XAJ  82.70  32.21  31.92  14.79 

18.20 
SMAR  80.05  34.59 

30.23  29.52  16.64  
SIM  69.81  42.56  39.48  20.84  24.43  22.32  
OF3 (MSESRT)  XAJ 


4.54  68.56  26.95 

SMAR  78.26  36.11  7.48  44.56  27.59  14.53  
SIM  71.09  41.64  8.98  53.86  27.69  16.47  
OF4 (MSELT)  XAJ  77.25  36.94  8.74  63.07  26.68  11.85 
SMAR  43.43  58.25  −18.79  35.53  35.76  27.36  
SIM  72.27  40.79  −21.69  34.05  36.22  18.96  
 
BMA 
84.54  30.46  25.42 

55.91  13.20 
Note: bolded values represent the best results.
The results of the uncertainty intervals of BMA(9) and its 9 individual models are also listed in Table
The mean prediction and 90% uncertainty interval of both BMA(9) and SIMHYD3 model (the SIMHYD with the objective function OF3) for the Mumahe catchment in 1983 during the calibration period.
The mean prediction and 90% confidence interval of BMA(9) and SIMHYD3 model (the SIMHYD with the objective function OF3) for the Mumahe catchment in 1987 during the validation period.
The results of both BMA(3) and BMA(9) in terms of the mean prediction and 90% uncertainty interval for the whole flow series are listed in Table
The comparison of BMA
Indices  Calibration  Validation  

BMA 
BMA 
BMA 
BMA 

Mean Prediction  

90.68  90.49  86.98  84.54 
DRMS (m^{3}/s)  27.92  28.22  27.95  30.46 
RE (%)  27.87  21.40  30.72  25.42 
 
90% uncertainty interval  
CR (%)  40.72  91.11  40.65  90.23 

43.76  70.98  36.71  55.91 

16.06  14.54  14.13  13.20 
Further, we compare the BMA(3) and BMA(9) mean predictions with respect to three flow ranges in Table
The comparison of BMA
Indices  High flow  Medium flow  Low flow  

BMA 
BMA 
BMA 
BMA 
BMA 
BMA 

Calibration period  
Mean prediction  

93.01  91.74  32.28  52.76  95.83  96.39 
DRMS (m^{3}/s)  78.15  84.90  23.24  19.41  7.81  7.27 
RE (%)  15.48  17.44  35.66  21.51  69.29  46.73 
90% uncertainty interval  
CR (%)  88.74  92.05  45.91  91.32  27.40  90.75 

273.17  342.61  40.34  74.97  6.39  19.23 

59.78  63.66  18.33  15.44  6.21  5.02 
 
Validation period  
Mean prediction  

89.00  85.47  22.03  41.82  93.66  94.94 
DRMS (m^{3}/s)  92.51  106.35  19.01  16.42  6.87  6.14 
RE (%)  22.49  27.68  31.35  17.66  67.48  45.11 
90% uncertainty interval  
CR (%)  85.33  88.00  46.76  90.81  28.60  90.02 

252.88  282.17  34.97  61.22  7.19  18.45 

65.67  66.12  14.90  14.03  5.99  4.82 
In this paper, the Bayesian Model Averaging (BMA) method is employed to construct a threemember predictions ensemble, denoted by BMA(3), and a ninemember predictions ensemble, denoted by BMA(9), for ensemble prediction as well as for prediction uncertainty analysis. There are three kinds of comparisons made in terms of both mean prediction and prediction uncertainty interval in this study: BMA(3) with its three individual models, BMA(9) with its nine individual models, and BMA(3) with BMA(9). In particular, we break observational flows into three different ranges for detailed comparison and analysis. The performance of two BMA schemes can be summarized as follows.
In terms of mean predictions, BMA(3) performs generally better than any of its individual models. And BMA(9) mean prediction has generally higher accuracy than each of its individual model predictions. The comparison between BMA(3) and BMA(9) in mean predictions indicates that BMA(9) does not have any advantage compared to BMA(3) as far as the entire flow series is concerned. The performance of BMA(9) mean prediction is better than that of BMA(3) in both medium and low flow ranges, however, worse in the high flow range.
In terms of the containing ratio for assessing the uncertainty intervals, the BMA(3) has a larger
The average bandwidth
The average deviation amplitude
Based on this study, it is found that BMA is a particularly useful method for dealing with two issues. Firstly, when there are two or more competing models or methods available for the same problem, BMA can assess the relative performances of all models by assigning weights to each model or method and then produce more accurate mean prediction by weighted averaging of all predictions from those models or methods. Secondly, BMA can be used when there is uncertainty over control variables. The uncertainty intervals for both individual predictions and the BMA prediction can be derived when the distribution of the data is known or assumed.
Two issues from this study of BMA also need to be pointed out. The first is about the data transformation process. It is obvious that the daily flow data do not strictly obey the normal distribution even after the BoxCox transformation. In fact, it is impossible to make every prediction from every model be normally distributed by using only a uniform transformation coefficient. Another problem is about the quality of the hydrological models chosen for combination. In this paper, the models employed here are all conceptual hydrological models. If better models are chosen as the ensemble members, then it is expected that the better results will come out of the BMA combination.
This research is supported by the National Natural Science Foundation of China (Grant nos. 51190094, 51079098), which is greatly appreciated. The comments and suggestions from the editor and the reviewers are very helpful in the improvement of the paper and are greatly appreciated.