Model averaging is a technique used to account for model uncertainty, in both Bayesian and frequentist multimodel inferences. In this paper, we compare the performance of model-averaged Bayesian credible intervals and frequentist confidence intervals. Frequentist intervals are constructed according to the model-averaged tail area (MATA) methodology. Differences between the Bayesian and frequentist methods are illustrated through an example involving cloud seeding. The coverage performance and interval width of each technique are then studied using simulation. A frequentist MATA interval performs best in the normal linear setting, while Bayesian credible intervals yield the best coverage performance in a lognormal setting. The use of a data-dependent prior probability for models improved the coverage of the model-averaged Bayesian interval, relative to that using uniform model prior probabilities. Data-dependent model prior probabilities are philosophically controversial in Bayesian statistics, and our results suggest that their use is beneficial when model averaging.
1. Introduction
Historically, statistical inference has been based on a single model selected from among a set of predetermined candidate models, with no allowance made for model uncertainty. This process of model selection has been shown to produce biased estimators and result in the incorrect calculation of standard error terms [1–4]. Recently, model averaging has gained popularity as a technique to incorporate model uncertainty into the process of inference [5–7]. The use of model averaging has been studied in a variety of settings (e.g., [8, 9]), where it generally exhibits favorable results relative to traditional model selection.
Model averaging is a natural extension in the Bayesian paradigm, where the choice of model is introduced as a discrete-valued parameter. A prior probability mass function is specified for this parameter, defining the prior probability of each candidate model. Posterior model probabilities are defined by the posterior distribution of the model parameter, and the posterior distributions for model parameters are not conditional upon a particular model and hence naturally account for model uncertainty [10, 11]. In practice, Bayesian model averaging is achieved by allowing a Gibbs sampler to traverse the augmented parameter space, which generates approximations to the posterior distributions of interest. Facilitated by recent advances in computation, Bayesian model averaging has been widely applied in a variety of application domains (e.g., [12–14]).
In the frequentist setting, a model-averaged estimate θ^ is defined as the weighted sum of single-model estimates: θ^=∑i=1Rwiθ^i, where θ^i is the estimate under model Mi, model weights wi are determined from an information criterion such as AIC, and the summation is over the set of R candidate models.
Several approaches to constructing frequentist model-averaged confidence intervals have been suggested. Wald intervals of the form θ^±zαse^(θ^), where zα is the (1-α) quantile of the standard normal distribution, rely on accurate estimation of se(θ^), the standard error of θ^. Estimation of this term is complicated by the fact that the model weights and the single-model estimates are all random quantities. Burnham and Anderson [6] have suggested a variety of forms for se^(θ^), which are studied by Claeskens and Hjort [7] and by Turek and Fletcher [15]. In each of these studies, model-averaged Wald intervals of this form were found to perform poorly in terms of coverage rate.
An alternate methodology for the construction of frequentist model-averaged intervals is proposed by Turek and Fletcher [15]. Here, each confidence limit is defined as the value for which a weighted sum of the resulting single-model Wald interval error rates is equal to the desired error rate. As this involves averaging the “tail areas” of the sampling distributions of single-model estimates, this new construction is called a model-averaged tail area Wald (MATA-Wald) interval. In a simulation study by Turek and Fletcher [15], the MATA-Wald interval outperformed model-averaged intervals of the form θ^±zαse^(θ^). Fletcher and Turek [16] applied the MATA construction to profile likelihood intervals to produce a model-averaged tail area profile likelihood (MATA-PL) interval. Coverage properties of MATA confidence intervals are also studied in Kabaila et al. [17], and a transformed version of the MATA interval was proposed by Yu et al. [18].
In this paper, we compare the performance of model-averaged Bayesian credible intervals and the MATA-Wald and MATA-PL intervals of Turek and Fletcher [15] and Fletcher and Turek [16]. The effect of using various model prior probabilities and parameter prior distributions on Bayesian intervals is considered. We also study the use of several information criteria to calculate frequentist model weights. A theoretical study of the asymptotic properties of these intervals is complicated by the random nature of the model weights. For this reason, we assess the performance of these intervals through a simulation study.
In Section 2, we define the Bayesian and frequentist model-averaged intervals. The differences between these intervals are shown in Section 3, through an example involving cloud seeding. We describe the simulation study used to compare these intervals in Section 4 and present the results of this study in Section 5. We conclude with a discussion in Section 6.
2. Model-Averaged Intervals
Assume a set of R candidate models {Mi} exists, where the parameter of interest θ is common to all models. For data y, let model Mi have likelihood function Li(θ,λi), parameterized in terms of θ and the nuisance parameter λi, which may be vector-valued. We now define the Bayesian and frequentist model-averaged intervals for θ.
2.1. Bayesian Interval
The model-averaged posterior distribution for θ is(1)pθ∣y=∑i=1Rpθ∣Mi,ypMi∣y,where p(θ∣Mi,y) is the posterior distribution of θ under model Mi and p(Mi∣y) is the posterior probability of Mi [10]. An equal-tailed (1-2α)100% model-averaged Bayesian (MAB) credible interval is defined as the α and (1-α) quantiles of p(θ∣y).
Each posterior distribution p(θ∣Mi,y) in (1) may be expressed through integration of the joint posterior, as(2)pθ∣Mi,y=∫pθ,λi∣Mi,ydλi∝∫Liθ,λipθ,λi∣Midλi,following Bayes’ theorem, where p(θ,λi∣Mi) is the joint prior distribution for parameters θ and λi under Mi. The posterior model probabilities in (1) may be expressed as p(Mi∣y)∝p(y∣Mi)p(Mi), where p(Mi) is the prior probability of model Mi and p(y∣Mi) is the integrated likelihood under Mi, given by(3)py∣Mi=∬Liθ,λipθ,λi∣Midθdλi.
Evaluation of the integrals in (2) and (3) is generally difficult in practice, and Markov chain Monte Carlo (MCMC) simulation is used to approximate the posterior distributions of interest. In the multimodel case, this is implemented using the reversible jump MCMC (RJMCMC) algorithm [19].
2.2. Frequentist Interval
The frequentist MATA intervals are constructed in a manner analogous to Bayesian model averaging. Confidence limits are defined such that the weighted sum of error rates under each single-model interval will produce the desired overall error rate. This utilizes model weights wi, which are derived from an information criterion.
We initially focus on the information criterion AIC=-2logL^+2p to define model weights, where L^ is the maximized likelihood and p is the number of parameters. Model weights are calculated as wi∝exp(-ΔAICi/2), where ΔAICi≡AICi-minj=1,…,R(AICj) and AICi is the value of the information criterion for model Mi [20]. Other choices of information criteria for defining model weights are addressed in the discussion in Section 6.
2.2.1. MATA-Wald Interval
In the normal linear model, the confidence limits θL and θU of a single-model (1-2α)100% Wald interval for θ satisfy the equations (4)1-FνtL=α,FνtU=α,where Fν(·) is the distribution function of the t-distribution with ν degrees of freedom, ν is the error degrees of freedom associated with the model, tL=(θ^-θL)/se^(θ^), tU=(θ^-θU)/se^(θ^), and se^(θ^) is the estimated standard error of θ^ [21, 22]. A MATA-Wald interval is constructed using a weighted sum of the single-model error rates. The lower and upper confidence limits of a MATA-Wald interval, θL and θU, are defined as the values satisfying(5)∑i=1Rwi1-FνitL,i=α,∑i=1RwiFνitU,i=α,where model Mi has νi error degrees of freedom, tL,i=(θ^i-θL)/se^(θ^i), tU,i=(θ^i-θU)/se^(θ^i), and θ^i is the estimate of θ under model Mi.
The MATA-Wald interval may be generalized to nonnormal data, assuming that we can specify a transformation ϕ=g(θ) for which the sampling distribution of ϕ^i=g(θ^i) is approximately normal when Mi is true. For example, ϕ=logit(θ) when θ is a probability. In this case, the MATA-Wald confidence limits θL and θU are the values satisfying the pair of equations(6)∑i=1Rwi1-ΦzL,i=α,∑i=1RwiΦzU,i=α,where Φ(·) is the standard normal distribution function, zL,i=(ϕ^i-ϕL)/se^(ϕ^i), zU,i=(ϕ^i-ϕU)/se^(ϕ^i), ϕL=g(θL), and ϕU=g(θU), as set out by Turek and Fletcher [15].
2.2.2. MATA Profile Likelihood Interval
Assuming a single model with likelihood function L(θ,λ), the limits θL and θU of a (1-2α)100% profile likelihood interval for θ satisfy(7)ΦrθL=α,1-ΦrθU=α,where r(θ) is the signed likelihood ratio statistic, defined as(8)rθ=signθ^-θ2logLpθ^-logLpθ,and Lp(θ)=maxλL(θ,λ) is the profile likelihood function for θ [23, p. 126–129]. The limits θL and θU of the MATA-PL interval are defined as the values which satisfy(9)∑i=1RwiΦriθL=α,∑i=1Rwi1-ΦriθU=α,where ri(θ) is defined in terms of the corresponding likelihood function Li(θ,λi), as in (8), and as described by Fletcher and Turek [16].
3. Example
We use a study of cloud seeding to illustrate the differences between these methods of model averaging. There is clear evidence that seeding clouds causes an increase in the mean volume of rainfall [24–26]. However, the size of this effect may depend on the pattern of motion of the clouds. As rainfall volume has agricultural impacts, the results may affect the practicality and focus of cloud seeding operations. The data we consider come from testing conducted by the Experimental Meteorology Laboratory in Florida, USA. Total rainfall volume was measured for 27 stationary clouds, 16 of which were seeded and 11 of which were unseeded. The full data set appears in Biondini [27], and the subset relevant to our analysis is presented in Table 1.
Rain volume data recorded in 1968 and 1970 cloud seeding experiments. All clouds are stationary and are categorized as seeded or unseeded. Rain volume is measured in thousands of cubic meters (103m3).
Seeded clouds
Unseeded clouds
Rain volume
Rain volume
160.32
32.29
38.84
32.53
3396.34
397.33
605.02
1026.84
147.21
427.38
248.27
1487.62
339.80
45.28
339.80
6.06
1209.79
6.06
245.66
201.63
870.11
26.84
146.34
315.44
142.63
40.46
50.23
Suppose that we aim to predict the expected rainfall from seeded, stationary clouds. The lognormal distribution can provide a good model for total rain volume [27]. Denote the volume of rainfall from seeded, stationary clouds as RS, where logRS~N(βS,σ2), and the volume of rainfall resulting from unseeded, stationary clouds as RU, where logRU~N(βU,σ2). Let the quantity of interest be the expected rain volume resulting from the seeded clouds, θS≡E[RS]=exp(βS+σ2/2), and we consider the following two models:
βS=βU;
βS and βU unspecified.
In the Bayesian analyses, we used a vague N(0,σ2=1002) prior distribution for parameters βS and βU, a uniform prior distribution on the interval (0,100) for σ [28], and an equal prior probability for each model. We ran an MCMC algorithm for 300,000 iterations, with a 5% burn-in period. Convergence was assessed using the Brooks-Gelman-Rubin (BGR) diagnostic on two parallel chains [29, 30]. This indicated convergence for each model, with all BGR values being less than 1.008.
Frequentist models were fit using maximum likelihood. Since we are interested in prediction of θS, each likelihood function was reparameterized using logθS-σ2/2 in place of βS and logθU-σ2/2 in place of βU. The MATA-Wald interval was constructed according to (6) and the MATA-PL interval following (9), both of which used AIC weights for wi.
The resulting Bayesian posterior model probabilities were p(M1∣y)=p(M2∣y)=0.50, which were equal to the model prior probabilities to two decimal places. The AIC weights slightly favored M2, with w1=0.38 and w2=0.62. Figure 1 shows the predicted mean rain volume θ^S from seeded, stationary clouds, with 95% confidence intervals. Predictions and confidence intervals are shown for single-model inferences under M1 and M2, as well as using model averaging.
Expected mean rainfall for seeded, stationary clouds, under each model and using model averaging. Vertical bars show 95% intervals for each prediction. Intervals shown: Bayesian and MAB (purple), Wald and MATA-Wald (orange), and profile likelihood and MATA-PL (blue).
The Bayesian posterior mean and the maximum likelihood estimate for predicted rainfall are reasonably similar, with the Bayesian estimate being approximately 15% higher under each model. As expected, all estimates under M2 (where seeding may cause increased rainfall) are greater than those under M1.
The differences between methods are highlighted by confidence intervals for the expected rainfall. All lower limits are reasonably similar, while the upper limits from the Bayesian analyses are significantly higher than those from the frequentist analyses. This is particularly true under M2 and also when model averaging, where the MAB interval is 62% wider than the MATA-Wald interval. The MAB interval produces a visually appealing compromise between the single-model Bayesian intervals, especially when considering the high degree of model uncertainty.
Each profile likelihood interval is slightly more asymmetric than the corresponding Wald interval, as one would expect. The frequentist model-averaged intervals again produce a pleasing compromise between the separate inferences under each model. In light of the model uncertainty present, it would seem appropriate to use one of the model-averaged intervals to summarize the results of this analysis. It is important to realize the generalizability of the analysis presented here. The same approach is equally applicable to any data analysis situation in which there is model uncertainty, meaning that the true, underlying data-generating model is unknown.
4. Simulation Study
Based on the example in Section 3, we considered a two-sample setting for the simulation study, using both normal and lognormal data. Observations were generated as either Yij~N(βi,σ2) or logYij~N(βi,σ2), for i=1,2 and j=1,…,n. We fixed β1=0, β2=1, and σ2=1 and varied the sample size n between 10 and 100. We focused on prediction of θi≡E[Yij], for i=1,2. In the lognormal case, θi=exp(βi+σ2/2), so the likelihood was again reparameterized using logθi-σ2/2 in place of βi. The two models considered were
β1=β2;
β1 and β2 unspecified.
The performance of each method was assessed by the actual coverage rate achieved, defined as the proportion of simulations for which θ∈[θL,θU]. We averaged results over 20,000 simulations, ensuring a standard error for the coverage rate less than 0.3%. In addition, we calculated the mean interval width of each method, defined as θU-θL. All calculations were performed in R, version 2.13.0 [31].
4.1. Bayesian Implementation
Three sets of prior probabilities were considered, for the construction of three distinct model-averaged Bayesian intervals. The first Bayesian interval (MAB) used equal prior probabilities for each model and “flat” prior distributions for the parameters, βi~N(0,σ2=1002) and σ~Uniform(0,100), as suggested by Gelman [28]. The second interval (MABJ) used equal model prior probabilities and improper Jeffreys’ prior distributions [32] for the parameters: p(βi)∝1 and p(σ)∝1/σ (see, e.g., [33]). The third interval (MABKL) used flat prior distributions for the parameters and the Kullback-Leibler (KL) prior probability for each model, defined as (10)pMi∝exppi12logn-1,where pi is the number of parameters in model Mi [6, p. 302–305]. The KL model prior is a Bayesian counterpart to frequentist AIC model weights, being designed to produce posterior model probabilities asymptotically equal to AIC model weights.
A Gibbs sampler was implemented in R, using the RJMCMC algorithm. Convergence of two parallel chains was again assessed using the BGR convergence diagnostic. Simulations which failed to converge after 100,000 iterations (BGR>1.1) were discarded. In total, 99.7% of the simulations were retained, with a maximum BGR value of 1.099 and a mean BGR value of 1.007. The initial 5% of each simulation was discarded as burn-in.
4.2. Frequentist Implementation
Frequentist model-averaged intervals were constructed using AIC weights, as defined in Section 2.2. For the normal linear simulation, the MATA-Wald interval was constructed using (5) and the lognormal MATA-Wald interval following (6). The MATA-PL interval was defined according to (9), using the reparameterized likelihood in the lognormal case. Numerical solutions to these equations were found using the R root-finding command uniroot.
5. Results
In the normal linear setting, the results for θ1 and θ2 are identical by symmetry. In addition, in the lognormal setting, the results were qualitatively similar for θ1 and θ2. Therefore, for simplicity, we focus on the results for θ2. Figure 2(a) shows the estimated coverage rate for the MATA-Wald, MATA-PL, and MAB intervals. The MATA-Wald interval performs best, in particular for small sample sizes, followed by the MATA-PL and MAB intervals. All intervals asymptotically approach the nominal coverage rate of 95%. We would expect the MATA-Wald interval to perform well, since M2 is the generating model, and the Wald interval based on this model will achieve exact nominal coverage in this setting. To observe the trade-off between coverage rate and interval width, coverage is also plotted against mean interval width. For comparable interval width, the MATA-Wald interval achieves notably improved performance.
Confidence interval coverage rate performance for prediction of the mean, θ2, in the normal linear simulation. (a) MAB (purple), MATA-Wald (orange), and MATA-PL (blue) intervals. (b) MAB (solid), MABJ (dotted), and MABKL (dashed) Bayesian intervals. Nominal 95% coverage rate is shown as a dashed line.
Figure 2(b) provides the same comparison for the Bayesian MAB, MABJ, and MABKL intervals. The MABKL interval provides a noticeable improvement in coverage performance, as compared to the MAB and MABJ intervals, each of which uses equal model prior probabilities. Use of the KL prior probability for models in the MABKL interval provides an improvement of almost 2% in coverage rate for small sample sizes. This improvement comes at no noticeable increase in interval width. In addition, the use of Jeffreys’ prior distributions for the parameters slightly degrades the performance of the Bayesian interval, relative to the use of flat prior distributions.
Figure 3 provides analogous comparisons in the lognormal setting. The MAB interval outperforms the frequentist intervals for small sample sizes, although it requires a substantial increase in interval width. The MAB interval remains roughly within 1% of the nominal coverage rate for all sample sizes, while the frequentist intervals deviate by as much as 3%. The MATA-Wald interval performs better than the MATA-PL interval, with both exhibiting comparable interval widths.
Confidence interval coverage rate performance for prediction of the mean, θ2, in the lognormal simulation. (a) MAB (purple), MATA-Wald (orange), and MATA-PL (blue) intervals. (b) MAB (solid), MABJ (dotted), and MABKL (dashed) Bayesian intervals. Nominal 95% coverage rate is shown as a dashed line.
Comparison of the Bayesian intervals in the lognormal setting is qualitatively similar to that of the normal linear setting. The use of the KL prior probability for models in the MABKL interval provides an improvement over the use of equal prior probabilities, and here the use of Jeffreys’ prior distributions for the parameters severely degrades performance, relative to the use of flat prior distributions. Overall, the Bayesian interval using KL model prior probabilities outperforms all other model-averaged interval constructions in the lognormal setting.
6. Discussion
The aim of this paper has been to compare the performance of Bayesian and frequentist model-averaged confidence intervals. The frequentist MATA intervals are based upon model averaging the error rates of single-model intervals, rather than constructing an interval around a model-averaged estimator. This construction is analogous to Bayesian model averaging, and the idea was initially motivated using an analogy to a model-averaged Bayesian interval [16]. The MATA construction was studied further in Turek and Fletcher’s work [15], where it is shown that, asymptotically, a MATA interval will converge to the single-model interval based upon the candidate model with minimum Kullback-Leibler distance to the true, generating model.
Through simulation, the frequentist MATA-Wald interval produced the best coverage properties in the normal linear setting, where we would expect Wald intervals to perform well. In the lognormal setting, Bayesian intervals produced substantial improvement over the frequentist intervals. A Bayesian analysis fully allows for parameter uncertainty and does not rely on the asymptotic distributions of estimators. So long as we are willing to accept the prior distributions for the parameters, we might expect the Bayesian approach to be better suited for nonnormal settings. In contrast, when the assumptions of Wald intervals are satisfied exactly (as with normal data), use of the frequentist MATA-Wald interval resulted in improved coverage performance.
In both settings, the use of KL prior probabilities provided a noteworthy improvement in the performance of the Bayesian interval, when compared to the use of equal model prior probabilities. The KL model prior is designed to produce posterior model probabilities approximately equal to frequentist AIC model weights. This agreement between posterior probabilities and model weights was observed in our simulation.
Burnham and Anderson [6] describe prior probabilities which depend upon sample size and model complexity, such as the KL prior, as “savvy priors,” and argue in favor of their use. Larger data sets have the potential to support more complex models, which may justify assigning model prior probabilities dependent upon the data available and the relative complexity of the models being considered.
In contrast, Link and Barker [34] argue that for large sample sizes the data ought to completely dominate the priors, and the use of prior probabilities which depend upon the sample size may prevent this from occurring. They also argue that prior probabilities should represent one’s beliefs prior to data collection and have no dependence upon the data observed. This is consistent with Box and Tiao [33], where a prior is defined as “what is known about [a parameter] without knowledge of the data.” This discrepancy in what a prior probability may represent is interesting, especially considering that data-dependent priors were seen to be advantageous for Bayesian model averaging.
Thus far, we have presented results for frequentist MATA intervals constructed using AIC model weights. Two alternate information criteria were also considered: AICc [35] and BIC [36]. AICc was derived as a small-sample correction to AIC and in certain contexts may be favorable for use in model selection [37]. BIC provides an asymptotic approximation to Bayes factors and may also be used to approximate the posterior model probabilities which result from equal model priors [34].
In our study, the MATA intervals based upon AICc and BIC weights were consistently inferior to those using AIC weights. This was true in both simulation settings, and also for small sample sizes, when one may have expected AICc to perform best. This result is consistent with the findings of Fletcher and Dillingham [38], in which model-averaged intervals constructed using AIC weights yielded improved coverage properties over a variety of other information criteria, including both AICc and BIC.
Our study has used the assumption that “truth is in the model set.” This assumption is also used in the derivations of the MATA-Wald and MATA-PL intervals, as well as generally in Bayesian multimodel inference. We do not feel that this assumption undermines our conclusions, since all model averaging techniques would be adversely affected when this assumption is not met.
Our simulation has also followed the assumption that “the largest model is truth.” Philosophically this may not pose a problem, as Burnham and Anderson [6] believe that nature is arbitrarily complex, and it is unrealistic to assume that we might fully characterize the underlying process. From this viewpoint, model selection attempts to identify the most parsimonious approximating model to truth, given the data available. This assumption may in part explain the superior performance of AIC model weights, since AIC is known to favor increased model complexity [39]. However we do not consider this an issue, since results from Fletcher and Turek [16] indicate that intervals using AIC weights perform at least as well as those using other information criteria when the most complex model is not the generating model. Furthermore, all simulations presented herein were repeated using data generated under the simpler of the two candidate models (M1). In these simulations all model-averaged constructions performed similar to one another and achieved very near to the nominal coverage rates. This would be expected, since model averaging takes place over two models, both of which now represent truth.
Any simulation study is inherently limited in scope. We have considered both normal and nonnormal data, as well as a wide range of sample sizes, and observed consistent patterns throughout. Bayesian model averaging was better suited for the nonnormal setting, and the frequentist MATA-Wald interval performed best in the normal linear setting. In addition, the performance of model-averaged Bayesian intervals was improved through use of the KL model prior, a data-dependent prior probability. This result raises consideration of exactly what Bayesian priors represent; in particular, whether or not knowledge of the size of an observed sample provides grounds to update model prior probabilities.
Conflict of Interests
The author declares that there is no conflict of interests regarding the publication of this paper.
HurvichC. M.TsaiC.The impact of model selection on inference in linear regression199044321421710.2307/2685338ChatfieldC.Model uncertainty, data mining and statistical inference1995158341946610.2307/2983440DraperD.Assessment and propagation of model uncertainty19955714597MR1325378KabailaP.The effect of model selection on confidence regions and prediction regions199511353754910.1017/s0266466600009403MR1349934RafteryA. E.MadiganD.HoetingJ.Bayesian model averaging for linear regression models19979243717919110.2307/2291462MR14361072-s2.0-0031506560BurnhamK. P.AndersonD. R.2002New York, NY, USASpringerMR1919620ClaeskensG.HjortN. L.2008Cambridge, UKCambridge University PressCambridge Series in Statistical and Probabilistic Mathematics10.1017/cbo9780511790485MR2431297ViallefontV.RafteryA. E.RichardsonS.Variable selection and Bayesian model averaging in case-control studies200120213215323010.1002/sim.9762-s2.0-0035889492PesaranM.ZaffaroniP.Model averaging and value-at-risk based evaluation of large multi asset volatility models for risk management20041358HoetingJ. A.MadiganD.RafteryA. E.VolinskyC. T.Bayesian model averaging: a tutorial199914438241710.1214/ss/1009212519MR1765176WassermanL.Bayesian model selection and model averaging20004419210710.1006/jmps.1999.1278MR17700032-s2.0-0002226399VolinskyC. T.MadiganD.RafteryA. E.KronmalR. A.Bayesian model averaging in proportional hazard models: assessing the risk of a stroke19974644334482-s2.0-0040007375RafteryA. E.GneitingT.BalabdaouiF.PolakowskiM.Using Bayesian model averaging to calibrate forecast ensembles200513351155117410.1175/MWR2906.12-s2.0-20444497873DuanQ.AjamiN. K.GaoX.SorooshianS.Multi-model ensemble hydrologic prediction using Bayesian model averaging20073051371138610.1016/j.advwatres.2006.11.0142-s2.0-33847274843TurekD.FletcherD.Model-averaged Wald confidence intervals20125692809281510.1016/j.csda.2012.03.0022-s2.0-84859908148FletcherD.TurekD.Model-averaged profile likelihood intervals2012171385110.1007/s13253-011-0064-8MR29125532-s2.0-84858750739KabailaP.WelshA.AbeysekeraW.Model-averaged confidence intervals201510.1111/sjos.12163YuW.XuW.ZhuL.Transformation-based model averaged tail area inference20142961713172610.1007/s00180-014-0514-1MR32790142-s2.0-84911997977GreenP. J.Reversible jump Markov chain Monte Carlo computation and Bayesian model determination199582471173210.1093/biomet/82.4.711MR13808102-s2.0-77956889087BucklandS. T.BurnhamK. P.AugustinN. H.Model selection: an integral part of inference199753260361810.2307/25339612-s2.0-0030613470WaldA.The fitting of straight lines if both variables are subject to error1940113285300MR0002739SteinC.WaldA.Sequential confidence intervals for the mean of a normal distribution with known variance19471842743310.1214/aoms/1177730389MR0022061DavisonA. C.2003Cambridge, UKCambridge University Press10.1017/cbo9780511815850MR1998913SimpsonJ.WoodleyW. L.MillerA. H.CottonG. F.Precipitation results of two randomized pyrotechnic cumulus seeding experiments197110352654410.1175/1520-0450(1971)01060;0526:protrp62;2.0.co;22-s2.0-0015079015SimpsonJ.Use of the gamma distribution in single-cloud rainfall analysis1972100430931210.1175/1520-0493(1972)10060;0309:uotgdi62;2.3.co;2RosenfeldD.WoodleyW. L.Effects of cloud seeding in west Texas: additional results and new insights199332121848186610.1175/1520-0450(1993)032x003C;1848:eocsiwx0003e;2.0.co;2BiondiniR.Cloud motion and rainfall statistics197615320522410.1175/1520-0450(1976)01560;0205:cmars62;2.0.co;22-s2.0-0016926937GelmanA.Prior distributions for variance parameters in hierarchical models200613515533GelmanA.RubinD. B.Inference from iterative simulation using multiple sequences19927445747210.1214/ss/1177011136BrooksS. P.GelmanA.General methods for monitoring convergence of iterative simulations19987443445510.2307/1390675MR16656622-s2.0-0032273615R Core Team2014Vienna, AustriaR Foundation for Statistical ComputingJeffreysH.An invariant form for the prior probability in estimation problems194618645346110.1098/rspa.1946.0056MR0017504BoxG.TiaoG.1973Reading, Mass, USAAddison-WesleyLinkW. A.BarkerR. J.Model weights and the foundations of multimodel inference200687102626263510.1890/0012-9658(2006)87[2626:mwatfo]2.0.co;22-s2.0-33845360384SugiuraN.Further analysts of the data by akaike's information criterion and the finite corrections200771132610.1080/03610927808827599SchwarzG.Estimating the dimension of a model19786246146410.1214/aos/1176344136MR0468014HurvichC. M.TsaiC.-L.Regression and time series model selection in small samples198976229730710.1093/biomet/76.2.297MR10160202-s2.0-70349119250FletcherD.DillinghamP. W.Model-averaged confidence intervals for factorial experiments201155113041304810.1016/j.csda.2011.05.014MR28130642-s2.0-79959764257KassR. E.RafteryA. E.Bayes factors19959043077379510.1080/01621459.1995.10476572