JAM Journal of Applied Mathematics 1687-0042 1110-757X Hindawi Publishing Corporation 978691 10.1155/2014/978691 978691 Research Article Analysis of the Behrens-Fisher Problem Based on Bayesian Evidence http://orcid.org/0000-0003-1801-0031 Yin Yuliang Li Baoren Au Francis T. K. School of Economics Beijing Technology and Business University Beijing 100048 China btbu.edu.cn 2014 332014 2014 13 10 2013 28 01 2014 4 3 2014 2014 Copyright © 2014 Yuliang Yin and Baoren Li. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The Behrens-Fisher problem concerns the inferences for the difference between the means of two normal populations without making any assumption about the variances. Although the problem has been extensively studied in the literature, researchers cannot agree on its solution at present. In this paper, we propose a new method for dealing with the Behrens-Fisher problem in the Bayesian framework. The Bayesian evidence for testing the equality of two normal means and a credible interval at a specified level for the difference between the means are derived. Simulation studies are carried out to evaluate the performance of the provided Bayesian evidence.

1. Introduction

The Behrens-Fisher problem may arise in the comparison of two treatments, products, and so forth. It concerns comparing the means of two normal distributions whose variances are unknown. Suppose that X 1 , , X m and Y 1 , , Y n are two independent random samples from two normal populations N ( μ 1 , σ 1 2 ) and N ( μ 2 , σ 2 2 ) , respectively, where both σ 1 2 and σ 2 2 are completely unspecified. We are interested in testing the hypothesis H 0 : μ 1 = μ 2 and giving the interval estimation for the difference between two means, θ = μ 1 - μ 2 .

The difficulty with the Behrens-Fisher problem is that the standard classical frequentist evidence is not available because nuisance parameters are present. Tsui and Weerahandi  introduced the concept of the generalized P value to deal with nuisance parameters in testing hypotheses. If the corresponding sample means and sample variances are denoted by ( x ¯ , y ¯ ) and ( s 1 2 , s 2 2 ) , respectively, a generalized frequentist evidence for testing H 0 can be formulated by the approach of the generalized P value as (1) p ( x ) = P ( ( ( s ~ 1 2 B ( m - 1 ) / 2 , ( n - 1 ) / 2 - 1 + s ~ 2 2 ( 1 - B ( m - 1 ) / 2 , ( n - 1 ) / 2 ) - 1 ) ) F 1 , m + n - 2 ( ( x ¯ - y ¯ ) 2 ( m + n - 2 ) ) × ( + s ~ 2 2 ( 1 - B ( m - 1 ) / 2 , ( n - 1 ) / 2 ) - 1 ) s ~ 1 2 B ( m - 1 ) / 2 , ( n - 1 ) / 2 - 1 + s ~ 2 2 ( 1 - B ( m - 1 ) / 2 , ( n - 1 ) / 2 ) - 1 ) ) - 1 , where F 1 , m + n - 2 is an F -variable with 1 and m + n - 2 degrees of freedom and B ( m - 1 ) / 2 , ( n - 1 ) / 2 is a B e t a -variable with parameters ( m - 1 ) / 2 and ( n - 1 ) / 2 that is independent of F 1 , m + n - 2 , s ~ 1 2 = ( 1 - m - 1 ) s 1 2 , and s ~ 2 2 = ( 1 - n - 1 ) s 2 2 . This generalized frequentist solution is formally equivalent to the Bayesian solution given by Jeffreys  or the fiducial solution given by Wallace . Meng  introduced the concept of the posterior predictive P value and provided posterior predictive evidence. In the case of Behrens-Fisher problem, this test is formulated as (2) p p p ( x ) = P ( F 1 , m + n + [ s ~ 2 2 + ( y ¯ - μ ) 2 ] ) ( 1 - B m / 2 , n / 2 ) - 1 ) - 1 ( ( x ¯ - y ¯ ) 2 ( m + n ) ) × ( [ s ~ 1 2 + ( x ¯ - μ ) 2 ] ( 1 - B m / 2 , n / 2 ) - 1 B m / 2 , n / 2 - 1 + [ s ~ 2 2 + ( y ¯ - μ ) 2 ] ( 1 - B m / 2 , n / 2 ) - 1 ) - 1 ) , where F 1 , m + n is an F -variable with 1 and m + n degrees of freedom, B m / 2 , n / 2 is a B e t a -variable with parameters m / 2 and n / 2 that is independent of F 1 , m + n , and μ is a variable with a “combined t ” distribution: (3) π 0 ( μ x ) [ 1 + 1 m - 1 ( μ - x ¯ s 1 / m ) 2 ] - m / 2 × [ 1 + 1 n - 1 ( μ - y ¯ s 2 / n ) 2 ] - n / 2 .

Behrens  gave a confidence interval for the difference between the two means in a testing context of H 0 : μ 1 = μ 2 against H 1 : μ 1 μ 2 based on the pivotal quantity of (4) D = ( X 1 ¯ - X 2 ¯ ) - ( μ 1 - μ 2 ) ( n 1 - 1 S 1 2 - n 2 - 1 S 2 2 ) 1 / 2 . Bartlett  revealed, from a frequentist perspective, that the coverage probability of the confidence interval given by Behrens is different from the specified confidence coefficient. Fisher  derived a fiducial interval for θ = μ 1 - μ 2 which has a specified fiducial level by the method of fiducial inference. Neyman illustrated by calculation that an interval estimator with a fiducial level of 1 - α is not necessarily a confidence interval with a confidence coefficient of 1 - α . Welch [8, 9] gave approximate solutions of the confidence intervals which are also constructed in a testing context based on the pivotal quantity D . In the Bayesian framework, Jeffreys , based on the objective prior (5) π ( μ 1 , μ 2 , σ 1 2 , σ 2 2 ) σ 1 - 2 σ 2 - 2 , constructed a Bayesian credible interval. This interval is algebraically equivalent to the fiducial interval of Fisher.

For more discussions of the Behrens-Fisher problem see Wilks , Chernoff , Chand , Banerjee , Srivastava , Ghosh and Kim , Madruga et al. , and McMurry et al. .

In this paper, we derive the Bayesian evidence for the Behrens-Fisher problem using the procedure in Yin  for testing point null hypotheses. Based on the provided Bayesian evidence, a Bayesian credible interval at a specified credible level for the difference of the means θ = μ 1 - μ 2 is derived in a Bayesian testing context.

This paper is organized as follows. In Section 2, we give the main results of the Bayesian analysis of the Behrens-Fisher problem concerning the testing and interval estimation of the difference of two normal means with the variances completely unknown. Some conclusions and discussions are given in Section 3.

2. Main Results 2.1. Bayesian Evidence for the Behrens-Fisher Problem

Yin  introduced a Bayesian measure of evidence for testing point null hypotheses of the form (6) H 0 : θ = θ 0 v.s. H 1 : θ θ 0 . Let X 1 , , X n be a random sample from a distribution with density f ( x θ ) , where θ is an unknown element of the parameter space Θ . The Bayesian evidence against the null hypothesis H 0 based on a prior π ( θ ) is given by (7) p B ( x ) = P ( | θ - E ( θ x ) | | θ 0 - E ( θ x ) | x ) , where E ( θ x ) is the posterior expectation of θ under the prior π ( θ ) and the probability is taken over the posterior distribution of θ . A smaller p B ( x ) means stronger evidence against the null hypothesis H 0 . In his work, Yin illustrated that the Bayesian evidence given by (7) under the Jeffreys noninformative prior is just equivalent to the corresponding frequentist evidence for many classical testing situations and showed that the Lindley's paradox in Lindley  can be avoided by this Bayesian method of testing point null hypotheses.

Now consider the Behrens-Fisher problem of testing hypotheses (8) H 0 : μ 1 = μ 2 v.s. H 1 : μ 1 μ 2 . Note that (8) can be reformulated as (9) H 0 : μ 1 - μ 2 = 0 v.s. H 1 : μ 1 - μ 2 0 . The posterior distribution for θ = μ 1 - μ 2 under the objective prior (5) can be obtained as (10) θ x ~ x ¯ - y ¯ - ( s 1 T m - 1 m - s 2 T n - 1 n ) , where T m - 1 and T n - 1 are two independent t -variables with m - 1 and n - 1 degrees of freedom, respectively. Since the posterior expectation of θ is (11) E ( θ x ) = x ¯ - y ¯ , the Bayesian evidence under the objective prior (5) can be formulated as (12) p BF ( x ) = P ( | θ - ( x ¯ - y ¯ ) |    | x ¯ - y ¯ | x ) = P ( | s 1 T m - 1 m - s 2 T n - 1 n | | x ¯ - y ¯ | ) , where the first probability is taken over the posterior distribution of θ and the second one is taken over two independent t -variables T m - 1 and T n - 1 .

Now we carry out a simulation study to illustrate the performance of the proposed Bayesian evidence. The simulation results listed in Table 1 show that p BF ( x ) is quite reasonable evidence for testing the Behrens-Fisher problem. For fixed values of σ 1 and σ 2 , notice that the more significant the difference between μ 1 and μ 2 is, the smaller value of p BF ( x ) we may obtain, which means that the stronger Bayesian evidence for rejecting the null hypothesis of H 0 : μ 1 = μ 2 is given. Moreover, p BF ( x ) gives more reliable and efficient evidence when the population variances are small. It can also be noticed that the Bayesian evidence p BF ( x ) is very close to the corresponding generalized frequentist evidence p ( x ) in (1) and the posterior predictive evidence p p p ( x ) in (2).

p BF ( x ) , p ( x ) and p p p ( x ) for testing the Behrens-Fisher Problem.

σ 1 = 2 , σ 2 = 3 σ 1 = 3 , σ 2 = 2
μ 1 μ 2 p BF ( x ) p ( x ) p p p ( x ) μ 1 μ 2 p BF ( x ) p ( x ) p p p ( x )
2.00 2.00 0.7823 0.7810 0.7816 2.00 2.00 0.7635 0.7606 0.7615
2.00 2.01 0.3133 0.3104 0.3102 2.00 2.01 0.3028 0.3031 0.3017
2.00 2.02 0.1722 0.1713 0.1711 2.00 2.02 0.1949 0.1950 0.1938
2.00 2.03 0.0854 0.0863 0.0868 2.00 2.03 0.0628 0.0627 0.0630
2.00 2.04 0.0418 0.0420 0.0428 2.00 2.04 0.0188 0.0185 0.0190
2.00 2.05 0.0117 0.0122 0.0121 2.00 2.05 0.0018 0.0017 0.0016

σ 1 = 0.1 , σ 2 = 0.2 σ 1 = 0.2 , σ 2 = 0.1
μ 1 μ 2 p BF ( x ) p ( x ) p p p ( x ) μ 1 μ 2 p BF ( x ) p ( x ) p p p ( x )

2.000 2.000 0.8597 0.8603 0.8596 2.000 2.000 0.8525 0.8522 0.8511
2.000 2.001 0.1329 0.1319 0.1336 2.000 2.001 0.4979 0.4998 0.4961
2.000 2.002 0.0688 0.0690 0.0696 2.000 2.002 0.1090 0.1070 0.1085
2.000 2.003 0.0384 0.0383 0.0379 2.000 2.003 0.0239 0.0239 0.0237
2.000 2.004 0.0313 0.0317 0.0318 2.000 2.004 0.0014 0.0014 0.0011
2.000 2.005 0.0033 0.0032 0.0035 2.000 2.005 0.0007 0.0008 0.0006

σ 1 = 2 , σ 2 = 0.1 σ 1 = 2 , σ 2 = 2
μ 1 μ 2 p BF ( x ) p ( x ) p p p ( x ) μ 1 μ 2 p BF ( x ) p ( x ) p p p ( x )

2.00 2.00 0.5239 0.5245 0.5279 2.00 2.00 0.9963 0.9963 0.9962
2.00 2.01 0.3327 0.3346 0.3357 2.00 2.01 0.2523 0.2507 0.2500
2.00 2.02 0.0224 0.0228 0.0218 2.00 2.02 0.0996 0.1006 0.0991
2.00 2.03 0.0032 0.0033 0.0034 2.00 2.03 0.0366 0.0366 0.0368
2.00 2.04 0.0017 0.0017 0.0018 2.00 2.04 0.0119 0.0122 0.0122
2.00 2.05 0.0001 0.0001 0.0001 2.00 2.05 0.0036 0.0040 0.0040

By this Bayesian evidence for the Behrens-Fisher problem, we consider two examples. One is included in Lehmann . The driving times from a person's house to his working place following two different routes were measured which we list in Table 2. Another one is in Ghosh et al.  where the data which we list in Table 3 is from a clinical trial conducted by Sahu to compare the improvement score of surgical treatment with that of nonsurgical treatment. If it is assumed that the two independent samples in both Tables 2 and 3 are, respectively, drawn from two normal distributions N ( μ 1 , σ 1 2 ) and N ( μ 2 , σ 2 2 ) and if we are interested in the equality of the two means μ 1 and μ 2 , each of these two examples reduces to the Behrens-Fisher problem of testing hypotheses (8). For both situations, the Bayesian evidence p BF ( x ) and the corresponding generalized frequentist evidence p ( x ) and posterior predictive evidence p p p ( x ) all give very strong evidence of nearly zero for rejecting the null hypothesis that there is no difference between the two means. This agrees with our intuition from the observed data.

Measures of driving times from following two different routes.

Route Times
I 6.5 6.8 7.1 7.3 10.2
II 5.8 5.8 5.9 6.0 6.0 6.0 6.3 6.3 6.4 6.5 6.5

Scores of surgical and non-surgical treatments.

Treatment Scores
Surgical 15 9 12 16 14 15 18 13 12 11 15 9 16 9
Non-surgical 6 8 7 4 4 6 8 3 7 8 9 6 3 6 4

2.2. Bayesian Credible Interval

Based on the proposed Bayesian evidence, a credible interval for the difference of means θ = μ 1 - μ 2 at a specified credible level can be constructed in a testing context. For the following hypothesis testing problem of comparing two normal means: (13) H 0 : μ 1 - μ 2 = θ 0 v.s. H 1 : μ 1 - μ 2 θ 0 , where the variances are completely unspecified, the Bayesian evidence under the objective prior (5) is (14) p BF ( x ; θ 0 ) = P ( | θ - ( x ¯ - y ¯ ) | | θ 0 - ( x ¯ - y ¯ ) | x ) = P ( | s 1 T m - 1 m - s 2 T n - 1 n | | θ 0 - ( x ¯ - y ¯ ) | ) , where the first probability is taken over the posterior distribution of θ and the second one is taken over two independent t -variables T m - 1 and T n - 1 .

Theorem 1.

For the Behrens-Fisher problem, let A B F ( θ 0 ) = { x : p B F ( x ; θ 0 ) α } , S B F ( x ) = { θ 0 : x A B F ( θ 0 ) } , and I k = [ x ¯ - y ¯ - k , x ¯ - y ¯ + k ] . For a fixed α , if I k B F satisfies (15) P ( θ I k B F x ) = 1 - α , then one has (16) S B F ( x ) = I k B F .

Proof.

On one hand, I k BF satisfies (17) P ( θ I k BF x ) = 1 - α , which means that (18) P ( | θ - ( x ¯ - y ¯ ) | k BF x ) = 1 - α . On the other hand, it is easy to know that p BF ( x ; θ 0 ) α is equivalent to (19) P ( | θ - ( x ¯ - y ¯ ) | | θ 0 - ( x ¯ - y ¯ ) | x ) 1 - α . By (18) and (19), we know that S BF ( x ) = I k BF .

By Theorem 1, we know that the 1 - α credible interval for θ = μ 1 - μ 2 centered at E ( θ x ) = x ¯ - y ¯ can be easily obtained by p BF ( x ; θ 0 ) α . This is a Bayesian interval obtained in a testing context. Interestingly, the resulting interval by our method is just equivalent to that given by Fisher or Jeffreys.

In fact, we have another interesting result about the interval estimation of θ = μ 1 - μ 2 on the basis of the Bayesian evidence p BF ( x ; θ 0 ) , which shows that the 1 - α credible interval centered at the posterior expectation for the Behrens-Fisher problem can be constructed by the α and 1 - α / 2 quantiles of the posterior distribution of θ . We summarize this as the following theorem.

Theorem 2.

For the Behrens-Fisher problem, p B F ( x ; θ 0 ) α yields the 1 - α credible interval for θ = μ 1 - μ 2 centered at the posterior expectation E ( θ x ) = x ¯ - y ¯ as follows: (20) I k B F = [ θ ^ α / 2 ( x ) , θ ^ 1 - ( α / 2 ) ( x ) ] , where θ ^ α / 2 ( x ) and θ ^ 1 - α / 2 ( x ) are, respectively, the α / 2 and 1 - α / 2 quantiles of the posterior distribution (21) θ x ~ x ¯ - y ¯ - ( s 1 T m - 1 m - s 2 T n - 1 n ) .

Proof.

We first prove that the Bayesian evidence for testing (13) can be expressed as (22) p BF ( x ; θ 0 ) = 2 min { P ( θ θ 0 x ) , P ( θ θ 0 x ) } . In fact, if θ 0 x ¯ - y ¯ , we have (23) p BF ( x ; θ 0 ) = P ( | θ - ( x ¯ - y ¯ ) | | θ 0 - ( x ¯ - y ¯ ) | x ) = P ( θ - ( x ¯ - y ¯ ) θ 0 - ( x ¯ - y ¯ ) x )    + P ( θ - ( x ¯ - y ¯ ) - ( θ 0 - ( x ¯ - y ¯ ) ) x ) = 2 P ( θ - ( x ¯ - y ¯ ) θ 0 - ( x ¯ - y ¯ ) x ) = 2 P ( θ θ 0 x ) , where the second equation is due to the fact that the posterior distribution of θ is symmetric about x ¯ - y ¯ . Similarly, if θ 0 x ¯ - y ¯ , we have (24) p BF ( x ; θ 0 ) = 2 P ( θ θ 0 x ) . By (23) and (24) together with the symmetry of the posterior distribution of θ , we have (25) p BF ( x ; θ 0 ) = 2 min { P ( θ θ 0 x ) , P ( θ θ 0 x ) } .

It then follows that p BF ( x ; θ 0 ) α if and only if P ( θ θ 0 x ) α / 2 and P ( θ θ 0 x ) α / 2 hold simultaneously, which is equivalent to (26) θ ^ α / 2 ( x ) θ 0 θ ^ 1 - ( α / 2 ) ( x ) . Since the posterior of θ is symmetric about x ¯ - y ¯ , [ θ ^ α / 2 ( x ) , θ ^ 1 - ( α / 2 ) ( x ) ] is a credible interval centered at x ¯ - y ¯ . This completes the proof.

Theorem 2 provides another way of constructing the credible interval for θ = μ 1 - μ 2 . Moreover, we know easily by the proof of Theorem 2 that the 1 - α credible interval for θ = μ 1 - μ 2 which is centered at the posterior expectation can be given by [ θ ^ α / 2 ( x ) , θ ^ 1 - ( α / 2 ) ( x ) ] even when other priors are used so long as the posterior of θ is symmetric.

Now we return to the examples of comparing means of driving time and comparing improvement scores of treatments discussed above. We recommend the 1 - α credible intervals of ( - 0.4659,3.2313 ) and ( 5.1982,9.3422 ) for Lehmann's and Sahu's data, respectively, which are obtained according to our procedure. The recommended intervals are essentially equivalent to the intervals given by the method of Fisher or Jeffreys.

3. Conclusions

We carry out Bayesian analysis of the Behrens-Fisher problem in this paper. The Bayesian evidence for testing the hypothesis H 0 : μ 1 = μ 2 against H 1 : μ 1 μ 2 is given. Simulation results show that our evidence performs quite well and is very close to the corresponding generalized frequentist evidence and posterior predictive evidence for the Behrens-Fisher problem. Based on the proposed evidence, a method of constructing the credible interval at a specified level for the difference of means θ = μ 1 - μ 2 is provided in a Bayesian testing context. It is interesting that the credible interval given by our method is just in accordance with that derived by Fisher or Jeffreys. This way of constructing the credible interval via the Bayesian testing evidence is in analogy with the way of constructing the confidence interval via the frequentist evidence.

By this method of analyzing the Behrens-Fisher problem, we give an efficient way of dealing with nuisance parameters which are the source of the difficulty with this problem. This is because our inferences about θ = μ 1 - μ 2 are based on the posterior distribution of the interested parameter, which can be easily obtained in the Bayesian framework even when nuisance parameters are present. Both the Bayesian evidence and the credible interval can be computed quite easily by the Monte Carlo method. Furthermore, by this method, even if an informative prior which is different from that in (5) is used, the corresponding Bayesian evidence and credible intervals could be obtained smoothly. In other words, this method provides an efficient way of combining the information contained in the prior and that contained in the samples. Further research would be needed to evaluate the performance of the inferences by the proposed method if an informative prior is introduced.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors thank the editors and reviewers for their kind help and valuable comments that lead to significant improvement of this paper. The work was supported by the Foundation for Training Talents of Beijing (Grant no. 19000532377), the Project of Construction of Innovative Teams and Teacher Career Development for Universities and Colleges Under Beijing Municipality (Grant no. IDHT20130505), and the Research Foundation for Youth Scholars of Beijing Technology and Business University (Grant no. QNJJ2012-03).

Tsui K.-W. Weerahandi S. Generalized P -values in significance testing of hypotheses in the presence of nuisance parameters Journal of the American Statistical Association 1989 84 406 602 607 MR1010352 Jeffreys H. Theory of Probability 1967 3rd Oxford University Press Wallace D. L. Fisher R. A. The Behrens-Fisher and Feiller-Creasy Problems 1980 New York, NY, USA Springer Edited by R. A. Fisher MR578886 Meng X.-L. Posterior predictive P -values The Annals of Statistics 1994 22 3 1142 1160 10.1214/aos/1176325622 MR1311969 Behrens B. V. Ein Beitrag zur Fehlerberechnung bei wenige Beobachtungen Landwirtschaftliches Jahresbuch 1929 68 807 837 Bartlett M. S. The information available in small samples Proceedings of the Cambridge Philosophical Society 1936 32 4 560 566 10.1017/S0305004100019290 Fisher R. A. The fiducial argument in statistical inference The Annals of Eugenics 1935 11 141 172 Welch B. L. The significance of the difference between two means when the population variances are unequal Biometrika 1938 29 350 362 Welch B. L. The generalization of student's problem when several different population variances are involved Biometrika 1947 34 28 35 MR0019277 ZBL0029.40802 Jeffreys H. Theory of Probability 1961 Oxford University Press MR0187257 Wilks S. S. On the problem of two samples from normal populations with unequal variances Annals of Mathematical Statistics 1940 11 4 475 476 10.1214/aoms/1177731837 Chernoff H. Asymptotic studentization in testing of hypotheses Annals of Mathematical Statistics 1949 20 268 278 MR0030170 10.1214/aoms/1177730035 ZBL0033.07701 Chand U. Distributions related to comparison of two means and two regression coefficients Annals of Mathematical Statistics 1950 21 507 522 MR0038612 10.1214/aoms/1177729748 ZBL0039.35503 Banerjee S. K. Approximate confidence interval for linear functions of means of k populations when the population variances are not equal Sankhya 1960 22 357 358 MR0125679 Srivastava M. S. On a sequential analogue of the Behrens-Fisher problem Journal of the Royal Statistical Society B 1970 32 144 148 MR0275581 ZBL0209.50404 Ghosh M. Kim Y. The Behrens-Fisher problem revisited: a Bayes-frequentist synthesis Biometrika 2001 29 1 5 17 MR1834483 ZBL1015.62023 Madruga M. R. Pereira C. A. B. Stern J. M. Bayesian evidence test for precise hypotheses Journal of Statistical Planning and Inference 2003 117 2 185 198 10.1016/S0378-3758(02)00368-3 MR2004654 ZBL1021.62018 McMurry T. L. Politis D. N. Romano J. P. Subsampling inference with K populations and a non-standard Behrens-Fisher problem International Statistical Review 2012 80 1 149 175 10.1111/j.1751-5823.2012.00177.x MR2990350 Yin Y. A new Bayesian procedure for testing point null hypotheses Computational Statistics 2012 27 2 237 249 10.1007/s00180-011-0252-6 MR2923226 ZBL06080832 Lindley D. V. A statistical paradox Biometrika 1957 44 187 192 Lehmann E. L. Nonparametrics: Statistical Methods Based on Ranks 1975 San Francisco, Calif, USA Holden-Day Ghosh J. K. Delampady M. Samanta T. An Introduction to Bayesian Analysis 2006 New York, NY, USA Springer MR2247439