TSWJ The Scientific World Journal 1537-744X Hindawi Publishing Corporation 381539 10.1155/2013/381539 381539 Research Article Testing Normal Means: The Reconcilability of the P Value and the Bayesian Evidence http://orcid.org/0000-0003-1801-0031 Yin Yuliang 1 Zhao Junlong 2 Guillén M. Umarov S. 1 School of Economics Beijing Technology and Business University Beijing 100048 China btbu.edu.cn 2 School of Mathematics and System Science Beihang University LMIB of the Ministry of Education Beijing 100083 China buaa.edu.cn 2013 30 10 2013 2013 08 08 2013 09 09 2013 2013 Copyright © 2013 Yuliang Yin and Junlong Zhao. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The problem of reconciling the frequentist and Bayesian evidence in testing statistical hypotheses has been extensively studied in the literature. Most of the existing work considers cases without the nuisance parameters which is not the frequently encountered situation since the presence of the nuisance parameters is very common in practice. In this paper, we consider the reconcilability of the Bayesian evidence against the null hypothesis H0 in terms of the posterior probability of H0 being true and the frequentist evidence against H0 in terms of the P value in testing normal means where the nuisance parameters are present. The reconcilability of evidence can be obtained both for testing a normal mean and for the Behrens-Fisher problem.

1. Introduction

In the problem of testing a statistical hypothesis H0, a frequentist may give evidence against H0 by the observed significance level, the P value, while a Bayesian may give it by the posterior probability that H0 is true. Lindley  illustrated the possible discrepancy between the Bayesian and the frequentist evidence. The relationship of these two measures of evidence is then extensively studied in the literature. Pratt  revealed that the P values are usually approximately equal to the posterior probabilities in the one-sided testing problems. Casella and Berger  considered testing the one-sided hypothesis for a location parameter and showed that the lower bounds of the posterior probability over some reasonable classes of priors are exactly equal to the corresponding P values in many cases. Some important papers which deal with the reconcilability of the Bayesian and frequentist evidence are Bartlett , Cox , Shafer , Berger and Delampady , and Berger and Sellke .

Although many researches have been carried out to deal with the problem of reconciling the Bayesian and frequentist evidence and some of them show that evidence is reconcilable in several specific situations, most of the existing work assumes that no other unknown parameters are present except the parameters of interest. In fact, we may be confronted with the nuisance parameters in various situations. In the location-scale settings, for example, when the location parameter is unknown, so is the scale parameter, in general.

However, in significance testing of hypotheses with the nuisance parameters, the classical P values are typically not available. Tsui and Weerahandi , considering testing the one-sided hypothesis of the form (1)H0:θcversusH1:θ>c, where θ is the parameter of interest and c is a fixed constant, introduced the concept of the generalized P value, which appears to be useful in situations where conventional frequentist approaches do not provide useful solutions.

Tsui and Weerahandi  and some later relevant works formulated the generalized P values for many specific examples. Hannig et al.  provided a general method for constructing the generalized P value via fiducial inference.

In this paper, for the one-sided testing situations about normal means where the nuisance parameters are present, we study the reconcilability of the Bayesian evidence and the generalized P value. It is shown that, under the conjugate class of prior distributions, the Bayesian evidence and the generalized P value are reconcilable both for the problem of testing a normal mean and for the Behrens-Fisher problem.

This paper is organized as follows. In Section 2, we give the main results of the reconcilability of the P value and the Bayesian evidence in testing normal means. Some conclusions and discussions are given in Section 3.

2. Main Results

In this section, we consider two testing problems in which the nuisance parameters are present. When no efficient classical frequentist evidence is available because of the presence of the nuisance parameters, we formulate the frequentist evidence by the generalized P value.

2.1. One-Sample Normal Mean

Let X1,,Xn be a random sample from a normal population N(μ,σ2), where both the mean μ and the variance σ2 are unknown. Consider now the following problem of testing the mean of a normal distribution (2)H0:μcversusH1:μ>c, where c is a fixed constant.

For this testing problem, where the nuisance parameter is present, we can still obtain the classical P value as (3)p(x)=P(Tn-1n(c-x¯)s), where Tn-1 is a t-variable with n-1 degrees of freedom and x¯ and s2 stand for the observed sample mean and sample variance, respectively.

To derive the Bayesian evidence, we need a prior for the parameters. One reasonable and conventional class of priors for μ and σ2 is the following conjugate class of prior distributions Gc1: (4)μσ2~N(μ0,σ2κ0),              1σ2~Gamma(ν02,ν0σ022), where the prior parameters (μ0,κ0) can be interpreted as the mean and sample size of the normal prior observations and (σ02,ν0) the sample variance and sample size of the Gamma prior observations.

Under (4) we have (5)μx,σ2~N(μn(x),σ2κn),1σ2x~Gamma(νn2,νnσn2(x)2), where x=(x1,,xn),  κn=κ0+n,  μn(x)=(κ0μ0+nx¯)/κn, νn=ν0+n, and σn2(x)=[ν0σ02+(n-1)s2+κ0n(x¯-μ0)2/κn]/νn. Therefore, we can give the posterior density for (μ,σ2) as (6)π(μ,σ2x)=κn2πσexp[-κn(μ-μn(x))22]×((νn/2)σn2(x))νn/2Γ(νn/2)σνn+2exp[-νnσn2(x)2σ2]=κn((νn/2)σn2(x))νn/22πΓ(νn/2)σνn+3×exp[-κn(μ-μn(x))2+νnσn2(x)2σ2]. Then the marginal posterior density for μ can be obtained by integrating out σ2 as (7)π(μx)=κn((νn/2)σn2(x))-1/2Γ((νn+1)/2)Γ(νn/2)×[1+κn(μ-μn(x))2νnσn2(x)]-(νn+1)/2, from which we know that (8)κn(μ-μn(x))σn(x)~t(νn). Consequently, the posterior probability of H0 being true is (9)P(H0x)=P(Tνnκnσn2(x)(c-μn(x))), where Tνn is a t-variable with νn degrees of freedom. Notice that if μ0=c, we have (10)limκ0,σ00P(H0x)=P(Tνnνnn-1n(c-x¯)s).

Lemma 1.

Let T be a t-variable with α degrees of freedom, where α is a positive real number. Then for R=T/α and a fixed constant r, P(Rr) is nonincreasing in α if r0 and is nondecreasing in α if r>0.

Proof.

Suppose that X is a nonpositive random variable obtained by the negative part of T; that is, the density of X is (11)f(x,α)=2Γ((α+1)/2)απΓ(α/2)(1+x2α)-(α+1)/2,x0. Then the density of Y=X/α is (12)p(y,α)=2Γ((α+1)/2)πΓ(α/2)(1+y2)-(α+1)/2,y0. By Theorem 3.3.2 in Lehmann , for any fixed nonpositive constant r, we have that P(Yr) is nonincreasing in α since it can be verified that the family of densities p(y,α) has monotone likelihood ratio in y. This implies that Lemma 1 holds for the case when r0 since P(Rr)=P(Xr)/2. Since when r>0, we have P(Rr)=1/2+P(0Rr), the proof for the latter case is completely analogous if we introduce a nonnegative random variable obtained by the positive part of T.

Now take r=n(c-x¯)/(n-1s). By Lemma 1, for r0, we have (13)P(Tνnνnr)P(Tn-1n-1r)=P(Tn-1n(c-x¯)s). Then comparing (3) and (10), for μ0=c and any fixed nonnegative ν0, we have (14)limκ0,σ00P(H0x)<p(x),as  p(x)<12, which implies that (15)infπGc1P(H0x)<p(x),as  p(x)<12. The reconcilability of the Bayesian and frequentist evidence is therefore obtained in this testing problem. We summarize this as the following theorem.

Theorem 2.

For testing the hypothesis of the form (2) under a normal distribution N(μ,σ2) with σ2 unknown, the Bayesian and frequentist lines of evidence are reconcilable under the conjugate class of priors (4).

2.2. Behrens-Fisher Problem

Now we turn to consider the Behrens-Fisher problem. It is a classical testing situation in which the nuisance parameters are present and no useful pivotal quantities are available. Suppose that X1,,Xm and Y1,,Yn are two independent random samples from two normal populations N(μ1,σ12) and N(μ2,σ22), respectively, where both σ12 and σ22 are completely unspecified. We are interested in testing the hypothesis of the form (16)H0:μ1-μ2cversusH1:μ1-μ2>c, where c is a fixed constant.

In situations where the traditional frequentist approaches fail to provide useful solutions, the conception of the generalized P values introduced by Tsui and Weerahandi  appears to be helpful in deriving the frequentist evidence for testing a statistical hypothesis. For this specific problem of testing hypothesis (16), we can give the generalized P value as (17)p(x)=P(x¯-y¯-(m-1s1Umχm-12-n-1s2Vnχn-12)c)=P(s2nTn-1-s1mTm-1c-(x¯-y¯)), where U~N(0,1), V~N(0,1), χm-12~χ2(m-1), χn-12~χ2(n-1), Tm-1~t(m-1), and Tn-1~t(n-1).

In this problem, we consider the reconcilability of evidence under the following conjugate class of prior distributions Gc2: (18)μ1σ12~N(μ01,σ012κ01),1σ12~Gamma(ν012,ν01σ0122);μ2σ22~N(μ02,σ022κ02),1σ22~Gamma(ν022,ν02σ0222). Under Gc2, the posterior density of (μ1,μ2,σ12,σ22) is (19)π(μ1,μ2,σ12,σ22x,y)=f(μ1,μ2,σ12,σ22,x,y)--00f(μ1,μ2,σ12,σ22,x,y)dσ12dσ22dμ1dμ2, where (20)f(μ1,μ2,σ12,σ22,x,y)=(exp[-Σi=1m(xi-μ1)2+κ01(μ1-μ01)2+ν01σ0122σ12-Σi=1n(yi-μ2)2+κ02(μ2-μ02)2+ν02σ0222σ22])×((σ12)(m+ν01+4)/2(σ22)(n+ν02+4)/2)-1. Let θ=μ1-μ2. Then the posterior density of (θ,μ2,σ12,σ22) is (21)π(θ,μ2,σ12,σ22x,y)=f(θ,μ2,σ12,σ22,x,y)--00f(μ1,μ2,σ12,σ22,x,y)dσ12dσ22dμ1dμ2, where (22)f(θ,μ2,σ12,σ22,x,y)=(exp[-Σi=1m(xi-θ-μ2)2+κ01(θ+μ2-μ01)2+ν01σ0122σ12-Σi=1n(yi-μ2)2+κ02(μ2-μ02)2+ν02σ0222σ22])×((σ12)(m+ν01+4)/2(σ22)(n+ν02+4)/2)-1. So that the posterior probability of H0 is (23)P(H0x)=-c-00f(θ,μ2,σ12,σ22,x,y)dσ12dσ22dμ2dθ--00f(μ1,μ2,σ12,σ22,x,y)dσ12dσ22dμ1dμ2. It is straightforward to check that (24)limκ01,κ02,σ01,σ020P(H0x)=Γ((m+ν01+2)/2)Γ((n+ν02+2)/2)π(m-1)(n-1)Γ((m+ν01+1)/2)Γ((n+ν02+1)/2)×-m(x¯-y¯-c)/s1+ms2r/s1(1+t2m-1)(-m-ν01-2)/2×(1+r2n-1)(-n-ν02-2)/2dtdr,=P(x¯-y¯-(m+ν01+1s1Um+ν02+2χm+ν01+12x¯-11y¯--n+ν02+1s2Vn+ν02+2χm+ν01+12)c)=P(s2n+ν02+2Tn+ν02+1-s1m+ν01+2Tm+ν01+1c-(x¯-y¯)s2n+ν02+2Tn+ν02+1-s1m+ν01+2Tm+ν01+1), where x¯ and y¯ are the observation of the sample mean X¯ and Y¯, respectively, s12 and s22 are that of the sample variance S12 and S22 respectively, U~N(0,1), V~N(0,1), χm+ν01+12~χ2(m+ν01+1) and χn+ν02+12~χ2(n+ν02+1).

Now we prove an interesting result that, when m and n are sufficiently large, the frequentist and Bayesian lines of evidence given respectively by (17) and (23) are reconcilable for any fixed ν01 and ν02 under the prior class of Gc2.

Theorem 3.

As min{m,n}+, for any fixed 0<ν01,ν02<, we have (25)limκ01,κ02,σ01,σ020P(H0x)<p(x),as  p(x)<12, which implies that (26)infπGc2P(H0x)<p(x),as  p(x)<12.

Proof.

Let (27)A1=(n-1)s22nχn-12,B1=(n+ν02+1)s22(n+ν02+2)χn+ν02+12.

(I) We first prove that, given s2, as n is sufficiently large, we have (28)B1<dA1. In fact, for any γ>0, as n is sufficiently large, (29)P(B1<γ)=P(χn+ν02+12>(n+ν02+1)s22(n+ν02+2)γ),=P(χn+ν02+12>(n-1)s22nγ+ϵ(n,γ)), where ϵ(n,γ)=(ν02+2)s22/[(n+ν02+2)nγ]0, as n. On the other hand, we have (30)P(A1<γ)=P(χn-12>(n-1)s22nγ). Since χn+ν02+12>dχn-12 holds for any n, it follows that, as n, (31)P(B1<γ)>P(A1<γ). That is, (32)B1<dA1.

Similarly, as m+, we have (33)B2:=(m+ν01+1)s12(m+ν01+2)χm+ν01+12<d(m-1)s12mχm-12:=A2. Therefore, we have (34)B1+B2<dA1+A2. Consequently, we have (35)B1+B2<dA1+A2.

(II) We now show that the final conclusion holds. In fact, if we let C(x,y)=c-(x¯-y¯), then (36)p(x)=P(A1V-A2UC(x,y))=E(P(A1V-A2UC(x,y)A1,A2))=E(Φ(C(x,y)A1+A2)), where Φ(·) stands for the cumulative distribution function of a standard normal distribution and the last equation is due to the fact that U and V are independent normal distributions.

Similarly, for (24), we have (37)limκ01,κ02,σ01,σ020P(H0x,y)=E(P(B1V-B2UC(x,y)B1,B2))=E(Φ(C(x,y)B1+B2)).

Note that for each C(x,y) in (-,0),Φ(C(x,y)/R) is increasing in R(0,). Therefore, by (35), we have (38)Φ(C(x,y)B1+B2)<dΦ(C(x,y)A1+A2),as  C(x,y)<0.

Consequently, we have (39)E(Φ(C(x,y)B1+B2)<E(Φ(C(x,y)A1+A2))),<E(Φ(C(x,y)A1+A2))ggg.as  C(x,y)<0.

In addition, by the symmetry of the t-distribution, it follows that C(x,y)<0 is equivalent to p(x)<1/2. Therefore, by (36), (37), and (39), the conclusion of Theorem 3 holds.

The following theorem shows that, even for fixed m and n with 2<m, n<, we still obtain the reconcilability of the frequentist and Bayesian evidence.

Theorem 4.

As min{ν01,ν02}, the conclusion of Theorem 3 holds for any fixed m and n with 2<m, n<; that is, (40)infπGc2P(H0x)<p(x),as  p(x)<12.

Proof.

We still adopt the notations of Theorem 3. We first prove that, as ν02, (41)B1<dA1. By the proof of Theorem 3, we have (42)P(B1<γ)=P(χn+ν02+12>(n-1)s22nγ+ϵ(n,γ)), where (43)ϵ(n,γ)=(ν02+2)s22(n+ν02+2)nγ. It is obvious that (44)supν02ϵ(n,γ)=limν02ϵ(n,γ)=s22nγ.

Let fn(t) denote the density function of χn2. Then it is easy to see that fn(t) reaches the maximum at t=n-2 and that maxtfn(t)=fn(n-2)0, as n. Therefore, as ν02 is sufficiently large, it holds that (45)P(B1<γ)=P(χn+ν02+12>s22γ)+ϵ(ν02)=P(B~1<γ)+ϵ(ν02) for some ϵ(ν02)>0, where ϵ(ν02)0, as ν02, and B~1=s22/χn+ν02+12.

Therefore, for any fixed n  (2<n<), as ν02, we have (46)P(B~1<γ)=P(χn+ν02+12>s22γ)=P(χn+ν02+12-(n+ν02+1)2(n+ν02+1)>s22/γ-(n+ν02+1)2(n+ν02+1))=1-Φ(s22/γ-(n+ν02+1)2(n+ν02+1))-ϵ1(ν02)1, where ϵ1(ν02)0, as ν02.

On the other hand, for any fixed n  (2<n<) and γ0, we have (47)P(A1<γ)=P(χn-12>(n-1)s22nγ)1-ϵ0, for some ϵ0>0. Therefore, as ν02, by (46) and (47), we have P(B~1<γ)-P(A1<γ)>0. Furthermore, by (45), we have (48)P(B1<γ)>P(A1<γ). That is, (49)B1<dA1. Similarly, for any fixed m  (2<m<), as ν01, we have (50)B2<dA2. Therefore, similar to Theorem 3, as min{ν01,ν02}, for any fixed m and n with 2<m, n<, we have (51)B1+B2<dA1+A2. The rest part of the proof is similar to that of (II) of Theorem 3.

The following simulation results show that even for small and fixed values of m and n or ν01 and ν02, the generalized P value and Bayesian evidence for testing the Behrens-Fisher problem are still reconcilable.

For fixed ν01=0.5 and ν02=1 and for s12=1 and s22=4, taking different values of m and n, some results of comparing the P value and limκ01,κ02,σ01,σ020P(H0x) are listed in Table 1.

P value and limP(H0x) for testing the Behrens-Fisher problem for different m and n.

m = 2 , n=2
c - ( x ¯ - y ¯ ) −2.5000 −2.1000 −1.8000 −1.6000 −1.3000 −0.9000 −0.5000 −0.3000 −0.1000
p ( x ) 0.1165 0.1405 0.1785 0.1880 0.2330 0.3110 0.3850 0.4155 0.4745
lim P ( H 0 x ) 0.0195 0.0420 0.0505 0.0705 0.1280 0.1945 0.3140 0.3770 0.4675

m = 5 , n=5

c - ( x ¯ - y ¯ ) −2.5000 −2.1000 −1.8000 −1.6000 −1.3000 −0.9000 −0.5000 −0.3000 −0.1000
p ( x ) 0.0385 0.0520 0.0760 0.0995 0.1405 0.2250 0.3460 0.3845 0.4705
lim P ( H 0 x ) 0.0060 0.0195 0.0400 0.0465 0.0765 0.1630 0.2840 0.3640 0.4630

m = 1 , n=8

c - ( x ¯ - y ¯ ) −2.5000 −2.1000 −1.8000 −1.6000 −1.3000 −0.9000 −0.5000 −0.3000 −0.1000
p ( x ) 0.1165 0.1405 0.1785 0.1880 0.2330 0.3110 0.3850 0.4155 0.4745
lim P ( H 0 x ) 0.0195 0.0420 0.0505 0.0705 0.1280 0.1945 0.3140 0.3770 0.4675

m = 3 , n=5

c - ( x ¯ - y ¯ ) −2.5000 −2.1000 −1.8000 −1.6000 −1.3000 −0.9000 −0.5000 −0.3000 −0.1000
p ( x ) 0.0600 0.0755 0.1030 0.1360 0.1765 0.2615 0.3495 0.4355 0.4595
lim P ( H 0 x ) 0.0070 0.0205 0.0405 0.0545 0.0845 0.1715 0.3005 0.3760 0.4670

m = 12 , n=3

c - ( x ¯ - y ¯ ) −2.5000 −2.1000 −1.8000 −1.6000 −1.3000 −0.9000 −0.5000 −0.3000 −0.1000
p ( x ) 0.0835 0.1130 0.1260 0.1585 0.2075 0.2975 0.3700 0.4245 0.4995
lim P ( H 0 x ) 0.0185 0.0315 0.0515 0.0730 0.1025 0.1810 0.2905 0.3635 0.4585

For fixed m=8 and n=10 and for s12=1 and s22=4, taking different values of ν01 and ν02, we list some results of comparing P value and limκ01,κ02,σ01,σ020P(H0x) in Table 2.

P value and limP(H0x) for testing the Behrens-Fisher problem for different ν01 and ν02.

ν 01 = 0.5 , ν02=0.5
c - ( x ¯ - y ¯ ) −2.5000 −2.1000 −1.8000 −1.6000 −1.3000 −0.9000 −0.5000 −0.3000 −0.1000
p ( x ) 0.0030 0.0080 0.0130 0.0240 0.0490 0.1310 0.2720 0.3495 0.4565
lim P ( H 0 x ) 0.0010 0.0035 0.0060 0.0135 0.0280 0.1030 0.2245 0.3375 0.4410

ν 01 = 2 , ν02=2

c - ( x ¯ - y ¯ ) −2.5000 −2.1000 −1.8000 −1.6000 −1.3000 −0.9000 −0.5000 −0.3000 −0.1000
p ( x ) 0.0025 0.0045 0.0165 0.0270 0.0545 0.1130 0.2550 0.3480 0.4595
lim P ( H 0 x ) 0.0005 0.0015 0.0050 0.0080 0.0245 0.1000 0.2125 0.3155 0.4345

ν 01 = 0.2 , ν02=0.5

c - ( x ¯ - y ¯ ) −2.5000 −2.1000 −1.8000 −1.6000 −1.3000 −0.9000 −0.5000 −0.3000 −0.1000
p ( x ) 0.0025 0.0085 0.0145 0.0295 0.0470 0.1225 0.2550 0.3400 0.4610
lim P ( H 0 x ) 0.0015 0.0030 0.0045 0.0130 0.0275 0.1055 0.2215 0.3245 0.4455

ν 01 = 0.5 , ν02=0.2

c - ( x ¯ - y ¯ ) −2.5000 −2.1000 −1.8000 −1.6000 −1.3000 −0.9000 −0.5000 −0.3000 −0.1000
p ( x ) 0.0055 0.0070 0.0170 0.0265 0.0545 0.1215 0.2485 0.3535 0.4650
lim P ( H 0 x ) 0.0010 0.0055 0.0070 0.0125 0.0440 0.1085 0.2500 0.3285 0.4300

ν 01 = 5 , ν02=0.5

c - ( x ¯ - y ¯ ) −2.5000 −2.1000 −1.8000 −1.6000 −1.3000 −0.9000 −0.5000 −0.3000 −0.1000
p ( x ) 0.0025 0.0060 0.0150 0.0230 0.0570 0.1145 0.2645 0.3685 0.4460
lim P ( H 0 x ) 0.0020 0.0025 0.0065 0.0095 0.0330 0.0890 0.2205 0.3105 0.4305
3. Conclusions

In the presence of the nuisance parameters, we study the reconcilability of the P value and the Bayesian evidence in the one-sided hypothesis testing problem about normal means. For the problem of testing a normal mean where the nuisance parameter is present, it is shown that the Bayesian and frequentist lines of evidence are reconcilable. For the Behrens-Fisher problem, it is illustrated that if the sample sizes m and n tend to infinity, then for fixed prior parameters ν01 and ν02, both lines of evidence are reconcilable. Furthermore, it is illustrated that if the prior parameters ν01 and ν02 tend to infinity, then for any fixed sample sizes m and n, lines of evidence are reconcilable. Simulation results show that even for small and fixed values of sample sizes m and n or for small values of prior parameters ν01 and ν02, the reconcilable conclusion of the Bayesian and frequentist evidence still holds.

This provides another illustration of testing situation where the Bayesian and frequentist evidence can be reconciled and may therefore to some extent prevent people from debasing or even dismissing P values as evidence in hypothesis testing problems. Furthermore, our results of the reconcilability in the one-sided testing situations may help us to come to the idea that maybe it is arbitrary to assert the irreconcilability of the evidence in the two-sided (point or interval) hypothesis testing problems and perhaps we should be concerned more about the appropriateness of the methods we employ to tackle a two-sided hypothesis in both the frequentist and the Bayesian frameworks.

Acknowledgments

The work was supported by the Foundation for Training Talents of Beijing (Grant no. 19000532377), the Project of Construction of Innovative Teams and Teacher Career Development for Universities and Colleges Under Beijing Municipality (Grant no. IDHT20130505) and the Research Foundation for Youth Scholars of Beijing Technology and Business University (Grant no. QNJJ2012-03).

Lindley D. V. A statistical paradox Biometrika 1957 44 187 192 Pratt J. W. Bayesian interpretation of standard inference statements Journal of the Royal Statistical Society B 1965 27 169 203 Casella G. Berger R. L. Reconciling Bayesian and frequentist evidence in the one-sided testing problems Journal of the American Statistical Association 1987 82 397 106 111 10.1080/01621459.1987.10478396 Bartlett M. S. A comment on D. V. Lindley's statistical paradox Biometrika 1957 44 533 534 Cox D. R. The role of significance tests Scandinavian Journal of Statistics 1977 4 49 70 Shafer G. Lindley's paradox Journal of the American Statistical Association 1982 77 378 325 351 10.1080/01621459.1982.10477809 Berger J. O. Delampady M. Testing precise hypotheses Statistical Science 1987 2 317 352 10.1214/ss/1177013238 Berger J. O. Sellke T. Testing a point null hypothesis: the irreconcilability of p-values and evidence Journal of the American Statistical Association 1987 82 112 122 Tsui K. W. Weerahandi S. Generalized p-values in significance testing of hypotheses in the presence of nuisance parameters Journal of the American Statistical Association 1989 84 602 607 Hannig J. Iyer H. Patterson P. Fiducial generalized confidence intervals Journal of the American Statistical Association 2006 473 254 269 Lehmann E. L. Testing Statistical Hypotheses 1986 2nd New York, NY, USA John Wiley & Sons