Inequalities Between Hypergeometric Tails

A special inequality between the tail probabilities of certain related hyperge- ometrics was shown by Seneta and Phipps (19) to suggest useful 'quasi-exact' alternatives to Fisher's (5) Exact Test. With this result as motivation, two inequalities of Hajek and Havranek (6) are investigated in this paper and are generalised to produce inequalities in the form required. A parallel inequality in binomial tail probabilities is also established.


Introduction
The hypergeometric variable U ∼ HG(z, m, n) has probability distribution A standard result for independent binomial variables X and Y , where X ∼ B(m, p 1 ) and Y ∼ B(n, p 2 ) with p 1 = p 2 (common success probability) is that the distribution of X, conditional on Z(= X + Y ) = z, is hypergeometric, HG(z, m, n).This result is exploited in Fisher's Exact Test, the commonly used approach for testing the hypothesis of common success probability (H 0 : p 1 = p 2 = p) in independent binomials when the sample sizes, m and n are small.In this context, X and Y represent the number of successes in the two independent samples, and the observed success and failure frequencies may be summarized in a 2 × 2 table.The fixed values are m and n: Based on these empirically observed values of (X, Y ), the Fisher-exact Pvalue for an upper one-sided test (with H 1 : p 1 > p 2 ) is P (X ≥ a|Z = z) = p(a; z, m, n), which we shall denote by the generic p F .The corresponding test procedure at nominal level α ∈ (0, 1) is: "Reject H 0 if p F ≤ α", and the test is known as Fisher's Exact Test.
This test is conditional since it treats z as fixed, rather than as an observed value of the variable Z(= X + Y ).The use of p F as P-value cleverly avoids the theoretical and computational problems involved in calculating an unconditional P-value, since it is free of the nuisance parameter, p, and it also avoids the problems of 'ordering' the 2 × 2 tables.It is generally agreed however that p F is conservative.The difference of opinion about the reason (discreteness or conditioning) for this conservativeness is well documented, and a comprehensive overview of these opinions is presented by Sahai and Khurshid [17].Fisher's test is obviously α-level in the unconditional setting where the variable corresponding to p F is p(X; Z, m, n).
and for any nominal level α ∈ (0, 1).Fisher's test is however very conservative, and it is not unusual to find that P H0 (p F ≤ α) < 1  2 α, as demonstrated by Boschloo [3].
This excessive conservativeness of p F suggests that a less conservative measure may be preferable, provided it is also easily calculated.In §2 we give a brief summary of the findings of Seneta and Phipps [19], concerning the properties of two measures based on hypergeometric tails.These measures, p(.), not only have some statistical justification as significance measures in the two binomial problem, but also satisfy the strict double inequality (1).This means that they are less conservative than p(a; z, m, n) = p F and yet not as liberal as p(a + 1; z, m, n): p(a + 1; z, m, n) < p(.) < p(a; z, m, n). ( Motivated by this result, we generalise two inequalities due to Hájek and Havránek [6], and show that there are more related hypergeometric tails, p(.), satisfying (1).This is followed by a numerical example, comparing the measures p(.).A parallel inequality in binomial tails is established in §3 and some implications are discussed.

Some Alternatives to Fisher's Exact Test
We begin by discussing two measures which are of historical significance in the two-binomial context, and which also satisfy (1).

Lancaster's mid-P, p M
A measure which has gained acceptance as an alternative to Fisher's Pvalue (see for example Hirji, Tan and Elashoff [7]) is an adjustment for discrete P-values due to Lancaster [8], [9].The adjustment is called the mid-P and will be denoted by p M .Lancaster's mid-P adjustment of p F is defined by Since p M is the average of p(a; z, m, n) and p(a + 1; z, m, n) it is clear that (1) is satisfied by p(.) = p M , and therefore that p M is less conservative than Fisher's p F but does not err too far in the other direction.Barnard [1] suggests that p F and p M should both be quoted when testing equality of success probability for small samples because of the conservativeness of p F .Further, Berry and Armitage [2] point out that p M has mean 1 2 and variance close to 1  12 , in line with the properties of uniformly distributed P-values (based on continuous test statistics) and that p M has some justification as a significance measure on these grounds.(We note here that all other weighted averages of p(a; z, m, n) and p(a + 1; z, m, n) also satisfy (1), but that they do not have the stated desirable properties of p M .) The corresponding mid-P test procedure at arbitrary nominal significance level α is "Reject H 0 when p M ≤ α."In contrast with Fisher's Exact Test, this procedure is not strictly α-level since there is no guarantee that P H0 (p M ≤ α) ≤ α for arbitrary α ∈ (0, 1).Hirji, Tan and Elashoff [7] describe the procedure as quasi-exact.Their extensive empirical assessment reveals the excessive conservativeness of p F when compared with p M .They also demonstrate that in the unconditional setting p M is occasionally (but only mildly) anti-conservative, ie P H0 (p M ≤ α) ≈ α even though α is occasionally exceeded.It is worth mentioning that this is true also of the Pearson χ 2 -statistic used for large samples in this context (loc.cit.).
Hirji et al. [7] argue that closeness to nominal levels with only rare exceedance is an important criterion for assessing a test procedure.They conclude that although not strictly a P-value, p M can be regarded as an approximation in the unconditional setting, just as the chi-squared approximation is used in the large-sample case.

Liebermeister's measure, p L
We now turn to a different hypergeometric, HG(z +1, m+1, n+1).The use of its tail probability, p(a+1; z +1, m+1, n+1), in the two binomial setting dates back to Liebermeister [10]; Seneta [18] shows the Bayesian derivation and historical background to this tail probability, which we shall denote by p L .We note that Overall [11], [12] also recommends the use of p L , purely on the basis of worked numerical examples.
Seneta and Phipps [19] prove that, in addition to the Bayesian origins of p L , inequality (1) is satisfied by p(.) = p L , ie From ( 2), it is seen that Liebermeister's measure, p L is less conservative than p F but not too anticonservative and so, like the mid-P, p L is quasiexact and can be interpreted as an approximation to the unconditional P-value in the sense that P H0 (p L ≤ α) ≈ α for arbitrary α ∈ (0, 1).A comparison of the degree of anti-conservatism and also power comparisons are carried out by Seneta and Phipps [19] for the measures p F , p M and p L .The point is also made that the calculations required for p L are no more complicated than for p F .In fact existing software for p F can be used simply by adding unity to the diagonals a and d in the 2 × 2 table of frequencies, as the numerical example in §2.4 demonstrates.

Further inequalities in hypergeometic tails
Other promising related hypergeometrics are HG(z + 1, m + 1, n) and HG(z, m − 1, n).Hájek and Havránek [6] proved two inequalities involving their tail probabilities.They showed, subject to a > zm m+n , that (in our notation): p(a + 1; z + 1, m + 1, n) ≤ p F and also p(a; z, m − 1, n) ≤ p F .We shall write p(a + 1; z + 1, m + 1, n) as p Ha and p(a; z, m − 1, n) as p Hb .In the context of an upper tail test, it is only the cases a > zm m+n which are of interest since the mean of HG(z, m, n) is zm m+n .Nevertheless we show that a > zm m+n is unnecessarily restrictive and also that the inequalities can actually extend to double inequalities like (1), which means that p Ha and p Hb are both less conservative than p F , but not as liberal as p(a+1; z, m, n).

The inequality for p
The inequality: holds for l < a ≤ u, where l = max(0, z − n) and u = min(z, m) are the lower and upper bounds respectively of HG(z, m, n).
The boundary case a = l is of no interest in significance testing, but we note here for completeness that (3) does also hold for a = l when z < n.The right hand inequality '<' needs to be replaced by '≤' only for case a = l when z ≥ n, and in that case p(a Since HG(z, m, n) is degenerate when z = 0 or z = m + n, statistical interest is in the case 0 < z < m + n only.A brief outline of the proof of (3) for this case now follows.The complete proof, including a discussion of the degenerate cases z = 0 and z = m + n, is in Phipps [14].
Outline of the proof The right hand inequality of (3), which is the strict version of the inequality of Hájek and Havránek [6], is considered first, namely: Clearly the two tails p(a + 1; z + 1, m + 1, n) and p(a; z, m, n) have the same number of summands.It can easily be seen that all the summands of p(a+1; z+1, m+1, n) are strictly smaller than the corresponding summands of p(a; z, m, n) when a > (m+1)(z+1) (m+n+1) − 1, but not otherwise.Hence ( 4) is satisfied for a ≥ l , where l is the integer part of (m+1)(z+1) (m+n+1) .To prove that (4) is also satisfied for a < l , we focus on the summands of the lower tails: 1 − p(a + 1; z + 1, m + 1, n) and 1 − p(a; z, m, n).
A parallel argument gives p(a + 1; z, m, n) < p(a + 1 : z + 1, m + 1, n) for all integer a satisfying l ≤ a ≤ u.Taking this inequality together with (4), the double inequality ( 3) is proved for l < a ≤ u, with a weaker inequality at a = l.

The inequality for p
For l and u defined as in §2.3.1, the following inequality holds for l < a ≤ u: The proof is not given here, but follows similar arguments to those given for p Ha .Notice that the left hand inequality of ( 5) is not strict at a = m since both p(m + 1; z, m, n) and p(m; z, m − 1, n) are identically zero.This means that an outcome with frequencies: has positive probability, and yet p Hb = 0.This is an unacceptable approximation to a positive P-value and so p H b is not suitable as a significance measure.Nevertheless we include p Hb for completeness in the following numerical example.
• The frequencies are too small for the Pearson χ 2 -statistic to be appropriate, but the approximate P-value calculated from its positive square root is Chi-P = 0.028.The Yates' corrected value is 0.073.
Figure 1 shows a plot of the unconditional P-value for this example: as p varies.We have used p F as the criterion for 'ordering' the 2 × 2 tables, ie the region of summation used was C = {(x, z) : p F (x; z, m, n) ≤ 0.072}.Other criteria for ordering the tables, such as p L , lead to almost identical curves.(Pierce and Peters [15] give reasons for such phenomena in a more general context.)Superimposed on the plot of P (p) in Figure 1 are horizontal lines corresponding to the Fisher-P (p F = 0.072), the mid-P (p M = 0.039), the Liebermeister-P (p L = 0.035) and the P-value from the chi-squared test (Chi-P= 0.028).The values for the two measures, p Ha = 0.0415 and p Hb = 0.0590 are also superimposed.We observe that the maximum likelihood estimate of p is 6/31 ≈ 0.2 and it is clear from the diagram that the Liebermeister-P is closer to P (p) for all p ∈ (0.2, 0.8).
This numerical example is typical of 2 × 2 tables with small sample sizes.The two measures p Ha and p Hb are 'closer' than p F to the unconditional P-value, but typically they are more conservative than either the mid-P or the Liebermeister-P.As a result, it is only p M and p L which are seriously considered as useful quasi-exact alternatives to Fisher's Exact Test.In their comparison of p M , p L and p F as suitable easily calculated approximations to the unconditional P-value, Seneta and Phipps [19] include plots of the Type I error probability at various significance levels and for various combinations of m and n.With the exception of very unbalanced tables for which p L behaves erratically (the example used is m = 80, n = 40, α = 0.05) the comparisons support the computational use of p L , but for very unbalanced tables, the use of p M is recommended instead.

The Binomial Tail Analogue
An inequality corresponding to (1), for tails from the binomial B(z, p), is: b(a + 1; z, p) < b(.) < b(a; z, p).   6) is satisfied by b(.) = b(a + 1; z + 1, p).This can be proved using elementary combinatorial algebra, since it is not difficult to show that b(a + 1; z + 1, p) can be expressed as follows: This is simply a weighted average of b(a + 1; z, p) and b(a; z, p) and therefore inequality ( 6) is satisfied by b(.) = b(a + 1; z + 1, p).The particular case p = 0.5 is b(.) = b(a + 1, z + 1, 0.5) and is the mid-P in the following two tests.

Exact test for Poisson means
It is well known that if X and Y are independent Poisson variables with common parameter λ, the distribution of X conditional on X + Y = z is binomial, B(z, 0.5).The 'exact' (upper-tail) test for common mean in the Poisson is based on this conditional distribution (see for example Robinson [16]).For an empirically observed value (a, z−a) for (X, Y ), the P-value for an upper tail 'exact' test is b(a; z, 0.5).The less conservative mid-P, b(a + 1; z + 1, 0.5), has some justification as an alternative measure on the grounds that it more closely resembles the uniform distribution.Seneta and Phipps [19] show that this measure is also justified on Bayesian grounds.They use uniform priors to obtain b(a + 1, z + 1, 0.5), by analogy with the method used to derive the Liebermeister p L .It is not difficult to show that the same result is obtained using exponential priors with arbitrary positive, finite mean.It is curious that the resulting measure, b(a + 1, z + 1, 0.5), is identical to the mid-P, in contrast to the two measures p L and p M discussed in §2.

The sign test
Suppose we want an upper one-tail test of the hypothesis (H 0 ) of equal probability of positive and negative counts in a small sample of n counts, some of which may be zero (or ties in a sample of n pairs).Let X, Y, W be the number of positive, negative and zero (or tied) counts and write Z = X + Y .The variable (X, Y, W ) is trinomial, and if H 0 is true, conditional on Z(= X + Y ) = z, the distribution of X is binomial B(z, 0.5).The 'exact' test is therefore the usual sign test and if (a, z) is the observed value of (X, Z), the P-value is P H0 (X ≥ a|Z = z) = b(a; z, 0.5).The parallel with Fisher's Exact Test is immediate, and the corresponding quasi-exact test is the test based on the mid-P.Phipps [13], in discussing the sign test, demonstrates the superiority of the mid-P, b(a + 1; z + 1, 0.5), over the conditional P-value, b(a; z, 0.5), from the sign test.

Figure 1 .
Figure1.A plot of P (p), the unconditional P-value as p varies, for the numerical example of §2.4.Approximations to P (p) for this example are superimposed on the plot: p F (Fisher-P), p M (Mid-P), p L (Liebm.-P),p Ha (Ha-P), p Hb (Hb-P) and Chi-P.
F for the table below, where unity has been added to the diagonals of the previous table: [4]this 2 × 2 table of observed frequencies which arose from a study by Di Sebastiano et al.[4]on rumbling appendix pain (success) in independent samples of non-acute and acute appendix cases.An upper tail test for success probability was required.•TheFisher-P measure is p F = p(5; 6, 15, 16) =• The Liebermeister-P is p L = p(6 : 7, 16, 17) =