Exact Probabilities and Confidence Limits for Binomial Samples: Applied to the Difference between Two Proportions

An exact probabilities method is proposed for computing the confidence limits of medical binomial parameters obtained based on the 2×2 contingency table. The developed algorithm was described and assessed for the difference between two binomial proportions (a bidimensional parameter). The behavior of the proposed method was analyzed and compared to four previously defined methods: Wald and Wilson, with and without continuity corrections. The exact probabilities method proved to be monotonic in computing the confidence limits. The experimental errors of the exact probabilities method applied to the difference between two proportions has never exceeded the imposed significance level of 5%.


INTRODUCTION
The problem of computing confidence limits for a binomial success probability is not new; however, a consensus has not yet been reached. Most methods of computing the confidence interval are implemented in order to exclude both tails, but they could also be expressed to exclude only one tail. The confidence interval is used to report research results as a criterion of assessment of the trustworthiness or robustness of the finding [1,2], allowing a better interpretation of the results.
The following approaches are frequently used to construct the confidence interval for binomial proportions:  Cumulative probabilities of a binomial distribution (Clopper and Pearson construct confidence limits by intersecting one-sided lower and upper intervals) [3]  Approximation of binomial distribution with normal distribution (Wald method [4], also as normal approximation interval)

Binomial and Bidimensional Binomial Distribution
A binomial sample was generated (m experiments are done and the number of x successes are counted). If p is the probability of success and q is the probability of failure (where q = 1-p) in the population, the probability to obtain x successes for a sample size of m is given by the formula presented in Eq. 1.
Using x as an unbiased estimator for the population mean (m·p) and performing independent repetitions of a binomial experiment with the same sample size (m), the probability of y occurrence is given by the formula in Eq. 2.
A bidimensional binomial distribution could be obtained when the cases (noted x and y) are extracted from two samples of sizes m and n. The main hypothesis in medical studies is that the two samples (m and n) belong to the same population. The hypothesis that must be verified is: "Is there a significant difference between the two samples under the hypothesis that both are representative for the population?" The probability that the binomial variable X (sample size of m and mean of x) and Y (sample size of n and mean of y) occur simultaneously and independently is given by Eq. 3.
where P BB = probability of bibinomial distribution. The probability that pairs (X,Y) occur in sample sizes of 10 (m = n = 10) was obtained by applying Eq. 3 (Fig. 1).

FIGURE 1.
Bidimensional probability function; event: simultaneous extraction of X and Y from independent binomial samples (x,10) and (y,10). Probability for X = 3 and Y = 7.

Algorithm of the Exact Probabilities Method
Confidence limits depend on the binomial distribution function calculated based on the values of the studied variables (as is shown in Fig. 1). The lower and upper boundaries of the confidence interval could be obtained by the sum of the probabilities of simultaneous occurrence of values adjacent to the binomial values observed in the investigated samples (the sum of the segments on the plane XOY in Fig. 1).
A series of home-made PHP (recursive acronym for Hypertext Preprocessor) programs were developed in order to compute the confidence lower and upper limits at a significance level of 5%. The algorithm was developed and implemented after a previous analysis.
The algorithm is presented below: for every observation vx from sample m for every observation vy from sample n for every random X drawn from the binomial distribution (vx,m) for every random Y drawn from the binomial distribution (vy,n) compute P BB (vx,X,m,vy,Y,n) given by Eq. 3 compute a medical-like indicator (MLI) using a formula for X, m, Y, n: MLI=MLI(X,m,Y,n) sort the table (ascending, maintaining associations) containing P BB , and MLI by MLI values (procedure done in Table 3

Algorithm Assessment
The algorithm for computing the confidence interval was assessed on the excess risk (a measure of the association between a specified risk factor and a specified outcome) [43].
The excess risk is computed based on a 2×2 contingency table by using the formulas presented by Eq. 4.
where ER = excess risk; a = true positive; b = false positive; c = false negative; d = true negative. The mathematical function associated to Eq. 4 is presented in Eq. 5.
where x, y = binomial distributed variables; m, n = sizes of the samples. Two experiments were conducted for equal sample sizes of 15 (m = n = 15): 1. (0≤x≤15, 15,14,15), where x varied from 0 to 15, y had to be equal to 14, and m and n sample size were equal to 15 2. (x random ,15,y random ,15), where x and y were random variables and the sample sizes m and n were equal to 15 The following methods for computing the 95% confidence interval (significance level of 5%) were applied: exact probabilities (abbreviated as Exact.p) (proposed algorithm); Wald method (abbreviated as DWald) [4]; Wald with continuity correction (abbreviated as DWaldC; used only in the random experiment) [30]; Wilson (abbreviated as DWilson) [5]; and Wilson with continuity correction (abbreviated as DWilsonC) [29].
Four parameters were computed and assessed: the value of excess risk, the lower and upper 95% boundaries for each method (α = 5%), and the associated error for each method.

Examples of Bidimensional Binomial Distribution
The bidimensional probability matrix was constructed in order to meet the research objective. The bidimensional probability matrix obtained by applying Eq. 2, which corresponds to the extraction of the value (Y) from the binomial sample of (y = 1, n = 10) (see rows in the matrix) simultaneously with the extraction of the value (X) from the binomial sample of (x = 9, m = 10) (see columns in the matrix), is presented in Table 1. The intersection of a row with a column contains the values obtained by applying Eq. 3.
The probabilities associated with a bidimensional binomial distribution ([x,m],[y,n]) could be used to calculate probabilities for any mathematical function of the ff(x,m,y,m) type.
The data in Table 2 are presented in ascending order according to the values of f4(X,5,Y,5).
The data presented in Table 3 were reorganized according to the sum of the probabilities corresponding to the identical values of the f4(X,m,Y,n) function ( Table 4). The cumulated probabilities from extreme values towards the value of the f4(x,m,y,n) function were also included in Table 4 (last column).

Exact Probabilities Method: Experimental Errors
The experimental errors associated with the 95% confidence boundaries, when the implemented algorithm for excess risk (m = n = 15) was computed, are shown in Fig. 2. The values of errors varied from 0.00 to 4.95, with a mean of 3.56 and a standard deviation of 0.93.
The following results were obtained when the confidence limits presented in Fig. 3 were analyzed to see whether they fit in the "golden standard" interval (the Exact.p method):  The DWald method proved to be the closest method to the proposed Exact.p approach. The lower and upper limits obtained by applying the DWald method respected the Exact.p intervals for the lower and upper limit in 10 out of 16 cases (~63%).  The Wilson methods (DWilson and DWilsonC) performed similarly with one exception (DWilson for (13,15,14,15)): the lower and upper boundaries were not included in the intervals obtained by the Exact.p method.
The descriptive statistics parameters associated with the experimental error for all the methods were computed and the results are presented in Table 5. The errors associated with the investigated methods (at a significance level of 5%) are presented in Fig. 4.

Results of the (x random ,15,y random ,15) Experiment
Forty-five values for x and y were assigned randomly by applying the (0≤x≤m and 0≤y≤n) criteria. The obtained results are presented in Table 6. Table 6 also summarizes the inclusion of the limits obtained by each applied method into the range of boundaries obtained by the Exact.p method (under the assumption that the Exact.p method is the most restrictive method).
The variation of the experimental errors obtained by applying all the methods when computing the 95% confidence interval and when using the "|5-abs(Err)|" criterion is presented in Fig. 5.

DISCUSSION
The aim of the research was to develop and assess an exact method for computing confidence boundaries for medical parameters computed on the 2×2 contingency table. The exact probabilities method proved to  be useful in computing the confidence limits on binomial samples with small sample sizes and provided reliable results. The Exact.p method provides confidence limits within 95% (significance level of 5%) or better no matter how small the sample size. Computing the binomial probability matrix was the first step in applying the proposed method. The obtained probabilities were used later to construct confidence limits for binomial parameters. As expected, the repetitions of the value associated with the f4(X,m,Y,n) function used for exemplification (see Table 3) were not identified when the confidence limits were calculated. This observation applies to any function of the ff type (ff[x,m,y,m]).
The analysis of the results presented in Tables 1-3 revealed an important conclusion. The values of X and Y proved to be relevant only for the calculation of probabilities and for computing the values of functions. These values proved not to be important in the addition of probabilities used to construct the confidence limits. The adding of probabilities associated to bibinomial distribution values from the two extremes (lowest and highest -beginning and end of last column) proved to be a reliable solution in constructing the confidence limits by using the imposed probability and the experimental errors (see Table 4). These experimental errors are the real values of the probability errors obtained for the computed confidence limits. The experimental errors of the Exact.p method never exceeded the imposed significance level (4.95 was the maximum value, see Fig. 2). The boundaries calculated by using the exact probabilities approach are, in fact, intervals for both the lower and the upper limit, as shown in Fig. 3. The Exact.p method is considered the "golden standard" in the evaluation of the 95% confidence interval according to the values of the experimental errors (Tables 5  and 6). The errors varied in the (0≤x≤15, 15,14,15) experiment from 0.18% (DWilsonC) to 35.55% (DWilson), the lowest variation being obtained by the Exact.p method (3.32 was the difference between the maximum and the minimum error). The errors varied in the (x random ,15,y random ,15) experiment from 0.03% (DWilsonC) to 35.55% (DWilson), the lowest variation was obtained by the Exact.p method (3.34 was the difference between the maximum and the minimum error).
The analysis of experimental errors (Tables 5 and 6, Figs. 4 and 5) led to the following observations:  The experimental errors of the proposed method of constructing confidence limits has never exceeded the imposed significance (α = 5%).  The DWald method proved to perform better than the DWilson method and its continuity correction, but similarly with the DWaldC method. The means of errors in the random experiment was 5.24 (1.86 standard deviation) when the DWald method was used and 5.00 (1.84 standard deviation) when the DWaldC method was used.
The confidence limits obtained in the random experiment were checked to see whether they fit in the "golden standard" interval (the Exact.p method, according to the values of the experimental errors). The following proved to be true (see Table 6):  The DWaldC method had the highest inclusion: 19 out of 45 cases. It was closely followed by the DWald method: 18 out of 45 cases.  The DWilson method performed slightly better than DWilsonC: eight inclusions out of 45 compared with six inclusions out of 45.
The newly introduced method for constructing the confidence limits for discrete distributions using the probability distribution matrix proved to respect, without any exception, the imposed significance level. The proposed algorithm was assessed in two experiments by using the excess risk parameter, which is a binomial parameter computed on the 2×2 contingency table. The algorithm was assessed in terms of the difference between two proportions, but it could also be applied to construct the confidence limits of any parameter computed based on the 2×2 contingency table, since computing the confidence limits is based on the probability matrix.

CONCLUSIONS
The proposed algorithm obtained the exact domain for the lower and upper limits of the confidence interval for f, X, Y, m, n, α, and their associated experimental errors. The implemented algorithm required that the desired value for the experimental errors was as close as possible to the imposed significance level α without exceeding this level.
The proposed algorithm proved to be monotone and never exceeded the imposed significance level in small sample sizes (m = n = 10, m = n = 15). Further studies will be conducted in order to analyze the behavior of the implemented algorithm on different sample sizes and different discretely distributed parameters.