Odds Ratios Estimation of Rare Event in Binomial Distribution

We introduce the new estimator of odds ratios in rare events using Empirical Bayes method in two independent binomial distributions. We compare the proposed estimates of odds ratios with two estimators, modified maximum likelihood estimator (MMLE) andmodifiedmedian unbiased estimator (MMUE), using the EstimatedRelative Error (ERE) as a criterion of comparison. It is found that the new estimator is more efficient when compared to the other methods.


Introduction
The odds ratio is a measure of association between two independent groups on a categorical response with two possible outcomes, success and failure.The two independent groups can be two treatment groups or treatment and control groups.The odds ratio is widely used in many fields of medical and social science research.It is most commonly used in epidemiology to express the results of some clinical trials, such as in case-control studies.
A number of subjects in each group falling in each category can be summarized in a two-way contingency table.Total numbers of subjects in group 1 and group 2 are  1 and  2 , which are assumed to be fixed.Numbers of successes in group 1 and group 2 are  1 and  2 , which are considered as independent binomial random variables.Let  1 and  2 be probabilities of success in group 1 and group 2, respectively.The odds of success in group 1 are defined to be odds 1 =  1 /(1− 1 ), similar to group 2. The usual maximum likelihood estimator of odds ratio is defined as . ( Odds ratio is nonnegative real value.When successes are similar in both groups, the odds ratio is equal to 1, meaning that groups are independent of response.When the odds of a positive response are higher in group 1 than in group 2, the odds ratio is greater than 1 and vice versa for the value less than 1.The father of odds ratio from 1 in a given direction represents stronger association.In addition, its sampling distribution is highly skewed.Sample natural logarithm of odds ratio, which is less skewed, is often utilized for inference.However, odds ratio can be zero (if zero cell count appears in numerator of ( 1)) or infinity (if zero cell count is in denominator of ( 1)) or undefined (if there are zero cell counts in both the numerator and denominator of (1)).Haldane [1] and Gart and Zweifel [2] suggested to add a correction term 0.5 to each cell, when having zero cell count, which gives the modified maximum likelihood estimator (MMLE) as Even though ÔR MMLE still laid between 0 and infinity, some investigators discouraged adding 0.5 to each cell because of the appearance of adding "fake data"; see Bishop et al.
[3] and Agresti and Yang [4].Among controversy, several similar alternatives to this modified maximum likelihood estimator have been proposed.Hirji et al. [5] proposed the median unbiased estimator (MUE) of the odds ratio, obtained from the conditional noncentral hypergeometric distribution.However, the median unbiased estimator of the odds ratio still caused a problem when  1 =  1 and  2 =  2 or  1 =  2 = 0, and then the MUE was undefined.Parzen et al. [6] proposed an estimator of the odds ratio based on MUE called the modified median unbiased estimator (MMUE) of which the estimated probability of success was always in the interval (0, 1), even if there were 0 or  successes in each group.Consequently, the estimated odds ratio always laid between 0 and infinity.Additionally, this method performed well with respect to bias in small sample and was an alternative to adding "fake data." In this paper, we focus on "rare events" which occasionally observed zero or small counts of interesting events which happened within a given time period or a given sample such as natural disasters or some diseases.As aforementioned, rare events caused difficulty in estimation of odds ratio due to the occurrence of zeros or small observed counts in numerator or in denominator or in both, resulting in the large standard error and therefore less precise confidence interval.Only a rough estimate of the odds ratio is thus obtained.Researches involving association between categorical variables in contingency table have long been studied, using both classical and Bayesian approaches.Good [7] studied association factor, at early stage, in large contingency table with small entries, assuming log-normal and Pearson type III distribution.The author also mentioned that these assumptions may be less accurate but easy to handle.Fisher [8] estimated the odds ratio based on hypergeometric distribution utilizing exact method in a 2 × 2 table.Thomas and Gart [9] constructed a table for 95% confidence limits of differences and ratio of two proportions, including odds ratio and one-tailed  value for Fisher-Irwin Exact test in various types of 2 × 2 table.Altham [10] studied association and exact  value in a 2 × 2 contingency table based on the cumulative posterior probabilities which was not easy to extract.Nurminen and Mutanen [11] proposed Bayesian approach for the estimation of difference between two proportions, risk ratio and odds ratio, using independent beta prior and provided integral expressions for the cumulative posterior distribution.They also applied the proposed method to real data regarding malignant lymphoma and colon cancer cases exposed to phenoxy acids and chlorophenols in agriculture.Nouri et al. [12] presented the estimation of the odds ratio in 2 × 2 ×  tables when exposure was misclassified.They compared the matrix and inverse matrix methods to the MLE method using simulation study and found that the inverse matrix method having a closed form was more efficient than the matrix method.
As previously mentioned, the estimates of association measure in two-way contingency table can be carried out based on classical and Bayesian approaches.The exact distribution using classical approach is, however, rather difficult for mathematical tractability.In Bayesian approach, where prior belief is incorporated into derivation of posterior density, the hyperparameters, characterizing the prior density, are often unknown to researchers and need to be assessed irrespective of current data.However, controversy still exists.Alternatively, the estimation of hyperparameters is plausibly carried out with the notion of Empirical Bayes method using current data to estimate the unknown hyperparameters, contrary to Bayesian approach.As a consequence, we focus on the utilization of Empirical Bayes method to estimate the odds ratio in a two-way contingency table, focusing on small proportions of success.Our purposed estimation tends to outperform the traditional estimator, MMLE, and MMUE without interference in the original data.
The rest of this paper is organized in the following sequence.In the next section, we discuss the median unbiased estimator.The third section describes the odds ratio estimation using EB method.The forth section illustrates simulated results, and the efficiency of EB is compared with MMLE and MUE.The fifth section displays the application of our method to real data.Our conclusion is drawn in the final section.

The Modified Median Unbiased Estimator of Odds Ratio
Parzen et al. [6] suggested the modified median unbiased estimator (MMUE) in two independent binomial distributions.
Let p be the estimator of success probability which satisfies To obtain p , they use the binomial distribution,   ∼ (  ,   ), where   denotes random variable representing success in the th group ( = 1, 2).Let   be the observed value of   .

𝑃 (𝑌
The MMUE can be computed from the distribution of sufficient statistics for binomial data.
Compute the values of    and    to be those value of   for which where    and    are the smallest and largest values of   , respectively.Then, the MMUE is defined as When 0 <   ≤   , we can find values of p  and p  which satisfy Then, solve from and solve p  from The values of p  and p  can then actually be obtained by using the relationship between the cumulative beta distribution and the cumulative binomial distribution function as follows (Daly [13] and Johnson et al. [14]).
Let  ∼ Beta(, ): We need to find p  and p  such that In particular, where  −1 ( | ,) is the th quantile of the betadistribution with parameters  and .Now suppose   = 0, and then Any value of p  in the interval [0, 1] satisfies where p  = 0 is the smallest possible value of p  .Similarly, when   = 0, p  satisfies Consequently,   = 0; p equals p = (p Similarly, when p  = 1 is the largest possible value of p  , then p  satisfies when   =  and p = (p   + p  )/2.Then, the MMUE of odds ratio estimation is defined as where p1 and p2 denote success probability estimators in groups 1 and 2, respectively.

Proposed Estimation of Odds Ratio
In this section, we proposed a new method for odds ratio estimation using Empirical Bayes method in two independent binomial distributions.Let  1 and  2 be random variables, distributed as binomial with equal and unequal sample sizes and unknown probability,  1 ∼ Bin(  ( Consequently, the posterior marginal distribution function of  is the beta-binomial distribution (BBD).Then, both hyperparameters in each group can be estimated using maximum likelihood method.The likelihood function of posterior marginal distribution function is then written as Applying Newton-Raphson method to solve a nonlinear equation, the ( + 1)th maximum likelihood estimator of hyperparameters ( = 1, 2, . ..) can be obtained from where where the moment estimators of hyperparameters in betabinomial distribution are used as initial values; see Minka [15].
The posterior distribution function of  is thus calculated, yielding Thus, the EB estimator of odds ratio can be obtained as follows: where   1 and   2 denote success probability estimators in groups 1 and 2, respectively.

Simulation Study for MMLE, MMUE, and EB Method
Simulation studies have been carried out using R program (version 3.2.0)[16] to assess the efficiency of the EB method in comparison with two existing methods.Binomial data are generated with equal and unequal sample sizes: ( 1 ,  2 ) = (10, 10), (10, 30) with (10, 50) probabilities of success in group 1:  1 = 0.01, 0.03, 0.05, 0.1, and 0.15.For each value where OR denotes the usual maximum likelihood estimator of odds ratio and ÔR  denotes the estimate of odds ratio using EB, MMLE, and MMUE ( = 1, 2, 3, . ..), respectively.The simulation results with odds ratio estimates for sample sizes ( 1 ,  2 ) = (10, 10), (10, 30) and (10, 50) are given in Tables 1-3.The performance of estimation uses ERE given in Tables 4-6 and compares this result with graph in Figure 1; the other case provides similar results.It is found that the odds ratio estimation using EB method mostly yields smallest ERE with 78.67%, while those using MMLE and MMUE methods result in smallest ERE with only 6.67% and 14.66%, respectively.

Illustrative Examples Using Real Data
Our first example is taken from the studies of Good [7] and Hardell [17].As shown in Table 7, subjects with malignant   The EB estimator of odds ratio is also more efficient than the other two estimators, MMLE and MMUE.In addition, our purposed estimator is an alternative method for odds ratio estimation to the MMLE method without disturbing the original data.
2 be estimators of  1 and  2 , respectively, where

Table 7 :
True odds ratios and their estimates using EB, MMLE, and MMUE, with corresponding percentages of ERE.