Bayesian Equivalence Testing and Meta-Analysis in Two-Arm Trials with Binary Data

We consider a Bayesian approach for assessing hypotheses of equivalence in two-arm trials with binary Data. We discuss the development of likelihood, the prior, and the posterior distributions of parameters of interest. We then examine the suitability of a normal approximation to the posterior distribution obtained via a Taylor series expansion. The Bayesian inference is carried out using Markov Chain Monte Carlo (MCMC) methods. We illustrate the methods using actual data arising from two-arm clinical trials on preventing mortality after myocardial infarction.


Introduction
Consider a clinical trial where a pharmaceutical company wants to test a new drug against a currently existing drug. Sometimes in these studies, the clinical trial end point may be the success or failure of the treatment. A binary outcome is an outcome whose unit can take on only two possible states "0" and "1." is success/failure response variable could be heart disease (Yes/No), patient condition (Good/Critical), how often patient feel depressed (Never/Often), and so on. e natural distribution for modeling these types of binary data is the binomial distribution given by f(x; p) � n x p x (1 − p) n−x for x � 0, 1, . . . , n, p ∈ (0, 1).
(1) e mean and variance for the binomial random variable are E(X) � np and var(X) � np(1 − p), respectively. In (1), it is assumed that there are only two outcomes (denoted "success" or "failure") and a fixed number of trials (n). e trials are independent with a constant probability of success. e main objective of this type of clinical trial is to determine whether there is a significant difference between active treatment (new drug) and reference treatment (current drug). Tests of significance have generally been argued not to be enough. at is, if the p value for a test of significance leads to the nonrejection of the null hypothesis, it is not a proof that the null hypothesis holds. e clinician may want to test a null hypothesis of equivalence against an alternative hypothesis that states that there is a sufficient difference between the two drugs.
Equivalence testing is widely used when a choice is to be made between a drug (or a treatment) and an alternative. e term equivalence in the statistical sense is used to mean a weak pattern displayed by the data under study regarding the underlying population distribution. Equivalence tests are designed to show the nonexistence of a relevant difference between two treatments. It is known that Fisher's one-sided exact test is the same as the test for equivalence in the frequentist approach [1].
is testing procedure is similar to the classical two-sided test procedure but involves an equivalence zone determined by a margin known as equivalence margin (δ). e equivalence margin (δ), which represents a margin of clinical indifference, is usually estimated from previous studies and as such is also based primarily on clinical criteria as well as statistical principle. is margin is influenced by statistical principle but largely dependent on the interest of the experimenter and research questions clinicians wish to answer. As such, the statistical method employed together with the design of the study must be in such a manner that the margin of difference is not too restrictive to capture the bounds of the research question. For a test of equivalence of two binomial proportions, the equivalence margin is discussed in [2]. e frequentist approach to equivalence testing is the two one-sided test (TOST) procedure. By the TOST, equivalence is established at the α significance level if a (1 − 2α) × 100% confidence interval for the difference in treatment means μ i − μ j is contained within the interval (−δ, δ) where δ is the equivalence margin. e motivation for this paper is based on the fact that for a given disease, there is likely to be many other substitute drugs or new drugs that can be used to treat the patients. But these drugs may not all be at the same cost; some may possibly have adverse side effects, and the method of application could be complex for others. On grounds of these information, we do equivalence testing to see if two different drugs can be regarded as equivalent in terms of their treatment effect. ere are a variety of different approaches to this problem as indicated by some recent literature. See Wellek [1], Albert [2], Gamalo et al. [3], Rahardja and Zhao [4] and Zaslavsky [5] for comprehensive details on recent developments. We remark that Gamalo et al. [3] consider a Bayesian approach to proportions along with noninferiority trials. In this paper, we consider a Bayesian approach focusing on equivalence tests. We also construct a simple normal approximation and provide a mechanism for missing data analysis as well.
e remaining sections of this article are organized as follows: In Section 2, Bayesian inferential procedure for binary data is discussed. Section 3 presents a normal approximation to the posterior distribution obtained via a Taylor series expansion. We then examine the suitability of this normal approximation. We discuss a Gibbs sampling mechanism for estimating missing data in Section 4. In Section 5, we analyze a published dataset by Carlin [6] and Yusuf et al. [7]. is dataset consists with 22 treatmentcontrol trials to prevent mortality after myocardial infarction. We conclude with a discussion of the approach in Section 5.

Bayesian Inferential Procedure
Let X t be the number of individuals with positive exposure out of a total of n t patients in treatment group with proportion P t . Accordingly, let X c denote the number of individuals with positive exposure out of a total n c in the control group with proportion P c . en, e priors on the parameters, P t and P c are given by en the posterior distributions of P t and P c are given by For Bayesian inference about treatment effect, a test is required to determine whether the posterior probability of treatment proportions P t and P c lies within the bounds of the equivalence margin or not. ere is therefore, the need to sample from the posterior distribution of P t − P c . e marginal posteriors of P t and P c are Beta distributions, and therefore π(P t − P c | X t , X c ) is not in an analytically tractable form. So, P 1t , P 2t , . . . , P nt are generated from π(P t | X t ) and independently P 1c , P 2c , . . . , P nc generated from π(P c | X c ) because P t and P c are independent. en, P 1t − P 1c , P 2t − P 2c , . . . , P nt − P nc can be treated as a random sample from π(P t − P c | X t , X c ).

Normal Approximation to the Beta Posterior Distribution
Note that the posterior distributions of P t and P c are Beta distributions. By following Kpekpena [8], a normal approximation to posteriors can be obtained using a Taylor series expansion of the Beta distribution. By applying a Taylor series expansion with first three terms, it can be Similarly, the approximation of π(P c | X c ) can also be obtained. e details of this construction are given in the Appendix. We provide some approximations based on this development in Table 1 and Figures 1 and 2. It is clear that the approximation starts to work well for the values of the posterior parameters from x + α � 10 and n + β − x � 10. However, the approximation is not suitable when Beta posterior parameters are less than 10.

Estimating Missing Data in Arms
Missing data are easily handled in Bayesian inference by treating them as another set of parameters. We estimate the missing values conditioning on the observed data. For example, let X 1 , . . . , X n be a binary random sample from Ber(P) in an arm and suppose that X m is missing. Note that P represents P t in treatment arm and P c in control group. Let P ∼ Beta(α, β) and Y � n i≠m X i . en, the likelihood of the observed data is e posterior of P based on the complete data e full conditionals of P and X m are It is easy to generate from these full conditionals in R, so P and x m can be estimated using Gibbs sampling.

Data Analysis
We apply our approach on data analyzed in [7,9]. e data includes 22 trials of beta-blockers to prevent mortality after myocardial infarction. For each of the 22 trials, a test of equivalence is done to ascertain whether the treatment proportion is equivalent to the control proportion. is example is based on the Statistical inferential procedure for binary data discussed in Sections 2 and 3. For each arm, the number of patients who had myocardial infarction out of a total n t is considered to be the number of successes in n t binomial trials. Similarly, the number of cases in the control group is treated as a binomial outcome independent of the treatment group. e equivalence margin δ is chosen to be as small as possible such that if the absolute value of the difference in the control and treatment proportions is less than δ, and we can say that the two proportions are equivalent. For demonstration purpose, we assume a practically meaningful equivalence margin δ � 0.01. We use noninformative priors Beta(2, 1) for the parameters, P t and P c .
e hypothesis for a test of equivalence of study number i and its control group is as follows: We perform the equivalence test in (9) using the Bayes factor [9]. Table 2 gives the results of the equivalence tests. e first column D i is the ith study label. Columns 2 and 3 are the treatment proportion (x t /n t ) and control proportion (x c /n c ), respectively. Columns 4 (P(H 1 | X)) and 6 (P A (H 1 | X)) are the posterior probabilities that H 1 : |P ti − P ci | < δ is true under the Beta posterior distributions and under the normal approximation to the Beta posterior, respectively. Column 5 (B) is the Bayes factor for exact posterior, and B A in column 7 is the Bayes Factor based on the normal approximation. For study 1, the Bayes factor for the exact posterior is 7.3822 whereas that of the normal approximation is 7.5466. Both Bayes factors are above 1 which implies that H 0 is more likely to be true, and H 0 is the hypothesis that the treatment proportion is not equivalent to the control proportion. We remark that classical hypothesis tests give one hypothesis a preferred status and only consider evidence against it which is not the case in Bayesian tests. Results also indicate that the approximation and exact computation lead to the same conclusion in each study indicating the suitability of the approximation.
We now consider a missing data analysis in an arm. As an example, suppose an observation was missing in the     Computational and Mathematical Methods in Medicine treatment group under study 1. We estimate this missing value using Gibbs sampling derived in Section 4. e posterior distributions of parameters P and x m are given in Figure 3 based on 20000 MCMC simulations. According to Figure 3, it is likely that x m is 0. e trace plot in Figure 4 shows that mixing is good enough, and there are no large spikes in the autocorrelation plot after lag 0. is is an indication of convergence of the Markov Chain.
We also consider a meta-analysis of the binary data in two-arm trials in order to assess the between-study variations. Let y i be the estimate of the true effect size μ i corresponding to the ith study. en, the random effects model is given as As developed in Muthukumarana and Tiwari [10], we consider a hierarchical Dirichlet process formulation for μ i as follows: where M, μ 0 , and d are known.
We analyze the dataset published in Nissen and Wolski [11] to assess the between-study heterogeneity. In this dataset, there are 42 trials including 15565 diabetes patients who were put on rosiglitazone (treatment group) and 12282 diabetes patients assigned to medication that does not contain rosiglitazone (control group). Note that the interest is on myocardial infarction and death from rosiglitazone as a treatment for diabetes. We use the odds ratio as the treatment effect. e parameters of the model are estimated by Gibbs sampling algorithm implemented in R. e estimates of the model parameters (μ i ) are given in Table 3.
We also conducted a simulation study to assess the validity of the approach. Each study was simulated by means of a binomial random variable in which the number of cases in the treatment group and the control group are generated as independent binomial random variables. We generate twenty binomial successes using the rbinom random generator. We assume n � 200 in each case and fix the p at 0.7. is setting is similar to administering a treatment in twenty hospitals with 200 patients in each hospital. Fixing p at 0.7 generates number of cases that do not vary so much from each other. is is confirmed in the non-significance of the chi-square test for heterogeneity. Another set of twenty "number of cases" is generated from the binomial distribution, but this time we induce heterogeneity.
is is done by varying the success probability of each trial. For instance rbinom (1, 200, 0. Note that our interest is in comparing the posterior treatment means of the heterogeneous studies with the studies that are not heterogeneous. Table 4 compares the posterior treatment means of 20 studies with heterogeneity to the treatment means of 20 other studies in which there is no heterogeneity. Column 1 is the posterior treatment means of the nonheterogeneous (μ i ) studies whereas μ * i in column 2 is the posterior treatments of the heterogeneous studies. Treatment means in column 1 (μ i ) are mostly 0.68 or just slightly below or above it. If the responses are similar, the treatment effects are supposed to be an estimate of a common treatment mean. On the other hand, all the treatment means in column 2 (μ * i ) differ from each other

Discussion
We have considered a Bayesian analysis of binary data in testing hypotheses of equivalence. e tests of hypotheses of equivalence are popular in clinical trials, and our approach is relatively simple and easy to perform. Bayesian formulation was considered for testing hypothesis of equivalence, and we observed that the normal approximation to the Beta posterior can be used for moderately large sample sizes. We also presented a mechanism for estimating missing data in arms. is is useful in situations where data are partially missing in some arms. We also considered a meta-analytic approach for assessing between-study variations.
ere are two directions we would like to pursue along the methods discussed in this article. We are interested in enhancing the method to accommodate extra covariates into the model in the presence of multiple outcomes in an arm. e incorporation of covariates makes the Bayes factor inappropriate, and we would like to examine the other model selection criterion in place of the Bayes factor.

Appendix
Let the best estimate of P, P 0 be the value of P for which the posterior is at its maximum. at is, dπ(P | x) dp | P 0 � 0, d 2 π(P | x) dP 2 | P 0 < 0.
(A.1) e Taylor series expansion of a function f(x) at X � x 0 is Let the log of the posterior distribution be L(P) � log(π(P | X)).