PLAY-THE-WINNER RULE AND ADAPTIVE DESIGNS OF CLINICAL TRIALS

. In another paper, we have argued that the traditional randomized design of clinical trials is ethically infeasible in desperate medical situations and adaptive designs are morally required. We have also argued that in such situations, the appropriate designs must satisfy what we call the “Principle of interchangeability.” In this statistics paper, we show that the statistical model of bandit processes satisﬁes this principle of interchange-ability. Moreover, we demonstrate that when such a model is used as an adaptive design, the total regret of successes lost is smaller when compared with simple randomization. We illustrate the results by the simple deterministic play-the-winner design.


Introduction.
In our ethics paper (Pullman and Wang [17]) on the ethical necessity to use adaptive designs in clinical research, we have argued that the key consideration in clinical research depends on the nature of the relationship between the clinician/researcher and the patient/subject, instead of on the tension between individual and collective ethics.Our study on the dynamics of the researcher's moral responsibility shows that the process of informed consent is central to this relationship.Randomization is justified as long as the patient/subject understands the nature of the proposed clinical research and the risks involved, and is capable of providing a fully informed consent.The ethical justification for randomization hence lies in the process of informed consent.
However, in desperate medical situations, the patient/subject is justifiably concerned with his/her own physical well-being, and from his/her point of view the physician's dual role as a researcher is irrelevant to the clinical decision that must be made.The patient is no longer capable of comprehending the nature of the proposed clinical research and the risks involved, and his/her capacity to provide a fully informed consent is almost entirely compromised.
As subjects/patients lose their capacity for autonomous choice, it is incumbent upon the researchers to assume greater responsibility for their care and well-being.The appropriate designs of the clinical research must then be such, so as to minimize the risk to individual patients.Adaptive clinical trials are designed for this purpose.Moreover, informed consent does not provide the ethical justification for adaptive designs as it does for randomization.Hence adaptive designs are ethically justified and may be morally required in desperate medical situations.
We have argued that the appropriate adaptive design must satisfy what we call the "Principle of interchangeability."Suppose that there are N patients to be treated both in and after the trial, and each patient is treated by one and only one of two treatments.We say that a design satisfies the principle of interchangeability if any two of these N patients are ethically interchangeable.That is, at the point of enrollment in the clinical trial, the intent is to provide the best treatment available to each patient given current information.This principle meets the ethical imperative that clinicians must always endeavour to provide the best overall treatment for each individual patient and so each patient's fate is determined not by the particular design of the trial but instead by the chance and timing of getting the disease.
Randomized trials do not satisfy this principle of interchangeability because a patient in the trial and another patient after the trial are not ethically interchangeable.In Section 2, we demonstrate that the principle of interchangeability is satisfied if we utilize the statistical model of bandit processes as an adaptive design.We also compare this adaptive model with randomization and show that more lives can be saved under adaptive designs.The comparison is illustrated by the simple deterministic play-the-winner design.Finally, we conclude with a discussion on relevant studies and further research directions.
2. Bandit processes and the play-the-winner rule.For our discussion, assume immediate and dichotomous responses and two treatments A and B with probabilities of success P A and P B , respectively.We denote Z i , i = 1, 2,...,N, as the response from the ith patient, which is either 1 for a success or 0 for a failure.As noted by Berry and Eick [3], N depends on the prevalence of the disease and is normally unknown to us.
The principle of interchangeability is satisfied when we maximize the total expected responses from all N patients.Such an objective is particularly desirable from the subject/patients' perspective.Following Berry and Eick [3], we maximize W π (P A ,P B ) = E π (Z 1 + Z 2 + ••• + Z N | P A ,P B ), conditionally on P A , P B , and N, or unconditionally, where π is a strategy for allocating treatments to these N patients.Any optimal strategy π * which maximizes W π (P A ,P B ) is characterized by the dynamic programming equation which states that the current patient is offered the best treatment under the current information, given that all future patients are treated optimally.Such a recursive property of the optimality equation demonstrates the satisfaction of the principle of interchangeability.
This optimization problem is a typical bandit problem (Berry and Fristedt [4], Gittins [11], and Presman and Sonin [16]).The fundamental characteristic of an optimal strategy is that it enables us to compromise between the need to gather information about the unknown effectiveness of the treatments in order to provide better treatments in the future and the imperative ethics to maximize the immediate response for the current patient (Berry and Fristedt [4]).By nature such a strategy must be adaptive.Randomization aims only at gathering information and ignores immediate responses.On the other hand, the myopic strategy focuses on immediate responses and ignores information gathering.These are two extreme strategies and are not optimal in general.
Suppose that n(≤ N) patients are included in the clinical trial, and the rest N − n patients are treated with the superior treatment identified at the end of the trial.n may be either fixed or random.If P A and P B were known, the total expected number of successes is N max{P A ,P B }.The conditional regret of successes lost for strategy π when not knowing the treatment effectiveness P A , P B is defined as where P π A and P π B are, respectively, the estimated probabilities of success on treatments A and B at the conclusion of the trial.
is the total regrets from all patients after the trial and is determined by the trial conclusion.On the other hand, R π,2 (P A ,P is the total regrets from all patients in the trial and is determined by the trial design and treatment allocations.It is ideal to minimize both R π,1 (P A ,P B ) and R π,2 (P A ,P B ) simultaneously in order to minimize R π (P A ,P B ).
Compare the performance of randomization π 1 and an adaptive design π 2 with respect to R π (P A ,P B ). Conditioning on (2.2) Fixed-size randomized trials use fixed sample sizes for n and minimize R π,1 (P A ,P B ) only.In a sequential trial, n is adaptive and minimized such that the trial is stopped as soon as there is strong evidence to indicate the superiority or inferiority of one treatment.Hence a sequential trial minimizes in general both R π,1 (P A ,P B ) and R π,2 (P A ,P B ).This clearly indicates that sequential trials are both ethically and statistically superior to fixed-size randomized trials, especially when there is a large difference between the two treatments. Let ( then the unconditional advantage of adaptive designs over randomization for R 2 is which is negative if and only if or (2.10) In both conditional and unconditional cases, the total regret for all patients in the trial is less if the total probability of using the superior treatment for all patients in the trial is more than n/2.This is exactly what is expected through the use of adaptive designs.
It is hypothesized that in general, the probability that the superior treatment is allocated to each patient would be more than 0.5 under the use of an adaptive design.Although its demonstration may be difficult in general, we show it for the deterministic play-the-winner design (Zelen [28]).Under this design, the first patient is allocated to either treatment by a simple randomization.Then the same treatment is applied after a success and the other treatment is used after a failure.Assuming clinical equipose (Freedman [10]) (i.e., there is genuine uncertainty at the beginning of the trial as to which treatment is superior), the principle of interchangeability is satisfied when the initial patient is enrolled.
Suppose the deterministic play-the-winner rule is followed.Let ∆ = P A − P B and K = P A + P B .To avoid triviality, assume that K ≠ 2. Then p 1 = 1/2 and for any integer n ≥ 0, by mathematical induction, .11)This sequence of probabilities of allocations on treatment A has the following properties.
(a) p n+1 > 1/2 for any integer n ≥ 0 if P A > P B , p n+1 < 1/2 if P A < P B , and p n+1 = 0.5 if P A = P B .Moreover, when K > 1, p n+1 is increasing in n when P A > P B , and is decreasing when P A < P B .
These results have intuitive interpretations.(a), (b), (d), and (e) say that the probability of using the superior treatment is more than 50%.(c) indicates that p is simply the asymptotic fraction of patients on treatment A. Zelen [28] achieved the same result based on a different approach.Wei and Durham [26] observed the same result for randomized play-the-winner rule.(f) shows that if P A (or P B ) is sufficiently large, treatment A (or B) will eventually be identified as the superior treatment.On the other hand, if there is no difference between the two treatments, we eventually randomize between them.(g) assumes that P B = 1 − P A .Suppose that P A > 0.5.If we call a "success" if treatment A is allocated and a "failure" if B is used, we essentially have a binomial experiment with P A as the probability of success.The probability of using the superior treatment remains a constant larger than 0.5, and the expected number of successes is then For n < N, we have Then conditioning on (P A ,P B ), For given values of P A and P B , this upper bound B is decreasing in n < N and is negative for sufficiently large n.
when N = 100 is given in Table 2.1.The values become smaller when N is increased.
Therefore if the majority of all patients are recruited into the trial (i.e., n/N is large), deterministic play-the-winner design is statistically superior to randomization in the sense that the maximum number of patients in and after the trial are treated successfully.This proportion n/N becomes smaller for larger difference between the treatments.

Conclusion and discussions.
The deterministic play-the-winner rule has been used in some clinical trials such as Bjerkeset et al. [5], Larsen et al. [14], and Reiertsen et al. [18,19,20].Since its use in practice may introduce selection bias, it is later generalized to the randomized play-the-winner design (Wei and Durham [26]) which has been used in Bartlett et al. [2] and Tamura et al. [25].
Many simulations have demonstrated the superiority of adaptive designs over randomization.For example, Berry and Eick [3] proposed and examined the objective of optimization which we have discussed here.Based on simulations, they suggest that adaptive designs may be appropriate when the majority of the patients with a particular disease are recruited into the trial.Our statistical arguments are consistent with their observations.Yao and Wei [27] have simulated the AZT trial for reducing the risk of maternal-to-infant HIV transmission (Connor et al. [7], Rosenberger [21]), and concluded that more newborns could have been saved with no jeopardy to the statistical power if randomized play-the-winner design were used instead of randomization.Day [8] shows that adaptive clinical trials are better than sequential clinical trials, which are in turn better than randomized clinical trials.Simon [24] reports that the gain of adaptive clinical trials over sequential clinical trials is relatively modest, and Hallstrom et al. [12] indicate that the power is largely unaffected by using the playthe-winner rule in a typical chronic disease mortality trial.More recently, Coad and Rosenberger [6] have reported reduction of failures when the randomized play-thewinner design is combined with a fully sequential triangular test.Also see Flehinger and Louis [9], Louis [15], and Rosenberger and Seshaiyer [23] about adaptive clinical trials for survival trials.
The use of adaptive designs have also been suggested by Hardwick [13], Rosenberger and Lachin [22], and others.Yet adaptive designs have never become part of the mainstream clinical research methodology even though they seem to be desirable in desperate medical situations from ethical point of view.Pullman and Wang [17] (and references quoted) have discussed the many reasons behind, including ideological resistance.
What kind of adaptive design is both statistically optimal and practically feasible remains an open problem.It is also a major difficulty to develop appropriate methods of statistical inferences for adaptive designs.We use the deterministic play-the-winner design for illustration purpose only.
Despite these challenges in both the design and analysis of adaptive clinical trials, we are confident that adaptive designs will remain an important and active research area.After all, this seems to be the kind of contribution we statisticians should be making (Armitage [1]).This area has a bright future because when adaptive trials are properly designed and analyzed, we statisticians may save more lives than the medical doctors.
n, andR π 1 ,2 P A ,P B = B − min P A ,P B .
< P B .The conditional advantage of adaptive design π 2 over randomization π 1 for R 2 is R π 2 ,2 P A ,P B − R π 1 ,2 P A ,P B = .3)To minimize R π 2 ,2 (P A ,P B ), it is desired to maximize n i=1 p i when P A > P B or to minimize n i=1 p i when P A