Methodology and Application of Adaptive and Sequential Approaches in Contemporary Clinical Trials

The clinical trial, a prospective study to evaluate the effect of interventions in humans under prespecified conditions, is a standard and integral part of modern medicine. Many adaptive and sequential approaches have been proposed for use in clinical trials to allow adaptations or modifications to aspects of a trial after its initiation without undermining the validity and integrity of the trial. The application of adaptive and sequential methods in clinical trials has significantly improved the flexibility, efficiency, therapeutic effect, and validity of trials. To further advance the performance of clinical trials and convey the progress of research on adaptive and sequential methods in clinical trial design, we review significant research that has explored novel adaptive and sequential approaches and their applications in Phase I, II, and III clinical trials and discuss future directions in this field of research.


Statistical Methodology of Phase I Clinical Trials
A Phase I trial is one of the most important steps in a drug's development and is the first clinical trial in human subjects after laboratory and animal studies of a therapeutic agent have shown a potential cure effect on the disease. The sample size of a Phase I clinical trial is relatively small and varies in the range of twenty to eighty. It is a widely accepted assumption that the therapeutic effect of a drug depends on its toxicity and increases monotonically with its dosage level. Higher doses are correlated with both severe toxicity and better therapeutic effect. Therefore, a balance is to be achieved between toxicity level and therapeutic benefit. To achieve the best therapeutic benefit, a patient should be treated with the maximum dosage of drug at which the patient can tolerate its associated toxicities with close monitoring. Among all toxicities patients experience, some are so severe that they limit dose escalation. These toxicities are called dose limiting toxicity DLT . In the National Cancer Institute NCI Common Toxicity Criteria, DLT is defined as a group of grade 3 or higher nonhematologic toxicities and grade 4 hematologic nontransient toxicities. The grades of all toxicities are classified as below: The main goals of a Phase I trial are to determine the dose-toxicity relationship of a new therapeutic agent and estimate the maximum tolerated dose MTD of the agent given the specified tolerable toxicity level. The highest acceptable DLT level is usually defined as a target toxicity level TTL . It can be said that the TTL determines the MTD of the new therapeutic agent. A careful and thoughtful approach to the design of Phase I trials and accurate MTD estimation are essential for the fate of the new drug in subsequent clinical trials.
In a Phase I clinical trial, the well accepted assumption is that the probability of toxicity increases monotonically with increasing drug dose, although a decrease in the probability of toxicity at high dose levels could happen in some special cases which are not common and not considered here. There are nonparametric and parametric manners to describe the toxicity-dose relationship. In the non-parametric way, the only assumption is that toxicity is nondecreasing with dose. In the parametric description, a distribution with some parameters is adapted to model the toxicity-dose curve. From a biological point of view, the human body has stabilization and self-salvage systems to protect the person from mild toxicity when a drug dose is at a low level below a certain threshold level, but the probability of toxicity increases at an accelerated speed once the stabilization and self-salvage systems have been overcome, and reaches rapidly the worst condition, death, and then levels off. Therefore a sigmoid shape distribution is an appropriate model to describe the relationship between toxicity probability and dose. Many statistical designs have been proposed for Phase I clinical trials; the most commonly used are summarized and compared in Table 1. According to their algorithm, Phase I clinical trial designs can be grouped into two major categories, rule based design and model based design 3 .

Rule Based Phase I Designs
All rule based designs follow a sequential approach. In rule based designs, a non-decreasing dose toxicity relationship is the only well accepted assumption required. Therefore rule based designs are well suited for first in human clinical trials in which the dose toxicity relationship is not well understood. Common rule based designs include 3 3 design 4 , isotonic design 5 , accelerated titration design 6 , and so forth. The 3 3 designs are rule based up-and-down methods used in Phase I protocol templates of the cancer therapy evaluation program CTEP , whose mission is to improve the lives of cancer patients by sponsoring clinical trials to evaluate new anticancer agents, with a particular emphasis on translational research to elucidate molecular targets and mechanisms of drug effects. While 3 3 designs have become standard practice among many Phase I clinical trialists, they are not designed with the intention of producing accurate estimates of a target quantile. Rather they are designed to screen drugs quickly and identify a dose level that does not exhibit too much toxicity in a very small group of patients. These 3 3 designs fall into two categories, without dose de-escalation Figure 1 and with dose de-escalation Figure 2 . In the 3 3 design without dose de-escalation, three patients are assigned to the first dose level. If no DLT is observed, the trial proceeds to the next dose level and another cohort of three patients is enrolled. If at least two out of the three patients experience at least one DLT, then the previous dose level is considered as the MTD; otherwise, if only one patient experiences the DLT, then three additional patients are enrolled at the same dose level.    If at least one of the three additional patients experiences the DLT, then the previous dose is considered as the MTD; otherwise, the dose will be escalated. The 3 3 design with dose deescalation allows three new patients to be treated at a previous dose level if only three patients were treated at that level previously. Dose reduction continues until a dose level is reached at which six patients are treated and at most one DLT is observed in the six patients. The MTD is defined as the highest dose level at which at most one of six patients experiences DLT, and the immediate higher dose level has at least two patients who experience DLTs. If the first dose is not tolerable, then the MTD cannot be established within the confines of the study. Hence, the MTD is identified from the data and is a statistic rather than a parameter. Storer 1989 was probably the first to examine the characteristics of the 3 3 design from the standpoint of the statistician 7 . The operating characteristics of the 3 3 design were discussed in Lin and Shih 2001 4 . Note that any design with sampling that is asymmetric about the MTD will yield a biased result; thus the standard design, and all other designs that approach the MTD from below, will tend to yield a low estimate of the MTD. The 3 3 designs are simple and can usually determine a reasonable MTD and are thus the most widely used methods for Phase I clinical trials. But they also have many shortcomings; for example, the methods are not designed around a quantile of interest; not all toxicity data are used to determine the MTD; the MTD is not a dose with any particular probability of toxicity. These disadvantages led to the exploration of extended isotonic design for Phase I clinical trials. Leung and Wang 2001 , for the first time, introduced a semiparametric Phase I design called isotonic design in which only a non-decreasing dose toxicity relationship is the required assumption 5 . In their isotonic design, the pool-adjacent-violators algorithm PAVA and isotonic regression are used to update the probability of DLT of each dose level after the toxicity response of each newly treated cohort has been obtained. The dose allocation rationale is to treat each new cohort at a dose level with an estimated probability of DLT closer to the pre-specified target acceptable toxicity level. The trial stops when the same dose has been tested consecutively for a certain number of cohorts or a maximum number of patients have been treated. The recommended dose level for the next cohort based on all completed data after the trial stops is the MTD. Through simulation studies, the isotonic design was demonstrated to perform substantially better than the 3 3 design and comparably to the continual reassessment method CRM 8 , Storer's up-and-down designs, and escalation with overdose control EWOC design 9 . Moreover, the isotonic design is model-free and especially appropriate in cases where the parametric dosetoxicity relationship is not well understood.

Journal of Probability and Statistics
There are many other rule based designs. All rule based designs can estimate a reasonable MTD using a stopping rule based either on observed DLTs or on convergence criteria. Ad hoc additional dose levels can also be added when needed without any impact on their robustness. Most rule-based designs are practically simple and easy to implement. At present, 3 3 designs are still the most popular in Phase I clinical trials.

Model Based Designs
In model based designs, three parametric dose-toxicity functions logistic model, hyperbolic model, and power function are usually employed to depict the relationship between dose and toxicity. Model based designs often fail to find an MTD in first in human studies that are based on observed DLTs. The most common model based designs are CRM and EWOC. Their algorithms are illustrated in Figure 3. O'Quigley et al. 1990 originally introduced the CRM, a Bayesian approach to fully and efficiently use all data and prior information available in a Phase I study 8 . As in rule based designs, a TTL is specified and the goal is to estimate the dose associated with the TTL, Γ. A parametric model depicting the dose toxicity relationship and a prior distribution for each unknown parameter of the model are required to implement CRM. The posterior mean of each parameter is computed using the prior for the parameter and all available toxicity data for the probability of toxicity, P DLT , of each dose level. The computation is conducted and P DLT of each dose level is updated with accumulative toxicity data available when a new patient is recruited. The main idea of CRM is to treat each patient at the dose level with P DLT closest to Γ. The MTD is defined as the dose level of the last patient treated in the trial. In the originally proposed CRM, a one parameter model of dose toxicity function and a single patient cohort are used. Furthermore, the first patient is proposed to be treated at a dose level determined purely by a guess in the original CRM, which makes the method impractical. Therefore, Korn et al. 1994 proposed a modified CRM in which the trial starts at the lowest dose level, no dose level can be skipped during the dose escalation, and the trial stops when the same dose has been recommended for a new patient consecutively for a fixed number of times 10 . However, patients still may be treated at excessively toxic doses in the modified CRM because of its single patient per cohort and the length of study is still very long because of the restriction that the toxicity of all treated patients must be obtained to calculate the new dose level for the next patient. In addition to the modification of Korn at al. 1994 10 , Faries 1994 11 , in his modified CRM, added another rule that no dose escalation is allowed for the next patient when the last patient has DLT. This rule can avoid treating patients at overly toxic doses compared with the traditional 3 3 design. In order to address the ethical requirement that the probability of a patient being treated at overdose 8

Journal of Probability and Statistics
is under a pre-specified value,  introduced an adaptive dose escalation scheme called EWOC 9 . The constraint on overdosing of EWOC is a superior feature over the CRM and its theoretical foundation was further elaborated by Zacks et al. 1998 12 . A two-parameter model logit P DLT x i α βx i was first used to depict the dose, x i , and DLT relationship and then the joint posterior for α and β was transformed to a joint posterior for the MTD and the probability of DLT at the lowest dose level, ρ 0 . EWOC is also designed to rapidly approach the MTD in addition to the overdose constraint so that it starts from the lowest dose level and a single patient per cohort is used. After the toxicity response of the last enrolled patient has been obtained, the joint posterior for the MTD and ρ 0 is updated using all the available information and the next coming patient is treated at the 25th percentile of the marginal posterior for the MTD. The trial stops after a fixed number of patients have been treated and then the MTD is computed as its posterior mean or estimated by minimizing the posterior expected loss in a loss function. In order to be safe and shorten the length of the trial, no dose level can be skipped during the dose escalation procedure and multiple patient cohorts can be used instead in EWOC. Through simulation studies, EWOC has been shown to be effective in overdose control and have comparable accuracy of estimated MTD as CRM. Fewer patients are treated at nonoptimal dose levels, resulting in less DLT, and the estimated MTD has smaller average bias and mean squared error in EWOC than in some other nonparametric designs, such as four up-and-down designs and two stochastic approximation methods 9 . It seems that EWOC is a promising alternative design for Phase I clinical trials, especially when the ethical and safety requirement of overdose control is a particular concern. Both CRM and EWOC belong to adaptive dose finding designs in which a Bayesian approach is usually employed and the dose level for the new incoming cohort is adaptive based on the toxicity responses of the previously treated patients in the ongoing trial. Another adaptive dose design is the nonparametric adaptive urn design approach for estimating a dose-response curve 13 .
All ruled based designs are robust and simple to implement and usually give a reasonable MTD under certain rules. Applying some sort of models, such as isotonic regression, to data can improve the accuracy of the MTD. Model based designs require a parametric model of dose toxicity relationship and may greatly improve the probability of estimating the correct MTD compared with rule based designs when certain assumptions are satisfied. However, model based designs are not robust and should not be used unless their underlying assumptions can be met with confidence. The accuracy of the estimated MTD depends substantially on the number of observed DLTs, and the sample size is also an important factor. Overall, different designs, whether rule based or model based, usually perform similarly when they are similar in sample size and aggressiveness. Thus, simple designs, especially standard designs, are still very popular in Phase I clinical trial practices.
The design of Phase I clinical trials can involve one or two stages. Rule based or model based designs can be implemented in each stage of two stage designs. There are other critical issues in Phase I clinical trial designs, such as the operating characteristics of 3 3 design in terms of expected toxicity level 14 , two or multiple stage Phase I design, within-patient dose escalation, late toxicity, combination of multiple agents, balance between toxicity and efficacy, individual MTD, fully utilization of all toxicities 15, 16 , and so forth. Some outstanding research studies have been conducted on these topics, which will not be elaborated on herein due to space constraints but have been described in several comprehensive review articles 3, 17-19 .

Statistical Methodology of Phase II Clinical Trials
After the safety and MTD of an experimental drug have been established in a Phase I clinical trial, the drug will enter Phase II clinical trials, which initially evaluate the drug's therapeutic effects at the recommended MTD. Phase II trials are sometimes further classified as Phase IIa and IIb studies. Phase IIa trials screen the promising novel experimental agent for significant antidisease activity and Phase IIb trials focus on the drug's improved therapeutic effectiveness over the standard treatment. Phase II studies provide critical information to decide whether further testing of the experimental drug in a large confirmatory Phase III trial is warranted. The surrogate endpoint used in Phase II clinical trials needs to be obtained in a short time and should be able to assess the treatment's primary benefit. For cancer trials, the experimental drug's antitumor activity and progression-free survival PFS of treated patients are often used as surrogates of the drug's efficacy. The drug's anti-tumor activity is measured as clinical response within a short period of time following the treatment and is classified as complete response CR , partial response PR , progressive disease PD , or stable disease SD . PFS, which is estimated as the time elapsed from the date of treatment to the date of adverse event, resembles the outcome overall survival of the following Phase III clinical trial and is also widely used when it can be measured in a short time.

Single Arm Phase II Designs
The most commonly used Phase II clinical trial designs are summarized in Table 2. Phase II trials can involve either a single arm, which compares the new treatment with the standard response rate reported by historical data, or two or more arms with patients randomized among different treatments. In a single arm Phase II trial, two or multistage designs may be used to improve the trial efficiency and save resources with early termination of a futile trial. The interim analysis between the consecutive stages examines the accumulated data and decides whether the trial should stop as suggested by the early evidence of futility or should continue to next stage. The earliest two stage Phase II design was proposed by Gehan et al. in 1961 20 , in which a trial is terminated for futility when no patients enrolled in the first stage show any response or continues with the second stage, enrolling an additional number of patients to estimate a more accurate response rate with additional patient data. This design provides interim monitoring and can rule out ineffective drug with minimized sample size. This design is only appropriate for binary outcomes, which differ from the overall survival endpoint used in the following Phase III trial. Moreover, this design has no statistical testing on agents showing some promise and is not optimized. Therefore, Simon 1989 proposed an optimized two stage Phase II design by controlling both type I and type II errors as well as optimizing the sample sizes in both stages 21 . This design can quickly screen out agents without effectiveness while testing further agents with some promise. The design has two subtypes, optimal and minimax. The optimal subtype minimizes the expected overall sample size with the probability of the trial stopping after only the first stage so that it is appropriate for experimental drugs with a high probability of failure after the first stage. The minimax subtype minimizes the maximum possible sample size when the trial stops after completion of two stages so that it is better for highly promising experimental drugs. As with Gehan's design, Simon's two stage designs are only appropriate for binary outcomes.
Other investigators have further proposed to conduct multiple interim analyses in Phase II clinical trials by using multistages. For example, Fleming 1982 22 andChang et al. 1987  23 studied multiple testing and group sequential methods for Phase II trial designs. But the issue of inflating overall type I error needs to be considered in these kinds of Phase II designs. Among the single arm Phase II designs, another major group is Bayesian Phase II design. For example, Thall and Simon 1994 24 proposed a Bayesian Phase II design which continuously examines the results after each new enrolled patient and determines whether the trial can stop with a solid decision on the efficacy of the experimental drug or should continue to enroll more patients and obtain enough data for making a decision. Lee and Liu 2008 25 proposed a Bayesian approach called predictive probability Phase II design. This novel Bayesian design provides a flexible monitoring schedule for Phase II clinical trials which becomes more efficient and robust, but at the cost of intensive computation, and relies heavily on the statistician during the trial. Yin et al. 2011 further coupled the methods of predictive probability monitoring and adaptive randomization in a randomized Phase II trial and extensively compared this hybrid Bayesian approach with group sequential methods 26 .

Two or More Arm Phase II Designs
Some Phase II clinical trials may have two arms and randomization is frequently used to generate a reliable concurrent control arm and reduce biases. This kind of randomized Phase II trial is more similar to a Phase III trial. Randomized Phase II trials may reduce the so-called trial effect which often arises due to different patient populations, physician preferences, and medical environments between current and previous studies. But the sample size, trial length, and cost increase about 4-fold.
There are several multiple arm Phase II designs 27 . The Phase II "pick the winner" design is one in which each experimental regimen is compared with a historical control. No formal statistical comparison between groups is conducted and the simple winner of the all arms is the winner of the trial. This design provides an efficient and effective way of comparing two or multiple experimental regimens but is not appropriate for the comparison of adding an experimental agent to a standard regimen.
Phase II screening design is another Phase II design with multiple arms in which all experimental arms are compared with the standard treatment arm and all the experimental arms beating the standard treatment arm are winners. Therefore this design limits the sample size required for a randomized Phase II comparison and it is appropriate for testing the effect of adding an experimental agent to a standard regimen. However, it provides no statistical comparison between the selected winning arms.
Some investigators have proposed a novel Phase II randomized discontinuation design in which all patients receive the same treatment for a period of time and those with stable disease are randomized to continue or discontinue. This design is particularly appropriate when the treatment is known to have better therapeutic effects and it is ethical for all participants to benefit from it, or when the potential subgroup of patients who can benefit from the treatment is unknown before receiving it. However, this design requires a large number of patients to be treated with a treatment not effective for them. Therefore this design has specific applications but is not widely used.
Conventionally, Phase II and III trials are conducted separately in a sequential order and only an experimental drug that has successfully passed a Phase II trial can enter a Phase III trial. The resulting gap between trials and time lag may be unnecessary under certain circumstances. Therefore, a seamless Phase II/III design has been proposed, which uses Phase II data in a Phase III trial and minimizes delay in starting up the Phase III study 28, 29 . Usually the Phase II part is a randomized Phase II trial which uses a concurrent control. This nonstop Phase II/III design is particularly useful for new drugs showing efficacy. It usually requires large sample sizes and requires a Phase III infrastructure to be developed even if it stops early.

Other Advanced Topics in Phase II Designs
Categorical tumor response has been the most common endpoint in the Phase II clinical trial designs. However, from a statistical standpoint, categorizing a continuous tumor change percentage into a categorical tumor response with 4 levels results in a loss of study power by not fully utilizing all available data. Several publications have studied extensively the direct utilization of continuous tumor shrinkage as the primary endpoint for the measurement of drug efficacy in Phase II clinical trials 30-32 . The success rate of Phase III oncology trials remains very low e.g., 50-60% despite the success demonstrated in the preceding Phase II trials 30 . The relationship between tumor response/tumor shrinkage percentage and overall survival as the gold standard for drug efficacy has been revisited 33 . PFS has the advantage of short follow-up time 34 and has been confirmed as the best estimate of overall survival 35 so that PFS is recommended as the primary endpoint over categorical tumor response in Phase II clinical trials when feasible.

Statistical Methodology of Phase III Clinical Trials
If an experimental agent exhibits adequate short term therapeutic effects in a Phase II trial, the drug will be moved forward to a Phase III study for confirmative testing of its long term effectiveness. The typical endpoint in a Phase III trial is a time to event measurement, such as progression free survival or overall survival. Phase III trials are large scale in terms of sample size, resources, efforts, and costs. This Phase collects a large amount of data over a long period of followup to evaluate the ultimate therapeutic effect of a new drug. The design of Phase III clinical trials has become a very important research field in order to improve the performance of these critical clinical trials. The most commonly used Phase III clinical trial designs are summarized in Table 3.

Randomization
The earliest design of Phase III clinical trials is a single arm study design using historical controls from the literature, existing databases, or medical charts. This kind of Phase III design allows ethical consideration and can increase enrollment as patients are assured of receiving new therapy. In addition, trials will have shorter time and lower cost, making this type of trial a good choice for the initial testing of new treatments, or when disease diagnosis is clearly established, prognosis is well known, or the disease is highly fatal. This Phase III design, however, provides no comparison to control group data and is vulnerable to biases because disease and mortality rates have changed over time and literature controls are particularly poor. Phase III trials conducted using this design tend to exaggerate the value of a new treatment. In order to avoid bias and eliminate time trends, a concurrent control but nonrandomized design for Phase III clinical trials was then proposed and implemented. In this design, randomization does not interfere with treatment selection. It is easier to select a group to receive the intervention and select the controls matching key characteristics. Therefore, this design can reduce costs and is relatively simple and easily acceptable to both the investigator and participant. But in this Phase III design, intervention and control groups may not be comparable because of selection bias and incomparable different group populations. It is difficult to prove comparability because it is impractical to have information on all important prognostic factors and to match several factors. The existence of unknown or unmeasured factors in large studies is also uncertain. The afterward covariance analysis is not adequate for offsetting the imbalance between groups.
To eliminate the bias, facilitate masking treatments, and permit the use of statistical theory, randomization has been employed widely in the Phase III clinical trials 36 . There are two major types of randomization approaches, non adaptive versus adaptive. Subjects may not represent general patient population. Increased sample size and cost. Acceptability of randomization process. Administrative complexity.

Sequential RCT design
Continues to randomize subjects until null hypothesis is either rejected or "accepted." Good for acute response, paired subjects, and continuous testing. Good for one-time dichotomous decisions such as regulatory approval, and so forth.
Multiple testing inflates type I error.
Inhibits adaptation due to the requirement of prespecifying all possible study outcomes.

Bayesian RCT design
Dynamic learning adaptive feature. Incorporates external evidence. Add new interventions and drop less effective ones without restarting trial. Improves timeliness and clinical relevance of trial results. Lowest sample size and cost.
May be criticized as too subjective, not well planned, or too complicated.
Simple randomization, block randomization, and stratified randomization belong to the nonadaptive randomization type. The simple randomization is robust against both selection and accidental biases and appropriate for RCTs with over 200 subjects because of the possibility of imbalanced group sizes in small RCTs 37 . Block randomization can guarantee balanced group sizes by pre-specifying the block size and allocation ratio and allocating subjects randomly within each block 33 . Block randomization is often used with "stratified randomization" in small RCTs. There are several adaptive randomization approaches: adaptive biased coin, covariate adaptive, and response adaptive 33 . The adaptive biasedcoin randomization method can reduce the imbalance of group size and is less affected by selection bias than permuted-block randomization by decreasing and increasing the probability of being assigned to an overrepresented group and underrepresented group, respectively. Randomization can be adaptive to covariate in order to produce balanced groups in terms of the sample size of several covariates. The most common covariate adaptive randomization approaches are the Taves's method 38 , Pocock and Simon method 39 , and Frane's method 40 for both continuous and categorical types. Overall, covariate adaptive randomization can reduce the imbalance further and handle more covariates simultaneously than using the combination of block and stratified randomization 41 . Randomization can be adaptive to response or outcome in order to increase the trial therapeutic effect, taking into account ethical considerations. Response-adaptive randomization can assign more patients to receive better treatment by skewing the probability of assigning new patients to the group showing favorable response as the data of the trial are accumulating while maintaining a certain study power 41 . The most common approaches used for response-adaptive randomization are the urn model, biased coin design, and Bayesian's approach 34 . Each randomization approach has its own merits and limitations. The selection of randomization method depends on the specific study purpose.

Randomized Controlled Phase III Trials
The statistical approach of randomization removes any potential bias in group allocation. The use of randomization and a concurrent control together produce comparable groups and make conclusions more convincing. The use of feasible blinding minimizes the bias after randomization. At present, the standard form of a Phase III trial is a randomized and placebocontrolled clinical trial RCT with double blinds. The control arm may be a placebo or the standard of care. The use of placebo is only acceptable if there is no other better or standard therapy available. Interim monitoring is also often considered for a long term confirmatory RCT. The RCT which guarantees the validity of statistical tests and valid comparisons has been generally used as the "gold standard" for verifying the efficacy of new drugs. However, there are still some limitations in RCTs; for example, subjects may not represent the general patient population; sample size and cost increase substantially; the randomization process may not be widely accepted; the administrative process may be complex; and so forth. According to their statistical algorithm and characteristics, besides the conventional fixed sample Phase III clinical trial in which only one final data analysis is conducted at the end of the study, other RCT designs with additional analyses before final analysis can be divided into two distinct categories: sequential RCT design and Bayesian adaptive RCT design.

Group Sequential RCT Design
The scheme of the group sequential design is summarized in Figure 4. In this design, type I and II errors are explicitly controlled while testing the study hypotheses, and patients continue to be enrolled and randomized until the primary hypothesis has been proved or disproved. To design a Phase III clinical trial with the group sequential method, the total number of stages, the sample size, and stopping criterion at each stage for the null hypothesis testing as well as the usual specifications in a conventional Phase III clinical trial must be prespecified before the trial starts. At each interim stage, all accumulated data up to the point are  analyzed and the test statistics is compared with critical values generated from the sequential design to determine whether the trial should stop or continue. A conclusion on the primary hypothesis must be reached at the final stage when the sequential trial passes all interim analyses and completes with the final stage. Multiple testing during the sequential trial may inflate type I error which can be controlled using the Pocock approach 42 , O'Brien-Fleming approach 43 , and alpha spending function 44 . The Pocock approach was the first method for group sequential testing with given overall type 1 error and power by dividing type I error evenly across the number of interim and final analyses. For example, in a clinical trial with 2 interim analyses and 1 final analysis, the Pocock procedure uses the same cut-off for both the interim and final analyses and the clinical trial can stop and claim a positive outcome if the P value is less than 0.022 at any of the analysis times. One obvious problem with the Pocock approach is its too high probability of stopping the trial early. In order to prevent early stopping and to keep the final P value close to the overall significance level, such as 0.05, O'Brien and Fleming's approach 43 uses a very strict cut-off P value at the beginning, then relaxes the cut-off P value over time. As in the above clinical trial, the P values for the first and second interim analyses are 0.005 and 0.014, respectively. The P value for the final analysis is 0.045 which is close to 0.05. Both the Pocock and O'Brien-Fleming approaches maintain the overall type I error by paying a penalty at the final analysis, but the O'Brien-Fleming method involves much less of a penalty at the planned conclusion of the study because it requires stricter standards earlier. Both methods have some limitations; both require a prespecified maximum number of patients, the number of interim analysis, and equal increments of information between interim stages. Therefore, DeMets and Lan 44 1994 introduced a spending function approach to relax the requirement of the equal increments of information. The approach spends the allowable type I error rate over time according to a chosen spending principle and the amount of information accrued and allows dropping or adding an interim analysis during conduct of the trial. There are several types of spending functions proposed in the literature. Besides the Pocock-type and O'Brien-Fleming-type error spending functions proposed by Lan and DeMets, the gamma error spending function 45 proposed by Hwang, Shih, and DeCani and the power error spending function 46 proposed by Jennison and Turnbull are also commonly used in clinical trials. The conclusions drawn at the interim and final analyses are affected heavily by the pre-specified boundaries so that the choices of the type of spending function are very important and depend on the specific purpose of the trial and its associated clinical program. In addition to efficacy, the safety profile of drug is also an important factor when considering the early stopping of a trial.
The major advantages of the group sequential RCT design are its abilities to prevent unnecessary exposure of patients to an unsafe or ineffective new drug or to a placebo treatment, and to save time and resources by stopping the trial early for efficacy, futility, and safety. The sequential RCT design is suitable for acute response, paired subjects, and continuous testing. It is especially appropriate for dichotomized decisions yes/no because the result of the RCT trial is determined to be significant or not according to a pre-specified significance level type I error . Although sequential RCT is the most widely used design in Phase III clinical trials, it has some limitations. Sequential RCT may require larger sample sizes than Bayesian adaptive RCT as a result of additional variability and comparison of multiple treatments with similar efficacies. Sequential RCT is somewhat adaptive by using interim monitoring and stopping rules, but it requires prespecification of all possible study outcomes, thus inhibiting the full adaptation and utilization of newly accumulated data from the ongoing trial.

Bayesian RCT Design
Bayesian randomized clinical trials refer to trials in which Bayesian approaches are applied extensively to some or all of the processes of a trial including randomization, monitoring, interim and futility analysis, final analysis, and adaptive decisions. Berry and Kadane 47 proposed optimal Bayesian randomization in 1997 and the practical uses of Bayesian adaptive randomization in clinical trials have been reviewed by Thall and Wathen 48 . Bayesian monitoring has been frequently used in some Phase III clinical trials, especially in those with failure time endpoints 49 . Bayesian analysis in clinical trials has become increasingly common recently as it can borrow strength from outside the study 50 . Bayesian adaptive decisions in clinical trials can be made according to a posterior probability or predictive probability of trial success or from the result of Bayesian final analysis. Bayesian adaptive decisions have been compared to frequentist sequential approaches 51 and some studies 52-54 proposed to use Bayesian decision theoretical approaches in the optimization of designs under various settings.
Bayesian RCT design is dynamic learning adaptive in nature as it prespecifies the approaches to combine all available data accumulated during the process of the study, calculate probabilistic estimation of uncertainty, control the probability of false-positive and false-negative conclusions, and change the study design correspondingly 55 . Bayesian and adaptive RCT design cannot only compare multiple active treatments but can also allow the ongoing trial to add new emerging effective interventions, discontinue less effective ones proved by accumulated within-trial data, or focus on patient subgroups identified by certain biomarkers for whom interventions are more or less effective so that the trial tests the most current interventions, improves the clinical relevance, and targets biomarkers that predict response to alternative intervention. Using external existing data from previous studies during the design stage and the accumulated within-trial data to update the design results in smaller sample size, shorter time, and reduced cost of Bayesian and adaptive RCT 56 . But Bayesian RCT may be criticized as being too subjective, not well planned, or too complicated.
Both Bayesian and sequential RCT designs have their advantages and disadvantages. Instead of biasing toward either Bayesian or sequential methods, statisticians and investigators should choose the design of Phase III clinical trial that best fits the goals of the trial and is most likely to provide the best performance.

Adaptive Sample Size Calculation and Adaptive Stopping
In the planning stage of a Phase III clinical trial, sample size is one of the most important factors to be considered because the budget for the trial depends on the minimum required sample size. Usually sample size is fixed in a trial, but an adaptive sample size calculation is often used in adaptive clinical trials and the sample size is adjusted based on the observed data at the interim analysis 1 . Sample size determination depends on the expected treatment difference and its standard deviation; however, their initial estimations often turn out to be too large or small as suggested by the accumulating data from the ongoing trial or other newly completed studies. In this case, keeping the original sample size will lead to an underpowered or overpowered trial, and so the sample size should be adjusted according to the updated effect size for the ongoing trial. There are several approaches for sample size adjustment based on the criteria of treatment effect size, conditional power, and/or reproducibility probability 57-61 . The observed treatment effect and estimated standard deviation from a limited number of subjects at the interim analysis may not be of statistical significance. Therefore, these factors should not be weighed too heavily and the targeted clinically meaningful difference in the ongoing clinical trial should always be considered fully in the adaptive sample size calculation.
The fate of an ongoing Phase III trial is determined at its data monitoring committee DMC meeting, which makes recommendations based on the available data according to stopping rules in the statistical guidelines. The common factors considered in stopping rules are safety, efficacy, futility, benefit-risk ratio, weight between the short term and long term treatment effects, and conditional power or predictive power 1 . Current tools for monitoring Phase III trials are stopping boundaries, conditional and predictive powers, futility index, repeated confidence interval, and Bayesian monitoring tools. Even though the stopping rules are usually stipulated in the design stage, adaptive stopping is becoming more and more common due to unpredicted events during the conduct of the trial, such as a change in the DMC meeting date because of unavailability of committee members, different patient accrual progress, and deviation in the analysis schedule. Moreover, the true variability in the parameters to construct these boundaries of stopping rules is never known and it is very common that the initial estimates of the variability and treatment effect in the design phase are inaccurate as shown by the preliminary results of the ongoing trials. These deviations could affect substantially the stopping boundaries so that adaptive stopping becomes especially desirable in these cases. To stop a trial prematurely under adaptive stopping algorithm, thresholds for the number of subjects randomized and some rules such as utility rules, futility rules, etc. in terms of boundaries must pass.

Concluding Remarks
Clinical trials remain an indispensable component of new drug development. Novel statistical approaches have been applied to clinical trials and have significantly improved their performance in every step from design, conduct, and monitoring to data analysis and drawing final conclusions. As modern medicine progresses, increasingly complex requirements and factors need to be considered in clinical trials, which in turn create new challenges for statisticians. In the future, more novel statistical approaches, frequentist and Bayesian, should be developed to enhance the performance of clinical trials in terms of therapeutic effect, safety, accuracy, efficiency, simplicity, and validity of conclusions and to expedite the development of effective new drugs to improve human healthcare.