^{1, 2}

^{3}

^{3}

^{1}

^{1}

^{2}

^{3}

The clinical trial, a prospective study to evaluate the effect of interventions in humans under prespecified conditions, is a standard and integral part of modern medicine. Many adaptive and sequential approaches have been proposed for use in clinical trials to allow adaptations or modifications to aspects of a trial after its initiation without undermining the validity and integrity of the trial. The application of adaptive and sequential methods in clinical trials has significantly improved the flexibility, efficiency, therapeutic effect, and validity of trials. To further advance the performance of clinical trials and convey the progress of research on adaptive and sequential methods in clinical trial design, we review significant research that has explored novel adaptive and sequential approaches and their applications in Phase I, II, and III clinical trials and discuss future directions in this field of research.

Medicine is of paramount importance for human healthcare. Development of novel successful medicines is a lengthy, difficult, and expensive process which consists of laboratory experimentation, animal studies, clinical trials (Phase I, II, and III), and postmarket followup (Phase IV). Clinical trials are FDA-approved studies conducted in human beings to demonstrate the safety and efficacy of new drugs for health interventions under pre-specified conditions. A clinical trial is conducted in a sampled small population and the conclusions reached will be applied to a whole target population; therefore, statistics is an indispensable and critical component of clinical trial development and analysis, which has become increasingly important in contemporary clinical trials. As the gold standard for the evaluation of a new drug, every contemporary clinical trial must be well designed according to its specific purpose and conducted properly under governmental regulations. The major roles of a statistician in a clinical trial are to design an efficient trial with minimum cost and length and maximum therapeutic effect for patients in the trial, and to draw convincing conclusions by applying appropriate cutting edge statistical knowledge. In the past several decades, numerous groundbreaking novel statistical methodologies have been developed and applied to clinical trials and have significantly improved their performance. Consequently, clinical trials have evolved from simple observation studies to hypothesis-driven and well-designed prospective studies. At present, contemporary clinical trials have become the most important part of modern medicine.

Classical clinical trials are usually designed with a fixed sample size and schedule without using the information obtained from the ongoing trial. However, it has become increasingly common to modify a trial and/or statistical procedures during the conduct of a clinical trial. Specific modifiable procedures include the patient eligibility and evaluation criteria, drug or treatment dosage and schedule, laboratory testing or clinical diagnosis, study endpoints, measurement of clinical response, formulation of study objectives into statistical hypotheses, appropriate study design according to study purpose, calculation of minimum sample size, participant randomization, study monitoring with interim/futility analysis, statistical data analysis plan, and reaching conclusions, and so forth. The purpose of the modification is to improve the performance of a trial with prompt utilization of data accumulating from within the trial as well as upcoming related information from the literature.

Recently, adaptive and sequential clinical trials have become increasingly popular. The sequential method is an approach of frequentist statistics in which data are evaluated sequentially as they are accumulated and a study is monitored sequentially for stopping whenever a conclusion is reached with enough evidence. Adaptive design refers to the modification of aspects of the trial according to data accumulating during the progress of the trial, while preserving the integrity and validity of the trial. The modifiable aspects of adaptive trials include, but are not limited to, (a) sample size, (b) addition or removal of a study arm, (c) dose modification, (d) treatment switch, and so forth [

A Phase I trial is one of the most important steps in a drug’s development and is the first clinical trial in human subjects after laboratory and animal studies of a therapeutic agent have shown a potential cure effect on the disease. The sample size of a Phase I clinical trial is relatively small and varies in the range of twenty to eighty. It is a widely accepted assumption that the therapeutic effect of a drug depends on its toxicity and increases monotonically with its dosage level. Higher doses are correlated with both severe toxicity and better therapeutic effect. Therefore, a balance is to be achieved between toxicity level and therapeutic benefit. To achieve the best therapeutic benefit, a patient should be treated with the maximum dosage of drug at which the patient can tolerate its associated toxicities with close monitoring. Among all toxicities patients experience, some are so severe that they limit dose escalation. These toxicities are called dose limiting toxicity (DLT). In the National Cancer Institute (NCI) Common Toxicity Criteria, DLT is defined as a group of grade 3 or higher nonhematologic toxicities and grade 4 hematologic nontransient toxicities. The grades of all toxicities are classified as below:

grade 0: no toxicity;

grade 1: mild toxicity;

grade 2: moderate toxicity;

grade 3: severe toxicity;

grade 4: life-threatening toxicity;

grade 5: death.

The main goals of a Phase I trial are to determine the dose-toxicity relationship of a new therapeutic agent and estimate the maximum tolerated dose (MTD) of the agent given the specified tolerable toxicity level. The highest acceptable DLT level is usually defined as a target toxicity level (TTL). It can be said that the TTL determines the MTD of the new therapeutic agent. A careful and thoughtful approach to the design of Phase I trials and accurate MTD estimation are essential for the fate of the new drug in subsequent clinical trials.

In a Phase I clinical trial, the well accepted assumption is that the probability of toxicity increases monotonically with increasing drug dose, although a decrease in the probability of toxicity at high dose levels could happen in some special cases which are not common and not considered here. There are nonparametric and parametric manners to describe the toxicity-dose relationship. In the non-parametric way, the only assumption is that toxicity is nondecreasing with dose. In the parametric description, a distribution with some parameters is adapted to model the toxicity-dose curve. From a biological point of view, the human body has stabilization and self-salvage systems to protect the person from mild toxicity when a drug dose is at a low level below a certain threshold level, but the probability of toxicity increases at an accelerated speed once the stabilization and self-salvage systems have been overcome, and reaches rapidly the worst condition, death, and then levels off. Therefore a sigmoid shape distribution is an appropriate model to describe the relationship between toxicity probability and dose. Many statistical designs have been proposed for Phase I clinical trials; the most commonly used are summarized and compared in Table

Summary of main Phase I clinical trial designs.

Designs | Advantages | Disadvantages |
---|---|---|

Standard 3 + 3 design | Robust. |
MTD is not a dose with any particular probability of DLT, but in the range from 20% to 25% DLT. |

| ||

ID isotonic design | Only assumes a monotonically increasing relationship between dose and toxicity. |
The accuracy of MTD may not be as good as CRM or EWOC. |

| ||

CRM continual reassessment method | Fit parametric model for dose toxicity relationship. |
High risk of patients being treated with over toxic dosages. |

| ||

EWOC escalation with overdose control | Includes all advantages of CRM. |
If the parametric model is not reliable, the result could be questionable. |

All rule based designs follow a sequential approach. In rule based designs, a non-decreasing dose toxicity relationship is the only well accepted assumption required. Therefore rule based designs are well suited for first in human clinical trials in which the dose toxicity relationship is not well understood. Common rule based designs include

The

Escalation scheme for

Escalation scheme for

Leung and Wang (2001), for the first time, introduced a semiparametric Phase I design called isotonic design in which only a non-decreasing dose toxicity relationship is the required assumption [

There are many other rule based designs. All rule based designs can estimate a reasonable MTD using a stopping rule based either on observed DLTs or on convergence criteria. Ad hoc additional dose levels can also be added when needed without any impact on their robustness. Most rule-based designs are practically simple and easy to implement. At present,

In model based designs, three parametric dose-toxicity functions (logistic model, hyperbolic model, and power function) are usually employed to depict the relationship between dose and toxicity. Model based designs often fail to find an MTD in first in human studies that are based on observed DLTs. The most common model based designs are CRM and EWOC. Their algorithms are illustrated in Figure

Diagram of model based phase I designs: continuous reassessment design (CRM) and escalation with overdose control (EWOC).

O’Quigley et al. (1990) originally introduced the CRM, a Bayesian approach to fully and efficiently use all data and prior information available in a Phase I study [

All ruled based designs are robust and simple to implement and usually give a reasonable MTD under certain rules. Applying some sort of models, such as isotonic regression, to data can improve the accuracy of the MTD. Model based designs require a parametric model of dose toxicity relationship and may greatly improve the probability of estimating the correct MTD compared with rule based designs when certain assumptions are satisfied. However, model based designs are not robust and should not be used unless their underlying assumptions can be met with confidence. The accuracy of the estimated MTD depends substantially on the number of observed DLTs, and the sample size is also an important factor. Overall, different designs, whether rule based or model based, usually perform similarly when they are similar in sample size and aggressiveness. Thus, simple designs, especially standard designs, are still very popular in Phase I clinical trial practices.

The design of Phase I clinical trials can involve one or two stages. Rule based or model based designs can be implemented in each stage of two stage designs. There are other critical issues in Phase I clinical trial designs, such as the operating characteristics of

After the safety and MTD of an experimental drug have been established in a Phase I clinical trial, the drug will enter Phase II clinical trials, which initially evaluate the drug’s therapeutic effects at the recommended MTD. Phase II trials are sometimes further classified as Phase IIa and IIb studies. Phase IIa trials screen the promising novel experimental agent for significant antidisease activity and Phase IIb trials focus on the drug’s improved therapeutic effectiveness over the standard treatment. Phase II studies provide critical information to decide whether further testing of the experimental drug in a large confirmatory Phase III trial is warranted. The surrogate endpoint used in Phase II clinical trials needs to be obtained in a short time and should be able to assess the treatment’s primary benefit. For cancer trials, the experimental drug’s antitumor activity and progression-free survival (PFS) of treated patients are often used as surrogates of the drug’s efficacy. The drug’s anti-tumor activity is measured as clinical response within a short period of time following the treatment and is classified as complete response (CR), partial response (PR), progressive disease (PD), or stable disease (SD). PFS, which is estimated as the time elapsed from the date of treatment to the date of adverse event, resembles the outcome (overall survival) of the following Phase III clinical trial and is also widely used when it can be measured in a short time.

The most commonly used Phase II clinical trial designs are summarized in Table

Summary of main Phase II clinical trial designs.

Designs | Advantages | Disadvantages |
---|---|---|

One stage one arm design | Compare with historical control. |
Delay the evaluation of effectiveness. |

| ||

Gehan’s two stage design | With interim monitoring. |
No testing on agents showing some promise. |

| ||

Simon’s two |
The samples in two stages are optimized. |
Only suitable for binary outcome. |

| ||

Bayesian Phase II design | Flexible monitoring schedule. |
Intensive computation. |

| ||

Randomized Phase II design | Use of randomization. |
Sample size increases. |

| ||

Phase II pick the winner design | Efficient and effective way of comparing two or multiple experimental regimens. |
Not appropriate for comparison of adding an experimental agent to standard regimen. |

| ||

Phase II |
Limits the sample size required for a randomized Phase II comparison. |
No statistical comparison between the selected arms. |

| ||

Phase II randomized discontinuation design | Good when significant continued benefit after initial benefit implies significant benefit overall, and vice versa, or when benefit is restricted to a nonidentifiable subgroup of patients. | May need a large number of patients treated at a treatment not effective for them. |

| ||

Phase II/III design | Use of Phase II data in Phase III trial. |
Large sample sizes. |

Among the single arm Phase II designs, another major group is Bayesian Phase II design. For example, Thall and Simon (1994) [

Some Phase II clinical trials may have two arms and randomization is frequently used to generate a reliable concurrent control arm and reduce biases. This kind of randomized Phase II trial is more similar to a Phase III trial. Randomized Phase II trials may reduce the so-called trial effect which often arises due to different patient populations, physician preferences, and medical environments between current and previous studies. But the sample size, trial length, and cost increase about 4-fold.

There are several multiple arm Phase II designs [

Phase II screening design is another Phase II design with multiple arms in which all experimental arms are compared with the standard treatment arm and all the experimental arms beating the standard treatment arm are winners. Therefore this design limits the sample size required for a randomized Phase II comparison and it is appropriate for testing the effect of adding an experimental agent to a standard regimen. However, it provides no statistical comparison between the selected (winning) arms.

Some investigators have proposed a novel Phase II randomized discontinuation design in which all patients receive the same treatment for a period of time and those with stable disease are randomized to continue or discontinue. This design is particularly appropriate when the treatment is known to have better therapeutic effects and it is ethical for all participants to benefit from it, or when the potential subgroup of patients who can benefit from the treatment is unknown before receiving it. However, this design requires a large number of patients to be treated with a treatment not effective for them. Therefore this design has specific applications but is not widely used.

Conventionally, Phase II and III trials are conducted separately in a sequential order and only an experimental drug that has successfully passed a Phase II trial can enter a Phase III trial. The resulting gap between trials and time lag may be unnecessary under certain circumstances. Therefore, a seamless Phase II/III design has been proposed, which uses Phase II data in a Phase III trial and minimizes delay in starting up the Phase III study [

Categorical tumor response has been the most common endpoint in the Phase II clinical trial designs. However, from a statistical standpoint, categorizing a continuous tumor change percentage into a categorical tumor response with 4 levels results in a loss of study power by not fully utilizing all available data. Several publications have studied extensively the direct utilization of continuous tumor shrinkage as the primary endpoint for the measurement of drug efficacy in Phase II clinical trials [

If an experimental agent exhibits adequate short term therapeutic effects in a Phase II trial, the drug will be moved forward to a Phase III study for confirmative testing of its long term effectiveness. The typical endpoint in a Phase III trial is a time to event measurement, such as progression free survival or overall survival. Phase III trials are large scale in terms of sample size, resources, efforts, and costs. This Phase collects a large amount of data over a long period of followup to evaluate the ultimate therapeutic effect of a new drug. The design of Phase III clinical trials has become a very important research field in order to improve the performance of these critical clinical trials. The most commonly used Phase III clinical trial designs are summarized in Table

Summary of main Phase III clinical trial designs.

Designs | Advantages | Disadvantages |
---|---|---|

Historical control |
Allows ethical consideration. |
Vulnerable to bias. |

| ||

Concurrent control, not randomized | Eliminates time trends. |
Intervention and control groups may not be comparable because of selection bias and different treatment groups are not comparable. |

| ||

Randomized clinical trials (RCT) | Considered to be “gold standard”. |
Subjects may not represent general patient population. |

| ||

Sequential RCT design | Continues to randomize subjects until null hypothesis is either rejected or “accepted.” |
Multiple testing inflates type I error. |

| ||

Bayesian RCT design | Dynamic learning adaptive feature. |
May be criticized as too subjective, not well planned, or too complicated. |

The earliest design of Phase III clinical trials is a single arm study design using historical controls from the literature, existing databases, or medical charts. This kind of Phase III design allows ethical consideration and can increase enrollment as patients are assured of receiving new therapy. In addition, trials will have shorter time and lower cost, making this type of trial a good choice for the initial testing of new treatments, or when disease diagnosis is clearly established, prognosis is well known, or the disease is highly fatal. This Phase III design, however, provides no comparison to control group data and is vulnerable to biases because disease and mortality rates have changed over time and literature controls are particularly poor. Phase III trials conducted using this design tend to exaggerate the value of a new treatment. In order to avoid bias and eliminate time trends, a concurrent control but nonrandomized design for Phase III clinical trials was then proposed and implemented. In this design, randomization does not interfere with treatment selection. It is easier to select a group to receive the intervention and select the controls matching key characteristics. Therefore, this design can reduce costs and is relatively simple and easily acceptable to both the investigator and participant. But in this Phase III design, intervention and control groups may not be comparable because of selection bias and incomparable different group populations. It is difficult to prove comparability because it is impractical to have information on all important prognostic factors and to match several factors. The existence of unknown or unmeasured factors in large studies is also uncertain. The afterward covariance analysis is not adequate for offsetting the imbalance between groups.

To eliminate the bias, facilitate masking treatments, and permit the use of statistical theory, randomization has been employed widely in the Phase III clinical trials [

The statistical approach of randomization removes any potential bias in group allocation. The use of randomization and a concurrent control together produce comparable groups and make conclusions more convincing. The use of feasible blinding minimizes the bias after randomization. At present, the standard form of a Phase III trial is a randomized and placebo-controlled clinical trial (RCT) with double blinds. The control arm may be a placebo or the standard of care. The use of placebo is only acceptable if there is no other better or standard therapy available. Interim monitoring is also often considered for a long term confirmatory RCT. The RCT which guarantees the validity of statistical tests and valid comparisons has been generally used as the “gold standard” for verifying the efficacy of new drugs. However, there are still some limitations in RCTs; for example, subjects may not represent the general patient population; sample size and cost increase substantially; the randomization process may not be widely accepted; the administrative process may be complex; and so forth. According to their statistical algorithm and characteristics, besides the conventional fixed sample Phase III clinical trial in which only one final data analysis is conducted at the end of the study, other RCT designs with additional analyses before final analysis can be divided into two distinct categories: sequential RCT design and Bayesian adaptive RCT design.

The scheme of the group sequential design is summarized in Figure

Diagram of group sequential design.

Multiple testing during the sequential trial may inflate type I error which can be controlled using the Pocock approach [

The major advantages of the group sequential RCT design are its abilities to prevent unnecessary exposure of patients to an unsafe or ineffective new drug or to a placebo treatment, and to save time and resources by stopping the trial early for efficacy, futility, and safety. The sequential RCT design is suitable for acute response, paired subjects, and continuous testing. It is especially appropriate for dichotomized decisions (yes/no) because the result of the RCT trial is determined to be significant or not according to a pre-specified significance level (type I error). Although sequential RCT is the most widely used design in Phase III clinical trials, it has some limitations. Sequential RCT may require larger sample sizes than Bayesian adaptive RCT as a result of additional variability and comparison of multiple treatments with similar efficacies. Sequential RCT is somewhat adaptive by using interim monitoring and stopping rules, but it requires prespecification of all possible study outcomes, thus inhibiting the full adaptation and utilization of newly accumulated data from the ongoing trial.

Bayesian randomized clinical trials refer to trials in which Bayesian approaches are applied extensively to some or all of the processes of a trial including randomization, monitoring, interim and futility analysis, final analysis, and adaptive decisions. Berry and Kadane [

Bayesian RCT design is dynamic learning adaptive in nature as it prespecifies the approaches to combine all available data accumulated during the process of the study, calculate probabilistic estimation of uncertainty, control the probability of false-positive and false-negative conclusions, and change the study design correspondingly [

Both Bayesian and sequential RCT designs have their advantages and disadvantages. Instead of biasing toward either Bayesian or sequential methods, statisticians and investigators should choose the design of Phase III clinical trial that best fits the goals of the trial and is most likely to provide the best performance.

In the planning stage of a Phase III clinical trial, sample size is one of the most important factors to be considered because the budget for the trial depends on the minimum required sample size. Usually sample size is fixed in a trial, but an adaptive sample size calculation is often used in adaptive clinical trials and the sample size is adjusted based on the observed data at the interim analysis [

The fate of an ongoing Phase III trial is determined at its data monitoring committee (DMC) meeting, which makes recommendations based on the available data according to stopping rules in the statistical guidelines. The common factors considered in stopping rules are safety, efficacy, futility, benefit-risk ratio, weight between the short term and long term treatment effects, and conditional power or predictive power [

Clinical trials remain an indispensable component of new drug development. Novel statistical approaches have been applied to clinical trials and have significantly improved their performance in every step from design, conduct, and monitoring to data analysis and drawing final conclusions. As modern medicine progresses, increasingly complex requirements and factors need to be considered in clinical trials, which in turn create new challenges for statisticians. In the future, more novel statistical approaches, frequentist and Bayesian, should be developed to enhance the performance of clinical trials in terms of therapeutic effect, safety, accuracy, efficiency, simplicity, and validity of conclusions and to expedite the development of effective new drugs to improve human healthcare.

This work is supported in part by NIH/NCI Grants no. 1 P01 CA116676 (Z. Chen.), P30 CA138292-01 (Z. Chen. and J. Kowalski.), and 5 P50 CA128613 (Z. Chen); NSA GrantH98230-12-1-0209 (Y. Zhao).