Number of Patients per Cohort and Sample Size Considerations Using Dose Escalation with Overdose Control

The main objective of cancer phase I clinical trials is to determine a maximum tolerated dose MTD of a new experimental treatment. In practice, most of these trials are designed so that three patients per cohort are treated at the same dose level. In this paper, we compare the safety and efficiency of trials using the escalation with overdose control EWOC scheme designed with three or only one patient per cohort. We show through simulations that the number of patients per cohort does not impact the proportion of patients given therapeutic doses, safety of the trial, and efficiency of the estimate of the MTD. Additionally, we present guidelines and tabulated values on the number of patients needed to design a phase I cancer clinical trial using EWOC to achieve a given accuracy of the estimate of the MTD.


Introduction
Cancer phase I clinical trials are small studies whose main objective is to determine a maximum tolerated dose MTD of a new experimental drug or combination of known drugs for use in a phase II trial.Patients are typically accrued to the trial sequentially in cohorts of size m and dose level assignment to a given cohort of patients is dependent upon the dose levels and toxicity outcomes of the previously treated cohorts of patients.A large number of statistical methodologies which account for the sequential nature of the data generated by such designs have been proposed in the literature, see 1, 2 for a comprehensive review of such methods.In particular, the continual reassessment method CRM proposed by 3 and its modifications 4-8 and escalation with overdose control EWOC described in 9-15 are Bayesian adaptive designs that produce consistent sequences of doses and can be easily implemented in practice using published tutorials and free interactive software, see, for example, 16-19 .

m P Design Using EWOC
EWOC is a Bayesian adaptive design permitting precise determination of the MTD while directly controlling the likelihood of an overdose.It is the first statistical method to directly incorporate formal safety constraints into the design of cancer phase I clinical trials.Zacks et al. 10 and Tighiouart and Rogatko 15 discuss statistical properties and coherence of the method, and a comparison of EWOC with alternative phase I design methods is given in 9 .Babb and Rogatko 11 provide a summary of Bayesian phase I design methods and Tighiouart et al. 12 studied the performance of EWOC under a richer class of prior distributions for the model parameters.The defining property of EWOC is that the expected proportion of patients treated at doses above the MTD is equal to a specified value α, the feasibility bound.This value is selected by the clinician and reflects his/her level of concern about overdosing.Zacks et al. 10 showed that among designs with this defining property, EWOC minimizes the average amount by which patients are underdosed.This means that EWOC approaches the MTD as rapidly as possible, while keeping the expected proportion of patients overdosed less than the value α.Zacks et al. 10 also showed that, as a trial progresses, the dose sequence defined by EWOC approaches the MTD i.e., the sequence of recommended doses converges in probability to the MTD .Eventually, all patients beyond a certain time would be treated at doses sufficiently close to the MTD.
EWOC has been used to design over a dozen of phase I studies approved by the Research Review Committee and the Institute Review Board of the Fox Chase Cancer Center, Philadelphia, Winship Cancer Institute, Atlanta, and Cedars Sinai Medical Center, Los Angeles see 23-29 for some of the published trials .
We adopt the-logistic-based model to represent the dose-toxicity relationship the following: where β 0 , β 1 ∈ −∞, ∞ × 0, ∞ so that the probability of dose limiting toxicity DLT is an increasing function of dose.The MTD γ is defined as the dose expected to produce DLT in a specified proportion θ of patients.Let ρ 0 be the probability of a DLT at the starting dose.To facilitate interpretation of model parameters by the clinicians, we further parameterize model 2.1 in terms of ρ 0 , γ , see 9, 12 for more details.Suppose we plan to enroll n patients in the trial in cohorts of size m.Dose levels in the trial are selected in the interval X min , X max and an m P design proceeds as follows.We first specify prior distributions for ρ 0 and γ.Then, the first cohort of m patients receives the dose x 1 X min .Let d 1 be the number of toxicities observed among the first m patients.The likelihood given the observed data thus far is where and x be the marginal posterior cumulative distribution function cdf of the MTD γ given D 1 .The second cohort of m patients receives the dose x 2 Π −1 1 α so that the posterior probability of exceeding the MTD is equal to the feasibility bound α.In general, the likelihood of the data after observing the toxicity outcomes of the ith cohort of m patients is where x j is the dose assigned to the jth cohort of m patients, p ρ 0 , γ, x j is given by 2.3 with x 1 replaced by x j , and The i 1 st cohort of m patients receives the dose x i 1 Π −1 i α where Π i x is the marginal posterior cdf of γ given D i .This process is repeated until a total of k cohorts are enrolled in the trial.This completes the description of an m P design.For a given sample size n, we propose to compare the performance of a 1 P with a 3 P design by estimating the percent of patients treated within a neighborhood of the true MTD.Other comparisons include safety and efficiency of the estimate of the MTD under the two designs.

Sample Size Determination
An increasing number of clinicians inquire about the number of patients they need to accrue in the design of cancer phase I trials to achieve a specific goal.Sample size recommendation based on the expected number of patients treated at each dose level in "3 3" designs and A B designs have been studied in 21, 22 , respectively.However, these methods apply to a prespecified set of discrete doses and it is not clear how they can be applied to continuous doses.Unlike the frequentist approach, there is no consensus on a specific Bayesian method for the SSD problem, see Adcock 30 for a review of Bayesian approaches.In this paper, we present numerical results based on the posterior variance of the MTD and highest posterior density HPD interval, see 31 .
Denote by Var γ | D n the posterior variance of the MTD given that n patients have been accrued to the trial.The first criterion is to find the smallest n that satisfies where the above expectation is taken with respect to the marginal distribution of the data and η is specified by the clinician.In other words, we require an estimate of the MTD within a given accuracy as measured by the posterior variance on the average overall possible trials.
In the second criteria, we seek the smallest n such that where l D n is the length of the HPD interval a, a l D n determined by the constraint on the coverage probability This is also known as the average length criteria ALC because for each realization of a trial D n , the corresponding HPD interval is determined by 3.3 and the lengths of these HPD intervals are averaged out with respect to the marginal distribution of the data in 3.2 .The tolerance values of the average length of the HPD interval d and coverage probability 1-α 1 are prespecified by the clinician.Since both the posterior distribution of the MTD and marginal distribution of the data are intractable, Monte Carlo averages were used to estimate the left hand sides of 3.1 and 3.2 .Details on the computation of Var γ | D n and l D n can be found in 9, 18 .

Numerical Results
The simulation results presented below all assume that the feasibility bound α 0.25 and that the dose levels are standardized so that the starting dose for each trial is x 1 0 and all subsequent dose levels are selected from the unit interval.Independent uniform prior distributions were put on the parameters ρ 0 and γ on the intervals 0, θ , 0, 1 , respectively.

Comparison of Designs 3 P with 1 P
We simulate trials under different scenarios corresponding to different values of ρ 0 and γ.For the 1 P design, the first patient receives dose 0 and the next dose x 2 is determined as described in Section 2. The second response y 2 is then generated from the logistic model 2.3 .This process is repeated until a trial of n patients is generated.The same process applies to the 3 P design except that 3 patients will be given the same dose at each stage of the trial and 3 responses are generated from model 2.3 independently instead of 1.Since 0 ≤ ρ 0 ≤ θ and 0 ≤ γ ≤ 1, we considered 12 scenarios corresponding to combinations of three values of ρ 0 , {θ/4, θ/2, 3θ/4} with four values of γ, 0.2, 0.4, 0.6, and 0.8.We will refer to θ/4, θ/2, 3θ/4 as low, intermediate, and high values for ρ 0 , respectively.Similarly, 0.2 and 0.4 will be referred to as low values for the MTD γ and 0.6 and 0.8 as high values.The same value θ 0.3 was used in all simulations.For each design, each sample size n 12, 18, 24, 30, and each combination of ρ 0 , γ , we simulated 5000 trials and calculated the proportions of patients given therapeutic doses, that is, doses in an ε-neighborhood of the true MTD, for ε 0.05, 0.1, 0.15, 0.2.
Table 1 gives the estimated proportions of patients given doses in an ε-neighborhood of the true MTD under designs 1 P and 3 P and the difference in these proportions between the two designs for low values of the true MTD γ and different sample sizes.Table 2 gives the corresponding estimates for high values of the true MTD and Table 3 gives the average of these estimates across the 12 combination of ρ 0 , γ .For low values of the true MTD, design 1 P assigns more patients to doses near the MTD than design 3 P in general and the difference can be as high as 16% for ε 0.05, ρ 0 , γ 0.4, 0.075 , and n 12.For high values of the MTD, Table 2 shows that design 1 P always assigns more patients to doses near the MTD than design 3 P and the highest difference is about 16% for ε 0.2, ρ 0 , γ 0.6, 0.075 , and n 12. The estimated difference in the proportions of patients given doses in an εneighborhood of the true MTD between the 1 P design and 3 P design averaged across the 12 entertained scenarios for ρ 0 , γ for different sample sizes show that the proportion of patients given therapeutic doses under design 1 P is always greater than the corresponding proportion under design 3 P, the largest of these differences is about 5%.The practical impact of this difference is unimportant because of the relatively small number of patients involved in phase I cancer clinical trials.In Tables 4 and 5, we present differences in i the proportions of patients exhibiting DLT, ii the proportions of patients given doses above the "true" MTD,   iii the bias, and iv the mean square error between the 1 P and 3 P designs.Table 6 gives the average values of these statistics, averaged across the 12 entertained scenarios for ρ 0 , γ .Based on i and ii , the results indicate that the two designs are equally safe and that no practical gain is achieved in terms of the efficiency of the estimate of the MTD according to iii and iv .From an ethical point of view, we recommend the 1 P design to prevent the occurrence of three simultaneous DLTs if we were to use the 3 P design.This should be discussed with the clinician after assessing the importance of the length of the trial.

Sample Size Determination
In this section, we present tabulated values for average posterior standard deviation of the MTD and average length HPD interval that are achieved for even sample sizes n 6, . . ., 40 and selected values of θ, the target probability of DLT.Table 7 summarizes the results for θ 0.3.For a given sample size n, each entry in the table was calculated according to the following algorithm: Set j 1.
ii Simulate a trial of n patients D n,j according to the EWOC algorithm described in Section 4.1 with ρ 0,j , γ j as the true model parameters.
iii Calculate the posterior variance Var γ | D n,j and HPD a j , a j l D n,j using 3.3 .
iv Repeat steps i -iii for j 2, . . ., M.   The left hand sides of 3.1 and 3.2 are estimated by

4.1
In the numerical results presented here, we took M 1000.When θ 0.3, Table 7 shows that with 6 patients, we can estimate the MTD with an average posterior standard deviation equal to 25% of the range of the dose and that a 17% decrease in the average posterior standard deviation is achieved when increasing the sample size from 6 to 40 patients.Similarly, the average length of the 90% HPD interval is 74% of the dose range when 6 patients are enrolled in the phase I trial and a reduction of 16% of this length is achieved when increasing the number of patients from 6 to 40.Figures 1 and 2 show the average posterior standard deviation and average lengths of the 95% HPD intervals as functions of the sample size n and target probability of DLT θ.

Illustrative Example
A randomized phase I clinical trial of the combination bortezomib and melphalan as conditioning for autologous stem cell transplant in patients with multiple myeloma was designed using EWOC and the results published in 27 .patients are randomized to arm A where a fixed dose of melphalan 100 mg/m 2 is given before bortezomib and arm B where the same fixed dose of melphalan is given after bortezomib.The doses available for bortezomib are 0.4, 0.7, 1.0, 1.3, and 1.6 mg/m 2 with the first patient in either arm receiving 1.0 mg/m 2 .For each arm, the MTD is defined to be the dose level of bortezomib that when administered in combination with 100 mg/m 2 of melphalan either before or after to a patient results in a probability equal to θ 0.33 that a dose limiting toxicity will be manifest.
In this trial, we start at α 0.3 and increase α in small increments of 0.05 until α 0.5, this value being a compromise between the therapeutic aspect of the Bortezomib and its toxic side effects.Since the doses in this trial are discrete, the dose allocated to the next patient is obtained by rounding down the dose recommended by EWOC algorithm to the nearest discrete dose, see 9, 15 on how to conduct a trial in the presence of a prespecified set of discrete doses.Figure 3 shows all the possible dose sequences that could be realized for the first four patients, assuming that only one patient is treated at each dose and a selected situation for patient 5.The principal investigator PI wanted to determine the number of patients to accrue in each arm so that the posterior standard deviation of the MTD is no more than one-fifth the range of the dose level.This statistical constraint combined with the logistics such as availability of the resources for the PI, number of patients available, and limits on the duration of the trial leads us to select 20 patients per arm.In fact, a sample size of 20 results in an average posterior standard deviation E D 20 Var γ | D 20 1/2 ≈ 0.228; This is just below one-fifth the range of dose levels 0.4-1.6.

Concluding Remarks
The objectives of this paper are to provide a rational for the choice of cohort sizes and number of patients to accrue in a phase I cancer clinical trial when the Bayesian adaptive design EWOC is used.In these trials, patients are typically enrolled in cohorts of size three for no apparent reason other than being in agreement with the traditional "3 3" design and shortening the duration of the trial.We have shown through simulations that the two designs are equally safe and that no practical gain is achieved in terms of the efficiency of the estimate of the MTD.Depending on how important the length of the trial is to the clinician and the institution, we recommend using one patient per dose level to avoid seeing simultaneous toxic events when a group of patients is treated at the same dose level as was the case in a recent phase I trial of the drug TGN1412, see 32 .In that trial, six volunteers were given what was believed to be a safe dose of an anti-inflammatory drug TGN1412.Shortly after, all 6 were admitted into intensive care due to severe reactions including swelling of the head and neck.The simulation results were obtained by generating the toxicity responses using the logistic model 2.3 .This assumption may not be true in practice and the operating characteristics of EWOC may be sensitive to model misspecification.However, for the purpose of model comparisons between 1 P and 3 P designs, any model misspecification for the probability of toxicity response will affect the two designs the same way.
In the second part of the paper, we addressed the SSD problem by giving tabulated values of the number of patients to accrue in a cancer phase I clinical trial as a function of the posterior standard deviation and length of the HPD interval of the MTD on the average over all possible trials.Although this aspect of the trial never received much emphasis in the literature due to the relatively small number of patients and logistical issues associated with such trials, we felt that providing a measure of the accuracy of the estimate of the MTD that can be achieved for a given sample size would help the clinicians understand what can and cannot be achieved during this phase of the trial.Our results show that in general, there is 17% decrease in the average posterior standard deviation of the MTD when the sample size increases from 6 to 40 patients and that for a sample size of 20 patients, the average posterior standard deviation of the MTD is about one-fifth the range of the dose levels.Although this decrease in the average posterior standard deviation seems modest, we note that this is dependent upon the use of prior distribution for the MTD.A more informative prior based on past data will result in smaller average posterior standard deviations and narrower HPD intervals.

4 Figure 1 : 4 Figure 2 :
Figure 1: Estimated mean posterior standard deviation as a function of the number of patients accrued to the trial for different target probabilities of DLT θ.

Figure 3 :
Figure 3: All the possible dose sequences that could be realized for the first four patients and a selected situation for patient 5.It assumes no simultaneous treatment of patients.

Table 1 :
Estimated proportions of patients given doses in an ε-neighborhood of the true MTD under designs 1

Table 3 :
Estimated proportions of patients given doses in an ε-neighborhood of the true MTD under designs 1 P and 3 P and differences between these proportions on the average.

Table 4 :
Estimated proportions of patients exhibiting DLTs, treated above the MTD, MSE, and bias of the MTD under designs 1

Table 5 :
Estimated proportions of patients exhibiting DLTs, treated above the MTD, MSE, and bias of the MTD under designs 1

Table 6 :
Estimated proportions of patients exhibiting DLTs, treated above the MTD, MSE, and bias of the MTD under designs 1 P and 3 P and differences between these proportions on the average.

Table 7 :
Average posterior standard deviation and average length of HPD of the posterior distribution of the MTD that are achieved for a given sample size for θ 0.3.