Biostatistical Assessment of Mutagenicity Studies : A Stepwise Confidence Procedure

The paper addresses the issue of identifying the maximum safe dose in the context of noninferiority trials where several doses of toxicological compounds exist. Statistical methodology for identifying the maximum safe dose is available for three-arm noninferiority designs with only one experimental drug treatment. Extension of this methodology for several experimental groups exists but with multiplicity adjustment. However, if the experimental or the treatment groups can be ordered a priori according to their treatment effect, then multiplicity adjustment is unneeded. Assuming homogeneity of variances across dose group in normality settings, we employed the generalized Fieller’s confidence interval method in a multiple comparison stepwise procedure by incorporating the partitioning principle in order to control the familywise error rate (FWER). Simulation results revealed that the procedure properly controlled the FWER in strong sense. Also, the power of our procedure increases with increasing sample size and the ratio of mean differences. We illustrate our procedure with mutagenicity dataset from a clinical study.


Introduction
Assessing an investigational substance for mutagenic activity is one of the vital concerns of genetic toxicologists.This is because it is unacceptable to declare a substance as nonmutagenic when in actual fact it is mutagenic.Hence, the objective of mutagenicity assay in regulatory toxicology is the decision on mutagenicity or nonmutagenicity of an investigational substance (Hothorn et al., [1]).Therefore, it is important to adopt reliable biostatistical procedure to properly control (FWER) in a strong sense.However, a deep-seated problem of a statistical procedure is the possibility of a false decision.A typical experimental design used in this assay for genotoxicity assessment in one-way model in  + 2 groups is as follows: {V ,  1 , ⋅ ⋅ ⋅ ,   , V } . ( In this setup, we have two objectives to achieve.Firstly, we need to assess the sensitivity of the experiment in order to ensure the validity of the study by comparing the the positive control to negative control.Secondly, we simultaneously compare each of the  treatments with the negative control.Statistical decision in this settings involves multiple comparison and stepwise procedures: that is, individual inferences are made in stepwise manner if the sequence of individual inferences is in a specific order, as used in Stefensson et al. [2], Cao et al. [3], Chen [4], and Adjabui et al. [5].Some simultaneous inferences remit multiplicity adjustments by invoking the partition principle proposed by Finner and Strassburger [6]: where the parameter space is partitioned into many disjoint subsets and only one of these nonempty disjoint subsets contains the true parameter of interest, so that the FWER will be properly controlled.In literature, mutagenicity dataset has been assessed according to the proof of safety by utilizing the concept of the maximum safe dose (Hothorn and Hauschke [7], by numerous authors, among them Hauschke and Hothorn [8], Hauschke et al. [9], Hothorn and Bretz [10]).As a result, this article discusses statistical aspects in terms of design and analysis using stepwise confidence set-based procedure for identification of maximum safe dose: that is, the highest experiment dose with no biological relevant increase in safety effect in comparison with negative control (Hothorn amd Hauschke [9]).We organize the article as follows.In Section 2, we provide both the testing and confidence notations, which are essential for the construction of our proposed stepwise confidence procedure.We proposed stepwise confidence interval procedure for identifying maximum safe dose for a normally distributed data with equal variances across dose group in Section 3. In Section 4, we carried out simulation studies to investigate the performance of our stepwise confidence interval procedure in terms of FWER and power estimation.We apply our proposed procedure to analyze real dataset as an example in Section 5. We end with conclusion of our study in Section 6.
For some ethical reasons, a negative control group can be included in trial in (3).Therefore, the testing problem can be written as where   is the ratio of difference in means denoted as Equation ( 4) is valid if and only if  +1 −  0 > 0; this is inescapable condition and must be determined in the first step in our stepwise procedure in order to assess the sensitivity of the trial.We can rearrange and express (3) as Let the sample mean estimates be The unknown and common variance  2 can be estimated as where σ2 is the pooled estimator of the variance  2 and  2  ,  2 +1 , and  2 0 denote the sample variances for the experiment and positive and negative groups, respectively.Then, the random variables for  = 1, 2, ⋅ ⋅ ⋅  are the test statistics for the testing problem in (3), which has  distribution with ] =   +  +1 +  0 − 3 degrees of freedom.Pigeot et al. [13] have proved that one can claim safety if where  1−,] is (1 − )-percentile of the central distribution with ] d.f.There are two approaches in solving the problem in (2), namely, the p-value approach and the confidence interval approach.It is noted in literature that the confidence interval approach is preferred to p-value approach.Therefore, in this study, we will construct a confidence set-based approach for   for  = 1, 2, ⋅ ⋅ ⋅  that remits multiplicity adjustment.The concept of maximum safe dose (MSD) for the proof of safety was defined by Hothorn and Hauschke [7] as which means that  0 is rejected if   >  1−,] (  < ) at a given level .Then, safety can be concluded for treatments In solving the testing problem in (3), we construct simultaneous confidence sets using intersection-union principle formulated by Berger [14]: the global null hypothesis can be expressed as the union of the subsets { 0 } of the null hypotheses,  0 against the intersection of the alternatives hypotheses  1 , that is, If  0 is rejected, then  = 1, 2, ⋅ ⋅ ⋅  − 1 are all rejected too in a stepwise fashion.In this case, no multiplicity adjustment is needed.Notice that these hypotheses are a priori ordered according to their importance and one's interest and beliefs but they assume no order restrictions.

Fieller's Confidence Interval.
We employed the generalized Fieller's theorem [15] to construct confidence interval for   for  = 1, 2, ⋅ ⋅ ⋅ .We need to solve  quadratic equations and then adapt the following notation from Hasler et al. [11]: thus yielding the upper confidence bounds as The above confidence interval is only valid as long as  2 +1 >  +1 −  0 by Fieller's theorem [15].The upper confidence limits for one-sided 100(1 − )% confidence interval are for the parameters   .

Stepwise Confidence Interval for Identifying Maximum
Safe Dose Based on Ratio of Mean Differences.We identify maximum safe dose via Hsu-Berger [16] stepwise confidence set procedure: In the first step, we establish the assay sensitivity of the procedure by proving that  2 +1 >  +1 −  0 .If not, the procedure stops, indicating that the sensitivity of experiment is inadequate.We estimate the upper confidence limits in the second step as where  is the total number of treatment doses to be tested.
In step three, we start screening the drug by screening the lowest dose (that is at  = 1) for the first safety drug and sequentially screen the subsequent doses for  = 2, 3, ⋅ ⋅ ⋅  without adjusting the  levels in each of the steps in ascending manner searching for the first integer , if it exists {1 ≤  ≤ } such that   <  and  +1 ≥  (this screens the first unsafe dose that is inferior to the reference dose).In this set up, dose level at step M is estimated as : the highest estimated safe dose that is noninferior to the reference doses, such that it and all lower doses at steps 1, 2, ⋅ ⋅ ⋅  − 1 are also noninferior.Once dose at step  is estimated as M, then the upper confidence bound for doses at  + 2,  + 3, ⋅ ⋅ ⋅  steps is unneeded and should not be computed.A discernible property of this procedure is theoretically more powerful than Bonferroni-Holm step-down procedure (Holm [17]).This is because the  value in our procedure is inexhaustible and hence in each step the entire  is used without multiplicity adjustment while in Bonferroni-Holm step-down procedure the  is exhaustible: that is, /, /( − 1), ⋅ ⋅ ⋅ , /2,  is exhausted and hence conservative.This may lead to liberal decision especially when  is large.The conservativeness of Bonferroni-Holm step-down procedure is overcome by the partition principle employed in our procedure.

Validity of the
Stepwise Procedure.To construct and validate 100(1 − )% simultaneous confidence sets in the above procedure in estimating MSD, the individual confidence intervals should have 100(1 − )% confidence level.For a given parameter space Θ, we set Θ   = (−∞,) as the rejection region and the alternative Θ  = [, ∞) as the acceptance.We can construct simultaneous confidence set for the parameter vector Γ = { 1 ,  2 , ⋅ ⋅ ⋅   } by employing the partitioning principle (Bretz et al. [18]).In identifying the MSD, the parameter space Θ can be decomposed into nonempty disjoint subset as follows: Therefore, Θ * 1 , Θ * 2 , ⋅ ⋅ ⋅ Θ *  partition the entire parameter space Θ.That is, Θ = Θ * 1 ∪Θ * 2 ∪⋅ ⋅ ⋅∪Θ *  .Each of these subsets Θ *  is tested at a local level  with the conviction that the true parameter of interest can be found in one and only one of the nonempty disjoint subsets.This construction leads to multiple comparison procedure which guarantees the control of family-wise error in the strong sense.Hence, (12) can be rewritten as Theorem 1. Suppose that  1 ,  2 ⋅ ⋅ ⋅   are the 100(1 − )% confidence bounds for  1 ,  2 ⋅ ⋅ ⋅   , respectively, with confidence level 1 − .Then, for all  1 ,  2 ⋅ ⋅ ⋅   ∈ Θ, we have The proof of Theorem 1 is a direct application of Theorem 1 of Hsu and Berger [16].
Proof.Case 1.Let M=1 be the step at which the procedure stops.In such a situation, the assay sensitivity of the experiment cannot be assessed Case 2. 2 ≤  ≤ : provides a 100(1−) confidence set for In this setup, the unionized confidence set can be decomposed as follows: Finally, we have Remark 2. The resulting proof of Theorem 1 warrants the control of FWER at level 1- in a strong sense.
For this reason, we state and prove the following proposition.

Proposition 3. The stepwise simultaneous inferences procedure for ratio of difference in means strongly controls the FWER at level 𝛼.
Proof.Let  be any unknown subset of {1, 2, ⋅ ⋅ ⋅ }.Suppose that  = 0, then no FWER will ever exist.Thus, assume that  ̸ = 0 and Remark 4. Proposition 3 guarantees that FWER is properly controlled at prespecified nominal level .This is a critical requirement by Food and Drug Administration (FDA) for statistical procedures in dose-findings.
To confirm these theoretical results, the following simulation studies were carried out at Section 4.

Simulation Studies
4.1.FWER.We conducted simulation studies to investigate the performance of the (FWER).Without loss of generality, we set  = 0.8,  = 0.025.In this study, observations were generated with 1million replications from a normal distribution based on the assumption of equal variance across dose groups.This is indicated in Table 1 as HOMO.We also explored the effect of violation of this assumption as a way of comparing the two situations and this is indicated in Table 1 as HETRO.We used Hasler et al. [11] means configuration   = 16.5,   = 16.5,   1 = 32.66,  2 = 32.66.For HOMO=(  =   =    = 5 for  = 1, 2) and the HETRO= (  = 5,   = 12,    = 9 for  = 1, 2).In the simulation study, we considered only  = 2 experimental treatment.Results from Table 1 indicated that the FWER is properly controlled at a nominal value  = 0.025 in the case of equal variances but that of unequal variances is seriously conservative because simulated values are far below or above 0.025, the nominal level, and hence, poorly controlled the FWER.

Power Estimation.
Power estimation is imperative for a well-design clinical study.There are many definitions of power in multiple comparisons procedures, but in this study, Hence, in this setting, power is defined as the probability of rejecting the incorrect null hypotheses.This power concept is directly related to all-pairs power definition introduced by Ramsay [19].Therefore, (25) expression can be rewritten as Therefore, (26) can be calculated from a  variate noncentral -distribution with ]  degree of freedom and noncentrality parameters for  = 1, 2, ⋅ ⋅ ⋅ , : It is possible to express common variance  as a fraction of difference  +1 − 0 , that is,  = ( +1 − 0 ),  > 0. Hence, the following representation of noncentrality parameter based on the ratio of mean differences is stated as From (28), it is clear that the expected values of power are a function of   , the ratio of mean differences, and the sample sizes.From Table 2, it can be seen that power increases with increasing   and sample size but decreases with increasing

Example
To illustrate our procedure, we used raw data published by Adler and Kliesch [20] for a micronucleus assay by applying 30mg/kg, 50mg/kg, 75mg/kg, and 100mg/kg doses of hydroquinone (Hydro) with positive control 25mg/kg cyclophosphamide.Their primary interest is to demonstrate whether the underlying substance is able to induce chromosome damage or interact with spindle apparatus.The male mice studies results of 24h sampling time are given in Table 3. and summary of the test for micronucleus assay data from Hasler et al. [11] is given in Table 4.
In evaluation of the mutagenicity data from Table 3 and setting  = 0.05 and  = 0.5, where  is the safety threshold, the following results were obtained: θ1 = 0.24 <  = 0.5 we reject  01 θ2 = 0.35 <  = 0.5 we reject  02 θ3 = 0.74 ̸ <  = 0.5 we do not reject  03 ; (29) the procedure then stop at step 3, which implies that it is needless to step it further down.
From this analysis, the doses 30mg/kg and 50mg/kg are declared safe while doses 75mg/kg and 100mg/kg are unsafe at level .Since θ3 = 0.74 ̸ <  = 0.5, 50mg/kg is recommended as the maximum safe dose, which the highest dose that is noninferior to the reference drug at level .Note that 30mg/kg is also noninferior to the reference drug but lower.

Conclusion
In this paper, we have proposed a stepdown confidence set approach for identification of maximum safe dose within the framework of noninferiority clinical trials.The classical three-arm trial for noninferiority investigations involves only one experimental treatment but in clinical trials some therapeutic situations necessitate comparisons with several experimental compounds.Therefore, the proposed ( + 2) −  trial is an extended three-arm noninferiority trial with only one treatment to multiple treatments without multiplicity adjustment.Our simulations results revealed strong control of the familywise type I error rate when we assumed equal variances across dose groups for a normally distributed dataset.This was validated by the partitioning principle.

Table 3 :
Number of micronuclei per animal and 2000 scored cells for the negative control, four doses of hydroquinone and positive control cyclophosphamide.