Simultaneous Inference on All Linear Combinations of Means with Heteroscedastic Errors

We proposed a statistical method to construct simultaneous confidence intervals on all linear combinations of means without assuming equal variance where the classical Scheffé’s simultaneous confidence intervals no longer preserve the familywise error rate FWER . The proposed method is useful when the number of comparisons on linear combinations of means is extremely large. The FWERs for proposed simultaneous confidence intervals under various configurations of mean variances are assessed through simulations and are found to preserve the predefined nominal level very well. An example of pairwise comparisons on heteroscedastic means is given to illustrate the proposed method.


Introduction
Multiple comparisons on a large number of linear combinations of means is of general interest in many applications.If an inferential statistical procedure relies on the number of comparisons, it may be quite challenge as the number of comparisons is increasing.Additionally, oftentimes we may not be able to make the assumption that all variances of means are equal.Many authors proposed various methods for multiple comparison on means in the past.Scheffé 1 proposed a method to construct simultaneous confidence intervals for all linear combinations of means while keeping Type I error under control.Since Scheffé's method constructs simultaneous confidence intervals for all possible linear combinations of means, his method has its own advantage when dealing with a large number of comparisons on linear combinations of means.It is understood that there are three major assumptions for Scheffé's simultaneous confidence intervals to be constructed correctly.1 The samples are independent, 2 the populations are normally distributed, and 3 populations have an equal variance.The third assumption, often referred to as homoscedasticity, is most vulnerable.The violation of homoscedasticity often results in inflation of the familywise error rate FWER .As pointed out by Scheffé 2 , his method has certain robustness when the group sample sizes are the same even when the variances are not equal.However, the FWER is out of control in situation where both the variances and sample sizes are unequal.No explicit formula is available so far for simultaneous confidence intervals on all linear combinations of means in the case of unequal variances.
The problem of comparisons on two means in the case of unequal population variances is known as the Behrens-Fisher problem 3 .Dunnett 4, 5 and Nel and van der Merwe 6 published simulation-based results on assessing different pairwise mean comparison procedures in the unequal variance case.Kim 7 proposed a practical solution to the Behrens-Fisher problem using the geometry of confidence ellipsoids for two mean vectors.Wilcox 8 tackled the Behrens-Fisher problem via trimmed means.Christensen and Rencher 9 compared Type I error rates and power levels in the Behrens-Fisher problem.Fouladi and Yockey 10 conducted a Monte Carlo study to evaluate the performance of the tests on means under the conditions of normality and abnormality.Hoover 11 discussed behavioral interventions with heterogeneous subgroup effects in clinical trials.In this paper, a method for constructing simultaneous confidence intervals on all linear combinations of means with unequal variances is proposed.Since there is no limitation for the number of linear combinations of means the proposed method may be used in situation where the comparisons on a large number of linear combinations of means is deemed to be necessary.The proposed simultaneous confidence intervals, to which we refer as the generalized Scheffé's confidence intervals, have an explicit format that is similar to their classical counterparts.The equal mean variance assumption is no longer needed.In addition, these simultaneous confidence intervals become the classical Scheffé's confidence intervals when all population variances and sample sizes are equal.Most importantly, the proposed simultaneous confidence intervals preserve FWER in all configurations of variances and sample sizes.

Generalized Scheff é Confidence Intervals
Suppose that we have I populations and let μ i , σ 2 i be the true mean and variance for population i.Let n i , D i , S 2 i be the sample size, sample mean, and sample variance of the ith population.In the case of equal variance among I populations, that is, σ 2 i ≡ σ 2 , Scheffé simultaneous confidence intervals on all linear combinations of means I i 1 c i μ i are given by: where the mean squared error MSE If pairwise comparisons are of interest, we can set one pair of c i , c j to be 1, −1 and rest c i s to be zero.This is a special case of contrast.Note that Scheffé's intervals are useful when dealing with a large number of linear combinations of means.When the total number of observations and the number of populations are determined, the quantity F α,I,N−I stays the same regardless the number of simultaneous confidence intervals.For the Bonferroni approach, the width of the confidence intervals tend to be wider if the number of linear combinations of means is increasing.Suppose that we have 10 populations each with a sample size 10.If we have 100 simultaneous confidence intervals for the linear combinations of means, the √ F in Scheffé's method is 10 × F 0.05,10,100−10 1.9635.If we apply Bonferroni's approach the |t 0.05/200, 100 − 10 | 3.6118.This means that the width of Scheffé's intervals may be shorter than the width of the Bonferroni's intervals.There is a breakdown point such that Scheffé's intervals may be shorter than the Bonferroni's intervals when the number of linear combinations of means gets larger.This alerts the common perception that Scheffé's intervals are more conservative than Bonfferoni's intervals.
We now consider the problem of constructing simultaneous intervals without assuming equal variance.Let a i σ 2 i / I i 1 σ 2 i and define Finding the exact distribution of linear combination of χ 2 variables, known as Satterthwaite's problem, is rather difficult.Satterthwaite tried to approximate this type of variable as a χ 2 ν random variable divided by its degrees of freedom ν see 12 .This degree of freedom ν is then solved via the method of moment estimation.As noted in Casella and Berger 12 , for a variable We then set where ν 1 and ν 2 are the respective degrees of freedom for R 1 and R 2 .By applying the results above we can estimate ν 1 and ν 2 .First we consider ν 1 , which can be found as ν 1 2

4 Journal of Probability and Statistics
A natural estimate of ν 1 is given by ν 1 It can be estimated by ν 2 and ν 2

2.7
Note that if the I populations have equal variance, σ 2 i ≡ σ 2 , we have ν 1 I; additionally, if all populations have the same sample size, that is, To derive the generalized Scheffé's interval we would need the following projection lemma see 13 pages 231-232 .For I real numbers z 1 , z 2 , . . ., z I and all a a 1 , a 2 , . . ., a I ∈ R I to satisfy the following inequality:

2.9
Choosing F α, ν 1 , ν 2 , the 1 − α quantile of an F distribution with ν 1 and ν 2 degrees of freedom, based on the results in 2.7 , we have

2.10
Applying the projection lemma this probability can be pivoted to give the following generalized 1 − α simultaneous confidence intervals for I i 1 c i μ i , For population mean μ i 's and their pairwise differences μ i − μ j , the generalized Scheffé's confidence intervals are 12 where 1 ≤ i / j ≤ I.By comparing 2.1 with 2.11 , it can be seen that the generalized Scheffé's confidence intervals are very similar to their classical counterparts.

Assessment of Familywise Error Rate
The Type I error in multiple comparisons is referred to as the probability of incorrectly rejecting at least one of the null hypotheses that make up the family.The validity of the proposed generalized Scheffé's confidence intervals largely lies in successfully controlling the FWER at a given nominal level α.
There are two major factors, population sample sizes and variances, which affect the performance of the Scheffé's confidence intervals.We will show through simulation that the FWER will be inflated in the situation where population variances are unequal.
A variety of configurations of variances and sample sizes will be selected to assess the performance of the generalized Scheffé method.To this end, the number of groups is chosen to be I 4. Without loss of generality, we use 0 for all population means, that is, μ 1 , μ 2 , μ 3 , μ 4 0, 0, 0, 0 .The specification of sample sizes and variances is given in Table 1.Although Scheffé's intervals apply to inference on all linear combinations, for simplicity, we have focused on two sets of inferences only: population means and their pairwise differences.For each configuration we conducted 5,000 simulation runs and for each run 95% Scheffé's intervals and generalized Scheffé's intervals on both population means and pairwise mean differences were computed.We then obtained the coverage rates that the proposed intervals contain the true means, which all equal 0. Table 1 reports the coverage rates based on both methods.Note that the empirical FWER would be one minus the coverage rate.Clearly, in the case of equal variances, both methods give very similar rates of coverage for balanced design or unbalanced design.In the unequal variance case, the coverage rate of Scheffé's method drops.However, its FWER still stays well within the nominal level, that is, around α 0.05, for balanced designs.This confirms the notion of Scheffé that his method is robust to heteroscedasticity when sample sizes from populations are equal.We notice that the FWER is 3, 3, 1, 1 , for sample sizes n 1 , n 2 , n 3 , n 4 5, 5, 10, 10 , 5, 5, 20, 20 , 10, 10, 20, 20 , 10, 10, 50, 50 , the FWERs are 12.8%, 23.5%, 12.95%, and 27.45%, respectively.Note that these FWERs are all significantly greater than the nominal level α 0.05%.It can be seen that the greater the difference in sample sizes is the larger the corresponding FWER will be.On the other hand, the performance of the generalized Scheffé method is much more robust.For the same configuration settings, the FWERs based on the generalized Scheffé's intervals are between 0.025% and 0.038%.Although it is conservative, but it stays well within the nominal level of α 0.05.
It would also be interesting to see how different in width the two types of intervals are.Comparing 2.1 with 2.12 , one can see that the difference between them are due to the following two terms:

3.1
The averaged Q 1 and Q 2 from 5,000 simulation runs are presented in Table 2.
It can be seen that they are very close to each other in the case of equal variances.However, in the case of unequal variances, Q 1 becomes over optimistically smaller than Q 2 , which leads to the inflation of FWER.Finally, Scheffé's intervals are derived from the fact that the F statistic follows the F I,N−I distribution under a number of assumptions.When these assumptions are violated, the performance of Scheffé's intervals would depend on how the above F statistic deviates from the distribution F I,N−I .For the generalized Scheffé's intervals, the FWER

Sample size
Equal variances Unequal variances 0.1, 0.1, 0.1, 0.1 largely depends on how accurately R 1 /R 2 approximates F ν 1 ,ν 2 .Figure 1 plots the empirical distribution function of R 1 /R 2 and the F statistic in 2.13 , along with their designated F distribution.We selected the following four different configurations of variances and sample sizes, which correspond to homoscedastic/heteroscedastic and balanced/unbalanced cases:  1 .The overlapping between edf of R 1 /R 2 and F ν 1 ,ν 2 suggests an excellent approximation of the F distribution to the ratio of R 1 and R 2 .
In addition, the edf of the F statistic also matches well with the distribution F I,N−I 1 b -3 b in Figure 1 , except in the unbalanced heteroscedastic case where Scheffé's method fails 4 b in Figure 1 .This explains why the FWER is inflated in the case of unequal variances.One last comment, the above simulation results suggest that the widths of the generalized Scheffé intervals tend to be wider than that of the Scheffé intervals.This is our overall impression, but may not always be true in general.In the simulations, from time to time, we observed narrower generalized Scheffé intervals.We will see this feature from the data analysis example in the next section.

Example of Data Analysis
Solomon et al. 14 studied smoking behavior in pregnant women.They examined the women's determination to quit smoking while pregnant.They interviewed 349 women at their first prenatal visit, all of whom were smokers when they became pregnant, and were classified into four groups: precontemplation PC , contemplation C , preparation P , and action A .Their intention was to look at the subsequent smoking behavior of these subjects during the course of pregnancy, but one important consideration was how much   these women smoked when they became pregnant.The sample sizes, means, and standard deviations of these four groups, in terms of cigarettes smoked per day when they became pregnant, are given in Table 4. Noting that the smallest sample size is 37, we do not need to worry about the normality assumption even if the response of interest is count or integer.Table 3 presents the 95% Scheffé's intervals and the generalized Scheffé intervals for the four group means and their differences.Since both sample sizes and variances are quite different from each other, the generalized Scheffé intervals are more reliable.
One may make a number of inferences with a joint confidence level of 95%.For example, women in the preparation P group have an average number of cigarettes every day ranging from 26.09 to 31.51, which seems to be the most frequent smoker group.There is no significant difference found between group P and group PC, because their difference has a confidence interval −8.87, 0.87 that includes 0. It is also quite interesting to notice that the generalized Scheffé's intervals are even narrower than the Scheffé intervals.

Discussion
Among others, the Scheffé method is one of the commonly-used method to make simultaneous inference on all linear combinations of means.Scheffé intervals are for all possible linear combinations of means and this brings benefit if a large number of linear combinations of means need to be compared.Assumption of equal variance for all means is needed to control type I error.When this assumption is violated the proposed method can be conveniently used for constructing simultaneous confidence intervals where type I error

I i 1
n i − 1 S 2 i / N − I is the pooled estimate of the common variance from I populations; F α,I,N−I is the upper αth quantile from the F distribution with degrees of freedom I, N − I; N I i 1 n i is the total sample size.If I constants c 1 , c 2 , . . ., c I satisfy I i 1 c i 0, Scheffé's simultaneous confidence intervals on all contracts I i 1 c i μ i are given by:

FFigure 1 :
Figure 1: Empirical density plots: each density curve is generated from 5000 simulation runs.The solid line is for the F statistic or R 1 /R 2 and the dotted line is for their designated F distribution.

Table 3 :
The simultaneous Scheffé intervals and Generalized Scheffé's intervals on means and pairwise mean differences in the cigarette example.

Table 4 :
Sample sizes, means, and sample standard deviations of 349 women who stopped smoking during pregnancy period 14 .