JPSJournal of Probability and Statistics1687-95381687-952XHindawi Publishing Corporation48427210.1155/2011/484272484272Research ArticleSimultaneous Inference on All Linear Combinations of Means with Heteroscedastic ErrorsYanXin1SuXiaogang2PolanskyAlan M.1Department of Statistics, University of Central FloridaOrlandoFL 32816USAucf.edu2School of Nursing, University of Alabama at BirminghamBirminghamAL 35294USAuab.edu20111102011201122052011080820112011Copyright © 2011 Xin Yan and Xiaogang Su.This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

We proposed a statistical method to construct simultaneous confidence intervals on all linear combinations of means without assuming equal variance where the classical Scheffé's simultaneous confidence intervals no longer preserve the familywise error rate (FWER). The proposed method is useful when the number of comparisons on linear combinations of means is extremely large. The FWERs for proposed simultaneous confidence intervals under various configurations of mean variances are assessed through simulations and are found to preserve the predefined nominal level very well. An example of pairwise comparisons on heteroscedastic means is given to illustrate the proposed method.

1. Introduction

Multiple comparisons on a large number of linear combinations of means is of general interest in many applications. If an inferential statistical procedure relies on the number of comparisons, it may be quite challenge as the number of comparisons is increasing. Additionally, oftentimes we may not be able to make the assumption that all variances of means are equal. Many authors proposed various methods for multiple comparison on means in the past. Scheffé  proposed a method to construct simultaneous confidence intervals for all linear combinations of means while keeping Type I error under control. Since Scheffé's method constructs simultaneous confidence intervals for all possible linear combinations of means, his method has its own advantage when dealing with a large number of comparisons on linear combinations of means. It is understood that there are three major assumptions for Scheffé’s simultaneous confidence intervals to be constructed correctly. (1) The samples are independent, (2) the populations are normally distributed, and (3) populations have an equal variance. The third assumption, often referred to as homoscedasticity, is most vulnerable. The violation of homoscedasticity often results in inflation of the familywise error rate (FWER). As pointed out by Scheffé , his method has certain robustness when the group sample sizes are the same even when the variances are not equal. However, the FWER is out of control in situation where both the variances and sample sizes are unequal. No explicit formula is available so far for simultaneous confidence intervals on all linear combinations of means in the case of unequal variances.

The problem of comparisons on two means in the case of unequal population variances is known as the Behrens-Fisher problem . Dunnett [4, 5] and Nel and van der Merwe  published simulation-based results on assessing different pairwise mean comparison procedures in the unequal variance case. Kim  proposed a practical solution to the Behrens-Fisher problem using the geometry of confidence ellipsoids for two mean vectors. Wilcox  tackled the Behrens-Fisher problem via trimmed means. Christensen and Rencher  compared Type I error rates and power levels in the Behrens-Fisher problem. Fouladi and Yockey  conducted a Monte Carlo study to evaluate the performance of the tests on means under the conditions of normality and abnormality. Hoover  discussed behavioral interventions with heterogeneous subgroup effects in clinical trials. In this paper, a method for constructing simultaneous confidence intervals on all linear combinations of means with unequal variances is proposed. Since there is no limitation for the number of linear combinations of means the proposed method may be used in situation where the comparisons on a large number of linear combinations of means is deemed to be necessary. The proposed simultaneous confidence intervals, to which we refer as the generalized Scheffé’s confidence intervals, have an explicit format that is similar to their classical counterparts. The equal mean variance assumption is no longer needed. In addition, these simultaneous confidence intervals become the classical Scheffé's confidence intervals when all population variances and sample sizes are equal. Most importantly, the proposed simultaneous confidence intervals preserve FWER in all configurations of variances and sample sizes.

2. Generalized Scheffé Confidence Intervals

Suppose that we have I populations and let (μi,σi2) be the true mean and variance for population i. Let (ni,Di,Si2) be the sample size, sample mean, and sample variance of the ith population. In the case of equal variance among I populations, that is, σi2σ2, Scheffé simultaneous confidence intervals on all linear combinations of means i=1Iciμi are given by:   i=1IciDi±IFα,I,N-IMSEi=1Ici2ni, where the mean squared error MSE=i=1I(ni-1)Si2/(N-I) is the pooled estimate of the common variance from I populations; Fα,I,N-I is the upper αth quantile from the F distribution with degrees of freedom I, N-I; N=i=1Ini is the total sample size. If I constants c1,c2,,cI satisfy i=1Ici=0, Scheffé’s simultaneous confidence intervals on all contracts i=1Iciμi are given by:i=1IciDi±(I-1)Fα,I-1,N-IMSEi=1Ici2ni. If pairwise comparisons are of interest, we can set one pair of (ci,cj) to be (1,-1) and rest cis to be zero. This is a special case of contrast. Note that Scheffé’s intervals are useful when dealing with a large number of linear combinations of means. When the total number of observations and the number of populations are determined, the quantity Fα,I,N-I stays the same regardless the number of simultaneous confidence intervals. For the Bonferroni approach, the width of the confidence intervals tend to be wider if the number of linear combinations of means is increasing. Suppose that we have 10 populations each with a sample size 10. If we have 100 simultaneous confidence intervals for the linear combinations of means, the F in Scheffé's method is 10×F0.05,10,100-10=1.9635. If we apply Bonferroni’s approach the |t(0.05/200,100-10)|=3.6118. This means that the width of Scheffé's intervals may be shorter than the width of the Bonferroni's intervals. There is a breakdown point such that Scheffé's intervals may be shorter than the Bonferroni’s intervals when the number of linear combinations of means gets larger. This alerts the common perception that Scheffé's intervals are more conservative than Bonfferoni's intervals.

We now consider the problem of constructing simultaneous intervals without assuming equal variance. Let ai=σi2/i=1Iσi2 and define R1=i=1Iai(Di-μiσi/ni)2=i=1IaiYi,R2=i=1Iaini-1(ni-1)Si2σi2=i=1Iaini-1Zi.

Note that Yi~χ12 and Zi~χ(ni-1)2. Therefore, R1 and R2 are linear combinations of χ2 variables with E(R1)=E(R2)=1.

Finding the exact distribution of linear combination of χ2 variables, known as Satterthwaite’s problem, is rather difficult. Satterthwaite tried to approximate this type of variable as a χν2 random variable divided by its degrees of freedom ν (see ). This degree of freedom ν is then solved via the method of moment estimation. As noted in Casella and Berger , for a variable Y~χν2/ν, we have E(Y)=1. Hence ν=2(EY)2Var(Y)=2Var(Y).

We then set R1~χν12/ν1, and R2~χν22/ν2, where ν1 and ν2 are the respective degrees of freedom for R1 and R2. By applying the results above we can estimate ν1 and ν2. First we consider ν1, which can be found as ν1=2i=1Iai2Var(Yi)=1i=1Iai2=(i=1Iσi2)2i=1Iσi4.A natural estimate of ν1 is given by ν̂1=(i=1ISi2)2/i=1ISi4. For ν2, we have ν2=2i=1I(ai2/(ni-1)2)Var(Zi)=1i=1Iai2/(ni-1)=(i=1Iσi2)2i=1Iσi4/(ni-1).

It can be estimated by ν̂2 and ν̂2=(i=1ISi2)2/i=1ISi4/ni-1. Furthermore, note that R1 is independent of R2, therefore, R=R1/R2 has approximately the F distribution with degrees of freedom ν1 and ν2. It turns out that R=R1/R2 has a very simple formR=R1R2=i=1Ini(Di-μi)2i=1ISi2  ~  Fν1,ν2. Note that if the I populations have equal variance, σi2σ2, we have ν1=I; additionally, if all populations have the same sample size, that is, nin, then ν2=N-I.

To derive the generalized Scheffé's interval we would need the following projection lemma (see  pages 231-232). For I real numbers z1,z2,,zI and all a=(a1,a2,,aI)I to satisfy the following inequality: i=1Iaiyi-r(i=1Iai2)1/2i=1Iaizii=1Iaiyi+r(i=1Iai2)1/2, the necessary and sufficient condition is i=1I(zi-yi)2r2. We then choose zi=niμi and let z=(z1,z2,,zI) satisfy i=1I(zi-niDi)2Fα,ν̂1,ν̂2i=1ISi2 which constitutes the interior of a I-dimensional sphere centered at the point (n1D1,n2D2,,nIDI) with radius Fα,ν̂1,ν̂2i=1ISi2. By applying the projection lemma to vector a, where a=(c1/n1,c2/n2,,cI/nI), we have {i=1I(niDi-niμi)2Fα,ν̂1,ν̂2i=1ISi2}={i=1Icininiμii=1IcininiDi±Fα,ν̂1,ν̂2  i=1ISi2i=1Ici2ni}={i=1Iciμii=1IciDi±Fα,ν̂1,ν̂2  i=1ISi2i=1Ici2ni}. Choosing Fα,ν̂1,ν̂2, the 1-α quantile of an F distribution with ν̂1 and ν̂2 degrees of freedom, based on the results in (2.7), we have P{i=1I(niDi-niμi)2Fα,ν̂1,ν̂2i=1ISi2}=1-α.

Applying the projection lemma this probability can be pivoted to give the following generalized 1-α simultaneous confidence intervals for i=1Iciμi,i=1IciDi±Fα,ν̂1,ν̂2  i=1ISi2i=1Ici2ni. For population mean μi’s and their pairwise differences μi-μj, the generalized Scheffé’s confidence intervals areDi±Fα,ν̂1,ν̂2  i=1ISi21ni,Di-Dj±Fα,ν̂1,ν̂2  i=1ISi21ni+1nj, where 1ijI. By comparing (2.1) with (2.11), it can be seen that the generalized Scheffé's confidence intervals are very similar to their classical counterparts.

3. Assessment of Familywise Error Rate

The Type I error in multiple comparisons is referred to as the probability of incorrectly rejecting at least one of the null hypotheses that make up the family. The validity of the proposed generalized Scheffé's confidence intervals largely lies in successfully controlling the FWER at a given nominal level α.

There are two major factors, population sample sizes and variances, which affect the performance of the Scheffé's confidence intervals. We will show through simulation that the FWER will be inflated in the situation where population variances are unequal.

A variety of configurations of variances and sample sizes will be selected to assess the performance of the generalized Scheffé method. To this end, the number of groups is chosen to be I=4. Without loss of generality, we use 0 for all population means, that is, (μ1,μ2,μ3,μ4)=(0,0,0,0). The specification of sample sizes and variances is given in Table 1.

Coverage rates of 95% Scheffé’s intervals (S) and generalized Scheffé (GS) intervals: two sets of inferences are considered, the population means and pairwise mean differences.

 Sample size Equal variances Unequal variances (0.1, 0.1, 0.1, 0.1) (1, 1, 1, 1) (0.3, 0.3, 0.1, 0.1) (3, 3, 1, 1) Balanced S GS S GS S GS S GS (5, 5, 5, 5) 98.00 98.60 98.45 99.00 93.60 96.85 94.05 97.35 (10, 10, 10, 10) 97.90 98.45 98.20 98.65 94.75 97.10 95.10 97.30 (20, 20, 20, 20) 97.70 97.90 98.20 98.35 93.90 96.25 94.80 96.45 (50, 50, 50, 50) 97.90 97.95 98.35 98.35 94.35 96.75 94.55 96.60 Unbalanced (5, 5, 10, 10) 98.20 98.75 98.20 98.70 87.70 97.50 87.20 97.40 (5, 5, 20, 20) 98.40 99.10 97.90 98.45 73.00 96.20 76.50 96.65 (10, 10, 20, 20) 97.95 98.05 98.35 98.35 88.40 97.30 87.05 96.65 (10, 10, 50, 50) 98.60 98.80 98.70 98.65 73.95 96.70 72.55 97.10

Although Scheffé’s intervals apply to inference on all linear combinations, for simplicity, we have focused on two sets of inferences only: population means and their pairwise differences. For each configuration we conducted 5,000 simulation runs and for each run 95% Scheffé's intervals and generalized Scheffé's intervals on both population means and pairwise mean differences were computed. We then obtained the coverage rates that the proposed intervals contain the true means, which all equal 0.

Table 1 reports the coverage rates based on both methods. Note that the empirical FWER would be one minus the coverage rate. Clearly, in the case of equal variances, both methods give very similar rates of coverage for balanced design or unbalanced design. In the unequal variance case, the coverage rate of Scheffé’s method drops. However, its FWER still stays well within the nominal level, that is, around α=0.05, for balanced designs. This confirms the notion of Scheffé that his method is robust to heteroscedasticity when sample sizes from populations are equal. We notice that the FWER is inflated when sample sizes are different among the populations. It can be found from Table 1 that when (σ1,σ2,σ3,σ4)=(0.3,0.3,0.1,0.1), for sample sizes (n1,n2,n3,n4)=(5,5,10,10),  (5,5,20,20),(10,10,20,20),  (10,10,50,50) the FWERs are 12.3%, 27%, 11.6%, and 26.5%, respectively. When (σ1,σ2,σ3,σ4)=(3,3,1,1), for sample sizes (n1,n2,n3,n4)=(5,5,10,10),(5,5,20,20),(10,10,20,20),(10,10,50,50), the FWERs are 12.8%, 23.5%, 12.95%, and 27.45%, respectively. Note that these FWERs are all significantly greater than the nominal level α=0.05%. It can be seen that the greater the difference in sample sizes is the larger the corresponding FWER will be. On the other hand, the performance of the generalized Scheffé method is much more robust. For the same configuration settings, the FWERs based on the generalized Scheffé‘s intervals are between 0.025% and 0.038%. Although it is conservative, but it stays well within the nominal level of α=0.05.

It would also be interesting to see how different in width the two types of intervals are. Comparing (2.1) with (2.12), one can see that the difference between them are due to the following two terms: Q1=Fα,I,N-IIMSE,Q2=Fα,ν̂1,ν̂2Si2.

The averaged Q1 and Q2 from 5,000 simulation runs are presented in Table 2.

Comparison of interval widths between Scheffé’s and generalized Scheffé’s methods. Their interval widths differ in quantities: Q1=I·Fα,I,N-I·MSE in Scheffé’s method and Q2=Fα,ν̂1,ν̂2·Si2 in the generalized Scheffé’s method (α=0.05).

 Sample size Equal variances Unequal variances (0.1, 0.1, 0.1, 0.1) (1, 1, 1, 1) (0.3, 0.3, 0.1, 0.1) (3, 3, 1, 1) Balanced Q1 Q2 Q1 Q2 Q1 Q2 Q1 Q2 (5, 5, 5, 5) 0.343 0.370 3.422 3.680 0.754 0.909 7.598 9.182 (10, 10, 10, 10) 0.322 0.331 3.229 3.323 0.718 0.813 7.166 8.105 (20, 20, 20, 20) 0.315 0.319 3.153 3.194 0.703 0.778 7.032 7.780 (50, 50, 50, 50) 0.310 0.312 3.105 3.120 0.694 0.759 6.945 7.595 Unbalanced (5, 5, 10, 10) 0.329 0.350 3.284 3.490 0.602 0.905 6.052 9.125 (5, 5, 20, 20) 0.318 0.340 3.196 3.423 0.489 0.905 4.894 9.093 (10, 10, 20, 20) 0.318 0.326 3.173 3.250 0.597 0.812 5.951 8.092 (10, 10, 50, 50) 0.312 0.321 3.128 3.218 0.466 0.810 4.669 8.138

It can be seen that they are very close to each other in the case of equal variances. However, in the case of unequal variances, Q1 becomes over optimistically smaller than Q2, which leads to the inflation of FWER. Finally, Scheffé's intervals are derived from the fact that the F statistic F=i=1I(Di-μi)2/IMSE, follows the FI,N-I distribution under a number of assumptions. When these assumptions are violated, the performance of Scheffé's intervals would depend on how the above F statistic deviates from the distribution FI,N-I. For the generalized Scheffé's intervals, the FWER largely depends on how accurately R1/R2 approximates Fν1,ν2. Figure 1 plots the empirical distribution function of R1/R2 and the F statistic in (2.13), along with their designated F distribution. We selected the following four different configurations of variances and sample sizes, which correspond to homoscedastic/heteroscedastic and balanced/unbalanced cases:

(σ1,σ2,σ3,σ4)=(1,1,1,1), (n1,n2,n3,n4)=(10,10,10,10), (10,10,50,50),

(σ1,σ2,σ3,σ4)=(3,3,1,1), (n1,n2,n3,n4)=(10,10,10,10), (10,10,50,50).

Empirical density plots: each density curve is generated from 5000 simulation runs. The solid line is for the F statistic or R1/R2 and the dotted line is for their designated F distribution.

The configuration (1) indicates the equal variance for the 4 means with equal or different sample sizes. The configuration (2) indicates the unequal variances for the 4 means with equal or different sample sizes. We calculate the empirical distribution function of R1/R2, and it can be seen that they are nearly overlaps with Fν1,ν2 in all four cases of configurations of variance and sample sizes (1(a)–4(a) in Figure 1). The overlapping between edf of R1/R2 and Fν1,ν2 suggests an excellent approximation of the F distribution to the ratio of R1 and R2. In addition, the edf of the F statistic also matches well with the distribution FI,N-I (1(b)–3(b) in Figure 1), except in the unbalanced heteroscedastic case where Scheffé's method fails (4(b) in Figure 1). This explains why the FWER is inflated in the case of unequal variances.

One last comment, the above simulation results suggest that the widths of the generalized Scheffé intervals tend to be wider than that of the Scheffé intervals. This is our overall impression, but may not always be true in general. In the simulations, from time to time, we observed narrower generalized Scheffé intervals. We will see this feature from the data analysis example in the next section.

4. Example of Data Analysis

Solomon et al.  studied smoking behavior in pregnant women. They examined the women's determination to quit smoking while pregnant. They interviewed 349 women at their first prenatal visit, all of whom were smokers when they became pregnant, and were classified into four groups: precontemplation (PC), contemplation (C), preparation (P), and action (A). Their intention was to look at the subsequent smoking behavior of these subjects during the course of pregnancy, but one important consideration was how much these women smoked when they became pregnant. The sample sizes, means, and standard deviations of these four groups, in terms of cigarettes smoked per day when they became pregnant, are given in Table 4. Noting that the smallest sample size is 37, we do not need to worry about the normality assumption even if the response of interest is count or integer.

Table 3 presents the 95% Scheffé’s intervals and the generalized Scheffé intervals for the four group means and their differences. Since both sample sizes and variances are quite different from each other, the generalized Scheffé intervals are more reliable.

The simultaneous Scheffé intervals and Generalized Scheffé’s intervals on means and pairwise mean differences in the cigarette example.

Parameters Scheffé Generalized Scheffé
Mean
μPC (20.66, 28.94) (20.76, 28.84)
μC (10.95, 22.25) (11.08, 22.12)
μP (26.02, 31.58) (26.09, 31.51)
μA (10.08, 17.32) (10.16, 17.24)
Pairwise comparisons
μPC-μC (1.19, 15.21) (1.36, 15.04)
μPC-μP (−8.98, 0.98) (−8.87, 0.87)
μPC-μA (5.59, 16.60) (5.73, 16.47)
μC-μP (−18.49, −5.90)(−18.35, −6.05)
μC-μA (−3.81, 9.61) (−3.65, 9.45)
μP-μA (10.53, 19.67) (10.64, 19.56)

Sample sizes, means, and sample standard deviations of 349 women who stopped smoking during pregnancy period .

Label Condition Descriptionniy¯isi
PCPrecontemplationSmokes and has no plan to quit smoking 69 24.8 13.3
C Contemplation Smokes but is thinking of quitting 37 16.6 5.2
P Preparation Smokes but has made some effort at quitting 153 28.8 12.2
A Action Has already quit 90 13.7 8.8

One may make a number of inferences with a joint confidence level of 95%. For example, women in the preparation (P) group have an average number of cigarettes every day ranging from 26.09 to 31.51, which seems to be the most frequent smoker group. There is no significant difference found between group P and group PC, because their difference has a confidence interval (-8.87,0.87) that includes 0. It is also quite interesting to notice that the generalized Scheffé’s intervals are even narrower than the Scheffé intervals.

5. Discussion

Among others, the Scheffé method is one of the commonly-used method to make simultaneous inference on all linear combinations of means. Scheffé intervals are for all possible linear combinations of means and this brings benefit if a large number of linear combinations of means need to be compared. Assumption of equal variance for all means is needed to control type I error. When this assumption is violated the proposed method can be conveniently used for constructing simultaneous confidence intervals where type I error is controlled at a prespecified nominal level. Results from simulations show that the FWER of the proposed simultaneous confidence intervals are well preserved at a nominal level and the equal variance assumption can be simply ignored.

SchefféH.A method for judging all contrasts in the analysis of varianceBiometrika195340871040057504ZBL0052.15202SchefféH.The Analysis of Variance1959New York, NY, USAJohn Wiley & Sonsxvi+4770116429SchefféH.Practical solutions of the Behrens-Fisher problemJournal of the American Statistical Association19706515011504027373210.2307/2284332ZBL0224.62009DunnettC. W.Pairwise multiple comparison in the unequal variance caseJournal of American Statistical Association19807579680010.2307/2287161DunnettC. W.Pairwise multiple comparisons in the homogeneous variance, unequal sample size caseJournal of American Statistical Association19807578979510.2307/2287160NelD. G.van der MerweC. A.A solution to the multivariate Behrens-Fisher problemCommunications in Statistics. Simulation and Computation198615123719373510.1080/03610928608829342871335KimS. J.A practical solution to the multivariate Behrens-Fisher problemBiometrika19927911711762-s2.0-001109427110.1093/biomet/79.1.171ZBL0850.62427WilcoxR. R.Simulation results on solutions to the multivariate Behrens-Fisher problem via trimmed meansThe Statistician19954421322510.2307/2348445ChristensenW. F.RencherA. C.A comparison of type I error rates and power levels for seven solutions to the multivariate Behrens-Fisher problemCommunications in Statistics. Simulation and Computation1997264125112732-s2.0-003126948710.1080/03610919708813439ZBL1100.62575FouladiR. T.YockeyR. D.Type I error control of two-group multivariate tests on means under conditions of heterogeneous correlation structure and varied multivariate distributionsCommunications in Statistics. Simulation and Computation2002313375400191060010.1081/SAC-120003848ZBL1079.62517HooverD. R.drhoover@stat.rutgers.edu.Clinical trials of behavioural interventions with heterogeneous teaching subgroup effectsStatistics in Medicine2002301351136410.1002/sim.1139CasellaG.BergerR. L.Statistical Inference2001DuxburyHsuJ. C.Multiple Comparisons, Theory and Methods1999London, UKChapman and Hall/CRCxiv+2771629127SolomonL. J.Secker-WalkerR. H.SkellyJ. M.FlynnB. S.Stages of change in smoking during pregnancy in low-income womenJournal of Behavioral Medicine19961943333442-s2.0-003005452710.1007/BF01904760