An Integrated Selection Formulation for the Best Normal Mean : The Unequal and Unknown Variance Case

This paper considers an integrated formulation in selecting the best normal mean in the case of unequal and unknown variances. The formulation separates the parameter space into two disjoint parts, the preference zone (PZ) and the indifference zone (IZ). In the PZ we insist on selecting the best for a correct selection (CS1) but in the IZ we define any selected subset to be correct (CS2) if it contains the best population. We find the least favorable configuration (LFC) and the worst configuration (WC) respectively in PZ and IZ. We derive formulas for P (CS1|LFC), P (CS2|WC) and the bounds for the expected sample size E(N). We also give tables for the procedure parameters to implement the proposed procedure. An example is given to illustrate how to apply the procedure and how to use the table.


Introduction
This paper studies the integrated approach in selecting the best normal mean among k normal populations with unequal and unknown variances.Unlike the case of common and unknown variance studied in Chen and Zhang (1997), we can not use the pooled sample variance to estimate the unknown variances in this case.One important change, compared to the case of common and unknown variance, is that in the case of unequal and unknown variances we use weighted averages as the estimators for the population means.Such change enables us to effectively evaluate the lower bounds of the probability of a correct selection.
Historically, many have studied multiple decision procedures in the case of unequal and unknown variances using the classical approaches.In the indifference zone approach, Bechhofer, Dunnett, and Sobel (1954) had men-tioned the possibility of a two-stage procedure in selecting the best population among k normal populations with unknown means and unequal and unknown variances.Dudewicz (1971) showed that under the indifference zone approach of Bechhofer (1954), a single-stage procedure is not appropriate in the case of unequal and unknown variances.Dudewicz and Dalal (1975) proposed a generalized Stein-type two-stage procedure using the indifference zone approach.In subset selection approach, Gupta and Huang (1974) have proposed a single-stage procedure based on unequal sample sizes for selecting a subset which would contain the best population when the variances are unknown and possibly unequal.Chen and Sobel (1987) was the first article that proposed the integrated selection formulation.They studied a single-stage procedure for the common known variance case.The integrated formulation approach to the selection problem in the case of unequal and unknown variances has not been studied.However, such a case is important in applications since variances are often unknown and unequal in most of the real world problems.The objective of this paper is to develop a two-stage procedure, using the integrated approach, to select the best normal mean from k normal populations with unequal and unknown variances.
In section 2 we state our goal, assumptions and the probability requirements.We propose a two-stage procedure in section 3.In section 4 we derive lower bounds for the probability of a correct selection.These bounds will enable us to effectively compute the unknown parameters in our selection procedure and to guarantee the procedure to satisfy a given probability requirement (P * 1 , P * 2 ).The experimenter can allocate sample sizes according to these parameters.In section 5, we develop bounds for the expected sample size for the proposed procedure.The integrated formulation requires our procedure to satisfy two probability requirements simultaneously.Therefore, it is reasonable that the expected sample size in our procedure is larger than the expected sample size in the indifference zone approach.Section 6 discusses the computation of the tables.Section 7 gives an illustrative example.

Assumptions, Goal, and The Probability Requirements
Suppose that we have k normal populations π 1 , . . ., π k with unknown means and unequal and unknown variances σ 2 1 , σ 2 2 , . . ., σ 2 k .We denote the ordered means as µ and denote π (i) as the population which corresponds to µ [i] .We also define the best population to be π (k) , the population corresponding to the largest population mean µ Our goal is to derive a two-stage selection procedure P E which would where δ * > 0 is a specified constant.
where 0 < δ * is a prespecified constant.We define CS 1 to be the event that our procedure selects the one best population when µ ∈ P Z and CS 2 to be the event that our procedure selects a subset that contains the best population when µ ∈ IZ.We require that our two-stage selection procedure, P E , which will be defined formally in Section 3, for a given (P * 1 , P * 2 ), would satisfy the following probability requirements:

Procedure P E
We propose a Dudewicz-Dalal-type two-stage selection procedure.Procedure P E : (i) Take an initial sample X i1 , X i2 , . . ., X in0 of size n 0 (≥ 2) from population π i I = 1, 2, . . ., k. Compute: (ii) Define [y] denotes the smallest integer greater than or equal to y.Here h * = max {h * 1 , h * 2 } and h * 1 , h * 2 , h * 3 , and c are chosen to satisfy the probability requirement (5).They are the solutions of the following integral equations: When k = 2, for given n 0 and specification (δ Here G and g are Student's t-distribution and density function, respectively.For any k ≥ 3 and any n 0 and specification (δ * , P * 1 , P * 2 , a), the h * 1 , h * 2 and h * 3 values simultaneously satisfy: Here G and g are Student's t-distribution and density function, respectively.
(iii) Take n i − n 0 additional observations from the i th population.Denote the observations by X ij , where i = 1, 2, . . ., k and j = 1, 2, . . ., n i .Compute: where a ij 's are to be chosen so that the following conditions are satisfied: and where i = 1, 2, . . ., k, and we use X Here δ * > c, δ * = ac, and a > 1 is given; is the weighted average associated with population π i .The previous procedure would be meaningful only if the a ij exist.One can show the existence of the a ij 's through simple, but extended lines of algebra.Essentially what is being done on a ij 's here is an adjustment to allow for the fact that sample size must be a whole number, and that therefore a standard error estimate based on the preliminary sample takes only discrete values if all observations are equally weighted.By allocating unequal weights, the estimated standard error can be equated to a specific quantity.

Lower Bounds for P (CS 1 ) and P (CS 2 )
To derive lower bounds for the probability of a correct selection, one needs to find the least favorable configuration as well as the worst configuration.We first define the least favorable configuration in the P Z and the worst configuration in the IZ.
Definition 1 For any σ 2 = (σ 2 1 , σ 2 2 , . . ., σ 2 k ), the least favorable configuration in P Z is defined to be: k ), the worst configuration in IZ is defined to be: To derive lower bounds for P (CS 1 ) and P (CS 2 ) on the parameter space Ω we first show that (for any σ 2 ), where and , then T i 's have independent student's tdistribution with n 0 − 1 degrees of freedom, i = 1, 2, . . ., k.
Proof: The proof can be found in Stein (1945).
As the denominator (s * − c)/h * is a constant, this lemma can only be true because the additional sample sizes n i are random.
Theorem 1 Under procedure P E the LF C for P (CS 1 |P Z) is given by the slippage configuration, i.e. by µ [1] ) is given by the equal parameter configuration, i.e. by Proof: From (18), we find that the random variable T i (i = 1, 2, . . ., k) has a t-distribution with n 0 − 1 degrees of freedom.Rewrite X I as and consider the family of distribution function {(G n (X|µ))} where G n is the distribution of the random variable δ * −c n * • t n−1 + µ where δ * −c n * is a constant, µ is the parameter of interest, and t n−1 is the random variable which has t distribution with n − 1 degrees of freedom.Then it is clear that {(G n (X|µ))} is a stochastically increasing family in µ.We now show that the LF C for P (CS 1 |P Z) is given by µ The proof of the W C for P (CS 2 |IZ) is similar.We start with an arbitrary configuration in the IZ Letting X(i) denote the sample mean associated with µ [i] , we have Define the function ψ = ψ(y 1 , y 2 , . . ., y k ) by when all the y i for j = i are held fixed.Since X's are from a stochastically increasing family, we use Lemma 5.1 by Chen and Sobel (1987) to conclude that . This completes the proof of the Theorem.
Lemma 2 Under procedure P E , the probability of a correct selection in the P Z and the IZ are, respectively: where and Proof: The result is clear for P (CS are the values for procedure P E to satisfy the probability requirement (5).
Here G and g are Student's t-distribution and density function, respectively.Remark: When k = 2, d > 0 can be arbitrarily chosen since if we did not select the one best population, we would select two populations regardless the value of d.
Proof: Denote δ * −c h * by e * .By Lemma 2, By Lemma 2, P (CS Therefore, The first inequality follows from the fact that T 1 and T 2 both have students' t distributions and From Theorem 2, it is clear that as h * 1 , h * 2 → ∞, the left hand sides of (26) and (27) approach 1.
Theorem 3 For any k ≥ 3 and any n 0 and specification (δ * , P * 1 , P * 2 , a), the h * 1 , h * 2 and h * 3 values which simultaneously satisfy: are the values for procedure P E to satisfy the probability requirement (5).
Here G and g are Student's t-distribution and density function, respectively.
Proof: The proof of Theorem 3 is lengthy.It is omitted here.The readers may contact the first author for a full version of the manuscript which contains the proof.The left hand side of the integral equations in (32) and in (33) in Theorem 3 are increasing in h * 1 , h * 2 and h * 3 .Indeed, when h * 1 approaches infinity, the left hand side of (32) increases to 1.When h * 2 and h * 3 approach infinity, the left hand side of (33) also increases to 1. Thus we can always find h * 1 , h * 2 , and h * 3 that satisfy the probability requirements P * 1 and P * 2 .
One should note that it is necessary to let h * 1 , h * 2 and h * 3 vary freely so that our procedure will be applicable for any given probability requirements.Otherwise, the integral equations in (32) and in (33) might not have a solution, and in such a case, procedure P E is not applicable.For instance, if one requires h * 1 = h * 2 , then for some (P * 1 , P * 2 ) the integral equations in (32) and in (33) might not have a solution.
In procedure P E , we let δ * = ac, a > 1.Such a requirement has the advantage that the lower bounds of the probability of a correct selection do not involve c.Instead of letting δ * = ac, a > 1, one can require that δ * = a + c, a > 0. In such a case, (32) in Theorem 3 is unchanged.But (33) is changed to:

The Expected Sample Sizes and The Expected Subset Size
The total sample size n i from population π i (i = 1, 2, . . ., k) in procedure P E can be calculated from (7), It is clear that n i , i = 1, 2, . . ., k, are random variables.The expected values of the sample sizes are often valuable to the experimenter.In our case, studying the expected sample size is especially important since there are two unknowns in the integral equation ( 11) with only one constraint.Thus we have infinitely many solutions.It is clear that we need some additional guidelines to choose h * 2 and h * 3 .The expected sample size, which is a function of h * , will give us some idea about how h * relates to E(n i ).It is reasonable to choose h * 2 and h * 3 to minimize the expected sample sizes in addition to satisfying the probability requirements.To evaluate the expected sample sizes, we use the method of Stein (1945).
Theorem 4 For any i ∈ {1, 2, . . ., k}, the expected sample size E(n i ) for procedure P E satisfies the following inequality: where F i (x) is a chi-squared probability distribution function with i degrees of freedom and e * 2 = δ * −c h *
Proof: The proofs follow the ideas of Stein (1945).It is omitted here.Readers are recommended to contact the first author for a full version of the transcript which contains the proof.

The difference between the upper bounds and the lower bounds of E(n
Proof: These properties are immediate by Theorem 4.

Tables
To carry out procedure P E , one needs the values of h * 1 , h * 2 , and h * 3 .In Table 1, we provide a table of the h 1 value, for the cases k = 3, 4, which satisfies the following integral equation: for P * = .5,.75,.90,.95,.99.
As discussed in section 4, there are infinitely many solutions for the integral equation ( 33).Therefore, it is impossible to provide tables which would cover all the practical situations.A particular solution of the integral equation (33) might be good for one objective yet might not be suitable for another goal.We tabulate in Table 2 the values of h 2 and h 3 for k = 3, 4, P * 2 = .50,.75,.90,.99,where h 2 and h 3 satisfy the following integral equation: The relationship between h * 2 , h * 3 and h 2 , h 3 are as follows: The computation of Table 2 follows the following assumptions: 1. We take a = 2 (thus, c = 1 2 δ * ). 2. We take h * 1 = h * 2 = h 1 = h 2 where h 1 is the value corresponding to P * 1 = P * 2 = .50,.75,.90,.95,.99 in Table 1, respectively.3. The probability is accurate to ±.0003.We use Fortran77 to program the double integrals.Integration is carried out by the Romberg numerical method (Burden and Faires (1988)) in which Neville's algorithm (Burden and Faires (1988)) is used for extrapolation.We modified the subroutines provided by Press, Teukolsky, Vetterling, and Flannery (1992) for our program.The upper limits of the integration for the student's t-density functions depend on the degree of freedom of the density function.All real variables are declared as double precision.Programs are executed under a UNIX environment using SUN4 600 Series and SUN4 Sparc 2000 machines.
We also provide a table (Table 3) of the approximation for the expected sample sizes using the h = h 1 value obtained in Table 1 and for r = . Mathematica was used to perform the calculation.We compute the approximation of the expected sample sizes using the lower bound formula for E(n i ) in Theorem 4. The formula is: By (38), it is clear that E(n i ) is dominated by h 2 r when r is small, h is large, and n 0 is not very large.Indeed, from Table 3 one sees that the change of E(n i ) is proportional to the change of r for a fixed h and when r is small, h is large, and n 0 is not very large.In fact h 2 r is a precise estimate of E(n i ) when r is small, h is large, and n 0 is not very large.

An Illustrative Example
Now we present an example to illustrate the procedure P E .Example: Suppose that we are given three normal populations with unequal and unknown variances.Suppose that we wish to use the integrated formulation to select the population having the largest population mean if µ [3] − µ [2] ≥ 1, and to select a subset that contains the longest mean if µ Suppose that for certain practical reasons, the experimenter decides to take a initial sample of size n 0 = 15.We use Fortran to generate three random samples of size 15 from population N (4, .9 2 ), N (4.5, 1 2 ), and N (5.5, 1.5 2 ).
1 |P E ).For P (CS 2 |P s ), H 0 , H 1 and H 2 correspond to the cases of X (k) being the largest, the second longest, and neither the largest nor the second largest, respectively.The following theorems give lower bounds for P (CS 1 |P E ) and P (CS 2 |P E ).When k = 2, for given n 0 and specification (δ * , P * +∞−∞ G (t + h * 1 ) g(t) dt = P * 1 , and

Table 1 .
This table provides some h 1 values for procedure P E .

Table 2 .
This table provides some ( h 2 , h 3 ) values for procedure P E .

Table 3 .
This table provides some approximations of the expected sample sizes for procedure P E .