Lower Confidence Bounds for the Probabilities of Correct Selection

We extend the results of Gupta and Liang 1998 , derived for location parameters, to obtain lower confidence bounds for the probability of correctly selecting the t best populations PCSt simultaneously for all t 1, . . . , k − 1 for the general scale parameter models, where k is the number of populations involved in the selection problem. The application of the results to the exponential and normal probability models is discussed. The implementation of the simultaneous lower confidence bounds for PCSt is illustrated through real-life datasets.


Introduction
The population Π i is characterized by an unknown scale parameter θ i > 0 , i 1, . . ., k.Let T i be an appropriate statistic for θ i , based on a random sample of size n from population Π i , having the probability density function pdf f θ i x 1/θ i f x/θ i with the corresponding cumulative distribution function cdf F θ i x F x/θ i , x > 0, θ i > 0, i 1, . . ., k. F • is an arbitrary continuous cdf with pdf f • .Let the ordered values of T i 's and θ i 's be denoted by T 1 , . . ., T k and θ 1 , . . ., θ k , respectively.Let T i be the statistic having a scale parameter θ i .Let Π i denote the population associated with θ i , the ith smallest of θ i 's.Any other population or sample quantity associated with Π i will be denoted by the subscript i attached to it.Throughout, we assume that there is no prior knowledge about which of Π 1 , . . ., Π k is Π i , i 1, . . ., k and that θ 1 , . . ., θ k are unknown.Call the populations Π k , Π k−1 , . . ., Π k−t 1 as the t best populations.
In practice, the interest is to select the populations Π k , Π k−1 , . . ., Π k−t 1 , that is, the populations associated with the largest unknown parameters θ k , θ k−1 , . . ., θ k−t 1 .For this, the natural selection rule "select the populations corresponding to t largest T i 's, that is, T k , T k−1 , . . ., T k−t 1 as the t best populations" is used.However, it is possible that selected populations according to the natural selection rule may not be the best.Therefore, a question which naturally arises is: what kind of confidence statement can be made about these selection results?Motivated by this, we make an effort to answer this question.
Let CS t a correct selection of the t best populations denote the event that t best populations are actually selected.Then, the probability of correct selection of the t best populations PCS t is: where F • 1 − F • and θ θ 1 , . . ., θ k .For the k populations differing in their location parameters μ 1 , . . ., μ k , Gupta and Liang 1 provided a novel idea to construct simultaneous lower confidence bounds for the PCS t for all t 1, . . ., k − 1.Their result was applied to the selection of the t best means of normal populations.For other references under location set up, one may refer to the papers cited therein.
For other relevant references, one may refer to Gupta et al.In this article, we use the methodology and results of Gupta and Liang 1 to derive simultaneous lower confidence bounds for the PCS t for all t 1, . . ., k − 1 under the general scale parameter models.Section 2 deals with obtaining such intervals.The application of the results to the exponential and normal probability models is discussed in Section 3. In the case of an exponential distribution, Type-II censored data is also considered.In Section 4, we have given some numerical examples, based on real life data sets, to illustrate the procedure of finding out simultaneous lower confidence bounds for the probability of correctly selecting the t best populations PCS t .

Simultaneous Lower Confidence Bounds for PCS t
Most of the results in this Section are as a simple consequence of the results obtained by Gupta and Liang 1 .
From 1.1a , the PCS t θ can be expressed as where for each j k − t 1, . . ., k, where Note that for each j k − t 1 ≤ j ≤ k , P tj θ is increasing in Δ tji 1 , and decreasing in Δ tjm 2 and Δ tjl 3 , respectively.Thus, if we develop simultaneous lower confidence bounds for Δ tji 1 , 1 ≤ i ≤ k − t and upper confidence bounds for Δ tjm 2 and Δ tjl 3 , k − t 1 ≤ m ≤ j ≤ l ≤ k, m / j, l / j for all t 1, . . ., k − 1, then, simultaneous lower confidence bounds for PCS t θ for all t 1, . . ., k − 1 can be established.
Also, from 1.1b , the PCS t θ can be expressed as where for each i 1, . . ., k − t, and decreasing in δ tij 3 , respectively.Thus, if simultaneous lower confidence bounds for δ tim 1 and l / i and upper confidence bounds for δ til 3 , i ≤ k − t < j ≤ k can be obtained, and, thereafter, by using 2.3 and 2.4 , we can obtain simultaneous lower confidence bounds for the PCS t θ for all t 1, . . ., k − 1.
For each P * 0 < P * < 1 , let c k, n, P * be the value such that Note that since T i has a distribution function F y/θ i , i 1, . . ., k, the value of c c k, n, P * is independent of the parameter θ.Let where y max 1, y and y − min 1, y .
Proof.Part a follows on the lines of Lemma 2.1 of Gupta and Liang 1 by noting that θ i /θ j ≥ 1 as j < i and θ i /θ j ≤ 1 for i < j, we have Part b follows immediately from part a and 2.5 .For each t 1, . . ., k − 1 and j k − t 1, . . ., k, let

2.8
The following Lemma is a direct result of Lemma 2.1.

2.14
Define P tL max P t , Q t .

2.15
The authors propose P tL max P t , Q t as an estimator of a lower confidence bound of the PCS t θ for each t 1, . . ., k − 1.The authors have the following theorem.

2.17
This proves the theorem.

Exponential Distribution (i) Complete Data
Let X ij , j 1, . . ., n denote a random sample of size n from the two-parameter exponential population Here, M i , Y i is a sufficient statistic for μ i , θ i , i 1, . . ., k. Y i /θ i has a standardized gamma distribution with shape parameter θ n − 1, i 1, . . ., k.Then, based on statistics Y 1 , . . ., Y k by applying the natural selection rule for each t 1, . . ., k − 1, the associated PCS t is

3.1
where and F • is the distribution function of the standardized gamma distribution with shape parameter θ n − 1.
For each P * 0 < P * < 1 , let c c k, P * , n be the P * quantile of the distribution of the random variable Z defined as Z {max 1≤i≤k Y i /θ i }/{min 1≤i≤k Y i /θ i }, the extreme quotient of independent and identically distributed random variables Y i .
Given k, n, P * the value of c can be obtained from the tables of Hartley's ratio Z with 2 n − 1 degrees of freedom refer to Pearson and Hartley 8 .

(ii) Type-II Censored Data
From each population Π i , i 1, . . ., k, we take a sample of n items.Let X i 1 , . . ., X i n denote the order statistic representing the failure times of n items from population Π i , i 1, . . ., k.
Let r be a fixed integer such that 1 ≤ r ≤ n.Under Type-II censoring, the first r failures from each population Π i are to be observed.The observations from population Π i cease after observing X i r .The n − r items whose failure times are not observable beyond X i r become the censored observations.Type-II censoring was investigated by Epstein and Sobel 9 .The sufficient statistic for θ i , when location parameters are known, is U i is called the total time on test TTOT statistic.It is easy to verify that U i /θ i has standardized gamma distribution with shape parameter r, i 1, . . ., k. Again, the results of complete data can be applied simply by taking ϑ r.

Normal Distribution
Let Π i denote the normal population with mean μ i and variance θ i both unknown , i 1, . . ., k.The sufficient statistic for θ i based on a random sample X i1 , . . ., X in of size n from Π i is , where Once again, the above results of exponential distribution can be used with ϑ n − 1 /2.To illustrate the implementation of the simultaneous lower confidence bounds for the probability of correctly selecting the t best populations PCS t , we consider the following examples.

Examples
Example 4.1.Hill et al. 10 considered data on survival days of patients with inoperable lung cancer, who were subjected to a test chemotherapeutic agent.The patients are divided into the following four categories depending on the histological type of their tumor: squamous, small, adeno, and large denoted by π 1 , π 2 , π 3 , and π 4 , respectively, in this article.The data are a part of a larger data set collected by the Veterans Administrative Lung Cancer Study Group in the USA.
We consider a random sample of eleven survival times from each group, and they are given in Table 1.
Using the standard results of reliability refer to Lawless 11 , one can check the validity of the two-parameter exponential model for Table 1.In this example, the populations with larger survival times i.e., larger Y i 's are desirable.
For Table 1 data set: Hence, according to natural selection rule, the populations π 1 , π 2 , and π 4 are selected as the t t 1, 2, 3 best populations, that is, for t 1, population π 1 which has largest survival time is the best; for t 2, populations π 1 and π 4 which have the two largest survival times are the best; and for t 3, populations π 1 , π 2 , and π 4 which have the three largest survival times are the best.However, it i,s possible that selected populations according to the natural selection rule may not be the best.Therefore, we wish to find out a confidence statement that can be made about the probability of correctly selecting the t best populations PCS t simultaneously for all t 1, 2, 3.
Here, k 4, n 11, and, by taking P * 0.95, we get, from the tables of Pearson and Hartley 8 , c c k, n, P * 3.29.Then, P t and Q t computed for the above data set using 3.5 are given in Table 2.
From Table 2, we have, with at least a 95% confidence coefficient, that simultaneously PCS 1 θ ≥ 0.551725, PCS 2 θ ≥ 0.33380, and PCS 3 θ ≥ 0.174162.Example 4.2.Nelson 12 considered the data which represent times to breakdown in minutes of an insulating fluid subjected to high voltage stress.The times in their observed order were divided into three groups.After analyzing the data, it was shown to follow an exponential distribution.We consider the following data based on a random sample of size 11 each from the three groups and the observations are in Table 4.
For the above data set: Hence, according to natural selection rule, the populations π 1 , π 2 are selected as the t t 1, 2 best populations, that is, for t 1, population π 1 which has largest survival time is the best; and for t 2, populations π 1 and π 2 which have the two largest survival times are the best.However, it is possible that selected populations according to the natural selection rule may not be the best.Therefore, we wish to find out a confidence statement that can be made about the probability of correctly selecting the t best populations PCS t simultaneously for all t 1, 2.
Here, k 3, n 11, and, by taking P * 0.95, we get, from the tables of Pearson and Hartley 8 , c c k, n, P * 2.95.Then, P t and Q t computed for the above data set using 3.5 are given in Table 3. From Table 3, we have, with at least a 95% confidence coefficient, that simultaneously PCS 1 θ ≥ 0.424471 and PCS 2 θ ≥ 0.248274.Example 4.3.Proschan 13 considered the data on intervals between failures in hours of the air-conditioning system of a fleet of 13 Boeing 720 jet air planes.After analyzing the data, he found that the failure distributions of the air-conditioning system for each of the planes was well approximated as exponential.We consider the following data based on four random samples of size seven each, and the observations in the samples are mentioned in Table 5.
For the above data set: 2 , Gupta and Panchpakesan 3 , Mukhopadhyay and Solanky 4 , and the review papers by Gupta and Panchapakesan 5, 6 , Khamnei and Kumar 7 , and the references cited therein.