Ranked Set Sampling : Its Relevance and Impact on Statistical Inference

Ranked set sampling RSS is an approach to data collection and analysis that continues to stimulate substantial methodological research. It has spawned a number of related methodologies that are active research arenas as well, and it is finally beginning to find its way into significant applications beyond its initial agricultural-based birth in the seminal paper by McIntyre 1952 . In this paper, we provide an introduction to the basic concepts underlying ranked set sampling, in general, with specific illustrations from the oneand two-sample settings. Emphasis is on the breadth of the ranked set sampling approach, with targeted discussion of the many options available to the researcher within the RSS paradigm. The paper also provides a thorough bibliography of the current state of the field and introduces the reader to some of the most promising new methodological extensions of the RSS approach to statistical data analysis.


Introduction
Basic statistical tenets and principles play vital roles in important research across all of the sciences-agricultural, biological, ecological, engineering, medical, physical, and social-and perhaps the most fundamental of these principles is that which ensures the experimental data to be collected are truly representative of the scientific questions under investigation.If this principle is violated, even optimal statistical procedures will not allow us to make legitimate statistical inferences about the questions of interest.
In settings where we are concerned with making inferences about a population based on a sample of data collected from the population, the most common approach to data collection utilizes the concept of a simple random sample SRS from the population.The emphasis throughout this paper will be on the setting where the underlying population is taken to be infinite or large enough to be well approximated by an infinite population.In such settings, it is standard to assume that the underlying population is represented by 2 ISRN Probability and Statistics a probability distribution with p.d.f.f x and c.d.f.F x and a simple random sample of size n from the population can then be defined as follows.
Definition 1.1.A collection of random variables X 1 , . . ., X n is said to be a simple random sample SRS of size n from an underlying probability distribution with p.d.f.f x and c.d.f.F x if the variables satisfy two properties as follows.
i Each X i , i 1, . . ., n, has the same probability distribution as the underlying population with p.d.f.f x and c.d.f.F x .
ii The n random variables X 1 , . . ., X n are mutually independent.
Even though the emphasis in this paper will be on this setting where the underlying population is taken to be infinite or large enough to be well approximated by an infinite population, we will have occasion to mention adjustments that are necessary when the underlying population must be viewed accurately as a finite population containing N objects.In such finite population settings, a simple random sample satisfies a different set of conditions.Definition 1.2.A collection of n sample observations from a finite population consisting of a total of N observations is said to be a simple random sample X 1 , . . ., X n if each of the N n possible subsets of n observations has the same chance of being selected as the random sample.
Throughout this paper, we will take our underlying population to be infinite or at least quite large relative to the size of the sample being collected so that Definition 1.1 applies, unless we specifically state that the population under consideration is finite.
While the necessary stipulations are in place to ensure that a random sample represents the underlying population in the probabilistic sense that each of the items in the random sample has the same probability distribution as the underlying population, we are also aware that there is no guarantee that a specific random sample of units selected from the population for measurement is truly representative of the population.We only have the assurance that if we were to repeat this sampling process over and over, the sample average for an attribute of interest across the multiple random samples would provide a good estimator for the population value of the attribute.The single sample actually collected might or might not actually provide such a good estimate.
Early attempts to minimize this "bad random sample" possibility concentrated on using prior information to first divide the population into more similar subgroups and then employ the random sample approach within each of these subgroups to ensure broad representation across the entire population.Examples of these approaches include systematic sampling, stratified sampling, probability-proportional-to-size sampling, cluster sampling, and quota sampling.These refinements to simple random sampling provide increased assurances that the collected sample data will be more representative of the entire population by using prior knowledge about the structure of the population with respect to the variable of interest, usually through some correlated auxiliary variable that is readily available.However, none of these approaches utilize extra information from specific units in the population to guide their search for a truly representative sample.It was not until McIntyre 1 , reprinted in 2005 introduced the concept of ranked set sampling that statisticians had a valid way to utilize additional information from individual population units to aid in the selection of a more representative sample from a population.This seminal paper proposed  of the individual sample items represents a typical value chosen from the underlying population.That is not the case for a balanced RSS of size n.While the individual observations in a balanced RSS remain mutually independent, they are clearly not identically distributed, so that individual observations in a balanced RSS do not represent typical values from the underlying population.In fact, the individual judgment order statistics represent very distinctly different portions of the underlying population.This is a very important feature of an RSS, as the items in the sample are designed in such a way as to provide greater assurance that the entire range of population values are represented.This is best illustrated by considering an example.Suppose that X has a standard normal distribution and let X 1 , X 2 , X 3 , X 4 , and X 5 be a random sample of size five from this distribution.Let X 1 ≤ X 2 ≤ X 3 ≤ X 4 ≤ X 5 be the associated order statistics.
In Figure 1, we plot the underlying N 0, 1 density as well as the marginal distributions for the five individual order statistics X 1 , X 2 , X 3 , X 4 , and X 5 .
If we use perfect rankings to collect an RSS of size five from the standard normal distribution, then these five RSS observations behave like mutually independent order statistics from the standard normal and their densities are represented by the five individual marginal density curves in Figure 1.While these five densities certainly overlap, they assign the bulk of their individual marginal probabilities to five subregions of the standard normal domain.As a result, the five RSS observations are much more likely to represent the full range of values for the standard normal distribution than would an SRS of size five; that is, the probability that the five SRS observations fail to represent the full range of the standard normal distribution is greater than the corresponding probability for the five RSS observations.As we will see in Section 5, this feature enables RSS to be more effective than SRS in estimation of a population mean.

Collecting a Ranked Set Sample: Example
Unburned hydrocarbons emitted from automobile tailpipes and via evaporation from manifolds are among the primary contributors to ground level ozone and smog levels in large cities.One way to reduce the effect of this factor on air pollution is through the use of reformulated gasoline, designed to reduce its volatility, as measured by the Reid Vapor Pressure RVP value.To assure that gasoline stations in metropolitan areas are selling gasoline that complies with clean air regulations, regular samples of reformulated gasoline from the pumps at these stations are collected and RVP values are measured.
The RVP value for a sample can either be measured by a crude field technique right after collection at the gasoline pump or via a more sophisticated analysis after the sample has been shipped to a government laboratory.While the actual laboratory analysis of RVP is not overly expensive, it is costly to ship these gasoline samples to the laboratory, since Standard normal density dotted curve and the individual marginal densities of the five order statistics X 1 , X 2 , X 3 , X 4 , and X 5 solid curves, in order of peaks, from the minimum, X 1 , on the left to the maximum, X 5 , on the right for a random sample of size five from the standard normal distribution.they must be packed to prevent gaseous hydrocarbons from escaping en route and special transport measures are required for flammable liquids like gasoline.It would be beneficial to use these cruder, less expensive, field RVP measurements as reliable surrogates for the more expensive laboratory RVP measurements to reduce the required number of formal laboratory tests without significant loss of accuracy, resulting in considerable cost savings.
Nussbaum and Sinha 6 suggested the use of RSS as an aid in achieving this goal.Thirty-six of the field RVP measurements collected at the pumps considered by Nussbaum and Sinha are given in Table 2 .
Nussbaum and Sinha recommended using these field RVP values highly correlated with the more precise laboratory measurements to provide the ranking mechanism for selection of a much smaller subgroup of gasoline samples to submit for follow-up laboratory analysis.They considered a set size of k 3 with m 4 cycles, which leads to a ranked set sample of only n 12 gasoline samples to send for full laboratory RVP measurement.
To select this RSS, using a set size k 3, of twelve gasoline samples to be sent to the laboratory for more precise RVP measurements, the first thing we must do is to randomly divide the 36 gasoline samples into twelve sets of three each.For this purpose, we use the R command sample 1 : 36, 36, replace F to obtain the following random ordering of the sample numbers 1 to 36 clustered into twelve sets of size k 3 each based on their order of appearance

3.1
Next we must decide which four sets will be used to obtain the smallest judgment ordered units, which four will be used to obtain the median judgment ordered units, and which four will be used to obtain the largest judgment ordered units.There is complete flexibility here, but these decisions must be made without knowledge of the actual field RVP values in the twelve sets.For sake of illustration here, we choose to select the minimum judgment ordered unit from the first four sets, the median judgment ordered unit from the second four sets, and the largest judgment ordered unit from the final four sets.The twelve sets of three RVP values each ordered as above that result from our sampling process are given in Table 3 .
Using our chosen criteria for selecting the judgment ordered units, we see that the units selected by our ranked set sampling scheme for shipment to the laboratory for precise RVP measurements are those gasoline samples corresponding to the bolded, enlarged field RVP values in Table 4 .
Thus we will send gasoline samples 23,17,21,15,5,14,22,4,8,29,7, and 25 to the laboratory for more precise RVP determinations and the resulting laboratory RVP values will constitute our balanced ranked set sample of size n 12 based on a set size of k 3 and cycle size m 4, using field RVP value as our auxiliary ranking variable.

Early Historical Development
McIntyre 1 first proposes the concept of ranked set sampling in the context of obtaining reliable farm yield estimates based on sampling of pastures and crop plots.He provides a clear and insightful introduction to the basic framework of ranked set sampling and lays out the rationale for how it can lead to improved estimation relative to simple random sampling.However, more than fifteen years passed before Takahashi and Wakimoto 7 and Takahashi 8 formally develop the statistical methodology underlying this sampling approach.Dell and Clutter 9 and David and Levine 10 soon follow with the important result that the ranked set sample mean is always an unbiased estimator of the population mean and that it is at least as precise as the simple random sample estimator based on the same number of sample observations.Moreover, this remarkable fact is true regardless of possible errors in the rankings used to obtain the ranked set sample data.Initially, theoretical interest in ranked set sampling was minimal.Yanagawa and Shirahata 11 , Yanagawa and Chen 12 , and Shirahata 13 utilize the notion of selective probability in conjunction with RSS.Stokes 14 proposes the use of concomitant variables to aid in the ranking process used to obtain RSS data.She also studies the use of the ranked set sample approach for making inferences about the population variance 15 and a correlation coefficient 16 .These papers were followed a few years later by an elegant paper by Stokes and Sager 17 in which they discuss the use of ranked set sampling to make inferences about a population distribution function.Their paper contains many innovative ideas that lead directly to the modern era for ranked set sampling research.It has not only proven to be the stimulus for an ongoing and active ranked set sample research community, but it has also spawned a number of spinoffs of the basic concept leading to additional interesting and important approaches to statistical inference.
We wil return to this more recent research later in the paper.First, we illustrate the use of ranked set sampling methodology by providing a few details for three specific ranked set sampling procedures-estimation of a population mean, estimation of a population proportion, and an analogue to the Mann-Whitney two-sample test procedure.

Ranked Set Sample Estimation of a Population Mean
Consider two mutually independent sets of n observations each from a continuous population with distribution function F, density function f, finite mean μ, and finite variance σ 2 .One set of n observations, X 1 , . . ., X n , is collected as a simple random sample SRS and the second set of n observations is collected as a balanced ranked set sample RSS , corresponding to set size k and m cycles, with n km.The ranked set sample observations from cycle 1 are denoted by X 1 1 , X 2 1 , . . ., X k 1 , the ranked set sample observations from cycle 2 are denoted by X 1 2 , X 2 2 , . . ., X k 2 , . .., and the ranked set sample observations from the final cycle m are denoted by X 1 m , X 2 m , . . ., X k m .
The SRS estimator for the population mean μ is just the sample mean μ SRS X 1/k k j 1 X j and it is well known that E μ SRS μ and Var μ SRS σ 2 /k.
The natural ranked set sample estimator, μ RSS , for the population mean μ based on the balanced ranked set sample X 1 1 , . . ., X k 1 ; X 1 2 , . . ., X k 2 ; . . .; X 1 m , . . ., X k m is simply the average of the sample observations, namely, The balanced RSS estimator μ RSS 5.1 is also an unbiased estimator for the population mean μ regardless of whether the judgment rankings are perfect or imperfect.As noted previously, Dell and Clutter 9 established this result in the general setting for set size k and m cycles without any restriction on the accuracy of the judgment rankings.We demonstrate the argument under the more restrictive assumption that the judgment rankings are perfect.Under this additional assumption of perfect rankings, the ranked set sample observations are, in fact, true order statistics from the underlying continuous population.
For simplicity in the argument, we consider only the case of a single cycle m 1 , so that the total sample size n is equal to the set size k.Under the assumption of perfect rankings, we can represent the RSS observations for this setting by X * 1 , . . ., X * k , where these k variables are mutually independent and X * i , i 1, . . ., k, is distributed like the ith order statistic for a random sample of size k from a continuous distribution with distribution function F and density f.
It follows immediately from properties of a simple average that Moreover, since X * i is distributed like the ith order statistic for a random sample of size k from a continuous distribution with distribution function F and density f under perfect rankings, we have for i 1, . . ., k. Combining 5.2 and 5.3 , we obtain

5.4
Letting q i − 1 in the summation in 5.4 , we see that since the latter expression is just the sum over the entire sample space of the probabilities for a binomial random variable with parameters k − 1 and p F x .Using this fact in 5.3 , we obtain thus establishing the fact that μ RSS is an unbiased estimator for μ.
To obtain the variance of the RSS estimator μ RSS , we note that the mutual independence of the Var X * i .

5.7
Letting since the cross product terms are zero.Combining 5.7 and 5.8 yields the expression

5.9
Now, proceeding as we did with E X RSS , we see that

5.10
Once again using the binomial distribution, the interior sum is equal to 1 and we obtain

5.11
Combining 5.9 and 5.11 , it follows that

5.12
Thus, both μ SRS and μ RSS are unbiased estimators for the population mean.Moreover, from 5.12 , it follows that Hence, in the case of perfect rankings not only is X RSS an unbiased estimator, but also its variance is always no larger than the variance of the SRS estimator X based on the same number of measured observations.In fact, this is a strict inequality unless μ * i μ for all i 1, . . ., k, which is the case only if the judgment rankings are purely random.

Ranked Set Sample Estimation of a Population Proportion
While estimation of a population proportion is simply a special case of estimation of a population mean, some important ranked set sampling developments have resulted from considering it in its own right.For populations consisting of binary data corresponding to "success" or "failure," for example, the feature of interest is the proportion, p, of "successes" in the population.If we assign the numerical values of 0 and 1 to "failure" and "success," respectively, then the proportion p is nothing more than the population average μ as discussed in the previous section, so that one natural estimator for p is simply the sample average, p RSS , corresponding to the percentage of "successes" observed in the ranked set sample.Lacayo et al. 18 discuss this naive estimator.However, p RSS does not fully utilize the additional information incorporated in the ranked set sample data via the prior ranking process; that is, unlike with a simple random sample, not all "successes" in a ranked set sample should be treated equally.
Taking into account this special information associated with the different ranked set sample observations, Terpstra 19 develops the RSS maximum likelihood estimator, p RSS, MLE , for a population proportion p.He shows that p RSS, MLE is slightly more efficient than p RSS and uniformly more efficient than the standard sample percentage of "successes," p SRS , for a simple random sample of the same size.Terpstra and Nelson 20 compare this RSS maximum likelihood estimator with a weighted average competitor and develop optimal unbalanced allocation protocols for both estimators.Terpstra and Miller 21 and Terpstra and Wang 22 study mechanisms for obtaining RSS confidence intervals and hypothesis tests for a proportion.
Another factor that is important to consider when applying RSS methodology to estimation of a population proportion is the curious aspect of initially "ranking" binary data to implement the ranked set sample structure.This is not an issue if individuals are used to subjectively judgment rank the candidates within a set with respect to their relative likelihoods of being "successes."However, if we wish to use additional quantitative information from the population to aid in these within-sets binary rankings, then appropriate mechanisms are required to enable that process.Terpstra and Liudahl 23 suggest the use of a single concomitant variable to facilitate the ranking of binary data and Chen et al. 24, 25 expand on this concept through the use of logistic regression to incorporate multiple concomitants in a formal mechanism for ranking such data.Using data from the Third National Health and Nutrition Examination Survey 26 conducted by the National Center for Health Statistics, Centers for Disease Control and Prevention, and the definitions of overweight and obesity given in Kuczmarski et al. 26 , they find that the use of logistic regression substantially improves the accuracy of the preliminary rankings in the RSS process, which, in turn, leads to considerable gains in precision for estimation of the proportions of obese and overweight individuals in the NHANES III population.

Ranked Set Sample Analogue of the Two-Sample Mann-Whitney Test Procedure
One of the most common statistical applications is that of comparing the medians for two populations.The test procedure proposed by Wilcoxon 27 and Mann and Whitney 28 is often used for this purpose when only minimal assumptions can be made about the underlying populations and the data are independent simple random samples from the populations.Bohn and Wolfe 29 develop an analogue to the Mann-Whitney Wilcoxon procedure that is applicable when the data are ranked set samples, rather than simple random samples.
Let F and G denote the c.d.f.'s for continuous populations 1 and 2, respectively.We are interested in testing the null hypothesis of no differences in the two populations, namely, We collect m balanced ranked set sample observations from population 1 using c cycles of set size k each, where m kc, and the ranked set sample observations from cycle 1 are denoted by X 1 1 , X 2 1 , . . ., X k 1 , the ranked set sample observations from cycle 2 are denoted by X 1 2 , X 2 2 , . . ., X k 2 , . .., and the ranked set sample observations from the final cycle c are denoted by X 1 c , X 2 c , . . ., X k c .In addition, we collect n balanced ranked set sample observations from population 2 using d cycles of set size q each, where n qd, and the ranked set sample observations from cycle 1 are denoted by Y 1 1 , Y 2 1 , . . ., Y q 1 , the ranked set sample observations from cycle 2 are denoted by Y 1 2 , Y 2 2 , . . ., Y q 2 , . .., and the ranked set sample observations from the final cycle d are denoted by Y 1 d , Y 2 d , . . ., Y q d .Thus we obtain a total of N m n ranked set sample observations, m from population 1 and n from population 2. We also assume that the ranked set samples from the two populations are themselves independent and that the ranking processes used to obtain them are perfect, so that the ranked set sample observations are true order statistics from their respective populations.
To compute the Bohn and Wolfe 29 statistic BW, we follow the form of the twosample Mann and Whitney 28 U statistic by computing the mn kcqd indicator statistics φ X s t , Y u v , for s 1, . . ., k; t 1, . . ., c; u 1, . . ., q; v 1, . . ., d, where The Bohn-Wolfe statistic is then and their level α procedure for testing H 0 7.1 against the alternative H 1 : Δ > 0 corresponding to the Y 's tending to be larger than the X's is Reject H 0 if BW ≥ bw α ; otherwise do not reject, 7.4 where the constant bw α is the upper α percentile for the null Δ 0 distribution of BW chosen to make the type I error probability equal to α.For the alternative Δ < 0 the Bohn-Wolfe test rejects for small values of BW and for the general alternative Δ / 0 it rejects for either small or large values of BW.
Generating the necessary critical values bw α for the BW test 7.4 is both interesting and challenging if the sample sizes m and n are even moderately large.Fortunately, Bohn and Wolfe established the asymptotic normality of a properly standardized version of the BW statistic and this can be used to obtain approximate critical values for the test procedure.

Impact of Imperfect Rankings
The effectiveness of RSS procedures depends directly on how well the within-set rankings to select the units for measurement can be accomplished.While perfect rankings are surely the goal of any RSS protocol, it is just as likely not to be feasible.Thus it is imperative in practice that we are able to assess the effect of imperfect rankings on our procedures and the most appropriate way to do this is to develop statistical models to capture the uncertainty of the ranking process.
Dell and Clutter 9 propose the first class of models for this purpose.They view the ranks of the experimental units as being based on perceived values that are associated with the true measured values through an additive model.Taking a much different approach, Bohn and Wolfe 30 consider the distributions of the judgment order statistics to be mixtures of distributions of the true order statistics and base their model on the expected spacings between order statistics.Aragon et al. 31 study the effect of imperfect rankings on RSSbased procedures through the use of a ranking error probability matrix.Presnell and Bohn 32 point out some limitations with this approach.Frey 33 overcomes the Presnell-Bohn concerns by producing a much larger class of models through a clever scheme of subsampling order statistics from the basic Bohn-Wolfe model.A recent attempt by Fligner and MacEachern 34 to understand the ranking process uses the monotone likelihood ratio principle to develop a class of imperfect ranking models.Özturk 35 uses a nonparametric maximum likelihood approach to estimate within-set ranking errors.Park and Lim 36 study the effect of imperfect rankings on the amount of Fisher Information in ranked set samples.

Adjustments to Mitigate the Effect of Imperfect Rankings
As noted in the previous section, the impact of seriously imperfect rankings on the performance of ranked set sampling procedures can be substantial.In particular, this is certainly the case for the Bohn-Wolfe ranked set sample analogue of the Mann-Whitney procedure discussed in Section 7.Both Bohn and Wolfe 30 and Fligner and MacEachern 34 point out that the true significance level for the Bohn-Wolfe procedure can be considerably larger than the nominal significance level prescribed under the condition of perfect rankings.
For the setting where the set size is the same for both the X and Y ranked set samples i.e., k q , Fligner and MacEachern 34 propose a modified extension of the Mann-Whitney procedure that includes cross comparisons between the X and Y samples only for those X, Y observations that have the same within set judgment ranks.Let where Thus, T j is simply the Bohn-Wolfe statistic with Mann-Whitney counts limited solely to those X and Y observations that have the same within set judgment rank j, for j 1, . . ., k.The Fligner-MacEachern test statistic is then the sum of these common-judgment-rank Mann-Whitney statistics, namely, FM k j 1 T j .

9.3
The statistic FM is distribution free under H 0 : Δ 0 for any ranking mechanism perfect or imperfect that is the same for both the X and Y populations.In fact, the null distribution of each T j , j 1, . . ., k, is precisely that of the usual two-sample Mann-Whitney statistic for cX observations and dY observations.Moreover, T 1 , . . ., T k are mutually independent since all cdk 2 RSS observations are mutually independent.Thus the null distribution for FM 9.3 can be obtained as the convolution of k independent Mann-Whitney null distributions, each for the same sample sizes of cX's and dY 's, and this is true whether the ranking process is perfect or imperfect in any fashion, including completely at random.Fligner and MacEachern compared the performance of tests based on the Bohn-Wolfe statistic BWwith tests based on their statistic FM under perfect rankings and under a variety of imperfect ranking models.Since the BW statistic includes more individual comparisons between the two samples than does the FM statistic, it is not surprising that Fligner and MacEachern found the Bohn-Wolfe procedure to generally have higher power than the FM procedure when the rankings are perfect, although this edge in power for BW under perfect rankings is never overwhelming for the underlying distributions considered in their study.

ISRN Probability and Statistics
On the other hand, when the rankings are imperfect, they found that the FM procedure was generally superior to the BW procedure.This is also not surprising given that the FM procedure is truly distribution free under H 0 so that it maintains its nominal significance level even when the rankings are imperfect, while the true significance level for the BW procedure can be considerably inflated over its nominal level in the presence of less than perfect rankings.
Özturk 42 takes a different approach to deal with imperfect rankings.He points out that the only effect that imperfect rankings have on the large sample approximation for the Bohn-Wolfe procedure is through the asymptotic null variance σ 2 0,∞ .He uses a natural stochastic ordering constraint on the ranked set sample data to construct a consistent estimator, σ 2 0,∞ , for σ 2 0,∞ that leads to a modified standardized version of the Bohn-Wolfe statistic, namely, BW * OZ BW − E 0 BW / σ 0,∞ .He shows that when H 0 : Δ 0 is true, the calibrated statistic BW * OZ has, as the cycle sizes c and d tend to infinity, an asymptotic N 0, 1 distribution, even in the presence of imperfect rankings.This Özturk-calibrated test procedure maintains the nominal significance level asymptotically without negatively impacting the power of the test.Özturk proposes a similar correction to deal with imperfect rankings for the RSS sign test.Özturk 43 proposes an alternative way to adjust for imperfect rankings by estimating the parameters in the imperfect ranking models of Bohn and Wolfe 30 and Frey 33 through minimization of a distance measure.He then uses these fitted models to calibrate RSS confidence intervals and tests.Alexandridis and Özturk 44 use a robust inference approach to alleviate the effect of imperfect rankings.

Choices: Set Size and Cycle Size, Balanced versus Unbalanced, Iterative Sampling
Even though McIntyre introduced the basic concept of ranked set sampling in his seminal paper sixty years ago, it was not until the paper by Stokes and Sager 17 that the true impact of this simple idea began to materialize.Ranked set sampling has been an active arena of statistical research ever since their paper and it continues to attract widespread attention even sixty years post-McIntyre.Part of this richness is due to the great flexibility provided by the ranked set paradigm.In this section, we discuss some aspects of this flexibility that provide excellent research opportunities and address complexities in applications.

Set Size and Cycle Size
Set size plays a critical role in the performance of any RSS procedure.For given set size k, each measured ranked set sample observation utilizes additional information obtained from its ranking relative to k − 1 other units from the population.With perfect rankings this additional information is clearly an increasing function of k.Thus, with perfect rankings, we would like to take our set size k to be as large as economically possible within available resources.However, it is also clear that the likelihood of errors in our rankings is an increasing function of the set size as well; that is, the larger k is, the more likely we are to experience ranking errors.Therefore, to select the set size k optimally, we need to be able to both model the probabilities for imperfect rankings and to assess their impact on our RSS statistical procedures.

Unbalanced Ranked Set Sampling
The emphasis in this paper has been on balanced ranked set sample data of the form X i j , i 1, . . ., k and j 1, . . ., m, where k is the common set size and m is the number of cycles.Thus, in the case of balanced rank set sample data we have the same number, m, of each of the judgment order statistics; that is, we have m mutually independent and identically distributed first judgment order statistics X 1 1 , . . ., X 1 m ; mmutually independent and identically distributed second judgment order statistics X 2 1 , . . ., X 2 m ; . ..; and m mutually independent and identically distributed kth judgment order statistics X k 1 , . . ., X k m .While balanced RSS is the most commonly occurring form of ranked set sampling data, there are situations where it is not optimal to collect the same number of measured observations for each of the judgment order statistics.For example, consider an underlying distribution that is unimodal and symmetric about its median θ and suppose we are interested only in making inferences about θ using ranked set sample data based on an odd set size k.Among all the order statistics for a random sample of size k, we know that the sample median X k 1 /2 contains the most information about θ.Thus, to estimate θ in this setting, it is natural to consider measuring the same judgment order statistic, namely, the judgment median X k 1 /2 , in each set, so that it is measured all k times in each of the m cycles.The resulting ranked set sample consists of mk measured observations, each of which is a judgment median from a set of size k.This would be the most efficient ranked set sample for estimating the population median θ for a population that is both unimodal and symmetric about θ, and it is clearly as unbalanced as possible.A similar approach calls for a distinctly different unbalanced ranked set sample for estimating the median of an asymmetric unimodal population.There are, of course, other considerations.While median judgment order statistics do provide an efficient estimator for the median of a symmetric population, they would not be an optimal choice if we also want to estimate the variance of the population-more balanced RSS measurements would be preferable for this purpose.
Chen et al. 45 and Chen et al. 46 consider the use of unbalanced ranked set samples in estimation of a population proportion p.They use Neyman allocation to decide on optimal representations of the various judgment order statistics in the formation of a ranked set sample.This approach leads to the preferred use of balanced RSS for values of p near 1/2, but the unbalanced nature of the optimal allocation grows dramatically as the value of p nears either 0 or 1.
Additional work with very specific median, truncated, and extreme unbalanced RSS schemes can be found in: Samawi

Unequal Set Sizes
Sometimes the sets that arise naturally in RSS applications are of unequal sizes.For instance, commuters on different public buses in a large city or patients in a collection of doctors' waiting rooms represent naturally occurring sets of varying sizes.One alternative in such situations is to pare down the larger sets to agree in size with the smaller sets, but that can lead to a loss of valuable information that could have been obtained from the more comprehensive rankings within the larger sets.Gemayel et al. 75 propose an estimator for the median of a symmetric population that combines medians of ranked set samples of varying sizes.While not optimal for any specific symmetric distribution, they show that the estimator is robust over a wide class of symmetric distributions.

Cost Considerations
Even under perfect judgment rankings, the costs of the various components of ranked set sampling, namely, identifying sampling units, ranking of sets of sampling units, and eventual measurement of units selected for inclusion in a ranked set sample, all affect the choice of an optimal set size k.
From the very beginning 1, 7, 76 , the importance of the relative costs of these factors has been emphasized.Dell and Clutter 9 incorporate the cost of stratification sampling and ranking the units and the cost of quantification for the selected units in a model to assess the efficiency of the RSS mean relative to the SRS mean in estimation of the population mean.Kaur

Multiple Observations per Set
In all of the previous discussion of ranked set sampling in this paper, we only consider measuring a single observation from each set.The rationale behind this approach is the fact that the correlation inherent in measuring more than one observation per set typically leads to a reduction in efficiency for RSS estimation.Wang et al. 83 , however, demonstrate that this is not necessarily the case when the cost involved in the ranking process itself is not small relative to the costs of unit selection and unit measurement.Under such conditions, they find that quantifying two or more observations from a set can actually lead to improved RSS estimation.Muttlak 84 and Hossain and Muttlak 85 also suggest selecting two observations for measurement from each ranked set.Ghosh and Tiwari 86 take a related, but more general, approach in estimation of the distribution function and expand their methodology to other settings in Ghosh and Tiwari 87 .This idea is also critical to the development of order-restricted randomization, which will be discussed in Section 13.2.

Extended Forms of Ranked Set Sampling
A number of modifications to the basic ranked set sampling process have also appeared in the literature.Double ranked set sample procedures, where a second application of the ranked set sampling process occurs within the initial selected RSS units before formal measurements are obtained, are discussed in Al-Saleh and Al-Kadiri 88 , Abu-Dayyeh et al. 89

Estimation of the Population Distribution Function
Utilization of information obtained from rankings is clearly an integral part of the ranked set sample concept through the judgment ranking process used to select the specific items for measurement.However, it was not until the seminal paper by Stokes and Sager 17 that a rank-based nonparametric approach was proposed for analysis of the RSS measurements themselves.
Let X 1 j , . . ., X k j , for j 1, . . ., m be the ranked set sample for set size k and m cycles from a distribution with distribution function F t .Stokes and Sager 17 consider the sample distribution function for the RSS data, namely, to be the natural RSS estimator for F t .They show that F RSS t is an unbiased estimator of F t and that where F SRS t is the usual sample distribution function for an SRS of the same size n mk.
They also demonstrate how to use F RSS t in conjunction with the Kolmogorov-Smirnov statistic to provide simultaneous confidence bands for the distribution function F t .Kvam and Samaniego 103 consider competitors to F RSS t that allow for differential weightings of the RSS observations in the averaging process.Their approach leads to more efficient estimators than F RSS t under a variety of specific distributional assumptions about F t .Kvam and Samaniego 104 use a similar approach to obtain a nonparametric maximum likelihood NPLM estimator for F t based on RSS data.The estimators proposed by Kvam and Samaniego in these two papers also automatically accommodate unbalanced ranked set sample data, where the different order statistics are not equally represented in the collected ranked set sample see Section 10.2 for more discussion of unbalanced RSS approaches .The original Stokes and Sager estimator F RSS t does not immediately adapt to such unbalanced ranked set samples.Huang 105 studies the asymptotic properties of ISRN Probability and Statistics the NPLM estimator, showing that it is consistent and that it converges weakly to a normal process.Kvam and Tiwari 106 consider Bayes estimation of a distribution function with RSS data.Kim and Arnold 107 utilize generalized ranked set sampling to estimate the distribution function.Özturk 108 develops an estimator for the distribution function under the additional assumption of population symmetry.Lam et al. 109 use the kernel method to estimate the distribution function in conjunction with auxiliary information from ranked set samples.Frey 110 uses ranked set sampling in conjunction with a covariate to estimate both the distribution function and the population mean.

Estimation of the Population Variance
The ranked set sampling approach has also been used to estimate a population variance.Let X 1 j , . . ., X k j , for j 1, . . ., m, be a ranked set sample for set size k and m cycles from a population with finite variance σ 2 .Stokes 15 proposes the following RSS estimator, σ 2  Stokes , for where μ RSS 5.1 is the RSS estimator for the population mean.Stokes shows that the estimator σ 2 Stokes 11.3 is asymptotically unbiased for σ 2 and, for sufficiently large m or k, at least as efficient as the standard variance estimator, σ 2 SRS , based on a simple random sample of the same size n mk.Stokes points out, however, that the estimator σ 2  Stokes does not do as well for small or moderate samples, due primarily to the fact that it can be quite biased for even moderate sample sizes.
MacEachern et al. 111 and Perron and Sinha 112 note that the Stokes estimator σ 2  Stokes treats each observation in the ranked set sample the same regardless of which judgment order statistic it corresponds to, thereby ignoring some of the structural information provided by the ranked set sample design.They take advantage of this additional structure inherent in the ranked set sample design to propose a competitor estimator that incorporates both within judgment ranking and between judgment ranking information from the RSS data.MacEachern et al. show that σ 2 MOSW 11.4 is an unbiased estimator for σ 2 and that it is more efficient over a broad variety of underlying distributions for small to moderate sample sizes than the Stokes estimator σ 2  Stokes 11.3 .Under mild conditions, the asymptotic relative efficiency of σ 2 MOSW relative to σ 2  Stokes is 1 when the judgment ranking is perfect.
Yu and Tam 113 consider the problem of estimating the population mean and standard deviation based on an RSS with partially censored data.Chen 114 and Barabesi and Fattorini 115 explore the use of RSS data in conjunction with the kernel density method to estimate the underlying density function.Ghosh and Tiwari 116 explore Bayesian density estimation using ranked set samples.Chen Baklizi 127,128 investigates the use of the empirical likelihood to develop inferences for population quantiles for either balanced or unbalanced ranked set samples.Mahdizadeh and Arghami 129 study quantile estimation with ranked set samples in the special case when the population mean is known.

Nonparametric Test and Confidence Interval Procedures
In addition to the Bohn-Wolfe analogue of the Mann-Whitney test procedure for the twosample setting discussed in Section 7, ranked set sample analogues are also available for a number of other standard nonparametric tests.Hettmansperger   There are many additional published articles dealing with RSS as it applies to making inferences about specific distributions in a parametric setting.Since the goal of this paper is to emphasis the robust, nonparametric nature of ranked set sampling, we have chosen not to provide details for RSS methodology that is applicable only under specific distributional assumptions.

Applications of Ranked Set Sampling Methodology
Applications of ranked set sampling did not begin to appear until nearly fifteen years after the publication of McIntyre's paper.Halls and Dell 76 discuss its application in a study of forage yields and they were actually the first to invoke the name ranked set sampling for the methodology.Evans 208

Related Statistical Approaches and Extensions
The rapid development of the field of ranked set sampling over the past two decades has also provided a stimulus for the emergence of other important related approaches to statistical inference.In this section, we discuss four such areas that have arisen directly from previous RSS considerations.

Judgment Poststratification
One of the features of ranked set sampling is that a researcher is required to judgment rank the potential units prior to obtaining any measurements; that is, the researcher must commit to the ranked set sampling approach from the onset of the experiment.MacEachern et al. 224 introduce a data collection method, called judgment poststratification JPS , that enables a researcher to collect an initial simple random sample SRS in standard fashion from the population of interest and then to poststratify the SRS observations by ranking each of them among its own randomly chosen comparison sample.Thus the variable of interest is first measured on all of the original simple random sample units and only then is relative judgment ranking information obtained from the comparison samples to enable the judgment poststratification.This approach allows the researcher to utilize the measurements in the full SRS as well as the additional information obtained from the judgment poststratification process.
The JPS approach provides a mechanism for incorporating both imprecise rankings and information from multiple rankers via the judgment poststratification process.For additional work on JPS, see Wang et

Order-Restricted Randomization
Özturk and MacEachern 233, 234 build on the general framework of ranked set sampling to develop order-restricted randomized ORR designs that utilize subjective judgment ranking to enable restricted randomization in the comparison of two treatments one of which could be a control .The units within a given set are assigned to different treatments and then instead of the typical RSS approach that selects a single unit from each ranked set for full measurement, the ORR designs allow for all of the units within a set to be fully measured.The positive dependence between the units within sets leads to contrast estimators and confidence intervals with smaller variability than those based on either completely randomized designs or purely ranked set sample designs.An added feature of ORR estimation is that it does not rely on perfect judgment rankings.Özturk and Sun 235 utilize subjective information on experimental units to develop an ORR design two-sample rank sum test.

Intentionally Representative Sampling
Frey 236 introduces a novel approach to data collection dubbed intentionally representative sampling IRS that allows a researcher more flexibility in the use of prior and auxiliary information than is possible with RSS.Once a target sample size n has been established, the IRS process requires that the researcher divides the population of interest into disjoint potential samples of size n, each of which is considered based on prior and auxiliary information to be at least roughly representative of the overall population with respect to the measurement of interest.In this way, the researcher can exclude from the very beginning any potential samples that are considered to be unrepresentative of the population.To effectively implement the IRS approach we must, of course, have reasonably good auxiliary information about all of the units in the population, not just the ranking subsets that are required for implementation of RSS procedures.

Sampling from Partially Rank-Ordered Sets
There are times when it is difficult to rank all of the experimental units in a set with high confidence, particularly when subjective information is utilized in the ranking process.Özturk 237, 238 and Gao and Özturk 239 consider a judgment ordering process called judgment subsetting that allows a judgment ranker to use tied ranks when it is difficult to fully rank the experimental units in a set.They show that this added flexibility leads to improved precision for RSS estimation procedures in settings where the full ranking cannot be done with high confidence.Frey 240 studies nonparametric mean estimation using partially rankordered sets.Özturk 241 proposes statistical procedures that utilize partially rank-ordered data from multiple observers to assist in the selection of units for measurement in a basic ranked set sample design or to construct a judgment post-stratified design.

Summary
What started as a simple attempt by McIntyre 1 to utilize additional information to improve precision in the estimation of pasture yields through the selection of more representative sample observations has clearly grown into a major field of statistical methodology that continues to attract substantial research activity.For more detailed bibliographic discussions on developments in the area of ranked set sampling, we refer the interested reader to a series of review papers by Patil Figure1: Standard normal density dotted curve and the individual marginal densities of the five order statistics X 1 , X 2 , X 3 , X 4 , and X 5 solid curves, in order of peaks, from the minimum, X 1 , on the left to the maximum, X 5 , on the right for a random sample of size five from the standard normal distribution.
Frey et al. 37 , Li and Balakrishnan 38 , Vock and Balakrishnan 39 , and Zamanzade et al. 40 develop nonparametric test procedures to assess the perfectness of judgment rankings.Chen et al. 41 provide an empirical assessment of the ranking accuracy in ranked set sampling for data from the Third National Health and Nutrition Examination Study 26 .
et al. 77 devise a more complex model that incorporates a further delineation of the various costs, a model that was also used by Yu and Lam 78 in their study of an RSS regression estimator.Nahhas et al. 79 utilize the concept of a coherent ranking developed by Patil et al. 80 and a modified version of the Kaur et al. 77 cost model to study the interplay between the accuracy of visual ranking and the costs of sampling, quantification, and ranking on the choice of an optimal set size for RSS estimation of a population mean.They found that set sizes between 3 and 8 were optimal for a reasonable range of ranking error probabilities.Mode et al. 81 and Buchanan et al. 82 study the total costs of ranked set sampling relative to simple random sampling in the context of ecological research.
, Samawi and Tawalbeh 90 , Al-Saleh and Al-Omari 91 , Al-Saleh and Zheng 92 , Abujiya and Muttlak 93 , Al-Saleh and Samuh 94 , Al-Omari and Al-Saleh 95 , Samuh and Al-Saleh 96 , Agarwal et al. 97 , and Al-Omari and Haq 98 .RSS procedures involving random selection of the ranked set sample units for measurement are discussed in Li et al. 99 , Rahimov and Muttlak 100 , and Jozani and Perron 101 .Rahimov and Muttlak 102 study random ranked set sampling, where the set size and/or the number of cycles are also allowed to be random.
117 investigates many of the basic properties, including consistency and asymptotic normality, of RSS sample quantiles.Zhu and Wang 118 study a competitor estimator for population quantiles.Kaur et al. 119 consider the properties of a sign test for quantiles.Zhu and Wang 118 , Özturk and Deshpande 120 , and Frey 121 discuss RSS nonparametric quantile confidence intervals and Özturk and Balakrishnan 122 use their results to develop a test for quantile differences between two populations.Deshpande et al. 123 extend these quantile confidence intervals results to accommodate finite populations.Balakrishnan and Li 124, 125 introduce the concept of ordered ranked set samples and use them to construct confidence intervals for quantiles and tolerance intervals.Tiensuwan et al. 126 investigate nonnegative unbiased RSS estimators of scale parameters and associated quantiles.

130 ,
Koti and Babu 131 , Barabesi 132, 133 , D. H. Kim and H. G. Kim 134 , and Wang and Zhu 135 discuss various attributes of RSS versions of the standard sign test and Bohn 136 develops an RSS signed rank procedure.Dong and Cui 137 study an optimal RSS sign test for a general quantile.Other approaches to nonparametric inference using RSS data in the one-and two-sample arena include the papers by Li et al. 138 , Özturk 139, 140 , Özturk and Wolfe 141-144 , Sengupta and Mukhuti 145 , Hussein et al. 146 , and Hussein et al. 147 .Frey 148 develops a general class of distribution-free statistical intervals based on ranked set samples.Vock and Balakrishnan 149 study nonparametric RSS prediction intervals.Hartlaub and Wolfe 150 propose an RSS test procedure designed to detect umbrella alternatives in the k-sample setting and Magel and Qin 151 study a competitor to the Hartlaub and Wolfe procedure.Özturk et al. 152 use simultaneous one-sample sign confidence intervals for population medians to develop a k-sample RSS test procedure designed to detect simple-tree alternatives.Özturk and Balakrishnan 153 propose an exact RSS control-versus-treatment test procedure.Chen et al. 154 extend the application of RSS methodology to ordered categorical variables with the goal of estimating the probabilities of all of the categories.They use ordinal logistic regression to aid in the ranking of the ordinal variable of interest and propose an optimal allocation scheme.Özturk 155 explores the adaptation of rank regression methodology to RSS data and Liu et al. 156 study the use of the empirical likelihood in the context of ranked set sampling.Gaur et al. 157 consider an RSS approach to the multiple sample scale problem.
et al. 242 , Kaur et al. 243 , Patil 244 , Bohn 245 , Patil et al. 246 , Muttlak and Al-Saleh 247 , Patil 248 , Wolfe 249 , Chen 250 , and Wolfe 251 .Chen et al. 216 have written the only monograph/textbook on the subject, although a chapter in the third edition of Nonparametric Statistical Methods by Hollander et al. 252 is devoted to ranked set sampling.But, of course, the activity continues to outpace the review papers and monographs!

Table 3
Muttlak and McDonald 158, 159 utilize the RSS scheme in conjunction with size-biased probability of selection and Muttlak and McDonald 160 propose using a two-stage sampling plan with line-intercept sampling in the first stage and RSS in the second stage.Nematollahi et al. 161 employ ranked set sampling in the second stage of a two-stage cluster sampling design.Al-Saleh and Samawi 162 and Frey 163 present results about inclusion probabilities for population elements under RSS designs and G ökpinar and Özdemir 164 use these inclusion probabilities to construct a Horvitz-Thompson RSS estimator for the population mean in a finite population setting.Samawi 165 , Al-Saleh and Samawi 166 , and Al-Nasser and Al-Talib 167 incorporate the RSS approach to obtain more efficient Monte Carlo methods.Barabesi and Pisani 168 consider the use of RSS in replications of designs such as plot sampling or line-intercept sampling and Barabesi and Pisani 169 continue their work with a study of steady-state RSS for replicated environmental sampling plans.Barabesi and Marcheselli 170 investigate the use of auxiliary variables in design-based ranked set sampling and Chen and Shen 171 approach RSS as a twolayer process with multiple concomitant variables.Muttlak and Al-Sabah 172 , Al-Nasser and Al-Rawwash 173 , and Al-Omari and Al-Nasser 174 incorporate RSS in statistical quality control.Mode et al. 175 study the general use of incorporating prior knowledge in environmental sampling, including RSS.Ridout and Cobby 176 look at RSS under the condition of non-random selection of sets.Samawi and Muttlak 177 use RSS to estimate a ratio.Patil et al. 178 , Norris et al. 179 , and Ridout 180 all explore the use of RSS when we are interested in making inferences about multiple characteristics.Ahmed et al. 181 and Muttlak et al. 182 explore the role of RSS in Stein-type estimation and shrinkage estimation, respectively.Modarres et al. 183 investigate the use of resampling techniques with RSS data.A number of authors discuss the use of RSS in the context of regression analysis, including Patil et al. 184 , Muttlak 185, 186 , Yu and Lam 78 , Barreto and Barnett 187 , Chen 188 , Muttlak 189 , Chen and Wang 190 , Hui et al. 191 , Tipton Murff and Sager 192 , and Alodat et al. 193 .RSS methodology for bivariate data also receives considerable attention in the literature, including papers by Samawi and Al-Saleh 194 , Samawi et al. 195-197 , Samawi and Al-Saleh 198 , Hui et al. 199 , Samawi and Pararai 200, 201 , and Samawi et al. 202 .Arnold et al. 203 use multivariate order statistics to extend the RSS approach to a multivariate setting.The use of RSS in a Bayesian context is explored in Al-Saleh and Muttlak 204 , Lavine 205 , Al-Saleh et al. 206 , and Gemayel 207 .
uses this approach in regeneration surveys for long-leaf pine trees.More than ten years later Martin et al. 209 employ RSS in the estimation of shrub phytomass in Appalachian oak forests; Nelson et al. 210 study the nutrition of Populus deltoides plantations in the lower Mississippi River Valley using RSS-collected data; Cobby et al. 211 utilize RSS in their investigation of grass and grass-clover swards.More recently, Mode et al. 81, 175 investigate the use of ranked set sampling in the assessment of stream habitat areas in the Pacific Northwest in connection with salmon production.Murray et al. 212 provide an application of RSS in the comparison of different approaches to spraying of apple orchards.Al-Saleh and Al-Shrafat 213 use RSS to estimate average milk yield among sheep and Özturk et al. 214 use it to estimate the population mean and variance in regard to sheep flock management.Kvam 215 applies RSS to binary water quality data with covariates.Chen et al. 216 illustrate the use of an RSS approach in the estimation of tree heights for a set of data collected by Platt et al. 217 and for estimation of cinchona yield from a previous experiment by Sengupta et al. 218 .Husby et al. 219 use a crop production data set from the United States Department of Agriculture to demonstrate the practical benefits of RSS in the timely prediction of corn production and corn yield.Muttlak 220 , Özturk 155 , and Özturk et al. 152 apply an RSS protocol in a one-way analysis of variance setting to assess the relative healthiness of young males raised in different regions of Jordan.Tarr et al. 221 incorporate RSS in their study of the map accuracy of soil variables using soil electrical conductivity as a covariate.Wang et al. 222 show how RSS can be used to increase efficiency and reduce costs in fishery research.In a totally different venue, Gemayel et al. 223 provide an illustration of the cost savings that can result from the application of RSS in the field of auditing.