Double Sampling with Ranked Set Selection in the Second Phase with Nonresponse : Analytical Results and Monte Carlo Experiences

This paper is devoted to the study of the behavior of the use of double sampling for dealing with nonresponses, when ranked set sample is used. The characteristics of the sampling strategies are derived. The structure of the errors generated the need of studying of the optimality of the strategies by performing a set Monte Carlo experiments.


Introduction
The usual theory of survey sampling is developed assuming that the finite population U {u 1 , . . ., u N } is composed by individuals that can be perfectly identified.A sample s of size n ≤ N is selected.The variable of interest Y is measured in each selected unit.Real-life surveys should deal the existence of missing observations.There are three solutions to cope with this fact: to ignore the nonrespondents, to subsample the nonrespondents, or to impute the missing values.To ignore the non responses is a dangerous decision, to sub sample is a conservative and costly solution.Imputation is often used to compensate for item nonresponse.See, for discussions on the theme, Rueda and González 1 , Singh 2 , for example.
Section 2 presents the problem of non response when a single sample is selected.We consider the use of double sampling for obtaining information on an auxiliary variable X.A first large sample is selected, it is supposedly noncostly.The values of X are used for selecting a ranked set sample RSS , as the units are ranked using the values in the first stage sample.A selection of second sample provides a subsample from the preliminary large sample.The literature on the use of simple random double sampling SRS is large.Text books give the basic theory, see Singh 2 and Cochran 3 .In this paper we consider a ranked set sampling RSS double sampling procedure.It is presented in Section 3 where a family of estimators is considered as an RSS alternative to the proposal of Singh and Kumar 4 .An expression of the gain in accuracy due to our proposed estimator is found.The estimator is compared with simple mean and the proposal of Singh and Kumar 4 .Real-life data are used for evaluating the behavior of these alternative estimators of the population mean in Section 4.

The Nonresponse Problem: A Single Sample
Non responses may be motivated by a refusal of some units to give the true value of Y or by other causes.Hansen and Hurvitz in 1946 5 proposed selecting a sub-sample among the nonrespondents, see Cochran 3 .This feature depends heavily on the proposed sub-sampling rule.Sampling rules are due to Hansen and Hurvitz 5 , Srinath 6 , and Bouza 7 .The existence of non responses fixes that U is divided into two strata: The procedure is a particular double sampling design described, using Hansen-Hurvitz's rule HHR as follows.
Step 1. Select a sample s from U using srswr.
Step 2. Evaluate Y among the respondents and determine Step 4. Select a sub-sample s 2 of size n 2 from s 2 using srswr.
Step The variance of 2.3 is deduced by using the following trick: the first term is the mean of s, then its variance is σ 2 /n.For the second term we have that

2.7
Conditioning to a fixed n 2 we have that the expectation of the third term is n .

2.8
Hence the expected error of 2.3 is given by the well-known expression Our proposal is to consider obtaining information provided by a known variable X for using RSS.
McIntire 8 proposed the method of RSS.He noticed the existence of a gain in accuracy with respect to the use of the sample mean with respect to srswr.Dell and Clutter 9 and Takahashi and Wakimoto 10 provided mathematical support to his claims.The following procedure provides a description of RSS selection.

RSS Procedure
Step 1. Randomly select m 2 units from the target population.
Step 2. Allocate the m 2 selected units as randomly as possible into m sets, each of size m.
Step 3. Without yet knowing any values for the variable of interest, rank the units within each set with respect to variable of interest.This may be based on personal professional judgment or done with concomitant variable correlated with the variable of interest.
Step 4. Choose a sample for actual quantification by including the smallest ranked unit in the first set, the second smallest ranked unit in the second set, the process is continued in this way until the largest ranked unit is selected from the last set.
Step 5. Repeat Steps 1 through 4 for r cycles to obtain a sample of size mr for actual quantification.
The RSS sample is the sequence of order statistics OS ξ 1 : 1 t , . . ., ξ m:m t , where j : h t denotes the statistic of order j in the hth sample in the cycle t 1, . . ., r.We have n mr observation and r of them are of the ith order statistics os , i 1, . . ., m.The RSS estimator of the mean of a variable of interest ξ, μ ξ is and its variance is given by where σ 2 The second term of 2.11 is the gain in accuracy due to the use of RSS instead of srswr.Bouza 11 developed an RSS alternative under non responses.The non responses in s is n 2 = rm 2 .He derived that, using a subsample size m 2 m 2 /K, 12 is unbiased for the mean of Y in the nr stratum.
The cross-expectation's expected value is zero.In this case the RSS is balanced and we may express the variance of the order statistics OS as a function of the variance of Y in U 2, V y i:m 2 t , and the gains in accuracy measured by the Δ 2 2Y i , s as Substituting n 2 rm 2 /K 2 we obtain the following:

2.14
Taking the RSS estimator

2.15
Then there is gain in accuracy due to the use of RSS which is where 2 is the gain in accuracy due to the use or RSS in the second stage.

The Nonresponse Problem: Double Sampling
We will consider that double sampling is used for obtaining a sample s * from U using srswr.
A cheap variable X is measured in the units in s * .X is correlated with Y and we are able to compute the mean of it x in the first stage.There are non responses.In the second stage we know x s * n * i 1 x i /n * and x n i 1 x i /n.Note that these estimates are used only in the estimation process.
Non responses on Y are present in the second stage sample and a subsample among the non respondents is selected.Singh and Kumar 4 considered this problem for simple random sampling.They proposed the family of estimators characterized by The sampler fixes the constants α and β as well as a and b.They can be constants or functions, a different from zero.Taking

3.6
The variance is given by

3.8
We are going to derive the RSS counterpart of this family.The first phase sample is selected using srswr and the information on X is used for selecting the initial sample and to subsample the non respondents.Our proposal is to use x rss is the RSS mean of X in the second stage and Let us represent the involved estimators by x rss μ X 1 ω rss .

3.12
We can rewrite 3.9 as Note that

3.14
Under the hypothesis /φZ/ < 1, Z ε rss , θ rss , ϑ, ω rss , an expansion in Taylor series of 3.13 may be worked out.Grouping conveniently we have that

3.15
The cross-products for the OS Z i , Z X, Y , are expressed by

3.16
The conditional expectations of the RSS estimators are 3.17 Using these results we have that

3.20
Substituting in 3.15 after some algebraic work we obtain that the bias of 3.9 is where , z x, y.

3.22
For a large value of n the bias tends to zero.Then we have proved the first statement of the following proposition.

3.25
Calculating the expected value and grouping we have that

3.26
Remark 3.3.The gain in accuracy due to the use of 3.9 in terms of the variance is Hence, as V y * rss V y * G the proposed method is more precise if G < 0. ε was generated using the same distribution.The results are given in Table 2.Note that generally the gain in efficiency is larger when the underlying distribution is symmetric.The best results are derived when m 4 excepting the Beta distribution.

Conclusions
The accuracy of the proposed method seems to be better than the SRS method when G rss is analyzed.It can take negative values but it has been larger than zero in the experiments developed.It was around 0,1 in all the cases and using m 4 may be the best choice.

Table 1 :
Gain in accuracy due to the use of RSS in three populations.

Table 2 :
Gain in accuracy due to the use of RSS of six populations: n * 240 and K 0, 10. m 3, 4, 6.The results are given in Table1.They sustain that the use of RSS provides gains of accuracy larger than 10%/.A similar study was developed by generating a sample of 240 values of X and determining for