This paper presents a technique for estimating finite population mean of the study variable
in the presence of two auxiliary variables using two-phase sampling scheme when the regression line does not pass through the neighborhood of the origin. The properties of the proposed class of estimators are studied under large sample approximation. In addition, bias and efficiency comparisons are carried out to study the performances of
the proposed class of estimators over the existing estimators. It has also been shown that the proposed technique has greater applicability in survey research. An empirical study is carried out to demonstrate the performance of the proposed estimators.
1. Introduction
The use of auxiliary information for estimating population mean of the study variable has greater applicability in survey research. It is utilized at the estimation stage and design stage to obtain an improved estimator compared to those not utilizing auxiliary information. The use of ratio and product strategies in survey sampling solely depends upon the knowledge of population mean X- of the auxiliary variable X.
The ratio estimator was developed by Cochran [1] to estimate the population mean Y- of the study variable Y by using information on auxiliary variable X, positively correlated with Y. The ratio estimator is most effective when the relationship between Y and X is linear through the origin and the variance of Y is proportional toX. Robson [2] defined a product estimator that was revisited by Murthy [3]. The product estimator is used when the auxiliary variable X is negatively correlated with the study variable Y.
When the population mean X- of the auxiliary variable X is not known before the start of a survey, then a first-phase sample of size n′ is selected from the population of size N on which only the auxiliary variable X is measured in order to furnish a good estimate of X-. And then a second-phase sample of size n is selected from the first-phase sample of size n′ on which both the study variable Y and the auxiliary variable X are measured. This procedure of selecting the samples from the given population is known as two-phase sampling (or double sampling). The concept of double sampling was first introduced by Neyman [4]. Some contribution to two-phase sampling has been made by Sukhatme [5], Hidiroglou and Sarndal [6], Fuller [7], Hidiroglou [8], Singh and Vishwakarma [9], and Sahoo et al. [10].
We can use either one or two (or more than two) auxiliary variables while estimating population mean of the study variable; keeping this fact, Chand [11] introduced chain ratio estimators. This led various authors including Kiregyera [12], Singh and Upadhyaya [13], Prasad et al. [14], Singh et al. [15], Singh and Choudhury [16], and Vishwakarma and Gangele [17] to modify the chain type estimators and discuss their properties.
When the population mean Z- of another auxiliary variable Z which has a positive correlation with X (i.e., ρXZ>0) is known and if ρYX>ρYZ>0, then it is advisable to estimate X- by X-=x-′(Z-/z-′), which would provide a better estimate of X- as compared to x-′.
The usual chain type ratio and product estimators of Y- under double sampling scheme using two auxiliary variables X and Z are given, respectively, by
(1)y-Rdc=y-x-′x-Z-z-′,y-Pdc=y-x-x-′z-′Z-.
Singh and Choudhury [16] suggested the following exponential chain type ratio and product estimators of Y- under double sampling scheme using two auxiliary variables X and Z:
(2)y-Redc=y-exp{(x-′/z-′)Z--x-(x-′/z-′)Z-+x-},y-Pedc=y-exp{x--(x-′/z-′)Z-x-+(x-′/z-′)Z-},
where x-′ and z-′ are the sample means of X and Z, respectively, based on the first-phase sample of size n′ drawn from the population of size N with the help of Simple Random Sampling Without Replacement (SRSWOR) scheme. Also, y- and x- are the sample means of Y and X, respectively, based on the second-phase sample of size n drawn from the first-phase sample of size n′ with the help of SRSWOR scheme.
2. Proposed Estimator
It has been theoretically established that, in general, the linear regression estimator is more efficient than the ratio (product) estimator except when the regression line of Y on X passes through the neighborhood of the origin, in which the efficiencies of these estimators are almost equal. However, owing to stronger intuitive appeal, survey statisticians favour the use of ratio and product estimators. Further, we note that, in many practical situations, the regression line does not pass through the neighborhood of the origin. In these situations, the ratio estimator does not perform well as the linear regression estimator. Considering this fact, Singh and Ruiz Espejo [18] made an attempt to improve the performance of these estimators and suggested the following ratio-product type estimator for population mean Y- under double sampling scheme using single auxiliary variable X:
(3)y-RPd=y-[αx-′x-+(1-α)x-x-′],
where α is a real constant.
We propose the following exponential chain ratio-product type estimator for population mean Y- under double sampling scheme using two auxiliary variables X and Z:
(4)y-RPedc=y-[αexp{(x-′/z-′)Z--x-(x-′/z-′)Z-+x-}g+(1-α)exp{x--(x-′/z-′)Z-x-+(x-′/z-′)Z-}],
where α is a real constant to be determined such that the Mean Square Error (MSE) of the proposed estimator y-RPedc is minimum. For α=1, y-RPedc→y-Redc, whereas, for α=0, y-RPedc→y-Pedc.
Remark. It is noted that the proposed estimator in (4) is a special case of the class of estimators y-class=y-H(x-,z-′) proposed by Srivastava [19], whereH(·) is a parametric function such that H(x-′∣s1,Z-)=1 and satisfies certain regularity conditions defined in Srivastava [19].
3. Bias and MSE of the Proposed Estimator
To obtain the Bias and Mean Square Error (MSE) of the proposed estimator y-RPedc, we consider
(5)y-=Y-(1+e0),x-=X-(1+e1),x-′=X-(1+e1′),z-′=Z-(1+e2′),
such that
(6)E(e0)=E(e1)=E(e1′)=E(e2′)=0,
where |e0|<1, |e1|<1, |e1′|<1, |e2′|<1.
Let CY, CX, and CZ be the coefficients of variation of Y, X, and Z, respectively. Also, let ρYX, ρYZ, and ρXZ be the correlation coefficients between Y and X, Y and Z, and X and Z, respectively. Then, we have
(7)E(e02)=f1CY2,E(e12)=f1CX2,E(e1′2)=f2CX2,E(e2′2)=f2CZ2,E(e0e1)=f1ρYXCYCX,E(e0e1′)=f2ρYXCYCX,E(e0e2′)=f2ρYZCYCZ,E(e1e1′)=f2CX2,E(e1e2′)=f2ρXZCXCZ,E(e1′e2′)=f2ρXZCXCZ,
where
(3)f1=(1n-1N),f2=(1n′-1N),f3=f1-f2=(1n-1n′),CY2=SY2Y-2,CX2=SX2X-2,CZ2=SZ2Z-2,ρYX=SYXSYSX,ρYZ=SYZSYSZ,ρXZ=SXZSXSZ,SY2=1(N-1)∑i=1N(Yi-Y-)2,SX2=1(N-1)∑i=1N(Xi-X-)2,SZ2=1(N-1)∑i=1N(Zi-Z-)2,SYX=1(N-1)∑i=1N(Yi-Y-)(Xi-X-),SYZ=1(N-1)∑i=1N(Yi-Y-)(Zi-Z-),SXZ=1(N-1)∑i=1N(Xi-X-)(Zi-Z-).
Now, expressing the estimator y-RPedc in terms of e0, e1, e1′, and e2′ and neglecting the terms of e0, e1, e1′, and e2′ involving degree greater than two, we get
(9)y-RPedc=Y-[121+α(e1′-e2′-e1+e0e1′-e0e2′-e0e1)-α2(e1′2-e2′2-e12)+e0-12(e1′-e2′-e1+e0e1′-e0e2′-e0e1)+14(e1′2-e2′2-e12-e1′e2′+e1e2′-e1e1′)+18(e1′2+e2′2+e12)]
To the first degree of approximation, the Bias and Mean Square Error (MSE) of the proposed estimator y-RPedc are given by
(10)B(y-RPedc)=Y-[(4α-1)8×{f3CX2+f2CZ2}-(2α-1)2×{f3ρYXCYCX+f2ρYZCYCZ}(4α-1)8],(11)MSE(y-RPedc)=Y-2[f1CY2+(2α-1)24{f3CX2+f2CZ2}f-(2α-1){f3ρYXCYCX+f2ρYZCYCZ}(2α-1)24].
To the first degree of approximation, the expressions for Bias and Mean Square Error (MSE) of the estimators y-Rdc, y-Pdc, y-Redc, y-Pedc, and y-RPd are, respectively, given by
(3)B(y-Rdc)=Y-[f3CX2+f2CZ2-f3ρYXCYCX-f2ρYZCYCZ],B(y-Pdc)=Y-[f3ρYXCYCX+f2ρYZCYCZ],B(y-Redc)=Y-[38{f3CX2+f2CZ2}-12{f3ρYXCYCX+f2ρYZCYCZ}],B(y-Pedc)=Y-[-18{f3CX2+f2CZ2}+12{f3ρYXCYCX+f2ρYZCYCZ}],B(y-RPd)=Y-[αf3CX2-(2α-1)f3ρYXCYCX],(13)MSE(y-Rdc)=Y-2[f1CY2+f3CX2+f2CZ2-2f3ρYXCYCX-2f2ρYZCYCZCY2],MSE(y-Pdc)=Y-2[f1CY2+f3CX2+f2CZ2+2f3ρYXCYCX+2f2ρYZCYCZCY2],MSE(y-Redc)=Y-2[f1CY2+14{f3CX2+f2CZ2}f-{f3ρYXCYCX+f2ρYZCYCZ}14],MSE(y-Pedc)=Y-2[f1CY2+14{f3CX2+f2CZ2}e+{f3ρYXCYCX+f2ρYZCYCZ}14],MSE(y-RPd)=Y-2[f1CY2+4α2f3CX2-4αf3{CX2+ρYXCYCX}+f3{CX2+2ρYXCYCX}].
3.1. Optimum Value of <inline-formula>
<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M112">
<mml:mrow>
<mml:mi>α</mml:mi></mml:mrow>
</mml:math></inline-formula>
As we know, α is determined so as to minimize the Mean Square Error (MSE) of the estimators y-RPd and y-RPedc. So, the optimum values of α, for which MSE(y-RPd) and MSE(y-RPedc) are minimum, are obtained by using the following conditions:
(14)∂∂αMSE(y-RPd)=0,∂∂αMSE(y-RPedc)=0.
The optimum value of α, which minimizes the Mean Square Error (MSE) of the estimator y-RPd, is given by
(15)αopt=12[1+ρYXCYCX].
The optimum value of α, which minimizes the Mean Square Error (MSE) of the estimator y-RPedc, is given by
(16)αopt=f3(2ρYXCYCX+CX2)+f2(2ρYZCYCZ+CZ2)2(f3CX2+f2CZ2).
Substituting the value of α from (15) in (13), we get the minimum MSE of y-RPd as
(17)MSE(y-RPd)min=Y-2[f1CY2-f3ρYX2CY2].
Substituting the value of α from (16) in (11), we get the minimum MSE of y-RPedc as
(18)MSE(y-RPedc)min=Y-2[f1CY2-(f3ρYXCYCX+f2ρYZCYCZ)2f3CX2+f2CZ2].
4. Efficiency Comparisons
It is well known that the Bias and variance of the usual unbiased estimator y- for population mean in SRSWOR are
(19)B(y-)=0,(20)V(y-)=f1SY2=f1Y-2CY2.
From (11), (13), and (20), we have
MSE(y-RPedc)<V(y-), if
(21)α<4(f3ρYXCYCX+f2ρYZCYCZ)+f3CX2+f2CZ22(f3CX2+f2CZ2),
MSE(y-RPedc)<MSE(y-Rdc), if
(22)α<4(f3ρYXCYCX+f2ρYZCYCZ)-(f3CX2+f2CZ2)2(f3CX2+f2CZ2),
MSE(y-RPedc)<MSE(y-Pdc), if
(23)α<4(f3ρYXCYCX+f2ρYZCYCZ)+3(f3CX2+f2CZ2)2(f3CX2+f2CZ2),
MSE(y-RPedc)<MSE(y-Redc), if
(24)α<2(f3ρYXCYCX+f2ρYZCYCZ)f3CX2+f2CZ2,
MSE(y-RPedc)<MSE(y-Pedc), if
(25)α<2(f3ρYXCYCX+f2ρYZCYCZ)+f3CX2+f2CZ2f3CX2+f2CZ2,
MSE(y-RPedc)<MSE(y-RPd), if
(26)α<4(f3ρYXCYCX-f2ρYZCYCZ)+(3f3CX2-f2CZ2)2(3f3CX2-f2CZ2).
The range of α provides enough scope for choosing many estimators that are more efficient than the above considered estimators.
5. Empirical Study
To examine the merits of the proposed estimator of Y-, we have considered the following natural population datasets.
Population I (source: Cochran [20]) is shown as follows:
Y: number of “placebo” children,
X: number of paralytic polio cases in the “placebo” group,
Z: number of paralytic polio cases in the “not inoculated” group
Population II (source: Murthy [21]) is shown as follows:
Y: area under wheat in 1964,
X: area under wheat in 1963,
Z: cultivated area in 1961,
N=34, n′=10, n=7, Y-=199.44, X-=208.89, and Z-=747.59,
ρYX=0.9801, ρYZ=0.9043, ρXZ=0.9097, CY2=0.5673, CX2=0.5191, and CZ2=0.3527.
Here, we have computed
the Absolute Relative Bias (ARB) of different suggested estimators of Y- using the formula
(27)ARB(·)=|Bias(·)Y-|,
the Percentage Relative Efficiencies (PREs) of different suggested estimators of Y- with respect to y- using the formula
(28)PRE(·,y-)=V(y-)MSE(·)×100.
6. Conclusion
It is observed from Table 1 that,
for population I,
(29)ARB(y-)<ARB(y-RPedc)<ARB(y-Redc)<ARB(y-Pedc)<ARB(y-RPd)<ARB(y-Rdc)<ARB(y-Pdc),
for population II,
(30)ARB(y-)<ARB(y-RPd)<ARB(y-Rdc)<ARB(y-Redc)<ARB(y-Pedc)<ARB(y-RPedc)<ARB(y-Pdc).
Absolute Relative Bias (ARB) of different estimators of Y-.
Estimators
Population I
Population II
y-
0.0000
0.0000
y-Rdc
0.0369
0.0042
y-Pdc
0.0564
0.0513
y-Redc
0.0068
0.0079
y-Pedc
0.0165
0.0198
y-RPd
0.0222
0.0008
y-RPedc
0.0058
0.0243
From Table 2, we see that the Percentage Relative Efficiency (PRE) of the proposed estimator y-RPedc, for populations I and II, is more as compared to all other existing estimators, that is, usual unbiased estimator y-, chain type ratio estimator y-Rdc, chain type product estimator y-Pdc, exponential chain type ratio estimator y-Redc, exponential chain type product estimator y-Pedc, and ratio-product type estimator y-RPd.
Percentage Relative Efficiencies (PREs) of different estimators of Y- with respect to y-.
Estimators
Population I
Population II
y-
100
100
y-Rdc
136.91
730.81
y-Pdc
*
*
y-Redc
184.36
259.55
y-Pedc
*
*
y-RPd
133.95
156.96
y-RPedc
189.27
763.30
*Data is not applicable.
Finally, from Tables 1 and 2, we conclude that the proposed estimator y-RPedc (based on two auxiliary variables X and Z) is a more appropriate estimator in comparison to other existing estimators as it has appreciable efficiency as well as lower relative bias.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The authors are grateful to the editor Professor Zhidong Bai and the learned referee for their comments leading to the improvement of the paper.
CochranW. G.The estimation of the yields of the cereal experiments by sampling for the ratio of grain to total produceRobsonD. S.Applications of multivariate polykays to the theory of unbiased ratio-type estimationMurthyM. N.Product method of estimationNeymanJ.Contribution to the theory of sampling human populationsSukhatmeB. V.Some ratio-type estimators in two-phase samplingHidiroglouM. A.SarndalC. E.Use of auxiliary information for two phase samplingFullerW. A.Two-phase samplingProceedings of the Annual Meeting of the Survey Methods Section of the Statistical Society of Canada20002330HidiroglouM. A.Double samplingSinghH. P.VishwakarmaG. K.Modified exponential ratio and product estimators for finite population mean in double samplingSahooL. N.MishraG.NayakS. R.On two different classes of estimators in two-phase sampling using multi-auxiliary variablesChandL.KiregyeraB.A chain ratio-type estimator in finite population double sampling using two auxiliary variablesSinghG. N.UpadhyayaL. N.A class of modified chain-type estimators using two auxiliary variables in two phase samplingPrasadB.SinghR. S.SinghH. P.Some chain ratio-type estimators for ratio of two population means using two auxiliary characters in two phase samplingSinghS.SinghH. P.UpadhyayaL. N.Chain ratio and regression type estimators for median estimation in survey samplingSinghB. K.ChoudhuryS.Exponential chain ratio and product type estimators for finite population mean under double sampling schemeVishwakarmaG. K.GangeleR. K.A class of chain ratio-type exponential estimators in double sampling using two auxiliary variatesSinghH. P.Ruiz EspejoM.Double sampling ratio-product estimator of a finite population mean in sample surveysSrivastavaS. K.A generalized estimator for the mean of a finite population using multi-auxiliary informationCochranW. G.MurthyM. N.