Efficient estimation of finite population mean is carried out by using the auxiliary information meaningfully. In this paper we have suggested some modified ratio, product, and regression type estimators when using minimum and maximum values. Expressions for biases and mean squared errors of the suggested estimators have been derived up to the first order of approximation. The performances of the suggested estimators, relative to their usual counterparts, have been studied, and improved performance has been established. The improvement in efficiency by making use of maximum and minimum values has been verified numerically.
1. Introduction
Supplementary information in form of the auxiliary variable is rigorously used for the estimation of finite population mean for the study variable. Ratio and product estimators due to Cochran [1] and Murthy [2], respectively, are good examples when information on the auxiliary variable is incorporated for improved estimation of finite population mean of the study variable. When correlation between the study variable (y) and the auxiliary variable (x) is positive, ratio method of estimation is effective and when correlation is negative, product method of estimation is used. There are a lot of improvements and advancements in the construction of ratio, product, and regression estimators using the auxiliary information. For recent details, see Haq et al. [3], Haq and Shabbir [4], Yadav and Kadilar [5], Kadilar and Cingi [6], and Koyuncu and Kadilar [7] and the references cited therein.
The ratio method of estimation is at its best when the relationship between y and x is linear and the line of regression passes through the origin but as the line departs from origin, the efficiency of this method decreases. In practice, the condition that the line of regression passes through the origin is rarely satisfied and regression estimator is used for estimation of population mean.
Let U=(U1,U2,…,UN) be a population of size N. Let (yi,xi) be the values of the study and the auxiliary variables, respectively, on the ith unit of a finite population.
Let us assume that a simple random sample of size n is drawn without replacement from U for estimating the population mean Y-=∑i=1Nyi/N. It is further assumed that the population mean X-=∑i=1Nxi/N of the auxiliary variable x is known. The minimum say (xmin) and maximum say (xmax) values of the auxiliary variables are also assumed to be known.
The variance of mean per unit estimator y-=∑i=1nyi/n is given by
(1)V(y-)=λSy2,
where λ=((1/n)-(1/N)) and Sy2=(1/(N-1))∑i=1N(yi-Y-)2.
Some time there exists unusually very large (say ymax) and very small (say ymin) units in the population. The mean per unit estimator is very sensitive to these unusual observations and as a result population mean will be either underestimated (in case the sample contains ymin) or overestimated (in case the sample contains ymax). To overcome the situation Sarndal [8] suggested the following unbiased estimator:
(2)y-s={y-+cifsamplecontainsyminbutnotymax,y--cifsamplecontainsymaxbutnotymin,y-forallothersamples,
where c is a constant.
The variance of y-s is given by
(3)V(y-s)=λSy2-2λncN-1(ymax-ymin-nc).
Further, V(y-s)<V(y-) if 0<c<(ymax-ymin)/n.
For, copt=(ymax-ymin)/2n, variance of y-s is given by
(4)V(y-s)opt=V(y-)-λ(ymax-ymin)22(N-1),
which is always smaller than V(y-).
The usual ratio and product estimators of population mean (Y-) are given by
(5)y-R=y-(X-x-),y-P=y-(x-X-),
where y-=∑i=1nyi/n and x-=∑i=1nxi/n are the sample means of variables y and x, respectively.
The expressions for biases (B(·)), and mean square errors (M(·)), of the conventional ratio and product estimators, are given by
(6)B(y-R)≅Y-λ(Cx2-ρyxCyCx),(7)B(y-P)=Y-λρyxCyCx,(8)M(y-R)≅Y-2λ(Cy2+Cx2-2ρyxCyCx),(9)M(y-p)≅Y-2λ(Cy2+Cx2+2ρyxCyCx),
where Cy=Sy/Y- and Cx=Sx/X- are the coefficients of variation of y and x, respectively, ρyx=Syx/SySx is the correlation coefficient between y and x, Sx2=∑i=1N(xi-X-)2/(N-1), and Syx=∑i=1N(yi-Y-)(xi-X-)/(N-1) are the population variance and population covariance, respectively.
Usual regression estimator is given by
(10)y-lr=y-+b(X--x-),
where b is the sample regression coefficient.
The variance of the estimator y-lr is given by
(11)V(y-lr)=λSy2(1-ρyx2).
2. Proposed Estimators
Motivated by Sarndal [8], we extend this idea to estimators which make use of the auxiliary information for increased precision. It is well known that ratio and product estimators are used when y and x are positively and negatively correlated, respectively. We suggest estimator for each case separately as follows.
Case 1 (positive correlation between <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M58"><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M59"><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:math></inline-formula>).
When y and x are positively correlated, then with selection of a larger value of x, a larger value of y is expected to be selected and when smaller value of x is selected, selection of a smaller value of y is expected. So we define the following estimators:
(12)Y-^RC=y-c11x-c21X-={y-+c1x-+c2X-,y--c1x--c2X-,y-x-X-
and similarly
(13)y-lrC1=y-c11+b(X--x-c21),
where (y-c11=y-+c1,x-c21=x-+c2) if the sample contains ymin and xmin; (y-c11=y--c1,x-c21=x--c2) if the sample contains ymax and xmax, and (y-c11=y-,x-c21=x-) for all other combinations of samples.
Case 2 (negative correlation between <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M75"><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M76"><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:math></inline-formula>).
When y and x are negatively correlated then with selection of a larger value of x, a smaller value of y is expected to be selected and when smaller value of x is selected, a larger value of y is expected to be selected. Keeping these points in view, the following estimators are therefore suggested:
(14)Y-^PC=y-c12x-c22X-={(y-+c1)(x--c2)X-,(y--c1)(x-+c2)X-,y-x-X-
and similarly
(15)y-lrC2=y-c11+b(X--x-c22),
where (y-c12=y-+c1,x-c22=x--c2) if sample contains ymin and xmax; y-c12=(y--c1,x-c22=x-+c2), if sample contains ymax and xmin, and (y-c12=y-,x-c22=x-) for all other combinations of samples.
To find the bias and mean square error of these suggested estimators, we first prove two theorems which will be used in subsequent derivations.
Theorem 1.
If a sample of size n units is drawn from a population of size N units, then the covariance between y-c12 and x-c12, when they are positively correlated, is given by
(16)
Cov(y-c11,x-c21)=λSyx-nλN-1×[c1(xmax-xmin)+c2(ymax-ymin)-2nc1c2].
Proof.
Let us assume that n units have been drawn without replacement from a population of size N. Let Sn denote a sample space. We partition the whole sample space into three mutually exclusive and collectively exhaustive sets, that is, S1, S2, and S3 such that Sn=S1∪S2∪S3. Further S1 is the set of all possible samples which contains ymin and xmin, and S2 consists of all samples which contains ymax and xmax, and S3=S-S1-S2. The number of sample points in S1, S2, and S3 is given by (N-2n-1), (N-2n-1), and (Nn)-2(N-2n-1), respectively.
By definition of covariance, we have
(17)Cov(y-c11,x-c21)=(Nn)-1[∑s∈S1(y-+c1-Y-)(x-+c2-X-)00002200000.+∑s∈S2(y--c1-Y-)(x--c2-X-)00002200000.+∑s∈S3(y--Y-)(x--X-)]=(Nn)-1[∑s∈S1(y--Y-)(x--X-)00002200000.+∑s∈S2(y--Y-)(x--X-)00022000000.+∑s∈S3(y--Y-)(x--X-)00000220000.-c1{∑s∈S2(x--X-)-∑s∈S1(x--X-)}00002200000.-c2{∑s∈S2(y--Y-)-∑s∈S1(y--Y-)}00002200000.+c1c2{∑s∈S1+∑s∈S2}]=(Nn)-1[∑s∈S(y--Y-)(x--X-)]-(Nn)-1×[c1(xmax-xminn)(N-2n-1)0002200.-c2(ymax-yminn)(N-2n-1)0002200.+2c1c2(N-2n-1)]=Cov(y-,x-)-N-nN(N-1)×[c1(xmax-xmin)+c2(ymax-ymin)0000..-2nc1c2]=λSyx-nλ(N-1)×[c1(xmax-xmin)+c2(ymax-ymin)00000-2nc1c2].
Theorem 2.
If a sample of size n units is drawn from a population of size N units, then the covariance between y-c12 and x-c22, when they are negatively correlated, is given by
(18)
Cov
(y-c12,x-c22)=λSyx+nλN-1×[c1(xmax-xmin)+c2(ymax-ymin)00000-2nc1c2].
The above Theorem 2 can be proved similarly as Theorem 1.
We define the following relative error terms.
Let e0=(y-c1-Y-)/Y- and e1=(x-c2-X-)/X-, such that
(19)E(e0)=E(e1)=0,(20)E(e02)=λY-2[Sy2-2nc1N-1(ymax-ymin-nc1)],(21)E(e12)=λX-2[Sx2-2nc2N-1(xmax-xmin-nc2)],(22)E(e0e1)=λY-X-[Syx-nN-1…….×{c2(ymax-ymin)+c1(xmax-xmin)……..0..-2nc1c2}nN-1].
Expressing Y-^RC in terms of e’s, we have
(23)Y-^RC=Y-(1+e0)(1+e1)-1.
Expanding and rearranging right-hand side of (23), to first degree of approximation, we have
(24)(Y-^RC-Y-)≅Y-(e0+e1-e0e1+e12).
Using (24), the bias of Y-^RC is given by
(25)B(Y-^RC)≅λX-[R(Sx2-2nc2N-1(xmax-xmin-nc2))00000-{Syx-nN-100000000×(c2(ymax-ymin)+c1(xmax-xmin)000000000.-2nc1c2c2(ymax-ymin)+c1(xmax-xmin))Syx-nN-1}R(Sx2-2nc2N-1(xmax-xmin-nc2))],
where R=Y-/X-.
Using (24), the mean square error of Y-^RC, to the first degree of approximation, is given by
(26)M(Y-^RC)≅λ[Sy2-2nc1N-1(ymax-ymin-nc1)00.00+R2{Sx2-2nc2N-1(xmax-xmin-nc2)}00.00-2R{Syx-nN-1000.000000.×(c2(ymax-ymin)+c1(xmax-xmin)000.20000..000.-2nc1c2c2(ymax-ymin)+c1(xmax-xmin))Syx-nN-1}Sy2-2nc1N-1(ymax-ymin-nc1)]
or
(27)M(Y-^RC)≅M(y-R)-2λnN-1×[(c1-Rc2){(ymax-ymin)-R(xmax-xmin)00000000000000-n(c1-Rc2)(ymax-ymin)-R(xmax-xmin)}(c1-Rc2){(ymax-ymin)-R(xmax-xmin)].
To find optimum values of c1 and c2, we differentiate (27) with respect to c1 and c2 as
(28)∂M(Y-^RC)∂c1=0⟹(ymax-ymin)-R(xmax-xmin)-2n(c1-Rc2)=0,∂M(Y-^RC)∂c2=0⟹(ymax-ymin)-R(xmax-xmin)-2n(c1-Rc2)=0
Here we have one equation with two unknowns so unique, solution is not possible, so we let c2=(xmax-xmin)/2n, and then c1=(ymax-ymin)/2n.
For optimum values of c1 and c2, the optimum mean square error of Y-^RC is given by
(29)M(Y-^RC)opt≅M(y-R)-λ2(N-1)×[(ymax-ymin)-R(xmax-xmin)]2.
Similarly the bias and mean square error or optimum mean square error of Y-^PC are, respectively, given by
(30)B(Y-^PC)≅λX-[Syx-nN-1….0×{c2(ymax-ymin)+c1(xmax-xmin)…22...0-2nc1c2}nN-1],(31)M(Y-^PC)≅M(y-P)-2λnN-1×[(c1+Rc2){(ymax-ymin)+R(xmax-xmin)0000000000000.-n(c1+Rc2)(ymax-ymin)+R(xmax-xmin)}(c1+Rc2){(ymax-ymin)+R(xmax-xmin)].
For optimum values of c1 and c2, the optimum mean square error of Y-^PC is given by
(32)M(Y-^PC)opt≅M(y-P)-λ2(N-1)×[(ymax-ymin)-R(xmax-xmin)]2.
The variance of regression estimator y-lrC1 in case of positive correlation is given by
(33)V(y-lrC1)=V(y-lr)-2λnN-1×[(c1-βc2){(ymax-ymin)-β(xmax-xmin)0000020000000..-2n(c1-βc2)(ymax-ymin)-β(xmax-xmin)}(c1-βc2){(ymax-ymin)-β(xmax-xmin)],
where β=ρyx(Sy/Sx) is the population regression coefficient of y on x.
For c2=(xmax-xmin)/2n and c1=(ymax-ymin)/2n, optimum variance of y-lrC1 is given by
(34)V(y-lrC1)opt=V(y-lr)-λ2(N-1)×[(ymax-ymin)-β(xmax-xmin)]2.
For negative correlation, variance of the regression estimator y-lrC2 is given by
(35)V(y-lrC2)=V(y-lr)-2λnN-1×[(c1+βc2){(ymax-ymin)+β(xmax-xmin)0000000002000..-2n(c1+βc2)}].
For c2=(xmax-xmin)/2n and c1=(ymax-ymin)/2n, optimum variance of y-lrC2 is given by
(36)V(y-lrC2)opt=V(y-lr)-λ2(N-1)×[(ymax-ymin)+β(xmax-xmin)]2.
So in general we can write V(y-lrC)opt as
(37)V(y-lrC)opt=V(y-lr)-λ2(N-1)×[(ymax-ymin)-|β|(xmax-xmin)]2.
3. Comparison
The conditions under which the suggested estimators Y-^RC, Y-^PC, and y-lrC perform better than the usual mean per unit estimator and their usual counterpart is given below.
(a) Comparison of Proposed Ratio Type Estimator. A proposed estimator Y-^RC will perform better than
(i) mean per unit estimator (by (1) and (27)) if V(y-)-M(Y-^RC)>0 or if
(38)ρyx>RSx2Sy-nRSySx(N-1)×[(c1-Rc2){(ymax-ymin)-R(xmax-xmin)00000000000.-2n(c1-Rc2)}];
(ii) usual ratio estimator (by (8) and (27)) if M(y-R)-M(Y-^RC)>0 or if
(39)min[Rc2,Rc2-{R(xmax-xmin)-(ymax-ymin)n}]<c1<max[Rc2,Rc2-{R(xmax-xmin)-(ymax-ymin)n}].
(b) Comparison of Proposed Product Type Estimator. A proposed product type estimator will perform better than
(iii) mean per unit estimator if V(y-)-M(Y-^PC)>0 (by (1) and (31)) or if
(40)ρyx<-RSx2Sy+nRSySx(N-1)×[(c1+Rc2){(ymax-ymin)+R(xmax-xmin)000000200000..-2n(c1+Rc2)}];
(iv) usual product estimator if M(y-P)-M(Y-^PC)>0 (by (9) and (31)) or if
(41)min[+(R(xmax-xmin)+(ymax-ymin)n)-Rc2,-Rc2000000+(R(xmax-xmin)+(ymax-ymin)n)]<c1<max[(R(xmax-xmin)+(ymax-ymin)n)-Rc2,-Rc200000000.+(R(xmax-xmin)+(ymax-ymin)n)].
(c) Comparison of Proposed Regression Type Estimator. A proposed regression type estimator (positive correlation) will perform better than
(v) mean per unit estimator if V(y-)-V(y-lrC1)>0 (by (1) and (33)) or if
(42)ρyx2>-2nSy2(N-1)×[(c1-βc2){(ymax-ymin)-β(xmax-xmin)000000000000…-2n(c1-βc2)}];
(vi) usual regression estimator if V(y-lr)-V(y-lrC1)>0 (by (11) and (33)) or if
(43)min[βc2,βc2-{β(xmax-xmin)-(ymax-ymin)n}]<c1<max[βc2,βc2-{β(xmax-xmin)-(ymax-ymin)n}].
A proposed regression type estimator (negative correlation) will perform better than
(vii) mean per unit estimator if V(y-)-V(y-lrC2)>0 (by (1) and (35)) or if
(44)ρyx2>-2nSy2(N-1)×[(c1+βc2){(ymax-ymin)+β(xmax-xmin)0000000000000..-2n(c1-βc2)}];
(viii) usual regression estimator if V(y-lr)-V(y-lrC1)>0 (by (11) and (33)) or if
(45)min[+{β(xmax-xmin)-(ymax-ymin)n}-βc2,-βc200000+{β(xmax-xmin)-(ymax-ymin)n}]<c1<max[+{β(xmax-xmin)-(ymax-ymin)n}-βc2,-βc20000000000+{β(xmax-xmin)-(ymax-ymin)n}].
(d) Comparison of Suggested Estimators for Optimum Values of c1 and c2 with Usual Estimators. For optimum values of c1 and c2, the proposed estimator will always perform better than usual mean per unit estimator and their usual counterparts (ratio, product and regression estimators).
4. Empirical Study
We consider the following datasets for numerical comparison.
Population 1 (Singh and Mangat [<xref ref-type="bibr" rid="B10">9</xref>, page 193]).
Let y be the milk yield in kg after new food and let x be the yield in kg before new yield. N=27, n=12, X-=10.41111, Y-=11.25185, ymax=14.8, ymin=7.9, xmax=14.5, xmin=6.5, Sy2=4.103, Sx2=4.931, Sxy=4.454, and ρyx=0.990.
Population 2 (Singh and Mangat [<xref ref-type="bibr" rid="B10">9</xref>, page 195]).
Let y be the weekly time (hours) spent in nonacademic activities and let x be the overall grade point average (4.0 bases). N=36, n=12, X-=2.798333, Y-=14.77778, ymax=33, ymin=6, xmax=3.82, xmin=1.81, Sy2=38.178, Sx2=0.3504, Sxy=-2.477, and ρyx=-0.6772.
Population 3 (Murthy [<xref ref-type="bibr" rid="B7">10</xref>, page 399]).
Let y be the area under wheat crop in 1964 and let x be the area under wheat crop in 1963. N=34, n=12, X-=208.882, Y-=199.441, ymax=634, ymin=6, xmax=564, xmin=5, Sy2=22564.56,Sx2=22652.05, Sxy=22158.05, and ρyx=0.980.
Population 4 (Cochran [<xref ref-type="bibr" rid="B2">11</xref>, page 152]).
Let y be population size in 1930 (in 1000) and x be the population size in 1920 (in 1000). N=49, n=12, X-=103.1429, Y-=127.7959, ymax=634, ymin=46, xmax=507, xmin=2, Sy2=15158.83, Sx2=10900.42, Sxy=12619.78 and ρyx=0.98.
The conditional values and results are given in Tables 1 and 2, respectively.
Numerical values of conditions (38)–(45).
Conditions
Population 1
Population 2
Population 3
Population 4
(38)
0.99 > 0.592
−0.667 > 0.2529*
0.980 > 0.478
0.981 > 0.525
(39)
0.214 < 0.287 < 0.360
0.442 < 1.125 < 1.807
22.23 < 26.16 < 30.094
22.92 < 24.5 < 26.07
(40)
0.99 < −0.592*
−0.667 < −0.253
0.980 < −0.4783*
0.981 < −0.525*
(41)
−0.360 < 0.287 < 0.93
−0.442 < 1.125 < 2.692
−22.238 < 26.16 < 74.57
−26.07 < 24.5 < 75.07
(42)
0.980 > 0.000
0.459 > 0.000
0.9605 > 0.000
0.963 > 0.000
(43)
0.273 < 0.287 < 0.301
−0.773 < 1.125 < −0.592*
22.78 < 26.16 < 29.549
24.36 < 24.5 < 24.63
(44)
0.980 > −1.913
0.459 > 0.272
0.9605 > −0.862
0.963 > −1.88
(45)
−0.30 < 0.287 < −0.15*
0.592 < 1.125 < 4.026
−30.63 < 26.16 < −22.78*
−24.36 < 24.5 < −21.2*
Note: *indicates that the condition is not satisfied.
PRE of different estimators with respect to y-.
Estimator
Population 1
Population 2
Population 3
Population 4
y-R
1742.030
51.514
2501.29
2442.965
y-P
21.053
175.210
26.383
23.998
y-lr
5115.818
184.699
2536.131
2763.749
Y-^RC
2319.293
54.325
2940.086
2502.692
Y-^PC
21.117
212.639
26.424
24.004
y-lrC
5249.673
208.254
2856.83
2764.336
For percentage relative efficiency (PRE), we use the following expression:
(46)PRE(y-i,y-)=V(y-)V(y-i)orM(y-i)×10000000.fori=R,P,RC,PC,lr,lrC.
5. Conclusion
From Table 2, it is observed that the ratio estimator Y-^RC is performing better than y-R in Populations 1, 3, and 4 because of positive correlation. The product estimator Y-^PC is better than y-P just in Population 2 because of negative correlation. The regression estimator y-lrC outperforms than all other considered estimators and is preferable.
Acknowledgments
The authors are thankful to the learned referees for their valuable suggestions and helpful comments in revising the manuscript.
CochranW. G.The estimation of the yields of cereal experiments by sampling for the ratio to total produceMurthyM. N.Product method of estimationHaqA.ShabbirJ.GuptaS.Improved exponential ratio type estimators in stratified samplingHaqA.ShabbirJ.Improved family of ratio estimators in simple and stratified random samplingYadavS. K.KadilarC.Improved class of ratio and product estimatorsKadilarC.CingiH.Improvement in estimating the population mean in simple random samplingKoyuncuN.KadilarC.Ratio and product estimators in stratified random samplingSarndalC. E.Sample survey theory vs general statistical theory: estimation of the population meanSinghR.MangatN. S.MurthyM. N.CochranW. G.