TSWJ The Scientific World Journal 1537-744X Hindawi Publishing Corporation 431868 10.1155/2013/431868 431868 Research Article Some Improved Ratio, Product, and Regression Estimators of Finite Population Mean When Using Minimum and Maximum Values Khan Manzoor http://orcid.org/0000-0002-0035-7072 Shabbir Javid Robak T. Xia Y. Department of Statistics Quaid-i-Azam University Islamabad 45320 Pakistan qau.edu.pk 2013 17 11 2013 2013 05 08 2013 16 09 2013 2013 Copyright © 2013 Manzoor Khan and Javid Shabbir. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Efficient estimation of finite population mean is carried out by using the auxiliary information meaningfully. In this paper we have suggested some modified ratio, product, and regression type estimators when using minimum and maximum values. Expressions for biases and mean squared errors of the suggested estimators have been derived up to the first order of approximation. The performances of the suggested estimators, relative to their usual counterparts, have been studied, and improved performance has been established. The improvement in efficiency by making use of maximum and minimum values has been verified numerically.

1. Introduction

Supplementary information in form of the auxiliary variable is rigorously used for the estimation of finite population mean for the study variable. Ratio and product estimators due to Cochran  and Murthy , respectively, are good examples when information on the auxiliary variable is incorporated for improved estimation of finite population mean of the study variable. When correlation between the study variable (y) and the auxiliary variable (x) is positive, ratio method of estimation is effective and when correlation is negative, product method of estimation is used. There are a lot of improvements and advancements in the construction of ratio, product, and regression estimators using the auxiliary information. For recent details, see Haq et al. , Haq and Shabbir , Yadav and Kadilar , Kadilar and Cingi , and Koyuncu and Kadilar  and the references cited therein.

The ratio method of estimation is at its best when the relationship between y and x is linear and the line of regression passes through the origin but as the line departs from origin, the efficiency of this method decreases. In practice, the condition that the line of regression passes through the origin is rarely satisfied and regression estimator is used for estimation of population mean.

Let U=(U1,U2,,UN) be a population of size N. Let (yi,xi) be the values of the study and the auxiliary variables, respectively, on the ith unit of a finite population.

Let us assume that a simple random sample of size n is drawn without replacement from U for estimating the population mean Y-=i=1Nyi/N. It is further assumed that the population mean X-=i=1Nxi/N of the auxiliary variable x is known. The minimum say (xmin) and maximum say (xmax) values of the auxiliary variables are also assumed to be known.

The variance of mean per unit estimator y-=i=1nyi/n is given by (1)V(y-)=λSy2, where λ=((1/n)-(1/N)) and Sy2=(1/(N-1))i=1N(yi-Y-)2.

Some time there exists unusually very large (say ymax) and very small (say ymin) units in the population. The mean per unit estimator is very sensitive to these unusual observations and as a result population mean will be either underestimated (in case the sample contains ymin) or overestimated (in case the sample contains ymax). To overcome the situation Sarndal  suggested the following unbiased estimator: (2)y-s={y-+cifsamplecontainsyminbutnotymax,y--cifsamplecontainsymaxbutnotymin,y-forallothersamples, where c is a constant.

The variance of y-s is given by (3)V(y-s)=λSy2-2λncN-1(ymax-ymin-nc).

Further, V(y-s)<V(y-) if 0<c<(ymax-ymin)/n.

For, copt=(ymax-ymin)/2n, variance of y-s is given by (4)V(y-s)opt=V(y-)-λ(ymax-ymin)22(N-1), which is always smaller than V(y-).

The usual ratio and product estimators of population mean (Y-) are given by (5)y-R=y-(X-x-),y-P=y-(x-X-), where y-=i=1nyi/n and x-=i=1nxi/n are the sample means of variables y and x, respectively.

The expressions for biases (B(·)), and mean square errors (M(·)), of the conventional ratio and product estimators, are given by (6)B(y-R)Y-λ(Cx2-ρyxCyCx),(7)B(y-P)=Y-λρyxCyCx,(8)M(y-R)Y-2λ(Cy2+Cx2-2ρyxCyCx),(9)M(y-p)Y-2λ(Cy2+Cx2+2ρyxCyCx), where Cy=Sy/Y- and Cx=Sx/X- are the coefficients of variation of y and x, respectively, ρyx=Syx/SySx is the correlation coefficient between y and x, Sx2=i=1N(xi-X-)2/(N-1), and Syx=i=1N(yi-Y-)(xi-X-)/(N-1) are the population variance and population covariance, respectively.

Usual regression estimator is given by (10)y-lr=y-+b(X--x-), where b is the sample regression coefficient.

The variance of the estimator y-lr is given by (11)V(y-lr)=λSy2(1-ρyx2).

2. Proposed Estimators

Motivated by Sarndal , we extend this idea to estimators which make use of the auxiliary information for increased precision. It is well known that ratio and product estimators are used when y and x are positively and negatively correlated, respectively. We suggest estimator for each case separately as follows.

Case 1 (positive correlation between <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M58"><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M59"><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:math></inline-formula>).

When y and x are positively correlated, then with selection of a larger value of x, a larger value of y is expected to be selected and when smaller value of x is selected, selection of a smaller value of y is expected. So we define the following estimators: (12)Y-^RC=y-c11x-c21X-={y-+c1x-+c2X-,y--c1x--c2X-,y-x-X- and similarly (13)y-lrC1=y-c11+b(X--x-c21), where (y-c11=y-+c1,x-c21=x-+c2) if the sample contains ymin and xmin; (y-c11=y--c1,x-c21=x--c2) if the sample contains ymax and xmax, and (y-c11=y-,x-c21=x-) for all other combinations of samples.

Case 2 (negative correlation between <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M75"><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M76"><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:math></inline-formula>).

When y and x are negatively correlated then with selection of a larger value of x, a smaller value of y is expected to be selected and when smaller value of x is selected, a larger value of y is expected to be selected. Keeping these points in view, the following estimators are therefore suggested: (14)Y-^PC=y-c12x-c22X-={(y-+c1)(x--c2)X-,(y--c1)(x-+c2)X-,y-x-X- and similarly (15)y-lrC2=y-c11+b(X--x-c22), where (y-c12=y-+c1,x-c22=x--c2) if sample contains ymin and xmax; y-c12=(y--c1,x-c22=x-+c2), if sample contains ymax and xmin, and (y-c12=y-,x-c22=x-) for all other combinations of samples.

To find the bias and mean square error of these suggested estimators, we first prove two theorems which will be used in subsequent derivations.

Theorem 1.

If a sample of size n units is drawn from a population of size N units, then the covariance between y-c12 and x-c12, when they are positively correlated, is given by (16) Cov(y-c11,x-c21)=λSyx-nλN-1×[c1(xmax-xmin)+c2(ymax-ymin)-2nc1c2].

Proof.

Let us assume that n units have been drawn without replacement from a population of size N. Let Sn denote a sample space. We partition the whole sample space into three mutually exclusive and collectively exhaustive sets, that is, S1, S2, and S3 such that Sn=S1S2S3. Further S1 is the set of all possible samples which contains ymin and xmin, and S2 consists of all samples which contains ymax and xmax, and S3=S-S1-S2. The number of sample points in S1, S2, and S3 is given by (N-2n-1), (N-2n-1), and (Nn)-2(N-2n-1), respectively.

By definition of covariance, we have (17)Cov(y-c11,x-c21)=(Nn)-1[sS1(y-+c1-Y-)(x-+c2-X-)00002200000.+sS2(y--c1-Y-)(x--c2-X-)00002200000.+sS3(y--Y-)(x--X-)]=(Nn)-1[sS1(y--Y-)(x--X-)00002200000.+sS2(y--Y-)(x--X-)00022000000.+sS3(y--Y-)(x--X-)00000220000.-c1{sS2(x--X-)-sS1(x--X-)}00002200000.-c2{sS2(y--Y-)-sS1(y--Y-)}00002200000.+c1c2{sS1+sS2}]=(Nn)-1[sS(y--Y-)(x--X-)]-(Nn)-1×[c1(xmax-xminn)(N-2n-1)0002200.-c2(ymax-yminn)(N-2n-1)0002200.+2c1c2(N-2n-1)]=Cov(y-,x-)-N-nN(N-1)×[c1(xmax-xmin)+c2(ymax-ymin)0000..-2nc1c2]=λSyx-nλ(N-1)×[c1(xmax-xmin)+c2(ymax-ymin)00000-2nc1c2].

Theorem 2.

If a sample of size n units is drawn from a population of size N units, then the covariance between y-c12 and x-c22, when they are negatively correlated, is given by (18) Cov (y-c12,x-c22)=λSyx+nλN-1×[c1(xmax-xmin)+c2(ymax-ymin)00000-2nc1c2].

The above Theorem 2 can be proved similarly as Theorem 1.

We define the following relative error terms.

Let e0=(y-c1-Y-)/Y- and e1=(x-c2-X-)/X-, such that (19)E(e0)=E(e1)=0,(20)E(e02)=λY-2[Sy2-2nc1N-1(ymax-ymin-nc1)],(21)E(e12)=λX-2[Sx2-2nc2N-1(xmax-xmin-nc2)],(22)E(e0e1)=λY-X-[Syx-nN-1.×{c2(ymax-ymin)+c1(xmax-xmin)..0..-2nc1c2}nN-1].

Expressing Y-^RC in terms of e’s, we have (23)Y-^RC=Y-(1+e0)(1+e1)-1.

Expanding and rearranging right-hand side of (23), to first degree of approximation, we have (24)(Y-^RC-Y-)Y-(e0+e1-e0e1+e12).

Using (24), the bias of Y-^RC is given by (25)B(Y-^RC)λX-[R(Sx2-2nc2N-1(xmax-xmin-nc2))00000-{Syx-nN-100000000×(c2(ymax-ymin)+c1(xmax-xmin)000000000.-2nc1c2c2(ymax-ymin)+c1(xmax-xmin))Syx-nN-1}R(Sx2-2nc2N-1(xmax-xmin-nc2))], where R=Y-/X-.

Using (24), the mean square error of Y-^RC, to the first degree of approximation, is given by (26)M(Y-^RC)λ[Sy2-2nc1N-1(ymax-ymin-nc1)00.00+R2{Sx2-2nc2N-1(xmax-xmin-nc2)}00.00-2R{Syx-nN-1000.000000.×(c2(ymax-ymin)+c1(xmax-xmin)000.20000..000.-2nc1c2c2(ymax-ymin)+c1(xmax-xmin))Syx-nN-1}Sy2-2nc1N-1(ymax-ymin-nc1)] or (27)M(Y-^RC)M(y-R)-2λnN-1×[(c1-Rc2){(ymax-ymin)-R(xmax-xmin)00000000000000-n(c1-Rc2)(ymax-ymin)-R(xmax-xmin)}(c1-Rc2){(ymax-ymin)-R(xmax-xmin)].

To find optimum values of c1 and c2, we differentiate (27) with respect to c1 and c2 as (28)M(Y-^RC)c1=0(ymax-ymin)-R(xmax-xmin)-2n(c1-Rc2)=0,M(Y-^RC)c2=0(ymax-ymin)-R(xmax-xmin)-2n(c1-Rc2)=0

Here we have one equation with two unknowns so unique, solution is not possible, so we let c2=(xmax-xmin)/2n, and then c1=(ymax-ymin)/2n.

For optimum values of c1 and c2, the optimum mean square error of Y-^RC is given by (29)M(Y-^RC)optM(y-R)-λ2(N-1)×[(ymax-ymin)-R(xmax-xmin)]2.

Similarly the bias and mean square error or optimum mean square error of Y-^PC are, respectively, given by (30)B(Y-^PC)λX-[Syx-nN-1.0×{c2(ymax-ymin)+c1(xmax-xmin)22...0-2nc1c2}nN-1],(31)M(Y-^PC)M(y-P)-2λnN-1×[(c1+Rc2){(ymax-ymin)+R(xmax-xmin)0000000000000.-n(c1+Rc2)(ymax-ymin)+R(xmax-xmin)}(c1+Rc2){(ymax-ymin)+R(xmax-xmin)].

For optimum values of c1 and c2, the optimum mean square error of Y-^PC is given by (32)M(Y-^PC)optM(y-P)-λ2(N-1)×[(ymax-ymin)-R(xmax-xmin)]2.

The variance of regression estimator y-lrC1 in case of positive correlation is given by (33)V(y-lrC1)=V(y-lr)-2λnN-1×[(c1-βc2){(ymax-ymin)-β(xmax-xmin)0000020000000..-2n(c1-βc2)(ymax-ymin)-β(xmax-xmin)}(c1-βc2){(ymax-ymin)-β(xmax-xmin)], where β=ρyx(Sy/Sx) is the population regression coefficient of y on x.

For c2=(xmax-xmin)/2n and c1=(ymax-ymin)/2n, optimum variance of y-lrC1 is given by (34)V(y-lrC1)opt=V(y-lr)-λ2(N-1)×[(ymax-ymin)-β(xmax-xmin)]2.

For negative correlation, variance of the regression estimator y-lrC2 is given by (35)V(y-lrC2)=V(y-lr)-2λnN-1×[(c1+βc2){(ymax-ymin)+β(xmax-xmin)0000000002000..-2n(c1+βc2)}].

For c2=(xmax-xmin)/2n and c1=(ymax-ymin)/2n, optimum variance of y-lrC2 is given by (36)V(y-lrC2)opt=V(y-lr)-λ2(N-1)×[(ymax-ymin)+β(xmax-xmin)]2.

So in general we can write V(y-lrC)opt as (37)V(y-lrC)opt=V(y-lr)-λ2(N-1)×[(ymax-ymin)-|β|(xmax-xmin)]2.

3. Comparison

The conditions under which the suggested estimators Y-^RC, Y-^PC, and y-lrC perform better than the usual mean per unit estimator and their usual counterpart is given below.

(a) Comparison of Proposed Ratio Type Estimator. A proposed estimator Y-^RC will perform better than

(i) mean per unit estimator (by (1) and (27)) if V(y-)-M(Y-^RC)>0 or if (38)ρyx>RSx2Sy-nRSySx(N-1)×[(c1-Rc2){(ymax-ymin)-R(xmax-xmin)00000000000.-2n(c1-Rc2)}];

(ii) usual ratio estimator (by (8) and (27)) if M(y-R)-M(Y-^RC)>0 or if (39)min[Rc2,Rc2-{R(xmax-xmin)-(ymax-ymin)n}]<c1<max[Rc2,Rc2-{R(xmax-xmin)-(ymax-ymin)n}].

(b) Comparison of Proposed Product Type Estimator. A proposed product type estimator will perform better than

(iii) mean per unit estimator if V(y-)-M(Y-^PC)>0 (by (1) and (31)) or if (40)ρyx<-RSx2Sy+nRSySx(N-1)×[(c1+Rc2){(ymax-ymin)+R(xmax-xmin)000000200000..-2n(c1+Rc2)}];

(iv) usual product estimator if M(y-P)-M(Y-^PC)>0 (by (9) and (31)) or if (41)min[+(R(xmax-xmin)+(ymax-ymin)n)-Rc2,-Rc2000000+(R(xmax-xmin)+(ymax-ymin)n)]<c1<max[(R(xmax-xmin)+(ymax-ymin)n)-Rc2,-Rc200000000.+(R(xmax-xmin)+(ymax-ymin)n)].

(c) Comparison of Proposed Regression Type Estimator. A proposed regression type estimator (positive correlation) will perform better than

(v) mean per unit estimator if V(y-)-V(y-lrC1)>0 (by (1) and (33)) or if (42)ρyx2>-2nSy2(N-1)×[(c1-βc2){(ymax-ymin)-β(xmax-xmin)000000000000-2n(c1-βc2)}];

(vi) usual regression estimator if V(y-lr)-V(y-lrC1)>0 (by (11) and (33)) or if (43)min[βc2,βc2-{β(xmax-xmin)-(ymax-ymin)n}]<c1<max[βc2,βc2-{β(xmax-xmin)-(ymax-ymin)n}].

A proposed regression type estimator (negative correlation) will perform better than

(vii) mean per unit estimator if V(y-)-V(y-lrC2)>0 (by (1) and (35)) or if (44)ρyx2>-2nSy2(N-1)×[(c1+βc2){(ymax-ymin)+β(xmax-xmin)0000000000000..-2n(c1-βc2)}];

(viii) usual regression estimator if V(y-lr)-V(y-lrC1)>0 (by (11) and (33)) or if (45)min[+{β(xmax-xmin)-(ymax-ymin)n}-βc2,-βc200000+{β(xmax-xmin)-(ymax-ymin)n}]<c1<max[+{β(xmax-xmin)-(ymax-ymin)n}-βc2,-βc20000000000+{β(xmax-xmin)-(ymax-ymin)n}].

(d) Comparison of Suggested Estimators for Optimum Values of c 1 and c 2 with Usual Estimators. For optimum values of c1 and c2, the proposed estimator will always perform better than usual mean per unit estimator and their usual counterparts (ratio, product and regression estimators).

4. Empirical Study

We consider the following datasets for numerical comparison.

Population 1 (Singh and Mangat [<xref ref-type="bibr" rid="B10">9</xref>, page 193]).

Let y be the milk yield in kg after new food and let x be the yield in kg before new yield. N=27, n=12, X-=10.41111, Y-=11.25185, ymax=14.8, ymin=7.9, xmax=14.5, xmin=6.5, Sy2=4.103, Sx2=4.931, Sxy=4.454, and ρyx=0.990.

Population 2 (Singh and Mangat [<xref ref-type="bibr" rid="B10">9</xref>, page 195]).

Let y be the weekly time (hours) spent in nonacademic activities and let x be the overall grade point average (4.0 bases). N=36, n=12, X-=2.798333, Y-=14.77778, ymax=33, ymin=6, xmax=3.82, xmin=1.81, Sy2=38.178, Sx2=0.3504, Sxy=-2.477, and ρyx=-0.6772.

Population 3 (Murthy [<xref ref-type="bibr" rid="B7">10</xref>, page 399]).

Let y be the area under wheat crop in 1964 and let x be the area under wheat crop in 1963. N=34, n=12, X-=208.882, Y-=199.441, ymax=634, ymin=6, xmax=564, xmin=5, Sy2=22564.56,Sx2=22652.05, Sxy=22158.05, and ρyx=0.980.

Population 4 (Cochran [<xref ref-type="bibr" rid="B2">11</xref>, page 152]).

Let y be population size in 1930 (in 1000) and x be the population size in 1920 (in 1000). N=49, n=12, X-=103.1429, Y-=127.7959, ymax=634, ymin=46, xmax=507, xmin=2, Sy2=15158.83, Sx2=10900.42, Sxy=12619.78 and ρyx=0.98.

The conditional values and results are given in Tables 1 and 2, respectively.

Numerical values of conditions (38)–(45).

Conditions Population 1 Population 2 Population 3 Population 4
(38) 0.99 > 0.592 −0.667 > 0.2529* 0.980 > 0.478 0.981 > 0.525
(39) 0.214 < 0.287 < 0.360 0.442 < 1.125 < 1.807 22.23 < 26.16 < 30.094 22.92 < 24.5 < 26.07
(40) 0.99 < −0.592* −0.667 < −0.253 0.980 < −0.4783* 0.981 < −0.525*
(41) −0.360 < 0.287 < 0.93 −0.442 < 1.125 < 2.692 −22.238 < 26.16 < 74.57 −26.07 < 24.5 < 75.07
(42) 0.980 > 0.000 0.459 > 0.000 0.9605 > 0.000 0.963 > 0.000
(43) 0.273 < 0.287 < 0.301 −0.773 < 1.125 < −0.592* 22.78 < 26.16 < 29.549 24.36 < 24.5 < 24.63
(44) 0.980 > −1.913 0.459 > 0.272 0.9605 > −0.862 0.963 > −1.88
(45) −0.30 < 0.287 < −0.15* 0.592 < 1.125 < 4.026 −30.63 < 26.16 < −22.78* −24.36 < 24.5 < −21.2*

Note: *indicates that the condition is not satisfied.

PRE of different estimators with respect to y-.

Estimator Population 1 Population 2 Population 3 Population 4
y - R 1742.030 51.514 2501.29 2442.965
y - P 21.053 175.210 26.383 23.998
y - l r 5115.818 184.699 2536.131 2763.749
Y - ^ R C 2319.293 54.325 2940.086 2502.692
Y - ^ P C 21.117 212.639 26.424 24.004
y - l r C 5249.673 208.254 2856.83 2764.336

For percentage relative efficiency (PRE), we use the following expression: (46)PRE(y-i,y-)=V(y-)V(y-i)orM(y-i)×10000000.fori=R,P,RC,PC,lr,lrC.

5. Conclusion

From Table 2, it is observed that the ratio estimator Y-^RC is performing better than y-R in Populations 1, 3, and 4 because of positive correlation. The product estimator Y-^PC is better than y-P just in Population 2 because of negative correlation. The regression estimator y-lrC outperforms than all other considered estimators and is preferable.

Acknowledgments

The authors are thankful to the learned referees for their valuable suggestions and helpful comments in revising the manuscript.

Cochran W. G. The estimation of the yields of cereal experiments by sampling for the ratio to total produce The Journal of Agriculture Science 1940 30 262 275 Murthy M. N. Product method of estimation Sankhya A 1964 16 69 74 Haq A. Shabbir J. Gupta S. Improved exponential ratio type estimators in stratified sampling Pakistan Journal of Statistics 2013 29 1 13 31 Haq A. Shabbir J. Improved family of ratio estimators in simple and stratified random sampling Communication in Statistics: Theory and Methods 2013 42 5 782 799 Yadav S. K. Kadilar C. Improved class of ratio and product estimators Applied Mathematics and Computation 2013 219 22 10726 10731 10.1016/j.amc.2013.04.048 Kadilar C. Cingi H. Improvement in estimating the population mean in simple random sampling Applied Mathematics Letters 2006 19 1 75 79 2-s2.0-27644494188 10.1016/j.aml.2005.02.039 Koyuncu N. Kadilar C. Ratio and product estimators in stratified random sampling Journal of Statistical Planning and Inference 2009 139 8 2552 2558 2-s2.0-67349253857 10.1016/j.jspi.2008.11.009 Sarndal C. E. Sample survey theory vs general statistical theory: estimation of the population mean International Statistical Institute 1972 40 1 12 Singh R. Mangat N. S. Elements of Survey Sampling 1996 London, UK Kluwer Academic Publishers Murthy M. N. Sampling Theory and Methods 1967 Calcutta, India Statistical Publishing Society Cochran W. G. Sampling Techniques 1977 3rd New York, NY, USA John Wiley & Sons