On Improving Ratio / Product Estimator by Ratio / Product-cum-Mean-per-Unit Estimator Targeting More Efficient Use of Auxiliary Information

To achieve amore efficient use of auxiliary information we propose single-parameter ratio/product-cum-mean-per-unit estimators for a finite population mean in a simple random sample without replacement when the magnitude of the correlation coefficient is not very high (less than or equal to 0.7). The first order large sample approximation to the bias and the mean square error of our proposed estimators are obtained. We use simulation to compare our estimators with the well-known sample mean, ratio, and product estimators, as well as the classical linear regression estimator for efficient use of auxiliary information. The results are conforming to our motivating aim behind our proposition.


Introduction and Notation
This paper addresses the problem of efficiently estimating the population mean, using auxiliary information.A fairly large simple random sample of size  is selected without replacement from, say, a large bivariate population of size  (which could, reasonably, be thought to have come from a normal superpopulation), with the sampling fraction  = /,  ≫ , so that  is negligible.Quite often, we have surveys where some auxiliary variable  may be relatively less expensive to observe than the main variable .In order to have a survey estimate of the population mean  of the main variable, assuming knowledge of the population mean  of the auxiliary variable, the following estimators are well known.
The ratio estimator: The product estimator: Here R = / is the estimate of the ratio of the population means and P =  ⋅  is the estimate of the product of the population means,  and  being unweighted sample means of the two variables, respectively.Usually, the variability of  is less than that of .
It is straightforward to derive first order approximations to the bias and mean square error of these estimators.Let   =   / and   =   / be the population coefficients of variation of  and , respectively, where are the population variances of  and , respectively.Let  = (1 + ) and  = (1 +   ), where the errors  and   can be positive or negative, so that () = (  ) = 0.It is known that, for simple random sample without replacement, Var() = (1−) 2  /, Var(  ) = (1−) 2  /, and Cov(,   ) = (1−)    / where  is the correlation coefficient between the variables (P.V. Sukhatme and B. V. Sukhatme [1]).Further, to validate our first order large sample approximations, we assume that the sample is large enough to make || and |  | so small that the terms involving  and/or   to a degree higher than two are negligible, an assumption which is not unrealistic.
Substituting the expressions for  and  in terms of  and   in (1) we have Assuming that |  | < 1, we expand (1 +   ) −1 to obtain up to the first order of approximation, O ( 1  ) . ( Since (1 − )/ → 0 as  → ∞, we have that the ratio estimator is asymptotically unbiased up to O(1/).Similarly we have that the product estimator is asymptotically unbiased (Murthy [2]).Also, up to the first order of approximation Thus up to order O(  [2]).Thus the ratio and product estimators are relatively more efficient than the usual unbiased estimator (u.u.e) sample mean when  > 1/2 and  < −1/2, respectively.Consequently, Ŷ / Ŷ fail to improve  (by using auxiliary information) when −1/2 ≤  ≤ 1/2.
Also we cannot ignore the classically well-known linear regression estimator, say ŶLR : If we recall the ANOVA of linear regression analysis, we must remember that the residual sum of squares for ŶLR is  2  (1 −  2 ) (Cochran [3]).Thus when || is high (say  > 0.7 or  < − 0.7), linear regression estimator is most likely to be more efficient than ratio/product estimators in using the auxiliary information (via auxiliary variable ).We aim at improving use of auxiliary information on  when −1/2 ≤  ≤ 1/2; Ŷ when  > 1/2; Ŷ when  < −1/2; and ŶLR when || ≤ 0.7.

Our Proposed Estimators
Because Ŷ and Ŷ are relatively more efficient than  when  > 1/2 and  < −1/2, respectively, we try the following single parameter linear combinations of Ŷ and −, as well as Ŷ and − to propose the estimators: (i) Shirley-Sahai-Dialsingh-ratio-cum-mean, say ŶSSDR : (ii) Shirley-Sahai-Dialsingh-product-cum-mean, say ŶSSDP : In ( 8) and (9),  is the design parameter for our proposed estimators to be assigned an optimal value, for example, so as to minimize the first order of MSE,  1 (∘), as in our case.Note that when  = 0, ŶSSDR = Ŷ and ŶSSDP = Ŷ .As remarked earlier, quite often a good guess of  is available from which we can give a suitable value to .

Sampling Bias and Mean Square Error of the Proposed Estimators
We derive the first order approximation,  1 ( ŶSSDR ), to the bias of ŶSSDR .Using the notation introduced in Section 1 and substituting the expressions for  and  in terms of  and   in (8) we have It is realistic practically to suppose that |  | < 1 so that (1 +   ) −1 is expandable.Then to the first order of approximation {O(1/)}, the bias of ŶSSDR is given by where  =   /  ,  = 1 + , and  0 = (1 − ) 2  /. 0 → 0, as  → ∞; therefore, ŶSSDR is asymptotically unbiased up to O(1/).
To compute the MSE of ŶSSDR we have where  0 = Var() = ((1 − )/) 2  and   =   .For large sample size,  1 ( ŶSSDR ) is minimum for  = .The optimal value of  is thus / = .If a good guess of , say  * , is available, we use  =  * − 1 in our proposed estimator (8), so that We deduce the large sample approximation for bias of ŶSSDP in a similar manner: where  and  0 are as before. 0 → 0, as  → ∞; therefore ŶSSDP is asymptotically unbiased.
To compute the MSE of Ŷ we have where Up to O(1/),  1 ( ŶSSDP ) is a minimum for  = −.The optimal value of  is thus −/ = −.
We use  = − * − 1 in our proposed estimator (9), where  * is the guess of .Thus, Consequently we have used the software R [4] to calculate the MSEs of each of the following estimators: We use 10,000 replications of simulated sample sizes  = 30, 40, 60, 80, and 100.Hence we have compared the efficiencies of these estimators relative to  by using Motivated by our desire to beat ratio/product estimators (implicitly, therefore ŶLR also), we have, therefore, taken up

Results and Discussion
The results of our simulations are tabulated in the Appendix.
For illustrative purposes, we highlight below the relative efficiency values for the various values of , for the cases when  = 30 and  = 0. To lessen the obscurity in the results, we have rounded these values to two decimal places.We also include a column for the value of  =   /  .
Tables 1 and 2 illustrate very well the relative betterment achieved by our proposed estimators vis-à-vis ŶLR and Ŷ (or Ŷ , as the case may be).Notably, when || is not greater than 1/2, our estimators are more efficient than  even though Ŷ or Ŷ (as the case may be) is worse than  which does not even use auxiliary information.Also when || is significantly less than 1/2, our estimators are more efficient than  even though ŶLR is worse than (i.e., it fails to use auxiliary information rightly)!

Conclusion
Our results conform to our motivating aim of achieving more efficient use of auxiliary information.Many other authors, such as Sahai [5] and Chami et al. [6], have suggested efficient variants of ratio and product estimators.In future work we are engaged in comparing these estimators and in trying even better estimators, like the proposed ones, which will be not only more efficient relatively, but also, possibly, more robust against the possible over/underguess of the key-population parameter .
Apparently, no algebraic comparison of mean square errors is feasible.We, therefore, have a numerical setup under simulation to do so.Knowing  exactly is seldom tenable in practice.Consequently, we have to assume the availability of a guess value of , which we have called  * , defined by  * = (1 + ), where  designates the quantum of relative under guess/overguess.We have taken the following  values: 0, ±0.02, ±0.04, ±0.06, ±0.08, and ±0.10.We have also assumed that the parent population is very large, envisaged to have come from a superpopulation which is bivariate normal with the following parameters, therefore having the same parametric values: