On Estimation of Distribution Function Using Dual Auxiliary Information under Nonresponse Using Simple Random Sampling

In this paper, we proposed two new families of estimators using the supplementary information on the auxiliary variable and exponential function for the population distribution functions in case of nonresponse under simple random sampling. )e estimations are done in two nonresponse scenarios. )ese are nonresponse on study variable and nonresponse on both study and auxiliary variables. As we have highlighted above that two new families of estimators are proposed, in the first family, themean was used, while in the second family, ranks were used as auxiliary variables. Expression of biases and mean squared error of the proposed and existing estimators are obtained up to the first order of approximation. )e performances of the proposed and existing estimators are compared theoretically. On these theoretical comparisons, we demonstrate that the proposed families of estimators are better in performance than the existing estimators available in the literature, under the obtained conditions. Furthermore, these theoretical findings are braced numerically by an empirical study offering the proposed relative efficiencies of the proposed families of estimators.


Introduction
It is a well-known phenomenon that the known auxiliary information in the study of sample survey gives us an efficient estimate of population parameters, i.e., the population mean and population distribution function, under some essential conditions. is information (auxiliary) may be used for drawing a random sample using SRSWR or SRSWOR. Also, simple random sampling can be improved using the following sampling methods.
Stratification, systematic, nonresponse sampling, and probability proportional sampling schemes are used for estimating the population parameter. Auxiliary information gives us some sort of techniques by means of the ratio, product, regression, and other methods. In a practical situation, one of the important issues in surveys is that it suffers from nonresponse. Nonresponse is a common problem which may crawl with sampling survey. Nonresponse has many ways of occurrence. Examples are linguistic problems, illness, nonresponse, nonacceptance, process of return address misguided, and capture by another person. Research has labelled that various types of nonresponse may have different effects on estimators. A lot of work has been done on the estimation of population mean under nonresponse to control the nonresponse bias and to increase the efficiency of the estimators by different authors. e problem of nonresponse in sample surveys is more common and more prevalent in mail surveys than in special interview surveys. Hansen and Hurwitz [1] assumed that a part of sample of earlier nonrespondents to be recommunicated with a more expensive system; they attempted the first effort by mail questionnaire and performed the second attempt by a personal interview. However, Hansen and Hurwitz [1] have not used any kind of supplementary information to increase the efficiency of the estimator. For the first time, the author of [2] used the auxiliary information for estimating the population mean. Cochran [3] used the auxiliary information for estimating the population mean under nonresponse. en, work on nonresponse extended by many authors (cf., [4][5][6][7]) recommends various types of estimators for estimation of population mean and distribution function using the secondary information under nonresponse. Okafor and Lee [8] presented ratio and regression estimation with partial sampling of the nonrespondents for estimating the population mean. Furthermore, the authors of [9,10] proposed estimators for estimating population mean using multiauxiliary information in different directions and Zhao et al. [11] used the idea of robust estimation of the distribution function and quantiles with nonignorance missing data.
Also, for estimating population mean under the twophase sampling strategy in the presence of nonresponse, the authors of [12][13][14][15] have made significant contributions. Diana and Perri [16] suggested a class of estimators in twophase sampling with subsampling of nonrespondents in estimating the finite population mean. In this paper, we introduce the use of sample distribution functions of the study variable and auxiliary variable along with the mean of the auxiliary variable and also the ranks of the auxiliary variable for estimating the population distribution function.
Extensive literature has been published on estimation of population mean under nonresponse; however, no effort has been dedicated to the development of efficient methods for population cumulative distribution function. In survey sampling, the statisticians are often interested in proportion size of the study variable, i.e., proportion of units in population with values less than or equal to a specified value of y; for instance, we may be interested to know the proportion of the population in which 31% or more people are educated.
Motivated byF R,D (y),F S (y), and average ofF BT,R (y) andF BT,P (y), two new families of estimators are proposed for estimating distribution function in the presence of nonresponse. By numerical results, we will show that the proposed family of estimators is more precise than the existing estimators.
We planned the paper as follows: In Section 2, some notations are introduced. In Section 3, the existing estimators are reviewed briefly. Two new families of estimators are introduced in Section 4, respectively. e existing and proposed estimators are compared (theoretically and numerically) in Sections 5 and 6. In Section 7, the concluding remarks of the paper are discussed.

Notations
Consider a finite population Ω � V 1 , V 2 , . . . , V N of N distinct units, which is partitioned into respondents . , V N } groups with sizes N 1 and N 2 , respectively, for estimating the CDF, where N � N 1 + N 2 . A sample of size n has been drawn from this population by simple random sampling (SRSWOR), out of which n 1 units respond and n 2 � n − n 1 do not respond. It is assumed that the sample size n 1 is drawn from the response group of Ω 1 and n 2 is drawn from the nonresponse group of Ω 2 . Moreover, a sample of size r � n 2 /k(k > 1) is drawn by simple random sampling (SRSWOR) from n 2 , and this time response is obtained from all r units. Let Y and X be the study and auxiliary variables, respectively. Let Z be used for the ranks of the X and I(Y ≤ y) and I(X ≤ x) be the indicator variables based on Y and X. Furthermore, /N 2 are the population distribution functions of I(Y ≤ y) and I(X ≤ x) for the nonresponse group and � X 2 � N i�N 1 +1 X i /N 2 and Z 2 � N i�N 1 +1 Z i /N 2 are the population means of X and Z for the nonresponse group, respectively.
Here, (x � � X and Θ 2 (x)) and (y � Y and Θ 2 (y)), where � X and Y are the population means of X(Y). Similarly, Θ 2 (x) and Θ 2 (y) are the population second quartiles of X(Y), respectively.
To obtain the bias and MSE of the proposed estimator, we consider the following error terms. Let Here,F * H (y),F * H (x), and � X * and Z * are the notations used for CDFs, mean, and mean of ranks when there are no responses on both study and auxiliary variables. And, F H (x), � X, and Z are the notations used for CDF, mean, and mean of ranks when there are no responses on only auxiliary variable, shown in Table 1.
is the coefficient of multiple determination of I(Y ≤ y) on I(X ≤ x) and Z with situation-II. Here, λ � (1/n1/N), are the population variances of I(Y ≤ y), I(X ≤ x), X, and Z for the response group, respectively.
are the population covariances for the response group.
are the population covariances for the nonresponse group.
Journal of Probability and Statistics 3 Similarly, ρ 12 � S 12 /S 1 S 2 , ρ 13 � S 13 /S 1 S 3 , ρ 23 � S 23 /S 2 S 3 , ρ 14 � S 14 /S 1 S 4 , and ρ 24 � S 24 /S 2 S 4 are the population correlation coefficients for the response group, respectively. (2) , and ρ 24(2) � S 24(2) / S 2(2) S 4(2) are the population correlation coefficients for the nonresponse group. Let responding units out of n units and F 2r (y) � r i�1 I(Y i ≤ y)/r denote the sample distribution function of r responding units out of nonresponse units. e existing Hansen and Hurwitz [1] unbiased estimator of F(y) with its variance iŝ Similarly, the unbiased estimators forF H (x), � X H , and Z H and their corresponding variances arê In practice, we use three situations, occurring under nonresponse, but here, we use two situations which mostly occur, namely, nonresponse on both the study variable and the auxiliary variable (say situation-I) and nonresponse just on study variable only (say situation-II). For notational convenience, we follow the notations given in Table 1.

Existing Estimators
In this section, some estimators of finite population mean exist for estimating the finite CDF under nonresponse; the biases and MSEs of these existing estimators are derived under the first order of approximation.
(1) Cochran's [17] existing ratio estimator of F(y) iŝ e bias and MSE ofF R (y), to the first order of approximation, are (2) Murthy's [18] existing product estimator of F(y) iŝ e bias and MSE ofF P (y), to the first order of approximation, are biasF P (y) � F(y)Θ 1100 , (3) e existing regression estimator of F(y) iŝ where k is an unknown constant. Here,F Reg (y) is an unbiased estimator ofF(y). e minimum variance of Here, (15) may be written as (4) Rao's [19] existing difference-type estimator of F(y) isF where k 1 and k 2 are unknown constants. e bias and MSE ofF R,D (y), to the first order of approximation, are e optimum values of k 1 and k 2 , determined by minimizing (18), are Here, (20) may be written as (5) Grover and Kaur's [20] existing generalized class of ratio-type exponential estimator of F(y) iŝ where k 3 and k 4 are unknown constants. e bias and MSE ofF G,K (y), to the first order of approximation, are e optimum values of k 3 and k 4 , determined by minimizing (15), are e simplified minimum MSE ofF G,K (y) at the optimum values of k 3 and k 4 is Here, (25) may be written as which shows thatF G,K (y) is more precise thanF 4 (y).
Journal of Probability and Statistics 5

On similar lines, the second proposed family of estimators for estimating F(y) is given bŷ
where m 8 , m 9 , and m 10 are unknown constants and a(≠0) and b are either two real numbers or functions of known population parameters of I(X ≤ x), such as ρ 12 , β 2 (coefficient of kurtosis), and C 2 . e estimatorF Pr 2 (y) can also be written aŝ Simplifying (34) and keeping terms only up to the second power of e i s, we can writê e simplified minimum MSE ofF Pr 2 (y) at the optimum values of k 8 , k 9 , and k 10 is where R 2 1.24 � (Θ 2 1100 Θ 0002 + Θ 2 1001 Θ 0200 − 2 Θ 1001 Θ 1100 Θ 0101 /Θ 2000 (Θ 0200 Θ 0002 − Θ 2 0101 )). It can be seen thatF Pr 2 (y) is more precise thanF Reg (y). In Table 2, we put some members of the Grover and Kaur [20] and proposed families of estimators with selected choices of a and b.

Empirical Study
In this section, we conduct a numerical study to see the performance of the existing and proposed distribution function estimators. For this purpose, three populations are considered. e summary statistics of these populations are reported in Tables 3-5. e percentage relative efficiency PRE of an estimatorF i (y) with respect toF H (y) is where i � R, P, Reg, R, D, . . . , Pr 2 .
e PREs of distribution function estimators, computed from three populations, are given in Tables 6 and 7.
GK (y) F Pr 1 (y) F Pr 2 (y) Pr 2 (y)   Population I (source: [21]). Y: duration of sleep of persons with age more than 50 years X: the age of persons in years. e proportion of the non-response units in the given population is considered to be the last 25% units Population II (source: [22]). Y: the eggs produced in 1990 (millions) X: the price per dozen (cents) in 1990. e proportion of the non-response units in the given population is considered to be the last 25% units

Journal of Probability and Statistics
Population III (source: [22]). Y: the eggs produced in 1990 (millions) X: the price per dozen (cents) in 1991. e proportion of the non-response units in the given population is considered to be the last 25% units From the numerical results, presented in Tables 6 and 7, it is observed that the PREs of all families of estimators change with the choices of a and b. It is further noted that the proposed families of estimators are more precise than the existing distribution function estimators of Hansen and Hurwitz [1]; Cochran [17]; Murthy [18]; Rao [19]; and Grover and Kaur [20], in terms of PRE under both situations.

Concluding Remarks
In this paper, we have proposed two new families of estimators for estimating the finite population distribution function. e proposed estimators needed supplementary data on the sample mean and ranks of the auxiliary variable. e biases and mean squared error of the proposed families of estimators were derived using the first order of approximation. Based on theoretical as well as numerical comparative studies, it is concluded that the proposed families of estimators are more precise than their existing counterparts under situation-I and situation-II. So, we recommend using the sample mean and ranks of the auxiliary variable with the proposed families of estimators for estimating the finite population distribution function.

Data Availability
e data used to support the numerical findings of this study are available from the corresponding author upon request. e data can also be obtained upon searching the given sources of data.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.