Robust Regression-Ratio-Type Estimators of the Mean Utilizing Two Auxiliary Variables: A Simulation Study

Çankırı Karatekin University, Faculty of Science, Department of Statistics, Çankırı, Turkey Ondokuz Mayıs University, Faculty of Science, Department of Statistics, Samsun, Turkey Department of Mathematics, Usmanu Danfodiyo University, Sokoto, Nigeria Department of Mathematics, Masinde Muliro University of Science and Technology, Kakamega, Kenya Department of Mathematics and Statistics, PMAS Arid Agriculture University, Rawalpindi, Pakistan


Introduction
e classical ratio estimator is the most usual estimator of Y when the correlation between study variable y i and auxiliary variables x 1 and x 2 is positively high. However, when there are outliers in dataset, classical estimators perform poorly by decreasing their efficiency. As a result, the following studies in ratio estimators are available in the literature to lessen the detrimental impact of outlier data. In ratio estimators, Kadilar et al. [1] introduced using Huber-M estimate instead of least squares estimation (LSE). Noor-ul-Amin et al. [2] proposed to use Huber estimate, instead of LSE under double sampling. In ratio estimators, Zaman and Bulut [3] improved the estimators by applying robust regression coefficients. Zaman [4] provided estimators combining ratio estimators for Y utilizing the ratio estimators given in Zaman and Bulut [3]. Subzar et al. [5] provided ratio estimators to predict Y by using various robust regression techniques. Shahzad et al. [6] presented some ratio estimators to predict Y by using of the Zaman and Bulut [3] ratio estimators for several cases of missing observations. Zaman and Bulut [7] presented ratio estimators by considering robust regression techniques and robust covariance estimations for stratified random sampling. Subzar et al. [8] presented various ratio estimators by considering robust regression tools. Ali et al. [9] provided robust regressiontype estimators of the mean for simple random sampling. Grover and Kaur [10] improved the various ratio-type estimators by considering robust regression tools. In this paper, we are suggested to use of some robust regression estimates, instead of LSE, for the improved estimator of Y by considering two auxiliary variables for simple random sampling.
In the next section, we will go over ratio-type estimators for Y for simple random sampling, as well as their MSEs. In Section 4, we deduce the properties of the suggested estimators. Section 5 compares the efficiency of the estimators offered by Kadilar and Cingi [11] and the suggested estimators based on the MSEs. An empirical study utilizing two datasets and a simulation study are conducted, and we obtain satisfactory results, both theoretically and numerically. e numerical results are presented in Sections 6 and 7, respectively. We provide conclusion in the last section. In addition, the current study relies on robust estimates for two auxiliary variable studies for simple random sampling.

Traditional Ratio Estimator
Abu-Dayyeh et al. [12] provided the following estimator of Y utilizing two auxiliary variables for simple random sampling and assuming that X 1 and X 2 are known: where α 1 and α 2 are real numbers. Considering the ratio estimator presented in (1), Kadilar and Cingi [11] provided the following estimator: e MSE equation of the estimator given in (2) was obtained as follows: where are obtained by the LS estimate, f � (n/N), S 2 x 1 and S 2 x 2 are the variances of population of x 1i and x 2i , respectively, and S yx 1 and S yx 2 are the covariance of population between y i and x 1i and between y i and x 2i , respectively [11].

Robust Regression Methods
Here, we describe some important and famous robust regression methods.

Huber-M Estimation.
Huber [13] proposed an estimation by using different ρ functions known as M-estimators.
is estimate is based on minimizing another function of outliers instead of error squares (e i ).
e objective function of M-estimation is presented as and is a symmetric function of outliers. Huber's function ρ is designed as A derivative of the function presented in (5) is equal in the following equation: where sgn(.) is sign function and specified as and a constant value [14]. [15] proposed the regression estimates associated with M-scales to S estimation. S estimate is based upon the residual scale of M estimate. e estimate uses the residual standard deviation to tackle the weaknesses of the median. Function to minimize is given as follows:

S Estimation. Rousseeuw and Yohai
where

Least Trimmed Squares Estimation (LTS).
In LTS estimate, initially squared error terms are sorted. en, the sum of the first φ of sorted error terms is taken, and function to minimize is given as follows: where φ � τ/2 + 1 and τ is the number of observation.

Least Median of Squares Estimation (LMS)
. LMS estimate is improved by Rousseeuw and Leroy [17]. In LMS, the median of error squares is minimized. e following equation is minimized: e estimate is robust against unusual observations in the direction of both x and y, and its breakdown point is 0.5 [17]. For more detailed information about robust regression techniques, Zaman and Bulut [3] work can be investigated.

Suggested Estimators
In the presence of unusual observations, we propose utilizing the following 4 robust ratio estimators based on LTS, S, LMS, and Huber-M estimations instead of the ratio estimators stated in (2): e MSE equations of the suggested estimators y pri where i � 1, 2, 3, 4 are assumed to be the same as the expression for MSE in (3), but it is clear that B 1 and B 2 in (3) should be substituted by B 1rob(k) and B 2 rob(k) , whose values as computed by LTS, S, LMS, and Huber-M estimates, respectively.
e MSE equations for the suggested robust regression-ratio-type estimators belonging to LTS, S, LMS, and Huber-M estimates are obtained as follows: where B 1rob(k) and B 2 rob(k) are obtained from LTS, S, LMS, and Huber-M estimates, respectively (k � LTS, S, LMS, and Huber-M).

Efficiency Comparisons
We compare the MSE equation for the traditional estimator, presented in (3), with the MSE equations for the suggested robust regression-ratio-type estimators, given in (14), to derive the conditions for which the suggested estimators will perform better than traditional estimator for two auxiliary variables for simple random sampling.

Numerical Illustrations
In this part, we performed numerical examples on two real datasets. e first dataset (Education) was collected to model the expenditure for public education [18]. e second dataset (Crime) was used to predict the violent crimes in states [19]. ese datasets were previously used for investigating the robust techniques in the literature. Education dataset exists in robustbase package in R software [20].
We employed four robust estimators: LTS, LMS, S, and Huber-M, respectively. We compared the efficiency of robust techniques with LSE. While assessing the performance of the estimators, we used the efficiencies relative to LSE. We used R programming language in the implementation phase [21]. We utilized MASS and robustreg packages for robust regression analysis [20,22]. We obtained the MSE values for each estimator for performance evaluation. e definitions of the variables existing in datasets are shown in Table 1. e datasets contain two auxiliary and one study variables. Table 2 indicates the descriptive statistics for the real datasets.
In Table 3-6 , the covariance and correlation matrices of the variables are given for both datasets. e high positive correlation values satisfy the condition of the applicability for the ratio estimators.
We calculated the Mahalanobis distances for checking the existence of potential outliers. For each observation "i," Mahalanobis distances are calculated as follows: where "L" shows the location matrix and "C" indicates the covariance matrix. e cut-off value is χ 2 p,(1−α)/2 where p is the number of variables. e α term represents the error level. e observation is considered as the potential outlier when exceeds the cut-off value.
We used minimum covariance determinant (MCD) estimators for avoiding the masking effect [23]. Figures 1  and 2 represent the Mahalanobis distance plots for the real datasets. e cut-off value is given with a straight line in plots. Clearly, we can see that Education and Crime datasets include some potential outliers. Also, these two datasets were used to evaluate the robustness issues in the literature. Table 7 indicates the regression coefficients which were obtained using the robust estimates and LSE. e coefficients are rather different from LSE for robust estimates. Although there is a positive correlation between x 1 and y in Education dataset, B 1 is negative. is case shows the corruptive effect of outliers in dataset. Robust estimates adroitly overcome this problem, and all regression coefficients are positive for each robust estimate.
We use the MSE values of the conventional and suggested robust regression-ratio-type estimators, as specified in Sections 2 and 4, to calculate the relative efficiency of each suggested robust regression-ratio-type estimator in (10)- (13) in comparison to the classical estimators in (2), utilizing the formulae: Table 8 denotes the performance results of the estimators. According to efficiencies, the suggested robust regression-ratio-type estimator is better than the estimator presented by Kadilar and Cingi [11]. Especially the suggested robust regression-ratio-type estimators based on S and LTS estimates produced the lowest MSE values for Education and Crime dataset, respectively. e classical estimator has the highest MSE value when comparing with four robust regression-ratio-type estimators in both two datasets. ese results are not surprising because condition (15) is satisfied.

Simulation
A simulation study is performed to evaluate the efficiencies of the suggested robust regression-ratio-type estimators. Epilepsy dataset is used for the simulation [24]. e dataset include two auxiliary and one study variable. ere are n � 59 observations in epilepsy dataset. e purpose of this dataset is to predict the epilepsy attacks. Epilepsy dataset exists in "robustbase" package of R programming language. e description of the variables for epilepsy dataset is shown in Table 9. Sum of Y is the study, and the other variables are auxiliary variables. Figure 3 demonstrates the Mahalanobis distance plot for epilepsy dataset. e distances are obtained similar to the previous application section. Obviously, we can see that this dataset contains some possible outlier observations. We randomly selected sample from the datasets for 10000 times randomly and estimated the population means using the traditional and proposed estimators. We computed the MSE equation as follows: where Y i indicates the estimation of mean for i � 1, 2, . . . , 9999, 10000 and Y represents the priorly known population mean of the study variable. In Table 10, we reported the MSE ratios of the suggested robust regressionratio-type estimators with respect to the traditional estimator for each dataset. ese values are obtained using (18).

Mathematical Problems in Engineering
Numerical results were conducted for the sizes of sample n � 10, 20, 30, 40. Computations were run in R software. Table 10 denotes the simulation results. According to results, the suggested robust regression-type estimators apparently outperform the estimator presented in Kadilar and Cingi [11] in all sample sizes. In most cases, the suggested robust regression-type estimator based on LTS estimate has the lowest MSE value. As the sample size grows, all estimators produce lower MSE values. Generally, the suggested regression-type estimators based on robust Percentage of the people who are single parents y * Violent crimes per 100,000 people     techniques perform better than traditional estimator and overcome the outliers problem. ese simulation findings support the theoretical results in Table 8.

Discussion
Tables 8 and 10 clearly show that proposed robust regression-ratio-type estimators for estimating Y by utilizing outliers data for simple random sampling that are more efficient. e estimators of (10)-(13) provide lower MSEs than the MSE of traditional ratio estimator of (3). is means that the estimators of (10)- (13) show high performance than the estimator presented by Kadılar and Cingi [11]. ese results have been demonstrated theoretically and supported by both empirical and simulation results.

Conclusion
Traditional ratio estimators suffer from the outliers because of the distorting effect. In this study, we proposed robust regression-ratio-type estimators using several robust estimates to handle the robustness task for the estimator proposed by Kadılar and Cingi [11]. We aimed to improve the performance of the suggest estimators by adopting robust coefficients. We used two real dataset examples containing possible outliers for comparing the suggested robust regression-ratio-type estimators with traditional estimator. According to our findings, the suggested robust regressionratio-type estimators based on all robust techniques have lower MSE values when comparing with the traditional estimator. Numerical results demonstrate that the suggested robust regression-ratio-type estimators provide more efficient results than the traditional estimator. We hope that in the future we will expand the estimators presented here to other sampling designs.

Data Availability
e data used to support this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest.