Minimum Covariance Determinant-Based Quantile Robust Regression-Type Estimators for Mean Parameter

Robust regression tools are commonly used to develop regression-type ratio estimators with traditional measures of location whenever data are contaminated with outliers. Recently, the researchers extended this idea and developed regression-type ratio estimators through robust minimum covariance determinant (MCD) estimation. In this study, the quantile regression with MCD-based measures of location is utilized and a class of quantile regression-type mean estimators is proposed. The mean squared errors (MSEs) of the proposed estimators are also obtained. The proposed estimators are compared with the reviewed class of estimators through a simulation study. We also incorporated two real-life applications. To assess the presence of outliers in these real-life applications, the Dixon chi-squared test is used. It is found that the quantile regression estimators are performing better as compared to some existing estimators.


Introduction
e use of auxiliary information in survey sampling is as ancient as survey sampling itself (Bulut and Zaman [1]). Neyman's work [2] mentioned the early works in which auxiliary information was used. e problem of improving the efficiency of parameter estimation by the use of auxiliary information has received a lot of attention in sampling theory and practice. e common examples of such methods are ratio, product, and regression estimators. Under the simple random sampling scheme, ratio and product estimation techniques are widely used. For more details about these two estimation techniques, studies by Cochran [3], Murthy [4], Singh [5], and Shalabh and Tsai [6] can be referred. Furthermore, there is a wealth of literature on ratio estimators for population mean. Studies by Koyuncu [7], Abid et al. [8,9], Irfan et al. [10], Shahzad et al. [11], Ali et al. [12], and Yadav and Zaman [13] are examples of such works.
Both ratio and product techniques have benefits and drawbacks. For instance, a ratio estimator is appropriate when the study and the auxiliary variables have a positive linear relationship/correlation, whereas a product estimator is appropriate when they have a negative linear relationship. is problem is solved by using the regression estimation technique, which yields significantly improved results for both positive and negative correlations. It should be noted that the typical regression estimator is based on the linear least square regression coefficient. For regression estimation, interested readers may refer to Ijaz et al. [14] and Tanış et al. [15]. e linear least squares or ordinary least squares (OLS) regression is the most conventional statistical method commonly used for parameter estimation due to its easy implementation. is method aims to reduce the sum of the squares of the differences or residuals (s i ) between the observed dependent variable and the predictions made by the independent variable's linear function. e OLS strategy produces the best estimation results for straight-line regression under the ideal circumstances revealed by OLS. On the other hand, the parameter estimates based on OLS is influenced by outliers or extreme values and, therefore, do not produce significantly productive results. e OLS-fitting threshold point is n −1 (n, sample size) or 0%, indicating that a single outlier can have a significant impact (Rousseeuw and Leroy [16]). As a result, in the presence of outliers, the mean estimation based on OLS is also affected (Zaman and Bulut [17]). To overcome this issue in sense of mean estimation, the authors used robust regression tools, robust quantile regression tools, and robust covariance matrices (see, among others, recent works of Zaman and Toksoy [18], Zaman [19], and Zaman and Bulut [17,20] who developed robust regression techniques to monitor the effect of extreme values). In this study, we have attempted to utilize the quantile regression with minimum covariance determinant estimator-based measures of location and propose a class of quantile regression-type mean estimators.
e main parts of this study are organized as follows. First, a review of robust regression tools and the MCD-based ratio is provided. e proposed class of quantile regressionratio type mean estimators is then introduced, as are its MSE. In addition, numerical illustrations of the existing and proposed class of estimators are included. Finally, the conclusion is provided.

Review of MCD-Based Ratio Estimators
In attempts to find estimators for the location parameter that are efficient across a wide range of datasets, the robust approach as the main approach has been developed (Hogg [21]). e purpose of the robust technique is to discover a single estimator that is efficient across a wide range of datasets, even if it is not exactly ideal for any population. For this article's purposes, the following lines give a precise description of some of the robust regression tools and MCD estimator utilized by authors to overcome the issue of extreme values in mean estimation.
As robust regression tools, we considered the following: LAD: the least absolute deviation estimator is based on minimizing the sum of absolute squared errors (SE).
LMS: the least median of squares estimator is based on minimizing the SE median.
LTS: the least trimmed squares estimator is based on applying OLS to specified initial observations of sorted SE that is; hence, their computations are not affected by extreme values.
M-estimation: the M-estimation, M stands for maximum likelihood type, is based upon the minimization of the objective function ∅(·). Some of the designed formulae for the objective function of residuals (s i ) are as follows: Huber M-estimator: Huber [22] investigated the objective function ∅(s i ) with a � 4.685, or 6 as Hampel M-estimator: Hampel [23] investigated the objective function ∅(s i ) with k � 1.7, g � 3.4, and a � 8.5 as Tukey M-estimator: Tukey [24] investigated the objective function ∅(s i ) with a � 4.685, or 6 as MM-estimator: the MM-estimator, MM stands for modified M-estimator, is another type of a robust regression tool that is also developed in the presence of outliers. It is proposed based on combining the high resistance to outliers of S-estimators with the high efficiency of M-estimators. e MM-estimator is a regression M-estimator with a redescending function, with the initial values of the regression coefficients and the scale estimate coming from the S-estimator, which is based on a robust scale M-estimator minimization. For details, Yohai [25] may be viewed.
MCD: the minimum covariance determinant estimator is one of the earliest and most robust estimators of multivariate location and spread. Although MCD was first introduced in 1984 (Rousseeuw [26,27]), the development of the computationally efficient fast MCD algorithm in 1999 (Rousseeuw and Van Driessen [28]) marked the beginning of its principal application. e MCD location estimate is the mean of the r th observations, (n/2) ≤ r ≤ n and r � n * (1 − trimming ratio), for which the sample covariance matrix determinant is as small as possible. For more details, refer studies by Al-Noor and Mohammad [29], Hubert et al. [30], Bulut and Zaman [1], and Zaman and Bulut [17,20].
Zaman and Bulut [17] incorporated traditional measures of location such as coefficient of variation ψ x , coefficient of kurtosis ψ 2 (x), and arithmetic mean X. ey calculated these characteristics through MCD estimation which is highly sensitive in the absence of normality and in presence of outliers. ey defined the following family of MCD-based ratio-type estimators where (X M , Y M ) are the population means and (x M , y M ) are the sample means when a simple random sample of size n is gathered from the population. β (rob) is based on robust regression tools as discussed in previous lines of the current section. Furthermore, A 1 and A 2 are either (0, 1) or some known population measures. e family members of y bzi are provided in Table 1. e MSE of y bzi is given by Furthermore, S 2 y and S 2 x are the unbiased variances, and S xy be the covariance of Y and X. All these quantities are calculated through MCD estimation as Bulut and Zaman [1].

Proposed Class of Quantile Regression-Ratio-Type Estimators
Quantile regression is a variant of standard linear regression that calculates the conditional median of the result variable and can be used when the assumptions of linear regression are not fulfilled. Quantiles are points in a distribution that corresponds to the rank order of the distribution's values. e median is the value in the sorted sample that falls in the middle (middle quantile, q 50 th � 0.50). Interested readers may refer to studies by Koenker and Bassett [31], Koenker and Hallock [32], Koenker [33], Hao et al. [34], and Korkmaz and Chesneau [35].
So, with the presence of outliers and based on equation (4), A 1 and A 2 are (0, 1), and we propose a class of quantile regression-ratio-type estimators as with where y M , X M , x M are based on MCD, β (q) is the quantile regression, and ρ q (v) is a continuous piecewise linear function (or asymmetric absolute loss function), for quantile q ∈ (0, 1), but nondifferentiable at v � 0. Note that β (q) is the quantile regression coefficient for p � 2 variables. e MSE of the proposed family of estimators is For the purposes of the current study, it is worth mentioning that we are using q 14th � 0.14, q 24 th � 0.24, q 34th � 0.34, q 44th � 0.44, and q 54th � 0.54 quantiles. We see from the consequences of the numerical study conducted in next section that utilizing the quantile regression coefficients, based on these referenced quantiles, incredibly enhance the efficiencies of proposed estimators. Note that investigation of these five referenced quantiles leads to Mathematical Problems in Engineering propose a class containing five members. For sack of readability, let us provide five members of the proposed class with their MSE in a compact form, as follows:

Numerical Illustration
In this section, the performance of proposed and existing estimators through two real-life applications and simulation study is presented.

Real-Life Applications
Population 1. In this dataset, "amount of nonreal estate farm loans during 1977" is taken as auxiliary variable (X), while "amount of real estate farm loans during 1977" is taken as study variable(Y). Furthermore, n � 20 is selected for N � 50. For remaining characteristics of the population interested readers may refer to Singh [36].

Population 2.
We use "UScereals" dataset, which describes 65 widely available breakfast cereals in the USA, depending on the information available on the mandatory food label on the packet. e measurements are normalized here to a serving size of one American cup. e data come from ASA Statistical Graphics Exposition and are used by Venables and Ripley [37]. e dataset contains a number of variables. So, "grams of sodium in one portion" is taken as auxiliary variable (X), while "Number of calories" is taken as study variable (Y). Furthermore, n � 20 is selected for N � 65. For remaining characteristics of the population, interested readers may refer to Venables and Ripley [37].
For diagnostic checking, we should apply robust regression techniques on the referenced dataset; let us check the presence of outliers in (X, Y) individually through the Dixon chi-squared test for outliers presented by the Dixon [38,39] test. An R package "outliers" is used for this purpose. e results are provided in Table 2. Table 2 provides significant results in terms of p values, hence providing clear indication of the presence of outliers in our considered datasets. In light of DT, we can say that traditional OLS is not suitable for our dataset. So, there is a need to incorporate some sort of techniques, which can provide us better results in the presence of outliers. erefore, we apply robust and quantile regression with MCD estimation. e results associated with Populations 1 and 2 are available, respectively, in Tables 3 and 4.

Simulation Study
In the current subsection, an assessment of proposed and some existing estimators through Monte Carlo simulation is considered. e simulation design is organized as follows. A random variable X i ∼ G(2.7, 3.9) and random variable Y i is defined as Y i � h + RX i + εX p i . Here, it is assumed that p � 1.7, h � 6, R � 2.2, and ε has standard normal distribution with population of size N � 1000. e simple random sampling (SRS) is considered for n � 200. e SRS sampling has been replicated 1000 times. We examine empirical MSEs of y bzi , y reg , and y Ni as e results are available in Table 5.
Regarding Table 3, the MSE of proposed and existing estimators for Pop-1 associated to θ 1 − θ 5 can be ordered as follows: On the other hand, within five values of θ, quantile estimators perform best with θ 1 , LAD performs best with θ 3 , and all other estimators perform best with θ 4 .

(12)
Also, within five values of θ, quantile estimators perform best with θ 1 , LAD and LMS estimators perform best with θ 5 , and all other estimators perform best with θ 4 .
Based on Table 5, again the proposed estimators record the best performance among all compared estimators in this simulation study, where the MSE of proposed and existing estimators associated to θ 1 − θ 5 can be ordered as follows: Furthermore, within five values of θ, all estimators appear the best performance with θ 3 .
Overall, with real data and simulation, the proposed estimators record the best or near-best performance Mathematical Problems in Engineering

Conclusion
Bulut and Zaman used robust minimum covariance determinant (MCD) estimation to create a new class of robust regression-type ratio estimators. In this article, drawing inspiration from Bulut and Zaman's work, we propose to use quantile regression with MCD estimator-based location measures under a simple sampling scheme to introduce a class of quantile regression-type mean estimators to the assessment of the population mean with the appearance of outliers. e MSEs of the proposed estimators are also obtained. e performance of the proposed and some existing robust regression estimators are assessing through simulation and two real-life applications, where the Dixon chi-squared test is considered to assess the existence of outliers in the real-life datasets. Based on numerical comparisons, it is obvious that proposed estimators outperform or near outperform existing robust estimators across the considered variety of datasets. As a result, the proposed estimators may be of interest, and they will almost certainly increase the possibility of getting additional accurate estimates of the population mean when outliers exist. Also, these estimators may be developed under the ranked set sampling method, as given by Al-Omari [40], Al-Omari and Almanjahie [41], and Haq et al. [42].

Data Availability
e datasets used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.