L-Moments and Calibration-Based Estimators for Variance Parameter

The subject of variance estimation is one of the most important topics in statistics. It has been clarified by many different research studies due to its various applications in the human and natural sciences. Different variance estimators are built based on traditional moments that are especially influenced by the existence of extreme values. In this paper, with the presence of extreme values, we proposed some new calibration estimators for variance based on L-moments under double-stratified random sampling. A simulation study with COVID-19 data is performed to evaluate the efficiency of the proposed estimators. All results indicate that the proposed estimators are often superior and highly efficient compared to the existing traditional estimator.


Introduction
Statistics has special importance among the different sciences, whether natural, human, or social, because it deals with human and scientific needs at various levels and in a variety of methods ranging from simplicity describing primary information (statistical data) up to more complex and sophisticated methods of analyzing data related to describing and explaining phenomena or events and their relationship with each other and predicting and controlling them in light of the laws that govern them.
at is why statistics came to be viewed as that science that is concerned with studying different life phenomena using a group of statistical means for the purpose of describing and explaining them and establishing expectations about the future behavior of phenomena and directions of their development. us, this science becomes an essential tool for reaching correct decisions in describing and studying phenomena through the information available or collected. e additional information, in the statistical literature, that is attached to each unit or element is referred to as auxiliary (or supplementary, ancillary, concomitant, and supporting) information.
ere are several types of statistical data or auxiliary information, including data taken from documenting various actual details, statistics gathered from data available in delivery service centers, and surveys. Actually, how this information is used is determined by the type in which it is available. In sampling techniques, the use of this information is very old, and this is due to the pioneers in this field. Watson [1] and Cochrane [2] used the auxiliary information in developing estimation techniques that led to high estimation accuracy. Later, Abid et al. [3], Ali et al. [4], Hanif and Shahzad [5], Shahzad et al. [6][7][8], and Zaman and Bulut [9,10] have shown important work on the use of auxiliary information in various ways and applications.
e key concern in all sample surveys is to find point estimates for the parameters of interest, and it is equally important to extrapolate variations or discrepancies in these estimates. For an estimator, the importance of variance estimates is largely due to the fact that it provides a measure of the quality or accuracy of the estimates, enables users to draw specific conclusions, enables statistical agencies to provide data accuracy indicators to users, and is used to calculate confidence intervals [11]. e sampling design that forms the basis of the sample survey is one of the most important considerations deciding the procedure for estimating the variance as well as the number of sampling stages. In this paper, we are interested in estimating the variance with double-stratified random sampling. Stratified sampling is a form of survey sampling in which the population is grouped into a number of different categories (called strata). en, each stratum is sampled as an independent subpopulation from which individual elements can be selected randomly. In order for each element to have the same probability of being sampled, the stratums should not overlap. For more details about estimating the variance and double-stratified random sampling, see the works of Särndal et al. [12] and Al-Omari [13]. e data structure plays a major role in performing the correct analysis of the data by various methods of estimation. e process is greatly facilitated when the data are homogeneous and free of extreme values or outliers. In practice, however, most of the data are not free of such observations that are inconsistent with the rest of the observations and have bad and negative effects on the estimation of variance based on the central moments. One strategy to solve this problem, which provides a robust statistical basis for the data analysis, is to adopt the L-moments [14] which are computed based on a linear set of expected values for the order statistics. In addition, calibration estimation is another popular statistical strategy for improving the accuracy of estimators that are based on the use of auxiliary information to set the original weights of the design. While implementing a set of constraints with additional information, the calibration estimation methodology used calibrated or tuned weights aimed at reducing the measured distance to the original weights. e credit for the use of calibration estimation with survey data was attributed to pioneers Deville and Särndal [15], and then, many works related to the subject matter of estimation were carried out, such as that of Clement [16] and Koyuncu [17]. In the current paper, our objective is to develop some new variance estimators based on the above two strategies, "L-moments and calibration method," with double-stratified random sampling. e rest of this article is organized as follows. Firstly, the adapted family of estimators is presented in detail. Secondly, L-moments and proposed estimators are given in detail. en, a simulation study based on data of COVID-19 that is performed to assess the performance of the proposed estimators is given along with its obtained results. Finally, the conclusions are provided.

Adapted Estimators
Consider two random variables, "X as auxiliary variable and Y as study variable," associated with a finite population Ω of From the stratum h, a simple random sample of size n h is selected without replacement, such as H h�1 n h � n. Additionally, for X and Y in h th stratum, with h � 1, 2, . . . , K and i � 1, 2, . . . , N h , (x hi , y hi ) symbolize the observed values, (S 2 xh , s 2 xh ) and (S 2 yh , s 2 yh ) symbolize the population and sample variances, and (β 2xh , β 2yh ) symbolize the coefficients of skewness. In the light of this stratified sampling design, a new family of estimators [18] was implemented as shown below: with the methodology of the Taylor series, for where After substituting d hi and Σ h in equation (1), the above expressions of MSE(F sti ) will be where e adapted family members are provided in Table 1. Hosking and Wallis [18] employed the two robust measures of location "interdecile range (IDR) and midrange (MR)," that are less affected in the presence of extreme values with the conventional second central moment. e formulas of IDR and MR associated with X in stratum h are, respectively, given by IDR xh � D 9h − D 1h and MR xh � (max(x h )+ min(x h )/2). Furthermore, ρ xh and C xh represent, respectively, the population correlation coefficient and coefficient of variation associated with X in stratum h.

L-Moments and Proposed Estimator
Variance is one of the well-known and widely used measures of dispersion. However, it depends on traditional moments that are strongly affected by the presence of extreme values. Hosking [14] provided a solution to this problem, which is the adoption use of L-moments that are determined by linear combinations of the expected values of the order statistics. L-moments have many advantages over traditional moments. e main four advantages are as follows: (i) formulate as linear functions of the data, (ii) suffer less from the effects of sample variability, (iii) more robust to unusual values "outliers or extreme values" that may contain the data, and (iv) allow for safer inferences from small samples about the underlying probability distribution. For more interesting information about robust regression methods, interested readers are referred to [19][20][21]. e general population formulas for the first four Lmoments associated with X in stratum h are defined as and the sample formulas corresponding to the above four Lmoments can be written as where (:) represents the binomial coefficient and x h(k) is the k th order statistics. Also, by replacing x with y, we can write the general mathematical forms of L-moments for Y. For further information on L-moments, interested readers may refer to the work of Hosking and Wallis [18]. Now, based on L-moments, some symbolizations for proposed estimators w.r.t. stratum h are given below.
) symbolize the X-and Y-population and sample variances.
In this paper, a new calibration estimator of population variance with stratified sampling is considered as follows: where ϑ * h are calibrated weights. Using the loss functions of chi-square, and the following three calibration constraints with the function of Lagrange are given by Now, the process of minimizing the loss functions of chisquare in equation (9) which subject to the three constraints of calibration in equations (10)- (12) provides the weights of calibration with stratified sampling as follows: e following system equations are yields from substituting equation (14) in equations (10)- (12): Solving system equations in equation (15) w.r.t. λ * s , we obtain Substituting the above values of λ * s , s � 1, 2, 3, in equation (14) and the resulting in equation (8) by setting Δ h � 1 yields the proposed estimator of calibration regression for population variance based on stratified double sampling as follows: where the regression coefficients A 1h(α) , A 2h(α) , and A 3h(α) are, respectively, given by

Simulation Study
In this section, the performance of the proposed estimators will be evaluated by conducting a simulation study based on COVID-19 data.
e coronavirus, which causes COVID-19 disease, was discovered in December 2019 in Wuhan, China, and affected almost all the countries of world. World's population is suffering with disease from last 20 months and the disease is not over yet. Coronaviruses are a broad family of viruses that cause disease in humans and animals.
Around the world, there are seven continents: Asia, Africa, Europe, North America, South America, Australia, and Antarctica. e nation of Asia is the largest continent in the world, followed by Africa, Europe, and North America in the second, third, and fourth rank, respectively. ese four continents consist, respectively, of 49, 57, 48, and 39 countries. Furthermore, the total population of the world is 7.8 billion; however, 93.93% of this total population lives on these four continents. at is why we focus on adopting data on COVID-19 in these four main continents (strata), Africa (stratum I), Asia (stratum II), Europe (stratum III), and North America (stratum IV), compiled from the website (https://www. worldometers.info). e data are represented by two variables: X � the total number of infections "total cases" per million Y � total deaths per million population.
For each stratum, we draw the scatter plot (see Figures 1-4). ese plots clearly demonstrate the existence of extreme values and are consequently fitted for our proposed estimator. Some essential characteristics of the COVID-19 data are listed in Table 2.
e simulation steps can be summarized as follows: Step 1: based on SRSWOR, a random sample of size n h is selected from stratum h.
Step 4: compute the MSE "mean square error" as Step 5: compute the PRE "percentage relative efficiency" as which is listed in Table 3.

Conclusions
e statistical analysis of the data as a comprehensive description is not complete by determining its shape or by identifying the measure of central tendency that fits it, but rather by determining the degree of spread of observations using an appropriate measure of dispersion.
Variance is one of the well-known and widely used measures of dispersion. Different variance estimators are built based on traditional moments that are especially influenced by the existence of extreme values. In this paper, with the existence of extreme values, we proposed some new estimators for the population variance based on L-moments and calibration strategies under doublestratified random sampling. A simulation study with COVID-19 data for the period January 22, 2020, up to August 23, 2020, is performed to evaluate the efficiency of new estimators. All results of the percentage relative efficiency (PRE) measure indicate that the proposed estimators are often superior and highly efficient (PRE > 100) compared to the existing traditional estimator. High PREs of the proposed estimators show the better performance. Furthermore, the proposed estimators F st5 and F st6 that are built on the coefficient of variation record the highest efficiency among other estimators.

Data Availability
e datasets used to support the findings of this study are included within the article.