Mean Empirical Likelihood Inference for Response Mean with Data Missing at Random

We extend the mean empirical likelihood inference for response mean with data missing at random.*e empirical likelihood ratio confidence regions are poor when the response is missing at random, especially when the covariate is high-dimensional and the sample size is small. Hence, we develop three bias-corrected mean empirical likelihood approaches to obtain efficient inference for response mean. As to three bias-corrected estimating equations, we get a new set by producing a pairwise-mean dataset. *e method can increase the size of the sample for estimation and reduce the impact of the dimensional curse. Consistency and asymptotic normality of the maximummean empirical likelihood estimators are established.*e finite sample performance of the proposed estimators is presented through simulation, and an application to the Boston Housing dataset is shown.


Introduction
e missing data problems exist widely in the social sciences, political sciences, medical research, and many other fields.
ere are different missing patterns including single variable nonresponse, multivariate nonresponse, monotone nonresponse, and general nonresponse, and there are three types of missing mechanisms: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). e MAR is defined as missing that only depends on the observed. Let Y be the value of a response that is subject to nonresponse and let X be a d-dimensional vector of auxiliary variables that are observed fully. Y o means the dataset of Y which is observed, and Y m means the dataset of Y which is missing or nonresponse. We interested in the mean of Y; i.e., μ � E(y). Let M be the indicator for Y, where M � 1 if Y is observed and M � 0 otherwise. In the MAR mechanism, means that missing data of Y is only related to (Y o , X) and has nothing to do with Y m .
Empirical likelihood [1,2] which is widely used for nonparametric and semiparametric statistical inferences is a competitive and powerful method for constructing confidence intervals (CIs). e empirical log-likelihood ratio (ELR) usually has an asymptotic chi-squared distribution, and the CI based on EL has many excellent properties, as proposed by Owen [3]. Due to these excellent properties, Wang and Rao [4], Liang et al. [5], and Stute et al. [6] extended the EL to missing data by imputing the missing data by a kernel regression function of observed data. For example, the bias-corrected EL method was constructed via the inverse propensity weighting (IPW), mean imputation (MI), and augmented inverse propensity weighting (AIPW). Guan et al. [7] proposed that an efficient estimator of the response means is constructed using the estimated response probabilities, which is a new method based on semiparametric maximum likelihood inference. Xie and Zhang [8] construct empirical likelihood confidence regions for the regression parameters without and with constraints by unconstrained and constrained empirical likelihood ratio statistics. Wang and Deng [9] proposed dimension-reduced empirical likelihood that was a two-stage estimation procedure and applied sufficient dimension reduction (SDR) technique in the kernel estimation of the propensity as well as the conditional mean response function. e method can avoid the well-known curse of dimensionality in the multivariate situation. But, it works only with a large sample. A lot of studies devote themselves to improve the accuracy of empirical likelihood ratio confidence region inference when there are small sample sizes and multidimensional situations. Hall and Scala [10], Diciccio et al. [11], and Tsao [12] found that empirical likelihood ratio confidence intervals could have poor accuracy, especially in the small sample and multidimensional situations. For confidence estimation, Dicicio et al. [11], Chen et al. [13], and Abraham and Wu [14] proposed the Bartlett correction empirical likelihood (BEL), the adjusted empirical likelihood, and an extended empirical likelihood (EEL), respectively. However, these methods require the calculation of the Bartlett correction constant, which is difficult to compute. So, Liang et al. [15] proposed the new method named mean empirical likelihood, which constructs an empirical likelihood function based on a set of pseudodata and it is easy to compute. erefore, we need to develop a new empirical likelihood inference method for response mean with data missing at random in case of a small sample and multidimensional data.
Let V be the pairwise-mean dataset. We will introduce a new empirical likelihood method named mean empirical likelihood. It constructs an empirical likelihood function based on a set of pseudodata and it is easy to complete compared to the mean empirical likelihood (MEL). e large sample properties of MEL are presented by Liang et al. [15]. e simulation studies indicate that the confidence regions by MEL are much more accurate than those by other empirical likelihood methods. Bigger coverage probability and smaller average interval length are other significant advantages of MEL. Nonetheless, there are two main problems in inference for response mean with data missing at random. First, the size of the sample is too small so that asymptotic normality is a little hard. e other is the high dimension of the covariate, and in these cases, the EL method is ineffective. MEL can solve the above problems by increasing the size of the sample for estimation.
In this paper, we propose a new empirical likelihood to make efficient inference on response mean, especially for confidence region of response mean, by applying the mean empirical likelihood inference for small sample sizes and multidimensional variable situations. In specific, we first apply the mean empirical likelihood (MEL) method (see [15]) for response mean with data missing at random. We construct three bias-corrected nonparametric MEL functions by pairwise-mean dataset based on the IPW, MI, and AIPW approaches. We showed that the resulting three MEL methods yield asymptotically equivalent estimators that achieve the desirable asymptotic unbiasedness and asymptotic normality. After a factor adjusted to the MELR function based on the IPW or MI, the proposed three MELR functions are shown to be asymptotically standard chi-squared with (d − 1) degree of freedom, where d is the rank of the covariance matrix of the pair-mean dataset. Simulation results show that the proposed method not only has higher accurate coverage probabilities, smaller average lengths, and standard deviation but also has efficient point estimators when the sample size is only 40. is paper is organized as follows. Section 2 presents three types of nonparametric MELRs. en, we introduce our main idea and establish a number of asymptotic properties. e simulation studies are given in Section 3. Section 4 provides a real data analysis and the paper concludes with a discussion in Section 5.  (1), we make MAR assumptions about the full data, and the missing pattern is generally nonresponse. Furthermore, To adjust the weight, we introduce the probability function

Mean Empirical
e unspecified functions π(X) and m(X) can be estimated by the kernel regression estimators as follows: is a symmetric kernel function, and h is a bandwidth.
Next, we introduce three bias-corrected estimating equations using the nonparametric kernel estimators π(X) and m(X) in equation (2) for handling missing data as follows.
(i) Nonparametric inverse propensity weighting (IPW) approach: this method assigns each observed Y i with weight proportional to the inverse of the estimated propensity π(X i ) in (2), i.e., (ii) Nonparametric mean imputation (MI) approach: for each X i with δ i � 0, we use an estimator m(X i ) in equation (2) to estimate m(X i ), and a nonparametric 2 Discrete Dynamics in Nature and Society imputation estimating equation for μ is given by (iii) Nonparametric augmented inverse propensity weighting (AIPW) approach: we combine the nonparametric IPW and MI approaches leading to the nonparametric AIPW estimating equation for μ as follows: In general, we use the empirical likelihood method to estimate the response mean. Let p li present the probability weight allocated to ϕ li (μ) for l � 1, 2, 3, i � 1, . . . , n. e profile empirical log-likelihood ratio function for μ based on ϕ li (μ) with data missing at random is defined as However, when the sample size is less than 100 or the data is high dimensional, its finite sample properties may not work well because of the low precision of χ 2 approximation. Hence, the efficiency of the EL method for the response mean with data missing at random is low. In particular, the estimated coverage probability and average length are far from the theoretical value. e mean empirical likelihood can deal with a small sample and multidimensional situations of response mean problem with data missing at random.

Mean Empirical Likelihood for Response Mean with Data
Missing at Random. In this paper, we apply the MEL to estimate the response mean with data missing at random under small sample and high-dimensional variate situation.
Based on this pairwise-mean dataset, the empirical loglikelihood ratio for μ is defined as which is named mean empirical likelihood ratio. It follows that Let θ 0 be the unknown true parameter value. en, the mean empirical log-likelihood ratio is given by Now, we have the following main theorem.

Theorem 1. Under the conditions listed in the Appendix, as
where l � 1, 2, 3 denotes IPW, MI, and AIPW, respectively, and ⟶ L denotes convergence in distribution.

Theorem 2. Under the conditions listed in the Appendix, as
where χ 2 1 is the χ 2 -distributed random variable with one degree of freedom,

Theorem 3. Under the conditions that Cov
Proof. See Appendix.

□
Following eorem 1, a confidence region for the parameter θ with asymptotic coverage probability 1 − α can be defined as

Simulation Studies
We present two simulation studies in this section, which correspond to the finite sample performance of different MEL methods in Section 2. We will compare the MEL with OEL for response mean with data missing at random. For the one-dimensional covariate model, the estimating equation g(X, θ) is equal to the estimating equation ϕ li (Y i , μ). For the multivariate covariate model, we compare the performance for different π(X i ), correlation structures, and the size of the sample.
Discrete Dynamics in Nature and Society

Simulation 1: Single Covariate Model.
e results and discussion may be presented separately, or in one combined section, and may optionally be divided into headed subsections. Suppose that g(X, θ) � ϕ l (Y, μ) � Y − μ is the estimating equation for μ. We aim to compare the confidence intervals derived from MEL and OEL, for a given sample size n. We consider different scenarios by generating observations X 1 , X 2 , . . . , X n from N(0, 1), respectively.
ere are four indicators for evaluating method performance.
e bias represents the difference between the estimated result and the true value, which measures the accuracy of point estimators. e coverage probability represents the probability that the point estimator falls within the confidence interval of the true value, which measures the stability of point estimators. e average length represents the average length of the confidence interval of the estimated value, and the variance represents the degree of dispersion of the estimated value, which measures the accuracy of interval estimators.
Based on 5000 replicates, the bias, coverage proportions, average length, and standard deviation were calculated. e simulation results are presented in Table 1. Table 1 shows the confidence region characteristics of different empirical likelihood methods when the X i is the one-dimensional normal distribution with mean 1 and variate 1, denoted as From Tables 1 and 2, we conclude the following: (1) Comparison for different sample sizes: when the sample sizes increase, all coverage probability estimated by OEL or MEL increases for small sample size n and all coverage probability estimated by OEL or MEL decreases. However, the bias estimated by AIPW and MI method reduced significantly, and the bias estimated by the complete-case and IPW method changed a little (2) Comparison of different estimation methods: MEL is much better than OEL. All coverage probabilities estimated by MEL are closer to the nominal levels than those of the OEL method. All standard deviations by MEL are much smaller than those of the OEL method. All average lengths by MEL are much smaller than those of the OEL method (3) Comparison for different estimating equations: the point estimator based on MI is superior to other estimating equations. In particular, the coverage probabilities estimated by the MI method perform very well in all simulations. In a word, the sequence about the accuracy of the estimating equation is that the MI method is superior to the AIPW method and the AIPW method is superior to IPW (4) Comparison for different error situations: the average length estimated by OEL and MEL with ε i ∼ N(0, 1 + 0.5X 2 ) is much longer and more stable than the method with ε i ∼ N(0, 1). But the standard deviation estimated by OEL and MEL with is much bigger than the method with ε i ∼ N(0, 1). Hence, when the errors in the method are heteroscedastic with ε i ∼ N(0, 1 + 0.5X 2 ), MEL method for response mean also plays well In a word, the performances of IPW, MI, and AIPW based on MEL precede those methods based on OEL, and AIPW based on MEL achieves the most accurate results.

Simulation 2: Multivariate Covariate Model.
In simulation 2, X i ′ s are 6-dimensional normal distribution with mean 0 and covariance matrix Γ, denoted by (i) e first generates homoscedastic errors where ε i and X i are independent and σ 2 (X i ) � 1 (ii) e second errors are heteroscedastic with . . , n We generate δ i from the Bernoulli distribution with probability π(X i ) and consider three choices of π(X i ): e coefficients in the propensity models are chosen so that the unconditional rates of missing data are between 20% and 40%. When Y i are linear and π(X i ) is logistic linear under M1-M3, so S Y and S δ under M1-M3 are onedimensional.
In addition, we also conduct two simulations with four correlation structures: (i) Independent covariates, where the correlation coefficient is 0 (ii) Slightly correlated covariates, where the correlation coefficient is 0.2 (iii) Moderately correlated covariates, where the correlation coefficient is 0.5 (iv) Strongly correlated covariates, where the correlation coefficient is 0.8 All results are based on 1000 replications and the sample sizes are n � 20 and n � 40. Tables 3 and 4 show the biases and standard deviations (SDs) of the point estimators, the average lengths (ALs), and the coverage probabilities (CPs) of confidence intervals (CIs) at the nominal level of 95%. e CIs based on the mean empirical likelihood ratio L M (μ) are obtained by I M in equation (15). e CIs based on μ � Y and μ cc are obtained by the normal approximation 4 Discrete Dynamics in Nature and Society (μ − 1.96σ se , μ + 1.96σ se ), with standard error σ se obtained by the square roots of bootstrap variance estimators based on n/4 bootstrap replications, where n is the size of the sample.
From Tables 3 and 4, we can see the following.
(1) Comparison for different sample sizes: when the sample sizes increase, estimators μ ipw , μ mi , and μ aipw have small biases, the coverage probability is big, the AL and the SD are small. So, the finite sample performance of the mean empirical likelihood is as that of the original empirical likelihood (2) Comparison for different estimation methods: estimators μ ipw , μ mi , and μ aipw based on MEL have similar point estimation compared with those based on OEL. In addition, estimators μ ipw , μ mi , and μ aipw based on MEL have bigger CP and smaller SD than those based on OEL. So, the ALs based on MEL are comparable, close to the method based on Y (3) Comparison for different estimating equations: as to the bias, the estimator μ aipw is closer than μ ipw and μ mi . e CP of the estimator μ aipw performs better than μ ipw and μ mi in many cases. So, the AL of the estimator μ ipw performs better than μ aipw and μ mi in many cases (4) Comparison for different error situations: the variation range of the bias estimated by OEL and MEL with ε i ∼ N(0, 1 + 0.5X 2 ) is much smaller and more stable than the method with ε i ∼ N(0, 1). But the standard deviation estimated by OEL and MEL with ε i ∼ N(0, 1 + 0.5X 2 ) is close to the method with ε i ∼ N(0, 1) (5) Comparison for different correlation structures: when ρ is large, the biases based on MEL are not stable but are compared to μ cc . But the CPs and the ALs still perform well, and the ALs are stable. when ρ � 0 and ρ � 0.2, the MEL method performs better than the OEL method In a word, the performances of IPW, MI, and AIPW based on MEL precede those methods based on OEL, and AIPW based on MEL achieves the most accurate results.

Real Data Analysis
In this section, we apply the proposed method to the Boston Housing data to illustrate our proposed MEL for missing data.
e data is taken from the UCI Irvine Machine Learning Repository. e distribution of the per capita crime rate (CRIM) by the town is unknown. e inference on      CRIM distribution based on the full data has even been analyzed by Liang et al. [15]. e dataset consists of 506 observations and 14 variates, with CRIM being the response variate and the other 11 being the covariates. We are interested in the mean of CRIM and its confidence region. We set the unconditional rates of missing data between 5% and 15%, and all real data analysis results are based on 1000 replications and the sample size is n � 20, n � 40, and n � 80. Tables 5-7 show the biases and standard deviations (SDs) of the point estimators, the average lengths (ALs), and the coverage probabilities (CPs) of confidence intervals (CIs) at the nominal level 95%.
In Tables 5-7, it is observed that when the sample sizes increase, the coverage probabilities of the estimators μ ipw , μ mi , and μ aipw based on the MEL method are bigger than those of the AL and the SD. In particular, the confidence regions of the response mean are close to the confidence region of the full data. So, the finite sample performance of mean empirical likelihood is better than that of the original empirical likelihood.

Conclusions
is paper proposed a new empirical likelihood for inference on response mean with a small sample and multidimensional variate using the mean empirical likelihood. Its large sample properties were proved with different bias-corrected estimating equations.
is new method outperforms the original empirical likelihood methods.
On the basis of IPW, MI, and AIPW, the new method uses a pairwise-mean dataset to obtain its advantages. Compared with the original EL method, this method can obtain an approximate point estimator and better CPs, ALs, and SDs. Compared with the other EL inference methods, this method has two characteristics: one is that the size of the sample is much larger than dimensions, and the other is that the number of samples is large. In addition, AIPW based on MEL achieves the most accurate results. eoretically, when the dimension of the covariate is increasing, the performance of response mean inference based EL is bad. e method of Liang et al. [15] can solve this problem. Meanwhile, it has superior performance when the size of the sample is small. erefore, we recommend that three bias-corrected models be carefully constructed and used to infer the confidence region of response mean with missing at random. We will further study the MEL method for categorical data with random missing covariates.   en, we prove that (i) holds. (ii) According to Lemma A.1 (iv) in Liang et al. [15], we notice that Using eorem 2 and Lemma (i), we have N(0, 1). erefore, we have where ϑ l � σ 2 /σ 2 l , l � 1, 2, and ϑ 3 � 1.

Conflicts of Interest
e authors declare that there are no conflicts of interest in the publication of this paper.