Empirical Likelihood for Multidimensional Linear Model with Missing Responses

Imputation is a popular technique for handling missing data especially for plenty of missing values. Usually, the empirical log-likelihood ratio statistic under imputation is asymptotically scaled chi-squared because the imputing data are not i.i.d. Recently, a bias-corrected technique is used to study linear regression model with missing response data, and the resulting empirical likelihood ratio is asymptotically chi-squared. However, it may suffer from the “the curse of high dimension” inmultidimensional linear regressionmodels for the nonparametric estimator of selection probability function. In this paper, a parametric selection probability function is introduced to avoid the dimension problem.With the similar bias-correctedmethod, the proposed empirical likelihood statistic is asymptotically chi-squared when the selection probability is specified correctly and even asymptotically scaled chi-squared when specified incorrectly. In addition, our empirical likelihood estimator is always consistent whether the selection probability is specified correctly or not, and will achieve full efficiency when specified correctly. A simulation study indicates that the proposed method is comparable in terms of coverage probabilities.


Introduction
Consider the following multidimensional linear regression model: where Y i is a scalar response variable, X i is a p×1 vector of design variable, β is a p×1 vector of regression parameter, and the errors i are independent random variables with E i | X i 0, Var i | X i σ 2 .Suppose that we have incomplete observations X i , Y i , δ i , i 1, 2, . . ., n, from this model, where all the X i s are observed, and δ i 0 if Y i is missing, δ i 1 otherwise.Throughout this paper, we assume that Y is missing at random MAR ; that is, the probability

Main Result
For simplicity, denote the true selection probability function by p 0 x and a specified probability distribution function by p x, θ for given θ, a p × 1 unknown vector parameter.Thus, the first situation means p 0 x p x, θ 0 for some θ 0 , where θ 0 is the true parameter and the second situation means p 0 x / p x, θ for any θ.And let β r be the least square estimator of β based on the completely observed pairs X i , Y i , δ i 1 , i 1, 2, . . ., n that is, Journal of Probability and Statistics 3

Empirical Likelihood
In this subsection, an empirical likelihood statistic is conducted, and then some asymptotic results are shown when the selection probability is specified correctly, that is, p 0 x p x, θ 0 for some θ 0 .
Since the design variable X i is observable for each subject, the maximum likelihood estimator θ can be obtained by maximizing the likelihood function The following regularity assumptions on L θ are sufficiently strong to ensure both consistency and asymptotic normality of the maximum likelihood estimator.Suppose U θ 0 is some field of θ 0 where θ 0 is the true parameter: Then, we use Yi . ., n, as "complete" data set for Y to construct the auxiliary random vectors

2.2
Thus, an empirical log-likelihood ratio is defined as Further, the maximum empirical likelihood estimator β of β is to maximize {−l * n β }.To ensure asymptotic results, the following assumptions are needed: C4 p x, θ is uniformly continued in U θ 0 for all X; C5 A, D 1 is a positive definite matrix, where A E{XX T } and D 1 E{{1/p X, θ 0 } XX ε 2 }; C6 p x, θ 0 has bounded almost surely and in f x p x, θ 0 > 0; where condition C4 is common for some selection probability function.Condition, C5 -C6 are necessary for asymptotic normality of the maximum empirical likelihood estimator.Let the χ 2 p 1 − α be the 1 − α quartile of the χ 2 p for 0 < α < 1.Using Theorem 2.1, we obtain an approximate 1 − α confidence region for β, defined by Theorem 2.1 can also be used to test the hypothesis H 0 : Remark 2.2.In general, plug-in empirical likelihood will asymptotically lead to a sum of weighted χ 2 1 variables with unknown weights for the EL statistics proposed.However, when p x, θ is specified correctly, the EL statistic with two plug-ins has the limiting distribution of χ 2 p which is due to the following reasons.Firstly, the bias-correction method, that is, the selection function as inverse weight will eliminate the influence by the β r .Secondly, the estimating function has special structure, that is, the influence of θ will be also eliminated if θ is concluded in the denominator of function.

Theorem 2.3. Under Conditions (C1)-(C6), if p x, θ is specified correctly, then
where To apply Theorem 2.3 to construct the confidence region of β, we give the estimator of Σ 1 , say It is easily proved that Σ 1 is a consistent estimator of Σ 1 .Thus, by Theorem 2.3, we have where I p is an identity matrix of order p.Using 10.2d in Arnold 6 , we can obtain Therefore, the confidence region of β can be constructed by using 2.8 .

Adjusted Empirical Likelihood When p 0 x / p x, θ
In this subsection, we also construct the empirical likelihood statistic and discuss some asymptotic results when the selection probability is specified incorrectly, that is, p 0 x / p x, θ for any θ.Since the design variable X i is observable for each subject, the quasi-maximum likelihood estimator θ, other than the maximum likelihood estimator, can be obtained by maximizing the likelihood function under some regularity assumptions.
The following regularity assumptions are sufficiently strong to ensure both consistency and asymptotic normality of the quasi-maximum likelihood estimator.Let u It is natural that g u is the true density function of u, and f u, θ does not contain the true structure g u .The Kullback-Leibler Information Criterion KLIC can be defined by I g : f, θ E log g U /f U, θ , here, and in what follows, expectations are taken with respect to the true distribution g u .When expectations of the partial derivatives exist, we define the matrices C8 f u, θ are measurable in u for every θ in Θ, a compact subset of a R p , and continuous u for every θ in Θ.
C9 a E log g U exists and | log f u, θ | ≤ m u for all θ in Θ, where m is integrable with respect to g; b I g : f, θ has a unique minimum at θ * in Θ.
C10 E ∂ 2 log f / ∂θ i , i 1, . . ., p, are measurable functions of u for each θ in Θ and continuously second-order differentiable functions of θ for each u in Ω.
, . . ., n as "complete" data set for Y to construct the auxiliary random vectors

2.10
Thus, an empirical log-likelihood ratio is defined as

2.11
And the maximum empirical likelihood estimator β of β is to maximize {−l n β }.

Journal of Probability and Statistics
To ensure asymptotic results, the following assumptions are needed.
C4 p x, θ is uniformly continued in U θ * for all x; C5 A, C, E, F, G, D 1 , D 2 is a positive definite matrix, where C6 p 0 x has bounded almost surely and in f x p 0 x > 0, where condition C4 is common for some selection probability function.Condition C5 -C6 are necessary for asymptotic normality of the maximum empirical likelihood estimator.

D
→ represents the convergence in distribution.
Let r β p/tr D −1 3 D 2 be adjustment factor.Along the lines of 7 , it is straightforward to show that r β We define an adjusted log-likelihood ratio by l n,ad β r β l n β .

2.13
Corollary 2.5.Under the conditions of Theorem 2.4, one has

2.15
Corollary 2.5 can also be used to test the hypothesis H 0 : Note that r β → 1, when p x, θ is close to correct one.Actually, the adjustment factor r β reflects information loss due to the misspecification of p x, θ .

Theorem 2.6. Under the conditions of Theorem 2.4, one has
where To apply Theorem 2.6 to construct the confidence region of β, we give the estimator of Σ 2 , say , where A and D 3 are defined by It is easily proved that Σ 2 is a consistent estimator of Σ 2 .Thus, by Theorem 2.6, we have where I p is an identity matrix of order p.Using 10.2d in Arnold 6 , we can obtain

2.18
Therefore, the confidence region of β can be constructed by using 2.18 .
Remark 2.7.The estimator proposed by Robins in this situation is to solve is, in that case the underlying regression function can asymptotically correctly specified.So whether missing data mechanism is specified correctly or not, the estimator is always consistent and the estimator can achieve asymptotic full efficiency when specified correctly.

Simulation
Due to the curse of nonparametric estimation, Xue's method may be hard to realize.Here we conducted an extensive simulation study to compare the performances of the weightedcorrected empirical likelihood WCEL proposed in this paper and Wang's method AEL proposed in Wang and Rao under the covariates of four dimensions.
We considered the linear model 1.1 with d 4 and β 0.8, 1.5, 1, 2 , where X was generated from a four-dimensions standard normal distribution, and ε was generated from the normal distribution with mean zero and variance 0.04.In the first case, the real selection probability function p 0 x, θ was taken to be exp x θ / 1 exp x θ .We considered θ equaled to the three following values θ 1 −0.5, −0.5, −0.5, −0.5 , θ 2 0, 0, 0, 0 , θ 3 0.5, 0.5, 0.5, 0.5 , respectively.In the second case, the real selection probability function p 0 x was taken to be the following three cases: , and 0.9 elsewhere.
, and 0.9 elsewhere.Case 3. p 3 x 0.6 for all x.
We generated 5000 Monte Carlo random samples of size n 100, 200, and 500 based on the above six selection probability functions p x .When the working model was p x, θ exp x θ / 1 exp x θ , the empirical coverage probabilities for β, with a nominal level 0.95, were computed based on the above two methods with 5000 simulation runs.The results are reported in Table 1.
From Table 1, we can obtain the following results.Firstly, under both cases, WCEL performs better than AEL because its confidence regions have uniformly higher coverage probabilities.Secondly, all the empirical coverage probabilities increase as n increases for every fixed missing rate.Observably, the missing rate also affects coverage probability.Generally, the coverage probability decreases as the missing rate increases for every fixed sample size.However, under Case I, the values do hardly change by a large amount for both methods because of the exponential selection probability function.

Conclusion
In this paper, a parametric selection probability function is introduced to avoid the dimension difficulty, and a bias-corrected technique leads to an empirical likelihood EL statistic with asymptotically chi-square distribution when the selection probability specified correctly and with asymptotically weighted chi-square distribution when specified incorrectly.Also, our estimator is always consistent and will achieve asymptotic full efficiency when selection probability function is specified correctly.
Applying the Taylor expansion to A.11 and A.13 , we get that A.14 By A.12 , it follows that This together with Lemma A.3 and A.13 proves that

A.16
Therefore, from A.14 we have A.17 This together with Lemma A.3 completes Theorem 2.4.
Proof of Theorem 2.6.From Theorem 1 of Qin 10 and A.5 , we obtain the result of Theorem 2.6 directly.

Table 1 :
Empirical coverage probabilities of the confidence regions for β under different selection probability functions p x and sample sizes n when nominal level is 0.95.