This paper studies the nonparametric regressive function with missing response data. Three local linear M-estimators with the robustness of local linear regression smoothers are presented such that they have the same asymptotic normality and consistency. Then finite-sample performance is examined via simulation studies. Simulations demonstrate that the complete-case data M-estimator is not superior to the other two local linear M-estimators.
1. Introduction
Local polynomial regression methods, which have advantages over popular kernel methods in terms of the ability of design adaption and high asymptotic efficiency, have been demonstrated as effective nonparametric smoothers. In addition, the local polynomial regression smoothers can adapt to almost all regression settings and cope very well with the edge effects. For details, see [1] and references therein. However, a drawback of these local regression estimators is lack of robustness. It is well known that M-type of regression estimators have many desirable robustness properties. As a result, M-type of regression estimators is natural candidates.
In fact, some methods such as kernel, local regression, spline, and orthogonal series methods can estimate nonparametric functions. For an introduction to this subject area, see [1–3]. Local linear smoother, as an intuitively appealing method, has become popular in recent years because of its attractive statistical properties. As is shown in [4–7], local regression provides many advantages over modified kernel methods. Consequently, it is reasonable to expect that the local regression based M-type estimators carry over those advantages.
In the present paper, local M-type regression estimators are applied to propose three M-estimators of m(x) with missing response data, including the complete-case data M-estimator, the weighted M-estimator, and the estimated weighted M-estimator, such that these estimators have the same asymptotic normality and consistency. Finite sample simulations show that the complete-case data M-estimator is not superior to the other two local linear M-estimators.
In the regression analysis setup, the basic inference begins by considering the random sample
(1)(Xi,Yi,δi),i=1,2,…,n,
where the design point Xi is observed and
(2)δi={0,Yiismissing,1,otherwise.
Theoretically, this is actually a missing response problem. Recently, considerable interest in the work on nonparametric regression analysis with missing data and many methods has been developed. Cheng (see [8]) employed a kernel regression imputation approach to define the estimator of the mean of Y. Hirano et al. (see [9]) defined a weighted estimator for the response mean when response variable is missing. Then, Wang et al. (see [10]) developed estimation theory for semiparametric regression analysis in the presence of missing response. In the last year, Liang [11] and Wang and Sun [12] discussed the generalized partially linear models with missing covariate data and the partially linear models with missing responses at random, respectively.
To study the missing data (1), the MAR assumption would require that there exists a chance mechanism, denoted by p(X), such that
(3)P(δ=1∣X,Y)=P(δ=1∣X)=p(X)
holds almost surely. In practice, (3) might be justifiable by the nature of the experiment when it is legitimate to assume that the missing of Y mainly depends on X.
The paper is organized as follows. Some notations and preliminary results about three M-estimators of m(x) with missing response data including the local M-estimator with the complete-case data, the weighted M-estimator, and the estimated weighted M-estimator are given in Section 2. The asymptotic normality and consistency of three M-estimators are then presented in Section 3. In Section 4, simulation studies give some comparison results of the proposed estimators. Sketches of the proofs are given in Section 5.
2. Model and Estimators2.1. The Models
The nonparametric regression model that we will consider for the incomplete data (1) is given by
(4)Yi=m(Xi)+εi
for i=1,2,…,n. Here m(·) is the regression function, Xi is design point, εi=Yi-m(Xi) is regression error, and Yi is a response variable. The regression error εi is conveniently assumed to be independent and identically distributed (i.i.d.) random variable with mean 0 and variance σ2∈(0,∞). Furthermore, the variance of εi will be assumed to depend on Xi, denoted by σ2(Xi), allowing for heteroscedasticity. To simplify our preliminary discussion the covariate Xi will be assumed to be real-valued.
2.2. The Local M-Estimator with the Complete-Case Data
The local M-estimator with the complete-case data is defined as the solution of the following problems; that is, find a and b to minimize
(5)∑i=1nρ(Yi-a-b(Xi-x))K(Xi-xhn)δi,
or to satisfy the local estimation system of equations
(6)Ψ1cc(a,b)=∑i=1nψ(Yi-a-b(Xi-x))K(Xi-xhn)δi=0,Ψ2cc(a,b)=∑i=1nψ(Yi-a-b(Xi-x))Xi-xhnK(Xi-xhn)δi=0,
where ρ(·) is a given outlier-resistant function, ψ(·) is the derivative of ρ(·), K(·) is a kernel function, and hn is a sequence of positive numbers tending to zero.
The M-estimations of m(x) and m′(x) are defined as a^ and b^, which are the solution to the system of (6). We denote them by m^Mcc(x) and m^Mcc′(x), respectively.
2.3. The Weighted M-Estimator
If we find a and b to minimize
(7)∑i=1nρ(Yi-a-b(Xi-x))K(Xi-xhn)δip(Xi),
or to satisfy the local estimation system of equations
(8)Ψ1cp(a,b)=∑i=1nψ(Yi-a-b(Xi-x))K(Xi-xhn)δip(Xi)=0,Ψ2cp(a,b)=∑i=1nψ(Yi-a-b(Xi-x))Xi-xhnK(Xi-xhn)δip(Xi)=0,
then the solutions a and b are called the weighted M-estimator. Furthermore, the solutions of the system of (8) will be denoted by m^Mcp(x) and m^Mcp′(x), respectively.
2.4. The Estimated Weighted M-Estimator
In practice, the selection probabilities function p(x) is usually unknown. To estimate the selection probabilities, we apply the local linear smoother to find a and b such that
(9)∑i=1n[δi-a-b(Xi-x)]2K(Xi-xhn)
is minimized. A straightforward calculation yields
(10)p^(x)=∑i=1nviδi∑j=1nvj,
where
(11)vj=K(Xj-xhn)[mn,2-(Xj-x)mn,1],mn,l=∑j=1nK(Xj-xhn)(Xj-x)l,l=0,1,2.
With the estimator p^(x) defined in (10), we can give the estimated weighted M-estimator by finding a and b to minimize
(12)∑i=1nρ(Yi-a-b(Xi-x))K(Xi-xhn)δip^(Xi),
or to satisfy the local estimation system of equations
(13)Ψ1wp(a,b)=∑i=1nψ(Yi-a-b(Xi-x))K(Xi-xhn)δip^(Xi)=0,Ψ2wp(a,b)=∑i=1nψ(Yi-a-b(Xi-x))Xi-xhnK(Xi-xhn)δip^(Xi)=0.
By m^M*(x) and m^M*′(x) denote the solutions a and b of the system of (13), respectively.
3. Main Results
In this section, we will present the main results of this paper to explore the asymptotic distribution and consistencies of m^Mcc(x), m^Mcp(x), and m^M*(x). The following conditions will be used in the rest of this section.
The regression function m(·) has a continuous second derivative at the given point x and m′′(x) is continuous and bounded on support field D.
The sequence of bandwidths hn tends to zero such that nhn→+∞.
The design density f(·) is continuous at the given point x and f(x)>0.
The kernel function K(·) is a continuous probability density function with bounded support field [-1,1].
E[ψ(ɛ)∣X=x]=0 with ɛ=Y-m(X).
The function ψ(·) is continuous and has a derivative ψ′(·). Further, assume that ψɛ(x)=E[ψ′(ɛ)∣X=x] and ψɛ2(x)=E[ψ2(x)∣X=x] are positive and continuous at the given point x and there exists r>0 such that E[|ψ2+r(ɛ)|∣X=x] is bounded in a neighborhood of x.
The function ψ′(·) satisfies that
(14)E[sup|z|≤δ|ψ′(ɛ+z)-ψ′(ɛ)|∣X=x]=o(1),E[sup|z|≤δ|ψ(ɛ+z)-ψ(ɛ)-ψ′(ɛ)z|∣X=x]=o(δ),
as δ→0, uniformly in a neighborhood of x.
In addition, for the convenience of representation and proof, m^M(x) will denote three estimators m^Mcc(x), m^Mcp(x), and m^M*(x), and m^M′(x) will denote the estimators m^Mcc′(x), m^Mcp′(x), and m^M*′(x). Lastly, assume sl=∫-∞+∞ulK(u)du<∞ for l=0,1,2.
The following will present the main theorems such that the three M-estimators mentioned above have the same consistency and asymptotic normality.
Theorem 1.
Under conditions (1)–(7), f(·),p′′(·) are bounded; furthermore, f(·) also satisfies Lipchitz’s condition; that is, |f(x)-f(y)|≤c|x-y| for c>0. If hn=cn-β,0<β<1, then
(15)m^M(x)-m(x)⟶P0,n⟶∞,hn(m^M′(x)-m′(x))⟶P0,n⟶∞.
Theorem 2.
Under conditions (1)–(7), f(·),p′′(·) are bounded and f(·) also satisfies Lipchitz’s condition; that is, |f(x)-f(y)|≤c|x-y| for c>0. If hn=cn-β,0<β<1, then
(16)nhn(m^M(x)-m(x)-Cn)⟶LN(0,π(x)),
where Cn=(1/2)m′′(x)hn2s2(1+op(1)) and π(x)=((ψɛ2(x)/[ψɛ(x)]2)(1/f(x)p(x)))∫-∞+∞K2(u)du.
4. Simulation Studies
In this section, we conducted some simulations to better understand the finite-sample performance of the present three M-estimators. Then, we will compare the biases, the sample mean square errors (MSE), and the sample mean average square errors (MASE) of three M-estimators. The MASE of m^(x) is defined by
(17)MASE(h)=E[N-1∑i=1N(m^(xi)-m(xi))2W(xi)],
where W(xi)=I[h,1-h](xi) and {xi,i=1,…,N} is grid points.
The sample size was set n=500 and the regression model Yi=Xi3+εi was considered, where Xi is a uniform (0,1) and εi(i=1,2,…,n) is a random sample from N(0,0.1) and is independent of Xi. The kernel function was taken as the Epanechnikov kernel: K(u)=0.75(1-u2)+ and ψ(x)=x. To generate the indicators δi, the function p(x) was chosen as p(x)=0.8, for all x∈[0,1], and the pseudo i.i.d. uniform random variables U(0,1). As a result, each Ui∈U(0,1) was generated. If Ui≤0.8, then δi=1; otherwise, δi=0. By these, 500 independent sets of data were generated.
For m^M*(x), m^Mcp(x), and m^Mcc(x), the optimal bandwidth hopt was obtained by 500 simulations. Let hopti be the optimal bandwidth of the MAES(hn) obtained by the ith simulation and then compute hopt=(1/500)∑i=1500hopti. When hopt=0.2, we compare the bias, MSE, and MAES of the present three M-estimators. The comparison results are shown in Figures 1–3.
Comparison on the biases of three M-estimators, where the solid curve, the dashed curve, and the dot curve denote the biases of m^Mcc(x), m^M*(x), and m^Mcp*(x), respectively.
Figure 1 shows the biases of m^M*(x), m^Mcp(x), and m^Mcc(x). It is easy to see that m^Mcc(x) has considerable bias while m^Mcp(x) and m^M*(x) are very close in most points.
Figure 2 demonstrates the MSE of m^M*(x), m^Mcp(x), and m^Mcc(x). It follows from Figure 2 that m^Mcc(x) has considerable MSE while m^Mcp(x) and m^M*(x) are very close in most points.
Comparison on the MSE of three M-estimators, where the solid curve, the dashed curve, and the dot curve denote the MSE of m^Mcc(x), m^M*(x), and m^Mcp*(x), respectively.
Comparison on the MASE of three M-estimators, where the solid curve, the dashed curve, and the dot curve denote the MASE of m^Mcc(x), m^M*(x), and m^Mcp*(x), respectively.
Figure 3 shows the MASE of m^M*(x), m^Mcp(x), and m^Mcc(x). We can see that m^Mcc(x) has considerable MMSE while m^Mcp(x) and m^M*(x) are very close in most points.
The comparison results on the biases, the MSE, and the MASE of three M-estimators obtained by Figures 1, 2, and 3 show that the weighted M-estimator and the estimated weighted M-estimator are obviously superior to the complete-case data M-estimator while there is no appreciable distinction on the superiority between weighted M-estimator and the estimated weighted M-estimator.
5. The Proofs of Theorems
The proofs of Theorems 1 and 2 will be given in this section, respectively. The following lemmas will be needed for our technical proofs.
Lemma 3.
Under conditions (1)–(7) and for any random sequence {ηj}j=1n, if max1≤j≤n|ηj|=op(1) and Kj=K((Xj-x)/hn), then
(18)1n∑j=1nψ′(εj+ηj)δjKj(Xj-x)l=ψɛ(x)hnl+1f(x)p(x)sl(1+op(1)),1n∑j=1nψ′(εj+ηj)R(Xj)δjKj(Xj-x)l=12ψɛ(x)hnl+3m′′(x)f(x)p(x)sl+2(1+op(1)),
where R(Xj)=m(Xj)-m(x)-m′(x)(Xj-x).
Proof.
Since the second equality can be obtained from the first one, we only prove the first equality. It is obvious that
(19)1n∑j=1nψ′(εj+ηj)δjKj(Xj-x)l=1n∑j=1nψ′(εj)δjKj(Xj-x)l+1n∑j=1n[ψ′(εj+ηj)-ψ′(εj)]δjKj(Xj-x)l=Tn,1+Tn,2,
where
(20)Tn,1=1n∑j=1nψ′(εj)δjKj(Xj-x)l,Tn,2=1n∑j=1n[ψ′(εj+ηj)-ψ′(εj)]δjKj(Xj-x)l.
Similar to the proof of Lemma 4 in [13], we have
(21)ETn,1=E[1n∑j=1nE(ψ′(εj)∣Xj)δjKj(Xj-x)l]=ψɛ(x)hnl+1f(x)p(x)sl(1+op(1)),Tn,1=ψɛ(x)hnl+1f(x)p(x)sl(1+op(1)).
The following only proves Tn,2=op(hnl+1). Let Δn=(ξ1,…,ξn)T,Dη={Δn:|ξj|≤η,∀j≤n} for any given η>0 and
(22)V(Δn)=1hnl+1∑j=1n[ψ′(εj+ξj)-ψ′(εj)]δjKj(Xj-x)l.
Then we have
(23)supDn|V(Δn)|≤1hnl+1∑j=1nsupDn|ψ′(εj+ξj)-ψ′(εj)|δjKj(Xj-x)l.
By condition (7) and noticing that |Xj-x|≤hn in the above expressions, it is not difficult to see
(24)EsupDn|V(Δn)|≤aη1hnl+1E∑j=1nδjKj(Xj-x)l≤bη,
where aη and bη are two sequences of positive numbers, tending to zero as η→0. Since sup1≤j≤n|ηj|=op(1), it follows that V(Δ^n)=op(1) with Δ^n=(η1,…,ηn)T. The conclusion is obtained coming from the fact that Tn,2=hnl+1V(Δ^n)=op(hnl+1).
Lemma 4.
Under conditions (1)–(7) and for any random sequence {ηj}j=1n, if max1≤j≤n|ηj|=op(1) and Kj=K((Xj-x)/hn), then
(25)1n∑j=1nψ′(εj+ηj)δjp(Xj)Kj(Xj-x)l=ψɛ(x)hnl+1f(x)sl(1+op(1)),1n∑j=1nψ′(εj+ηj)R(Xj)δjp(Xj)Kj(Xj-x)l=12ψɛ(x)hnl+3m′′(x)f(x)sl+2(1+op(1)),
where R(Xj)=m(Xj)-m(x)-m′(x)(Xj-x).
Proof.
The proof is similar to the proof of Lemma 3.
Lemma 5.
Under conditions (1)–(7), Jn/nhn is asymptotically normal and
(26)Jnnhn⟶LN(0,D(x)),
where Jn=∑j=1nψ(εj)δjK((Xj-x)/hn) and D(x)=ψɛ2(x)f(x)p(x)∫-∞+∞K2(u)du.
Proof.
Let Jn=∑j=1nψ(εj)K((Xj-x)/hn)δj≜∑j=1nξj. Then Jn is a sum of i.i.d. random variables with mean zero and variance Bn2 with
(27)Bn2=nE{ψ2(ε1)K2(X1-xhn)δ12}.
Similar to the proof of Lemma 4 in [13], we can easily obtain asymptotic expression of Bn2, namely,
(28)Bn2=nhnψɛ2(x)f(x)p(x)∫-∞+∞K2(u)du,
and easily verify Lyapunov’s condition (1/Bn2+r)∑j=1nE|ξj|2+r→0 via using condition (6). That is, Jn is asymptotically normal. With (28) we have
(29)Jnnhn⟶LN(0,D(x)).
This completes the proof of this lemma.
Lemma 6.
Under conditions (1)–(7), nhnJ^nwas asymptotically normal and
(30)nhnJ^n⟶LN(0,D*(x)),
where J^n=(1/nhn)∑j=1nψ(εj)(δj/p(Xj))K((Xj-x)/hn) and D*(x)=ψɛ2(x)(f(x)/p(x))∫-∞+∞K2(u)du.
Proof.
The proof is similar to the proof of Lemma 5.
Lemma 7.
Under conditions (1)–(7), f(·) and p′′(·) are bounded; furthermore, f(·) also satisfies Lipchitz’s condition; that is, |f(x)-f(y)|≤c|x-y| for c>0. If hn=cn-β,0<β<1, then the following equalities hold:
equality (31) is changed into
(32)1p^(x)=∑i=1nvi(∑j=1nvjδj+n-2)+n-2∑i=1nvi{∑j=1nvjδj(∑j=1nvjδj+n-2)}.
Now, we denote Zn=Or(an) if E|Zn|r=O(anr). It is easy to see that
(33)(i)Or(an)Or(bn)=Or/2(anbn),(ii)Zn=E(Zn)+Or((E|Zn-EZn|r)1/r).
Then we have with the method of the kernel density estimate that
(34)Emn,l=nhnl+1f(x)sl(1+O(hn)),Esn,l=nhnl+1f(x)p(x)sl(1+O(hn)),
where
(35)sn,l=∑j=1nδjK(Xj-xhn)(Xj-x)l,l=0,1,2.
By the operations property of (2) for an integer r>1,
(36)1nhnl+1mn,l=1nhnl+1Emn,l+Or(1nhn)=f(x)sl+Or(hn+1nhn),(37)1nhnl+1sn,l=1nhnl+1Esn,l+Or(1nhn)=f(x)p(x)sl+Or(hn+1nhn).
Since s0=1 and s1=0,
(38)∑i=1nvi=mn,0mn,2-mn,12=n2hn4f2(x)s2(1+Or(hn+1nhn)),(39)∑j=1nvjδj=sn,0mn,2-sn,1mn,1=n2hn4f2(x)p(x)s2(1+Or(hn+1nhn)).
Let Wn=(∑j=1nvjδj+n-2)/n2hn4,W=p(x)f2(x)s2. The following will prove
(40)1Wn=1W+o4(1).
In fact, (40) holds only if E((W/Wn)-1)4=o(1) holds. Then,
(41)E(WWn-1)4=E(Wn-W)4Wn4I(|Wn-W|≤W2)+E(Wn-W)4Wn4I(|Wn-W|>W2)≤(W2)-4E(Wn-W)4+n16E(Wn-W)4I(|Wn-W|>W2)≜An+Bn,
where An=(W/2)-4E(Wn-W)4 and Bn=n16E(Wn-W)4I(|Wn-W|>W/2). It follows from (39) and the definition of Wn and W that
(42)Wn-W=∑j=1nvjδj+n-2n2hn4-p(x)f2(x)s2=p(x)f2(x)s2Or(hn+1nhn).
Further, (41) and (42) yield An=o(1). Again, since Wn≥n-4,
(43)Bn≤n16(W2)-rE(Wn-W)r+4=O(n16(hn+1nhn)r+4),
which indicates Bn=o(1) if r is sufficiently large. As a result, equality (40) holds. We conclude from (38) and (39) that this completes the proof of ①.
Similar to the proof of Theorem 1 in [4], one can easily obtain
(44)E{(p^(Xi)-p(Xi))∣Xi}=12hn2p′′(Xi)∫-∞+∞u2K(u)du{1+op(1)}=12hn2C1(Xi){1+op(1)},Var{(p^(Xi)-p(Xi))∣Xi}=1nhnp(Xi)(1-p(Xi))f(Xi)∫-∞+∞K2(u)du{1+op(1)}=1nhnC2(Xi){1+op(1)}.
As a consequence, we have with (44) that
(45)VarBn=1n2hn2Var×{∑j=1nK(Xj-xhn)εjδjp2(Xj)(p^(Xj)-p(Xj))}=1n2hn2E×{∑j=1nK(Xj-xhn)εjδjp2(Xj)(p^(Xj)-p(Xj))}2+op(1)≤1n2hn2E{∑j=1nK2(Xj-xhn)εj21p4(Xj)×E[(p^(Xj)-p(Xj))2∣Xj]∑j=1n}+op(1)=1n2hn2E{∑j=1nK2(Xj-xhn)εj21p4(Xj)×Var(p^(X1)-p(X1)∣Xj)∑j=1n}+1n2hn2E{∑j=1nK2(Xj-xhn)εj21p4(Xj)×E2((p^(Xj)-p(Xj))∣Xj)∑j=1n}+op(1)=1n3hn3E{∑j=1nK2(Xj-xhn)εj21p4(Xj)C2(Xj)}×(1+op(1))+hn24n2E{∑j=1nK2(Xj-xhn)εj21p4(Xj)C1(Xj)}×(1+op(1))=1n2hn3E{K2(X1-xhn)ε121p4(X1)C2(X1)}(1+op(1))+hn24nE{K2(X1-xhn)ε121p4(X1)C1(X1)}(1+op(1))=op(1).
This completes the proof of ②.
The proof is similar to the proof of ②.
Lemma 8.
Under conditions (1)–(7), if nhn4→0 and nhn2/log(hn-1)→∞ when n→∞, then
(46)supx∈D|p^(x)-p(x)-1nhnf(x)∑j=1nεjK(Xj-xhn)|=Op{cnhn2+cn2log1/2(1hn)},
where D⊆R is a compact set, cn=(nhn)-1/2, and p^(x) is the estimate value of p(x).
Proof.
The same arguments as those of Theorem 2 in [14] can yield the proof of this lemma.
Lemma 9.
Under conditions (1)–(7), f(·) and p′′(·) are bounded; furthermore, f(·) also satisfies Lipchitz’s condition; that is, |f(x)-f(y)|≤c|x-y| for c>0. If hn=cn-β,0<β<1, then nhnJ~n was asymptotically normal and nhnJ~n→LN(0,D*(x)), where J~n=(1/nhn)∑j=1nψ(εj)K((Xj-x)/hn)(δj/p^(Xj)) and D*(x)=ψɛ2(x)(f(x)/p(x))∫-∞+∞K2(u)du.
Proof.
By using the first equality of Lemmas 7 and 8, we get
(47)J~n-J^n=1nhn∑j=1n[δjp^(Xj)ψ(εj)K(Xj-xhn)-δjp(Xj)ψ(εj)K(Xj-xhn)]=1nhn∑j=1n[ψ(εj)δjK(Xj-xhn)p(Xj)-p^(Xj)p(Xj)p^(Xj)]=1nhn∑j=1n[ψ(εj)δjK(Xj-xhn)p(Xj)-p^(Xj)p2(Xj)]×{1+op(1)}=-Gn{1+op(1)}.
It then follows from the third equality of Lemmas 7 and 6 that J~n=J^n+op(1) and consequently nhnJ~n→LN(0,D*(x)). This completes the proof of this lemma.
In what follows we present the proofs of Theorems 1 and 2, respectively.
Proof of Theorem 1.
The proof can be proved by the following two cases.
(i) If either m^M(x)=m^Mcc(x) or m^M(x)=m^Mcp(x), the proof of this theorem is similar to the proof of Theorem 1 in [15].
(ii) If m^M(x)=m^M*(x), similar to the proof of Theorem 1 in [15], the proof of this theorem is obtained immediately by the first equality of Lemmas 7 and 8. It is easy to see from (i) and (ii) that we complete the proof of this theorem.
Proof of Theorem 2.
We can prove the conclusion of this theorem by the following three cases.
(1) If m^M(x)=m^Mcc(x), let
(48)R(Xj)=m(Xj)-m(x)-m′(x)(Xj-x),η^j=R(Xj)-[m^Mcc(x)-m(x)]-[m^Mcc′(x)-m′(x)](Xj-x)=m(Xj)-m^Mcc(x)-m^Mcc′(x)(Xj-x),
where m^Mcc′(·) is the estimate of m′(·). Then
(49)εj+η^j=Yj-m^Mcc(x)-m^Mcc′(x)(Xj-x).
Using (6), we get
(50){ψ(εj)+ψ′(εj)η^j+[ψ(εj+η^j)-ψ(εj)-ψ′(εj)η^j]}×δjK(Xj-xhn)=0.
Again,
(51)∑j=1nψ′(εj)η^jδjK(Xj-xhn)=∑j=1nψ′(εj)R(Xj)δjK(Xj-xhn)-∑j=1nψ′(εj)δjK(Xj-xhn)(m^Mcc(x)-m(x))-∑j=1nψ′(εj)δjK(Xj-xhn)(m^Mcc′(x)-m′(x))(Xj-x)=I1-I2-I3,
where
(52)I1=∑j=1nψ′(εj)R(Xj)δjK(Xj-xhn),I2=∑j=1nψ′(εj)δjK(Xj-xhn)(m^Mcc(x)-m(x)),I3=∑j=1nψ′(εj)δjK(Xj-xhn)×(m^Mcc′(x)-m′(x))(Xj-x).
By Lemma 3, we have that
(53)I1=nhn32p(x)f(x)ψɛ(x)m′′(x)s2(1+op(1)),I2=nhnp(x)f(x)ψɛ(x)s0(m^Mcc(x)-m(x))×(1+op(1)),(54)I3=nhn2p(x)f(x)ψɛ(x)s1(m^Mcc′(x)-m′(x))×(1+op(1)).
It follows from the consistency shown in (15) and condition (7) that
(55)∑j=1n[ψ(εj+η^j)-ψ(εj)-ψ′(εj)η^j]δjK(Xj-xhn)=op(I2).
Since s0=1 and s1=0, we have with (50) and (54) that
(56)nhnp(x)f(x)ψɛ(x)(m^Mcc(x)-m(x))(1+op(1))=nhn32p(x)f(x)ψɛ(x)m′′(x)s2(1+op(1))+Jn.
Therefore,
(57)m^Mcc(x)-m(x)=hn2m′′(x)2s2(1+op(1))+Jnnhnψɛ(x)f(x)p(x)(1+op(1))=Cn+Jnnhnψɛ(x)f(x)p(x)(1+op(1)).
By Lemma 5 and Slutsky’s Theorem, we get (16); that is,
(58)nhn(m^Mcc(x)-m(x)-Cn)⟶LN(0,π(x)).
(2) If m^M(x)=m^Mcp(x), similar to the proof case (1), the proof can be obtained immediately by Lemmas 4 and 6.
(3) The following will prove the case when m^M(x)=m^M*(x). Let
(59)R(Xj)=m(Xj)-m(x)-m′(x)(Xj-x),η^j*=R(Xj)-[m^M*(x)-m(x)]-[m^M*′(x)-m′(x)](Xj-x)=m(Xj)-m^M*(x)-m^M*′(x)(Xj-x),
where m^M*′(·) is the estimate of m′(·). Then we have
(60)εj+η^j*=Yj-m^M*(x)-m^M*′(x)(Xj-x).
By (13), one can get
(61){ψ(εj)+ψ′(εj)η^j*+[ψ(εj+η^j*)-ψ(εj)-ψ′(εj)η^j*]}×δjp^(Xj)K(Xj-xhn)=0.
It then follows from the first equality of Lemma 7, Lemma 8, and (61) that
(62)1nhn∑j=1nψ′(εj)η^j*δjp^(Xj)K(Xj-xhn)=1nhn∑j=1nψ′(εj)R(Xj)δjp(Xj)K(Xj-xhn)-1nhn∑j=1nψ′(εj)δjp(Xj)K(Xj-xhn)(m^M*(x)-m(x))-1nhn∑j=1nψ′(εj)δjp(Xj)K(Xj-xhn)×(m^M*′(x)-m′(x))(Xj-x)+op(1)=I1-I2-I3+op(1),
where
(63)I1=1nhn∑j=1nψ′(εj)R(Xj)δjp(Xj)K(Xj-xhn),I2=1nhn∑j=1nψ′(εj)δjp(Xj)K(Xj-xhn)×(m^M*(x)-m(x)),I3=1nhn∑j=1nψ′(εj)δjp(Xj)K(Xj-xhn)×(m^M*′(x)-m′(x))(Xj-x).
By Lemma 4, it is easy to see
(64)I1=hn22f(x)ψɛ(x)m′′(x0)s2(1+op(1)),(65)I2=f(x)ψɛ(x)s0(m^M*(x)-m(x))(1+op(1)),(66)I3=hnf(x)ψɛ(x)s1(m^M*′(x)-m′(x))(1+op(1)).
According to the first equality of Lemmas 7 and 8, the consistency shown in (15), and the condition (7), we have
(67)1nhn∑j=1n[ψ(εj+η^j*)-ψ(εj)-ψ′(εj)η^j*]×δjp^(Xj)K(Xj-xhn)=op(I2).
Since s0=1 and s1=0, we have with (61) and (65) that
(68)f(x)ψɛ(x)(m^M*(x)-m(x))(1+op(1))=hn22f(x)ψɛ(x)m′′(x)s2(1+op(1))+J~n.
Therefore,
(69)m^M*(x)-m(x)=hn2m′′(x)2s2(1+op(1))+J~nψɛ(x)f(x)p(x)(1+op(1))=Cn+J~nψɛ(x)f(x)p(x)(1+op(1)).
By Lemma 9 and Slutsky’s Theorem, we get (16); that is,
(70)nhn(m^M*(x)-m(x)-Cn)⟶LN(0,π(x)).
It is easy to see from cases (1), (2), and (3) that we complete the proof.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work was partly supported by the Science Foundation of the Education Department of Shaanxi Province of China (2013JK0593), the Scientific Research Foundation (BS1014) and the Education Reform Foundation (2012JG40) of Xi’an Polytechnic University, and the National Natural Science Foundations of China (11201362, 11271297, and 1101325).
FanJ.GijbelsI.1996London, UKChapman and HallGreenP. J.SilvermanB. W.1994London, UKChapman and HallTitteringtonD. M.MillG. M.Kernel-based density estimates from incomplete data1983452258266MR721752FanJ.Local linear regression smoothers and their minimax efficiencies199321119621610.1214/aos/1176349022MR1212173ZBLl0773.62029HastieT.LoaderC.Local regression: automatic kernel estimators of regression curves199315182201OrchardT.WoodburyM. A.A missing information principle: theory and applications3Proceedings of the 6th Berkeley Symposium on Mathematical Statistics and ProbabilityJune-July 1970University of California697715MR0400516RuppertD.WandM. P.Multivariate locally weighted least squares regression19942231346137010.1214/aos/1176325632MR1311979ZBLl0821.62020ChengP. E.Nonparametric estimation of mean functionals with data missing at random199489425818710.1080/01621459.1994.10476448HiranoK.ImbensG. W.RidderG.Efficient estimation of average treatment effects using the estimated propensity score20037141161118910.1111/1468-0262.00442MR19958262-s2.0-0141495120WangQ.LintonO.HärdleW.Semiparametric regression analysis with missing response at random20049946633434510.1198/016214504000000449MR20628202-s2.0-2942599454LiangH.Generalized partially linear models with missing covariates200899588089510.1016/j.jmva.2007.05.004MR2405096ZBLl1136.623382-s2.0-40749158174WangQ.SunZ.Estimation in partially linear models with missing responses at random20079871470149310.1016/j.jmva.2006.10.003MR2364130ZBLl1116.620422-s2.0-34248576272FanJ.GijbelsI.Variable bandwidth and local linear regression smoothers19922042008203610.1214/aos/1176348900MR1193323CarrollR. J.FanJ.GijbelsJ.WandM. P.Generalized partially linear single-index models19979243847748910.2307/2965697MR14678422-s2.0-0031536017FanJ.JiangJ.Variable bandwidth and one-step local M-estimator199929168870210.1007/BF029038492-s2.0-0042689800