In this paper, we consider a single-index varying-coefficient model with application to longitudinal data. In order to accommodate the within-group correlation, we apply the block empirical likelihood procedure to longitudinal single-index varying-coefficient model, and prove a nonparametric version of Wilks’ theorem which can be used to construct the block empirical likelihood confidence region with asymptotically correct coverage probability for the parametric component. In comparison with normal approximations, the proposed method does not require a consistent estimator for the asymptotic covariance matrix, making it easier to conduct inference for the model's parametric component. Simulations demonstrate how the proposed method works.
1. Introduction
The single-index varying-coefficient model which was proposed by Huang and Zhensheng [1] is a very important tool to explore the dynamic pattern in many complex dynamic systems, such as economics, finance, politics, epidemiology, medical science, and ecology. As mentioned in Gao et al. [2], the concept of complex dynamic systems arises in many varieties. Such systems are often concurrent and distributed, because they have to react to various kinds of events, signals, and conditions. They may be characterized by a system with uncertainties, time delays, stochastic perturbations, hybrid dynamics, distributed dynamics, chaotic dynamics, and a large number of algebraic loops. Moreover, many related literatures, such as Jian et al. [3] and Hu et al. [4], have been proposed. The single-index varying-coefficient models is one method that can be used to describe the complex dynamic systems. They are natural extensions of classical parametric models with good interpretability and are becoming more and more popular in data analysis.
Longitudinal data arise frequently in many scientific studies. For longitudinal data, we know that the data that are collected from the same subject at different times are correlated and that the observations from different subjects are often independent. Therefore, it is of great interest to estimate the regression function incorporating the within-subject correlation to improve the efficiency of estimation. The single-index varying-coefficient model is a popular nonparametric fitting technique; it is easily interpreted in real applications because it has the features of the single-index model and the varying-coefficient model. In addition, the single-index varying-coefficient model may include cross-product terms of some components of covariates. Hence, it has considerable flexibility to cater for a complex multivariate nonlinear structure.
Without loss of generality, we consider a longitudinal study with N subjects and ni observations over time for the ith subject (i=1,…,N) with a total of n=∑i=1Nni observations. In this article, we apply longitudinal data to a single-index varying-coefficient model, and propose a single-index varying-coefficient longitudinal data model of the form
(1)yij=gT(βTxij)zij+εij,i=1,…,N;j=1,…,ni,
where (xij,zij)∈Rp×Rq is a vector of covariates, yij is the jth measurement on the ith unit, β is an p×1 vector of unknown parameters, g(·) is an q×1 vector of unknown functions, and εij is a random error with mean 0 and finite variance σ2, assuming that εij and (xij,zij) are independent. For the sake of identifiability, it is often assumed that ∥β∥=1 and the first nonzero element is positive, where ∥·∥ denotes the Euclidean metric.
Obviously, model (1) includes a class of important statistical models. For example, if q=1 and zij=1, model (1) reduces to the single-index longitudinal data model which was proposed by Bai et al. [5] to estimate the index coefficient and unknown link function in a single-index model for longitudinal data by combining penalized splines and quadratic inference functions. If p=1 and β=1, (1) is the varying-coefficient longitudinal data model studied by Chiang et al. [6], Huang et al. [7], and Qu and Li [8], among others. So model (1) is easily interpreted in real applications because it has the features of the single-index longitudinal data model and the varying-coefficient longitudinal data model. In addition, model (1) may include cross-product terms of some components of xij and zij. Hence, it has considerable flexibility to cater for complex multivariate nonlinear structure.
When ni=1, model (1) reduces to the nonlongitudinal single-index varying-coefficient model. Some authors have studied the estimation and application of the model. Recently, empirical likelihood methods have been applied to nonlongitudinal single-index varying-coefficient model. For example, Xue and Wang [9] developed statistical techniques for the unknown coefficient functions and single-index parameters in the single-index varying-coefficient models. They first estimate the nonparametric component via the local linear fitting, then construct an estimated empirical likelihood ratio function, and hence obtain a maximum empirical likelihood estimator for the parametric component. The motivation is that empirical likelihood based inference has many desirable statistical properties. For example, this method does not involve any variance estimation which is rather complicated in nonparametric or semiparametric regression settings and hence are robust against the heteroscedasticity; confidence region based on the empirical likelihood method does not have predetermined symmetry so that it can better correspond to the true shape of the underlying distribution, and so on. Owen [10, 11] and many others developed this into a general methodology. For example, Wang and Jing [12], Chen and Qin [13], Shi and Lau [14], and Xue and Zhu [15–17], among others. A recent survey on empirical likelihood can be found in the monograph of Owen [18]. More methods about the single-index varying-coefficient model have been proposed, such as Huang and Zhang [19] and Feng and Xue [20]. When ni>1, model (1) is the single-index longitudinal data model. The usual empirical likelihood method cannot be applied, however, to the single-index longitudinal data model (1) due to correlation within groups. In this paper, we propose a block empirical likelihood procedure to accommodate this correlation. A nonparametric version of the Wilks’ theorem is derived, which can be used to construct confidence regions with asymptotically correct coverage probabilities for the parametric component in the model. Compared with normal approximations, our method has the appealing feature that it does not require one to construct a consistent estimator for the asymptotic covariance matrix. Furthermore, the block empirical likelihood method avoids intensive Monte Carlo simulations usually required by the bootstrap method.
The rest of the paper is organized as follows. Section 2 introduces the estimated block empirical likelihood method. Section 3 derives the nonparametric version of Wilks’ theorem. Section 4 provides a data-driven procedure to choose the tuning parameters. A simulation study is given in Section 5. Proof of the main result is relegated to Section 6.
2. Block Empirical Likelihood Method
In this section, we are to extend the results of You et al. [21] and Xue and Wang [9] to the single-index varying-coefficient longitudinal data model.
To apply the block empirical likelihood method to model (1), we introduce an auxiliary random vector
(2)ηij(β)={yij-gT(βTxij)zij}g˙T(βTxij)zijxijω(βTxij),
where g˙(·) stands for the derivative of the function vector g(·), and ω(·) is a bounded weight function with a bounded support 𝒰ω, which is introduced to control the boundary effect in the estimations of g(·) and g˙(·). For convenience, we pointed that ω(·) is the indicator function of the set 𝒰ω. Note that E{ηij(β)}=0 if β=β0. Hence, the problem of testing whether β is the true parameter is equivalent to testing whether E{ηij(β)}=0, for i=1,…,N;j=1,…,ni. Because of the unknowns g(·) and g˙(·), we cannot directly use the block empirical likelihood method to make statistical inference on β. A natural way is to replace g(·) and g˙(·) by their estimators. In this paper, we estimate the vector functions g(·) and g˙(·) via the local linear regression technique (see, e.g., Fan and Gijbels [22]). The local linear estimators for g(u) and g˙(u) are defined as g^(u;β)=a^ and g˙^(u;β)=b^ at the fixed point β0, where a^ and b^ minimize the sum of weighted squares:
(3)∑i=1N∑j=1ni[yij-{a+b(βTxij-u)}Tzij]2Kh(βTxij-u),
where Kh(·)=h-1K(·/h), K(·) is a kernel function, and h=hn is a bandwidth sequence that decreases to 0 as n increase to ∞. It follows from the least squares theory that
(4)(g^T(u;β),hg˙^T(u;β))T=Sn-1(u;β)ξn(u;β),
where
(5)Sn(u;β)=(Sn,0(u;β)Sn,1(u;β)Sn,1(u;β)Sn,2(u;β)),ξn(u;β)=(ξn,0(u;β)ξn,1(u;β)ξn,1(u;β)ξn,2(u;β))
with
(6)Sn,k(u;β)=1n∑i=1N∑j=1nizijzijT(βTxij-uh)kKh(βTxij-u),ξn,k(u;β)=1n∑i=1N∑j=1nizijyij(βTxij-uh)kKh(βTxij-u).
Remark 1.
Since the convergence rate of the estimator of g˙0′(u) is slower than that of the estimator of g0(u) if the same bandwidth is used, this leads to a slower convergence rate for the estimator β^ of β than n. To increase the convergence rate of the estimator of g˙0′(u), we introduce the another bandwidth h1 to replace h in g˙^(u;β) and define it as g˙^h1(u;β).
Similar to Owen [11] and Shi and Lau [14], {r^ij(β)=yij-g^T(βTxij)zij,i=1,…,N;j=1,…,ni} can be treated as a random sieve approximation of the random error sequence {εij,i=1,…,N;j=1,…,ni}. In order to deal with the correlation within groups, we use the block empirical likelihood procedure proposed by You et al. [21]. Unlike the usual empirical likelihood method, the block empirical likelihood procedure takes the “data” r^ij(β) for j=1,…,ni into account as a whole. Let η^ij(β)=r^ij(β)g˙^T(βTxij)zijxijω(βTxij) be ηij(β), with g(βTxij) and g˙(βTxij) replaced by g^(βTxij;β) and g˙^(βTxij;β), respectively, for i=1,…,N;j=1,…,ni. Then an estimated block empirical likelihood function for β is defined as
(7)L^(β)=max{∏i=1npi∣pi≥0,∑i=1kpi=1,∑i=1kpi∑j=1niη^ij(β)=0}.
For a given β a unique maximum exists, provided that 0 is inside the convex hull of the points ∑j=1niηij(β) for i=1,…,N. The maximum of (7) may be found via the method of Lagrange multipliers. The optimal value for pi satisfying (7) may be shown to be
(8)pi=1N×11+λT∑j=1niη^ij(β),
where the Lagrange multiplier λ=(λ1,…,λp)T is the solution of the following equation:
(9)0=1N∑i=1N∑j=1niη^ij(β)1+λT∑j=1niη^ij(β).
Since p1×⋯×pN is maximized for pi=1/N in the absence of parametric constraints, we define the corresponding estimated profile block empirical log-likelihood ratio as
(10)l^(β)=-∑i=1Nlog[1+λT∑j=1niη^ij(β)].
We will show in the next section that if β0 is the true parameter vector, l^(β0) is asymptotically chi-square distributed.
3. Theoretical Properties
Throughout this article, we assume that N increases to push up the total sample size n=n1+⋯+nN, while the ni is fixed. To establish the nonparametric Wilks' theorem for LR(β0), we first make the following assumptions.
The density function of βTxij, f(u), is bounded away from zero for u∈𝒰ω and β near β0 and satisfies the Lipschitz condition of order 1 on 𝒰ω, where 𝒰ω is the support of ω(u).
The function gk(u),1≤k≤q, have continuous second derivatives on 𝒰ω, where gk(u) are the kth components of g(u).
The kernel K(·) is a symmetric probability density function with a bounded support and satisfies the Lipschitz condition of order 1 and ∫u2K(u)du≠0.
The matrix D(u)=E(zijzijT∣β0Txij=u) is positive definite, and each entry of D(u) and C(u)=E(vijzijT∣β0Txij=u) satisfies the Lipschitz condition of order 1 on 𝒰ω, where vij=xijg˙0T(β0Txij)zijω(β0Txij), and 𝒰ω is defined in (A.1).
The matrices B(β0)=E(vijvijT) and B*(β0)=B(β0)-E{C(β0Txij)g˙0(β0Txij)E(xijT∣β0Txij)} are positive definite, where vij is defined in (A.6).
Remark 2.
Condition (A.1) is used to bound the density function of βTxij away from zero. This ensures that the denominators of g^(u;β) and g˙^(u;β) are, in probability one, bounded away from 0 for u∈𝒰ω. The second derivatives in (A.2) are standard smoothness conditions. (A.3)–(A.5) are necessary conditions for the asymptotic normality or the uniform consistency of the estimators. It should be pointed out that the condition can be replaced by E(∥xij∥6+δ)<∞, E(∥zij∥6+δ)<∞, and E(|εij|6+δ)<∞ for some δ>0. In the current work, the exponential index of the norm is set as 6 for it is the minimum value to meet the asymptotic normality or the uniform consistency of the estimators. Conditions (A.6) and (A.7) ensure that the asymptotic variance for the estimator of β0 exists.
Let ℬ={β∈Rp:∥β∥=1, and the first nonzero element is positive}. Then β0 is an inner point of set ℬ. The following theorem shows that -2l^(β0) is asymptotically distributed as a weighted sum of independent χ12 variables.
Theorem 3.
Suppose that (A.1)–(A.7) hold, then as N→∞,
(11)-2l^(β0)→Dω1χ1,12+⋯+ωpχ1,p2,
where →D represents convergence in distribution, χ1,12,…,χ1,p2 are independent χ12 variables, and the weights ωj, for 1≤j≤p, are the eigenvalues of G(β0)=B-1(β0)A(β0). Here B(β0) is defined in condition (A.7),
(12)A(β0)=B(β0)-E{C(β0Txij)D-1(β0Txij)CT(β0Txij)},
and C(u) and D(u) are defined in condition (A.6).
To apply Theorem 3 to construct a confidence region or interval for β0, we need to consistently estimate the unknown weights ωj. By the plug-in method, A(β0) and B(β0) can be consistently estimated by
(13)A^(β^)=1N∑i=1N∑j=1ni{v^ijv^ijT-C^(β^Txij)D^-1(β^Txij)C^T(β^Txij)},(14)B^(β^)=1N∑i=1N∑j=1niv^ijv^ijT,
respectively, where β^ is the maximum empirical likelihood estimator of β0 defined by (9), v^ij=xijg˙^T(β^Txij;β^)zijω(β^Txij), C^(·)=∑i=1nWni(·)v^ijzijT, and D^(·)=∑i=1nWni(·)zijzijT with
(15)Wnij(·)=K1(β^Txij-·/bn)∑k=1N∑l=1nkK1(β^Txkl-·/bn),
where K1(·) is a kernel function, and bn is bandwidth with 0<bn→0.
This implies that the eigenvalues of G^(β^)=B^-1(β^)A^(β^), say ω^j, consistently estimate ωj for j=1,…,p. Let c^1-α be the 1-α quantile of the conditional distribution of the weighted sum s^=ω^1χ1,12+⋯+ω^pχ1,p2 given the data. Then an approximate 1-α confidence region for β0 can be defined as follows:
(16)ℛ(α)={β∈ℬ:-2l^(β)≤C^1-α}.
In practice, the conditional distribution of the weighted sum s^, given the sample {(yij,xij,zij),1≤i≤N;1≤j≤ni}, can be calculated using Monte Carlo simulations by repeatedly generating independent samples χ1,12,…,χ1,p2 from the χ12 distribution.
In addition to the above, direct way of approximating the asymptotic distributions, we can also consider the following alternative. The alternative is motivated by the results of Rao and Scott [24]. Now, we propose another adjusted empirical log-likelihood, whose asymptotic distribution is chi-squared with p degrees of freedom. The adjustment technique is developed by Wang and Rao [25] by using an approximate result in Rao and Scott [24]. Note that ρ^(β) can be written as
(17)ρ^(β)=tr{A^-(β)A^(β)}tr{B^-1(β)A^(β)}.
By examining the asymptotic expansion of -2l^(β), which is specified in the proof of Theorem 4 below, we define an adjustment factor
(18)r^(β)=tr{A^-(β)Σ^(β)}tr{B^-1(β)Σ^(β)},
by replacing A^(β) in ρ^(β) by Σ^(β), where Σ^(β)={∑i=1N∑j=1niη^ij(β)}{∑i=1N∑j=1niη^ij(β)}T. The adjusted empirical log-likelihood ration is defined by
(19)l^a(β)=r^(β){-2l^(β)},
where l^(β) is defined in (10).
Theorem 4.
Suppose that conditions (A.1)–(A.6) hold. Then, l^a(β)→Dχp2.
According to Theorem 4, l^a(β) can be used to construct an approximate confidence region for β0. Let
(20)ℛa(α)={β∈ℬ:l^a(β)≤χp2(1-α)}.
Then, ℛa(α) gives a confidence region for β0 with asymptotically correct coverage probability 1-α.
4. Bandwidth Selection
For practical implementation, the tuning parameters need to be decided. We employ a data-driven procedure to choose the tuning parameter h, where h controls the smoothness of g^(·) and g˙^(·). We all know that various existing bandwidth selection techniques for nonparametric regression, such as the cross-validation, generalized cross-validation, and the modified multifold cross-validation criterion, can be adapted for the estimation g^(·) and g˙^(·). Because the algorithm of the modified multifold cross-validation criterion proposed by Cai et al. [26] to select the optimal bandwidth is simple and quick, throughout the empirical studies in this paper, we consider the modified multifold cross-validation criterion. Specifically, let m and M be two given positive integers and n>mM. The basic idea is first to use M subseries of lengths n-km(k=1,…,M) to estimate the unknown coefficient functions and then to compute the one-step forecasting error of the next section of the sample of length m based on the estimated models. More precisely, we choose h which minimizes
(21)AMS(h)=∑k=1M1m∑i=n-km+1n-km+m∑j=1ni{yij-∑l=1qg^l,k(βTxij)zij(l)}2,
where g^l,k(·) are computed from the sample {(yij,xij,zij),1≤i≤n-km;1≤j≤ni} with bandwidth equal to h(n/(n-km))1/5. Note that for different sample size, we rescale bandwidth according to its optimal rate, that is, h∝n-1/5. Since the selected bandwidth does not depend critically on the choice of m and M, to computation expediency, we take m=[0.1n] and M=5 in our simulation.
Let hopt be the bandwidth obtained by minimizing (21) with respect to h>0; that is, hopt=infh>0AMS(h). Then hopt is the optimal bandwidth for estimating g^(·). When calculating the block empirical likelihood ratios and estimator of β0, we use the approximation bandwidth
(22)h=hoptn-1/20(logn)-1/2,h1=hopt,
because this insures that the required bandwidth has correct order of magnitude for the optimal asymptotic performance (see, e.g., Carroll et al. [27]), and the bandwidth h^ satisfies condition (A.4).
5. A Simulation Study
In this section, we carry out some simulations to study the finite sample performance of the estimated block empirical likelihood method.
Example 5.
The data are generated from
(23)Yij=g0(βXij)+g1(βXij)Zij+εij,i=1,…,N,j=1,…,ni,
where Xij~N(0,1), Zij~N(0,1), g(t)=sin(2πt), εij=aεi,j-1+eij, and eij′s are i.i.d N(0,1). For each combination of N, ni, and a, 1000 samples are generated from the above model in all simulations. For each sample, a 95% confidence interval for β=2 are computed using our estimated block empirical likelihood method. For the smoother, we used a local linear smoother with the Gaussian kernel Kh(t)=(1/h2π)exp(-t2/2h2) with a modified multifold cross-validation criterion bandwidth throughout all smoothing steps. Some representative coverage probabilities and coverage confidence intervals are reported in Table 1. Simulation results show that our estimated block empirical likelihood confidence regions have high coverage probabilities and short average confidence interval lengths.
The selection probabilities of adaptive EL shrinkage estimation.
N
Number of replicates
CI
CP
a=0.2
50
n1=⋯=n25=4, n26=⋯=n50=4
[1.645015, 2.349996]
0.9372
50
n1=⋯=n25=4, n26=⋯=n50=5
[1.634619, 2.355068]
0.9270
100
n1=⋯=n25=4, n26=⋯=n100=4
[1.634619, 2.355068]
0.9343
100
n1=⋯=n25=4, n26=⋯=n100=5
[1.634619, 2.355068]
0.9424
a=0.4
50
n1=⋯=n25=4, n26=⋯=n50=4
[1.634619, 2.355068]
0.9428
50
n1=⋯=n25=4, n26=⋯=n50=5
[1.634619, 2.355068]
0.9334
100
n1=⋯=n25=4, n26=⋯=n100=4
[1.634619, 2.355068]
0.9427
100
n1=⋯=n25=4, n26=⋯=n100=5
[1.634619, 2.355068]
0.9351
Example 6.
Consider the regression model
(24)Yij=g0(β0TXij)+g1(β0TXij)Zij(1)+g2(β0TXij)Zij(2)+εij,
where β0=(1/3,2/3) and the εijs are independent N(0,12) random variables. The sample {Xij=(Xij(1),Xij(2))T;1≤i≤N,1≤j≤ni} was generated from a bivariate uniform distribution on [-1,1]2 with independent components, {Zij=(Zij(1),Zij(2))T;1≤i≤N,1≤j≤ni} was generated from a bivariate normal distribution N(0,Σ) with var(Zij(1))=var(Zij(2))=1, and correlation coefficient between Zij(1) and Zij(2) is ρ=0.5. In model (24), the coefficient functions are g0(u)=8exp(-2u2), g1(u)=6u2 and g2(u)=4sin(πu).
For the smoother, we use a local linear smoother with a Gaussian kernel Kh(u)=(1/h2π)exp(-u2/2h2), and use the modified multifold cross-validation criterion proposed by Cai et al. [26] to select the optimal bandwidth throughout all smoothing steps because the algorithm is simple and quick. We take the weight function ω(u)=I[-1,1]. The sample size for the simulated data is 100, and the run is 1000 times in all simulations.
The confidence regions of β0 and their coverage probabilities, with nominal level 1-α=0.95, were computed from 1000 runs. The estimated block empirical likelihood was used to construct the confidence regions. The simulated results are given in Figure 1. Simulation results show that our block empirical likelihood confidence regions have high coverage probabilities and short average confidence interval lengths.
Averages of 95% confidence regions of (β1,β2), based on EEL (solid curve) and AEL (dashed curve ) when n=100 in the cases of Example 6.
The histograms of the 1000 estimators of the parameters β1 and β2 are in Figures 2(a) and 2(b), respectively. The Q-Q plots of the 1000 estimators of the parameters β1 and β2 are in Figures 3(a) and 3(b), respectively. Figures 2 and 3 show empirically that these estimators are asymptotically normal. The means of the estimates of the unknown parameters β1 and β2 are 0.33342 and 0.66673, respectively, and their biases (standard deviations) are 0.000128 (0.00308) and 0.000603 (0.00352), respectively.
The histograms of the 1000 estimators of every parameter, the estimated curve of density (solid curve) and the curve of normal density (dased curve): (a) for β1 and (b) for β2.
(a) for β1 and (b) for β2: the Q-Q plot of the 1000 estimators of every parameter.
We also consider the average estimates of the coefficient functions g0(u), g1(u), and g2(u) over the 1000 replicates. The estimators g^j(·) are assessed via the root mean squared errors (RMSE); that is, RMSE=∑j=02RMSEj, where
(25)RMSEj=[ngrid-1∑k=1ngrid{g^j(uk)-gj(uk)}2]1/2,
and {uk,k=1,…,ngrid} are regular grid points. The boxplot for the 1000 RMSEs is given in Figure 4. From Figures 4(a)–4(c) we see that every estimated curve agrees with the true function curve very closely. Figure 4(d) shows that all RMSEs of estimates for the unknown functions are very small.
The true curve (solid curve) and the estimated curve (dashed curve); (d) the boxplots of the 1000 RMSE values in estimations of g0(·),g1(·), and g2(·) and the sum of the three RMSEs.
Example 7.
We now apply the block empirical likelihood method to analyze the data from a longitudinal hormone study [28]. The study involved 34 women whose urine samples were collected in one menstrual cycle and whose urinary progesterone was assayed on alternate days. A total of 492 observations were obtained, with each woman contributing from 11 to 28 observations over time. Each woman's cycle length was standardized uniformly to a reference 28-day cycle since the change of the progesterone level for each woman depends on time during a menstrual cycle. In the following, we consider the following model:
(26)Yij=g0(β1AGEij+β2BMIij)Yij=+g1(β1AGEij+β2BMIij)tij+εij,
where Yij is the jth log-transformed progesterone value measured at standardized day tij since menstruation for ith woman, and AGEij and BMIij are age and body mass index for the ith individual at day tij, respectively.
We apply the block empirical likelihood method to fit the data. Because we focus on the estimators of β1 and β2, we only summarize the estimators of β1 and β2 in Figure 5. Next, we denote βind and βAR as the estimators of β=(β1,β2), when the correlation structures are specified as independence and first-order autoregressive, respectively. We see from Figure 5 that both βind and βAR are significant for neither of confidence regions for the two estimators including (0,0). Therefore, we conclude that the parameters β1 and β2 are not significant, which is consistent with the conclusion of Zhang et al. [28].
The 0.95 confidence regions for the regression coefficients β1 and β2 with correlation structures being independence (dotted curve) and first-autoregressive (sold curve).
6. Proof of the Theorem
In order to prove Theorem 3, we introduce the following several lemmas. The following lemma gives uniformly convergent rates of g^(u;β) and g˙^(u;β). This lemma is straight-forward extension of known results in nonparametric function estimation. Moreover, the proofs of Lemma 9 and Lemma 10 is similar with the corresponding Lemma 9 and Lemma 10 of Xue and Wang [9]. We hence omit these proofs.
Lemma 8.
Let ℬn={β∈ℬ:∥β-β0∥≤c0n-1/2} for some positive constant c0. Suppose that conditions (A.1)–(A.3), (A.5), and (A.6) hold. Then
(27)supu∈𝒰ω,β∈ℬn∥g^(u;β)-g0(u)∥=Op({log(1/h)nh}1/2+h2),supu∈𝒰ω,β∈ℬn∥g˙^(u;β)-g˙0(u)∥=Op({log(1/h)nh3}1/2+h).
In order to describe Lemma 9, we use the following notations. Denote 𝒢={g:𝒰ω×ℬ↦Rq},∥g∥𝒢=supu∈𝒰ω,β∈ℬn∥g(u;β)∥. From Lemma 8, we have ∥g^-g0∥𝒢=op(1) and ∥g˙^-g˙0∥𝒢=op(1); hence, we can assume that g lies in 𝒢δ with δ=δn→0 and δ>0, where
(28)𝒢δ={g∈𝒢:∥g-g0∥g≤δ,∥g˙-g˙0∥g≤δ}.
Let g0(βT𝒳;β)=E{g0(β0T𝒳)|βT𝒳} and g˙0(βT𝒳;β)=E{g˙0(β0T𝒳)|βT𝒳},
(29)Q(g,β)=E[{𝒴-gT(βT𝒳;β)𝒵}g˙T(βT𝒳;β)𝒵𝒳ω(βT𝒳)],(30)Qn(g,β)=1n∑i=1N∑j=1ni{yij-gT(βTxij;β)zij}g˙T=1n∑i=1N∑j=1ni11×(βTxij;β)zijxijω(βTxij).
Lemma 9.
Suppose that conditions (A.1)–(A.6) hold. Let
(31)J1(g,β)=Qn(g,β)-Q(g,β)-Qn(g0,β0),J2(g^,β)=Q(g,β)-Q(g0,β)J2(g^,β)=-ϖ(g0(βT𝒳;β);β)J2(g^,β)=×{g(βT𝒳;β)-g0(βT𝒳)},J3(g,β)=ϖ(g0(βT𝒳),β){g(βT𝒳;β)-g0(βT𝒳)}J2(g^,β)=-ϖ(g0(βT𝒳;β),β0)J3(g,β)=×{g(β0T𝒳;β0)-g0(β0T𝒳;β)},J4(g^,β0)=Qn(g0,β0)J4(g^,β0)=+ϖ(g0(βT𝒳),β){g(βT𝒳;β)-g0(βT𝒳)}.
Then
(32)sup(g,β)∈𝒢×ℬn∥J1(g,β)∥=op(n1/2),(33)supβ∈ℬn∥J2(g^,β)∥=op(n1/2),(34)sup(g,β)∈𝒢×ℬn∥J3(g,β)∥=o(n1/2),(35)nJ4(g^,β0)→DN(0,σ2A(β0)),
where A(β0) is defined in (12).
Lemma 10.
Suppose that conditions (A.1)–(A.6) hold. Then
(36)supβ∈ℬn∥Qn(g^,β)∥=Op(n-1/2),(37)supβ∈ℬn∥Rn(β)-σ2B(β0)∥=op(1),(38)supβ∈ℬnmax1≤i≤N1≤j≤ni∥η^ij(β)∥=op(n1/2),(39)supβ∈ℬn∥λ(β)∥=op(n-1/2),
where Qn(g^,β) is defined in (30), Rn(β)=n-1∑i=1N∑j=1niη^ij(β)η^ijT(β),B(β0) is defined in condition (A.7), and η^ij(β) is defined in (2).
Proof of Theorem <xref ref-type="statement" rid="thm3.1">3</xref>.
Note that, when β=β0, Lemma 10 also holds. Applying the Taylor expansion to (7) and invoking Lemma 10, we can obtain
(40)-2l^(β0)=-∑i=1N∑j=1ni[λTη^ij(β0)-12{λTη^ij(β0)}2]+op(1).
By (9) and Lemma 10, we have
(41)∑i=1N∑j=1ni{λTη^ij(β0)}2=∑i=1N∑j=1niλTη^ij(β0)+op(1),λ={∑i=1N∑j=1niη^ij(β0)η^ijT(β0)}-1∑i=1N∑j=1niη^ij(β0)+op(n-(1/2)).
This together with (40) proves that
(42)-2l^(β0)=nQnT(g^,β0)Rn-1(β0)Qn(g^,β0)+op(1),
where Qn(g^,β0) and Rn(β0) are defined in (30) and (37), respectively. From (37) of Lemma 10 and (42), we obtain
(43)-2l^(β0)={(σ2A)-1/2nQn(g^,β0)}TG(β0)-2l^(β0)=×{(σ2A)-1/2nQn(g^,β0)}+op(1),
where G(β0)=A1/2(β0)B-1(β0)A1/2(β0). Let G0=diag(ω1,…,ωn), where ωi,1≤i≤p, are the eigenvalues of G(β0). Then there exists an orthogonal matrix H such that HTG0H. Using the notations of Lemma 9, we have
(44)Qn(g^,β)=J1(g^,β)+J2(g^,β)+J3(g^,β)Qn(g^,β)=+J4(g^,β)+Q(g0,β).
Noting that Q(g0,β0), from the above equation and Lemma 9, we have
(45)Qn(g^,β0)=J4(g^,β0)+op(n-1/2).
Hence, by (35) of Lemma 9, we have
(46)H{σ-2A-(β0)}1/2nQn(g^,β0)→DN(0,Ip),
where Ip is the p×p identity matrix. This together with (43) proves Theorem 3.
Proof of Theorem <xref ref-type="statement" rid="thm3.2">4</xref>.
By Lemma 10 and, similarly to the proof of (42), we can obtain
(47)l^(β)=-n2QnT(g^,β){σ2B(β)}-1Qn(g^,β)+Op(1),
uniformly for β∈ℬn, where op(1) tends to 0 in probability uniformly for β∈ℬn. Note that A^(β0)→pA(β0) and B^(β0)→pB(β0). By the expansion of l^a(β0), defined in (19) and (47), we get
(48)l^a(β0)=QnT(g^,β){σ-2A-(β0)}Qn(g^,β0)+Op(1).
This together with (44) and (48) proves Theorem 4.
Then we complete the proof.
Acknowledgments
This research was supported by NNSF project (11171188 and 11231005) of China, Mathematical Finance-Backward Stochastic Analysis and Computations in Financial Risk Control of China (11221061), NSF and SRRF projects (ZR2010AZ001 and BS2011SF006) of Shandong Province of China, K C Wong-HKBU Fellowship Programme for Mainland China Scholars 2010-11, and the Fundamental Research Funds for the Central Universities (27R1310008A).
HuangZ.Empirical likelihood for single-index varying-coefficient models with right-censored dataGaoZ.KongD.GaoC.Modeling and control of complex dynamic systems: applied mathematical aspectsJianL.ShenS.SongY.Improving the solution of least squares support vector machines with application to a blast furnace systemHuJ.GaoZ.Modules identification in gene positive networks of hepatocellular carcinoma using pearson agglomerative method and pearson cohesion coupling modularityBaiY.FungW. K.ZhuZ. Y.Penalized quadratic inference functions for single-index models with longitudinal dataChiangC. T.RiceJ. A.WuC. O.Smoothing spline estimation for varying coefficient models with repeatedly measured dependent variablesHuangJ. Z.WuC. O.ZhouL.Varying-coefficient models and basis function approximations for the analysis of repeated measurementsQuA.LiR.Quadratic inference functions for varying-coefficient models with longitudinal dataXueL.WangQ.Empirical likelihood for single-index varying-coefficient modelsOwenA. B.Empirical likelihood ratio confidence intervals for a single functionalOwenA.Empirical likelihood ratio confidence regionsWangQ. H.JingB. Y.Empirical likelihood for partial linear models with fixed designsChenS. X.QinY. S.Empirical likelihood confidence intervals for local linear smoothersShiJ.LauT. S.Empirical likelihood for partially linear modelsXueL.ZhuL.Empirical likelihood for a varying coefficient model with longitudinal dataXueL.ZhuL.Empirical likelihood semiparametric regression analysis for longitudinal dataXueL. G.ZhuL.Empirical likelihood for single-index modelsOwenA. B.HuangZ.ZhangR.Testing for the parametric parts in a single-index varying-coefficient modelFengS.XueL.Variable selection for single-index varying-coefficient modelYouJ. H.ChenG. M.ZhouY.Block empirical likelihood for longitudinal partially linear regression modelsFanJ.GijbelsI.YouJ.ZhouY.Empirical likelihood for semiparametric varying-coefficient partially linear regression modelsRaoJ. N. K.ScottA. J.The analysis of categorical data from complex sample surveys: chi-squared tests for goodness of fit and independence in two-way tablesWangQ.RaoJ. N. K.Empirical likelihood-based inference in linear errors-in-covariables models with validation dataCaiZ.FanJ.YaoQ.Functional-coefficient regression models for nonlinear time seriesCarrollR. J.FanJ.GijbelsI.WandM. P.Generalized partially linear single-index modelsZhangD.LinX.RazJ.SowersM.Semiparametric stochastic mixed models for longitudinal data