Block Empirical Likelihood for Longitudinal Single-Index Varying-Coefficient Model

In this paper, we consider a single-index varying-coefficient model with application to longitudinal data. In order to accommodate the within-group correlation, we apply the block empirical likelihood procedure to longitudinal single-index varying-coefficient model, and prove a nonparametric version of Wilks’ theorem which can be used to construct the block empirical likelihood confidence region with asymptotically correct coverage probability for the parametric component. In comparison with normal approximations, the proposed method does not require a consistent estimator for the asymptotic covariance matrix, making it easier to conduct inference for the model’s parametric component. Simulations demonstrate how the proposed method works.


Introduction
The single-index varying-coefficient model which was proposed by Huang and Zhensheng [1] is a very important tool to explore the dynamic pattern in many complex dynamic systems, such as economics, finance, politics, epidemiology, medical science, and ecology.As mentioned in Gao et al. [2], the concept of complex dynamic systems arises in many varieties.Such systems are often concurrent and distributed, because they have to react to various kinds of events, signals, and conditions.They may be characterized by a system with uncertainties, time delays, stochastic perturbations, hybrid dynamics, distributed dynamics, chaotic dynamics, and a large number of algebraic loops.Moreover, many related literatures, such as Jian et al. [3] and Hu et al. [4], have been proposed.The single-index varying-coefficient models is one method that can be used to describe the complex dynamic systems.They are natural extensions of classical parametric models with good interpretability and are becoming more and more popular in data analysis.
Longitudinal data arise frequently in many scientific studies.For longitudinal data, we know that the data that are collected from the same subject at different times are correlated and that the observations from different subjects are often independent.Therefore, it is of great interest to estimate the regression function incorporating the withinsubject correlation to improve the efficiency of estimation.The single-index varying-coefficient model is a popular nonparametric fitting technique; it is easily interpreted in real applications because it has the features of the single-index model and the varying-coefficient model.In addition, the single-index varying-coefficient model may include crossproduct terms of some components of covariates.Hence, it has considerable flexibility to cater for a complex multivariate nonlinear structure.
Without loss of generality, we consider a longitudinal study with  subjects and   observations over time for the th subject ( = 1, . . ., ) with a total of  = ∑  =1   observations.In this article, we apply longitudinal data to a single-index varying-coefficient model, and propose a singleindex varying-coefficient longitudinal data model of the form   =   (    )   +   ,  = 1, . . ., ;  = 1, . . .,   , (1) where (  ,   ) ∈   ×  is a vector of covariates,   is the th measurement on the th unit,  is an  × 1 vector of unknown parameters, (⋅) is an  × 1 vector of unknown functions, and   is a random error with mean 0 and finite variance  2 , assuming that   and (  ,   ) are independent.For the sake of identifiability, it is often assumed that ‖‖ = 1 and the first nonzero element is positive, where ‖ ⋅ ‖ denotes the Euclidean metric.
Obviously, model (1) includes a class of important statistical models.For example, if  = 1 and   = 1, model (1) reduces to the single-index longitudinal data model which was proposed by Bai et al. [5] to estimate the index coefficient and unknown link function in a singleindex model for longitudinal data by combining penalized splines and quadratic inference functions.If  = 1 and  = 1, (1) is the varying-coefficient longitudinal data model studied by Chiang et al. [6], Huang et al. [7], and Qu and Li [8], among others.So model ( 1) is easily interpreted in real applications because it has the features of the singleindex longitudinal data model and the varying-coefficient longitudinal data model.In addition, model (1) may include cross-product terms of some components of   and   .Hence, it has considerable flexibility to cater for complex multivariate nonlinear structure.
When   = 1, model (1) reduces to the nonlongitudinal single-index varying-coefficient model.Some authors have studied the estimation and application of the model.Recently, empirical likelihood methods have been applied to nonlongitudinal single-index varying-coefficient model.For example, Xue and Wang [9] developed statistical techniques for the unknown coefficient functions and single-index parameters in the single-index varying-coefficient models.They first estimate the nonparametric component via the local linear fitting, then construct an estimated empirical likelihood ratio function, and hence obtain a maximum empirical likelihood estimator for the parametric component.The motivation is that empirical likelihood based inference has many desirable statistical properties.For example, this method does not involve any variance estimation which is rather complicated in nonparametric or semiparametric regression settings and hence are robust against the heteroscedasticity; confidence region based on the empirical likelihood method does not have predetermined symmetry so that it can better correspond to the true shape of the underlying distribution, and so on.Owen [10,11] and many others developed this into a general methodology.For example, Wang and Jing [12], Chen and Qin [13], Shi and Lau [14], and Xue and Zhu [15][16][17], among others.A recent survey on empirical likelihood can be found in the monograph of Owen [18].More methods about the single-index varying-coefficient model have been proposed, such as Huang and Zhang [19] and Feng and Xue [20].When   > 1, model ( 1) is the single-index longitudinal data model.The usual empirical likelihood method cannot be applied, however, to the single-index longitudinal data model (1) due to correlation within groups.In this paper, we propose a block empirical likelihood procedure to accommodate this correlation.A nonparametric version of the Wilks' theorem is derived, which can be used to construct confidence regions with asymptotically correct coverage probabilities for the parametric component in the model.Compared with normal approximations, our method has the appealing feature that it does not require one to construct a consistent estimator for the asymptotic covariance matrix.Furthermore, the block empirical likelihood method avoids intensive Monte Carlo simulations usually required by the bootstrap method.
The rest of the paper is organized as follows.Section 2 introduces the estimated block empirical likelihood method.Section 3 derives the nonparametric version of Wilks' theorem.Section 4 provides a data-driven procedure to choose the tuning parameters.A simulation study is given in Section 5. Proof of the main result is relegated to Section 6.

Block Empirical Likelihood Method
In this section, we are to extend the results of You et al. [21] and Xue and Wang [9] to the single-index varying-coefficient longitudinal data model.
To apply the block empirical likelihood method to model (1), we introduce an auxiliary random vector where ġ (⋅) stands for the derivative of the function vector (⋅), and (⋅) is a bounded weight function with a bounded support U  , which is introduced to control the boundary effect in the estimations of (⋅) and ġ (⋅).For convenience, we pointed that (⋅) is the indicator function of the set U  .Note that {  ()} = 0 if  =  0 .Hence, the problem of testing whether  is the true parameter is equivalent to testing whether {  ()} = 0, for  = 1, . . ., ;  = 1, . . .,   .Because of the unknowns (⋅) and ġ (⋅), we cannot directly use the block empirical likelihood method to make statistical inference on .A natural way is to replace (⋅) and ġ (⋅) by their estimators.In this paper, we estimate the vector functions (⋅) and ġ (⋅) via the local linear regression technique (see, e.g., Fan and Gijbels [22]).The local linear estimators for () and ġ () are defined as ĝ(; ) = â and ̂ġ (; ) = b at the fixed point  0 , where â and b minimize the sum of weighted squares: where  ℎ (⋅) = ℎ −1 (⋅/ℎ), (⋅) is a kernel function, and ℎ = ℎ  is a bandwidth sequence that decreases to 0 as  increase to ∞.It follows from the least squares theory that ( ĝ (; ) , ℎ ̂ġ  (; )) where with Remark 1.Since the convergence rate of the estimator of ġ  0 () is slower than that of the estimator of  0 () if the same bandwidth is used, this leads to a slower convergence rate for the estimator β of  than √.To increase the convergence rate of the estimator of ġ  0 (), we introduce the another bandwidth ℎ 1 to replace ℎ in ̂ġ (; ) and define it as ̂ġ ℎ 1 (; ).Similar to Owen [11] and Shi and Lau [14], {r  () =   − ĝ (    )  ,  = 1, . . ., ;  = 1, . . .,   } can be treated as a random sieve approximation of the random error sequence {  ,  = 1, . . ., ;  = 1, . . .,   }.In order to deal with the correlation within groups, we use the block empirical likelihood procedure proposed by You et al. [21].Unlike the usual empirical likelihood method, the block empirical likelihood procedure takes the "data" r () for  = 1, . . .,   into account as a whole.Let η () = r () ̂ġ and ̂ġ (    ; ), respectively, for  = 1, . . ., ;  = 1, . . .,   .Then an estimated block empirical likelihood function for  is defined as For a given  a unique maximum exists, provided that 0 is inside the convex hull of the points ∑   =1   () for  = 1, . . ., .The maximum of (7) may be found via the method of Lagrange multipliers.The optimal value for   satisfying (7) may be shown to be where the Lagrange multiplier  = ( 1 , . . .,   )  is the solution of the following equation: Since  1 × ⋅ ⋅ ⋅ ×   is maximized for   = 1/ in the absence of parametric constraints, we define the corresponding estimated profile block empirical log-likelihood ratio as We will show in the next section that if  0 is the true parameter vector, l( 0 ) is asymptotically chi-square distributed.

Theoretical Properties
Throughout this article, we assume that  increases to push up the total sample size  =  1 + ⋅ ⋅ ⋅ +   , while the   is fixed.To establish the nonparametric Wilks' theorem for ( 0 ), we first make the following assumptions.
(A.1) The density function of     , (), is bounded away from zero for  ∈ U  and  near  0 and satisfies the Lipschitz condition of order 1 on U  , where U  is the support of (). (A. 2) The function   (), 1 ≤  ≤ , have continuous second derivatives on U  , where   () are the th components of ().Remark 2. Condition (A.1) is used to bound the density function of     away from zero.This ensures that the denominators of ĝ(; ) and ̂ġ (; ) are, in probability one, bounded away from 0 for  ∈ U  .The second derivatives in (A.2) are standard smoothness conditions.(A.3)-(A.5)are necessary conditions for the asymptotic normality or the uniform consistency of the estimators.It should be pointed out that the condition can be replaced by (‖  ‖ 6+ ) < ∞, (‖  ‖ 6+ ) < ∞, and (|  | 6+ ) < ∞ for some  > 0. In the current work, the exponential index of the norm is set as 6 for it is the minimum value to meet the asymptotic normality or the uniform consistency of the estimators.Conditions (A.6) and (A.7) ensure that the asymptotic variance for the estimator of  0 exists.Let B = { ∈   : ‖‖ = 1, and the first nonzero element is positive}.Then  0 is an inner point of set B. The following theorem shows that −2 l( 0 ) is asymptotically distributed as a weighted sum of independent  2 1 variables.
To apply Theorem 3 to construct a confidence region or interval for  0 , we need to consistently estimate the unknown weights   .By the plug-in method, ( 0 ) and ( 0 ) can be consistently estimated by respectively, where β is the maximum empirical likelihood estimator of  0 defined by ( 9), V =   ̂ġ  ( β   ; β)  ( β   ), Ĉ(⋅) = ∑  =1   (⋅)V     , and where  1 (⋅) is a kernel function, and   is bandwidth with 0 <   → 0. This implies that the eigenvalues of Ĝ( β) = B−1 ( β) Â( β), say ω , consistently estimate   for  = 1, . . ., .Let ĉ1− be the 1 −  quantile of the conditional distribution of the weighted sum ŝ = ω1  2  1,1 + ⋅ ⋅ ⋅ + ω  2 1, given the data.Then an approximate 1 −  confidence region for  0 can be defined as follows: In practice, the conditional distribution of the weighted sum ŝ, given the sample {(  ,   ,   ),1 ≤  ≤ ;1 ≤  ≤   }, can be calculated using Monte Carlo simulations by repeatedly generating independent samples  2  1,1 , . . .,  2 1, from the  2 1 distribution.In addition to the above, direct way of approximating the asymptotic distributions, we can also consider the following alternative.The alternative is motivated by the results of Rao and Scott [24].Now, we propose another adjusted empirical log-likelihood, whose asymptotic distribution is chi-squared with  degrees of freedom.The adjustment technique is developed by Wang and Rao [25] by using an approximate result in Rao and Scott [24].Note that ρ() can be written as By examining the asymptotic expansion of −2 l(), which is specified in the proof of Theorem 4 below, we define an adjustment factor by replacing Â() in ρ() by Σ(), where =1 η ()}  .The adjusted empirical log-likelihood ration is defined by where l() is defined in (10).
According to Theorem 4, l () can be used to construct an approximate confidence region for  0 .Let Then, R  () gives a confidence region for  0 with asymptotically correct coverage probability 1 − .

Bandwidth Selection
For practical implementation, the tuning parameters need to be decided.We employ a data-driven procedure to choose the tuning parameter ℎ, where ℎ controls the smoothness of ĝ(⋅) and ̂ġ (⋅).We all know that various existing bandwidth selection techniques for nonparametric regression, such as the cross-validation, generalized cross-validation, and the modified multifold cross-validation criterion, can be adapted for the estimation ĝ(⋅) and ̂ġ (⋅).Because the algorithm of the modified multifold cross-validation criterion proposed by Cai et al. [26] to select the optimal bandwidth is simple and quick, throughout the empirical studies in this paper, we consider the modified multifold cross-validation criterion.Specifically, let  and  be two given positive integers and  > .The basic idea is first to use  subseries of lengths  −  ( = 1, . . ., ) to estimate the unknown coefficient functions and then to compute the one-step forecasting error of the next section of the sample of length  based on the estimated models.More precisely, we choose ℎ which minimizes where ĝ, (⋅) are computed from the sample {(  ,   ,   ), 1 ≤  ≤  − ;1 ≤  ≤   } with bandwidth equal to ℎ(/( − )) 1/5 .Note that for different sample size, we rescale bandwidth according to its optimal rate, that is, ℎ ∝  −1/5 .Since the selected bandwidth does not depend critically on the choice of  and , to computation expediency, we take  = [0.1]and  = 5 in our simulation.Let ℎ opt be the bandwidth obtained by minimizing ( 21) with respect to ℎ > 0; that is, ℎ opt = inf ℎ>0 AMS(ℎ).Then ℎ opt is the optimal bandwidth for estimating ĝ(⋅).When calculating the block empirical likelihood ratios and estimator of  0 , we use the approximation bandwidth because this insures that the required bandwidth has correct order of magnitude for the optimal asymptotic performance (see, e.g., Carroll et al. [27]), and the bandwidth ĥ satisfies condition (A.4).

A Simulation Study
In this section, we carry out some simulations to study the finite sample performance of the estimated block empirical likelihood method.

Example 6. Consider the regression model
For the smoother, we use a local linear smoother with a Gaussian kernel  ℎ () = (1/ℎ √ 2) exp(− 2 /2ℎ 2 ), and use the modified multifold cross-validation criterion proposed by Cai et al. [26] to select the optimal bandwidth throughout all smoothing steps because the algorithm is simple and quick.We take the weight function () =  [−1,1] .The sample size for the simulated data is 100, and the run is 1000 times in all simulations.
The confidence regions of  0 and their coverage probabilities, with nominal level 1 −  = 0.95, were computed from 1000 runs.The estimated block empirical likelihood was used to construct the confidence regions.The simulated results are given in Figure 1.Simulation results show that our block empirical likelihood confidence regions have high coverage probabilities and short average confidence interval lengths.
The histograms of the 1000 estimators of the parameters  1 and  2 are in Figures 2(a) and 2(b), respectively.The Q-Q plots of the 1000 estimators of the parameters  1 and  2 are in Figures 3(a) and 3(b), respectively.Figures 2 and 3 show empirically that these estimators are asymptotically normal.The means of the estimates of the unknown parameters  1 and  2 are 0.33342 and 0.66673, respectively, and their biases (standard deviations) are 0.000128 (0.00308) and 0.000603 (0.00352), respectively.We also consider the average estimates of the coefficient functions  0 (),  1 (), and  2 () over the 1000 replicates.The estimators ĝ (⋅) are assessed via the root mean squared errors (RMSE); that is, RMSE = ∑ 2 =0 RMSE  , where and {  ,  = 1, . . .,  grid } are regular grid points.The boxplot for the 1000 RMSEs is given in Figure 4. From Figures 4(a)-4(c) we see that every estimated curve agrees with the true function curve very closely.Figure 4(d) shows that all RMSEs of estimates for the unknown functions are very small.
Example 7. We now apply the block empirical likelihood method to analyze the data from a longitudinal hormone study [28].The study involved 34 women whose urine samples were collected in one menstrual cycle and whose urinary progesterone was assayed on alternate days.A total of 492 observations were obtained, with each woman contributing from 11 to 28 observations over time.Each woman's cycle length was standardized uniformly to a reference 28-day cycle since the change of the progesterone level for each woman depends on time during a menstrual cycle.In the following, we consider the following model: where   is the th log-transformed progesterone value measured at standardized day   since menstruation for th woman, and AGE  and BMI  are age and body mass index for the th individual at day   , respectively.
We apply the block empirical likelihood method to fit the data.Because we focus on the estimators of  1 and  2 , we only summarize the estimators of  1 and  2 in Figure 5. Next, we denote  ind and  AR as the estimators of  = ( 1 ,  2 ), when the correlation structures are specified as independence and first-order autoregressive, respectively.We see from Figure 5 that both  ind and  AR are significant for neither of confidence regions for the two estimators including (0,0).Therefore, we conclude that the parameters  1 and  2 are not significant, which is consistent with the conclusion of Zhang et al. [28].

Proof of the Theorem
In order to prove Theorem 3, we introduce the following several lemmas.The following lemma gives uniformly convergent rates of ĝ(; ) and ̂ġ (; ).This lemma is straightforward extension of known results in nonparametric function estimation.Moreover, the proofs of Lemma 9 and Lemma 10 is similar with the corresponding Lemma 9 and Lemma 10 of Xue and Wang [9].We hence omit these proofs.

1 = 5 [Figure 2 :Figure 3 :
Figure 2: The histograms of the 1000 estimators of every parameter, the estimated curve of density (solid curve) and the curve of normal density (dased curve): (a) for  1 and (b) for  2 .

1 Figure 5 :
Figure 5: The 0.95 confidence regions for the regression coefficients  1 and  2 with correlation structures being independence (dotted curve) and first-autoregressive (sold curve).

Table 1 :
The selection probabilities of adaptive EL shrinkage estimation.