The Empirical Cressie-Read Test Statistics for Longitudinal Generalized Linear Models

For the marginal longitudinal generalized linear models (GLMs), we develop the empirical Cressie-Read (ECR) test statistic approach which has been proposed for the independent identically distributed (i.i.d.) case.The ECR test statistic includes empirical likelihood as a special case. By adopting this ECR test statistic approach and taking into account the within-subject correlation, the efficiency theory results of estimation and testing based on ECR are established under some regularity conditions. Although a working correlationmatrix is assumed, there is no need to estimate the nuisance parameters in theworking correlationmatrix based on the quadratic inference function (QIF).Therefore, the proposed ECR test statistic is asymptotically a standard χ2 limit under the null hypothesis. It is shown that the proposed method is more efficient even when the working correlation matrix is misspecified. We also evaluate the finite sample performance of the proposed methods via simulation studies and a real data analysis.


Introduction
Longitudinal studies are increasingly common in many scientific research fields, including epidemiology, econometrics, medicine, and life and social sciences.For example, longitudinal studies are often used in psychology to study developmental trends across the life span, in sociology to study life events throughout lifetimes or generations, and in medical area to take different treatments at the start of the study and to see what kind of effects the assigned patients have with each treatment by week or by year.
Longitudinal data modeling is a statistical method often used in the experiments that are designed such that responses on the same experimental units are observed at each repetition.However, generalized linear models (GLMs [1]) have become a favored tool for the modelling of clustered and longitudinal data by allowing for the non-Gaussian data and nonlinear link functions, such as binomial or Poisson's type responses.The intrinsic complexity of longitudinal GLMs makes them challenging.First, one characteristic of longitudinal data is the within-subject correlation among the repeated measurements.Ignoring this within-subject correlation causes a loss of efficiency in general problems, and the presence of correlation makes it hard to establish the underlying asymptotic theory.Second, the full likelihood for longitudinal data is often difficult to specify, particularly for the correlated non-Gaussian data.Researchers had developed appropriate methods and criteria for longitudinal GLMs, where generalized estimating equations (GEE [2,3]) or a recently proposed related method based on quadratic inference functions (QIF; see [4]) is an attractive option for longitudinal regression, particularly when marginal relationships rather than within-subject correlation are of primary interest.
The empirical likelihood method was introduced by Owen [12][13][14] as a nonparametric method of inference based on a data-driven likelihood ratio function in the i.i.d.case and had been applied to various statistical models.For example, Kolaczyk [15] and Chen and Cui [16] made an extension to the generalized linear models.As for other studies, see Baggerly [17], DiCiccio et al. [18], Chen and Qin [19], Qin and Lawless [20], Zhu and Xue [21], and Li et al. [22] among others.Owen [23] is one of the best references for this field.Although the empirical likelihood method had been investigated by many authors, those researches had mostly focused on independent observations.For the analysis of longitudinal data, Xue and Zhu [24] applied the empirical likelihood approach to the varying coefficient models with longitudinal data, but they did not consider the within-subject correlation structure of the longitudinal data.Li et al. [25] and Li et al. [26] proposed the generalized empirical likelihood method for longitudinal data by introducing the known within-subject correlation structure.Wang et al. [27] proposed two generalized empirical likelihood methods for the longitudinal linear model by taking into consideration within-subject correlations.They showed that, by plugging the estimator of the working correlation matrix into the empirical log-likelihood ratio (ELR), the resulting ELR is asymptotically a weighted sum of independent  2 1 -variables with unknown weights.Thus, the empirical likelihood method needs to be adjusted so that it can be efficiently used to accommodate the correlation inherent in longitudinal data.However, in many applications, how to estimate the working correlation matrix effectively is a challenging problem.
Instead of the empirical likelihood ratio statistic, Baggerly [17] used the empirical Cressie-Read (ECR) test statistic and showed that it is also asymptotically a Chi-squared distribution in the i.i.d.case.Bravo [28] extended the ECR approach to inference for -mixing processes.The empirical Cressie-Read test statistic has user-specified parameter  ∈ (−∞, ∞) and encompasses several commonly used testing statistics as special cases, such as the empirical likelihood statistic ( = 0), the maximum entropy, minimum information or the Kullback-Leibler statistic ( = −1), the Neyman-modified  2 statistic or the Euclidean likelihood statistic ( = −2), the Freeman-Tukey statistic ( = −1/2), and Pearson's  2 statistic ( = 1), the first two being defined in a limiting sense.Thus, we can use these results to construct confidence regions of the parameter  of interest.
In this paper we develop Baggerly's results on the ECR test statistic for the longitudinal data by combining the idea of QIF in Qu et al. [4] and apply the proposed method to the marginal longitudinal GLMs (1).Specifically, we derive the asymptotic distributions of the maximum empirical Cressie-Read likelihood estimator (MECRLE) and the ECR test statistic.Although the working correlation matrix is assumed, the proposed method needs not to estimate the nuisance parameters associated with correlations.This is because the inverse of the commonly used working correlation matrix can be exactly represented or approximated by a linear combination of basis matrices such that the dimension of the unbiased estimating functions is greater than the number of unknown parameters.Thus, the efficiency results can be obtained by combining the ECR method with the generalized method of moments (GMM) proposed by Hansen [29].Therefore, the proposed method inherits the advantages of the empirical likelihood, QIF, and GMM methods, and the proposed method represents a good alternative in this area as well.
The paper is organized as follows.In Section 2, we propose the more generalized method of the ECR test statistic for marginal longitudinal GLMs by combining with the QIF approach.The technical conditions and some asymptotic properties are given in Section 3. In Section 4, we present the results for simulation studies and a real dataset to illustrate the proposed methods.The technical proofs of the main results are presented in Section 5.

Model and ECR Test Statistic
Suppose that the population (X, Y) comes from the marginal longitudinal generalized linear model (1).Then the marginal mean of   is and the marginal variance of   is where ](⋅) is a variance function and  is a scale parameter.With assumptions on the first marginal moments, the GEE [2] estimator of  is defined as the solution of where   = ( 1 , . . .,   )  , μ  =   /, is an  ×  matrix and is a working covariance matrix with A  being the  ×  diagonal matrix of marginal variances Var(Y  ) and R() as the working correlation matrix, where  is a vector which fully characterizes R().The main advantage of the GEE method is that it yields a consistent estimator even if the working correlation matrix is misspecified.
Qu et al. [4] introduced the quadratic inference function (QIF) by assuming that the inverse of the working correlation can be approximated by a linear combination of several basis matrices; that is, where  1 is the identity matrix,  2 , . . .,   are symmetric basis matrices which are determined by the structure of R(), and  1 , . . .,   are constant coefficients.The advantage of this approach is that it does not require estimation of linear coefficients   's which can be viewed as nuisance parameters.
To implement the QIF approach, we need to choose the basis for the inverse of the correlation matrix R().Qu et al. [4] and Qu and Song [30] found that, if R() is the exchangeable working correlation matrix, R −1 =  1  1 +  2  2 , where  1 is the identity matrix and  2 is a matrix with 0 on the diagonal and 1 off diagonal.If the working correlation matrix is AR (1) with   =  |−| , then R −1 can be written as a linear combination of three basis matrices, that is, , where  * 1 is the identity matrix,  * 2 can be 1 on the subdiagonal and 0 elsewhere, and  3 can be 1 on the corners (1, 1) and (, ) and 0 elsewhere.However,  * 3 can often be dropped out of the model, as removing  * 3 does not affect the efficiency of the estimator too much, and this can simplify the estimation procedure.More details can be found in Qu et al. [4], Qu and Song [30], and Qu and Li [8] among others.In addition, Qu and Lindsay [31] developed an adaptive estimating equation approach to find a reliable approximation to the inverse of the variance matrix. Let It is easy to check that the GEE defined in (4) becomes a linear combination of the above extended score vector of ∑  =1   ().Note that the dimension of   () is  = , and it is greater than the number of unknown parameters; thus the GEE method is unavailable.Therefore, Qu et al. [4] proposed the quadratic inference function by extending the method of GMM proposed by Hansen [29] to obtain the estimator of .
In this paper, we consider an alternate method using the following empirical Cressie-Read test statistics [17,32]: where  is a user-specified parameter.ECR ( 7) is a generalization of the empirical log-likelihood ratio statistic and can also be defined as Let   be a multinomial likelihood supported on the observed data, and suppose that we are interested in testing the null hypothesis  0 :  =  0 , where  0 is the true parameter vector.The empirical Cressie-Read family of test statistics for  0 can be based on the following constrained minimization problem: As noted by Baggerly [17], a unique value for the righthand side of ( 9) exists, provided that zero is inside the convex hull of the  1 (), . . .,   () for given .Applying the Lagrange multiplier method, we can obtain the solution of  by minimizing (9).Let where  1 and  2 = ( 1 , . . .,   )  are the Lagrange multipliers.
Let /  = 0; we have where  and the vector  = ( 1 , . . .,   )  are determined by In the present paper, we only are interested in the case of  ̸ = − 1.When  → −1, the estimator of  is viewed as the empirical exponential tilting estimator (see [33,34]).When  ̸ = − 1 and substituting ( 11) into (7), we obtain the following empirical Cressie-Read test statistic: Baggerly [17] had applied the empirical Cressie-Read test statistic to the general parameter case and shown that all members of the empirical Cressie-Read family have a Chisquared calibration and enjoy the advantages of the empirical likelihood method.From the proof of Theorem 1 in this paper, we can obtain that where   () = (1/) ∑  =1   () and Σ = (1/) ∑  =1   ()   ().If β satisfies CR  ( β) = inf ∈B CR  (), the estimator β is called the maximum empirical Cressie-Read likelihood estimator (MECRLE) of the parameter .

Asymptotic Properties
To establish the asymptotic properties, we need the following regularity conditions.
(C1) The matrix Σ = (1/) ∑  =1   ()   () converges a.s. to an invertible positive definite matrix Σ().This condition holds based on the weak law of large number when  tends to infinity and the cluster size is fixed.
(C2) The domain B is a compact subset of R  and the true parameter value  0 is in its interior.Conditions (C1)-(C4) are actually quite mild and can be easily satisfied, and these conditions are also found in Qin and Lawless [20] and Qu et al. [4].Conditions (C1) and (C4) ensure that the asymptotic variance exists for the proposed estimator.Condition (C2) ensures that there exists a √consistent solution in the compact subset B. (C3) is a common condition used in GLMs with longitudinal data.Theorem 1. Assume that conditions (C1-C4) hold.Then, as  → ∞, with probability tending to 1, CR  () attains its minimum value at some point β in the interior of the ball ‖− 0 ‖ ≤  −1/3 and β, ŝ, and t = ( β) satisfy where Theorem 2. Suppose that the conditions of Theorem 1 hold, and further assume that  2 ()/  is continuous about  in a neighborhood of the true value  0 .Then, as  → ∞, one has where "   →" denotes the convergence in distribution and Remark 3. When  → 0, the empirical log-likelihood ratio statistic is the special case of the empirical Cressie-Read test statistic CR  ().
Remark 4. The proposed method provides a way to find efficient estimates for marginal longitudinal GLMs when the within-subject correlation is considered.It is known that we can construct the confidence regions of  0 using Theorem 2.
The asymptotic variance Ω of √( β −  0 ) does not depend on any  and can be consistently estimated by or by the same expression with the p 's, replaced by 1/.
Theorem 5. Suppose that the conditions of Theorem 2 hold; then, under the hypothesis  0 :  =  0 , CR  ( 0 ) is asymptotically a Chi-squared distribution with  degrees of freedom as  → ∞.
Theorem 5 allows us to use the empirical Cressie-Read test statistic for testing or obtaining the confidence regions for the parameter  0 .For any 0 <  < 1, the confidence region of  0 with asymptotic coverage 1 −  can be determined by We can construct a goodness-of-fit statistic to test the model assumption: We call CR  ( 0 ) − CR  ( β) the empirical Cressie-Read likelihood ratio statistic and call CR  ( β) the model test statistic. ,  0 ,  = 1, . . ., ,  = 1, . . ., 5, (22) where  0 = [1, 2, −0.8]  and the covariate x  = ( ,1 , . . .,  ,3 )  has a multivariate normal distribution with mean zero, marginal variance 1, and an AR(1) correlation matrix with autocorrelation coefficient 0.7.The binary response vector for each cluster has mean specified by (22) and an AR(1) correlation structure with an autocorrelation coefficient  = 0.7.Table 1 reports the simulation results for the average coverage probabilities and the average lengths of the confidence intervals for  1 ,  2 , and  3 when the nominal level is 0.95 and the true correlation structure is AR(1) structure.
From Table 1, it is easy to see that the empirical likelihood method performs much better in terms of coverage accuracies of the confidence intervals even when the working correlation structure is misspecified.And when the working correlation structure is correctly specified, the performances of all the methods are usually slightly better.Table 1 also shows that the average coverage probabilities obtained by all methods tend to the nominal level 0.95 and the average lengths decrease as  increases.
When sample size is 60 and the true correlation structure is AR(1), the histograms and the QQ plots of the 500 maximum empirical likelihood estimators β1 , β2 , and β3 based on the empirical likelihood method are plotted in Figure 1 and the histograms and the QQ plots of the 500 maximum empirical exponential tilting estimators β1 , β2 , and β3 based on the empirical exponential tilting method are plotted in Figure 2 under the misspecified and correct correlation structures, respectively.Figures 1 and 2 show that empirically these estimators are asymptotically normal even when the working correlation structure is misspecified.

Application to Real Data.
To examine the performance of the proposed method, we analyze real data [35,36] from a sixweek frequent magnetic resonance imaging (MRI) substudy of the Betaseron clinical trial conducted at the University of British Columbia in relapsing-remitting multiple sclerosis involving 52 patients.The real data concerns a longitudinal clinical trial to assess the effects of neutralizing antibodies on interferon beta-1b (IFNB) in relapsing-remitting multiple sclerosis (MS), which is a disease that destroys the myelin sheath that surrounds the nerves.All patients were randomized into three treatment groups with allocation of 17 patients being treated by placebo, 17 by low dose, and 16 by high dose.This dataset has been studied by Song [37] and Li et al. [11].There exist the missing values in this dataset; for convenience, we only analyze the balanced longitudinal data which contain 39 patients.
For the analysis of this data, the binary response variable is exacerbation, which refers to whether an exacerbation began since the previous MRI scan, and is 1 for yes and 0 for no.Several baseline covariates are included in the model.They are treatment (trt), time () in weeks, squared time ( 2 ), and duration of disease (dur) in years.Here trt is treated as an ordinal covariate with scale 0, 1, 2 representing zero (placebo), low, and high dosage of the drug treatment.We consider the following marginal logistic model for the data: where   is the probability of exacerbation at visit  for subject .Two correlation structures (exchangeable (CS) and AR( 1)) Figure 1: The histograms and the QQ plots for the maximum empirical likelihood estimators based on the empirical likelihood method for  = 60.When the true correlation structure is AR(1), the left plots are obtained by using the misspecified CS working correlation structure and the right plots are obtained by using the true AR(1) working correlation structure.Figure 2: The and the QQ plots for the maximum empirical exponential tilting estimators based on the empirical tilting method for  = 60.When the true correlation structure is AR(1), the left plots are obtained by using the misspecified CS working correlation structure and the right plots are obtained by using the true AR(1) working correlation structure.are considered in this analysis.Table 2 reports the coefficient estimators and the corresponding standard errors for CS and AR(1) correlation structures, respectively.Table 3 reports the 95% confidence intervals and the lengths of the confidence intervals for CS and AR(1) correlation structures.From Tables 2 and 3, we can see that the parameter estimators are very similar for all four methods under the CS and AR(1) working correlation structures.For the CS working correlation structure, the empirical likelihood and the empirical exponential tilting methods have smaller standard errors and smaller interval lengths than the GEE and QIF methods.However, the GEE method has the smaller standard errors and the smaller interval lengths under the AR(1) working correlation structure.Similar to the results in Song [37], we also find that the baseline disease severity measured as the duration of disease before the trial is an important explanatory variable associated with the risk of exacerbation, and we also do not find strong evidence that the drug treatment is effective in reducing the risk of exacerbation.Therefore, all the methods have comparable performance, and these findings are close to the existing analysis in Song [37].

Proofs of the Theorems
In this section, we present rigorous proofs of our results stated in Section 3.

Theorem 6 .
Suppose that the conditions of Theorem 2 hold and   () has dimension  =  and  has dimension  with  < .Then, under the model assumption(21), CR  ( β) is asymptotically a Chi-squared distribution with  −  degrees of freedom as  → ∞; under the null hypothesis  0 :  =  0 , CR  ( 0 )−CR  ( β) is asymptotically a Chi-squared distribution with  degrees of freedom as  → ∞.

Table 1 :
The average coverage probabilities (ACP) and the average lengths (ALEN) of the confidence intervals for  1 ,  2 , and  3 when the nominal level is 0.95 and the true correlation structure is AR(1) structure.
4.1.Simulation Studies.In this subsection, we report the simulation study to illustrate the finite sample properties of the proposed ECR test statistic.For simplicity, we only compute the empirical likelihood (EL,  = 0) and the empirical exponential tilting (ET,  → −1) and compare the proposed methods with the GEE and QIF methods.Throughout the simulation study, each dataset comprised  = 60 and 100 subjects and  = 5 observations per subject over time.For each case, we repeat the experiment 500 times.3 ∑ =1

Table 2 :
The parameter estimators (the corresponding standard errors) for CS and AR(1) correlation structures.

Table 3 :
95% confidence intervals (CI) and the lengths (LEN) of the confidence intervals for CS and AR(1) correlation structures.