A Semiparametric Marginalized Model for Longitudinal Data with Informative Dropout

We propose a marginalized joint-modeling approach for marginal inference on the association between longitudinal responses and covariates when longitudinal measurements are subject to informative dropouts. The proposed model is motivated by the idea of linking longitudinal responses and dropout times by latent variables while focusing on marginal inferences. We develop a simple inference procedure based on a series of estimating equations, and the resulting estimators are consistent and asymptotically normal with a sandwich-type covariance matrix ready to be estimated by the usual plug-in rule. The performance of our approach is evaluated through simulations and illustrated with a renal disease data application.


Introduction
Longitudinal studies often encounter data attrition because subjects drop out before the designated study end.Both statistical analysis and practical interpretation of longitudinal data can be complicated by dropouts.For example, in the Modification of Diet in Renal Disease MDRD study 1, 2 , one main interest was to investigate the efficacy of interventions of blood pressure control and diet modification on patients with impaired renal functions.The primary outcome was glomerular filtration rate GFR , which measured filtering capacity of kidneys, and was repeatedly measured over the study period.However, some patients could leave the study prematurely for kidney transplant or dialysis, which precluded further GFR measurements.This resulted in a dropout mechanism that could relate to patients' kidney function and correlate with their GFR values.Other patients were followed to the end of the study or dropped out due to independent reasons.Thus, statistical analysis of longitudinal GFR needs to take into consideration the presence of mixed types of informative and independent dropouts.
Many statistical models and inference approaches have been proposed to accommodate the nonignorable missingness into modeling longitudinal data see reviews 3-8 .According to the target of inference and the interpretation of model parameters, existing methods can be classified into three categories: subject-specific inference, event-conditioning inference, and marginal inference.First, a widely used modeling strategy for longitudinal data with informative dropouts is to specify their joint distribution via shared or correlated latent variables.Under such model assumptions, the longitudinal parameters have a conditional, subject-specific interpretation e.g., 9-11 .But the interpretation of longitudinal parameters usually changes with the number and characteristics of latent variables assumed, for example, a single random intercept versus a random intercept plus a random slope.
Second, event-conditioning approaches have also been widely used when the target of inference is within subgroups of patients with particular dropout patterns or when the dropout can potentially change the material characteristic of the longitudinal process e.g., death .The inference is usually conducted conditioning on the dropout pattern or on the occurrence of the dropout event.Thus, model parameters have an event-conditioning subpopulation-averaged interpretation, for example, pattern-mixture models for the group expectation of each dropout pattern 3, 12 ; treatment effects among survivors 13 ; gender and age effects in mortal cohort 14 .Because the interpretation of such models is made by conditioning on a future event, event-conditioning approaches may be natural in a retrospective setting but may not be directly useful for the evaluation of treatment efficacy prospectively.
Lastly, when the research objective is to study covariate effects at population level in a dropout-free situation, marginal models address this concern directly.When data are without missing or missing completely at random using Rubin's definition on missingness 15 , the estimation of model parameters can be carried out by the generalized estimating equation GEE approach assuming a "working" correlation matrix 16 .When dropouts are missing at random, the inverse probability-weighted GEE methods are commonly used 17, 18 .In the presence of informative dropouts, the class of selection models that were originally proposed to adjust selection bias in econometrics 19 have been widely used for the marginal analysis of longitudinal data 20-22 .Recently, the marginalized transition model 23 and marginalized pattern-mixture model 24 were proposed for binary longitudinal data with finite nonignorable nonresponse patterns.These marginalized approaches provide a powerful tool for studying the marginal association between longitudinal outcomes and covariates while incorporating nonignorable nonresponses.
In this paper, we shall adopt the idea of shared latent variables to account for the dependence between longitudinal responses and informative dropouts while focusing on marginal inference for the longitudinal responses.Here dropouts can occur on a continuous time scale.We develop an effective estimation procedure built on a series of asymptotically unbiased estimating equations with light computational burden.The resulting estimators for longitudinal parameters are shown to be consistent and asymptotically normal, with a sandwich-type variance-covariance matrix that can be estimated by the usual plug-in rule.
The remainder of the paper is organized as follows.In Section 2, we introduce the notation and the proposed semiparametric marginalized model.In Section 3, a simple estimating equation-based procedure is first proposed for the situation with pure informative dropouts and is extended to a more general situation where there is a mixture of random dropouts and informative dropouts.Asymptotic properties of resulting estimators are also studied.
Simulation studies and an application to a renal disease data set are given in Section 4. Some remarks are discussed in Section 5.All the technical details are provided in the appendix.

Data Notation
Consider that a longitudinal study follows n subjects over time period 0, τ .For the ith subject i 1, . . ., n, the complete-data consist of {Y ij , X ij , t ij , j 1, . . ., n i }, where Y ij is the value of response at the jth observation time t ij and X ij is a p × 1 vector of covariates associated with response Y ij .Note that X ij includes baseline covariates that are separately denoted by Z i and potential time-dependent covariates.Let T i denote the informative dropout time and C i denote the random censoring time that is independent of Y ij , T i given the covariates.In practice, we observe T * i , δ i , where

Semiparametric Marginalized Latent Variable Model
We first introduce the composition of our proposed model and then discuss the model motivation and interpretation.The first component is a marginal generalized linear model for longitudinal responses Y ij 's: where g • is a known link function and μ ij denotes the marginal expectation.The second component is a linear transformation model for the informative dropout time T i : where H • is an unspecified monotone transformation function and η i is assumed to follow a known continuous distribution F • that is independent of Z i .The last component is a conditional mean model characterizing the dependence between longitudinal responses and informative dropouts: where the latent random effects b i η i {b i1 η i , . . ., b in i η i } are investigator-specified functions of η i and covariates, and Δ ij is an implicit parameter whose value is determined by the integral equation matching the conditional mean model 2.3 with the corresponding marginal model 2.1 , that is, The marginal mean model 2.1 directly specifies the marginal relationship between the responses and covariates, and β M is the p × 1 marginal regression parameters of main interest.Next, the semiparametric linear transformation model 2.2 is chosen to provide a flexible survival model for the informative dropout time while it still can be easily incorporated into model 2.3 for the dependence of the longitudinal responses and informative dropouts.Model 2.2 includes the proportional hazards model 25 , the proportional odds model 26 , and the Box-Cox transformation model as special cases and has been studied intensively in survival analysis literature 27-29 .In addition, as we present in Section 3, the explicit assumption on the error distribution in 2.2 can facilitate the "marginalization" procedure for parameter estimation.
The conditional mean model 2.3 is motivated by the construction of the marginalized random-effects model 30, 31 .As a motivating example, we consider a continuous Gaussian process following a simple random-effects model, , where b i0 , b i1 are the random intercept and slope, and error terms ε ij , j 1, . . ., n i are assumed to follow N 0, Σ i but independent of b i0 , b i1 or η i .Note that ε ij 's can still exhibit temporal dependence in addition to what has been accounted by the random effects, that is, Σ i with AR 1 covariance structure.Furthermore, as in joint modeling approaches via latent vari- . It is easy to see that the conditional mean has the expression as model 2.3 , We use model 2.3 primarily as a parsimonious model for the dependence structure between the longitudinal responses and informative dropout times.However, note that although model 2.3 takes a similar form as the marginalized random-effects model, it does not intend to fully specify the joint distribution of the repeated measurements since model 2.3 only specifies the conditional mean function and there is no conditional independence assumed.
Note that Δ ij is the solution of 2.4 , and thus its value implicitly depends on β M , α, the formulation of b ij η i and the distribution of η i .The specification of b ij η i reflects investigator's assumptions on the dependence structure among the longitudinal responses and their association with the dropout times.It is well known that the dependence assumptions between longitudinal measurements and informative dropouts are usually unverifiable from the observed data, but it intrinsically affects the inference about β M .Thus, a sensitivity analysis under various assumptions is always warranted.It is clear that the sensitivity analysis can be easily conducted within the framework of model 2.3 .For example, the analysis may start with a large model in the specification for α b ij η i , for example, α 1 η i α 2 η i t ij α 3 η i X ij , and then examine the statistical significance of estimates for α's to further simplify the model.As shown in the next subsection, complex structure can be imposed on b ij η i without introducing much extra computation.Lastly, we note that the marginal interpretation of longitudinal parameters in model 2.1 is invariant under different specifications of the conditional mean model 2.3 .

Conditional Generalized Estimating Equation
First, assume that η i is known.We construct a "conditional" generalized estimating equation for B β M , α .More specifically, the estimating function U B is specified as where It is easy to see that U B has mean zero at the true parameter values B 0 under model 2.3 .Note that the vector of marginal parameters β M is implicitly present in U B with Δ i• through the constrain equation 2.4 .Thus, the Jacobian matrix A i• η i needs to be derived using both the constrain 2.4 and models 2.1 and 2.3 , which is different from the ordinary GEE.More specifically, entries of the Jacobian matrix A i• η i are given by , and we use ȧ x to denote the derivative of a function a x throughout this paper.In particular, we have Υ ij μ ij 1−μ ij and Υ ij η i μ ij η i {1 − μ ij η i } under the logit-link function for binary data; Υ ij μ ij and Υ ij η i μ ij η i under the log-link function for count data.Thus, under these canonical link functions, , η i are the marginal variance and conditional variance of the responses, respectively.In addition, these formulations also facilitate our selection of the weight matrix W i .For example, for binary longitudinal data with logit-link function, we can choose a weight matrix as It is clear that the implementation of the estimating function 3.1 requires the knowledge of η i , which is an unknown quantity and has to be estimated first.The estimation of the semiparametric linear transformation model 2.2 has been studied by many authors 27-29 .In particular, Chen et al. 27 proposed a class of martingale-based estimating equations, where Then an iterative algorithm can be carried out to solve θ and H simultaneously.We estimate θ, H using the approach of Chen et al. 27 and shall denote the estimates as Θ θ, H .

Estimation Procedure for Pure Informative Dropouts
We first consider the situation of pure informative dropouts, that is, δ i ≡ 1. Define η i H T i θ Z i and replace η i 's in 3.1 with their estimated counterparts η i 's.Denote the resulting estimating function by U B; Θ and define the estimator of B as the solution to U B; Θ 0. The estimation of B entails an iteration between solving nonlinear equations for Δ ij and updating a Newton-Ralphson equation for B.More specifically, given the current estimated value of B J at the Jth step, we first estimate Δ and then update the parameters B by where The algorithm is iterated until it converges.Because η i is assumed to follow an explicit parametric distribution F, it greatly simplifies the marginalization procedure 3.4 .We propose to use the Gaussian-quadrature approach 32 to numerically evaluate 3. 4 and A J i η i .Since the integrand of 3.4 is monotonic in Δ J ij and so is the whole integral, it is easy to calculate a large number of Δ J ij , i 1, . . ., n, j 1, . . ., m i , in all iterative steps.Moreover, the numerical integration is only upon the one-dimensional space of η i and requires light computation even with complex structure assumed on b ij η i .The proposed iterative algorithm has been implemented using "R" codes, which are available from the authors upon request.

Estimation Procedure for Mixed Types of Dropouts
We generalize the proposed estimation function 3.1 to accommodate the situation where there are mixed informative dropouts and random censoring.More specifically, the modified estimating equation is given by where and the Jacobian matrix , the ith component of U * B; Θ is the same as the one in 3.1 .For δ i 0, the entries of A * i• η * i , 0 are given by 3.8 In addition, the entries of the weight matrix can be changed to Var{Y ij | X ij , η * i , δ i } accordingly.Conditional expectations of various functions given η ≥ η * i are computed using the Gaussian-quadrature method.Let η * i H T * i θ Z i and replace η * i 's in 3.6 with their estimated counterparts η * i 's.Denote the resulting estimating function by U * B; Θ .Then the estimator B of B can be obtained from the equation U * B; Θ 0 using the same iterative algorithm described in the previous subsection.

Asymptotic Properties of B
In this subsection, we establish the asymptotic properties of B. Towards this end, we need the following assumptions.
C1 The covariates X ij 's are bounded with probability 1.

C2
The true parameter values B 0 and θ 0 belong to the interior of a known compact set, and the true transformation function H 0 has a continuous and positive derivative.

C5 The matrix
is positive finite, and the number of repeated measurements m i N.
The regularity conditions C1 -C4 are also used by Chen et al. 27 to derive the consistency and asymptotically normality of the estimators Θ. Condition C5 is needed to establish the consistency and asymptotic normality of B, which is given in the following theorem.

Theorem 3.1. Under conditions (C1)-(C5), with probability 1, | B − B
3.9 The definition of V and a sketch of the proof for Theorem 3.1 are given in the appendix.The asymptotic variance-covariance matrix can be consistently estimated by its empirical counterpart Ω −1 V Ω −1 , which can be easily obtained using the usual plug-in rule.

Simulations
We conducted a series of simulation studies to evaluate the finite-sample performance of our proposed approach.Consider a binary longitudinal process with the marginal probability of success as g −1 β 0 β 1 t β 2 Z , where g was the logit-link function; observations occurred at t i {t ij j, j 0, 1, . . ., 5}; Z was generated from a Bernoulli distribution with the success probability of 0.5, and β 0 , β 1 , β 2 −1.5, 0.3, 1 .The informative dropout time T i was generated from a linear transformation model H T i −θZ i η i , where θ −0.5.We considered three distributions for η i : the standard normal distribution N. , the extreme value distribution E. , and the logistic distribution L. , corresponding to the normal transformation model, the Cox proportional hazard model, and the proportional odds model, respectively, for the informative dropout time.We then generated the binary response Y ij independently from a Bernoulli distribution with the success probability of g −1 {Δ ij αb ij η i }, where Δ ij was calculated to match the marginal mean value as in 2.4 and α indicated the level of dependence.We considered several combinations to specify the dependence between longitudinal outcomes and informative dropout times.More specifically, when α 0, there was no informative dropouts; when α 0.5 or 0.25 and b ij η i η i , the dependence existed and was linear in the latent variable η i ; when α 0.25 or 0.5 and b ij η i η i t ij , the dependence was present through an interaction between the latent variable and the observation time.
For each scenario, we considered samples of size 100 and 200 and conducted 500 runs of simulations.The Gaussian-quadrature approximation was calculated using 50 grid points.We first considered the situation of pure informative dropouts and generated the dropout time T i from the transformation model with H 0 t 2{arctan t π/2}.Under the assumptions of η i following the normal, the extreme value, and the logistic distributions, the average numbers of repeated measurements were 3.91, 3.37, and 3.94, respectively.The estimation results on β 1 and α are summarized in Table 1.The proposed estimators are unbiased under all simulated scenarios, and the Wald-type 95% confidence intervals all have reasonable empirical coverage probabilities.The performance of the proposed method is consistent with different distributional assumptions of η i and different specifications of the dependence structure, and the results improve as the sample size increases.
Next, we consider the situation where there are mixed informative dropouts and random censoring.For simplicity, let C i be an administrative censoring at the end of the study, that is, τ 6.The informative dropout time T i was generated from the transformation model with H t log t − 1 and η i followed the standard normal distribution.This yielded the proportion of informative dropouts of 69.6% and the average number of repeated measurements about 4. Other settings were kept the same as in the previous simulations.The simulation results are presented in Table 2. Again, the proposed approach gives unbiased parameter estimates and reasonable coverage probabilities under all the scenarios.For comparison, we also implemented the ordinary GEE method 16 .When the informative dropout is absent, that is, α 0, the GEE method yields consistent parameter estimates of β 1 as expected.But when there is informative dropout α / 0 , the performance of the GEE method deteriorates quickly as the magnitude of the dependence between the longitudinal data and informative dropout increases.
Last, we conducted sensitivity analysis for the proposed approach and our simulations consisted of two parts.First, as discussed in Section 2, to better characterize the dependence  structure between longitudinal responses and informative dropouts, we would suggest to start with a large model in the specification for α b ij η i and then examine the statistical significance of estimates for α's to further simplify the model.We simulated data from a simple model with either α b ij η i α 1 η i or α b ij η i α 2 η i t ij , and then applied the proposed approach by assuming a bigger model 2.3 as Δ ij α 1 η i α 2 η i t ij .The simulation results are summarized in the top panel in Table 3.The proposed method can reasonably well estimate all the parameters, and in particular, could correctly indicate the unnecessary zero term.Second, we simulated data from Δ ij α 1 η i α 2 η i t ij but fitted misspecified models that omitted

Application to Renal Disease Data from MDRD Study
Here we considered a subgroup of 129 patients with low-protein diet in MDRD study B, among whom, 62 patients were randomized to the group of normal-blood-pressure control and 67 patients were randomized to the group of low-blood-pressure control.Besides the randomized intervention, other covariates included time in study (time), baseline disease progression status (Prog), baseline blood pressure (Bp), and log-transformed baseline urine protein level (log.Pt).There were 52 40.3% patients left the study prematurely for kidney transplant or dialysis and were treated as informative dropouts.
We applied the proposed approach to estimate the marginal effects of covariates on GFR values.To account for the possible informative dropouts, we assumed that the dependence term α b ij η i had a form of α 1 η i α 2 η i t ij , analogous to the joint modeling approach with latent random intercept and random slope used in Schluchter et al. 33 .We considered the situations of η i following the standard normal, the extreme value, or the standard logistic distributions.Because the outcome was a continuous variable, we used the identity link function.
Our results are presented in Table 4 and compared with the results from the ordinary GEE 16 with an independent working correlation matrix.More specifically, the slope estimates from the proposed approach indicate a much faster decreasing rate of GFR e.g., time Est −0.27, SE 0.03, under the normality assumption for η i than the result from the ordinary GEE method time Est −0.14, SE 0.03 .A possible explanation is that those patients remaining under observation usually have better kidney functions and thus higher GFR values.The ordinary GEE approach that treats the observed patients as random representatives of the population tends to underestimate the degressive trend of GFR.
The estimates for the intervention on blood pressure control show positive effect of the low-blood-pressure control on the longitudinal GFR development.Although the results are not statistically significant, the estimates from the proposed method e.g., Intervention Est 0.82, SE 1.07, under the normality assumption for η i are about twice large of the values from the ordinary GEE method Intervention Est 0.35, SE 0.90 .Moreover, for the proposed approach, the results under different distributional assumptions for η i are quite similar.The estimates of the dependency parameters α 1 , α 2 are positive and statistically significant.This indicates that higher GFR values are positively associated with longer dropout times in the study.In addition, our proposed approach shows that the baseline urine protein level is significantly associated with the longitudinal GFR development, but the ordinary GEE method does not show such significance.The results obtained using our proposed method are also consistent with those reported in Schluchter et al. 33 .

Discussion
In this paper, we propose a semiparametric marginalized model for marginal inference of the relationship between longitudinal responses and covariates in the presence of informative dropouts.The regression parameters represent the covariate effects on the population level.The proposed estimators are expected to be insensitive to misspecification of the latent variable distribution 31 , which is desirable pertaining to the sensitivity analysis on unverifiable assumptions for the informative dropouts.In practice, the choice between marginal models and other types of joint modeling approaches should be determined by study objective.
To estimate the regression parameters in the proposed marginalized model, we proposed a class of simple conditional generalized estimating equations and demonstrated its computational convenience.In general, a likelihood-based approach can be used to achieve more efficient inference and is also of great interest.For example, a marginalized random effects model 30, 31 can be used for the longitudinal process and a frailty model 34 can be used for the dropout time.Furthermore, latent variables b ij , η i can be modeled by a copula distribution or non-Gaussian distributions 35 .The likelihood-based methods enjoy the high efficiency and facilitate the implementation of classical model selection procedures, such as AIC or BIC; however, intensive computations are often involved when the dimension of random effects is high.
We first derive the consistency of the proposed estimator B, which is the solution of the equation where O i ∂ H T * i ; θ /∂θ| θ θ 0 and its asymptotic representation can be found in the appendix of Chen et where A.5 The definition of B t, s can be found in Chen et al. 27 .Hence, 1/ √ n U * B 0 ; Θ has been written as a standardized summation of independent terms with mean zero.By the central limit theorem, it is asymptotically equivalent to a multivariate Gaussian variable with zero mean and covariance matrix V , which is the limit of A.6 From A.1 , it is easy to see that the estimator B is asymptotically normal with mean zero and the variance-covariance matrix Ω −1 V Ω −1 , which can be consistently estimated by its empirical counterpart Ω −1 V Ω −1 using the usual plug-in rule.
is a positive definite matrix, and by the law of large numbers and the consistency of θ and H, it converges uniformly to a deterministic positive definite matrix Ω B over a compact set of B. In addition, we have that 1/n U * B; Θ converges uniformly to a deterministic function u * B; Θ 0 satisfying u * B 0 ; Θ 0 0 and − ∂u * B; Θ 0 /∂B | B B 0 Ω B 0 Ω.Thus, the estimating equation U * B; Θ 0 exists a unique solution B. Since B 0 is the unique solution of u * B 0 ; Θ 0 0, the consistency of B easily follows.To prove the asymptotic normality, by the Taylor expansion, we have lies between B and B 0 .Since B is consistent and − 1/n ∂U * B; Θ /∂B converges uniformly to Ω B , we have that − 1/n ∂U * B * ; Θ /∂B converges to Ω. Furthermore, the Taylor expansion of U * B 0 ; Θ around Θ 0 T * i min T i , C i and δ i I T i ≤ C i taking the value of 1 if the informative dropout time is observed and 0 otherwise.Throughout the paper, let I • denote the indicator function.Due to the dropout, longitudinal responses and covariates can only be observed at t ij ≤ T * i .Hence, the observed data are {Y ij , X ij , t ij , T *

Table 1 :
Simulation results for pure informative dropouts.
In Tables1-3: F: error distribution of the semiparametric transformation model; N: sample size; SSE: sample standard deviations of estimates; SEE: mean of estimates standard errors; CP: 95% coverage probability of Wald-type confidence interval.

Table 2 :
Simulation results for mixed types of dropouts.

Table 3 :
Sensitivity analysis for misspecified models under mixed types of dropouts.The results are summarized in the lower panel of Table3.It is evident that misspecified models lead to biased estimates for both longitudinal regression coefficients and dependence parameters.

Table 4 :
Estimates of regression coefficients for the MDRD study.Intervention: blood pressure control 1: low and 0: normal ; Prog: baseline disease progression status 1: yes and 0: no ; Bp: baseline blood pressure; log.Pt: baseline log-transformed urine protein level.
al. 27 .The asymptotic representations of θ and H have also been derived by Chen et al. 27 , H t; θ 0 − Λ * {H 0 t } n −1 s dΛ{θ 0 Z i H 0 s } is the mean-zero martingale process, and the definitions of Σ * and the functions Λ * • , λ * • , B 2 • , and μ b • are given in the appendix of Chen et al. 27 .Plugging these terms back to the expansion of 1/ √ n U * B 0 ; Θ specified in A.2 , some rearrangement yields that it is equal to * * i ≤ t and ξ i t I T * i ≥ t are the counting and at-risk processes, respectively, M i t N i t − t 0 ξ i