We propose a marginalized joint-modeling approach for marginal inference on the association between longitudinal responses and covariates when longitudinal measurements are subject to informative dropouts. The proposed model is motivated by the idea of linking longitudinal responses and dropout times by latent variables while focusing on marginal inferences. We develop a simple inference procedure based on a series of estimating equations, and the resulting estimators are consistent and asymptotically normal with a sandwich-type covariance matrix ready to be estimated by the usual plug-in rule. The performance of our approach is evaluated through simulations and illustrated with a renal disease data application.
Longitudinal studies often encounter data attrition because subjects drop out before the designated study end. Both statistical analysis and practical interpretation of longitudinal data can be complicated by dropouts. For example, in the Modification of Diet in Renal Disease (MDRD) study [
Many statistical models and inference approaches have been proposed to accommodate the nonignorable missingness into modeling longitudinal data (see reviews [
Second, event-conditioning approaches have also been widely used when the target of inference is within subgroups of patients with particular dropout patterns or when the dropout can potentially change the material characteristic of the longitudinal process (e.g., death). The inference is usually conducted conditioning on the dropout pattern or on the occurrence of the dropout event. Thus, model parameters have an event-conditioning subpopulation-averaged interpretation, for example, pattern-mixture models for the group expectation of each dropout pattern [
Lastly, when the research objective is to study covariate effects at population level in a dropout-free situation, marginal models address this concern directly. When data are without missing or missing completely at random (using Rubin's definition on missingness [
In this paper, we shall adopt the idea of shared latent variables to account for the dependence between longitudinal responses and informative dropouts while focusing on marginal inference for the longitudinal responses. Here dropouts can occur on a continuous time scale. We develop an effective estimation procedure built on a series of asymptotically unbiased estimating equations with light computational burden. The resulting estimators for longitudinal parameters are shown to be consistent and asymptotically normal, with a sandwich-type variance-covariance matrix that can be estimated by the usual plug-in rule.
The remainder of the paper is organized as follows. In Section
Consider that a longitudinal study follows
We first introduce the composition of our proposed model and then discuss the model motivation and interpretation. The first component is a marginal generalized linear model for longitudinal responses
The marginal mean model (
The conditional mean model (
Note that
First, assume that
It is easy to see that
It is clear that the implementation of the estimating function (
We first consider the situation of pure informative dropouts, that is,
We generalize the proposed estimation function (
In this subsection, we establish the asymptotic properties of The covariates The true parameter values Let The matrix
The regularity conditions (C1)–(C4) are also used by Chen et al. [
Under conditions (C1)–(C5), with probability 1,
The definition of
We conducted a series of simulation studies to evaluate the finite-sample performance of our proposed approach. Consider a binary longitudinal process with the marginal probability of success as
For each scenario, we considered samples of size 100 and 200 and conducted 500 runs of simulations. The Gaussian-quadrature approximation was calculated using 50 grid points. We first considered the situation of pure informative dropouts and generated the dropout time
Simulation results for pure informative dropouts.
Bias | SSE | SEE | CP | Bias | SSE | SEE | CP | ||||
N. | 0 | 100 | 0.001 | 0.084 | 0.082 | 0.948 | 0.004 | 0.130 | 0.127 | 0.954 | |
200 | 0.006 | 0.057 | 0.058 | 0.952 | −0.002 | 0.090 | 0.091 | 0.942 | |||
0.5 | 100 | 0.008 | 0.083 | 0.082 | 0.952 | 0.008 | 0.148 | 0.142 | 0.936 | ||
200 | 0.005 | 0.056 | 0.058 | 0.954 | 0.001 | 0.102 | 0.101 | 0.944 | |||
0.25 | 100 | 0.004 | 0.102 | 0.101 | 0.940 | 0.006 | 0.083 | 0.081 | 0.938 | ||
200 | 0.009 | 0.071 | 0.071 | 0.948 | −0.001 | 0.060 | 0.058 | 0.940 | |||
E. | 0 | 100 | 0.007 | 0.107 | 0.097 | 0.920 | 0.001 | 0.150 | 0.140 | 0.942 | |
200 | 0.002 | 0.069 | 0.069 | 0.950 | −0.003 | 0.010 | 0.097 | 0.958 | |||
0.5 | 100 | 0.010 | 0.097 | 0.097 | 0.948 | 0.025 | 0.161 | 0.161 | 0.962 | ||
200 | 0.009 | 0.071 | 0.069 | 0.932 | 0.002 | 0.111 | 0.113 | 0.950 | |||
0.25 | 100 | 0.026 | 0.138 | 0.135 | 0.936 | 0.001 | 0.124 | 0.114 | 0.926 | ||
200 | 0.009 | 0.089 | 0.095 | 0.954 | 0.001 | 0.079 | 0.082 | 0.952 | |||
L. | 0 | 100 | 0.004 | 0.079 | 0.077 | 0.932 | 0.001 | 0.076 | 0.077 | 0.950 | |
200 | 0.002 | 0.057 | 0.055 | 0.952 | −0.001 | 0.058 | 0.054 | 0.938 | |||
0.5 | 100 | 0.009 | 0.079 | 0.076 | 0.944 | 0.007 | 0.114 | 0.105 | 0.928 | ||
200 | 0.002 | 0.053 | 0.054 | 0.964 | 0.006 | 0.074 | 0.075 | 0.956 | |||
0.25 | 100 | 0.003 | 0.096 | 0.096 | 0.954 | 0.008 | 0.069 | 0.067 | 0.942 | ||
200 | 0.002 | 0.067 | 0.068 | 0.964 | 0.004 | 0.047 | 0.047 | 0.958 |
In Tables
Next, we consider the situation where there are mixed informative dropouts and random censoring. For simplicity, let
Simulation results for mixed types of dropouts.
Proposed | Proposed | GEE | |||||||||
Bias | SSE | CP | Bias | SSE | CP | Bias | SSE | CP | |||
0 | 100 | 0.003 | 0.075 | 0.958 | −0.007 | 0.158 | 0.956 | 0.002 | 0.068 | 0.946 | |
200 | 0.005 | 0.056 | 0.954 | −0.003 | 0.114 | 0.944 | 0.005 | 0.049 | 0.956 | ||
0.25 | 100 | 0.006 | 0.073 | 0.948 | 0.009 | 0.156 | 0.964 | 0.061 | 0.068 | 0.878 | |
200 | 0.004 | 0.055 | 0.950 | −0.003 | 0.116 | 0.936 | 0.057 | 0.050 | 0.768 | ||
0.50 | 100 | 0.008 | 0.076 | 0.948 | 0.010 | 0.174 | 0.952 | 0.116 | 0.070 | 0.660 | |
200 | 0.006 | 0.051 | 0.960 | −0.008 | 0.121 | 0.956 | 0.113 | 0.049 | 0.380 | ||
0.25 | 100 | 0.005 | 0.106 | 0.948 | 0.005 | 0.106 | 0.960 | 0.233 | 0.074 | 0.110 | |
200 | 0.009 | 0.078 | 0.936 | −0.007 | 0.077 | 0.940 | 0.233 | 0.054 | 0.000 | ||
0.50 | 100 | 0.007 | 0.097 | 0.954 | 0.010 | 0.141 | 0.950 | 0.397 | 0.081 | 0.000 | |
200 | 0.008 | 0.071 | 0.940 | −0.008 | 0.104 | 0.936 | 0.395 | 0.059 | 0.000 |
Last, we conducted sensitivity analysis for the proposed approach and our simulations consisted of two parts. First, as discussed in Section
Sensitivity analysis for misspecified models under mixed types of dropouts.
True | Fitted | Proposed | Proposed | Proposed | ||||
Bias | CP | Bias | CP | Bias | CP | |||
100 | 0.009 | 0.946 | 0.014 | 0.956 | 0.002 | 0.950 | ||
200 | 0.009 | 0.964 | −0.002 | 0.948 | 0.002 | 0.968 | ||
100 | 0.005 | 0.946 | −0.004 | 0.954 | 0.007 | 0.956 | ||
200 | 0.009 | 0.936 | 0.001 | 0.956 | −0.007 | 0.952 | ||
100 | 0.071 | 0.860 | 0.119 | 0.918 | ||||
200 | 0.069 | 0.788 | 0.103 | 0.896 | ||||
100 | −0.017 | 0.924 | 0.086 | 0.850 | ||||
200 | −0.017 | 0.902 | 0.078 | 0.692 |
Here we considered a subgroup of 129 patients with low-protein diet in MDRD study B, among whom, 62 patients were randomized to the group of normal-blood-pressure control and 67 patients were randomized to the group of low-blood-pressure control. Besides the randomized intervention, other covariates included time in study
We applied the proposed approach to estimate the marginal effects of covariates on GFR values. To account for the possible informative dropouts, we assumed that the dependence term
Our results are presented in Table
Estimates of regression coefficients for the MDRD study.
Proposed | GEE | |||||||
Variable | Normal | EV | Logistic | |||||
Est | SE | Est | SE | Est | SE | Est | SE | |
18.54 | 0.96 | 18.57 | 0.91 | 18.58 | 1.11 | 18.57 | 0.78 | |
−0.27 | 0.03 | −0.29 | 0.04 | −0.28 | 0.03 | −0.14 | 0.03 | |
0.82 | 1.06 | 0.74 | 1.01 | 0.71 | 1.17 | 0.35 | 0.90 | |
−0.14 | 1.07 | −0.08 | 1.02 | −0.14 | 1.19 | −0.14 | 0.91 | |
−0.15 | 1.38 | −0.20 | 1.34 | −0.07 | 1.48 | −0.36 | 0.49 | |
−1.09 | 0.39 | −1.09 | 0.37 | −1.12 | 0.42 | −0.61 | 0.38 | |
1.91 | 0.50 | 1.38 | 0.38 | 1.11 | 0.28 | |||
0.14 | 0.04 | 0.14 | 0.04 | 0.08 | 0.03 |
The estimates for the intervention on blood pressure control show positive effect of the low-blood-pressure control on the longitudinal GFR development. Although the results are not statistically significant, the estimates from the proposed method (e.g.,
In this paper, we propose a semiparametric marginalized model for marginal inference of the relationship between longitudinal responses and covariates in the presence of informative dropouts. The regression parameters represent the covariate effects on the population level. The proposed estimators are expected to be insensitive to misspecification of the latent variable distribution [
To estimate the regression parameters in the proposed marginalized model, we proposed a class of simple conditional generalized estimating equations and demonstrated its computational convenience. In general, a likelihood-based approach can be used to achieve more efficient inference and is also of great interest. For example, a marginalized random effects model [
Under the conditions (C1)–(C4), Chen et al. [
To prove the asymptotic normality, by the Taylor expansion, we have
Plugging these terms back to the expansion of
Hence,
The authors would like to thank the editor and the referees for their instructive comments. This research was supported partially by NIH grants RO1 CA-140632 and RO3 CA-153083.