A Proposed Approach for Joint Modeling of the Longitudinal and Time-To-Event Data in Heterogeneous Populations: An Application to HIV/AIDS's Disease

In recent years, the joint models have been widely used for modeling the longitudinal and time-to-event data simultaneously. In this study, we proposed an approach (PA) to study the longitudinal and survival outcomes simultaneously in heterogeneous populations. PA relaxes the assumption of conditional independence (CI). We also compared PA with joint latent class model (JLCM) and separate approach (SA) for various sample sizes (150, 300, and 600) and different association parameters (0, 0.2, and 0.5). The average bias of parameters estimation (AB-PE), average SE of parameters estimation (ASE-PE), and coverage probability of the 95% confidence interval (CP) among the three approaches were compared. In most cases, when the sample sizes increased, AB-PE and ASE-PE decreased for the three approaches, and CP got closer to the nominal level of 0.95. When there was a considerable association, PA in comparison with SA and JLCM performed better in the sense that PA had the smallest AB-PE and ASE-PE for the longitudinal submodel among the three approaches for the small and moderate sample sizes. Moreover, JLCM was desirable for the none-association and the large sample size. Finally, the evaluated approaches were applied on a real HIV/AIDS dataset for validation, and the results were compared.


Introduction
In many studies, the repeated measures of a biomarker are recorded together with time to an event of interest. For example, in HIV/AIDS studies, the trajectories of CD4 counts and time-to-death are collected. In such studies, the interest often lies in understanding the relationships between the longitudinal history of a process and its effect on the risk of an event [1][2][3][4][5][6][7][8][9].
Classical models such as the separate analysis were performed for these types of data; consequently, the association between the longitudinal and survival outcomes is neglected because the linear mixed model for repeated measurements and the Cox model for time-to-event are conducted separately [6,10,11]. In addition, some practices consider the dependency between the two outcomes. Hence, the extended Cox model is used to incorporate the repeated measures as time-varying covariates [4]. In this method, time varying covariates are assumed to be observed continuously till the study terminated using this approach. In practice, this assumption usually does not stratify. Moreover, longitudinal biomarkers tend to be measured with error; thus, modeling the longitudinal measures by a mixed model accounts for this measurement error, which is neglected in the extended Cox model, thus leading to biased and inefficient estimates [4,10,[12][13][14].
In recent years, joint model has been used to analyze the longitudinal and survival outcomes simultaneously to consider association between the two outcomes [1,14,15]. Joint model enjoys some advantages as compared to classical approaches such as Cox and linear mixed models alone and provides more powerful, accurate, efficient, and robust estimations [4,10,12,16].
Most of the joint models allow subjects to just follow one pattern [5,6,13], and the baseline hazard is considered the same for all subjects. Thus, they become inappropriate when there are subgroups with different patterns of response profiles [13]. 2 BioMed Research International Joint latent class model (JLCM) is a type of joint models that assumes the population of the subjects to be heterogeneous with multiple homogenous patterns; it is known as the latent class (subpopulation, subtype, or subgroup), having its own longitudinal trajectory and survival curve [2,5,6,17].
Conditional independence (CI) as a fundamental assumption of the JLCM shows that the entire association between longitudinal and survival outcomes is captured by the latent class structure. Thus, given these latent classes, the two types of outcomes are independent [17][18][19][20]. However, the CI assumption may not sufficiently show the strength of association and might underestimate the association between the longitudinal and survival processes [13]. Furthermore, to ensure the CI assumption, JLCM has to be examined for various numbers of latent classes, which may ultimately lead to choosing an inappropriate and meaningless size of classes.
We designed a simulation study to combine the joint model with the latent class framework which proposed an approach (PA) for heterogeneous population of subjects free from the CI assumption. At first, the class membership for each subject based on the latent class framework was identified for appropriate number of latent classes. Then, the joint model for longitudinal and survival processes was conducted separately in each latent class for PA. In addition, the separate approach (SA), the linear mixed model for the longitudinal data, and the extended Cox model for the survival outcome were applied separately in each latent class. Finally, we compared PA with JLCM and SA for various sample sizes and different association parameters. In addition, we focused on both the longitudinal and survival outcomes in this study. (JLCM). JLCM assumes that the subjects in each latent class have their own specific longitudinal trajectory and risk of the event, which is useful in many types of research with different patterns of the longitudinal and survival outcomes. In addition, JLCM can be performed for normal and nonnormal distributions and ordinal outcomes [6,21]. This model does not require normal distribution of random-effects assumption, since it consists of several subpopulations, where this assumption is not realistic [22]. JLCM includes three components: the latent class membership, the longitudinal, and survival submodels. Given the latent class , there is no association between two processes of the longitudinal and survival outcomes; consequently, dependency between time-to-event and longitudinal processes is captured by the structure of latent class [5]. Several methods were introduced to evaluate the CI assumption: evaluation based on the posterior classification, analysis of the residuals conditional on the event, and a score test [19,23,24]. Among these approaches, the score test is more powerful than the other methods to assess the CI assumption [2,5].

Joint Latent Class Model
In practice, JLCM is applied to a number of latent classes from one to three; the appropriate number of latent classes is determined using the best Bayesian information criterion (lower BIC) and satisfactory CI assumption [6,20].
Each subject is assigned to each latent class, which has the highest class membership probabilities [25]. A case that is wrongly classified is called misclassified on a categorical variable [13]. (SA). Commonly, the linear mixed model is used for continuous longitudinal measurements. Also, the parametric or semiparametric survival models are used for modeling the time-to-event data [11]. In SA, the probability that a subject belongs to a latent class structure can be modeled via a latent class framework. Next, the linear mixed model for modeling the longitudinal measurements and the extended Cox model by incorporating repeated measurements into the survival data were conducted for each latent class.

Proposed Approach (PA).
We incorporated the latent class framework to identify its subgroups behind the observed longitudinal measurements and survival outcome. PA provides an approach that achieves appropriate number of the latent classes in heterogeneous populations without requiring the CI assumption. Appropriate number of latent classes are determined by a suitable and easier interpretation according to researcher' comments. For PA, each subject was allocated to an appropriate class according to the highest class membership probabilities. Then, joint model was conducted for each class; additionally, in each latent class, the association between the longitudinal and time-to-event data was modeled by the entire longitudinal trajectory as a covariate in the survival submodel.
(1) Latent Class Framework. The class membership probability for a subject belonging to a latent class can be modeled via a multinomial logistic regression with vector of covariate : Let represent the latent variable with = 1, . . . , latent classes. 0 is the intercept for class and 1 is the vector of class-specific parameters associated with the set covariates . Also, to ensure identifiability, 0 = 0 and 1 = 0, that is, last latent class as [2,5,25].
In the application, parameters from latent class framework are estimated by maximizing the log likelihood function with iteration of Expected-Maximization (EM) algorithm with steps of Newton-Raphson [26,27].
(2) Longitudinal Submodel. The longitudinal submodel is specified as a class-specific linear mixed model. Let be the total number of subjects and let = 1, 2, . . . , be the number (2) Given the latent class , ( ) is the longitudinal outcome for subject at the time of , and ( ) represents the random effect covariate vectors at the time for subject , associated with the -vector of random effect , where ( ) is the fixed effects covariate vectors at the time , which is associated with the -vector of fixed effect. The random error term, ( ) is usually assumed to be normally distributed.
(3) Survival Submodel. The survival submodel is specified as a Cox or any parametric survival model. Given latent class , the survival submodel is specified as where ℎ 0 ( ) is the baseline hazard function for class and ( ) is the covariate vector associated with the -vector parameters for the latent class .
The quantity, * ( ) = ( ) + ( ) , is the trajectory of the longitudinal function for class to connect the longitudinal process with the survival outcome. The parameter links the longitudinal and time-to-event outcomes in each class.

Simulation Studies.
We conducted this simulation study to examine bias, SE, the average bias of parameters estimation (AB-PE), the average SE of parameters estimation (ASE-PE), and coverage probability of the 95% confidence interval (CP) for three approaches (PA, JLCM, and SA) for the longitudinal and survival submodels. AB-PE shows the average of absolute bias of all parameters estimation. CP shows the proportion of time that confidence interval contains the true value. A multinomial logistic model was considered for the latent class membership for each subject: We considered a binary and a continuous covariate, where 1 is called a treatment effect, which was assumed as a binomial distribution with = 0.5 and 2 ∼ (0, 1). We assumed two latent classes ( = 2), where approximately 50% of the subjects belonged to class 1.
The survival submodel assumed a Cox model with a Weibull baseline hazard function. The event time was generated using an inverse cumulative hazard function [15,28,29]. The censored time is noninformative and is uniformly distributed random variable on 2.5+ uniform [0, 3]. Therefore, the observed failure time for the th subject was considered as the minimum of true event time and censored time [20,30]. As some previous studies, the censoring rate was considered around 60% in this simulation study [13,28].
The survival submodel was generated for each latent class as follows: The treatment effect on the time-to-event was = 0.5 and −0.5 in classes 1 and 2, respectively. The shape and scale parameters, ( , ), of baseline hazard function were (0.6, 0.001) and (1, 0.001) in classes 1 and 2, respectively. Sets of simulated data were performed for three sample sizes (150, 300, and 600 as small, moderate, and large sample sizes). Similar to previous study, three association parameters between longitudinal and survival outcomes were considered = (0, 0.2, 0.5) for none, moderate, and considerable association, respectively [12]. The magnitude of the association parameters was assumed the same in the two classes. For each simulation, the three approaches of PA, SA, and JLCM were fitted. We ran 1000 replications for each set of simulated data.
There are several methods to estimate parameters in joint models, including ML, restricted maximum likelihood (REML), and Bayesian method [18]. In PA, Gauss-Hermite integration method for maximizing the log likelihood of the joint distribution and EM iterations algorithm or quasi-Newton iterations were used. In JLCM, ML with EM algorithm was implemented to estimate parameters. For SA approach, ML in the longitudinal submodel and REML in the survival submodel were used for parameters estimation. The JM and LCMM packages in R version 3.1.1 software were used in this study.

Effect of Sample Size.
Simulations results showed that in most cases when the three approaches were used the sample size increased, while AB-PE and ASE-PE decreased, and the CP went close to nominal level of 0.95. Tables 1-3

None-Association ( = 0).
For JLCM, the model with the best BIC and the CI assumption satisfied included two latent classes ( = 2) for the moderate and large sample sizes, while in the small sample size, = 1 was the best-fit model. For the small sample size, results were reported for the two classes, since we can compare the models together. The average misclassification rates for the two latent classes for sample sizes of 150, 300, and 600 were 24%, 5%, and 1.4%, respectively. AB-PE and ASE-PE for the longitudinal submodel of PA and SA for the small sample size were the same and the lowest among the three approaches. Additionally, PA had lower ASE-PE than SA for the moderate and large sample sizes. For the large sample size, AB-PE of the longitudinal submodel for the three approaches was the same, and JLCM had the lowest ASE-PE (Figure 1). AB-PE for the treatment effect on time-to-event in the PA and SA was approximately the same for the small and moderate sample sizes. In addition, PA had better CP as compared with SA. Besides, for the large sample size, AB-PE and ASE-PE for the treatment effect on the survival submodel of JLCM were the lowest (Figure 2) and had a good CP amongst the three approaches.
The average of absolute bias and SE for the association parameter of PA and SA was approximately the same, and by increasing the sample size, bias and its SE decreased.
Bias, SE, and CP estimated parameters for the three approaches are presented in Table 1. AB-PE and their ASE-PE for the longitudinal and survival submodels are shown in Figures 1 and 2. ( = 0.2). For JLCM, the average misclassification rate for the two latent classes in the small sample size was approximately 20%, which was greater than the other sample sizes.

Moderate Association
PA had the smallest AB-PE and ASE-PE for the longitudinal submodel among the three approaches for small and moderate sample sizes. As for the large sample size, PA and JLCM had the same AB-PE, but JLCM had smaller ASE-PE in comparison with the other approaches for the longitudinal submodel ( Figure 1). AB-PE for the treatment effect on the survival submodel of PA was lower than JLCM and SA for the all sample sizes. In addition, ASE-PE of PA was the lowest among the three approaches for the small and moderate sample sizes. Furthermore, JLCM had the lowest ASE-PE, and SA had the highest AB-PE among the three approaches for the large sample size (Figure 2). Figures 1 and 2 show the results. In addition, by increasing the sample size, CP for PA and JLCM were close to 0.95 (Table 2). ( = 0.5). JLCM with one to three numbers of latent classes was performed. For the moderate and large sample sizes, the three appropriate numbers of latent classes were detected based on the best BIC, and satisfaction CI assumption, and for the small sample size, one latent class was preferred. We reported the estimation of parameters for the two classes in order to compare the three approaches together. The average misclassification rates for the two latent classes were 47%, 26%, and 10%, for the small, moderate, and large sample sizes, respectively.

Considerable Association
PA had the lowest AB-PE and ASE-PE and plausible CP for the longitudinal outcome, as well as the treatment effect on the survival submodel for the three sample sizes. In the three approaches, if sample size increases, AB-PE and ASE-PE decrease (Figures 1 and 2) and the CP get closer to nominal level of 0.95. In addition, bias of association parameter for PA and SA was negative in two classes. Moreover, the average absolute bias of association parameter for SA was higher than PA. The average of CP for PA, JLCM, and SA was 0.970, 0.837, and 0.833, for the large sample size, respectively. For bias, SE, and CP information of parameters estimation, refer to Table 3.

The Data and Methods
Description. The number of new HIV infections has declined by 38% worldwide from 2001 to 2013, followed by a significant decline in AIDS-related deaths [31]. According to the World Health Organization [32] report, 36.7 million people will be living with HIV/AIDS by the end of 2015 [32].
Among infectious diseases, the HIV/AIDS studies are a good example to be used in joint modeling of the longitudinal and survival processes. There are some literatures available that have used the joint modeling on such data [3,6,33,34]. In HIV/AIDS studies, CD4 cells are considered as a sign of disease progression in HIV-infected people. CD4 cells help to coordinate the immune system's response to certain microorganisms such as viruses; a low CD4 count is an indication of a higher risk of infection [6,33,35].
In this study, the HIV/AIDS dataset from Community Programs for Clinical Research on AIDS (CPCRA) was used [36], and a total of 467 patients infected with HIV were included in this study. The two outcomes were the longitudinal measurements of CD4, recorded at different time points: at the study entry, 2, 6, 12, and 18 months, and the time-to-death outcome. In CPCRA study, patients received two treatments, Zalcitabine (ddC) or Didanosine (ddI), randomly. Only a brief description of the dataset used in this study was mentioned here, since they have been fully described elsewhere [36].
In the present study, the HIV/AIDS dataset was used as an example to evaluate PA. To predict the class membership, an intercept-only-model or different covariates such as baseline hemoglobin (Hgb), treatment, and gender were considered from the literature [6,13,18]. In this study, based on Hgb and the treatment covariates, the class membership probability for each patient was identified via latent class framework. Then the patients were divided into two latent classes based on their highest posterior class membership probabilities. The number of latent classes was chosen in a way that there were enough observations in each latent class for easier classification, consistency with our simulation-based study, and easier interpretation.
PA and SA for modeling the influence of effective covariates on CD4 count and time-to-death were conducted in each class. In addition, we fitted JLCM for longitudinal CD4 measures and time-to-death with the number of latent classes varying from 1 to 3.
Due to the skewed distribution of CD4 cell level, and the presence of zero values, log(CD4+1) was used as the longitudinal outcome. The baseline hazard functions were estimated by Weibull distribution.
We used latent GOLD software ver. 4.5 to identify the probability of the class membership for each subject and influential covariates on classes.

Results of Application Data.
The results of latent class framework, using the PA and SA, showed that Hgb was significant ( value < 0.001), while the treatment ( value = 0.170) was an insignificant covariate on the subtype classification. Based on the classification, 51% of the patients were in the first class.
The PA on HIV/AIDS Dataset. In both classes, CD4 values decreased with time. The estimates of association parameters between CD4 and the time-to-death ( ) were significant and negative in both classes. Treatment had a significant effect on time-to-death in the second class. The effective covariates on the longitudinal and survival submodels are presented in Table 4.
The Kaplan-Meier survival plot and the mean of log(CD4+1) stratified by posterior classification are presented in Figures 3 and 4. Patients in the second class had a better survival rate and higher log(CD4+1) values.  The SA on HIV/AIDS Dataset. In the longitudinal submodel, time effect in the first class occurrence reduces CD4 cells significantly. Hgb had a small, but significant effect on CD4 in the first class and a strong significant effect in the second class.
In the survival submodel, the treatment had a significant effect on time-to-death in the second class. The estimated association parameters between CD4 and the time-to-death were significant and negative in both classes. The results for SA are presented in Table 4.
The JLCM on HIV/AIDS Dataset. BIC calculated for two latent classes was 4280.560, which was smaller than one class (4417.210) and three classes (4304.480). Also, the CI assumption was not rejected ( value = 0.250) for this model; thus, the model with two latent classes was preferred.
The probability of belonging to the latent classes was not significantly associated with the treatment ( value = 0.370) and Hgb ( value = 0.566).
In the longitudinal submodel, the time effect was negative and significant in the two latent classes. In the survival submodel, the treatment in the second class and Hgb in the first class were significantly associated with risk of death for HIV/AIDS patients (Table 4).
Overall, ASE-PE for the longitudinal submodel was 0.024, 0.024, and 0.039 for the PA, SA, and JLCM, respectively. Furthermore, ASE-PE for the survival submodel among the three approaches were 0.106, 0.394, and 0.343 for the PA, SA, and JLCM, respectively.

Discussions about Simulation
Results. According to the simulations results, in most cases, in the three approaches when the sample size increased, AB-PE and ASE-PE decreased, and CP got closer to nominal level of 0.95. This finding is consistent with a simulation-based study for a parametric latent class joint model of the longitudinal and survival outcome [2].
Our main finding occurred when there was a considerable association ( = 0.5) between two processes. PA provided lower AB-PE and ASE-PE than JLCM and SA for the three sample sizes; hence, PA yielded unbiased and more efficient estimation of parameters than JLCM and SA for the longitudinal and survival submodels. The results of a similar study are consistent with those of PA for heterogeneous populations [13]. However, PA used the full longitudinal trajectory to connect the longitudinal and survival data, whereas in the similar study, only the shared random effect was used. This study showed that the model worked well in estimating longitudinal and survival parameters in a sample size of 400 and for the considerable association between the two processes.
To the best of our knowledge, no comparison has been made between JLCM and other approaches for the heterogeneous populations. However, to compare with similar studies, we used the ones that had assumed that the subjects exhibited one pattern. For comparison between PA and SA, the results are consistent with previous studies that had conducted simulation-based studies where there was a strong association between the two outcomes. Their results showed that the joint modeling that utilizes information from both outcomes tends to produce almost unbiased estimates and smaller SEs of all the parameters than separate model [8,37,38]. Furthermore, since AB-PE and ASE-PE in JLCM were higher than PA, it seems that JLCM cannot contain the strength of association entirely by latent structures. In addition, the number of latent classes in JLCM could not be estimated directly and for some sample sizes, the appropriate number of classes is selected according to lower BIC, and acceptance of the CI assumption was not consistent with the true size of classes. Therefore, it led to biased estimation of parameters, while PA achieved an appropriate number of latent classes directly with no need for the CI assumption and BIC criterion. In addition, the association parameter for SA was underestimated in comparison with PA. This result concurs with those of a similar study which showed that using the longitudinal outcome as a time-varying covariate into the survival model is not recommended, due to severe underestimation of the association parameter [15].
When there was the moderate correlation ( = 0.2) between the longitudinal and survival processes, PA was preferred over JLCM and SA for the small and moderate sample sizes. In addition, the average misclassification rate for the small sample size in JLCM was high; hence, AB-PE and ASE-PE were increased. Furthermore, for the large sample size, the average misclassification rate for JLCM was low; thus, AB-PE of the longitudinal submodel for JLCM and PA was the same and JLCM was more efficient.
For the case of none-association ( = 0) between the longitudinal and survival processes, results of the longitudinal submodel of PA were similar to SA in the small sample size. Our finding is consistent with a similar study for noneassociation between the longitudinal and survival data [39]. Also, PA was more efficient than SA in the moderate and large sample sizes. For the small and moderate sample sizes, the results of the effect of the treatment on the time-to-event of PA and SA were found to be similar. This result is consistent with similar studies that had shown when there was no association between the longitudinal and survival data; the longitudinal information did not improve the estimation of the treatment effect on the survival outcome [12,39]. Moreover, JLCM was unbiased and more efficient than the other two approaches in the large sample size. Computationally, JLCM was faster and easier than the time consuming PA. In addition, for the large sample size, the misclassification rate was the lowest; hence, the entire association between the longitudinal and survival outcomes can be considered with the latent structure. Therefore, in this case, JLCM was more desirable than the two other approaches.
We believe that our PA can address the heterogeneity and consider the association structure behind the longitudinal and survival processes. One of the advantages of using PA was its capability to reduce AB-PE and ASE-PE by increasing the sample size and intensity of the association parameter. However, it leads to increased computation and time required to fit the model, which is one of the disadvantages of PA.
Finally, this study had some limitations that have to be addressed. First, in this study, we used the same magnitude parameters with opposite direction to consider the two heterogeneous classes. Second, association parameters for the two classes were the same. Third, this study was limited to two latent classes and continuous longitudinal and single event data. Further researches have to use PA with various options for the survival and longitudinal processes such as a nonlinear mixed model for the longitudinal data and a parametric or recurrent survival model. Moreover, we used ML estimation, while the Bayesian inference can be an alternative approach for estimation of parameters. Also, further simulation studies can be performed to evaluate statistical properties for PA including the statistical power.

Discussions about HIV/AIDS Results.
The results showed that Hgb was a significant covariate in classifying subjects via latent class framework, concurring with the results of a study on this dataset [13].
According to the results of the application, the time effect was significant in each class for CD4 longitudinal outcome in PA and JLCM. This study produced results which corroborate the findings of a great deal of the previous works that used this dataset [3,13]. There were no statistically significant differences between the two treatments (ddC and ddI), on CD4 longitudinal outcome in the two latent classes in the three approaches. This result agrees with the findings of similar studies that had investigated the effect of the treatment on CD4 longitudinal outcome [11,13]. In addition, Hgb had a significant positive effect on CD4 values in both classes in PA and SA. This result matches those observed in earlier studies on this dataset [13].
As for the survival submodel, the treatment was a significant factor on time-to-death in the second class in the three approaches. Patients in the second class had a better survival rate when given ddC. Furthermore, Hgb was not a significant factor of the death rate in the two classes for PA and SA. This finding is consistent with a similar study where Hgb was imported into the model [13]. In JLCM, Hgb was a significant factor of death rate in the first class but did not have a significant effect in the second class.
The estimated association parameters ( 1 and 2 ) between CD4 and time-to-death were negative and significant in both classes for PA and SA. This implies that a higher CD4 count is associated with a lower death rate or a reduced number of CD4 significantly increases the risk of death in patients [10,40].
Overall, the results of PA in this study confirm those of the previous studies on this dataset and with the biomedical literature [11,13,14,18]. Moreover, PA and SA had the same ASE-PE approximately for the longitudinal submodel that are consequently more efficient than JLCM. In addition, PA had lower ASE-PE for the survival submodel; hence, PA is more efficient than the other two approaches for indicating the influence covariates on time-to-death in patients with HIV/AIDS. The results of the three approaches on CPCRA data confirm our result in the simulation study when there was a considerable association parameter between the longitudinal outcome and time-to-event in the large sample size.
The application study on CPCRA data shows the advantages of our PA. Therefore, by using appropriate latent class joint model, we can assign treatment ddC to patients with a higher chance of being classified into the second class based on their baseline hemoglobin (Hgb), thereby increasing the survival rate. In other cases, when the treatments have side effects, we could utilize an appropriate latent class joint modeling to identify a subgroup of patients that are most likely to have side effects. Hence, we can assign treatments in a personalized manner to avoid such subgroup, which can further benefit the patients.

Conclusion
This simulation-based study provided an approach for the joint model, by considering the association between the longitudinal and time-to-event data for heterogeneous