Joint Modelling Approaches to Survival Analysis via Likelihood-Based Boosting Techniques

Joint models are a powerful class of statistical models which apply to any data where event times are recorded alongside a longitudinal outcome by connecting longitudinal and time-to-event data within a joint likelihood allowing for quantification of the association between the two outcomes without possible bias. In order to make joint models feasible for regularization and variable selection, a statistical boosting algorithm has been proposed, which fits joint models using component-wise gradient boosting techniques. However, these methods have well-known limitations, i.e., they provide no balanced updating procedure for random effects in longitudinal analysis and tend to return biased effect estimation for time-dependent covariates in survival analysis. In this manuscript, we adapt likelihood-based boosting techniques to the framework of joint models and propose a novel algorithm in order to improve inference where gradient boosting has said limitations. The algorithm represents a novel boosting approach allowing for time-dependent covariates in survival analysis and in addition offers variable selection for joint models, which is evaluated via simulations and real world application modelling CD4 cell counts of patients infected with human immunodeficiency virus (HIV). Overall, the method stands out with respect to variable selection properties and represents an accessible way to boosting for time-dependent covariates in survival analysis, which lays a foundation for all kinds of possible extensions.


Introduction
First suggested by [1], joint models were established as a valuable tool for analysing data where event times are measured alongside a longitudinal outcome. One naive approach of evaluating such frequently occurring data structures would be separate modelling, i.e., fitting appropriate models independently for longitudinal and time-to-event data. However, separate modelling neither corrects for event-dependent dropout in longitudinal analysis nor quantifies the relation between a time-dependent covariate and the risk for an event in survival analysis [2]. Various approaches have been proposed to address these issues, one being the Andersen-Gill model [3] for time-varying covariates in survival analysis or two-stage approaches, where longitudinal model fits are included as fixed covariates in time-to-event regression. It has been shown, however, that these methods tend to produce biased results [4,5]. One solution therefore is combining both the survival and longitudinal models within one single joint likelihood. A wide introduction to this joint modelling framework is presented in [4] including the JM package [6]. Moreover, an evolution of joint model progression up to the year 2004 is provided in [7], and in addition, several Bayesian approaches have been carried out [8][9][10].
Current joint modelling estimation methods, however, lack clear concepts for proper variable selection and good performance regarding prediction. Moreover they are not feasible for high-dimensional data, in particular where the number of covariates exceeds the number of observations, i.e., p > n problems. In order to overcome these hindrances, an algorithm was initially proposed, where joint models are fitted with gradient boosting techniques, which are known for addressing exactly these issues [11]. Evolved from machine learning as an approach to classification problems originally proposed in [12], gradient boosting deals with high-dimensional data and the component-wise updating scheme offers implicit variable selection. The boosting algorithm for joint models was extended [13], but when wanting to lay more focus on the survival side of the model, gradient boosting proved to struggle with time-varying covariates in time-to-event analysis. This has also been observed for pure survival models in [14].
Hence, this work focuses on likelihood-based boosting. First introduced in [15], likelihood-based boosting is for random intercepts and slopes in an additional Fisher scoring step on the penalized log-likelihood ℓ pen . step3: Update variance-covariance components as fixed values. Survival part form = 1 to m s do Update survival effects Compute the score vector and Fisher matrix s r ðβ r Þ = ∂ℓ pen /∂β r , F r ðβ r Þ = −E½∂ 2 ℓ pen /∂β r ∂β T r , with respect to the current baseline hazard b λ ½m−1 and the rth linear effect β ½m−1 r or b α ½m−1 , respectively. Obtain p s + 1 possible updates and find the best performing effect * ∈f1, ⋯, p s + 1g according to Section 2.2, yielding the update u * = ðu T λ , u * Þ T containing the update for the effect * with corresponding baseline hazard update end for (iii) Determine the best performing tuple ðm * ,l ,m * ,s Þ with respect to prediction based on the unpenalized joint log-likelihood ℓ as explained in more detail in Section 3.3.
2 Computational and Mathematical Methods in Medicine designed to directly maximize a given likelihood where concepts, which gradient boosting implicitly offers, are reproduced artificially for regular optimization methods like Newton algorithms or Fisher scoring. The method was further developed for flexible semiparametric mixed models [16] and for several classes of generalised mixed models [17][18][19][20]. The R package GMMBoost [21] covers most of these approaches. Likelihood-based boosting has also been proved useful for survival analysis with time-varying effects [22], and a general overview is given in [23]. Since the random structure plays an important role as a connector between longitudinal and time-to-event data, we additionally incorporate a novel correction step within the estimation procedure for the random effects, which was first suggested in [24,25] and reduces possible bias arising from wrongly identified random effects. The contribution of this work is the novel lbbJM boosting algorithm for joint models, which offers the first boosting-based regularization approach for time-dependent covariates in survival analysis and in addition new variable selection mechanics for joint models with focus on timeto-event analysis.
The remainder of this manuscript is structured as follows: Section 2 highlights the overall concepts of both joint modelling and likelihood-based boosting to give a sufficient understanding of the methods used in the following parts. Section 3 then contains a detailed description of the considered joint model together with the proposed boosting algorithm and its computational details. Sections 4 and 5 deal with applying the algorithm to different setups of simulated data as well as to the AIDS dataset [26] included in the JM package. Results and possible extensions are discussed in the final section.

Backgrounds
Before the algorithm is presented and discussed in detail, we briefly highlight the concepts of both joint modelling and likelihood-based boosting.
2.1. Joint Models. In general, a joint model consists of two parts, one longitudinal and one survival submodel. A popular view on joint modelling is to choose one model as the main model, whereas the other model then features the analysis occurring in the main model. With the primary outcome being longitudinal data, a survival model can be used to correct for event-dependent dropout in longitudinal analysis. For time-to-event data as outcome of interest, additional longitudinal modelling reduces measurement error on the one hand and, on the other hand, extrapolates only on single time points observed longitudinal data to continuous functions which are then included in survival analysis. We will from now on focus on joint models with time-toevent data as primary outcome.
The longitudinal submodel takes the form: where longitudinal outcome y is described by the longitudinal predictor function η l depending on time t and a set of covariates x. Although t can be included in x, we will highlight it in the context of joint models, as the role of t is of greater importance. In the survival submodel, the hazard is modelled by a baseline hazard λ 0 ðtÞ with multiplicative effects described by the survival predictor function η s . In addition, the longitudinal predictor η l is reappearing in the survival model, this time scaled by a factor α. The parameter α thus quantifies the association of the two submodels and is therefore called the association parameter. It can be interpreted as the impact a time-varying longitudinal covariate has on the hazard for an event.
Parameter estimation for such joint models can be done in various ways. Two common methods are two-stage and joint likelihood approaches, respectively. In the former, the longitudinal model is estimated with the estimation method of choice leading to the model fit b η l , which is then carried forward as fixed covariate into survival analysis. In the latter, longitudinal and survival submodels are combined in a single joint likelihood. Let i = 1, ⋯, n denote clusters and j = 1 , ⋯n i the repeated measurements. Assuming independent data generating processes for both submodels, the joint likelihood can then be written as 3 Computational and Mathematical Methods in Medicine with densities f l and f s for the longitudinal and survival submodels and time-to-event outcome ðT, δÞ = ðT i , δ i Þ i∈ℕ . Regular inference is now done by maximizing (3) using appropriate maximization methods, the most prominent one being EM algorithms.

Likelihood-Based Boosting.
We intend to give a short description of the underlying mechanics used in the following section. The overall concept of likelihood-based boosting is to create an iterative and component-wise updating scheme, which eventually converges to a maximum likelihood estimator but is stopped early in order to prevent overfitting. Let β model be the effect of p covariates. Likelihood-based boosting maximizes a given log-likelihood lðβÞ by component-wise Fisher scoring in the following way: For each covariate r ∈ f1, ⋯, pg consider the subvector β r containing only the coefficients referring to the rth covariate. We compute the score vector and Fisher matrix as and obtain a possible update for the rth component. Now, we determine the best performing covariate with respect to likelihood maximization, i.e., find the component * = arg max where the corresponding update yields the biggest improvement of the likelihood. One receives a new model fit by weakly updating this best performing component, i.e., by scaling with a factor ν, the so called step length: The step length ν is controlling the weakness of the update to prevent overfitting and give every covariate a chance for selection. A popular choice in the literature is ν = 0:1.   Repeating this updating process for a sufficiently large number of iterations leads to the regular maximum likelihood estimator for β. But instead the algorithm is stopped early to gain better prediction performance and variable selection. The optimal amount of iterations actually is a tuning parameter of the method and can be determined via cross-validation or by focusing on information criteria like AIC or BIC [27,28].

Boosting Joint Models
3.1. The Model. Before we introduce the boosting algorithm, we describe the specific joint model. With i = 1, ⋯, n denoting the individual and j = 1, ⋯, n i a single specific measurement, the longitudinal submodel is given by where y ij is modelled by η l depending on specific measurement times t ij and longitudinal time-independent covariates x li ∈ ℝ p l . This represents a standard linear mixed model with intercept β 0 and fixed linear effects β t and β l of time and baseline covariates as well as individual specific random effects γ 0i and γ ti with ðγ 0i , γ ti Þ~N ⊗2 ð0, QÞ. The error terms ε ij are assumed to follow a normal distribution with E½ε ij = 0 and Varðε ij Þ = σ 2 > 0.
Please note that the model can be additionally extended to interaction effects of time t ij with baseline covariates x ti ∈ ℝ p t by including the term β T t x ti t ij in (8). This results in slightly adjusted integrals in the survival part and is omitted in the following for the sake of better readability.
In the survival part, the individual hazard is modelled with the survival predictor η s ðx si Þ = β T s x si containing additional linear effects β s of baseline covariates x si ∈ ℝ p s . To execute a full likelihood approach, the baseline hazard is chosen to be piecewise-constant depending on the number of segments K and their exact locations I k = ½t k−1 , t k Þ with t 0 = 0 and t k = max ðTÞ for k = 1, ⋯, K. The collection of values for the baseline hazard is denoted in λ = ðλ k Þ k=1,⋯,K . Later, we will choose K between 7 and 10 in order to guarantee substantially more flexibility than a constant baseline hazard without becoming computationally too demanding. Given two formulas (8) and (9), we can now calculate the joint log-likelihood. Let y = ðy ij Þ i=1,⋯,n,j=1,⋯,n i denote the collection of all longitudinal measurements. Assuming the time-to-event process is conditionally independent from the longitudinal random structure, the joint likelihood can be decomposed into a longitudinal and a survival term. Set . Furthermore, τ contains information on variancecovariance components σ 2 and Q. The unpenalized longitudinal log-likelihood is where ϕð·|m, vÞ denotes the density of a normal distribution with mean m and variance v. Laplace approximation follows [29] and then leads to an additional quadratic penalty term for the random effects yielding the penalized log-likelihood: Note that for the penalized log-likelihood τ substitutes σ 2 as an argument, since the penalized log-likelihood additionally contains information of the variance matrix Q.
On the other hand, for given survival outcome ðT, δÞ = ðT i , δ i Þ i=1,⋯,n with event times T i and censoring indicator δ i , the survival log-likelihood takes the well-known form: 3.2. The Algorithm. The following lbbJM algorithm (likelihood-based boosting for joint models) describes a way to fit the formulated joint model by likelihood-based boosting methods discussed in Section 2.2.

Computational Details of the Algorithm.
In general, the algorithm carries out a hybrid between a two-stage and a joint likelihood approach. For one single tuple ðm l , m s Þ, the fitting procedure goes as follows: In a first step, the longitudinal submodel is boosted up to m l iterations using the lbbLMM boosting algorithm [25]. The received estimates are carried forward into the survival model, where another boosting process up to m s iterations takes place. This fitting process is carried out for any tuples ðm l , m s Þ with m l < m max,l , m s < m max,s , where m max,l and m max,s are prespecified maximum numbers of iterations per submodel. For every of these possible combinations of stopping iterations, the corresponding estimates are evaluated based on the joint likelihood using test data, which can be achieved via cross-validation or bootstrapping. Hence, the algorithm uses two-stage fitting but joint likelihood evaluation. We give a detailed description for both parts. Exact formulas for all appearing variants of score vectors and Fisher matrices can be found in the supplementary material. Please note that due to the component-wise updating scheme in both submodels, the lbbJM algorithm works with arbitrarily high numbers of candidate variables and is therefore not confined to low-dimensional data structures.
For starting values, the parameters, which actually underlie the boosting process, are necessarily set to zero, The remaining values are extracted from a standard linear mixed model for time and random effects by using, e.g., the function lme from the R package nlme. For boosting longitudinal fixed effects, for convenience, we omit the iteration index as well as the hat indicating estimated values in the following subsections. In the first step of the longitudinal part, the effects β l follow the classical component-wise likelihood-based boosting procedure. In Table 5: Shrinkage and variable selection properties by the different packages for model (22).  Computational and Mathematical Methods in Medicine each iteration, an update for every single covariate r together with intercept β 0 and time effect β t is computed, leading to p l three-dimensional updates. Selection of the best performing component is then performed either by selecting the component yielding the optimal likelihood maximization or lowest information criteria like AIC or BIC, which minimizes the model complexity rather than residuals. The linear effect β t is excluded from the selection process, as it holds a very important role in a joint model and should always be included.
For updating random effects, after the best performing fixed effect from β l was updated, in every iteration, an update for the random effects is executed separately. This means that the score vector and Fisher matrix have to be derived in order to execute the update The matrix C is a correction matrix which prevents from potential correlations between the random effects estimates and any observed covariates [25], and its derivation can be traced in more detail in the supplementary material.
For updating variance-covariance components, the covariance matrix Q of the random effects is updated with an approximate EM algorithm using the posterior curvatures F ii of the random effects model [30]. Receive an update by computing The current longitudinal model error is obtained by setting Varðy − η l Þ.
For boosting the association parameter and survival effects, once the longitudinal part was updated in up to m l iterations, the algorithm proceeds to boost the effects β s and α. Although being of different structures, the association parameter α is boosted alongside the effects β s , meaning the algorithm decides whether the association or some baseline survival effect is updated, based on which parameter leads to the best likelihood improvement. This means, the linear effect of the whole longitudinal trajectory is also scaled by the step length ν when being updated within the selection step, which minimizes the chance of potential overfitting also for the association parameter. An alternative method would be choosing just from the effects in β s and updating α in an additional step by optimizing the current likelihood. This approach was used in [11] and treats α as a nuisance parameter, which might not be satisfactory with regards to the importance of α. Again, only the update for α and β s is scaled by the step length ν s . The baseline hazard λ receives a full update.
For step lengths, apart from the stopping iterations, the step lengths are tuning parameters of the boosting algorithm. Although there is some effort in focusing on adaptive step lengths recently, we chose to set both step lengths to the constant value ν l = ν s = 0:1. The exact choice of the step length factor is of minor importance as long as it is sufficiently small to ensure proper performance. Setting it to 0.1 is an established choice in the boosting literature [31,32].
For stopping iterations, since the step lengths are chosen to be constant, the tuple ðm l , m s Þ is the main tuning parameter of the boosting algorithm. In regular boosting with only one iteration index, it is convenient to check for every single iteration and take the estimates from the estimation count leading to the best prediction. In the present twodimensional case, this would mean finding with M ≔ f1, ⋯, m max,l g × f1, ⋯, m max,s g, b ϑ ½m l ,m s denoting the vector of estimates received via the tuple ðm l , m s Þ of total iterations and X test a complete set of test data for evaluation. Problem (19) is then solved via k-fold cross-validation. But since checking for every single tuple ðm l , m s Þ ∈ M implies a very high computational effort, we suggest to coarsen the grid and check for fewer possible stopping iterations in the longitudinal part, e.g., m l ∈ f10,20,30, ⋯, m max,l g. Because of the two-stage-approach nature of the algorithm, we still can check for every single m s ∈ f1, ⋯, m max,s g without gaining additional computational effort. Furthermore, parallel computing can be executed in order to minimize computational demand.

Simulations
We evaluate the lbbJM algorithm with a simulation study. The aim is to assess estimation and shrinkage characteristics in general as well as variable selection properties and performance in high dimensional, i.e., p > n settings. The lbbJM algorithm is included in two variants. While lbbJM a executes the full approach as depicted in Section 3.2, lbbJM b performs a shortened two-stage procedure where the longitudinal submodel is fitted in advance using regular maximum likelihood inference and does not underlie any regularization. The exact lbbJM b algorithm is depicted in detail in the supplementary material. We additionally include the JM package as state of the art for convenient estimation of joint models as well as the glmnet package, which offers elastic net penalization for start-stop-data and therefore an alternative approach for regularization of time-dependent covariates in survival analysis. None of the competitors are completely suitable for a benchmark comparison and are viewed as reference points for the specific objectives addressed by the lbbJM algorithm. Regarding glmnet, as an alternative approach to regularization of time-dependent covariates, shrinkage and variable selection properties are of interest, although it focuses solely on survival analysis. JM in addition offers unregularized effect estimates with corresponding 7 Computational and Mathematical Methods in Medicine significance indicator but is neither suitable for highdimensional setups nor offers variable selection.
4.1. Setup. The simulations are executed according to the model described in Section 3.1 with n ∈ f100,500g and n i = 5 using inversion sampling, which is explained in detail in the supplementary material. The prespecified true parameter values are with variance components σ = 0:1, Q = 2 0:1 The entries of the covariate vectors x li and x si are drawn independently from the standard normal distribution N ð0, 1Þ. In addition to the informative covariates with effects β l and β s , the covariate vectors x li and x si are expanded with noninformative noise variables until the chosen numbers p l and p s of total dimensions are reached. These additional noise variables are included to evaluate variable selection properties and robustness of the approach in case of a misspecified model. The baseline hazard is chosen as λ 0 ðtÞ = 2:5t 1:5 and given the censoring mechanism described in the supplementary material, the chosen parameter values result in an average censoring rate of ≈50%.
Overall, we consider two scenarios. One lowdimensional setup with n = 500 and p l = p s = 9 mimicking a more common data structure and one high-dimensional setup, where the number of covariates included in the survival submodel exceeds the number of individuals so that conventional approaches like JM fail to return results.
For the computation, JM and lbbJM use K = 10 with equidistant knot placement. The grid M ≔ f25, 50, ⋯, 500g × f1, 2, ⋯, 1000g is specified for possible tuples of stopping iterations and the optimal regularization parameter for glmnet is determined by the function cv.glmnet() based on 10-fold cross-validation.

Results.
Since the compared estimation routines follow different approaches targeting various objectives from regular maximum likelihood estimation in joint models to regularization in pure time-to-event analysis, we focus on plain coefficient estimates averaged over 100 independent simulation runs in order to asses estimation characteristics. Variable selection properties are evaluated by considering share of true positives (TP) and false discovery rate (FDR).
For low-dimensional setup (n = 500, p s = 9), Table 1 depicts the results for the low-dimensional setup. In the longitudinal submodel, lbbJM a has small shrinkage and therefore offers variable selection with a rather low false discovery rate of 0.23 but still selects all informative variables. The time effect β t receives comparatively high shrinkage, since time, as a cluster-varying variable, adds more information to the model. Overall, the longitudinal submo-del is boosted up to 108.25 iterations on average yielding rather strongly shrunk coefficient estimates. Please note that the results for lbbJM b are simply obtained by lme() and very similar to JM. In the survival part, both boosting approaches substantially outperform glmnet in terms of variable selection while again receiving also more shrinkage. Due to the comparatively rough baseline hazard depicted in Figure 1, all full likelihood approaches, i.e., JM and lbbJM, receive additional shrinkage which is unaffected by possible regularization. As glmnet uses the partial likelihood, there are no estimates for the baseline hazard function available and the small elastic net penalty of λ * = 0:003 on average also results in weaker performance regarding the rate of false positives. Overall, the lbbJM approaches yield satisfactory results regarding both regularization and variable selection. The effect estimates clearly reflect the true values approximately obtained by JM while simultaneously receiving sufficiently large shrinkage for decent performance of identifying influential covariates in both the longitudinal and survival submodels.
For high-dimensional setup (n = 100, p s = 100), Table 2 depicts the results for the high-dimensional setup. As expected, estimates in the high-dimensional setup are regularized stronger and therefore experience more shrinkage. Again, the boosting approaches contained in lbbJM yield better variable selection properties and slightly more regularized coefficient estimates, although results seem to align with increasing dimensions. Note that JM is not capable of handling high-dimensional data structures and is therefore not included in the high-dimensional setup at all.
For computational effort, Table 3 shows estimates for elapsed computation time of each routine. Times are measured in seconds and depict the computation time for one single model fit which was carried out on a 2 × 2:66 GHz-6-Core Intel Xeon CPU (64 GB RAM). As expected, the full boosting approach executed in lbbJM a comes with high computational costs similar to [11]. Note that the runtimes are higher in the low-dimensional scenario as the overall number of clusters is higher (n = 500) leading to an increased burden in the already time-consuming longitudinal boosting procedure. The reduced approach lbbJM b , however, runs considerably faster and is therefore more desirable as long as research focus lies solely on the time-to-event analysis.

Application
We showcase the lbbJM algorithm by applying it to the 1994 AIDS data [26]. The study is aimed at comparing the two antiretroviral drugs, didanosine (ddI) and zalcitabine (ddC), based on a collective of 467 patients infected with human immune deficiency virus (HIV) who were either intolerant to or failed a previous treatment with Zidovudine (AZT). Alongside several baseline covariates, the square root CD4 cell count was recorded at study entry and in multiple follow-ups after 2, 6, 12, and 18 months, respectively. The CD4 cells are attacked by the virus and thus decrease over time for infected patients; hence, they are a widely used surrogate for disease progression. In addition to the 8 Computational and Mathematical Methods in Medicine longitudinal outcome, 188 patients died during the time of the study leading to 188 observed and 279 censored events. The structure of the data is depicted in Table 4.
We formulate the joint model The CD4 cell count CD4 i ðtÞ for i = 1, ⋯, 467 is described by a linear mixed model with random intercepts, random slopes for time t and linear effects of time, drug and an additional interaction between time and drug. The true, i.e. modelled by η l ðtÞ, underlying profile of the CD4 cell count is then included together with the remaining baseline covariates in the Cox full likelihood model, thus the model error εðtÞ is eliminated. Here, drug is a dummy for ddI, gender for female gender, AZT for failure of Zidovudine therapy, and prevOI for prevalence of AIDS. The number of segments for the baseline hazard was chosen to be K = 7.
We fit model (22) with the same methods as already used in the simulation study. The tuning parameters of the boosting algorithm were chosen to be ν l = ν s = 0:1 for step lengths and M a ≔ f5, 10, ⋯, 100g × f1, 2, ⋯, 250g for the grid of possible tuples of stopping iterations for lbbJM a and M b ≔ f1, 2, ⋯, 250g for lbbJM b . Again, glmnet was tuned using the cv.glmnet() function and all regularization approaches are based on 10-fold cross validation.
For lbbJM a , m * ,l = 10 and m * ,s = 33 formed the best performing tuple of stopping iterations. The two-stage approach lbbJM b used m * ,s = 40 and the optimal penalization parameter for glmnet turned out as λ * pen = 0:0014. The corresponding coefficient estimates are shown in Table 5. Overall, the results reflect what was already observed in the simulation study. While glmnet shows quite conservative shrinkage properties where every variable is included in the final model, the lbbJM approaches tend to stop rather early yielding bigger shrinkage and in addition effects, which did not get selected at all. In order to give some point of reference regarding variable selection properties, the p values computed by JM are included in Table 5. The selected effects align quite nicely with being significant according to JM. While lbbJM a only includes the variable drug with only half the effect size, lbbJM b additionally sees a quite tiny impact of female gender.
The corresponding coefficient progressions for the survival submodel are visualized in Figure 2. Both algorithms update the coefficients referring to the longitudinal CD4 cell profile and the variable prevOI right away and therefore see a strong connection between the risk for death and the CD4 cell count as well as whether or not a patient has AIDS. Due to early stopping, lbbJM a selects neither of the two remaining covariates into the final model. lbbJM b includes one variable more, gender, which however only has a very small effect.

Outlook and Discussion
Overall, the lbbJM algorithm introduces a novel boostingbased regularization scheme to joint models focusing on survival analysis as well as to Cox models with time-dependent covariates. The method fits in well among alternative routines and especially stands out with respect to variable selection properties. Due to its clear advantage regarding computational effort as depicted in Table 3, lbbJM b is the preferred routine when research interest clearly lies on time-to-event analysis, whereas lbbJM a is capable of regularizing both submodels simultaneously. Besides the good results regarding variable selection, it is also expected that the proposed boosting methods can improve the predictive power of a joint model, since boosting algorithms are, due to their model tuning based on test errors, primarily used for prediction. A thorough investigation of improving and evaluating prediction performance of a joint model by several boosting techniques remains an interesting task.
Still, the presented foundation is of a comparatively simple nature and possible extensions include more flexible modelling, e.g., based on P-splines which allow smooth effects of time-dependent as well as time-independent covariates and can additionally include time-varying effects as well as possibly time-varying association structures. As the current algorithm is confined to a time-constant association parameter α, an extension to αðtÞ would increase the flexibility of the model. While it is usually difficult to disentangle a time-dependent association parameter from the longitudinal trajectory for parameter estimation, the lbbJM algorithm could potentially avoid these identification issues due to its clear separation in a longitudinal and a survival boosting process.
Similar regularization-based approaches have been proved useful for time-to-event settings [22,33], and it can be assumed that the presented method could benefit from these concepts as well.
Moreover, the presented work only lays a foundation to several extensions addressing known issues for both boosting and joint modelling. It represents an accessible way to boosting for time-dependent covariates in survival analysis. Gradient boosting has known limitations in this matter, and although efforts for a framework to overcome these issues are made [34], things rather quickly become technical and the proven robustness of likelihood-based boosting represents a flexible and far more intuitive alternative. Further developments in this direction could include multiple time-dependent covariates based on the two-stage approach of lbbJM b , where additional effort is necessary in order to provide a fair competition between time-dependent and time-independent covariates within the selection procedure.
Furthermore, the component-wise updating process is capable of including allocation mechanisms into the algorithm. While the lbbJM algorithm is due to its two-stage fitting process fairly robust to estimate models, where one candidate variable is assigned to both submodels, this kind of specification is not advised in general as identification issues may arise and a proper interpretation of the resulting model might be challenging. However, it is usually tricky to 9 Computational and Mathematical Methods in Medicine decide, whether a covariate should be included in the longitudinal or the survival part of the model and these decisions are often made using prior knowledge. An allocation routine based on likelihood maximization could therefore greatly improve joint model inference and would additionally eliminate the two-stage nature of the lbbJM algorithm. This would not only decrease the computational burden but also provide a far more flexible algorithm allowing for regularization, variable selection, and allocation.

Data Availability
The R code data used to support the findings of this article are included within the supplementary information files.

Conflicts of Interest
The authors declare that there is no conflict of interest.