Efficient Algorithms and Implementation of a Semiparametric Joint Model for Longitudinal and Competing Risk Data: With Applications to Massive Biobank Data

Semiparametric joint models of longitudinal and competing risk data are computationally costly, and their current implementations do not scale well to massive biobank data. This paper identifies and addresses some key computational barriers in a semiparametric joint model for longitudinal and competing risk survival data. By developing and implementing customized linear scan algorithms, we reduce the computational complexities from O(n2) or O(n3) to O(n) in various steps including numerical integration, risk set calculation, and standard error estimation, where n is the number of subjects. Using both simulated and real-world biobank data, we demonstrate that these linear scan algorithms can speed up the existing methods by a factor of up to hundreds of thousands when n > 104, often reducing the runtime from days to minutes. We have developed an R package, FastJM, based on the proposed algorithms for joint modeling of longitudinal and competing risk time-to-event data and made it publicly available on the Comprehensive R Archive Network (CRAN).


Introduction
In clinical research and other longitudinal studies, it is common to collect both longitudinal and time-to-event data on each participant, and these two endpoints are often correlated. Joint models of longitudinal and survival data have been widely used to mitigate incorrect estimation and statistical inferences associated with separate analysis of each endpoint [1,2]. For instance, in longitudinal data analysis, joint models are often used to handle nonignorable missing data due to a terminal event, which cannot be properly accounted for by a standard mixed effects model or generalized estimating equation (GEE) method that relies on the ignorable missing-atrandom or missing-completely-at-random assumption [3][4][5]. Joint models are also popularly employed in survival analysis to study the effects of a time-dependent covariate that is measured intermittently or subject to measurement error [6][7][8][9] and for dynamic prediction of an event outcome from the past history of a biomarker [10][11][12][13][14]. Comprehensive reviews of joint models for longitudinal and time-to-event data and their applications can be found in Elashoff et al. [1], Hickey et al. [15], Rizopoulos [16], Sudell et al. [17] and the references therein.
Despite the explosive growth of literature on joint models for longitudinal and time-to-event data during the past three decades, efficient implementation of joint models has lagged behind, which limits the application of joint models to only small to moderate studies. Recently, massive sample size data collected from electronic health records (EHRs) and insurance claim databases warrants great opportunities to conduct clinical studies in a real-world setting. For example, the UK Biobank is a prospective cohort study with approximately 500,000 individuals, aged 37-73 years, from the general population between 2006 and 2010 in the United Kingdom [18,19]. Aggregated and quality controlled EHR data purchasable from Optum (https://www.optum.com/EHR) includes 80+ millions patients with longitudinal lab measures. The Million Veteran Project [20] and IBM MarketScan Database [21] are some of many other big biobank databases that contain rich yet complex longitudinal and time-to-event data on 10 5~1 0 8 patients. However, current implementations of many joint models are inefficient and not scalable to the size of biobank data as demonstrated later in Sections 3 and 4. There is a pressing need to develop efficient implementations of joint models to enable the analysis of these rich data sources.
The purpose of this paper is to develop and implement efficient algorithms for a semiparametric joint model of longitudinal and competing risk time-to-event data. As specified in Section 2.1, the joint model consists of a linear mixed effects submodel for a longitudinal outcome and a semiparametric proportional cause-specific hazard submodel for a competing risk survival outcome. These two submodels are linked together by shared random effects or features of an individual's longitudinal biomarker trajectory. In Section 2, we identify key computational bottlenecks in the semiparametric maximum likelihood inference procedure for the joint model. Specifically, we point out that in a standard implementation, the computational complexities for numerical integration, risk set calculation, and standard error estimation are of the order Oðn 2 Þ, Oðn 2 Þ, and Oðn 3 Þ, respectively, where n is the number of subjects. Consequently, current implementation grinds to a halt as n becomes large (for example, n > 10 4 ). We further develop tailored linear scan algorithms to reduce the computational complexity to OðnÞ in each of the aforementioned components. We illustrate by simulation and real-world data that the linearization algorithms can result in a drastic speed-up by a factor of many thousands when n > 10 4 , reducing the runtime from days to minutes for big data. Finally, we have developed a user-friendly R package FastJM to fit the shared parameter semiparametric joint model using the proposed efficient algorithms and made it publicly available on the Comprehensive R Archive Network (CRAN) at https:// CRAN.R-project.org/package=FastJM. The rest of the paper is organized as follows. Section 2.1 outlines the semiparametric shared random effect joint model framework and reviews a customized expectationmaximization (EM) algorithm for semiparametric maximum likelihood estimation as well as a standard error estimation method. Section 2.2 develops various linear scan algorithms to address the key computational bottlenecks in the EM algorithm and standard error estimation for large data. Section 3 presents simulation studies to illustrate the computational efficiency of the proposed linear scan algorithms. Section 4 demonstrates the improved computational performance of our FastJM R package over some established joint model R packages on two moderate to large real-world data sets. Concluding remarks are provided in Section 5.

Efficient
Þ be the competing risk data on subject i, where T i is the observed time to either failure or censoring and D i takes value in f0, 1, 2, ⋯, Kg, with D i = 0 indicating a censored event and D i = k implying the kth type of failure is observed on subject i, k = 1, 2, ⋯, K.
The censoring mechanism is assumed to be independent of the failure time.
2.1.1. Model. Consider the following joint model in which the longitudinal outcome Y i ðtÞ is characterized by a linear mixed effects model: and the competing risk survival outcome is modeled by a proportional cause-specific hazard model: is the measurement error independent of b i , and ε i ðt 1 Þ is independent of ε i ðt 2 Þ for any t 1 ≠ t 2 . In the competing risk survival submodel (2), λ k ðt | X ð2Þ i , b i , γ k , ν k Þ is the conditional cause-specific hazard rate for type k failure at time t, given time-invariant covariates X ð2Þ i and the shared random effects b i , and λ 0k ðtÞ is a completely unspecified baseline cause-specific hazard function for type k failure. The two submodels are linked together by the shared random effects b i , and the strength of association is quantified by the association parameters ν 1 , ⋯, ν K .

Semiparametric Maximum Likelihood Estimation via
an EM Algorithm. Let Ψ = fβ, σ 2 , γ, ν, Σ, λ 01 ð·Þ, ⋯, λ 0K ð·Þg denote all unknown parameters for the joint models (1) and (2) IðD i = kÞ be the competing event indicator for type k failure, which takes the value 1 if the condition D i = k is satisfied and 0 otherwise. The observed-data likelihood for Ψ is then given by 2 Computational and Mathematical Methods in Medicine which follows from the assumption that Y i and C i are independent conditional on the covariates and the random effects. Because Ψ involves K unknown hazard functions, finding its maximum likelihood estimate by maximizing the above observed-data likelihood is nontrivial. However, a customized EM algorithm can be derived to compute the maximum likelihood estimate of Ψ by regarding the latent random effects b i as missing data [3]. The complete-data likelihood based on ðY, C, bÞ is where Λ 0k ð:Þ is the cumulative baseline hazard function for type k failure and ΔΛ 0k The EM algorithm iterates between an expectation step (E-step): and a maximization step (M-step): until the algorithm converges, where Ψ ðmÞ is the estimate of Ψ from the mth iteration. The E-step (5) involves calculating the following expected values across all subjects: for any function hð·Þ. Furthermore, it can be shown that the M-step (6) has closed-form solutions for the parameters β, σ 2 , Σ, and Λ 0k ðtÞ, and that the other parameters γ and ν can be updated using the one-step Newton-Raphson method. Details are provided in equations (7.1)-(7.8) of the supplementary materials (available here).
where ∇ Ω l ðiÞ ðΩ ; Y, CÞ is the observed score vector from the profiled likelihood l ðiÞ ðΩ ; Y, CÞ of Ω on the ith subject by profiling out the baseline hazards. Details of the observed score vector for each parametric component are provided in equations (7.9)-(7.13) of the supplementary materials.

Efficient
Algorithms and Implementation of the EM Algorithm and Standard Error Estimation. With naive implementation, multiple quantities in the above E-step, M-step, and standard error estimation will involve Oðn 2 Þ or Oðn 3 Þ operations, which become computationally prohibitive at large sample size n. Below we identify these bottlenecks and discuss appropriate linear scan algorithms to reduce their computational complexities to OðnÞ.

Efficient Implementation of the E-
Step. At each EM iteration, the E-step (5) requires calculating integral (7) across all subjects. Below we discuss how to accelerate two commonly used numerical integration methods for evaluating these integrals.
(1) Standard Gauss-Hermite Quadrature Rule for Numerical Integration. A commonly used method for numerical evaluation of integral (7) is based on the standard Gauss-Hermite quadrature rule [26]:

Computational and Mathematical Methods in Medicine
with Λ ðmÞ 0k ð:Þ the right-continuous and nondecreasing cumulative baseline hazard function for type k failure at the mth EM iteration as defined in Section 7.1 (equation (7.4)) of the supplementary material and ΔΛ ðmÞ 0k ðT i Þ the jump size of Λ ðmÞ 0k ð:Þ at T i , q the dimension of the random effects vector, ∑ t 1 ,t 2 ,⋯,t q the shorthand for ∑ n q t 1 =1 ⋯ ∑ n q t q =1 , n q the number of quadrature points, c t = ðc t 1 , c t 2 , ⋯, c t q Þ T the abscissas with corresponding weights π t ,b ðmÞ t = ffiffi ffi 2 p Σ∧ ðmÞ1/2 c t the rescaled alternative abscissas, and Σ∧ ðmÞ1/2 the square root of Σ∧ ðmÞ [3]. However, this method is computationally intensive due to multiple factors. First, it usually requires many quadrature points to approximate an integral with sufficient accuracy because the mode of the integrand is often located in a region different from zero. Second, the computational cost increases exponentially with q because the Cartesian product of the abscissas is used to evaluate the integrand with respect to each random effect. Lastly, the alternative abscissasb t ðmÞ need to be recalculated at every EM iteration.
(2) Pseudo-Adaptive Gauss-Hermite Quadrature Rule for Numerical Integration. When the number of longitudinal measurements per subject is relatively large, Rizopoulos [27] introduced a pseudo-adaptive Gauss-Hermite quadrature rule for numerical approximation of integral (7), which achieves good approximation accuracy with only a small number (n q ) of quadrature points and is thus computationally more efficient. The pseudo-adaptive Gauss-Hermite quadrature rule proceeds as follows. First, fit the linear mixed effects model (1) to extract the empirical Bayes estimates of the random effects and its covariance matrixH i where X with the notations defined similarly to Equation (10).
A derivation of (13) is provided in Section 7.2 of the supplementary materials. The pseudo-adaptive Gauss-Hermite quadrature rule is computationally appealing because the alternate rescaled quadrature pointsr t are computed only once before the EM algorithm and do not need to be updated in the EM algorithm. Additionally, the pseudoadaptive Gauss-Hermite quadrature rule requires fewer quadrature points than the standard Gauss-Hermite quadrature rule to achieve the same numerical approximation accuracy [27]. For example, our simulation results in the supplementary materials (Section A.4, Table A.1) illustrate that the pseudo-adaptive Gauss-Hermite quadrature rule with n q = 6 quadrature points produces almost identical results to the standard Gauss-Hermite quadrature rule with n q = 20 quadrature points. i involves a summation over n subjects. However, because the same quantity i 's across all subjects in OðnÞ operations. Our simulation study in the supplementary material (Section A.5, Figure A.1) shows that applying this simple linearization algorithm can yield a speed-up by a factor of 10 to 10,000 when n grows from 10 to 10 5 . Our implementation is also significantly faster than a popular R package lme4 [28] with over a speed-up by a factor of 10 to 500 as n grows from 10 to 10 5 .(3) Linear Scan Algorithm for Calculating f ðY i , C i |r t , Ψ ðmÞ Þ across All subjects. Both the standard Gauss-Hermite quadrature rule (9) and the pseudo-adaptive Gauss-Hermite quadrature rule (13) require evaluating f ðY i , C i | b i , Ψ ðmÞ Þ at their prespecified abscissas across all subjects (see Equations (10) and (14)). Hence, calculating f ðY i , C i | r t , Ψ ðmÞ Þ requires evaluation of Λ ðmÞ 0k ðT i Þ across all subjects for each k. We observe from equation (7.4)

Remark 1 (Linear calculation ofH
where t k1 , ⋯, t kq k are scanned forward from the largest to the smallest, and for each t kj , only a subset of the ranked observation times T ðiÞ are scanned forward to calculate Λ ðmÞ 0k ðT ðiÞ Þ as follows: Specifically, start with t k1 and scan through the first set of observation times T ð1Þ ≥ ⋯≥T ði k1 Þ where T ði k1 Þ ≥ t k1 > T ði k1 +1Þ , and the corresponding Λ 0k ðT ðiÞ Þ's take the value Λ 0k ðt k1 Þ. Next, move forward to t k2 and scan through the second set T ði k1 +1Þ ≥ ⋯≥T ði k2 Þ , where T ði k2 Þ ≥ t k2 > T ði k2 Þ+1 , and the corresponding Λ 0k ðT ðiÞ Þ's take the value evaluated at Λ 0k ðt k2 Þ. Repeat the same process until T ðnÞ is scanned. Because the scanned T ðiÞ 's for different t kj 's do not overlap, the entire algorithm costs only OðnÞ operations.

Linear Risk Set Scan for the M-
Step. In the M-step, multiple quantities in equations (A.4)-(A.8), such as the cumulative baseline hazard functions and the Hessian matrix and score vector for γ k and ν k (k = 1, 2, ⋯, K), involve aggregating information over the risk set Rðt kj Þ = fr : T r ≥ t kj g at each uncensored event time t kj . These quantities are further aggregated across all t kj 's. If all subjects are scanned to determine the risk set Rðt kj Þ at each t kj , then aggregating information over the risk set for all uncensored event times would obviously require Oðn 2 Þ operations. Below we explain how to linearize the number of operations for risk set calculations across all uncensored event times by taking advantage of the fact that the risk set is decreasing over time for right censored data.
First, to calculate Λ ðm+1Þ 0k ðt kj Þ, j = 1, ⋯, q k , one needs to compute ∑ r∈Rðt kj Þ exp ðX ð2ÞT r γ ðmÞ k ÞEfexp ðν ðmÞ k b r Þg, j = 1, ⋯, q k . Because the distinct uncensored event times t k1 > ⋯> t kq k are arranged in decreasing order, the risk set Rðt kðj+1Þ Þ can be decomposed into two disjoint sets: Rðt kðj+1Þ Þ = Rðt kj Þ ∪ fr : T ðrÞ ∈ ½t kðj+1Þ , t kj Þg, and consequently, for any sequence of real numbers a 1 , ⋯, a n . It follows from the recursive formula (17) and the fact that the subjects in Rðt kj Þ do not need to be scanned to calculate the second term of (17), one can calculate ∑ r∈Rðt kj Þ a r , j = 1, ⋯q k , in OðnÞ operations when T ðrÞ 's are scanned backward in time.
Next, to calculate the Hessian matrix I ðmÞ γ k for γ k in (A.5), we first rewrite it as which allows one to linearize its calculation based on ( ðt kj Þ's. Finally, the above linear risk set scan algorithm can be adapted to calculate the Hessian matrix and score vector for γ k and ν k in equations (A.6)-(A.8) in OðnÞ operations in a similar fashion.

Linear Scan Algorithm for Standard Error Estimation.
The standard error estimation formula in (8) relies on the observed score vector from the profiled likelihood where the baseline hazard is profiled out. However, for each subject i, two components of the score vector, ∇ γ k l ðiÞ ðΩ ∧ ; Y, CÞ and ∇ ν k l ðiÞ ðΩ ∧ ; Y, CÞ as given in equations (A.12) and (A.13), involve aggregating information either over fr ∈ RðT i Þg or over both fr ∈ Rðt kj Þg and fj : t kj ≤ T i g. If implemented naively, the aggregation can take either OðnÞ or Oðn 2 Þ operations, respectively. As a result, the observed information matrix can take Oðn 3 Þ operation as it requires summing up the information across all subjects. Below we describe a sequential linear scan algorithm to reduce the computational complexity from Oðn 3 Þ to OðnÞ.
Our algorithm can be easily explained by considering the calculation of the following expression in the second term of ∇ γ k l ðiÞ ðΩ ∧ ; Y, CÞ in (7.12): In other words, we need to compute BðT i Þ for i = 1, ⋯, n, where BðtÞ ≡ ∑ j:t kj ≤t b kj and Before going further, we recall that the distinct uncensored event times t k1 > ⋯>t kq k are in descending order and that the subjects are sorted so that the observation times T i 's are in descending order.
First of all, because the risk set is decreasing over time for right censored data, it follows from Equation (17) that Bðt k1 Þ, ⋯, Bðt kq k Þ can be computed in OðnÞ operations as one scans through t k1 , ⋯, t kq k backward in time. Second, analogous to (15), the following linear scan algorithm can be used to calculate fBðT ð1Þ Þ, BðT ð2Þ Þ,⋯,BðT ðnÞ Þg from fBðt k1 Þ, ⋯, Bðt kq k Þg: where t k1 , ⋯, t kq k are scanned forward from the largest to the smallest, and for each t kj , only a subset of the ranked observation times T ðiÞ 's are scanned forward to calculate BðT ðiÞ Þ's as follows: The details are essentially the same as those discussed following Equation (15) and thus omitted here.

Simulation Studies
We present a simulation study to illustrate the computational speed-up rendered by the proposed linear algorithms as the sample size n grows from 100 to 1,000,000. All simulations were run on a MacBook Pro with 6-Core Intel Core i7 processor (2.6 GHz) and 16 GB RAM running MacOS.
We generated longitudinal measurements Y ij from which corresponds to model (1) with X ð1Þ i ðt ij Þ T = ð1, t ij , X 2i Þ andX i ð1Þ ðt ij Þ T = ð1, t ij Þ, and competing risk event times from a proportional cause-specific hazard model where the two submodels (23)- (25) are linked together through the shared random effects b i = ðb 0i , b 1i Þ T . In the above joint model, t ij = 0, 1, ⋯ represent scheduled visit times, X 1i follows Nð2,1:0Þ, X 2i~B ernoullið0:5Þ is a binary covariate, the random effects b i = ðb 0i , b 1i Þ T follows a N 2 ð0, ΣÞ distribution with Σ 11 = 0:5, Σ 22 = 0:25, and Σ 12 = 0, the measurement errors ε ij are iid Nð0,0:5Þ and independent of b i , and the baseline hazards λ 01 ðtÞ and λ 01 ðtÞ are constants 0.05 and 0.1, respectively. We simulated noninformative censoring time V i following exp (20) and let T i = min fT * i1 , T * i2 , V i g be the observed time (possibly censored) for subject i. The longitudinal measurements for subject i at t ij are assumed missing after T i .
We first compared the runtime between three different implementations of the EM algorithm for fitting the joint models (1) and (2) as described in Section 2.2.
(1) Method 1. This method is a standard implementation of the EM algorithm using the standard Gauss-Hermite quadrature rule in the E-step (Equation (9) with n q = 20) without any linear computation.
(2) Method 2. This is a standard implementation of the EM algorithm using the pseudo-adaptive quadrature rule in the E-step (Equation (13)  (3) Method 3. This is a method 2+linear scan for calculating f ðY i , C i |r t , Ψ ðmÞ Þ's+linear risk set scan for M-step as described in Section 2.2.
The number of quadrature points n q for methods 1 and 2 was determined by first trying different values, {10, 20, 30} for method 1 and {6, 9, 12, 15} for method 2, and then choosing the smallest value for which the estimation results are stabilized and similar between the two implementation methods. For comparison purposes, we have also included the runtime of an established joint model R package joineR, which uses a similar EM algorithm for parameter estimation to fit a semiparametric joint model with a slightly different latent association structure in the competing risk submodel [29]. The results are depicted in Figure 1. It is seen from Figure 1(a) that the runtime of method 3 increases linearly with the sample size, while the runtime of the other three methods grows exponentially. For moderate sample size, method 2 is computationally more efficient than method 1 because it requires fewer quadrature points for numerical integration. However, its computational advantage diminishes as the sample size increases due to the exponen-tially increasing computational cost of f ðY i , C i |r t , Ψ ðmÞ Þ's and risk set calculation in the M-step. By further linearizing the computation of these key components, method 3 has yielded more than 100-fold speed-up over method 2 when n = 10 5 , and the speed-up is expected to increase exponentially as n increases (Figure 1(b)). Furthermore, method 3 has demonstrated more than 30-fold speed-up over joineR when n = 10 4 . We also note that joineR failed to run when n = 10 5 due to the overload of memory.
We also compared the runtime of two implementations of the standard error estimation, with and without linear scan as described in Section 2.2.3, and the bootstrap method employed by the joineR package [29]. The results are shown in Figure 2.
It is seen from Figure 2(a) that the implementation with linear scan easily scales to a million subjects, taking only minutes to finish, while the naive implementation without linear scan grinds to a halt when the sample size is 10,000 or larger. Figure 2(b) shows that linear scan can generate a speed-up by a factor of greater than 100,000 when n ≥ 10,000. Similarly, in comparison with joineR that used 100 bootstrap samples for standard error estimation, our standard error estimation method with linear scan generated a speed-up by a factor of greater than 100,000 when n ≥ 5,000.   (1) and (2) and the joineR package. The details of methods 1-3 are given in Section 3. joineR is an established R package which fits a similar semiparametric joint model with a slightly different latent association structure in the competing risk submodel [29]. Fold change is calculated as the ratio of runtime between two methods. 7 Computational and Mathematical Methods in Medicine Finally, Figures 1 and 2 in Section 3 have focused on contrasting the computational efficiency of different implementations for parameter estimation and standard error estimation in terms of the runtime. We have also compared their parameter estimates and standard error in Section A.6 of the supplementary materials. As one would expect, our three different implementations (methods 1-3) yielded almost identical estimation results, whereas joineR produced similar estimation results for the longitudinal model, but slightly different results for the competing risk model due to its different latent association structure.

Real Data Examples
We have developed an R package FastJM to implement the efficient algorithms described in Section 2. Below we illustrate the improved computational performance of FastJM in comparison to existing joint model R packages on a lung health study (LHS) data with n = 5,887 subjects and a UK Biobank data with n = 193,287 participants.

Lung Health Study.
The lung health study (LHS) data were collected from a ten-center randomized clinical trial on 5,887 middle-aged smokers with mild to moderate chronic obstructive pulmonary disease (COPD) [30]. Patients were randomized into three arms: usual care, smoking intervention and placebo inhaler (SIP), and smoking intervention and active bronchodilator inhaler (SIA). An important objective of the study was to determine if the intervention program with the combination of intensive smoking cessation counseling and an inhaled anticholinergic bronchodilator can slow down the decline in forced expired volume in 1 s (FEV 1 ) during a 5-year follow-up period. Patients' FEV 1 values were collected annually upon recruitment into the study. FEV 1 was chosen as the primary outcome since its trajectory is an indicator of a patient's natural loss of lung function during the progression of COPD. Since not all patients completed the whole study period, about 9.47% of longitudinal measurements were missing. One of the possible reasons for dropout is that treatment was not effective, and hence, missing longitudinal measurements after dropout are nonignorable.
Joint modeling of FEV 1 together with the possible informative dropout time provides an attractive approach to deal with nonignorable missing longitudinal data due to dropout. Based on previous findings, we considered the following   (1) and (2), linear scan and no linear scan as described in Section 2.2.3, and the bootstrap method employed by the joineR package [29]. Fold change is calculated as the ratio of runtime between two methods. 8 Computational and Mathematical Methods in Medicine covariates when characterizing the trajectory of Y = FEV 1 : time (year), sex, age, body mass index (BMI), baseline number of cigarettes smoked per day, and the logarithm of twopoint methacholine concentration-FEV 1 O'Connor slope (logslope) [31]. We also included two interaction terms between treatment indicators SIP and SIA and time, so that the difference in the slope of FEV 1 between SIP (or SIA) and usual care can be evaluated by testing if the interactions are zero or not. Specifically, we considered the following linear mixed effects model: which corresponds to model (1) with X The random error term ε ij~i id Nð0, σ 2 Þ and the random effects b i = ðb 0i , b 1i Þ T are assumed normally distributed with zero mean and a covariance matrix Σ. For the dropout time T i (possibly censored at the end of the study), we assume the Cox proportional hazard submodel.
where λ 0 ðtÞ denotes the baseline hazard function and W i ðtÞ is a latent association structure that links the two submodels. Table 1 compares the runtime of FastJM and some existing joint model packages including joineR [29], different versions of JM [2], JMbayes [32], and JSM [33] with various specifications of λ 0 ðtÞ and W i ðtÞ.
Among all the semiparametric models (FastJM, joineR, JSM a , and JSM b ), FastJM finished in 0.3 minutes while other methods took 20.4 minutes to 111 minutes. As a matter of fact, the runtime of FastJM was even shorter than those of some parametric joint models (JSM a and JSM b ). We also observed that JMbayes based on a Bayesian MCMC framework is considerably slower than its frequentist counterpart JM. Finally, the parameter estimates and inference results for the longitudinal outcome were almost identical between all packages, but slightly different for the survival submodel because of their slightly different latent structure W i ðtÞ.   [18,19]. Participants attended assessment at their closest clinic center where they completed questionnaires, took physical measurements, and provided biological samples (blood, urine, and saliva) as a baseline assessment visit. Hospital admission records were available until February, 2018, for the full UKB cohort, whereas linkage to primary care records was available for 45% of the UKB cohort (approximately 230,000 participants) until May, 2017, for Scotland, September, 2017, for Wales, and August, 2017, for England. The detailed linkage procedures relating to primary care records are available online (https://biobank.ndph .ox.ac.uk/showcase/showcase/docs/primary_care_data.pdf).
In this example, we consider a joint model of longitudinal systolic blood pressure (SBP) measurements and a competing risk event time defined as age-at-onset of type 2 diabetes (T2D) as the first risk and age-at-onset of stroke, myocardial infarction (MI), or all-cause death as the second risk, whichever occurred first. Age-at-onset of outcomes were based on participants' primary care or hospital records, whichever occurred first. Follow-up was censored at the primary care data end date for the relevant country or the date of outcomes, if this occurred earlier. SBP measures were extracted from either baseline assessment visit or primary care data. Covariates include sex, ethnicity, and BMI measured during baseline visit. However, considering the imbalanced racial distribution in this case study, we only considered white vs. non-white ethnicity groups. Specifically, the joint model consists of a linear mixed effects model for the longitudinal outcome (SBP), which corresponds to model (1) with X i ðt ij Þ T = ð1, t ij , for k = 1, 2. In Model-L, the random error term ε ij~i id Nð0, σ 2 Þ and the random effects b i = ðb 0i , b 1i Þ T are assumed normally distributed with zero mean and covariance matrix Σ.
In Model-PCH, k = 1 denotes type 2 diabetes and k = 2 stroke, λ k0 ðtÞ denotes the baseline cause-specific hazard function for cause k, and W ik ðtÞ denotes the latent association structure of SBP with cause k risk. To our knowledge, besides our FastJM package, joineR and JM are two other current joint model R packages that are capable of handling competing risk event outcomes. However, because JM encountered convergence issues, we will focus on FastJM and joineR in this case study. Table 2 compares the runtime of FastJM and joineR on a subset of 5,000 and 20,000 participants randomly selected from the UKB-PC data and the full UKB-PC data with 193,287 participants. Table 2 shows that for the UKB-PC subset of 5,000 participants, FastJM finished within 1 minute, while joineR took 3.3 hours to finish. For the UKB-PC subset of 20,000 participants, FastJM finished within 5 minutes, while joineR took 33 hours to run. For the UKB-PC full data with 193,287 participants, FastJM finished within 1 hour, whereas joineR encountered a computational failure.
Finally, the analysis results produced by FastJM and joineR are similar for the longitudinal submodel for the UKB-PC subset of 5,000 and 20,000 participants and for UKB-PC full data. For the survival submodel, the analysis results are also similar for most parameters except for the association parameters due to the different latent structure W i ðtÞ between two packages. Detailed analysis results are provided in Section A.8 of the supplementary materials.

Discussion
We have developed customized linear scan algorithms to reduce the computational complexity from Oðn 2 Þ or Oðn 3 Þ to OðnÞ within each iteration of the EM algorithm and in the standard error estimation step for a semiparametric joint model of a longitudinal biomarker and a competing risk event time outcome. Through simulation and case studies, we have demonstrated that the efficient implementation can generate a speed-up by a factor of up to hundreds of thousands and oftentimes reduce the runtime from days to minutes when the sample size is large (n > 10 4 ), making it feasible to fit a joint model on a large data in real time.
The ideas and techniques of this paper can potentially be adapted to improve computational efficiency for other joint models. For instance, the linear computational algorithm in Remark 1 for computing the variance-covariance matrices of empirical Bayes estimates of the random effects is not spe-cific to the joint model considered in this paper and can be used in any procedure that uses the pseudo-adaptive quadrature rule. Also, although we have focused on joint modeling of a single biomarker with a time-to-event outcome, our methodology can be easily extended to handle multiple biomarkers in a similar fashion. It is also important to note that the linear risk set scan algorithm is limited to the share random effect joint model in which the Cox submodel (2) only involves time-independent covariates. If the Cox submodel contains time-dependent covariates such as the present value of the longitudinal marker, then one may have to impose more restrictive assumptions such as assuming a parametric baseline hazard in order to linearize the computation costs with respect to the sample size. This paper has focused on linearizing the computation with respect to the sample size within the framework of classical EM algorithm that is coupled with the pseudo-adaptive quadrature rule for numerical integration in the E-step. It would be interesting to investigate if coupling our algorithms with other numerical integration methods such as quasi-Monte Carlo method [34] in the E-step or with other variations of EM algorithms such as the stochastic EM algorithm (stEM) [35] or Turbo-EM [36] could further enhance the computational efficiency, especially when there are 3 or more random effects in the model. Finally, current joint model implementations are generally not scalable as the number of longitudinal measurement grows large, rendering it infeasible to fit dense longitudinal data such as those generated from modern wearable devices for dynamic prediction of a health outcome. Future research is warranted to develop novel joint modeling procedures that are scalable to large number of subjects, random effects, and longitudinal measurements.

Software
A user-friendly R package FastJM [37] has been developed to fit the shared parameter joint model using the efficient algorithms developed in this paper and is publicly available on the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=FastJM.

Data Availability
The lung health study (LHS) data used to support the findings in Section 4.1 have not been made available because the authors did not have permission for data sharing from

Disclosure
This paper has been submitted as preprint as per the URL: https://arxiv.org/abs/2110.14822 [38].

Conflicts of Interest
The authors declare that they have no conflicts of interest.