^{1}

^{2}

^{2}

^{1}

^{2}

Truncated power basis expansions and penalized spline methods are demonstrated for estimating nonlinear exposure-response relationships in the Cox proportional hazards model. R code is provided for fitting models to get point and interval estimates. The method is illustrated using a simulated data set under a known exposure-response relationship and in a data application examining risk of carpal tunnel syndrome in an occupational cohort.

The Cox proportional hazards (PH) model is frequently used to model survival data or time-to-event data, particularly in the presence of censored survival times [

Consider an occupational cohort with _{i}, which takes the value of 1 if the individual had the event and 0 if the time is censored. The general form of the Cox PH model for a single covariate is

A nonlinear exposure-response relationship can be modeled by including a transformation of_{i} in the model:

This manuscript provides a detailed introduction to modeling and interpreting nonlinear exposure-response curves using these spline functions. We assume familiarity with the Cox PH model and survival data. The remainder of the paper is structured in three sections. Section

In the Cox PH model in (

To provide flexibility in capturing local features in the exposure-response curve, polynomial spline terms may also be used as basis functions. A spline function is a function, typically a polynomial, defined on a subinterval of the range of exposures. Splines allow for estimation of the exposure-response relationship using a piecewise-defined curve. They are generally considered to provide more flexibility in estimating nonlinear relationships than polynomials or other algebraic functions. To define a piecewise linear curve over four regions in which the slope changes from region to region, we would use a set of basis functions consisting of the functions

As an illustration, we simulated a data set of

True exposure-response relationship used to simulate data (a). Histogram of the simulated exposure data (b). Kaplan-Meier estimates of the survival functions for five exposure groups (c).

We illustrate the spline-based methods for estimating the exposure-response relationship,_{1} = 3.0,_{2} = 5.5, and_{3} = 8.3). A cubic truncated power basis representation using these same knots requires six basis functions,

Linear spline (a) and cubic spline (b) basis functions using knots at quartiles of the case exposures (_{1} = 3.0,_{2} = 5.5, and_{3} = 8.3).

Fitting the Cox PH model requires using the basis function transformations of the exposure variables as the covariates in the model (and introduces regression coefficients_{j}),

The R software package used here for fitting Cox PH models and obtaining the estimates is the

Based on the calculations and code in Appendices

Estimated ln(HR) and corresponding pointwise 95% confidence intervals using linear spline (a) and cubic spline (b) basis functions with knots at quartiles of the case exposures (_{1} = 3.0,_{2} = 5.5, and_{3} = 8.3).

Although the truncated power basis functions are relatively easy to visualize and implement, they do require a choice of the polynomial degree

Linear B-spline (a) and cubic B-spline (b) basis functions using equally spaced knots.

With the knots and degree specified, the B-spline basis functions are then the known functions

Estimated exposure-response curves on the natural logarithmic scale (logarithm of the hazard ratio) using truncated power basis functions and B-spline basis functions.

Penalized estimates for the unknown parameters in the basis expansion (

As with the truncated power basis expansion method of Section

To illustrate penalized estimates, we used our simulated data with the known quadratic nonlinear exposure-response curve. We fit penalized splines as described above, under three conditions: with

Estimated exposure-response curves on the natural logarithmic scale (logarithm of the hazard ratio) using penalized splines.

Table

Estimated hazard ratios (HR) and 95% pointwise confidence intervals from two Cox proportional hazard model fits.

Exposure |
Penalized spline function AICc as in Figure |
Linear spline function with knots at quartiles of case |
True HR |
---|---|---|---|

2.0 | 1.3 (1.2, 1.5) | 1.3 (1.1, 1.6) | 1.5 |

3.0 | 1.5 (1.3, 1.8) | 1.5 (1.1, 2.1) | 1.7 |

4.0 | 1.8 (1.4, 2.2) | 1.8 (1.3, 2.3) | 2.0 |

5.0 | 2.0 (1.6, 2.5) | 2.1 (1.6, 2.7) | 2.3 |

7.0 | 2.5 (2.0, 3.1) | 2.5 (2.0, 3.3) | 2.9 |

9.0 | 2.9 (2.3, 3.6) | 2.9 (2.2, 3.8) | 3.5 |

19.3 | 3.7 (2.1, 6.3) | 4.1 (2.5, 6.5) | 4.0 |

21.1 | 3.5 (1.7, 7.3) | 4.3 (2.5, 7.5) | 3.5 |

24.0 | 3.3 (1.1, 9.9) | 4.7 (2.4, 9.2) | 2.6 |

These estimated hazard ratios give the estimated hazard (risk) of the outcome at a given exposure relative to the hazard when unexposed. For instance, we estimate from the penalized spline fit using AICc that the hazard of the event when exposed at a level of 2.0 is 1.3 times that when unexposed, corresponding to a 30% increase in hazard at this exposure level. For this simulated data set, the linear truncated power basis with knots at the quartiles of the case exposures and the penalized spline fit are comparable; however while the former does attenuate, it does not decrease at the highest exposure values.

The

Garg et al. [

An initial assessment of a nonlinear exposure-response was made using plots of the martingale residuals. To do so, the Cox PH model with all covariates excluding the exposure (SI) variable was fit and the martingale residuals were obtained. These martingale residuals were then plotted against the exposure variable and Loess curves were added to the plot. The residual plot is displayed in Figure

Unscaled (a) and scaled (b) plots of the martingale residuals versus exposure (SI) with Loess curves using various degrees of smoothing (0.4 to 2.0) from a Cox proportional hazards model with all covariates excluding the exposure variable. (b) is scaled to focus on the Loess curves. The distribution of the exposure variable is given in the rug plot on the

To address the nonlinearity displayed in the residual plots, four models were examined for these revisited analyses: two parametric functional forms (linear and a logarithmic transformation), a linear spline function with a single knot at the median exposure of SI = 13.5 units (as in [

Estimated exposure-response curves for carpal tunnel syndrome and strain index in a cohort of 569 workers. Rug plot is of cases.

Table

Estimated hazard ratios and 95% pointwise confidence intervals from separate Cox proportional hazard models using the carpal tunnel syndrome and strain index exposure data.

Exposure value |
Linear | Logarithmic | Linear spline with knot at 13.5 | Penalized spline function with |
---|---|---|---|---|

0.8 | 1.01 |
1.21 |
1.10 |
1.04 |

6.0 | 1.10 |
1.88 |
2.09 |
1.35 |

9.0 | 1.15 |
2.11 |
3.03 |
1.57 |

13.5 | 1.24 |
2.38 |
5.27 |
1.89 |

18.0 | 1.33 |
2.60 |
4.85 |
2.12 |

20.3 | 1.38 |
2.70 |
4.65 |
2.18 |

54.0 | 2.33 |
3.68 |
2.51 |
2.32 |

The analyses of the previous sections illustrate a typical modeling conundrum in that the models considered all give differing estimated hazard ratios. For the occupational cohort of the previous section, all examined models provide statistical evidence of elevated risk (or hazard) for carpal tunnel syndrome as SI exposure levels increase relative to unexposed. The linear spline model used by Garg et al. [

A visual representation of the effect size differences (and similarities) between models can be assessed using the

One caution when using the spline-based methods was highlighted in Tables

As an illustration, we simulated two new data sets using the simulation set-up of Section

Estimated exposure-response curves on the natural logarithmic scale (logarithm of the hazard ratio) for simulated data with 41 cases in 500 observations (a, b, c) and with 40 cases in 5000 observations (d, e, f) using linear, linear splines, and linear B-splines (a, d), cubic spline and cubic B-splines (b, e), and penalized splines (c, f).

Regression modeling often focuses on interpreting coefficient estimates. When exposure-response relationships are nonlinear and a nonparametric or smoothing method is used to estimate the relationship, the resulting regression coefficients are not interpretable. But, these methods do provide effect size estimates which are interpretable—estimates at specific exposures of interest. The methods illustrated here are easily adapted to include a time-varying exposure. They can also be applied to a covariate of interest which is not an exposure measure but some other quantitative covariates, such as a prognostic factor. In these situations, the reference value of

The hazard ratio for a given exposure

We use a basis expansion representation for

The linear truncated power basis coefficients estimates have a nice interpretation in terms of the estimated change in the slope of the exposure-response curves that occurs at the knot points. For instance, the estimated slope for exposures up until the first knot point of 3.0 corresponds to the coefficient

The R script for creating the linear truncated power basis using knots at the quartiles of the case exposures is given in Appendix

The R script for creating the linear truncated power basis and fitting the corresponding Cox PH model is as follows:

The R script for computing fitted values at each exposure value, their corresponding standard errors, pointwise 95% confidence intervals, and plotting the results is as follows:

The R script for fitting a penalized spline with the degrees of freedom selected using the AICc is below. It assumes the

The corresponding output from the

Formal tests can also be evaluated for the truncated power basis methods. As the truncated power bases include a linear term in their expansion, this corresponds to testing the null hypothesis Ho:

The

The R code and corresponding output for the likelihood ratio test of nonlinearity in the linear truncated power basis expansion are as follows:

The authors declare that there are no conflicts of interest regarding the publication of this paper.

This work was partially supported by the National Institute for Occupational Safety and Health under Grant nos. U01 OH07917 and R01 OH010474.