In this paper, we compare the agreement [
Cortisol regulates normal responses to stress, and is important for vascular reactivity, carbohydrate metabolism, and immune function [
Direct/analogue immunoassay (IA) methods for the measurement of FT4 are widely used and are controversial [
This study represents secondary analyses of previously existing data [
Samples for cortisol and FT4 were collected in plastic red top tubes (containing clot activator, Vacutainer, manufactured by Becton Dickenson, Franklin Lakes, NJ 07417) and allowed to clot for 20 minutes. The samples were then centrifuged at 4,000 rpm for 10 min, serum separated and immediately stored at minus
Cortisol was measured on the DPC Immulite 1000 (Diagnostic Products Corporation, Los Angeles, CA) while FT4s were measured on the Dade RxL Dimension (Dade-Behring Diagnostics, Glasgow, DE).
Both the cortisol and FT4s were assayed as previously published in [
Data in each study cohort were analyzed separately using a Bland-Altman (BA) or means-difference plot for the IA-MSMS pairs of measurements. The “summary” of data in the BA plot is reflected in reference lines at zero, representing the ideal mean difference between the two measures, and the values one standard deviation (solid lines) and two standard deviations (short dashed lines) away from zero (long dashed line) on the
The data are shown first for cortisol values at baseline and then 30 and 60 minutes after cortrosyn injection and then for free T4 values sampled in the nonpregnant state and then during the successive trimesters of pregnancy. Tables
Descriptive statistics for cortisol (Table
Cortisol (mcg/dl) over a 60 minute test period by MS-MS and IA
MSMS: MEAN (SD) | IA: MEAN (SD) | |
---|---|---|
cortisol | 9.498 (6.08) | 8.508 (4.99) |
cortisol | 21.982 (10.24) | 20.553 (8.96) |
cortisol | 25.758 (12.22) | 23.844 (10.74) |
Free T4 (FT4, ng/dl) over 3 trimesters of pregnancy, and in nonpregnant women. Note that agreement on reference ranges for FT4 in pregnant women is not universal. Kahric-Janicic et al. (2007) [
MSMS: MEAN (SD), N | IA: MEAN (SD), N | |
---|---|---|
FT4, trimester 1 | 1.125 (.23), 59 | 1.071 (.22), 61 |
FT4, trimester 2 | 0.915 (.31), 36 | 0.795 (.17), 42 |
FT4, trimester 3 | 0.863 (.22), 26 | 0.875 (.18), 35 |
FT4, nonpregnant women | 0.928 (.26), 28 | 1.102 (.25), 28 |
Figure
Bland-Altman plot of Cortisol ((mcg/dl) at
A clear trend in increasing variance is observed along the
Table
Pearson correlations of difference between methods versus MSMS alone, for Cortisol
DIFFERENCE | Cortisol by MSMS, | Cortisol by MSMS, | Cortisol by MSMS, |
---|---|---|---|
0.650 | |||
0.494 | |||
0.480 |
All Pearson correlation coefficients significant at
Figure
Bland-Altman plots of FT4 (ng/dl), first trimester (T1, 2A), second trimester (T2, 2B), 3rd trimester (T3, 2C), and in nonpregnant (NP, 2D) women. Difference (MSMS-IA) on Y axis and average value from IA and MSMS on X axis. (Pregnant cohort N with both ranges from 59 (T1) to 26 (T3); nonpregnant cohort N is 28.) Reference line at zero and pairs of lines at
The patterns in the means-difference plots are not as clear cut for FT4 as for cortisol. In the first trimester (T1, Figure
Table
Pearson correlations of difference between methods versus MSMS alone, for Free T4 (FT4). MSMS and (MSMS-IA) correlations reflect significant differences in the variances of the measurements by IA versus MSMS over time, and for nonpregnant women (NP) measured at one time only. (Pregnant cohort N with both ranges from 59 (T1) to 26 (T3); Nonpregnant cohort N is 28.)
DIFFERENCE | FT4 by MSMS, over time (and for NP) | |||
FT4 by MSMS, T1 | FT4 by MSMS, T2 | FT4 by MSMS, T3 | FT4 by MSMS, NP | |
FT4, T1 MSMS-IA | 0.598† | — | ||
FT4, T2 MSMS-IA | 0.823† | — | ||
FT4, T3 MSMS-IA | 0.723† | — | ||
FT4, NP MSMS-IA | — | — | — | 0.411† |
† Pearson correlation coefficient significant at
Strong, positive correlations were observed between the difference between the two methods and MSMS measurements of FT4 over the three trimesters and in the nonpregnant state, suggesting that the patterns reflected in Figures
This study of two methods to assay FT4 and cortisol over time showed nonlinear disagreement between analytes measured by immunoassay and tandem mass spectrometry. The differences were more dramatic for cortisol than for FT4 but significant correlation coefficients reflected “genuine” trends of increasing variance associating with increased analyte concentration, and over time, for both analytes. In these correlations, we treated MSMS as the standard measurement; IA results are derived through a mathematical formula in the direct/analogue methods used in the great majority (
Our results suggest significant variation (heteroscedasticity) and nonequivalence of these two methods. This could simply reflect poor reliability in IA for these analytes but in the case of FT4 it is also perhaps suggestive that the variation in disagreement over trimesters could also have varying causes (i.e., that vary with pregnancy, e.g., heterophilic antibodies, changes in protein binding, etc.). Although the agreement was worse for cortisol, correlations between the difference in results from the two methods and MSMS was significant for both analytes and at each of the time points.
The purpose of this work is not to provide correct/corrected values for the analytes in these populations but rather to highlight areas where interpretation/interpretability might be compromised by unreliability and/or failures of modeling assumptions (such as heteroscedasticity and time-sensitive variability). As noted earlier, this study sought to quantify the agreement between the two measurement methods and/or their disagreement, and not to quantify or correct for the degree of disagreement. The magnitudes of the differences our analyses discovered could be clinically relevant. For example, in the evaluation of adrenal insufficiency they could make the difference between concluding someone has adrenal insufficiency or adrenal sufficiency. In this particular cohort of patients, 11.5% of those tested could be given a different diagnosis (adrenal insufficiency versus adrenal sufficiency) depending on which assay was used to make the diagnosis. Similarly, the differences between the thyroid hormone assays are clinically relevant, particularly in the pregnant population where the thyroid hormone concentration could be most relevant for fetal health [
Our results suggest that immunoassay and tandem mass spectrometry cannot be considered to yield interchangeable results. The methods did not agree and this disagreement became more extreme and less predictable at higher concentrations of the analytes we studied. As the true concentration becomes more extreme, so does the discrepancy between IA and MSMS results. The implication of this for clinicians is that patients with analyte values at the extremes are more likely to be misdiagnosed/mismeasured when IA is used (see [
Since the extreme observations are by definition rare outcomes, they might appear to be outliers and not contribute much (particularly in larger samples) to decrease the
Our results demonstrate good agreement between IA and MSMS in the concentration range of least interest—that is, at normal levels [
IA and MSMS methods have been compared elsewhere [
In conclusion, these analyses demonstrate statistically significant disagreement in the measurement of two analytes at levels outside of the reference range by two different assays. It is important for physicians who are making clinical decisions to be aware that the analyte value they are provided with varies depending on the assay used to generate the data. Clinicians may be surprised to discover that the clinical decision they reach may be impacted by the assay employed. The mechanism of action of the MSMS assay relative to that of IA methods support MSMS as the more specific and accurate assay (see also [
Dr. R. Tractenberg was supported in part by Grant M01RR1329 from the National Center for Research Resources and in part by Grant K01AG027172 from the National Institute on Aging. Dr. S. Soldin is partially supported by NIH GCRC Grant no. MO1-RR-020359, by Grant 1 U10HD45993-02 of the National Institute of Child Health and Development and partially by Applied Biosystems/Sciex. Dr. J. Jonklaas is supported by National Center for Research Resources Grant K23 RR16524. The authors thank Dr. Niek Verwey for helpful comments on the paper.