Comparison of per cent predicted and percentile values for pulmonary function test interpretation

1University of Ottawa, The Ottawa Hospital, Ottawa, Ontario; 2University of Manitoba, Health Sciences Centre, Winnipeg, Manitoba; 3University Health Network, Toronto, Ontario Correspondence: Dr Smita Pakhale, University of Ottawa, Division of Respiratory Medicine, The Ottawa Hospital, 501 Smyth Road, Ottawa, Ontario K1H 8L6. Telephone 613-737-8899 ext 79469, fax 613-739-6807, e-mail spakhale@ohri.ca A pulmonary function test (PFT) is routinely used for the diagnosis and management of many pulmonary conditions. However, standardization of different parameters in a PFT is a difficult task. The American Thoracic Society (ATS) and European Respiratory Society (ERS) have jointly published standards for interpretative strategies for lung function tests (1), spirometry (2), lung volumes (3) and diffusing capacity of the lung for carbon monoxide (DLco) (4). Despite standardization, there are many differences from laboratory to laboratory, mainly due to the different reference values used (5-9). The best reference values for a particular laboratory depend on its patient population. PFTs are commonly interpreted in comparison with predicted normal values, based on a patient’s sex, height, age and race, with the observed value expressed as a per cent of predicted. With the per cent predicted method, abnormal PFTs have been defined as less than 80% or greater than 120% of the predicted value (10,11). However, more recent recommendations of the ATS and ETS suggest a percentile-based approach to interpret PFTs (1). The percentile-based approach defines an abnormal PFT result as less than the fifth percentile or greater than the 95th percentile (1). Although some studies have shown that for adults of average age and height, 80% of predicted is close to the fifth percentile, older and shorter adults are more likely to be classified as abnormal, and taller, younger adults are more likely to be classified as normal using the per cent predicted approach. This occurs because the scatter around the predicted value original artiCle

Therefore, we explored the diagnostic test characteristics of the per cent predicted method, with the percentile method as a reference standard, using information contained in a database of all PFTs performed on individuals older than 18 years of age at the Health Sciences Centre (Winnipeg, Manitoba), between January 2000 and July 2004, with the commonly used Crapo et al (5)(6)(7) and the more recently published, Canadian, Gutierrez et al (12) reference equations.

METHODS
Data contained in the PFT database at the Health Sciences Centre, for the period between January 2000 and July 2004, were used.All PFTs were performed during standard work hours in a single pulmonary function laboratory using Collins Equipment (GS 4G and Body Box II, Warren E Collins Inc, USA).All PFTs were performed according to ATS standards for acceptability and reproducibility (13,14).Registered PFT laboratory technicians performed all PFTs.All patients referred to the laboratory underwent a series of tests including flow-volume loops, lung volumes by body plethysmography when possible and DLco using the single-breath technique.At the end of each day, all PFT results were stored in an Excel (Microsoft Corporation, USA) database format.Although all races were included in the study, stratified data for race was not available and, therefore, correction for race was not performed.
Only one PFT per subject was included in the data collected and analyzed.All subjects included were outpatients and/or inpatients at the Health Sciences Centre.Predicted values for forced expiratory volume in 1 s (FEV 1 ), forced vital capacity (FVC), FEV 1 /FVC ratio, total lung capacity (TLC), residual volume (RV) and DLco, were calculated using Crapo et al (5)(6)(7) reference equations and the more recently published, Canadian, Gutierrez et al (12) reference equations.For percentiles, patients' observed values were converted to standardized residuals: (Observed -Predicted)/RSD in which RSD is residual SD.If the standardized residual was less than or equal to 1.64, then the values were at or below the fifth percentile.The RSD was taken from the original papers in which the reference equations were published (5)(6)(7)12).
Generally, a test was defined as abnormal if it was less than 80% or greater than 120% of predicted, with notable exceptions for RV (greater than 140% of predicted), DLco (less than 75% or greater than 125% of predicted) and FEV 1 /FVC (less than 0.70 observed) (Table 1) using the per cent predicted method as per the conventional criteria (10,11).All of the PFTs were then reclassified using the percentile approach in which an abnormal test was defined as less than the fifth or greater than the 95th percentiles (based on ATS/ERS recommendations).The values for men and women were calculated separately.
Using the percentile method as a reference standard, the diagnostic test characteristics of the per cent predicted method for all parameters of PFT were calculated.Sensitivity and specificity were considered to be suboptimal if they were below 90%.Agreement between the per cent predicted method and percentile method was then calculated as follows:

Agreement = (Number positive by both tests + Number negative by both tests)/Total number of tests
Separate analyses were performed in subgroups of patients that comprised the extremes of age and height, with respect to sex in each case.Extremes of ages were defined as younger than 25 years and older than 70 years, whereas extremes of height were defined as below 152 cm and above 173 cm for women and above 185 cm for men.Extremes of age and height were defined somewhat arbitrarily and based on the number of available subjects in each category of the data set.

RESULTS
Full PFTs including lung volumes and DLco of 2176 male and 1658 female subjects were analyzed using Crapo et al (5-7) and Gutierrez et al (12) equations.The mean (± SD) age of the entire study cohort was 52±15 years (Table 2).
Tables 3 and 4 describe the classification of abnormal PFTs by the per cent predicted and the percentile methods using the Crapo and the Gutierrez equations for women and men, respectively.The Crapo equation classified female subjects as having decreased DLco three times more commonly than the Gutierrez equation, using both the per cent predicted and the percentile methods.Tables 5 to 7 demonstrate the diagnostic characteristics of the per cent predicted method using the percentile method as a gold standard, as suggested by the ATS/ERS guidelines, for women, men and for all PFTs combined, respectively.The specificity for a reduction in RV was suboptimal in women (83% to 86%).The sensitivity for an abnormal TLC and increased RV tended to be suboptimal in women (78% to 94%), although less so with Gutierrez than with Crapo predicted values.The performance of the per cent predicted method appeared to be much better in men, with suboptimal sensitivities for increased RV (82% to 90%), reduced DLco (Crapo, 88%), and increased TLC (Gutierrez, 85%).In men and women combined, the specificity for reduced RV (86% to 88%), sensitivity for increased RV (83% to 89%), and sensitivity for abnormal TLC or increased RV (83% to 89%) tended to be suboptimal by both Crapo and Gutierrez predicted values.However, only the Crapo predicted values yielded suboptimal sensitivity for reduced DLco (89%).
For women 18 to 25 years of age (n=96), suboptimal sensitivity (13% to 81%) for all parameters was found -except for increased RV by Crapo and abnormalities in RV by Gutierrez equations -as well as suboptimal specificity for reduced RV.For women older than 70 years of age (n=187), there was suboptimal specificity for FEV 1 , FVC, FEV 1 /FVC ratio and reduced DLco, and suboptimal sensitivity for increased RV and reduced TLC by both equations.For women less than 152 cm in height (n=101), there was suboptimal specificity for FEV 1 , FVC and reduced RV by both equations, reduced DLco by the Gutierrez equation, and suboptimal sensitivity for FEV 1 /FVC ratio by both equations.For women taller than 173 cm (n=93), there was suboptimal sensitivity for all the parameters except for reduced RV (which has suboptimal specificity) by both equations.
For men younger than 25 years of age (n=98), there was suboptimal sensitivity for all the parameters (except change in RV by both equations and FVC by Crapo's equations) and there was suboptimal specificity for reduced RV by both equations.For men older than 70 years of age (n=337), there was suboptimal specificity for FEV 1 , FVC and reduced DLco and suboptimal sensitivity for increased RV by both equations, and FEV 1 /FVC ratio by the Gutierrez equation.For men taller than 185 cm (n=168), there was suboptimal sensitivity for all the parameters, except FVC and reduced RV by Crapo and FEV 1 /FVC ratio, reduced RV (suboptimal specificity) and DLco by Gutierrez equations.The study cohort did not have any male subjects shorter than 152 cm.
The per cent agreement between the two tests, for all PFTs combined, using both equations, was more than 94% for all parameters except for reduced RV (88% to 89%).For women, the per cent agreement between the two tests was more than 94% for all parameters except for reduced RV (84%) and DLco (90%) using Crapo equations, and reduced RV (87%) using the Gutierrez equation.For men, the per cent agreement between the two tests was at least 93% for all parameters except DLco (92%) using the Crapo equation and reduced RV (87%) using the Gutierrez equation.

DISCUSSION
We compared the per cent predicted and percentile methods of PFT interpretation because the latter method has been recommended by the ATS/ERS guidelines (1).Moreover, there are no studies in the literature comparing the two methods for interpretation of lung volumes and DLco.There are only a few studies in the literature comparing the two methods for all parameters in spirometry (15,16) and FEV 1 /FVC ratio (17,18).In addition, the majority of the PFT laboratories in North America use the Crapo et al (5)(6)(7) or Morris (19) equations for calculating the predicted normal values.However, these equations are dated and, therefore, are derived from PFTs performed on older PFT machines.Therefore, we compared the two interpretation strategies using the newer, Canadian, Gutierrez et al equations and the older Crapo et al equations.
Although it is difficult to know whether classifying a single value as normal or abnormal makes a difference in the interpretation of a complete set of PFTs, we selected variables that we believe are of importance to clinicians in making diagnostic and therapeutic decisions.We selected a somewhat arbitrary threshold of acceptability for sensitivity and specificity (greater than 90%), but presented the actual values to allow clinicians to judge whether the degree of agreement was acceptable.We found that both methods, per cent predicted and percentile, were comparable except for lung volumes and DLco.In our population, the reduced specificity for a reduction in the RV of a woman translated into a positive predictive value of 31% to 36% and positive likelihood ratios of 6 to 7 (depending on the predictive equation used).In women, suboptimal sensitivity for restriction (reduction of TLC) by both methods of analysis, elevation of the TLC by Crapo's equation and an increased RV by Crapo's equation, translated into only moderate negative likelihood ratios that were generally between 0.15 and 0.22.In men, suboptimal sensitivity for increased RV by Crapo's equation and increased TLC by Gutierrez's equation translates into only moderate negative likelihood ratios of between 0.15 and 0.18.Hence, these measurements may best be corrected to percentiles.Our study supports the theoretical concern that the per cent predicted method is less accurate in subjects with extreme age or height.Therefore, the above concern should serve as a caution in the interpretation of PFTs in subjects with extreme age or height.
In such subjects, it may be advisable to use the percentile method to avoid diagnostic errors.
In our study, the differences in both sexes combined appeared to be driven by the differences in the female subjects, as was observed by Aggarwal et al (16) for spirometry.Similar to our study, Aggarwal et al (16) demonstrated that subjects at extremes of ages and height had discordant results with the per cent predicted method and percentile method.
Similar to Gutierrez et al (12), but far more often, we found that reduced DLco is more often diagnosed by using the Crapo equation, and more so in female than in male subjects.However, the discrepancy in DLco observed in the study by Gutierrez et al (12) and in the present study, raises concerns regarding the interpretation of an abnormal DLco measurement.

CONCLUSION
The results of the per cent predicted and percentile-based approaches to PFT interpretation are generally similar.As expected, subjects at the extremes of age and height were, however, more likely to be misclassified using the per cent predicted method.In most subjects, however, the two methods of PFT interpretation may be used interchangeably for spirometry.For TLC, increased RV and DLco, there was suboptimal sensitivity, and for decreased RV there was suboptimal specificity.Inadequate sensitivity may lead to difficulties in detecting important disorders associated with a reduction in DLco (eg, emphysema, interstitial lung disease and pulmonary vascular disease).Caution is warranted in relying solely on per cent predicted methods when assessing lung volumes or DLco in all subjects, and most parameters in subjects at the extremes of age and height, for which it is likely best to correct these values using the percentile method.These results provide empirical evidence to support the ATS/ ERS recommendation to use percentile-based interpretation of PFTs rather than the per cent predicted method.

TABLE 4 Percentage of abnormal pulmonary function tests determined by the per cent predicted and percentile methods using Crapo et al (5-7) and Gutierrez et al (12) equations for men (n=2176) Conventional: Per cent predicted (Crapo/Gutierrez), %* Reference standard: Percentile (Crapo/Gutierrez), %*
*Per cent predicted (reduced and increased) and the percentile (<5th and >95th) values are calculated using both the Crapo and the Gutierrez equations; they are reported here in the respective columns as Crapo/Gutierrez; † Abnormal ratio is <0.70 of actual FEV 1 /FVC.DLco Diffusing capacity of the lung for carbon monoxide; FEV 1 Forced expiratory volume in 1 s; FVC Forced vital capacity; RV Residual volume; TLC Total lung capacity

TABLE 5 Diagnostic characteristics of the per cent predicted method, using the percentile method as a reference standard using Crapo et al (5-7) and Gutierrez et al (12) equations for women (n=1658) Parameter Crapo et al Gutierrez et al
Above normal, for all other parameters, only below normal are presented; DLco Diffusing capacity of the lung for carbon monoxide; FEV 1 Forced expiratory volume in 1 s; FVC Forced vital capacity; LR+ Positive likelihood ratio; LR-Negative likelihood ratio; NPV Negative predictive value; PPV Positive predictive value; RV Residual volume; Sen Sensitivity; Spe Specificity; TLC Total lung capacity

TABLE 6 Diagnostic characteristics of the per cent predicted method, using the percentile method as a reference standard using Crapo et al (5-7) and Gutierrez et al (12) equations for men (n=2176) Parameter Crapo et al Gutierrez et al
Above normal, for all other parameters, only below normal are presented; DLco Diffusing capacity of the lung for carbon monoxide; FEV 1 Forced expiratory volume in 1 s; FVC Forced vital capacity; LR+ Positive likelihood ratio; LR-Negative likelihood ratio; NPV Negative predictive value; PPV Positive predictive value; RV Residual volume; Sen Sensitivity; Spe Specificity; TLC Total lung capacity

TABLE 7 Diagnostic characteristics of the per cent predicted method, using the percentile method as a reference standard using Crapo et al (5-7) and Gutierrez et al (12) equations for all pulmonary function tests (n=3834) Parameter Crapo et al Gutierrez et al Sen, % Spe, % PPV, % NPV, % LR+ LR- Sen, % Spe, % PPV, % NPV, % LR+ LR-
↓ Below normal; ↑ Above normal, for all other parameters, only below normal are presented; DLco Diffusing capacity of the lung for carbon monoxide; FEV 1 Forced expiratory volume in 1 s; FVC Forced vital capacity; LR+ Positive likelihood ratio; LR-Negative likelihood ratio; NPV Negative predictive value; PPV Positive predictive value; RV Residual volume; Sen Sensitivity; Spe Specificity; TLC Total lung capacity