Comparison of Semiparametric, Parametric, and Nonparametric ROC Analysis for Continuous Diagnostic Tests Using a Simulation Study and Acute Coronary Syndrome Data

We aimed to compare the performance of three different individual ROC methods (one from each of the broad categories of parametric, nonparametric and semiparametric analysis) for assessing continuous diagnostic tests: the binormal method as a parametric method, an empirical approach as a nonparametric method, and a semiparametric method using generalized linear models (GLM). We performed a simulation study with various sample sizes under normal, skewed, and monotone distributions. In the simulations, we used estimates of the ROC curve parameters a and b, estimates of the area under the curve (AUC), the standard errors and root mean square errors (RMSEs) of these estimates, and the 95% AUC confidence intervals for comparison. The three methodologies were also applied to an acute coronary syndrome dataset in which serum myoglobin levels were used as a biomarker for detecting acute coronary syndrome. The simulation and application studies suggest that the semiparametric ROC analysis using GLM is a reliable method when the distributions of the diagnostic test results are skewed and that it provides a smooth ROC curve for obtaining a unique cutoff value. A sample size of 50 is sufficient for applying the semiparametric ROC method.


Introduction
Receiver operating characteristic (ROC) curve analysis is used in many scientific fields to determine the accuracy of a diagnostic test, for example, in signal detection theory and medicine [1][2][3][4][5][6][7]. An ROC curve is a plot of the false positive rates against the true positive rates for various cutoff values of the diagnostic test result. The most commonly used value to summarize the accuracy is the area under the ROC curve (AUC). The AUC can take values between 0 and 1, and greater AUC values denote better accuracy [6].
The result of a diagnostic test may be binary, ordinal, or continuous. Most medical diagnostic tests, such as biomarkers for myocardial injury or cancer, have continuous test results [5,8]. The most common ROC analyses are nonparametric. The nonparametric ROC methods do not require any assumptions about the diagnostic test result distributions and do not provide a smooth ROC curve. However, parametric methods assume that some function of the diagnostic test measurements are normally distributed in both the diseased and nondiseased populations but with different means. It is possible to obtain a smooth ROC curve using the parametric ROC methods. For comparison, we chose the empirical ROC method [9,10] an example from the nonparametric category and the binormal ROC method, which has been popularized by Metz and other researchers [11][12][13], as an example from the parametric category. The empirical and binormal ROC methods are included in most major statistical packages and are the most popular methods within the nonparametric and parametric categories, respectively. The third alternative to these two traditional ROC categories is semiparametric ROC analysis. Semiparametric ROC methods do not make any distributional assumptions about the results of the diagnostic test and also yield a smooth ROC curve. The main difference between the semiparametric and nonparametric methods is that semiparametric methods can estimate ROC curve parameters without making any assumptions about the distribution of the diagnostic test results. Many semiparametric ROC approaches have been developed. For example, Gang et al. [14] developed a semiparametric ROC method using a nonparametric approach for the test result distribution of the nondiseased group and a parametric methodology for the test results of the diseased group. Cai and Moskowitz [15] proposed using two methods, profile likelihood and pseudomaximum likelihood, to estimate the ROC curve parameters. In addition to these semiparametric approaches, Wan and Zhang [16] used a kernel distribution function estimator. In their study, Pepe's [17] application of the generalized linear model (GLM) to ROC curves was used as a semiparametric ROC method because of its highly efficient estimators. In this approach, inferences are made using binary regression techniques applied to indicator variables constructed from paired test results (one component from a diseased subject and the other from a nondiseased subject). ROC curve parameter estimates can be easily obtained using the GLM binary regression framework, and the effects of covariates can be evaluated. Thus, we chose the semiparametric method for our comparison.
In this study, our specific objectives were to compare the performance of three different individual ROC methods for assessing a continuous diagnostic and to determine which method is efficient under which conditions. To achieve these goals, we generated simulated random datasets of various sample sizes from normal, skewed, and monotone distributions using the SAS/IML and SAS GENMOD procedure in SAS 9.1. The three methods were also applied to an actual acute coronary syndrome dataset. For comparison, we used the estimates of the ROC curve parameters a and b, the AUC estimates, the standard errors of these estimates, the 95% AUC confidence intervals, and the root mean square errors (RMSE) of AUC estimates.

Materials and Methods
Let Y denote a random variable representing a continuous diagnostic test result. The diagnosis according to any cutoff value c is positive if Y ≥ c and negative if Y < c. Let D 0 and D 1 denote the nondiseased and diseased populations, respectively. The true and false positive rates at the cutoff value c, TP(c), and FP(c) are The ROC curve is denoted by where TP(c) = F 1 (c), FP(c) = F 0 (c), and t is the all possible FP rates according to the varying c values in (−∞, ∞) [5].

Parametric ROC.
In this study, the binormal method was used for the parametric ROC analysis. The main assumption of this method is that the results of the continuous diagnostic test in the diseased (Y 1 ) and nondiseased (Y 0 ) populations are normally distributed with different means: The ROC curve is modeled by the following function: where Φ is the cumulative normal distribution function, and see [11][12][13].
The AUC equals the probability that a randomly selected diseased subject has diagnostic higher than a randomly selected nondiseased subject: where φ is the normal probability density function [5,12,13]. Thus, the estimates of a, b, and AUC (denoted by a, b, and A UC, resp.) are computed using μ 1 , μ 0 , σ 1 , and σ 0 . The respective variances of a and b are V ( a) = n 1 a 2 + 2 + 2n 0 b 2 2n 0 n 1 , where n 1 and n 0 are the numbers of diseased and nondiseased study subjects, respectively. The variance of A UC can be derived using the delta method [6].

Nonparametric ROC.
In our study, the empirical method was used for the nonparametric ROC analysis. This method is popular because it does not make any distributional assumptions about the diagnostic test measurements. In this approach, the possible diagnostic test results for each cutoff value c are considered, and the corresponding true and false positive rates are calculated by where s 1 (c) is the number of subjects with test results greater than or equal to c(Y ≥ c) among the diseased subjects and s 0 (c) is the number of subjects with test results greater than or equal c(Y ≥ c) among the nondiseased subjects. The ROC curve is subsequently created by connecting these points with a straight line [9,10]. The AUC of the nonparametric ROC curve is obtained using trapezoidal rule and is estimated by Computational and Mathematical Methods in Medicine and Y i1 and Y j0 are the diagnostic test results for the diseased and nondiseased individuals, respectively. The variance of the estimated AUC is computed using Mann-Whitney Statistic [10]: Q 1 and Q 2 are defined as where n =y 0 is the number of true negative subjects with test results equal to y, n =y 1 is the number of true positive subjects with test results equal to y, n <y 0 is the number of true negative subjects with test results less than y, and n >y 1 is the number of true positive subjects with test results greater than y [10].

Semiparametric ROC.
The semiparametric methods of ROC curve interpretation were represented by Pepe's [17] generalized linear model (GLM). Like the nonparametric method, this approach does not need to make any distributional assumptions about the diagnostic test results; similar to the parametric method, it estimates the parameters a and b and the corresponding AUC. Therefore, this method can be described as a semiparametric ROC analysis. To implement the semiparametric ROC approach using the GLM, a binary indicator variable is defined by for all n 1 × n 0 possible pairs of diagnostic test results. Next, the false-positives rates t j are calculated for all of the possible pairs using the test results of the nondiseased subjects. That is, for any pair (Y i1 , Y j0 ), t j is obtained by The ROC curve is constructed parametrically as where g is the specified link function, h 1 , . . . , h K are basis functions, and β 1 , . . . , β K are unknown parameters. Applying GLM procedures, a linear model can be derived by using the expectation of the binary variable U i j and the function t j . This model is defined as If the probit link function Φ −1 is used, and h 1 (t j ) = 1 and , the linear model is denoted by see [17]. We used probit link fuction as above because our aim is to construct the ROC model parametrically as parametric method, but then we estimate the parameters without making any assumption about diagnostic test results to make the comparisons. The parameter estimates β 1 and β 2 are calculated using the generalized linear model binary regression framework and can be used for a and b, respectively. The AUC for the semiparametric model is estimated using β 1 and β 2 : The variances of β 1 , β 2 and A UC are computed using bootstrap techniques [17,18].

Simulation Algorithm.
To compare the performances of the parametric, nonparametric, and semiparametric ROC methods, we generated continuous datasets from the normal, lognormal, and uniform distributions and applied the following simulation steps.
(1) The normally distributed diagnostic test results were generated from the normal distributions of both the diseased and nondiseased subjects (specifically Y 1 ∼ N(a/b, 1/b) and Y 0 ∼ N(0, 1), where a = 1.400, b = 0.900), and the corresponding AUC ∼ = 0.850. Next, the three ROC methods were applied to this dataset, and the parameter estimates and AUCs (with their standard errors, RMSEs, and 95% confidence intervals) obtained from the methods were recorded.
(2) To represent diagnostic test results from a skewed distribution, a dataset was generated from the lognormal distribution for both the diseased and nondiseased subjects: As in step 1, the methods were applied to this dataset, and the parameter estimates and AUCs (with their standard errors, RMSEs, and 95% confidence intervals) were recorded.
(3) To represent diagnostic test results from a monotone distribution, datasets for both the diseased and nondiseased subjects were generated from the uniform distribution: where l is the left location parameter and r is the right location parameter and where a = 1.400 and b = 0.900 when the corresponding AUC ∼ = 0.850. As in steps (1) and (2), the methods were applied to this data set, and the parameter estimates and AUCs (with their standard errors, RMSEs, and 95% confidence intervals) were recorded.
(4) The first three steps were independently replicated 1000 times. Thus, 1000 different parameter estimates and AUCs with their standard errors, RMSEs, and 95% confidence intervals were obtained for each method and each diagnostic test result distribution.
(5) The means of the 1000 different parameter estimates and AUCs with their standard errors, RMSEs, and 95% confidence intervals were calculated.
The three ROC methods were compared by evaluating how close the means of the parameter estimates were to the values determined for a, b, and AUC.

Application Data.
The data set consisted of 62 patients who had been diagnosed with non-ST elevation acute coronary syndrome (NSTE-ACS) on the basis of an acute chest pain episode and electrocardiographic changes manifested by ST depressions or T wave inversions within 12 h of the symptom onset. The levels of cardiac troponin-I (cTnI) and the MB isoenzyme of creatine kinase (CK-MB) were measured at the time of emergency department arrival. A single test for myoglobin was obtained if the cTnI level was elevated. A non-ACS group consisted of 20 subjects who had atypical chest pain with normal cTnI and normal CK-MB levels. Myoglobin levels were obtained from both the NSTE-ACS and non-ACS groups. Figure 1 shows the distribution plot of the myoglobin levels for the NSTE-ACS and non-ACS groups. The parametric, nonparametric, and semiparametric ROC analyses were applied to the data set, with the myoglobin levels serving as a biomarker for detecting ACS. Next, the results of the three ROC methods were compared. The study was approved by the Eskisehir Osmangazi University School of Medicine Ethics Committee, and the data set was collected between October 4, 2004 and September 4, 2005.  Table 1 shows the simulation results when the distributions of the continuous diagnostic test measurements were normal in both the diseased and nondiseased subjects. As the total sample size increased (especially to over 50), both the parametric and semiparametric methods provided parameter estimates with negligible bias and similar standard errors. Additionally, the three methods had similar estimates with negligible bias for the AUC. The standard errors, RMSE's, and 95% confidence intervals for the AUC's of each method are similar with negligible differences at larger sample sizes. Table 2 shows the diagnostic test simulation results when using a skewed distribution for the diseased and nondiseased subjects. The parametric method yielded biased parameters and AUC estimates at each sample size. However, the semiparametric method provided parameter and AUC estimates with a negligible bias when the sample size increased. The nonparametric method produced negligible AUC bias at each sample size. When the sample size increases, the nonparametric and semiparametric AUC estimates and their standard errors become similar. On the other hand, nonparametric and semiparametric methods have similar RMSE for the AUC estimates at each sample size. For small sample sizes, the 95% AUC confidence intervals from the nonparametric method have a narrower range than those of the semiparametric method.

Results and Discussion
The diagnostic test simulation results using a monotone distribution for the diseased and nondiseased subjects are shown in Table 3. These results indicate that the parametric method gave parameter estimates with negligible bias for each sample size. However, the semiparametric method provided estimates for the a parameter with negligible bias for smaller samples and larger bias for larger sample sizes. Standard errors of the parameter estimates were similar at larger sample size for the parametric and semiparametric method. The parametric method provided less biased AUC estimates at each sample size than did the semiparametric and nonparametric methods. The AUC estimates of each method had similar standard errors. However, the parametric method has smaller RMSE for the AUC estimates than the other two ROC methods. Additionally, the parametric method has narrower 95% AUC confidence intervals. Table 4 indicates that the myoglobin levels are skewed and nonnormally distributed for each group. Table 5 shows the results of applying the three ROC methods to the ACD data set. These results were similar to Table 1: Means of the parameter estimates and AUC's with their standard errors, RMSE, and 95% confidence intervals (CI) obtained from the parametric, semiparametric, and nonparametric ROC methods using various sample sizes from 1000 simulated data sets generated from the normal distribution.  the results in Table 2. The ROC curves obtained from the ACS dataset for each method are shown in Figure 2.
The semiparametric ROC method is alternative to the traditional parametric and nonparametric ROC methods. The parametric method has the restriction about the distribution of a diagnostic test which must be normal or a transformation of the test must be normal. On the other hand, the nonparametric method has a disadvantage that it does not yield smooth curve, especially in small samples. However, the semiparametric method has no assumption about the distribution of a diagnostic test and also yield smooth curve. In this case, the performances of the semiparametic method according to the other two methods were investigated and compared in this study. This paper argues that semiparametric ROC analysis using GLM is a reliable method that can be used instead of parametric and nonparametric ROC methods for continuous diagnostic test results with skewed distributions and sample sizes greater than 50. Additionally, the semiparametric ROC analysis yields a smooth ROC curve, which is important when determining a unique optimal cutoff value.
As shown in the results of the simulation and application studies, the parametric method yielded unreliable, biased, and inconsistent parameter and AUC estimates when the distribution of the diagnostic test results was skewed from normality. We can conclude that applying Table 3: Means of the parameter estimates and AUC's with their standard errors, RMSE, and 95% confidence intervals (CI) obtained from the parametric, semiparametric and nonparametric ROC methods using various sample sizes from 1000 simulated data sets generated from the uniform distribution.   Additionally, determining a unique optimal cutoff value for a diagnostic test using a jagged ROC curve is notably difficult in real clinical applications [7,18]. A smooth nonparametric estimation of the ROC curve can be achieved by applying kernel smoothing, and it was demonstrated that the smooth nonparametric ROC curve is superior to the jagged ROC curve in terms of deficiency [19][20][21]. Although this estimator is smooth and robust, the approach is not as efficient as other nonparametric ROC methods [16].
Our study chose to use the semiparametric model using GLM proposed by Pepe [17] and Alonzo and Pepe [18] for comparison purposes because the estimator of this method has a high statistical efficiency [15,17,18]. The simulation and application studies demonstrated that the semiparametric GLM method provided reliable, unbiased, and consistent estimates for the parameters and AUC when the sample size was over 50. Additionally, it yielded a smooth ROC curve. This result was also confirmed by the application study.

Conclusions
We propose using the semiparametric GLM ROC method to assess the accuracy of a continuous diagnostic test if the test results have a skewed distribution. The robust estimators of this method provide a smooth ROC curve, which is important when determining the optimal cutoff value. The model also has the advantage of being easy to implement in certain statistical packages. If the results of a continuous diagnostic test have a rigorous normal or monotone distribution in both the diseased and nondiseased groups, however, the parametric method should be used. Alternatively, the semiparametric ROC method can be used for large sample sizes with normally distributed diagnostic test results. The parametric method is unreliable under other circumstances, even when the data are nearly normal. In this situation, determining the optimal cutoff value is best achieved using the semiparametric model because it has a smooth ROC curve. When applying the semiparametric ROC method, a total sample size 50 is adequate for obtaining reliable unbiased estimates and a smooth ROC curve. This study has some limitations. The main issue was that the simulations were performed by using continuous diagnostic test results. However, comparisons can be extended including ranked data as a diagnostic test. Campbell and Ratnaparkhi [22] used Lomax distribution as a model for rating data in ROC analysis. Metz et al. [23] proposed some algorithms including truth-state runs in ranked data in ROC analysis. These models may work well for the nonnormal data. Future researches should take into account the models for comparisons.