Diagnostic Accuracy of Procalcitonin Compared to C-Reactive Protein and Interleukin 6 in Recognizing Gram-Negative Bloodstream Infection: A Meta-Analytic Study

Objective Gram-negative bloodstream infections (GNBSIs), especially those caused by antibiotic-resistant species, have become a public health challenge. Procalcitonin (PCT) showed promising potential in early diagnosis of GNBSI; however, little was known about its performance under different clinical settings. We here systematically assessed the diagnostic accuracy of PCT in recognizing GNBSI and made direct comparisons with C-reactive protein (CRP) and interleukin 6 (IL-6). Methods PubMed, Embase, ISI Web of Knowledge, and Scopus were searched from inception to March 15th, 2019. Area under the summary receiver operating characteristic curve (AUC), pooled sensitivity, specificity, and diagnostic odds ratio (DOR) were calculated. Hierarchical summary receiver operating characteristic (HSROC) model was used for the investigation of heterogeneity and for comparisons between markers. Results 25 studies incorporating 50933 suspected BSI episodes were included. Pooled sensitivity and specificity for PCT were 0.71 and 0.76, respectively. The overall AUC was 0.80. The lowest AUCs were found in patients with febrile neutropenia (0.69) and hematological malignancy (0.69). The highest AUC was found in groups using electrochemiluminescence immunoassay (0.87). In direct comparisons, PCT showed better overall performance than CRP with the AUC being 0.85 (95% CI 0.81–0.87) for PCT and 0.78 (95% CI 0.74–0.81) for CRP, but the relative DORs varied with thresholds between PCT and CRP (p < 0.001). No significant difference was found either in threshold (p < 0.001). No significant difference was found either in threshold (p < 0.001). No significant difference was found either in threshold ( Conclusions PCT was helpful in recognizing GNBSI, but the test results should be interpreted carefully with knowledge of patients' medical condition and should not serve as the only criterion for GNBSI. Further prospective studies are warranted for comparisons between different clinical settings.


Introduction
Gram-negative bloodstream infection (GNBSI) is a common type of bacterial infection and also the leading cause of septic shock [1]. Missed identification of GNBSI delays treatment, increasing the risk of disability and mortality. On the other hand, the overuse of antibiotic agents in patients without GNBSI usually leads to antibiotic resistance. GNBSI caused by antibiotic-resistant species has become a public health challenge with substantial morbidity and mortality [2,3]. Therefore, early diagnosis of GNBSI is crucial for disease management. Blood culture is the gold standard in identifying causative pathogens for bloodstream infection (BSI); however, standard incubation processes would take nearly 5 days and false negatives often occur [4]. Though advanced techniques were proposed for pathogen identification, including high-throughput polymerase chain reaction (PCR), microarray-based assays, and matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS), their demands for skills and equipment were too strict to be widely satisfied, especially in less developed regions [5][6][7].
Procalcitonin (PCT), a 116 amino acid peptide biomarker, has been extensively investigated in differentiation between bacterial infection and systematic inflammatory response syndrome (SIRS) of noninfectious origin [8]. Recent studies suggested that highly elevated blood PCT level was associated with Gram-negative infection [9]. In healthy volunteers, PCT was found to increase within 4 hours after the injection of endotoxin, a specific pathogenic factor of Gram-negative bacteria, and fall rapidly during recovery [10]. This feature makes PCT an ideal candidate for early identification of GNBSI with further potential in guiding antibiotic treatment. Some studies have compared PCT with its counterparts which also exhibit potential in recognizing GNBSI, e.g., C-reactive protein (CRP) and interleukin 6 (IL-6) [9]. However, the results of these comparisons were inconsistent, and the patients' medical conditions varied greatly between studies [9].
So far, the value of PCT in early identification of GNBSI is still argued by researchers and is poorly explored in guidelines [11]. Two meta-analyses on this topic were published before, but their clinical utility was limited by either poor investigation of underlying heterogeneity or not investigating the proper diagnostic indices [12,13]. Therefore, we herein systematically assessed the diagnostic accuracy of PCT in recognizing GNBSI in patients with suspected BSI and examined the factors associated with threshold and diagnostic accuracy. We also made direct comparisons between PCT and other markers showing potential in recognizing GNBSI, including CRP and IL-6.

Materials and Methods
This meta-analysis was conducted in accordance with the Cochrane Collaboration's Diagnosis Test Accuracy Working Group protocol [14]. Findings were reported following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline (Supplementary Table S1) [15]. The protocol was registered with the PROSPERO database (registration number CRD42018092664).

Search Strategy and Selection
Criteria. Databases including PubMed, Embase, ISI Web of Knowledge, and Scopus were searched from inception to March 15th, 2019. The searched Mesh terms (for Medline), EMTREE terms (for Embase), and text words (for others) were "(procalcitonin OR PCT) AND (bloodstream infection OR BSI OR bacteremia) AND (sensitivity OR specificity OR diagnose OR predict) AND Gram negative". Reference lists of previous reviews and included original articles were also checked.
Studies were independently reviewed by two investigators (YL and NZ). Eligible studies should (1) assess the diagnostic accuracy of PCT in recognizing GNBSI in a context of suspected bloodstream infection (BSI), (2) provide a clear culture result, and (3) written in English. The exclusion criteria were (1) animal experiments, reviews, case reports, con-ference abstracts, and expert opinions; (2) information insufficient for calculating the number of true positives, false positives, false negatives, and true negatives; (3) analysis with mixed culture results; and (4) case-control studies with healthy controls. In comparisons between markers, heterogeneity in the estimated accuracy of a diagnostic test across studies is likely to occur and would confound the comparisons. Therefore, in comparing the performance between markers, we only included studies that made a direct comparison of the tests of interest either by applying both tests to each individual or by randomizing each individual to receive one of the tests [14].

Data Extraction.
Two investigators independently extracted the following data: author, year, region, assay methods for PCT, cutoffs, study design, settings, true positives, false positives, false negatives, and true negatives. Since there were no established criteria for the optimal cutoff in this diagnostic theme and the proposed optimal cutoff varied greatly between studies, we extracted the data with the highest Youden's index if multiple cutoffs were presented in a study for the index test. We referred to the corresponding authors if further information was needed.

Quality Assessment.
Methodological quality of the studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) [14]. Modifications and redefinitions were made to the rules in the QUADAS-2 tool as described in Supplementary Tables S2 and S3. The assessment was performed independently by two authors (YL and HW). Discrepancies were resolved in a consensus meeting.

Statistical Analysis.
Bivariate mixed-effects regression model was used to calculate the pooled estimates of sensitivity, specificity, and diagnostic odds ratio (DOR) with their standard errors and 95% CIs. Hierarchical summary receiver operating characteristic (HSROC) curves were constructed to assess the overall diagnostic performance. The area under the summary receiver operating characteristic curve (AUC) was used to reflect the overall predictive power. The unit of the primary analysis of this review is a suspected BSI episode. As the optimal cutoffs varied greatly from 0.291 ng/mL to 16.9 ng/mL among the included studies, we used the scatter of points and prediction ellipse to depict the observed heterogeneity graphically [14].
The direct comparisons were implemented by Rutter and Gatsonis HSROC model. We also explored the effect of covariates on heterogeneity in test thresholds (or cutoff values) and diagnostic accuracy with this model [14]. In metaregression, a p value based on the likelihood ratio χ 2 statistic was calculated. The χ 2 statistic is computed as the change in the -2Log likelihood when a covariate is added (or removed) from the logistic regression model. When statistical significance is found in a test threshold between two and three conditions of a certain covariate, it is suggested that the SROC curves of these conditions have different shapes and the ratio of diagnostic odds ratio (rDOR) will not be constant along the entire length of the curve, which means the 2 Disease Markers relative diagnostic accuracy under these different conditions varies with thresholds [14]. If no statistical significance was found in the test threshold, then the HSROC model could be further simplified by removing the parameters for threshold, leaving only parameters for accuracy [14].  Figure 1, the search retrieved 1003 records. After screening titles and abstracts, 131 full-text articles were assessed and 25 were included . If PCT was used to discriminate GNBSI from two different types of BSI with overlapped population in a study, datasets with the largest sample number were adopted [26,27,36].  Figure 1: Study selection. * Beyond the topic of this review: once the article types were qualified, studies were further checked for their topic; ineligible studies were excluded.  3.3. Quality Assessment. The overall and individual dataset's quality assessment according to our tailored QUADAS-2 checklist in four domains ("patient selection," "index test," "reference standard," and "flow and timing") are summarized in Supplementary Figures S1 and S2. All included studies used blood culture as the reference standard for diagnosis of GNBSI. In general, the included studies showed moderate (without high-risk items) risk of bias in three of the four domains and high applicability, but high risk of bias in "index test" domain was found in 11 studies [17-23, 28, 30, 31, 33]. The high risks of bias were mainly caused by using a data-driven method, namely ROC analysis, for calculation of optimal cutoff in a relatively small number of patients [42].

Diagnostic Accuracy of PCT.
For recognizing GNBSI in a context of BSI, the median optimal cutoff value of PCT was 1.3 (IQR 0.5-8.06) ng/mL, the pooled sensitivity and specificity were 0.71 (95% CI 0.66-0.76) and 0.76 (95% CI 0.71-0.80) (Figure 2), respectively, and the pooled DOR was 7.60 (95% CI 5.51-10.48) (Supplementary Figure S3). The value of AUC was 0.80 (95% CI 0.76-0.83) ( Figure 3). As substantial heterogeneity was indicated by the scatter of points and prediction ellipse, we further conducted subgroup and     Disease Markers  7 Disease Markers metaregression analysis. In the subgroup analysis, the lowest values of AUC were found in patients with febrile neutropenia (0.69) and hematological malignancy (0.69), and the highest value of AUC was found in groups using electrochemiluminescence immunoassay (ECLIA) (0.87). The lowest sensitivity was found in patients with hematological malignancy (0.52); the highest sensitivity was found in discriminating GNBSI from Gram-positive BSI (0.77). The lowest specificity was found in groups using BRAHMS-KRYPTOR assay (0.65); the highest specificity was found in groups using ECLIA (0.84) ( Table 2).
In analyzed covariates of medical contexts, diagnostic accuracy of PCT was found not to vary with thresholds (p1 > 0:05). With further simplification of the model, the diagnostic accuracy of PCT was found significantly lower in hematological malignancy patients (p2 = 0:032, Figure 4(a)). In the comparison between studies with adult population and mixed population (adult and pediatric patients), the rDOR of PCT was suggested to vary with thresholds (p1 = 0:043, Figure 4(b)). No statistically significant impact of the rest of the investigated covariates, including types of BSI, sepsis status, febrile neutropenia status, culture positivity, region, settings, assay method for PCT, and sample type, was found either on threshold or on accuracy (p1 > 0:05, p2 > 0:05, Table 2).
Supposing the pretest probability of GNBSI in all patients with suspected BSI to be 47% (the median prevalence of GNBSI in patients with suspected BSI), Fagan's nomogram for likelihood ratios indicated that, with the assistance of PCT test, the postprobability increased to 72% when the PCT test results were positive and the postprobability decreased to 25% when the results were negative (Supplementary Figure S4) [16]. Deek's funnel plots suggested potential publication bias (t = 2:48, p = 0:02, Supplementary Figure S5).

3.5.
Comparisons of PCT with CRP and IL-6. In 13 studies simultaneously assessing the performance of CRP and PCT for discriminating GNBSI from BSI of other origins in a total of 7371 episodes, the pooled DORs of PCT and CRP were 11.40 (95% CI 6.13-21.21) and 6.39 (95% CI 3.40-11.99) ( Table 3). In 5 studies simultaneously assessing the performance of IL-6 and PCT in a total of 3455 episodes, the pooled DORs of IL-6 and PCT were 11.86 (95% CI 3.95-35.64) and 17.98 (95% CI 4. 47-72.41). Additionally, these later five studies also investigated the performance of CRP with a pooled DOR being 11.86 (95% CI 3. 29-42.74).
In direct comparisons between biomarkers, PCT showed higher overall performance than CRP with the AUC being 0.85 (95% CI 0.81-0.87) for PCT and 0.78 (95% CI 0.74-0.81) for CRP. However, the shape of the summary curve differed between studies using PCT and CRP (χ 2 = 446:4 − 434:2 = 12:2, p < 0:001), which indicated that the relative accuracy of the test would vary with threshold ( Figure 5(a)). Focusing on the region of the plot covering the observed data, the interpretation of which marker showed higher accuracy depended on the threshold: when the specified threshold defined a sensitivity > 0:42 or a specificity < 0:85, the diagnostic accuracy was higher in PCT test compared to CRP [14]. In the comparison between PCT and IL-6, the two curves can be assumed to have the same shape (χ 2 = 125:2 − 125 = 0:2,  Figure 5(b)). Though bivariate model showed a higher diagnostic odds ratio in PCT than in IL-6, further simplification of the HRSOC model showed no significant difference in diagnostic accuracy between PCT and IL-6 (χ 2 = 125:7 − 125:2 = 0:5, p = 0:480).

Discussion
Recent original studies and meta-analyses highlighted the effectiveness of PCT protocols in early diagnosis of bacterial infection and further in assisting in the initiation and termination of antibiotic treatment [43][44][45][46][47]. Though the value of PCT in recognizing GNBSI has been explored, utility of the results in most studies is hampered by either small sample size or limited clinical information. Only two meta-analyses were published on this topic [12,13]. He et al. estimated the overall accuracy of PCT for diagnosing GNBSI and found its sensitivity being 0.73 (95% CI 0.68 to 0.78), specificity being 0.74 (95% CI 0.64 to 0.81), DOR being 7.59 (95% CI 5.31 to 10.85), and AUC being 0.79 [13]. In their study, pairs of sensitivity and specificity were transformed into a single indicator (diagnostic odds ratio) to investigate heterogeneity; as a result of this process, the analysis was simplified but the merits of the two-dimensional nature of the data were lost [48]. Furthermore, the analyzed covariates were so limited that the difference between specific conditions, including age, background diseases, and PCT test methods, could not be revealed. In the other meta-analysis, Tang et al. compared concentrations of PCT in patient with Gramnegative and Gram-positive bloodstream infections; however, the diagnostic indices, such as sensitivity and specificity, were not investigated [12]. The results of this meta-analysis indicated a helpful potential of PCT in recognizing GNBSI with an overall AUC of 0.80. This diagnostic value maps onto an increase to 72% in positive postprobability and a decrease to 25% in negative postprobability compared to a pretest probability of GNBSI of 47%. The relative diagnostic value varied between different patient populations with AUC values ranging from 0.69 in febrile neutropenia and hematological malignancy patients to 0.87 in groups using electrochemiluminescence immunoassay. To our knowledge, this is the first meta-analysis to provide direct comparisons of the diagnostic value of PCT with CRP and IL-6 in recognizing GNBSI. We herein identified a trend indicating PCT being superior to CRP in recognizing GNBSI, while the relative diagnostic ratio changes across thresholds.  [49]. In healthy individuals, PCT found in the circulation would be ≤0.1 ng/mL [50]. Normal or slightly elevated PCT level in critically ill septic patients was more likely to be a result of viral infection or systemic inflammatory response of noninfectious origin rather than bacteremia (including both Gram-negative and Gram-positive infection) or fungemia [12,22,48,51]. In a previous meta-analysis, the mean concentration of PCT was found to be around 6 ng/mL in patients with Gram-positive and/or fungal infections, which is significantly higher than in healthy controls [12]. However, in Gram-negative infections, the PCT level was found to be even higher with its value being around 13 ng/mL, which indicates the level of induced PCT concentration differs among pathogens even in bacteremia [12]. Though the proposed optimal cutoffs varied greatly from 0.291 ng/mL to 16.9 ng/mL in our included studies, the results consistently indicated a higher level of PCT in Gram-negative infections than in Gram-positive and/or fungal infections [17-20, 22, 23, 26-28, 30, 31, 34, 40, 41]. Therefore, with algorithms based on staged cutoffs, e.g., 6 ng/mL for differentiation between Gram-positive (and/or fungal) infections and healthy controls and 13 ng/mL for differentiation between Gram-negative infections and Gram-positive (and/or fungal) infections, PCT was potentially helpful in differential diagnosis among bloodstream infections or sepsis arising from diverse pathogens [8]. However, it should be noted that the cutoffs should be carefully selected based on the population characteristics and assay techniques, because significant heterogeneity was identified between different clinical settings in our meta-analysis. Though our study failed to identify statis-tically significant differences in the diagnostic performances (thresholds and accuracies) of PCT either between different types of BSIs or between different states of culture positivity (culture positive or negative), there were nonsignificant trends indicating PCT could be more useful for diagnosing GNBSI in patients with bacterial infections and positive cultures than in their opposite conditions.
The metaregression results suggested the diagnostic accuracy was relatively low in patients with hematological malignancies (acute leukemia, lymphoma, and other hematologic malignancies), implicating unreliability of the PCT test for diagnosing GNBSI in patients with hematological malignancy. Noticing that the optimal cutoffs reported in these studies were 0.5-1.52 ng/m, which was fairly close to the cutoff used in discriminating bacterial infection from nonbacterial infection, patients with hematological malignancy could possibly lose part of the ability to respond to Gram-negative bacteria or their products [8]. Our results also identified a nonsignificant trend indicating PCT could be of greater value in sepsis patients than in patients without sepsis. However, it should be noted that PCT is reported to correlate with the severity of infection and the diagnostic accuracy could be therefore affected. Unfortunately, we were not able to evaluate the impact of severity of infection because few of the included studies documented PCT values along with individual severity [52]. As Gram-negative infections are usually associated with increased severity of diseases, the issue whether PCT concentration is affected by severity or pathogen remained to be further discussed [53].
Demographical and technical characteristics were also crucial aspects in clinical practice. Although excellent performance has been reported in some East Asian studies with both sensitivity and specificity being over 85%, our pooled 10 Disease Markers results failed to identify a significant difference between East Asian and European population [22,23]. In our subgroup analysis, we failed to get exact age and sex information from some of the studies, which hindered the analytical process going further [21,26,31,33,36,37,40,41,54]. Our results assessing the performance of different PCT assays were in line with previous studies which demonstrated equivalence among 3 different PCT assays (Kryptor, Vidas, or Elecsys/-Cobas), as the threshold and accuracy were suggested consistent across these three tests in our study [55,56]. A study comparing recent popular PCT assay systems showed the results from these systems correlated well, but their regression lines varied considerably. In future research, preexperimental calibration could possibly help reduce heterogeneity when diagnostic tests using different assay systems were compared in a single study [57].

PCT in Comparison with Other
Biomarkers. CRP and IL-6 were the markers most frequently compared with PCT, while the results were somehow inconsistent [17, 18, 21-24, 27, 30, 33-35, 38, 39]. This issue was further explored in our study by direct comparison; the findings suggested variation in diagnostic accuracy across different thresholds, which meant the diagnostic accuracy of PCT was superior to CRP at some certain thresholds while inferior at others, but no significant difference was found between PCT and IL-6. Under most circumstances, PCT should be recommended over CRP, as the overall diagnostic accuracy of PCT was higher than CRP. Though the diagnostic accuracy of IL-6 was found higher than PCT in some researches, the direct comparison failed to identify a statistical significance [39]. In clinical practice, IL-6 has potential in serving together with PCT as markers for GNBSI and researches are needed for comparative effectiveness of IL-6 under different clinical settings [33,39]. Endotoxemia was another widely investigated marker for GNBSI and was also systematically reviewed for prediction of GNBSI [58]. The pooled DORs of endotoxemia were 3.2 and 5.8 in association with GNBSIs with Escherichia coli and those with Pseudomonas aeruginosa, which were both lower than the DORs of PCT, CRP, and IL-6 derived in our study [58]. However, because none of the studies assessed PCT and endotoxemia tests in the same population, direct comparison between PCT and endotoxemia was not feasible. Increased leukocyte count is also demonstrated in some researches as a feature of GNBSI and showed potential in differentiation between GNBSI and other types of bloodstream infection, but few studies analyzed the corresponding diagnostic indices, such as specificity, sensitivity, and AUC [59][60][61][62][63][64][65]. Additionally, promising results of TNF-α and IL-8 tests were reported in predicting GNBSI in abdominal sepsis patients, with AUCs being 0.912 and 0.999, sensitivities being 90.2% and 97.6%, and specificities being 87.5% and 100%, respectively [66]. Performance of prepsin and IL-10 in recognizing GNBSI was found superior to PCT in certain contexts, including adult patients after HSCT and children with hematology-oncology disease [35,38]. Although these markers were found valuable in diagnosing GNBSI, the number of studies were not enough for a meta-analysis [66][67][68][69][70][71][72][73].
Alternatively, the use of comprehensive sets of markers, especially those correlated with severity of the disease, together with PCT may help improve its performance in recognizing GNBSI [22,66].

Limitations.
Our meta-analysis has several limitations. First, information on patients' medical condition is extremely limited. Patients with suspected BSI could have diverse comorbidities, while most studies only recorded comorbidities of interest, e.g., sepsis and hematological malignancy. Changes in patients' medical conditions could cause fluctuations in PCT level and therefore affect the diagnostic performance. Also, the PCT levels could be influenced by some drugs, such as antithymocyte globulin (ATG) [74]. Second, the timing of measurement was seldom mentioned in our included studies. Once triggered by toxins, PCT increases in a sigmoid manner, false negatives might take place at an early stage if toxins were not enough for triggering a surge in PCT levels [8,10]. Third, since there were no established criteria for selecting the optimal cutoff in this diagnostic theme, 11 studies used ROC analysis to derive optimal cutoffs. A predefined cutoff could help in reducing the bias in sensitivity and specificity possibly caused by this datadriven method [42]. Additionally, in this present study, we were not able to calculate a specific cutoff for clinical use, because individual patient data on PCT concentration was not available in most studies.

Conclusions
PCT was helpful in recognizing Gram-negative bloodstream infection, but the results should be carefully interpreted with full knowledge of patients' medical condition. In patients with hematological malignancy, PCT should not be encouraged to be used as a marker for GNBSI. Also, results of PCT tests should be interpreted separately in adult and pediatric population. Though PCT showed a higher diagnostic odds ratio compared to CRP and IL-6, selection of the optimal biomarkers should be done carefully considering the required range of the sensitivity and specificity. In future research, features of medical context, demographics, and demands for sensitivity and specificity should be taken into consideration. Further prospective studies are warranted for comparisons between different clinical settings.

Data Availability
The dataset can be requested by sending an email to the corresponding author.

Conflicts of Interest
The authors have declared that no conflicts of interest exist.