Differentiating Small (≤1 cm) Focal Liver Lesions as Metastases or Cysts by means of Computed Tomography: A Case-Study to Illustrate a Fuzzy Logic-Based Method to Assess the Impact of Diagnostic Confidence on Radiological Diagnosis

Purpose. To quantify the impact of diagnostic confidence on radiological diagnosis with a fuzzy logic-based method. Materials and Methods. Twenty-two oncologic patients with 20 cysts and 30 metastases ≤1 cm in size found at 64-row computed tomography were included. Two readers (R1/R2) expressed diagnoses as a subjective level of confidence P(d) in malignancy within the interval [0,1] rather than on a “crisp” basis (malignant/benign); confidence in benignancy was 1 − p(d). When cross-tabulating data according to the standard of reference, 2 × 2 table cells resulted from the aggregation between p(d)/1 − p(d) and final diagnosis. We then assessed (i) readers diagnostic performance on a fuzzy and crisp basis; (ii) the “divergence” δ(F, C) (%) as a measure of how confidence impacted on crisp diagnosis. Results. Diagnoses expressed with lower confidence increased fuzzy false positives compared to crisp ones (from 0 to 0.2 for R1; from 1 to 2.4 for R2). Crisp/fuzzy accuracy was 94.0%/93.6% (R1) and 94.0/91.6% (R2). δ(F, C) (%) was larger in the case of the less experienced reader (R2) (up to +7.95% for specificity). According to simulations, δ(F, C) (%) was negative/positive depending on the level of confidence in incorrect diagnoses. Conclusion. Fuzzy evaluation shows a measurable effect of uncertainty on radiological diagnoses.


Introduction
The confidence underlying medical diagnosis has a pivotal impact on clinical decisions [1]. Based on different levels of diagnostic confidence (DC), a specific therapy can be promptly instituted or withheld while waiting for the results of additional investigations. This translates into important consequences in terms of appropriateness, efficacy, and costs of therapies [2]. Similar concepts can be extended to radiological diagnosis. The widespread use of conceptual instruments such as the Breast Imaging Reporting and Data System (BI-RADS) exemplifies the need for taking into account the DC (e.g., expressed as a diagnostic probability) in order to guide patients' management [3].
Quantifying the effect that a diagnostic test has on DC serves as an assessment of the efficacy of that test [4,5]. To our knowledge, a number of analytic methods have been proposed for this purpose [6], especially in order to take into account the impact of incorrectly confident diagnoses on patients management [2,7]. In general, these methods assess the changes between pre-and posttest confidences of referring physicians by using proportions or scoring systems (e.g., 1 to 5 scale) rather than a continuum of information. Consequently, these methods (i) do not reflect the variability 2 Computational and Mathematical Methods in Medicine of DC inherent to test interpretation, and consequently they do not express the radiologist point of view and (ii) do not measure the direct effect of DC levels on diagnostic performance (i.e., the "radiologist efficacy" rather than "test efficacy"). As previously emphasized by Castanho et al. [8] fuzzy logic, which is a cognate to sets theory introduced by Zadeh in 1965 [9], has the potential to contribute to this field. Fuzzy logic is successfully used in many technology systems and has been repeatedly investigated for clinical applications [10]. By definition, this approach is used when it is difficult to classify objects in collections through a binary ("crisp") process. Accordingly, intermediate membership degrees are defined into the interval [0, 1], [11]. For instance, a man with 1000 hairs can be viewed as belonging either to the set of bald men (e.g., with a membership degree of 0.8) or to the set of not bald men (with a membership degree of 0.2). Using the same conceptual framework, a radiological diagnosis (e.g., liver metastasis) might be viewed as belonging to both sets of correct and incorrect diagnoses at the same time, depending on the uncertainty with which the reader achieved it on the basis of the available radiological signs. One (1) or zero (0) values (i.e., the lesion "is" or "is not" a metastasis, resp.) then become a special case of a continuum of (un)certainty rather than absolute values.
To our knowledge, no previous studies compared crisp and fuzzy diagnostic performances. In this study, we assumed that levels of DC in a given diagnosis are equivalent to the fuzzy membership degrees expressing how much that diagnosis belongs to the sets of correct diagnoses. We then adjusted the diagnostic performance of radiologists for the level of DC, based on the method used by Castanho et al. [8], to "fuzzify" sensitivity and specificity (and, by extension, predictive values and accuracy). Crisp and fuzzy diagnostic performances of radiologists were compared accordingly, introducing a measure that we named "divergence. " The method was tested in a simplified, dichotomous clinical scenario, in order to verify whether the effect of DC on diagnostic performance is measurable on real readers. Additionally, we provided two simulations from clinical data in order to emphasize the effect of confidently incorrect diagnoses on diagnostic accuracy.

Materials and Methods
Because of the retrospective design and the theoretical nature of the study (leading to the absence of clinical implications for patients), the approval by Institutional Review Board was not required, according to laws and regulations of our country. However, patients' data were managed according to the ethical principles for medical research as stated by the Declaration of Helsinki, and patients gave informed consent to undergo computed tomography (CT).

Clinical Model and Patients Population.
We searched in our institutional database for all oncologic patients who performed a baseline abdominal or thoracoabdominal CT between March 2009 and March 2010. Of them, we included those presenting up to five focal livers lesions matching the following criteria: (i) a maximum diameter ≤1 cm on axial images; (ii) hypodense appearance on venous and/or equilibrium phases as compared to the surrounding liver; (iii) to have been assessed as metastases or cysts by a consensus panel of two experienced radiologists who reviewed images of baseline and follow-up CTs, magnetic resonance imaging (MRI), and/or ultrasonography (US) examinations. They assessed as metastases those lesions showing any modification in dimensions and attenuation characteristics at CT, whereas lesions remaining stable were assessed as cysts ( Figure 1). Alternatively, they assessed the lesions based on their appearance at MRI or US. Additional inclusion criteria were represented by the absence of extrahepatic findings in the upper abdomen scan. Excluded were patients who did not match the above criteria, including those with additional focal liver lesions of any nature.

CT Protocol.
Patients underwent examinations on multidetector CT scanner (LightSpeed CT750 HD, GE Healthcare, Milwaukee, WI, USA) with 64 sections at a detector collimation of 0.625 mm, a table feed speed of 39.37 mm per rotation, a pitch of 0.98, and a gantry rotation time of 0.5 s. Reconstructed images were displayed as 1.25 and 4 mm thick images. Both image sets were accessible to radiologists for image analysis.
All patients received i.v. injection of 600 mg iodine/Kg of nonionic iodinated contrast material at the concentration of 400 mg iodine/mL (Iomeron 400 Bracco SpA, Milan, Italy), using a commercially available power injector (CT-Injector Missouri XD 2001, Ulrich Medical, Ulm, Germany). After the acquisition of unenhanced images, a bolus-tracking program (Smart Prep; GE Healthcare, Milwaukee, WI, USA) was used to determine the time to initiate diagnostic scanning, after placing a circular Region of Interest in the aorta just above the diaphragmatic dome and fixing a threshold of 70 Hounsfield Units. When the bolus-tracking threshold was reached, triple-phase contrast-enhanced diagnostic scans were acquired with acquisition delays of 35 s for the arterial, 110 s for the venous, and 180 s for the equilibrium phases.

Fuzzy Logic Basic Principles and Comparison with the
Theory of Probability. The notion of fuzzy set has been introduced by Zadeh [9] in order to formalize the concept of gradedness in class membership, in connection with the representation of human knowledge.
Fuzzy sets seem to be relevant in all the informationdriven tasks where we need to make a classification based on data analysis and approximate reasoning. Gradedness in class membership is specified by the "membership function" ( ), which measures the degree of membership of an element in a fuzzy set , defined on a referential . Literature provides at least three different interpretations of the concept of membership function, which are degree of similarity, preference, and uncertainty [12]. In the present study we used the semantics of uncertainty (in the general sense of incomplete knowledge, rather than only randomness), which is captured by fuzzy sets and fuzzy logics in the framework of possibility theory. This interpretation was proposed by Zadeh [13] when he introduced fuzzy logics, which is quite akin to "possibility theory, " and developed his theory of approximate reasoning [14]. ( ) is then the degree of possibility that a parameter has value , (the degree up to which the proposition " = " is true), given that all that is known about it is that " is " (e.g., " is bald"). What happens is that the extreme values of the membership function are mutually exclusive, and the membership degrees rank these values in terms of their respective plausibility. Uncertainty is often measured in terms of frequency of precise observed situations in a random experiment, and this approach leads to probability theory. However, uncertainty can also emerge in all cases in which it is not related with the frequency of random outcomes but with the imprecise nature of observations. When the repeated observations are precise, the probability assignments to the elements of can be viewed as special membership functions such that the sum of membership grades is 1 ("singleton" fuzzy sets). On the other hand, when repeatedly observed situations are imprecise, more general kinds of membership functions are necessary. The degree of membership ( ) can then be computed as the proportion of observations that do not rule out the situation . In this case, the membership function is interpreted as a "plausibility" function, since ( ) = 1 means that is ruled out by no observation.
In the context of medical diagnosis, the degree of membership ( ), that is the proportion of observations that do not rule out the situation , could be effectively measured (at least in principle) by the proportion of times the situation has been tagged in a random experiment where individuals presented to situation are asked to put the tag " " on or not. This situation happens, for example, when (i) we show the same (blind) report to the same reader in different time, or (ii) we show much copies of the same (blind) report dispersed among a huge number of reports. Even if we are using relative frequencies, their mining is not that of probabilities, because no randomness is available, but rather a decision is relevant.

Fuzzy Logic Method and Image Analysis.
Anonymized CT images from the venous phase were independently evaluated on a dedicated workstation (Osirix Aycan Workstation Osirix Pro, Rochester, NY, USA) by two radiologists with 15 (R1) and 5 (R2) years of experience in abdominal radiology, respectively. R1 and R2 were different radiologists than those who established final diagnoses by consensus. Both R1 and R2 were blinded to final diagnosis but not to the clinical status of patients, in order to reflect as much as possible the real clinical scenario. For the same purpose, R1 and R2 were left free to examine the whole liver in case of patients with multiple lesions. However, the visible CT examination was limited to the upper abdomen to avoid that eventual collateral findings might act as confounders.
To simplify the model, readers were aware that lesions were metastases or cysts at final diagnosis. Let be a nonempty set indicating the universe of all possible test results and and two subsets of containing the "positive test results" and "negative test results, " respectively. For each lesion, readers were asked to express their subjective DC in the positive diagnosis ( ) of metastasis by means of the fuzzy term ( ); its value is included in the interval [0, 1] (0 ≤ ( ) ≤ 1). Values were limited to the first decimal position, except for 0.51 and 0.49 indicating a ( ) near to the perfect uncertainty (0.50). In accordance with Castanho et al. [8], ( ) can be interpreted as the membership degree (or membership function) ( ) of a given radiological diagnosis to the fuzzy set . By assuming a unitary value for the whole DC, the confidence in the alternative diagnosis of cysts (negative test result) will be corresponding to the complementary membership degree of that radiological diagnosis to the fuzzy set . So ( ) can be interpreted as the membership degree ( ) of a given radiological diagnosis to the fuzzy set [8].
In other words, we can express R1 and R2 diagnoses on a continuous interval as functions of DC, rather than on classical "crisp, " mutual exclusive basis 0 versus 1. Accordingly, a given diagnosis might belong at the same time to the sets and with complementary membership degrees. For example, a ( ) = 0.8 for a given diagnosis indicates that the lesion was interpreted as being a metastasis with a DC = 0.8 and a cyst with a DC = 0.2 (and vice versa for a ( ) = 0.2).

Analysis of Readers' Diagnostic Performance.
At the end of R1 and R2 readings, fifty couples of ( ) and ( ) values (one for each reading) were available. Analysis of the impact of DC on readers' performance was articulated in three steps. First, we cross-tabulated fuzzy data into a 2×2 table according to the results of the standards of reference. Because there is no graduation between a cyst and a metastasis, the latter was expressed on a crisp basis. In accordance with rules provided by Parasuraman et al. [15], the 2 × 2 table was interpreted as the result of the mathematical operation of aggregation, based on the min[ , ] operator, within the fuzzy subsets and (expressing the test results) and the subsets "test results  associated with malignancy" ( ) and "test results associated with benignancy" ( ), that is, Tables 1(a) and 1(b) illustrate how each couple of complementary fuzzy values was entered in the 2 × 2 table, according to the standard of reference result. Given all values, the resulting subsets PM, PB, NM, and NB corresponded, in the 2 × 2 table, to those of fuzzy true-positive (fTP), fuzzy falsepositive (fFP), fuzzy true-negative (fTN), and fuzzy falsenegative (fFN) cases, respectively. It has been demonstrated elsewhere [8] that global fTP, fTN, fFP, and fFN values correspond to the algebraic sum of complementary fuzzy membership degrees expressed for 30 patients with liver metastases (in the case of fTP and fFN diagnoses) and 20 patients with cysts (in the case of fFP and fTN diagnoses, resp.) ( Table 2).
Second, we calculated per-lesion sensitivity, specificity, positive-predictive value (PPV), negative-predictive value (NPV), and accuracy for malignancy (together with 95% C.I.s) (i) on a crisp basis, that is, by assuming that 2 × 2 table cells were filled with mutual exclusive 1/0 values; (ii) on a fuzzy basis, that is, by using fTPs, fFNs, fFPs and fTNs resulting from the above method [8,14]. Crisp diagnosis of malignancy or benignancy was assumed-for a given lesion-when ( ) or ( ) was equal to or larger than 0.51, respectively. In the case of perfect uncertainty ( ( ) = 0.5), we assumed a crisp diagnosis of metastasis.
Finally, we assessed the impact of DC on diagnosis as the "divergence" ( , ) between fuzzy and crisp proportions, calculated as follows: ( , ) (%) = ( crisp value − fuzzy value fuzzy value ) × 100. (3) Table 2: Formulas for determining the total number of fuzzy true positives (fTPs), fuzzy false negatives (fFNs), fuzzy false positives (fFPs) and fuzzy true negatives (fTNs) used to estimate fuzzy sensitivity, specificity, PPV, NPV, and accuracy. Of note, those value can not be integer numbers.
In the classical crisp approach for diagnosis, the reader is forced to press his own uncertainty (i.e., fuzzy DC levels) on a "flat" dichotomic basis 0-1. This leads to a skewed building of the 2 × 2 table based on the standard of reference. So the divergence ( , ) is here assumed to represent the "error" inherent to crisp values of diagnostic performance. In other words, ( , ) measures how much crisp evaluation differs from "the real state" of uncertainty (unexpressed in crisp setting) with which radiologists achieve a diagnosis, that is, the "real" diagnostic performance as adjusted for DC levels.
Based on simulated membership degrees, we estimated fuzzy sensitivity, specificity, PPV, NPV, and accuracy (on a per-lesion basis) for both scenarios, together with the divergence ( , ) between crisp and fuzzy proportions.

Readers' Diagnostic Performance.
According to the crisp evaluation of CT images, R1 and R2 had 27 and 28 TPs, 3 and 2 FNs, 0 and 1 FPs, and 20 and 19 TNs, respectively. The set of readers' fuzzy diagnostic attributions are shown in Table 3, whereas 2 × 2 tables built on fuzzy membership degrees of corresponding fuzzy subsets are shown in Table 4(a) for R1 and Table 4(b) for R2. In the case of 30 patients with liver metastases, R1 achieved the same number of crisp and fuzzy TP and FN diagnoses (27 and 3, resp.). This was related to the high DC with which R1 made the majority of fuzzy diagnoses. In particular, ( ) and ( ) ranged from 0.8 to 1 in 27 and 2 patients, respectively (Table 4(a)). Concerning R2, (s)he was quite confident both in the majority of correct diagnoses (28/30 with ( ) = 1) and one incorrect diagnosis ( ( ) = 1). However, since ( ) for the one FN diagnosis was 0.2 (i.e., ( ) = 0.8), fTPs (28.2) and fFNs (1.8) results slightly increased and decreased compared to crisp ones, that is, 28.2 and 1.8 versus 28 and 2, respectively (Table 4(b)).
In the case of patients with cysts, R1 attributed a ( ) = 0 (i.e., ( ) = 1) to 19 of 20 of them, but (s)he was less confident in one single diagnosis of benignancy ( ( ) = 0.2). Compared to the crisp scenario, the effect was to slightly Table 3: Set of P(d) and N(d) values attributed to 50 focal liver lesions by reader R1 and reader R2, respectively, with the calculations necessary to evaluate fTPs, fFNs, fFPs, and fTNs based on final diagnosis. Numbers in the brackets represent a fuzzy DC value multiplied ( ) by the number of lesions to which it was attributed. increase the FPs (from 0 to 0.2) and decrease the TNs (from 20 to 19.8) (  When estimating sensitivity, specificity, PPV, and NPV with the above fuzzy values, variations with respect to crisp proportions occurred as shown in Table 5. ( , ) (%) values are reported as well. In the case of R1 crisp diagnostic performance was slightly "overestimated" as compared to fuzzy ones in terms of specificity, PPV, NPV, and accuracy. In other words, positive ( , ) (%) values indicate that crisp performance was higher than that adjusted for the level of DC, that is, when R1 was not forced to express diagnoses dichotomically. The effect of DC level was more complex for R2, showing that the crisp performance was (i) slightly "underestimated" in the case of sensitivity and NPV and (ii) more largely overestimated in the case of the remaining estimates, especially specificity ( ( , ) = +7.95%).
Difference between crisp and fuzzy proportions was not statistically significant (using an alfa of 0.05), as arguable from the overlap in 95% C.I.s. Table 6. According to the first scenario, confidently incorrect diagnoses contributed to increasing crisp diagnostic performance compared to the fuzzy one, except for specificity ( ( , ) (%) = 0). On the contrary, lower level of DC in incorrect diagnoses led to a decrease of crisp performance in the second scenario ( ( , ) (%) as low as −16.7% in the case of specificity). It is easy to demonstrate that ( , ) might have assumed a positive sign by entering less incorrect diagnoses in the model.

Simulations. Results of simulations are shown in
Difference between crisp and fuzzy proportions was not statistically significant (using an alfa of 0.05), as arguable from the overlap in 95% C.I.s.

Discussion
Fuzzy logic has been used in medicine to define complex epidemiological and clinical models for health decisions or risk prediction [16]. Applications in radiology have been mainly focused to test algorithms of image elaboration [17] or computer-aided diagnosis accounting for uncertainty inherent to lesions variability [18], as well as to fuzzify image findings features in the attempt to provide more effective diagnoses [19]. In the case of clinical applications, the basic principle has been to define to which extent one or more radiological features belonged to different diagnoses. Of these, the one showing radiological features with the highest membership degree is assumed to represent the "true" diagnosis. In summary, fuzzy logic has been applied as an instrument to facilitate diagnostic reasoning, whereas the fuzziness underlying the performance of a diagnostic test has been poorly investigated [20]. We used the method proposed by Castanho et al. [8] to express diagnoses on a fuzzy rather than binary base, thus estimating fuzzy sensitivity, specificity, NPV, PPV, and accuracy values. Differently from these authors, we investigate an imaging modality involving radiologists' decisions rather than objective results of a diagnostic tool. Accordingly, fuzzification was performed by using levels of DC as a measure of readers' uncertainty. One could argue that this choice is arbitrary and/or inappropriate. However, linguistic variables underlying fuzzy sets are a matter of interpretation [12], and our choice was founded on a mathematically consistent model, regardless of specific semantic attributions and specific causes of readers' uncertainty [8,15]. It might be speculated that this is a potential advantage of this method, since uncertainty can be evaluated regardless of its origin (e.g., before and after an educational program or in different environmental conditions affecting readings). It should be pointed that, compared to previous works we presented in this field at European and American radiological congresses under the form of oral presentation [21]  Given the above premises, we asked two radiologists to quantify the diagnosis of liver metastasis as a whichever value ( ) ranging from 0 to 1, based on the degree of DC at a single-time reading. A simplified clinical model was set for this purpose, providing one complementary alternative only ( ( ) = 1 − ( )) to main diagnosis (hepatic cyst versus metastasis at MDCT, resp.). Both radiologists were shown to be highly confident in making diagnoses, regardless of their correctness: they used ( )/ ( ) values of 1 or 0 in the majority of cases (44/50 and 40/50 lesions for R1 and R2, resp.). The degree of DC was relatively high even in more uncertain cases, corresponding to ( )/ ( ) no less than 0.8 for R1 and 0.7 for R2. Moreover, R2 showed average lower DC compared to R1, as expected on the basis of differential effects of experience (5 versus 15 years, resp.) [22]. In summary, we  showed that differences in levels of DC have a measurable effect on the nominal, crisp diagnostic performance of radiologists, and that experience has a key role in such a determination. Fuzzy evaluation could be then used to define whether readers involved in radiological studies fulfil ideal criteria for image interpretation, that is, making correct diagnoses with high DC.
DC has been advocated to impact on diagnostic efficacy [2]. It is reasonable to assume that readers' efficacy will depend on how informative a diagnosis is, that is, on how much diagnostic "truth" it contains in order to guide the clinical workup. Given the uncertainty that unavoidably affects subjective image interpretation, we assumed that crisp diagnostic performance is not entitled to fully represent such 8 Computational and Mathematical Methods in Medicine a "truth, " that is, fuzzy diagnosis only is really representative of the "real state" of information carried by a diagnosis. Thus, we tested readers' efficacy by estimating the divergence ( , ) between fuzzy and crisp sensitivity, specificity, PPV, and NPV. In other words, we measured how much crisp performance is affected by levels of DC underlying radiological diagnoses, that is, how much information do diagnoses carry in the light of their subjective origin. Positive ( , ) values indicate that crisp diagnoses are overestimated as compared to fuzzy ones, because of lower levels of DC. In the case-study, R2 showed larger positive ( , ) (up to +7.95% in the case of specificity) compared to R1. One can suppose that a clinician aware of ( , ) values would probably find R1 diagnoses more informative than R2 ones and would probably change (or not) patients management based on the level of DC underlying radiological diagnoses, for example, by adding (or not) further diagnostic procedures before any treatment. It should be pointed that the estimate of ( , ) refers to a set of readings given the "truth" established by a standard of reference, that is, to the body of evidence provided by one radiologist on a large series of readings, as for example, occurs in an experimental study. The effect on the referring physician of expressing each single diagnosis as a fuzzy value in clinical practice, where the "truth" about the patient just comes at the end of the work-up process, should be a matter of a specific study, and its estimate was beyond the purpose of our work. In the case of negative values, ( , ) expresses how much crisp diagnosis is underestimated as compared to fuzzy one. This occurs when incorrect diagnoses are expressed with lower levels of DC, as shown in the case of R2 in the clinical setting. Since ( ) for one FN negative diagnosis was 0.8, the number of f Ns slightly decreased compared to crisp ones (from 2 to 1.8, resp.). Accordingly, ( , ) for sensitivity and NPV was −0.71 and −0.27, respectively. This effect was emphasized by supposing more incorrect diagnoses with low DC, as we did in the second simulation. The diagnostic performance of the simulated reader (R3) was systematically lower on a crisp rather than a fuzzy basis, that is, when accounting for DC levels (up to −16.67% in the case of specificity). Thus, in a context of low accuracy readers should be unconfident in incorrect diagnoses in order to be more efficacious. On the other hand, high DC in incorrect diagnoses in the first simulation had the main effect to increase diagnostic accuracy (up to 5.26% in the case of sensitivity). In other words, DC level and ( , ) are able to communicate how confidently wrong diagnoses impact readers performance, leading to potential consequences in terms of clinical practice or evaluation of study results. Finally, ( , ), equal to zero when the algebraic sum of fuzzy and crisp elements in the 2×2 table is equivalent, as occurred for TP/fTP and TN/fTN cases of R1 in the clinical scenario (crisp and fuzzy sensitivities of 90.0% each) and FP/fFP and TN/fTN cases of R3 in the first simulation (crisp and fuzzy sensitivity of 50.0%). ( , ) might be zero also in the case readers would express all correct and incorrect diagnoses with ( ) = 1 and ( ) = 0, that is, in the case fuzzy membership degrees would correspond to crisp ones. We did not observe this result in our clinical series, confirming that readings of an experimental set are unavoidably affected by uncertainty, even if low. Accordingly, fuzzy diagnosis is expected to be more informative than crisp one.
Our work has some limitations. First, although differential diagnosis between small cysts and metastasis is relatively easy during imaging and clinical followup of oncologic patients, no histological examination was available to establish final diagnosis. However, general assumptions and applicability of our model do not depend on the presence of incorrect final diagnoses and subsequent variations in ( , ) potentially related to our series. Of note, we did not aim to test diagnostic accuracy of fuzzy evaluations per se but rather to verify whether effects of uncertainty are measurable in terms of ( , ). For this purpose, we used a simple, dichotomic model, regardless of its clinical importance. It is reasonable to infer that differences between crisp and fuzzy sensitivity, specificity, predictive values, and accuracy would be larger and statically significant in the case of more complex clinical scenario (e.g., screening mammography). Nonetheless, the absence of statistical significance does not imply a lack of clinical significance, as arguable from large ( , ) values in R3 simulations. Second, one might argue that two readers only might have been insufficient to achieve adequate statistical power in our analysis. However, we did not mean to estimate a somewhat variant of inter-or intrareader agreement and to establish how precise the estimate was. Divergence consists in the algebraic difference between crisp and fuzzy diagnostic performance as determined by readers' DC (see above); thus statistical power has no influence in its determination. Third, we gave emphasis to the concept of ( , ) in this study. One might argue that area under the curve (AUC) at receiver operating characteristic (ROC) analysis is more helpful in assessing uncertainty than ( , ). However, the role for fuzzy ROC curves in refining crisp evaluations was investigated elsewhere [8]. Our aim was to preliminarily determine whether (a) the method proposed by Castanho et al. [8] is applicable to real readers; (b) fuzzy and crisp performances are comparable in clinical scenario involving human readers.
In conclusion, we tested a fuzzy logic-based method accounting for readers' uncertainty (DC levels) on a real, simplified clinical scenario and two simulations using CT. According to this approach, a diagnosis can belong to positive or negative test results at the same time, differently from what occurs when considering diagnosis with binary (i.e., crisp) logic. By operating with the 2 × 2 table cells as fuzzy subsets, we calculated the divergence ( , ) between fuzzy and crisp diagnostic accuracy as a measure of the effect of DC levels on the nominal, crisp diagnostic performance (i.e., as a measure of diagnostic information). Potential applications of our method are in the context of clinical practice and/or experimental studies, and should be a matter for targeted investigation: first, by adjusting test results for readers' subjective interpretation; second, by testing the efficacy of readers involved in a study; third, by showing whether confidently incorrect diagnoses significantly affect readers' interpretation and fourth, by evaluating how training or educational programs improve the knowledge of less experienced radiologists. We expect that major application for this method would be-at least initially-in the setting of experimental