Thyroid nodules are a common clinical problem. It is important to differentiate benign from malignant nodules. Fine needle aspiration (FNA) is utilized as a preoperative diagnostic technique which is safe, simple, and cost effective for triaging patients with thyroid nodules [
Proper communication among pathologists, clinicians, radiologists, and surgeons along with cytohistological correlation is essential for reporting of thyroid FNA. Hence, consistent diagnostic terminology is vital.
To achieve standardization of diagnostic terminology, morphologic criteria, and risk of malignancy for reporting of thyroid FNA, in 2007, the National Cancer Institute (NCI) organized the NCI Thyroid Fine Needle Aspiration State of the Science Conference which proposed a 6-tier system and named it The Bethesda System for Reporting Thyroid Cytopathology (TBSRTC). The categories and their risk of malignancy for I—nondiagnostic, II—benign, III—atypia of undetermined significance (AUS)/follicular lesion of undetermined significance (FLUS), IV—follicular neoplasm (FN)/suspicious for follicular neoplasm (SFN), V—suspicious for malignancy (SM), and VI—malignant were 1–4%, 0–3%, 5–15%, 15–30%, 60–75%, and 97–99%, respectively [
The study aimed to evaluate the diagnostic utility and reproducibility of “The Bethesda System for Reporting Thyroid Cytopathology” at our institute.
All thyroid FNA smears and thyroidectomy specimens received from January 2013 to June 2018 in the Department of Pathology, at our institute, were included in the study after obtaining approval from the Institute Ethics Committee. The FNA smears were reviewed and categorized according to the Bethesda system. Cytohistological correlation was done for cases with surgical follow-up.
Statistical analysis was performed using R software version 3.5.1 (R Core Team) and Microsoft Office Excel 2007. Mean, median, and Standard Deviation (SD) were calculated for continuous variables like age. Categorical variables were expressed as frequencies and percentages. ANOVA test was used to calculate the
The diagnostic values (sensitivity, specificity, positive predictive value, negative predictive value, and accuracy) and risk of malignancy for FNAs using the Bethesda system were calculated for cases with surgical follow-up. FNA smears interpreted as nondiagnostic were excluded. True negative cases were defined as nodules with benign FNA cytology and surgical pathology. Follicular neoplasm/suspicious for follicular neoplasm, suspicious for malignancy, and malignant cases confirmed to be malignant upon final histology were considered true positive. Nodules with cytological results of FN/SFN or suspicious for malignancy or malignant diagnosed as benign on surgical excision were interpreted as false positive. False negative samples included cases with benign cytology that were found to be malignant upon histopathology.
Cross tabulation and Cohen’s Weighted Kappa (
The study included 646 patients with complaints of thyroid swelling evaluated by FNA. The age group of the patients ranged from 7 to 85 years with a mean of 41.78 years. The male: female ratio was 1 : 6.3.
Out of 646 cases, 75.9% were benign of which 34.7% was nodular goitre. Scant cellularity contributed with 7.8% of the nondiagnostic category. The distributions of AUS/FLUS (III) and FN/SFN (IV) were 1.2% and 3.7%, respectively. Category-V constituted 2.4% cases suspicious for papillary carcinoma. Papillary carcinoma (2%) was the most common malignancy in category-VI (Table
Distribution of cases according to the Bethesda system.
Bethesda category | Bethesda category percentage (%) | FNA diagnosis | No. of cases (total = 646) | Percentage (%) |
---|---|---|---|---|
I—nondiagnostic (89) | 13.8 | Cyst fluid | 6 | 0.9 |
Scant cellularity | 50 | 7.8 | ||
Obscuring blood | 33 | 5.1 | ||
II—benign (490) | 75.9 | Nodular goitre | 224 | 34.7 |
Adenomatoid nodule | 37 | 5.7 | ||
Colloid nodule | 70 | 10.8 | ||
Grave’s disease | 3 | 0.5 | ||
Lymphocytic (Hashimoto) thyroiditis | 156 | 24.2 | ||
III—AUS/FLUS (8) | 1.2 | AUS/FLUS | 8 | 1.2 |
IV—FN/SFN (24) | 3.7 | FN/SFN | 24 | 3.7 |
V—suspicious for malignancy (17) | 2.6 | Suspicious for papillary carcinoma | 16 | 2.4 |
Suspicious for medullary carcinoma | 1 | 0.2 | ||
VI—malignant (18) | 2.8 | Papillary carcinoma | 13 | 2.0 |
Medullary carcinoma | 3 | 0.4 | ||
Poorly differentiated carcinoma | 1 | 0.2 | ||
Undifferentiated carcinoma | 1 | 0.2 |
Cytohistological correlation was done for 100 patients with surgical follow-up. On histopathology, 71 cases were confirmed to be benign of which the most common was nodular goitre. Out of 100 cases, 29 were malignant. Papillary carcinoma (17%) was the most common malignancy followed by follicular carcinoma (6%) (Table
Cytohistological correlation with assessment of risk of malignancy and risk of neoplasm.
Bethesda category | No. of cases (total = 646) | Cases that underwent surgery (total = 100) | Histopathology diagnosis | Risk of neoplasm (%) | Risk of malignancy including papillary microcarcinoma (%) | Risk of malignancy excluding papillary microcarcinoma (%) | ||
---|---|---|---|---|---|---|---|---|
Benign nonneoplastic | Benign neoplastic | Malignant lesion | ||||||
I—non diagnostic | 89 (13.8%) | 1 | Colloid nodule (1) | 0 | 0 | 0 | 0 | 0 |
II—benign | 490 (75.9%) | 71 | Nodular goitre (42) | Follicular adenoma (4) | Follicular carcinoma (1) | 14.1 | 8.5 | 5.6 |
Adenomatoid hyperplasia (10) | Papillary carcinoma (2) | |||||||
Colloid nodule (5) | Papillary microcarcinoma (2) | |||||||
Lymphocytic/Hashimoto thyroiditis (4) | Hurthle cell carcinoma (1) | |||||||
III—AUS/FLUS | 8 (1.2%) | 3 | Follicular adenoma (1) | Follicular carcinoma (1) | 100 | 66.7 | 66.7 | |
Papillary carcinoma (1) | ||||||||
IV—FN/SFN | 24 (3.7%) | 11 | Nodular goitre (2) | Follicular adenoma (1) | Follicular carcinoma (4) | 81.8 | 63.6 | 63.6 |
Hurthle cell adenoma (1) | Papillary carcinoma (2) | |||||||
Medullary carcinoma (1) | ||||||||
V—suspicious for malignancy | 17 (2.6%) | 7 | Papillary carcinoma (5) | 100 | 100 | 85.7 | ||
Papillary microcarcinoma (1) | ||||||||
Medullary carcinoma (1) | ||||||||
VI—malignant | 18 (2.8%) | 7 | Papillary carcinoma (7) | 100 | 100 | 100 |
Risk of malignancy was assessed for 100 cases with surgical follow-up. Out of 100 cases, one was excluded since it was reported as nondiagnostic on cytology. To calculate the risk of neoplasm the surgical resections were divided into three groups: benign nonneoplastic lesions, benign neoplasms, and malignant lesions (Table
The total of 99 cases was divided into two groups. One group comprised of Bethesda categories II and III for which surgery is not recommended due to low malignancy risk and the other group consisted of Bethesda categories IV, V, and VI for which surgery is recommended due to high malignancy risk. The sensitivity, specificity, positive predictive value, negative predictive value, and diagnostic accuracy hence obtained are 72.4%, 94.3%, 84%, 89.2%, and 87.9%, respectively (Table
Determination of diagnostic values.
Test | HPE malignant | HPE benign | Total |
---|---|---|---|
FNA Bethesda categories IV, V, VI | 21 | 4 | 25 |
FNA Bethesda categories II, III | 8 | 66 | 74 |
Total | 29 | 70 | 99 |
Cross tabulation and Cohen’s Weighted Kappa (
The goal of thyroid FNA is to successfully differentiate benign from malignant lesions and to triage patients requiring surgery. The six-tired Bethesda system provides standardized nomenclature for reporting thyroid FNA smears which enables better communication and understanding between clinicians and pathologists. The advantage of this systematic approach is that each of the six Bethesda categories has implied risk of malignancy which helps the clinicians to plan appropriate therapy necessary for the patient [
Nondiagnostic (ND) thyroid FNA result remains a major constraint in arriving at a definitive diagnosis and is the most common cause of false negative reports [
Gunes et al. stated that the clinical expertise of the person performing the FNA, ultrasound guidance, and rapid on-site evaluation for specimen adequacy were not uniform between studies which contributes to the wide range of malignancy rate. All these determinants make the comparison between studies cumbersome and should be taken into consideration while labelling a specimen as nondiagnostic and assessing the risk of malignancy [
In our study, the nondiagnostic yield was 13.8% which was high when compared to TBSRTC consensus. Sampling error and technical quality due to the above-mentioned reasons and strict adherence to the adequacy criteria explain the high rate of ND smears.
Mondal et al. and Nandedkar et al. found high incidence of category II lesions since the patients directly visit a tertiary care center for primary diagnosis without any referral which was also the case in our study [
The incidence of benign lesions in our study was 75.9% when compared to studies done in USA ranging from 64% to 66% which can be attributed to the regional variation in the incidence of thyroid disorders and where majority of patients come only on a referral basis and hence are not exactly representative of the general population [
The implied risk of malignancy for category II is 0% to 3% with the recommended management being clinical follow-up of patients [
The indeterminate category, AUS/FLUS, has led to confusion due to inconsistent usage amongst pathologists of various institutions. This category should be used as a last resort in reporting with the expectation of 7% or less cases to receive this diagnosis as proposed by TBSRTC. Layfield et al. reported a variation of 2.5–28.6% among individual pathologists and 3.3–14.9% among three academic institutions [
There were less number of cases (1.2%) diagnosed under the category AUS/FLUS in our study which was due to rigid adherence to the diagnostic criteria and the pathologists endeavor to avoid ambiguity and keep the use of AUS/FLUS to a minimum which was in similarity to a study by Nandedkar et al. which had 0.8% of cases in category III out of 606 FNA’s [
Mondal et al. reported a lower percentage (1%) of AUS/FLUS cases which was a result of performing ultrasound guided FNA in small and heterogeneous nodules with suspicious features on palpation and radiological evaluation, so that the aspirate can be obtained from the exact site of lesion which is a routine practice even at our institute [
The actual risk of malignancy of category III is difficult to determine, since confirmatory diagnosis is only available in a subset of patients selected for surgery who have suspicious clinical or USG features. The patients are also subjected to selection bias which overestimates the prevalence of malignancy [
The risk of malignancy of AUS/FLUS cases was 69% in a study done by Park et al. which was higher when compared to our study and TBSRTC guidelines. This was because patients with high index of clinical suspicion for malignancy undergo surgery without a repeat FNA. Patients tend to be more concerned about false positive results than false negative results, which might have pressurized cytopathologists to underdiagnose cases to avoid making false positive diagnosis [
Our study was held in a teaching hospital, where FNAs were performed by different persons with varied level of experience during their training period. This factor could have resulted in hemodilution and artefactual changes during smear preparation which might have contributed to a higher ROM in category III (Figure
Atypia of undetermined significance (Bethesda category III). Smear shows clotting artefact with crowding of follicular cells hindering the interpretation (MGG stain ×400).
Based on cytology it is difficult to distinguish follicular carcinoma from follicular adenoma [
Follicular neoplasm/suspicious for follicular neoplasm (Bethesda category IV). (a) Highly cellular smear with cells arranged predominantly in microfollicular pattern (MGG ×100). Histopathology of the same showed follicular carcinoma with capsular invasion (b) and vascular invasion (c) (H&E ×100).
The high ROM in categories III and IV in our study when compared to other studies may be due to the following reasons. Firstly, it is due to the heterogeneity of the indeterminate categories III and IV which are subject to variation in interpretation across institutions [
Our study had 2.4% cases suspicious for papillary thyroid carcinoma (PTC) which was similar to the lower range of rate of suspicious for PTC in the following study [
The ROM in a study by Partyka et al. was in good correlation with our study in categories V and VI which was 100% each after inclusion of papillary microcarcinoma [
Suspicious for papillary carcinoma (Bethesda category V). (a) One of the follicular cells show nuclear groove (arrow) (H&E ×400). (b) Intranuclear cytoplasmic inclusion (arrow) seen in occasional follicular cell (H&E ×400). (c) Smear shows focal papillaroid structure (H&E ×400). (d) Histopathology of the same showed papillary microcarcinoma (H&E ×100).
The risk of neoplasm (RON) gives an overall estimate of predicting both benign and malignant lesions. Our study had nil risk of neoplasm in the nondiagnostic category (Table
The RON of category II was similar to the study done by Wu et al. (Table
Comparison of risk of neoplasm of our study with another study by Wu et al. [
Bethesda category | Risk of neoplasm of our study (%) ( |
Risk of neoplasm in a study by Wu et al. (%) ( |
---|---|---|
I—nondiagnostic | 0 | 24 |
II—benign | 14.1 | 14 |
III—AUS/FLUS | 100 | 44 |
IV—FN/SFN | 81.8 | 67 |
V—SFM | 100 | 77 |
VI—malignant | 100 | 100 |
Our study was able to accurately predict the RON of categories III, V, and VI when compared to the study done by Wu et al. which could be attributed to the routine practice of correlating cytology with clinical, biochemical, and radiological features at our institute (Table
The FN/SFN category had RON of 81.8% which was high compared to the study by Wu et al. This was due to classification of two cases of nodular goitre as category IV lesion (Table
Mehra and Verma in their study found that the method of statistical analysis can alter the results of diagnostic values. If suspicious lesions are considered positive, the sensitivity increases while the specificity decreases. If suspicious lesions are excluded, then the sensitivity decreases and the false negative rates increase. In their study diagnostic values were calculated by either excluding FN/SFN or including it with either benign or malignant diagnosis to highlight the effect on diagnostic values [
Shi et al. suggested that eliminating the diagnosis of category III substantially decreases the sensitivity of thyroid FNAs (the sensitivity for detecting PTC dropped from 100% to 27%) and increases both false positive and false negative rates. The authors concluded that AUS/FLUS category should not be eliminated but recommended using it minimally [
The findings from our study indicate that the calculation of sensitivity, specificity, positive predictive value, negative predictive value, and diagnostic accuracy of thyroid FNAs according to the Bethesda system are less reliable because of the arbitrary nature of cases classified under categories III (AUS/FLUS) and IV (FN/SFN) (Table
The main purpose of TBSRTC was to eliminate the ambiguity and to follow uniformity in the reporting of thyroid FNAs thereby enabling ease of communication among pathologists and clinician and to plan appropriate treatment for the patients [
Comparison of interobserver reproducibility of among various studies.
Study | No. of observers | Interobserver agreement |
---|---|---|
Awasthi et al. [ |
2 | Good (Cohen’s kappa score 0.613) |
Padmanabhan et al. [ |
7 | Fair (Fleiss kappa score 0.23) |
Pathak et al. [ |
3 | Strong (Fleiss kappa score 0.6561) |
Our study | 3 | Almost perfect (Cohen’s kappa score 0.99) |
Our study differed from a study done by Padmanabhan et al. which assessed the interobserver reproducibility in reporting AUS/FLUS category among seven cytopathologists which revealed fair agreement (Fleiss kappa score 0.23) and recommended review of AUS/FLUS cases for more definite categorization [
Thyroid FNA smears reported using the Bethesda system helped in achieving more precise cytological diagnosis. Our study substantiates greater reproducibility among pathologists using TBSRTC for reporting thyroid FNA. The Bethesda system has an added advantage of predicting the risk of malignancy which enables the clinician to plan for follow-up or surgery and also the extent of surgery.
The raw data used to support the findings of this study have not been made available because of patient’s confidentiality and privacy rules.
The yield of nondiagnostic aspirate was high due to the varied experience level of the persons who performed the thyroid FNA. Repeat USG guided FNA would have reduced the number of nondiagnostic aspirates but it was feasible only for patients with high index of clinical and radiological features suspicious of malignancy.
The authors declare that they have no conflicts of interest.
The authors would like to thank the Department of Pathology, Pondicherry Institute of Medical Sciences for their guidance and support, and Dr. Anand Mariaselvam, Medical Officer, Indira Gandhi Government General Hospital and Post Graduate Institute, Puducherry, for his technical support in the preparation of the manuscript.