the effectiveness of ultrasound surveillance for hepatocellular carcinoma in a canadian centre and determinants of its success

1Department of Medical Imaging; 2Department of Gastroenterology, University of Toronto, Toronto, Ontario Correspondence: Dr Korosh Khalili, Department of Medical Imaging, University Health Network, Princess Margaret Hospital, 610 University Avenue, Room 3-964, Toronto, Ontario M5G 2M9. Telephone 416-946-4501 ext 4833, fax 416-946-6564, e-mail korosh.khalili@uhn.ca Received for publication July 3, 2014. Accepted February 10, 2015 Surveillance of populations at risk for hepatocellular carcinoma (HCC) using ultrasound (US) has been suggested to improve survival in a randomized controlled trial, is cost effective, and is recommended by American and international associations dedicated to the study of liver disease (1-5). It is, nevertheless, an imperfect test and its effectiveness has been questioned in the North American setting, in which patient size may limit sound transmission (6,7). The method of HCC surveillance in North America is variable, with serum alphafetoprotein (AFP), computed tomography (CT) and magnetic resonance imaging (MRI) also being used as primary surveillance tools (8,9). Approximately 3000 patients undergo active HCC surveillance in our combined hospitals. We aimed to determine the proportion of patients whose tumour was detected at a potentially curable stage using US surveillance alone. Furthermore, we wanted to know whether there were any patient-, liveror tumour-related factors that were associated with success of surveillance (ie, was the detection of the tumour in a curative stage). The aim of the present study was, therefore, twofold: to determine the effectiveness of US surveillance for HCC in a specialized Canadian liver disease centre; and to determine independent variables that were associated with success of surveillance. K Khalili, R Menezes, TK Kim, et al. The effectiveness of ultrasound surveillance for hepatocellular carcinoma in a Canadian centre and determinants of its success. Can J Gastroenterol Hepatol 2015;29(5):267-273.


METHODS
The present study was approved by the institution's research ethics board and informed consent was waived. Electronic patient charts, available at one of the surveillance centres (Toronto Western Hospital, Toronto, Ontario), were retrospectively reviewed to identify all 236 patients with a new diagnosis of HCC between January 2000 and June 2010, and whose tumour was discovered first at the authors' centre. Patients who presented symptomatically or whose tumour was found using other modes of imaging were excluded ( Figure 1). The reasons for other modes of imaging were: long-term follow-up of a different, ultimately benign nodule detected by US surveillance (10); symptomatic presentation (7); imaging for unrelated disease (5); surveillance by CT for previous HCC >2 years from the original tumour (3).
A total of 201 patients were included in the analysis; their characteristics are summarized in Table 1. The diagnosis of HCC was confirmed for all patients using the following criteria: updated American Association for Study of Liver Disease (AASLD) HCC management guidelines' imaging diagnostic criteria of one positive contrastenhanced imaging scan in at-risk patients; positive histopathology from core biopsy or explant specimens; or recurrence of tumour after treatment (3). All relevant imaging was directly and retrospectively reviewed by a fellowship-trained abdominal imager with expertise in hepatobiliary imaging (12 years of practice) to ensure compliance with the latest imaging criteria and to confirm staging. The reviewer was blinded to the surveillance intervals.

Determining effectiveness
The Milan criteria for treatment of HCC by transplantation was used as an outcome measure with successful US surveillance defined as fulfilling the criteria of one nodule <5 cm in size, or three nodules <3 cm in size, with no vascular invasion or distant metastases (10). The Barcelona Clinic Liver Cancer (BCLC) staging classification was also used as an additional outcome measure (11). Successful surveillance was defined as detection of tumour using US in potentially curative stages 0 (one nodule <2 cm) or A (one resectable nodule of any size or three nodules <3 cm) of disease. Failure was defined as presentation in the palliative stages B to D.
For each patient, reports and images of all US scans were reviewed to determine the date of the earliest positive surveillance US of a nodule. This scan was noted as the 'detection scan' and was used to determine tumour stage. If a nodule ≥1 cm had been detected by a surveillance US but followed by imaging and subsequently proven to be malignant, the tumour stage at the time of the original detection scan, rather than at time of proof of malignancy, was used. The surveillance US images were reviewed to ensure that the nodule(s) detected corresponded to that characterized as malignant on contrast-enhanced imaging; if it did not, the malignant nodule was designated either as 'missed' by surveillance (if within three months of the surveillance US) or detected by other means and excluded (if ≥3 months of the surveillance US). Nodules <1 cm were not examined for the purposes of the present study because these were conventionally followed (3).

Missed tumours
If a surveillance US was negative and a HCC was detected within three months by other means (imaging or AFP, including a repeat US scan), it was defined as 'missed' by surveillance. Three months was used as a cut-off because a fast-growing malignant tumour may be truly undetectable on one surveillance image but present on the next. 'Missed' nodules were omitted from statistical analysis of effectiveness of surveillance (above) and also of potential determinants of successful

Data presented as n (%) unless otherwise indicated. BCLC Barcelona Clinic
Liver Cancer surveillance (below) because it would not be possible to determine what proportion of these would be detected by next surveillance US still in a curable stage.

The contribution of AFP
The charts of all patients were reviewed and level and dates of serum AFP measurements were recorded. Only measurements within 90 days of the final surveillance US were included (from 90 days before to 90 days after). If multiple measurements were available before the last surveillance scan, then the highest level was used. If multiple measurements were available after the detection of tumour, the closest measurement to the date of US was used. An AFP level >20 mg/mL was defined as positive.

Potential determinants of successful surveillance
For the present analysis, patients whose tumours were detected on the first surveillance US scan were omitted because theirs were representative of a screening (prevalence) and not of a surveillance (incidence) population. On the first surveillance scan, a tumour may be detected in the advanced stage just for the reason that there was no previous surveillance. On subsequent surveillance scans, a tumour may be detected in an advanced stage due to patient-or tumour-related variables that led to failure of surveillance.
The following variables were tested for an association with successful surveillance: age, sex, ethnicity, cause of liver disease, severity of liver disease, year of detection, location of residence and surveillance frequency. Due to the limited size of the study population, variables were broadly grouped where appropriate. Ethnicity was divided into two groups of Caucasian (European descent) and others (including East/ Southeast Asian, African, Middle-Eastern, South-Asian, Caribbean and Latin American). Causes of liver disease included chronic hepatitis B virus infection, chronic hepatitis C virus infection, other causes and multiple causes (ie, patients with more than one cause of disease). This latter category (multiple causes) was created because these patients are more likely to be closely followed and screened. The Child-Pugh score was used for severity of liver disease, with noncirrhotic patients grouped with Child's A versus patients with Child's B and C scores. versus beyond metropolitan groups to determine whether there was a difference in urban versus rural populations. Finally, surveillance frequency was grouped into ≤12 months (regular) and >12 months (irregular). The authors acknowledge that a six-month surveillance interval is the standard of care; however, because the present study covers a substantial period before publication of the AASLD guidelines in 2005, 12-month follow-up interval may still have been considered to be appropriate by some clinicians. Therefore, for the sake of consistent terminology, both within the present study and with other studies that cover a similar period of examination (9), frequency of ≤12 months has been defined as 'regular surveillance'.

Statistical methods
A two-tailed Fisher's exact test was used to compare effectiveness between regular and irregular surveillance groups. χ 2 analysis, t test (for age) and unadjusted logistic regression were used to identify potential determinants of successful surveillance. Patients' region of residence were derived using the Postal Code Conversion File + Version 5 (12). Postal Code Conversion File + Version 5 is a series of files created by Statistics Canada based on the most recent Canadian census data and assigns geographical identifiers based on postal codes. P<0.05 was considered to indicate a significant association. Analyses were performed using SAS version 9.2 (SAS Institute, USA) and SPSS version 17.0 (IBM Corporation, USA) for Windows (Microsoft Corporation, USA).

Effectiveness of US surveillance
The stage of tumour at detection by US using end points of Milan criteria and BCLC staging system are listed in Tables 2 and 3, and graphically depicted in Figures 2 and 3, respectively. Seventy-seven percent of tumours were discovered within the Milan criteria through regular US surveillance. Similarly, 80% of patients undergoing regular surveillance had their tumour detected by US in BCLC curative stages of 0 (33%) and A (47%).
When comparing the proportion of patients within and outside the Milan criteria for transplantation, there was a statistically significant difference between patients undergoing regular surveillance versus    Table 4. The overall sensitivity of AFP detection of tumour within Milan criteria was 32% using a threshold of 20 ng/mL. AFP was able to detect tumour within three months of a negative surveillance US (ie, missed) in four patients, all of whom were undergoing regular surveillance. Because 70 of 129 patients with available serum AFP were undergoing regular US  surveillance, the addition of AFP resulted in detection of HCC in four of 70 (6%) of patients undergoing regular US surveillance.

Potential determinants of success of surveillance by US
The results of univariate analysis using Milan transplantation criteria are summarized in Table 5 The results of univariate analysis using the BCLC staging system as the end point are summarized in Table 6. In none of the variables tested was there a statistically significant correlation with detection in the palliative stages of tumour (BCLC B to D).

DISCUSSION
The present study showed that in a diverse Canadian population, regular US surveillance performed in a hepatobiliary centre resulted in detection of tumour in curative stages in at least 80% of patients. In addition, at least 77% of patients met the Milan criteria for liver transplantation. Furthermore, regular surveillance resulted in detection of tumour in the curable stages in a significantly higher proportion of patients than in irregular surveillance or screening (first surveillance) populations. In other words, regular surveillance for HCC resulted in stage migration. Tumours detected by other means within three months of negative US surveillance were designated as 'missed' in the present study because some of these would likely be detected within a curable range on a future surveillance scan; therefore, the true effectiveness of US in our centre is likely higher.
Our results show that US surveillance in North America is as effective as the best published rates. A recent meta-analysis of prospective studies assessing the effectiveness of US surveillance for HCC reported a combined sensitivity of 63% for detection of tumour within Milan criteria for transplantation (13). However, most studies used in this particular analysis were old, with only three of 12 study analyses performed in the past decade, and only one after 2006. Imaging technology is continually improving and important sonographic innovations, such as harmonic and compound imaging, have become part of standard-imaging US scanners over the past decade (14). A review of prospective and retrospective US surveillance studies with or without AFP published over the past two years show a sensitivity of 69% to 88% for detection of tumour within Milan criteria (15)(16)(17)(18)(19). Furthermore, detection of very early HCC (BCLC stage 0, <2 cm) is reported to be 8% to 43% (15)(16)(17)(18)(19). Our rates of 77% for detection of HCC within Milan criteria and 33% for detection of very early HCC are well within this reported range.
Regular surveillance was the only independent variable associated with detection of disease in early, curable stages. Regular surveillance increased the odds of detecting tumour within Milan criteria by a factor of 2.76 (95% CI 1.10 to 6.88). Just as importantly, there was no significant difference in sensitivity of US surveillance between Caucasian and non-Caucasian ethnicity. The present study was the first to directly demonstrate similar sensitivities within different ethnic groups; previously, this was inferred by comparing effectiveness rates published in Europe and Asia (1,3). Furthermore, a significantly higher proportion of tumours were detected in the very early BCLC stage of 0 (<2 cm) in the latter half of our study period; that is, from 2006 to 2010 than in 2000 to 2005 (P=0.02). This is an important finding because treatment of tumours <2 cm in size result in five-year survival rates of up to 90% (5). The reason for increased detection of smaller tumours in 2006 to 2010 may be because of improved US technology; all of our scanners were upgraded in 2006 to the latest generation of high-end equipment. It was also due to increased frequency of  Data presented as n (%) unless otherwise indicated surveillance because the proportion of patients undergoing six-month surveillance intervals rose from 38% to 44% after the publication of the first HCC management guidelines from the AASLD (20).
In the present study, we used tumour stage at time of detection by US rather than tumour stage at time of treatment. This was done to assess US alone as a surveillance test in a Canadian setting. In a companion study involving the same cohort to assess effectiveness of our surveillance program compared with patients referred to our own institution, we used tumour stage at time of treatment (21). Occasionally, the surveillance test is effective in detection but the work-up strategy fails to diagnose a HCC as malignant. Sometimes a false-positive surveillance test for a benign nodule leads to incidental detection of a HCC by the work-up imaging (such as CT or MRI). Finally, some HCCs are detected on CT/MRI for patients undergoing follow-up imaging for indeterminate nodules detected elsewhere by surveillance US. These scenarios explain why, in the present study, in which US surveillance effectiveness was assessed, tumour stage is BCLC 0/A 83% (Table 1), whereas in our companion study, in which effectiveness of the comprehensive surveillance program was measured, it was 75%.
The present study demonstrates the limited role of serum AFP levels in a surveillance setting. Using a threshold of 20 ng/mL, AFP was able to detect 73% of tumours beyond Milan criteria, but only 32% of tumours within Milan. More importantly, in only 6% of patients did AFP detect a tumour when US was negative. The present study demonstrates that when US is performed effectively, most tumours are detected before reaching a differentiation sufficiently advanced, or a volume sufficiently large, to produce even a low threshold of AFP. The present study supports the AASLD recommendation of elimination of AFP in the setting of regular US surveillance (3). We hope that this result further promotes the abandonment of AFP as a surveillance test, because even in our own practice, we have found multiple examples of noncompliance with the AASLD guidelines in this regard.
Studies that assess clinical performance of US in HCC surveillance demonstrate acceptable sensitivity (15)(16)(17)(18)(19). However, studies that assess sensitivity of US with explant correlation show an unacceptably low performance, leading some to advocate its replacement by CT or MRI (7,22). Why the discrepancy? First, the per-patient sensitivity -rather than per-lesion sensitivity -is relevant to US surveillance because all patients with positive US undergo CT or MRI, in which additional nodules can be detected. Second, the assumption with explant correlation studies is that every HCC is significant and, therefore, its detection affects patient survival. However, this is a faulty assumption; it is tumour biology -rather than its mere presence -that affects patient survival (23). Multiple studies have shown that markers of aggressive tumour behaviour, such as poor histological differentiation, serum AFP, vascular invasion and tumour size are all stronger predictors of poor survival outcomes (23)(24)(25)(26). The fact that multiple proposed expansions of the Milan criteria for transplantation allows for inclusion of additional tumours is a reflection of the relative lower clinical significant of tumour number (27). Finally, the slow growth rate of early HCC allows for multiple chances for US to detect a nodule; repeated application of the test improves its overall sensitivity.

Strengths and limitations
The present study was the first to investigate HCC surveillance effectiveness using US in Canadian patients with HCC. A major strength was the direct correlation of tumour to the original surveillance scan. By anatomically reviewing the exact location of every HCC to the surveillance US, we ensured that every HCC was truly detected by the US or otherwise designated as missed. Furthermore, the stringent inclusion criteria for the diagnosis of HCC and staging through review of the imaging and clinical features ensured that every nodule included fulfilled current criteria for malignancy and staging systems. Moreover, we accounted for missed tumours that were found by other means. However, the retrospective nature of the present study subjects it to inherent bias. For example, HCCs detected by surveillance in our centre but characterized elsewhere could not be included because the imaging was not available for staging. Our study shows a stage migration through regular surveillance, but we cannot demonstrate a direct survival benefit. In addition, the sample size of 135 patients used for uni-and multivariate analyses limited our ability to demonstrate small but significant differences. Finally, our results are reflective of an academic US department in a hepatobiliary centre with the latest generation of scanners, uniformity of scanning standards, and with continual training for sonographers and direct physician supervision.

DISCLOSURES:
The authors have no financial disclosures or conflicts of interest to declare.