Colorectal Cancer Screening in Average Risk Populations: Evidence Summary

Introduction. The objectives of this systematic review were to evaluate the evidence for different CRC screening tests and to determine the most appropriate ages of initiation and cessation for CRC screening and the most appropriate screening intervals for selected CRC screening tests in people at average risk for CRC. Methods. Electronic databases were searched for studies that addressed the research objectives. Meta-analyses were conducted with clinically homogenous trials. A working group reviewed the evidence to develop conclusions. Results. Thirty RCTs and 29 observational studies were included. Flexible sigmoidoscopy (FS) prevented CRC and led to the largest reduction in CRC mortality with a smaller but significant reduction in CRC mortality with the use of guaiac fecal occult blood tests (gFOBTs). There was insufficient or low quality evidence to support the use of other screening tests, including colonoscopy, as well as changing the ages of initiation and cessation for CRC screening with gFOBTs in Ontario. Either annual or biennial screening using gFOBT reduces CRC-related mortality. Conclusion. The evidentiary base supports the use of FS or FOBT (either annual or biennial) to screen patients at average risk for CRC. This work will guide the development of the provincial CRC screening program.


Introduction
Canada has one of the highest rates of colorectal cancer (CRC) in the world, with an estimated 25,100 cases in 2015 [1,2]. CRC is also one of the leading causes of cancer-related death for men and women combined in Canada, with an estimated 9300 deaths in Canada in 2015. However, if CRC is found in its early stages, there is a 90% chance that it can

Systematic Review Objectives
The purpose of this systematic review is to evaluate the existing evidence concerning screening of adults at average risk for CRC in the context of an organized, population-based screening program. The main objectives are to identify the following: (i) The benefits and harms of screening in this population.
(ii) The optimal primary CRC screening test(s) for this population.
(iii) The appropriate ages of initiation and cessation for screening in this population.
(iv) The intervals at which people at average risk should be recalled for CRC screening.

Target Population
The target population includes primary care providers, endoscopists, policy-makers, and program planners in Ontario.

Primary Research Question. Primary Research Question is as follows:
(1) How do different screening tests, individually or in combination, perform in average risk people in preventing CRC-related mortality or all-cause mortality or in decreasing the incidence of CRC? Secondary outcomes include the detection of cancer or its precursors, screening participation rate, adverse effects of tests, and test characteristics, such as sensitivity, specificity, positive predictive value, negative predictive value, and proportion of false-positives or of false-negatives.

Secondary Research Questions. Secondary Research
Questions are as follows: (1) What are the appropriate ages of initiation and cessation for screening in people at average risk for CRC? Is there a relationship between age and the effectiveness of CRC screening? (2) What are the appropriate intervals between CRC screening tests (by test)? Is there a relationship between screening intervals and the effectiveness and risks of screening?

Methods
The authors of this evidentiary base (working group) consisted of one primary care physician, one colorectal surgeon, one expert in public health screening, one policy analyst from the Ontario CRC screening program, two methodologists, and three gastroenterologists. The PEBC, a provincial program of Cancer Care Ontario, is supported by the Ontario Ministry of Health and Long-Term Care. All work produced by the PEBC and any associated programs is editorially independent from the ministry. A two-stage method was used. It is summarized here and described in more detail as follows: (1) Search and evaluation of existing systematic reviews: if existing systematic reviews were identified that addressed the research questions and were of reasonable quality, then they were included as a part of the evidentiary base.
(2) Original systematic review of the primary literature: this review focused on areas not covered by existing and accepted reviews.

Study Selection Criteria and Protocol.
Systematic reviews were included if (i) they addressed at least one of the research questions, (ii) they evaluated randomized or nonrandomized control trials of asymptomatic average risk subjects undergoing CRC screening, (iii) the literature search strategy for the existing systematic review was reproducible (i.e., reported) and appropriate, (iv) the existing systematic review reported the sources searched, as well as the dates that were searched.
Identified systematic reviews were assessed using the Assessing of Methodological Quality of Systematic Reviews (AMSTAR) tool [82]. In cases where multiple systematic reviews were identified for a particular outcome, only evidence from the most recent systematic review with the highest quality was used in the evidence base. The literature was searched for new primary studies published after the end search date of included systematic reviews. Individual study quality from the studies included in the systematic reviews as well as any new primary studies was assessed in order to complete the Grading of Recommendations, Assessment, Development and Evaluations (GRADE) tables for risk of bias [9].
If no existing systematic review was identified for a given test or question, or if identified reviews were incomplete, a systematic review of the primary literature was performed. Articles in reference lists from included studies were also searched. The scope of the primary literature review was tailored to address the gaps in the incorporated existing systematic reviews (e.g., subject areas covered and time frames covered). The criteria for the primary literature are described as follows.
Inclusion Criteria. Inclusion criteria are as follows: (1) Randomized controlled trials (RCTs) (primary research question and secondary research questions 1 and 2) that could be identified directly from the search or from reference sections of systematic reviews.
(3) Evidence from nonrandomized prospective comparative studies with historical or contemporaneous controls, with the consensus of the working group, when there were gaps in available evidence from RCTs.
(4) Studies preferred with asymptomatic average risk subjects and population-based studies that did not oversample adults with symptoms of CRC or a family history of CRC which were also considered acceptable.
(2) Studies that included a population enriched with subjects with symptoms of CRC or a family history of CRC. (3) Nonsystematic reviews. (4) Non-English-language publications.
One of the two reviewers (NI and EV) independently reviewed the titles and abstracts resulting from the search. For items that warranted full-text review, NI or EV reviewed each item independently. However, in uncertain cases, a second reviewer (JT) was asked to review them.

Data Extraction and Assessment of Study Quality and
Potential for Bias. Data from the included studies were independently extracted by NI and EV. If there was more than one publication for the same study, only the most updated or recent versions of the data were reported in the result. All extracted data and information were audited by an independent auditor. Important quality features, such as randomization details, sample size and power, intention-to-screen (ITS) analysis, length of follow-up, and funding, for each RCT, were extracted. The quality of observational studies was assessed using a modified Newcastle-Ottawa Scale [83]. The quality of diagnostic studies was assessed using a modified Quality Assessment of Diagnostic Accuracy Studies (QUADAS) tool [84]. The GRADE method for assessing the quality of aggregate evidence was used for each comparison using the GRADEpro Guideline Development Tool [9,85]. The working group used the GRADE system for ranking outcomes and scored each outcome from the evidence review on a scale from 1 to 9. Outcomes with a score from 1 to 3 were considered of limited importance, from 4 to 6 important, and from 7 to 9 critical in the development of recommendations for the CRC screening program. Only outcomes that were considered critical or important were included in the GRADE evidence tables.

Synthesizing the Evidence.
When clinically homogenous results from two or more trials were available, a meta-analysis was conducted using review manager software (RevMan 5.3) provided by the Cochrane Collaboration [86]. For all outcomes, the dichotomous model with random effects was used. The number of person-years, rather than the total number of subjects, was used, if available. The number of person-years takes into account the fact that different people in the study may have been followed up for different lengths of time.
In order to have comparable control rates across all gFOBT and FS trials, the control rates for the no screening groups in the gFOBT and FS trials were combined and calculated from the total number of cases across all gFOBT and FS trials over the total number of person-years across all gFOBT and FS trials.
Statistical heterogeneity was calculated using 2 test for heterogeneity and 2 percentage. A probability level for 2 Table 1: Description of the quality of evidence grades according to Grading of Recommendations, Assessment, Development and Evaluations (GRADE) [9].

Grade Definition High
We are very confident that the true effect lies close to that of the estimate of the effect

Moderate
We are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different Low Our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect

Very low
We have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect statistic less than or equal to 10% ( ≤ 0.10) and/or an 2 greater than 50% was considered indicative of statistical heterogeneity.

Process for Developing Conclusions.
The working group members met in person on four occasions to develop evidence-based conclusions through consensus. For each comparison (e.g., gFOBT versus no screening) the working group assessed the quality of the body of evidence for each outcome using the GRADE process [9]. Five factors were assessed for each outcome in each comparison: the risk of bias, inconsistency, indirectness, imprecision, and publication bias. Observational studies were initially graded as low quality and RCTs as high quality; the quality of the evidence was downgraded when serious threats were identified to one or more factors. At the in person meetings, the working group discussed each comparison and agreed on the overall certainty of the evidence across outcomes (Table 1), whether the desirable anticipated effects were large, whether the undesirable anticipated effects were small, and whether the desirable effects were large relative to the undesirable effects. For each comparison, conclusions were developed that reflected these working group discussions.

Literature Search Results.
A total of 7538 studies were identified and 378 were selected for full-text review. Of those, 48 met the predefined eligibility criteria for this systematic review. An additional 27 articles were found from the reference lists. After our literature search, we became aware of and included an updated publication for one of the FS screening RCTs that had already been identified [21]. A total of 76 articles were included of which eight were systematic reviews [10,29,30,59,68,[87][88][89], 39 [6, 11-28, 60-66, 69-81] were from 30 RCTs, 19 were prospective studies [31-43, 52, 53, 55-57, 67], five were retrospective studies [44][45][46][47][48], and five were case-control studies [49-51, 54, 58]. Evidence from five of the eight systematic reviews was included either because the reviews were the most recent systematic review with the highest quality evidence for a particular outcome or because they included an outcome of interest not covered by other high-quality reviews [10,29,30,59,68]. After the search process and quality assessment, a total of 73 articles were included in this systematic review. The search flow diagram, the characteristics and quality of the included systematic reviews and studies, and the meta-analyses can be found online or in Supplementary  Figures 1 to 19 (see Supplementary Material available online at http://dx.doi.org/10.1155/2016/2878149) [8]. Table 2 provides a summary of the number and type of studies used for each comparison.

Interpretation and Conclusions
The following are the conclusions developed by the working group based on the review of the evidence and meta-analyses. When discussing the effects of various screening tests, the reported outcomes vary by test. There was strong agreement among the members of the working group that CRC-related mortality and complications from screening tests were critical outcomes for recommendation development. All-cause mortality, CRC incidence, participation rate, and diagnostic outcomes were considered important outcomes of interest.

Fecal Tests for Occult Blood.
There was strong evidence to support the use of fecal tests for occult blood to screen people at average risk for CRC.

Guaiac Fecal Occult Blood Test (gFOBT) versus
No Screening (i) The overall certainty of the evidence was high, suggesting a definite reduction in CRC-related mortality (Table 3 and Supplementary Figures 1 to 3 [91]. The anticipated harms associated with gFOBT (including follow-up colonoscopy for people with positive tests) are small and outweighed by the benefits.

Fecal Immunochemical Test (FIT) versus gFOBT
(i) The overall certainty of the evidence was moderate (Table 4 and Supplementary Figures 7 to 10). The Canadian Journal of Gastroenterology and Hepatology  In order to have comparable control rates across all gFOBT and FS trials, the control rates for the no screening groups in the gFOBT and FS trials were combined and calculated from the total number of cases across all gFOBT and FS trials over the total number of person-years across all gFOBT and FS trials.
1 GRADE working group grades of evidence: (i) High quality: we are very confident that the true effect lies close to that of the estimate of effect.
(ii) Moderate quality: we are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.
(iii) Low quality: our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect.
(iv) Very low quality: we have very little confidence in the effect estimate: The true effect is likely to be substantially different from the estimate of effect.
2 Major complication defined as bleeding, perforation, or death within 30 days of screening, follow-up colonoscopy, or surgery. 3 Holme et al. 2013 [10] reported a major complication rate of 0.03%. 4 Goteborg trial used sigmoidoscopy and double-contrast barium enema as reference standard; other trials used colonoscopy.

8
Canadian Journal of Gastroenterology and Hepatology Canadian Journal of Gastroenterology and Hepatology 9 magnitude of the desirable anticipated effects was at least equivalent to gFOBT, and it is likely that the desirable effects of FIT are greater than for gFOBT. The anticipated undesirable effects associated with FIT (including follow-up colonoscopy for people with positive tests) are small and outweighed by the benefits. (ii) While there were well-designed randomized controlled trials (RCTs) comparing FIT with gFOBT, the outcomes of these trials (participation and detection rates) were considered to be less important than CRCrelated mortality. However, it was anticipated that the reduction in CRC-related mortality and the complications resulting from screening with FIT would be at least equivalent to those observed from screening with gFOBT. FIT's greater sensitivity for detection of CRC and advanced adenomas compared with gFOBT suggests that the reduction in CRC incidence with FIT could be greater than with gFOBT; however, the magnitude and significance of any additional benefit of FIT over gFOBT are unknown. It is important to highlight that the FIT positivity threshold selected would be an important determinant of the magnitude of the benefits and harms of FIT relative to gFOBT.

Lower Bowel Endoscopy.
There was strong evidence to support the use of flexible sigmoidoscopy (FS) to screen people at average risk for CRC. There was no direct evidence to support the use of colonoscopy to screen people at average risk for CRC, but evidence from FS informed the assessment of the benefits and harms of colonoscopy in screening people at average risk for CRC.

FS versus No Screening
(i) The overall certainty of the evidence was high, suggesting that FS has a definite effect on CRCrelated mortality and incidence when compared with no screening (  [92]. The anticipated harms associated with FS (including follow-up colonoscopy for people with positive tests) were small and outweighed by the benefits.

Colonoscopy versus No Screening
(i) The overall certainty of direct evidence supporting the use of colonoscopy to screen people at average risk for CRC was very low when compared with no screening ( Table 6). The desirable and undesirable anticipated effects were uncertain. (ii) It is anticipated that the benefit of screening with colonoscopy would be at least equivalent to that observed for screening with FS; however, the magnitude of additional benefit over FS, if any, is unknown. The magnitude of additional undesirable effects of colonoscopy relative to FS is also unknown.

Fecal Tests for Occult Blood versus Lower Bowel Endoscopy.
There was insufficient evidence to determine how fecal tests for occult blood perform compared with lower bowel endoscopy to screen people at average risk for CRC (Supplementary Tables 1 to 5 and Supplementary Figures 11 to 19).
(i) The studies that compared one-time fecal tests for occult blood to lower bowel endoscopy were heterogeneous, with few comparisons where data could be pooled. However, in general, the evidence suggested that participation was higher and detection rate was lower with fecal-based tests compared with endoscopic tests. (ii) The overall certainty of the evidence was low. CRCrelated mortality was not evaluated and the design of the studies favoured endoscopic tests because the comparison was to one-time fecal-based testing (rather than repeated testing over time, which is how these tests are used in usual practice). There was significant heterogeneity in participation. The undesirable anticipated effects of endoscopy (including follow-up endoscopy for people with positive fecal tests) are probably small. It is uncertain whether the desirable effects are large relative to the undesirable effects.

Computed Tomography Colonography versus Colonoscopy.
There was insufficient evidence to determine how computed tomography colonography performs compared with colonoscopy to screen people at average risk for CRC (results not shown; see [8]).
(i) The overall certainty of the evidence was low. The desirable and undesirable anticipated effects were uncertain.

Capsule Colonoscopy versus Colonoscopy.
There was insufficient evidence to determine how capsule colonoscopy performs compared with colonoscopy to screen people at average risk for CRC (results not shown; see [8]).
(i) The overall certainty of the evidence was very low. The desirable and undesirable anticipated effects were uncertain. In order to have comparable control rates across all gFOBT and FS trials, the control rates for the no screening groups in the gFOBT and FS trials were combined and calculated from the total number of cases across all gFOBT and FS trials over the total number of person-years across all gFOBT and FS trials. 1 Major complication rate included bleeding, perforation, or death within 30 days of screening, follow-up colonoscopy, or surgery. 2 Holme et al. 2013 [10] reported a major complication rate of 0.08%. Mixed study designs included case-control and retrospective.

2
The risks of perforation or bleeding were less than 1% ranging from 0% to 0.22% for perforations and 0% to 0.19% for bleeding.   FIT). There was insufficient evidence to determine how stool DNA performs compared with gFOBT or FIT to screen people at average risk for CRC (results not shown; see [8]).
(i) The overall certainty of the evidence was very low. The desirable and undesirable anticipated effects were uncertain.

Other DNA Tests.
There was insufficient evidence to support the use of mSEPT9 to screen people at average risk for CRC (results not shown; see [8]).
(i) The overall certainty of the evidence was very low. The desirable and undesirable anticipated effects were uncertain.
7.6. Metabolomic Tests 7.6.1. Fecal M2-PK. There was insufficient evidence to support the use of fecal M2-PK to screen people at average risk for CRC (results not shown; see [8]).
(i) The overall certainty of the evidence was very low. The desirable and undesirable anticipated effects were uncertain.  [8]).
(i) The overall certainty of the evidence was very low. There was insufficient evidence to demonstrate differences in reduction of CRC mortality using gFOBT across age groups. The desirable and undesirable anticipated effects across age groups were uncertain.

Age of Initiation/Cessation with FS.
There was insufficient evidence to recommend ages of initiation or cessation when screening with FS in people at average risk for CRC (results not shown; see [8]).
(i) The overall certainty of the evidence was very low. There was insufficient evidence to demonstrate differences in reduction of CRC mortality or incidence using FS across age groups. The desirable and undesirable anticipated effects across age groups were uncertain. (ii) Of the four large FS RCTs, three examined "once in a lifetime" FS between the ages of 55 and 64, while the fourth RCT examined baseline FS between the ages of 55 and 74 with a second FS after three or five years.

Age of Initiation/Cessation with Colonoscopy.
There was insufficient evidence to recommend an age of initiation or cessation to screen with colonoscopy in people at average risk for CRC (results not shown; see [8]).
(i) The overall certainty of the evidence was very low. There was insufficient evidence to demonstrate differences in CRC detection using colonoscopy across age groups. The desirable and undesirable anticipated effects across age groups were uncertain. (ii) Currently, the Ontario CRC screening program does not recommend colonoscopy to screen persons at average risk for CRC. The program does recommend colonoscopy in people at increased risk (one or more first-degree relatives with CRC) starting at 50 years of age or 10 years younger than the age at which the relative was diagnosed, whichever occurred first.

Age of Initiation/Cessation with FIT.
There were no studies that met our inclusion criteria for age of initiation/cessation for FIT.
7.8. Screening Intervals 7.8.1. gFOBT Intervals. There was evidence to suggest that either annual or biennial screening using gFOBT in people at average risk for CRC reduces CRC-related mortality (results not shown; see [8]).
(i) The overall certainty of the evidence was moderate. The desirable anticipated effects on CRC mortality were small and similar for annual or biennial screening. The undesirable anticipated effects were not reported for each interval group. Anticipated harms associated with gFOBT (including follow-up colonoscopy for people with positive tests) were small for biennial screening and were likely to be greater for annual screening. In addition, annual screening is anticipated to increase burden to the participant.

FIT Intervals.
There was insufficient evidence to recommend an interval to screen people at average risk for CRC using FIT (results not shown; see [8]).

Summary and Next Steps
This evidentiary base summarizes the known clinical effectiveness and safety of CRC screening tests. Concurrently, the Canadian Task Force on Preventive Health Care (CTFPH) [99] and the US Preventive Services Task Force (USPSTF) have published guidelines on CRC screening [100]. The audience for the 3 reviews differed slightly: the current review seeks to provide guidance to the CRC screening program in light of emerging evidence in CRC screening while the CTFPH and the USPSTF specify "primary care physicians" and "primary care clinicians and patients" as their target audiences, respectively. All 3 reviews support the use of fecal testing (gFOBT or FIT) and flexible sigmoidoscopy but treat colonoscopy slightly differently. The CTFPH recommends against colonoscopy while the USPSTF endorses it as one of a range of options. In the current review, we grade the evidence only (recommendations for Ontario's CRC screening program are released separately; see below). Our interpretation of the evidence acknowledges that the strong evidence in favour of flexible sigmoidoscopy does inform the assessment of colonoscopy. However, this evidence is both indirect and incomplete (magnitude of additional benefit from colonoscopy and of additional harms is unknown). In order to have complete understanding of the benefits and harms of colonoscopy, we await the results of ongoing randomized controlled trials in colonoscopy for average risk screening, anticipated in the 2020s.
The evidence from the current review is central to the ongoing development of Ontario's CRC screening program. However, this evidentiary base is necessary but not sufficient to guide program development as other context-specific criteria such as cost-effectiveness, existing program design, and public acceptability and feasibility (from an organizational and economic perspective) must be considered. In addition, the program must also consider the balance between choice and informed decision making and issues not well addressed by the evidence such as how to best implement CRC screening when there is more than one CRC screening test supported by high-quality evidence. An expert panel comprising members from national and international screening programs, primary care physicians, general surgeons, gastroenterologists, pathologists and laboratory medicine professionals, nurse endoscopists, and members of the public was convened to provide guidance on how to incorporate this evidence in light of the other issues listed above. Their level of agreement with the conclusions is reflected in Table 7. The CCC program will use findings from the evidence summary as well as expert panel recommendations to guide its ongoing development. The specific recommendations resulting from this process have been released recently and can be accessed online at https://www.cancercare.on.ca/common/pages/UserFile.aspx?fileId=358486.

Competing Interests
Jill Tinmouth was a lead scientist at Cancer Care Ontario for ColonCancerCheck and paid as a consultant for this work. Catherine Dubé published an editorial in Can J. Gastro 2012 26 417-18 [101]. Michael Gould has an endoscopy clinic in Toronto, is a consultant for Abbott Laboratories, is on the Board of Directors for the Ontario Association of Gastroenterology and the Canadian Digestive Health Foundation, and is the Clinical Lead for ColonCancerCheck. For the remaining authors none were declared. The Program in Evidence-based Care (PEBC) is a provincial initiative of Cancer Care Ontario supported by the Ontario Ministry of Health and Long-Term Care. All work produced by the PEBC is editorially independent from the Ontario Ministry of Health and Long-Term Care.