Mutation Profiling of Premalignant Colorectal Neoplasia

Accumulation of allelic variants in genes that regulate cellular proliferation, differentiation, and apoptosis may result in expansion of the aberrant intestinal epithelium, generating adenomas. Herein, we compared the mutation profiles of conventional colorectal adenomas (CNADs) across stages of progression towards early carcinoma. DNA was isolated from 17 invasive adenocarcinomas (ACs) and 58 large CNADs, including 19 with low-grade dysplasia (LGD), 21 with LGD adjacent to areas of high-grade dysplasia and/or carcinoma (LGD-H), and 28 with high-grade dysplasia (HGD). Ion AmpliSeq Comprehensive Cancer Panel libraries were prepared and sequenced on the Ion Proton. We identified 956 unique allelic variants; of these, 499 were considered nonsynonymous variants. Eleven genes (APC, KRAS, SYNE1, NOTCH4, BLNK, FBXW7, GNAS, KMT2D, TAF1L, TCF7L2, and TP53) were mutated in at least 15% of all samples. Out of frequently mutated genes, TP53 and BCL2 had a consistent trend in mutation prevalence towards malignancy, while two other genes (HNF1A and FBXW7) exhibited the opposite trend. HGD adenomas had significantly higher mutation rates than LGD adenomas, while LGD-H adenomas exhibited mutation frequencies similar to those of LGD adenomas. A significant increase in copy number variant frequency was observed from LGD through HGD to malignant samples. The profiling of advanced CNADs demonstrated variations in mutation patterns among colorectal premalignancies. Only limited numbers of genes were repeatedly mutated while the majority were altered in single cases. Most genetic alterations in adenomas can be considered early contributors to colorectal carcinogenesis.


Introduction
Cancers are highly heterogeneous, polygenic disorders that arise in multistep microevolutionary processes involving the selection of successive cellular clones that occur in response to specific environmental factors, as well as genetic influences. Colorectal cancer (CRC) is a common epithelial neoplasia worldwide and a leading cause of cancer-related morbidity and mortality [1]. As a result of the selective growth advantage of dysplastic cells over their normal neighbors, the morphological consequences of molecular alterations lead to progressive cytological and architectural disorganization, recognizable as the adenoma-carcinoma sequence, first described by Fearon and Vogelstein [2]. There are multiple colorectal neoplastic pathways, including the chromosomal instability (CIN) pathway, the microsatellite instability (MSI) pathway, and the CpG island methylator pathway (CIMP, also referred to as the serrated neoplasia pathway) [3].
Histologically, conventional colorectal adenomas (CNADs) are classified based on their proportions of villous components (tubular, tubulovillous, or villous adenoma) and the severity of dysplasia (low grade or high grade) [7]. Villous growth and high-grade dysplasia (HGD) are the most important determinants associated with the risk of adenomas transforming into malignant growths and are also closely related to adenoma size [8]; however, most adenomas stabilize their growth progression or even regress [9]. Mutations of APC, KRAS, and β-catenin represent key events in the development of adenomas, while mutations of PIK3CA and TP53 occur during progression to invasive CRC [10][11][12]. The earliest clinically relevant CRCs are tumors that invade the submucosa, but not the muscular layer. It remains unknown which molecular alterations induce the final transition towards invasive growth; therefore, to prevent CRC, both adenomas and premalignant serrated polyps should be resected [3].
While large-scale sequence profiling of CRCs has advanced the understanding of their genetic characteristics, our knowledge of benign and premalignant CNADs remains limited. Recently determined mutation profiles, comprising limited numbers of genes, can clearly distinguish CNADs and CRCs [13,14]. In this study, we compared the mutation profiles and abundance during progression of CNADs towards early carcinoma by deep sequencing the coding regions of 409 "cancer genes." Consistent with previous reports [5,13,14], total numbers of nonsynonymous variants were significantly higher in adenomas exhibiting HGD than in those with low-grade dysplasia (benign adenomas); however, they did not differ between benign adenomas and carcinomas.

Material and Methods
This was a retrospective study using a collection of paraffinembedded colorectal polyps removed by endoscopic polypectomy at the Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Warsaw, Poland, between 2010 and 2016. Based on pathology reports, 58 adenomas ≥ 2 cm were reevaluated by referral pathologists (second opinion), 18 polyps were excised in one piece, and 40 were removed using a piecemeal technique. Patient characteristics are detailed in Table 1. . All procedures  were performed in accordance with the ethical standards  of the local bioethical committee who gave permission  for this retrospective study (approval ID: 13/2008 and  3/2019) and according to the principles of the 1964 Declaration of Helsinki. 2.2. DNA Isolation and Sequencing Using the Ion AmpliSeq Comprehensive Cancer Panel. Several series of sections were prepared from different parts of each specimen, and the upper and lower sections from each group were evaluated by pathologists to control for the relative cell type content. DNA was isolated from sections representative of a given component (tubular or villous adenoma) that contained the highest percentages of epithelial cells. In addition, fragments of polyps containing HGD and/or a carcinoma invading the submucosa were macrodissected. In total, DNA was extracted from 85 tumor samples using a QIAamp DNA FFPE (formalin-fixed paraffin-embedded) Tissue Kit (Qiagen), following the manufacturer's protocol.

Compliance with Ethical Standards
DNA sample concentrations were measured using a Qubit fluorometer (Invitrogen), following the manufacturer's instructions, and stored at -20°C. Ion AmpliSeq Comprehensive Cancer Panel libraries were prepared from DNA for analysis of the coding regions of 409 oncogenes and tumor suppressor genes by sequencing using the Ion Proton system (Thermo).

Postsequencing Data Analyses and Variant Calling.
Raw sequence reads were processed using the Torrent Suite analysis pipeline and mapped to the human genome assembly hg19 using TMAP. Variant calls were made with Torrent Variant Caller (version 5.6.0), using default parameters for somatic variants. Called variants were filtered with bcftools (version 1.3) using the following parameters: phred-scaled genotype quality ðGQÞ ≥ 5, read depth ðDPÞ ≥ 20, flow evaluator alternate allele observation count ðFAOÞ ≥ 4 for indels and ≥2 for SNPs, flow evaluator read depth ðFDPÞ > 6 for SNPs and >10 for indels, strand bias in a variant relative to reference ðSTBÞ ≤ 0:9 for SNPs, and number of consecutive repeats of the alternate allele in the reference genome ðHRUNÞ ≤ 6 for indels. Filtered variants were further filtered using the fpfilter tool with default parameters except for the following: -min-strandedness, 0.05; -max-mapqual-diff, 10; -max-readlen-diff, 10; and -max-mm-qualsum-diff, 50. Variants with alternate allele observations < 20% of total allele observations were discarded. Annotation of variants and prediction of their consequences for mature proteins were conducted using ANNOVAR [15], while deleteriousness was assessed using the SIFT [16] and PolyPhen [17] algorithms. Variants with population frequencies > 0:001, according to the 1000 Genomes Project database (European and global), the Exome Sequencing Project of the National Heart, Lung, and Blood Institute [18], and the Exome Aggregation Consortium database [19], were discarded. To reduce the probability of listing a germline variant specific for the local population, we removed all variants present in more than 35% of samples. Copy number variations (CNVs) were called using Contra (version 2.0.8, [20]) with default parameters (except -minReadDepth, which was set to 32), and a reference baseline was created using sequencing results from 78 blood samples collected from patients with pancreatic cysts analyzed in parallel for another project. CNVs called in ≥20% of samples were discarded as possible false-positive results.

Driver Mutation and Nonsynonymous Variant Analysis.
Two types of mutation were specified: nonsynonymous rare mutations and the so-called "driver" mutations. Nonsynonymous variants were considered driver mutations if they fulfilled at least one of the following conditions: (i) a known driver gene (as designated by CRAVAT [21]), (ii) cancer driver FDR of ≤0.1 (computed using CHASM [22]), and (iii) a gene with a cancer driver FDR value of ≤0.1.
The trend for changes in mutation frequencies in benign polyp samples through to those in malignant specimens was assessed using the Cochran-Armitage trend test. Changes in the numbers of driver and nonsynonymous mutations were assessed by linear regression, with benign polyps as the reference.
A random forest classifier was prepared for three groups, benign adenomas, HG adenomas, and carcinomas, taking into account the presence or absence of nonsynonymous and driver mutations in genes. The significance of p values was assessed using the rfPermute package (https://cran.r-project .org/web/packages/rfPermute/index.html, Eric Archer).

Results
To establish genetic profiling across a spectrum of colorectal neoplasias, we sequenced 409 cancer-related gene coding regions in 85 samples dissected from 58 CNADs > 2 cm. Among these, 19 samples were from adenomas (ten tubular and nine tubulovillous) containing only low-grade dysplasia (LGD, benign adenomas), 21 were from adenomas with LGD associated with synchronous high-grade dysplasia adenoma and/or carcinoma components (premalignant-related adenomas) (LGD-H), 28 were from high-grade dysplasia (HGD) adenomas, and 17 were from submucosal invasive adenocarcinoma (AC). For the purpose of statistical analysis of mutation frequencies, LGD and LGD-H groups were treated as one.
The median sequencing depth was 750×, and >95% of targeted sequences were covered at ≥50× in 76 samples (90%). We identified 956 unique single-nucleotide variants across the 409 genes included in the Comprehensive Cancer Panel (Table S1). Among them, 499 were considered nonsynonymous allelic variants. The average genetic variant rate was 15.7/Mb, and the nonsynonymous variant rate was 8.0/Mb; the mean number of nonsynonymous variants discovered per sample was 10.3. In total, 1632 point mutations were detected in sequences included in the Comprehensive Cancer Panel. The average rate of mutations in The Cancer Genome Atlas (TCGA) CRC dataset was 8.8/Mb. While in five (5.9%) samples, nonsilent mutation rates were 20-30/Mb, none could be considered hypermutated. The structure of genetic variance among the investigated samples is presented in Figure 1.
Eight hundred and forty-three unique CNVs were identified across 292 genes. The mean number of CNVs in LGD samples (12.4) was significantly lower than that in HGD samples (43.6, p = 0:0004). Similarly, the mean number of CNVs in CA samples (76.1) was significantly higher than that in HGD samples (p = 0:027).
Nonsynonymous variants (Table S1) were submitted to further analysis. In total, 92 driver genes were selected (Table S2). Nine genes were consistently mutated; i.e., they carried a nonsynonymous mutation in at least one sample per group and were mutated in ≥15% of samples on average (Table 2(a)), and two other genes were mutated in ≥15% of samples on average but not mutated in one of the tumor groups (Table 2(b)).
Six genes demonstrated a statistically significant trend in changes for nonsynonymous variant frequencies from nonmalignant to malignant groups (nominal p value < 0.05), of which HNF1A, TP53, FBXW7, and BCL2 were mutated in ≥10% of samples on average. The mutation proportions for TP53 and BCL2 rose, while those of FBXW7 and HNF1A declined along progression to CA. These four genes also exhibited a significant trend in the proportion of driver mutations since all nonsynonymous mutations were considered driver mutations (Tables 2(a) and 2(b), Table S3). LGD: low-grade dysplasia; HGD: high-grade dysplasia.
A random forest classifier designated 10 genes, in which the presence or absence of a driver mutation was statistically significant in the classification of at least one group (nominal p < 0:05, Table 3). Three of these genes were also important for the construction of the entire model: KRAS, HNF1A, and TP53.

Discussion
Stepwise, nonrandom accumulation of allelic variants in genes that regulate cellular proliferation, differentiation, and apoptosis can cause expansion of the aberrant intestinal epithelium into adenomas. Sequence alterations in specific genes, including APC and KRAS, contribute to the development of early polypoid lesions, while other genetic aberrations, such as inactivating mutations of TP53, can promote malignancy and are observed in more advanced stages of CRC development [23]. Intratumor heterogeneity, resulting from the presence of different subclones, can lead to discordance in the mutation landscapes of tumor cells isolated from different components of adenomatous polyps. For example, 43% of adenoma components and 51% of carcinoma compo-nents from 70 tumor samples comprising both adenoma and carcinoma were positive for KRAS mutations, while 23% generated discordant results [24]. Furthermore, KRAS mutations were identified in a small number of samples from the histologically normal colonic mucosa adjacent to carcinomas [25].
Mutations that provide a growth advantage are driver alterations, while those that occur coincidentally alongside drivers are referred to as passenger events [26]. Allelic variants in adenomas can be considered early driver events that contribute to the initiation of tumorigenesis, while those enriched in carcinomas can be classified as later driver mutations involved in tumor progression. Despite the high mutation loads of adenomas, the contributions of various mutations to oncogenesis differ [27]. Consequently, only a small percentage of adenomas will progress to become CRC, with the majority remaining stable over time or even regressing. While <5% of small tubular adenomas with LGD will transform into CRC [28], the 10-year cumulative risk of an advanced adenoma (≥25% villous component, HGD, or size ≥ 10 mm) developing into cancer is 25% for patients aged 55 years and increases to approximately 40% for those aged 80 years [29]; thus, advanced adenomas have  a higher malignancy potential than nonadvanced adenomas [3]. Nevertheless, it remains unknown which molecular alterations induce the final transition towards malignancy. We analyzed mutation profiles in components of CNADs comprised entirely of either LGD or LGD with synchronous HGD and/or carcinoma components. As corresponding nor-mal samples were unavailable, we employed highly stringent filtering, based on existing databases of variants in the general population. All variants present in >0.1% of populations were discarded, resulting in a >10-fold reduction in the variant dataset. Using the Ion AmpliSeq Comprehensive Cancer Panel, which provides multiplexed targeted selection of all LGD, n (%) LGD LGD: low-grade dysplasia; HGD: high-grade dysplasia; LGD-H: low-grade dysplasia adjacent to areas of high-grade dysplasia and/or carcinoma; AC: adenocarcinoma. LGD: low-grade dysplasia; HGD: high-grade dysplasia; AC: adenocarcinoma. exons of 409 tumor suppressor genes and oncogenes implicated in cancer, we identified 956 unique variants (after all filtering steps), of which 499 were considered nonsynonymous allelic variants in 214 "cancer genes." Among these mutated genes, consistent with previous studies [5,13,14], APC, KRAS, and SYNE1 were mutated in 76.5%, 62.3%, and 35.3% of samples, respectively, and another 11 genes (BCL2, BLNK, FBXW7, GNAS, LRP1B, MLL2/KMT2D, MLL3/KMT2C, PKHD1, RNF213, TAF1L, TCF7L2, and TP53) were mutated in ≥10% of all samples. While the majority of allelic variants were found in individual cases, all genes that were mutated in two or more carcinoma components were also altered at least in one adenoma component. "Private" variants likely arise at an early stage of adenoma development, generating multiclonal adenomas [30].
In a study of the mutation profiles of synchronous colon adenoma and CRC using whole exome sequencing (WES), Lee et al. [14] found nonsilent allelic variants in the cancer census genes, APC, KRAS, TP53, GNAS, NRAS, SMAD4, ARID2, and PIK3CA, in 12 HGD adenomas, which matched sequences in classical adenoma-carcinomas, and reported allelic variants in MTOR, ACVR1B, GNAQ, ATM, CNOT1, EP300, ARID2, RET, and MAP2K4 in colon adenomas for the first time [2,10]. Lin et al. discovered four additional affected genes (CTNNB1, KRTAP4-5, GOLGA8B, and TMPRSS13) in adenomas that represented potential new somatic driver mutations with characteristics of oncogenes [13]. The majority of mutated genes previously reported in adenomas [5,13,14] were also found in our study; however, most were present at a low frequency.
Of 138 potential driver genes (74 tumor suppressor genes and 64 oncogenes), a typical sporadic CRC may only contain 2-8 driver gene alterations, making each tumor unique [31][32][33]. By covering 9423 tumor exomes using 26 computational tools, 299 driver genes were cataloged recently in the contexts of their anatomical sites and cancer/cell types [34]. APC, KRAS, BRAF, PIK3CA, SMAD4, and TP53 are the most common "drivers" among late CRCs [35]; however, distinguishing rare driver mutations from passengers remains challenging. Based on strict criteria for driver gene classification, we selected 92 affected genes in advanced adenomas that could be considered early drivers in colorectal tumorigenesis.
Genes displaying a consistent trend in mutation prevalence from nonadvanced to advanced adenomas and CRC could reflect progress towards malignancy [13]. While mutations in TP53 and PIK3CA are characteristic of late-stage CRC [36], pathogenic TP53 allelic variants have also been reported in colon adenomas [10,14]. In this study, 0%, 21.4%, and 35.3% of LG adenomas, HG adenomas, and synchronous carcinoma components, respectively, were affected by TP53 mutations. TP53 was one of the two genes with a significant consistent trend in mutation prevalence towards malignancy, while four other genes (HNF1A, KAT6B, FBXW7, and NFKB1) exhibited the opposite trend, with mutation frequencies decreasing towards carcinoma. Similar increases followed by decreases in the frequency of a particular gene mutation during colon tumorigenesis have been noted previously [37]. These may represent mutations acting as drivers during a specific phase of colorectal carcinogenesis, followed by loss when their role is no longer essential for growth advantage [37,38]. It is also possible that the observed decreased or increased frequencies in mutation profiles result from different levels of development towards malignancy across the adenoma tissue.
Genes involved in WNT signaling are highly conserved through evolution, and recurrent mutations in multiple genes encoding key proteins in the canonical WNT/βcatenin signaling pathway occur in a wide spectrum of human cancers [39,40]. Among these genes, APC and CTNNB1 (which encodes β-catenin) are key factors in the reprogramming of the nuclear T cell factor/lymphoid enhancer factor (TCF/LEF) transcriptional network. The majority of APC mutations in colorectal neoplasia are truncating and affect the WNT pathway [35]. As expected, we found that APC was the most frequently mutated gene across the adenomas studied, while other WNT pathway genes (CTNNB1, EP300, TCF7L2, and AMER1) were altered less frequently. All of these genes are implicated as drivers of WNT-dependent tumor growth [39]. Both APC and CTNNB1 are classic oncogenes, while AMER1 (also known as the Wilms tumor gene on the X chromosome, WTX) is a tumor suppressor gene [41]. In addition to WNT-related genes, other mutated genes, including BCL2, FBXW7, GNAS, HNF1A, KRAS, MLL2/KMT2D, MLL3/KMT2C, SYNE1, TCF7L2, TP53, NOTCH1, PBRM1, RET, RARA, and FN1, can be considered drivers of colorectal tumorigenesis.
Structural variations in the human genome, including deletions, insertions, duplications, and large-scale copy number variants, are collectively termed CNVs. CNVs are influential factors in gene expression in both normal and various cancer tissues [42][43][44]. As expected, we detected a progressive increase in CNVs from LGD through HGD to malignant samples. Most sporadic CRCs are CIN and may be partly attributable to somatic APC mutations [45]. However, the first large-scale genome-wide analysis investigating rare CNVs in sporadic CRC indicated that rare CNVs increased the risk of CRC and that the assembly of chromatin or nucleosome-related or olfaction-associated genes might contribute to this elevated risk [46]. Thus, the genomic instability noted in CRC tumorigenesis may not be associated with APC mutations or its associated alterations within the WNT signaling pathway. Interestingly, comparisons of genetic aberrations detected in normal colon samples from patients with CRC with those found in corresponding polyp tissue and peripheral blood indicate that CNVs were present, not only in tumor tissues but also in the normal colon and blood [47].
In summary, we confirmed an increase in the mutation load in HGD compared with LGD adenomas, while carcinoma components of adenomas had mutation loads similar to those of LGD adenomas (Figure 1). Furthermore, the number of CNVs progressively increased in samples from adenomas representing LGD, HGD, and CA samples. Most genetic alterations detected in this study, including those involved in the WNT signaling pathway, can be considered early contributors to colorectal carcinogenesis [48]; however, only a limited number of genes were consistently mutated in ≥10% of cases, while most gene changes affected single cases. The ultimate contribution of these mutations to the process of CRC tumorigenesis remains unclear.

Data Availability
The sequencing datasets are not publicly available due to a concern to protect individual patient confidentiality but are available from the corresponding author on reasonable request.