Whole Exome Sequencing Identifies Two Novel Mutations in a Patient with UC Associated with PSC and SSA

Background Patients diagnosed with ulcerative colitis (UC) associated with primary sclerosis cholangitis (PSC) and sessile serrated adenoma (SSA) are rare. The present study aimed to identify the potential causative gene mutation in a patient with UC associated with PSC and SSA. Methods DNA was extracted from the blood sample and tissue sample of SSA, followed by the whole exome sequencing (WES) analysis. Bioinformatics analysis was utilized to predict the deleteriousness of the identified variants. Multiple sequence alignment and conserved protein domain analyses were performed using online software. Sanger sequencing was used to validate the identified variants. Expression and diagnostic analysis of identified mutated genes was performed in the GSE119600 dataset (peripheral blood samples of PSC and UC) and GSE43841 dataset (tumor samples of SSA). Results In the present study, a total of 842 single nucleotide variants (SNVs) in 728 genes were identified in the blood sample. Two variants, integrin beta 4 (ITGB4) (c.C2503G; p.P835A) and a mucin 3A (MUC3A) (c.C1019T; p.P340L), were further analyzed. MUC3A was associated with inflammatory bowel disease. Sanger sequence in blood revealed that the ITGB4 mutation was fully cosegregated with the result of WES in the patient. Additionally, a variant, tumor protein p53 gene (TP53) (c.86delA; p.N29Tfs∗15) was identified in the tissue sample of SSA. Compared to that in normal controls, ITGB4 was upregulated in both UC and PSC, MUC3A was, respectively, upregulated and downregulated in PSC and UC, and TP53 was downregulated in SSA. ITGB4 and TP53 had a potential diagnostic value for UC, PSC and SSA. Conclusions The present study demonstrated that the ITGB4 (c.C2503G; p.P835A) and MUC3A (c.C1019T; p.P340L) mutations may be the potential causative variants in a patient with UC associated with PSC and SSA. TP53 (c.86delA; p.N29Tfs∗15) mutation may be associated with SSA in this patient.


Background
Primary sclerosis cholangitis (PSC) is a chronic disease with inflammation and fibrotic obliteration of the intra and extrahepatic bile duct, which can lead to bile stasis and hepatic fibrosis [1,2]. PSC is associated with underlying inflammatory bowel disease (IBD), mainly ulcerative colitis (UC) [1]. ere is no effective medical treatment for this condition, and liver transplantation represents the treatment of choice for patients with end-stage [3]. In Western countries, approximately 52-90% of patients with PSC have concurrent UC [4][5][6][7][8]. Meanwhile, the number of patients with UC, as well as with other types of IBDs, continues to increase in Western countries [9,10]. Several current reports suggest that patients with PSC associated with UC (PSC-UC) have increased gradually in some European populations [11]. e involvement of gut bacteria in PSC-UC patients is widely accepted [12,13]. Accumulating evidence suggested that gut dysbiosis and abnormal bile acid metabolites are involved in PSC [14]. However, the pathogenetic mechanism of PSC is far from clear. Furthermore, patients with PSC-IBD have an increased risk of colon cancer and cholangiocarcinoma compared with those with IBD alone [15].
Sessile serrated adenomas (SSA) give rise to 20-30% of colorectal cancer (CRC). SSA displays endoscopic features which present challenges in diagnosis and surveillance. SSA are increasingly found in colectomy specimens from CRC patients. SSA differs from conventional adenomas in several respects. ey arise at any age and are over represented in young patients [5,6]. Histologically, they vary little in appearance for the majority of their dwell time prior to the development of dysplasia. However, once dysplasia develops, SSA rapidly progresses to CRC [7]. SSAs are thought to be responsible for "interval" CRCs which arise within the colonoscopy surveillance interval [7]. Genetic factors that contribute to development of SSA are not well understood, but microsatellite instability (MSI) is thought to be involved. Rajagopalan et al. [16] used the HumanMethylation 450 array to identify methylated genes in SSA compared with normal mucosa, followed by RNA sequencing. ree genes were methylated and downregulated in SSA, including bone morphogenetic protein 3 (BMP3), erythrocyte membrane protein band 4.1 like 3 (EPB41L3), and cystathionine betasynthase (CBS). Using a customised methylation microarray, Beach et al. [17] identified 32 genes which were methylated in SSAs compared with normal mucosa and performed immunohistochemistry for the six most promising candidates.
Whole exome sequencing (WES) is widely used to explore the genetic mechanism of rare diseases [18][19][20]. e present study investigated a patient with UC associated with PSC and SSA. e results on the inheritance of mutant genes may provide novel information regarding the pathological mechanism underlying this unusual condition.

Patients.
For the purpose of this study, a patient with UC associated with PSC and SSA was recruited. e diagnosis of UC was made due to clinical symptoms, endoscopic examination, and biopsy of the colon mucosa. e diagnosis of PSC was based on radiological imaging and liver biopsy. e diagnosis of SSA was based on endoscopic features and histological investigation. e patient's peripheral blood was collected. Genomic DNA was extracted for WES. e tissue sample of SSA from a colonoscopy biopsy was also used for WES analysis. Our study was approved by the Research Ethics Committee of the Peking Union Medical College Hospital (approval number 2603, 2898). e written informed consent was obtained from this patient before the study.

Analysis of Exome Capture.
e genomic DNA was extracted from the blood sample according to the standard procedures. e 3 μg of genomic DNA was fragmented with approximately 200 bp, ligated with adapters, and amplified by ligation-mediated polymerase chain reaction (PCR). e qualified genomic DNA was used for exome capture and high-throughput sequencing. SureSelect XT Library Prep Kit was used to perform exome target enrichment. e captured library was sequenced on the Illumina HiSeq X-Ten sequencer with paired-end 125 bp and mean coverage of 200x [21].

Analysis of Basic Biological Information.
e FastQC and SeqPrep were utilized to detect the quality of primitive data of exome sequencing. Raw data were filtered by removing the adapter, contaminating reads and low-quality reads via SeqPrep and sickle, and the remains were the clean ones. e exome sequencing clean reads were mapped to the reference human genome sequence (hg19) (http://hgdownload.soe. ucsc.edu/goldenPath/hg19/bigZips/) using the Burrows-Wheeler Alignment (BWA) tool (http://bio-bwa. sourceforge.net/bwa.shtml), which can do short reads alignment to a reference genome and support paired-end mapping. e sequence alignment/map (SAM) file was then generated. Picard tool (http://picard.sourceforge.net/) was used to mark and exclude the duplicate reads. Variants (single nucleotide variants (SNVs), insertions, and deletions) calling was performed using the Genome Analysis Toolkit (GATK) [22].

Functional Annotation of Hotspot Mutation Genes.
In order to investigate the biological function of hotspot mutation genes, 73 hotspot mutation genes were analyzed in the Gene Ontology (GO) functional categories, the Online Mendelian Inheritance in Man (OMIM), and the Kyoto Encyclopedia of Genes and Genomes (KEGG) biochemical pathway using the GeneCodis3 (http://genecodis.cnb.csic. es/analysis).

Sanger Sequencing for Variant Validation.
After the systematic filtering, two candidate variants ITGB4 (c.C2503G; p.P835A) and MUC3A (c.C1019T; p.P340L) were selected for variant validation. Genomic DNA was prepared from the patient. After oligonucleotide primer was designed by using Primer-Premier 5.0 (Premier Biosoft International, Palo Alto, CA, USA). PCR analyses and Sanger sequencing were then performed. e amplification process was 15 min at 95°C followed by 40 cycles of 10 sec at 95°C, 30 sec at 55°C, 32 sec at 72°C, and 15 sec at 95°C, 60 sec at 60°C, and 15 sec extension at 95°C. e PCR products were used for Sanger sequencing.

Expression and Diagnostic Analysis of Mutated Genes.
e GSE119600 dataset (peripheral blood samples, involving 45 PSC cases, 93 UC cases, and 47 normal controls) and GSE43841 dataset (tumor samples, involving 6 SSA cases and 6 normal controls) were used for expression and diagnostic analysis of mutated genes. e ROC analysis was used to assess the diagnostic value of mutated genes. e expression result of mutated genes was shown by box plots.

Clinical Manifestations.
At present study, one male patient (63 years old) who was diagnosed with UC associated with PSC and SSA was included in the study. He had bloody diarrhea and lower abdominal pain for 8 years before admission. On colonoscopic examination, diffuse edema, erosion, and petechial hemorrhage were noted along the whole colon and rectum. e diagnosis of UC was made based on clinical, endoscopic, and histological data. A flat polyp with a size of 0.6 cm was detected in the cecum and was resected ( Figure 1). Histological diagnosis of SSA was made according to serrated hyperplasia and basal dilation of crypts ( Figure 2). PSC was diagnosed based on radiological image ( Figure 3) and liver biopsy (Figure 4(a)). Histological examination demonstrated neutrophilic microabscesses in the crypts and infiltration with mononuclear cells and plasma cells in the lamina propria ( Figure 4(b)). Figure 4(a) discloses onion skin-like changes, lymphocyte infiltration plasmacytes, and scattered eosinophils with fibrosis and pseudolobulation. Two portal areas had hyperplastic ductule.

Genetic and In Silico
Analysis. Genomic DNA samples in the blood sample from the patient were analyzed. A total of 842 single nucleotide variants (SNVs) in 728 genes were identified. Among which, 26 genes had three or more rare missense mutations, and 47 genes had two rare missense mutations. Based on the WES in the patient and subsequent Sanger sequencing results, 6 SNVs, the c.C2503G mutation (a substitution from P to A) in ITGB4 (Chr17: 73736495), the c.C1019T mutation (a substitution from P to L) in MUC3A (Chr7: 100550438), the c.G418C mutation (a substitution from D to H) in HLA-A, the c.C40664T mutation (a substitution from P to L) in MUC16, the c.T869A mutation (a substitution from F to Y) in TAS2R46, and c.C3414A mutation (a substitution from D to E) in MUC5B were likely to be associated with UC associated with PSC and SSA. In the tissue sample of the patient when diagnosed with SSA, TP53 mutation (c.86delA; p.N29Tfs * 15) (Chr17: 7579710) was identified. Interestingly, the missense mutation of ITGB4, MUC16, and TP53 is found in the tumor tissue of CRC in the Cancer Genome Atlas (TCGA) dataset (involving 471 CRC cases and 41 normal controls). It is suggested that mutations of ITGB4, MUC16, and TP53 are significantly associated with both SSA and CRC (Supplementary Figure 1).

In Silico Analysis.
e c.C2503G mutation in ITGB4 resulted in a protein alteration of p.P835A. In addition, the c.C40664T mutation in MUC16 resulted in a protein alteration of p.P13555L. Furthermore, it was highly conserved across several species, including humans, Macaca mulatta, Mus musculus, Ovis aries, and Castor canadensis ( Figure 5). Furthermore, the amino acid at position 13555 in the MUC16 protein sequence was located in the SEA domain, and it was highly conserved across several species, including humans, Pan troglodytes, Cricetulus griseus, Eumetopias jubatus, and eropithecus gelada ( Figure 6).

Sanger Sequencing of the Candidate Causative Variants.
In order to further confirm the c.C2503G mutation in UC associated with PSC and SSA, Sanger sequencing was performed on the patient. e results demonstrated that the c.C2503G mutation in ITGB4 was fully cosegregated with patient recruited for WES (Figure 7).

Expression and Diagnostic Analysis of Mutated Genes.
First, we investigated the expression of ITGB4 and MUC3A in peripheral blood of UC and PSC in the GSE119600 dataset and the expression of TP53 in tumor of SSA the GSE43841 dataset ( Figure 8). e result showed that ITGB4 was upregulated in both UC and PSC, compared to that in normal controls. MUC3A was upregulated in PSC and downregulated in UC, compared to that in normal controls. TP53 was downregulated in SSA, compared to that in normal controls. It is suggested that mutation of above genes may be associated with gene expression. Second, we performed the ROC analysis to evaluate the diagnostic value of ITGB4, MUC3A, and TP53 (Figure 9). e ROC result showed that the AUC of ITGB4 and TP53 was more than 0.6, which suggested that they had a diagnostic value for UC, PSC, and SSA.

Discussion
Patients who were diagnosed with UC associated with PSC and SSA were clinically rare. UC is currently affecting more than 3.5 million people in Europe and North America [23]. e disease is recrudescent, and the main symptoms during a relapse are diarrhea, rectal bleeding, and abdominal pain [24]. Approximately every 20 patients with IBD are affected by PSC, a chronic bile duct disease characterized by obliterative fibrosis. is is an incurable, progressive condition associated with an increased risk of cholangiocarcinoma and colon cancer [25,26]. At present, genetic mechanisms   underlying UC associated with PSC pathogenesis remain to be fully elucidated. As previously reported, David Ellinghaus identified two novel risk locis, G protein-coupled receptor 35 (GPR35) and transcription factor 4 (TCF4), at genome-wide significance levels. GPR35 shows associations in both UC and PSC, whereas TCF4 represents a PSC risk locus not associated with UC [27]. WES technology is an effective method for identifying potential causative genes in disease phenotypes [28]. In the present study, WES was performed to identify potential causative genes in the patient who was diagnosed with UC associated with PSC and SSA. Two mutations (c.C2503G and c.C1019T) in ITGB4 and MUC3A genes, respectively, were revealed to be associated with UC associated with PSC and SSA.   Figure 9: e ROC curves of ITGB4 and MUC3A in UC and PSC and TP53 in SSA. epidermal growth factor-like (EGF). Due to their localization and structure at the cell surface, they may participate in cell signaling, cell-matrix, and cell-cell interactions, modulating biological properties under normal and pathological conditions [30]. MUC3A was located in chromosome 7q22 with tandem repeats of 17 amino acids and categorized as a membraneassociated mucin [31]. MUC3A includes two extracellular cysteine-rich epidermal growth factor-like domains that maybe implicated in cell proliferation through growth factors [31].
Poor prognosis with abnormal high MUC3A expression has been researched in breast, pancreatic, gastric, colorectal, appendiceal, and prostate cancer [32][33][34][35][36][37]. Zimmer et al. revealed an aberrant expression of MUC2 and MUC3 of the gastric epithelium in Menetrier's disease during remission of UC [38]. Previous study reported that B-Raf protooncogene, serine/threonine kinase (BRAF), and KRAS protooncogene, GTPase (KRAS) mutations, are early molecular alterations in serrated lesions and are mutually exclusive in colorectal neoplasms [16,17]. In the present study, a mutation (c.C1019T; p.P340L) in the MUC3A gene was detected in the patient with UC associated with PSC and SSA. Moreover, MUC3A was, respectively, upregulated and downregulated in the peripheral blood of PSC and UC patients. It is suggested that mutation of MUC3A may be associated with its expression. Interestingly, ITGB4 had a potential diagnostic value for UC and PSC. In addition, MUC3A was highly conserved among different species, including humans, Ovis aries, Pteropus alecto, eropithecus gelada, and Octodon degus. MUC3A was enriched in the maintenance of gastrointestinal epithelium and extracellular matrix structural constituent and associated with inflammatory bowel disease. erefore, this MUC3A mutation may be considered one candidate molecule in the pathogenesis of UC associated with PSC and SSA.
Microsatellites are short tandem repeats that are found throughout the human genome [39]. Compared with normal tissue, the microsatellite of tumor tissue changes the length of the microsatellite due to the insertion or deletion of repeating units, called microsatellite instability (MSI) [40]. Numerous studies have shown that MSI is caused by defects in the mismatch repair (MMR) gene and is closely related to tumorigenesis [41,42]. MSI has been clinically used as an important molecular marker for the prognosis of CRC and the development of adjuvant therapy and is used to assist in the screening of Lynch syndrome [43]. SSA shares molecular features with colon tumors that have MSI and a methylated phenotype, indicating that these lesions are precursors that progress via the serrated neoplasia pathway. But the genetic mutation in the pathogenesis of SSA is needed to be fully clarified. As a member of the integrin family, ITGB4 generally forms heterodimers with integrin 6 to achieve its biological functions, and integrin molecules can also coordinate with receptor tyrosine kinases (RTKs) to regulate downstream signaling pathways. Recent studies demonstrated that ITGB4 played an important role in promoting tumorigenesis in prostate cancer [44], breast cancer [45], gastric cancer [46], and lung squamous cell carcinoma [47]. In the present study, a mutation (c.C2503G; p.P835A) in the ITGB4 gene was detected in the patient with UC associated with PSC and SSA. Additionally, ITGB4 was upregulated in peripheral blood of both UC and PSC patients. It is indicated that mutation of ITGB4 may affect its expression. e mutation is involved in the pathogenesis of colon cancer [48]. ITGB4 gene was highly conserved among different species, including humans, Macaca mulatta, Mus musculus, Ovis aries, and Castor canadensis. erefore, ITGB4 mutation may be considered another candidate molecule in the pathogenesis of this patient.
In addition, a variant, TP53 (c.86delA; p.N29Tfs * 15) was identified in the tissue sample of the patient when diagnosed with SSA. Moreover, TP53 was downregulated in tumor of SSA. It is supposed that mutation of TP53 may be related to the expression level of TP53. Significantly, TP53 had a potential diagnostic value for SSA, which is considered as a diagnostic biomarker for SSA. TP53 plays an important role in senescence or apoptosis in cells with damaged genomes [49]. e association between TP53 mutations and IBDassociated CRC has been reported [50]. A higher frequency of TP53 mutation is found in UC-CRC compared to sporadic CRC. An increase in the frequency of TP53 in the bile of PSC patients with the malignancy has been demonstrated. In addition, subsets of serrated adenomas showed TP53 mutation. Our study indicated that the TP53 (c.86delA; p.N29Tfs * 15) mutation may be associated with disease progression of UC associated with PSC and SSA.
In conclusion, mutations in ITGB4 (c.C2503G; p.P835A) and MUC3A (c.C1019T; p.P340L) may predispose patients with UC to PSC and SSA. MUC3A is associated with inflammatory bowel disease. ese findings may reveal the high degree of genetic heterogeneity in patient with UC associated with PSC and SSA. Additionally, TP53 (c.86delA; p.N29Tfs * 15) may be regarded as a risk diagnostic factor for patients developed to SSA. Future work will be performed to improve understanding of this disorder. However, there is a limitation of the present study. First, the sample size is small. Larger peripheral blood and tumor samples from patients with UC associated with PSC and SSA are further needed for analysis. Second, identified mutations are further needed to be verified in colon tumors or precancerous lesions. ird, the in vitro study is further needed to verify the molecular biological changes caused by mutations, such as cell proliferation, dedifferentiation, and tumorigenesis phenotype of intestinal epithelium in cell lines or animal models. Fourth, analysis of gut dysbiosis and differential metabolites is further needed to study in PSC.

Data Availability
e datasets used and/or analyzed during the current study are available from the corresponding author upon request.