Computational Identification of Tumor Suppressor Genes Based on Gene Expression Profiles in Normal and Cancerous Gastrointestinal Tissues

Cancer prevails in various gastrointestinal (GI) organs, such as esophagus, stomach, and colon. However, the small intestine has an extremely low cancer risk. It is interesting to investigate the molecular cues that could explain the significant difference in cancer incidence rates among different GI tissues. Using several large-scale normal and cancer tissue genomics datasets, we compared the gene expression profiling between small intestine and other GI tissues and between GI cancers and normal tissues. We identified 17 tumor suppressor genes (TSGs) which showed significantly higher expression levels in small intestine than in other GI tissues and significantly lower expression levels in GI cancers than in normal tissues. These TSGs were mainly involved in metabolism, immune, and cell growth signaling-associated pathways. Many TSGs had a positive expression correlation with survival prognosis in various cancers, confirming their tumor suppressive function. We demonstrated that the downregulation of many TSGs was associated with their hypermethylation in cancer. Moreover, we showed that the expression of many TSGs inversely correlated with tumor purity and positively correlated with antitumor immune response in various cancers, suggesting that these TSGs may exert their tumor suppressive function by promoting antitumor immunity. Furthermore, we identified a transcriptional regulatory network of the TSGs and their master transcriptional regulators (MTRs). Many of MTRs have been recognized as tumor suppressors, such as HNF4A, ZBTB7A, p53, and RUNX3. The TSGs could provide new molecular cues associated with tumorigenesis and tumor development and have potential clinical implications for cancer diagnosis, prognosis, and treatment.


Introduction
e gastrointestinal (GI) tract is a highly organized organ system in human with many important functions, such as the absorption of food and nutrients, endocrine secretion, and resistance to microorganism invasion. e GI cancers prevail and account for a large number of cancer deaths worldwide, of which colorectal, stomach, esophagus, liver, and pancreatic cancers are the most common [1]. However, the small intestine, an important GI tract organ for the food and nutrients absorption, has much lower cancer incidence compared to other GI tract organs. erefore, it is interesting to investigate what molecular features may explain the significantly lower cancer incidence rate in small intestine than in other GI tract organs. With the advancement of genomics technology, a large volume of cancer and normal tissue genomics data has been produced that would enable us to investigate the association between genomic features and the significantly differential cancer incidence rates among different human tissues.
In this study, we compared the gene expression profiling between the tissue (small intestine) with a low cancer risk and the tissues (colon, stomach, and esophagus) with a high cancer risk using the Genotype-Tissue Expression (GTEx) data [2]. We identified the genes that were significantly differentially expressed between both groups of tissues. Moreover, we compared the gene expression profiling between cancer and normal tissues in colon, stomach, and esophagus cancer cohorts using e Cancer Genome Atlas (TCGA) data and identified differentially expressed genes (DEGs) between cancer and normal tissues. We obtained common DEGs between the results from both analyses and divided them into tumor promoter genes (TPGs) and tumor suppressor genes (TSGs). e TPGs were those that are more highly expressed in colon, stomach, and esophagus than in small intestine and more highly expressed in colon, stomach, and esophagus cancers than in their normal tissues. By contrast, the TSGs were those that are more lowly expressed in colon, stomach, and esophagus than in small intestine and more lowly expressed in colon, stomach, and esophagus cancers than in their normal tissues. Figure 1 is a summary of the analysis pipeline for identifying TPGs and TSGs. Furthermore, we verified these results using several Gene Expression Omnibus (GEO) datasets [3]. e downstream analysis of the genes identified was performed based on TCGA data.

Materials.
e gene expression profiling of normal tissues (small intestine, colon, stomach, and esophagus) was downloaded from GTEx (https://gtexportal.org/home/) and GEO (https://www.ncbi.nlm.nih.gov/gds) databases. e gene expression profiling of colon, stomach, and esophagus cancers and their normal tissues were downloaded from TCGA (https://portal.gdc.cancer.gov/). In addition, we obtained the gene expression profiling and clinical immunotherapy response data of two melanoma cohorts (Nathanson et al. [4] and Roh et al. [5]) from the associated publications. e sample size of cancer and normal tissues are presented in Supplementary Table S1.

Identification of Differentially Expressed Genes.
We identified differentially expressed genes (DEGs) between two classes of samples using Student's t-test. e false discovery rate (FDR) estimated by the Benjamini and Hochberg (BH) method [6] was used to adjust for multiple tests. e threshold of FDR <0.05 and mean gene-expression foldchange >1.5 was used to identify the DEGs between two classes of samples.

Identification of TPGs and TSGs.
Based on the GTEx datasets, we identified three sets of more highly expressed genes and three sets of more lowly expressed genes in small intestine by comparing small intestine tissue versus colon tissue, intestine tissue versus gastric tissue, and intestine tissue versus esophagus tissue. We obtained the set of genes common in the three sets of more highly expressed genes in small intestine (termed as SI-HEGs) and the set of genes common in the three sets of more lowly expressed genes in small intestine (termed as SI-LEGs). Furthermore, based on the TCGA datasets, we identified three sets of upregulated genes and three sets of downregulated genes in cancer by comparing colon cancer versus colon tissue, gastric cancer versus gastric tissue, and esophagus cancer versus esophagus tissue. We termed the set of genes consistently upregulated in the three cancer types as Ca-HEGs and the set of genes consistently downregulated in the three cancer types as Ca-LEGs. We defined TPGs as the set of genes overlapping between SI-LEGs and Ca-HEGs and TSGs as the set of genes overlapping between SI-HEGs and Ca-LEGs.

Survival Analyses.
We compared overall survival (OS) and disease-free survival (DFS) between cancer patients with higher gene expression levels (>median) and cancer patients with lower gene expression levels (<median) and between cancer patients with higher tumor immunity (>median) and cancer patients with lower tumor immunity (<median). e tumor immunity was calculated by the ABSOLUTE algorithm [7]. e significance of survival time differences was evaluated by the log-rank test using a threshold of P < 0.05. Kaplan-Meier curves were used to show the survival time differences. e survival analyses were performed in the TCGA datasets.

Correlation of Gene Expression Levels with Tumor Purity, Tumor Immune Cell Infiltration Levels, and Immunotherapy
Response. We evaluated the correlations of gene expression levels [8] with tumor purity and the abundance of tumor immune cell (CD8+ T cells and dendritic cells) infiltration using TIMER [9]. In both melanoma cohorts (Nathanson et al. [4] and Roh et al. [5]), we divided cancer samples into two groups based on the median expression levels of TSGs and compared the immunotherapy response rates between them.

Comparison of the Methylation Levels of TSGs between
Tumor and Normal Tissues. We compared the mean gene promoter methylation levels (β values) between tumor and normal tissues in 18 TCGA cancer types and used the linear regression model to evaluate the correlation between gene expression levels and mean gene promoter methylation levels in these cancer types using MethHC [10].

Identification of Master Transcriptional Regulators
(MTRs) of TSGs. We used iRegulon [11], a Java-based plugin in Cytoscape, to identify the MTRs that regulate the TSGs. iRegulon uses a large collection of transcription factor (TF) motifs and ChIP-seq tracks to identify the target genes of TFs on the basis of the normalized enrichment score (NES).
ese results indicate that these gene signatures are importantly involved in tumor suppression by the regulation of metabolism, immune, and cell growth signaling-associated pathways.
Moreover, we found that a majority of the 17 TSGs showed significantly lower expression levels in various cancers apart from colon, stomach, and esophagus cancers than in normal tissues. For example, RASGRP2 was downregulated in 20 TCGA cancer types ( Figure 2). CCL21 had lower expression levels in 18 TCGA cancer types (Supplementary Figure S1). CBFA2T3 and XPNPEP2 were downregulated in 15 and 10 TCGA cancer types (Supplementary Figures S2 and S3). ese results suggest that the TSGs may play crucial roles in tumor suppression in a wide variety of cancer types.
We compared the DNA methylation levels of TSGs between cancer and normal tissues and found that many TSGs exhibited significantly higher methylation levels in various cancers. For example, the methylation levels of RASGRP2 promoter were significantly higher in 18 TCGA cancer types than in their normal tissues ( Figure S5). e promoter region of CCL21 had significantly higher methylation levels in 17 TCGA cancer types (Supplementary Figure S7). Furthermore, linear regression analysis showed that the methylation levels of TSGs had a significant inverse correlation with the expression levels of TSGs in many cancer types. For example, RASGRP2 methylation levels significantly inversely correlated with its expression levels in stomach adenocarcinoma (STAD), PAAD, and prostate adenocarcinoma (PRAD) with the absolute correlation coefficient not less than 0.3 ( Supplementary Figures S6 and S7). In addition, previous studies have shown that several TSGs had significantly higher methylation levels in cancer, such as CBFA2T3 [13] and TMEM25 [14]. ese results suggest that the downregulation of many TSGs is associated with their elevated methylation levels in cancer.  and mean gene-expression fold-change >1.5. e FDR was calculated by the Benjamini and Hochberg (BH) method [6]. SI: small intestine; FDR: false discovery rate.
e higher expression levels of TSGs are associated with lower tumor purity and more active antitumor immune response in cancer.
We found that the expression levels of TSGs tended to inversely correlate with tumor purity in various cancers. For example, the RASGRP2 expression levels were inversely associated with tumor purity in 25 TCGA cancer types/subtypes with the Spearman rank correlation coefficient (cor) not greater than −0.3 ( Figure 4). e expression levels of CCL21 had a significant inverse correlation with tumor purity in 17 TCGA cancer types (cor ≤−0.3) (Supplementary Figure S8).
ese results indicate that the higher expression levels of TSGs may correlate with more nontumor components in cancer. In fact, the expression levels of TSGs were likely to have a positive correlation with antitumor immune signatures in various cancers. For example, the RASGRP2 expression levels positively correlated with the enrichment levels of CD8+ T cells in 14 TCGA cancer types/subtypes and with the enrichment levels of dendritic cells in 26 TCGA cancer types/subtypes (cor ≥0.3) ( Figure 5). e expression levels of CCL21 had a significant positive correlation with the enrichment levels of CD8+ T cells and dendritic cells in 5 and 6 TCGA cancer types, respectively (cor ≥0.3) (Supplementary Figure S9). Previous studies also showed that the expression of certain TSGs could promote antitumor immunity in diverse cancers, such as CCL21 [15][16][17], MADCAM1 [18], and FCER2 [19]. Again, these results suggest that the elevated expression of TSGs is associated with a favorable prognosis in cancer since the higher levels of tumor-infiltrating lymphocytes (TILs) are associated with improved survival in cancer patients [20,21].
Since the elevated expression of TSGs was associated with the higher levels of TILs in tumor and the TILs levels were a positive predictor for cancer immunotherapy response (ITR) [22], the expression levels of TSGs could be positively associated with ITR in cancer. To prove this hypothesis, we examined the correlation between the expression levels of TSGs and ITR in two cancer (melanoma) cohorts (Nathanson et al.'s cohort [4] and Roh et al.'s cohort [5]) with anti-CTLA-4/PD-1 immunotherapy. We found that the higher GSTA2 expression levels were associated with a significantly higher ITR in Nathanson Table 2). ese results proved the hypothesis that the expression of TSGs is capable of promoting ITR in cancer.

Master Transcriptional Regulators (MTRs) of TSGs.
To understand the regulatory mechanism underlying the different cancer risk in varying tissues, we identified 34 MTRs that significantly regulated the 17 TSGs (NES >3) ( Figure 6). Among the 34 MTRs, HNF4A (hepatocyte nuclear factor 4 alpha) was the most highly enriched which regulated 14 TSGs. HNF4A plays a role in the development of multiple organs including intestines, liver, and kidney and functions as a repressor of cell proliferation [23]. e reduced expression of HNF4A has been associated with tumorigenesis in various cancers [24,25].
is is consistent with the tumor suppressive function of the TSGs regulated by HNF4A. e second highly enriched MTR of TSGs was ZBTB7A regulating 11 TSGs. ZBTB7A acts as a tumor suppressor by repressing the expression of key genes in tumor glycolysis [26] and negatively regulating TGF-β pathway [27]. Previous studies have revealed its tumor suppressor role in a wide variety of cancers [28][29][30]. Again, this is in line with the tumor suppressive function of the TSGs regulated by ZBTB7A. Some other MTRs have been also identified as tumor suppressors, such as p53 [31][32][33] and RUNX3 [34][35][36]. Collectively, these results suggest that the identification of MTRs of TSGs may provide insights into the regulatory mechanism underlying the different cancer risks in different tissues and cancer development.

Discussion
Cancer prevails in variously different human organs, such as lung, colon, stomach, esophagus, liver, pancreas, brain, head and neck, breast, and kidney. However, some human organs have an extremely low cancer risk, such as small intestine, spleen, and heart. Although a recent study has proposed that the total number of stem cell divisions largely explained the cancer risk in different tissues [37], it cannot explain why some tissues with a large number of stem cell divisions have a low cancer risk, such as small intestine. us, it is necessary to investigate the other molecular cues that could explain the different cancer risk among different tissues. We identified 17 TSGs which showed significantly higher expression levels in small intestine than in other GI tissues including esophagus, stomach, and colon. Moreover, these genes were more lowly expressed in GI cancers than in GI normal tissues and were also downregulated in many other cancer types relative to their normal control, suggesting their tumor suppressor role. e tumor suppressive function of these TSGs was further confirmed by the fact that the elevated expression of many TSGs was associated with a better survival prognosis in various cancers. Furthermore, we revealed that the downregulation of many TSGs was associated with their promoter hypermethylation in cancer, demonstrating the important relationship between DNA methylation and cancer [38]. Pathway analysis showed that these TSGs were mainly involved in metabolism, immune, and cell growth signaling-associated pathways. Interestingly, the expression of many TSGs inversely correlated with tumor purity and positively correlated with antitumor immune cell infiltration levels in a wide variety of cancers, suggesting that these TSGs may exert their tumor suppressive function by promoting antitumor immunity. Moreover, the higher expression levels of certain TSGs, including GSTA2, CCL21, and MADCAM1, were associated with a significantly higher ITR in cancer. is could be attributed to the higher levels of TILs in the tumors highly expressing these TSGs. In addition, we identified 34 MTRs of TSGs, many of which have been recognized as tumor  Our results showed that the elevated expression of TSGs was likely associated with better survival prognosis in cancer. A possible explanation is that the expression of TSGs is positively associated with the infiltration of TILs which may improve the immune response against cancer cells. Our results also showed that the elevated expression of TSGs was likely associated with lower tumor purity whose association with prognosis is controversial. We analyzed the correlation between tumor purity and survival prognosis in 33 TCGA cancer types. We found that tumor purity had a significant positive correlation with OS in COAD, KIRP, and PRAD and with DFS in kidney renal clear cell carcinoma (KIRC), UCEC, and uveal melanoma (UVM) (log-rank test, P < 0.05), but have a significant negative correlation with DFS in HNSC (P � 0.035) (Supplementary Figure S10). ese results indicate that the correlation between tumor purity and survival prognosis was not significant in most cancer types although it was positive or negative in a few cancer types.
Based on another gene expression profiling dataset from GEO [3], we found that three TSGs (G6PC, XPNPEP2, and  TREH) had significantly higher expression levels in small intestine tissue than in colon and stomach tissues, confirming the results obtained from the analysis of GTEx dataset. Using COMPARTMENTS [39], we found that G6PC was located in the endoplasmic reticulum and both XPNPEP2 and TREH were located in the plasma membrane.
On the basis of the connections between pathways and proteins' subcellular location, we found that both G6PC and TREH were associated with glucose metabolism and contributed to the elevation of glucose levels. In addition, XPNPEP2 acts as a proline-specific aminopeptidase to regulate the proline concentration which is capable of affecting the glucose concentration [40]. us, all the three proteins can enhance the glucose concentration both inside and outside cells to inhibit glucose metabolism in cells ( Figure 7). Again, these results suggest that the metabolic regulation may play a crucial role in controlling tumorigenesis and tumor development.
A literature review showed that many of the 17 TSGs have been associated with tumor suppression. For example, BMP5 (bone morphogenetic protein 5) was downregulated in breast tumors relative to normal tissues and its downregulation was associated with cancer recurrence [41]. is gene plays a role in tumor suppression via repressing TGF-β1-induced epithelial-to-mesenchymal transition [41]. CCL21 (C-C motif chemokine ligand 21) is a cytokine gene involved in immunoregulation and inflammation. CCL21 is able to exert antitumor immunity by activating both innate and adaptive immune responses [42,43]. e other TSGs, such as SEPP1 [44], TMEM25 [45], XPNPEP2 [46], and G6PC [47], have been reported to play a role in tumor suppression. ese prior studies lend support to our results that these TSGs are likely to be tumor suppressor genes, although further experimental verification is needed.
In conclusion, this study provides new molecular cues associated with tumorigenesis and tumor development. e identified TSGs have potential clinical implications for cancer diagnosis, prognosis, and treatment.  TF:
Data Availability e gene expression profiling of normal tissues (small intestine, colon, stomach, and esophagus) was downloaded from GTEx (https://gtexportal.org/home/) and GEO (https://www.ncbi.nlm.nih.gov/gds) databases. e gene expression profiling of colon, stomach, and esophagus cancers and their normal tissues was downloaded from TCGA (https://portal.gdc.cancer.gov/). In addition, we obtained the gene expression profiling and clinical immunotherapy response data of two melanoma cohorts (Nathanson et al.'s cohort [4] and Roh et al.'s cohort [5]) from the associated publications.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this paper.
Acknowledgments is project was supported by the Key Laboratory of Disease Proteomics of Zhejiang Province. Table S1. Sample size of cancer and normal tissues in the datasets used in this study. Figure S1. CCL21 tends to be downregulated in cancer. Figure S2. CBFA2T3 tends to be downregulated in cancer. Figure S3. XPNPEP2 tends to be downregulated in cancer. Figure S4. Downregulation of tumor suppressor genes (TSGs) is associated with a worse survival prognosis in various cancers. Figure S5. e methylation levels of RASGRP2 promoter are higher in various cancer types than in normal tissues. Figure S6. e methylation levels of RASGRP2 promoter are inversely associated with the expression levels of RASGRP2 in cancer. Figure S7.

Supplementary Materials
e CCL21 promoter methylation levels are significantly upregulated in 16 TCGA cancer types compared to their normal tissues. Figure S8. Correlations of the expression levels of CCL21 with tumor purity in cancers. Figure S9. Correlations of the expression levels of CCL21 with immune cell infiltration levels in cancers. Figure S10. Correlation of tumor purity with survival prognosis in cancer. (Supplementary Materials)