MicroRNA-Related Prognosis Biomarkers from High-Throughput Sequencing Data of Colorectal Cancer

Background Colorectal cancer (CRC) is the third most common cancer in the world, and most of them are adenocarcinomas. CRC could be classified as colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ) according to the original tumorigenesis position. Increasing evidences indicated that microRNAs (miRNAs) play an important role in the occurrence of multiple tumors. Methods In this study, we firstly downloaded miRNA (COAD, 8 controls vs. 455 tumors; READ, 3 controls vs. 161 tumors) and mRNA (COAD, 41 controls vs. 478 tumors; READ, 10 controls vs. 166 tumors) data from The Cancer Genome Atlas (TCGA) database and then used DESeq2, RegParallel, miRDB, TargetScanHuman 7.2, DAVID 6.8, STRING, and Cytoscape software to identify the potential prognosis biomarkers. Results We identified 175 differential expression miRNAs (DEMs) and 3747 differential expression genes (DEGs) in COAD and 184 DEMs and 3928 DEGs in READ. And then, we obtained 21 (13 in COAD and 8 in READ) DEMs associated with the survival rates, which correlated with 440 (217 in COAD and 223 in READ) overlapping DEGs. Through survival analysis for those overlapping DEGs, we found 11 (8 in COAD and 3 in READ) overlapping DGEs associated with survival rates of patients, which were correlated with 9 (7 in COAD and 2 in READ) DEMs significantly. Conclusion In this study, we found several candidate prognostic biomarkers which have been identified in various cancers and also found several new prognosis biomarkers of COAD and READ. In conclusion, this analysis based on theoretical knowledge and clinical outcomes we have done needs further confirmation by more researches.


Introduction
Colorectal cancer (CRC) is the third most prevalent cancer, accounting for 10.2% of all cancers, and the second leading cause of cancer mortality, making up 9.2% of total cancer deaths [1]. In 2018, there are approximately 1.8 million new cases around the world with 881,000 deaths [1]. Depending on the original tumorigenesis position, CRC could be classified as colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ). Until now, the main efficiency therapy for CRC is limited to surgery. However, for those patients with CRC in an advanced phase, neoadjuvant therapy is indispensable to reduce the recurrence rate [2][3][4][5]. Despite the evolution of cancer pharmacological treatments providing advanced therapeutic strategies, new diagnostic and prognostic biomarkers are urgently needed [6][7][8].
With the development of The Cancer Genome Atlas (TCGA) database, which was based on the high-throughput sequencing, clinicians and researchers have increasingly understood the pathogenesis of various cancers. The data from TCGA is significantly important for guidance prevention, diagnosis, and treatment of various cancers. TCGA database characterized over 20,000 primary cancers and matched normal samples covering 33 cancer types, the information of which includes gene expression data, microRNA (miRNA) expression data, DNA methylation data, and standardized clinical data [9]. These are of great importance for clinicians and researchers to understand the mechanisms and find the potential prognostic biomarkers of related cancers. miRNAs were first identified in Caenorhabditis elegans by Lee et al. in the 1990s [10]. They are a key component of the noncoding RNA family with 19-24 nucleotides in length and regulate the expression of target genes by partial complement binding and degradation. Increasing evidences indicate that miRNAs are involved in multiple cancers by regulating target gene expression [11][12][13]. In different cancers, miRNA expression patterns are different. Therefore, establishment of correlation between mRNA and miRNA expression in cancer patients is important for understanding cancer pathogenesis and discovering potential prognostic biomarkers.

Data Downloading and Differential Expression Screening.
All data used in the present study were downloaded from TCGA-COAD and TCGA-READ project (https://portal .gdc.cancer.gov/), including an open-access raw count table of miRNA-seq (isoform expression quantification) and RNA-seq (HTSeq-Counts) and the corresponding clinical information. The differential expression of mRNA and miRNA was analyzed under a standard workflow of the DESeq2 package installed in R (R version 3.6.2) [14]. Technical replication samples in COAD were collapsed by the collapseReplicates function of DESeq2. The significantly different expression of mRNAs and miRNAs was selected according to basemean ≥ 50, padj < 0:05, and |logFC | ≥1:0 for COAD and READ. All normalized expression data were output for further survival analysis and pathologic TNM correlation analysis.

Survival
Analysis for DEMs and DEGs. The miRNAs whose expression significantly changed in tumor samples were used for survival analysis by RegParallel and survival packages installed in R. Before analysis, all DEM data were transformed to a low-expression and a high-expression group according to the medium value of each normalized miRNA expression. Then, a Cox proportional hazards regression method was selected for the survival analysis combined with samples' overall survival data downloaded from TCGA. miRNAs which significantly correlated with the survival rate were selected according to p < 0:05 and used to predict downstream target genes. For those miRNApredicted target genes, survival analyses were also performed by RegParallel and survival packages in the same way.

Target Gene Prediction of miRNAs.
We utilized two different web-based tools miRDB and TargetScanHuman 7.2 for miRNA target gene prediction; the target genes only enriched in the two databases were selected as putative target genes [15,16]. The Venn diagram was applied to visualize the overlapping DEGs of DEM target genes and DEGs in COAD and READ. To further understand the biological functions of the overlapping DEGs, we performed GO and KEGG pathway enrichment by DAVID 6.8 through input targeted gene official symbols [17].

Protein Interaction Network Construction and Topology
Analysis. The official gene symbols were imported into the Search Tool for the Retrieval of Interacting Genes (STRING (version 11)) to assess the interactions [18]. A normal medium confidence interval of 0.4 was used as a threshold. Cytoscape software was utilized to identify and visualize central genes.
2.5. Pathologic TNM Correlation Analysis. Expression data of candidate DEGs were obtained from the normalized analysis of DESeq2 and normalized by the average of the control group. The pathologic TNM data were obtained from the clinical information. The classification of pathologic M is divided into two groups, M0 and M1, according to whether distant metastasis is transferred or not. The classification of pathologic N is divided into two groups, N0 and N1, according to whether lymph node metastasis is transferred or not. The classification of pathologic T is divided into two groups, T1+2 and T3+4, according to the tumor size and infiltrating range.
2.6. Statistical Analysis. IBM SPSS Statistics 22 was applied for the Kaplan-Meier analysis. A repeated measures ANOVA followed by unpaired two-tailed Student's t-test was used as indicated. All results are expressed as mean ± SEM. All figures were created using the Prism 6.01 and Cytoscape 3.7.2 software.

Prediction of DEM Target Genes and Functional Analysis.
Followed by DEM identification in COAD and READ, the target genes of those DEMs were predicted by using two different online tools, miRDB and TargetScanHuman 7.2. Combining those DEM-targeted genes with DEGs, we identified 217 overlapping DEGs, of which 91 overlapping DEGs were downregulated and 126 overlapping DEGs were upregulated in COAD (Figure 2(a)). And 223 overlapping DEGs were identified, of which 193 overlapping DEGs were downregulated and 30 overlapping DEGs were upregulated in READ (Figure 2(b)).

Discussions
CRC is the third most prevalent cancer and the second leading cause of cancer mortality worldwide. Up to now, surgery is still the main treatment method in clinical practice [1]. In addition, for those patients in advanced cancer stages, the implementation of neoadjuvant therapy to reduce the recurrence rate and sphincter malfunction rate of colorectal cancer is indispensable [3]. Therefore, an accurate preoperative assessment of CRC is of great importance. In the present study, we analyzed the COAD and READ sequencing data from TCGA in order to reveal the key genes correlated with miRNAs and to evaluate their potential for diagnosis and treatment as biomarkers.
Increasing studies have shown that miRNAs play an important role in the diagnosis and treatment of cancers [19][20][21][22]. Through data mining of sequencing results deposited in TCGA, we identified 21 DEMs (13 DEMs in COAD and 8 DEMs in READ) associated with the survival rates and simultaneously negatively correlated with 440 (217 DEGs in COAD and 223 DEGs in READ) overlapping DEGs. For those 21 DEMs in COAD and READ, previous studies demonstrated that several miRNAs are indeed associated with survival in cancer patients, such as has-miR-328 in pancreatic cancer, has-miR-486 in non-small-cell lung cancer, and has-miR-145 in breast cancer and laryngeal       [23][24][25][26]. For those overlapping DEGs, we then analyze their correlation with the survival status. Finally, we found 8 and 3 DEGs which were correlated with 7 and 2 DEMs significantly associated with patients' overall survival in COAD and READ, respectively.

BioMed Research International
For those DEMs used in the miRNA-mRNA network, miR-144-5p, miR-145-5p, and miR-486-5p have been reported as biomarkers in bladder cancer, gastric cancer, and lung cancer, respectively [27][28][29]. Our results indicated that the three miRNAs have potential to be prognostic biomarkers in COAD as well. It is reported that other four miRNAs, miR-29c-5p, miR-15b-3p, miR-130a-3p, and miR-500a-3p, were associated with various cancers but not recommended as prognostic markers [30][31][32][33]. Our results suggest that those DEMs (miR-144-5p, miR-145-5p, miR-486-5p, miR-29c-5p, miR-15b-3p, miR-130a-3p, and miR-500a-3p) may serve as potential biomarkers, but their feasibility remains to be investigated. For those DEMs in READ, miR-455-5p was associated with oral squamous cancer [34]. However, few evidences indicate the association between  BioMed Research International miR-194-3p and cancers. Our results suggested that those DEMs (miR-455-5p and miR-194-3p) may also serve as potential biomarkers, but their feasibility remains to be further validated. As for miRNA-targeted genes in COAD, TUBB6, FZD3, and SERPINE1 were reported to be prognostic biomarkers in colorectal cancer potentially [35][36][37]. According to the previous literature, one of the causes of sporadic colorectal cancers is RPS6KA5 frameshift mutation, which leads to the same consequence as being targeted by miRNAs [38]. At last, the four genes, DTNA, PRR11, TMEM178B, and CACNA1D, were mainly related to gastric cancer or prostate cancer [39][40][41][42]. In miRNA-targeted genes of READ, ZFPM2 was recom-mended as a diagnostic biomarker for malignant pleural mesothelioma and PLXNA1 and PTPRU were related to lung cancer or colon cancer [43][44][45][46]. Combining the expression levels of those target genes with TNM stage classification, SER-PINE1 is the most important gene associated with tumor size and metastasis status in COAD. And miR-145-5p, targeting SERPINE1, could be another strong prognosis biomarker for COAD. Further research is urgently needed to verify the detailed regulation pattern between miR-145-5p and SER-PINE1 in a COAD patient, and this might be a valuable target in COAD therapy since therapy using small RNA is safer than other methods in gene therapy [47]. For READ, no DEGs have been found through all TNM stages. It is most likely that the   BioMed Research International stringent criteria we set cannot screen out ideal DEMs and their target genes. There are many reports presenting potential prognostic miRNAs in colorectal cancer, but they just analyzed the microarray data without comparison with gene expression data or focused on hypoxia stress to cancer progression [48,49]. Another research analyzed the differentially expressed miRNA and its potential target in COAD without considering the survival effect on the screened genes [50], which cannot be sufficiently used in clinical practice as prognostic biomarkers.
In the present study, we downloaded TCGA-COAD (miRNA: 8 controls vs. 455 tumors; mRNA: 41 controls vs. 478 tumors) and TCGA-READ (miRNA: 3 controls vs. 161 tumors; mRNA: 10 controls vs. 166 tumors) project data from TCGA. Through a series of bioinformatics analyses, we found that several potential DEMs could be used as prognostic biomarkers in COAD and READ. However, due to the small number of samples in the control group, false-positive or falsenegative results might be inevitable in our results. In addition, by the own complexity of cancers, the combined analysis strategy may be more accurate than that using the individual miRNA or mRNA as the prognostic indicator [23]. Therefore, further analysis in READ and COAD is necessary to improve the accuracy of prediction results.

Conclusions
In conclusion, we identified that 7 DEMs and 8 DEGs were correlated with the survival of patients with COAD, which may be the potential biomarkers for the prognosis of patients with COAD, and 2 DEMs and 3 DEGs were correlated with the 232survival of patients with READ, which may be the potential biomarkers for the prognosis of patients with READ through bioinformatics analysis for COAD-and READ-related miR-NAs and mRNAs systematically. Our research not only deepened the understanding of the pathogenesis but also contributed to the diagnosis and prognosis of patients with COAD and READ. However, it is worth emphasizing that the regulatory network of miRNA-mRNA is particularly com-plex and the number of case and control data used in the study was not sufficient. Therefore, more scientific researches are needed to confirm our findings while we just provide an analysis direction depending on theoretical knowledge and clinical outcomes.

Data Availability
The data that support the findings of this study are openly available in TCGA at https://portal.gdc.cancer.gov/.

Conflicts of Interest
The authors declare no competing interests.