Construction of ceRNA co-expression network and screening of molecular targets in colorectal cancer

Objective to screen some RNAs that correlated with colorectal cancer (CRC).Methods Differentially expressed miRNAs, lncRNAs, and mRNAs between cancer tissues and normal tissues in CRC were identified using data from the Gene Expression Omnibus (GEO) database. The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and protein-protein interactions (PPIs) were performed to do the functioal enrichment analysis. And a lncRNA-miRNA-mRNA network was constructed wich correlated with CRC. RNAs in this network were subjecte to analyze the relationship with the patient prognosis. Results A total of 688, 241, and 103 differentially expressed genes (diff-mRNA), diff-lncRNA, and diff-miRNA were obtained. between cancer tissues and normal tissues. A total of 315 edges were obtained in the ceRNA network. lncRNA RP11-108K3.2 and mRNA ONECUT2 correlated with prognosis. Conclusion The identified RNAs and constructed ceRNA network could provide great sources for the reasearches of therapy the CRC. And the lncRNA RP11-108K3.2 and mRNA ONECUT2 may serve novel prognostic predictor of CRC. several CRC-related diff-miRNAs, diff-lncRNAs, and diff-mRNAs A ceRNA also constructed to analyze the crosstalk among the identified diff-miRNAs, diff-lncRNAs, and diff-mRNA. The lncRNA RP11–108K3.2 and mRNA ONECUT2 in the network in CRC, their low expression levels being correlated with better prognosis. the mechanism of RP11–108K3.2 and ONECUT2 remain to be reveald, both of two may use as a novel clinical predictors of CRC. Further experimental studies are needed to reveal the mechanism of RP11–108K3.2 and ONECUT2 in CRC in


Introduction
Colorectal cancer (CRC) is a common malignant tumor in the gastrointestinal tract [1] The initial syndrome of CRC are not conspicuous, but with the development of the tumor, the patient will show changes in bowel habits, blood in the stool, diarrhea, alternating diarrhea and constipation, local abdominal pain and other symptoms. Patients with advanced symptoms often show syndromes such as anemia and weak [2] . CRC is the third most common malignant tumor around the world and its mortality rate is extremely high [3] . Considering the great threat to human health of CRC, a series of new diagnostic and therapeutic methods have emerged. CRC is a complex disease involving the expression and structure of genes [4] . More and more studies have shown that miRNA can play a crutial aspect in cancer progression by regulating its related targets, including mRNA, lncRNA, etc [5] .
LncRNA is a non-coding RNA with a length of more than 200 nucleotides, which acts a pivotal part in abounding actions such as dose compensation effect, epigenetic regulation, cell differentiation regulation and cell cycle regulation [6,7] . It has been reported that upregulation of lncRNA FOXD3-AS1 suggests a lower survival rate in CRC patients. Experimental results showed that FOXD3-AS1 was overexpressed in CRC tissues and cells. Down-regulation of FOXD3-AS1 expression in vitro can promote cell proliferation, invasion and migration, and promote apoptosis [8] . lncRNA-ATB and lncRNA-CCAT have strong accuracy in distinguishing CRC patients from healthy individuals [9] . miRNAs are endogenous non-coding RNAs with managerial actions, with a length of approximate 22 nucleotides that are involving post-transcriptional gene expression regulation in animals and plants [10] . miRNA-149 can be used as a target miRNA for identifying single bases in the serum of healthy and cancer patients. This method is direct and sensitive and can be used as an early diagnostic tool for colorectal cancer [11] . The expression of miR-139-5p was down-regulated in the CRC cell line compared to the ordinary human colonic mucosal epithelial cell line NCM460. The study subsequently also demonstrated that overexpression of miR-139-5p in colon tumor cell lines significantly inhibited proliferation of cells in vivo and in vitro. The final study found that miR-139-5p regulates chronic inflammation by inhibiting NF-kB activity, thereby inhibiting cell proliferation and invasion in CRC [12] .
A series of studies have shown that miRNAs can silence gene by binding to mRNA, and lncRNA can regulate gene expression level relying on competitively binding miRNAs [13,14] . At this study, a lncRNA-miRNA-mRNA co-expression network was constructed using two GEO datasets to screen RNAs that may associated with CRC, and supply a novel method for the diagnosis and therapy of CRC ( Figure 1).

Data collection
The LncRNA/mRNA profile data GSE126092 was downloaded from Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/gds/?term =). And the extracted data were produced by the platform of Agilent-074348 Human LncRNA v6 4 180K (GPL21047, Probe Name Version). This dataset contained the data of lncRNA/mRNA expression profiles in 10 colorectal cancer (CRC) tissues and their corresponding normal-appearing tissues (NATs). The miRNA profile data GSE126093 was also extracted from GEO database. The data were produced by the platform of Exiqon miRCURY LNA microRNA array, 7th generation (GPL18058, miRBase v18, condensed Probe_ID version). This dataset contained ten cases of CRC tissues and their corresponding NATs.
Data preprocessing and screenning of differentially expressed mRNA, lncRNA and miRNA (diff-mRNA, diff-lncRNA, diff-miRNA) Limma pakage was used to preprocess the downloaded raw data and screen the differentially expressed mRNA, lncRNA and miRNA. The preprocess process included background correction, normalization, and concentration prediction. The matrix data was combineded with the chip platform annotation file to map the probe to the symbol. For multiple probes corresponding to a equal symbol, the final expressin was decided by the mean of probes. Differential expression analysis of tumor vs.
control was performed on the samples using classical Bayesian test, and corrected with Benjamini/Hochberg. P-value 0.05 and log 2 FC 2 were taken as the cuts-off for the screenning of diff-mRNA, diff-lncRNA, diff-miRNA.

Functional enrichment analysis
The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways for diff-mRNA, mRNAs in modules that were significantly clustered in PPIs, mRNAs in ceRNA network were carried out, via the R pakage clusterprofiler (Version 2.4.3, http://bioconductor.org/packages/3.2/bioc/html/clusterProfiler.html). In KEGG pathway enrichment analysis, p-value < 0.05, and gene count > 2 were chosen as the cuts-off criteria.
Co-expression analysis of dif-lncRNA and dif-mRNA The Pearson correlation coefficient (PCC) was calculated between lncRNA and mRNA. And the PPC 0.95 and p-value 0.05 were considered tobe meaningful correlation. Cytoscape was used to illustrate the co-expression network.

Survival analysis
The colon cancer tumor samples from which mRNA, lncRNA, miRNA and survival prognosis messages were extracted from the TCGA database, and the expression values and survival prognosis information of all node elements in the lncRNA-miRNA-mRNA network were extracted from the TCGA colon cancer samples. K-M survival curve was performed using Survival (Version: 2.42-6, https://cran.r-project.org/web/packages/survival/index.html). P-value 0.05 was considered as the significant threshold.  (Table 1).

PPIs network with regard to Differentially expressed mRNAs in CRC
PPIs network was constructed, which intergrates large amount of known and predicted interactions of proteins. As shown in Figure 3, 354 nodes and 1,500 edges were included in the network. Module analysis was performed using MCODE, and two modules were obtained (module-A, module-B).
Module-A (score = 32.778) was constructed with 37 nodes adn 590 edges, and the module-B (score = 15) was constructed with 15 nodes and 105 edges ( Figure 4A, 4B). KEGG analysis was also performed. lncRNA-miRNA-mRNA network was shown in Figure 5, and the nodes in the network was shown in Supplementary Table 4. The diff-mRNA in lncRNA-miRNA-mRNA network were analyzed using KEGG pathway enrichment analysis, of which 15 pathways (Table 3).
Differentially expressed RNAs with regard to the prognosis of CRC All the nodes in the Figure 6 were used to do the survival analysis, of which a lncRNA RP11-108K3.2 and one mRNA ONECUT2 correlated with prognosis.

Discussion
In this study, to screen for RNAs and pathways associated with CRC, a lncRNA-miRNA-mRNA expression network was constructed using two datasets downloaded from the GEO database. A total of 688 diff-mRNA, 241 diff-lncRNA, and 103 diff-miRNA were identified at the present study. A functional enrichment analysis of these 688 diff-mRNA revealed that they were mainly enriched in 21 pathways. A PPI network analysis was also performed on these diff-mRNA. Finally, a ceRNA network was constructed and a survival analysis was performed on the nodes to obtain two prognostic-related RNAs. The expression levels of RP11-108K3.2 and ONECUT2 were both up-regulated in CRC, and low levels were associated with better prognosis, suggesting that they may play a positive role in CRC.

LncRNA can participate in the initiation and development of many diseases by directly regulating
proteins or indirectly regulating the target genes of related miRNA. The result of ceRNA showed that [15] . In a study on lung adenocarcinoma, the researchers study 346 differentially expressed lncRNAs, of which RP11-108K3.2 is highly expressed in lung cancer tissues [16] . Furthermore, hsa-miR-224-5p is up-regulated both in adeoma and cancer tissues [17] . A study has been revealed that hsa-miR-224-5p could suppress the cell growth of two oral squamous cell carcinoma cell lines (SCC4, SCC15).

RP11-108K3.2 correlated with low overall survival of CRC, and this result was consisten with our study
However, when hsa-miR-224-5p and pcDNA3.1-CT-GFP-FTH1P3 are co-transfected, the growth inhibition of hsa-miR-224-5p on oral squamous cell carcinoma will be reversed [18] . In this study, the expression of RP11-108K3. 2  ONECUT2 is a member of the onecut2 transcription factor family that interacted with hsa-miR-139-5p, hsa-miR-188-5p, hsa-miR-19b-1-5p, hsa-miR-31-5p, and hsa-miR-497-5p in ceRNA network in this study [19] . It has been reported that ONECUT2 acts a critial position on CRC gene network and is significantly associated with the development of cance. Interfernce with endogenous ONECUT2 expression inhibited the CRC cell line SW620 cell migration [20] . A series of experiments have shown that hsa-miR-139-5p acts as a crutial part in colon tumor. Compared with normal tissues, hsa-miR-139-5p expression level is significantly declined in colon tumor, and its expression level is significantly correlated with tumor stage. Subsequently, experiments have shown that the direct object of hsa-miR-139-5p is BCL2, and the expression of BCL2 is negatively correlated with the expression of hsa-miR-139-5p. The tumor metastasis and drug sensitivity of CRC could be diminished by hsa-miR-139-5p targeting the BCL2 pathway [21] . In present study, hsa-miR-139-5p was interacted with ONECUT2, and its expression level was negatively connected with the expression level of the later one. These two RNAs may have a similar mechanism with hsa-miR-139-5p and BCL2. hsa-miR-31-5p as a infrequently expressed miRNA, was associated with stage when assessing CRC cases.
The up-regulated hsa-miR-31-5p makes a more advanced disease stage more likely than a lower disease stage [22] . At this research, the expression level of hsa-miR-31-5p was higher in CRC tissues than in control (logFC = 9.153176). In addition to being significantly up-regulated in CRC tissues, it has been reported to be significantly up-regulated in uterine cervical cancer tissues [23] .
In conclusion, at the present study, several CRC-related diff-miRNAs, diff-lncRNAs, and diff-mRNAs were obtained. A ceRNA network was also constructed to analyze the crosstalk among the identified diff-miRNAs, diff-lncRNAs, and diff-mRNA. The lncRNA RP11-108K3.2 and mRNA ONECUT2 in the network may play crutial role in CRC, with their low expression levels being correlated with better prognosis. Although the mechanism of RP11-108K3.2 and ONECUT2 remain to be reveald, but both of    Diagram of bioinformatics analysis  The protein-protein interaction network of differentially expressed mRNAs. The purple nodes represented the up-regulated mRNAs, and the green nodes represented the down-regulated mRNAs. The higher the degree value is, the larger the nodes.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.