Novel lncRNA Panel as for Prognosis in Esophageal Squamous Cell Carcinoma Based on ceRNA Network Mechanism

Background The competitive endogenous RNA (ceRNA) mechanism has been discovered recently and regulating cancer-related gene expressions. The ceRNA network participates in multiple processes, such as cell proliferation and metastasis, and potentially drives the progression of cancer. In this study, we focus on the ceRNA networks of esophageal squamous cell carcinoma and discovered a novel biomarker panel for cancer prognosis. Methods RNA expression data of esophageal carcinoma from the TCGA database were achieved and constructed ceRNA network in esophageal carcinoma using R packages. Results Four miRNAs were discovered as the core of the ceRNA model, including miR-93, miR-191, miR-99b, and miR-3615. Moreover, we constructed a ceRNA network in esophageal carcinoma, which included 4 miRNAs and 6 lncRNAs. After ceRNA network modeling, we investigated six lncRNAs which could be taken together as a panel for prognosis prediction of esophageal cancer, including LINC02575, LINC01087, LINC01816, AL136162.1, AC012073.1, and AC117402.1. Finally, we tested the predictive power of the panel in all TCGA samples. Conclusions Our study discovered a new biomarker panel which may have potential values in the prediction of prognosis of esophageal carcinoma.


Introduction
Esophageal squamous cell carcinoma is one of the most frequently diagnosed malignancies among women worldwide. Despite the substantial research findings in investigating the intracellular mechanisms of esophageal cancer, it remains to be a significant cause of cancer death to women. One reason for this urgent situation is that we have not found effective methods for the early diagnosis and prognosis of esophageal cancer. Previous work has proved that some noncoding RNAs (NcRNAs), including circular RNA (circRNA), long noncoding RNAs (lncRNAs), and micro RNAs (miRNAs) [1], could interact with each other [2].
Based on the functions of NcRNAs, most researches focus on the functional NcRNAs, which also play essential roles in the ceRNA network [3], in brief, miRNAs bind with mRNAs via MREs, and the functions of mRNAs are thus inhibited by miRNAs, at the same if functional lncRNAs bind with the corresponding miRNAs, the functions of miR-NAs are thus inhibited which indirectly release the mRNAs which were inhibited by miRNAs, and the interaction network are called ceRNETs [4]. Recently, more and more evidence have proved that the ceRNA network has key roles in the progression of cancer, and some ceRNAs could be further used to construct biomarkers for the prediction of patient prognosis [5,6].
Esophageal cancer is one of the most aggressive cancer types and sixth most common cause of cancer death worldwide [7]. Esophageal cancer has two major subtypes: esophageal squamous cell carcinoma (ESCC) and esophageal adenocarcinoma (EAC). The incidence rate of ESCC is high in the Asian regions including Japan, China, and India [8,9]. ESCC is one of the most deadly forms of human malignancy characterized by late metastasis, stage diagnosis, therapy resistance, and frequent recurrence [10]. Earlier diagnosing ESCC could improve treatment outcomes of patients as well as prolong patients' OS. ESCC has a close relation with aberrant expressions of genes, which makes clinical diagnosis more complicated to patients [11]. The ceRNA network could improve our insight into the progression of esophageal carcinoma, and it may help us to build a better biomarker panel for the diagnosis and prognosis of patients [12].
Numerous experimental studies have demonstrated ceRNA network had a crucial role in modulating the pathogenesis of human diseases. For example, lncRNA PART1 suppresses ESCC cell proliferation by regulating the miR-18a-5p/SOX6 signaling axis [13]. Linc00941 regulates ESCC via functioning as a competing endogenous RNA for miR-877-3p to modulate PMEPA1 expression [14]. Moreover, analysis of the ceRNA network has been utilized in biomarker construction, drug target discovery, and even drug resistance studies [15]. Based on the ceRNA network theory, Cantile et al. discovered that a single lncRNA, HOTAIR, was highly expressed in ESCC, which was the first breast cancer hallmark and a biomarker to predict the patient's OS [16]. Since a single lncRNA was not enough considering poor sensitivity and specificity in a statistic perspective [17], taking multiple dysregulated RNAs into evaluation of the prognosis of patients were gradually accepted by researchers.
Taken together, it has been generally approved that lncRNAs have great predict powers in the prediction of cancer stages and patients' prognosis. In this study, we discovered a ceRNA network by analyzing RNA expressions in esophageal squamous cell carcinoma. Through bioinformatic procedures, we further explored novel ESCC progression-related lncRNAs, then construct a prediction model, and evaluated the predictive power in TCGA samples.

Material and Methods
2.1. RNA-Seq Data Procession. The study was done in R environment, and RTCGAToolBox was used to download mRNA and miRNA database. GDC Data Portal was used to download the lncRNA database. The following R packages were selected for data filtration and analysis in this paper, including Hmisc, plyr, hash, ggplot2, R.utils, ggbiplot, pROC, gplots, survival, ggbiplot, edgeR, survminer, RColor-Brewer, magrittr, GenomicDataCommons, rtracklayer, grid, corrplot, scales, and limma.

RNA-Seq Data
Procession. ESCC RNA-seq data containing 161 tumor samples and 11 normal samples from the TCGA database were retrieved with the software R. All databases were opensource and free to use.

RNA Expression Analysis.
All RNA expression profiles were divided into high and low expression groups with Edge R and limma as previously described by Anders and Huber [18]. GENCODE was used to analyze lncRNA expressions. In this study, log2 FC ≥ 1:5 and the FDR < 0:01 were set as threshold.
2.4. ceRNA Network Construction. MiRWalk (http://mirwalk .umm.uni-heidelberg.de/) was selected to screen miRNAs and corresponding target genes. The miRNA-lncRNA correlation > 0:45 and P < 0:05 was set to filter out powerful correlations [19]. Cytoscape 3.6.3 was used to construct the network model. Cluster Profiler was used to do GO and KEGG analysis followed by methods previously described [20].
A schematic view of this study is shown in Supplementary Figure 1 ( Figure S1). In brief, the key procedures of this study could be listed as follows: (1) Download mRNA, miRNA, and lncRNA data from the TCGA portal

Survival Analysis
We referred to a previously described method for the construction of CESC prognosis prediction model [21]. Survival package was selected to generate survival plots, which were then used to evaluate the prediction power of the biomarker panel. Cox regression statistic procedures were processed to test the predictive value of the panel. ROC and AUC curves were plotted with the Daim package to evaluate model specificity and sensitivity [22].

Database Download and Clinical
Features. All ESCC TCGA samples with level 2 RNA expression info were downloaded. The sample information was shown in Table S1. The transcriptome data of 161 ESCC samples and 11 normal samples were achieved. Log2 FC > 1:5 and P < 0:01 set as threshold. To help our understanding of the intracellular mechanisms which were involved in the development of tumor progression, KEGG analysis was also performed in the R-based ProfilerCluster package. The pathways in cancer (P < 0:01) were shown in Figure 1(a). Heatmap package was used to plot the heatmap with the top 1000 differentially expressed genes ( Figure 1(b)), followed by PCA plots in Figure 1(c). Expression data were gathered in Table S2.
4.2. lncRNA and miRNA Profiles. Log2 FC > 1:5 and P < 0:01 were set as threshold, as shown in Figure 2(a) (DEGs were shown in Table S3). Next, PCA plots were shown in Figure 2(b). Expression data were grouped into multiple categories based on clinical features to evaluate prognostic significance.
To generate ceRNA network, we further processed the miRNA expression data. Dysexpressed miRNAs  Table S4. Heatmap was shown in Figure 2(c). The PCA plots were shown in Figure 2(d).
4.3. miRNA Survival Analysis. As miRNAs form the core of ceRNA network, survival plots were generated to filter out the miRNAs with clinical significance. Significant correlation among four miRNAs, namely, miR-93, miR-191, miR-99b, and miR-3615 and the patients' OS were found (P < 0:05). To be more specific, all the four miRNAs were positively correlated with patients' OS (Figures 3(a)-3(d)).

Construction of ceRNA Network in Esophageal
Carcinoma. In the first step, we tried to find a hubLncRNA panel without training the lncRNA data. But the result showed that the smallest panel we could find contained as many as 50 hubLncRNAs, which were not appropriate for future application. The initial ceRNA network was shown in Figure 4(a). Then, we trained the lncRNAs data by unsupervised hierarchical clustering strategy. After unsupervised clustering, the number of lncRNAs was limited to a proper amount, and we constructed an initial ceRNA network that directly yielded four key miRNAs (Figure 4(b)). For instance, two lncRNAs, namely, LINC02575 and LINC01087 could competitively bind with miR-191. Two lncRNAs, namely, LINC01816 and AL136162.1, could competitively bind with miR-3615. One lncRNA, AC012073.1, could competitively bind with miR-93. One lncRNA, AC117402.1, could competitively bind with miR-99b. Next, DElncRNAs and DERNAs were used to build a linear regression model. Results proved a strong correlation between miRNAs, lncRNAs, and mRNAs (Figure 4(c)).  Table S5.
To clearly investigate the predictive power of the six hub genes, by unsupervised clustering strategy, all 161 patients were grouped into two clusters ( Figure 5(e)). Survival analysis showed statistical significance (median OS, 494 vs. 247 days) (log-rank test P = 0:012; Figure 5(f)).
The survival data of the six HublncRNAs were taken to computing the patients' risk score which comes as below:  When combined all together, the six lncRNAs constructed a powerful biomarker panel. For example, in male patients, the OS curves showed statistical significance (log-rank test P < 0:05, Figure 6(e)). AUC curve was shown in Figure 6(d). Furthermore, similar results could be found in stages I-II/III-IV as well (log-rank test P < 0:05, Figures 6(g)-6(h)).

Discussion
The ceRNA model is found on the basis of miRNA competitive binding. More and more evidence shows that these miRNAs and their competitive endogenous targets can form complex ceRNA networks [23].
Thanks to the free access to the comprehensive TCGA database, this study managed to develop an ESCC prognostic prediction biomarker panel. Our ceRNA network-based prediction model provided a deeper perspective into the intracellular mechanisms of ESCC.
As discussed in many studies, lncRNAs have great predict power in prognosis and diagnosis, which are sometimes even stronger than miRNAs and mRNAs [6,17,24,25], some lncRNAs such as Fender have already been applied in practice. In esophageal cancer, several lncRNAs had been revealed to be related to the cancer prognosis and progression [26]. For example, overexpression of LEF1-AS1 predicts a poor prognosis in ESCC [26]. Li et al. reported an eight-lncRNA signature could predict the overall survival of ESCC [27]. Moreover, upregulation of HOTAIR was related to the poor prognosis of ESCC [28]. However, few studies on ceRNA have focused on predicting ESCC prognosis. Moreover, there are rare lncRNAs, mRNAs, or miRNAs related to ESCC could be treated as reliable biomarkers to detect ESCC and stratify the ESCC risk. Under this background and ceRNA network hypothesis, our study identified 95 dysexpressed lncRNAs, such as DIAPH3-AS1, NCBP2-AS1, ALG9-IT1, PRR7-AS1, SLC25A5-AS1, and TBL1XR1-AS1, which were strongly correlated with the progression of esophageal carcinoma. One lncRNA worth noting is LINC01087, it is the only lncRNA that has been reported to promote the cell division and metastasis of cancer [29]. As far as we know, the rest 5 lncRNAs are not reported to have prognostic value or molecular insights. One thing worth noting, at the beginning of this study, first, we tried to find a hub lncRNA panel without training the lncRNA data. But the result showed that the smallest panel we could find contained as many as 50 hub lncRNAs, which were not appropriate for future application. So we trained the lncRNA data by unsupervised hierarchical clustering strategy. Unsupervised clustering strategy has been adopted in many researches, for example, a multi-institutional study of the International Mesothelioma Panel from MESOPATH Reference Center proved that unsupervised clustering strategy is a powerful tool in achieving the predict accuracy of biomarker panel [30].
Given that any transcripts harbouring MREs could theoretically function as ceRNAs, as previously reported, they may represent in a widespread multiple forms of regulation in both pathology and physiology [31]. In this study, as shown in Figure 4(b), we found two lncRNAs, namely, LINC02575 and LINC01087, could competitively bind with miR-191. Two lncRNAs, namely, LINC01816 and AL136162.1, could competitively bind with miR-3615. One lncRNA, AC012073.1, could competitively bind with miR-93. One lncRNA, AC117402.1, could competitively bind with miR-99b. Among the core miRNAs found in this study, miR-191 has been previously reported in esophageal carcinoma which predicted poor prognosis and promoted cell proliferation and invasion [32]. Furthermore, miR-191 has been proved to be emerged as key participants of p53 signaling pathways in breast cancer [33]. miR-93 has been proved to promote cervical cancer progression by targeting THBS2/MMPS signal pathway [34]. miR-99b was previously reported to be a tumor suppressor by targeting IGF-1R in gastric cancer [35], while in this study, miR-99b is highly expressed in cancer patients and is supposed to be a tumor promoter. Moreover, miR-99b and miR-375 as a combination are found to be potential predictive response biomarker panel for preoperative chemoradiotherapy in rectal cancer [36]. miR-3615 was previously reported to be a potential  9 Computational and Mathematical Methods in Medicine oncogene in human cancer. For example, miR-3615 was overexpressed in liver cancer and related to the high TNM stage [37]. Although the ceRNA network we found in this study identifies many ESCC-related miRNAs, lncRNAs, and mRNAs, the correlation and the extent of ceRNA effects remain to be investigated by in vivo experiments. Recent studies have proved that by constitutive posttranscriptional regulations, binding free energy and repression feedbacks are pivotal factors for crosstalk between ceRNAs [38,39], which means that there are many criteria for validating ceRNA networks, such as miRNAs and RBPs, kinetic parameters, the target ratio of miRNAs, the quantitative measurements of miRNAs, and the size and affinities of the competing targets [40,41]. Moreover, we reported six lncRNAs which could be taken together as a panel for prognosis prediction of esophageal cancer, including LINC02575, LINC01087, LINC01816, AL136162.1, AC012073.1, and AC117402.1. Of note, several previous reports had implied the functional importance of these lncRNAs in cancers. For instance, LINC02575 was reported to promote the proliferation of laryngeal squamous cell carcinoma cells [42]. LINC01087 is highly expressed in breast cancer [43]. LINC01816 was found to promote the migration of thyroid carcinoma cells [44]. Therefore, our findings still need to be verified through in the future in vivo and in vitro experiments as well as clinical practice.
This study showed that the six lncRNAs together can be taken as a biomarker panel for the prognosis of esophageal cancer.

Conclusion
This study provides a novel biomarker for the prediction of prognosis to patients diagnosed with esophageal squamous carcinoma.

Data Availability
The data used to support the findings of this study are included within the article. The data and materials in the current study are available from the corresponding author on reasonable request.

Conflicts of Interest
The authors declare that they have no conflict of interest.

Authors' Contributions
CL designed the project. JZ proposed the research concept and wrote the manuscript. FX, GQ, and ZZ carried out the data search and downloading and literature collection. QM and YH conducted the bioinformatic analysis. HX constructed the graphic images and data charts and performed the statistical processing.

Acknowledgments
This study is supported by the China-Japan Friendship Hospital under grant number 2016-2-QN-20.

Supplementary Materials
Supplementary