Construction and Analysis of Survival-Associated Competing Endogenous RNA Network in Lung Adenocarcinoma

Increasing evidence has shown that noncoding RNAs play significant roles in the initiation, progression, and metastasis of tumours via participating in competing endogenous RNA (ceRNA) networks. However, the survival-associated ceRNA in lung adenocarcinoma (LUAD) remains poorly understood. In this study, we aimed to investigate the regulatory mechanisms underlying ceRNA in LUAD to identify novel prognostic factors. mRNA, lncRNA, and miRNA sequencing data obtained from the GDC data portal were utilized to identify differentially expressed (DE) RNAs. Survival-related RNAs were recognized using univariate Kaplan-Meier survival analysis. We performed functional enrichment analysis of survival-related mRNAs using the clusterProfiler package of R and STRING. lncRNA-miRNA and miRNA-mRNA interactions were predicted based on miRcode, Starbase, and miRanda. Subsequently, the survival-associated ceRNA network was constructed for LUAD. Multivariate Cox regression analysis was used to identify prognostic factors. Finally, we acquired 15 DE miRNAs, 49 DE lncRNAs, and 843 DE mRNAs associated with significant overall survival. Functional enrichment analysis indicated that survival-related DE mRNAs were enriched in cell cycle. The survival-associated lncRNA-miRNA-mRNA ceRNA network was constructed using five miRNAs, 49 mRNAs, and 21 lncRNAs. Furthermore, seven hub RNAs (LINC01936, miR-20a-5p, miR-31-5p, TNS1, TGFBR2, SMAD7, and NEDD4L) were identified based on the ceRNA network. LINC01936 and miR-31-5p were found to be significant using the multifactorial Cox regression model. In conclusion, we successfully constructed a survival-related lncRNA-miRNA-mRNA ceRNA regulatory network in LUAD and identified seven hub RNAs, which provide novel insights into the regulatory molecular mechanisms associated with survival of LUAD, and identified two independent prognostic predictors for LUAD.


Introduction
Lung cancer is the most commonly diagnosed and lethal malignancy worldwide [1]. Lung adenocarcinoma (LUAD) is a common subtype of lung cancer [2]. Despite the recent advances in targeted therapeutic strategies, the outcomes of the available treatment strategies for LUAD remain unsatisfactory owing to the drug resistance and relapse, and the five-year overall survival is less than 20% [3]. Therefore, there is an urgent need to understand the molecular mechanisms underlying the pathogenesis of LUAD and identify novel potential prognostic biomarkers to improve prognosis of the disease.
Genetic mutations and dysregulation that can contribute to the pathogenesis of cancer are served as biomarkers. Mutations in epidermal growth factor receptor (EGFR) occur in approximately 20% cases of lung cancer [4], and epidermal growth factor receptor-tyrosine kinase inhibitors (EGFR-TKIs) are indispensable in the treatment of EGFR-mutant advanced LUAD. Next-generation sequencing technology has been used to study the role of various RNAs in greater depth. Long noncoding RNAs (lncRNAs) have been considered as potential biomarkers and therapeutic targets due to their unique expression in various cells [5]. Several studies have indicated that dysregulation of lncRNAs, such as MIR31HG [6] and LINC01512 [7], promotes the progression and proliferation of tumour cells in LUAD. microRNAs (miRNAs) play a crucial role in the regulation of protein expression and therefore are considered potential biomarkers in cancer diagnosis. Wang et al. [8] identified a four-miRNA signature comprising miR-142-5p, miR-409-3p, miR-223-3p, and miR-146a-5p, for an early detection of LUAD. Xu et al. [9] reported that miRNA-21, miRNA-125b, and miRNA-224 are associated with chemotherapy sensitivity in patients with LUAD. However, the RNA biomarkers require a critical review before their application for clinical decision-making.
The competing endogenous RNA (ceRNA) network hypothesis, which states that noncoding RNAs (ncRNAs), miRNAs, and mRNAs communicate with each other through microRNA response elements (MREs), has been implicated in posttranscriptional regulation [10,11]. miR-NAs repress the translation of target mRNAs by partial or complete complementary binding to MREs on their target RNA transcripts [12,13]. ncRNAs can act as endogenous miRNA sponges to competitively bind miRNAs through shared MREs to regulate the expression levels of mRNAs, thereby forming specific ceRNA regulatory network comprising ncRNA-miRNA-mRNA interactions [14]. In the past decade, the study of ceRNAs has gained increased attention and several studies have reported their involvement in tumorigenesis [15], migration [16], and prognosis [17]. For example, lncRNA MAFG-AS1 regulates the expression of MAFG to facilitate proliferation of LUAD cells via miR-744-5p [18]. The ceRNA hypothesis provides novel insights into tumorigenesis [19] and biomarker identification [11] at the system biology level.
Bioinformatics techniques are used to integrate and analyse large-scale genomic data, such as RNA-Seq and microarray, to discover potential molecular mechanisms and identify biomarkers, and to guide further experiments. Kumar et al. identified hub genes as potential biomarkers from a large number of differentially expressed (DE) genes by protein-protein interaction (PPI) network [20] and enrichment analysis [21,22], providing valuable ideas for further study. Wan et al. [23] identified a prognosisassociated ceRNA axes in prostate cancer based on RNA sequencing data using bioinformatics approaches, and validated their regulatory mechanisms by cell proliferation and dual luciferase reporter assay.
In this study, we aimed to construct a ceRNA network associated with survival in DE genes to reveal the molecular mechanisms underlying LUAD and initially identify prognostic factors, thereby to provide new ideas for further biological experiments. In addition to identifying the relationship among various RNAs based on the RNA interaction database, we also performed three statistical tests on lncRNA-mRNA pairs to screen for significant ceRNA interactions based on the ceRNA network hypothesis. The concise LUAD ceRNA network proposed by us would provide accurate and reliable results for subsequent studies.

Materials and Methods
2.1. Data Source and Preprocessing. The GDC data portal (https://portal.gdc.cancer.gov/) [24] is an accessible highquality cancer genome data-sharing platform that provides primary processed genomic data (level 3 data). We acquired level 3 RNA-Seq (including mRNA and lncRNA) and miRNA-Seq RNA expression data (HTSeq-counts), and clinical information of LUAD patients, who were part of TCGA project from GDC on October 15, 2019. After excluding duplicate samples and other tissue samples, mRNA and lncRNA dataset included 524 cancer samples and 59 adjacent nontumour tissue samples and isoform quantification data from miRNA-Seq included 516 cancer samples and 46 adjacent nontumour tissue samples.
The RNA-Seq and miRNA-Seq data were processed using the R package GDCRNATools [25]. The raw RNA counts were normalised using the trimmed mean of M value (TMM) method [26] and transformed via the voom method [27], wherein the RNAs with lower expression, where the log CPM was found to be lower than 1 in more than half of the samples, were filtered out. The procedure followed in this study is demonstrated in Figure 1.

Differential Expression
Analysis. The limma [28] method was used to identify DE RNAs. Fold change (FC) refers to the differences in RNA expression within samples, and the |FC | >2 as the threshold value was set based on previous studies on the ceRNA network [29,30]. |FC | >2 and false discovery rate ðFDRÞ < 0:01 were considered statistically significant. Compared to adjacent nontumour tissue samples, RNAs with a higher expression level (FC > 2) in tumour tissue samples were considered upregulated DE RNAs, whereas RNAs with lower expression level (FC < −2) were considered downregulated DE RNAs.

Survival Analysis of DE RNAs.
Univariate Kaplan-Meier survival analysis was performed to determine the correlation between the expression level of each DE RNA and the survival time of patients with LUAD. LUAD patients were categorised into high-and low-expression groups based on the median expression of certain DE RNAs. The hazard ratio (HR) of the two groups was evaluated using the Kaplan-Meier plot, and their difference was assessed by performing the log-rank test using the survival package of R [31]. Results with p < 0:05 were considered statistically significant.

Functional Enrichment
Analysis. The clusterProfiler package [32] of R software is a widely used method for functional enrichment analysis. This package performs overrepresentation and hypergeometric tests to identify DE mRNAs enriched in biological functions or processes. Several enrichment methods ignore the numerical information of DE mRNAs. However, the STRING database (https:// string-db.org/) [33] provides another platform to analyse the numerical data via the two-sided Kolmogorov-Smirnov test and aggregate fold change test that perform well in various settings [33,34]. We used aforementioned two tools to perform functional enrichment analysis on survival-related DE mRNAs and their FC value (for STRING). The cut-off value was set as p adjusted < 0.01 for clusterProfiler and FDR < 0:01 for STRING.

Construction of Survival-Associated ceRNA Networks and
Identification of Prognostic Predictors. We used miRcode (http://www.mircode.org/) [35] to predict the potential interactions between survival-related miRNAs and lncRNAs. Starbase (http://starbase.sysu.edu.cn/) [36] and miRanda (http://www.microrna.org/microrna/home.do) [37] were used to predict target genes of survival-related miRNAs. Starbase  3 BioMed Research International uses multiple algorithms and Ago-binding sites to predict miRNA target sites and their target genes. miRanda predicts the miRNA-mRNA interactions with optimal sequence complementarity using a weighted dynamic programming algorithm and thermodynamic analysis. Therefore, we chose these two databases to improve the reliability of the prediction outcomes.
The ceRNA hypothesis proposed that lncRNAs and their target mRNAs had a positive correlation and they shared miRNAs. Therefore, the competing endogenous interactions between lncRNA and mRNA were evaluated by performing three different statistical tests using GDCRNATools package to select ceRNA pairs matching the ceRNA hypothesis. First, a hypergeometric test was performed to test whether lncRNA and mRNA significantly share a number of miRNAs. Second, Pearson correlation analysis was performed to test the positive correlation between lncRNA and mRNA expressions. Third, regulation pattern analysis [38] was used to measure the regulatory role of miRNAs on lncRNAs and mRNAs. The test criteria were p < 0:05, and regulation similarity was not equal to 0.
Nuclear chromosome segregation

BioMed Research International
The lncRNA-miRNA-mRNA ceRNA regulatory network associated with the survival of LUAD was constructed using Cytoscape 3.7.1 [39]. Hub nodes of the ceRNA network were identified using Cytoscape plugin cytoHubba [40]. We imported the mRNAs from the ceRNA network into the STRING database [33] and selected "Homo sapiens" in organism and medium confidence in the minimum required interaction score to obtain a PPI network. The hub RNAs were subjected to multivariate Cox regression analysis to identify independent prognostic predictors.

Results and Discussion
Cell cycle process

Functional Enrichment
Analysis of Survival-Related mRNAs. The 843 mRNAs with significant overall survival were analysed using clusterProfiler and STRING for Gene Ontology (GO) enrichment analysis and identified the first 10 terms with p values among three different categories. GO included three different aspects: biological process (BP), cellular component (CC), and molecular function (MF). Figure 3 shows that 36 upregulated mRNAs with high log 2 FC values were enriched in six terms associated with BP and four cellular components as identified via clusterProfiler. The four terms associated with BP (sister chromatid segregation, nuclear chromosome segregation, mitotic nuclear division, and mitotic sister chromatid segregation) were involved in cell cycle. The four terms associated with CC were involved in chromosome. Figure 4 shows 60 mRNAs enriched in eight terms associated with BP and the two terms associated with CC as identified by STRING. Similar to the clusterProfiler results, three terms associated with BP involved in cell cycle (mitotic cell cycle, cell cycle process, and mitotic cell cycle process) and two terms associated with CC involved in chromosome. For the MF ontology, the catalytic activity acting on DNA was identified using two different methods.
The clusterProfiler Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway results ( Figure 5) indicated that mRNAs were mainly enriched in cell cycle, human T cell leukaemia virus 1 infection, and cell adhesion molecules (CAMs). STRING identified two different pathways, namely, cell cycle and oocyte meiosis.
The two methods revealed that DE mRNAs were enriched in the cell cycle process and downstream terms. Cell cycle involves progression of cell and nuclear replication and dysregulation of cell replication and division contributes to tumorigenesis. Several studies have reported that overexpression of cell division cycle-associated genes is associated with tumour cell proliferation indicating poor survival in lung cancer patients [41][42][43]. Rac3 induces apoptosis of LUAD cells via cell cycle pathway and is associated with longer survival [44]. Other KEGG pathways have rarely been mentioned in earlier studies on LUAD. A previous study indicated that human T cell leukaemia virus type I infection induces gene expression of CAMs in lung epithelial cells [45]. However, the association between these pathways and LUAD has not been reported previously.
3.3. Survival-Related lncRNA-miRNA-mRNA ceRNA Network in LUAD. To investigate the regulatory interaction in survival-related RNAs, we acquired information on lncRNA-miRNA interactions from miRcode and miRNA-mRNA pairs from Starbase and miRanda. Three statistical tests were performed on lncRNA-mRNA pairs to confirm significant ceRNA pairs. The aforementioned results intersected with survival-significant RNAs. Finally, we considered five miRNAs, 49 mRNAs, and 21 lncRNAs (Table 1) to construct a survival-related ceRNA network ( Figure 6) comprising 37 pairs of miRNA-lncRNA interaction and 61 pairs of miRNA-mRNA interaction. This network suggests a potential regulatory relationship between lncRNA-miRNA-mRNA in LUAD prognosis. Several RNAs in the  BioMed Research International  [46]. lncRNA MNX1-AS1 regulates the progression of oesophageal squamous cell carcinoma by targeting the miR-34a/SIRT1 axis [47]. However, most of these interactions have not been previously reported to be associated with LUAD.
The number of mRNAs in the ceRNA network was insignificant to perform functional enrichment analysis via clusterProfiler or STRING. Therefore, we used the PAN-THER classification system [48] available on the GO website (http://geneontology.org/). The cellular components and KEGG pathways showed no significant results. All 48 genes, except ZFPM2-AS1, were identified and enriched in 38 terms  (Table 2). Major genes were enriched in the regulation of cellular process (GO:0051244), response to stimulus (GO:0050896), signalling (GO:0023052), and their subclass. The regulation of cellular process involves the regulation of the rate, frequency, and extent of cellular processes. The signalling is a process that transmits information in biological systems. Moreover, the end of signal transduction (GO:0007165) regulates the initiation of transcription [49,50]. In general, genes in the ceRNA network regulate the activity of various enzymes, participate in signal transduction, and indirectly regulate the initiation of transcription.

Hub RNAs of ceRNA Network and Prognostic Predictor.
A subnetwork with 15 hub nodes (Figure 7, Table 3) was identified using maximal clique centrality (MCC) in cytoHubba plug-ins. There were a total of six ceRNA pairs in the subnetwork, among which LINC01936-TNS1 exhibited the highest correlation coefficient (Figure 8) indicating they might have the same expression patterns. LINC01936 and TNS1 were the highest scoring nodes in their respective categories. The ceRNA network suggested LINC01936 and TNS1 interacted with miR-20a-5p and miR-31-5p. Our study demonstrated that lower expression of LINC01936 was associated with longer overall survival (Figure 9(a)). However, the role of LINC01936 in LUAD remains unclear. miR-20a-5p exhibited highest topological parameters, indicating that it plays a crucial role in the ceRNA network. Overexpression of miR-20a-5p promotes the migration and invasion of tumour cells [51] and correlates with a shorter survival [52], which is consistent with the findings of our study (Figure 9(c)). The low expression group of miR-31-5p showed better survival (Figure 9(b)). Wei et al. reported that miR31-5p was upregulated in LUAD patients with lymph node metastasis, and low expression of miR-31 was associated with good prognosis in patients with T2N0 stage [53]. TNS1 participates in fibrillar adhesion formation and cell migration [54] and is involved in signal transduction [55]. However, the role of TNS1 in tumours remains controversial. TNS1 negatively regulates tumour migration and invasion, and its high expression is associated with longer metastasis-free survival in breast   [56]. Moreover, higher expression of TNS1 is associated with worse prognosis in colon adenocarcinoma [57].
In this study, we observed that TNS1 was downregulated in LUAD tissues and that higher expression was associated with better prognosis (Figure 9(d)). Based on our analysis, we predicted that LINC01936 regulates TNS1 via miR-20a-5p and miR31-5p. However, the role of TNS1 and its regulatory interaction in LUAD remains unclear and warrants further in vitro and in vivo studies. The scale of the ceRNA network constructed in this study was small. Some information may have been lost during the identification of hub genes using network topological parameters alone. Therefore, we included genes from the ceRNA network into the STRING to analyse protein interactions to   Figure 10) demonstrated the protein interactions of the ceRNA network. TGFBR2, SMAD7, and NEDD4L with the highest degree were identified as hub genes in the PPI network. TGFBR2 is the crucial receptor for transforming growth factor-β 1 (TGF-β1). The TGF-β1 ligand binding to TGFBR2 depends on the serine and threonine residues of the receptor, which in turn binds to the TGF-β receptor I to initiate downstream signalling such as Smad and non-Smad signalling pathways to regulate cell proliferation, migration, and apoptosis [58,59]. TGFBR2 is downregulated in various cancers [60]. Borczuk et al. reported that low expression of TGFBR2 associated with lymph node metastasis in patients with LUAD and increased risk of death [61]. Smad complexes translocate to the nucleus to initiate gene transcription. SMAD7 is an inhibitory Smad molecule that inhibits the formation of Smad complex [62]. Inhibition of miR-21 leads to SMAD7 upregulation, which inhibits cell invasion via TGF-β receptor signalling in non-small-cell lung cancer [63]. A previous study demonstrated that NEDD4L can limit TGF-β signalling by activating SMAD2/3 [64]. Downregulated NEDD4L enhances tumour metastasis and results in poor prognosis [65].
The three hub genes were downregulated in LUAD, and their high expression levels indicated a longer survival (Figures 9(e)-9(g)). Moreover, the ceRNA network showed that LINC01936 is a ceRNA of the three hub genes and mediated through its interaction with miR-20a-5p. In conclusion, several interactions regulate the three hub genes by competitive binding of LINC01936 to miR-20a-5p, which in turn regulate TGF-β signalling and downstream signalling pathways. This affects LUAD progression and patient prognosis. To the best of our knowledge, these interactions have not been reported earlier. Thus, our study outcomes lay a strong    Multifactorial Cox regression analysis was performed to identify independent prognostic factors from the abovementioned seven hub RNAs. Our results (Table 4), based on the multifactorial Cox regression model, indicate that LINC01936 and miR-31-5p are independent prognostic predictors of LUAD. LINC01936 was identified as a protective predictor for LUAD, while miR-31-5p was identified as a risk factor. The potential of miR-31-5p as a biomarker has been reported in oral carcinoma [66], colorectal cancer [67], and lung cancer [68]. This study is the first to demonstrate the prognostic potential of LINC01936 in LUAD.
Our study has several limitations. The three RNA expression data and clinical data for this study were based on TCGA database, and our findings lack biological validation. Computational prediction is only a preliminary step in ceRNA research. Therefore, these results need to be verified by studies involving large-scale clinical samples and laboratory methods such as qRT-PCR, luciferase reporter assay, and western blotting. The regulatory mechanism of the ceRNA network needs to be validated by further in vivo and in vitro research.

Conclusions
In summary, our study constructed a survival-associated lncRNA-miRNA-mRNA ceRNA network in LUAD using bioinformatics approaches and identified seven hub RNAs (LINC01936, miR-20a-5p, miR-31-5p, TNS1, TGFBR2, SMAD7, and NEDD4L). LINC01936 and miR-31-5p were identified as independent prognostic predictors of LUAD. The ceRNA network identified in this study provides novel insights into the molecular regulatory mechanisms associated with LUAD progression. Further studies are required to explore the biological mechanisms of ceRNAs in LUAD and validate the prognostic value of LINC01936 and miR-31-5p in other cohorts.

Data Availability
The RNA-Seq and miRNA-Seq data used to support the findings of this study have been deposited in the GDC data portal (https://portal.gdc.cancer.gov/).

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments
We would like to thank Editage (http://www.editage.com) for the English-language editing. This study is supported by the project sponsored by SRF for ROCS, SEM and supported by the Scientific Research Starting Foundation for Returned Overseas Chinese Scholars, Ministry of Education, China, and the Guangdong Province Innovative Strong School Project of Guangdong Pharmaceutical University, China (No. 2020KZDZX1126). The results published or shown here are in whole or part based upon data generated by TCGA Research Network: https://www.cancer.gov/tcga. Figure S1: heat map of differentially expressed mRNAs. Figure S2: heat map of differentially expressed lncRNAs. Figure S3: heat map of differentially expressed miRNAs.