Identification of EMT-Related lncRNAs as Potential Prognostic Biomarkers and Therapeutic Targets for Pancreatic Adenocarcinoma

Epithelial-mesenchymal transition (EMT) can promote carcinoma progression by multiple mechanisms; many studies demonstrated the invasiveness of pancreatic adenocarcinoma (PAAD) associated with the EMT, but how it acts through an lncRNA-dependent manner is unknown. Here, we investigated 146 samples from The Cancer Genome Atlas (TCGA) and 92 samples from the International Cancer Genome Consortium (ICGC). By gene set variation analysis (GSVA) and weighted correlation network analysis (WGCNA), we explored the EMT-related long noncoding RNAs (EMTlnc). Then, we performed univariate Cox regression analysis to screen their prognostic value for PAAD. The least absolute contraction and selection operator (LASSO) Cox regression was used to establish EMT-related lncRNA prognostic signal (EMT-LPS). In addition, we established a competitive endogenous ceRNA network. Then, we identified 33 prognostic EMTlnc as prognostic lncRNAs and established an EMT-LPS which showed strong prognostic ability in stratification analysis. By corresponding risk scores, patients were divided into low-risk and high-risk subgroups. Principal component analysis (PCA) showed that these subgroups had individual EMT status. Enrichment analysis showed that in the high-risk subgroup, biological processes, pathways, and hallmarks related to malignant tumors are more common. What is more, we constructed a nomogram that had powerful ability to predict the overall survival rate (OS) of PAAD patients in two datasets. So, EMT-LPS are a principal element in PAAD's carcinoma progression and may help us in choosing the way of prognosis assessment and provide some clues to design the new drugs for PAAD.


Introduction
Pancreatic adenocarcinoma (PAAD) is a malignant disease with poor prognosis, and the cure rate for this disease is only 9%, moving to be the third principal cause of cancer death; if untreated, the median survival was only 3 months of those patients with metastatic disease [1]. Thus, it is very urgent to search for more therapeutic targets for PAAD.
Long noncoding RNAs (lncRNAs) are a ribonucleic acid molecule with a length of more than 200 nucleotides, which does not have any protein coding function and participates in various cellular processes. An accumulating body of research indicates that the dysregulation of lncRNA expression is implicated in many kinds of cancer [2], including proliferation, apoptosis, migration, and invasion [3][4][5]. Moreover, lncRNAs can also act as diagnostic or prognostic markers in a range of cancer types, for an example, hepatocellular carcinoma and prostate cancer [6][7][8].
Epithelial-mesenchymal transition plays a prominent role in the formation of body plan and the differentiation of multiple tissues and organs. It is defined as the phenotypic transition from an epithelial state to a mesenchymal state in the morphological cell program. Study reveals that EMT can not only cause organ fibrosis but also promote carcinoma progression [9]. It involves a complex network including epigenetic modifications, transcriptional control, alternative splicing, protein stability, and subcellular localization [10][11][12]. Given that EMT are documented in many cancer cell models, the importance of EMT in cancer progression and the correlation in cancer tissues are still a matter of controversy. A multitude of researches have proven the invasiveness of PAAD associated with the EMT [13][14][15][16], while during the PAAD progression, how it works in an lncRNA-dependent way is unclear.
In our research, relying on The Cancer Genome Atlas (TCGA) dataset (n = 146) and the International Cancer Genome Consortium (ICGC) dataset (n = 92) and through the bioinformatics and statistical analysis of the PAAD patient data, the prognostic significance of the EMTlnc is determined. The results showed that 33 EMTlnc had prog-nostic value for PAAD patients in these two datasets. What is more, on the strength of the ability of 33 EMTlnc, we constructed an EMT-dependent predictive sign of lncRNA (EMT-LPS) that can predict the OS of PAAD patients. In the meantime, we found that in patients with PAAD, there exist different prognoses between low-risk and high-risk subgroups, and tumor features are more common in highrisk subgroups. Finally, we establish an accurate nomogram to predict the OS for PAAD patients, and a ceRNA network to search for target miRNA and mRNA was built.   Xena Database (http://xena.ucsc.edu/) and the International Cancer Genome Consortium Data Portal (https://dcc.icgc .org/). For the purpose of reducing statistical bias, we excluded the PAAD patients with missing OS values or OS < 30 days. Finally, 146 PAAD patients from TCGA were selected to construct the EMT-LPS, and 92 PAAD patients from ICGC were included to test the EMT-LPS. Then, the FPKM data underwent a log2 transformation. The gene annotation file "gencode.v22.annotation," which was downloaded from TCGA database, was utilized to transform ENSEMBL ID to GENE SYMBOL. On the basis of the GENCODE website (https://www.gencodegenes.org/human/), 9 types of transcripts (3prime_overlapping_ncRNA, antisense, bidirec-tional_promoter_lncRNA, lincRNA, macro_lncRNA, non_ coding, processed_transcript, sense_intronic, and sense_ overlapping) were defined as lncRNAs. The ENSEMBL ID with the max average expression level was used to represent the expression level of this gene when there were multiple ENSEMBL ID annotated to the same gene symbol. Ultimately, in TCGA cohort, there were 14805 lncRNAs and 19712 mRNAs, and in the ICGC cohort, the data were 1440 lncRNAs and 15146 mRNAs.     [17,18]. The EMT pathway values for each PAAD patient were calculated using gene set variation analysis (GSVA) in TCGA cohort. Then, weighted gene coexpression network analysis (WGCNA) was applied to find significantly relevant lncRNA binding with lncRNA modules and find the relationship between the modules. Here, to provide a scale-free network, 50 were set to minimum cut size, with cut height = 0:3 combined with similar height modules. With different lncRNA modules marked with different colors, the gray modules represent lncRNA that cannot be combined; then, we applied the Pearson correlation analysis to evaluate the correlation of lncRNA and EMT values in each module. Finally, the lncRNAs in the model with absðcorÞ > 0:5 and p value < 0.05 were defined as EMTlnc.

Establishment and Verification of EMT-LPS.
The univariate Cox was used to choose prognostic EMTlnc in both TCGA cohort and ICGC cohort based on EMTlnc. Through taking intersection, 33 lncRNAs with p < 0:05 were defined as shared prognostic EMTlnc. Then, we used the LASSO Cox regression to select the significant prognostic lncRNAs and construct EMT-LPS involved 11 EMTlnc by the "glmnet" package [19] in R. Here, to facilitate parameter selection, the"10-fold cross-validation" approach was applied. The risk score of TCGA cohort and ICGC cohort patients was calculated as the following formula: Coef ðiÞ is the estimated regression coefficient obtained from LASSO Cox regression analysis and χ i is the expression value of each selected EMTlnc. PAAD patients were classified into low-risk and high-risk groups based on the median risk score. The OS results were then compared using the log-rank test and Kaplan-Meier survival analysis.
To verify the accuracy of the identified EMT-LPS, the receiver operating characteristic (ROC) curve analysis in the package "Survival ROC" was applied [20].

Principal Component Analysis (PCA) and Nomogram
Construction. Based on the expression of 200 EMT-related genes in the HALLMARK_EPITHELIAL_MESENCHYMAL_ TRANSITION pathway, we used PCA to evaluate the differences between low-risk and high-risk subgroups and constructed a nomogram using the R package "RMS" [21] to evaluate the 1-, 2-, and 3-year survival possibility of patients in the cohorts. We then generated the calibration curve of the nomogram and evaluated the consistency between the predicted value and the actual observed value through the rms package.

Enrichment Analysis.
We inputted differentially expressed genes between low-risk and high-risk subgroups and targeted mRNAs differentially expressed in the ceRNA network into the "Metascape" website (http://metascape .org/) for function and pathway enrichment analysis, including typical pathway, Reactome gene set, Gene Otology (GO) biological process, and Kyoto Gene and Genome Encyclopedia pathway (KEGG pathway). In addition, in order to study which tumor features are more common in high-risk subgroups, GSEA software (http://software.broadinstitute.org/ gsea/index.jsp) was used.      Journal of Oncology analyses were used to evaluate the independent predictive value of EMT-LPS for OS prognosis.

Identification of EMTlnc in PAAD Patients.
First, 14805 lncRNAs from TCGA dataset and 1440 lncRNAs from the ICGC dataset were identified for the next analysis. Then, we analyzed the genes related to the EMT pathway in TCGA PAAD samples by GSVA and got the GSVA variation. After applying WGCNA on the database of PAAD patients in TCGA, the coexpression network by WGCNA analysis showed that the EMT-related lncRNA models in PAAD samples were grouped into 48 models, containing MElightcyan and MEturquoise models with absðcorÞ > 0:5 and p value < 0.05, which were both included into ulterior analysis in this research. The lncRNAs in those two models were defined as EMTlnc.
With the combination of the prognostic information, we implement univariate Cox regression (p < 0:05) in two datasets, screening out the prognostic reactions associated with EMT from the EMTlnc. Ultimately, in these two datasets, by cross-analysis, we found that EMTlnc was significantly correlated with the OS of PAAD patients. The workflow is shown in Figure 1(a), and WGCNA in the PAAD sample is shown in Figure 1(b). The Pearson correlation analysis showed the correlation between the membership in the light cyan module and membership of characteristic genes in the light cyan module in the EMT pathway and the Pearson correlation analysis showed the correlation between the membership in the turquoise module and characteristic genes in the turquoise module in the EMT pathway as shown in Figures 1(c) and 1(d), respectively. The univariate Cox analysis results of 33 EMTlnc are shown in Table 1.

Construction of the EMT-LPS in TCGA Dataset and
Validation of the EMT-LPS in the ICGC Dataset. In TCGA cohort, on the basis of 33 EMT-related prognostic lncRNAs, we performed a LASSO Cox analysis to build the EMT-LPS for the prediction of PAAD patients' OS, and coefficients containing 11 EMT-LPS were generated (Figures 2(a) and 2(b)). The EMT-LPS included 11 lncRNAs, and the risk assessment is calculated from the coefficient of each lncRNA (Figure 2(c)). Then, according to the median risk score, the patients were divided into low-risk and high-risk subgroups. The Kaplan-Meyer survival curve showed that PAAD patients with poor clinical outcomes usually have higher risk scores, which means lower OS incidence and shorter OS time (Figure 2(d)). Figure 2(f) plots the risk score and survival status. The ROC curve (Figure 2(h)) showed that EMT-LPS had a strong ability to predict OS in TCGA cohort (AUC = 0:86, 0.86, and 0.9 for 1, 2, and 3 years, respectively).
To verify the predictive ability of EMT-LPS, we calculated the same formula for these patients in the ICGC cohort. Based on the average risk score, we divided PAAD patients into lowrisk and high-risk groups. The results were consistent with TCGA dataset: in the ICGC dataset, patients with higher risk scores had lower OS rates and shorter OS time (Figure 2(e)). Figure 2(g) shows the assessment of risk and survival status distribution, which indicated that the total survival time and death status were shorter in patients with the higher risk score. ROC analysis also showed that EMT-LPS has a powerful predictive value for PAAD patients (AUC = 0:895, 0.85, and 0.89 for 1, 2, and 3 years, respectively; Figure 2(i)). These results comprehensively implied that the EMT-LPS had a strong and stable OS prediction ability.

Stratification Analysis of the EMT-LPS.
We tried to determine if the clinicopathological features were related to the risk score or not. In TCGA dataset, the results showed the PAAD patients whose WHO grade at III+IV were in a higher risk scores, but the risk score had no association with age, gender, N_stage, and T_stage (Figures 4(a)-4(e)). To have a better assessment on the prognostic ability of EMT-LPS, we conducted a stratification analysis to verify if it retains the ability to predict OS in different subgroups. On

11
Journal of Oncology the contrary to the PAAD patients with lower risk, patients with higher risk had worse OS when they were ≥60 years old (Figures 4(f) and 4(g)). In the same way, we proved that EMT-LPS can predict the OS of female or male PAAD patients (Figures 4(h) and 4(i)), patients with grade I+II or grade III+IV, and patients with TNM stage N0, N1, or T3+T4 (Figures 4(j)-4(n)). In the ICGC dataset, the results showed that PAAD patients with N1_stage, T3_stage, and T4_stage had higher risk scores (Supplementary Figure  S2D-G). These data testified that the EMT-LPS might be a potential predictor of PAAD patients.
3.5. Principal Component Analysis. We conducted principal component analysis (PCA) on the basis of the expression value of the 200 EMT-related genes to evaluate the differences between the low-risk and high-risk groups (Supplementary Figure S5A, B). The results showed that the low-risk and high-risk patients were distributed in distinct directions in both TCGA and ICGC datasets. These results may suggest that different risk subgroups have different EMT statuses.

Pathway and Process Enrichment Analysis and Gene Set
Enrichment Analysis (GSEA). We identified 710 differentially expressed genes (DEGs) (|log2 ðfold changeÞ | >1 and p < 0:05) between two subgroups in TCGA cohort to study the potential biological process and pathway concerning molecular heterogeneity between the low-risk and high-risk subgroups. These DEGs were mainly focused on the following aspects: chemical synaptic transmission, neuronal system, membrane potential regulation, behavior, projection morphogenesis of plasma membrane-binding cells, and GABAergic synapse (Supplementary Material S1A). GSEA showed that the interferon alpha response and the interferon gamma response (two tumor hallmarks) were enriched in the high-risk subgroup (Supplementary Material S1B, C). These results may be helpful to elucidate the cell biological effects of EMT-LPS.

Construction and Validation of the Nomogram Based on
EMT-LPS. By using risk status (based on EMT-LPS), gender, age, T_stage, N_stage, and WHO grade in TCGA dataset, we constructed a nomogram to create a clinically applicable quantitative tool to predict the OS of PAAD patients and tested in the ICGC dataset (Supplementary Material S2C). Calibration plots indicated that 1-, 2-, and 3-year OS ratios are completely consistent with the predicted ratios in these two cohorts (Supplementary Material S3A-C and Supplementary Material S4B-D, respectively). Then, we used the time-dependent ROC curves to evaluate the predictive ability of the nomogram and other predictors in TCGA (risk score, gender, age, T_stage, N_stage, and WHO grade, Supplementary Material S3D-F and Supplementary Figure S1E-G, respectively); the results showed that the nomogram had excellent accuracy in 1-, 2-, and 3-year OS (AUC of TCGA was 0.79, 0.83, and 0.86; AUC of ICGC was 0.8, 0.84, and 0.89). These data suggested that the nomogram has strong and stable capabilities to predict the OS for PAAD patients.
3.9. Construction of the ceRNA Network and Functional Enrichment Analysis. Based on the EMTlnc, we established a ceRNA network to have a deeper insight on how EMTlnc regulates mRNA expression by sponging miRNAs in PAAD. Two of 12 lncRNAs were extracted from the miRcode database, and 13 pairs of interactions between 2 lncRNAs and 34 miRNAs were identified. Then, we used three databases (miRTarBase, miRDB, and TargetScan) to search for target mRNAs on the basis of the 34 miRNAs, and 1539 mRNAs were identified in these three databases.
Furthermore, these target mRNAs were crossed with DEGs to obtain differentially expressed target mRNA. Finally, two lncRNAs, twelve miRNAs, and thirteen mRNAs were included in the ceRNA network ( Figure 5(a)). Additionally, functional enrichment in the network tool using 1539 mRNA targets revealed that these genes are enriched in vascular system development, pathway in cancer, regulation of cellular response to stress, Wnt signaling pathway, tissue morphogenesis, insulin signaling, regulation of cellular protein localization, response to growth factor, and negative regulation of cell differentiation (Figures 5(b)-5(d)). These data can provide some clues for us to discover the potential functionality of EMTlnc in PAADs.

Discussion
In this study, 268 PAAD patients in TCGA and ICGC datasets were included to explore the prognostic significance of EMTlnc. In these two datasets, 33 EMTlnc were proven to have predictive value and 11 EMTlnc among them were used to establish EMT-LPS to predict PAAD OS. On the basis of the median risk score, we divided PAAD patients into low-risk and high-risk subgroups and found that the latter group had worse clinical outcomes, richer tumor characteristics, and exact malignant-related pathways. Multivariate Cox regression analysis revealed the EMT-LPS was an independent risk factor for OS.
In addition, we set up a nomogram of EMT-LPS combined with gender, age, and World Health Organization grade, T_stage and N_stage, which has a strong predictive ability for OS of PAAD patients on TCGA and ICGC datasets. Finally, we established a ceRNA network that includes 12 Journal of Oncology two EMTlnc, twelve miRNAs, and thirteen mRNAs for observing the latent functions of these EMTlnc. More and more evidences show that lncRNAs can coordinate multiple cell processes by regulating EMT in different types of cells. MALAT1 and lnc-ATB can stimulate the EMT process by competitively binding miR-503 and miR-200c and then promote silica-induced pulmonary fibrosis [22,23]. In EMT of the breast, bladder, and nasopharynx, lncRNA ROR can regulate various signal pathways [24][25][26]. More importantly, by inducing EMT to accelerate the growth and progression of cancer, hypoxia can enhance the mutual movement of lncRNA UCA1 into the exonmediated bladder cancer cells [27].
Studies had revealed that EMT had impact on cancer invasion and progression, and lncRNAs may play the role of ceRNAs, targeting EMT regulators so as to influence the aggressive progression of the tumor. Liu et al. [28] found that TGFBI acts as the ceRNA of miR-21 and FN1 acts as miR-200c to regulate EMT. What is more, the richness of ceRNA can determine the reversibility of EMT. By the way, it was reported that the mutations in BRCA2 significantly increase the risk of pancreatic cancer [29]; whether it is associated with EMT is a question that needs to be further explored. miR-330 might be of potential therapeutic value for pancreatic cancer pain because it participated in the genesis of pancreatic carcinoma-induced pain hypersensitivity [30]; perhaps we can explore the mechanism of pancreatic cancer pain from the perspective of EMT. Here, we implemented a ceRNA-based functional enrichment analysis and found that genes were mainly enriched in vascular system development, pathway in cancer, cell stress response regulation, Wnt signaling pathway, and some other pathways. Taking all these evidences into consideration, we bet that EMT is targeted at lncRNAs, and to identify potential prognostic markers or therapeutic targets, we should better place more emphasis on the interactions and functions of lncRNAs and EMT.
From 268 PAAD patients, we identified 33 EMT-related prognostic lncRNAs, eleven of which were included in the EMT-LPS. Yang et al. uncovered that RP11-400N13.3 reacts as an oncogenic lncRNA in colorectal cancer, and by modulating the miR-4722-3p/P2RY8 axis, it can accelerate colorectal cancer progression [31]. LINC00152 was firstly found overexpressed in gastric cancer and acted as an oncogene in gliomas and liver, lung, and colorectal cancer [32][33][34]. It can promote cell proliferation and invasion capability of these cancer cells by targeting GFR, EZH2, miR-16, and miR-139-5p [35][36][37][38][39]. Wang et al. confirmed that the overexpression of LINC01116 was related to the clinicopathological characteristics and survival in glioma patients and can accelerate tumor proliferation and neutrophil recruitment by regulating IL-1β [40]. Li et al. found that LINC01128 can resist acute myeloid leukemia by regulating miR-4260/NR3C2 [41]. Tang et al. reported that miR-135a can downregulate DANCR by regulating the downstream of NLRP3 in pancreatic cancer [42], which is consistent with our results.
Some of the 11 lncRNAs have been reported to be related to cancer progression, but little is reported about PAAD, let alone how lncRNAs interact with EMT-related genes. Hence, we hope that our results will help determine the pre-dictive responses that EMT regulators may target, thus providing insight into its potential role in carcinogenesis and PAAD development.
In a word, this study included two PAAD datasets (TCGA and ICGC datasets) through which our results have been obtained and validated, but there also are some limitations. In the future, we should use more independent PAAD cohorts to verify the established prognostic EMTlnc, design new drugs or strategies to manage PAAD, and advocate more research to reveal the specific mechanism and genes of PAAD regulating EMT.

Conclusions
EMT-LPS played an important role in PAAD's carcinoma progression and may help us in choosing the way of prognosis assessment and provide some clues to design the new drugs for PAAD.

Data Availability
The datasets during the current study are available from the corresponding author upon reasonable request.