Identification of Three lncRNAs as Potential Predictive Biomarkers of Lung Adenocarcinoma

Background Lung cancer is the most common cancer and the most common cause of cancer-related death worldwide. However, the molecular mechanism of its development is unclear. It is imperative to identify more novel biomarkers. Methods Two datasets (GSE70880 and GSE113852) were downloaded from the Gene Expression Omnibus (GEO) database and used to identify the differentially expressed genes (DEGs) between lung cancer tissues and normal tissues. Then, we constructed a competing endogenous RNA (ceRNA) network and a protein-protein interaction (PPI) network and performed gene ontology (GO) analysis, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, and survival analyses to identify potential biomarkers that are related to the diagnosis and prognosis of lung cancer. Results A total of 41 lncRNAs and 805 mRNAs were differentially expressed in lung cancer. The ceRNA network contained four lncRNAs (CLDN10-AS1, SFTA1P, SRGAP3-AS2, and ADAMTS9-AS2), 21 miRNAs, and 48 mRNAs. Functional analyses revealed that the genes in the ceRNA network were mainly enriched in cell migration, transmembrane receptor, and protein kinase activity. mRNAs DLGAP5, E2F7, MCM7, RACGAP1, and RRM2 had the highest connectivity in the PPI network. Immunohistochemistry (IHC) demonstrated that mRNAs DLGAP5, MCM7, RACGAP1, and RRM2 were upregulated in lung adenocarcinoma (LUAD). Survival analyses showed that lncRNAs CLDN10-AS1, SFTA1P, and ADAMTS9-AS2 were associated with the prognosis of LUAD. Conclusion lncRNAs CLDN10-AS1, SFTA1P, and ADAMTS9-AS2 might be the biomarkers of LUAD. For the first time, we confirmed the important role of lncRNA CLDN10-AS1 in LUAD.


Introduction
Lung cancer has the highest incidence of all cancers (11.6% of the total cases) and the highest death rate (18.4% of the total cancer deaths) [1]. It contains two main types, small cell (SCLC, 15% cases) and non-small-cell lung cancer (NSCLC, 85% cases). Lung adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) are the main histological subtypes of NSCLC [2]. e underlying biological and molecular mechanisms of lung cancer are gradually being understood over the past decades [3]. e molecular biomarkers may improve the early lung cancer detection [4]. Furthermore, traditional chemotherapy that is based on histopathology is replaced by the individualized precision treatment which is based on carcinogenic factors [5].
Protein-coding genes are widely studied in lung cancer over the past decades, but the human genome transcribes less than 2% of the protein-coding genes. 85% are noncoding RNAs, including long noncoding RNAs (lncRNAs) [6]. lncRNAs account for a large part of the human genome [7], and they were once considered to be the insignificant "noise" in genome's repertoire of non-protein-coding transcripts. However, recent studies have revealed the roles of lncRNAs in many biological processes, including transcriptional regulation and cell differentiation [8,9]. It is well recognized that dysregulation of lncRNAs plays an important role in cancer, including lung cancer [10,11]. Nevertheless, a large number of lncRNAs are still unexplored [12]. erefore, it is imperative to recognize more lncRNAs as biomarkers of lung cancer for better diagnosis, therapy, and prediction of the prognosis.
is study aimed to explore more biomarkers of lung cancer via integrated bioinformatics analysis. Gene Expression Omnibus (GEO) is a NCBI's publicly available genomics database (https://www.ncbi.nlm.nih.gov/gds/), which provides us a large amount of genomic data about lung cancer. Two datasets (GSE70880 and GSE113852) were downloaded from GEO and used to identify the differentially expressed genes (DEGs) between lung cancer tissues and normal tissues. en, we constructed a competing endogenous RNA (ceRNA) network and a protein-protein interaction (PPI) network and performed gene ontology (GO) analysis, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, and survival analyses to identify potential biomarkers that are related to the diagnosis and prognosis of lung cancer. We verified the expression differences and measured the diagnostic roles of lncRNAs through e Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/).

Gene Expression Microarray Datasets.
We downloaded gene expression microarray datasets GSE70880 and GSE113852 from the GEO database. e criteria they met were as follows: (1) the studies were about lung cancer, (2) tissue samples included the tumor and corresponding adjacent tissues or normal tissues, (3) the number of total samples was not less than 40, and (4) the datasets must include lncRNAs and mRNAs.

Integrated Analysis of Microarray Datasets.
We merged the two datasets not only to increase the sample size but also to facilitate subsequent analyses. Based on different platforms, days, environments, and people, the samples had heterogeneity and potential variables, which might lead to a bias. erefore, we batch-normalized the merged dataset by suing limma and sva packages in software R (v3.6.1). Next, we performed gene differential analysis between tumor and normal tissues by limma package. |LogFC| > 1 and adjusted P value <0.05 (the correction for P value was done by Benjamini-Hochberg method) were considered statistically significant for the DEGs.

Construction of ceRNA Network.
In order to find out the competing endogenous regulating network mediated by lncRNAs and miRNAs, the ceRNA network was constructed. CeRNA networks link the functions of proteincoding mRNAs with that of noncoding RNAs such as microRNA, long noncoding RNA, pseudogenic RNA, and circular RNA [12]. e construction of the ceRNA network included two steps. (1) Interactions between differentially expressed lncRNAs (DElncRNAs) and miRNAs were predicted by the miRcode database (http://www.mircode. org/) [13]. lncRNAs could competitively bind to the shared miRNAs.
e interactions should match all the three databases. We selected the mRNAs that were differentially expressed in our study from the target mRNAs of miRNAs for the construction of the ceRNA network. We used the software Cytoscape (v3.7.1) to visualize the network.

Functional Enrichment Analysis and PPI Network.
Functional enrichment analysis includes GO and KEGG analysis. Genes in the ceRNA network were analyzed by GO and KEGG analysis by package clusterprofiler in software R. In GO analysis, functions were divided into biological processes (BP), molecular functions (MF), and cellular component (CC). PPI network was constructed by using STRING (http://string-db.org/cgi/input.pl).

Online Survival Analyses in LUAD and LUSC.
To confirm whether lncRNA is associated with the survival of LUAD and LUSC was one of our research contents. Due to lack of clinical information, we performed Kaplan-Meier (KM) analyses of lncRNAs in the ceRNA network and genes in the PPI network through the Kaplan-Meier Plotter (http://kmplot.com/analysis/), respectively [18]. We only performed the survival analyses of these lncRNAs and mRNAs in LUAD and LUSC. Log rank P < 0.05 was considered statistically significant. Genes related to the prognosis of LUAD or LUSC would be considered as the hub genes.

e Differences of Expression Levels and Diagnostic
Roles of lncRNAs Selected from ceRNA Network. To confirm whether the expression levels of lncRNAs in the ceRNA network were different between normal and tumor tissues, we downloaded the gene expression profile from TCGA database and performed gene differential analysis of lncRNAs selected from the ceRNA network. P < 0.05 means there is a statistical difference. In order to measure the diagnostic values of these lncRNAs, we used the data from TCGA database to perform receiver operating characteristic (ROC) curves by GraphPad Prism 7.0, and the values of cutoff, specificity, sensitivity, and area under the curve (AUC) were also calculated. e cutoff value is the best

Gene Expression Profile Data.
ere were two datasets and a total of 47 lung cancer tissue samples and 47 normal tissue samples in this study (Table 1). We merged the two datasets into one dataset and then batch-normalized it to reduce variability. 846 genes (321 genes were upregulated and 525 genes were downregulated) were confirmed as DEGs. We divided the DEGs into mRNA group (a total of 805 mRNAs, 307 mRNAs were upregulated and 498 mRNAs were downregulated) and lncRNA group (a total of 41 lncRNAs, 14 lncRNAs were upregulated and 27 lncRNAs were downregulated) ( Table 2). Volcano plot showed the results of gene differential analysis ( Figure 1). Heatmaps showed gene expression changes ( Figure 2).

GO and KEGG Pathway
Analysis. GO terms and KEGG pathway analyses were performed by package clusterProfiler in software R. e results showed that biological processes were mainly associated with epithelial cell migration, regulation of epithelial cell migration, ameboidal-type cell migration, endothelial cell migration, endothelium development, and so on, while molecular functions were associated with carbohydrate binding, transmembrane receptor protein serine/threonine kinase activity, transmembrane receptor protein tyrosine phosphatase activity, transmembrane receptor protein phosphatase activity, carboxylic acid binding, and organic acid binding. Cellular component gathered in the basal part of cell, MCM complex, pore complex, cell-cell adherens junction, and basal plasma membrane ( Figure 4). KEGG pathway analysis revealed the genes were only enriched in cell adhesion molecules (CAMs) and axon guidance ( Figure 5(a)).

Expression Levels and Diagnostic Values of Four lncRNAs in LUAD Confirmed through TCGA.
A total of 551 samples (54 normal samples and 497 tumor samples) of LUAD were downloaded from the TCGA database. After gene differential analysis, we observed that, in tumor samples, CLDN10-AS1 was upregulated, while SFTA1P, SRGAP3-AS2, and ADAMTS9-AS2 were downregulated (Figure 9). e results were consistent with our previous conclusions. Furthermore, the ROC curves showed that all of the four lncRNAs had diagnostic values in LUAD (Table 3, Figure 10).

Discussion
In the present study, we investigated the gene expression patterns of lncRNAs and mRNAs in lung cancer cells and corresponding adjacent tissues or normal tissues. We found that 525 genes were downregulated and 321 genes were upregulated in lung cancer cells. e ceRNA network revealed the correlation among lncRNAs, miRNAs, and mRNAs. Four lncRNAs (CLDN10-AS1, SFTA1P, SRGAP3-AS2, and ADAMTS9-AS2) were involved in ceRNA network and survival analyses showed that CLDN10-AS1, SFTA1P, and ADAMTS9-AS2 were associated with prognosis of LUAD.
e PPI network showed the interaction among these mRNAs in ceRNA network and five mRNAs (DLGAP5, E2F7, MCM7, RACGAP1, and RRM2) had the highest connectivity. Survival analyses also revealed the relationship between the five mRNAs and the prognosis of LUAD, and they were confirmed as hub genes of LUAD finally. However, survival analyses indicated that these lncRNAs and mRNAs were not related to the prognosis of LUSC. To increase the credibility of our results, we verified the expression levels and measured the prognostic values of the four lncRNAs in LUAD through TCGA database. e results confirmed our conclusions and revealed that all the four lncRNAs were associated with the diagnosis of LUAD. lncRNAs CLDN10-AS1, SFTA1P, and ADAMTS9-AS2, and Previous studies have explored the functions of lncRNAs as diagnostic and prognostic biomarkers [6]. HOTAIR is a famous cancer-related lncRNA and it is highly expressed in NSCLC and SCLC [10]. MALAT1 is another important lncRNA, and in patients with non-small-cell lung cancer, it is significantly related to metastasis potential and poor prognosis [19,20]. Schmidt et al. indicated that MALAT1 could be considered as an independent prognostic parameter for both LUAD and LUSC [19,20,22]. Due to the high expression in LUAD, CCAT2 is a potential diagnostic biomarker for LUAD [22]. lncRNAs have also been studied as potential drug targets [6]. HOTAIR might be a therapeutic target in NSCLC because of its role in the chemoresistance to cisplatin [6]. Liu et al. indicated that MEG3 might be a potential therapeutic target in lung cancer, for tumor cells would be sensitive to cisplatin when MEG3 was overexpressed in A549 cells [23].
lncRNA SFTA1P had been reported previously. Huang et al. demonstrated that SFTA1P and CASC2 were associated with the regulation and development of LUSC and could be used as prognostic and predictive indicators of LUSC via integrated bioinformatics analysis [24]. Zhang et al. reported that the downregulation of SFTA1P affected LUAD patients' survival time but had no influence on LUSC patients.  SFTA1P exerted tumor inhibition in LUAD [25]. ere were no experiments on the mechanism of SFTA1P in lung cancer so far. However, this opens up many avenues of study to pursue on this topic. e number of miRNAs that connected with lncRNA ADAMTS9-AS2 was the largest among the four lncRNAs, suggesting it was a critical lncRNA. Studies about lncRNA ADAMTS9-AS2 in lung cancer were rare. Liu et al. demonstrated that lncRNA ADAMTS9-AS2 was lowly expressed in lung cancer tissues by qRT-PCR. In their study, high expression of lncRNA ADAMTS9-AS2 reduced proliferation ability and inhibited migration and elevated their apoptosis rate. ey also verified the relationship among lncRNA ADAMTS9-AS2, mRNA TGFBR3, and miRNA miR-223-3p. ADAMTS9-AS2 increased TGFBR3 expression, but miR-223-3p decreased both of them. miR-223-3p targeted TGFBR3 to enhance the ability of proliferation, migration, and invasion of lung cancer. ey came to a conclusion that DAMTS9-AS2, TGFBR3, and miR-223-3p might provide potential therapeutic targets in lung cancer [26].
miRNAs miR-125a-5p and miR-125b-5p only connected with lncRNA CLDN10-AS1, indicating lncRNA CLDN10-AS1 had a different binding trend among the four lncRNAs. CLDN10-AS1 might play a role in the development and progression of lung cancer, by regulating the G1/S transition of mitotic cell cycle through miR-125b-5p/PPAT and by regulating endothelium development, angiogenesis, and cell-cell adherens junction through miR-125b-5p/CDH5, respectively. Furthermore, CLDN10-AS1 was associated with the prognosis of LUAD (logrank P � 0.026), indicating the potential functions in the prognosis of LUAD patients. It is noteworthy that CLDN10-AS1 has not been reported in lung cancer. For the first time, we confirmed the crucial role of lncRNA CLDN10-AS1 in LUAD. It might be related to the diagnosis, ability of migration, and prognosis of LUAD.
Limitations existed in our study. (1) We only included two datasets. Although there were some other datasets, their sample size was too small. In order to increase the quality of our study, we excluded datasets with a sample size less than 40. (2) e two datasets are noncoding mRNA and lncRNA microarray, so we used the probe reannotation method to annotate gene symbol, which might drop some genes due to the failure of matching the probes. (3) Neither dataset contained clinical information. We could only perform survival analyses through the Kaplan-Meier Plotter (http:// kmplot.com/analysis/). (4) We did not validate our results by experiment, giving us a direction for future research.

Conclusion
In our study, we identified the differentially expressed lncRNAs CLDN10-AS1, SFTA1P, SRGAP3-AS2, and ADAMTS9-AS2 by analyzing gene expression profiles from GEO. Among them, CLDN10-AS1, SFTA1P, and ADAMTS9-AS2 and their related mRNAs DLGAP5, E2F7, MCM7, RACGAP1, and RRM2 were associated with the prognosis of LUAD, suggesting they were more critical in LUAD. What is more, the four lncRNAs had diagnostic values in LUAD. lncRNAs CLDN10-AS1, SFTA1P, and ADAMTS9-AS2 and their related mRNAs DLGAP5, E2F7, MCM7, RACGAP1, and RRM2 might be biomarkers of LUAD. For the first time, we confirmed the important role of lncRNA CLDN10-AS1 in LUAD. It might be related to the diagnosis, ability of migration, and prognosis of LUAD.

Conflicts of Interest
e authors declare that they have no conflicts of interest.

Authors' Contributions
Donghui Jin designed the study, analyzed the data, and wrote the manuscript. Yuxuan Song designed the study and analyzed the data. Yuan Chen revised the paper critically for important intellectual content. Peng Zhang designed the study and revised the paper critically for important intellectual content.