Identification of Key Biomarkers and Candidate Molecules in Non-Small-Cell Lung Cancer by Integrated Bioinformatics Analysis

Background Non-small cell lung cancer (NSCLC) is the most prevalent malignant tumor of the lung cancer, for which the molecular mechanisms remain unknown. In this study, we identified novel biomarkers associated with the pathogenesis of NSCLC aiming to provide new diagnostic and therapeutic approaches for NSCLC by bioinformatics analysis. Methods From the Gene Expression Omnibus database, GSE118370 and GSE10072 microarray datasets were obtained. Identifying the differentially expressed genes (DEGs) between lung adenocarcinoma and normal samples was done. By using bioinformatics tools, a protein-protein interaction (PPI) network was constructed, modules were analyzed, and enrichment analyses were performed. The expression and prognostic values of 14 hub genes were validated by the GEPIA database, and the correlation between hub genes and survival in lung adenocarcinoma was assessed by UALCAN, cBioPortal, String and Cytoscape, and Timer tools. Results We found three genes (PIK3R1, SPP1, and PECAM1) that have a clear correlation with OS in the lung adenocarcinoma patient. It has been found that lung adenocarcinoma exhibits high expression of SPP1 and that this has been associated with poor prognosis, while low expression of PECAM1 and PIK3R1 is associated with poor prognosis (P < 0.05). We also found that the expression of SPP1 was associated with miR-146a-5p, while the high expression of miR-146a-5p was related to good prognosis (P < 0.05). On the contrary, the lower miR-21-5p on upstream of PIK3R1 is associated with a higher surviving rate in cancer patients (P < 0.05). Finally, we found that the immune checkpoint genes CD274(PD-L1) and PDCD1LG2(PD-1) were also related to SPP1 in lung adenocarcinoma. Conclusions The results indicated that SPP1 is a cancer promoter (oncogene), while PECAM1 and PIK3R1 are cancer suppressor genes. These genes take part in the regulation of biological activities in lung adenocarcinoma, which provides a basis for improving detection and immunotherapeutic targets for lung adenocarcinoma.


Introduction
Tere are 234,030 newly diagnosed lung cancer patients in 2018. Lung cancer is the leading cause of morbidity and mortality among malignant tumors, and the incidence and mortality rate are increasing year by year [1]. Tere are approximately 85% of lung cancer cases that are non-small cell lung cancer (NSCLC) [2]. It is still difcult to predict the prognosis for lung cancer despite the progress made in targeted drugs and immunotherapy. It is reported that the 5year survival rate of NSCLC patients is less than 20%. Te main treatments for NSCLC are surgery, chemotherapy, radiotherapy, targeted therapy, and immunotherapy. Chemotherapy and targeted drug therapy are the most common and efective treatments for NSCLC, especially for patients with recurrence and metastasis [3][4][5]. However, chemical antitumor drugs have great side efects, and patients are poorly tolerated. Targeted drugs combined with chemotherapy can reduce the side efects of chemotherapy, but they are still drug resistant after a period of use [6][7][8]. Te use of molecular diagnosis and treatment has become increasingly important to the treatment of non-small cell lung cancer in recent years [9,10]. As a result, it is imperative to pinpoint the precise molecular mechanisms of the occurrence, development, invasion, and metastasis of NSCLC. In view of this, the development of new molecular biomarkers is vital for early diagnosis, prevention, and precision treatment.
It is increasingly common to use bioinformatics to identify biomarkers associated with certain diseases. At present, the bioinformatics technology provides a basis for further research on disease pathways and cellular activity networks by exploiting the underlying genetic and molecular mechanisms of disease. In this study, microarray datasets GSE118370 and GSE10072 were obtained from the Integrated Gene Expression Database (GEO) to identify differentially expressed genes (DEGs) between lung adenocarcinoma and adjacent normal tissues consisting of gene expression data for 66 lung adenocarcinoma samples of patients and 55 samples of normal lung tissue. In order to gain a deeper understanding of how these DEGs function biologically, pathway enrichment analysis using the Genomics Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) was conducted. In addition, we created a protein-protein interaction (PPI) network related to DEGs.
With the help of String's online tool, the protein interaction between DEGs was constructed and then beautifed with Cytoscape. Ten, we searched for the hub gene using the CytoHubba plugin in Cytoscape. For this study, we used four diferent models to identify the hub genes that were most signifcant. In the last step, we will use tools such as UALCAN, cBioPortal, String, Cytoscape, and Timer to discover the gene and to learn about its role in the biological system. We demonstrated that SPP1 and PIK3R1 were possible biomarkers of lung adenocarcinoma. Further investigation revealed which we were pleasantly surprised to fnd that both SPP1 and PIK3R1 were associated with lung adenocarcinoma prognoses and immunotherapy. In conclusion, this study aims to develop promising new biomarkers from a new perspective for the diagnosis, prognosis, and molecular target therapy or immunotherapy of NSCLC.

Data Resources.
Te NCBI-GEO database (https://www. ncbi.nlm.nih.gov/geo/) [11] contains microarrays, chips, and gene expression data, which is an open high-throughput functional genome database. Te microarray data of GSE10072 data (including 58 lung adenocarcinoma samples and 49 normal lung tissue samples) and GSE118370 data (including 6 lung adenocarcinoma samples and 6 normal lung tissue samples) are based on GPL96-57554 and GPL570-55999 platforms from the NCBI-GEO database ( Figure 1).
Te Limma package in R language was used to identify genes that difer in expression between lung cancer and normal lung tissues [12]. Te DEGs were screened using the R studio's limma package, which performs log2 conversion and normalization of the matrix data. Trough the Afy package, the GSE10072 and GSE118370 raw microarray data were processed in the R studio and standardized by the RMA method. Te cutof criteria were adjusted to P < 0.05, (logFC) which was >1 or <−1. DEGs were visualized using a hierarchical clustering heat map and a volcano map.
We used the GEPIA website (https://gepia.cancer-pku. cn) to probe the DEGs of lung adenocarcinoma in TCGA database [13]. In this research, DEGs were defned as P value <0.05, and the (logFC) value was >1 or <−1. Ten, fun-richVenn software was used to integrate the two data sets to obtain the common up-regulated and downregulated DEGs.

GO and Pathway Enrichment Analysis.
Te pathway enrichment analysis was performed to determine the biological functions of the overlapping DEGs, based on FUNRICH software. We used the online WebGestalt (https://www. webgestalt.org/) tool for GO (Gene Ontology) enrichment, which is an extensively used method to investigate the molecular function (MF), cell component (CC), biological process (BP), and site of expression of genes or gene products for the analysis of DEGs [14]. Te KEGG pathway was analyzed by the CLUGO [15] plugin in Cytoscape software (https://www. cytoscape.org/) [16]. PANTHER and REACTOME [17,18] were also widely used databases for systematic analysis of highlevel gene functions by KOBAS, using the CLUGO plugin in Cytoscape software. Signifcant pathways with P value <0.05 and the top 20 plotted were visualized by the R language.

PPI Network Establishment and Module Analysis.
To better illustrate the existence of potential interactive relationships between overlapping DEGs, we selected the online database search tool String (https://string-db.org/) to retrieve interacting genes [19]. Te frst step is to draw the PPI network diagram for DEGs using the String website. Ten, Cytoscape was applied to identify the top 30 genes based on the DEGREE, MCC, DMNC, and MNC methods generated by CytoHubba [20] plugin in Cytoscape software. Te top common genes were identifed according to the previously mentioned four methods which were selected as hub genes using FUNRICH software.

Expression and Survival Analysis of Hub Genes.
To confrm diferential expression of hub genes between lung adenocarcinoma and normal lung tissues, we validated each hub gene in TCGA database by using the online tool GEPIA website (https://gepia.cancer-pku.cn). Correlation between DEG expression and overall survival (OS) in lung adenocarcinoma was also analyzed using the same tool website on the GEPIA online database. P value <0.05 was considered statistically signifcant.

Upstream miRNA Prediction and Survival Analysis.
Upstream miRNA prediction and survival analysis of the three genes (SPP1, PIK3R1, and PECAM1) were conducted with the miRtarbase, and the miRNA survival analysis was Genetics Research used by the OncoLnc (https://www.oncolnc.org/) [21] website. P value <0.05 was considered statistically signifcant.

Analysis of the Target Gene.
Based on survival analysis, we analyzed the target gene which was identifed from the hub genes using the UALCAN (https://ualcan.path.uab.edu/index. html) [22], GEPIA, STRING, and DAVID [23] online websites. In order to get the distribution and relationship of all proteins, we used the Human Protein Atlas (https://www.proteinatlas. org/) [24]. Te protein expression level of target genes was verifed in the Human Protein Atlas. P value <0.05 was considered statistically signifcant.

mRNA Expression Correlates with Immune Cell
Infltration and Immune Checkpoint Activation. Te TIMER website (https://cistrome.shinyapps.io/timer/) [25] focuses on analyzing tumor immune relationships. TIMER was used to analyze mRNA expression data of SPP1 in TCGA database which was correlated with tumor infltration and immune checkpoints. P value <0.05, and the (logFC) value was >1 or <−1 was considered statistically signifcant.  PANTHER analysis shows the DEGs are mostly enriched for signaling pathways related to integrin signaling, infammation mediated by chemokine, chemokine signaling, plasminogen activating cascade, and so on (Figure 3(g)). At last, the DEGs are mostly enriched in extracellular matrix organization, signal transduction, metabolism, and so on when using REACTOME pathway analysis (Figure 3(h)).

PPI Construction and Screening for Hub Genes.
Our analysis was based on the STRING online tool and Cytoscape software; we constructed a DEG PPI network containing 94 DEGs with 94 nodes and 224 edges identifed, as shown in Figure 4. Further analysis of these genes and pathways, as shown in the PPI network, CPB2, SERPING1, CFD, A2M, PROS1, C7, C5AR1, CLU, and THBD, is in the complement and coagulation cascades pathway. Some genes such as HMMR, VWF, SPP1, CD36, LAMC3, THBS2 SDC1, and so on are in the ECM-receptor interaction pathway, while some genes are the core status for many pathways across them. For example, PIK3R1 gene has 12 pathways, such as insulin resistance, relaxin signaling pathway, signaling pathway regulating stem cell pluripotency, fuid shear stress and atherosclerosis, AGE-RAGE signaling pathway in diabetic complication, focal adhesion, and so on. In order to study the protein networks associated with these genes, we created a DEG PPI network based on the STRING online database and Cytoscape ( Figure 5(a)). Te top 20 genes are identifed as the most promising hub genes using CytoHubba plugin in Cytoscape software according to the degree, MNC, closeness, and betweenness methods, respectively (Figures 5(b)-5(e)). Te genes are selected from the top most connected genes using Venn analysis, and 14 genes are obtained as hub genes which include CDH1, PECAM1, VWF, SPP1, CDH5, T1MP1, ACE, CAV1, CTGF, A2M, CLU, PIK3R1, EPCAM, and ANGPT1 ( Figure 5(f )).

Expression of Hub Genes and Survival Analysis.
To validate the distinctive expression of the 14 hub genes we got from previse analysis between lung adenocarcinoma (LUAD) and normal tissues. As a result of using the GEPIA website to analyze the expression level of each hub gene, we      We carried out a survival analysis of the hub genes by the GEPIA website to investigate the relevance to the survival with lung adenocarcinoma patients. Tree genes (PIK3R1, SPP1, and PECAM1) have a clear correlation with OS in lung adenocarcinoma patients (Figure 7(a)). Among the three genes, SPP1 is highly expressed in lung adenocarcinoma compared with normal lung tissue, but PECAM1 and PIK3R1 have low expression in lung adenocarcinoma compared with normal lung tissue, and the high expression of SPP1 is associated with poor prognosis (Figure 7(d)), while a positive prognosis is associated with high PECAM1 and PIK3R1 expression (Figures 7(b) and 7(c)).

Upstream microRNA Prediction and Survival Signifcance
Analysis. Te miRtarbase prediction tool was used to predict upstream miRNAs of three genes (SPP1, PIK3R1, and PECAM1) and found two miRNAs related to the two hub genes; miR-146a-5p is associated with SPP1, while miR-21-5p is associated with PIK3R1. However, we don't fnd any miRNAs signifcantly related to PECAM1 (Figures 8(a)-8(c)). Ten, we analyzed the survival curves of miRNAs by the OncoLnc website. We found that the higher expression of miR-146a-5p on upstream of SPP1 is associated with a higher surviving rate in cancer patients (Figure 8(d)). On the contrary, lower miR-21-5p on upstream of PIK3R1 is associated with a higher surviving rate in cancer patients (Figure 8(e)), P < 0.05.

Te Biological Role of PIK3R1 in Tumors.
PIK3R1 was identifed as one of the target genes prior to survival analysis. Here, we analyzed PIK3R1 through the UALCAN website and found that PIK3R1 was highly expressed in lung adenocarcinoma compared with normal lung tissue (Figure 9(a)). Te expression of PIK3R1 in diferent tumors and normal tissues by PAN cancer analysis on the TIMER website revealed that PIK3R1 was highly expressed in almost all the tumors (Figure 9(b)). 10 genes were found to be closely related to PIK3R1 by protein interaction analysis through the STRING website, AKT1, PTEN, ERBB2, PIK3CD, PIK3CA, EGFR, CBL, PIK3CB, IRS1, and SHC1 (Figure 9(c)). KEGG analysis of PIK3R1 showed that the pathway was mainly enriched in the EGF-receptor signaling pathway, insulin/IGF pathway-protein kinase B signaling cascade, p53 pathway, and p53 pathway feedback loops 2 ( Figure 9(d)). Te mutations and mutation rates of the 11 genes included PIK3R1 that were obtained by the cBioPortal TCGA online analysis tool, and each gene is mutated (Figure 9(e)). Analysis of 10 gene-related pathway network was performed, with white representing tumor-targeted drugs and yellow representing oncology drugs approved by the FDA (Figure 9(f )). We next analyzed the correlation between PIK3R1 and seven genes associated with targeted drugs. We found that PIK3R1 has strong coexpression relationship with PIK3CB, CBL, EGFR, PIK3CA, PTEN, and PIK3CD, while it has negative relationship with SHC1 ( Figure 9(g)).

Te Biological Role of SPP1 in Tumors.
Te diference expression of SPP1 in lung adenocarcinoma and normal tissues was analyzed by the HPA online tool. Surprisingly, SPP1 was highly expressed in lung adenocarcinoma compared to the normal lung tissue (Figure 10(a)). Te expression of SPP1 in diferent tumors and normal tissues by PAN cancer analysis on the TIMER website revealed that SPP1 is highly expressed in almost all tumors (Figure 10(b)). Te methylation level of SPP1 in lung adenocarcinoma is signifcantly decreased (Figure 10(c)) which is determined by the UALCAN online tool. SPP1 is also associated with clinical lymph node metastasis. Te SPP1 is most expressed in grade N3 lymph node metastasis compared to N0, N1, and N2 ( Figure 10(d)). 10 genes closely related to SPP1 are found by SPP1 protein interaction analysis through the STRING website ( Figure 10(e)). KEGG analysis of SPP1 revealed that 10 genes were closely related to it and these genes were mainly enriched in the ECM-receptor interaction pathway ( Figure 10(f )). Te mutations and mutation rates of these 11 genes that included SPP1 were obtained by the cBioPortal TCGA online analysis tool, and each gene is mutated (Figure 10(g)). Analysis of 11 gene-related pathways network was performed, with white representing tumortargeted drugs and yellow representing oncology drugs approved by the FDA (Figure 10(h)). Te relationship between these genes and SPP1 was also analyzed. We found that SPP1 has strong coexpression relationship with TIMP1, FAM20C, IL6, ITGAV, MMP3, FN1, and CD44 ( Figure 10(i)).

SPP1 Acts as Immune-Related Genes in Lung
Adenocarcinoma. In order to explore the relationship between lung adenocarcinoma and tumor immunity, we analyzed the immune cell infltration and found that SPP1 is expressed in many immune cells, such as macrophages, and the association of neutrophils and dendritic cells in lung adenocarcinoma with SPP1 was analyzed using the GEPIA website (Figure 11(a)P < 0.05), while SPP1 was involved in the infltration of CD4 + T cells, macrophage and dendritic cells through the TIMER website ( Figure 11(b), P < 0.05). As immunotherapy is currently mainly focused on immune checkpoint inhibitors, such as CD274, PDCD1, PDCD1LG2, and CTLA4, we further analyzed the coexpression relationship of SPP1 and immune checkpoint-related genes.
We are surprised to fnd that SPP1 has signifcant coexpression relationship with CD274 and PDCD1LG2 (Figures 11(c) and 11(d)).

Discussion
Lung cancer is one of the most common malignant tumors in the world, so it poses a serious threat to human health. Worldwide, lung cancer accounts for about 17% of all new cases of malignant tumors and about 23% of all patients with tumor-related death [26]. Its incidence and mortality are increasing year by year, ranking the frst among malignant tumors [27,28]. Finding out the molecular mechanism and biomarkers related to the occurrence and development of lung cancer has always been the focus and difculty of clinical and scientifc research, which has important research value in improving the diagnosis, treatment efcacy, and prognosis survival of lung cancer patients. Te aim of the present study is through the bioinformatics analysis which is performed systematically to clarify the key efect of the candidate genes and pathways in NSCLC. Gene expression data (expression profles GSE118370 and GSE10072) were gained from the Gene Expression Omnibus database.
Subsequently, 1,635 and 633 potential DEGs were obtained and identifed 88 up-regulated genes and 234 downregulated genes overlapped in the two GSE datasets in lung adenocarcinoma. Ten, we performed enrichment analyses of GO analysis (three methods: CC, MF, and BP), biological pathway, site of expression pathway, KEGG, PANTHER, and REACTOM to analyze up-regulated and downregulated genes.
Te results indicated that the main changes of DEGs in CC are mainly in the extracellular part, and the main differentiation of DEGs in MF is cell adhesion activity, and the main discrepancy of DEGs in the BP is being extraordinarily enriched in cell communication. Te epithelial-tomesenchymal transition had the highest enrichment score in the biological pathway, while the cerebrospinal parts and lungs had the highest enrichment score in the site of the expression pathway. Te complement and coagulation cascades had the highest enrichment score in KEGG pathway analysis, and the integrin signaling pathway had the highest enrichment score in the PANTHER pathway analysis, and the extracellular matrix organization had the highest enrichment score in REACTOME pathway analysis.
Furthermore, we constructed a PPI network to analyze the interactional relationships between the DGEs which included 94 nodes and 224 edges. 14 hub genes were identifed with three up-regulated genes (CDH1, EPCAM, and SPP1) and ten down-regulated genes (A2M, ACE, ANGPT1, CAV1, CDH5, CLU, CTGF, PECAM1, PIK3R1, and VWF), and one gene had no signifcance. Surprisingly, the survival analysis revealed that three genes (PIK3R1, SPP1, and PECAM1) have a clear correlation with OS in lung adenocarcinoma patients. Among these three genes, SPP1 is highly expressed in lung adenocarcinoma compared with normal lung tissue, and the high expression is associated with poor prognosis; however, the poor prognosis was associated with the low expression of PECAM1 and PIK3R1. Terefore, we chose these three genes (PIK3R1, SPP1, and PECAM1) as target genes for the next analysis. Tese three genes have a clear correlation with OS in lung adenocarcinoma patients. SPP1 is highly expressed in lung adenocarcinoma compared with normal lung tissue, and the high expression of SPP1 is associated with poor prognosis, while the high expression of PECAM1 and PIK3R1 is associated with good prognosis. Te results indicated that SPP1 is 8 Genetics Research a cancer promoter (oncogene), while PECAM1 and PIK3R1 are cancer suppressor genes. Te oncoming analysis was performed with the survival curves of upstream miRNAs with three genes (PIK3R1, SPP1, and PECAM1), respectively. Te results showed that the higher expression of miR-146a-5p on upstream of SPP1 is associated with a higher surviving rate in cancer patients (Figure 8(d)). On the contrary, lower miR-21-5p on upstream of PIK3R1 is associated with a higher surviving rate in cancer patients (Figure 8(e)), P < 0.05.
Platelet adhesion molecule-1 (PECAM1), a cell adhesion and signaling receptor, is located on chromosome 17q23.3 and encodes a protein found on the surface of monocytes, platelets, neutrophils, and certain types of T cells and makes up most of the intercellular junctions in endothelial cells.
PECAM1 is a member of the immunoglobulin superfamily and is expressed in monocytes, neutrophils, macrophages, and other types of immune cells, as well as endothelial cells [29]. Tey may be involved in leukocyte migration, angiogenesis, and integrin activation. It was previously reported that PECAM-1 is also associated with advanced metastatic tumor progression [30]. Anti-PECAM-1 antibodies have been reported to inhibit late metastatic progression of various tumors without blocking tumor-platelet and tumorendothelial interactions, events associated with the initial establishment of metastatic tumor foci. Studies of the molecular mechanism of PECAM-1 have shown that PECAM-1 mediates the release of soluble mediators that stimulate in vitro tumor cell proliferation [31]. Yu et al. determined that PECAM-1 plays a key role in tumorigenesis of LUAD by regulating vascular endothelial growth factor (VEGF) expression [32]. In this study, based on the survival and the hub gene analysis, we found that PECAM-1 expression was low in lung adenocarcinoma tissues, and importantly, PECAM-1 expression was associated with worse prognosis in lung adenocarcinoma. It may be an important predictor of prognosis. Secretory phosphoprotein 1(SPP1) is a secretory acidic glycoprotein with multiple functions, also known as osteopontin (OPN). It contains the plant homeodomain (PHD) fnger, which is a reading domain that typically binds unmethylated H3K4(H3K4me0), dimethylated H3K4(H3K4me2), or trimethylated H3K4(H3K4me3) [33]. It has been reported that it is a key extracellular matrix protein involved in tumor progression and metastasis and is considered as a promising biomarker for prognosis and therapeutic targets [34,35]   *** *** *** *** *** *** *** *** *** *** *** *** *** ** ** ** **  *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** **  higher cytoplasmic SPP1 expression levels had a signifcantly better prognosis than those with lower SPP1 levels. However, when expressed in the nucleus, SPP1 did not show prognostic value in colorectal cancer [36]. Tis implies that SPP1 overexpression in the cytoplasm is an important and good prognostic biomarker. SPP1 enhances the drug resistance of the second generation EGFR TKI in lung cancer treatment. Inhibition of SPP1 may be a therapeutic target to overcome afatinib resistance [37]. At the same time, studies have found that SPP1 can regulate the expression of PD-L1 to mediate the immune escape of lung adenocarcinoma cells. In addition, SPP1 is also considered a marker of early lymphatic metastasis in lung cancer [38,39]. Tis indicates that SPP1 plays an extremely important role in the progression of lung cancer and also provides strong supporting evidence for our fndings.
Silencing SPP1 can inhibit the proliferation, invasion, migration, and the EMT process of gastric cancer cells by inhibiting the PI3K/AKTsignaling pathway and promote the apoptosis of gastric cancer cells [40]. SPP1 mediates chemotherapy resistance in prostate cancer [41]. In addition, some scholars have demonstrated that inhibition of SPP1 expression can inhibit the progression of melanoma [42]. Tese evidences suggest that SPP1 plays a role in promoting the progression of various cancers. Terefore, we analyzed the expression of SPP1 in lung cancer and diferent cancers.
To investigate the relationship between SPP1 and lung adenocarcinoma, we performed immunohistochemistry by using the HPA online tool. It was found that SPP1 was highly expressed in lung adenocarcinoma compared with normal lung tissue. Te results of the TIMER website analysis showed that SPP1 was highly expressed in almost all cancers. Te methylation level of SPP1 is signifcantly decreased in lung adenocarcinoma, and SPP1 is clearly associated with clinical lymph node metastasis. We were surprised to fnd that the expression of SPP1 was associated with miR-146a-5p and the high expression of miR-146a-5p was related to good prognosis.
MicroRNAs (miRNAs) were named as a rank of small noncoding RNA molecules which are less than 22 nucleotides and induce post-transcriptional mRNA degradation after binding to the 3-untranslated regions (3′UTRs) in target mRNAs [43]. Te expression level and regulatory mechanism of miR-146a-5p was diversity and have been reported in a variety of tumors. miR-146a-5p is lowly expressed in lung cancer, gastric cancer, and breast cancer [44][45][46], and the high expression of exogenous miR-146a-5p can inhibit the proliferation of tumor cells. However, this is in contrast to the high expression level of miR-146a-5p in melanoma and thyroid carcinoma [47,48]. MiR-146a-5p was a powerful inhibitor in cervical cancer and epithelial ovarian cancer with a better prognosis for patients [49,50]. Tis is consistent with our fnding that Mir-146A-5p expression is indeed associated with lung adenocarcinoma. In addition, miR-146a-5p has good clinical application value, and the serum level of miR-146a-5p in patients with nonsmall cell lung cancer is lower, and the expression level indicates a poor prognosis [51]. MiR-146a-5p can also be used as a diagnostic indicator of non-small cell lung cancer [52].
PIK3R1 gene is the regulatory subunit coding gene of PI3K. An increasing number of PIK3R1 has been identifed as diferentially expressed in many human cancers and associated with tumor progression and metastasis [53]. PIK3R1 is abnormally expressed in a variety of tumors and is associated with increased cell proliferation and invasion. Te mutation of PIK3R1 gene in breast cancer, endometrial cancer, and urothelial carcinoma can lead to pathogenesis [54][55][56]. However, in diferent types of cancers, it functions diferently. In ovarian cancer and colon cancer, PIK3R1 gene played a role of an oncogene [57], while in hepatocellular carcinoma and breast cancer, it played as a tumor suppressor gene [58,59]. When down-regulated expression of PIK3R1 gene can inhibit PTEN function and reduce the degradation of PIP3 molecule, which was activated PI3K/AKTsignaling to play the role of the tumor suppressor gene [60]. Te change of the PI3K/AKT signaling pathway is also related to the occurrence of breast cancer [61]. Studies have found that Mir-21 targeting PIK3R1 can inhibit the migration and invasion of tumor cells by reducing the PI3K/AKT signaling pathway and reversing EMT in breast cancer patients [62]. Although we identifed the role of PIK3R1 in exacerbating NSCLC progression, the underlying mechanisms driving tumor progression need to be further elaborated. We therefore analyzed the biological processes in which it might participate. We also found in lung adenocarcinoma that the overexpression of PIK3R1 was associated with miR-21-5p and the low expression of miR-21-5p means good prognosis.
MiR-21-5p is the one unstable strand chain form from mature miR-21 while the other miR-21-3p chain is degraded. MiR-21 was considered an onco-miRNA which participated in oncogenesis via the regulation of a lot of tumor suppressors [63]. Te study found that the expression level of Mir-21 in colorectal cancer tissues is higher than that in adjacent tissues, and miR-21-5p promoted proliferation and invasion in colon adenocarcinoma cells [64,65]. MiR-21-5p induces cell proliferation by targeting TGFBI in non-small cell lung cancer cells [66]. It can inhibit the sensitivity of hepatocellular carcinoma cells to cisplatin [67]. Mir-21-5p has also been reported to be involved in a variety of signaling pathways. It promotes the progression of lung adenocarcinoma by targeting the SET/TAF-Iα fraction [68]. It also reported MiR-21-5p promoted the occurrence of lung cancer by inhibiting the RAS/MEK/ERK pathway and inhibiting apoptosis [69]. Downregulation of miR-21-5p inhibited both proliferation and apoptosis in oesophageal squamous cell carcinoma cells via the CADM2/Akt pathway [70]. It promotes EGF-induced pancreatic cancer cell proliferation by targeting Spry2, and the mechanistic revealed that miR-21 targeted MAPK/ERK and PI3K/AKT signaling pathways to modulate cell proliferation [71]. In our study, we found that PIK3R1 was highly correlated with miR-21-5p and afected the prognosis of patients. Terefore, we believe that PIK3R1 may be regulated by miR-21-5p and participate in the malignant progression of nonsmall-cell lung cancer. In our study, we found that SPP1 acted as an immunerelated gene. We used the TIMER website to analyze the involvement of the infltration of macrophages, neutrophils, and dendritic cells. Te expression of immune cells was involved in the infltration of CD4 + T cells, macrophages, and dendritic cells in lung adenocarcinoma (LUAD). Ten, we further analyzed the coexpression of the relationship between SPP1 and immune checkpoint related genes. It was found that SPP1 has signifcant coexpression relationship with CD274(PD-L1) and PDCD1LG2(PD-1). Te immune treatment with (PD-1/PD-L1) is a hot topic in recent years, and the therapeutic efect is encouraging. Studies have shown that PD-L1 was highly expressed in ovarian cancer, melanoma, non-small-cell lung cancer, renal cell cancer, and other tumors, and the expression of PD-1 in CD8 + T cells infltrated in tumor sites is higher than that in peripheral blood [72][73][74][75]. Te combination of PD-L1 and PD-1 inhibited glycolysis, amino acid metabolism, and mitochondrial metabolism. In addition, it can promote the accumulation of polyunsaturated fat A(PUFA) and the oxidation and activation of fatty acids and fnally change the metabolic mode of T cells [76,77].
PD-1 was named the programmed death receptor because of its participation in apoptosis [78]. It is mainly expressed on the surface of T cells, B cells, and natural killer cells. Pd-1 consists of 288 amino acid residues and belongs to the CD28 family, which is located on the PDCD1 gene. Pd-l1, also known as CD274, is the major ligand of PD-1 (the other ligand is PD-L2), a 290-amino acid residue transmembrane protein that is mainly expressed in mature immune cells, such as CD4 + T cells, CD8 + T cells, B cells, macrophages, dendritic cells, endothelial cells, and other nonimmune cells [79]. Teses suggested that the PD-1/ PD-L1 pathway was abnormally activated in tumors. Studies have shown that the PD-L1 expression in lung adenocarcinoma is associated with many genes and activates multiple pathways. KRAS up-regulated PD-L1 through p-REK instead of the p-AKT pathway, and the PD-1 blocker or ERK inhibitor can rescue the antitumor function of T cells and reduce the survival rate of KRAS mutated non-small-cell lung cancer cells [80]. Te expression of PD-L1 can be down-regulated by the EGFR inhibitor erlotinib in lung cancer with EGFR receptor mutation [81]. Te patients with positive PD-L1 were also more sensitive to EGFR tyrosine kinase inhibitors [82]. Studies have shown that chemotherapy, radiotherapy, the cytotoxic T-lymphocyte associated antigen 4 blocker (CTLA4), and other therapeutic methods which produce infltrating T cells can up-regulate PD-L1 expression through IFNy produced by T cells, which is related to the STAT pathway [83][84][85].

Conclusions
Tis study aims to develop promising novel biomarkers from a new perspective for NSCLC diagnosis, prognosis, and molecular target therapy or immunotherapy. We reveal two potential biomarkers for lung adenocarcinoma, including PIK3R1 and SPP1. Apart from two microRNA, miR-21-5p and miR-146a-5p are related to the prognosis of lung adenocarcinoma which are consistent to PIK3R1 and SPP1. All of these genes take part in the regulation of biological activities in lung adenocarcinoma, providing a basis for improving detection and immunotherapeutic targets for lung adenocarcinoma. Finally, we found that the immune checkpoint-related genes CD274(PD-L1) and PDCD1LG2(PD-1) was related to SPP1 in lung adenocarcinoma. With the development of tumor immunology, tumor immunotherapy is expected to become the successor to surgery, chemotherapy, radiotherapy, and targeted therapy. Te previous results also lay a very solid foundation for our nature research and clinical application. However, this study has some limitations, lack of in vivo and in vitro experimental verifcation. In the future work, we will be trying to study the further mechanisms in the base of bioinformatics.

Data Availability
All data generated or analyzed during this study are included within this article.

Consent
Te authors have obtained consent to publish this paper from all the participants of this study.

Conflicts of Interest
Te authors declare that they have no conficts of interest.