Identified GNGT1 and NMU as Combined Diagnosis Biomarker of Non-Small-Cell Lung Cancer Utilizing Bioinformatics and Logistic Regression

Non-small-cell lung cancer (NSCLC) is one of the most devastating diseases worldwide. The study is aimed at identifying reliable prognostic biomarkers and to improve understanding of cancer initiation and progression mechanisms. RNA-Seq data were downloaded from The Cancer Genome Atlas (TCGA) database. Subsequently, comprehensive bioinformatics analysis incorporating gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and the protein-protein interaction (PPI) network was conducted to identify differentially expressed genes (DEGs) closely associated with NSCLC. Eight hub genes were screened out using Molecular Complex Detection (MCODE) and cytoHubba. The prognostic and diagnostic values of the hub genes were further confirmed by survival analysis and receiver operating characteristic (ROC) curve analysis. Hub genes were validated by other datasets, such as the Oncomine, Human Protein Atlas, and cBioPortal databases. Ultimately, logistic regression analysis was conducted to evaluate the diagnostic potential of the two identified biomarkers. Screening removed 1,411 DEGs, including 1,362 upregulated and 49 downregulated genes. Pathway enrichment analysis of the DEGs examined the Ras signaling pathway, alcoholism, and other factors. Ultimately, eight prioritized genes (GNGT1, GNG4, NMU, GCG, TAC1, GAST, GCGR1, and NPSR1) were identified as hub genes. High hub gene expression was significantly associated with worse overall survival in patients with NSCLC. The ROC curves showed that these hub genes had diagnostic value. The mRNA expressions of GNGT1 and NMU were low in the Oncomine database. Their protein expressions and genetic alterations were also revealed. Finally, logistic regression analysis indicated that combining the two biomarkers substantially improved the ability to discriminate NSCLC. GNGT1 and NMU identified in the current study may empower further discovery of the molecular mechanisms underlying NSCLC's initiation and progression.


Introduction
As one of the most devastating diseases worldwide, lung cancer causes nearly 1.6 million mortalities each year [1][2][3]. Approximately 85% of lung cancers are characterized as non-small-cell lung cancer (NSCLC) [4][5][6], which is typically classified into two subtypes, squamous cell carcinoma (SCC) and adenocarcinoma (AD), using standard pathology methods [7][8][9][10]. Tobacco smoking is the most common risk factor for lung cancer. Smoking is also associated with multiple risks, including worse tolerance of treatment, higher risk of failure and second primary tumors, and poorer quality of life. Indeed, it has become clear that the significant reduction in tobacco consumption would result in the prevention of a large fraction of lung cancer cases and other smokingrelated diseases [11][12][13].
In addition, other factors such as air pollution, poor diet, occupational exposure, and hereditary factors have been reported in association with NSCLC in nonsmokers [14][15][16].
Despite recent advances in cancer treatment, unfortunately, the current five-year survival rate of NSCLC remains unsatisfactory [34][35][36][37]. Thus, it is imperative to identify potential biomarkers and explore NSCLC's underlying biological mechanisms.
In recent years, bioinformatics analysis has been utilized as a powerful tool to explore novel prognostic and therapeutic biomarkers and to unveil the potential mechanisms of NSCLC [38][39][40][41]. For instance, a novel model including seven genes was reported to indicate a promising prognostic biomarker for lung SCC patients using integrated bioinformatics methods [41][42][43]. In addition, studies used comprehensive bioinformatics analysis to show that the cell cycle pathway may play a significant role in NSCLC in nonsmokers [44][45][46][47].
In the present study, RNA-Seq data were downloaded from The Cancer Genome Atlas (TCGA) database. Then, the EdgeR package was applied to uncover differentially expressed genes (DEGs) between NSCLC tissues and normal tissues. Using the resulting data, this study is aimed at unveiling the underlying molecular mechanism of NSCLC onset and progression through gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, and the protein-protein interaction (PPI) network. Subsequently, cytoHubba, a novel Cytoscape plugin, was used to reveal the hub genes from 12 topological analysis methods. Furthermore, the prognostic and diagnostic values of the hub genes were further confirmed by survival analysis and receiver operating characteristic (ROC) curve analysis.
The screening revealed two key genes, GNGT1 and NMU, and the protein expressions of these genes were validated by the Human Protein Atlas online database at the system level. Their genetic alteration and coexpression were also revealed. Finally, a logistic regression model was built to evaluate the combined diagnostic capability of GNGT1 and NMU.

Materials and Methods
2.1. Downloading of TCGA Datasets and DEG Screening. The mRNA expression data of NSCLC patients were downloaded from the TCGA database (https://cancergenome.nih.gov/) [48]. The criteria used were as follows: primary site (lung), data category (Transcriptome Profiling), project ID (TCGA-LUAD and TCGA-LUSC), experimental strategy (RNA-Seq), and workflow type (HTSeq-counts). The other filters were kept as default. Practical Extraction and Reporting Language (Perl) was utilized to extract the sample information, generate the mRNA expression matrix, and annotate gene symbols. Finally, data from a cohort containing 1,145 samples were obtained from TCGA. Of these 1,145 samples, there were 108 normal tissue and 1,037 NSCLC samples, respectively. The EdgeR package from Bioconductor was used to screen the DEGs between normal tissue and NSCLC [49][50][51]. The adjusted P < 0:001, and fold change ðFCÞ > 4 were set as the cutoff criteria.

DEG Functional Enrichment
Analysis. Gene ontology (GO) analysis provides a standardized description of gene products in terms of molecular function (MF), biological process (BP), and cellular component (CC) [52]. The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a database offering gene functional meanings and expressed proteins [53]. GO and KEGG enrichment analyses were conducted using the powerful online tool DAVID (DAVID, https:// david.ncifcrf.gov/) and visualized by the R package "ggplot2" [54]. In addition, P < 0:05 was considered to indicate statistical significance.

Constructing the Protein-Protein Interaction Network.
The Search Tool for the Retrieval of Interacting Genes (STRING, https://string-db.org/) database, a database that integrates all functional interactions between proteins, was used to build the PPI network [55]. An interaction score of ≥0.4 was considered statistically significant.

Hub Gene Selection and Analysis.
A Cytoscape plugin, Molecular Complex Detection (MCODE), was utilized to screen modules of PPI networks with a node score cutoff of 0.2, degree cutoff of 2, k-core of 2, and max depth of 100. A P value of <0.05 was considered statistically significant. Next, the DEGs were ranked by cytoHubba [56], which contains 12 algorithms: Maximal Clique Centrality, Edge Percolated Component, Betweenness, Density of Maximum Neighborhood Component, Degree, Bottleneck, Eccentricity, Closeness, Radiability, Maximum Neighborhood Component, Stress, and Clustering Coefficient. The MCODE and cyto-Hubba results were combined to identify the hub genes.
2.5. Survival Analysis of Hub Genes. Whether the expression level of hub genes was associated with overall survival was investigated using the Kaplan-Meier plotter (http://www .kmplot.com/). An online database is capable of assessing the effect of 54,675 genes on survival using 10,461 cancer samples, including samples from 2,437 lung cancer, 1,065 gastric cancer, 1,816 ovarian cancer, and 5,143 breast cancer patients. P < 0:05 (Cox) was considered statistically significant.
2.6. ROC Curve. The ROC curve analysis was applied to evaluate the specificity and sensitivity of the hub genes. The area 2 Disease Markers under the curve (AUC) and P value were calculated. P < 0:05 was considered to denote statistical significance.

Validation of Hub
Genes. The expression level of hub genes in LUAD was validated by Oncomine (https://www .oncomine.org/resource/login.html) [57]. The threshold was set as the following: P < 1E − 4, fold change > 2, and gene ranking in the top 10%.
2.8. Human Protein Atlas. The Human Protein Atlas (https:// www.proteinatlas.org) is an online website that includes immunohistochemical data of nearly 20 types of tumors [58]. In our study, immunohistochemical images were used to directly compare the expression of biomarkers in normal and NSCLC tissues. The intensity of antibody staining indicated the protein expression of hub genes.
2.9. Genetic Alteration of Hub Genes. The cBio Cancer Genomics Portal (http://www.cbioportal.org/) is an open platform that provides visualization, analysis, and downloads of largescale cancer genomic datasets for various cancer types [59]. Complex cancer genomic profiles can be easily obtained using the portal's query interface, enabling researchers to explore and compare genetic alterations across samples. cBioPortal was used to explore genetic alterations, coexpression, and overall survival of two hub genes, GNGT1 and NMU.
2.10. Statistical Analysis. SPSS version 23.0 (SPSS Inc., Chicago, IL, USA) was used to perform logistic regression analysis. ROC curves were generated to evaluate the diagnostic accuracy of GNGT1 and NMU, and AUC was used to evaluate sensitivity and specificity.

Identification of DEGs in NSCLC.
The workflow is shown in Figure 1(a). DEGs were identified using the criteria of P < 0:001 and FC > 4. A total of 1,411 DEGs were screened out between NSCLC and normal samples, including 1,362 upregulated genes and 49 downregulated genes (Figures 1(b) and 1(c)).

Functional and Pathway Analysis of DEGs.
To further investigate the specific function of these genes, all DEGs were uploaded to the online tool DAVID. GO analysis revealed that in terms of BP, the DEGs were associated with nucleosome assembly, transcription from RNA polymerase II promoter, telomere organization, flavonoid glucuronidation, and DNA replication-dependent nucleosome assembly. When examined in terms of MF, DEGs were enriched in protein heterodimerization activity, retinoic acid-binding, hormone activity, glucuronosyltransferase activity, and extracellular ligand-gated ion channel activity. Regarding CC, the DEGs were mainly enriched in the extracellular region, cornified envelope, nucleosome, extracellular space, and intermediate filament. KEGG analysis found that the DEGs were predominantly involved in the Ras signaling pathway, nicotine addiction, steroid hormone biosynthesis, alcoholism, and systemic lupus erythematosus (Figure 2(a)).

PPI Network Construction, Module Analysis, and Hub
Gene Selection. The PPI network was constructed using the STRING database and visualized in Cytoscape. The PPI network consisted of 787 nodes and 2,104 edges, including 1,362 upregulated genes and 49 downregulated genes. The overlapping genes of different algorithms selected by cytoHubba were GNGT1, GNG4, NMU, GCG, TAC1, GAST, NPSR1, and GCGR (Figure 2(b)). The top modules were then extracted from the PPI network (Figure 2(c)).

Human Protein Atlas.
After studying the mRNA expression of hub genes in NSCLC, we tried to explore the protein expression of hub genes using the Human Protein Atlas. The results revealed that NMU protein was not expressed in normal lung tissues, whereas medium expression of NMU protein was observed in the NSCLC tissues. However, GNGT1 was not detected in either normal lung tissues or NSCLC tissues (Figure 4(c)).
Notably, according to the ROC curve analysis, the AUC of GNGT1 was 0.903 (P < 0:0001). For NMU, the AUC was

Discussion
Elucidating the molecular mechanisms of the initiation and development of NSCLC would benefit the early diagnosis and targeted therapy efforts [60][61][62][63]. In this study, we identified 1,362 upregulated genes and 49 downregulated genes and selected GNGT1, GNG4, NMU, GCG, TAC1, GAST NPSR1, and GCGR as hub genes using Molecular Complex Detection (MCODE) and cytoHubba. These genes were primarily enriched in terms of the Ras signaling pathway, steroid hormone biosynthesis, nicotine addiction, alcoholism, steroid hormone biosynthesis, and systemic lupus erythematosus.
Several studies have been conducted to investigate the association between alcohol and lung cancer. Some studies have reported that alcohol is linked to a number of human diseases, including cancers [101][102][103]. Interestingly, another report shows that alcohol has nothing to do with lung cancer [104]. Thus, conducting further experiments is necessary to confirm whether lung cancer is attributable to alcohol abuse. All in all, the findings of these studies are consistent with our results.
In the current study, the expressions of GNGT1 and NMU were low both in the Oncomine and TCGA databases, indicating that GNGT1 and NMU may play a role as oncogenes. The transducin γ-subunit gene (GNGT1) has been localized to human chromosome 7 [104] and is associated with various forms of cancer [105][106][107][108]. GNGT1 exerts effects in different tissues regulating cell proliferation, migration, adhesion, and apoptosis [109][110][111]. One study showed that GNGT1 could serve as a marker of medulloblastoma [112]. GNGT1 can be utilized to differentiate gastrointestinal stromal tumor and leiomyosarcoma, two cancers that have very similar histopathology, but require very different treatments [113][114][115]. In the current study, GNGT1 was significantly upregulated and high mRNA expression of GNGT1 was associated with poor overall survival in NSCLC patients. Furthermore, KEGG analysis showed that GNGT1 was involved in the Ras signaling pathway. Therefore, it is reasonable to regard GNGT1 as a hub gene of NSCLC. Further studies are needed to better understand GNGT1's association with NSCLC.
Neuromedin U (NMU) has been reported to exhibit early alterations associated with cancer, including lung cancer, pancreatic cancer, breast cancer, renal cancer, and endometrioid endometrial carcinoma, through promoting migration, invasion, glycolysis, a mesenchymal phenotype, a stem cell phenotype of cancer cells, and resistance to the antitumor immune response [116][117][118]. It is overexpressed in pancreatic cancer and increases the cancer invasiveness through the hepatocyte growth factor c-Met pathway [119][120][121]. A role has also been implicated for NMU in human breast can-cer and endometrial cancer [122][123][124]. The protein encoded by NMU can amplify ILC2 to drive allergic lung inflammation [125]. NMU is regulated by RhoGDI2, a metastasis inhibitor, which can be used as a target for lung metastasis. The expression of NMU is negatively correlated with prognosis in most types of cancer [126][127][128]. In the present study, the higher mRNA and protein expression of NMU were negatively correlated with overall survival. Therefore, our results are in line with these previous studies, which indicated that NMU may be directly or indirectly important in NSCLC development.
Moreover, to explore the predictive ability of GNGT1 and NMU, logistic regression analysis was performed. The logistic regression analysis showed a probabilistic nonlinear regression, which has functions in discrimination and prediction. Notably, according to logistic regression analysis, the AUC of the ROC curve of GNGT1 was 0.903 (P < 0:0001), and the AUC of NMU was 0.932 (P < 0:0001). Combining the two biomarkers enabled a relatively high capacity for discrimination between NSCLC and normal patients, with an AUC of 0.969, indicating that the combined test of GNGT1 combined with NMU was superior to testing for either gene individually, with better clinical accuracy and higher diagnostic value. Therefore, it is of high scientific value to use a logistic regression model as a diagnostic model for NSCLC.
In conclusion, our results identified two hub genes, GNGT1 and NMU, as prognostic target genes, and highlighted their probable role in NSCLC. Nevertheless, a few limitations to this study should be acknowledged. Because all the data analyzed in the current study were retrieved from the online databases, further independent experiments are required to validate our findings and to explore the molecular mechanism of the hub genes in NSCLC development and progression.

Data Availability
All data generated or analyzed during this study are included in this article.

Additional Points
Impact Statement. GNGT1 and NMU identified in the current study may empower further discovery of the molecular mechanisms underlying NSCLC's initiation and progression.

Conflicts of Interest
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.