Independent Prognostic Potential of GNPNAT1 in Lung Adenocarcinoma

Background Glucosamine-Phosphate N-Acetyltransferase 1 (GNPNAT1) is a critical enzyme in the biosynthesis of uridine diphosphate-N-acetylglucosamine. It has many important functions, such as protein binding, monosaccharide binding, and embryonic development and growth. However, the role of GNPNAT1 in lung adenocarcinoma (LUAD) remains unclear. Methods In this study, we explored the expression pattern and prognostic value of GNPNAT1 in LUAD across TCGA and GEO databases and assessed its independent prognostic value via Cox analysis. LinkedOmics and GEPIA2 were applied to investigate coexpression and functional networks associated with GNPNAT1. The TIMER web tool was deployed to assess the correlation between GNPNAT1 and the main six types of tumor-infiltrating immune cells. Besides, the correlations between GNPNAT1 and the LUAD common genetic mutations, TMB, and immune signatures were examined. Results GNPNAT1 was validated upregulated in tumor tissues in TCGA-LUAD and GEO cohorts. Moreover, in both TCGA and GEO cohorts, high GNPNAT1 expression was found to be associated with poor overall survival. Cox analysis showed that high GNPNAT1 expression was an independent risk factor for LUAD. Functional network analysis suggested that GNPNAT1 regulates cell cycle, ribosome, proteasome, RNA transport, and spliceosome signaling through pathways involving multiple cancer-related kinases and E2F family. In addition, GNPNAT1 correlated with infiltrating levels of B cells, CD4+ T cells, and dendritic cells. B cells and dendritic cells could predict the outcome of LUAD, and B cells and CD4+ T cells were significant independent risk factors. The TMB and mutations of KRAS, EGFR, STK11, and TP53 were correlated with GNPNAT1. At last, the correlation analysis showed GNPNAT1 correlated with most of the immune signatures we performed. Conclusion Our findings showed that GNPNAT1 was correlated to the prognosis and immune infiltration of LUAD. In particular, the tight relationship between GNPNAT1 and B cell marker genes may be the epicenter of the immune response and one of the key factors affecting the prognosis. Our findings laid the foundation for further research on the immunomodulatory role of GNPNAT1 in LUAD.


Introduction
Lung cancer is still the most common cancer (accounting for 11.6% of all cancers) and the leading cause of cancer death, with over 1.7 million deaths worldwide in 2018 [1][2][3]. The survival rate of lung cancer depends mainly on the stage of diagnosis. In general, the current 5-year survival rate is about 18%, but if found early, the prognosis can be improved [4]. Unfortunately, at the time of diagnosis, only about 15% of cases were at the early stage, while the vast majority (57%) were already at the advanced stage [4]. Lung adenocarcinoma (LUAD) is a subclass of non-small-cell lung cancer (NSCLC), which develops along the outer edge of the lungs within glandular cells in the small airways. LUAD accounts for approximately 40% of all lung cancer cases being the most common type of histology [5].
However, due to the combination of adverse factors that span a range of different biological and clinical behaviors and the increased resistance to antilung cancer drugs, existing targeted drugs have shown unsatisfactory efficacy [6].
In NSCLC, little is known about the genomic and host factors that drive the progression of preinvasive lesions. Investigating these factors can enhance our understanding of lung cancer biology, help to develop better screening strategies, and improve patient prognosis [7]. Furthermore, the lack of specific markers for disease stages or tumor types represents a key gap in the current understanding and treatment of LUAD.
Glucosamine-Phosphate N-Acetyltransferase 1 (GNPNAT1), a member of the GCN5-related Nacetyltransferase superfamily, is a key enzyme in the pathway toward biosynthesis of uridine diphosphate-Nacetylglucosamine (UDP-GlcNAc), an important donor substrate for N-linked glycosylation [8]. The gene encoding GNPNAT1 has been characterized in many eukaryotes, such as the murine gene EMeg32 [9]. It is worth noting that EMeg32 is essential for embryonic development, and the level of UDP-GlcNAc that depends on EMeg32 affects sensitivity to apoptosis stimulation and cell cycle progression [10]. One recent study indicated that the expression of GNPNAT1 is associated with the progression of castration-resistant prostate cancer via the phosphatidylinositol3-kinase/protein kinase B signaling pathway [11]. However, whether GNPNAT1 is a biomarker of LUAD and the biological function of GNPNAT1 in LUAD remains to be determined.
In this study, we examined the expression and prognostic value of GNPNAT1 in LUAD patients in the Cancer Genome Atlas (TCGA) and GEO cohorts. Moreover, using multidimensional analysis, we assessed the coexpression and functional network associated with GNPNAT1 in LUAD and learned its role in tumor immunity. The present study may potentially reveal new biological targets and strategies for the diagnosis, treatment, and prognosis assessment of LUAD.

Materials and Methods
2.1. Data Mining from TCGA and GEO Databases. 513 LUAD patients' gene expression profiles, along with their clinical data, and survival status were downloaded from the GDC Xena Hub (v2019-08-28, https://gdc.xenahubs.net) with cohort ID: GDC TCGA Lung Adenocarcinoma (TCGA-LUAD). For finding a suitable cohort in GEO for differential gene express validation, the inclusion criteria of the dataset were as follows: (I) expression level of GNPNAT1 in both LUAD and healthy tissues should be available, (II) the size of the dataset should be greater than 100 samples (of which the normal tissue sample should be greater than 50), and (III) the dataset should be released later than the year 2010. GSE19188 (n = 45 for LUAD, n = 65 for normal) and GSE32863 (n = 58 for LUAD, n = 58 for normal) were chosen. As for survival validation, the inclusion criteria were (I) expression level of GNPNAT1 of LUAD tissues should be available; (II) the numbers of samples with survival data should be higher than 200; (III) each of the four tumor stages should be available; (IV) the dataset should be released later than 2010. Finally, GSE72094 (n = 442) was chosen for survival validation. In our research, TCGA-LUAD, GSE19188, and GSE32863 cohorts were applied to evaluate the difference in expression of GNPNAT1 in normal tissues and tumor tissues. TCGA-LUAD and GSE72094 were used for survival analysis to value how GNPNAT1 affects the prognosis of LUAD.

Differential
Expression of GNPNAT1. The distributions of expression of GNPNAT1 in tumor and healthy tissues were examined by unpaired and paired t-test in TCGA-LUAD cohort and unpaired t-test in GSE19188 and GSE32863 cohorts. R package "beeswarm" was for visualization.
2.3. Survival Analysis. Survival analysis was conducted between high and low GNPNAT1 expression groups in cohorts of TCGA-LUAD and GSE72094 through Kaplan-Meier analysis with log-rank test, using "survminer" and "survival" packages in R. In addition, univariate and multivariate Cox analyses were performed on GNPNAT1 and clinical characteristics to assess the potential independent prognostic value of GNPNAT1 in LUAD.

LinkedOmics and GEPIA2 Databases
Analysis. LinkedOmics (http://www.linkedomics.org) is a publicly available portal that includes multiomics data from all 32 TCGA cancer types [12]. In the LinkFinder module, the Pearson test was applied to perform statistical analysis on GNPNAT1 coexpression. LinkInterpreter module was used to conducted analyses of Gene Ontology (Biological Process), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, kinasetarget enrichment, miRNA-target enrichment, and transcription factor-target enrichment through Gene Set Enrichment Analysis (GSEA). The rank criterion was false discovery rate ðFDRÞ < 0:05, and simulation was set as 500. The Gene Expression Profiling Interactive Analysis (GEPIA2) (http:// gepia2.cancer-pku.cn/) is a web server for analyzing the RNA sequencing expression data of 9,736 tumors and 8,587 normal samples from the TCGA and the GTEx projects, using a standard processing pipeline [13]. GEPIA2 was applied to plot survival heat maps and survival curves.
2.5. The Correlation between GNPNAT1 and Six Types of Infiltrating Immune Cells. The Tumor Immune Estimation Resource (TIMER, https://cistrome.shinyapps.io/timer/) is a comprehensive resource for systematical analysis of immune infiltrates across diverse cancer types [14,15]. Gene module was applied to explore the correlation between GNPNAT1 expression and abundance of six types of immune cells infiltrates, including B cells, CD4+ T cells, CD8+ T cells, neutrophils, macrophages, and dendritic cells, by tumor puritycorrected partial Spearman's correlation. The Kaplan-Meier analysis was conducted to assess the prognostic capacity of each immune infiltrate. Multivariate Cox analysis was used to evaluate how GNPNAT1 and these six types of immune cells together affect outcomes. p value < 0.05 is the threshold of a significant correlation.
2.6. Correlation between GNPNAT1 and KRAS Mutation, EGFR Mutation, STK11 Mutation, TP53 Mutation, TMB, and Immune Signatures. The mutations of KRAS, EGFR, STK11, and TP53 were detailed in GSE72094 cohort. The correlation between GNPNAT1 and the above mutations was tested using Spearman's correlation. TMB is defined as the total number of somatic gene coding errors, base substitutions, insertions, or deletions detected per million bases [16]. In our research, the somatic mutation data of the TCGA-LUAD cohort was downloaded from the GDC Data Portal (https://portal.gdc.cancer.gov/). The mutation frequency with the number of variants/the length of exons (38 million) for each sample was calculated via Perl scripts based on the JAVA8 platform (Table S1) [16]. Spearman's rank correlation coefficient was applied to examine the correlation between TMB and GNPNAT1 in TCGA-LUAD cohort. TISIDB (http:// cis.hku.hk/TISIDB/) is a central portal for tumor and immune system interactions, which integrates multiple heterogeneous data types [17], and contained various immune gene signatures categorized by type of immune or their function. Gene signatures of chemokine, receptor, major histocompatibility complex (MHC), immunoinhibitor, immunostimulator, and 28 tumor-infiltrating lymphocytes (TILs) [18] were downloaded. The correlations between GNPNAT1 and these gene signatures above were calculated via the "Correlation" module of TIMER with tumor purity-corrected partial Spearman's correlation. The cutoff of p value < 0.05 indicates the significance of correlation.

Clinical Characteristics.
The flowchart of the present research is shown in Figure 1. TCGA-LUAD (n = 513 for LUAD, n = 59 for normal), GSE19188 (n = 45 for LUAD, n = 65 for normal), and GSE32863 (n = 58 for LUAD, n = 58 for normal) were chosen for GNPNAT1 differential expression comparison between tumor and healthy tissues. 513 LUAD cases that came from the TCGA-LUAD cohort and 442 LUADs of the GSE72094 dataset were used as the survival validation purpose (Table 1).

High GNPNAT1 Expression in LUAD.
In TCGA-LUAD cohort, we compared the expression of GNPNAT1 in tumor and adjacent tissues, and the unpaired (p value = 3.28e−29, Figure 2(a)) and paired (p value = 4.425e−19, Figure 2(b)) tests both indicated that the expression of GNPNAT1 in the tumor was elevated. Moreover, we examined GNPNAT1 differential expression in independent datasets of GSE19188 (p value = 1.802e−10, Figure 2(c)) and GSE32863 (p value = 2.116e−10, Figure 2(d)), finding the consistent results with that in TCGA-LUAD cohort.

High GNPNAT1 Expression Indicated Worse Survival an
Acted as an Independent Risk Factor in LUAD. Then, to understand the correlation between GNPNAT1 expression and patients' outcomes, we used the Kaplan-Meier curves to evaluate and compare the survival differences between patients with high and low expression of GNPNAT1 ( Figure 3). In TCGA-LUAD cohort, the high GNPNAT1 expression group had significantly shorter overall survival, and the median overall survival of group of the high GNPNAT1 expression vs. the low expression was 3.33 years vs. 4.93 years (log-rank test, p value = 2.566e−05, Figure 3(a)). In addition, we also checked how GNPNAT1 performing in disease-specific survival (Figure 3(b)) and progression-free survival (Figure 3(c)) in TCGA-LUAD, finding the high expression of GNPNAT1 predicted a worse prognosis. Consistently, in GSE72094 cohort, the high expression group had significantly unfavorable overall outcomes than the low expression group (p value = 0.0015, Figure 3(d)). To assess the risk potential of GNPNAT1 in LUAD, the Cox proportional-hazards model was constructed. In TCGA-LUAD cohort, the overall survivalbased Cox analysis showed GNPNAT1 having potential predict value in univariate (HR = 1:68, 95% CI: 1.38-2.05, p value = 3.60E-07) and multivariate (HR = 2:81, 95% CI: 1.48-5.36, p value = 0.00166) test (Table 2). Similarly, the independent prognosis capacity of GNPNAT1 was also confirmed in the disease-specific survival and progression-free survival Cox model (Table S2, Table S3). Additionally, in the GSE72094 cohort, Cox analyses identified the important value of GNPNAT1 in independently predicting the overall survival (Table 3; univariate analysis: HR = 2:02, 95%CI = 1:36-3, p value = 0.000521; multivariate analysis: HR = 1:76, 95%CI = 1:17-2:65, p value = 0.00667).

GNPNAT1 Coexpression and Regulatory Networks in LUAD.
In order to better understand the biological meaning of GNPNAT1 in LUAD, the LinkFinder module in the Lin-kedOmics web portal was deployed to check the coexpression pattern of GNPNAT1 in TCGA-LUAD. As is plotted in Figure 4(a), it showed that 4825 genes (red dots) positively correlated with GNPNAT1, and 7679 genes (green dots) negatively correlated (FDR < 0:05). Figures 4(b) and 4(c) show the heat maps of the top 50 genes positively and negatively associated with GNPNAT1, respectively. Moreover, Table S4 detailed lists the coexpressed genes. Significant Gene Ontology term annotation by GSEA showed that GNPNAT1 coexpressed genes involved mainly in chromosome segregation, ncRNA processing, translational initiation, telomere organization, and RNA 3 ′ -end processing. In contrast, the activities like regulation of metal ion transport, heart morphogenesis, cell-substrate adhesion, and protein localization to cilium were inhibited ( Figure 4(d) and Table S5). KEGG analysis showed genes were primarily enriched in the cell cycle, ribosome, proteasome, RNA transport, ribosome biogenesis in eukaryotes, spliceosome pathways, etc. (Figure 4(e) and Table S6). These results To understand the regulatory factors of GNPNAT1 in LUAD, we further analyzed the enrichment of kinases, miR-NAs, and transcription factors of GNPNAT1 coexpressed genes. The top 5 kinases related mainly to CDK1, PLK1, AURKB, CDK2, and ATM (Table 4 and Table S7). In fact, 3 of top 5 kinase genes include CDK1, PLK1, and AURKB were significantly highly expressed in tumor tissues and significantly related to the overall survival of LUAD ( Figure S1). Interestingly, the coexpressed genes of GNPNAT1 were not enriched on any miRNA targets significantly (Table S8). Transcription factor enrichment results showed that the coexpressed genes of GNPNAT1 were mainly enriched in the E2F transcription factor family (Table S9), including V$E2F1_Q6, V$E2F_Q6, V$E2F_Q4, V$E2F4DP1_01, and V$E2F1DP1_01. A recent study revealed the biological function of the E2F family gene in the development of cancer, and the possibility of this family gene becoming a potential biomarker of further therapeutic studies in patients with LUAD [19].     (Table 5). Taking together, the significantly infiltrating with B cells seemed like one of the critical factors that GNPNAT1 holds to influence the outcome of LUAD pronounced.
3.6. Correlation between GNPNAT1 and the Mutations of KRAS, EGFR, STK11, and TP53, Tumor Mutation Burden (TMB), and Immune Signatures. The mutations of KRAS, EGFR, STK11, and TP53 are correlated with the expression of GNPNAT1 in GSE72094 cohort based on our results. And TMB also had a significant correlation with GNPNAT1 (Table 6).
To expand the understanding of the crosstalk between GNPNAT1 and multiple immune marker genes of 28 TILs, immune inhibitory or stimulatory, cytokine-related, cancertestis antigen, and MHC, we did correlation analysis between them. The analysis showed that GNPNAT1 was significantly correlated with 66.14% (582/880) immune marker genes (Table S10)   In the previous section, B cell infiltration was found potential to be one of the key reasons that caused GNPNAT1 to become a prognostic factor. Thus, the correlation between GNPNAT1 and B cell marker genes was notable. Table 7, which was extracted from Table S10, shows the puritycorrected partial Spearman's correlation between GNPNAT1 and B cell markers. In B cells, GNPNAT1 is highly correlated with CDKN3 (#1, r = 0:626, p value = 4.29E-55), CCNA2 (#2, r = 0:616, p value = 7.89E-53), and GNG7 (#3, r = −0:448, p value = 9.65E-26). In total, 38/57 of the B cell marker genes associated significantly to GNPNAT1, of which the number of positive correlations was 8/38 (21.05%) and the negative was 30/38 (78.95%). We plotted the survival heat maps of the significant B cell markers correlated significantly with the GNPNAT1 expression on Figure S2. Interestingly, all the positively markers showed a high probability of becoming high-risk factors in LUAD, of which 3/8 markers had elevated HR (p value < 0.05) ( Figure S2A). In comparison, there were 22/30 genes with low HR (p value < 0.05) in the negatively markers ( Figure S2B).

Discussion
The present study found that GNPNAT1 was highly expressed in LUAD tumor tissue and significantly predicts a poor prognosis. Univariate and multivariate Cox analyses indicated the GNPNAT1 might be a potential independent biomarker for LUAD prognosis. Then, the profiles of coex-pression and regulator networks of GNPNAT1 were analyzed. At last, we conducted a correlation analysis between GNPNAT1 and immune infiltration, common gene mutations, TMB, and immune signatures, finding that GNPNAT1 was related to gene mutations, TMB, most of the immune marker genes. The infiltration of GNPNAT1 in B cells may be one of the contributions of GNPNAT1 prognostic ability. Such work we have done aimed to guide future research in LUAD.
Early studies have shown that GNPNAT1 deficiency may reduce insulin secretion associated with type 2 diabetes [20]. In prostate cancer, both GNPNAT1 and UAP1 are highly expressed at the RNA and protein levels. In addition, high UDP-GlcNAc levels correlate with increased UAP1 levels in prostate cancer cells [21]. A recent study showed that prostate cancer contains higher levels of GNPNAT1 and UAP1 transcripts than benign tissue [11]. In addition, there were few other studies on GNPNAT1. In our story, the investigation of differential expression in LUAD indicated that GNPNAT1 was highly expressed in tumor tissues, which was subsequently validated in two independent GEO datasets. Thus, we carried out overall survival analysis in TCGA-LUAD, revealing that the high GNPNAT1 expression was associated with poor outcomes, which was also examined in GSE72094 cohorts. Besides, the Cox analyses further proved GNPNAT1 was an independent risk factor in LUAD. Therefore, our results indicate that GNPNAT1 upregulation occurs in LUAD, and as a potential diagnostic and prognostic marker, it is worthy of further clinical verification.
We explored the regulators responsible for GNPNAT1 dysregulation and found that GNPNAT1 was related to kinase networks, such as CDK1, PLK1, AURKB, CDK2, and ATM. These kinases mainly regulated mitosis, genome Z-score
In this study, we found that the E2F family was the main transcription factor constituting GNPNAT1 dysregulation. E2F is a group of genes that encodes a family of transcription factors in advanced eukaryotes. They participate regulating the cell cycle and DNA synthesis in mammalian cells [33]. Our analysis did not find miRNAs that are significantly  associated with GNPNAT1, which may be due to the fact that GNPNAT1 is mainly involved in the role of mRNA spliceosomes and is far away from miRNA cellular. Our results indicate that E2F1 is a vital regulator of GNPNAT1, and GNPNAT1 may play a role in regulating the cell cycle and proliferation ability of LUAD through this factor. KRAS mutation is the most common gain-of-function alteration, accounting for~30% of lung adenocarcinomas in Western countries and about 10% of Asian lung adenocarcinomas [34]. An EGFR mutation causes rapid cell growth, which helps lung cancer spread. Gene testing can identify it and help tailor lung cancer treatment [35]. STK11, mutated or deleted in a third of non-small-cell lung cancer patients, fosters an immunologically "cold" tumor microenvironment, with minimal penetration of tumors by T cells, rendering anti-PD1/PDL1 drugs ineffective [36]. The tumor suppressor gene TP53 is frequently mutated in human cancers. Abnormality of the TP53 gene is one of the most significant events in lung cancers and plays an important role in the tumorigenesis of lung epithelial cells [37]. TMB is defined as the number of mutations per DNA megabases. It was first assessed as a biomarker for ICI based on the observation of successful immune checkpoint inhibition in solid tumors with high TMB in NSCLC [38]. In our research, the above mutations and TMB were all correlated with GNPNAT1, which suggested that GNPNAT1 has the potential to utilize these correlations to obtain the possibility of biological treatment.
This study found that the GNPNAT1 expression had significant negative correlations with B cells infiltrating. Moreover, the subsequent analysis found that B cells could independently predict the outcome of LUAD. These findings indicated that B cell infiltration may be one of the important contributors to GNPNAT1 with prognostic value. Notable, we detailed analyzed the correlation between GNPNAT1 and B cell signatures finding 38/57 (66.67%) of the B cell marker genes associated significantly to GNPNAT1, including CDKN3, CCNA2, and GNG7. It is well known that CDKN3 is overexpressed in multiple human tumor tissues and cell lines [39,40]. The highly expression of CDKN3 in human cancer tissue may reflect the increased proportion of mitotic cells in the tumor [41]. The elevated CDKN3 expression is associated with the adverse outcome of LUAD. CCNA2, also known as cyclin A2, belongs to the highly conserved cyclin family and plays a key role in cell cycle control [42]. A recent study demonstrated that CCNA2 is a crucial regulator of NSCLC cell metastasis promoting invasion. It has been speculated that GNG7 may be involved in cell contact-induced growth arrest and thus block uncontrolled cell proliferation in multicellular organisms [43]. Correlate analysis provides an exhaustive characterization of the association between GNPNAT1 and immune signatures in LUAD patients, indicating that GNPNAT1 is a crucial player in immune escape in the tumor microenvironment. In addition, the correlation between GNPNAT1 and B cell markers is particularly vital to the prognosis of LUAD patients. It is worth noting that GNPNAT1 may be a key factor mediating B cell therapy, which is needed to be further studied in further research.
The present research also has some limitations. In this study, the cohorts included come from TCGA and GEO databases, which own undoubted academic recognition by  13 BioMed Research International most scholars. However, such sample distribution in these cohorts may not be consistent with the clinical population. Therefore, our research may have a selection bias for database selection. Besides, there is currently no wet experimental data explaining the relationship between GNPNAT1 and its mechanism in LUAD samples. Therefore, between GNPNAT1 and the prognosis of LUAD, more effort is needed to clarify the potential relationship.

Conclusion
This study provided multiple levels of evidence for the importance of GNPNAT1 in the development of lung cancer and its potential as a biomarker and prognostic predictor of LUAD. Our results indicate that the upregulation of GNPNAT1 in LUAD indicates a poor prognosis, which may be caused by multiple steps that affect RNA splicing and genomic stability and cell cycle. Besides, we found that GNPNAT1 has a significant correlation with most immune signatures. In particular, the relationship between GNPNAT1 and B cell marker genes needs to be noted, which might be a new target for future LUAD research.

Conflicts of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.