Comprehensive Analysis of the Potential Prognostic Value of 11 Glycosylation-Related Genes in Head and Neck Squamous Cell Carcinoma and Their Correlation with PD-L1 Expression and Immune Infiltration

Background Head and neck squamous cell carcinoma (HNSCC) is one of the worst and most common malignant tumors. This study is aimed at studying the complex interaction between glycosylation-related genes and HNSCC. Methods The Cancer Genome Atlas (TCGA) contains gene expression profile data of HNSCC and normal tissues, as well as patient survival and clinical data. Combining five glycosylation-related gene sets, bioinformatics was used to analyze the expression of glycosylation-related genes in TCGA-HNSCC datasets and to identify prognostic risk markers, analyze their prognostic value, and the influence of glycosylation-related genes on the tumor immune microenvironment. Results Gene expression profiles and corresponding clinical information of 499 cases of HNSCC and 44 cases of adjacent tissues were obtained. Using 11 glycosylation-related genes to construct a prognostic risk score, the Kaplan-Meier curve analysis found that the overall survival of the high-risk group was significantly different than that of the low-risk group (P < 0.001). ROC analysis was used to evaluate the prognostic efficacy of prognostic risk markers, and the results showed that the prognostic risk markers had good efficacy in predicting the prognosis of patients. We also found that there is a correlation between glycosylation-related genes, PD-L1, and immunocyte infiltration, and there is a dynamic effect between the change in the copy number of glycosylation-related genes and the number of tumor-infiltrating immune cells. Conclusions Our research shows that glycosylation-related prognostic risk markers may be independent risk factors for the prognosis of HNSCC. We have found that there may be subtle links between glycosylation-related genes, PD-L1, and immunocyte infiltration, which has certain significance for exploring the occurrence and development of HNSCC and exploring the research of targeted therapy.


Introduction
Head and neck cancer is one of the main causes of global morbidity and mortality; among which more than 90% are squamous cell carcinomas, and most are derived from the stratified squamous mucosa of the lips, oral cavity, oropharynx, hypopharynx, and epithelial larynx [1]. Current studies have shown that the occurrence and development of head and neck squamous cell carcinoma (HNSCC) are related to smoking, long-term chewing of betel nuts, heavy drinking, and increasingly, high-risk human papillomavirus (HPV) infections [2,3]. Although with the improvement of medical standards and the development of advanced medical equipment, an increasing number of patients can be diagnosed and treated at an early disease stage. As one of the most common malignant tumors, the 5-year overall survival (OS) period for patients with HNSCC is only approximately 50% [4]. Tumor biomarkers are considered markers that can assist in the diagnosis and understanding of the occurrence and development of malignant tumors and can predict and identify specific cancer types, improve clinical diagnosis and treatment, and provide predictive information about the response to treatment [5]. Therefore, it is of great significance to explore effective tumor markers and design new treatment strategies for HNSCC patients.
The research found that mechanisms related to abnormal glycosylation play an important role in promoting the occurrence and development of HNSCC [6]. The study also found that there are many cytokines and immunosuppressive cells related to tumor immune escape in the HNSCC microenvironment [7]. Programmed death ligand 1 (PD-L1) positive advanced and metastatic HNSCC patients are likely to be sensitive to immunotherapy, immune checkpoints such as programmed death 1 (PD-1) treatment have received widespread attention, and related inhibitory antibodies can improve the prognosis of patients with HNSCC [8].
Abnormal glycosylation is one of the unique characteristics of cancer cells. Specific glycan changes and abnormal glycosylation processes are crucial in tumorigenesis and metastasis [9]. Studies show that glycan structures, glycosylated proteins, and glycosylation enzymes have influence on different steps of the metastatic process, including epithelialmesenchymal transition (EMT), migration, invasion/intravasation, and extravasation of tumor cells [10]. Studies have found that abnormal glycosylation plays a key role in promoting the occurrence and development of HNSCC. These changes promote the proliferation, invasion, and metastasis of HNSCC [11]. Therefore, identifying reliable prognostic markers is very important for selecting appropriate targeted therapy and improving the prognosis of HNSCC patients. In addition, the correlation between glycosylation-related genes and PD-L1, the expression of PD-L1 in HNSCC, and the abundance of immune infiltrating cells all need to be further studied. This study intends to analyze the transcriptome sequencing data, clinical data, and immune cell data of HNSCC, construct glycosylation-related genes as prognostic risk markers, and study its potential clinical application value. In addition, we also studied the infiltration of immune cells, the expression of PD-L1 in HNSCC, and the correlation with glycosylation-related genes.   Journal of Oncology expression profiles and clinical data of patients with HNSCC were extracted from The Cancer Genome Atlas (TCGA) dataset. Gene profile expression comparing HNSCC tissues (n = 499) and normal tissues (n = 44) and the corresponding clinical-related information were obtained.

Prognostic Glycosylation-Related Genes and Construction
of New Predictive Risk Markers. Using gene set enrichment analysis, five glycosylation-related gene sets were obtained. The gene set files are from the Molecular Signatures Database and were in GMP format. We analyzed the expression differences of the glycosylation-related genomes in HNSCC tissues and adjacent tissues. P < 0:05 was considered to have a significant difference in gene expression, and the core gene signature was constructed. Through single-factor and multifactor Cox regression analysis based on core genes, prognostic risk indicators were selected and prognostic risk indicators established.

Statistical Methods and Survival
Analysis. R language was used to screen for glycosylation-related genes with differential expression and to analyze the correlation between potential prognostic risk markers and other clinical variables. The R language loaded with the limma (linear model of microarray data) package was necessary for the statistical analysis of this study. Each patient was assigned a unique risk score based on the linear combination of gene expression levels: Risk score = expression value of gene 1 × β1 + expression value of gene 2 × β 2 + ⋯ + expression value of gene n × β n expression.
Using the risk score value to obtain the median as a cutoff value, 499 patients with HNSCC were divided into highand low-risk groups. The Kaplan-Meier curve was used for survival analysis, and ROC analysis was used to predict the prognostic efficacy of the labels. Univariate and multivariate Cox regression analysis was used to analyze the correlation between predictive signatures of glycosylation-related genes and other clinical variables. A P value less than 0.05 was considered statistically significant.
cBioPortalc is an online analysis database that provides visualization tools for research and analysis of cancer genetic data and uses molecular data obtained from cancer tissue and cytology to understand heredity, epigenetics, gene expression, and proteomics. STRING version 11.0 is an online tool that can modify and integrate information from many sources to analyze interactions between glycosylation-related genes.

Results
3.1. Glycosylation-Related Gene Set. The datasets included gene profiles of throat squamous cell carcinoma and oral squamous cell carcinoma. All information is derived from the Cancer Genome Atlas database (https://portal.gdc .cancer.gov/repository). We obtained and screened glycosylation-related gene sets on the GSEA website (https://www.gsea-msigdb.org/gsea/login.jsp), and we used five different gene sets (GO_PROTEIN_N_LINKED_GLY-COSYLATION, GO_ROUGH_ENDOPLASMIC_RETICU-LUM, HP_ABNORMAL_GLYCOSYLATION, HP_ ABNORMAL_PROTEIN_GLYCOSYLATION, and REAC-TOME_ASPARAGINE_N_LINKED_GLYCOSYLATION) to explore whether these five glycosylation-related gene sets are different between the paracancer sample and the tumor sample. We found that the abovementioned gene set in HNSCC was significantly different than that of the adjacent tissues and tumor tissues (FDRs were 0.003, <0.001, 0.007, 0.002, and 0.007, respectively, Figure 1). Next, P value < 0.05 considered that the expression of glycosylation-related genes in HNSCC and normal tissues was significantly different, the glycosylation gene set was screened, and the core genes are thus obtained. Table 1 shows genes with significant differences in expression in tumor tissues.     Journal of Oncology To verify whether the core genes were related to glycosylation, we used GO analysis and KEGG pathway enrichment analysis for more in-depth analysis. Figure 2 shows that the most enriched biological process (BP) is related to a variety of glycosylation processes, and molecular functions (MF) are related to multiple glycosylation pathways including glycosyl transfer, UDP-glycosyltransferase activity, O-Glycosyl compounds, and other related; KEGG pathway enrichment analysis involves protein processing in the endoplasmic reticulum, N-glycan biosynthesis, and glycosphingolipid biosynthesis, suggesting that the core genes screened are related to glycosylation.

Construction of Prognostic Risk Markers Based on Core
Genes. To define the OS data of HNSCC patients in association with core genes, single-factor Cox regression analysis was used to screen glycosylation genes related to OS of patients. Multivariate Cox regression analysis further verified the correlation between glycosylation-related core genes and patient OS. After the analysis, prognostic risk markers related to OS were screened, and the result was a prognostic risk marker composed of OS-related glycosylation-related genes, namely, PSMC1, NAGK, AREG, DDOST, ATP6V1E1, KDELR1, PLOD2, TMED10, ALG5, ARF3, and OST4 ( Table 2).
The regression coefficient (β) of each prognostic risk index was calculated by multivariate Cox analysis. The unique risk   Based on unique risk score, HNSCC patients were divided into a high-risk group and a low-risk group according to the median value of the risk score, with a risk score greater than the median being classified as a high-risk group (n = 249) and a risk score less than the median being classified as low-risk risk group (n = 250). The Kaplan-Meier (KM) survival curve was used to assess the prognostic difference between the two groups. The results showed that the OS of the high-risk group was significantly different from that of the low-risk group (P value < 0.001) (Figure 3(a)). To test the effectiveness of 11-gene glycosylation-related prognostic risk markers in predicting OS in patients with HNSCC at 3, 5, and 10 years in the diagnosis of HNSCC, we used receiver operating characteristic curve (ROC) analysis that further verified its diagnostic efficacy. The results showed that the respective area under the curve (AUC) was 0.725, 0.669, and 0.766, respectively (Figure 3(b)), indicating that prognostic risk markers perform well in predicting OS in patients with HNSCC. Based on the risk score, we drew a risk curve ( Figure 4) and found that the higher the risk score, the shorter the patient's survival time.

Analysis of the Expression and Correlation of 11
Glycosylation-Related Genes in Head and Neck Squamous Cell Carcinoma. In the cBioPortal online database, we analyzed the mutations of 11 glycosylation-related genes in HNSCC through clinical samples. The results showed that 78 patients (15.6%) had genetic mutations. Among them, the rate of mutation of the PLOD2 gene was 9%, including 37 cases of amplification and 8 cases of missense mutations. PSMC1 and ATP6V1E1 gene mutations accounted for 1.4%, TMED10 gene mutations accounted for 1.2%, and ALG5 gene mutations accounted for 1%; in addition, NAGK, AREG, DDOST, KDELR1, ARF3, and OST4 gene mutation rates were all lower than 1% ( Figure 5(a)).
We further analyzed the differential expression of PSMC1, NAGK, AREG, DDOST, ATP6V1E1, KDELR1, PLOD2, TMED10, ALG5, ARF3, and OST4 in HNSCC and normal head and neck tissues and found that compared with normal tissues, there are 11 glycosyl groups in HNSCC tissues. The expression of metabolism-related genes is significantly different (P < 0:001 for PSMC1, NAGK, DDOST, ATP6V1E1, KDELR1, PLOD2, TMED10, and ARF3; P = 0:013 for AREG; P = 0:002 for ALG5; P = 0:03 for OST4) ( Figure 5(b)). We classified the expression of each gene in tumor samples from HNSCC patients and then divided these patients into two subgroups based on the median expression value, namely, the high-expression group and the low-expression group. The Kaplan-Meier curve was used to verify whether the high or low expression of each gene was associated with the OS in patients with HNSCC. PSMC1, NAGK, AREG, DDOST, ATP6V1E1, KDELR1, PLOD2, TMED10, ALG5, ARF3, and OST4 were found to be related to a poor prognosis of HNSCC (P = 0:004, <0.001, 0.027, 0.035, 0.030, and 0.009, respectively) ( Figure 6). For PSMC1, NAGK, AREG, DDOST, ATP6V1E1, KDELR1, PLOD2, TMED10, ALG5, ARF3, and OST4, their role as independent biomarkers of prognosis needs to be further verified. Thus, we used ROC curve analysis to further confirm their prognostic effects (shown in Table 3), but the AUC values of the 11 glycosylationrelated genes that predicted OS in HNSCC patients were below 0.725, 0.669, and 0.766, respectively, indicating that they were predictors of poor prognosis.
The Pearson correlation coefficient was used to test the strength of the coexpression of glycosylation-related genes. As shown in Figure 7, we found that 11 gene pairs showed a coexpression relationship. Among these gene pairs, the  7 Journal of Oncology TMED10-PSMC1 pair has the strongest correlation (0.52); while ATP6V1E1 and ALG5 showed the lowest correlation with other glycosylation-related genes, and NAGK was negatively related to the other glycosylation-related genes. We used the online tool STRING version 11.0 to construct a protein-protein interaction (PPI) network. We found that the genes TMED10, ARF3, and KDELR1 were closely related and seemed to be associated with other genes.

Correlation between Prognostic Risk Markers and
Clinical Characteristics. We then evaluated whether the glycosylation-related prognostic risk markers composed of 11 oncogenes were related to clinical parameters including sex, age, grade, and clinical stage of patients with HNSCC. We used clinical parameters as covariates, analyzed the entire data set using univariate and multivariate Cox regression, and analyzed the correlation between prognostic markers and covariates. The results showed that in univariate Cox regression analysis, age, tumor clinical stage, and risk score were significantly correlated with OS, while sex    In summary, prognostic risk markers can be applied to predict OS in patients with HNSCC ( Figure 8).
We carried out a stratified analysis of each clinical feature and further evaluated the correlation between the risk score and the clinical feature. The result is shown in Figure 9. We found that when patients were stratified according to age, patients older than 65 years old have high-risk subgroups (n = 93) and low-risk subgroups (n = 82); while patients younger than 65 years old have high risk subgroup (n = 156) and low-risk subgroup (n = 168), there was a significant difference in OS between the two subgroups (P < 0:001). Patients were stratified by sex; for the females, 64 were in the high-risk subgroup and 69 were in the low-risk subgroup, while for males, there were 185 cases in the high-risk subgroup and 181 cases in the low-risk subgroup. The analysis showed that there were significant differences in OS rates between the two subgroups (P < 0:001).
All patients were stratified according to tumor grade (G) and were divided into two subgroups: GI/GII (n = 359) and GIII/GIV (n = 121). According to the risk score, there were 180 cases in the high-risk subgroup and 179 cases in the low-risk subgroup for GI/GII. We found a significant difference in OS between the two subgroups (P < 0:001). The same results were also observed in the two subgroups of patients with GIII/GIV tumors (P = 0:003). Patients were divided into T1 and T2 groups (n = 177) and T3 and T4 groups (n = 266) based on the T stage of the tumor. The survival rate of the low-risk subgroup in the T1 and T2 groups was significantly different from that of the high-risk subgroup (P = 0:003), and similar survival rates of the highand low-risk subgroups in the T3 and T4 groups were also found (P < 0:001). Depending on the presence or absence of lymph node metastases, these patients were divided into two groups. We found that the survival rates of the low- 10 Journal of Oncology risk and high-risk subgroups were significantly different between the two groups (P = 0:018 and P < 0:001, respectively). We divided patients into stage I and II groups (n = 94) based on the clinical stage of the tumor, which was high-risk subgroups (n = 36) and low-risk subgroups (n = 58). Stages III and IV were grouped together (n = 337 ), for the high-risk subgroup (n = 184) and low-risk subgroup (n = 153). In tumor stage III and IV groups, the 11 Journal of Oncology survival rate of patients in the high-risk subgroup was significantly different from that of patients in the low-risk subgroup (P < 0:001). In the tumor clinical stages I and II groups, there was no significant difference in the survival rate of patients in the two subgroups (P = 0:125). Since there was only 1 patient with distant metastasis, we could only stratify patients without distant metastasis to obtain a high-risk subgroup and a low-risk subgroup. The results suggested that there was a significant difference in survival rate between the two groups (P < 0:001).
3.5. The Relationship between PD-L1 and Glycosylation-Related Genes. We assess the difference in expression of PD-L1 in tumor tissue and normal tissues of HNSCC patients (Figure 10(a)). Compared to normal tissues, the expression of PD-L1 in HNSCC tissue increased significantly. We analyzed the relevance of PD-L1 and 11 glycosylation-related genes (Figure 10(b)). We have found that the expression of PD-L1 is significantly positively correlated with the expression of NAGK, and the expression of PD-L1 is significantly negatively correlated with the expression of OST4. HNSCC patients were divided into high expression groups and low expression groups according to the expression level of PD-L1, and the two groups were divided into four subgroups according to the risk score. We found that there is a significant difference in overall survival between 4 subgroups (Figure 10(c)).

The Effects of Genetic Changes in Glycosylation-Related
Prognostic Risk Markers on Immune Cell Infiltration. We analyzed the relationship between prognostic risk markers and the level of infiltration of six immune cell types ( Figure 11). The results showed that there was a significant negative correlation between the risk score and the level of infiltration of B cells, CD4+ T cells, CD8+ T cells, dendritic cells, and neutrophils. There was no significant correlation between the risk score and the level of macrophage infiltration. The results confirmed that signatures based on glycosylation-related genes were related to the TIME of HNSCC.
The effects of somatic copy number alteration (CNA) based on glycosylation-related genes on immune cell infiltration were further analyzed. The CNA of identified glycosylation-related genes significantly influenced the infiltration of B cells, CD4+ T cells, CD8+ T cells, neutrophils, macrophages, and dendritic cells in HNSCC. These results indicated that glycosylation-related genes had a key regulatory effect on the TIME in HNSCC patients ( Figure 12).

Discussion
The occurrence, development, and invasion of tumors are accompanied by changes in glycosylation of related glycoproteins. The changes in the structure of carbohydrates on the surface of cancer cells play an important role in the course of cancer [12]. The process of glycosylation affects the complexity of the regulation of protein function and promotes tumor proliferation, invasion, and angiogenesis [13,14]. Abnormal glycosylation mediated by glycoproteins such 13 Journal of Oncology as E-cadherin, PD-1/PD-L1, EGFR, and CD44 can have a vital impact on the epithelial-mesenchymal transition and immune escape of HNSCC [15]. However, the specific mechanism of abnormal protein glycosylation on the occurrence and development of HNSCC has not been fully elucidated. We constructed 11 glycosylation-related prognostic