A Metabolic Gene Signature to Predict Overall Survival in Head and Neck Squamous Cell Carcinoma

Background Head and neck squamous cell carcinoma (HNSCC) is a common malignancy that emanates from the lips, mouth, paranasal sinuses, oropharynx, larynx, nasopharynx, and from other pharyngeal cancers. The availability of high-throughput expression data has made it possible to use global gene expression data to analyze the relationship between metabolic-related gene expression and clinical outcomes in HNSCC patients. Method In this study, we used RNA sequencing (RNA-seq) data from the cancer genome atlas (TCGA), with validation in the GEO dataset to profile the metabolic microenvironment and define potential biomarkers for metabolic therapy. Results We extracted data for 529 patients and 327 metabolic genes (198 upregulated and 129 downregulated genes) in the TCGA database. Carbonic anhydrase 9 (CA9) and CA6 had the largest logFCs in the upregulated and downregulated genes, respectively. Our Cox regression model data showed 51 prognostic-related genes with lysocardiolipin acyltransferase 1 (LCLAT1) and choline dehydrogenase (CHDH) being associated with the highest risk (HR = 1.144, 95% CI = 1.044 ~ 1.251) and the lowest risk (HR = 0.580, 95% CI = 0.400 ~ 0.839) in HNSCC, respectively. We next used the ROC curve to evaluate whether the differentially expressed metabolic-related genes could serve as early predictors of HNSCC. The findings showed an AUC of 0.745 and 0.618 in the TCGA and GEO analysis, respectively. Besides, the ability for the genes to predict clinicopathological HNSCC status was analyzed and the data showed that the AUC for age, gender, grade, stage, T, M, and N was 0.520, 0.495, 0.568, 0.606, 0.577, 0.476, and 0.673, respectively, in the TCGA dataset. On the other hand, the AUC for age, gender, stage, T, M, N, smoking, and HPV16-pos was 0.599, 0.531, 0.622, 0.606, 0.616, 0.550, 0.614, 0.519, and 0.397, respectively, in the GEO dataset. Conclusion Taken together, our study unearths a novel metabolic gene signature for the prediction of HNSCC prognosis based on the TCGA dataset. Our signature might point out the metabolic microenvironment disorders and provides potential treatment targets and prognostic biomarkers.


Background
Head and neck squamous cell carcinoma (HNSCC) is a malignancy that originates from the lips, mouth, paranasal sinuses, larynx, nasopharynx, and from other pharyngeal cancers [1]. The HNSCC is the sixth most common type of malignant tumors, with more than 655,000 new cases and 90,000 deaths every year [2]. Smoking, drinking, and human papillomavirus (HPV) infections are considered risk factors for the occurrence and development of HNSCC [3]. Worryingly, due to the lack of early manifestation of symptoms or diagnosis, local recurrence, and metastasis, the 5-year survival rate still lags at below 50% [4]. The occurrence and development of HNSCC are complex processes that are mediated by multiple molecules and pathways. Kim et al. reported the mechanistic and functional roles of CXCR7 as a key regulator of oncogenic TGF-β1/Smad2/3 signaling in HNSCC [5]. In addition, Hsu et al. defined the oncogenic driver role of atypical cadherin 1 (FAT1) in the mediation of proliferation, cell-death evasion, and chemoresistance in oral squamous cell carcinoma (OSCC) [6]. Moreover, another study revealed that the expression of E6 and E7, the HPV virus oncogenes, inactivates p53 and retinoblastoma (RB), respectively [7]. However, there is a wide spectrum of histological tumor markers for HNSCC and multiple anatomical sites. Therefore, it is possible to identify more valuable HNSCC drug targets by screening for changes in gene function networks linked to tumor formation and progression.
The activation of oncogenes and lack of tumor suppressors contribute to metabolic reprogramming in cancer, leading to improved nutrient uptake that feed biosynthetic pathways [8]. In the 1920s, Otto Warburg first reported that tumors took up distinctly more levels of glucose compared with normal tissues, indicating that these cells were biased towards shuttling glucose via the glycolytic pathway [9]. Recent studies have shown that immune cells have unique metabolic characteristics that affect their immune function. For example, macrophage polarization is associated with unique metabolic characteristics related to iron, energy, and lipid metabolism [10,11]. Whereas a number of studies have investigated that the prognostic role of these metabolic genes in cancer, data on the role, and mechanism of metabolism still remains scant. In their studies, Hu et al. found that mutationally activated KRAS robustly increased the glutathione biosynthesis and intracellular cystine level in lung adenocarcinoma [12]. On the other hand, Yoo et al. found that the    [13]. There is, however, no data on systematic assessment of the metabolic-related genes that could reliably predict the overall survival (OS) in HNSCC patients or characterize the patient response to immunotherapy. The availability of high-throughput expression data has made it feasible to utilize global gene expression data to analyze the relationships between the metabolic-related gene expression and clinical outcomes in HNSCC patients. In this study, we used RNA sequencing (RNA-seq) data from The Cancer Genome Atlas (TCGA) with validation from the Gene Expression Omnibus (GEO) dataset to profile the metabolic molecular microenvironment and assess their importance as biomarkers for metabolic therapy.

Data Collection.
We extracted RNA-seq data for HNSCC patients from TCGA (http://www.cancergenome.nih.gov), a web-based resource that provides a user-friendly interface and depository for mRNA expression data. We validated all the data from the GSE65858 data set obtained from the GEO database and then extracted all the metabolic-related genes contained in the Gene set enrichment analysis (GSEA) database. One millionth transcript normalization and log2 transformation were used for expression profiling. The selection of metabolic-related genes for prognostic analysis was not only consistent with the expression patterns in the TCGA cohort but also listed in the GSE65858 data set.

Development of the Metabolic-Related Prognostic Gene
Signature. Lasso-penalized Cox regression and Univariate Cox regression analyses were used to build the metabolicrelated prognostic gene signature [14]. The signature was defined as risk score = ðcoefficient mRNA1 × expression of mRNA1Þ + ðcoefficient mRNA2 × expression of mRNA2Þ + ⋯+ ðcoefficient mRNAn × expression mRNAnÞ. The related clinical data for HNSCC patients were also downloaded and evaluated. Based on the median, we denoted the data as either low-risk (<median number) or high-risk (≥median number) group. We used Kaplan-Meier survival analysis to analyze the survival rate for both the study and control groups.

Building and Validating a Predictive Nomogram.
Here, we developed a nomogram [15] for the prediction of the occurrence of cancer events, such as recurrence or death. We then used the time-dependent receiver operating characteristic (ROC) curve to assess the predictive accuracy of the developed prognostic signatures for patients with HNSCC. Univariate and multivariate Cox regression analyses were employed to analyze the relationship between immunerelated genes and clinicopathological manifestations.

External Validation of the Prognostic Gene Signatures.
We downloaded the validated GSE65858 dataset in the GEO database. Following the assessment of the risk scores for the patients with genetic characteristics and carrying out the ROC analysis as well as the Kaplan-Meier analysis, we robustly demonstrated the similarity between the constructed nomogram and the TCGA-HNSCC cohort. To understand the mechanisms underlying defining the gene signatures in the Kyoto Encyclopedia of Genes and Genomes (KEGG), we used GSEA to search for rich terms in C2 in the TCGA-HNSCC or GSE65858 cohort. A P < 0:05 and a false discovery rate q < 0:25 were considered to be statistically significant. The mRNA expression level (Oncomine and TIMER database) and protein expression profile (The Human Protein Atlas database)        Figure 3: The result of univariate and multivariate Cox regression analysis showed that our prognostic model is an independent prognostic factor for OS.  Mediators of Inflammation risk scores ( Table 2). The samples were then divided into high-and low-risk groups using the median risk score value as a cut-off.

Survival Results and Multivariate Examination.
Our OS analysis demonstrated that, unlike the low-risk group, the high-risk HNSCC group was associated with a worse prognosis (P < 0:01) (Figure 1). We showed that the mortality rate was higher in the high-risk HNSCC patients, and the increase in the patients' risk score was proportional to the death rate ( Figure 2). Next, we used the univariate and multivariate COX analyses to determine the risk factors which defined the prognostic model based on thirty metabolic-related genes. We demonstrate that the 30 metabolic-related gene signatures could robustly and independently predict prognosis and OS (Figure 3). On the other hand, we evaluated whether the metabolic-related gene patterns could serve as an early predictor of incidence in HNSCC.  Figure 4).

Gene Set Enrichment Analyses.
Here, we split the samples into high-and low-risk groups to distinguish the potential functions and elucidate the significant survival differences in the GSEA. Annotated gene set c2.cp.kegg.v6.0.symbols.gmt was selected as the reference gene sets, which included terms with NOM < 0:05. Gene set permutations were executed multiple times for every examination. A great majority of the metabolic-related pathways such as galactose metabolism, nicotinate/nicotinamide, and pantothenate/-COA biosynthesis or metabolic disease-related perturbations were enriched in the high-risk group. On the other hand, most of the nonmetabolic-related pathways such as base excision repair, spliceosome, homologous recombination, nucleotide excision, and DNA replication were enriched in the low-risk group ( Figure 6 and Table 3).

Online Database Analysis.
To provide new insights into the potential functions, expression patterns, molecular mechanisms, and distinct prognostic value, we used multidimensional survey techniques to explore CA6, CA9, LCLAT1, and CHDH based on variations in the copy numbers or gene expression profile in the HNSCC patients. In agreement with our findings, data from both the TIMER database and Oncomine showed that CA6 was significantly downregulated, while CA9 was significantly overexpressed in HNSCC patients (Figures 7 and 8). Despite the limited data in the Oncomine, the LCLAT1 mRNA expression was overexpressed while the CHDH expression was downregulated in HNSCC in the TIMER database. Representative protein expression levels for CA9, LCLAT1, and CHDH were explored in the HPA database as shown in Figure 9. We showed that CA9 has the most frequent genetic variations (10%), and the most pronounced changes were amplification of mutations ( Figure 10). In summary, we verified the abnormal expression profiles for these genes in HNSCC, and the genetic changes might explain the abnormal expression.

Discussion
Proliferating cancer cells must maintain sufficient energy and a library of metabolic intermediates to build the macromolecules required for growth. The molecules include DNA, pro-teins, and lipids [16]. Because the metabolic profile could distinguish the tumor cells from the normal cells, metabolic signaling pathways have become ideal targets for therapeutic intervention for cancer patients. In this study, we identified a novel and effective metabolic-related prognostic gene  Figure 7: Differential expression of CA6, CA9, LCLAT1, and CHDH between tumors and normal tissues based on TIMER database. 9 Mediators of Inflammation signature based on the TCGA dataset and validated it in the GSE65858 dataset. Our constructed signature had a strong prognostic value and may represent the metabolic status of patients with HNSCC. Therefore, the signature could be used as a potential biomarker and therapeutic target in the metabolic signaling pathways.
In our study, we downloaded transcriptome data from the TCGA database and verified it using the GEO dataset as well as the metabolic-related genes extract from the GSEA metabolic signaling pathways. We first evaluated the relationship between the differentially expressed RNA, immune-related genes, and transcription factors in HNSCC patients. Univariate Cox regression model found 51 survival-related genes, whereby LCLAT1 was associated with the highest risk while CHDH denoted the least risk. Cardiolipin (CL) types of polyunsaturated fatty acids, especially DHA (C22: 6n3), increased in ALCAT1-expressing cells, while C16-C18 fatty acids significantly decreased [17]. A recent study showed that ALCAT1 is critical for coupling mitochondrial respiration and metabolic plasticity [18]. Wang et al. reported that forced expression of ALCAT1 in primary hepatocytes led to multiple defects including steatosis, defective autophagy, and mitochondrial dysfunction [19]. Meanwhile, ALCAT1 can promote ROS production and is critical for coupling mitochondrial respiration and metabolic plasticity [20]. However, there was limited data on the role of ALCAT1 in tumors. Here, we hypothesize that the ALCAT1 might play a regulatory role in cardiolipin remodeling in response to oxidative stress and stimulate mitochondrial activity in HNSCC cancers. Choline dehydrogenase (CHDH) localizes to the mitochondrion, and variations in this gene can affect susceptibility to choline deficiency. CHDH strongly predicted clinical outcome in breast cancer patients receiving tamoxifen monotherapy [21]. Choline is an essential nutrient required for methyl group metabolism, and CHDH is associated with an increased risk of breast cancer [22]. There is no available data on the role of CHDH in HNSCC.
By the use of ROC curves, we next interrogated whether the metabolic-related gene patterns could serve as an early predictor for the incidence of HNSCC. Our model demonstrated an AUC of 0.745 and 0.618 in the TCGA and GEO datasets. Integrating our prognostic model with the clinicopathological indicators enhanced the prediction sensitivity and specificity for the 1-, 2-, and 3-year OS, thus, better clinical management. Our further analysis of the survival difference using the GSEA showed that the majority of the metabolic-related pathways such as galactose and nicotinamide metabolism were enriched in the high-risk group while most of the nonmetabolic-related pathways were enriched in the low-risk group. Galactose is an essential molecule and plays a pivotal role in energy transfer and galactosylation of complex molecules. On the other hand, nicotinamide adenine dinucleotide (NAD) plays a central role in energy metabolism and integrates cell metabolism with signaling and gene expression [23]. NAD biosynthesis is dependent on nicotinamide/nicotinate single-nucleotide adenylate transferase [24]. Therefore, high-risk patients may benefit from metabolic therapy, while low-risk patients may benefit from nonmetabolism-targeted therapy. However, there is a need for more studies on the relationship between gene signatures, metabolic microenvironment, and metabolic therapies. Our data provide a promising direction in elucidating the underlying molecular mechanisms for the interrogated signatures. In conclusion, our signatures may reflect metabolic microenvironment disorders and provide potential biomarkers for metabolic therapy and prediction of prognosis after treatment.
Our analysis of the 327 metabolic genes in the TCGA database showed that the CA9 had the largest logFC in the upregulated category, while CA6 was downregulated. Carbonic anhydrases (CAs) are a large class of zinc metal enzymes that catalyze the reversible hydration of carbon dioxide. They are involved in various biological processes, including bone resorption, respiration, calcification, and acid-base balance. A previous study showed that pancreatic ductal adenocarcinoma (PDACs) cells that express an activated KRAS increase the expression of CA9, via stabilization of hypoxia-inducible factor 1 subunit alpha (HIF1A) and HIF2A, which eventually regulates the pH and glycolysis [25]. Similarly, CA9 is an independent prognostic factor for OSCC patients and therefore a potential therapeutic target [26]. CA6 encodes several isoenzymes which are only found in salivary glands. Saliva and proteins may play a role in the reversible hydration of carbon dioxide. CA6 is a specific marker for salivary gland serous acinar cells and acinar cell carcinoma (AciCC). CA6 has the same sensitivity and  specificity as DOG1 in the differential diagnosis of AciCC and breast analogs (MASC) [27]. However, data on the role of CA9 and CA6 in the prognosis of human HNSCC remains scant. Our study identified a novel metabolic gene signature for the prediction of HNSCC prognosis based on the TCGA data set. Our signatures might reflect the disorders in the metabolic microenvironment and provide potential biomarkers for metabolic therapy and monitoring of the treatment response. However, the metabolic gene signatures for prediction must be verified in more independent cohorts and functional experiments. It is, however, important to mention that our study was limited by the relatively small sample size and the fact that our results were not verified in clinical samples.

Conclusion
Taken together, our data defined a novel metabolic gene signature for the prediction of HNSCC prognosis based on the TCGA dataset. Our signatures reflect the metabolic microenvironment disorders and provide useful biomarkers for metabolic therapy and prediction of the response to the treatment.  Figure 10: CA6, CA9, LCLAT1, and CHDH gene expression and mutation analysis.