Analyzing Prognostic Hub Genes in the Microenvironment of Cutaneous Melanoma by Computer Integrated Bioinformatics

Cutaneous melanoma (CM) is attracting increasing attention due to high mortality. In response to this, we synthetically analyze the CM dataset from the TCGA database and explore microenvironment-related genes that effectively predict patient prognosis. Immune/stromal scores of cases are calculated using the ESTIMATE algorithm and are significantly associated with overall patient survival. Then, differentially expressed genes are identified by comparing the immune score and stromal score, also prognostic genes are subsequently screened. Functional analysis shows that these genes are enriched in different activities of immune system. Moreover, 19 prognosis-related hub genes are extracted from the protein-protein interaction network, of which four unreported genes (IL7R, FLT3, C1QC, and HLA-DRB5) are chosen for validation. A significant negative relationship is found between the expression levels of the 4 genes and pathological stages, notably T grade. Furthermore, the K-M plots and TIMER results show that these genes have favorable value for CM prognosis. In conclusion, these results give a novel insight into CM and identify IL7R, FLT3, C1QC, and HLA-DRB5 as crucial roles for the diagnosis and treatment of CM.


Introduction
Cutaneous melanoma (CM) is highly malignant, accounting for more than 80% of deaths from skin disease [1,2]. As of late, the incidence and mortality of CM are increasing. Fortunately, innovative immunotherapy strategies, including therapies using anti-PD1, anti-PD-L1, and anti-CTLA4, have effectively improved patient prognosis [3][4][5]. However, only approximately 40% of patients who receive immunotherapy are able to achieve effective long-term remission [6]. Besides, some patients experience adverse effects due to the complicated interaction between the tumor microenvironment and cancer cells [7]. erefore, identifying more effective prognostic biomarkers is of extraordinary importance for the therapy of CM. e tumor microenvironment is composed of immune cells and stromal cells, which is of great significance for tumor diagnosis and prognostic assessment [8,9]. High levels of immune cell infiltration can be regarded as an indicator of a favorable prognosis in cancers [10], implying that effectively assessing the condition of the microenvironment might hold promise for further treatment. In addition, techniques for anticipating tumor purity utilizing gene expression information from e Cancer Genome Atlas (TCGA) database have been created [11,12]. For instance, Yoshihara et al. [11] designed an algorithm called ESTIMATE, which has been used to analyze cancer data in numerous studies, showing the efficiency of the algorithm [13][14][15]. As a frontier study, we want to explore the microenvironment-related genes that effectively predict CM development by integrating the TCGA database and immune-related scores. quantified expression levels of mRNA in the form of fragments per kilobase per million (FPKM). Dataset was submitted by the Genomic Data Commons using the Illumina platform ( August 15, 2017). e corresponding clinical information included gender, age, TNM grade, and New-American Joint Committee on Cancer (AJCC) grade; data on survival and prognosis were also acquired from the TCGA database. Moreover, information on identified gene alterations was obtained from the cBioPortal database (https://www.cbioportal.org/). Immune scores and stromal scores were determined with the ESTIMATE algorithm [11] on the downloaded data. Because no other valid CM datasets could be found in other public databases, 470 patients were randomly divided into an experimental group and a testing group for analysis and validation.

Differential Analysis of Expressed Genes.
e data processing was carried out using the 'limma' package [16] in R software. e thresholds for definite differentially expressed genes (DEGs) were set to adj. p < 0.05 and fold change >2.

Heatmaps and Clustering Analysis.
Heatmaps and clustering of DEGs were created utilizing the 'heatmap' (R package). Correlation heatmaps of the identified genes were drawn with the 'corrplot' (R package). e Pearson correlation coefficient was used for correlation analyses, in which >0 shows a positive relationship and >0.5 shows a strong relationship.

Visualization of Gene Expression Levels and Chromosome
Locations.
e R package 'OmicCircos' [17] was utilized to visualize the expression levels and chromosome areas of the top 100 significant DEGs.

Functional Analysis of DEGs.
Functional enrichment analysis of the DEGs utilizing the David online dataset [18] was performed to identify Kyoto Encyclopedia of Genes and Genomes (KEGG, https://www.genome.jp/kegg/) pathway and GO categories, the later including biological processes (BPs), molecular functions (MFs), and cellular components (CCs). A false discovery rate (FDR) <0.05 was used as cut-off criterion.
e outcome of the enrichment analyses was pictured by the 'GOplot' package [19].

Clinical Correlation and Survival Analysis of Genes.
e R package 'ggstatsplot' (https://cran.rproject.org/web/ bundles/ggstatsplot/) was used to reveal the relationships between the gene expression levels and patient clinical features. Independent samples T test or one-way analysis of variance (ANOVA) was used as fitting. e survival analysis was led utilizing the 'survial' (https://CRAN.R-project.org/ package�survival) package to evaluate the correlation between the overall survival (OS) and the gene expression level; the K-M curves were drawn using the 'survminer' (https:// CRAN.Rproject.org/package�survminer) package. e correlation was determined by the log-rank test, with p < 0.05 as the threshold.

Protein-Protein Interaction Analysis.
A protein-protein interaction (PPI) network was constructed to demonstrate the association among the proteins encoded by the identified DEGs. e network was created through STRING version 10.5 (https://string-db.org/), and the outcomes with a minimum correlation score of 0.4 were reconstructed via Cytoscape programming [20]. e connectivity degree of each node in the organization was also calculated. In addition, cytoHubba, a Cytoscape module application that provides an easy way to analysis significant hubs in biological networks [21], was used for comprehensive evaluation.

TIMER Online Analysis of Identified Genes.
To analyze the relationship of the expression level of the identified genes with tumor purity and the infiltration of immune cells (B cells, CD4+ T cells, CD8+ T cells, neutrophils, macrophages, and dendritic cells), we used the web-based instrument TIMER (https://cistrome.shinyapps.io/clock/) [22].

Immune/Stromal Scores Are Both Significantly Associated
with Different Pathological Types. We obtained the gene data and clinical profiles of 470 CM patients from the TCGA database. Among these patients, 180 (38.3%) were female, and 290 (61.7%) were male. e average age of the patients was 58.2 years old, and 45.1% were over 60 years of age. All CM patients with complete gene data and clinical data in the TCGA database were included for this study. rough the ESTIMATE algorithm, immune scores went from −1,481.02 to 3,768.83, and stromal scores were distributed between −886.07 and 584.93 (Figures 1(a) and 1(b)). e average immune score of stage I patients was the highest among the four grades, and the average scores of stage II and stage IV patients were the two least. Differences in immune scores among the four stages were statistically significant ( Figure 1(a), p < 0.001). Interestingly, the order of the stromal scores of the different stages from highest to lowest, is as follows: stage IV > stage III > stage I > stage II (Figure 1(b), p < 0.01), showing that immune scores might be more significant in the relationship of pathological types than stromal scores, despite they all had obvious differences between various stages.
In light of the AJCC melanoma guideline, BRAF mutations are frequent in CM patients and play a significant part in the development of CM [23]. We mapped the immune/stromal scores dependent on the BRAF mutation status in CM patients. Cases with BRAF mutation had lower immune scores and stromal scores, though this distinction was not measurably huge (Figures 1(c) and 1(d)).
To investigate the relationship between OS and the immune scores and/or stromal scores, we divided the 470 patients into high-and low-score groups as indicated by their scores. As shown in the Kaplan-Meier (K-M) survival analysis (Figure 1(e)), the middle OS of patients with a high immune score was higher than that of patients with a low score (p � 0.002). Consistently, patients in the high score group of stromal scores had a longer middle OS than those in the low-score group ( Figure 1(f ), p � 0.04).

Differential Analysis of Expressed Genes with Immune/ Stromal Scores.
To explore the DEGs related to the immune scores and/or stromal scores in CM, we analyzed each of the 235 cases in the experimental group. Immune-related genes were screened by contrasting the high score group and the lowscore group; 981 genes were upregulated and 94 genes were downregulated in the gathering with high immune scores. A comparison of the stromal scores show that 1,427 genes were upregulated and 28 genes were downregulated in the high score group (Figures 2(a) and 2(b)). Moreover, there were 833 coupregulated genes and 4 co-downregulated genes in both the high immune and stromal groups (Figures 2(c) and 2(d)). To get more accurate results, we focused on 837 common DEGs for subsequent analysis in this study.

Visualization of Gene Expression Levels and Chromosome
Locations. e gene levels and chromosomal locations of the top 100 significant DEGs ( Figure 3) were visualized in 10 randomly chose cases from the experimental group based on survival time (1-10 years). ese DEGs were distributed in chromosomes other than chromosomes 13/14/18/Y, and chromosomes 3/6 contained the most genes. Besides, the top 5 genes as indicated by the adj. p, i.e., SASH3, GNGT2, SNX20, WAS, and CORO1A, were situated in X, 17, 16, X, and 16, separately. e top 5 genes (CD8B, SIRPG, NKG7, PRF1, and CD3E) according to the fold change were distributed on chromosomes 2, 20, 19, 10, and 11, respectively.

Functional Enrichment and Survival Analysis of DEGs.
To reveal the potential function of 837 DEGs, GO and KEGG pathway investigations were conducted. e most significant GO terms for BPs, MFs, CCs, and KEGG pathways are shown in Figure 4. To investigate the value of each DEGs in patient prognosis, we plotted K-M survival curves based on the patient data. Within 837 DEGs, a sum of 277 genes were demonstrated to be significantly correlated with prognosis in the log-rank test (p < 0.05); the top 10 significant genes are shown in Figure 5.

Functional Analysis of Prognostic DEGs.
To better comprehend the function of the prognostic DEGs, which were chosen for further functional analysis. We found that their function was enriched in various GO terms, like the immune response, inflammatory response, T cell costimulation, the regulation of the immune response and the adaptive immune response (Figure 6(a)). As far as cellular components, the external side of the plasma membrane was the top GO term ( Figure 6(b)). In addition, some molecular component GO terms, like receptor activity and transmembrane signaling receptor activity, were enriched ( Figure 6(c)). rough the KEGG pathway, we found the cytokine-cytokine receptor interaction was highly associated with these genes (Figure 6(d)).

Protein-Protein Interactions and Correlation Analysis of
Genes. To elaborate the interplay among the proteins encoded by the 277 prognostic DEGs, we utilized the STRING program to construct a PPI network, which included 237 nodes and 2,676 edges. Additionally, 40 of the 277 DEGs were excluded from the PPI network because their interaction scores were lower than 0.4. Among the 237 genes in the network, 19 hub genes (CD3E, CD69, CD274, KLRD1, PDCD1, LAG3, HLA-DRB5, IL7R, LILRB4, TNFRSF18, TRAT1, PDCD1LG2, C1QB, GZMK, C1QC, CXCL13, FLT3, IDO1, and IRF1) were screened according to the comprehensive assessment with 12 algorithms in the cyto-Hubba plugin, and there were strong positive correlations between these genes (Figures 7(a) and 7(b)). We also obtained more information about their alterations through cBioPortal (https://www.cbioportal.org/), as shown in Table 1. Moreover, four (IL7R, FLT3, C1QC, and HLA-DRB5) of the 19 genes were selected for subsequent analysis because they have not been recently reported to be correlated with clinical outcome of CM.

Correlation of the Four Hub Genes' Expression with Clinical Features.
To further reveal the value of the 4 hub genes, we explored the correlations between their expression levels and pathological types as well as OS. All 4 hub genes were essentially differentially expressed in CM patients with various T grades, although three (IL7R, FLT3, and C1QC) of them showed significant differences between pathological types, with lower expression levels indicating advanced pathological degrees and higher T grades (Figures 8(a) and 8(b)). In addition, Figure 8(c) showed that a high expression level of four genes is favorable for patient prognosis. e above results demonstrated that the 4 hub genes have great clinical value for CM.

Correlation of the Four Hub Genes' Expression with the Tumor Microenvironment.
e tumor microenvironment comprises of cancer cells, infiltrating immune cells, and stromal cells. We used TIMER to reveal relationships between the expression levels of the 4 genes and both the proportion of immune cells and tumor purity. Remarkably, IL7R, FLT3, C1QC, and HLA-DRB5 were all negatively correlated with cancer purity. Similarly, higher expression levels of the 4 genes obviously associated with a high infiltration of B cells, CD4+ T cells, CD8+ T cells, neutrophils, macrophages, and dendritic cells (Figures 9(a)-9(d)).

Validation of the Four Hub Genes in the Testing Group.
To verify whether the 4 hub genes identified from the experimental group also have prognostic value in the testing group, we plotted K-M survival curves of each gene based on the data of the testing group. Notably, each of the 4 hub genes was approved to be essentially linked to the clinical outcome of CM patients ( Figure 10).

Discussion
CM is one of the most aggressive tumors, which has drawn increasing attention worldwide. Although some advanced treatments have been used in clinical practice, the results are still unsatisfactory [5]. erefore, it is urgent to look for better biological biomarkers that could be used to guide clinical treatment. Melssen et al. [24] have demonstrated the significant correlation between the tumor microenvironment and melanoma, so we hope to discover new biomarkers that are related with patient prognosis by analyzing CM data from the TCGA database. Interestingly, through comparing expression level of genes in the CM dataset with high versus low immune/stromal scores, 277 prognostic genes were implicated in Tcell activation and receptor ligand activity. Besides, 4 of the 19 hub genes have not recently been reported and were validated in the testing group.
To start with, we got 837 DEGs through a differential analysis between the parts of high and low immune/stromal score, and most of the DEGs were selected to participate in immune-related processes in the cancer microenvironment, as shown by the functional enrichment analysis (Figure 3). e results indicated that the interaction of immune cells and extracellular matrix molecules is critical for constructing the tumor microenvironment of CM, which is steady with past reports [25][26][27]. Figure 3 shows a positive association between expression level and prognosis of CM, and most genes were located on chromosomes 6/19. References [28,29] demonstrated that the change of chromosome 6 is the most frequent karyotypic abnormality in melanoma, which is significantly associated with primary tumor development. en, 277 prognostic genes were screened through the OS analysis of 837 DEGs in CM patients. A PPI network was constructed with the identified genes that were significantly associated with the immune/inflammation response. By constructing the network, 19 highly interconnected genes were identified, of which 15 genes (e.g., CD3E, CD69, CD274, KLRD1, PDCD1, LAG3, LILRB4, TNFRSF18, TRAT1, PDCD1LG2, C1QB, GZMK, CXCL13, IDO1, and IRF1) were shown to be correlated with the outcome of CM patients in past reports [30][31][32], further indicating that our results based on big data mining are valid. e other four genes not previously discovered to be correlated with the outcome of CM may likewise be crucial for the development of tumor, which include the interleukin receptor encoding gene IL7R, cytokine receptor FLT3, complement component C1QC, histocompatibility complex HLA-DRB5. rough the analysis of the correlations between these four genes and clinical features, we observed a critical negative relationship among gene expression levels with pathological stages, particularly T grade. Furthermore, the TIMER results for each identified gene showed that all of them were negatively associated with tumor purity, while their expression levels were positively correlated with the infiltration of immune cells in CM (Figures 9(a)-9(d)).
Finally, by cross validation with the testing group, a valid dataset of 235 CM cases, the expression of the IL7R, FLT3, C1QC, and HLA-DRB5 genes was significantly correlated with patient prognosis (Figure 10). In previous report, IL7R            CXCL13  TRAT1  GZMK  CD3E  LILRB4  C1QB  C1QC  PDCD1  LAG3  IRF1  KLRD1  IL7R  CD69  FLT3  IDO1  CD274  PDCD1LG2  TNFRSF18  HLA-DRB5   CXCL13  TRAT1  GZMK  CD3E  LILRB4  C1QB  C1QC  PDCD1  LAG3  IRF1  KLRD1  IL7R  CD69  FLT3  IDO1  CD274  PDCD1LG2  TNFRSF18 HLA-DRB5   T1  T2  T3  T4  TNM   IL7R   T1  T2  T3  T4  TNM   T1  T2  T3  T4  TNM   2   4   6   8   10   C1QC   T1  T2  T3  T4  TNM   2  has been distinguished as an important part in the development of immune cells, like lymphocytes [33]. Chandran et al. [34] observed that selectively expressed IL7R of T effector clones could significantly give rise to long-live memory cells after receptive exchange to cancer patients and considered IL7R as a vital part in the immune system of tumor. FLT3 propagates signals in the cell to maintain cellular functional activity, and its ligand is also important for inducing cancer relapse and antitumor immune reactions [35]. Powell et al. [36] reported that FLT3 could contribute on the penetration of CD8+ T cells and stimulatory dendritic cells in the microenvironment, further affecting the growth of cancers. C1QC, a crucial part of the classical complement pathway, yet there are few studies on C1QC in CM. erefore, we hope to explore the biological value of C1QC through C1QB, basing on the high relationship between them (Figure 7(b)). Recent studies announced C1QB as a strong biomarker for the classification of CM patients, which could be utilized for both early discovery of melanoma and follow-up monitoring of patients. [37,38]. Also, the protein encoded by HLA-DRB5 constitutes the HLA class II molecule, which assumes a focal part in the activity of immune cells [39]. Bierer et al. [40] found that the HLA class II genotype can effectively predict the reaction of renal cancer to combined immunochemotherapy and achieve a better prognosis of patients. Other studies also proposed that HLA class II expression can be seen as a good prognostic factor for malignant, while the lack of its expression might lead to bad outcome of patients [41]. ese reports further confirm the validity of our results. e interaction between CM and its tumor microenvironment significantly influences cancer development and further affects its diagnosis, treatment and prognosis. Previous studies have provided much evidence on how the activity of cancer characteristic genes contributes to the status of the tumor microenvironment. Within our research, we focused on the features of microenvironment-related genes that can influence patient prognosis by affecting the development of CM. Critically, our findings could provide effective data for elaborating the intricate interaction between the CM and its microenvironment.

Conclusion
In this research, we discovered 19 hub genes closely related to CM prognosis, of which 4 genes (IL7R, FLT3, C1QC, and HLA-DRB5) unreported were validated to be significantly favorable for patient outcome in the CM testing dataset and may become crucial biomarkers for CM. Furthermore, it would be fascinating to investigate whether this new group of genes could bring more prominent prognostic value than a single gene. us, more studies are needed to explore the potential association between CM prognosis and tumor microenvironment in a more comprehensive way.
Data Availability e raw data are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.

Authors' Contributions
Ke Sha and Guangyao Li made substantial contributions to the design of the present study. Guangyao Li conducted statistical analysis. Guangyao Li wrote the manuscript which was revised by Ke Sha. All the authors read and approved the final manuscript.