Identification of Melanoma Subsets Based on DNA Methylation Sites and Construction of a Prognosis Evaluation Model

Background Melanoma is a lethal skin malignant tumor, and its formation or development is regulated by various genetic and epigenetic molecules. Although there are traditional methods provided for the doctors to evaluate the patients' prognosis or make the diagnosis, the novel method based on epigenetic markers is still needed to make the early diagnosis. Results We identified 256 melanoma-independent prognosis-related methylation sites (P < 0.0001) and divided patients into seven methylation subgroups. Methylation levels and survival time in the C2 subgroup were lower than that of other clusters (P < 0.05). We established the predicted model of prognosis risk for melanoma using the significantly changed methylation sites in C2. The model efficiently divided patients into high- and low-risk groups (area under the receiver operating characteristic curve, 0.833). Risk scores and patient survival time were negatively correlated (rs = −0.325, P < 0.0001). Genes corresponding to the independent prognosis-associated methylation sites were enriched in cancer- and immunology-related pathways. We identified 35 hub genes. DOK2, GBP4, PSMB9, and NLRC5 were significantly changed according to methylation subgroups, survival, tumor stages, and T categories and were positively correlated, which was validated in the testing group (P < 0.05). The levels of DOK2, GBP4, PSMB9, and NLRC5 had an opposite trend to their methylation sites in patients with poor prognosis. Conclusions We identified seven DNA methylation subtypes and constructed a highly effective prognosis risk assessment model. The transcript levels of key genes corresponding to the independent prognosis-related methylation sites were significantly changed in patients according to prognosis and positively correlated with each other, indicating they may collaboratively promote melanoma formation. These findings further our understanding of the mechanism of melanoma and provide new targets for diagnosis and treatment.


Background
Melanoma is the most aggressive skin cancer and originates from malignant melanocytes [1]. Once melanoma has metastasized, it is generally associated with a poor prognosis [2]. e classification of melanoma has traditionally been based on histological features, and this approach divides to give appropriate treatment advice, early diagnosis of melanoma is still difficult. Furthermore, an accurate or consistent model to predict patient prognosis and identify personalized treatment strategies remains lacking [2,4,[6][7][8].
DNA methylation is a common epigenetic change involved in multiple cellular processes. ree DNA methyltransferases (DNMTs) have been identified in advanced eukaryotes, and several studies have linked aberrant protein structures of DNMTs with abnormal embryonic development and cancer development [9,10]. Previous research has shown that cancer-related DNA methylation events occur on CpG (5′-cytosine-phosphate-guanine-3′) islands and in 70% of mammalian promoter regions [9]. Methylation of CpG islands plays an important role in the regulation of gene transcription and is a critical factor of cellular malignant transformation [10,11]. CpG hypermethylation in the promoter region can influence the transcription of genes. Many actively transcribed genes show high DNA methylation levels, indicating the background or spatial distribution of DNA methylation was important for the regulation of gene transcription and the formation of malignant disease [11].
Cancer genomics studies identified a recurrent mutation in DNMT3a in 25% of patients with acute myeloid leukemia [12] that affects the prognosis of patients [13]. ese mutations were heterozygous and interfere with the catalytic activity of the enzyme. At present, the hypomethylating agent 5-azacytidine (azacytidine) shows good curative effect in myelodysplastic syndrome [9,14]. In addition, one study showed that a prediction model for colon adenocarcinoma patient prognosis built by the DNA methylation sites identified patients with poor prognosis [15]. Few studies have examined DNA methylation in melanoma.
In this study, the methylation levels were examined in samples from 475 melanoma patients and the patients were divided into different methylation subtypes. A prognostic model was built based on the prognosis-related DNA methylation sites. ese results may help provide a new method to assess the prognosis of patients and lay a theoretical foundation for researchers to understand this disease from a uniquely epigenetic perspective.

Download and Preprocessing of Data.
e study flow chart is shown in Figure 1. We downloaded transcriptome files of 471 patients with melanoma from e Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/ ) on May 7, 2021. e platform of transcriptome file was Illumina. Detailed information of patients is shown in Table 1. e detection platform for the 475 methylation data files was the Illumina Human Methylation 450; we acquired data from the University of California Santa Cruz (UCSC) cancer browser (https://xena.ucsc.edu/) on May 8, 2021. e testing data were from GSE98394. e exclusion criteria for the DNA methylation sites were as follows: there were more than 70% missing data in the whole sample; the sites located on sex chromosome; single nucleotide polymorphisms; the sites were not on the gene promoter region (2 kb upstream to 0.5 kb downstream of the transcription start site); and they were cross-reactive genome sites [16]. Clinical samples were excluded based on the following exclusion criteria: less than 30 days of follow-up data or no recorded follow-up data; no survival status; and critical clinical information such as tumor stage was missing or unknown. We used the R package impute and sva to perform the batch correction [17][18][19].

Division of DNA Methylation Subtypes of Melanoma.
e univariate Cox proportional risk regression model was built by DNA methylation sites, patients' age, stage, gender, TNM classifications, grade, and the follow-up data; after calculating, we got the prognosis-related sites (Table S1).
ose sites were used in the multivariate Cox regression models to get the independent prognosis-related methylation sites (Table S2), and they were analyzed by the Con-sensusClusterPlus package [20] in R software to determine the melanoma subtypes. Based on the k-means, we divided each sample into k groups, and the repeated times was used to check the classifications' stability. Pairwise consensus values were calculated and recorded for each k value. We used the Euclidean squared distance metric to calculate the k-means, and the results matrix included over 100 iterations. e k value was determined if there were high consistency and low variation in the cluster matrix. Pheatmap package [21] was used to draw the heatmap. If the squares were diagonally distributed, the matrix consensus was perfect.

Construction of the Prognosis Prediction Model and
Evaluation. We selected the subcluster with dramatically high or low survival probability and significantly changed methylation levels compared with other subclusters; if there were more sites in the targeted subcluster, those sites would   Journal of Oncology be chosen to build the Cox proportional hazard model by coxph function in R software [22,23].

Analysis of Gene Pathway Enrichment and Hub Genes.
We used CytoHubba to predict or discover important genes. We drew the interaction map of genes using the String tool (https://string-db.org) and imported it into CytoHubba to calculate the scores of the total genes. e main parameters in this research were maximal clique centrality (MCC), depth, edge percolated component (EPC), and maximum neighborhood component (MNC). We selected the top 50 genes in each method and constructed a Venn diagram. e overlapping genes were identified as key genes and used for subsequent analysis.

Statistical Analyses.
Comparison of the continuous data in three or more groups was performed by the Kruskal-Wallis test or analysis of variance according to whether it met normal distribution and homogeneity of variance. e comparison of numerical data in different groups was analyzed by the Chi-test. Pearson's or Spearman's correlation analysis was used to calculate the coefficients of two measurement data. Comparison of the levels of the methylation sites in different melanoma subtypes was performed by the Wilcoxon test. e survival analysis was performed by Survival package [24,25] in R software. All statistical analyses were performed using IBM SPSS statistics 21.0 or the R software, and P < 0.05 was defined as statistically significant.
Other methods in this research were performed as the same as our published article [26].

Clinical Characteristics of Patients with Melanoma and Filtering of Independent Prognostic Methylation Sites.
is study included 290 male patients and 180 female patients (Table 1). e mean patient age was 58.2 years (with 250 cases ≤60 years old and 212 cases >60 years old, Table 1). e survival time was less than 1 year for 63 cases, 1 to 5 years for 246 cases, and more than 5 years in 151 cases (Table 1). Regarding TNM staging system, there were 23 cases in T0 stage, 42 cases in T1 stage, 78 cases in T2, 90 cases in T3 stage, and 153 cases in T4 stage (Table 1). Regarding distant metastasis, there were 418 patients in M0 stage and 24 patients in M1 stage; in terms of lymph node involvement, 235 patients were in N0 stage, 74 patients were in N1 stage, 49 patients were in N2 stage, and 55 patients were in N3 stage (Table 1). Among the 470 total patients, 77 cases were in stage I, 140 cases were in stage II, 171 cases were in stage III, and 23 cases were in stage IV (Table 1).
Combined with the clinical data, we performed Cox univariate regression analysis on selected 206,635 methylation sites in the melanoma samples and extracted 783 prognostic-related methylation sites (P < 0.0001, Table S1), and those sites were not independently associated with patients' prognosis. We then performed multivariate Cox regression analysis on the above 783 sites to identify the independent prognostic methylation sites and obtained 256 Journal of Oncology 3 independent prognostic methylation sites (P < 0.0001, Table S2). Levels of the 256 sites and the follow-up were merged in Table S3. ese sites were used for the construction of the subsequent risk assessment model and further analysis.

DNA Methylation Subtypes of Patients with Melanoma and the Clinical Features.
e study flow chart is shown in Figure 1. Considering the CDF (consensus cumulative distribution function)-consensus index graph and the delta area curve, we observed that when k � 7, there was a low variation coefficient and high consistency in the cluster graph, and the changes of the area under CDF curve were relatively small (Figures 2(a)-2(b)). erefore, we divided the melanoma patients into seven subtypes. e seven subtypes were almost on a diagonal (Figure 2(c)), indicating there was good consistency. From the heatmap of the methylation levels for the seven subtypes, we found that the C2 subcluster had lower methylation levels than the other subclusters ( Figure 3(a)). As shown in the survival curve in Figure 3(b), there was a statistical difference between the seven subclusters in terms of survival (P � 2.51 × 10 −11 ) and the C2 subcluster had a lower survival rate than the others. ese results indicated that these could distinguish patients with different prognostic status.
We further compared the clinical parameters of the different subclusters and found no significant differences in patient age and gender among the subclusters ( Figure S1).

Comparison of the Methylation Levels of the Seven Methylation Subtypes.
Comparison of the methylation levels of the 256 independent prognostic DNA methylation sites in the seven subclusters revealed that the methylation levels of DNA methylation sites in the C2 subcluster were lower than those of the other subclusters (Table S4, Figures 3(c)-3(d)). We identified 99 sites that had changed in at least one subtype compared with others, and most of these sites were present in the C2 subcluster (95 sites; Table S4, Figure 3(c)).

Construction of the Cox Prognostic Risk Regression Model and Detection
Efficiency. As the C2 subtype had the lowest survival rate and methylation levels compared with other subclusters, and as this subcluster had the most significantly changed sites, we selected the significantly changed sites in the C2 subtype to draw the boxplot and construct the prognosis prediction model.
We used the coxph function in the R software to process the significantly changed methylation sites in the C2 subcluster and built the prediction model of clinical prognosis risk for patients with melanoma using the formula: risk score � Id1 × Co.1 + Id2 × Co.2 + Id3 × Co.3. . .. . .+Idn × Co.n; the Id value and Co. are shown in Table 2. Using this prediction model, we calculated the risk scores for every patient and ranked them by the risk score (Table S5). e median risk score was −2.305 (Table S5).
We then divided patients according to the median risk score: patients with a risk score higher than the median were high-risk cases, and those with lower risk scores than the median were low-risk cases. e survival rate of high-risk patients was significantly lower than that of patients with a low-risk score (P �1.11 × 10 −15 , Figure 4(a)). With the increasing of the risk score, the number of patients with melanoma increased and the methylation levels of the sites in the risk prediction model decreased (Figure 4(b)). ere was a negative correlation between patient survival time and risk scores, and patients in the high-risk group had poor prognosis (r s � −0.325, 95% confidence interval, −0.420 to 0.224, P < 0.0001, Figure 4(c)). ese results indicate that the risk score obtained from this risk prediction model could predict the prognosis of patients with melanoma.
We further used ROC (receiver operating characteristic) curve analysis to evaluate the efficiency of this method and found that the method had high efficiency. Area under the curve (AUC) was 0.833 (P < 0.05), demonstrating that this model could distinguish patients in the high-risk group from patients in the low-risk group (Figure 4(d)).
en, we randomly selected 60% of the samples to test the prediction model, we did this for 100 times, the mean AUC was 0.82, and the P value for the comparison of survival probabilities (high-risk group vs. low-risk group) was less than 0.05 in almost all the 100 trials (96% , Table S6), which confirmed the test efficiency of the prognosis prediction model we built in this research.

Pathway Enrichment of Genes Corresponding to the Prognostic Methylation
Sites. Our pathway enrichment results of genes corresponding to the prognostic-related methylation sites are shown in Table S7 and Figures 5(a)-5(b). e results identified pathways such as choline metabolism in cancer, colorectal cancer, and melanoma. e correlation between the pathways is shown in Figure 5(a); the pathways were focused on two groups, i.e., autoimmunity and immune-related pathways and cancer and its related pathways.
We next scored genes in the significant pathways using the CytoHubba plug-in of Cytoscape software. We used four methods to score the genes and selected the top 50 genes in every method to draw a Venn diagram. We identified 35 genes among the top 50 genes in all of the four methods ( Figure 5(c)) and selected these 35 genes as the hub genes. e expression levels of the hub genes in the seven subgroups are shown in Figure 5(d). We integrated the expression of the 35 hub genes with patient information and found that the gene expressions of docking protein 2 (DOK2), G protein-coupled bile acid receptor 1 (GPBAR1), guanylate-binding protein 4 (GBP4), proteasome 20S subunit beta 9 (PSMB9), and NLR family CARD domain containing 5 (NLRC5) were significantly altered in different DNA methylation subgroups, patients with different survival status, different stages, and T categories ( Figure 5(e), Table 3, P < 0.05). To explore the correlation of the 35 hub genes, we conducted Pearson correlation analysis and found that there was positive correlation between DOK2, GPBAR1, GBP4, PSMB9, and NLRC5; further, we validated their correlation in the testing group (GSE98394) and confirmed the positive correlation between DOK2, GBP4, PSMB9, and NLRC5 (P < 0.05, Tables S8-S9, Figure 5(f )). en, we searched the independent prognosis-associated methylation sites on their promoters' regions and found cg00533183 and cg07156249 were on the promoters' of PSMB9, cg07839457 was on NLRC5, cg21163717 was on DOK2, and cg27285720 was on GBP4, the methylation levels of those sites were higher in dead than the survival patients, patients in stages II ∼ IV than stage I, patients in T2∼4 than T0∼1, and the gene expression of their corresponding genes had an opposite trend to them just as we expected ( Figure 5(g)).

Discussion
Melanoma is a malignant cancer with an increasing incidence worldwide [2]. A patient diagnosed at stage IV (based on American Cancer Federation Staging) has very few treatment options and a predicted survival of less than 2 years [2,27,28]. erefore, identifying methods for early      Type  cg02736280  cg07343703  cg02717339  cg13206063  cg04020309  cg04638014  cg00622799  cg13646917  cg00637477  cg23075364  cg26418434  cg00089550  cg09468328  cg14809332  cg04803153  cg24408057  cg01328833  cg11274940  cg23288103  cg13857119  Journal of Oncology 7 diagnosis and developing more effective treatment strategies for patients are critical. e pathogenesis of melanoma is mediated by a series of genetic and epigenetic changes [2]. Epigenetic modifications silence the expression of melanin-related genes [27,29]. Aberrant DNA methylation is an epigenetic hallmark of melanoma and plays critical roles in the formation and progression of melanoma [30]. Changes in DNA methylation sites in tumor suppressor genes are found in patients with metastatic melanoma [31]. erefore, in this study, we performed an in-depth analysis of DNA methylation in melanoma to better understand the molecular mechanism of melanoma and potentially identify key sites that may be new diagnostic markers or therapeutic treatment targets.
In this study, we successfully built a prognostic risk assessment model based on the independent prognosisassociated DNA methylation sites, and this model could efficiently distinguish low-risk patient from high-risk patients ( Figure 4, Table S6). Patients with a high-risk score showed a low survival probability, and there was significant negative correlation between the risk score and patient survival time (Figures 4(a)-4(c)).
is indicated that this model could be used to predict the prognosis of patients with melanoma and thus provides a new method for doctors to evaluate patient prognosis and provide appropriate personalized treatments. Research has shown that abnormal DNA methylation changes occur before the disordered translation of the protein [32][33][34]. erefore, the predicted model built with these prognosis-related DNA methylation sites may be useful for early diagnosis. e enriched pathways of genes corresponding to the prognostic methylation sites were mainly in autoimmune or immune-related disease and tumor-related pathways. ere was a correlation between the melanoma pathway and pathways such as glioma and colorectal cancer. e tumor pathways may be connected to each other through signaling pathways such as the central carbon metabolism, phospholipase D signaling, or sphingolipid signaling pathways. Alternatively, they may affect the immune homeostasis of melanoma patients and further contribute to the   AKT3  ARL9  BACE1  BTBD6  CDC25B  CEL  CPEB1  DOK2  EIF4G1  ELOVL6  FABP3  GBP4  GHRL  GPBAR1  HERC3  KRAS  LIPE  NCK2  NLRC5  PLEKHG5  PRDM2  PSMB8  PSMB9  PSMD8  PTK2  RAC3  RLN1  RNF123  SPSB1  SREBF1  TAP1  TNNT1  TP53  TSC2 10.00 8.00 Journal of Oncology 9   SPSB1  RNF123  TP53  TSC2  EIF4G1  TAP1  RAC3  LIPE  TNNT1  RLN1  SREBF1  KRAS  ABCA1  HERC3  AKT3  PSMD8  BACE1  ARL9  BTBD6  PSMB8  NLRC5 BACE1  AKT3  PTK2  ELOVL6  KRAS  RAC3  RNF123  TSC2  NCK2  TP53  CEL  FABP3  CDC25B  TNNT1  BTBD6  LIPE  CPEB1  EIF4G1  PSMD8  DOK2  GPBAR1  GBP4  NLRC5  PSMB8  PSMB9  ABCA1  RLN1  PLEKHG5  GHRL  HERC3  PRDM2  ARL9  SPSB1  SREBF1  TAP1   survival  time BACE1 AKT3  PTK2 ELOVL6 KRAS RAC3  RNF123 TSC2 NCK2 TP53 CEL FABP3 CDC25B TNNT1 BTBD6 LIPE  CPEB1  EIF4G1 PSMD8 DOK2 GPBAR1 GBP4 NLRC5  PSMB8 PSMB9  ABCA1 RLN1  PLEKHG5 GHRL  HERC3 PRDM2 ARL9  SPSB1 SREBF1 TAP1   DOK2   GBP4   NLRC5   PSMB9   DOK2  GBP4  NLRC5 (Figures 5(a)-and 5(b)). e immune-related pathways were mainly concentrated on the natural killer cell-mediated cytotoxicity, antigen processing, and presentation pathways, suggesting that the independent prognosis-related DNA methylation sites may affect the expression of genes encoding key factors that influence the immune function, resulting in abnormal activities of natural killer or antigen-presenting cells in melanoma patients. ese alterations may affect the development of the disease and impact the prognosis of the patient.
Further, we found that the gene expression levels of DOK2, GBP4, PSMB9, and NLRC5 were changed according to different methylation subclusters, patient state, tumor stage, and clinical T categories; the transcript level of PSMB8 was changed in different methylation subgroups and patients with different survival states (Figures 5(d)--5(e)). DOK2 is an aptamer protein that regulates the tyrosine kinase signaling pathway, including tyrosine kinase receptors such as epidermal growth factor receptors [35]. e expression level of DOK2 was decreased in gastric cancer, indicating that DOK2 may be a potential tumor suppressor in solid tumors [36]. Based on the abnormal expression of DOK2 in digestive tract tumors such as gastric and colorectal cancers, some researchers established a prognostic evaluation model including DOK2 that effectively identified patients with poor prognosis [36,37]. In addition, DOK2 deficiency in ovarian cancer induced carboplatin resistance by inhibiting the apoptosis of tumor cells [38], while the expression of DOK2/Ras p21 protein activator 1 was associated with prognosis and quality of life of breast cancer patients. e deficiency of these proteins may result in tumor enlargement or progression and lymph node metastasis [39]. Deletion of DOK2 in a mouse model led to the accelerated formation of lung tumors [40]. GBP4 is a guanylate-binding protein that is involved in pathological processes such as tumor formation and progression. A prognosis predictive model that was constructed using GBP4 was used to evaluate the prognosis of melanoma patients [41,42]. A high expression of GPB4 was correlated with the favorable overall survival in skin cutaneous melanoma patients for more than 30 years [42]. Proteasome 20S subunit beta 8/9 (PSMB8/9) proteins are critical immune proteasome subunits. PSMB8/9 overexpression indicated a good prognosis in patients with melanoma and a good response to immune-checkpoint inhibitors [43]. Increased expression of NLRC5 was associated with the slower growth of the tumor in a mouse melanoma model and prolonged survival time in patients with melanoma [44]. NLRC5 was considered a potential immune molecule in antitumor treatment that could be used to improve tumor immunogenicity and restore antitumor immunity [45]. As the above description, DOK2, GBP4, PSMB9, and NLRC5 were tumor suppressors, in this research, their expressions were decreased in patients with poor prognosis and the levels of the methylation sites on their promoter regions were increased ( Figure 5(g)), the latter might hinder the expression of the former. e weak expressions of those tumor suppressors could promote the complicated formation or the development of melanoma.
Furthermore, we found a positive correlation between the gene expression levels of DOK2, GBP4, PSMB9, and NLRC5, and their correlation had been confirmed in the testing group ( Figure 5(f )), indicating that they may have a cooperative relationship. e expression of one gene might drive the expression of other genes to form a positively regulated cluster. In addition, the previously reported critical functions of these genes and encoded proteins in various cancers including melanoma help support and validate our findings. We speculate that the alterations in the independent prognosis-related methylation sites affect the expression of the corresponding genes, especially the genes in this cluster (Figures 5(f ) and 5(g)), and therefore influence the development or progression of melanoma. e positive relationships between these genes indicate they may be coordinately involved in melanoma development. ere are also some limitations in this research, which are as follows: we used 475 methylation files to build the predicted model of patients' prognosis and found the independent prognosis sites' corresponding genes as DOK2, GBP4, PSMB9, and NLRC5 were critical and positively correlated with each other, which was validated in the testing group, but there are few data sets to be found to retest the efficiency of the predicted model. In the future, we will reexamine them clinically.

Conclusion
Here, we established a prediction model for the prognosis of patients with melanoma based on prognosis-related DNA methylation sites in gene promoter regions.
is model efficiently distinguished high-risk patient from low-risk patients. Among the 35 hub genes corresponding to the prognosis-related DNA methylation sites, the gene expression levels of DOK2, GBP4, PSMB9, and NLRC5 were significantly changed in different patient subgroups according to DNA methylation subtypes, patient states, tumor stages, and T categories. e genes were positively correlated with each other, and the altered DNA methylation levels on the gene promoter regions of these genes might affect their expression. Furthermore, these genes might be involved in the pathological process of melanoma. We plan to focus future studies on these five identified hub genes and explore their potential mechanisms in DNA methylation and their interactions. Our findings may help provide new targets for clinicians to treat patients with melanoma and enable early diagnosis based on these critical DNA methylation sites. Data Availability e data that support the findings of this study are available from the corresponding author or the first author upon reasonable request.

Conflicts of Interest
e authors declare they have no conflicts of interest.

Authors' Contributions
Tengda Li downloaded and processed the data and wrote the paper; Cheng Qian provided some advices for this research; Yi Sun edited the paper; Tengda Li discussed with Cheng Qian, Jiaxuan Mei, and Yi Sun and provided the suggestions for the designation of this study.