Identification of Three Genes Associated with Metastasis in Melanoma and Construction of a Predictive Model: A Multiracial Identification

The aim of this study was to identify hub genes associated with metastasis and prognosis in melanoma. Weighted gene coexpression network analysis (WGCNA) was performed to screen and identify hub genes. ROC and K-M analyses were used to verify the hub genes in the internal and external data sets. The risk score model and nomogram model were constructed based on the IHC result. Through WGCNA, the three hub genes, SNRPD2, SNRPD3, and EIF4A3, were identified. In the external data set, the hub genes identified were associated with the worse prognosis (TCGA, SNRPD2, P ≤ 0.02; SNRPD3, P = 0.12; EIF4A3, P = 0.11; GSE65904, SNRPD2, P = 0.04; SNRPD3, P = 0.10; EIF4A3, P < 0.01; GSE19234, SNRPD2, P < 0.01; SNRPD3, P < 0.01; EIF4A3, P < 0.01). In the GSE8401, we found that the hub genes were highly expressed in the metastasis compared with the nonmetastasis group (SNRPD2, 988.5 ± 47.83 vs. 738.4 ± 35.35, P < 0.01; SNRPD3, 502.7 ± 25.7 vs. 416.4 ± 23.88, P = 0.02; EIF4A3, 567.6 ± 19.56 vs. 495.2 ± 21.1, P = 0.01). Moreover, the hub genes were identified by the IHC in our data set. The result was similar with the external data set. The hub genes could predict the metastasis and prognosis in the Chinese MM patients. Finally, the GSEA and Pearson analysis demonstrated that the SNRPD2 was associated with the immunotherapy. The three hub genes were identified and validated in MM patients in external and internal data sets. The risk factor model was constructed and verified as a powerful model to predict metastasis and prognosis in MM patients.


Introduction
Melanoma (MM) is a highly aggressive malignant tumor that is prone to metastasis at an early stage and has the highest mortality rate among skin cancers; it leads to more than 90% of skin cancer-related deaths [1,2]. The American Cancer Society reports that about 106,110 new cases are diagnosed in 2021 in the US [3]. Nowadays, the surgery combined with the immunotherapy, radiotherapy, and chemotherapy had been increased the prognosis of the MM patients [4]. However, the prognosis of the MM with metastasis patients was still poor [5]. Recently, several studies have paid attention on metastatic MM, and many potential biomarkers have been reported to predict the metastasis of the MM [6][7][8]. However, the majority of the biomarkers were according to the skin MM, and the mucous MM has been ignored. Thus, to identify the biomarkers which could predict the metastasis of the skin MM and mucous MM is needed.
In Europe and the US, most of the MM occurred in the skin. But it was different in China; the location of MM often occurred in other mucous tissues other than the skin, including the head and neck (nasal pharyngeal and oral), gastrointestinal (upper GI and lower GI), gynecological, and urological [9]. To search the effective biomarkers which could predict the metastasis not only in skin MM but also in the mucous MM was important. Currently, the weighted gene coexpression network analysis (WGCNA) is a widely used algorithm, and it was unbiased systematic biological approach which can be used to screen the effective biomarker in the multiple diseases including various cancers. Moreover, it could bridge the gap between individual genes and clinical information [10][11][12]. Thus, the WGCNA could be used as an efficient algorithm to screen the hub genes for the metastatic MM.
In this context, this study was aimed at screening the relevant hub genes for metastatic MM, including skin MM and mucous MM, by using WGCNA. Then, the hub genes were verified using patient tissue samples and internal and external testing data sets.

Data Collection from the GEO and TCGA Databases.
Three microarray data sets were downloaded from the Gene Expression Omnibus (GEO) database (https://www.ncbi .nlm.nih.gov/geo/). The data set (GSE8401) [13,14] from Austria included mRNA expression profiles of 31 primary MM samples and 52 metastatic MM samples treated as a training set for WGCNA. Data set GSE19234 [15] from the USA included mRNA expression profiles of 44 MM patients, and GSE65904 [16,17] from Sweden included mRNA expression profiles of 210 MM patients which were used to verify the hub gene prognosis in MM. In addition, The Cancer Genome Atlas (TCGA) data set from the US contained with 462 MM patients was also used to verify the prognosis of the hub genes. A flow diagram of the present study is demonstrated in Figure 1.

Data Collection and Immunohistochemical Analysis.
A total of 147 MM patients between January 2010 and December 2018 were enrolled in our study. The patients were diagnosed with MM by the two individual pathologists. The protocol was approved by the Committee of the Fujian Medical University Union Hospital.
The protein expression of hub genes in 147 MM patients was assessed using the immunohistochemical streptavidinbiotin complex method [18]. The IHC score was described in our previous study [18,19]. The score between 0 and 4 was defined as the low expression and >4 was defined as high expression. All analyses were performed in a doubleblind manner.

Coexpression Network Construction.
The WGCNA algorithm was described in detail previously [12]. Briefly, firstly the coexpression network was constructed by "WGCNA" package in R software [20,21]. Next, the correlation matrix GSE8401 from the GEO data base Weight co-expression network construction Identification of real hub gene in both coexpression network, PPI network and MCODE score The hub genes were screened Validate the relationship between the hub genes and metastasishe using the test set of GSE8401 and our data Go functional enrichment and KEGG pathway analysis Validate the prognosis of the hub genes in the GSE19234, GSE65904 and TCGA data base

GSEA analysis and correlation analysis
The risk score model was constructed basing on our data Validate the efficiency of the risk score modelin the GES8401, GSE19234, GSE65904 and TCGA data base The Nomogram was build to predict the prognosis of the melanoma patients Identification of metastasis associated hub module     Journal of Oncology was established and the soft threshold power was determined. Then, the topological overlap matrix (TOM) was established [22][23][24]. Based on the metastasis or not, we calculated each module P value by the t-test gene significance. The module which was most associated with the metastasis was selected as the hub module. The green module was selected.

Hub Gene
Identification. The hub genes should be considered as the maximum specific weight and core genes of the interaction in the green module of all genes. To identify hub genes, we uploaded all genes in the green module to the Search Tool for the Retrieval of Interacting Genes (STRING) to construct protein-protein interaction (PPI) network. Cytoscape was used to perform PPI network analysis to screen hub gene in the maximum specific weigh modules within the PPI network. The Molecular Complex Detection (MCODE) analysis was performed to screen the hub genes. Based on the PPI, Pearson analysis, and MCODE analysis, we screened three hub genes, SNRPD2, SNRPD3, and EIF4A3.

GSEA GO Enrichment and KEGG Pathway Analysis.
To explore the potential function of hub genes in MM patients, GSEA was performed in MM patients from GSE8401 data sets. P < 0:05 and jenrichment score ðESÞ j > 0:3 were set as the cutoff criteria. The Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed using standard enrichment computation method.
2.6. Statistical Analysis. Statistical analysis was performed using the SPSS software (24 SPSS), GraphPad Prism 7, and R software (4.0.3). Continuous variables were reported in means and standard deviation via the analysis of variance test. Survival outcomes were assessed using the Kaplan-Meier method and log-rank test. The optimal cutoff values for hub gene expression were determined using the X-tile program (http://www.tissuearray.org/rimmlab/). The TIMER database was used to analyze the relationship between the hub genes and tumor-infiltrating immune cells [25,26]. The risk factor model was constructed based on the Cox proportional hazard model [27]. P < 0:05 was considered statistically significant.

Construction of Weighted Coexpression Network and
Identification of Key Modules. To identify the hub gene, we used the weighted coexpression network to analyze the GSE8401 (Figure 2(a)). A total of 12 modules were identified (Figure 2(c)), and the green module has the highest positive association with the metastasis (Figure 2(b)) (r = 0:83, P < 0:01). Thus, the green module was selected as the hub modules for further analysis. KEGG analysis was conducted to investigate the pathway of the green module. Genes in the green module were mainly enriched for MAPK signaling pathway, cell cycle signaling pathway, and spliceosome signaling pathway (Figure 2(d)). GO analysis was conducted to investigate the pathway of the green module. Genes in the green module were mainly enriched for mRNA splicing, cell proliferation, and cell-cell adhesion (Figure 2(e)).

Hub Gene Identification and Validation in the GSE8401
Data Set. There were 715 genes in the green module considered as candidate genes. Firstly, the genes were analyzed by the PPI network and 214 genes were considered as the   Journal of Oncology candidate hub genes, based on the PPI degree score and MCODE score (all selected the top10 genes) (Figure 2(f)). Finally, we chose the most associated genes, SNRPD2, SNRPD3, and EIF4A3, as the "real" hub genes.

Hub Gene Validation in the External Data Sets.
To further explore the prognostic impact of SNRPD2, SNRPD3, and EIF4A3 on the MM patients, the X-tile program was used to determine the cutoff value of SNRPD2, SNRPD3, and EIF4A3. In the TCGA data set, the X-tile result identified that the best cutoffs of the hub genes were as follows: SNRPD2, 6.492; SNRPD3, 5.490; and EIF4A3, 4.805, respectively. The result demonstrated that the higher expression of the SNRPD2 had worse OS compared with the lower group (SNRPD2, P ≤ 0:02 Figure 4(a)), but there is no significance in the SNRPD3 and EIF4A3 (SNRPD3, P = 0:12; EIF4A3, P = 0:11; Figures 4(b) and 4(c)). Similarly, in the GSE65904, the X-tile result identified that the best cutoffs of the hub genes were as follows: SNRPD2, 7500; SNRPD3, 2450; and EIF4A3, 2950, respectively. The result demonstrated that the higher expression of the SNRPD2 and EIF4A3 had worse OS compared with the lower group (SNRPD2, P = 0:04; EIF4A3, P < 0:01; Figures 4(d) and 4(f )), but there is no significance in the SNRPD3 (SNRPD3, P = 0:10; Figure 4(e)). In the GSE19234, the X-tile result identified that the best cutoffs of the hub genes were as follows: SNRPD2, 14099.5; SNRPD3, 509.2; and EIF4A3, 3334.0, respectively. The result demonstrated that the higher expression of the hub genes had worse OS compared with the lower
An analysis of the relationship between the three hub genes and prognosis in MM patients further revealed that the high expression of SNRPD2, SNRPD3, and EIF4A3 was associated with worse OS (all P < 0:01, Figures 5(a)-5(c)). We found similar result in the DFS analysis; the lower expression of hub genes had better prognosis compared with the higher group (all P < 0:01, Figures 5(d)-5(f)).  Table 1). Although in the multivariate Cox regression the SNRPD3 has no significance, it may attribute to the similar expression with the SNRPD2. To prevent missing the information of the SNRPD3, the SNRPD3 was included in the risk score model. The risk score of the hub genes was calculated using the formulae risk score = ð0:48Þ × SNRPD2 expression + 0:16 × SNRPD3 + 0:52 × EIF4A3 expression, based on the Cox regression analysis. The patients were subsequently divided into the high-and low-risk groups (Figure 6(a)). Notably, patients in the low-risk group had an improved DFS and OS than those in the high-risk group (both log-rank P < 0:01, Figures 6(b) and 6(c)). The risk score can also predict the metastasis (AUC = 0:70, P < 0:01, Figure 7(d)) and disease recurrence (AUC = 0:87, P < 0:01, Figure 7(e)) in MM patients. As shown in Figure 7(f), the result demonstrated that the risk score was higher in the metastasis group compared with the nonmetastasis group (2:172 ± 0:327 vs. 1:322 ± 0:1047, P < 0:01). Similar result was also been found in the recurrence group compared with the disease-free group (1:966 ± 0:143 vs. 0:6503 ± 0:05832, P < 0:01). Time-dependent AUC curves showed that the risk score model had the best powerful ability to predict the MM patient prognosis (Figures 7(g) and 7(h)).

The Association between Risk Score and Clinical
Characteristics in MM Patients. The 147 MM patients according to the risk score were divided into the low-(n = 78) and the high-risk score groups (n = 69). Notably, there were no significant differences in gender, age,   Table 1). The high-risk group had a significantly higher pathology T stage (P < 0:001), pathology N stage (P < 0:001), and pathology M stage (P = 0:018) than the low-risk group.
The Cox regression analysis was subsequently performed to determine the prognostic factors in MM patients. The univariate Cox regression analysis revealed that higher pathological T stage (HR = 2:129, P < 0:001), pathological N stage (HR = 2:613, P < 0:001), and a higher risk score (HR = 1:533, P < 0:001) were independently associated with the DFS in MM patients. A higher pathological N stage (HR = 2:034, P = 0:006) and higher risk score (HR = 1:338, P < 0:001) remained significantly associated with the DFS in MM patients in the multivariate Cox regression analysis (Table 2). Moreover, based on the results from multivariate Cox regression, a predicting nomogram for DFS was developed, as shown in Figures 8(a) and 8(b). By summing up the score of each variable, a straight line could be drawn to obtain the predicted the DFS rate.
The Cox univariate analysis further revealed that a higher pathological T stage (HR = 2:030, P < 0:001), pathological N stage (HR = 2:459, P < 0:001), pathological M stage (HR = 3:703, P < 0:001), and a higher risk score (HR = 1:539, P < 0:001) were independently associated with the OS in MM patients. A higher pathological N stage (HR = 1:899, P < 0:001), pathological M stage (HR = 2:486, P = 0:002), and higher risk score (HR = 1:400, P < 0:001) remained significantly associated with increased risk of OS in MM patients in the multivariate Cox regression analysis (Table 3). Moreover, based on the results from the multivariate Cox regression, a predicting nomogram for OS was developed, as shown in Figures 8(c) and 8(d). By summing up the score of each variable, a straight line could be drawn to obtain the predicted the OS rate.

Validation of the Risk Score in the External Data Set.
To further explore the risk score model can predict the prognosis and metastasis in MM patients. Based on the X-tile result, we calculated the risk score in each external data set. Then, the patients were divided into two groups, high-risk score group and low-risk score group. The K-M analysis demonstrated that the risk score can distinguish the prognosis in the each external data set (GSE19234, P < 0:01, Figure 9(a); GSE65904, P < 0:01, Figure 9(b); TCGA, P = 0:04, Figure 9(c)). In the GSE8401 data set, the metastasis group had higher risk score compared with the nonmetastasis group (P < 0:01, Figure 9(f)). And the ROC curve also demonstrated that risk score model could efficiently predict the metastasis in GSE8401 data set (P = 0:02, AUC = 0:65, Figure 9(e)).
3.8. GSEA Analysis and Correlation Analysis. GSEA was conducted to determine the potential mechanism for hub gene involvement in metastasis in MM patients. As shown in Figures 10(a)-10(c), the three hub genes were all involved in the immunodeficiency. To further explore the relationship between hub genes and immunotherapy sites, the Pearson analysis was performed to analyze the PD1, PD-L1, and CTLA4 genes. The result demonstrated that the SNRPD2 was associated with the PD-L1in the TCGA data set (r = −0:27, P < 0:01, Figure 10(e)). However, we could not find the similar result in the other data set (Figures 10(d), 10(f), and 10(g)). Moreover, the TIMER database was employed to analyze the hub gene association with the tumor-infiltrating immune cells. The result demonstrated that the SNRPD2 was associated with the tumor-infiltrating immune cells (all P < 0:05, Figure 10(h)), and EIF4A3 was associated with the B cells, CD8+ T cells, neutrophil, and dendritic cells (P < 0:05, Figure 10(j)). However, the SNRPD3 was uncorrelated with the tumor-infiltrating immune cells (all P > 0:05, Figure 10(i)).

Discussions
Metastasis is a tough problem in the treatment of cancer patients. In the MM patients with metastasis, the 5-year survival rate only remains 20%-30% [5]. Thus, to explore the efficiency biomarkers that could precisely predict the patients who have distant metastasis is important. In the present study, based on the WGCNA, we identified the three hub genes, SNRPD2, SNRPD3, and EIF4A3. Then, we identified that the hub genes were associated with the metastasis and prognosis in MM patients in the internal and external data sets. Finally, a risk score model was constructed to distinguish the metastasis and prognosis in the MM patients. The early/local MM could be curable by the adequate surgery, and the 5-year survival rate for patients with early/local MM is over 90% [28]. However, most MM patients had metastasis at the time of diagnosis. In our data, for the patients diagnosed with MM, the rate of the pathology stage II/III was 72.8% (107/147). Thus, to explore the efficiency biomarkers that could precisely predict the patients who have distant metastasis is important. However, to screen the hub genes basing on the traditional differential expressed genes would miss important information. WGCNA is an effective method to discover the relationship between individual genes and the clinical characteristics in cancer [10][11][12]. Thus, the WGCNA was performed to screen the hub genes associated with the metastasis in the MM from the GSE8401 which contained the metastatic and nonmetastatic MM patients. The result from the WGCNA revealed that the green module is the hub module. And the GO and KEGG analysis demonstrated that the green module was associated with the MAPK signaling pathway and cell cycle signaling pathway which has already been reported associated with the metastasis in the MM [29,30]. Then, based on the PPI and MCODE analysis, the three hub genes, SNRPD2, SNRPD3, and EIF4A3, had been screened out and been identified as the useful biomarkers to predict the metastasis in the MM patients by ROC curve and expression value analysis in the GSE8401.
The MM could occur in the skin and other mucous. In Europe and the US, the MM often triggered by ultraviolet and occurred in the skin [1]. In China, the MM occurred in the mucous except the skin nearly 50% (51.0%, 75/147, in our data) [31]. Several studies have been reported the useful biomarkers to predict the metastasis in the MM. Wu et al. [32] described that the EIF3B acted as the oncogene in the melanoma and affected the progression and immunotherapy resistance development in European and US patients. Meng et al. [33] introduced that increased PRRX1 expression is independently a prognostic factor of poorer OS and metastasis-free survival in patients with uveal melanoma in US patients. A single-cell analysis was performed and constructed in a prognostic model in European and US patients [34]. However, the biomarkers were not identified in the Asian. Whether the biomarkers could be used in the mucous MM or Chinese is still unknown. To explore effective biomarkers that can predict the metastasis in the skin MM and mucous MM is important. In the present study, we searched the GEO database and found the different source MM data to identify the hub genes which had strong ability to predict the metastasis and prognosis in the skin MM and mucous MM. However, the four external data sets which we enrolled in the present study were all came from Europe and the US, and Chinese data set is lacking. To further identify the hub genes could predict the metastasis and prognosis in the Chinese MM. The IHC analysis was performed to detect the hub gene expression in the Chinese MM patients. The results were consistent with the Europe and US result. The above result revealed that the hub genes had a strong ability to predict the metastasis and prognosis in the multiracial MM patients and different site MM.

Journal of Oncology
The three hub genes, SNRPD2, SNRPD3, and EIF4A3, had been reported in the several studies. SNRPD2 and SNRPD3 belong to the small nuclear ribonucleoprotein Sm family, which participate in the RNA splicing reaction [35]. Accumulating evidence indicates that Sm core proteins influence the profile of alternative splicing [36,37]. There are several studies reporting that the SNRPD3 was the important composition of the spliceosome [38]. The depletion of the SNRPD3 would cause lethal defects in key steps of mitosi [39,40]. EIF4A3 is a member of the DEAD-box protein family [41][42][43], and it is also a core component of the splicing-dependent multiprotein exon junction complex [44]. However, there have been few study reporting the SNRPD2 and SNRPD3 functions in the cancer. In the      Journal of Oncology present study, we found that the SNRPD2 and SNRPD3 acted as the oncogenes in the MM patients, and high expressions of the SNRPD2 and SNRPD3 were associated with the metastasis and worse prognosis. In the present study, we found that the SNRPD2 was associated with the immunodeficiency function, PD-L1 expression in the TCGA data set, and tumor-infiltrating immune cells in the TIMER. The above result suggests that the SNRPD2 may affect the immune response to regulate prognosis and metastasis in MM patients. Previous studies have demonstrated that EIF4A3 is also highly expressed in many tumors, such as lung cancer [45], breast cancer [46], pancreatic cancer [47], colorectal cancer [48], and MM [49]. The result was similar to the present study. We identified that the high expression of the EIF4A3 was associated with the metastasis and worse prognosis in the several data sets. Moreover, the function of the EIF4A3 was explored in the present study, and the result revealed that the EIF4A3 associated with the several tumorinfiltrating immune cells, including B cells, CD8+ T cells, neutrophil, and dendritic cells. The result indicated that the hub genes may affect the immune response to regulate the prognosis and metastasis in the MM patients. The risk factor model and nomogram model have been reported in several cancers, such as MM, lung, and rectal cancers [27,33,50]. In the present study, based on the IHC result, the risk factor model was constructed and was identified as the effective model to predict the metastasis and prognosis in Chinese MM patients. Additionally, the risk model was verified in the external data sets which came from Europe and the US. Moreover, the time-dependent ROC curve further demonstrated that the model had the best AUC value than the single hub gene in predicting DFS and OS in MM patients. Notably, the risk score model also had powerful ability to predict the metastasis and prognosis in external data sets. Succinctly, the risk factor model and nomogram had a strong predictive ability in predicting the prognosis in MM patients.
Despite the notable findings of this study, it was limited by several factors. The WGCNA algorithm was based on the external data set from Europe and it may be different with the Chinese patients. Secondly, the hub genes were identified by the external microarray profiling and IHC analysis. It should be verified by the more data sets. Thirdly, the function of the hub genes was explored by the bioinformatics methods without any validation assays to verify their correctness. Future studies should thus include larger sample sizes and conduct validation assays using in vitro and in vivo experiments to enhance their comprehensiveness.

Conclusion
In conclusion, through the WGCNA, we screened the three hub genes, SNRPD2, SNRPD3, and EIF4A3. Then, the three hub genes were identified and validated as effective