Data Mining Analysis of Gene Prognostic Markers of Metastatic Skin Cancer Based on the Elastic Network Method

Skin cancer is a typical cancer tumor, which occurs all over the world and has a relatively high recurrence rate, including metastatic tumors that occur in other tissues and metastases to the skin, thus jeopardizing the personal life satisfaction and soundness of patients. Due to individual differences, the traditional treatment methods cannot adapt to every patient accurately, so it is difficult to achieve the desired treatment effect for each individual. Nowadays, with the development of gene chip, many new therapies based on gene are more targeted and flexible for the treatment of skin cancer patients. Therefore, it is necessary to mine and analyze appropriate gene biomarkers according to patients' genes. Because of the high cost of gene chip technology and the large number of human genes, there are few samples of gene data and high dimensions. It is a key problem to mine effective genetic biomarkers from the sample data. In this paper, we firstly performed the preliminary analysis using the difference expression analysis and proportional hazards model, then used the elastic network method to reduce the range of genetic data selection, and screened 26 gene prognostic markers closely related to the recurrence of metastatic skin cancer. Finally, the 26 gene biomarkers were analyzed by functional analysis and verified using a test sample. Research findings have shown that the obtained genetic markers have certain value in the clinical prognostic treatment of metastatic skin cancer.


Introduction
Skin cancer, which is a threatening tumor that torments people everywhere on the world, is partitioned into nonmelanoma skin cancer (NMSC) and malignant melanoma [1,2]. ere were 65,155 relevant NMSC deaths, representing 0.7 percent of worldwide cancer mortality, and 142,056 new instances of NMSC, representing 5.8 percent of worldwide malignancy cases, reported by the 2018 global cancer statistics. ere were 60,712 skin-melanoma-related passing, representing 0.6 percent of worldwide malignancy mortality, and 779,723 new instances of skin dangerous melanoma, representing 1.6 percent of worldwide disease cases [3].
Skin cancer has a higher recurrence rate in contrast with other types of cancer, with a 35 percent likelihood of recurrence in the initial 3 years and a half likelihood of recurrence in the initial 5 years. Repetitive skin malignancy is normally a similar subtype as the original cancer [4,5]. e rate of skin disease in China was 8 per 1,000 and the death rate was 3.22 per 1,000 reported by the Chinese Cancer Statistics in 2015 [6,7]. Even though the death pace of skin disease is very low and the cure rate is high, its high occurrence and repeat rate comprise a critical monetary weight to wellbeing administrations. Besides, skin malignancy situated on the head and other profoundly obvious regions may influence a patient's mental prosperity and personal satisfaction [8].
Metastatic skin cancer is brought about by metastasis and the spread of threatening tumor cells in different parts.
ese metastatic disease cells are normally metastasized by the blood or lymph, by surgical implantation by spreading in the space of various tissues. e likelihood of metastasis is more prominent, and the end phase of threatening tumor regularly takes the event of metastatic skin as a typical indication, as the harmful level of the tumor keeps on expanding. e emergence of this phenomenon frequently shows the decay of the prognostic impact. Simultaneously, some metastatic skin disease is likewise the primary clinical sign of harmful tumor [9]. us, great prognostic impact is significant for the recurrence therapy of metastatic skin cancer.
Conventional statistical techniques regularly yield insecure outcomes and inordinate blunders when applied to gene expression examination [4]. With the advancement of next-generation sequencing innovation and gene chip innovation [6] and the consistent amendment of exactness clinical consideration and the perspectives of individualized clinical therapy, comprehending skin cancer from proposing more compelling genetic biomarkers and the viewpoint of the genome gives more significant information to medicine advancement and clinical decision making [5]. Besides, the quantity of factors is commonly a lot higher than the sample size in high-throughput gene expression analyses, which is known as the Scourge of Dimensionality [10]. We ever used the LASSO (Least Absolute Shrinkage and Selection Operator) method to achieve available results [11]. However, due to the strong bundle ability of LASSO regression, only one of the characteristic variables is often selected in the face of the characteristic variables with high degree of correlation, which leads to the poor effect of model prediction or classification. e stability of the model is reduced. Compared with the LASSO regression, the elastic network can select the variables with higher degree of connection in the data samples with larger correlation of feature variables and can ensure the stability of the model while achieving the simplified model of screening feature variables. Subsequently, the elastic network method was utilized in this study to mine genomic information in the current investigation to limit the insecurity brought about by high-dimensional information [12].

Materials
We utilized e Cancer Genome Atlas and obtained the data on skin disease from the Xena Functional Genomics Explorer (xenabrowser.net/datapages/), which includes the quantity of genes surveyed (n � 20, 530) and the quantity of patients (n � 481). ere were 368 metastatic (called metastatic) samples, 105 primary samples (called primary tumor), and 8 other sorts of samples in the available data [13][14][15][16].
Firstly, metastatic samples and 255 patient samples were held after 218 samples which missed survival attribute values were eliminated from 473 primary samples. rough coordinating the patients' samples in the gene expression spectrum, these data were defined into metastatic samples (n � 183) and primary tumor samples (n � 72). At last, the metastatic samples (n � 183) were haphazardly isolated into two groups by utilizing the "sample" function in R version 3.5.2 [17]: training samples (n � 91) and test samples (n � 92).
It is essential for information examination to judge different trait values in the samples as affecting components because a few clinical variables influence the anticipation of patients with skin cancer. Clinical factors, which included age, radiation therapy, cancer status, and gender, were utilized to examine the relevant clinical components in the resulting information mining analysis based on past investigations [18][19][20].

Methods
Firstly, the differentially expressed genes of the 91 training samples and 72 primary tumor samples were analyzed through utilizing the "Limma" package version 3.42.2 [21] in R. 765 genes were viewed as differentially expressed genes in light of the adjusted P value (adj.P.val < 0.01).
en, the Cox coefficient and Cox regression analysis of the differentially expressed genes were carried out by utilizing the "survival" package (rdocumentation.org/packages/ survival; version 3.1-8) in R. At the same time, the P values of the Wald test and the hazard ratio (HR) of each gene were determined by utilizing the Kaplan-Meier method [22]. Also, using the "survivalROC" package (version number 1.0.3; rdocumentation.org/packages/survivalROC), the 169 genes that were essentially connected with the survival of the patients (P < 0.05) were screened out.
At last, utilizing the elastic network method in the R package "glmnet" (rdocumentation.org/packages/glmnet; version 3.0-2), the selected 169 genes were analyzed again in order to acquire more vital genes. 26 risk genes which were firmly identified with survival were distinguished through 10-fold cross validation [23,24]. In addition, the pathway involved in protein activity was investigated by using the Gene Ontology (GO) Cell Component Ontology Method [25].
During the process of this study, the PI value was a significant marker of the integration of risk genes and can be resolved for every patient with skin cancer. After directly fitting the result of the expression and the coefficient revised by the elastic network method for every gene, the PI was obtained. e formula for computing the PI is where β i is the coefficient amended by the elastic network method of the ith gene and X i is the declaration of the ith gene.

Results
Cancer status and age were fundamentally connected with the repeat of skin cancer, which indicated that these two clinical components might be utilized as independent prognostic pointers (Table 1). 44 genes were deemed essentially connected with the survival of patients through using survival analysis in the wake of deciding differential gene expression. erefore, 26 risk genes related with skin cancer were recognized (Table 2) after the elastic network analysis and Cox regression analysis were performed. e Box-Plot map was obtained by QC detection of gene expression profile, in which the horizontal axis was each gene and the longitudinal axis was the gene expression level. We can see that the data distribution of each gene expression of the expression matrix is relatively flat, and the differential expression analysis can be carried out. e Box-Plot diagram of QC detection is shown in Figure 1: en, the genes of the primary patient sample were compared with those of the metastatic patient by the method of gene differential expression analysis. On the basis of statistics and other principles, the results, which simultaneously satisfied adj.P.value < 0.01, |LogFC| > 0.7, AveExpr > 5, and B > 5, were screened out, and 765 differentially expressed genes were obtained.
According to the results of differential analysis, the Volcano Plot map based on the differentially expressed genes was drawn, in which the adj.P.value value after log10 conversion was the longitudinal axis, the LogFC value was the horizontal axis, the blue part was the significantly downregulated differentially expressed gene, the red part was the significantly upregulated gene, and the gray part was the nonsignificant differentially expressed gene. e results are shown in Figure 2.  Using the coxph function (version number 3.1-8), the Cox univariate regression analysis was performed on the obtained differential genes. Based on the P < 0.05 statistical principle, 169 genes were selected which were significantly related to metastatic skin cancer. e elastic network regression was used to analyze the obtained genes. So as to gradually reduce the scope of gene selection, cross validation was combined to dynamically contract the regression coefficients of each gene variable, and the dynamic process of screening the gene variables is shown in Figure 3.
rough analysis and comparison, the penalty parameter λ � 0.6155182 and the mixing ratio parameter r � 0.5 were finally chosen. e parameters λ and r were brought into the elastic network regression, and the genes which were significantly related to the recurrence of metastatic skin cancer were analyzed and processed. Finally, 26 gene prognostic markers were obtained through analysis and mining. e gene prognostic markers obtained are shown in Table 2.
e PI of every patient was determined and the patients were arranged from least to most according to their PI value through straight fitting of the result of regression coefficient and expression of the 26 genes in every sample. e patients were partitioned into low-risk and high-risk groups (Figure 4) according to the median PI value. e higher the PI value, the higher the risk of recurrent survival, and the lower the PI value, the lower the risk of recurrent survival. e survival curves for the two groups of patients are shown in Figure 5 by utilizing the Kaplan-Meier method. Patients were considered low risk that displayed essentially longer overall survival times (P < 0.001; HR � 26.321). A Receiver Operating Characteristic (ROC) curve with the 5year survival rates was drawn (Figure 6), and investigation was performed through utilizing the "survivalROC" package. According to the Area Under the Curve (AUC) value, the merits and demerits of the model developed utilizing the 26 gene biomarkers were confirmed. e outcomes demonstrated that AUC was equivalent to 0.977 (AUC > 0.5 shows an appropriate model).
We used the 26 gene prognostic biomarkers to fundamentally arrange the patients with skin cancer in the training samples into two groups: low risk and high risk based on the exploratory outcomes.
Using the obtained 26 gene prognostic markers and their regression coefficients, combined with the validation samples to calculate the prognostic index of patients in each validation sample, the prognostic index was established  according to the obtained prognostic index, as shown in Figure 7. According to the obtained prognostic indicators, combined with the gene expression profile of patients in the validation sample, the heatmap of risk gene expression profiles in the validation sample was drawn, as shown in Figure 8, from which we can see the different gene expression in each patient. e 26 genetic biomarkers were used to approve the test samples so as to further confirm the precision of the test outcomes. e patients with skin cancer in the test sample could be classified into low-risk and high-risk categories based on these genetic biomarkers (P � 0.03, HR � 3.029), and the results are shown in Figure 9. e AUC of the ROC curve was 0.78, as shown in Figure 10, indicating that the obtained gene prognostic markers are firmly identified with the recurrence of metastatic skin cancer, which greatly affected patient prognostic investigation. e association between the identified gene biomarkers and their associated protein synthesis pathways was also analyzed and studied through using the online genetic analysis tool STRING [26] (as shown in Figure 11). A pathway including ten proteins (CLIC5, PPL, EVPL, KRT6B, NMU, and P2RY14, etc.), which involved a cyclin-dependent protein kinase holoenzyme complex, was identified by Gene Ontology (GO) cellular component ontology analysis.
STRING gene function analysis shows that the main functions of the analyzed genes are focused on epidermal cell differentiation, keratinization, intermediate fibrin, and keratin formation. While keratin is the main protein that forms the skin epidermis and fur hair follicles, which plays a  Based on the abovementioned experimental results, we will further detect the performance of 26 genetic biomarkers in clinical subtypes. Age and cancer status among the clinical factors were fundamentally connected with the recurrence status of patients with skin disease (as shown in Figure 12-15). us, these 26 genes ought to be believed in various clinical types to figure out which clinical state they are more appropriate for.
As shown in Figure 12-15, the outcomes indicate the prescient impact of these 26 gene biomarkers. Patients  Mathematical Problems in Engineering without a tumor had enhanced survival contrasted with those who had a tumor. Additionally, patients who were less than and equal to 57 years of age had enhanced survival contrasted with those who were more than 57 years of age. erefore, 26 prognostic markers are closely related to the recurrence of metastatic skin cancer and still have a good prognostic risk assessment effect among the patient samples with different clinical factors, which can clearly distinguish between high-risk and low-risk patient samples.

Discussion
Multiple statistical analysis methods (including single-factor survival analysis, the multifactor Cox proportional hazards regression model, elastic network regression, and ROC curve analysis) were utilized to distinguish contrasts in the gene expression profiles of patients with skin cancer in the current investigation. After using a directed cluster analysis strategy, we identified 26 prognostic genes from the training samples and validated them on a test sample. e outcomes indicated that patients with skin cancer could be stratified as low risk and high risk based on these 26 genes, which suggests the achievability of the mining technique utilized in the current investigation. e 26 prognostic genes distinguished in the current examination effectively separated the patients with skin cancer as indicated by the danger of recurrence. From a biological perspective, these genes may have significant reference value for the therapy of repeat of metastatic skin cancer and may impact future clinical investigations of skin cancer and medication advancement.
Several genes among above 26 prognostic genes are closely associated with skin cancer and metastatic skin cancer, which included SSH3, CLIC5, and ACER1 genes. For example, CLIC5 was associated with HCC cell development, metastasis, and invasion of other parts of HCC [27]. SSH3 promotes the invasion and metastasis of cancer cells by affecting the signaling cascade involving LIMK1/Rac1 [28]. e PKHD1L1 gene has been shown to have a high frequency of mutations in various types of epithelial cancers, such as skin cancer [29]. ACER1 can regulate the state of epidermal cells and the occurrence of tumors [30]. DOCK9 plays an induction role in epithelial cancer [31]. FERMT1 can promote the metastasis of cancer cells [32]. High expression of KRT6C promotes the spread and metastasis of cancer [33]. e TRIM16 gene is inhibited in the process of skin canceration and can reduce the metastasis of skin cancer cells [34]. e SERPINB family genes repress the intrusion and metastasis of cancer cells [35].
MUM1 was positive in 33/36 (92%) melanoma cases (21/ 22 [95%] conventional primary melanoma and 12/14 [86%] metastatic melanoma) and may be a supplement to melanoma markers [36]. e abnormal expression of S100A2 is relevant to the canceration of epidermal cells. Compared with normal cells, the expression of S100A2 in tumor cells increases [37]. Although the remaining genes have not been directly correlated with metastatic skin cancer, many studies have found that these genes are correlated with other different cancers or cancer-related life activities. For example, GLTP expression regulates cell shape and participates in cell proliferation and other activities by interacting with other proteins [38]. SCN4A gene expression was found in both strong and weak metastatic PC cells [39]. e KRT6B gene can be used to separate lung adenocarcinoma from squamous cell carcinoma [40]. RHOV is expressed in lung cancer cell lines and upregulated in most studied lung tumors [41].
ere is a strong correlation between the activity level of FGL-2 and malignant proliferative diseases, which can be used as a diagnostic tool for malignant tumors [42].
In a word, the 26 gene biomarkers distinguished dependent on the elastic network method can availably forecast the risk of patients who had skin cancer and may be built into a useful model of prognosis which can provide some reference value for later research.

Data Availability
Using e Cancer Genome Atlas, data on skin cancer from the Xena Functional Genomics Explorer (xenabrowser.net/ datapages/) were obtained, including the number of patients (n � 481) and the number of genes assessed (n � 20,530).

Conflicts of Interest
e authors declare that they have no conflicts of interest.