Identification of Potential Biomarkers Associated with Basal Cell Carcinoma

Purpose This work is aimed at identifying several molecular markers correlated with the diagnosis and development of basal cell carcinoma (BCC). Methods The available microarray datasets for BCC were obtained from the Gene Expression Omnibus (GEO) database, and differentially expressed genes (DEGs) were identified between BCC and healthy controls. Afterward, the functional enrichment analysis and protein-protein interaction (PPI) network analysis of these screened DEGs were performed. An external validation for the DEG expression level was also carried out, and receiver operating characteristic curve analysis was used to evaluate the diagnostic values of DEGs. Result In total, five microarray datasets for BCC were downloaded and 804 DEGs (414 upregulated and 390 downregulated genes) were identified. Functional enrichment analysis showed that these genes including CYFIP2, HOXB5, EGFR, FOXN3, PTPN3, CDC20, MARCKSL1, FAS, and PTCH1 were closely correlated with the cell process and PTCH1 played central roles in the BCC signaling pathway. Moreover, EGFR was a hub gene in the PPI network. The expression changes of six genes (CYFIP2, HOXB5, FOXN3, PTPN3, MARCKSL1, and FAS) were validated by an external GSE74858 dataset analysis. Finally, ROC analysis revealed that CYFIP2, HOXB5, PTPN3, MARCKSL1, PTCH1, and CDC20 could distinguish BCC and healthy individuals. Conclusion Nine gene signatures (CYFIP2, HOXB5, EGFR, FOXN3, PTPN3, CDC20, MARCKSL1, FAS, and PTCH1) may serve as promising targets for BCC detection and development.


Introduction
Basal cell carcinoma (BCC) is one of common malignant epithelial neoplasms that derive from basal cells and accounts for nearly 75% of skin cancers [1]. It was reported that the BCC incidence rates have been increasing and roughly reached 2.75 million cases around the world [2]. BCC was generally categorized into several types such as superficial, nodular, and nevoid BCC on the basis of its histological and clinical characteristics. Notably, the majority of lesions for BCC occurred on the head and neck [3]. The convincing evidence has suggested that gender, age, sunlight exposure (especially ultraviolet radiation), smoking, chemicals, and fair skin were all BCC risk factors [4,5]. Although BCC recurrence was extremely frequent due to local tissue destruction, it rarely metastasizes or leads to death [6]. The imiquimod application, surgical excision, and radiation therapy have been unsatisfactory for BCC treatment. Therefore, it is imperative to develop promising strategies for early diagnosis and intervention of BCC.
Fortunately, the next-generation high-throughput sequencing provides fascinating opportunities for bioinformatics analysis. A growing number of studies have demonstrated that gene expression profiling could effectively screen molecular makers and predict pathogenesis for tumorigenesis in the last decades, thereby offering novel therapeutic strategies for cancer treatment [7]. Fang et al. previously found that the expression level of EGR1 (early growth response 1) was decreased in BCC based on a microarray analysis. Furthermore, they also noted that EGR1 might suppress epidermal proliferation via modulating CDC25A that is critical for cell cycle progression from G1 to S phase [8]. Heller et al. conducted a global gene expression analysis and identified BCC-associated differentially expressed gene (DEG) targets, providing several new diagnostic signatures for the susceptibility of BCC [9]. Later, Yunoki et al. identified three BCC-correlated genes including BCL2 (B-cell lymphoma 2), PTCH1 (Patched 1), and SOX9 (SRY-box 9) and unique gene networks involved with tumorigenesis through a microarray analysis [10]. Additionally, Jee et al. classified BCC into squamous cell carcinoma-(SCC-) like BCC, normal-like BCC, and classical BCC subtypes by a comparative analysis of gene expression profiles from BCC, SCC, and normal skin tissues. Moreover, functional analysis revealed that hedgehog signaling pathways were possibly related to classical BCC development [11]. Although numerous investigations have extracted several BCC-related gene makers, a deeper insight into the underlying molecular mechanisms of BCC has not been completely achieved.
In this study, the eligible mRNA microarray datasets from BCC tissues were retrieved and downloaded from the National Center for Biotechnology Information Gene Expression Omnibus (GEO) database. Then, the DEGs between BCC and normal tissues were screened followed by functional enrichment analyses. The protein-protein (PPI) network was also constructed to explore underlying interactions among key genes. Finally, the expression levels of DEGs were further examined by an external dataset, and diagnostic performance of these genes were also evaluated. This work provided novel gene signatures for BCC diagnosis and promoted a deeper understanding for BCC progression.

Materials and Methods
2.1. Data Acquisition. The publicly accessible microarray datasets in BCC were searched and downloaded from the NCBI-GEO [12] database (https://www.ncbi.nlm.nih.gov/ geo/). The keyword terms of carcinoma, basal cell (MeSH Terms) OR basal cell carcinoma (All Fields) AND "Homo sapiens" (porgn) AND "gse" (Filter) were used for precise searching. The inclusive criteria for datasets were as follows: (i) the datasets were gene expression profiles, (ii) the data was obtained from tumor tissues of patients with BCC or normal skin tissues from healthy individuals (normal controls), and (iii) the patients did not receive medication or other treatments. Finally, five datasets (GSE103439, GSE53462, GSE42109, GSE39612, and GSE7553) were chosen according to the abovementioned screening criteria as shown in Table 1. Of these, four datasets (GSE103439, GSE42109, GSE39612, and GSE7553) were generated by the GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array platform, and the platform for GSE53462 was GPL10558 Illumina Human HT-12 V4.0 expression beadchip. These datasets would be used for the following integrated analyses, which consisted of 48 BCC tissues and 85 normal tissue samples.

Identification of Differentially Expressed Genes (DEGs).
A meta-analysis of five microarray datasets from different platforms was carried out according to the guidelines provided by Ramasamy et al. [13]. The preprocessing of raw data across different platforms, mainly including background correction, normalization, log 2 transformation, and gene expression calculation, was performed. The overlapped genes in five datasets were selected. Then, the R metaMA package was used to calculate effect sizes from unpaired data by either classical or moderated t-tests (Limma) for each study and combine these effect sizes [14]. The DEGs between BCC and normal controls were screened according to the threshold of the multiple comparison correction false discovery rate ðFDRÞ < 0:05. The whole R sentences for data processing and identification of DEGs in this study were shown in Supplementary Materials (available here). Finally, the hierarchical clustering analysis of identified DEGs was performed by the pheatmap package (https://cran.r-project.org/package= pheatmap) in R language.

Gene Ontology (GO) and Pathway Enrichment Analyses.
To explore the possible biological roles of the DEGs, the functional analyses were conducted. Firstly, the GO analysis involving three categories of molecular function (MF), cellular component (CC), and biological process (BP) was conducted with BiNGO plugin in cytoscape software [15]. The parameters were set as follows: (i) hypergeometric test was used as the selected statistical test; (ii) Benjamini and Hochberg FDR correction was set as selected correction; (iii) selected significance level was 0.05; and (iv) for testing option, we used whole annotation as the reference set. In addition, the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis was undertaken using KOBAS (http://kobas.cbi.pku.edu.cn/index.php), which is a webaccessible tool and could identify enriched pathways for an input set of genes by mapping to genes using known Here, we performed a PPI analysis for screened DEGs based on this database to identify significant protein pairs. The cytoscape software (http://apps .cytoscape.org/apps/cytonca) was utilized to construct the PPI network. Moreover, the CytoNCA [17] (http://apps .cytoscape.org/apps/cytonca), a cytoscape plugin, was used to analyze topological characteristics of PPI nodes using the without weight parameter. Several metrics such as Degree Centrality (DC), Betweenness centrality (BC), and Closeness centrality (CC) were estimated. In this study, hub protein was determined on the basis of high scores. In addition, the MCODE plugin (http://apps.cytoscape.org/apps/mcode) in cytoscape software was employed to further identify several PPI submodules using the default parameters of degree cutoff = 2, node score cutoff = 0:2, K-core = 2, and max: depth = 100 [18]. Notably, the PPI score > 3 was considered as the cutoff for screening significantly enriched functional modules from the PPI network.

Validation of DEG Expression Levels by a Noncoding
RNA Expression Profile. The GSE74858 dataset, including 3 BCC patients and 3 healthy individuals, was obtained from the GEO database and used as a verification dataset to evaluate the expression levels of DEGs. Furthermore, the receiver operating characteristic (ROC) analysis was also carried out to assess the performance of DEGs in this research using the "pROC" package (https://cran.r-project .org/web/packages/pROC/index.html) in R language. The area under the ROC curve (AUC) was then computed. Herein, if AUC values of the genes were greater than 0.9, these genes were considered to discriminate patients undergoing BCC from healthy controls with high specificity and sensitivity.

DEG Screening.
The meta-analysis of five gene expression profiles was conducted, and the DEGs were identified. Consequently, a total of 804 DEGs were extracted between BCC and normal controls according to screening criteria described in Materials and Methods. Of these, there were 414 upregulated genes and 390 downregulated genes in patients with BCC. Moreover, the clustering analysis of these DEGs indicated that they could clearly distinguish BCC and healthy controls as exhibited in Figure 1, suggesting that the differential gene expression was possibly responsible for BCC occurrence. Table 2, the GO functional annotation analysis showed that identified DEGs were enriched in 53 GO-BP terms including cellular component organization, cell cycle, cell proliferation, and ectoderm development. Meanwhile, they were focused on 54 GO-CC terms such as intracellular part, intracellular, and cytoplasm. It was also observed that there were three GO-MF terms (protein binding, binding, and ribonuclease H activity). Notably, numerous genes were closely associated with the cellular process, including CYFIP2 (cytoplasmic FMR1 interacting protein 2; upregulated), HOXB5 (homeobox B5; downregulated), EGFR (epidermal growth factor receptor; downregulated), FOXN3 (forkhead box N3; upregulated), PTPN3 (protein tyrosine phosphatase nonreceptor type 3; downregulated), CDC20 (cell division cycle 20; upregulated), MARCKSL1 (myristoylated alanine rich protein kinase C substrate like 1; upregulated), FAS (fas cell surface death receptor; downregulated), and upregulated PTCH1. More specifically, FOXN3, CDC20, and EGFR played essential roles in the cell cycle process, and PTCH1 and EGFR were involved in cell proliferation. Additionally, KEGG pathway enrichment analysis revealed that these genes were primarily enriched in pathways in cancer, p53 signaling pathway and BCC pathway. We also noted that PTCH1 exerted critical biological roles in the BCC pathway ( Figure 2).

Functional Enrichment Analyses of DEGs. As shown in
3.3. PPI Analysis. PPI network analysis was conducted to explore the underlying interactions of DEGs. In total, 549 gene nodes (296 upregulated genes and 254 downregulated genes) and 1462 protein pairs were obtained in the PPI network (Figure 3(a)). The top 5 gene nodes with a high degree were EGLN3 (Egl-9 family hypoxia inducible factor 3; degree = 82), XPO1 (exportin 1; degree = 81), EGFR (degree = 77), CFTR (CF transmembrane conductance regulator; degree = 48), and MCM2 (minichromosome maintenance complex component 2; degree = 47), and they were regarded as hub genes. Furthermore, the PPI submodule analysis suggested that 3 significantly enriched submodules . Interestingly, we found that the genes in submodule 1 were all upregulated.

Verification of DEG Expression Levels and ROC Analysis.
A noncoding RNA profile GSE74858 was downloaded from the GEO repository, and there were 145 overlapped genes between this dataset and the integrated datasets. We found that 25 overlapped genes exhibited significant differential expression, and their expression trends were also consistent with our integration analysis. More notably, six genes (CYFIP2, HOXB5, FOXN3, PTPN3, MARCKSL1, and FAS) were predominately correlated with the cellular biological process, and their expression patterns were validated by an external GSE74858 dataset ( Figure 4). Besides, the ROC curve analysis was conducted to assess the diagnostic values of DEGs in BCC by five microarray datasets. There were 439 genes according to AUC > 0:9, and 129 genes were overlapped with DEGs by integrated analysis and then extracted. Our findings revealed that CYFIP2 (AUC = 0:949), HOXB5 (AUC = 0:908), PTPN3 (AUC = 0:952), MARCKSL1 (AUC = 0:962), PTCH1 (AUC = 0:981), and CDC20 (AUC = 0:956) could significantly distinguish BCC samples and healthy controls, implying that these genes might serve as diagnostic makers for BCC detection ( Figure 5).
Overwhelming evidence has suggested that the patched gene (PTCH) family played vital roles in BCC induction and progression [19,20]. A previous study highlighted that PTCH could encode a receptor for the hedgehog signaling pathway which was important for vertebrate development and tumorigenesis [21]. Undén et al. found that the expression level of PTCH mRNA was overexpressed in BCC cells compared with nontumor epidermal cells, which was similar to our finding that PTCH1 was upregulated in BCC patients [22]. Interestingly, numerous researchers pointed out that approximately 90% of function mutations in PTCH1 were  [24]. A recent investigation has reported that PTCH1 had two mutational statuses (germinal and somatic mutation). Moreover, there was a higher expression level of PTCH1 in BCC with germinal and somatic PTCH1 mutations than that only with germinal PTCH1 mutation [25]. Additionally, our functional analyses showed that PTCH1 played significant roles in cell proliferation and BCC pathway. However, few reports investigated the detailed molecular mechanisms of the biological roles of PTCH1 on BCC cell proliferation and growth. Herein, we performed a ROC curve analysis of PTCH1, and the result indicated that this gene was capable of discriminating BCC patients and healthy controls with a relatively high AUC value (AUC = 0:981). Taken together, our findings suggested that PTCH1 might be involved in the pathogenesis of BCC and has a great diagnostic value for BCC detection. We found that four genes (upregulated CYFIP2 and MARCKSL1 and downregulated HOXB5 and PTPN3) were also key players in cell biological processes. Furthermore, their expression patterns were validated by the differential expression analysis based on an external dataset. Besides, these four genes could clearly differentiate BCC patients from normal controls. CYFIP2 was reported to be a direct p53 target gene, and its overexpression was closely linked with the development of various cancers [26,27]. Ling et al. analyzed p53 mutations in sporadic and hereditary BCC tumors, and genetic alterations of p53 were responsible for BCC progression [28]. Later on, Huang et al. stated that p53 activation by imiquimod contributed to cell apoptosis of a skin BCC/KMC1 cell line [29]. These findings provided indirect evidence for the hypothesis that CYFIP2 participated in the pathogenic mechanism of BCC. HOXB5 is a member of the homeobox (HOX) gene B cluster and plays key roles in several cancers [30,31]. Lee et al. suggested that HOXB5 increased cell proliferation and invasiveness in estrogen receptor-(ER-) positive breast cancer [32]. More interestingly, this research group also found that HOXB5 could upregulate EGFR expression and thereby promote HOXB5driven invasion in ER-positive breast cancer [33]. In this work, we noted that EGFR was involved in cell cycle and proliferation and served as a pivotal hub gene in PPI analysis. Whether the HOXB5/EGFR axis participated in BCC development remains to be illuminated in the following analysis. PTPN3 (also known as PTPH1) was also reported to be strongly correlated with diverse cancers including colorectal cancer and gastric adenocarcinoma [34][35][36]. Furthermore, Li et al. unraveled that the PTPN3 depletion could block the degradation of EGFR, which accelerated cell proliferation and tumorigenicity in lung cancer cells [37]. The potential roles of the PTPN3/EGFR axis on BCC occurrence also need to be clarified in the future. MARCKSL1 is a cytoskeletal regulator and implicated with the initiation of multiple cancers [38]. Our results implied that MARCKSL1 might be a promising diagnostic maker for BCC prediction. However, the  The FAS gene contains a highly conserved cytoplasmic death domain and encodes a transmembrane protein of the tumor necrosis factor receptor superfamily. Existing evidence has shown that FAS could bind to a death ligand, FAS ligand (FASL), to induce the cell apoptotic pathway [39]. The early research indicated that FAS normally was underexpressed and even undetectable in BCC while there was a high FASL level, possibly inducing cell death and contributing to cancerization [40]. A later study examined the expression level of FAS/FASL in BCCs using the immunohistochemical method. The results implied that FASL was located on the cell membrane of keratinocytes at the basal cell layers and FAS/FASL was markedly decreased in BCC [41]. Wang et al. revealed that FAS/FASL mRNA expression and protein levels were reduced in the BCC compared to the normal skin samples and FASL immunostaining levels were strongly related to the ability of tumor invasiveness and metastasis [42]. Similarly, we found that there was a lower FAS expression in BCC tissues than healthy controls, which was also confirmed by a bioinformatics analysis with another BCC dataset. Therefore, we speculated that FAS might be a key gene driver in BCC development. The other two genes' (CDC20 and FOXN3) levels were increased in BCC tissues compared to normal tissues, and functional analysis showed that they were mainly involved in the cell cycle process. We also observed that CDC20 exhibited a good performance for BCC diagnosis based on a ROC analysis. Therefore, we inferred that the aberrant expression of CDC20 and FOXN3 was probably associated with BCC development. However, the influence of these two genes on BCC pathogenesis has not been fully investigated.
There are still limitations in our analysis. Firstly, only one external noncoding RNA dataset GSE74858 was obtained and used to evaluate expression levels of identified DEGs due to lack of gene expression profile of BCC. Notably, the expression patterns of six genes (CYFIP2, HOXB5, FOXN3, PTPN3, MARCKSL1, and FAS) were consistent with our initial differential expression analysis in the training dataset. However, the expression levels of other key genes still need to be further evaluated by using a larger sample size. Secondly, the corresponding experimental assays such as cell and animal experiments were not performed to validate our conclusion primarily due to lack of enough patients' samples and limited research funding. Thirdly, the underlying molecular mechanism between several signaling pathways involving key genes and BCC development remains to be deciphered. Fourthly, the relevant clinical characteristics are also required to be collected to assess BCC prognosis.
In summary, nine gene signatures (CYFIP2, HOXB5, EGFR, FOXN3, PTPN3, CDC20, MARCKSL1, FAS, and PTCH1) may play central roles in the initiation and  progression of BCC, which provides deeper insights into BCC management. However, experimental verification and integrated bioinformatics analyses still need to be carried out in the future.

Data Availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest
The authors declare that they have no conflicts of interest.

Supplementary Materials
The whole R sentences for data processing and identification of differentially expressed genes in this study.