Identification of Diagnostic Markers in Infantile Hemangiomas

Background Infantile Hemangiomas (IHs) are common benign vascular tumors of infancy that may have serious consequences. The research on diagnostic markers for IHs is scarce. Methods The “limma” R package was applied to identify differentially expressed genes (DEGs) in developing IHs. Plugin ClueGO in Cytoscape software performed functional enrichment of DEGs. The Search Tool for Retrieving Interacting Genes (STRING) database was utilized to construct the PPI network. The least absolute shrinkage and selection operator (LASSO) regression model and support vector machine recursive feature elimination (SVM-RFE) analysis were used to identify diagnostic genes for IHs. The receiver operating characteristic (ROC) curve evaluated diagnostic genes' discriminatory ability. Single-gene based on Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) was conducted by Gene Set Enrichment Analysis (GSEA). The chemicals related to the diagnostic genes were excavated by the Comparative Toxicogenomics Database (CTD). Finally, the online website Network Analyst was used to predict the transcription factors targeting the diagnostic genes. Results A total of 205 DEGs were singled out from IHs samples of 6-, 12-, and 24-month-old infants. These genes principally participated in vasculogenesis and development-related, endothelial cell-related biological processes. Then we mined 127 interacting proteins and created a network with 127 nodes and 251 edges. Furthermore, LASSO and SVM-RRF algorithms identified five diagnostic genes, namely, TMEM2, GUCY1A2, ISL1, WARS, and STEAP4. ROC curve analysis results indicated that the diagnostic genes had a powerful ability to distinguish IHs samples from normal samples. Next, the results of GSEA for a single gene illustrated that all five diagnostic genes inhibited the “valine, leucine, and isoleucine degradation” pathway in the development of IHs. WARS, TMEM2, and STEAP4 activated the “blood vessel development” and “vasculature development” in IHs. Subsequently, inhibitors targeting TMEM2, GUCY1A2, ISL1, and STEAP4 were mined. Finally, 14 transcription factors regulating GUCY1A2, 14 transcription factors regulating STEAP4, and 26 transcription factors regulating ISL1 were predicted. Conclusion This study identified five diagnostic markers for IHs and further explored the mechanisms and targeting drugs, providing a basis for diagnosing and treating IHs.


Introduction
Infants with infantile hemangiomas (IHs), with a morbidity rate of 4-10%, have developed one of the most prevalent benign vascular neoplasms in pediatrics [1]. IHs are clinically divided into the proliferation and involution stages and show a rapid proliferation during the frst year of life. Where most of them multiply within 6 to 10 months, reaching the maximum size by 3 to 5 months old, and slowly regress before 4 years old in 90% of cases [2], while the remaining 10% of IHs may require intervention resulting from location, size, and complications, involving obstruction, dysfunction, ulcers, hypothyroidism, disfgurement, and other syndromes with life-threatening [3]. Head and neck ulcers are the most common severe anatomical deformations which heavily burden the child and their families [4,5].
Te formation of hemangiomas might be evoked by secondary physiological events in patients carrying germline risk mutations, such as perinatal hypoxia or mechanical stress during delivery [6]. Although the potential pathogenesis has not been fully elucidated, the aberrant responses of pluripotent stem cells to stimuli such as hypoxia [7], abnormal glucose metabolism, and the renin-angiotensin system [8] are considered critical pathogenic factors. Te injury elicited by IHs depends on the depth and distribution of hemangioma-like lesions. IHs can occur in any organ system, and diagnosis may be delayed up to 3 months after birth in cases where deep local IHs present as a blue tumor with ill-defned boundaries [9].
Te high expression of glucose transporter-1 (GLUT-1) protein draws a distinction between IHs and other vascular tumors or malformations in pathology [10]. As a classical pathological diagnostic marker of IHs, GLUT-1 is present at all stages of IHs and has limited signifcance for the early diagnosis of IHs. Clinical events summary proved if propranolol is used early in the IHs proliferation period, which could minimize the demand for surgical reduction and reconstructive surgery in the future [11]. Terefore, the diagnostic markers contributing to an early diagnosis of IHs are eager to be explored and will be a crucial component in disease treatment.
Recently, various new biomarkers were studied and examined for IHs diagnosis, prediction of therapeutic efect, and nosogenesis investigation [12][13][14][15][16]. Diferentially expressed genes (DEGs) between normal and 6-, 12-, and 24month-old IHs samples were screened from the Gene Expression Omnibus (GEO) database in the present study, and the front-rank genes were identifed by DEGs coordinated with PPI network construction. Eventually, we confrmed TMEM2, GUCY1A2, ISL1, WARS, and STEAP4 for diagnosis and development of IHs based on receiver operator characteristic (ROC) curves analysis Gene Set Enrichment Analysis (GSEA), Gene-chemical interaction, and transcription factor prediction.

Data Source.
Te transcriptome data of the GSE127487 dataset [12] were obtained by downloading from the GEO database (https://www.ncbi.nlm.nih.gov/gds). Six normal skin samples, fve hemangioma samples from 6-month-old infants, six hemangioma samples from 12-month-old infants, and six hemangioma samples from 24-month-old infants were selected for subsequent analysis. Te hemangioma samples were all from infants who were not treated with propranolol.

Identifcation of DEGs in the Development of IHs.
To acquire genes that were involved in the development of IHs, we frst fltered genes that were diferentially expressed between normal and 6-month-old IHs samples, normal and 12-month-old IHs samples, normal and 12-month-old IHs samples, respectively by "limma" R package (version 3.44.3), employing P-value <0.05 and |Log2FC| >2 as screening thresholds [17]. Ten, the up-and down-regulated genes from the three groups were intersected separately. Finally, the common up-and down-regulated genes were obtained and considered candidate genes for the following analysis.

Functional Enrichment of DEGs. Te above conjoint
DEGs were submitted to Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional enrichment analysis in the plugin ClueGO of Cytoscape software (version 2.5.8) [18].

PPI Network Construction.
Te Search Tool for Retrieving Interacting Genes (STRING) database (https:// string-db.org/) was utilized to construct the PPI network [19] after concealing the independent nodes. Nodes with an interaction confdence level greater than 0.4 were incorporated into the network. Te data were introduced into Cytoscape software (version 2.5.8) for visual display.

Diagnostic Genes Screening and Verifcation.
Te least absolute shrinkage and selection operator (LASSO) is a highdimensional data analysis method that synchronously conducts regularization and variable selection, which would enhance the prediction accuracy and validity [20]. Te "glmnet" R package was utilized to execute the LASSO algorithm. Support Vector Machine-Recursive Feature Elimination (SVM-RFE) is a feature selection algorithm based on a support vector machine, which ranks the features based on deleting the recursive feature sequence [21]. Te e1071 R package applied the SVM algorithm. Intersection genes identifed from the two algorithms were deemed as characteristic genes. In addition, the area under the curve (AUC) of receiver operating characteristic (ROC) was computed by using the pROC package [22] to evaluate the diagnostic ability of potential diagnostic genes in discriminating hemangioma samples from normal samples.

Gene Set Enrichment Analysis.
To explore the mechanisms of diagnostic genes in the development of IHs, we performed GSEA for a single gene in "clusterProfler" and "org.Hs.eg.db" in R, took the expression value of each gene as the phenotype fle and ranked the correlation coefcients between each gene and all genes in the gene sets. Te threshold for enrichment signifcance was |NES| >1, and adjust P-value was <0.05.

Gene-Chemical Interaction and Transcription Factor
Prediction. Te chemicals related to the diagnostic genes were excavated by the Comparative Toxicogenomics Database (CTD) (https://ctdbase.org/) [23]. Te online website Network Analyst (https://www.networkanalyst.ca/faces/ home.xhtml) was used to predict the transcription factors targeting the diagnostic genes.

Statistical
Analysis. All analyses were conducted using the R programming language, and the Wilcoxon test compared the data from diferent groups. In all analyses, P-values less than 0.05 were regarded as statistically signifcant.  Table S4). Tese 205 common DEGs among the 6-, 12-, and 24-month-old infants were considered candidate genes for subsequent analysis. To investigate the function of these genes in the genesis of IHs, we performed functional enrichment analysis on these genes. As a result, 62 GO items and 9 KEGG pathways were enriched (Supplementary Table S5). Interestingly, we found that these genes were mainly involved in biological processes such as "blood vessel development," "vasculogenesis," "stress fber," "regulation of endothelial cell proliferation," "blood vessel maturation," "regulation of systemic arterial blood pressure by circulatory renin-angiotensin," "sprouting angiogenesis," "cell-cell junction," "regulation of angiogenesis," "regulation of adaptive immune response," "vascular process in the circulatory system," "endothelium development" and et al. (Figure 3(a)). At the same time, these genes were also associated with KEGG pathways such as "purine metabolism," "cGMP-PKG signaling pathway," "Notch signaling pathway," "cell adhesion molecules," "leukocyte transendothelial migration," "adipocytokine signaling pathway," "renin secretion" and et al. (Figure 3(b)). Hence, we speculated that these genes might impact the development of IHs via these processes.

Diagnostic Genes Identifcation.
A PPI network was constructed to explore the connections between the proteins encoded by the 205 DEGs. First, we dug up 127 interacting proteins and created a network with 127 nodes and 251 edges ( Figure 4). In the following, we ran the LASSO logistic regression analysis based on these 127 genes and yielded the gene coefcients graph and the cross-validation graph, fltering out seven characteristic genes according to lambda close to 0 and the lowest error rate, namely PCDH17, TMEM2, GUCY1A2, ISL1, RUNX2, WARS, and STEAP4 (Figures 5(a) and 5(b)). Meanwhile, we conducted the SVM-RFE algorithm to rank the 127 genes for characteristics, and the top 20 genes were listed in Supplementary  Table S6. As shown in Figure 5(c), 14 characteristic genes were identifed using a 10-foldcross-validation approach at the optimal point "0.0189," the top 14 genes in Supplementary Table S6. Subsequently, we obtained the characteristic genes for IHs by intersecting the distinct genes predicted by LASSO and SVM-RFE, namely TMEM2, GUCY1A2, ISL1, WARS, and STEAP4 ( Figure 5(d)). After that, we proceeded with expression analysis and ROC curve analysis to further assess the diagnostic ability of the characteristic genes. As shown in Figure 5(e), the expression of all fve distinct genes was increased in the infant hemangioma samples compared to the normal samples (all the P-value lower than 0.001). Furthermore, we found that except for GUCY1A2, which had an AUC value of 0.956, the other four diagnostic genes had an AUC value of 1, indicating the adequacy of fve hub genes as diagnostic markers to accurately distinguish IHs samples from normal samples ( Figure 5(f ).

Transcription Factors Regulating the Diagnostic Genes.
Via Network Analyst, we predicted 14 transcription factors regulating GUCY1A2, 14 transcription factors regulating STEAP4, and 26 transcription factors regulating ISL1, as revealed in the connection network ( Figure 9). ISL1 and STEAP4 were both regulated by SOX2, NANOG, SUZ12, and AR in the network. ISL1 and GUCY1A2 were both regulated by BMI1 and SMAD4. ERG1 and SETDB1 regulated STEAP4 and GUCY1A2 simultaneously. Based on the potential drugs and the transcription factors of the target diagnostic marker identifed, the linking of these drugs to the transcriptional factors was shown in the network diagram ( Figure 10). Ulteriorly focusing on the relationship between diagnostic biomarkers-related drugs and corresponding major transcription factors (Supplementary Table S10), we found that GUCY1A2 inhibitors, aristolochic acid I, and entinostat could respectively inhibit the corresponding transcription factors SETDB1 and ERG1. Te inhibitors of STEAP4, 7, 8-dihydro-7, 8-dihydroxybenzo(a)pyrene 9, 10oxide simultaneously inhibited the expression of ERG1, SETDB1, and SUZ12, antirheumatic agents inhibited ERG1, sulforaphane and troglitazone inhibited AR. ISL1 inhibitors, cyclosporine inhibited AR, and sunitinib inhibited both AR and NANOG.

Discussion
IHs are characterized by abnormalities in the proliferation of vascular endothelial cells and vascular structure. Distinguished from congenital vascular malformations, IHs generally do not appear until a few weeks after birth [5],          reporting a predominant prevalence (male-female ratio of 2.4 : 1) in females [24] and a higher incidence in premature infants and low birth weight (<2500 g). Current academic fndings revealed a 40% increased risk for every 500 g birth weight loss [25], a history of chorionic villus sampling was accompanied by a two-fold raised risk of IHs [26], and siblings with IHs may elevate the incidence of subsequent siblings [27]. In addition, multiple pregnancies, Caucasians, preeclampsia, advanced maternal age, and placental abnormalities (such as placenta previa, placental abruption, and abnormal insertion of the umbilical cord) are also existing as risk factors for IHs occasion. According to guidelines published in multiple countries and territories, pediatricians and primary care clinicians are supposed to regularly monitor IHs in infants during the frst months after birth and refer infants requiring intervention to specialists as early as possible [3]. Hence, early diagnosis is of great beneft for the subsequent treatment of IHs.
In this study, bioinformatics analyses were adopted for potential biomarkers identifcation in the development of IHs, providing a particular reference for the diagnosis and treatment of IHs. We frstly obtained 205 candidate genes by diferential gene expression analysis and intersection. Te results of GO enrichment analysis revealed that vascular development was the primary function of the candidate genes, which was primarily associated with cardiac development, vascular maturation, and the blood circulation system. Ulteriorly, we used the STRING tool to construct a PPI network, Lasso analysis, and SVM regression analysis, subtracting above 205 candidate genes, and fnally acquiring fve hub genes, i.e., TMEM2, GUCY1A2, ISL1, WARS, and STEAP4.
Acting as a transmembrane protein, TMEM2 is associated with tumor invasion and angiogenesis [28,29] and manipulates cell adhesion and metastasis by degrading nonprotein components of the extracellular matrix [30]. Gastric cancer prognosis is positively related to GUCY1A2 [31], which plays a vital role in rheumatoid arthritis development [32], and afects SARS-CoV 2 infections and drug repurposing [33]. ISL1 is identifed as a biomarker of oral squamous cell carcinoma [34], up-regulator of vascular development [35], and activator of EMT to induce drug resistance in prostate cancer [36], serving to regulate reproductive system development [37]. WARS, an essential enzyme catalyzing the ligation of tryptophan to its cognate tRNA tryptophan during translation via aminoacylation, promoting cancer metastasis by controlling angiogenesis and immune response in the tumor microenvironment, which plays a pathological role in autoimmune disease and Alzheimer's disease concurrently. In addition, high expression of WARS could be referred to as a prognostic biomarker for sepsis [38]. STEAP4 is a metal reductase that reduces Fe 3+ to Fe 2+ and Cu 2+ to Cu + , promoting iron and copper transmembrane transport and maintaining their homeostasis, as well as playing a benefcial role against damage from infammatory diseases and metabolic disorders. Nevertheless, its abnormal expression may accelerate cancer proliferation or progression [39]. Furthermore, the expressions were signifcantly higher for fve hub genes in IHs samples than in normal samples. And we fnally defned these fve genes as diagnostic markers for IHs by the ROC analysis.
Similar functional efects of fve diagnostic markers were obtained through further biofunctional analyses. It is well known that the dramatic and disorganized growth of blood vessels is a defning feature of IHs, which represents a vascular perturbation comprising pathologic angiogenesis and vasculogenesis during post-natal growth [40]. Among the top fve with highly signifcant enrichment, TMEM2, WARS, and STEAP4 were related to angiogenesis and vascular development. Valine, leucine, and isoleucine are collectively referred to as Branched-chain amino acids (BCAAs), which are essential for the growth of human tumors by donating nitrogen to generate macromolecules such as nucleotides. Cellular autonomous and involuntary efects of BCAA metabolic transformations play a role in cancer progression, indicating that the crucial proteins in BCAA metabolic pathway can be prognostic or diagnostic biomarkers for human cancer [41]. Consistently, all fve biomarkers were involved in the degradation of valine, leucine, and isoleucine. In addition, several drugs were recognized to predict the diagnostic makers' inhibitor using the CTD database. Tereinto, most of these drugs are involved in angiogenesis inhibition, e.g., arbutin [42], entinostat [43], sulforaphane [44], and sunitinib [45], which have been demonstrated to suppress angiogenesis in tumor treatments efectively. Triclosan [46] and ribavirin [47] had been identifed as efectual angiogenesis inhibitors. Cyclosporine promoted osteogenic diferentiation of periodontal membrane stem cells and curbed angiogenesis [48]. Troglitazone can subdue angiogenesis by inhibiting endothelial cells (EC) proliferation and vascular endothelial growth factor (VEGF) expression [49]. Triacsin C inhibits angiogenesis manipulated by fatty acid metabolism [50]. Te results were expected to make new progress in the treatment of IHs.
Te function of hub genes is infuenced by their expression level and regulated by transcription factors [51]. Terefore, we predicted the transcription factors regulating these screened biomarkers through the online website "Network Analyst," reporting that only three genes can obtain new work analysis results, indicating that diferent transcription factors regulated each gene. Te cotranscription factors involved in the regulation of 2 hub genes simultaneously involved SOX2, NANOG, SUZ12, AR, BMI1, SMAD4, ERG1, and SETDB1. of IHs are cellularly composed of EC, pericytes, and stem cells [52]. EC works as the main driving force of angiogenesis, which responds to VEGF stimulation, critical signal transduction for angiogenesis [53], and rapidly transforms from a static state to an active form in the proliferation of IHs [54]. Notably, most co-transcription factors regulate the VEGF pathway, affecting the angiogenesis of various tumors or organs. For example, high expression of SOX2, AR, and BMI1 stimulates angiogenesis in multiple tumors [55][56][57][58]. Over-expressed NANOG can promote EC proliferation and angiogenesis by inducing transcription of liver kinase-1 (FLK1) [59]. On the contrary, a low expression level of NANOG is required for the maintenance of EC homeostasis and angiogenesis [60].
EC development is also intervened by SMAD4 [61]; loss of SMAD4 leads to decreased expression of VEGFR2 and regulates angiogenesis [62].
It is interesting to note that there are some connections in potential drugs and transcription factors of target diagnostic markers, which were both involved in the regulation of angiogenesis. In addition to this study's results, recent literature also shows that aristolochic acid [63], and entinostat [64] induce the down-regulation of SMAD4 in human cells. Sulforaphane weakens the NANOG and SOX2 in respiratory tumors [65]. Sunitinib therapy can regulate SMAD4 expression by altering the expression of Mir-452-5p in renal cancer [66]. Overall, regulation of transcription factors may play a role in drug inhibition of diagnostic marker genes.
Tis study also has limitations, mainly refected in a small sample size as source databases are derived. And the molecules as the diagnostic markers of IHs are only based on bioinformatic analysis data. On this account, we plan to collect as sufcient samples as possible to verify the clinical diagnosis value of Hub genes for IHs. More specifcally, the discriminant capacity and potential action mechanism of hub genes on IHs development were further clarifed by Single-cell sequencing. Te feasibility of drug inhibitors and related transcription factors obtained in this study will be the primary objects discussed in the following survey on IHs therapy.

Conclusions
Bioinformatics methods were utilized to identify fve hub genes (including TMEM2, GUCY1A2, ISL1, WARS, and STEAP4) with power diagnostic capacity for IHs in this study. Tese diagnostic markers may regulate IHs development by participating in tumorigenesis-related biological processes, such as angiogenesis and BCAAs metabolism. We also determined inhibitors and transcription factors associated with these diagnostic markers. Despite as a preliminary study that requires further research to validate these results, the analysis outcomes provide new insights into the diagnosis and treatment of IHs.

Data Availability
Te data used to support the fndings of this study are available from the corresponding author upon request.

Conflicts of Interest
Te authors declare that there are no conficts of interest regarding the publication of this article.

Authors' Contributions
HJ, GS, and YH were responsible for conception, design and administrative support. SH, RC, and SG were equal contributors in manuscript preparation. All authors were responsible for provision of study materials, literature research, collection, assembly of the data, the data analysis, interpretation, and fnal approval of the manuscript. Sicong Huang, Ruiqi Chen, and Shihua Gao contributed equally to this work.

Acknowledgments
Tis work was supported by Guangdong Medical Science and Technology Research Fund Project (A2021053 and A2019189) and the Guangzhou Health Science and Technology Project (20211A011079 and 2019A010057).

Supplementary Materials
Table S1: DEGs of IHs in the 6-month-old compared to normal samples. Table S2: DEGs of IHs in the 12-month-old compared to normal samples. Table S3: DEGs of IHs in the 24-month-old compared to normal samples. Table S4: common up-and down-regulated genes among the 6-, 12-, and 24-month-old IHs samples. Table S5: GO and KEGG analysis of candidate genes. Table S6: the top 20 signifcant genes listed by the SVM-RFE algorithm ranked in 127 candidate genes for characteristics. Table S7: GO items relevant to diagnostic genes. Table S8: all functional annotation enrichment analysis results of the identifed diagnostic genes. Table S9: all potential compounds are associated with the identifed diagnostic genes. Table S10: potential compounds are associated with the major transcription factors. (Supplementary Materials)