Bioinformatic Identification of Neuroblastoma Microenvironment-Associated Biomarkers with Prognostic Value

The microenvironment plays a vital role in the tumor recurrence of neuroblastoma. This research aimed at exploring prognostic genes that are involved in neuroblastoma microenvironment. We used “estimate” R package to calculate the immune/stromal/ESTIMATE scores of each sample of ArrayExpress dataset E-MTAB-8248 based on the ESTIMATE algorithm. Then we found that immune/stromal/ESTIMATE scores were not correlated with age/chromosome 11q, but tumor stage, MYCN gene amplifications, and chromosome 1p. Samples were then divided into high- and low-score groups, and 280 common differentially expressed genes (DEGs) were identified. 64 potential prognostic genes were harvested through overall survival analysis from the common DEGs. 14 prognostic genes (ABCA6, SEPP1, SLAMF8, GPR171, ABCA9, ARHGAP15, IL7R, HLA-DPB1, GZMA, GPR183, CCL19, ITK, FGL2, and CD1C) were obtained after screening in two independent cohorts. GO and KEGG analysis discovered that common DEGs and 64 potential prognostic genes are mainly involved in T-cell activation, lymphocyte activation regulation, leukocyte migration, and the interaction of cytokines and cytokine receptors. Correlation analysis showed that all prognostic genes were negatively correlated with MYCN amplification. Cox analysis identified 5 independent prognostic genes (ARHGAP15, ABCA9, CCL19, SLAMF8, and CD1C).


Introduction
Neuroblastoma is a cancer that develops from immature nerve cells found in multiple parts of the body, including neuroblastomas, ganglioblastomas, and ganglion neuromas [1]. Neuroectodermal cells containing neuroblastomas originate from the neural crest during fetal development and are destined for the adrenal medulla and sympathetic nervous system [1]. Neuroblastoma accounts for 97% of all neuroblastic tumors, is heterogeneous, and differs in location, histopathological appearance, and biological characteristics [1]. Neuroblastoma comprises 6% to 10% of all childhood cancers and causes 15% of all pediatric cancer deaths [2]. Although its molecular basis is still unknown, clinical diversity is closely related to numerous clinical and biological factors, including patient age, tumor stage and histology, and genetic and chromosomal abnormalities [1]. Better learning the molecular mechanism of neuroblastoma could provide a crucial message related to prognosis [3]. e tumor microenvironment is the environment around a tumor, including the surrounding blood vessels, immune cells, fibroblasts, signaling molecules, and the extracellular matrix. e tumor and the surrounding microenvironment are closely related and interact constantly [4]. e tumor microenvironment has been recognized as a complex milieu where cancer cells interact with immune and stromal cells via numerous biochemical and physical signals that are crucial for cancer progression and metastasis [4][5][6]. e activities of immune cells and stromal cells have been demonstrated owing to the capacity to predict cancer outcomes [7].
ESTIMATE is a method that uses gene expression characteristics to infer the ratio of stromal cells to immune cells in a tumor sample. ESTIMATE scores are correlated with tumor purity based on DNA copy number in samples from 11 different tumor types. ese samples have been profiled on Agilent and Affymetrix platforms, or based on RNA sequencing, which can be obtained through e Cancer Genome Atlas. e use of 3,809 transcription profiles provided elsewhere in the public domain further confirmed the accuracy of the prediction. e ESTIMATE method can be used to evaluate the presence of stromal cells and immune cell infiltration in tumor samples using gene expression data [8]. According to the ESTIMATE algorithm, researchers have performed a prognostic assessment and are exploring the genetic changes of many malignant tumors [9][10][11]. But the immune and stromal scores of neuroblastoma have not yet been elucidated. No research has clearly shown whether the ESTIMATE algorithm can be used to predict the prognosis of neuroblastoma.
To understand the molecular pathogenesis of neuroblastoma involved in the tumor microenvironment, in this work, we used the ESTIMATE algorithm in conjunction with multiple cohorts to explore underlying genetic factors in the tumor microenvironment of neuroblastoma and identify prognostic genes.

ArrayExpress Microarray Data.
We searched the ArrayExpress (https://www.ebi.ac.uk/arrayexpress/) database [12] using "neuroblastoma" as the keyword and filtered the results by checking "Homo sapiens" as the "Organism" and selecting "ArrayExpress data only." In the filtered results, we dug into each detailed description of each dataset to see if they contain survival data. Finally, dataset E-MTAB-8248 that comes from the platform of Agilent-020382 Human Custom Microarray 44k (Feature Number version) was selected for this study. e processed data of the annotation table were obtained from the ArrayExpress website.

Pretreatment of Microarray Data.
To begin with, the annotation procedure was performed by using the downloaded annotation table. en, "estimate" R package was used to figure immune, stromal, and ESTIMATE scores of the dataset by ESTIMATE algorithm [8].

Differentially Expressed Gene Analysis.
Patients were then stratified according to the median of immune and stromal scores. Differentially expressed genes (DEGs) were explored between high and low immune/stromal score groups using the "limma" R package [13]. e cutoff was set as (1) |log2 (fold-change) | > 1 and (2) false discovery rate (FDR) < 0.05. e heatmaps of DEGs were generated from "pheatmap" R package.

Enrichment Analysis of Common DEGs.
e "cluster-Profiler" R package was applied to perform functional enrichment analysis of common DEGs, including Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG).

Identification of Potential Prognostic Genes.
Kaplan-Meier analysis was conducted by the "survival" R package.
e Kaplan-Meier curve illustrated the overall survival difference between low and high expressions of each common DEG gene. is relationship was examined by the log-rank test.

Establishment of the Protein-Protein Interaction (PPI)
Network of Potential Prognostic Genes. To better study the potential prognostic genes, we constructed a PPI network using the STRING online database (http://string-db.org) and the Cytoscape software (http://www.cytoscape.org/). e degree distribution was calculated using "cytoHubba" plug-in and Network Analyzer from Cytoscape. e clusters (highly interconnected regions) in this PPI network were discovered by a Cytoscape plug-in named "MCODE."

Screening for Prognostic Genes in Two Independent
Cohorts. Neuroblastoma patients with expression and survival data were available in the datasets of GSE85047 and GSE49710 from the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) database. e gene expression profiles were measured experimentally using platforms of GPL5175 and GPL16876, respectively (Table 1). We screened the identified potential prognostic genes using the two cohorts by overall survival. p value <0.05 (log-rank test) was considered statistically significant. e genes that are in the intersection of the above two results were considered prognostic genes.

Identification of the Independent Prognostic Value of
Prognostic Genes. MYCN amplification was strongly correlated with a poor prognosis in neuroblastoma cases [14]. e status of MYCN and the expression of prognostic genes were extracted from GSE49710, and their correlation was checked using the Spearman test. In addition, to explore the independence of the prognostic genes, we performed univariate and multivariate Cox analysis on prognostic genes and MYCN in the GSE49710 cohort. In the end, we reviewed previous studies related to crucial prognostic genes and neuroblastoma.

Characteristics of Cohorts Included in
is Study. E-MTAB-8248 (n � 223) was chosen for identifying potential prognostic genes in the TME of neuroblastoma. GSE85047 (n � 283) and GSE49710 (n � 498) were taken as screening tools to obtain prognostic genes from potential genes. e clinical characteristics of the cohorts included in this study are shown in Table 2. e distributions of immune/stromal/ESTIMATE scores did not vary with age/chromosome 11q ( Figure 1). However, in the tumor stage section, the distribution of stromal scores was significantly different in the stages, but immune and

Immune/Stromal/ESTIMATE Scores and Prognosis.
223 neuroblastoma patients were divided into high-and low-score groups according to their median scores. Kaplan-Meier survival curves showed that high immune score patients had a better trend of overall survival than that the low-score group (p value � 0.091 in the log-rank test) (Figure 2(a)). Analogously, as shown in Figures 2(b) and 2(c), patients with high stromal/ESTIMATE scores achieved a better trend on overall outcomes than those with low scores (p value >0.216). Not noticeable significant statistical results simply predict that fewer microenvironment-related genes may be found in the next steps and do not negate the role of the microenvironment in neuroblastoma. So, we still will focus on the genes related to both the immune and stromal scores.

Identification of Common DEGs from Immune and Stromal
Scores. 223 neuroblastoma patients were grouped based on their median scores. As shown in Figure 3(a), 601 DEGs were found among immune score groups (Table S1). 503 DEGs among stromal score groups were discovered, which are shown in Figure 3(b) and Table S2. 279 common upregulated DEGs (Figure 3(c)) and 1 common   (Table S3) then entered into the next steps. GO enrichment analysis (Table S4) showed top GO terms identified in the 280 common DEGs mainly involved with lymphocyte differentiation, T-cell activation, regulation of lymphocyte activation, external side of the plasma membrane, collagen-containing extracellular matrix, G proteincoupled receptor binding, and the peptide binding ( Figure 3(e)). KEGG analysis (Figure 3(f) and   Journal of Oncology common DEGs was mainly observed for cytokine-cytokine receptor interaction, viral protein interaction with cytokine and cytokine receptor, Staphylococcus aureus infection, 17 cell differentiation, and chemokine signaling pathway.

Identification of Potential Prognostic Genes.
Kaplan-Meier survival curves were built for the 280 common DEGs based on overall survival. According to the logrank test (p value <0.05), 64 common DEGs had survival predictive ability (all were upregulated in TME), of which 60 and 4 were positively and negatively correlated with survival, respectively ( Figure 4 and Table S3).

Functional Enrichment Analysis and Protein-Protein Interaction Construction of Potential Prognostic Genes.
GO enrichment was conducted using the 64 prognostic genes, and the enriched items in GO were involved largely in T-cell activation and the regulation of lymphocyte activation, MHC class II protein complex and the MHC protein complex, and antigen binding and the cytokine activity ( Figure 5(a)). KEGG pathways were mainly enriched for hematopoietic cell lineage, herpes simplex virus 1 infection, influenza A, and cytokine-cytokine receptor interaction ( Figure 5(b)). A PPI network was constructed by the online STRING tool and Cytoscape software. After hiding disconnected nodes, there were 50 nodes and 239 edges in this network ( Figure 5(c)). All genes positively correlated with the overall outcomes of neuroblastoma cases. CD69, CD3D, CD2, CCL5, CCR2, CD48, LCK, IL7R, IRF8, and HLA-DRB1 were the top ten genes in this network sorted by degree. RGS1 had the strongest predictive ability in this network for overall survival; however, FYB had the weakest ability to do so. e interaction between HLA-DRA and CD74 was the tightest in this PPI network, with a combine score of 0.999. e loosest relationship happened between CD52 and CCR2, with a combine score of 0.409. en, we explored the PPI network using the Cytoscape "MCODE" plug-in and found 3 main clusters. e top two modules had more than 9 nodes, which were plotted by us (Figures 5(d) and 5(e)). In cluster one, CCL5, HLA-DRA, HLA-DRB1, IRF8, and CD74 were the top five genes sorted by degree, and HLA-DMA, IGHV4-38-2, HLA-DPB1, CCL5, and HLA-DMB were the top five genes sorted by prognostic ability. In cluster two, CD69, CCR2, CD48, LCK, and TLR8 were on the top five sorted by degree distribution,  and CXCR6, CXCL14, CD69, TLR8, and GPR183 were on the top five sorted by prognostic ability.

Screening in Two GEO Cohorts.
We then screened the 64 potential prognostic genes described above using the GSE85047 and GSE49710 cohorts from the GEO database. p value <0.05 in the log-rank test was set as the cutoff. 14 genes were screened from GSE85047 cohort (Table S6), 56 genes were found from GSE49710 (Table S7), and 14 in the intersection were identified as prognostic genes (Table 1).

Identification of the Independent Prognostic Value of
Prognostic Genes. MYCN amplification was strongly correlated with a poor prognosis in neuroblastoma cases [14].   e status of MYCN and the expression of prognostic genes were extracted from GSE49710, and their correlation was checked using the Spearman test. e correlation results shown in Table 3 indicated that all prognostic genes were negatively correlated with the amplification of MYCN. In univariate Cox analysis, all prognostic genes and MYCN amplification were related to prognosis. In multivariate Cox analysis, MYCN amplification, ARHGAP15, ABCA9, CCL19, SLAMF8, and CD1C could still predict prognosis (Table 4). e above results suggested that ARHGAP15, ABCA9, CCL19, SLAMF8, and CD1C were potential independent prognostic factors for neuroblastoma. Finally, we did a literature review, finding that, among 14 prognostic genes identified in the present study, 3 had been reported having experimental evidence involving in the progression of neuroblastoma, none of them had been previously experimentally proofed affecting neuroblastoma prognosis (Table 5).

Discussion
In this study, we attempted to identify tumor microenvironment-related genes that contribute to neuroblastoma outcome from an ArrayExpress dataset. By comparing global gene expression between high-and low-score patients, 280     common DEGs were found. en 64 genes were identified owning potential prognostic value via survival analysis. Importantly, we screened the 64 genes in two independent GEO cohorts, identifying 14 genes (ABCA6, SEPP1, SLAMF8, GPR171, ABCA9, ARHGAP15, IL7R, HLA-DPB1, GZMA, GPR183, CCL19, ITK, FGL2, and CD1C) as prognostic genes. Moreover, ARHGAP15, ABCA9, CCL19, SLAMF8, and CD1C were found via Cox analysis having independent prognostic value ( Figure 6). We deployed the ESTIMATE algorithm to figure the immune/stromal/ESTIMATE scores of neuroblastoma cases and found these scores were not correlated with age/chromosome 11q, but tumor stage, MYCN gene amplifications, and chromosome 1p. e MYCN, a member of the MYC family, has been associated with high-risk disease and poor prognosis in approximately 25% of cases of neuroblastoma. Currently, the amplification of MYCN is still one of the most risky genetic markers in neuroblastoma [18]. Neuroblastoma tumor cells lacking parts of chromosome 1 or 11 (known as 1p deletion or 11q deletion) may be predictive of poor prognosis [19]. In our study, we found that the scores were related to the MYCN gene amplifications and chromosome 1p, which may reveal a new understanding of this disease, but need further research. And we also found the scores were related to the prognosis of neuroblastoma. Immune cells and stromal cells are essential components active in the tumor microenvironment, which affect the survival, proliferation, and treatment resistance of neuroblastoma [20][21][22][23][24]. e crosstalk between tumor and microenvironment influences the inflammatory response: cancer cells interact with both the innate and the adaptive immune system and use immune cells for tumor survival and protection from immunological attacks [21,[24][25][26].
GO analysis showed the common DEGs were largely enriched in T-cell activation, external side of the plasma membrane, and G protein-coupled receptor binding. Besides, the KEGG analysis demonstrated that cytokine-cytokine receptor interaction, viral protein interaction with cytokine and cytokine receptor, and Staphylococcus aureus infection were mostly enriched. Previous studies have shown that the biological processes of the immune system are critical to the formation of a complex tumor microenvironment, which is consistent with our findings [21,22,24]. In the last few years, the immunological  characteristics of neuroblastoma have been more deeply understood, and the development of effective neuroblastoma immunotherapy strategies has attracted widespread attention [27][28][29].
Overall survival analysis of the common DEGs found 64 genes owned potential prognosis capacity in neuroblastoma cases. In the PPI network of these 64 potential prognostic genes, we are particularly interested in CCL5 in cluster one ( Figure 5(d)) and CD69 and TLR8 in cluster two ( Figure 5(e)), of whom hold the top five degree and prognostic ability in their own cluster. CCL5 is an inflammatory chemokine that is widely secreted from natural killer cells, T cells, fibroblasts, epithelial cells, and platelets [30] and promotes chemotaxis on cells involved in the immune/inflammatory response [31]. Studies demonstrated that CCL5 is related to certain tumor cells, such as malignant melanoma cells [32], ovarian [33], prostate [34], and breast cancer cells [35]. e exact functions of CCL5 in tumor biology are still unclear. CCL5 production is relevant to inducing proper immune responses against tumors [36].
e current research showed that CCL5 promotes tumor migration and invasion. CCL5 secreted by tumor cells and other cells in the breast cancer tumor microenvironment can recruit tumor-associated macrophages into the tumor microenvironment [31]. Evidence on the direct relationship between CCL5 and neuroblastoma is still lacking and requires further study. CD69, a type II glycoprotein known to regulate inflammation through T-cell migration and retention in tissues, plays an important role in inducing the exhaustion of tumor-infiltrating T cells [37]. CD69 expression is readily upregulated upon activation in most leukocytes, which mainly underlies its widespread use as a marker of activated lymphocytes and NK cells [38]. Mita et al. reported that the use of anti-CD69 monoclonal antibodies to treat tumor-bearing mice significantly reduced tumor-infiltrating T-cell failure and enhanced protection against metastasis, indicating the efficacy of anti-CD69 antibodies in the treatment of malignant tumors [37]. CD69 expression is associated with hypoxia in the tumor microenvironment [39]. e research on the relationship between CD69 and tumors is still scarce, is still in infancy, and needs more in-depth research. TLR7 and TLR8 are phylogenetically and structurally closely related members of the TLR family; together with TLR9, they constitute one of the six major TLR clades [40]. TLR8 has been identified as a natural receptor for singlestranded RNA and is thought to act as an effective activator of the innate immune response after viral infection [41][42][43]. TLR8 is the only TLR that has been shown to be necessary and sufficient to reverse the suppressive function of Treg cells, resulting in strong tumor-suppressive effects [44]. Numerous reports have described the Toll-like receptor (TLR) expression in the tumor microenvironment as it relates to cancer progression, as well as their involvement in inflammation [45]. Inflammation triggered by TLR signaling can directly influence tumor-induced senescence in tumor microenvironments, and the effects are variable depending on different TLR signaling and tumor types [46].
In this study, GSE85047 and GSE49710 cohorts were used for the screening tools for prognostic genes. We screened the prognostic value of these 64 genes based on the cohorts, of which 14 genes were identified as prognostic genes. ARHGAP15, ABCA9, CCL19, SLAMF8, and CD1C could predict prognosis independently. We then conducted a small-scale review of these 14 genes, finding IL7R, CCL19, and GZMA have been demonstrated having experimental evidence with the progression of neuroblastoma by the previous research [15][16][17], while no gene has been experimentally confirmed having roles in prognosis to neuroblastoma. Several gene expression studies indicate that IL7 has increased expression in NB specifically in tumors with a better prognosis, making IL7 a key candidate for this soluble factor. Prasad's finding implicated IL7 within the Schwannian stroma of the neuroblastoma tumor architecture as having a paracrine signaling effect on neighboring neuroblasts which may provide the antiproliferative and differentiation signals postulated [15]. Walker and colleagues found that changes in the CCL19 signal transduction pathway can lead to defects in the migration of dendritic cells in neuroblastoma, which in turn affects tumor progression [16]. Yarmarkovich demonstrated that GZMA has a strong correlation with T-cell factors, and GZMA is very active in the neuroblastoma tumor microenvironment [17]. Most of the prognostic genes obtained in our study have not been clarified the mechanism in neuroblastoma, and more efforts need to be implemented in the future.

Conclusion
In summary, by applying ESTIMATE algorithm, and mining from ArrayExpress dataset and other two independent GEO cohorts, we got 14 prognostic genes (ABCA6, SEPP1, SLAMF8, GPR171, ABCA9, ARHGAP15, IL7R, HLA-DPB1, GZMA, GPR183, CCL19, ITK, FGL2, and CD1C) having the capacity to illustrate the prognosis of neuroblastoma patients. Besides, ARHGAP15, ABCA9, CCL19, SLAMF8, and CD1C had independent prognostic value. Interestingly, only 3 of the 14 genes have been previously experimentally identified to be involved in the progression of neuroblastoma. It would be anticipated to test whether this set of genes can provide better survival prediction capabilities than individual genes. More efforts on the further research of the 14 genes may reveal a potential relationship among tumor microenvironment and neuroblastoma prognosis in a novel and comprehensive means.

Data Availability
Publicly available datasets were analyzed in this study.

Authors' Contributions
Yi Wang organized and wrote the manuscript. Huan Luo and Jing Cao contributed to the literature search for the manuscript. Chao Ma designed and produced the figures and revised the manuscript. All authors reviewed the manuscript and approved the manuscript for publication. Table S1: 601 DEGs found between immune score groups (FDR < 0.05). Table S2: 503 DEGs found between stromal score groups (FDR < 0.05). Table S3: the performance of 280 common DEGs in Kaplan-Meier analysis from the E-MTAB-8248 cohort with p value tested in the log-rank test. Table S4: the gene ontology term enrichment of 280 common differentially expressed genes with p.adjust <0.05. Table S5: the Kyoto Encyclopedia of Genes and Genomes pathway enrichment of 280 common differentially expressed genes with p.adjust <0.05. Table S6: the list of 14 prognostic genes screened from the GSE85047 cohort with p value <0.05 in the log-rank test. Table S7: the list of 56 prognostic genes screened from the GSE49710 cohort with p value < 0.05 in the log-rank test. Table S8: the main scripts used in this study. (Supplementary Materials)