Transcription Factors Leading to High Expression of Neuropeptide L1CAM in Brain Metastases from Lung Adenocarcinoma and Clinical Prognostic Analysis

Background There is a lack of understanding of the development of metastasis in lung adenocarcinoma (LUAD). This study is aimed at exploring the upstream regulatory transcription factors of L1 cell adhesion molecule (L1CAM) and to construct a prognostic model to predict the risk of brain metastasis in LUAD. Methods Differences in gene expression between LUAD and brain metastatic LUAD were analyzed using the Wilcoxon rank-sum test. The GRNdb (http://www.grndb.com) was used to reveal the upstream regulatory transcription factors of L1CAM in LUAD. Single-cell expression profile data (GSE131907) were obtained from the transcriptome data of 10 metastatic brain tissue samples. LUAD prognostic nomogram prediction models were constructed based on the identified significant transcription factors and L1CAM. Results Survival analysis suggested that high L1CAM expression was negatively significantly associated with overall survival, disease-specific survival, and prognosis in the progression-free interval (p < 0.05). The box plot indicates that high expression of L1CAM was associated with distant metastases in LUAD, while ROC curves suggested that high expression of L1CAM was associated with poor prognosis. FOSL2, HOXA9, IRF4, IKZF1, STAT1, FLI1, ETS1, E2F7, and ADARB1 are potential upstream transcriptional regulators of L1CAM. Single-cell data analysis revealed that the expression of L1CAM was found significantly and positively correlated with the expression of ETS1, FOSL2, and STAT1 in brain metastases. L1CAM, ETS1, FOSL2, and STAT1 were used to construct the LUAD prognostic nomogram prediction model, and the ROC curves suggest that the constructed nomogram possesses good predictive power. Conclusion By bioinformatics methods, ETS1, FOSL2, and STAT1 were identified as potential transcriptional regulators of L1CAM in this study. This will help to facilitate the early identification of patients at high risk of metastasis.


Introduction
Lung cancer is a malignant tumor with increasing incidence and high mortality rates worldwide in recent years [1]. Nonsmall-cell lung cancer (NSCLC) accounts for approximately 85% of lung cancers, and lung adenocarcinoma (LUAD) accounts for 60% of diagnosed NSCLC, making it the most common type of NSCLC [2]. LUAD is a malignant tumor of the glandular epithelium. Current studies suggest that most LUADs progress through a sequence of atypical adenomatous hyperplasia (AAH), adenocarcinoma in situ (AIS), microinvasive adenocarcinoma (MIA), and finally invasive adenocarcinoma (IA). However, the exact mechanism of disease progression remains unclear [3].
Tumor infiltration and metastasis are important factors in the low overall survival of patients with LUAD. [4] Most patients with LUAD are diagnosed at an advanced stage or have distant metastases. The brain is one of the most common sites of hematogenous metastases of LUAD, and metastasis at this site is associated with high morbidity and mortality [5]. Approximately 10% to 20% of non-small-cell lung cancer patients have brain metastases at the initial diagnosis, and the majority are LUAD patients [5,6]. Additionally, about 40% to 60% of LUAD patients develop brain metastases during treatment [7]. According to the latest prognostic assessment model for brain metastases from lung cancer, the median survival of LUAD patients with brain metastases is approximately 15 months, which significantly affects patient prognosis [8]. Although TNM staging plays an important role in assessing the prognosis of patients with LUAD, some patients with similar staging and identical treatment courses had a significantly different prognosis.
L1 cell adhesion molecule protein (L1CAM) is a transmembrane glycoprotein with a molecular weight of 200 to 220 kDa and is a member of the immunoglobulin superfamily of cell adhesion molecules, which plays an important role in the development and regeneration of neural tissue [9,10]. As an adhesion molecule, L1CAM can increase the migration and invasion abilities of tumor cells, specifically by promoting tumor cells to cross the endothelium, invade the basement membrane, and metastasize to other sites, thus playing an important role in tumor development and bloodstream metastasis [11]. In recent years, L1CAM has been found to be highly expressed in many tumor cell lines and tumor tissues-for example, in glioblastoma, metastatic brain tumors, endometrioid adenocarcinoma, colorectal cancer, and lung cancer [12][13][14][15][16]. Its high expression often indicates a poor prognosis, and it is thus a valuable diagnostic or prognostic marker; in addition, it may be a new target for cancer therapy [12,13,[17][18][19][20][21][22][23].
In recent years, with the boom in single-cell technology, a large number of single-cell datasets have been assayed for the analysis of gene regulatory networks in individual cells [14][15][16]. A large number of single-cell data mining studies have been carried out, contributing to the flourishing of research in the field of tumor microenvironment and cell development [24][25][26][27]. A study published in Nat Commun in 2020 reveals the transcriptome signature of LUAD brain metastases [28]. This dataset was used in this study to explore the potential regulatory network of L1CAM.
There remains a lack of understanding of the development of LUAD metastases resulting from L1CAM. We explored the potential mechanisms of L1CAM-related LUAD metastasis, and our results may provide new targets and ideas for the treatment of LUAD patients.

Materials and Methods
2.1. Data Retrieval. RNAseq data and clinical information were downloaded from the Cancer Genome Atlas (TCGA) (https://portal.gdc.cancer.gov/) LUAD project. The R package DESeq2 (version 1.26.0) was used for the variance analysis. Single-cell expression profile data were obtained from the transcriptome data of 10 metastatic brain tissue samples; information on single-cell annotations was downloaded from the same Gene Expression Omnibus (GEO) dataset GSE131907 [28]. Differences in gene expression between these two datasets were analyzed using the Wilcoxon ranksum test, respectively.

Enrichment
Analysis. Enrichment analysis is an important means of demonstrating gene function. Gene oncology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were done with the clusterProfiler package (version 3.14.3) of R software (version 3.6.3). The http://org.Hs.eg.db (version 3.10.0) package was used for ID conversion. p < 0:05 was defined as statistically different. In addition, we performed gene set enrichment analysis (GSEA), an analysis method for genome-wide expression profiling microarray data that compares genes to a predefined gene set and allows for an understanding of the expression status of target genes in a specific set of functional genes, was performed. False discovery rate < 0:25 and adjusted p value < 0.05 were defined as significant enrichment.
2.3. Immune Infiltration and StromalScore Analysis. Different immune cells play different roles in tumorigenesis, and the composition of immune cells varies from tumor to tumor [29]. Therefore, quantitative immune infiltration analysis of different types of immune cells is often carried out in the study of tumor mechanisms. The specific methods of immune infiltration analysis were similar to those of previous studies. Cell markers were derived from previous studies [30]. The analysis for immunoinfiltration was performed on the retrieved dataset using ssGSEA, a built-in algorithm of the GSVA package in R. Immune infiltration and Stro-malScore analysis were performed using the CIBERSORT and ESTIMATE algorithms to calculate the degree of immune cell infiltration and immune, mesenchymal, and tumor purity in TCGA. The differences in immune cell infiltration and tumor purity between subtypes were further compared [31].

Prediction of Transcription Factors.
Transcription factors and their downstream target genes form a gene regulatory network that plays a key role in regulating gene expression. The Gene Regulatory Network database (GRNdb) (http:// www.grndb.com/) is a freely accessible database that provides a reliable way to explore gene expression profiles, correlations, and expression levels [32]. In this study, the GRNdb was used to reveal the upstream regulatory transcription factors of L1CAM in LUAD.

Construction of a LUAD Prognostic Nomogram
Prediction Model. The nomogram is a visualization of the regression model results, which can be easily and quickly applied to the clinical assessment of patient prognosis [33][34][35][36]. The Cox regression analysis included risk genes for LUAD to construct a lung cancer risk prediction model, which was then used to predict the probability of survival at 1, 2, and 3 years for LUAD patients. The time-dependent receiver operating characteristic curve (ROC) and the area under the curve (AUC) values at 3 and 5 years were used to evaluate the independent predictive ability of the nomogram factors. In addition, calibration curves were plotted to check the accuracy of the nomogram model.
2.6. Statistical Analysis. All RNAseq data in fragments per kilobase of transcript per million mapped reads (FPKM) format were converted to TPM (transcripts per million reads) format and log2 transformed. Statistical analysis of survival data was done with the survivor R package, and visualization was done using the survminer R package. Correlation   Figure 1(f), suggesting that high L1CAM expression is associated with distant metastasis, while the derived ROC curves suggest that high expression of L1CAM is associated with poor prognosis in LUAD. GSEA enrichment analysis suggested that the functions of L1CAM were significantly enriched in KEGG terms related to ECM receptor interaction, hematopoietic cell lineage, natural killer cell-mediated cytotoxicity, and pathways in cancer ( Figure 1(j)).

Potential Upstream Regulatory Targets of L1CAM in LUAD Bloodstream
Metastases. In this study, the GRNdb was used to reveal the upstream regulatory transcription factors of L1CAM. The predicted potential upstream transcriptional regulators are FOSL2, HOXA9, IRF4, IKZF1, STAT1, FLI1, ETS1, E2F7, and ADARB1. Heat map analysis of the correlation between these transcription factors and L1CAM expression in the TCGA-LUAD dataset suggested that all these transcriptional regulators were significantly and positively correlated with L1CAM expression (p < 0:001). The correlation analysis of these transcription factors with L1CAM expression in the GSE131907 LUAD brain metastasis malignancy cell dataset, meanwhile, is shown in Figure 2(c). Of the above regulators, ETS1, FOSL2, and STAT1 were found to be key transcriptional regulators in LUAD brain metastases. Analysis of the proportion of L1CAM-positive cells in LUAD and LUAD cerebrovascular metastases suggested that L1CAM was significantly more highly expressed in the brain tissue at the metastasis site ( Figure 2(e)). Scatter plot correlation analysis suggested that the expression of L1CAM was significantly and positively correlated with the expression of ETS1, FOSL2, and STAT1 ( Figure 2(f)). Survival analysis suggested that high expression of FOSL2 and STAT1 was significantly associated with poor prognosis of LUAD (p < 0:05), and high expression of ETS1 was also potentially correlated with poor prognosis of LUAD (p = 0:053) (Figure 2(g)).
3.3. KEGG Analysis of the Upstream Regulatory Targets of L1CAM. To further clarify the potential functions of ETS1, FOSL2, and STAT1 in LUAD transfer, we performed KEGG analysis on these regulators using GSEA ( Figure 3). The functions of ETS1 are mainly related to vascular smooth muscle contraction, the Wnt signaling pathway, and longterm depression (Figure 3(a)). The functions of FOSL2 are mainly related to neuroactive ligand-receptor interaction, nod-like receptor signaling pathway, ECM receptor interaction, small cell lung cancer, and focal adhesion (Figure 3(b)). The functions of STAT1 are mainly related to cell adhesion molecules (CAMs), ECM receptor interaction, focal adhesion, Parkinson's disease, and Alzheimer's disease (Figure 3(c)).

LUAD Prediction Model Constructed
Based on L1CAM, ETS1, FOSL2, and STAT1. Based on the strong correlations found in our analysis, L1CAM, ETS1, FOSL2, and STAT1 were thus used to construct the LUAD prognostic nomogram prediction model. Nomogram models predicting the probability of survival at 1, 2, and 3 years postdiagnosis for LUAD patients were constructed (Figure 4(a)). The total score was obtained by summing the scores for each item of information about the LUAD patient, and the probability of survival was given as a total score on the scale. The ROC curves of L1CAM, ETS1, FOSL2, and STAT1 associated with LUAD are shown in Figure 4(b). The results suggest that L1CAM, ETS1, FOSL2, and STAT1 have predictive power for LUAD prognosis. The calibration curve of the nomogram model was also shown in Figure 4(c), suggesting that the nomogram has good predictive power.

Discussion
As a common pathological type of lung cancer, the incidence and mortality rates of LUAD are on the rise [3,38,39]. Despite new advances in research into the diagnosis and clinical management of LUAD, the average 5-year survival rate for patients with LUAD is only 15%, and the related deaths account for nearly 30% of cancer-related deaths worldwide [40,41]. The mechanism of LUAD brain metastasis is still unclear, which hinders early detection and interruption of metastasis. Therefore, the active search for biological markers of LUAD brain metastases is of great clinical importance for the early warning.
The TCGA database was used to analyze differential genes in LUAD patients with GO and KEGG enrichment analyses in this study. The GRNdb database was used to reveal the upstream regulatory transcription factors of L1CAM in LUAD. Based on this, L1CAM, ETS1, FOSL2, and STAT1 were incorporated to construct a prognostic nomogram prediction model to assess the risk of LUAD metastasis.

Disease Markers
The functions of specific sets of genes in the tumor system are increasingly being revealed [42,43]. Previous studies have revealed that L1CAM is involved in the malignant phenotype of tumors through various signaling pathways, the more classic being the Wnt/β-catenin/TCF pathway that promotes tumor metastasis, but also by activating the Ras/ Raf/Mek/Erk signaling pathway to promote epithelialmesenchymal transition [44,45]. Inhibition of L1CAM expression has been shown to reduce the motility and invasiveness of NSCLC cells in vitro and tumorigenesis and distant metastasis in vivo. 18 Similar to previous studies, the present study found that high expression of L1CAM was associated with distant LUAD metastases. In addition, L1CAM was significantly more highly expressed in metastases from brain tissue than lung tissue in LUAD, suggesting that it could be a potential marker of LUAD metastasis. Enrichment analysis showed that the genes associated with high L1CAM expression were mainly enriched in recombination of immune receptors built from immunoglobulin superfamily domains in BPs, mitochondrial inner membrane in CCs, and antigen-binding in MFs. In addition, high L1CAM expression was significantly associated with the enrichment of NK cells, Th1 cells, Tregs, T cells, cytotoxic cells, ADC, NK CD56dim cells, and B cells, suggesting that L1CAM may participate in the progression of LUAD by regulating the role of immune cells in the microenvironment. Among the pathways, cell adhesion molecules were enriched. This suggests that the pathway may be involved in the development of LUAD and affect patient prognosis.
Many different transcriptional regulators regulate L1CAM, and among these, ETS1, FOSL2, and STAT1 were found to be significantly correlated with expression in LUAD. E26 transformation-specific-1, or ETS1, is a member of the ETS transcription factor family and is involved in the degradation of extracellular matrix proteins and cellular hypoxia tolerance through self-regulation and bypass   m m m m m man a a a a a a a a a a a a  p p p m e e e e e e e e n r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r  Spe Spe e Spe p p p p p p p p p p p p p p p p p arm arm rm rm rm arm arm r rm a a a ar r a a a a ar ar r a arm a a a arm a arm rm rm m a ar rm m m m arm arm arm a a a arm arm rm m m arm arm m arm m m a arm arm rm m rm arm a ar r r rm rm arm arm m a a a a a a m a a a a a arm ar ar ar r rm r an n n an n n n an an an an an n an an an n an an an an an an n a a an an n an a a p p p p p a r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 0.240 4 0 P P P P P P P P P P P P P P P P P P P P P P P P P < < < < < < < < < < < < < 0 9 Disease Markers regulation [46]. It can activate or repress the transcription of certain target genes and is involved in the growth and differentiation of a wide range of immune cells and the regulation of the expression of many cytokines [47,48]. The ETS1 signaling pathway was found to promote tumor cell migration, invasion, and secretion of matrix metalloproteinases (MMPs), which was closely associated with lymph node metastasis and distant metastasis in patients with lung, colon, ovarian, and breast cancers [49][50][51][52]. As in previous studies, we found that ETS1 was one of the key transcriptional regulators of brain metastasis in LUAD, and the expression of L1CAM was significantly and positively correlated with the expression of ETS1. KEGG analysis showed that the function of ETS1 was enriched in pathways related to immune cell maintenance and renewal, suggesting that ETS1 may influence the development of LUAD through relevant pathways. High expression of ETS1 was also potentially correlated with poor prognosis in LUAD, and the

Disease Markers
ROC curve suggested that ETS1 had some predictive power for lung cancer prognosis. However, to our knowledge, no other studies have identified a potential link between ETS1 and L1CAM. In the present study, ETS1 was found to function as a potential transcription factor for L1CAM in LUAD brain metastases. FOS-related antigen 2 (FOSL2) belongs to the AP-1 family of transcription factors and plays an important role in tumor proliferation and cell cycle regulation [53]. FOSL2 was found to be aberrantly expressed in non-small-cell lung cancer, ovarian cancer, liver cancer, and other malignant tumors and is involved in the growth and metastasis of tumor cells as a prooncogene [54][55][56]. In this study, FOSL2 was found to be an upstream regulatory transcription factor of L1CAM in LUAD, and the expression of L1CAM was significantly and positively correlated with that of FOSL2. . Calibration curves for this nomogram to predict the probability of survival at 1, 2, and 3 years for LUAD patients (c). "Observed fraction survival probability" means the actual observed survival rate. "Nomogram predicted survival probability" refers to the survival rate that was predicted.

Disease Markers
KEGG analysis showed that the function of FOSL2 was mainly enriched in the neuroactive ligand-receptor signaling pathway, Nod-like receptor signaling, ECM receptor interaction, small cell lung cancer, and focal adhesion, suggesting that FOSL2 may play a role in LUAD through these pathways. The ROC curves suggest that FOSL2 has a predictive power for LUAD prognosis, similar to the findings of Wang et al., who found that FOSL2 promoted TGF-β1-induced migration in NSCLC, and that patients with higher FOSL2 expression had a significantly higher risk of premature death [54].
The signal transducer and activator of transcription (STAT) family is a group of proteins that have transcriptional activity and transmit signals from the cell membrane into the nucleus, thereby activating gene transcription. The most important functions lie in activating the body's immune response and the regulation of cell proliferation and transformation [57,58]. STAT1 was the first member of the STAT family to be identified and is commonly regarded as a tumor suppressor protein in malignant tumors such as breast cancer, melanoma, and leukemia [59][60][61]. However, the role of STAT1 in the progression of different tumors remains controversial. It has been suggested that the IFN/STAT1 signaling pathway may promote the growth of tumor cells [62,63]. STAT1 is overexpressed in specific cellular environments and is associated with poor prognosis in cancer patients [64]. In this study, STAT1 was found to be an upstream regulatory transcription factor of L1CAM in LUAD, and the expression of L1CAM was significantly and positively correlated with that of STAT1. KEGG analysis showed that high STAT1 expression was associated with the enrichment of cell adhesion and neural function pathways. High expression of STAT1 was significantly associated with poor prognosis in LUAD. The present study revealed that STAT1 may act as a potential transcriptional regulator of L1CAM in the brain metastasis of LUAD. The relevance of STAT1 to L1CAM has been revealed in several previous studies. For example, STAT1 and L1CAM expressions were found to be jointly downregulated in diabetes-related skin disorders [65]. In colorectal cancer, L1CAM caused upregulation of clusterin expression in cancer cells as a result of the transactivating effect of STAT1 on clusterin [66]. Previous studies have also suggested that L1CAM may activate STAT1, but only revealed a correlation between them. The interaction between the two has yet to be verified by further experiments [67].
Thus, high L1CAM levels may indicate immune response abnormalities, tumorigenesis, and the dysregulation of neuron functions. High L1CAM expression was significantly associated with poor prognosis for OS, DSS, and PFI in LUAD patients. The ROC curves suggest that L1CAM has some predictive power for LUAD prognosis. This is similar to the findings of Yu et al., who concluded that L1CAM was a predictor of PFS in non-small-cell lung cancer patients and that positive expression of L1CAM suggested a poorer survival outcome [68]. A nomogram model was thus constructed using L1CAM, ETS1, FOSL2, and STAT1 to predict the survival probability of LUAD patients at 1, 2, and 3 years. The calibration curves suggest that the nomogram has a good predictive ability and is expected to be a valid indicator for assessing the prognosis of LUAD patients.
Based on multiple validation of TCGA, GRNdb, and GSE131907 scRNA datasets, ETS1, FOSL2, and STAT1 were identified as potential transcriptional regulators of L1CAM. However, as this study was based on the TCGA database and the mRNA expression data of GSE131907, its race specificity is obvious and its applicability to other species remains to be further investigated. In addition, the genes used in this study were taken from public databases, and the genes used in the model construction were only statistically significant associations, and their correlation with etiology and clinical immunotherapy outcomes cannot be confirmed at this time. Further experimental validation of the clinical value of the model in this study is needed.

Conclusion
By bioinformatics methods, ETS1, FOSL2, and STAT1 were identified as potential transcriptional regulators of L1CAM in this study. The nomogram prediction model based on L1CAM, ETS1, FOSL2, and STAT1 can be used as an intuitive and noninvasive quantitative tool to predict the risk of LUAD metastasis. This would facilitate the early identification of patients at high risk of metastasis for early clinical intervention and guide individualized treatment planning to improve the prognosis of LUAD patients.

Data Availability
RNAseq data and clinical information were from the TCGA (https://portal.gdc.cancer.gov/) LUAD project; information on single-cell annotations was downloaded in GSE131907.

Conflicts of Interest
The authors declare that they have no conflict of interest.

Authors' Contributions
XF and NG came up with and designed the research. EX, YM, and CL conducted the research. XF and NG analyzed the data. XF, NG, EX, YM, and CL contributed the analysis tools. XF and NG wrote the paper. All the authors read, revised, and approved the final version of the manuscript. 12 Disease Markers