Integrated Bioinformatics Analysis Identifies ELAVL1 and APP as Candidate Crucial Genes for Crohn's Disease

Immune imbalance and barrier destruction of intestinal mucosa are the central pathogenic factors of Crohn's disease (CD). In this study, three independent microarray studies of CD were integrated and 9912 differentially expressed genes (DEGs) were analysed by NetworkAnalyst to screen candidate crucial genes. NetworkAnalyst identified ELAV-like RNA binding protein 1 (ELAVL1) as the most crucial upregulated gene and amyloid-β precursor protein (APP) as the most crucial downregulated gene in peripheral blood of CD patients. By computing significance with hypergeometric test based on the KEGG pathway database, upregulated DEGs highlight the pathways of T cell receptor signaling and the differentiation of T helpers. Downregulated DEGs were found enriched in pathways in multiple cancers, MAPK signaling, Rap1 signaling, and PI3K-AKT signaling. Further taking all DEGs together, Gene Set Enrichment Analysis (GSEA) brought out the NOD-like receptor (NLR) signaling pathway which could be regulated by ELAVL1. xCell found decreased naïve and differentiated T cell proportions in the peripheral blood of CD patients suggesting T cell migration to the intestinal tissue and/or exhaustion. Further, ELAVL1 expression correlating with multiple T cell proportions suggests that ELAVL1 may regulate T cell activation. These findings illustrated that ELAVL1 and APP were candidate crucial genes in the peripheral blood of CD patients. ELAVL1 possibly acts as a key regulator of T cell activation via the NLR signaling pathway. APP might be a downstream effector of infliximab treatment connecting with MAPK signaling.


Introduction
Crohn's disease (CD), as a systemic inflammatory disease, mainly influences the gastrointestinal tract with a wide range of contributing factors including host genetics, immune system, environmental exposures, and the gut microbiome [1]. Although the pathogenesis is complex, the decades of studies have illustrated that CD is caused by environmental factors that broke the mucosal barrier and increased the luminal antigens into the lamina propria [2]. Different innate immune cells, like dendritic cells, recognize the antigens via pattern recognition receptors, such as Toll-like receptors (TLR) and NLR, and then regulate the activation of T cells [3].
Among the factors associated with CD in immune system, T cells were highlighted in CD pathology because about 200 CD risk loci are involved in T cell signaling [4,5]. Furthermore, the CD4 + T cell, including Th1, Th17, and regulatory T (Treg) cells, were associated with the severity of CD, particularly with active inflammation [2]. In addition, CD8 + T cell transcriptional signatures were identified as reliable prognostic biomarkers in the blood of CD patients [6,7]. Likewise, numerous studies are devoted to the development of diagnostic methods related to T cells for CD. For example, four CD-related differentially methylated regions were identified in the whole blood [8]. In addition, one T cell subtype from the blood of CD patients is enriched for the CD-risk gene [9]. Nevertheless, the key regulators and mechanisms regulating T cell activation in the blood of CD have not been fully described. Thus, we integrated three independent microarray studies to screen crucial genes in the blood of CD patients.
We performed an integrated bioinformatics analysis to find key regulators by NetworkAnalyst [10], a web-based visual analytics platform. Indeed, the utility of NetworkAnalyst to identify DEGs and the pathway has been demonstrated recently. For example, network analyses identified HNF4A and PTBP1 as effective biomarkers for Parkinson's disease [11] and 873 DEGs genetically related to insulin resistance [12].
In this study, we launched a network-based bioinformatics analysis in an attempt to screen DEGs in the blood of CD patients, followed by the KEGG pathway enrichment analysis, GSEA, and interaction network of DEGs. We identified ELAVL1 and APP, previously implicated in regulating the activation of innate immunity [13,14] and T cells [15], the most significant up-and downregulated genes. Furthermore, GSEA data show that NOD-like receptor signalingassociated gene signatures are enriched and the xCell analysis indicated that most T cells are significantly decreased in the three microarray studies, which could help in understanding the role of ELAVL1 and APP in the blood of CD patients.

Microarray Source.
The microarray studies of CD were downloaded from the Gene Expression Omnibus (GEO) by using the terms "Crohn's disease" and "blood". Three microarray studies were selected for subsequent integrative analysis. GSE86434 included 23 newly diagnosed CD patients and 24 healthy controls (HC). GSE94648 included 22 healthy controls, 9 inactive CD patients and 41 active CD patients. GSE119600 included 47 adult healthy controls, 48 adult CD patients, and 47 child CD patients. PRJEB28822 was chosen to be the verified datasets from the European Nucleotide Archive (ENA), which included 102 pediatric IBD patients and 51 controls, and 95 adult IBD patients and 46 controls, separately.

Gene Expression and Connectivity Correlation Assay.
To assess the comparability of two datasets, measures of average gene expression and overall connectivity between two datasets were correlated. The higher the correlations, the better the chance of finding similarities between the two datasets at subsequent stages of the analysis. All enrolled datasets with or without batch effect adjustment were examined for average expression correlation (correlation between expression ranks of genes in two studies) and connectivity correlation (correlation between connectivity ranks of genes in two studies) by R package WGCNA [16]. Better comparable datasets were indicated by greater positive correlations and more significant p values.

Integrated
Network-Based Bioinformatics Analysis. We performed integrated bioinformatics analysis using Networ-kAnalyst [10] in accordance with the protocol [17]. After annotating the gene probes to a common Entrez ID, these datasets were normalized per platform requirement and uploaded to the website. DEGs of each dataset were defined by p < 0:05 and log2 fold change > 1. The Venn diagram of DEGs was generated by the FunRich tool (version 3.1).
Network-based bioinformatics analysis was performed by NetworkAnalyst according to the pipeline described.
The whole blood-specific protein-protein interaction (PPI) of the top 20 DEGs and up-or downregulated genes was constructed per protocol [17]; then, the KEGG pathway enrichment analysis was performed by applying hypergeometric test.
2.4. Gene Set Enrichment Analysis. Preranked GSEA was analysed using a NetworkAnalyst module powered by R package fgsea [10]. All DEGs ranked by fold changes were put into the analysis, and the results was visualized as interactive heatmaps.
2.5. In Silico Immune Cell Type Enrichment Analysis. xCell, a gene signature-based method reliably portraying the cellular heterogeneity landscape of tissue expression profiles, was performed to explore immune cell types [18], and the proportion of T cells were obtained for each CD and HC sample per instruction.
2.6. Statistical Analysis. All basic statistical analyses, including the Mann-Whitney test, Pearson correlation, and Spearman correlation were calculated by R software. A p value < 0.05 was considered to be statistically significant. Data were represented by mean and standard deviation (SD) or median and quantile depending on distributions.

Screening DEGs by Integrated Bioinformatics Analysis.
Three microarray studies (Table 1) were analysed using Net-workAnalyst to screen DEGs in the blood of CD patients. Firstly, these datasets were preprocessed to ensure they are comparable enough. The correlations were positive and the p values were significant in all cases before batch effect adjustments (Figures 1(a)-1(c)). The correlations (cor) for the gene expression and connectivity of GSE86434 and GSE119600 (expression, cor = 0:97, p < 1e − 200; connectivity, cor = 0:74, p < 1e − 200) were better than GSE86434 and GSE94648 (expression, cor = 0:76, p < 1e − 200; connectivity, cor = 0:34, p < 1:6e − 135) or GSE94648 and GSE119600 (expression, cor = 0:75, p < 1e − 200; connectivity, cor = 0:34, p < 1:6e − 135). GSE86434 and GSE119600 were from the same platform (Illumina HumanHT-12 V4.0 expression bead chip). Thus, it is consistent with the notion that datasets from the same platform are more comparable than datasets from different platforms. To remove this batch effect, parametric empirical Bayes frameworks provided by ComBat function were applied by NetworkAnalyst. And the results after batch effect moving were shown in Figures 1(d)-1(f), indicating that the processed data are more comparable. PCA plots with or without batch effect adjustment were visualized in Figure 1(g).
Integrated bioinformatics analysis identified 9912 DEGs in the three microarray studies, in which 4705 genes were upregulated and 5207 were downregulated in CD compared with HC. A Venn diagram of integrative analysis DEGs and individual DEGs are shown in Figure 1(h). There were 1361 genes specifically classified by integrated bioinformatics analysis that are weakly but consistently expressed among the three datasets. 2511 genes were identified as lost genes, which are expressed in individual datasets but not in the integrated datasets.

Top 20 Hub Genes Are Identified by Network-Based
Analysis. Firstly, we put all DEGs into network-based analysis per instruction [10]. According to the degree of centrality (DC) and betweenness (BC), the most highly ranked node was APP (DC = 1056; BC = 4730653:29) followed by EGFR (DC = 580; BC = 1711464:65) and ELAVL1 (DC = 574; BC = 1999915:6) and the top 20 hub genes were presented ( Table 2). The results of the KEGG pathway enrichment analysis of these 20 genes were shown in Table 3. The whole blood-specific PPI (Figure 2(a)) and enrichment network ( Figure 2(b)) were generated by NetworkAnalyst. The zeroorder interaction network of these genes contained 20 nodes and 38 edges (Figure 2(a)). The results of pathway enrichment indicated that variables in the blood transcriptome of CD were closely connected to several cancers including colorectal cancer, and also with the PI3K-AKT and MAPK signaling pathways.

ELAVL1 and APP Are Hub Genes Measured by Network
Analysis. Then, we analysed pathway enriched in up-or downregulated DEGs separately to explore key regulators in the blood of CD patients, by NetworkAnalyst [10]. Upregulated genes in the blood of CD were enriched in the pathways, including T cell receptor signaling pathway, primary immunodeficiency, Th1 and Th2 cell differentiation, Th17 cell differentiation, and NF-kappa B signaling pathway (Figure 3(a)). The most crucial gene among upregulated DEGs was ELAVL1 (DC = 357; BC = 709523:17) (Figure 3(b)). Downregulated genes in the blood of CD were enriched in the pathways, including pathways in cancer, MAPK signaling pathway, Rap1 signaling pathway, and PI3K-AKT signaling pathway (Figure 3(c)). The most cru-cial gene among downregulated DEGs was APP (BC = 1244517:75; DC = 492) (Figure 3(d)).

NOD-like Receptor Signaling Pathway Is Enriched in the
Blood of CD by GSEA Analysis. To further clarify the possible mechanism of CD, preranked GSEA was performed to analyse the three microarray datasets separately or integrally selecting fold change as the gene ranking methods ( Table 4). As shown in Table 4 and Figures 4(a)-4(d), GSEA analysis suggested that NOD-like receptor signaling pathway is the only enriched gene set we can find in the top 10 enriched gene sets of all three datasets. In addition, the integrated dataset analysed by GSEA also indicated that NODlike receptor signaling pathway was enriched in CD patients (Figures 4(e) and 4(f)). These results revealed that NOD-like receptor signaling may act as a crucial role in the development of CD.

ELAVL1 Expression Correlates with Genes in NLR
Signaling Pathway, and APP Expression Is Associated with MAPK Signaling Cross Multiple Datasets. To further explore and validate the association between candidate crucial genes and top enriched signaling pathways in CD patients, we analysed the direct correlation between ELAVL1 and major genes in the NLR pathway and correlation between APP and major genes in MAPK signaling by the Pearson correlation. As shown in Figure 5(a), genes in the NLR signaling pathway, including NLRP1, NLRP3, CXCL8, TRAK4, and TNF, were correlated with the expression of ELAVL1 at least in one of the datasets. Additionally, GRB2, FOS, MYD88, MAP3K3, and EGF were found associated with the expression of APP ( Figure 5(b)). These results strongly supported the direct correlation between ELAVL1 and NLR signaling, as well as correlations between APP and MAPK signaling in the peripheral blood of CD patients. A slight inconsistency of correlations and significances among three datasets may be due to varied sample sizes and markable heterogeneity of the disease.     Previous studies have indicated that distinct T cell subsets, such as Th1/Th17, Tregs, memory, and naïve T cells, are activated in the gut of CD patients. Therefore, we performed xCell to explore the signature of T cells in the peripheral blood of CD from the three datasets and found that the altered T cell proportions were one of the hallmarks in the peripheral blood of CD ( Figure 6). Specifically, CD4 + T cells were obviously decreased in GSE86434 (Figure 6(a)) and GSE94648 ( Figure 6(b); similar trends were also shown in GSE119600 ( Figure 6(c)). Because of the essential function of CD4 + T cells in helping the activation of CD8 + T cells, these data also show the decrease of CD8 + T cells in GSE86434 (Figure 6(a)) and GSE94648 ( Figure 6(b)), as well as in GSE119600 (Figure 6(b). Similarly, the central memory CD8 + T (Tcm) cells were clearly decreased in all three datasets. Of note, the effector memory CD8 + T (Tem) cells were apparently decreased in the three datasets, which may be a new hallmark in the blood of CD patients. In addition, Th1 cells, the cell type mostly relevant to CD, were disordered in the three datasets. Since T cells are activated and recruited 7 Journal of Immunology Research to the intestinal tissue in CD to resist bacterial infections and continuously activated T cells can be exhausted, most T cell subsets decreased in the peripheral blood of CD can be interpreted by T cell migration to the intestinal tissue and/or exhaustion.

ELAVL1 Expression Correlates with Various T Cell Subset
Proportions in the Peripheral Blood of CD Patients. Since we have found that ELAVL1 is closely associated to NLR signaling, which has an important impact on priming the activation of T cells, we further investigate the association between ELAVL1 expression and the xCell scores of T cells in CD patients by the Spearman correlation. As shown in Figures 7(a)-7(c), different T cell subsets, including CD4 + naive T cells, CD4 + Tcm, CD4 + Tem, CD8 + naive T cells, CD8 + Tcm, and CD8 + Tem, were found positively correlated with ELAVL1 expression in dataset GSE119600 including 95 CD patients (Figure 7(c)). These results suggest that the upregulated expression of ELAVL1 is associated with T cell activation, probably through promoting the NLR signaling pathway. While in GSE86434 and GSE94648, the absence of a significant correlation may due to limited CD sample sizes of the two datasets.

Discussion
Up to date, the immunological pathogenesis of CD continues to be complex, and finding the key regulators of the immune response is essential for knowing the disease and improving the clinical management of CD. Specifically, investigations focusing on exploring CD pathogenesis and identifying therapeutic targets are warranted for the discovery of effective drugs. Here, we identified the candidate crucial genes in the peripheral blood of CD by a network-based analysis of NCBI GEO datasets GSE86434, GSE94648, and GSE119600. Integrative analysis identified 9912 DEGs among these datasets. ELAVL1 and APP were confirmed as the most significant up-and downregulated genes by NetworkAnalyst. Furthermore, the differential expression significances of the two hub genes were validated by an AmpliSeq dataset PRJEB28822, indicating that ELAVL1 and APP may be potential biomarkers for CD patients.
ELAVL1, also known as Hu antigen R (HuR), is an abundant RNA binding protein that can affect the stability and translation of many RNAs and participate in the regulation of chronic inflammation and cancer progression [22,23]. It has been identified that ELAVL1 was increased after cellular stress with protective activities [24], which is positively regulated by NF-κB and Smads [25]. Furthermore, the increase of ELAVL1 was confirmed to suppress inflammatory responses in mice [23], suggesting the important role of ELAVL1 as a posttranscriptional mediator for inflammation. However, it is worth noting that the increased expression of ELAVL1 is identified to promote the overexpression of COX-2, and thus contributing to the growth of colon cancer, whose risk is increased in the setting of CD [26,27]. Moreover, the influence of ELAVL1 in promoting malignant transformation has been well documented in multiple cancers [28][29][30][31][32]. That is in consistent with our pathway enrichment of top 20 hub genes indicating that variables in blood transcriptome of CD were closely connected to several cancers including colorectal cancer. The pathway enrichment of top 20 hub genes also shed light on the PI3K-AKT and MAPK signaling pathways, that is in line with investigations that phosphorylation by p38 MAPK results in the accumulation of ELAVL1 in the cytoplasm [33] and the elevation of ELAVL1 is essential for enhancing the proliferation of gastric cancer cells, which depends on the activation of PI3K-AKT and NF-κB signaling [34]. Our results highlight the crucial role of ELAVL1 in connecting inflammatory meditation with tumorigenesis, suggesting the potential role of ELAVL1 in carcinogenesis of colorectal cancer in the background of CD.
APP has been known as central to the pathogenesis of Alzheimer's disease (AD) and has been confirmed as the potential biomarker in predicting brain amyloid-β burden [35]. Although the specific mechanism coexisting in AD and CD is still unclear, it has been demonstrated that there   [36]. Besides, the immune response is considered one of major contributors in both AD and CD. A recent review highlights the influence of the peripheral immune system in AD [37], such as monocytes and lymphocytes, which have been linked to CD in a number of studies [38]. This overlap in the pathogenesis between the two diseases may suggest the potential central role of APP in CD pathogenesis. Though literature about APP in CD is very limited, we can still note that Apolipoprotein E (APOE), widely found increased in serum of CD patients who were primary nonrespondents or had responded clinically and serologically after infliximab treatment of CD [39], can increase transcription of APP significantly [40]. Next, APP can active the MAPK signaling pathway [41,42], which is one of the top enriched gene sets in our pathway enrichment analysis of top 20 hub genes.
Since detailed mechanism about infliximab treatment in CD has not been fully elucidated, APP may be predicted as a downstream effector of the treatment which is worthy of further investigation.     Journal of Immunology Research The results from GSEA of the individual and overall databases highlight the NOD-like receptor signaling pathway in the blood of CD patients. NOD1 and NOD2, members of the NOD-like receptor family, are two important mediators of inflammation induced by endoplasmic reticulum stress, which is a major contributor to CD [43,44]. In addition to the roles in regulating innate immune responses, there are plenty of evidences that NOD1 and NOD2 signaling have an impact on adaptive immune responses. In mice, NOD1 and NOD2 signaling are involved in the activation of Th1, Th2, and Th17 cells [45]. NOD2 also can drive CD8 + T cell activation via the cross-presentation pathway [46]. Our     14 Journal of Immunology Research integrative analysis also highlights the pathway of the T cell receptor signaling pathway and the differentiation of Th1, Th2, and Th17, which act essential roles in the pathogenesis of CD [38,47,48]. It is reasonable for us to assume that the NLR signaling pathway may work as a key regulator of CD pathology by activating T cells. Moreover, the highest ranking of the integrated dataset analysed by GSEA is the MAPK signaling pathway, followed by the Rap1 signaling pathway. As known, Rap1 plays an essential role in regulating the activation of MAPK [49], which is definitely involved in the development of CD [50] and is an effective target for the treatment of CD [51]. Therefore, it will be interesting to focus linkage between APP and MAPK and investigation of their role in the progression of CD seems to be promising. The adaptive immune system is considered the key regulator of the pathogenesis of CD [3]. When the intestinal barrier is broken, pattern recognition receptors, such as TLR [52] and NLR [53], recognize the microbe-associated molecular patterns, thus promoting the activation and differentiation of T cells [54]. Especially, CD4 + T cells are activated and differentiated into Th1/Th17, help the development of memory T cells, and then recruited to the gut to fight against  Figure 5: The transcriptome association between candidate crucial genes and top enriched signaling pathways in the peripheral blood of CD patients. The gene expression correlation between ELAVL1 and selected genes in NLR signaling (a) and correlation between APP and selected genes in MAPK signaling (b) in CD patients analysed using the expression data from the three datasets were represented by scatterplots with regression lines. Pearson correlation coefficients (r) and p values were calculated and shown.    the infection of bacteria, fungi, and viruses [55]. However, continuous T cell activation will aggravate inflammatory damage [2] and will even lead to the exhaustion of T cells [56][57][58]. Of note, the exhaustion of T cells regularly impedes the ability to defeat viral infection [59] and promote the occurrence of cancer [60]. Interestingly, the data of xCell indicated that CD4 + and CD8 + T cell proportions are obviously reduced in the blood, which may be caused by the increased migration of T cells into the gut and/or the exhaustion of T cells. Consistent with FACS data from other literatures [61][62][63], our xCell analysis highlight the decrease of CD8 + Tem as a general signature in the blood of CD, which is conventionally considered a cytotoxic cell type to defect the infection of virus. Although little is known about the detail mechanism, these phenomena suggest that the decrease of T cells, especially the CD8 + Tem, in the peripheral blood may be a novel feature for certain CD patients. As known, ELAVL1/HuR acts as a central posttranscriptional regulator of NOD2 expression, and HuR silencing can reduce NOD2 expression and mRNA stability [64]. In addition, HuR stimulated by integrin engagement and the level of HuR nuclear export are definitely involved in the activation of T cells [65]. Interestingly, we found that ELAVL1 expression showed strong positive correlations with multiple T cell subset proportions in GSE119600. The association sup-ports the idea that ELAVL1 can modulate the immune response by activating the T cells. However, we cannot demonstrate the regulatory roles of ELAVL1 on T cell activation in the current transcriptome analysis; further investigation in an animal model will be meaningful and straightforward.
In conclusion, our study suggests that ELAVL1 and APP are candidate crucial genes in the blood of CD and highlights the function of the NLR signaling pathway in priming the activation of T cells of CD. ELAVL1 may modulate the immune response of CD via the NLR signaling pathway and in turn regulate T cells status. APP could be a downstream effector of infliximab treatment connecting with MAPK signaling (a schematic diagram representing a possible mechanism is shown in Figure 8). Our analysis will be helpful for further investigation and understanding of the mechanism of ELAVL1 and APP in CD pathogenesis.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare no conflicts of interest.

18
Journal of Immunology Research