Investigation of Potential Genetic Biomarkers and Molecular Mechanism of Ulcerative Colitis Utilizing Bioinformatics Analysis

Objectives To reveal the molecular mechanisms of ulcerative colitis (UC) and provide potential biomarkers for UC gene therapy. Methods We downloaded the GSE87473 microarray dataset from the Gene Expression Omnibus (GEO) and identified the differentially expressed genes (DEGs) between UC samples and normal samples. Then, a module partition analysis was performed based on a weighted gene coexpression network analysis (WGCNA), followed by pathway and functional enrichment analyses. Furthermore, we investigated the hub genes. At last, data validation was performed to ensure the reliability of the hub genes. Results Between the UC group and normal group, 988 DEGs were investigated. The DEGs were clustered into 5 modules using WGCNA. These DEGs were mainly enriched in functions such as the immune response, the inflammatory response, and chemotaxis, and they were mainly enriched in KEGG pathways such as the cytokine-cytokine receptor interaction, chemokine signaling pathway, and complement and coagulation cascades. The hub genes, including dual oxidase maturation factor 2 (DUOXA2), serum amyloid A (SAA) 1 and SAA2, TNFAIP3-interacting protein 3 (TNIP3), C-X-C motif chemokine (CXCL1), solute carrier family 6 member 14 (SLC6A14), and complement decay-accelerating factor (CD antigen CD55), were revealed as potential tissue biomarkers for UC diagnosis or treatment. Conclusions This study provides supportive evidence that DUOXA2, A-SAA, TNIP3, CXCL1, SLC6A14, and CD55 might be used as potential biomarkers for tissue biopsy of UC, especially SLC6A14 and DUOXA2, which may be new targets for UC gene therapy. Moreover, the DUOX2/DUOXA2 and CXCL1/CXCR2 pathways might play an important role in the progression of UC through the chemokine signaling pathway and inflammatory response.


Introduction
Ulcerative colitis (UC) is a chronic nonspecific inflammation of the rectum and colon whose etiology and pathogenesis are not yet well defined [1]. UC has a high incidence in western countries, with increasing incidence in the developing countries [2]. e etiology of UC is considered to be multifactorial, including genetic and environmental factors such as urban lifestyles, dietary factors, high levels of hygiene, and gut microbiota, all of which are associated with disease progression; however, the pathogenesis of UC remains unclear [3]. Bioinformatics can be effectively used to analyze UC microarray data, providing theoretical reference for further exploration of the mechanisms of inflammatory bowel disease, and help to find potential target genes. As the latest bioinformatics research method, WGCNA is commonly used to reveal differences between genes in different samples [4].
In this study, UC gene expression data uploaded by Li et al. were downloaded. We identified the DEGs between UC samples and normal samples. en, a module partition analysis was performed based on a WGCNA, followed by pathway and functional enrichment analyses. en, data validation was performed to ensure the reliability of the hub genes. is study forecasts the molecular mechanism of UC and the potential biomarkers for UC therapy.

Microarray Data.
e gene expression profile of GSE87473 was obtained from the GEO database [5] (http:// www.ncbi.nlm.nih.gov/geo/). A total of 127 mucosal biopsy samples were obtained from 106 UC patients and 21 control subjects for subsequent analysis. e UC samples consisted of adult UC samples (n � 87) and pediatric UC samples (n � 19). Adult UC patients of 44 male and 43 female were enrolled from all geographic regions of the USA and from both metropolitan and rural settings, with an average age of 41 (race of samples not available) [6]. Pediatric UC patients of 8 male and 11 female obtained from a phase 1b clinical trial of golimumab in pediatric patients, with an average age of 15, and only subjects of European ancestry were applied [7].Normal samples (n � 21) were obtained from the Department of Gastroenterology, Perelman School of Medicine at the University of Pennsylvania (Philadelphia, PA) and the Department of Gastroenterology, University Hospital Gasthuisberg (Leuven, Belgium) [6], and information on age, gender, and race was not available.

Data Preprocessing and DEG Analysis.
ere were a total of 20741 probes in the present dataset. GEO2R (http:// www.ncbi.nlm.nih.gov/geo/geo2r/) is based on R that comes with the GEO databases, which was used to identify DEGs between UC and control samples. |log-fold change (LFC)| > 1 and P values <0.05 were selected as the thresholds for DEG screening.

WGCNA Analysis.
e coexpression network analysis was performed using WGCNA (version: 1.63) [8]. WGCNA is a systematic biological method for constructing scale-free networks using gene expression data. First, we selected the soft threshold for network construction. e soft threshold was used to transform the similarity matrix of gene expression into adjacency matrix, which enhances strong correlation and weakens correlation at the exponential level. Second, the adjacency matrix was transformed into a topological matrix. Based on TOM, we used the averagelinkage hierarchical clustering method to cluster genes. According to the standard of hybrid dynamic cut tree, we set the minimum number of base 30 for each gene network module. After determining the gene module by the dynamic shearing method, we calculated the eigenvectors of each module in turn, then clustered the modules, merged the nearer modules into new modules, and set height � 0.25 [9].
ird, we calculated the module eigengene (ME) of each module, which represents the expression level for each module. We also calculated the correlation between the clinical traits and ME in each module. At last, we calculated the gene significance (GS) of each gene in the module, which represented the correlation between the genes and sample.

Function and Pathway Enrichment Analysis.
We used the DAVID 6.8 (https://david.ncifcrf.gov) software for the GObiological function (GO-BP) and KEGG pathway analyses of the genes in main modules. We selected the P-false discovery rate (FDR) of <0.05 as the threshold for the identification of significant GO-BP terms and KEGG pathways.

Hub Genes Investigation.
According to the feature vector of each module, the correlation of the gene expression in the module was analyzed by WGCNA. Genes with correlations greater than 0.9 in each module were considered hub genes.

DEGs between UC Samples and Normal Samples.
We identified 988 DEGs, including 466 upregulated DEGs and 522 downregulated DEGs with P FDR < 0.05 and |LFC| > 1. e heatmap and volcano plot are shown in Figures 1(a) and 1(b). Obviously, the heatmap showed that these DEGs could be used to distinguish UC from control samples.

WGCNA Analysis.
We performed WGCNA analysis using the 988 DEGs. e coexpression network is a scale-free network, which means the logarithm log(k) of a node with a connection degree of k is negatively correlated with the logarithm log(P(k)) of the probability of occurrence of the node, and the correlation coefficient is greater than 0.8. R software package WGCNA was used to build a weighted coexpression network. To ensure that the network was a scale-free network, we chose a soft threshold of β � 6 ( Figure 1(c)).

Functional and Pathway Enrichment for DEGs.
e top 3 GO-BP and KEGG terms enriched by DEGs are shown in Table 1 and Figure 4. e DEGs in the brown module were mainly involved in functions such as inflammatory response (P � 4.88E − 07) and pathways such as the chemokine signaling pathway (P � 0.004195).
e DEGs in the turquoise module were mainly involved in functions such as the oxidation-reduction process (P � 9.70E − 3) and pathways such as metabolic pathways (P � 2.8E − 09).

Hub Genes.
e brown module was most relevant to the disease; therefore, we analyzed the correlation of gene expression in the brown module in the following study. Figure 5 shows that dual oxidase maturation factor 2 (DUOXA2), serum amyloid A (SAA) 1 and SAA2, TNFAIP3-interacting protein 3 (TNIP3), C-X-C motif chemokine (CXCL1), solute carrier family 6 member 14 (SLC6A14), and complement decay-accelerating factor (CD antigen CD55) were selected as hub genes.

Data Validation.
To verify the robustness of the hub genes, the validation data GSE75214 were obtained from the GEO database. We performed ROC curve analysis using GraphPad Prism7.00. e results of the analysis showed that the hub genes related to UC, including DUOXA2, SAA1, SAA2, TNIP3, CXCL1, SLC6A14, and CD55, were identified as potential tissue biopsy molecules for UC diagnosis (Table 2 and Figure 6).

Discussion
UC is a kind of inflammatory bowel disease that is difficult to treat, easy to recur, and prone to cancerization [11,12].Recently, many potential biomarkers for early diagnosis or treatment of UC have been identified after the development of biology technology; however, the mechanism of UC is still unknown. In this study, UC gene expression data were analyzed by WGCNA. We screened a total of 988 DEGs between UC samples and control samples, and identified 5 modules. Based on the correlation between the modules and occurrence or development of UC, we identified 7 hub genes after data verification. Combined with previous research, SLC6A14 and DUOXA2 might be critical biomarkers for UC diagnosis.  BioMed Research International   SLC6A14 and DUOXA2 are involved in the development and carcinogenesis of UC. Multiple sequencing or microarray studies have shown that SLC6A14 was upregulated in UC patients [5,10], 58 [13]. SLC6A14 may be involved in colonic inflammation by regulating glutamine (a substrate for SLC6A14) and nitric oxide synthase 2 (coordinated upregulation with SLC6A14 in inflamed cells) [14,15]. Furthermore, SLC6A14 is one such cancer-specific amino acid transporter and is essential for tumor growth [16]. DUOXA2, an ROS-generating enzyme expressed in the lower gastrointestinal tract, plays a critical role in host mucosal defense [17], which could be induced by the changes of gut microbiota [18]. DUOXA2 is the maturation partner of DUOX2, which participates in the signaling pathways against inflammation and regulates reactive oxygen species (ROS), mucin, IL-8, and matrix metalloproteinase-9 against invading microbial pathogens [19]. However, overproduction of H 2 O 2 could lead to oxidative stress resulting in oxidative injuries and mucosal barrier impairment [20]. In addition to its role in the persistent and recurrent inflammatory of UC, the DUOXA2/DUOX2 pathway is also involved in the development of UC-associated adenomas and colorectal cancer [21][22][23]. We supposed that SLC6A14 and DUOXA2 aberrantly expressed might promote the initiation and development of UC.
Our research also found that SAA, TNIP3, CD55, and CXCL1 were potential biomarkers for UC. SAA can reflect inflammation of UC at an early stage due to its higher sensitivity and specificity [24][25][26]. CXCL1 acts by specifically binding to its receptor, C-X-C chemokine receptor type 2 (CXCR2) [27]. Recent studies have shown that the CXCL1/ CXCR2 signaling pathway regulates the inflammatory response; moreover, the pathway causes tumor cell proliferation, angiogenesis, and lymph angiogenesis and promotes tumor invasion and vascular metastasis [28]. Previous studies have shown increased CD55 in stools and colonic mucosa of disease activity in patients with UC [29,30] and CD55 as the decay-accelerating factor can reflect the   carcinogenesis of UC [31][32][33]. TNIP3 is a negative regulator of nuclear factor (NF)-κB signal transduction in response to multiple stimuli [34]. Ishani Majumdar's study has demonstrated that the expression of TNIP3 negatively correlates with diseases severity in UC [35], which was contrary to our results. e contrary results might be related to the difference in disease severity and the genetic testing method. e results of functional and pathway DEGs enrichment in this study show that the biological functions involved in the pathogenesis of UC include the inflammatory response, innate immune response, and chemotaxis, indicating that the pathogenesis of UC was multifactorial, involving epithelial barrier defects, genetic predisposition, environmental factors, and dysregulated immune responses. e 7 hub genes screened in this study are not only related to mucosal inflammation but they also accelerate the progression of colon cancer, so they should be given proper attention in the treatment of UC.
Although we found 7 hub genes closely related to UC and confirmed the robustness of their diagnostic value, which may be useful for us to improve our understanding of the molecular mechanism of UC and as a potential prognostic and diagnostic biomarker, however, there were some limitations in this study such as small sample size and lack of verification test; thus, we still need large sample size with a wide verification analysis to confirm our hypothesis.

Conclusions
In conclusion, DUOXA2, A-SAA, TNIP3, CXCL1, SLC6A14, and CD55 might be used as potential biomarkers for UC tissue biopsy, especially SLC6A14 and DUOXA2, which may be new targets for UC gene therapy. Furthermore, DUOXA2/DUOX2 and CXCL1/CXCR2 pathways may play important roles in UC progression via the inflammatory response.
Data Availability e datasets analyzed during the current study are available in the GeneExpression Omnibus with the accession GSE87473 and GSE75214.

Conflicts of Interest
e authors declare that they have no conflicts of interest.