Identification of Transcriptional Modules and Key Genes in Chickens Infected with Salmonella enterica Serovar Pullorum Using Integrated Coexpression Analyses

Salmonella enterica Pullorum is one of the leading causes of mortality in poultry. Understanding the molecular response in chickens in response to the infection by S. enterica is important in revealing the mechanisms of pathogenesis and disease progress. There have been studies on identifying genes associated with Salmonella infection by differential expression analysis, but the relationships among regulated genes have not been investigated. In this study, we employed weighted gene coexpression network analysis (WGCNA) and differential coexpression analysis (DCEA) to identify coexpression modules by exploring microarray data derived from chicken splenic tissues in response to the S. enterica infection. A total of 19 modules from 13,538 genes were associated with the Jak-STAT signaling pathway, the extracellular matrix, cytoskeleton organization, the regulation of the actin cytoskeleton, G-protein coupled receptor activity, Toll-like receptor signaling pathways, and immune system processes; among them, 14 differentially coexpressed modules (DCMs) and 2,856 differentially coexpressed genes (DCGs) were identified. The global expression of module genes between infected and uninfected chickens showed slight differences but considerable changes for global coexpression. Furthermore, DCGs were consistently linked to the hubs of the modules. These results will help prioritize candidate genes for future studies of Salmonella infection.


Introduction
Chickens are an important component in the global agricultural economy by serving as one of the primary sources of proteins for humans. However, the poultry industry has been consistently threatened by various diseases, including those caused by viral, bacterial, and parasitic infections. Salmonella enterica serovar Pullorum (S. Pullorum) is one of the most important pathogens of poultry causing severe systemic disease [1,2]. To prevent and control . Pullorum in chickens, the host responses against this pathogen have been studied for decades. Although significant advances have been made, especially in the identification of molecules and genes involved in the host immune response [3,4] and mucosal inflammation [5,6], as well as their differential expression during infection [7][8][9][10], the precise pathways regulating immunity to Salmonella infection using a systems biology approach have not been investigated. Although gene differential expression analysis (DEA) provides important information, such as identification of genes that are expressed at different times during infection, which inform our understanding of pathogenesis, identifying gene interactions using a systems biology approach greatly enhances our knowledge at the mechanistic and regulatory levels. A large amount of information regarding gene interactions is available in microarray datasets and by applying network approaches the gap between individual genes and systems can be bridged [11][12][13]. The modularity in biological systems allows for both the study of independent components and identification of gene relationships within modules. Modern approaches, such as weighted gene coexpression network analysis (WGCNA) [14], can identify modules with expression levels that are 2 BioMed Research International highly correlated across samples and have been used to identify new candidate regulatory molecules and networks in Salmonella-infected pigs [15]. Differentially coexpressed modules (DCMs) can also be identified [16]. The holistic changes in modules would be reflected in transcriptional and coexpression changes for individual genes. In general, gene expression levels change during disease or infection, but some have reported that seemingly nonsignificant DEGs may also play a key role in a disease because their interactions with other genes change considerably [17]. These genes can be identified via differential coexpression analysis (DCEA), which can mine individual genes using a holistic approach [17][18][19]. Hence, combining the WGCNA and DCEA methods can identify interacting modules and differentially coexpressed genes (DCG) during infection, compared with controls. Here, we mined the molecular network relationships of the differential coexpression modules and genes using microarray data from spleens of . Pullorum-infected and uninfected chickens using WGCNA and DCEA ( Figure 1). The results complement traditional DEA and add to our understanding of the regulatory mechanisms that occur during Salmonella infection.

Microarray Data Harvesting and Processing.
A comprehensive transcriptomics dataset derived from microarray analysis of spleens from chickens challenged with 10 8 CFU of Salmonella enterica serovar Pullorum or mock-challenged with the same volume of distilled water (controls) was obtained from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/) (accession number: GSE59663). The dataset was generated with the Agilent oligo microarray chips containing 43,663 probe sets. In this study, we first streamlined the dataset by excluding 14,920 probe sets that were either unmappable to any gene IDs or mapped to multiple gene IDs. In the case of multiple probe sets mapped to one identical gene, the probe set, which is most often associated with the highest expression level, was maintained to ensure that only one probe set was left to investigate one gene. If more than one probe set was left after the above steps, their intensities were averaged. Finally, a one-to-one match between 13,538 probe sets and 13,538 genes was achieved.
Three biological replicates (chips) for each time point were available in the challenged group for these datasets. However, at each time point in the control group, only one chip was used to hybridize with the equally mixed mRNA sample containing the three control samples. We averaged the replicates for each time point, except at day 21, with the two replicates included and forming the dataset for the challenged group with 10 samples; this dataset was equivalent to the dataset of the control group. The dataset was quantile normalized by the function of normalizeQuantiles in R package limma [20].

Construction of Weighted Gene Coexpression Network
and Identification of Modules. Weighted gene coexpression network analysis (WGCNA) was used to detect coexpression modules from the dataset of challenged samples [14,21]. The R function of blockwise modules was implemented with the following parameters: power = 12, minModuleSize = 100, and networkType = "signed." Microarray data were processed as described below.
The pairwise Pearson's correlation coefficients were calculated for all the genes in the challenged groups, followed by the construction of an adjacency matrix using the power function: where and were the th and jth gene expression traits, respectively, which formed a signed weighted correlation network; and used default value (i.e., = 12). The topological overlap measure (TOM) was calculated as follows: where .total is the sum of connection strengths for a gene with the other network genes. is the other network genes. Afterwards, 1-TOM was calculated as a biological important measure for network interconnectedness. Genes with highly similar coexpression relationships were grouped together by performing hierarchical clustering on the topological overlap. Subsequently, genes were hierarchically clustered using 1-TOM as the distance measure and modules were determined by choosing a height cutoff of 0.995 for the resulting dendrogram. Highly similar modules were identified by clustering and merged together using a dynamic treecutting algorithm [14]. Eigengene refers to the first principal component for a given module and could be calculated to draw a module trajectory curve [14].

Identification of Differentially Coexpressed
Modules. Differentially coexpressed modules (DCMs) were identified using gene-set coexpression analysis (GSCA) that adopted the length-normalized Euclidean distance to measure the coexpression difference for the pairwise correlations between infected and control groups [16].
where was the number of gene pairs from the pairwise correlation for all the module genes. and were the correlation coefficients for a gene pair in the control and infected groups, respectively.
The null distribution for distance was constructed by permuting samples across conditions for 10,000 times to yield gene-set specific values. Modules with value < 0.01 were considered as significantly differentially coexpressed.

Identification of Differentially Coexpressed Genes.
The differential coexpression analysis (DCEA) was implemented by using R package DCGL, which is a useful tool to identify  differentially coexpressed genes (DCGs) and differentially coexpressed links (DCLs) [17][18][19]. The R function DCe was applied and then the values were adjusted for a false discovery rate (FDR) using the Benjamini-Hochberg method to reduce a large amount of false positive results [22]. The genes with FDR < 0.001 were selected as DCGs.

Gene Ontology (GO) and
Pathway Enrichment for Coexpression Modules. GO enrichment and KEGG pathway analyses for network modules were performed using Database for Annotation, Visualization, and Integrated Discovery (DAVID, v6.7) program using all chickens genes as the background [23,24]. The modified Fisher's exact test with an adjustment for multiple tests by Benjamini-Hochberg method was used to identify significantly enriched terms for module genes [22].

Network Visualization.
The complex network bioinformatics software Cytoscape (v3.1.1) was used to visualize the pairwise relationships between genes [25].

Weighted Gene Coexpression Network Analysis.
Using blockwiseModules R function ( = 12), a total of 19 modules ranging from 100 to 3,000 genes were recovered for the 13,538 Note. The column "Size" gives the gene numbers contained in every module. " summary" gives the score of module preservation. "Function" gives the module functions enriched by DAVID.
distinct genes in the S. Pullorum-infected group (Table 1). Each module was assigned a unique color, including gray color for the 373 unassigned genes. Genes in the same module shared the same or similar expression patterns that were catalogued by the trajectory curves ( Figure 2). Subsequent analysis using DAVID identified biological features in modules that were potentially associated with the infection by . Pullorum (Figure 6 and Table 1), such as the Jak-STAT signaling pathway (module lightgreen) [26], the extracellular matrix (ECM) (module grey60) [27],    cytoskeleton organization (module green), regulation of the actin cytoskeleton (module blue) [28], G-protein coupled receptor activity (module magenta), Toll-like receptor signaling pathways (module purple), and immune system processes (module blue). ECM genes and cell adhesion genes are significantly enriched in the module grey60 and cyan (FDR = 5.10 − 4 and 2.75 − 3), respectively ( Table 1). The grey60 and cyan modules also displayed significant similarity in expression patterns (eigengenes' correlation = 0.76; = 0.01).
These observations were in congruent with those reported earlier by others on the crucial role of host cell ECM proteins and bacterial outer membrane structures in the adhesion and invasion of Salmonella [27].

Module Stability.
To test the reproducibility of the identified modules, we performed a sampling test, in which we randomly selected half of the samples to calculate the new intramodule connectivity. The sampling was repeated 100 times and then the module stability was expressed as the correlation of intramodule connectivity between the original and sampled ones [29]. Most modules displayed good stability; module salmon was the least stable ( Figure 3).

Module Preservation Analysis.
We investigated whether the S. Pullorum-infected module was preserved in the corresponding controls by testing whether the infection-associated coexpression network can be replicated in the control groups. The preservation scores for all the modules were listed in Table 1, in which summary scores <2, between 2 and 10, and >10 indicate no evidence, weak-to-moderate evidence, and strong evidence for module preservation, respectively. Preservation analysis provided strong evidence to support the conservation of modules turquoise, brown, blue, yellow, green, red, pink, and purple, which all contained considerably large numbers of genes, but no evidence to support the preservation of modules lightcyan, lightgreen, magenta, and black associated with the membranes, the Jak-STAT signaling pathway, G-protein coupled receptor activity, and synapses, respectively (Table 1).

Module Gene Expression and Coexpression Comparison.
We compared the module genes' expression and coexpression level between the infected and control groups. The violin plot in Figure 4(a) showed that the gene expression for modules in the infection versus control groups is not significantly different, and the distribution for the expression intensities is similar. Subsequently, we compared the gene coexpression level by calculating the gene connectivity for each module. The module turquoise exhibits the largest connectivities since it includes the largest number of genes (2,998 genes). Modules blue (2,581 genes), yellow (1,122 genes), brown (1,349 genes), and green (1,056 genes), which include a considerable number of genes, display the next highest connectivities. In addition, the coexpression levels are different between modules in the two conditions. The coexpressions are strengthened in the infected state (Figure 4(b)).

Identification of Differentially Coexpressed Genes.
A total of 2,856 differentially coexpressed genes (DCG) were selected with a false discovery rate (FDR) of less than 0.001 using the DCe method in the DCGL package. And a total of 284,213 differentially coexpressed links (DCLs) were same signed, 82,619 were differently signed, and 272,491 were switched links. Furthermore, we mapped the DCGs for each module and found that the DCMs enrich the DCGs. For example, a total of 152 DCGs appeared in the module magenta with summary of 1.82 ( = 0), 231 DCGs in module black with summary of 1.97 ( = 0), and 147 DCGs in module salmon with summary of 2.14 ( = 0). In network biology, a hub gene is a good representative of a module. We identified the hub genes for all of the modules. Table 2 gives the gene names which are not only hub genes but also DCGs in each module.

Discussion
We constructed a gene network for the . Pullorum-infected chickens using weighted gene coexpression network analysis (WGCNA) from the data of time-series microarray. This module detection strategy utilizes the biological variability inherent in the prospective cohort study to reveal the modular organization and function of transcriptional systems. The time series expression profiles allow the study of the transcriptional regulation of these gene coexpression networks during infection. A network-based analysis provides a systems-level understanding of the relationships between members of a network by focusing on genomewide gene modules rather than individual genes [30]. Differential expression analysis (DEA) aims to identify genes that are expressed significantly higher or lower in one group compared with another. By contrast, WGCNA is not biased toward genes with significant changes in expression. Moreover, the dimensionality of microarray data in the present study was reduced from 13,538 genes to 19 modules, which significantly increased the ability to identify concordant changes in the expression of multiple genes.
The module expression analysis showed that module salmon was the least abundant but exhibited the largest variation in gene expression causing the instability in module construction. Gene expression was the most stable within module midnightblue (Figure 4(a)). The expression distribution for module genes in different conditions (infected versus control) was the same. The coexpression level was further compared. The coexpression comparison showed significant changes for different conditions. We investigated  the key modules and genes resulting in the differences. The GSCA analysis identified ten significantly differentially coexpressed modules (DCMs), which are in accordance with the module preservation results that are significant in coexpression differences with little evidence for preservation. Regulatory relationships among genes can be parsed as the pairwise correlations between gene expression levels, so the changes in coexpression patterns between two conditions may indicate dysfunctional regulatory systems in disease [31]. Module lightgreen associated with the Jak-STAT signaling pathway and lightcyan associated with membrane anchoring functions were the two weakest preserved modules (Table 1 and Figure 5). Thus, these modules may be associated with S. Pullorum infection in chickens. Furthermore, we investigated the driven genes leading to the coexpression difference. The differential coexpression analysis (DCEA) method was applied, and 2,856 differentially coexpressed genes (DCGs) were identified. Compared to the differential expression analysis (DEA), it was found that the overlapping of DCGs with the 234 DEGs (t-test value less than 0.01) was significant (hypergeometric test = 1.07 − 07), indicating that differential expression and differential coexpression are somewhat related to each other, which is consistent with a previous report [17]. However, there are many Salmonella infection-related genes identified by the DCEA method. The top one DCG identified is WASF1, which is an important gene in the Salmonella infection pathway and was not identified as a DEG (expression fold change: 1.08; t-test value of 0.34). The protein encoded by WASF1, a member of the Wiskott-Aldrich syndrome protein family, plays a critical role downstream of Rac, which is a Rho family of small GTPases, in regulating the actin cytoskeleton required for membrane ruffling. This gene associates with an actin nucleation core Arp2/3 complex while enhancing actin polymerization in vitro [32]. Another gene, CDC42 (fold change = 1.02; = 0.6), a member of the Rho subfamily of actin-organizing small GTP-binding proteins, interacts with WASF1 and is essential for . Typhimurium entry into host 8 BioMed Research International  Note. The column "Size" gives the gene numbers contained in every module. "GSCA.p" gives the value calculated by GSCA for modules between the two different conditions and the value less than 0.05 indicates that the module is significantly differentially coexpressed. "Hub and DCGs" shows the genes which are DCGs in the top ten hub genes. cells [33,34]. CDC42 was not selected as DCG with a false discovery rate (FDR) of 1.55 − 3, but it interacts with genes PAK7 (fold change = 1.95; = 0.64) [35,36], CDC42EP3 (fold change = 1.05; = 0.81) [37,38], PAK1 (fold change = 0.97; = 0.67) [39], PARD6B (fold change = 0.76; = 0.05) [40,41], PARD6A (fold change = 1.75; = 0.90) [29,40], and IQGAP2 (fold change = 1.58; = 0.12) [42], which are all identified as DCGs. Carow and Rottenberg reported that gene SOCS3, which was also identified as a DCG (fold change = 1.54; = 0.15), is a major regulator of infection and inflammation and controls immune homeostasis in physiological and pathological conditions such as infection and autoimmunity [43]. SOCS3 is a hub gene in the module lightgreen associated with the Jak-STAT signaling pathway, an important pathway for Salmonella infection [44]. It is well known that the Jak-STAT pathway can regulate cell growth, apoptosis, immunity, and inflammatory responses and because of its significance in the immune response, the Jak-STAT pathway is often exploited by pathogens [45]. In our study, we found that the Jak-STAT pathway genes were significantly enriched in the module lightgreen which is not detectable in controls. So we think that SOCS3 and the other Jak-STAT pathway genes may together regulate the activity of the organism in infection, which leads this module to be differentially coexpressed.
The above results showed some specific subnetworks for infection, in spite of a common network existing whether in the control or infected group. We constructed two coexpression networks from the top ten hub genes' expression profiles for each module from the two different conditions. As shown in Figures 6(a) and 6(b), common core networks, including the most preserved modules between infected and control groups, were present. However, some closely interacted subnetworks seen during infection disappeared in the control. These infection-specific subnetworks included genes that are members of the Jak-STAT signaling pathway (module lightgreen); others associated with membrane anchoring (module lightcyan), neuroactive ligand-receptor interaction (module salmon), and lysosomal processing (module tan), which suggested that these subnetworks dysregulated the systems during infection. Although only one dataset was used here, due to the lack of published related microarray datasets, these present results advance our understanding of the cell biology and immunoregulatory pathways involved in Salmonella infection in the chicken host.

Abbreviations
DAVID: Database for Annotation, Visualization, and Integrated Discovery DCEA: Differential coexpression analysis DCG: Differentially coexpressed gene DCL: Differentially coexpressed link DCM: Differentially coexpressed module DEA: Differential expression analysis ECM: Extracellular matrix FDR: False discovery rate GO: Gene Ontology GSCA: Gene-set coexpression analysis TOM: Topological overlap measure WGCNA: Weighted gene coexpression network analysis.

Conflicts of Interest
The authors declare that there are no conflicts of interest.