Comparative Analysis of Glycogene Expression in Different Mouse Tissues Using RNA-Seq Data

Glycogenes regulate a wide array of biological processes in the development of organisms as well as different diseases such as cancer, primary open-angle glaucoma, and renal dysfunction. The objective of this study was to explore the role of differentially expressed glycogenes (DEGGs) in three major tissues such as brain, muscle, and liver using mouse RNA-seq data, and we identified 579, 501, and 442 DEGGs for brain versus liver (BvL579), brain versus muscle (BvM501), and liver versus muscle (LvM442) groups. DAVID functional analysis suggested inflammatory response, glycosaminoglycan metabolic process, and protein maturation as the enriched biological processes in BvL579, BvM501, and LvM442, respectively. These DEGGs were then used to construct three interaction networks by using GeneMANIA, from which we detected potential hub genes such as PEMT and HPXN (BvL579), IGF2 and NID2 (BvM501), and STAT6 and FLT1 (LvM442), having the highest degree. Additionally, our community analysis results suggest that the significance of immune system related processes in liver, glycosphingolipid metabolic processes in the development of brain, and the processes such as cell proliferation, adhesion, and growth are important for muscle development. Further studies are required to confirm the role of predicted hub genes as well as the significance of biological processes.


Introduction
Cell function within an organism has huge variation due to gene expression pattern although all cells in an individual mammal have almost identical DNA [1].It is important to know how cells and tissues differ in gene expression which is regulated during developmental changes in different tissues, consequently affecting their biological functions.The pattern of glycogene expression is one of important factors that provide clues about many biological functions, developmental changes, and diseases in human, mouse, and tissues of other organisms [2][3][4].Glycogene is a gene that is responsible for the glycosylation of proteins, lipids, and proteoglycans and includes genes associated with the synthesis of glycans such as glycosyltransferase, sugar-nucleotide transporters, and sulfotransferases [5,6].Similarly, the term glycogenome involves all genes that play a role in glycosylation and represents more than 600 genes in the mouse genome [7].Glycosylation is one of the most common posttranslational modification reactions [8].The glycogene expression is relatively weak as compared to other molecules; however, different glycogenes are upregulated by specific tissues depending on the local conditions [5].
In order to understand the pathology of various diseases, recently, many studies have targeted and explored the role of glycogenes expression in various diseases [9][10][11][12].Alteration in the glycogene expression is also associated with drug resistance of various types of cancers including pancreatic International Journal of Genomics cancer epithelial-mesenchymal transition [13], breast cancer [14], and hepatocarcinoma [15].The change in the glycogene expression is also associated with conjunctival epithelium [2] and primary open-angle glaucoma (POAG) [16].Moreover, the defects in the normal expression of glycogenes in tonsillar B lymphocytes are also correlated with the proteinuria and renal dysfunction in IgA nephropathy [17,18].
Several high-throughput microarray studies have been applied to investigate the functional roles of various glycogenes [11,19,20].However, to the best of our knowledge, no study till date has utilized RNA-Seq data in the existing databases to specifically explore the role of glycogenes.Previously, we identified the role of glycogenes in skeletal muscle development in MYOG kd cells by using RNA-Seq data.Therefore, in this work, we intend to investigate the role of glycogenes in various biological processes that are involved in the development of brain, muscle, and liver tissues using the precomputed expression values from mouse RNAseq data.Functional and pathway analysis of differentially expressed glycogenes (DEGGs) between BvL579, BvM501, and LvM442 were enriched with gene ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, respectively.Additionally, the interaction networks of DEGGs were constructed to identify the hub genes as well as to detect the functional modules in these networks by using community analysis.We expect that the findings of our study may shed new lights on the roles of glycogenes in the development of these tissues and their pathogenesis.

Datasets.
The precomputed expression values with no spike for the brain, liver, and muscles C57BL6 mouse tissues were downloaded from WoldLab (http://woldlab.caltech.edu/∼alim/RNAseq/)for our computational analysis.These precomputed expression values represent the RNA-seq analysis initially performed by Mortazavi et al., using Illumina Genome Analyzer and expression values computed by using ERANGE package [21].To remove low expressed genes the dataset was filtered by removing genes with RPKM value <1 in all the three tissues.

Fold Change Analysis.
Fold change is often used in analysis of gene expression data of RNAseq experiments, for measuring change in the expression level of a gene and comparing the expression of genes between two sets of arrays, for example, case and control sets.Fold change analysis of genes commonly expressed between two different samples was done to compare the gene expression in glycogenes [22,23].Fold change was calculated as the ratio of brain versus liver, brain versus muscle, and liver versus muscle groups.Additionally, to further investigate the role of glycogenes in different tissues, the differentially expressed genes were manually verified in the UniProt database [24] to check whether they represent a glycogene or not.A gene was considered as a glycogene if it was annotated as a "glycoprotein" in the "Keywords" section of the UniProt database.
2.3.Functional Analysis.DAVID (http://david.abcc.ncifcrf.gov/home.jsp)functional annotation cluster analysis was performed on the list of differentially expressed glycogenes (DEGGs) with a fold change of ≥2.Only those terms that reported a  value of ≤0.05 and count number ≥5 genes were selected for analysis.The gene ontology (GO) term biological process (BP) in DAVID was used to categorize enriched biological themes in the list of DEGGs.Pathway mapping was performed using the KEGG Automatic Annotation Server (KAAS) (http://www.genome.jp/kegg/kaas/)[25].The amino acid sequences of these DEGGs were uploaded to the KAAS web server as an input using single-directional best hit (SBH) method to assign orthologs.KAAS offers functional annotation of genes in a genome via a BLAST similarity searches against a manually curated set of ortholog groups in the KEGG GENES database.KAAS assigned a KEGG orthology (KO) number to genes in the data sets, which were mapped to one of KEGG's reference pathways.

Network Analysis. The functional interactions between
DEGGs were analyzed by GeneMANIA webserver [26].The GO term biological process was used to create the interaction network between the DEGGs and 50 additional genes by using mouse as a source species.The relationship between the genes in the network includes coexpression, physical and genetic interactions, pathways, colocalization, protein domain similarity, and predicted interactions.The network was filtered by removing all the interactions where weights <0.1.

Identification of Hub Genes.
Generally, the biological networks exhibit the scale-free property [27] where only a few nodes in the network have many connections that represent hubs in the network.Hub genes were identified by calculating the node degree distribution [28] by using the NetworkAnalyzer plugin of Cytoscape.A glycogene with the highest degree distribution was considered as a hub in the current network.

Community Analysis.
The functional modules within the network were further detected by using the greedy community-structure detection algorithm, implemented in the Cytoscape plugin GLay (http://brainarray.mbni.med.umich.edu/sugang/glay/) [29].GLay offers Cytoscape users a diverse group of community structure algorithms and graph layout functions for network clustering and structured visualization [29].To identify the overrepresented biological functions within each cluster, the detected clusters were subjected to functional enrichment analyses by DAVID functional analysis tool.For enrichment analysis, only those communities were analyzed which have at least 10 nodes.

Identification of Differentially Expressed Glycogenes (DEGGs).
We downloaded the precomputed RPKM values for mouse transcriptome that was mapped and quantified previously by RNA-Seq analysis [21].This RNA-Seq data In addition to DAVID functional analysis, we also identified the biological pathways of BvL579, BvM501, and LvM442 DEGGs annotated in the present study.FASTA formatted amino acid sequences of DEGGs in these three sets were fed into the KAAS for prediction of distinct pathways.A total of 210 pathways were predicted for BvL579, whereas 193 and 198 pathways were predicted for BvM501 and LvM442, respectively.The top 10 KEGG pathways for each of the three categories are shown in Table 3 and a complete list of all pathways is provided in Supplementary Table S1 (Supplementary Material is available online at http://dx.doi.org/10.1155/2014/837365).From these tables, it can be observed that the majority of the DEGGs were found to be associated with important biological processes, many being classified in signaling (e.g., PI3K-Akt and Rap1 signaling) pathways, cytokine-cytokine receptor interaction, or ECM-receptor interaction or being involved in adhesion related functions.additional genes that are related to a set of query genes by using a very large set of functional interaction data [30].The genes showing significant interactions with weights higher than 0.1 were selected only for the network analysis.By integrating these relationships, a network between DEGGs and additional related genes was constructed for all the three groups, namely, BvL579, BvM501, and LvM442 (Figure 2).A GeneMANIA network analysis for interactions among the glycogene products suggested enrichment of GO terms related to positive regulation of locomotion, cell motility, and migration for BvL579 (Table 4(a)), as well as LvM442 samples (Table 4(c)).On the contrary, the processes related to extracellular matrix were overrepresented in BvM501 samples (Table 4(b)).

Interaction Network Construction and
The three interaction networks created from GeneMA-NIA were then exported to Cytoscape 2.8.2, a bioinformatics package for biological network visualization and data integration [31].The initial network for BvL579 consists of 395 nodes and 1149 edges which were further filtered to 395 nodes and 1109 edges by removing duplicate edges (Figure 2(a)).In case of BvM501, the initial network consists of 352 nodes and 794 edges, whereas the filtered network has 352 nodes and 748 edges (Figure 2(b)).Similarly, the initial network for LvM442 network has 307 nodes and 695 edges, while the final network is represented by 307 nodes and 653 edges (Figure 2(c)).All the genes in the network are represented by circles and the interactions between them are represented as edges.Additionally, each query gene is shown in red whereas the additional related genes predicted by GeneMANIA are shown in cyan (Figure 2).
The node degree was then calculated for all the nodes in each network by using Network Analyzer plugin of Cytoscape.The higher the node degree, the more important the gene was, and the gene was denoted as a hub gene.The genes with highest node degree include PEMT (BvL579) and IGf2 (BvM501), having node degree of 29 and 14, respectively.Both PEMT and IGF2 were predicted by GeneMANIA as related genes in their respective networks and do not represent the glycogenes.However, the genes with second highest node degrees of 23 and 13 are represented by glycogenes HPXN and NID2 in BvL579 and BvM501 networks (Figures 3(a) and 3(b)).Additionally, NID2 was also predicted by GeneMANIA as a related gene in BvM501 network because it was removed from the query list as it showed a fold change of less than 2 in our analysis.In LvM442 network, the top node degree genes are STAT6, and the glycogene FLT1 (Figure 3(c)), both having the node degree of 13.

Community Analysis and Functional Annotation of the Detected Modules.
In order to identify the biologically related genes in the three networks, we employed Fast Greedy community-structure identification algorithm, implemented in the Cytoscape plugin GLay (http://brainarray.mbni.med.umich.edu/sugang/glay/)[29] to identify functional modules, 31 (Figure 4(a)), 40 (Figure 4(b)), and 31 (Figure 4(c)) clusters were detected for BvL579, BvM501, and LvM442 networks, respectively.However, only those communities were selected for enrichment analyses which have at least 10 nodes.Therefore, based on this criterion, 8, 9, and 8 communities for BvL579, BvM501, and LvM442 networks were finally analyzed for overrepresentation of GO terms.To biologically categorize these clusters, DAVID functional analysis tool was used to classify the genes in each module and observed the enrichment of GO term "Biological Process" in all the selected modules.The enrichment analysis for the modules for three networks is described as follows.
BvL579.The top 10 statistically significant enriched GO terms for DEGGs in all 8 clusters for BvL579 community analysis are summarized in Table 5(a).The most statistically significant GO terms that were enriched in cluster 1 were related to immune response, complement activation, and protein maturation.Clusters 2, 3, and 7 show enrichment for GO terms response to wounding, blood coagulation, and hemostasis.Clusters 2 and 3 also show overrepresentation of terms such as blood vessel development and vasculature development.The GO terms of cluster 4 were mostly related to hair cycle, development of epidermis, and ectoderm.The GO functions of cluster 5 were most related to localization, motility and migration of cells, and regulation of cell proliferation.Cluster 6 was significantly enriched with functions related to homeostasis, transport, and metabolism of cholesterol, and lipid as well as sterol.Cluster 8 showed overrepresentation of functional terms BvM501.The top 10 statistically significant enriched GO terms for DEGGs in all 9 clusters for BvM501 community analysis are summarized in (Table 5(b)).Cluster 1 of this network shows overrepresentation of functions related to regulation of cell growth, protein maturation, and immune response.Cluster 2 is enriched in GO terms related to proteoglycan metabolic process as well as the development of blood vessel and vasculature.Cluster 3 is enriched in the functions related to the development of bone, cartilage, and skeletal system.The GO terms that are overrepresented in cluster 4 show enrichment for functions related to adhesion, such as biological adhesion, cell adhesion, cell-substrate adhesion, and cell-matrix adhesion.The processes related to localization, motility, and migration of cells are also overrepresented in cluster 4. Cluster 5 showed overrepresentation of functional terms related to the metabolism of glycosphingolipid, glycolipid, sphingolipid, and ganglioside.Additionally, the functions related to the metabolism of lipid and oligosaccharides are also enriched in this group.In cluster 6, functions related to homeostasis, remodelling, and regulation of transport are overrepresented.Response to hypoxia and oxygen levels, apoptosis, and metabolic processes such as phospholipid and organophosphate metabolic process are enriched in cluster 7. Similar to cluster 4, the GO terms that are overrepresented in Cluster 9 show enrichment  for functions related to biological adhesion.Protein folding and metal ion transport are the other important enriched functions present in this cluster.Cluster 8 did not yield the enrichment of any statistically significant GO category.
LvM442.Table 5(c) summarizes the top 10 statistically significant enriched GO terms for DEGGs in all 8 clusters for LvM442 community analysis.The GO terms related to development and morphogenesis of blood vessels, vasculature, and tube are enriched in cluster 1. Cluster 2 shows enrichment of functions related to various types of responses, such as response to wounding, external stimulus, and defense as well as inflammatory response.Other enriched functional groups in this cluster are regulation of cell growth, proliferation, and cell activation.One of the overrepresented functional groups in cluster 3 shows enrichment for functions related to biological adhesion, lipid catabolic process, and 4-hydroxyproline metabolic process.Cluster 4 represents the enrichment of functions related to protein folding and bone remodeling and resorption.Cluster 5 shows overrepresentation of GO terms such as regulation of T cell mediated cytotoxicity or immunity, regulation of leukocyte mediated cytotoxicity, and regulation of cell killing.Clusters 6 and 7 are enriched in functions related to immune or inflammatory response and complement activation.Finally, cluster 8 exhibited overrepresentation of signaling pathways such as hepatocyte growth factor receptor signaling and mesenchymal-epithelial cell signaling, in addition to processes concerning MAP kinase activity.

Discussion
The current study offers the first thorough insight into the glycogene analysis of brain, muscle, and liver tissues from mouse RNA-Seq data.Understanding the structure and function of these glycogenes is essential for studying the development of various tissues as well as their functional roles.Most of the serum glycoproteins originates from liver suggesting that liver diseases associated with aberrant glycosylation can be reflected by the changes in serum glycoproteins [32].Recently, the significance of glycomics of central nervous system (CNS) to identify potential glycobiomarkers in neurological diseases [33] and alterations in brain glycoproteins resulting from the aging process was shown [34].Recent studies in the glycomics field also offered insights into the biological significance of the glycome in the pathogenesis of diseases in humans [32].Additionally, many studies have highlighted the significance of glycoconjugates during skeletal muscle development [7,23,35].
With the evolution of bioinformatics during the past decades, molecular target discovery and targeted therapeutics have become a critical remedial treatment for diseases [36].In this work, we screened DEGGs between brain, muscle, and liver tissues using RNA-Seq data and performed functional and pathway enrichment analysis using DAVID and KEGG tools.The most significant enriched GO terms for BvL579 are processes related to response to wounding or inflammatory response.These processes are represented by genes that play a prominent roles in innate immunity such as Mannan-binding lectin serine protease 1 (MASP1), Mannan-binding lectin serine protease 2 (MASP2), Mannosebinding protein A (MBL1), and Mannose-binding protein C (MBL2) [37,38].BvM501 is enriched in functions related to glycoprotein metabolic processes and includes genes that code for various transferases such as N-acetylglucosamine-1-phosphotransferase subunit gamma (GNPTG) [39], Betagalactoside alpha-2,6-sialyltransferase 1 (ST6GAL1) [40], and Beta-1,3-galactosyltransferase 6 (B3GALT6) [41].Similarly, the top statistically significant enriched processes in LvM442 are related to protein maturation or processing and are represented by genes such as Battenin (CLN3) and Methionine aminopeptidase 2 (METAP2).The CLN3 protein is involved in the late endosomal/lysosomal membrane transport [42], whereas METAP2 protein catalyzes the hydrolytic cleavage of the N-terminal methionine from newly synthesized polypeptides [43].
Additionally, three interaction networks of DEGGs were constructed for BvL579, BvM501, and LvM442 samples, and node degrees of genes were calculated.HPXN, NID2, and FLT1 were the glycogenes with the highest degree in BvL579, BvM501, and LvM442 networks, respectively.However, these genes represent the second highest node degree in their International Journal of Genomics  respective networks as genes (PEMT, IGF2, and STAT6) with top node degree that do not code for glycoproteins.HPXN (∼60 kDa glycoprotein) is mainly synthesized by liver cells [44], secreted to the plasma where it binds free heme or hemin and inhibits its role in free radical reactions [45].HPXN is also reported to be expressed by the cells of immune systems, ganglionic and photoreceptor cells of the retina, cells of the peripheral nervous system, and the mesangial cells of kidney [46][47][48].Recently, it was shown that HPX protein is also present in various regions of mouse brain [49].NID2 is homologous to another member of the nidogen family, NID1, and both are found in all basement membranes (BMs) [50].Both nidogens have a similar distribution in various organs during development; however, in International Journal of Genomics 11 Cluster number Term P value GO:0051450∼myoblast proliferation 0.002206368 GO:0048012∼hepatocyte growth factor receptor signaling pathway 0.002206368 GO:0060665∼regulation of branching involved in salivary gland morphogenesis by mesenchymal-epithelial signaling 0.003674845 GO:0060638∼mesenchymal-epithelial cell signaling 0.004408354 GO:0060693∼regulation of branching involved in salivary gland morphogenesis 0.006605966 GO:0060688∼regulation of morphogenesis of a branching structure 0.018252999 GO:0001889∼liver development 0.031208893 GO:0000187∼activation of MAPK activity 0.036205594 GO:0043406∼positive regulation of MAP kinase activity 0.042595839 adult tissues nidogen-2 distribution becomes more confined [51][52][53].Additionally, these nidogens show a broad range of interacting partners including other BM proteins such as laminin, collagen IV, and perlecan [51,[53][54][55].They are involved in various functions including the regulation of cell attachment [56], neutrophil chemotaxis [57], trophoblast outgrowth [58], angiogenesis [59], osteoblast, and myogenic differentiation [60,61].Vascular endothelial growth factors (VEGFs) constitute a family of six polypeptides (VEGF-A, -B, -C, -D, -E, and PlGF) that regulate blood and lymphatic vessel development [62].VEGF signaling occurs by binding to various cellular receptors such as VEGFR1 (FLT1), VEGFR2 (FLK1), and VEGFR3 (FLT4) [63,64] and neuropilin-1 [65] and -2 (NRP1 and NRP2) [66] and heparan sulfate proteoglycans (HSPG) [67].FLT1 and FLK1 are closely related receptor tyrosine kinases and both share common and specific ligands [68].FLT1 has weaker kinase activity than FLK1 [68]; however, FLT1 is essential for normal development and angiogenesis as reported in previous FLT1 null mutant mice studies [69][70][71].
Recognizing the structure and function of biological networks is indispensable for the investigation of biological processes.In this work, we identified 8, 9, and 8 functional modules or communities for BvL579, BvM501, and LvM442 networks using fast greedy algorithm implemented as GLAY [29], plugin for cytoscape.Furthermore, the function of International Journal of Genomics each module in all the three networks was explored by using functional annotation tool DAVID.Our analysis shows that the modules that are common in BvL579 and LvM442 networks show enrichment for processes related to immune and inflammatory response and response to wounding.Liver is known to play an important role in innate immunity, an important primary line of defense against infection [72].Additionally, Kupffer cells in the liver are one of the earliest to be affected by bacterial or sterile insults and add to the inflammatory response [73].This sterile inflammation can be responsible for liver injury; it may also play a role in liver repair [72].In case of the modules of BvM501 and LvM442 networks, the enrichment of processes related to cell adhesion, proliferation, and regulation of cell growth is observed.The expression of myogenic regulatory factors (MRFs) describes the different stages of skeletal muscle development that includes myoblast proliferation, cell-cycle exit, cell fusion, and the maturation of myotubes to form myofibers [23].Cell adhesion molecules such as cadherins are glycoproteins that mediate homotypic cell-cell adhesion through their extracellular domain [74], and this cadherindependent adhesion is necessary for diverse types of cellular functions [75].Finally, glycosphingolipid or glycolipid metabolic processes show overrepresentation in modules of BvL579 and BvM501 networks.Glycosphingolipids are universally expressed in all vertebrate cells as well as body fluids, but they are richer in the nervous system [76].Previous studies have established the role of glycosphingolipids in the development of the brain among various species [77,78].

Conclusion
Overall, in this work we have used a systems biology approach to identify the DEGGs, their functional enrichment, and identified potential hub genes by constructing three glycogene interaction networks.We also identified functional modules in these networks and indicate the significance of immune system related processes in liver, and glycosphingolipid metabolic processes in the development of brain.Similarly, we observe that the processes such as cell proliferation, adhesion, and growth are important for muscle development.The findings deduced from the current work are in consistence with the previous studies.Experimental validation will be required to confirm the predictions made in this study that will establish the role of predicted hubs as well as enriched functional processes in these tissues.

Figure 2 :
Figure 2: Interaction networks between DEGGs and additional related genes: (a) BvL579, (b) BvM501, and (c) LvM442.In each of these networks, red indicates a DEGG whereas the GeneMANIA predicted genes are shown in cyan.The top two hubs are shown as purple diamonds.The size of the nodes represents the node degree.

Figure 3 :
Figure 3: Top glycogenes as hubs and their first neighbors.(a) HPXN gene as hubs in BvL579 interaction network.(b) NID2 as hub gene in BvM501 interaction network.(c) FLT1 gene as hub in LvM442 interaction network.

Figure 4 :
Figure 4: Communities generated by fast greedy (GLay) clustering algorithm are shown.Clusters for (a) BvL579 network, (b) BvM501 network, and (c) LvM442 network are shown.In each community, DEGG nodes are represented by red circles whereas the cyan nodes represent GeneMANIA predicted genes.Hub genes are shown as purple diamonds.

Table 1 :
Gene expression summary.(a)Total number of genes in each sample before and after filtering.Each tissue sample consists of 33598 genes that were filtered by removing all those genes that showed RPKM value < 1 consists of RNA from adult mouse brain, liver, and skeletal muscle tissues and represents 33598 genes in each tissue sample.The data was further filtered by excluding the genes with RPKM values <1 from the analysis.As a result, 15,237, 11,920, and 12,202 genes were identified from brain, liver, and muscle samples, respectively (Table1(a)).Then three groups (brain versus liver, brain versus muscle, and liver versus muscle) were created and the number of genes that were common in each group were identified and summarized in Table1(b).From the table, it can be observed that the number of common genes between brain versus liver is 10653, brain versus muscle is 10820, and liver versus muscle is 9833, respectively.These shared genes in each group were then used to calculate the fold change, which was defined as the ratio of RPKM values of brain versus liver, brain versus muscle, and liver versus muscle groups.In this study, the total fold change of ≥2 was considered to classify the differentially expressed genes (DEGs).

Table 2 :
Top 10 significantly enriched gene ontology (GO) terms detected by FAC in differentially expressed glycogenes between (a) BvL579, (b) BvM501, and (c) LvM442.Only those terms which reported a P value of ≤0.05 and count number ≥5 genes were selected for the analysis.

Table 4 :
10 enriched gene ontology (GO) terms for DEGGs and additional related genes as reported by GeneMANIA for each category (a) BvL579, (b) BvM501, and (c) LvM442.

Table 5 :
Enrichment of GO terms for BvL579, BvM501, and LvM442 cluster analysis.