During the last years, proteomic studies have revealed several interesting findings in experimental sepsis models and septic patients. However, most studies investigated protein alterations only in single organs or in whole blood. To identify possible sepsis biomarkers and to evaluate the relationship between protein alteration in sepsis affected organs and blood, proteomics data from the heart, brain, liver, kidney, and serum were analysed. Using functional network analyses in combination with hierarchical cluster analysis, we found that protein regulation patterns in organ tissues as well as in serum are highly dynamic. In the tissue proteome, the main functions and pathways affected were the oxidoreductive activity, cell energy generation, or metabolism, whereas in the serum proteome, functions were associated with lipoproteins metabolism and, to a minor extent, with coagulation, inflammatory response, and organ regeneration. Proteins from network analyses of organ tissue did not correlate with statistically significantly regulated serum proteins or with predicted proteins of serum functions. In this study, the combination of proteomic network analyses with cluster analyses is introduced as an approach to deal with high-throughput proteomics data to evaluate the dynamics of protein regulation during sepsis.
Proteomic studies and broad analyses of protein alterations in experimental and clinical sepsis allow evaluating the systemic host response to a hit or injury and offer comprehensive information about the complex host response to infection [
Furthermore, other proteomic studies identified peptides as possibly useful sepsis biomarkers [
To identify possible protein regulation patterns and to evaluate the interaction between protein alteration in sepsis affected organs and blood, proteomics data from the heart, brain, liver, kidney, and serum from previous studies was analysed [
Modern technologies make it possible to identify and quantify a large amount of different proteins in proteomic experiments. Thus, big data analyses have become a bottleneck and represent a great challenge in proteomics [
In five previous studies, male Wistar rats were randomly assigned to a sepsis group (cecal ligation and puncture, CLP) or a control group (sham) [
Significantly altered proteins were identified by mass spectrometry (MALDI-TOF MS) and used for further bioinformatical analysis to identify underlying networks, signalling cascades, and pathways affected.
In summary, as a first step, statistically significantly regulated proteins from blood and organ tissues of previous studies were identified and analysed by network analyses (GeneMania®). Afterwards, those statistically significant proteins were grouped (12, 24, and 48 hrs) using a hierarchical cluster analysis (Perseus®). As a third step, proteins of similarly early upregulated clusters underwent further network analysis to evaluate possible corresponding proteins or functions in blood and organ tissues. This approach to deal with pooled proteomic data is described in detail below.
Sixty proteins from sepsis related organs (liver, kidney, heart, and brain) and twenty proteins from a serum analysis which were significantly altered, at least at one time point (12, 24, and 48 hours), were used for further bioinformatical analysis to identify underlying networks, signalling cascades, and pathways affected.
Biological functions of statistically significantly regulated proteins were identified using functional network analysis. GeneMania (
As these software programs use different algorithms, we decided to perform the bioinformatical analyses with all of them in order to retrieve the highest number of predicted interactions, maintaining an acceptable level of confidence (0.400).
The associated functions detected by the software were downloaded in TAB-separated-values format and exported to Microsoft Excel® (Microsoft, Redmond, USA; version 2007) where they were filtered in subgroups which were reanalysed using GeneMania.
Heat maps are an efficient method of visualizing complex datasets organized as matrices [
On the basis of the cluster analysis, further subgroup network analyses of similarly upregulated proteins at 12 hours or 12 and 24 hours after sepsis induction in sepsis related organs (liver, kidney, heart, and brain) were performed to find regulation patterns and identify possible biomarkers.
Collecting data from the 5 previous studies [
Using GeneMania, separate network analyses regarding serum proteins (Figure
Thirty-eight functions filtered by prevalence (cutoff ≥ 12%) from the original 159 functions derived from GeneMania network analysis of the whole dataset without the serum proteins. Column 1 shows the functions names. Columns 2 and 3 show, respectively, the number of annotated genes in the displayed network and the number of genes with that annotation in the genome. In column 5, names in bold letters represent the genes predicted by the software.
Function | Genes in network | Genes in genome | Ratio | Names |
---|---|---|---|---|
NADH metabolic process | 6 | 12 | 50.00% | gpd1, dlst, ogdh, |
Oxaloacetate metabolic process | 4 | 11 | 36.36% | got1, |
Tricarboxylic acid cycle | 5 | 15 | 33.33% | dlst, ogdh, aco2, suclg2, |
Tricarboxylic acid cycle enzyme complex | 3 | 11 | 27.27% | dlst, ogdh, suclg2 |
NAD metabolic process | 6 | 24 | 25.00% | gpd1, dlst, ogdh, |
Aerobic respiration | 5 | 25 | 20.00% | dlst, ogdh, aco2, suclg2, |
Fatty-acyl-CoA binding | 3 | 15 | 20.00% | acadl, pitpna, |
Succinate metabolic process | 2 | 10 | 20.00% | aldh5a1, suclg2 |
Pentose-phosphate shunt | 2 | 10 | 20.00% | g6pd, |
Ribonucleoside diphosphate biosynthetic process | 2 | 10 | 20.00% | |
Pentose metabolic process | 2 | 10 | 20.00% | g6pd, |
NADPH regeneration | 2 | 10 | 20.00% | g6pd, |
2-Oxoglutarate metabolic process | 3 | 16 | 18.75% | got1, dlst, ogdh |
Nicotinamide nucleotide metabolic process | 8 | 43 | 18.60% | gpd1, dlst, ogdh, g6pd, |
Pyridine nucleotide metabolic process | 8 | 43 | 18.60% | gpd1, dlst, ogdh, g6pd, |
MHC class I protein binding | 2 | 11 | 18.18% | |
ADP metabolic process | 2 | 11 | 18.18% | |
Positive regulation of glycolysis | 2 | 11 | 18.18% | gpd1, |
Oxidoreductase activity, acting on the aldehyde or oxo group of donors, NAD or NADP as acceptor | 4 | 25 | 16.00% | aldh5a1, ogdh, gapdh, aldh7a1 |
Monosaccharide catabolic process | 10 | 63 | 15.87% | aldoa, akr1a1, gapdh, eno1, g6pd, fbp1, gpd1, |
Glucose catabolic process | 9 | 57 | 15.79% | fbp1, gpd1, aldoa, gapdh, eno1, g6pd, |
Neurotransmitter metabolic process | 3 | 19 | 15.79% | aldh5a1, glul, pebp1 |
Pyridine-containing compound metabolic process | 8 | 51 | 15.69% | gpd1, dlst, ogdh, g6pd, |
Monosaccharide biosynthetic process | 8 | 51 | 15.69% | gnmt, akr1a1, gapdh, g6pd, fbp1, gpd1, |
Oxidoreduction coenzyme metabolic process | 8 | 51 | 15.69% | gpd1, dlst, ogdh, g6pd, |
Acetyl-CoA metabolic process | 4 | 26 | 15.38% | acss1, fasn, |
Glycolysis | 7 | 46 | 15.22% | fbp1, gpd1, aldoa, gapdh, eno1, |
Glutamate metabolic process | 3 | 20 | 15.00% | aldh5a1, got1, glul |
Hexose catabolic process | 9 | 61 | 14.75% | fbp1, gpd1, aldoa, gapdh, eno1, g6pd, |
Gluconeogenesis | 6 | 44 | 13.64% | gnmt, gapdh, fbp1, gpd1, |
Dicarboxylic acid metabolic process | 10 | 76 | 13.16% | gnmt, ogdh, glul, suclg2, aldh5a1, got1, dlst, |
Hexose biosynthetic process | 6 | 46 | 13.04% | gnmt, gapdh, fbp1, gpd1, |
Purine nucleoside triphosphate biosynthetic process | 3 | 23 | 13.04% | adk, aldoa, |
Oxidoreductase activity, acting on the aldehyde or oxo group of donors | 4 | 31 | 12.90% | aldh5a1, ogdh, gapdh, aldh7a1 |
Single-organism carbohydrate catabolic process | 11 | 90 | 12.22% | cps1, aldoa, akr1a1, gapdh, eno1, c6pd, fbp1, gpd1, |
Regulation of glycolysis | 3 | 25 | 12.00% | fbp1, gpd1, |
Proton-transporting two-sector ATPase complex | 3 | 25 | 12.00% | atp6v1b1, |
Hydro-lyase activity | 3 | 25 | 12.00% | uroc1, aco2, eno1 |
Network analysis of serum proteins. In a GeneMania network analysis, each circle represents a gene. The input proteins/genes are depicted as striped circles of the same size, while the monochromatic circles, whose size is proportional to the number of interactions according to the software, can be considered “relevant” related genes found by GeneMania searching in many large, publicly available biological datasets (including protein-protein, protein-DNA, and genetic interactions, pathways, reactions, gene and protein expression data, protein domains, and phenotypic screening profiles). Lines linking different circles can be distinguished from their colour; mainly violet represents coexpression (when expression levels are similar across conditions in a gene expression study); light orange represents predicted functional relationships between genes.
Most of the functions were associated with oxidoreductive activity and cell energy generation or metabolism (ATP production, tricarboxylic metabolism, glycolysis, gluconeogenesis, cell respiration, etc.) and nucleotide or nucleoside metabolism. One-third of the proteins found are usually located in the mitochondria.
The functions identified with statistically significant altered serum proteins using 2% as cutoff for prevalence are shown in Table
Network analysis serum functions prevalence. Twenty-nine functions filtered by prevalence (cutoff ≥ 2%) from the original 166 functions derived from GeneMania® network analysis of the serum-protein dataset. Column 1 shows the functions names. Columns 2 and 3 show, respectively, the number of annotated genes in the displayed network and the number of genes with that annotation in the genome. In column 5, names in bold letters represent the genes predicted by the software.
Function | Genes in network | Genes in genome | Ratio | Names |
---|---|---|---|---|
Blood microparticle | 22 | 97 | 22.68% | apcs, hp, c3, tf, apoa1, cfb, apoe, serping1, fga, alb, itih4, gc, |
Glycerolipid metabolic process | 9 | 211 | 4.27% | c3, apoa1, apoe, |
Phospholipid binding | 9 | 222 | 4.05% | apoe, apoa1, |
Negative regulation of hydrolase activity | 9 | 264 | 3.41% | fetub, kng2, apoa1, serping1, |
Lipid transport | 8 | 174 | 4.60% | apoe, apoa1, |
Regeneration | 8 | 184 | 4.35% | fga, hp, apoa1, apoe, |
Enzyme inhibitor activity | 8 | 197 | 4.06% | fetub, apoa1, serping1, |
Wound healing | 8 | 287 | 2.79% | fga, c3, apoe, |
High-density lipoprotein particle | 7 | 15 | 46.67% | apoe, apoa1, |
Plasma lipoprotein particle | 7 | 19 | 36.84% | apoe, apoa1, |
Protein-lipid complex | 7 | 20 | 35.00% | apoe, apoa1, |
Acylglycerol metabolic process | 7 | 75 | 9.33% | c3, apoe, |
Neutral lipid metabolic process | 7 | 77 | 9.09% | c3, apoe, |
Acute inflammatory response | 7 | 96 | 7.29% | hp, c3, tf, itih4, serping1, |
Lipid localization | 7 | 136 | 5.15% | apoe, apoa1, |
Regulation of lipid metabolic process | 7 | 229 | 3.06% | c3, apoa1, apoe, |
Regulation of body fluid levels | 7 | 246 | 2.85% | c3, apoe, gc, fga, |
Extracellular matrix | 7 | 262 | 2.67% | apcs, alb, tf, rbp3, |
Triglyceride-rich lipoprotein particle | 6 | 14 | 42.86% | apoe, apoa1, |
Very-low-density lipoprotein particle | 6 | 14 | 42.86% | apoe, apoa1, |
Triglyceride metabolic process | 6 | 67 | 8.96% | c3, apoe, |
Organ regeneration | 6 | 92 | 6.52% | hp, apoa1, |
Blood coagulation | 6 | 110 | 5.45% | c3, apoe, fga, |
Hemostasis | 6 | 112 | 5.36% | c3, apoe, fga, |
Coagulation | 6 | 115 | 5.22% | c3, apoe, fga, |
Negative regulation of endopeptidase activity | 6 | 156 | 3.85% | fetub, kng2, serping1, |
Lipid catabolic process | 6 | 157 | 3.82% | apoe, |
Negative regulation of peptidase activity | 6 | 159 | 3.77% | fetub, kng2, serping1, |
Steroid metabolic process | 6 | 200 | 3.00% | gc, apoa1, apoe, |
Alcohol metabolic process | 6 | 211 | 2.84% | gc, apoa1, apoe, |
Regulation of endopeptidase activity | 6 | 276 | 2.17% | fetub, kng2, serping1, |
Organic anion transport | 6 | 279 | 2.15% | dpysl2, apoa1, apoe, |
Regulation of peptidase activity | 6 | 288 | 2.08% | fetub, kng2, serping1, |
Quantitative information concerning proteins that had statistically significant altered expression in the liver, kidney, heart, and brain at 12, 24, and 48 hours from the induction of sepsis was analysed using Perseus (Max Planck Institute of Biochemistry, Martinsried, Germany; v. 1.5.8.5) which performed the hierarchical cluster analysis (Figure
Heat map of the hierarchical cluster analysis of significantly regulated proteins of sepsis related organs. Three subclusters with significantly upregulated proteins at 12 or 12 and 24 hours are highlighted. A brick can progressively become darker up to a completely black one that would represent a fold change equal to 1 (therefore, no change between sepsis and sham groups). On the contrary, a green brick represents a protein whose expression at a particular time was decreased when compared to the value of the same protein in the sham group at that time.
The cluster analysis revealed several groups of regulation patterns with different combinations of proteins up/downregulated or unchanged at different time points. Three subclusters of similarly upregulated proteins at 12 or 12 and 24 hours were identified. Since these early upregulated subclusters may contain possible candidates for sepsis biomarkers, further network analyses were conducted for these subgroups highlighted in Figure
Heat map of the hierarchical cluster analysis of significantly regulated serum proteins. Two subclusters with significantly upregulated proteins at 12 or 12 and 24 hours are highlighted. A brick can progressively become darker up to a completely black one that would represent a fold change equal to 1 (therefore, no change between sepsis and sham groups). On the contrary, a green brick represents a protein whose expression at a particular time was decreased when compared to the value of the same protein in the sham group at that time.
In the same way, a cluster analysis of statistically significantly regulated serum proteins was performed (Figure
The subclusters of similarly up- and downregulated proteins in the first 24 hours after sepsis induction for both sepsis related organs and serum underwent further GeneMania analyses to identify networks and predicted proteins within these networks and their associated functions. By identifying predicted proteins, we expected a higher likelihood of finding statistically significantly regulated proteins both in organ tissues and in serum. For subcluster 1 in the organ tissue cluster analysis, we found no network using GeneMania.
The network for subcluster 2 revealed 19 functions filtered by absolute number (cutoff ≥ 5) and 17 functions filtered by prevalence (cutoff ≥ 10%) (Suppl. Tables
Using the same cutoff values in subcluster 3, 27 functions filtered by absolute number and 20 functions filtered by prevalence were found (Suppl. Tables
In serum proteins, a network analysis of subcluster 1 (Figure
In this study, proteomic data of various experiments all using the same experimental sepsis model (i.e., cecal ligation and puncture, CLP) were analysed using bioinformatical methods to identify protein regulation patterns altered by sepsis [
The study reveals several major findings. By using protein network analysis software (GeneMania), we demonstrated that most of the statistically significantly regulated proteins from the heart, liver, kidney, and brain were associated with oxidoreductive activity, cell energy generation or metabolism (ATP production, tricarboxylic metabolism, glycolysis, gluconeogenesis, cell respiration, etc.), and nucleotide or nucleoside metabolism (Table
It appears plausible that in the clinical setting of sepsis there is an alteration of proteins involved in energy generation in tissues since an imbalance between oxygen delivery and consumption is a hallmark of sepsis and particularly septic shock [
Concerning lipoprotein expression, which was found to be altered in serum in our study, there is an evolving interest in the use of lipoproteins, especially high-density lipoprotein, both as a biomarker [
Hierarchical cluster analysis confirmed that protein regulation in sepsis related organs and tissues underlies a dynamic process. We found that proteins can be up- or downregulated or even remain unchanged at different time points (12 hours, 24 hours, or 48 hours) after induction of sepsis. Regarding the early phase of sepsis, that is, up to 24 hours after sepsis induction, three subclusters of organ proteins were identified which were upregulated at 12 or at 12 and 24 hours (Figure
Another major finding of our analysis was that proteins in early upregulated subclusters of the serum (Figure
In our bioinformatical analysis we sought to assess if the dynamic process of sepsis associated alterations in tissue proteome is reflected in serum proteome changes. Several subclusters of early upregulated tissue proteins could be detected, which are possible interesting candidates as sepsis biomarker, if detected in blood. Furthermore, functions and pathways in organ tissues associated with early upregulated protein clusters could be compared to altered functions in blood. However, none of the tissue proteins was found in the serum and, moreover, even none of the predicted proteins from the GeneMania network functions correlated with serum proteins. Even though no identical proteins were detected in the serum as well as in the organ tissues, our bioinformatical approach could be helpful for our understanding of the pathophysiology of sepsis. For example, the cluster analyses revealed which proteins and functions were regulated at different stages during the course of sepsis. Furthermore, one-third of statistically significantly regulated proteins can be found in the mitochondria, underlining the importance of alteration of mitochondrial functions and even mitochondrial damage in the host response to sepsis [
Even though no common protein was found in the serum as well as in organ tissue, this does not necessarily mean that the detected proteins might not be potential candidates of sepsis biomarkers. Probably, the organ-related proteins were not found in the serum because they were under the detection limit and more sensitive techniques are needed. By using network analyses we were able to predict proteins possibly involved in functions and pathways of upregulated clusters. As a result of this, the number of possible candidates for biomarkers could be increased. The detection of a single protein or a set of proteins, upregulated in organ tissue as well as in serum, would implicate further research in those proteins.
In blood plasma, numerous tissue proteins can be found. However, most of them do not contribute to the genuine blood plasma functions [
In a recent septic mouse model, the authors introduced an MS-based strategy to monitor the dynamics of tissue and cell-specific proteins in the blood plasma and constructed a proteome-wide tissue atlas to demonstrate how the surrounding tissue and cells influence the blood plasma in severe infectious diseases [
In a recent review article the authors stated that “in case of the proteomic investigation, the challenges occur at all levels ranging from sample preparation and data gathering over the raw data integration and database searching to the functional interpretation of large datasets” [
Some limitations of our study have to be mentioned. Statistically significantly regulated tissue proteins from different organs were mixed in the network analyses. Thus, we cannot be sure that the derived functions and pathways in fact correspond to these functions in the respective organs. However, the previous organ proteomics analyses of this sepsis model confirm that most of the functions are associated with energy metabolism, mitochondrial function, and lipid metabolism [
The number of functions presented in this analysis was limited by using arbitrary cutoffs for prevalence and the absolute number of proteins involved in the network. By this, functions were identified in which only a representative number of proteins was present.
Interestingly, we found no typical acute phase proteins in our analysis. This probably depends on the technical limitations of proteomic analyses. As common inflammation biomarkers are relatively small proteins and concentration even after upregulation might be low, this could explain why those typical proteins were missed in our analysis. With further advances in proteomic techniques and more sensitive methods, small and low concentrated proteins might also be detected in future.
In summary, in our stepwise comparison of dynamic organ tissue proteome changes to serum proteome changes we were able to demonstrate that regulation patterns in organ tissues as well as in serum are highly dynamic. Subclusters of proteins can be upregulated or downregulated or even remain undifferentiated at different stages of sepsis. The main functions and pathways affected in the tissue proteome were oxidoreductive activity, cell energy generation, or metabolism, whereas in the serum proteome, functions were associated with lipoproteins metabolism and, to a minor extent, with coagulation, inflammatory response, and organ regeneration. Using hierarchical cluster analyses and functional network analyses (GeneMania) including predicted network proteins, we were not able to detect correlating proteins or functions in organ tissues and blood. Furthermore, we were not able to identify promising candidates for sepsis biomarkers. Nonetheless, this analysis provides new insights into protein regulation during sepsis and this bioinformatical approach could be helpful to deal with high-throughput proteomic data.
The authors declare that there are no conflicts of interest regarding the publication of this article.
Andreas Hohn and Ivan Iovino equally contributed to the manuscript.
Suppl. Table 1: fifty-one functions filtered by absolute number (cutoff ≥ 7) from the original 159 functions derived from GeneMania network analysis of the whole dataset without the serum proteins. Column 1 shows the functions names. Columns 2 and 3 show, respectively, the number of annotated genes in the displayed network and the number of genes with that annotation in the genome. In column 5, names in bold letters represent the genes predicted by the software. Suppl. Table 2: network analysis serum functions absolute number. Thirty-three functions filtered by absolute number (cutoff ≥ 6) from the original 166 derived from GeneMania network analysis of the serum-protein dataset. Column 1 shows the functions names. Columns 2 and 3 show, respectively, the number of annotated genes in the displayed network and the number of genes with that annotation in the genome. In column 5, names in bold letters represent the genes predicted by the software. Suppl. Table 3: subcluster 2 (gpd1, eno1, aldh5a1, coro1a, atp6v1b2, ckb, alb, fasn, acy1, fbp1, fscn1, aldh7a1, cct3, gpd1, ogdh, oxct1, and ca1). Seventeen functions filtered by prevalence (cutoff ≥ 10%) from the original 51 functions derived from GeneMania network analysis of the whole dataset without the serum proteins. Column 1 shows the functions names. Columns 2 and 3 show, respectively, the number of annotated genes in the displayed network and the number of genes with that annotation in the genome. In column 5, names in bold letters represent the genes predicted by the software. Suppl. Table 4: subcluster 2 (gpd1, eno1, aldh5a1, coro1a, atp6v1b2, ckb, alb, fasn, acy1, fbp1, fscn1, aldh7a1, cct3, gpd1, ogdh, oxct1, and ca1). Nineteen functions filtered by absolute number (cutoff ≥ 5) from the original 51 functions derived from GeneMania network analysis of the whole dataset without the serum proteins. Column 1 shows the functions names. Columns 2 and 3 show, respectively, the number of annotated genes in the displayed network and the number of genes with that annotation in the genome. In column 5, names in bold letters represent the genes predicted by the software. Suppl. Table 5: subcluster 3 (gapdh, cps1, aldoa, glul, myh6, myh7, oplah, got1, and acss1). Twenty functions filtered by prevalence (cutoff ≥ 10%) from the original 90 functions derived from GeneMania network analysis of the whole dataset without the serum proteins. Column 1 shows the functions names. Columns 2 and 3 show, respectively, the number of annotated genes in the displayed network and the number of genes with that annotation in the genome. In column 5, names in bold letters represent the genes predicted by the software. Suppl. Table 6: subcluster 3 (gapdh, cps1, aldoa, glul, myh6, myh7, oplah, got1, and acss1). Twenty-seven functions filtered by absolute number (cutoff ≥ 5) from the original 90 functions derived from GeneMania network analysis of the whole dataset without the serum proteins. Column 1 shows the functions names. Columns 2 and 3 show, respectively, the number of annotated genes in the displayed network and the number of genes with that annotation in the genome. In column 5, names in bold letters represent the genes predicted by the software. Suppl. Table 7: subcluster of similarly upregulated proteins from the serum-protein dataset (c3, kng2, dpysl2, igh-6, apoa1, hp, alb, tf, gc, apoe, and cfb). Forty-four functions filtered by prevalence (cutoff ≥ 15%) from the original 190 functions derived from GeneMania network analysis of this subcluster dataset. Column 1 shows the functions names. Columns 2 and 3 show, respectively, the number of annotated genes in the displayed network and the number of genes with that annotation in the genome. In column 5, names in bold letters represent the genes predicted by the software. Suppl. Table 8: subcluster of similarly upregulated proteins from the serum-protein dataset (c3, kng2, dpysl2, igh-6, apoa1, hp, alb, tf, gc, apoe, and cfb). Fifty-nine functions filtered by absolute number (cutoff ≥ 5) from the original 190 functions derived from GeneMania network analysis of this subcluster dataset. Column 1 shows the functions names. Columns 2 and 3 show, respectively, the number of annotated genes in the displayed network and the number of genes with that annotation in the genome. In column 5, names in bold letters represent the genes predicted by the software. Suppl. Figure 1: network analysis organs without serum. In a GeneMania network analysis, each circle represents a gene. The input proteins/genes are depicted as striped circles of the same size, while the monochromatic circles, whose size is proportional to the number of interactions according to the software, can be considered as “relevant” related genes found by GeneMania searching in many large, publicly available biological datasets (including protein-protein, protein-DNA, and genetic interactions, pathways, reactions, gene and protein expression data, protein domains, and phenotypic screening profiles). Lines linking different circles can be distinguished from their colour; mainly violet represents coexpression (when expression levels are similar across conditions in a gene expression study); light orange represents predicted functional relationships between genes; light blue represents colocalization (when genes are expressed in the same tissue or proteins found in the same location); light yellow represents shared protein domains (when two gene products have the same protein domain).