Integrative Bioinformatics Analysis of Proteins Associated with the Cardiorenal Syndrome

The cardiorenal syndrome refers to the coexistence of kidney and cardiovascular disease, where cardiovascular events are the most common cause of death in patients with chronic kidney disease. Both, cardiovascular as well as kidney diseases have been extensively analyzed on a molecular level, resulting in molecular features and associated processes indicating a cross-talk of the two disease etiologies on a pathophysiological level. In order to gain a comprehensive picture of molecular factors contributing to the bidirectional interplay between kidney and cardiovascular system, we mined the scientific literature for molecular features reported as associated with the cardiorenal syndrome, resulting in 280 unique genes/proteins. These features were then analyzed on the level of molecular processes and pathways utilizing various types of protein interaction networks. Next to well established molecular features associated with the renin-angiotensin system numerous proteins involved in signal transduction and cell communication were found, involving specific molecular functions covering receptor binding with natriuretic peptide receptor and ligands as well known example. An integrated analysis of identified features pinpointed a protein interaction network involving mediators of hemodynamic change and an accumulation of features associated with the endothelin and VEGF signaling pathway. Some of these features may function as novel therapeutic targets.


Introduction
The risk of developing cardiovascular disease (CVD) is dramatically increased in patients with chronic kidney diseases (CKDs). Mortality as a consequence of cardiovascular events is 10 to 30 times higher in patients on dialysis treatment than in the general population [1]. Due to this recognition of CVD as the leading cause of morbidity and mortality in patients with reduced kidney function, a growing body of literature has become available regarding this link of CKD and CVD, termed as cardiorenal syndrome (CRS).
CRS can be classified into five subtypes depending on the origin of damage (either the cardiovascular system or the kidney) and the course of disease (either acute or chronic) [2,3]. Major mechanisms leading to CRS1 and CRS2 (acute and chronic cardiorenal syndrome) include hemodynamically mediated damage, hormonal factors, immune-mediated damage, low cardiac output, endothelial dysfunction, and chronic hypoperfusion. Hallmarks of kidney dysfunction leading to CRS3 and CRS4 (acute and chronic renocardiac syndrome) on the other hand are volume expansion, drop of the glomerular filtration rate, humoral signaling, anemia, uremic toxins, and inflammation. The fifth subtype of the cardiorenal syndrome (CRS5) describes the secondary cardiorenal syndrome which refers to systemic diseases such as diabetes that ultimately lead to simultaneous cardiovascular and kidney dysfunction.
The multitude of cardiac risk factors in patients with chronic kidney disease is complex and increases with age, the stage of kidney disease, and the level of proteinuria. 2 International Journal of Nephrology Another powerful risk factor is hypertension which goes along with sodium retention, and activation of the reninangiotensin system. Atherosclerosis results from an impairment of endothelial function which, in turn, is associated with albuminuria. Changes in blood-lipid composition and oxidative stress as a consequence of inflammation due to renal dysfunction also contribute to endothelial dysfunction and subsequent CVD [4].
Management and therapy of the CRS is challenging since drugs in use for the treatment of cardiovascular diseases may go along with impairment of kidney function and vice versa. Examples include diuretics, ionotropes, angiotensinconverting enzyme inhibitors, angiotensin receptor blockers, or natriuretic peptides but treatment decision must be based on a combination of individual patient information and understanding of individual treatment options [5].
Biomarkers of relevance in the context of the CRS mainly hold proteins known either in the field of nephrology or cardiology, for the latter including, for example, the family of natriuretic peptides and troponins, whereas frequently reported renal-specific markers include neutrophil gelatinase-associated lipocalin (NGAL), kidney injury molecule 1 (KIM1), Cystatin C, interleukin 18 (IL18), and N-acetyl-β-D-glucosaminidase [6]. Levels of circulating fibroblast growth factor 23 (FGF-23) for example have been shown to be independently associated with left ventricular mass index and left ventricular hypertrophy in patients with CKD [7]. Chung and colleagues described the relationship between activation of matrix metalloproteinase 2 (MMP2) and elastic fiber degeneration, stiffening, medial calcification, and vasomotor dysfunction in macroarterial vasculature of dialyzed CKD patients [8]. Next to these proteins, a multitude of other molecular features is mentioned in the literature in the context of the cardiorenal syndrome. Perco et al. reported a list of 31 CVD biomarkers that were extracted from the literature and characterized with respect to biological function, gene expression in CKD, and known protein-protein interactions [9].
Literature mining approaches have the potential to reveal such biomarkers, thus providing a more global picture on genes, proteins, and metabolites associated with a specific disease. The biomedical literature can be seen as the condensed result of the combined effort of the scientific community, and as such represents the primary resource upon which further investigations may be based on. As such, it represents the primary resource upon which further investigations may be based on. PubMed, for instance, presently holds close to 20 million abstracts. Thus, computational literature mining tools assisting researchers in keeping pace with this ever-growing amount of fast changing information became indispensable [10,11].
In the context of drug discovery, the most prevailing approach is based on concept cooccurrence [12,13]. Here, a disease profile consisting of the concepts (e.g., drugs, genes, etc.) which are frequently mentioned together with the disease under analysis can be derived via text mining. Likewise, literature-based profiles for drugs or genes can be generated. Next to conveniently reaching an overview on biomarkers this information base may additionally be used to gain hints about yet undiscovered dependencies between diseases, drugs, and potential drug targets.
To further enhance text mining efforts, several "controlled vocabularies" ("ontologies") have been developed to allow a precise definition of the employed concepts [14]. The most popular ones are maintained by the U.S. Library of Medicine, namely, the Unified Medical Language System (UMLS) and the Medical Subject Headings (MeSH). Given that the majority of PubMed articles are indexed with MeSH, a fast and accurate extraction of biomedical concepts has become feasible [13,15]. With the advent of literature mining approaches also in combination with high-throughput Omics experiments, a number of bioinformatics tools and ontologies have been developed for the analysis of resulting large sets of genes or proteins. Analyzing extended sets of biomarker candidates on the level of molecular pathways and processes, represented as protein interaction networks, adds another layer of information for the interpretation of molecular feature (biomarker) sets.
A recent review by Lusis and colleagues summarized studies dealing with network analyses in cardiovascular disease [16]. Networks based on prior knowledge, such as existing pathway sources, literature cocitations, or other correlation measures as coexpression and sequence similarity were outlined by Ashley et al. [17], who mapped genes being differentially regulated between patients suffering from denovo atherosclerosis and in-stent restenosis on a cocitation network obtained by literature mining of Medline abstracts. Similar concepts can be followed by utilizing networks derived from physical protein interactions, or networks generated from measuring the response to experimental perturbations. Further approaches include system genetics and detailed analyses at the level of dynamic systems such as flux balance analyses which are often used to characterize enzymatic reactions in dynamic models of metabolism. Some of these approaches, especially highly abstracted network models on the level of phenotypes, managed to predict comorbidity patterns for myocardial infarction using a "human disease network" thus closing the gap to clinical applications [18].
Diez et al. presented another application of the network paradigm to reveal the mechanisms of cardiovascular disease, identifying a set of differentially expressed genes separating asymptomatic from symptomatic carotid stenosis patients [19]. Based on these transcriptomics data, a correlation network was generated. Furthermore, an association network of the differentially regulated genes was derived by mining the literature for gene associations thus resulting in an interaction network combining Omics data and associated features extracted from the literature. Subnetworks were identified, characterized by enriched lipid-, immune-, and atherogenesis-related pathways and gene ontology terms. On this level of representation, the interplay of APOC1 (a gene that is linked to coronary heart disease) became evident. Weiss et al. investigated networks on cardiovascular metabolism pointing out aspects of network structure, namely, differences between designed networks in engineering and networks having undergone an evolutionary process [20]. Based on the level of abstraction, three types of  network on cardiovascular metabolism were proposed: first, on the very abstract level of nodes and edges, metabolite networks described by using topological characteristics [21,22], second physical, spatially compartmentalized networks including the description of energy fluxes in the network [23,24], and on a third level dynamic networks [25][26][27]. The present knowledge regarding mechanisms leading to the formation of the CRS suggests a critical role for hemodynamic changes, originating either from the kidney or the cardiovascular system. In the following analysis, we used a literature mining approach to extract genes and proteins reported in the context of the cardiorenal syndrome, and analyzed these features on the level of protein interaction networks. Specific focus was laid on secreted proteins being specifically expressed in either renal or vascular tissue with the aim to identify molecular mediators potentially contributing to the cross-talk between the kidney and the cardiovascular system for allowing identification of novel therapeutic targets addressing both systems.

Materials and Methods
The general analysis strategy applied in this work is outlined in Figure 1. Major components include feature extraction via literature mining, followed by a range of bioinformatics analysis procedures for deciphering characteristics of individual features as well as joint interpretation on the level of protein interaction networks.

Literature Mining.
The strength but also the challenge of biomedical text mining relies on the fact that the scientific literature embraces a variety of concepts (genes, drugs, diseases, etc.) which in turn are interrelated in a variety of ways. Thus, carefully designed text mining methods are needed to extract "meaningful" information and reduce the amount of noise present in the final results.
In general, text mining consists of two steps: Information Retrieval (IR) and Information Extraction (IE) [10]. The first consists in identifying documents which are of relevance for a certain research objective (e.g., a PubMed query for "cardiorenal"), whereas the later is used to extract facts from these documents. Named Entity Recognition (NER) can be seen as the most prevalent type of IE used in real world applications, aiming at the identification of biological entities like genes, cell types, or drugs.
Even though the concept of NER might appear almost trivial at a first glance, it actually represents a challenging computational problem as the existence of over fifty available tools demonstrates [28]. The key obstacle that needs to be addressed when extracting genes or proteins from free text relies in the term ambiguity present at multiple levels. Some genes are spelled like normal English words (e.g., "WAS" with the NCBI GeneID: 7454) and even a gene with the official Gene Symbol "T" exists (NCBI GeneID: 6862). The same gene may additionally be referred to in various ways due to different naming conventions.
Ultimately, this ambiguities lead to two different types of errors which all methods are confronted with: erratically assuming that a certain gene was mentioned in a paper (false positive) or erratically assuming that it was not mentioned, even though it actually was given (false negative) [29]. Based on the trade-off between these two types of errors, the precision of a method (i.e., how much of the predicted genes were actually mentioned in the document) and its recall (i.e., how much of all actually mentioned genes were also identified as such) are determined.
We chose a method favoring precision over recall for mining genes/proteins in Medline/PubMed abstracts. The Fast Automated Biomedical Literature Extraction (FABLE) tool available at http://fable.chop.edu/ was used in order to fulfill this task. The algorithm basically consists of two steps: first, a statistical classifier was used to train a probabilistic model, which served as basis for gene tagging, that is, to identify possible occurrences of a gene, taking the textual context into account. Given that such an occurrence exhibits a sufficient likelihood of actually representing a gene, this occurrence was normalized in a second step to the official Gene Symbol. This normalization step was based on gene synonym lists, which were compared to the predicted occurrence using both exact and relaxed pattern matching procedures. It has been shown that this approach is competitive to alternative methods such as standard information extraction techniques and direct pattern matching both in terms of precision and recall [30,31]. We applied this procedures for all papers retrieved from PubMed associated with "cardiorenal" (PubMed status as of March 2010).

Functional Annotation of Identified
Genes/Proteins. The list of genes and proteins identified on the basis of the literature mining approach was in a first step annotated using the Stanford Source tool [32]. The set of genes was assigned to biological processes, pathways, and molecular functions using the PANTHER (Protein Analysis through Evolutionary Relationships) Classification System [33,34]. Significantly enriched categories were identified using the whole human genome as a reference dataset. Biological processes, pathways, and molecular functions showing Pvalues below .0001 were considered as statistically significant in terms of feature enrichment.
The subcellular location of proteins was determined using experimental data provided by SwissProt [35]. For proteins not covered in SwissProt, in-silico predictions using WoLF PSORT were done [36]. WoLF PSORT computes probabilities based on the protein sequence of a given protein for ten subcellular locations. Subcellular location tags from SwissProt were mapped to the ten locations defined by WoLF PSORT. Only assignments that were either reported in SwissProt or showed a probability value of 1 according to WoLF PSORT were considered for subcellular location enrichment analysis. Based on a reference dataset of 45,008 proteins assigned to one of the WoLF PSORT categories, the significance of enrichment was calculated using the Fisher's exact test. P-values below .01 were considered as statistically significant.
Information on tissue-specific expression patterns was extracted from NCBI UniGene EST profiles. EST counts of in total 45 tissues were extracted for each gene. Tissue-specific expression patterns for each single tissue for each single gene were calculated based on the normalized transcripts per million counts as provided by UniGene [37].

Network Analysis Framework.
For network analysis, we used an extended version of the protein dependency network "omicsNET" as described in Bernthaler et al. [38]. The network is comprised of information from proteinprotein interactions, tissue-specific reference coexpression, shared pathway information, gene ontology distance, and subcellular colocalization, and was extended by networks generated from shared transcription factor binding sites and shared miRNA target sites. In omicsNET, these sources were consolidated into a single human protein reference interaction network, where edges represent pairwise dependencies between proteins.
Protein-protein dependencies were calculated between proteins in the list resulting from the literature mining approach. Furthermore, highly connected subgraphs were identified and functionally annotated. We only considered dependencies with high confidence in the network construction process and focused on genes reported at least twice in the scientific literature in the context of the cardiorenal syndrome in order to reduce the number of false positive assignments.

Identification of Drug Targets.
Drug targets were identified in our set of 280 literature-derived proteins using information from DrugBank [39,40]. DrugBank combines information on drugs and their molecular targets and currently contains around 4800 drug entities with more than 1350 FDA-approved small molecule drugs and more than 2500 protein drug targets.

Results and Discussion
3.1. Literature Mining. 825 papers associated with the term "cardiorenal" were identified in PubMed. In this set of 825 papers, 280 genes could be extracted utilizing FABLE, with 132 genes being reported at least twice. The top ranked gene, mentioned in 156 articles, was the aspartyl protease renin (REN), followed by the natriuretic peptide precurser A (NPPA), and angiotensinogen (AGT), with 122 and 64 reports, respectively.
The list of 54 genes mentioned in at least 5 articles along with the term cardiorenal is provided in Table 1 (see supplementary Table 1 for the total list of 280 genes in Supplementary Material available online at doi:10.4061/2011/809378). Next to the number of articles, the relative expression levels in the four tissues blood, heart, vascular, and kidney are provided based on data from the UniGene expressed sequence tag counts.
The top ranked feature in the list of 280 literature derived genes is renin (REN) which is secreted by cells of the juxtaglomerular apparatus of the kidney and plays a key role in the blood pressure and water balanceregulating renin-angiotensin system (RAS). The connection between CRS and an increased activity of this hormone system was first reported in 1971 [41] and its consequences like renal hypoxia, vasoconstriction, intraglomerular hypertension, glomerulosclerosis, tubulointerstitial fibrosis, and proteinuria continue to be demonstrated in clinical practice. Conservative therapy for blocking the RAS activity is the administration of angiotensin-converting enzyme inhibitors and angiotensin receptor blockers, but recent studies demonstrate the benefit of a combination with direct renin inhibitors [42].
Further genes frequently reported in association with CRS are the components of the natriuretic peptide system (NPS) NPPA and NPPB, as well as their receptors NPR1, NPR2, and NPR3. Functions of the NPS include the counterregulation of RAS, and it is suggested that its activation provides organ protection in cardiorenal disease, especially in diabetic patients [43].

Functional Annotation.
According to the PANTHER Classification System, the biological processes of "signal transduction" and "cell communication" were identified as most significantly enriched, with 135 and 136 genes assigned to these categories, respectively. In total, 28 processes showed a P-value > .0001 in terms of enrichment, including "blood circulation", "regulation of vasoconstriction", and "angiogenesis". The most significantly enriched molecular functions are "receptor binding" and "protein binding" ( Table 2).
The two enriched categories "receptor binding" and "receptor activity" indicate that numerous receptors and ligands are involved in the cardiorenal syndrome. These receptors form the first line of molecules in a number of signaling cascades, which as such is another category   enriched in genes associated with the cardiorenal syndrome. We therefore took a closer look at receptor-ligand interactions. We searched for receptors mainly expressed in the cardiovascular system having ligands predominantly secreted by the renal tissue, and vice versa. The natriuretic peptide receptor NPR3 showed high expression in kidney tissue, whereas the ligands NPPA and NPPB were found to be almost exclusively expressed in the heart. Thus, a deregulation of blood pressure maintenance and extracellular fluid volume by heart-derived ligands of the natriuretic peptide system directly affect the kidney and may contribute to the formation of CRS.
Enrichment of the process "regulation of vasoconstriction" reflects the consequences of impaired heart function including a decreased cardiac output, and thus the hypoperfusion of organs. Since glomerular filtration is controlled by blood pressure, hypoperfusion of the kidney leads to the activation of the RAS and subsequent vasoconstriction, which, in turn, causes systemic hypertension and an increased heart preload [2]. 22 PANTHER pathways could be identified as significantly enriched in the list of 280 literature-derived genes. 28 genes could be assigned to "angiogenesis", 21 genes to "endothelin mediated signaling", and 15 genes to the "VEGF signaling pathway" ( Table 3).
The connection between angiogenic processes and cardiovascular disorders is well understood, since decreased cardiac output goes along with decreased organ perfusion, and vascularization is the natural response to diminution of blood supply. Apart from negative effects on organ function due to hypoperfusion, microvascularization is extensively performed at sites of inflammation which explains the role of angiogenesis in diseased kidney tissue. On the other hand, decreased vascularization and loss of capillaries lead to kidney fibrosis. However, deregulation of angiogenesis seems to be crucial for kidney function and a key regulatory mechanism of angiogenic processes is the VEGF signaling pathway [44][45][46]. A third enriched pathway is the "endothelin signaling pathway" which is known to regulate the renin-angiotensin system thus being a further player in Table 3: List of enriched biological pathways. Given are the total number of genes assigned to a process/function, the number of genes assigned as derived from literature mining for CRS, the number of genes expected from a statistical perspective, and the significance level of enrichment.

Pathway
No the hemodynamic cross-talk between the kidney and the cardiovascular system. Following the rationale that features secreted from kidney cells may lead to damage in vessels and vice versa, literature-derived proteins were classified in terms of subcellular location. The most significantly enriched compartment was "extracellular, including cell wall" with 81 genes being assigned to this category, whereas "nuclear" was significantly depleted with 48 genes as indicated in Figure 2 .
The list of 81 secreted genes included components of the renin-angiotensin system (REN, AGT, ACE) and the natriuretic peptide system (NPPA, NPPB), as well as some other regulators of vasoconstriction. Kininogen 1 (KNG1) for example is essential for the assembly of the blood pressure regulating kallikrein-kinin system. Another molecule serving as a vasodilator is the peptide hormone calcitonin-related polypeptide alpha (CALCA).

Network Analysis.
A subset of 40 proteins out of the list of 132 proteins mentioned in at least two publications in the context of the cardiorenal syndrome formed a highly connected protein interaction network as given in Figure 3 . The main components of this protein network are mediators of hemodynamic change. An accumulation of features involved in previously described signaling pathways like the endothelin signaling pathway or the VEGF signaling pathway is evident. Next to these two pathways, a number of members of the blood pressure regulating kallikrein-kinin system and the renin-angiotensin system are part of this network.
Another highly connected cluster holds genes associated with leukocyte transendothelial migration. The process of leukocyte migration from blood into tissues is vital for inflammation, and it is known that inflammation is an important cardiorenal connector and a hallmark of kidney and heart diseases [5].

Identification of Drug
Targets. 116 out of the 280 proteins associated with the CRS were listed as drug target for at least one drug in DrugBank (see supplementary Table 1). The proteins with the most number of drugs were PTGS1, PTGS2, and NOS3 with 49, 43, and 41 drugs associated. The drug with the most drug targets in our list of 280 proteins was NADH.
Standard therapeutic regimes in the context of cardiovascular and kidney disease included aliskiren, irbesartan, or ramipril. Another drug candidate is nesiritide, a recombinant B-type natriuretic peptide that counter-regulates the RAS, as used in the treatment of acute decompensated heart failure (ADHF). However, on the basis of a prospective, randomized, double-blinded, placebo-controlled clinical trial, Witteles et al. concluded that nesiritide therapy does not impact renal function in patients with ADHF and preexisting renal dysfunction [47].
It is known that reducing blood pressure has beneficial effects on renal function and there is a multitude of antihypertensive agents acting on the RAS. Administration of angiotensin receptor antagonists in combination with angiotensin-converting enzyme inhibitors showed a significant reduction of urine albumin creatinine ratio in patients with hypertension and microalbuminuria and thus, a reduction of the risk for myocardial infarction [48].
Further potential targets for regulation of hemodynamics are members of the endothelin signaling pathway. Endothelin receptor antagonists are used in the treatment of a variety of cardiovascular conditions but less is known about the effects on combined kidney dysfunction. Ding et al. showed in animal models that chronic endothelin receptor blockade with endothelin receptor antagonists is beneficial in the treatment of progressive renal dysfunction and sodium retention associated with chronic heart failure [49]. Studies in humans are required to fully elucidate the effects and risks of endothelin receptor antagonist treatment in patients with CRS.

Conclusions
In this work, we provide a comprehensive list of genes/proteins associated with the cardiorenal syndrome identified on the basis of a literature mining approach. On the basis of 825 articles identified in the context of CRS, 280 unique genes could be identified and were further characterized with respect to molecular function, biological processes, cellular pathways, subcellular location, tissue-specific expression, as well as on the level of protein interaction networks. The most frequently reported genes are involved in blood pressure regulating systems, particularly in the reninangiotensin system (REN, AGT, ACE), as well as in the antagonistic natriuretic peptide system (NPPA, NPPB). Enriched molecular functions include "receptor binding" and "receptor activity". Of special note in this context are again players of the natriuretic peptide system, namely, the two ligands NPPA and NPPB and its receptor NPR3. Tissuespecific expression patterns of these molecules showed that NPPA and NPPB are mainly expressed in the heart, whereas their receptor NPR3 is highly expressed in kidney tissue, suggesting that this regulatory system is part of the cross-talk between the kidney and the cardiovascular system.
Therapy of the CRS is largely focused on natriuretic peptides or the renin-angiotensin system with a number of other molecular targets like the endothelin signaling pathway holding promise for future therapeutic strategies.
Altogether, the results of the present study strongly indicate the critical role of hemodynamic changes, blood pressure regulating hormone systems, and inflammatory processes in the formation of the CRS. Our analyses led to a comprehensive picture of molecular features involved in the functional interplay between the kidney and the cardiovascular system. One limitation of this automated literature mining approach is that we do not have experimental data on the expression levels of the reported molecules in the process of disease development. An obvious next step would therefore be to integrate the findings of this work with Omics datasets on kidney disease as well as vascular diseases. Such a combined approach has the potential to identify deregulated features for potentially identifying novel players for diagnostic or therapeutic approaches in the field of kidney and cardiovascular diseases.