Evolutionary Influenced Interaction Pattern as Indicator for the Investigation of Natural Variants Causing Nephrogenic Diabetes Insipidus

The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations.


Introduction
Integral membrane proteins are coded by 20-30% of all open reading frames of known genomes [1][2][3]. As elements in accomplishing numerous molecular processes, that is, signal transduction and passive and active transport of an extensive number of chemical compounds and ions, mutations in genes coding for membrane proteins are often linked to diseases [4]. Despite their biological importance, relatively little is known about folding, functional mechanics, and synthesis of membrane proteins [1]. This is due to experimentally costly and complex procedures, since membrane proteins are difficult to handle in lab experiments [5]. To understand correspondence between genetic mutations and the effects on protein mechanics, the development of novel theoretical approaches is highly demanded. In our work we demonstrate a high-throughput in silico approach for the investigation of the influences of genetic variations within interacting sequence part in membrane proteins, which are directly linked to nephrogenic diabetes insipidus. Nephrogenic diabetes insipidus (NDI) is a disorder which can be acquired as a side effect of surpassing drug taking or which is caused by inherited genetic mutations. Autosomal recessive and dominant inherited NDI are linked to mutations in genes encoding the integral membrane aquaporin-2 water channel [6,7]. X-linked inheritable NDI is caused by mutations in the gene encoding the AVP type-2 receptor membrane protein (V2R) [8,9]. In the general population, inherited NDI shows a low prevalence of one case per 20,000-30,000 people [10][11][12]. Aquaporin-2 water channels and V2R are essential elements in the water reabsorption through the apical cell membrane. This water composes the main part of preurine, a product that results from ultrafiltration in the kidney. The process of water reabsorption from the preurine is essential to ensure the body's fluid balance and is realised by membrane-integrated aquaporin-2 water channels. The insertion of aquaporin-2 into the human kidney cell membrane is triggered by the antidiuretic hormone, which is also referred to as arginine vasopressin (AVP). The AVP blood concentration is regulated by the controlled release of AVP in the pituitary gland which is adapted according to the body's fluid balance. In the process, the binding of AVP to V2R leads to the activation of the receptor. In this state, V2R is able to interact with the guanine nucleotide-binding G(s) subunit alpha [13,14]. Subsequently, the activation of adenylcyclase 6 takes place, leading to cAMP synthesis and increase of cAMP concentration in the cell plasma [15,16]. By means of protein kinase A, cAMP triggers the phosphorylation of aquaporin-2 molecules which are stored in cytoplasmic vesicles that have bound to the endoplasmic reticulum. The phosphorylation induces the translocation and fusion of the cytoplasmic vesicles into the plasma membrane and finally leads to the insertion of aquaporin-2 molecules into the apical membrane [17]. Inactive mutants of V2R and aquaporin-2 cause a reduced water reabsorption in the kidneys [18]. Consequences are the typical symptoms of NDI, for example, sensorineural deafness, urinary tract anatomy, ataxia, peripheral neuropathy, mental retardation, psychiatric illness, a daily output of 15-20 L highly dilute (<100 mOsmol/kg) urine (polyuria), and compensatory excessive liquid intake [18][19][20]. In newborn infants, NDI is characterized by dehydration symptoms, irritability, and poor feeding as well as poor weight gain. A schematic illustration of these molecular coherences is given in Figure 1. The direct inspection of the aquaporin-2 gene as well as the V2 receptor gene (AVPR2) has become accomplishable in clinical practice [21] for differential NDI diagnosis and has been substituting dehydration testing over the last years [18].

Materials and Methods
As the first step, we want to realise a task which is involved in the prediction of homologue sequence parts within transmembrane -helices. This means that aquaporin specific evolutionary interaction pattern pairs (EIPPs) were generated as described in current work of [22]. In this work, Grunert and Labudde show that the combination of interaction information and sequence motifs with evolutionary variation can be used for 3D structure prediction. They obtained key information from homologue sequences to separate and predict membrane protein structures in the context of interacting pattern and their evolutionary variation. Patterns as motif representatives are investigated for evolutionary covariation. Here, a motif has been described in previous work of [23] and can be written in a generalized, regular expression-like form of XY , where X and Y correspond to amino acids separated by − 1 highly variable positions. Interaction information contributes to detecting interacting pattern with evolutionary background. This means that evolutionary variation at pattern positions was marked as X. Here, different mutation types like that described in [22] may apply at specific X-position. Subsequently, in this work recently published proteins with PDB-Ids: 4nef, 4oj2 were used to transfer family specific EIPPs to these aquaporin-2 representative proteins. For mention, the protein structure (PDB-Id: 4nef, 4oj2) was published by Frick et al. [24] and Vahedi-Faridi et al. [25] Beyond, both protein structures were considered as unknown structures at time of EIPP generation caused by missing Pfam entries. This led to no consideration of both proteins by EIPPs generation. Aquaporin specific EIPPs were derived from known structures of the corresponding PF00230 family. After obtaining of EIPPs, they were employed to generate interaction block schemes (IBSs). Here we try to illustrate that IBSs are useful graphical visualisation media to represent different interacting patterns which distinguish evolutionary. More specifically, we are able to show if a mutation within a pattern has influence on the evolutionary variability of the interacting counterpart. Eventually, IBSs can be used to support the understanding of the three-dimensional fold for the respective interaction partners and the whole protein structure. Moreover, transmembrane helical information was derived from PDBTM [26] for the proteins to be investigated (PDB-Ids: 4nef, 4oj2). Afterwards, EIPPs were applied on helices and sequence similarity of the incurred interacting ranges compared to known structures of the other family members was calculated. For further investigation, mutations occurring in nephrogenic diabetes insipidus patients were aggregated from recently published works [27][28][29][30][31][32][33][34] and registered on sequences of proteins to be investigated. These natural variants of NDI can also be obtained from UniProt (http://www.uniprot.org/uniprot/P41181). Finally, IBSs were applied to similar sequence parts which include NDI mutational effects.

Evolutionary Variations within EIPPs.
In this section we describe a method to derive variation at X-positions from evolutionary sequence record. To realise this task, the full unknown seed structure dataset (9641 proteins) of the representative family (PF00230) was derived from Pfam database [35]. Transmembrane helical information was obtained using TMHMM Server v.2.0 [36]. A variety of methods have been developed to predict structural features from sequence, such as -helical membrane-spanning helices and extra/intracellular domains. Basically, TMHMM performs a prediction of intra/extracellular regions and integral membrane helices based on sequence. Beside perresidue predictions TMHMM also lists underlying perresidue assignment probabilities as an indicator of prediction uncertainty. Consequently, helical information was used to apply our derived EIPPs at unknown structures. Here, Xpositions were investigated in a closer way when both existing EIPP counterparts were registered in different helices. For the detecting of new evolutionary variations, the amino acid occupancy from unknown structure information was used to compare with amino acids of known structures at specific X-positions. At last, new amino acids at variable X-position were registered and added. Finally, with this method we are able to extend evolutionary information within interacting sequence parts which can be used for further mutational or  Figure 1: (a) In normally regulated water absorption in kidney cells, the antidiuretic hormone arginine vasopressin (AVP) is released in the pituitary gland, binds to the V2 receptor (V2R), and subsequently induces a series of phosphorylation reactions which lead to the insertion of aquaporin-2 water channels in the apical membrane that allow water molecules to pass the membrane. (b) Genetic mutations in the gene encoding V2R lead to reduced binding affinity and protein stability in V2R. Dysfunctional V2R mutants cause a significantly reduced amount of inserted aquaporin-2 proteins and thus decrease the water flux through the apical membrane. On the other hand, dysfunctional aquaporin-2 mutants decrease the water reabsorption as well (see (c)). Reduced water reabsorption is directly linked to an increased output of highly diluted urine (polyuria) and excessive drinking (polydipsia) which are the most severe symptoms observable in nephrogenic diabetes insipidus patients [12,18,20].

Results and Discussion
Our structure prediction shows, if an unknown structure tends to a family affiliation, family specific EIPPs have to resurface on the protein sequence. Here, EIPPs were derived from known crystal structures of the aquaporin family (PF00230) and marked to -helical structure of recently published aquaporin-2 representative proteins (PDB-Ids: 4nef, 4oj2). As mentioned before, aquaporin-2 representatives have not been considered in the EIPP generation process and make them to transparent unknown structures. Similarity results are shown in Figures 2 and 3 and confirm the already enlightened family affiliation and predicted structures. This means in all TM-helices EIPPs could be found and cover the helical range with up to 100% as listed in Table 1.
Here our prediction results explain the mightiness of EIPPs. On the one hand they provide useful and powerful information to predict -helical structures within the transmembrane environment of homologue membrane proteins. On the other hand, we are able to describe selected interacting areas which are constrained by evolution. To evaluate this assumption, different IBSs were generated and applied to highly conserved interacting sequence parts which were derived from Pfam HMM-logos [35]. One IBS example is shown in Figure 4 which illustrates two interacting patterns. These patterns are part of the aquaporin-2 representative protein with PDB-Id: 4nef. Within an X-positioned pattern, we are able to register evolutionarily designed variable positions. Our IBSs additionally show that an interaction with another pattern takes place. In this work, the goal was not to show which pattern position is involved in spatial interaction but rather to show that two patterns build an interacting block. Figure 4 shows examples of variable positions, which can be occupied by different natural variants. With our IBSs, we can show that an interaction between two blocks is given, when the respective positions are occupied by the Uniprot-Id: P41181 Natural variants causing NDI amino acids. This implies that a TG9-GL8 interaction is given with Phe23-Ala101 or Phe26 and Ala101 or in the case of NDI caused by Leu28Pro [28], a TG9-GL8 interaction is given with Leu28 and Val102 (red coloured amino acid occupation) at specific positions. Here, IBSs give us a quick overview, that within or across a block evolutionary changes take place. That a destabilizing amino acid substitution is compensated by another position over the evolutionary time scale has been explained in detail in previous work of Morcos et al. [37]. Mutational variation information at specific X-positions can close this gap. This leads to further results in our work, the detection of new X-positioned variations caused by evolution. Here, many variations of new possible amino acid substitutions within different sequence pattern could be obtained. As one example, the TG9 motif representative pattern TLIXVFFXXG is given. For this, Xpositions can be occupied with the following amino acids: X3F, X3L, X7G, X8V, X7A, X8P, and X8L, which lead to a final regular expression similar to PROSITE [38][39][40][41]

Conclusion
In the present work, we have applied a new approach for extracting short, spatially interacting amino acid sequence parts, so-called evolutionary interaction pattern pairs (EEIPs), from known structures of membrane proteins, more specifically aquaporin water channel proteins. Based on EIPPs, structure similarity of recently published aquaporin-2 representative proteins was determined and this in silico analysis confirms the aquaporin family affiliation. EIPPs were obtained and employed to generate interaction block schemes of highly conserved sequence parts annotated with natural variants caused by diabetes insipidus. Newly amino acid variations have been discovered. In our further works we will prove the relevance. In conclusion, it is a fact that disease patterns play an important role in membrane proteins but currently few involved structures are available. Different works have shown mutations on a membrane protein sequence influencing disease patterns. Besides, mutations are used in the diagnosis of biomarkers. However, the application of interaction block schemes can lead to better indicators and this in silico analysis can support laboratory mutagen investigations.