Interplay between Peptide Bond Geometrical Parameters in Nonglobular Structural Contexts

Several investigations performed in the last two decades have unveiled that geometrical parameters of protein backbone show a remarkable variability. Although these studies have provided interesting insights into one of the basic aspects of protein structure, they have been conducted on globular and water-soluble proteins. We report here a detailed analysis of backbone geometrical parameters in nonglobular proteins/peptides. We considered membrane proteins and two distinct fibrous systems (amyloid-forming and collagen-like peptides). Present data show that in these systems the local conformation plays a major role in dictating the amplitude of the bond angle N-Cα-C and the propensity of the peptide bond to adopt planar/nonplanar states. Since the trends detected here are in line with the concept of the mutual influence of local geometry and conformation previously established for globular and water-soluble proteins, our analysis demonstrates that the interplay of backbone geometrical parameters is an intrinsic and general property of protein/peptide structures that is preserved also in nonglobular contexts. For amyloid-forming peptides significant distortions of the N-Cα-C bond angle, indicative of sterical hidden strain, may occur in correspondence with side chain interdigitation. The correlation between the dihedral angles Δω/ψ in collagen-like models may have interesting implications for triple helix stability.


Introduction
Protein three-dimensional structures are characterized by a high level of complexity [1]. Intriguingly, this complexity is coupled with marginal thermodynamic stabilities. Indeed, these intricate structures are the result of the delicate balance of a variety of different contributions. Even minor changes may undermine the overall organization of these macromolecules. In this framework, a comprehensive description of protein structures should not neglect subtle details that may become fundamental in specific situations.
In the last two decades there has been considerable interest in the analysis of protein backbone geometry. Several independent investigations have provided a plastic view of the protein backbone geometry [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18]. These statistical and quantum chemical investigations have highlighted an interesting interplay between them. In particular, it has been shown that many backbone bond angles strongly depend on the local conformations [2,10]. Similarly, the local conformation also affects peptide bond distortions from planarity and the pyramidalization of the carbonyl carbon atom [5,19]. Finally, a correlation between bond distances such as CO and CN has been detected in ultrahigh resolution protein structures [4]. It is important to note, however, that correlations between backbone geometrical parameters have been disclosed by performing analyses on protein structure databases essentially made of globular proteins. Since protein structures are extremely sensitive to the local context, we here evaluated the occurrence of such correlations in nonglobular systems. In particular, we considered the transmembrane regions of membrane proteins and two different fibrous models: the collagen triple helix [20][21][22] and the stericzipper motif exhibited by amyloid-like peptides [23,24]. These systems were selected to evaluate the impact on the protein geometry variability exerted by (a) the polarity of the medium in which proteins are immersed and (b) the structural strain present in fibrous proteins. The feasibility of the present study has been facilitated by the recent progresses of the crystallographic techniques, which have increased the number and the accuracy of structures of these nonglobular systems. It is worth mentioning that, although these proteins are underrepresented in current structure databases, they are very frequent in nature. Indeed, collagen is the most abundant protein in vertebrates [25]. Moreover, it has been estimated that membrane proteins may represent nearly 40% of the total protein content in the human genome [26]. Finally, recent data show that the tendency to form amyloid-like fibers is a rather common property of proteins [27].

Ensembles and Definition.
Statistical surveys of peptide bond geometrical parameters were performed on specific classes of protein/peptide structures reported in the Protein Data Bank (PDB) (release of March 2013) [28]. As detailed below, for each class different selection criteria were applied.
(b) Amyloid-Like Peptides. The following twenty-eight stericzipper structures present in the PDB were identified by searching manually the database using the literature information: 1yjo, 1yjp, 2okz, 2ol9, 2olx, 2on9, 2onv, 2onw, 2onx, 3dg1, 3dgj, 3fod, 3fpo, 3fqp, 3ftk, 3ftl, 3ftr, 3fva, 3hyd, 3nhc, 3loz, 3nvh, 3nvg, 3nvf, 3nve, 3ow9, 3q2x, and 3pzz. Only structures refined at a resolution better than 1.8Å were considered. A single copy of each peptide was considered when multiple copies were present in the asymmetric unit of the crystal. The distribution of the resolution of these structures is reported in Figure S1B. From these structures 114 residues were selected by omitting terminal aminoacids that are generally charged in these peptides.

Results and Discussion
To evaluate the occurrence of correlations between peptide geometry and conformation in nonglobular contexts we here analyzed structures of membrane and fibrous (amyloid-like and collagen) peptides/proteins. Although, in principle, , C , and N-C -C depend on both and , the analysis of the literature trends clearly shows a crucial role of the angle in modulating these parameters [2,6,7]. Therefore, all subsequent analyses were carried as a function of .
(a) Membrane Proteins. The detection of correlations between peptide geometry and conformation typically requires highly accurate protein structures that are refined against ultrahigh resolution diffraction data. However, the conformational dependency of some geometrical parameters can also be detected in structures determined at medium-high resolution (∼2.0Å). This holds for (a) the N-C -C bond angle and for (b) the Δ / correlation [2,5]. This observation offers the possibility to check the occurrence of these correlations also in proteins whose structures were generally determined at moderate resolution such as membrane proteins.
As reported in detail in the Methods section, we conducted these analyses on a nonredundant set of membrane proteins, sharing sequence identities lower than 40%. Only the transmembrane segments of these proteins were selected. As expected, most of the residues of these regions adopt either -helical (2415 residues) or -sheet (1133 residues) conformations ( Figure S2). A minor fraction of residues (75 residues) adopting the structure of a 3-10 helix was also observed. As shown in Figure 2, there is a clear dependency of N-C -C on the angle. In particular, the N-C -C angle assumes the largest values in 3-10 helices (112.8 ∘ ± 2.2 ∘ ) and the smallest ones in -sheets (108.6 ∘ ± 2.9 ∘ ) ( Figure S3). Intermediate values are detected for -helices (111.3 ∘ ± 2.0 ∘ ) ( Figure S3). The significance of these differences has been evaluated by using the two-sided two-sample Student's t-test.
These analyses indicate that the differences between mean values of the pairs -helix/3-10 helix, -helix/ -sheet, and 3-10 helix/ -sheet are statistically significant at = 0.001. Previous studies carried out on globular proteins have shown that assumes the largest values when approaches to 0 ∘ [2,6]. A second minor maximum is assumed for values 150 ∘ -180 ∘ whereas minimal values are observed for ∼ 90 ∘ . Our data show that the average values of the regions characterized by −5 ∘ < < 5 ∘ , 160 ∘ < < 170 ∘ , and 85 ∘ < < 95 ∘ are 114.2 ∘ ± 2.6 ∘ , 110.5 ∘ ± 2.4 ∘ , and 109.0 ∘ ± 4.3 ∘ , respectively. Therefore, the N-C -C variability observed in membrane proteins perfectly fits into the scheme derived for globular proteins. This suggests that the local polar/apolar environment does not play any role in modulating this geometrical parameter.
The analysis of the / values indicates the occurrence of this correlation in membrane proteins (Figure 3(a)). In particular, residues assuming a -conformation, whose falls in the interval 120-180 ∘ , show a tendency to adopt negative Δ values (average Δ value −2.0 ∘ ) (Figure 3(b)). This is in line with data observed for the general ensemble of protein structures which indicate that residues with values in the interval 120 ∘ -180 ∘ display negative Δ values [5,7,11].
The analysis of the Δ distribution in -helical residues does not display, on average, any significant deviation from planarity (Figure 3(c)). This is also in line with previous analysis conducted on globular proteins which showed that helical residues display minimal deviations, with both positive and negative Δ values, from planarity [5,7,11].
Altogether, these findings suggest that a large variation of the local polarity, going from soluble globular proteins to membrane proteins, does not affect peptide bond distortion from planarity. They also imply that the electronic effects that dictate peptide bond deformations [7] also operate in apolar contexts.
We also checked the occurrences of specific trends of carbonyl C-pyramidalization in these membrane protein structures. These analyses do not highlight any clear trend (data not shown). This observation may be ascribed to the limited resolution of the structures available.

(b) Steric-Zipper Motif in Amyloid-Like Peptides.
It has been recently discovered that amyloidogenic fibril-forming peptides assume a rather unusual structure characterized by the tight association of the side chains belonging to twofacing -sheets (steric-zipper motif) [23,24]. It is believed that this motif represents a reliable surrogate of amyloidlike fibrils formed by protein and peptides associated with the insurgence of neurodegenerative diseases. It has been proposed that the tight interdigitation of side chains in the steric-zipper models confers an elevated structural stability to this motif. We evaluated here the impact of this tight interdigitation on the geometrical parameters of 28 high resolution steric zipper structures, in relation to their dependence on the conformation. The average value of the N-C -C angle is 109.4 ∘ . This value is only slightly higher than that observed for residues in -structure of globular proteins (109.2 ∘ ). Interestingly, the evaluation of the / correlation clearly indicates that riseswhen increases from 90 to 180 ∘ (correlation coefficient of 0.55 with a value <10 −4 ) (Figure 4(a)). This is the trend typically observed in globular proteins. It is worth mentioning, however, that some of the larger deformations of the N-C -C angle (>112 ∘ ) observed in these fibrillar structures occur when the interdigitation of residues between facing sheets is stronger. In Figure S4, some examples are reported in which the tight interdigitation leads to a significant deformation of the N-C -C angle. A rough estimation of the energetic cost associated with the deformation of this angle was obtained by plotting the distribution of its values and by deriving energies according to the Maxwell-Boltzmann relationship ( Figure 5). The analysis of the resulting diagram indicates that the energy cost for a deformation of N-C -C to 114 ∘ , a value that may be observed in concomitance with tight interdigitation (Figure S4), is about 4 kJ per mol at the temperature of 298 K ( Figure 5). Although the absolute value of this energetic  Figure 5: The histogram represents the distribution of (N-C -C) angles for residues located in -sheets derived from 5139 protein chains extracted from the PDB. These chains were selected by applying the following criteria: resolution better than 2.2Å, factor lower than 0.20, and mutual sequence identities lower than 25%. The black points represent energies derived from Maxwell-Boltzmann relationship = exp(−Δ / ), where is the number of observations in state , is the number of observations in the most populated state, is the Boltzmann factor, is the temperature of the system, and Δ is the energy difference between two energy states.
term is limited, it may be important in repetitive systems as amyloid-like fibrils. Therefore, the cost associated with the backbone geometry deformation is one of the factors that should be considered in the energetic balance of the process that leads to amyloid-like fiber formation. It is worth mentioning that some -branched residues (Ile and Val) have a lower propensity to assume large N-C -C values [18]. This implies that, for these residues, a larger energetic cost is associated with the deformations required for these tight associations. Indeed, the energetic cost, estimated by using the Maxwell-Boltzmann approach described above, for the Ile residue to adopt a N-C -C angle of 114 ∘ is about 5 kJ per mol at 298 K (data not shown). These considerations suggest that specific residue substitutions in key points of amyloidlike protein/peptides may favor or disfavor the process of fibril formation. Although this analysis represents one of the first quantifications of the energetics involved in the geometry distortion caused by steric zipper, future systematic studies by using quantum chemistry approaches are likely to provide further interesting insights into this topic.
Present data indicate that the / correlation is also observed. Indeed, as shown in Figure 4(b), Δ values are positive and negative when approaches values close to 90 ∘ and 150 ∘ , respectively. Since these peptides present elevated resolutions and good crystallographic factors, we also evaluated the dependence of the C pyramidalization on . As shown in Figure 4(c), there is a clear dependence of C on , which strictly follows the one observed in high resolution structures of globular proteins. Altogether, these findings demonstrate that the interplay of different geometrical parameters is preserved even in these highly strained systems.
(c) Collagen-Like Polypeptides. Collagen is characterized by an uncommon abundance of glycine and iminoacid residues that are regularly distributed in the sequence in repetitive triplets of the type Gly-X-Y [20,21,25,33,34]. Although the three residues of this motif assume a polyproline PPII conformation, they present distinct ( , , ) values. Indeed, structural studies carried on model peptides of the type Pro-Pro-Gly or Pro-Hyp-Gly have shown that adopts values close to 175 ∘ , 165 ∘ , and 150 ∘ for Gly, Pro in , and Pro in (or Hyp), respectively. Although crystals of collagen-like peptides frequently show significant disorder and/or twinning, very recently ultrahigh resolution (better than 1.1Å) models have been reported. Correlations among geometrical parameters were initially analyzed here by considering the structure of (Pro-Pro-Gly) 9 . Given the peculiar aminoacid composition of these peptides, the distribution of N-C -C values was not considered; the analyses were, therefore, restricted to Δ / and C / correlations. As shown in Figure 6, the values of Δ clearly depend on . In particular, Gly residues, which display a angle close to 180 ∘ , present a virtually planar peptide bond (Δ = −1.3 ∘ ± 2.2). Significant distortions from planarity are displayed by both Pro residues in position (Δ = −5.8 ∘ ± 2.8) and in position (Δ = −7.2 ∘ ± 4.1). Notably, Pro residues in , which show angles close to 150 ∘ , display larger deviations from planarity than Pro residues in . Similar results are obtained from the analyses carried out on Gly, Pro-, and Pro-residues isolated from other ultrahigh resolution structures of collagen-like peptides (data not shown). The evaluation of the dependence of C pyramidalization on leads to less clear results. Although, as expected, C presents average negative values for these residues, the entity of its absolute value (∼ −1 ∘ ) is likely not significant ( Figure S5).
The trends detected here are in line with the Δ / dependence reported for globular proteins. Not only do these findings assess the high accuracy of these collagenlike structures, but they also hold important implications for the triple helix buildup. Indeed, previous studies have shown that the interplay between dihedral angles played a major role in the stabilization/destabilization of the triple helix [20,21]. In particular, limited variations of the main-chain dihedral angles associated with different proline puckering have dramatic effects on triple helix stability. The present findings further extend the notion of dihedral angle correlations in the triple helix to Δ / . By influencing and , proline puckering also dictates the propensity of the peptide bond to undergo distortions. It is important to note that the CO group of the Pro in is involved in the H-bond interactions with the N nitrogen atom of the Gly residues, which stabilize the triple helix. In this scenario, it can be suggested that the positional preference for the down pucker of Pro in is related to the interplay of main-chain dihedral angles that leads to the optimal position of the CO for the H-bond formation. Although the magnitude of the peptide bond deviations is limited, its significance is increased in a repetitive system as collagen triple helix. These considerations support the concept that the stability of protein structures may rely on subtle effects, which can be highlighted only by accurate protein structure determinations.

Conclusions
Present data clearly indicate that the interplay of backbone geometrical parameters is an intrinsic and general property of protein/peptide structure that is preserved also in highly nonglobular contexts. The detection of these features in protein structures is therefore exclusively related to their accuracy. In line with previous suggestions [7,11], these correlations should be systematically checked in protein structure validation protocols. Since the subtle correlations of the geometrical parameters become more evident with the increase of resolution, their evaluation may represent a tool particularly indicated for the validation of high/ultrahigh resolution protein structures. The general validity of these subtle observed trends also prompts their use in protein refinement protocols [35] as well as in the development of accurate force fields [2] for computational studies.