The potential energy landscape of pentapeptides was mapped in a collective coordinate principal conformational subspace derived from principal component analysis of a nonredundant representative set of protein structures from the PDB. Three pentapeptide sequences that are known to be distinct in terms of their secondary structure characteristics, (Ala)5, (Gly)5, and Val.Asn.Thr.Phe.Val, were considered. Partitioning the landscapes into different energy valleys allowed for calculation of the relative propensities of the peptide secondary structures in a statistical mechanical framework. The distribution of the observed conformations of pentapeptide data showed good correspondence to the topology of the energy landscape of the (Ala)5 sequence where, in accord with reported trends, the
Our understanding of sequence-structure relationships of proteins is increasing in both depth and breadth [
The conformational space of a single amino acid in a peptide can be effectively described by the backbone
A more complete understanding of sequence-structure relationships in polypeptides requires us to move beyond statistical/geometrical considerations to include physical interactions [
The relatively small number of folds adopted by experimentally determined protein structures suggests that native folds are confined to a low dimensional manifold (subspace) of the nominally high dimensional protein fold space [
In a previous work [
In this paper, we build on the approach developed in our previous work [
Representative structures from the Protein Data Bank were created using PDBSELECT [
In order to reduce the dimensionality of the conformational space of the pentapeptide segments, the data matrix (
To avoid problems, such as periodicity and nonlinearity, in the calculation of variance for angular data [
Despite the duality of PCA and MDS in terms of the scores of the data matrix [
The potential energy surface (PES) of a pentapeptide segment within the PCS was mapped via systematic energy evaluation on a grid defined by discrete points along the first three PCs. The coordinates of a grid point
At each grid point, the backbone torsion angles were restrained to their desired values using a force constant of 100 kcal mol−1 degree−2 and the system was energy minimised using 200 steps of steepest descent followed by 2000 steps of the ABNR method [
As a consequence of the discretization, the structure and energy of any point in the PCS are considered to be the structure and energy corresponding to the closest grid point within the resolution limit of the grid. Characterisation of the features of the PES and calculation of the thermodynamic properties for the different conformational states were conducted as described previously [
The energy levels,
PCA, therefore, provides a basis set of linear collective coordinates describing the variance of observed structures within the PCS. The PCs allow for the construction of conformers within the PCS, interpolating and, optionally, extrapolating from the observed conformational distribution. Calculating the energy of conformers within the PCS results in a smooth continuous energy landscape. The collective nature of the basis set allows also for a straightforward definition of conformational reaction coordinates for complex biomolecules in contrast to the use of proxy coordinates (“order parameters"), which, are arbitrarily or heuristically chosen [
We generated a conformational space for pentapeptide segments extracted from protein crystal structures achieving a similar result to that of Sims et al. [
(a) The probability distribution (in percentage) of X-ray crystal structures data superposed on the potential energy slice in the PC1-PC2 plane. (b) Slices of the potential energy surface in PC1-PC3 and PC2-PC3 planes. In (a) and (b), energy is offset relative to the global minimum and the colour ramp changes linearly from dark blue (lowest energy; 0 kcal/mole) to red (30 kcal/mole) in 1 kcal/mole increment. (c) Ramachandran plots of selected secondary structure classes of the X-ray crystal structures dataset. Selected secondary structure classes are coloured in the three panels as follows;
The PESs underlying the PCS were mapped for three selected pentapeptide sequences: (Ala)5, (Gly)5, and Val.Asn.Thr.Phe.Val. The (Ala)5 sequence represents the canonical sequence for sequence-structure studies; (Gly)5 represents the extreme case of removing local steric hindrance along the polypeptide chain, and the Val.Asn.Thr.Phe.Val sequence was chosen as a test case for the hypothesis that the secondary structure of Val.Asn.Thr.Phe.Val sequence is context dependent [
The energy landscapes were assessed in terms of the correspondence of the distribution of observed conformers to the features of the underlying PES. The utility of the PES is demonstrated via computation of sequence-specific relative propensities of different secondary structure conformers. These propensities, computed on a statistical mechanical basis, are compared to the observed statistical distributions.
Inspection of the projection of the observed conformational distribution (incorporating many different sequences) onto the PES of (Ala)5 reveals that the data is confined to the low energy regions of the computed PES. Further, the positions of the predominant peaks of this distribution match well with the location of the energy valleys within that surface (see Figure
Within the alpha helical regions, it is interesting to note that points whose secondary structure assignment is uniformly alpha helix (i.e., H.H.H.H.H) are found in the minimum of the
Analysis of the PESs of the selected three pentapeptide sequences reveals that their detailed topographical and topological features are strongly dependent on sequence composition (see Figure
The disconnectivity graphs of the potential energy surfaces in the principal conformational space of pentapeptide segments (a), thermodynamic propensities (at 298 K) superposed on the potential energy slice corresponding to the dominant energy valley (b), superposition of respective sequences on these slices (c) (in green), and Ramachandran plots of dominant energy valleys (d) for the penta-alanine (1), pentaglycine (2), and Val.Asn.Thr.Phe.Val (3) sequences. The disconnectivity graphs (a) are offset relative to the respective global minima; minima are labelled with their secondary structure classification based on position on the Ramachandran plot (per residue: H indicates helix and E an extended sheet while n indicates nonavailable secondary structure classification). Minima corresponding to the two valleys with the highest propensities are coloured in green and orange. Inset: the
Stratification of the disconnectivity trees of the three selected sequences into different energy intervals helps to highlight the sequence-specific distribution of conformational states over the energy funnel. The densities of the
At the highest energy band of the energy funnels (Figure
It is interesting to note that at the lowest band of the potential energy funnels (see Figure
Quantitative analysis of the PES surfaces via partitioning into different energy valleys and calculation of the local partition functions (see Methods) reveals that the relative propensities of the conformational states at 298 K depend highly on the sequence composition of each pentapeptide segment (Figure
Projection of pentapeptide fragments corresponding to the three selected sequences (X-ray crystal structures with resolution <2.0 Å from the protein databank) [
The observed distribution of the (Gly)5 sequence (Figure
The conformational propensities for the three selected pentapeptide sequences can be ascribed to two contributions to the free energy bias: enthalpic and entropic. The enthalpic contribution stems from the relative energetics of the energy minima while the (conformational) entropic contribution is related to the relative volumes of the corresponding energy valleys. These two contributions can be intuitively deduced from inspection of the disconnectivity graphs and the topography of the surfaces respectively (see Figures
The large energetic difference of the minima of the (Ala)5
The use of a collective coordinate basis set for mapping the potential energy surface of pentapeptide segments in a low dimensional principal conformational subspace allowed for a quantitative assessment of sequence-specific conformational preferences within a statistical thermodynamic framework. The calculated thermodynamic propensities (at 298 K) of (Ala)5 and Val.Asn.Thr.Phe.Val sequences are in accord with their statistically derived secondary structure preferences based on structures in the protein databank. The (Ala)5 sequence showed a predominant propensity for an
Analysis of the topography and topology of the energy landscapes reveals that preference for the
The method therefore provides a powerful and general framework for investigating the sequence-specific thermodynamic and dynamic properties of polypeptide segments derived from underlying conformational energy landscapes. For example, our approach may also be of utility in the investigation of specific sequence dependent structural properties, such as the identification of protein folding initiation sites [
The author declares that there are no competing interests.
The author is very grateful to Dr. Leo Caves for insightful discussions and helpful remarks. The author gratefully acknowledges Qassim University, represented by the Deanship of Scientific Research, for the material support for this research under no. 3041 during the academic year 1436 AH/2015 AD.