Toward Structure Prediction for Short Peptides Using the Improved SAAP Force Field Parameters

Based on the observation that Ramachandran-type potential energy surfaces of single amino acid units in water are in good agreement with statistical structures of the corresponding amino acid residues in proteins, we recently developed a new all-atom force field called SAAP, in which the total energy function for a polypeptide is expressed basically as a sum of single amino acid potentials (ESAAP) and electrostatic (EES) and Lennard-Jones (ELJ) potentials between the amino acid units. In this study, the SAAP force field (SAAPFF) parameters were improved, and classical canonical Monte Carlo (MC) simulation was carried out for short peptide models, that is, Met-enkephalin and chignolin, at 300K in an implicit water model. Diverse structures were reasonably obtained for Met-enkephalin, while three folded structures, one of which corresponds to a native-like structure with three native hydrogen bonds, were obtained for chignolin. The results suggested that the SAAP-MC method is useful for conformational sampling for the short peptides. A protocol of SAAP-MC simulation followed by structural clustering and examination of the obtained structures by ab initio calculation or simply by the number of the hydrogen bonds (or the hardness) was demonstrated to be an effective strategy toward structure prediction for short peptide molecules.


Introduction
In conformational analysis of short peptides, Monte Carlo (MC) and molecular dynamics (MD) simulation techniques have been widely applied [1,2], in which a set of potential energy functions (a so-called force field), such as ECEPP [3,4], AMBER [5][6][7][8], CHARMM [9], OPLS [10,11], and GROMOS [12], is employed to define the relation between the structure and the potential energy.However, it is still not practical to search for the possible conformers comprehensively within a limited computing time without using advanced sampling methods, such as multicanonical [13,14] and replica-exchange [15][16][17] methods, which have been developed to overcome the sampling problem.In recent years, various types of coarse-grained force fields, such as AMBER-UA [18], UNRES [19][20][21][22][23], and MARTINI [24,25], have also been developed in order to reduce time to compute the potential energy.For instance, the UNRES force field successfully simulated the aggregation process and stable structure of -amyloid protein oligomers that are regarded as pathologic factors of Alzheimer's disease [26].
On the other hand, we recently discovered the interesting feature of protein structures that Ramachandran-type potential energy surfaces of single amino acid units in water obtained by ab initio molecular orbital calculation are almost identical with statistical distributions of the Ramachandran plots obtained from protein databank (PDB) [27,28].In other words, the amino acid residues in proteins seem to statistically follow Boltzmann distributions on the single amino acid potential (SAAP) surfaces in water.Although physicochemical implications of this unexpected similarity are not yet clear, the finding strongly suggests prominent importance of SAAP as a determinant of protein structures.This point would also be supported by the previous experimental result by Dobson and coworkers [29,30] that  and  dihedral angle distributions of individual amino acid residues of a polypeptide in a random-coil state can be expressed by use of the Ramachandran plots, as well as by the theoretical approaches by Sakae and Okamoto [31] and Kamiya et al. [32] to the optimization of conventional all-atom force field parameters by using the Ramachandran plots.
A discovery of the similarity between the SAAP in water and the statistical structure of the amino acid residues in folded proteins has prompted us to develop a new force field called SAAP for polypeptide molecules [33,34].The SAAP force field (SAAPFF) is entirely different from conventional force fields in that a polypeptide is divided to the amino acid units, not to the atomic units, solvent effects are implicitly included in the parameters, and the atomic charges are not constant but variable depending on the conformation of the amino acid units.In the SAAPFF, the total potential energy ( TOTAL ) for a polypeptide is expressed by (1) as the following: where  SAAP is a sum of potential energies (SAAP) for the individual amino acid units,  ES and  LJ are electrostatic and Lennard-Jones potentials between the amino acid units, and  OTHERS is the other correlation term, which is ignored in a current version of SAAPFF.Detailed description of these terms and the simulation program was given previously (see also Supplementary Material available online at http://dx.doi.org/10.1155/2013/407862)[33,34].Canonical MC simulation using the SAAPFF reasonably reproduced randomly fluctuating conformation of Metenkephalin.However, for chignolin the SAAP-MC simulation did not afford the native -hairpin structure when the simulation was started from the extended structure [34].Thus, the previous SAAPFF parameters were a little too rough to be applied for structure prediction of the short peptide.In this study, the SAAPFF parameters have been improved in several points, and accuracy of the improved SAAP simulation suite to search for possible conformers of short peptides has been evaluated by performing the molecular simulation for Met-enkephalin and chignolin.The results were then compared with those obtained by the conventional AMBER-MD simulation method using the amber99sb force field [1] combined with the generalized Born solvent model [35][36][37].A protocol of SAAP-MC simulation followed by structural clustering is suggested to be an effective strategy toward structure prediction for short peptide molecules.

Improvement of the SAAPFF Parameters.
The SAAPFF parameters are comprised of a potential energy, atomic coordinates, atomic charges, and Lennard-Jones potential parameters for all possible conformations of the single amino acid unit (HCO-Xaa-NH 2 ).The conformations are defined by dihedral angles, , ,  1 ,  2 , and so forth, each with an interval of 15 degrees.In this study, the SAAPFF parameters previously developed [33,34] were improved in the following points.
First, the parameters for main-chain, Pro, and Val in water, which were obtained in the previous version [34] by single-point calculation with the IEFPCM [65][66][67] solvent model using the structures optimized in vacuo, were replaced with those obtained by geometry optimization in water at the HF/6-31+G(d) level using the IEFPCM model in a Gaussian03 program (rev.E.01) [68].The similar calculation level has been widely applied for small biomolecules [69,70].Relaxation of the potential surfaces in the implicit water model should improve accuracy of the SAAPFF.
Second, atomic charges of the SAAPFF parameters, which are variables as a function of the structure of an amino acid unit, were switched for all amino acids from Mulliken charges to electrostatic potential (ESP) charges [71] that were obtained at the recommended higher MP2/6-31G(d,p) level [72] using the IEFPCM solvent model.The ESP charges are defined to reproduce the electrostatic potential surface and are used in conventional force fields as reliable atomic charges.Therefore, the use of ESP charges would reproduce electrostatic interactions, that is, the  ES term, more appropriately than the Mulliken charges.
Third, the parameters of potential energies for the mainchain unit were further improved as follows.The SAAPFF parameters for most amino acids, except for Gly, Ala, Pro, and Val, are divided into the main-chain and side-chain units according to the side-chain separation approximation method [34].In this approximation, the main-chain unit was capped with a tert-butyl group, which was fixed eclipsed to the main-chain part during geometry optimization to obtain the SAAPFF parameters.However, this method brought a statistical error that the SAAP energies thus obtained are slightly larger than those obtained without the approximation, due probably to distortion of the bond angles derived from the fixed geometry.Indeed, such a statistical error was evident for the case of Val [34].Therefore, the energies were replaced with those calculated for the relaxed structures, which were obtained by geometry optimization with the tert-butyl group relaxed in water, while for structural parameters (i.e., atomic coordinates and atomic charges) the geometry with the tertbutyl group fixed eclipsed was still employed in order to avoid abnormally short atomic contacts between the main-chain and side-chain units after the connection.

Simulation Conditions. Conformational properties of
Met-enkephalin and chignolin in water were studied by SAAP-MC and AMBER-MD [73] simulations.The SAAP-MC simulation was carried out at 300 K in water by using the improved SAAPFF parameters, in which solvation effects were implemented implicitly.The conventional Metropolis method [74,75] and the Mersenne Twister random number generator [76] were employed for the MC simulation.At each MC step, one dihedral angle, randomly chosen from all dihedral angles including main-chain and side-chain dihedral angles, was changed randomly with a maximum displacement angle of ±32 degree.The new structure was constructed by the interpolation of the SAAPFF parameters and subsequent connection of the single amino acid units, and the potential energy was calculated by (1).The potential energy thus obtained was then compared with the previous one.Ten trajectories with total MC steps of 100 million were obtained for Met-enkephalin, while twenty trajectories with total MC steps of 2 billion were obtained for chignolin.
As an initial structure of Met-enkephalin for the SAAP-MC simulation, the extended structure from PDB (PDB ID: 1PLW) [77] was employed.On the other hand, extended (defined by all main-chain  and  dihedral angles of −180 degree) and native folded (PDB ID: 1UAO) [54] structures were selected for chignolin as the initial structures.An Intel Xeon W3550 3.06 GHz processor with 12 GB memory was employed as a platform for calculation.The computing time of SAAP-MC simulation for Met-enkephalin was about 6 hours and for chignolin was about 11 days.For comparison, ten trajectories of AMBER-MD [73] simulation were also calculated for Met-enkephalin and chignolin, respectively, in water at 300 K.The simulation time was 1 s with a time step of 2 fs, and a cut-off distance of nonbonded interactions was 12 Å.Solvation effects were implicitly considered by using the generalized Born model (igb = 1) [35][36][37] with a dielectric constant of 78.5, following the recent study by Götz et al. [78].The AMBER10 program [73] with the amber99sb force field [1] was used.The computing time of the AMBER-MD simulation to obtain one trajectory was about 165 hours for Met-enkephalin, and that for chignolin was about 18 days, on the same calculation platform used for the SAAP-MC simulation.

Data Analysis.
The 10,000 structures of Met-enkephalin that were extracted in every 10,000 step from the SAAP-MC simulation trajectory were classified to twenty structural clusters by using a clustering algorithm called the -means method [79] based on the RMSD for the main-chain C, O, and N atoms or all heavy atoms except for the hydrogen atoms.Similarly, ten structural clusters were obtained for chignolin from the 20,000 structures extracted in every 100,000 step from the SAAP-MC simulation trajectories.On the other hand, 10,000 structures were extracted in every 100 ps from the AMBER-MD trajectories for both Met-enkephalin and chignolin.The obtained structures were classified to ten or twenty clusters by application of the -means method under the same conditions to those applied for the SAAP-MC simulation results.
For chignolin, the RMSD values with respect to the native structure were also calculated for the structures extracted from the SAAP-MC and AMBER-MD trajectories by using the amber-ptraj program [7,8] based on the main-chain atoms or all heavy atoms.The free-energy surfaces of chignolin were further analyzed by using all trajectories obtained from the SAAP-MC simulation according to (2): where  is a probability of the structures with the corresponding structural parameters,  min is the probability of the structure that had the lowest energy, Δ is a difference in the free energy,  is a gas constant, and  is an absolute temperature.Ramachandran-type - free-energy surfaces were obtained for each amino acid residue by dividing the surface into 10 × 10 degree units, while main-chain RMSD versus hydrogen bond or hydrogen bond versus hydrogen bond free-energy surfaces were obtained by dividing the surface into 0.1 × 0.1 Å units.

Ab Initio Calculation.
Relative energies of the representative structures obtained for chignolin by the SAAP-MC simulation were calculated in water by the ab initio molecular orbital method.The calculation was performed at the HF/IEFPCM/6-31+G(d,p) level by using a Gaussian03 program [68].The polarizable continuum model using the integral equation formalism variant (IEFPCM), which is equipped as the default self-consistent reaction field method, was employed for the calculation in water.

Met-Enkephalin by SAAP-MC and AMBER-MD.
Energetic trajectories of SAAP-MC and AMBER-MD simulations for Met-enkephalin were obtained in an implicit water model at 300 K (see Figure S1).In SAAPFF, the total energy  TOTAL is basically expressed as a sum of  SAAP ,  ES , and  LJ .The values of these energetic terms were maintained stable during the SAAP-MC simulation with large fluctuation for  SAAP and  LJ , while the value of  ES was almost zero, as observed previously [34]: the large dielectric constant of water would make the electrostatic interaction between the amino acid residues ignorable.Similarly, the total energy was maintained stable for the AMBER-MD simulation.However, the distribution of the distance between the terminal C atoms ( Y1⋅⋅⋅M5 ) was significantly different for the two simulation methods.The value of  Y1⋅⋅⋅M5 dispersed in a range from 5 to 12 Å in the SAAP-MC simulation, while those converged at around 6 Å in the AMBER-MD simulation (Figure 3).The results suggested that the structures obtained from the SAAP-MC simulation contain diverse conformations and are more extended than those obtained from the AMBER-MD simulation on the average.Indeed, when the structures were classified into twenty clusters based on the RMSD values for the main-chain atoms, various types of structures were obtained from the SAAP-MC trajectory.The representative examples are shown in Figure 4.The structures are different from each other with a diverse  Y1⋅⋅⋅M5 value, and there is no hydrogen bond in the three representative structures.In contrast, the structures of Met-enkephalin obtained from the AMBER-MD trajectory seemed to be trapped in local energy minimums during the simulation (see Figure S2).

Chignolin by SAAP-MC and AMBER-MD.
Twenty trajectories were obtained for chignolin by SAAP-MC simulation.The structures in each trajectory were classified to ten clusters based on the main-chain RMSD by using the -means method.According to the main-chain folds, that is, combination of the - dihedral angles for the amino acid residues, the obtained clusters were further classified to three representative structures, which are called here A, B, and C (Figure 5).Populations of the three structures are summarized in Table 1.Structure A has a native-like main-chain fold stabilized by three hydrogen bonds, which are the same as those observed in the native structure (i.e., H-bonds I-III).However, the hydrophobic interaction between the aromatic side chains of Tyr2 and Trp9 is not present.Structures B and C have nonnative conformation with a common type II turn around Thr6 and Gly7.A weak hydrogen bond exists between Glu5(O) and Thr8(N) with a distance about 4.0 Å.However, the conformations of the C-terminals are different from each other.Other minor structures have various local and global conformations, such as a type I -turn, a helical conformation, and other conformations.
In the trajectories from 1 to 12, the existence ratio of structure A was high (31.2 to 58.5%), whereas the ratio was much lower than that of the misfolded structure B in the remaining trajectories.The mean ratio for structure A averaged for all trajectories was 29.2%.On the other hand, when the clustering analysis was performed based on all-atom RMSD, the native-like structure (A  ) was obtained from 0 to 19.5% (Table 1).The existence ratio of structure A  averaged over all trajectories was 6.7%.Structure A  maintains three native H-bonds as well as a hydrophobic interaction between Tyr2 and Trp9 (Figure 5).The all-atom and main-chain RMSD values of structure A  from the native structure were 2.6 and 2.0 Å, respectively, suggesting that structure A  keeps the native fold.The trace of the main-chain RMSD with respect to the native structure is shown in Figure 6(a) for the case of trajectory 10, where structures A and B are involved in almost equal amounts.The RMSD values fluctuated between 1 and 6 Å, and repeats of the structural transitions were observed.Similarly, in the AMBER-MD simulation, the main-chain RMSD value fluctuated between 1 and 6 Å (Figure 6(b)).The structural clustering analysis showed that the native structure with three native hydrogen bonds was involved about 30% (Figure S5).
The convergence of the SAAP-MC simulation is not clear from Table 1.However, the similar relative populations were obtained for structures A (∼30% including 6% of A  ) and B (∼60%) when the simulation was started from the native structure.Moreover, in our preliminary results of the SAAP-MC simulation using ten trajectories, almost the same mean populations were obtained for the representative structures (data not shown).Based on these observations, we employed the twenty trajectories shown in Table 2 for further analysis.
The structure with the smallest all-atom RMSD (1.3 Å) obtained from the twenty trajectories is superimposed on the native structure in Figure 7.The main-chain and C atom RMSD values of this structure with respect to the native structure were 0.9 and 0.5 Å, respectively, confirming that the structure is almost identical to the native structure.
The Ramachandran-type free-energy surfaces obtained for Tyr2-Trp9 residues from all trajectories of the SAAP-MC simulation are shown in Figure 8.The profiles for Tyr2, Pro4, Glu5, and Thr6 fit well to the native structures, while the free-energy minimums for Asp3, Gly7, and Thr8 are slightly moved from the positions of the native structures.For Trp9, the stable areas corresponding to  and  regions tend to merge.This is roughly consistent with diversity in the Cterminal conformation of the native structure.The dihedral angles for structures A and A  are located around the energy minimum points on the free-energy surfaces.In contrast, structure B has different geometry from structures A and A  for Glu5 and Thr6: the  angles are ca.+110 ∼ 130 degree.Structure C has the geometry with  angles of Glu5, Thr6, Thr8, and Trp9 different from those of structures A and A  .During the SAAP-MC simulation, mismatch of the mainchain dihedral angles for Glu5, Thr6, and Gly7 to the native structure occurred at high ratios.
The free-energy surface of chignolin projected on Hbonds I versus III plane and that projected on the main-chain RMSD versus H-bond II plane obtained from all trajectories are shown in Figure 9.In Figure 9(a), the most stable area was found at  (H-bond I) ∼ 6.5 Å and  (H-bond III) ∼ 4.5 Å.This energy minimum corresponds to the misfolded structure B. Another potential energy minimum was found at  (H-bond I) ∼ 3.5 Å and  (H-bond III) ∼ 3.5 Å, corresponding to structures A and A  .Similarly, two stable minimums were located in Figure 9(b);  (H-bond II) ∼ 6.0 Å and RMSD ∼ 2.3 Å, corresponding to structure B, and  (H-bond II) ∼ 3.7 Å and RMSD ∼ 3 Å, corresponding to structure A. The free-energy surfaces showed that structure A maintains three native hydrogen bonds, but it has larger main-chain RMSD from the native structure than misfolded and the most stable structure B. This conflict is ascribed to the N-and C-terminals of structure A, which are significantly apart due to the absence of a hydrophobic interaction between Tyr2 and Trp9 side chains.It should be noted that structure A  is separated from structure A in Figure 9(b).

Evaluation of the Relative Energies of Structures A-C by Ab
Initio Calculation.To evaluate relative stabilities of structures

Trp9
(h) Figure 8: Ramachandran-type free-energy surfaces for Tyr2-Trp9 residues of chignolin obtained from all trajectories of the SAAP-MC simulation at 300 K in water along with the plots of the native structures determined by NMR [58].Contour lines are drawn in an interval of 1 kcal/mol.
A, A  , B, and C more accurately, single-point ab initio calculation was carried out.Table 2 lists the SCF and relative energies calculated in the IEFPCM solvent model.Although structure B was most populated in the SAAP-MC simulation, it is significantly unstable compared to the other structures, suggesting that structure B would not be populated in practical solutions.On the other hand, structure A  with a native fold was found to be the most stable.According to the ab initio calculation results, it is likely that the structures having more hydrogen bonds and/or a hydrophobic interaction between the aromatic rings are more stable.The energies of the main-chain unit were replaced with those obtained for the geometry with the tertbutyl cap being relaxed during the optimization.The effects of these modifications were found to be significant as the SAAP-MC simulation using the new parameters reproduced the native structure of chignolin (structure A  ), while it was not obtained when the simulation was performed by using the previous parameters [34].

Discussion
The clustering of the structures obtained for chignolin by SAAP-MC simulation using the improved parameters showed that structures A and B are dominant in the implicit water model as shown in Table 1.Structure A (29.2%) maintains three native hydrogen bonds, indicating the native main-chain fold, while nonnative structure B (43.2%) is stabilized by a type II -turn.In addition, the structural clustering based on all-atom RMSD demonstrated that native structure A  , which maintains not only the three native hydrogen bonds but also a hydrophobic interaction between the aromatic side chains of Tyr2 and Trp9, is involved in 6.7% of the total structures.The free-energy analysis clearly indicated that the main-chain dihedral angles of the each amino acid residue and the distances of H-bonds I-III for structures A and A  are consistent with the native structure (see Figures 8 and 9), although their populations were less than that of the misfolded structure B (vide infra).
In the meantime, randomly fluctuating structure was obtained for Met-enkephalin by using both the previous [34] and improved SAAPFF parameters (see Figures 3 and 4).This would be due to the two Gly residues, whose SAAP parameters have been modified only for the atomic charges in this study, being involved in the mid of the amino acid sequence.
There are a number of reports in the literature on the molecular simulation for chignolin by using conventional force fields to obtain the native structure from the unfolded state [55][56][57][58][59][60][61][62][63][64].In these studies, various state-of-theart techniques of molecular simulation have been employed, including a multicanonical MD method [55,56], a replica exchange MD method [57][58][59][60], a long-term conventional MD method by using a high-speed computer system [62], and originally developed MHOP and MSES methods [63,64].The values of all-atom RMSD for the obtained native structure were in a range from 1.3 to 1.8 Å with respect to the experimentally determined structure, which is smaller than that of structure A obtained as a major native-like structure by the SAAP-MC simulation in this study.Moreover, the AMBER-MD simulation carried out in this study showed that the native structure of chignolin is a little more frequently obtained by using the amber99sb force field than SAAPFF.Thus, it is obvious that further improvement of the SAAPFF parameters is required in future works.

Efficiency of SAAP-MC Simulation for Conformational Sampling.
Previous studies applying an explicit water molecule model [43] or a generalized-ensemble method with RISM theory [39][40][41][42] have shown that Met-enkephalin has extended conformations in water.This feature was reasonably reproduced in this study by application of SAAPFF as shown in Figures 3 and 4. Clustering of the structures obtained from the SAAP-MC simulation indicated diverse conformations of Met-enkephalin in water.On the other hand, the AMBER-MD simulation for Met-enkephalin carried out in this study produced rather uniform folded conformations with intramolecular hydrogen bonds (see Figure S2), probably due to insufficient computing time.Thus, the SAAP-MC method would be useful for sampling possible conformations for short peptide molecules in an implicit water model, although the accuracy is not satisfactory in terms of the reproducibility of the relative energies.
The efficiency for conformational sampling arises probably from a less number of variables used in the SAAPFF than that in the conventional force fields.In SAAPFF, the structure of peptides is defined only by the dihedral angles of the each amino acid unit, not by the cartesian coordinates of each atom.Acceleration of conformational sampling by reducing the number of structural parameters is a common strategy of coarse-grained force fields, such as united-atom force fields [18-26, 80, 81].Efficiency of the SAAP-MC simulation for conformational sampling can also be explained by the fact that the solvent effects are implicitly included in the parameters of SAAPFF.This enables the SAAP-MC simulation in an implicit water model to be performed in the same computing time as that in vacuo.

Prediction of Native Structures.
Although the accuracy of SAAPFF is not yet satisfied, the efficiency for conformational sampling would be advantageous for prediction of the stable structures of peptides in water.Therefore, we subsequently explored the application of the SAAP-MC method to predict the native structure of chignolin.
Indeed, it was found by single-point ab initio calculation in water at the HF/IEFPCM/6-31+G(d,p) level that structure A  is the most stable one in water among the structures shown in Figure 5 (see Table 2).The procedure should involve a large error in the energy because the structure is not optimized.However, the tendency of the relative energies would be reliable to some extent as previously demonstrated for HCO-Ala 2 -NH 2 in vacuo [33].Thus, a protocol of SAAP-MC simulation followed by structure clustering and examination of the obtained structures by ab initio calculation would be a useful strategy toward the prediction of the native structure for short peptides.The relative energies obtained by ab initio calculation also suggested the possibility that structure B can be produced as an artifact of the force field because the relative energy was quite large (>20 kcal/mol).
As for prediction of the native structure, another simpler method would be possible based on the hardness (i.e., the number of interamino acid interactions or the depth of the potential hole on the free-energy surfaces) of the structures.As seen in Figure 5, structure B of chignolin maintains only one hydrogen bond, while structures A and A  hold three hydrogen bonds.In terms of the number of hydrogen bonds, it can be assumed that structures A and A  are more nativelike than structure B, although the population of structure B was larger than those of structures A and A  .Indeed, the areas around the free-energy minimums corresponding to structures A and A  in Figures 9(a) and 9(b) are slightly narrower than those corresponding to structures B and C. Such robust structures on the conformational surface would be predicted to be more native-like than the structures with more conformational flexibility.Thus, we suggest that a protocol of SAAP-MC simulation followed by structural clustering and examination of the obtained structures by the number of hydrogen bonds (or the hardness) may be another simple strategy toward structure prediction for short peptide molecules.

Conclusions
The parameters of SAAPFF, which was previously developed to analyze the structures and folding of polypeptides, have been improved in several points in this study.(1) The mainchain, Pro, and Val parameters in water were replaced with those obtained by geometry optimization in the IEFPCM solvent model.(2) The atomic charges were replaced from Mulliken to ESP charges.(3) The energies of the main-chain unit were replaced with those obtained for the geometry with the tert-butyl cap being relaxed during the optimization.To investigate the accuracy of the improved SAAPFF parameters, the SAAP-MC simulation was carried out for Metenkephalin and chignolin as model peptides.Analysis of the obtained structures revealed that the SAAP-MC simulation reasonably reproduced randomly fluctuating conformation for Met-enkephalin, while three folded structures, one of which corresponds to a native-like structure with three native hydrogen bonds, were obtained for chignolin.The latter result supported that the accuracy of the SAAPFF parameters has been remarkably improved from the previous ones, although the native structure of chignolin could not be assigned to the most stable structure in the conformational space defined by SAAPFF.
In the meantime, efficiency of SAAP-MC simulation for conformational sampling was demonstrated for Metenkephalin as the SAAP-MC simulation afforded diverse structures.The feature, combined with the structural clustering analysis, was subsequently applied to the structure prediction of chignolin.Among the representative structures obtained by the clustering, structure A  with a native fold was assigned to the most stable structure according to ab initio calculation.Thus, a protocol of SAAP-MC simulation followed by structural clustering and examination of the obtained structures by ab initio calculation would be a useful strategy toward prediction of the native structure for short peptides.Alternatively, the structure with a maximum number of hydrogen bonds or with the hardest conformation on the potential surface may be considered to be the most nativelike among the clustered structures.Applications of this method to structure prediction of other peptide molecules as well as further improvement of the SAAPFF parameters are being conducted by our group.

3 H 3 CFigure 2 :
Figure 2: Structure of chignolin.Three native hydrogen bonds (H-bonds I, II, and III) and a hydrophobic interaction between Tyr2 and Trp9 are indicated by arrows [58].

3 ) 3 )Figure 3 :
Figure 3: Histograms of the distance between C atoms of Tyr1 and Met5 for 10,000 structures of Met-enkephalin obtained by the SAAP-MC (a) and AMBER-MD (b) simulations at 300 K in water.The results from ten trajectories were averaged.See the text for details of simulation conditions.

Figure 4 :
Figure 4: Representative structures of Met-enkephalin obtained by the SAAP-MC simulation at 300 K in water (trajectory 1).The obtained 10,000 structures were analyzed by a structural clustering algorithm using the -means method based on the main-chain RMSD.A distance of the hydrogen bond is given in a unit of Å.

Figure 5 :Figure 6 :
Figure 5: Representative structures of chignolin obtained by the SAAP-MC simulation at 300 K in water.The obtained 20,000 structures were analyzed by a structural clustering algorithm using the -means method.Structures (A), (B), and (C) were obtained based on the main-chain RMSD.Structure (A  ) was obtained based on the all-atom RMSD.Distances of the hydrogen bonds and the hydrophobic interaction are given in a unit of Å.

Figure 7 :
Figure 7: The structure of chignolin with the smallest all-atom RMSD (1.3 Å) obtained by the SAAP-MC simulation at 300 K in water (white) superimposed on the reference native structure (gray).

4. 1 .
Improvement of SAAPFF Parameters.In this study, we have modified the SAAPFF parameters in the following points.(1) The main-chain, Pro, and Val parameters in water were replaced with those obtained by geometry optimization in water.(2) The atomic charges were replaced from Mulliken to ESP charges.(3)

Figure 9 :
Figure 9: Free-energy surfaces of chignolin projected on the hydrogen bonds I versus III plane (a) and on the hydrogen bond II versus mainchain RMSD plane (b) obtained from all trajectories of the SAAP-MC simulation at 300 K in water.Contour lines are drawn in an interval of 1 kcal/mol.

Table 1 :
Existence ratios (%) of representative structures of chignolin obtained by the SAAP-MC simulation at 300 K in water.

Table 2 :
Relative energies of structures A-C determined by ab initio calculation at the HF/IEFPCM/6-31+G(d,p) level in water.