Identification of Protein Folding Cores Using Charge Center Model of Protein Structure

METHODS. This presentation introduces a novel method for identification of the folding cores in proteins. Positive and negative charge centers, defined solely from the structure of the folded protein are likely to indicate the protein’s folding core. The residues and sequences, identified by the procedure, are highly consistent with the available biochemical data on folding as seen for a set of about 20 proteins[1]. Here, the folding of chymotrypsin inhibitor-2 (CI2) from barley is analyzed in some detail.

. Stages of protein folding using CI2 as an example. a) local hydrophobic collapse; b) formation of the residue cluster(s) around the charge centers, then electrostatic collapse; c) global hydrophobic collapse, formation of the active/binding site; (d) slow annealing to the native state. HX (hydrogen exchange) residues (cyan), continuous hydrophobic sequences (green dots), residues of folding nucleus (yellow), charge centers (positive, blue sphere, negative, red) and the binding site (loop colored in red) are shown. See text for the details on biochemistry of CI2 folding.

TABLE 1
Residues around the positive and negative charge centers (CC+,CC-), of the hydrogen exchange core (HX) and of the folding nucleus (NMR) for CI2 from barley (PDB 1ciq). Only hydrophobic residues are listed. Conserved residues are marked with '*'. Residues in this table comprise less than 1/3 of all hydrophobic residues in CI2.
Let's consider folding of CI2 stage by stage. 1. The analysis [1] suggests that continuous hydrophobic sequences (in CI2 there are 3 of them: 19-21 VIL, 29-35 IIVLPVG, and LFV 49-51) are likely to be the earliest folding events. Indeed, NMR experiments and molecular dynamics of CI2 unfolding [3] suggest "hydrophobic clustering" of the sequence 29-35 as well as 19-21 (which is part of the only helix, Fig. 1a). 2. Nucleation site for folding of the CI2 was suggested to include regions of the protein around the residues Ala-16, Leu-49 and Ile-57 [3]. These residues comprise the clusters around the charge centers (Table 1). Major event in the folding seems to be closing of the helix to the beta-sheet [3], which agrees with the charge centers being located between the helix and the beta-sheet (Fig. 1b) 3. Hydrogen exchange cores are likely to be an important intermediate in folding [2]. Global hydrophobic collapse supposedly led to formation of the active/binding site [1], which at least correlates with the structure CI2 (Fig. 1c, the binding loop 54-64 [4] [35-35 in 1ciq] is between the strands which are brought together in this stage). 4. Slow annealing (Fig. 1d) is, perhaps, the longest stage in folding, and it would include minor repacking of the half-folded protein.

DISCUSSION.
Although the role of electrostatics in folding has not been studied in detail [1], electrostatic interactions may be particularly important for the folding of CI2 [5]. For 16 out of 20 proteins, at least 80% of the core sequences could be correctly identified using the charge center model [1]. It should be noted, that this result is based on an "ab initio"-type of calculation, which suggest solid physical chemistry behind the model. Basic physico-chemical considerations do suggest that electrostatic interactions are more effective in driving together distant sequence regions than a "blind search" of the most efficient packing of the hydrophobic core [1].

CONCLUSION.
The analytical method, based on charge centers model, provides a detailed picture of the folding states. This picture is in good agreement with the available experimental data and it also provides additional details, which are not always available from experimental studies. Due to extreme simplicity of the calculation, the procedure is highly efficient computationally. Therefore, a method based on the charge center model could be used for identification of potential folding cores in large sets of protein structures.