HomoSAR : An Integrated Approach Using Homology Modeling and Quantitative Structure-Activity Relationship for Activity Prediction of Peptides

3D-QSAR of peptides is a daunting task. The difficulty in peptide QSAR arises due to the sheer number of conformational degrees of freedom for peptides that makes alignment in a 3D grid an overwhelming task. In this paper, we propose a method of QSAR where the alignment of peptides is shifted from 3D space to 1D space, making the alignment of peptides a very simple proposition. The method called HomoSAR, is based on an integrated approach that uses the principles of homology modeling in conjunction with the QSAR formalism to predict and design new peptide sequences. The peptides to be studied are subjected to a multiple sequence alignment which is followed by scoring every position in the peptide sequence against a reference peptide in the alignment, through calculation of similarity indices. The similarity indices obtained for each position (amino acid residue) in the peptide form the “descriptor” values (independent variables) which are then correlated to the biological activity of the peptide by G/PLS techniques. As an application, the methodology has been illustrated for the dataset of nonamer peptides that bind to the Class I major histocompatibility complex (MHC) molecule HLA-A∗0201 as this dataset has been extensively studied. The models generated have statistically significant correlation coefficients and predictive r2. The cross validated coefficients (q2) are in an acceptable range. The HomoSAR approach identifies amino acids and properties that are preferred or detrimental at every position in the peptide sequence. The approach is simple to use and is able to extract all information contained in the dataset to explain the underlying structure activity relationships. The approach is applicable to peptide sequences which are not all of uniform length.


Introduction
Peptides and proteins form one of the important components of all biological systems.Peptides are nature's choice to maintain homeostasis and combat disease conditions and endowed them with high potency (low dose), specificity, and selectivity (reduced side effects).The peptide backbone is highly flexible, and the side chains of amino acids have the ability to adopt a conformation complementary to the active site of the receptor so as to match the bulk, hydrophobic, and electrostatic forces.For example, in the case of alphaconotoxins, which are a class of nicotinic acetylcholine receptor (nAChR) antagonists, a single amino acid substitution in alpha-conotoxin PnIA shows a shift in the selectivity for the mammalian neuronal nicotinic acetylcholine receptor subtypes [1].The flexibility of the backbone and the side chains also setup difficulties in the rational design of peptide drugs.The process of experimental lead optimization of peptides becomes an exponentially cumbersome procedure as the peptide length increases.For example, the design of a dipeptide would require an experimental scan through 20 × 20 combinations of amino acids to unravel the entire SAR of the dipeptide.Thus, the rational design of peptides is still a daunting task.
The QSAR techniques have the power and ability to quickly optimize a peptide sequence given a dataset of peptides with known biological activity.However, this is mostly restricted to the 2D-QSAR methods.There are very few examples in the literature that deal with the design and SAR of peptides by 3D-QSAR approaches, for example, the 3D-QSAR studies of DPP-IV dipeptides [2], MHC binding peptides [3][4][5][6][7], and recently, studies on the δ opioid peptides-rubiscolins [8].The 3D-QSAR techniques are not without their inherent problems.The difficulty of obtaining a unique alignment of peptides for 3D-QSAR analysis makes the application of CoMFA or CoMSIA techniques a daunting proposition.However, CoMFA and CoMSIA are the preferred methods for small molecule 3D-QSAR.The alignment procedure becomes more complex and uncertain as the degrees of freedom increase with increasing number of rotatable bonds.However, receptorbased alignment methods may provide a way out albeit at a high computational expense.It is for these reasons that 2D-QSAR techniques being much simpler and quicker than the 3D techniques are often used for studying peptides [9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24].The power of 2D-QSAR results is dictated by the property spaced spanned.
In view of these short comings, we propose a method for QSAR of peptides where the central element of alignment is translated from 3D space to 1D space, which can be executed easily and accurately, while preserving the information content.The approach is called HomoSAR and is distantly related to the kernel-based approach of Salomon and Flower [23].There are three basic steps in HomoSAR.The first step in HomoSAR is the sequence alignment of the peptides in the training and test sets separately as followed in the homology modeling or comparative protein modeling procedures.The multiple sequence alignment is the method of choice since it takes into account all the peptides in the dataset.The second step of the approach scores all the peptides of the dataset against a reference peptide in the alignment using similarity indices.The similarity indices calculated for each and every position in the peptide sequence is related to the binding activity in the third step by a suitable statistical algorithm.The three individual steps in HomoSAR are discussed in some details below.
The central step in this approach is the so-called sequence alignment.Sequence alignment is used for the detection of correspondences between amino acids of a reference peptide/protein and those of the query peptide/protein and can be related to the structure and activity of the peptides.The alignment of amino acid sequences is a crucial step in homology modeling due to which many different methods and programs have been published and are still being developed.The earliest attempt to clarify the structural similarity between protein sequences was by Needleman and Wunsch [25].Variants of this algorithm have been developed independently by others and applied in many fields.ALIGN, BESTFIT, and GAP [26] are some of the computer-based programs which are being widely used for sequence alignment.The original Needleman and Wunsch algorithm was written to handle only a pair of sequences, whereas several other programs have been developed to handle multiple sequence alignment.Recent ones in this category are CLUSTALW [27], MAXHOM [28], and so forth.HomoSAR uses the multiple sequence alignment over the pairwise alignment due to the ability of the algorithm to handle multiple sequences and thus reduce the bias of a single reference.
The second step following sequence alignment is scoring or weighting the aligned sequences.In homology modeling, this is provided by the so-called homology matrices which makes use of the most probable amino acid substitutions according to the physical, chemical, or statistical properties.From the various available matrices [29][30][31][32][33][34][35], the following ones are frequently applied: identity matrix, codon substitution matrix, mutation matrix (Dayhoff or PAM 250 matrix), and physical property matrices.HomoSAR uses one of the above-mentioned scoring matrices for the multiple sequence alignment; however, for the QSAR analysis, similarity indices calculated from specific amino acid properties are used.These indices are calculated for every amino acid in the peptide sequence in relation to the amino acid in that position in a reference peptide, as identified by the alignment procedure.
The third step involves relating the similarity indices for the amino acids in the sequences with the biological activity through the use of a robust statistical method which is efficient enough to identify relationships with statistical significance.The G/PLS algorithm is the statistical method used in the third step of HomoSAR, which through its evolutionary nature is able to pick out descriptor variables that have the closest relation to the biological activity.
Homology modeling helps in identifying the similarity between different peptide/protein sequences on the basis of mutation, identity, or hydrophobic pattern of the sequences.This means that similar/related sequences will have similar structures and in turn similar function.It is well accepted that activity is related to structure, therefore variation in peptide sequences can be related to the variation in their activity distribution.Thus, the procedure of sequence alignment of peptides/proteins does establish a relationship between the activities and the sequences/structures, but is unable to quantify this relationship.On the other hand, QSAR which deals with the relationship of structure with activity establishes this relationship in a mathematical formulation.HomoSAR attempts to draw the strengths of homology modeling also called homology modeling, to overcome some of the limitations inherent in peptide QSAR approaches.Thus, a union of the principles of homology modeling and the QSAR formalism can establish a novel means of understanding in a quantitative fashion the variation in peptide sequence with activity.HomoSAR could also be used to address the difficulties of correlating both sequence diversity and variation in length with activity.The relationship between the length of the peptide chain and activity is not very obvious.An increase or decrease in peptide length often has a variable effect on the activity.For every biological effect, there is an optimum length of the peptide for which the activity is the highest, and deviation from this optimal length reflects directly on the activity.Thus identifying the optimum length for peptides is not always easy, though recognition of residues for affinity and activity may be somewhat simple.
If all the peptides binding to a given receptor are of uniform length, then the overlay of the peptides is straight forward if the active site permits a snug fit of the peptides.However, if the active site encloses a large space, then peptides with varying length could be translated in relation to each other so as to attain a tighter binding in the active site.In such cases, a simple overlay of peptides cannot be used to impose the condition that the peptides share the same binding mode, but a good understanding of the binding mode can be gained through the sequence alignment technique in HomoSAR.

Computational Details
We demonstrate the HomoSAR methodology on a dataset of 128 nonameric peptides belonging to the HLA-0201 * A series.This dataset was chosen simply because it is one of the established peptide dataset in terms of structural diversity, wide distribution of activity and has been well characterized both by theoretical and experimental studies [3][4][5][6][7][16][17][18][19][20][21][22][23].It is the best test bed for the validation of the HomoSAR methodology.The dataset was divided into a training set (87 molecules) and a test set (41 molecules) randomly on the basis of the activity values as shown in Table 1.For the present QSAR studies, the binding affinities of the peptides in the dataset were compiled from the literature [36][37][38][39][40][41][42][43][44][45][46][47][48] and transformed as pIC 50 (− log IC 50 ) values in terms of the molar concentration.

Multiple Sequence Alignment of the Peptides. The first step in
HomoSAR involves an alignment of all the peptides in the dataset, shown in Figure 1.The alignment was executed using the DNASIS Max [49] sequence alignment software running on a Windows platform.The peptides sequences in both the training and test sets were aligned separately, aligned by the multiple sequence alignment strategy.The peptide 102 in the dataset was chosen as the reference peptide for scoring (vide infra) following the alignment step.The eight nonapeptides cocrystallized with the MHC protein and whose structures have been solved by X-ray crystallography (PDB codes are 1AKJ, 1DUZ, 1HHG, 1HHJ, 1OGA, 1QEW, 1QSE, and 1QSF) were also included in the peptide alignment, as a check against alignment results obtained by the multiple sequence alignment protocol.

Similarity Indices.
Following alignment of the peptides in the dataset, the second step in HomoSAR involves calculating a similarity index for every amino acid position in the peptide sequence against the amino acid in the same position in the reference peptide (see Figure 1), as established by the alignment rule.
The similarity index (S) between peptide A (the reference peptide) and peptide B for "ith" position in the sequences, for a given physicochemical property, is given by where S AB [P] i is the similarity between peptides A and B at the "ith" position in the peptide sequences for the physicochemical property P; P A i and P B i are the physicochemical property of the amino acid in the respective peptide sequences A and B at the "ith" position.The denominator is a normalizing factor.

Physicochemical Properties [P] for Computing Similarity Indices. The properties [P] used to calculate the similarity indices (S) (1) are the properties of amino acids such as isotropic surface area (ISA), electronic charge index (ECI), hydrophobicity (HS), molar refractivity (MR), total dipole moment (TDM), and total lipole moment (TLM).
The similarity values for the peptides (1) are used as the Xvariables (descriptors) for derivation of the QSAR models.These properties were selected as they describe the steric, electronic, and hydrophobic nature of the amino acids that are key descriptors of the binding process.The significance of these properties used to calculate the similarity indices are discussed below.

Isotropic Surface Area (ISA). Isotropic surface area
(ISA) is the portion of the solute molecule which is accessible for nonspecific interactions with water.The nonspecific interactions are those between water and solute molecules other than hydrogen bond interactions.The ISA is calculated as the sum of the surfaces over the side chain atoms accessible to nonspecific solvent interactions.Surfaces which interface the waters of hydration and the solute are excluded from the ISA [13].Thus ISA provides a means to quantify hydrophobic nature of the solute molecules.

Electronic Charge Index (ECI). Electronic charge index (ECI)
is the sum of the absolute value of the CNDO/2 charges of the side-chain atoms [13].It is a measure of the local polarity at the amino acid side chain.A significant contribution of ECI to activity may indicate the presence of dipolar interactions of the side chain with the receptor site.It is calculated by the following formula: where q i is the atomic charge of the ith atom in the amino acid side chain.

Valence Relative Chirality Index ( v RCI).
Valence relative chirality index allows distinction between the R and S chiral isomers which the regular physicochemical properties cannot distinguish as reflected in the activity of the molecule.In the relative chirality indices [50] calculation, the three groups in descending priority attached to the chiral center are viewed from a reference point to calculate the new chirality metric.
The groups/atoms a, b, c and d are then assigned valence delta value (δ v ) according to the method of Hall and Kier.The group delta value for any group (δ v i ) attached to a chiral carbon is calculated as where n1 is the atom attached directly to the chiral center (nearest neighbor), n2 is attached to n1, n3, to n2, and so on.The relative chirality indices ( v RCIs) for a pair of enantiomers are calculated as (4)

Hydrophobicity Scale (HS).
The estimated hydrophobic effects [51] (kcal/mol) are values based on the contribution of the hydrophobic effect to the burial of each type of amino acid residue and side chain, obtained by analyzing the multitude of hydrophobicity scales.The scale estimates the free energy for transferring a residue from water to a nonaqueous solvent, that is, the affinity of a residue for the solvent.The hydrophobic scale for the amino acid side chains is calculated as the difference between the estimated hydrophobic effect for the individual amino acid burial and that for glycine residue.It describes the thermodynamics of the partitioning of nonpolar compounds between water and a nonaqueous phase.This scale has been calculated to overcome the flaws of a set of previous hydrophobic scales which account for the partitioning of the amino acid residue between aqueous and organic solvents.

Total Dipole Moment (TDM).
It is a partial chargedependent parameter calculated on the basis of the center of charge over the substitution as the origin [52][53][54][55].Tsar3.3 [56] uses an empirical procedure called Charge-2 for the rapid evaluation of partial atomic charges, which utilizes two fundamental chemical concepts; the inductive effect in saturated molecules and Hückel molecular orbital calculations for π systems.The total dipole moment along the amino acid side chain describes the electrostatic interaction at the receptor site.It is calculated as follows: where r i is the distance of the "ith" atom from the origin and "q i " is the atomic charge of the "ith" atom.

Total Lipole Moment (TLM).
The lipole of a molecule is a measure of the lipophilic distribution [57].It is calculated from the sum of atomic log P values.This property has been calculated for the amino acid side chains using Tsar3.3[55].It is calculated using where r i is the distance of an "ith" atom from the origin and "l i " is the atomic log P of the "ith" atom.

Molar Refractivity (MR).
The molar refractive index of a molecule is a combined measure of its size and polarizability [57].This fragment constant thermodynamic descriptor relates the effect of substituents on a reaction center from one type of process to another.The basic idea behind the use of such a descriptor is that similar changes in structure are likely to produce similar changes in reactivity, ionization, and binding.It can be experimentally determined or theoretically calculated using empirical rules.This property has been calculated using the method described by Vishwanadhan et al. as implemented in Tsar3.3 [56].It is calculated as where, "n" is the refractive index, "MW" molecular weight, and "d" is the density of the substituent group.A few other similarity indices were derived from the above described properties.These new similarity indices were derived for "dipeptide" pairs, that is, neighboring amino acids (i, i+1) and denoted as S AB [P] ij ; for "tripeptide" segments, that is, amino acids in a 1-3 relationship (i, i + 1, i + 2) and denoted as S AB [P] ijk , using one of the above mentioned properties [P].
The similarity indices for peptides "A" and "B" for "dipeptide" pairs, that is, neighboring residues "i" and "i + 1" are given by where j = i + 1 and S AB [P] i , S AB [P] j , are similarity indices calculated using (1) for positions "i" and "j," using property [P].
Likewise, the similarity between peptides "A" and "B" computed for "tripeptide" segments, that is, three successive amino acids "i," "j," and "k" is where j = i + 1 and k = i + 2 and S AB [P] i , S AB [P] j , S AB [P] k are the similarity indices calculated using (1) for positions "i," "j," and "k" using property [P].
Likewise, three other variables were calculated.The total similarity between peptides A and B is given by where S AB [P] i is the similarity index for position "i" in the two sequences according to (1).Likewise, (11) is the sum of the similarity indices for all dipeptides in the sequences A and B, as defined by (8). Moreover, is the sum of the similarity indices for all tripeptides motifs in the sequences A and B, as defined by (9).Every amino acid in the query sequence is assigned a similarity index (1) on the basis of a particular amino acid property [P] against the amino acid at that particular position in the reference peptide (see Figure 1), as defined by the sequence alignment rule.When there is a gap in the alignment, that is, no amino acid can be matched in the query sequence, the position in the query is assigned a zero (0) value for the similarity index (see Figure 1), while in the situation where a gap occurs in the alignment, because no amino acid match occurs in the reference sequence but an amino acid is found in the query sequence, then this position in the query sequence is penalized with a negative value of its similarity index calculated against glycine.The matrix containing the similarity indices calculated for a particular property [P] for all sequences in the training set forms the X-variables in the QSAR table which is correlated with the biological activity (Y -variable).During the multiple sequence alignment, there are peptide sequences which translate to the right or left of the reference peptide.The amino acids in the query peptides which are aligned to the right of the first amino acid in the reference peptide are marked by additional position numbers with a negative sign while the amino acids in the query peptide which are aligned to the left of the last amino acid in the reference peptide are marked with positive position numbers, as seen in Figure 1.

QSAR Models and Statistics.
The regression procedure, the third step in the HomoSAR, was carried out with the program-Cerius2 (v4.11Accelrys Inc., San Diego, Calif, USA) [58] running on a RedHat Linux Enterprise WS 4.0 workstation and on an SGI Fuel workstation (Silicon Graphics Inc., Calif, USA).Other modeling and computations were carried out using InsightII (v2005L Accelrys Inc., USA) [58] running on a RedHat Linux Enterprise WS4.0 workstation.All QSAR equations were generated with the genetic function approximation/partial least squares (G/PLS) method [59,60] as implemented in Cerius2, with 10 000 generations, a population size of 500, a smoothness value (d) of 1.0, 6 PLS components, and no scaling of descriptors.The models were generated with equation lengths varying from 7 to 11.The rest of the parameters were set at their default values.The QSAR models were generated for similarity indices calculated for all properties described in Sections 2.3.1 to 2.3.7 collectively.The total X-variables (the similarity indices) numbered 315.

Results and Discussion
In a previous paper, we had reported the Hansch approach using specific properties of the amino acids as descriptors, and the Free-Wilson method to understand the SAR of HLA-A * 0201 nonamer peptides [24].The approaches were able to throw light on how the variation in amino acids at the nine positions influence the activity; but could not shed light on why minor similarities or dissimilarities in the peptide sequences cause large variation in the activity.The method also falls short in explaining whether all the peptides have the same binding pose within the MHC protein.It is not always true that peptides of the same length have the same binding mode.There is a possibility that one sequence may glide or translate in the binding pocket relative to the other amino acid sequence of the same length, thus affecting the binding affinity.Thus simply overlaying peptides in the active site (the atom-based alignment in CoMFA) may be insufficient in understanding peptide QSAR.HomoSAR is a QSAR technique that is based on homology modeling which is an efficient tool in identifying peptide/protein sequences that have a strong underlying relationship in terms of structure and function (activity).The method also uses similarity indices that are based on amino acid properties that reflect important binding attributes (electrostatic, steric, and hydrophobic) to score the peptide sequences aligned against the reference sequence.
The training set and the test sets were separately aligned against the 8 peptides whose crystal structures have been solved.The 8 peptides show perfect sequence alignment without any gaps; which is in harmony with their identical binding modes as seen in the X-ray structures.This places confidence in the alignment results for both the training and test set peptides.

S AB
[TDM]78 Frequency (a) S AB Is the relative valence chiral index over 1st and 2nd position dipeptide pair after the C-terminal

S AB
[TDM]78 Is the total dipole moment over 7th and 8th position dipeptide pair

S AB
[TDM]678 Is the total dipole moment over 6th, 7th and 8th position tripeptide segment

S AB
[ECI]678 Is the electronic charge index over 6th, 7th and 8th position tripeptide segment

S AB
[TDM]45 Is the total dipole moment over 4th and 5th position dipeptide pair

S AB
[ECI]5 Is the electronic charge index over 5th position Is the total dipole moment over 9th, 1st and 2nd position after C-terminal

S AB
[HS]89 Is the hydrophobic scale over 8th and 9th dipeptide pair

S AB
[ISA]78 Is the isotropic surface area over 7th and 8th dipeptide pair

S AB
[HS]23 Is the hydrophobic scale over 2nd and 3rd dipeptide pair

DS AB
[HS] Is the hydrophobic scale over the entire peptide

S AB
[ISA]234 Is the isotropic surface area over 2nd, 3rd and 4th tripeptide segment

S AB
[ISA]123 Is the isotropic surface area over 1st, 2nd and 3rd tripeptide segment The models derived by HomoSAR along with their statistical data are presented in Table 2.All models constructed are statistically significant.The models were internally validated using cross-validation by the leave-one-out (LOO) and leavegroup-out (LGO) protocols and by boot strapping.The models were also tested for their predictive power on a test set.The predictive r 2 (r 2 pred ) for the models is given in Table 2.The plot of the experimental versus predicted binding affinities for the best model is given in Figure 2. The affinities predicted by the best HomoSAR model are given in Table 1.All the 500 equations were analyzed to identify the properties associated with each position in the peptide sequence that best accounts for the biological activity.The frequency of appearance of each property at the different positions in the peptide sequence in the QSAR equation is shown by the bar graph in Figure 3.The results of the QSAR models for the HLA-A * 0201 dataset indicating the preferred nature and type of the amino acid at each position in the sequence are discussed below.

The term DS AB
[HS] appears with high frequency in the QSAR equations; it is the sum of the similarity indices for hydrophobicity of "dipeptide" pairs in the sequence, thus indicating the prevalence of hydrophobic character over the entire length of the peptide as a significant attribute for activity.This is perfectly in line with the nature of the binding cavity of the MHC protein [61].The models also emphasize hydrophobic character for residues at the 2nd and the 3rd positions of the nonamer peptide.This is in complete harmony with all QSAR studies reported on this dataset [3,19,24].Further, at position 4, a small increase in the hydrophobic nature is predicted to improve affinity of the peptide.
The models speak of the need to strike a balance for amino acids at positions 7 and 8; these should be residues with sufficient hydrophobic character as well as a capacity for dipolar interaction with the receptor.This is supported by the X-ray crystal structures of the HLA-A * 0201 complexes, which show residues like tyrosine, tryptophan, and phenylalanine at these positions making dipolar contacts in the binding pocket.The term S AB [ECI] 456 -the electronic similarity index for the "tripeptide" segment spanning positions 4, 5, and 6-emerges with a negative frequency.This means that the electronic character of the amino acids at the three positions 4, 5, and 6 needs to be lowered to an optimal level to enhance binding; this is more so for the 5th position in the sequence.This insight into the requirements for positions 4, 5, and 6 was not revealed in the "descriptor-based QSAR" study [24], but the observations are in line with earlier papers [3,19].
It is appealing to note from the terms appearing in the HomoSAR models that there needs to be a considerable increase in the electronic property of the amino acid occupying positions 6 and 7, while maintaining sufficient hydrophobic character at these positions.This requirement is in agreement with the "binary QSAR" approach [24].
There is a titular appearance of the similarity terms for the extended positions 10 and 11 (see Figure 1) at the C-terminal end of the peptide.These terms show that an increase in the chain length at the C-terminal end is possible; however there can be no extension at the N-terminal end.This is in accordance with the fact that decapeptides do show decent levels of biological activity [61].The standard QSAR methods are unable to extract this information about the peptide length and activity.
The analysis of the HomoSAR models has led to the design of some new peptides with affinity higher than the peptides listed in Table 1.The peptide sequences with their predicted affinities are given in Table 3.

Conclusions
The complexity in peptide design by 3D-QSAR methods arises because of several variables: first, the large number of degrees of freedom that makes secondary structure determination difficult.Second, as the peptide length increases from two to ten, the probability of arriving at the optimal alignment is very remote.The problem aggravates when peptides of varying length have the same level of activity.For this reason, while 2D/3D QSAR has been very successful in the design and discovery of small molecules, the successful applications in peptides are far and few in between.The HomoSAR approach is an attempt to solve the problem of peptide QSAR by primarily moving the crucial step of alignment in 3D-QSAR from 3D space to the less complex 1D space.This has been achieved by adopting the principles of homology modeling into the QSAR formalism.As an application to the MHC class of peptides, the technique was able to extract all known SARs reported for their class as well as reveal a few that were hitherto unknown.The HomoSAR approach is also able to give an idea of the relative binding mode the query peptides can have in relation to the reference peptide.Thus, this technique can be gainfully employed to understand and optimize the relationship between activity and the position and nature of amino acids in any peptide sequence, without resorting to the cumbersome 3D spatial analysis.In conclusion, HomoSAR as a union of homology modeling and QSAR principles is a useful tool in the medicinal chemists' armamentarium to design peptide ligands.

Figure 1 :
Figure 1: A picture of the alignment of HLA-A * 0201 peptides along with positionwise similarity indices calculated by (1).S AB ISAi similarity indices for the query peptide [B] aligned against the reference peptide [A] for positions P −4 to P +2 , the similarity indices have been calculated by (1) using the property-isotropic surface area (ISA).

Figure 2 :
Figure 2: (a) Plot of experimental versus predicted activity for the training set.(b) Plot of experimental versus predicted activity for the test set.

Figure 3 :
Figure 3: Frequency of appearance of the physicochemical property associated at different positions in the sequence in the HomoSAR models.

Table 1 :
HLA-A * 0201 dataset (used for studying QSAR by the HomoSAR approach) with the experimental [pIC 50 ] and predicted affinity [pIC 50 ].

Table 2 :
HomoSAR models with the statistical data.

Table 3 :
Some of the newly designed peptides with their affinities for the HLA-A * 0201 molecule as predicted by the best HomoSAR model.