Model for Vaccine Design by Prediction of B-Epitopes of IEDB Given Perturbations in Peptide Sequence, In Vivo Process, Experimental Techniques, and Source or Host Organisms

Perturbation methods add variation terms to a known experimental solution of one problem to approach a solution for a related problem without known exact solution. One problem of this type in immunology is the prediction of the possible action of epitope of one peptide after a perturbation or variation in the structure of a known peptide and/or other boundary conditions (host organism, biological process, and experimental assay). However, to the best of our knowledge, there are no reports of general-purpose perturbation models to solve this problem. In a recent work, we introduced a new quantitative structure-property relationship theory for the study of perturbations in complex biomolecular systems. In this work, we developed the first model able to classify more than 200,000 cases of perturbations with accuracy, sensitivity, and specificity >90% both in training and validation series. The perturbations include structural changes in >50000 peptides determined in experimental assays with boundary conditions involving >500 source organisms, >50 host organisms, >10 biological process, and >30 experimental techniques. The model may be useful for the prediction of new epitopes or the optimization of known peptides towards computational vaccine design.


Introduction
National Institute of Allergy and Infectious Diseases (NIAID) supported the launch, in 2004, of the Immune Epitope Database (IEDB), http://www.iedb.org/[1][2][3][4].The IEDB system withdrew information from approximately 99% of all papers published to date that describe immune epitopes.In doing so, IEDB system analyses over 22 million PubMed abstracts and subsequently curated ≈13 K references, including ≈7 K manuscripts about infectious diseases, ≈1 K about allergy topics, ≈4 K about autoimmunity, and 1 K about transplant/alloantigen topics [5].IEDB lists a huge amount of information about the molecular structure as well as the experimental conditions (  ) in which different th molecules were determined to be immune epitopes or not.This explosion of information makes necessary both query/display functions for retrieval of known data from IEDB as well predictive tools for new epitopes.Salimi et al. [5] reviewed advances in epitope analysis and predictive tools available in the IEDB.In fact, IEDB analysis resource (IEDB-AR: http://tools.iedb.org/) is a collection of tools for prediction of molecular targets of Tand B-cell immune responses (i.e., epitopes) [6,7].
On the other hand, Quantitative Structure-Activity/Property Relationships (QSAR/QSPR) techniques are useful tool to predict new drugs, RNA, drug-protein complexes, and protein-protein complexes.In general, QSAR/QSPR-like methods transform molecular structures into numeric molecular descriptors (  ) in a first stage and later fit a model to predict the biological process.For example, DRAGON [8][9][10], CODESSA [11,12], MOE [13], TOPS-MODE [14][15][16][17], TOMO-COMD [18,19], and MARCH-INSIDE [20] are among the most used softwares to calculate molecular descriptors based on quantum mechanics (QM) and/or graph theory [21][22][23][24][25][26][27].The software STATISTICA [28] and WEKA [29] are often with applications in like protein spectroscopy and others [53][54][55][56][57].In a very recent work Gonzalez-Diaz et al. [58] formulated a general-purpose perturbation theory or model for multiple-boundary QSPR/QSAR problems.However, there is not report in the immunoinformatics literature of a general QSPR perturbation model for IEDB B-epitopes.Here we report the first example of QSPR-perturbation model for Bepitopes reported in IEDB able to predict the probability of occurrence of an epitope after a perturbation in the sequence, the experimental technique, the exposition process, and/or the source or host organisms.

Materials and Methods
2.1.Molecular Descriptors for Peptides.We calculated the molecular descriptors of the structure of peptides using the software MARCH-INSIDE (MI) based on the algorithm with the same name [59].The MI approach uses a Markov Chain method to calculate the th mean values of different physicochemical molecular properties (  ) for th molecules ().These (  ) values are calculated as an average of  (  ) values for all atoms placed at topological distance  ≤ ; which are in turn the means of atomic properties (  ) for all atoms in the molecule and its neighbors placed at  = .For instance, it is possible to derive average estimations of molecular refractivities  MR(  ), partition coefficients  (  ), and hardness  (  ) for atoms placed at different topological distances  ≤ .In this first work, we calculated only one type of (  ) values.We calculated for all peptides the average value (  ) of all the atomic electronegativities   for all   atoms connected to the th atom ( → ) and their neighbors placed at a distance  ≤ 5 [59]: We calculate the probabilities   (  ) for any atomic property including   (  ) using a Markov Chain model for the gradual effects of the neighboring atoms at different distances in the molecular backbone.This method has been explained in detail in many previous works so we omit the details here [59].

Electronegativity Perturbation Model for Prediction of B-
Epitopes.Very recently Gonzalez-Diaz et al. [58] formulated a general-purpose perturbation theory or model for multipleboundary QSPR/QSAR problems.We adapted here this new theory or modeling method to approach to the peptide prediction problem from the point of view of perturbation theory.Let be a set of th peptide molecules denoted as   with a value of efficiency   as epitopes experimentally determined under a set of boundary conditions   ≡ ( 0 ,  1 ,  2 ,  3 , . . .,   ).
We put the main emphasis here on peptides reported in the database IEDB.In this sense, the boundary conditions   used here are the same reported in this database,  0 = is the specific peptide,  1 = so  ,  2 = ho  ,  3 = ip  , and  4 = tq  .In general, so is the organism that expresses the peptide (but it can include also artificial peptides, cellular lines, etc.), ho is the host organism exposed to the peptide by means of the bp detected with tq. ( The state function    is for the th peptide measured under a set of   boundary conditions in output, final, or new state.The conjugated state function    is for the th peptide measured under a set of   boundary conditions for the input, initial, or reference state.The difference Δ between the new (output) state and the reference (input) state is the additive perturbation [58].Consider Equation ( 3) described before opens the door to test different hypothesis.A simple hypotheses is H 0 : existence of one small and constant value of the perturbation function Δ =  0 for all the pairs of peptides and a linear relationship We can use elemental algebraic operations to obtain from these equations an expression for efficiency as epitope of the peptide (  ) new .In this case, considering   ≈   , we can obtain the different expressions; the last may be very useful to solve the QSRR problem for the large datasets formed by IEDB B-epitopes.Consider The * indicates that quantities like *  is the average value of the mean electronegativity (  ) for all the peptides in IEDB that are epitopes for the same boundary condition.

Results and Discussion
We propose herein, for the first time, a QSRR-perturbation model able to predict variations in the propensity of a peptide to act as B-epitope taking into consideration the propensity of a peptide of reference and the changes in peptide sequence, immunological process, host organism, source organisms, and the experimental technique used.
[58]ur analysis, based on the data reported by IEDB we are unable to work with continuous values of epitope activity   .Consequently, we have to predict the discrete function of B-epitope efficiency (  ) = 1 for epitopes reported in the conditions   and (  ) = 0, otherwise.Our main aim is to predict the shift or change in a function of the output efficiency Δ(  ) = (  ) ref − (  ) new that takes place after a change, variation, or perturbation (Δ) in the structure and/or boundary conditions of a peptide of reference.But we know the efficiency of the process of reference (  ) ref in addition to the molecular structure and the set of conditions   for initial (reference) and final processes (new).Consequently, to predict Δ(  ) we have to predict only (  ) new the efficiency function of the new state obtained by a change in the structure of the peptide and/or the boundary conditions.Let Δ be a perturbation in a function ; we can define   as the state information function for the reference and new states.According to our recent model[58], we can write   as a function of the conditions and structure of the peptide   as follows.In fact, the variational state functions   have to be written in pairs in order to describe the initial (reference) and final (new) states of a perturbation, as follow: avg ) .

Table 2 :
Average values and count of input-output cases for different organisms, process, and techniques.

Table 3 :
Top100 values of p1 for positive perturbations in training series.