Defining immunogenic domains of viral proteins capable of eliciting a protective immune response is crucial in the development of novel epitope-based prophylactic strategies. This is particularly important for the selective targeting of conserved regions shared among hypervariable viruses. Studying postinfection and postimmunization sera, as well as cloning and characterization of monoclonal antibodies (mAbs), still represents the best approach to identify protective epitopes. In particular, a protective mAb directed against conserved regions can play a key role in immunogen design and in human therapy as well. Experimental approaches aiming to characterize protective mAb epitopes or to identify T-cell-activating peptides are often burdened by technical limitations and can require long time to be correctly addressed. Thus, in the last decade many epitope predictive algorithms have been developed. These algorithms are continually evolving, and their use to address the empirical research is widely increasing. Here, we review several strategies based on experimental techniques alone or addressed by
The development of vaccines directed against clinical relevant viral pathogens is perhaps the most important contribution of immunology to public health. Traditional vaccine preparations are based on attenuated or inactivated whole viruses or partially purified viral proteins. These strategies, although effective against a large number of pathogens, present drawbacks due to viral intrinsic characteristics such as poor or null
In order to overcome these issues, quite a number of novel approaches have been developed, one of the most promising focusing on epitope-based vaccine preparation.
The possibility to use minimal structures such as peptides, or a mixture of them, as the main constituent of a vaccinal preparation, presents many advantages. Firstly, peptides can be easily produced
This strategy also presents safety benefits, zeroing problematic related to back mutations for attenuated viruses and reducing side effects due to possible improper immune response against viral antigenic determinants.
Perhaps the most important aspect of using well-characterized synthetic peptides as immunogens is related to the specific triggering of both humoral and cell-mediated immune responses against a fundamental domain of a viral protein. Moreover, the possibility to remove antigen (Ag) domains activating suppressor mechanisms may elicit only a protective response targeting conserved functional regions shared among hypervariable viruses [
Despite these advantages, to date no epitope-based vaccines have been used in clinical practice. This is mainly due to low immunogenicity and difficulties related to the fine identification of protective epitopes and/or properly folded antigen structural motifs to be included in a vaccinal preparation. The latter is fundamental to properly activate an effective immune response. Furthermore, a main goal for a successful epitope-based vaccine approach is the identification of epitopes capable of eliciting both humoral and cell-mediated responses [
Different strategies, spanning from antigen presentation techniques to
The described approaches to characterize protein structural motifs to be included in new vaccines targeting hypervariable viruses. The synergistic use of techniques combining experimental and
A crucial step in epitope-based vaccine design is the identification of antigens capable of eliciting a protective immune response specific for a pathogen of interest. Depending on the characteristics of the virus to be targeted, humoral and cellular response changes in relevance. As an example, the former plays a crucial role in conferring specific immunity for influenza virus infection. Many researches have been focused on the characterization of protective monoclonal antibodies (mAbs) targeting widely conserved hemagglutinin (HA) regions among different influenza subtypes [
Different methods, either exclusively based on experimental approaches or involving the use of
Structural resolution of a specific mAb in complex with its target through X-ray crystallography or nuclear magnetic resonance (NMR) is to date the only procedure to obtain interaction information at atomic level [
The MS based techniques permit to define mAb epitopes at a medium resolution. All the MS-approaches aim at the identification of mAb footprint on the targeted antigen [
Mimotopes are small peptides able to mimic antigenic conformational structures recognized by an antibody (Ab) paratope. The most frequently used approach to isolate specific mimotopes recognized by a mAb is based on the screening of a random peptide phage display through biopanning techniques [
Selected peptides are then sequenced, aligned to antigen sequence, and, if available, superimposed to its three-dimensional (3D) structure, allowing the identification of the immunogenic domain. This process often requires the use of specific
Identification of mimotopes is a powerful technique as it easily allows to map many antigenic determinants at the same time using a polyclonal serum or to identify a single mAb epitope at a medium resolution [
Continuous epitopes include ~10% of all known antibodies epitopes; while they comprise a minority of all epitopes found in nature, many computational methods focus on their mapping [
Sequence-based algorithms represent the first attempt to predict B-cell epitopes located on a protein surface without
Considering the amino acid scale-based methods as a starting point, novel algorithms combining different propensity scales and machine-learning methods have been developed. While the former strategy did not lead to substantial improvements, machine-learning methods have proven their efficacy when tested, exceeding the
In the last few years several machine-learning algorithms exploiting Support Vector Machine (SVM) have been implemented as well, leading to a progressive prediction improvement in terms of accuracy, sensitivity, and specificity [
Recently Lin et al. developed the algorithm BEEPro, an SVM-based learning-machine which uses fourteen physiochemical scales to generate a hybrid propensity scale including antigenicity, hydrophilicity, flexibility, composition, volume, charge transfer and donor capability, hydrogen bond donor capability, and secondary structure features. It is then further combined with an amino acid ratio propensity scale representative of the propensity of each amino acid to be part of an epitope and a position specific scoring matrix (PSSM) which reflects the evolutionary information of a peptide [
Considering these parameters, BEEPro, has been trained with the Sollner dataset comprising many non-redundant linear epitopes and proved itself to efficiently predict both linear and conformational epitopes, outperforming other prediction algorithms [
Conformational epitopes mapping represents a challenging goal in different biological and medical fields. In the last few years many algorithms capable of predicting conformational epitopes have been developed. They can be divided in structure-based and sequence-based algorithms.
Structure-based algorithms work on three-dimensional (3D) proteins structure obtained either through X-ray crystallography or NMR and exploit different spatial parameters as well as amino acids statistics. CEP [
DiscoTope is a method oriented to conformational epitopes prediction; the algorithm bases its prediction on the combination of hydrophilicity, amino acids propensity score taken from a dataset of resolved antibody/antigen structures, residues spatial neighborhood, and area of relative solvent accessibility [
After CEP and DiscoTope, many others machine-learning methods to predict conformational epitopes starting from a 3D structure have been developed; PEPITO (
Moreover, new algorithms try to improve analysis and broaden targets using linear sequences when structures are not available. ElliPro (
Despite the effort, none of the structure-based methods reached a high efficiency in terms of accuracy, sensitivity, and specificity. Unsuccessful attempts might be due to many aspects; first of all, the number of antibody/antigen resolved structures is too small to provide a robust statistical sampling of all possible epitopic patches. Moreover, datasets are affected by the low resolution of some structures. Another issue is the lack of consideration of proteins as complexes
Considering efficiency issues and limited available antigens structure, novel sequence-based methods have been developed. The first attempt is represented by the CBTOPE (
Recently two more sequence-based algorithms, the aforementioned BEEPro, and the method published by Zhang et al. outperformed CBTOPE results. Results succeeded by these three algorithms are related to the usage, besides many physiochemical properties, of matrices that try to identify specific nonlinear patterns for epitopic and non-epitopic patches.
Considering results achieved by CBTOPE, Zhang et al. tried to explore more potentially relevant sequence-derived features effective for the conformational epitopes prediction. Besides physiochemical characteristics and amino acids propensity to be part of an epitope, residues side chains have been clustered in thirteen classes to compute the propensity for each of them; moreover, a PSSM has been used as in BEEPro to calculate evolutionary conservation. A term representing the secondary structure is included as well. The random forest machine-learning algorithm is then used to classify each query protein patch on the basis of every feature creating an output ensamble and then rank the results. It is interesting to notice that Zhang et al. determined the PSSM to be the most effective feature in predicting epitopes explaining BEEPro performance [
While moving towards an epitope-based vaccine strategy, both humoral and cell-mediated response have to be taken into account (Figure
Protective T epitopes characterization involves different issues that are related to the complexity of their processing and presentation on MHC I and MHC II; merely screening all possible MHC-binding peptides does not in fact directly correlate to their role in inducing immunity. Physiological pathogen-specific T-cell activation involves in fact several steps, comprising antigen digestion by the proteasome/immunoproteasome, interaction with the transporter associated with antigen processing (TAP) protein for MHC I binding, binding to MHC and TCR recognitions. Efficient T epitopes prediction has to take into account all these aspects; ideal immunogenic peptides thus must be efficiently processed by the immunoproteasome and delivered by TAP into the endoplasmic reticulum to bind to MHC I. Moreover, considering the human leukocyte antigen (HLA) allelic diversity, effective vaccine peptides have to be recognized by haplotypes widely shared among the population [
To date many online tools are available to predict cleavage, TAP translocation, and HLA specificity for MHC I and MHC II binding. Several databases reporting binding peptides are available online as well. The synergistic use of these tools can noticeably restrict the number of peptides to be experimentally analyzed. Here we describe
As described previously, protective T epitopes prediction has to take into account different aspects.
A first analysis can be easily done using databases of well-characterized peptides recognized by T cells (Table
Examples of the most commonly used databases and sequence-based algorithms for T-cell epitopes prediction.
Databases | Link | Algorithms used (cited ones) |
---|---|---|
Immune Epitope Database (IEDB) |
|
Stabilized Matrix Method-NetMHC-NetMHCIIpan-NetChop |
SYFPEITHI |
|
SYFPEITHI |
HIV Molecular Immunology Database |
|
|
IMGT/HLA Database |
|
|
| ||
Sequence-based algorithms | Link | Brief description |
| ||
SYFPEITHI |
|
Use of anchor residues |
BIMAS |
|
MHC I epitopes predictor |
Stabilized Matrix Method |
|
|
NetMHC |
|
Artificial neural network |
NetMHCIIpan |
|
Artificial neural network |
PROPRED |
|
Use of quantitative matrices derived from the literature |
NetChop |
|
Artificial neural network |
FragPredict |
|
Proteasomal cleavage sites and proteolytic fragments predictor |
Another example of database comprising huge number of peptides characterized and available in the literature is SYFPEITHI (
Other more specific databases are available to date, most notably the HIV-dedicated B- and T-cell epitope database (
Selecting target HLAs is another crucial step in epitope-based vaccinology, as an effective preparation has to include protective epitopes capable of binding MHCs in the majority of individuals; the IMGT HLA database (
Several algorithms are currently used in T-cell epitopes prediction. Considering the increasing importance of
Structure-based MHC binding prediction methods can be clustered in three main categories, based on protein threading, homology modeling, or protein-protein docking. Protein-threading methods use a known peptide/MHC complex structure to predict binding features of others peptides to the same MHC; this process involves the substitution of the original peptide with the one to be tested followed by a side chains orientation optimization [
Homology modeling has been used to predict MHC-binding peptides and potentially represents an improvement of threading methods since it allows to model both novel peptides and homologous MHC starting from a crystallographic structure [
Docking techniques differ from protein threading and homology modeling since they do not rely on a template peptide; their aim is in fact to explore all possible query peptide orientations in the binding with MHCs. Many different docking-based approaches have been extensively used, either based on rigid docking evaluation or on molecular dynamics, and Monte Carlo simulations performed to find the best fitting geometry and evaluate binding strength [
Sequence-based methods have been far more developed considering their low computational cost and independency from available crystallographic structures. As happened for B cell epitopes prediction algorithms, in the last decade these methods significantly improved and, starting from simple statistical sequence analysis, have moved towards machine-learning methods.
First attempts were based on the evidence that MHC binding pocket presents cavities with specific residues that require a certain degree of complementarity with specific epitope residues, defined as anchor residues; these algorithms thus search for this type of residues in specific positions, giving the highest contribute in MHC/epitope bindings. However, this strategy completely dismisses the contribute of nonanchor residues, resulting in a prediction lacking specificity and sensitivity [
From a simple search of specific residues, new algorithms moved towards a binding matrix-based strategy that takes into account residue frequencies at each epitope position; scoring matrices are built on the sequences of experimentally known binders and comprise information about position-specific frequencies and binding affinity. Binding matrices algorithms return more reliable results, and some of them, such as SYFPEITHI (
Novel algorithms evolved and adopted machine-learning approaches such as ANNs, HMMs, and SVMs; these algorithms have the advantage to perform predictions handling nonlinear data. ANN algorithms are some of the best predictors; they represent epitopes features as amino acid descriptors and perform complex pattern recognition after being trained with a dataset of epitopic and nonepitopic peptides. Their main drawback is the capability to predict epitopes only when query peptides and the training dataset are of the same length. Considering MHC II epitopes length variability, an alignment of peptides contained in the dataset to search for a pattern in the sequence core of defined length is necessary [
To date there are tens of online tools to predict MHC I and MHC II epitopes; considering the lack of standardization in dataset, the heterogeneity in output features and a highly variable performance of the same algorithm depending on the HLA type, defining the most reliable predictor, is not trivial. Lin et al. defined a standard benchmark protocol for both MHC I and MHC II predictors and tested the performance of the most used algorithms [
Although MHC binding prediction algorithms have reached high performances, they do not take into account the biological processes involved in epitopes production; predicted epitopes might not in fact be produced from antigen degradation [
Among the others, the ANN algorithm NetChop-3.0 (
Experimental techniques for T-cell epitopes mapping can be roughly divided in two main groups defined as cell based and cell free.
Cell-based techniques mainly involve the screening of synthetic peptides on T-cell population to evaluate binding specificity. The aforementioned computational methods play a fundamental role to focus the analysis on a selected cohort of peptides, reducing the number of potential ligands to be tested. Hereafter, we review the most common approaches used to date [
A broadly used cell-based approach is the enzyme linked immunospot assay (ELISPOT) [
Other cell-based assays are based on flow cytometry techniques that allow the selection of activated T cells. A widely used approach involves the culture of T cells in copresence of putative epitopes and a secretion inhibitor [
Lymphoproliferation assays rely as well on cytometric relevation; they consist in the uptake of the CFSE dye from T cells before activation [
The use of cell-based techniques presents several advantages, most notably the possibility to test the putative T cell-activating peptides directly against target cells. The main drawback consists in the need to be addressed by preliminary computational studies to reduce time and resources expense.
Many cell-free methods have been developed to identify a definite antigen region potentially able to stimulate an effective T-cell response. Here, we briefly review one of the most promising approaches adopted in this research field [
Several approaches combining the use of computational analysis with laboratory techniques have been widely described in the scientific literature [
First example regards the epitope characterization of PN-SIA28, a mAb endowed with potent neutralizing activity against highly phylogenetically divergent isolates of Influenza A virus and directed against a conserved region of the surface glycoprotein hemagglutinin. PN-SIA28 has been characterized through different experimental and
As previously described, T-cell epitopes prediction requires the use of databases and bioinformatic tools to address experimental studies. Predictive algorithms are employed to significantly reduce the number of putative peptides to be tested against T cells. As an example, Wang et al. used the NetCTL server, which rely on ANN-based algorithms to predict proteasomal cleavage, interaction propensity to TAP and MHC bindings to obtain a limited number of putative HLA-binding peptides derived from influenza A proteins [
Hypervariable viruses still represent a major world health threat. The identification of conserved protein domains, shared among the different viruses and able to elicit a protective immune response, opens new perspectives in the development of epitope-based vaccines. In particular, the discovery of protective mAbs, able to target these broadly shared protein motifs, permits to work on the identification of peptides able to mimic these epitopes, and hopefully, to elicit an immune response similarly protective. Moreover, the possibility to identify peptides able to elicit an effective T-cell response against these viruses can enormously implement the efficacy of a new vaccine formulation able to elicit both T- and B-cell protective responses (Figure