Accurate measurement of B and T cell responses is a valuable tool to study autoimmunity, allergies, immunity to pathogens, and host-pathogen interactions and assist in the design and evaluation of T cell vaccines and immunotherapies. In this context, it is desirable to elucidate a method to select validated reference sets of epitopes to allow detection of T and B cells. However, the ever-growing information contained in the Immune Epitope Database (IEDB) and the differences in quality and subjects studied between epitope assays make this task complicated. In this study, we develop a novel method to automatically select reference epitope sets according to a categorization system employed by the IEDB. From the sets generated, three epitope sets (EBV, mycobacteria and dengue) were experimentally validated by detection of T cell reactivity
Adaptive immunity is based on the recognition of specific molecular structures, named epitopes, by either antibodies/B cell receptors or T cell receptors. Antibodies and B cell receptors bind a wide variety of structures, including proteins and carbohydrates. In the case of protein ligands, antibodies can recognize either a series of contiguous residues (linear epitopes) or a set of residues encoded in disparate regions of the protein sequence and brought together in the three dimensional structure of the protein ligand (discontinuous epitopes).
T cells recognize a complex between MHC molecules (named HLA in humans and H-2 in mouse) and, in most cases, a peptidic epitope of 8–16 residues in length [
Accurate measurement of B and T cell responses is a valuable tool to study autoimmunity, allergies, immunity to pathogens, and host-pathogen interactions and assist in the design and evaluation of T cell-based vaccines and immunotherapies [
As immune investigations proceed over time, many different epitopes from various organisms have been identified. Alternatively, large-scale epitope identification can reveal hundreds of potential epitopes [
This large amount of information might, in some cases, pose a challenge for the identification and selection of appropriate sets of epitopes for use in specific contexts. Thus, it is clearly desirable, given the ever-growing body of information contained in the IEDB, to develop tools to enable the efficient generation of sets of validated reference epitopes for any antigenic source of interest.
While an epitope, according to classic definitions, is any structure capable of interacting with T and B cell receptors, in practice the consensus in the scientific community is that certain types of assays identify the most relevant and validated epitopes. In the context of antibody reactivity, by way of example, epitopes identified on the basis of X-ray structures of Ag/antibody complexes, biological activity, and
Indeed, for many applications, it is desirable to study T cells
An important consideration in the definition of reference sets of epitopes is how to factor the number of individual donors or experiments in which a given structure is reported to elicit a positive response, and particularly if this validation is provided in multiple independent studies. For example, different studies often report on essentially the same epitope but utilize different nested, truncated, or frame-shifted version of the same sequence, leaving uncertainty on how to combine the data or which particular version of the epitope to select for testing. Clarification of a general approach for combining data from such disparate studies would greatly facilitate the generation of nonredundant sets of epitopes.
In the present study, we have attempted the definition of an automated process to generate reference sets of high quality epitopes for various disease indications. The resulting tool, made available to the scientific community, provides a standardized and reproducible platform to automatically extract and process relevant data from the IEDB without the need of complex analysis and judgment calls from the user. At the same time, the tool also offers flexibility to enable the end user to design sets meeting specific user-defined criteria. We have also analyzed the data currently available in the IEDB, to determine how many sets of pathogen or autoantigen specific epitopes could be identified on the basis of the data available to date.
Epitope data was derived from the IEDB database as of October 2014. MySQL was used to run queries and directly work with the database itself. The web page application is written in PHP/HTML code with a MySQL connection that allows communication between the database and the user interface.
Two independent scoring systems were developed to allow ranking and sorting of the epitopes. The first was based on the type of assays used to characterize the epitopes and the second on the frequency by which each epitope was recognized.
Regarding the assay type scoring system for MHC class I or class II epitopes, in our selection we included epitopes defined by multimer/tetramer staining, ELISPOT, and ICS assays. We arbitrarily associate a numerical parameter value of 3, 2, and 1 to these assay types, respectively. Each of these assay types can be used in either an
In terms of scoring each epitope on the basis of the frequency by which it was recognized, we utilized a previously described Response Frequency (RF) score [
Both the assay score and RF score are calculated for each epitope and provided in the results. This allows further ranking or selecting epitopes based on different thresholds for these criteria.
For MHC class I epitopes, it is generally observed that a length of about 8–11 residues is optimal for T cell recognition and use in assays. Because of the structure of the class I binding groove, distinct class I sequences typically represent unique epitopes, even if they are nested within a longer sequence that is also recognized by T cells. Accordingly, for the present study, we have not subjected class I epitopes of nested or overlapping character to further processing.
For MHC class II epitopes, however, optimal epitopes are usually longer than the minimal T cell recognized 9-mer core. In general, class II epitopes are optimally of 13–20 residues in length [
Example of a dataset reduction of MHC class II epitopes.
Epitopes before being processed by the clustering tool; epitopes forming a potential consensus sequence or cluster are in bold
Epitope/cluster | RF score | Assay score |
---|---|---|
MLVLLVAVLVTAVYAFVHA | 0.67 | 8 |
|
|
|
VPSPSMGRDIKVQFQSGGAN | 0.65 | 12 |
NVTSIHSLLDEGKPT | 0.63 | 12 |
|
|
|
AQAAVVRFQEAANKQKQELD | 0.47 | 12 |
|
|
|
FAGIEAAASAIQGNV | 0.42 | 12 |
The cluster generated by combining the sequences and associated information is in bold
Epitope/cluster | RF score | Assay score |
---|---|---|
MLVLLVAVLVTAVYAFVHA | 0.67 | 8 |
VPSPSMGRDIKVQFQSGGAN | 0.65 | 12 |
|
|
|
AQAAVVRFQEAANKQKQELD | 0.47 | 12 |
FAGIEAAASAIQGNV | 0.42 | 12 |
In the case of donors with latent tuberculosis infection (LTBI), leukapheresis or whole unit blood samples from 10 adults were obtained from the University of California, San Diego, Antiviral Research Center (AVRC) clinic. Donors were classified as LTBI based on positive QuantiFERON-TB Gold In-Tube (Cellestis), as well as a physical exam and/or chest X-ray that was not consistent with active tuberculosis. Because Dengue virus (DENV) prevalence is low in the San Diego area, most LTBIs are DENV naïve.
To obtain DENV seropositive samples, anonymous blood donations from healthy adults were obtained by the National Blood Center, Ministry of Health, in the area of Colombo, Sri Lanka. Plasma of the associated donation was tested for serology using the flow-based U937+DC-SIGN neutralization assay (conducted at the University of North Carolina, Chapel Hill) as previously described [
All Samples were collected and used following guidelines from the Institutional Review Boards (IRB) of LJI and the Medical Faculty, University of Colombo (serving as National Institutes of Health-approved IRB for Genetech Research Institute).
15-mer peptides were synthesized as crude material on a small (1 mg) scale by Mimotopes (Victoria, Australia) and/or A and A (San Diego). PBMCs were purified by density gradient centrifugation (Ficoll-Hypaque, Amersham Biosciences) from 100 mL of leukapheresis sample or 450 mL of whole blood, according to manufacturer’s instructions. Cells were cryopreserved in liquid nitrogen suspended in fetal bovine serum (Gemini Bio-products) containing 10% dimethyl sulfoxide.
PBMCs (
As a preliminary step towards deriving sets of reference epitopes associated with preferred validated assays, we processed the data contained in the IEDB relating to T cell epitopes. As of October 2014, a total of 28370 epitopes are associated with positive results in at least one T cell assay.
As an example of filtering strategies, we first considered only peptidic epitopes associated with infectious agents and allergies (Figure
Diagram of the filtering steps towards the generation of the validated sets of epitopes, including the number of epitopes found in each step.
The next step in our process was to filter the results further by selecting epitopes that have been tested in “high quality” assays. This is possible because the IEDB curates the specific assays that are used to define and characterize the specific epitopes reported in the literature or provided to the database by direct submission. While obviously any desired assay set could be used, here we selected for inclusion the multimer/tetramer staining, ELISPOT and ICS assays. This assay-based filtering resulted in a final total of 6345 epitopes, 2512 and 3833 for class I and class II epitopes, respectively (Figure
We surveyed the epitope data in the IEDB in terms of the species and antigens of provenance (epitope sources). For this purpose we adapted the categorization adopted by Seymour et al. [
In Table
(a) Number of epitopes per category for viruses and bacteria. (b) Number of epitopes per category for eukaryotes (nonhuman). (c) Number of epitopes per category for autoimmune epitopes (human and mouse).
Class I | Class II | B cell | ||||||
---|---|---|---|---|---|---|---|---|
Linear | Discontinuous | |||||||
HLA | H-2 | HLA | H-2 | Human | Mouse | Human | Mouse | |
Virus | ||||||||
ssRNA (−) strand virus | ||||||||
H1N1 subtype influenza A | 41 | 77 | 207 | 206 | 15 | 11 | 27 | |
H3N2 subtype influenza A | 18 | 18 | 92 | 17 | 11 | 22 | 81 | |
Other influenza A subtypes |
116 | 43 | 179 | 25 | 43 | 16 | 81 | |
Influenza B/C | 13 | 36 | ||||||
Paramyxoviridae |
47 | 80 | 28 | 27 | 33 | 71 | ||
Hantavirus | 14 | |||||||
ssRNA (+) strand virus | ||||||||
Dengue virus | 432 | 116 | 58 | 139 | 24 | 29 | 97 | |
Hepatitis C virus | 405 | 65 | 241 | 20 | 53 | 32 | 27 | 12 |
West Nile virus | 33 | 99 | 103 | 24 | ||||
Yellow fever | 18 | 34 | 94 | 118 | ||||
Japanese encephalitis virus | 33 | |||||||
Picornaviridae (coxsackie, |
14 | 12 | 52 | 77 | 94 | |||
Coronaviruses | 22 | 38 | 38 | 25 | 23 | 28 | 27 | |
Retrotranscribing virus | ||||||||
Hepatitis B virus | 59 | 72 | 38 | 18 | 13 | |||
|
25 | |||||||
dsDNA virus | ||||||||
Adenoviruses | 13 | 45 | 10 | |||||
Alphaherpesvirinae |
91 | 52 | 35 | 13 | 32 | 32 | ||
Betaherpesvirinae (CMV, |
204 | 48 | 141 | 20 | ||||
Gammaherpesvirinae |
237 | 63 | 59 | |||||
Papillomaviridae |
80 | 44 | 72 | 30 | ||||
Poxviridae (vaccinia, pox) | 228 | 343 | 76 | 30 | ||||
Polyomavirus |
31 | 14 | ||||||
Parvoviridae | 21 | 24 | 10 | |||||
Bacteria | ||||||||
Actinobacteria/proteobacteria | ||||||||
Alphaproteobacteria |
31 | |||||||
Betaproteobacteria |
324 | 158 | 33 | |||||
Mycobacterium | 129 | 33 | 478 | 33 | 11 | |||
Firmicutes/other bacteria | ||||||||
Chlamydiales (chlamydia) | 15 | 38 | 37 | |||||
Clostridiales | 70 | |||||||
Other Bacilli |
106 | 19 |
Class I | Class II | B cell | ||||||
---|---|---|---|---|---|---|---|---|
Linear | Discontinuous | |||||||
HLA | H-2 | HLA | H-2 | Human | Mouse | Human | Mouse | |
Alveolata | ||||||||
Plasmodium |
64 | 29 | 186 | 33 | 16 | 49 | ||
Euglenozoa | ||||||||
Trypanosomatidae |
91 | 30 | 14 | |||||
Fungi | ||||||||
|
50 | 50 | 73 | |||||
Other fungi | 77 | 54 | 11 | |||||
Plants | ||||||||
Fabaceae (peas, |
17 | 419 | ||||||
Betulaceae (birch family) | 30 | 24 | ||||||
Cupressaceae (cypress, |
21 | 22 | ||||||
Gluten, coeliac |
23 | 245 | 36 | |||||
Timothy-grass | 474 | 19 | ||||||
Other grass | 98 | 124 | ||||||
Amaranthaceae | 20 | |||||||
Animals | ||||||||
Insects | 67 | 23 | ||||||
Arachnid | 97 | 33 | 13 | 14 | ||||
Mammals | 119 | 41 | 69 | 707 | 80 | 15 |
Class I | Class II | B cell | ||||||
---|---|---|---|---|---|---|---|---|
Linear | Discontinuous | |||||||
HLA | H-2 | HLA | H-2 | Human | Mouse | Human | Mouse | |
Rheumatoid arthritis | 27 | 11 | ||||||
Diabetes | 73 | 76 | 46 | 17 | ||||
Multiple sclerosis | 11 | 13 |
As a result of the processes described above, we generated sets of epitopes for the various categories. As an example, Table
Parvoviridae virus validated epitope set downloaded from the web tool.
Epitope/cluster | Epitope ID | Source organism | Source protein | MHC restriction | RF score | Assay score | Assay type | Effector origin |
---|---|---|---|---|---|---|---|---|
FYTPLADQF | 18474 | Human parvovirus B19 | Noncapsid protein NS-1 | HLA-A*24:02 | 0.51 | 12 | Multimer/tetramer, 51 chromium, ELISPOT | Direct |
GLCPHCINV | 20786 | Human parvovirus B19 | Noncapsid protein NS-1 | HLA-A*02:01, HLA-A2 | 0.46 | 4 | ELISPOT, 51 chromium | Direct |
QPTRVDQKM | 51981 | Human parvovirus B19 | Noncapsid protein NS-1 | HLA-B35 | 0.29 | 3 | ELISPOT, 51 chromium, multimer/tetramer | Cell line/clone |
LLHTDFEQV | 37397 | Human parvovirus B19 | Noncapsid protein NS-1 | HLA-A*02:01, HLA-A2 | 0.21 | 4 | ELISPOT, 51 chromium | Direct |
TAKSRVHPL | 62900 | Human parvovirus B19 | Viral protein 2 | HLA-B8 | 0.12 | 4 | ELISPOT, 51 chromium | Direct |
TEADVQQWL | 63285 | Human parvovirus B19 | Noncapsid protein NS-1 | HLA-B40 | 0.1 | 4 | ELISPOT, 51 chromium | Direct |
SSHSGSFQI | 61077 | Human parvovirus B19 | Noncapsid protein NS-1 | HLA-Class I | 0 | 4 | ELISPOT, 51 chromium | Direct |
SESSFFNLI | 57628 | Human parvovirus B19 | Noncapsid protein NS-1 | HLA-B40 | 0 | 4 | ELISPOT | Direct |
VQQWLTWCN | 70634 | Human parvovirus B19 | Noncapsid protein NS-1 | HLA-Class I | 0 | 4 | 51 chromium, ELISPOT | Direct |
VPQYGYLTL | 70458 | Adeno- associated virus - 2 | Major coat protein VP1 | HLA-B*07:02 | 0 | 2 | ICS, biological activity, ELISA | Short term restimulated |
SALKLAIYKA | 56861 | Human parvovirus B19 | Noncapsid protein NS-1 | HLA-Class I | 0 | 8 | RNA/DNA detection, ICS | Direct |
TEADVQQWLTW | 63286 | Human parvovirus B19 | Non-capsid protein NS-1 | HLA-B44 | 0 | 4 | ELISPOT | Direct |
QSALKLAIYK | 52287 | Human parvovirus B19 | Noncapsid protein NS-1 | HLA-Class I | 0 | 8 | ICS | Direct |
IDTCISATFR | 25677 | Human parvovirus B19 | Noncapsid protein NS-1 | HLA-Class I | 0 | 4 | ELISPOT | Direct |
HAKALKERMV | 23542 | Human parvovirus B19 | Noncapsid protein NS-1 | HLA-Class I | 0 | 4 | ELISPOT | Direct |
GLFNNVLYH | 20861 | Human parvovirus B19 | Noncapsid protein NS-1 | HLA-Class I | 0 | 4 | 51 chromium, ELISPOT | Direct |
LHTDFEQVM | 36432 | Human parvovirus B19 | Noncapsid protein NS-1 | HLA-Class I | 0 | 4 | ELISPOT, 51 chromium | Direct |
LLHTDFEQVM | 37398 | Human parvovirus B19 | Noncapsid protein NS-1 | HLA-A*02:01 | 0 | 8 | ICS | Direct |
GLCPHCINVG | 20787 | Human parvovirus B19 | Noncapsid protein NS-1 | HLA-Class I | 0 | 8 | ICS, RNA/DNA detection | Direct |
EADVQQWLT | 11014 | Human parvovirus B19 | Noncapsid protein NS-1 | HLA-Class I | 0 | 4 | ELISPOT, 51 chromium | Direct |
RMTENIVEV | 145986 | Human parvovirus 4 | ORF1 | HLA-A*02:01 | 0 | 12 | Multimer/tetramer, ICS | Short term restimulated, direct |
Having established the conceptual framework for selection of epitope sets, we next expanded our applicability. Autoimmune epitopes are identified by the fact that both source antigen and host organism are the same (e.g., both the T cells and the epitope are originated from a human source). As listed in Table
We considered expanding the scope of the study to also select epitopes recognized by species other than humans. In this case, the second most frequently represented host species is mouse. Accordingly, an option was created in the web application (next section) to allow selection of murine epitopes. The number of murine epitopes identified is listed as a separate column in Table
Finally, we also expanded our analysis to allow selection of B cell/antibody epitopes. In this case, we set a 5 to 20 residue size window and initially selected X-ray structure, biological activity, and
Finally, we developed a tool, which will be hosted by the IEDB as an additional link in the search results page and will be part of the next IEDB update release in fall 2015. This tool allows generation of specific epitope sets following the default criteria described above but also allows users to customize the generation of novel sets.
A sample screen shot of the main interface is shown in Figure
(a) Web application main page interface. (b) Web application “advanced options” page interface.
An “advanced options” webpage can be accessed from the main page, and a sample screen shot of this option is shown in Figure
To experimentally validate the usefulness of the tool we decided to synthesize some of the actual peptide sets identified by the tool and experimentally test them for recognition by human T cell responses. One of the main challenges for testing large pools for T cell recognition is that
However, in many cases the solubility of one peptide is not drastically influenced by the presence of other peptides (especially if the sequences, isolectric point, and general solubility are different). For this reason, we predicted that it might be possible to make pools of peptides already dissolved in a solvent like DMSO, mix the solutions, and relyophilize the pool of pools. Indeed, we routinely find these “sequentially lyophilized” pools, once resuspended, to be much more soluble than the individual components.
Accordingly, we synthetized a set of 207 EBV human CD8/class I epitopes, identified by the default setting described above (Supplemental Table 1A). In addition, we also synthetized a set of 92 CD8/class I epitopes derived from DENV virus, obtained by selecting only peptides with
Peptides corresponding to these three sets of epitopes were pooled and tested with human PBMC as a source of T cells. For these experiments we selected PBMC from 5 individuals infected with DENV virus and likely uninfected with TB (see methods for details) and PBMC from 5 LTBI individuals and likely uninfected with DENV. Because of the high incidence of EBV infection worldwide [
PBMC were stimulated with the DENV CD8 pool, MTB CD4 pool, and EBV CD8 pool. After
Predicted epitope pools induce a detectable
We devised a strategy that allows automatically filtering datasets to select epitopes of appropriate size, defined restriction, and assay type for use in characterizing responses to specific indications. While querying the IEDB database can also generate these sets, a certain degree of complexity in the queries and the setting of multiple parameters would be necessary. In our application, the epitope sets are automatically generated, while the user is still enabled to change the default settings to generate validated epitope sets matching specific criteria.
We further identified which epitope categories are supported by current IEDB data, and found that reference epitope sets could be produced for 43 categories with data currently available in the IEDB. The number of such categories, broadly based on previous epitope classification work [
While these actual epitope sets are provided as tables within the paper, we implemented a web application to automatically generate epitope sets, based on the fact that the IEDB content is rapidly growing and new epitopes are added to the IEDB in each of its biweekly updates. We plan to continuously gather feedback on this web application from the scientific community, and to implement changes and modifications through the main IEDB website [
Finally, to illustrate applicability in an actual experimental setting, we selected and synthesized peptide sets corresponding to EBV, DENV, and MTB epitopes. These epitope sets were used to measure immune reactivity in human cells. The experimental testing of these epitope sets demonstrated the applicability of these sets as a valuable resource to allow detection of T cell responses
The authors declare that they have no conflict of interests in the research.
This project has been funded with federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, and Department of Health and Human Services under Contract nos. HHSN272201200010C, HHSN272200900044C, and HHSN27220140045C and Grant no. U19 AI100275.