A Method for Individualizing the Prediction of Immunogenicity of Protein Vaccines and Biologic Therapeutics: Individualized T Cell Epitope Measure (iTEM)

The promise of pharmacogenomics depends on advancing predictive medicine. To address this need in the area of immunology, we developed the individualized T cell epitope measure (iTEM) tool to estimate an individual's T cell response to a protein antigen based on HLA binding predictions. In this study, we validated prospective iTEM predictions using data from in vitro and in vivo studies. We used a mathematical formula that converts DRB1* allele binding predictions generated by EpiMatrix, an epitope-mapping tool, into an allele-specific scoring system. We then demonstrated that iTEM can be used to define an HLA binding threshold above which immune response is likely and below which immune response is likely to be absent. iTEM's predictive power was strongest when the immune response is focused, such as in subunit vaccination and administration of protein therapeutics. iTEM may be a useful tool for clinical trial design and preclinical evaluation of vaccines and protein therapeutics.


Introduction
Peptide binding to HLA (MHC) is the critical first step required for a T cell response. HLA binding enables antigen presenting cells to engage T cells via the T cell receptor to initiate a cascade of events that stimulate proinflammatory responses [1,2]. Indeed, one of the most critical determinants of protein immunogenicity is the strength of peptide binding to MHC molecules [3]. Binding of antigenic peptides to HLA is desirable for vaccine design because immunogenic antigens produce protective T cell and antibody responses. However, the same interaction is often undesired in the context of biologic drug therapies, such as monoclonal antibodies and replacement proteins, because neutralizing antibodies raised against the therapy lower drug efficacy. In several cases, immune responses to proteins administered as drugs or vaccines have been linked to a particular HLA allele, which is better able to bind peptides derived from the antigen [4][5][6]. Consequently, the ability to predict this relationship might be useful in clinical trial design; for example, subjects who carry specific HLA alleles could be excluded from a protein therapeutic trial. We set out to develop a statistical analysis tool, individualized T cell Epitope Measure (iTEM), which estimates the likelihood that a particular antigen will generate an immune response for a specific subject. As shown in the five case studies reported here, iTEM can be used as a benchmark to determine whether or not an individual subject is likely to respond to a given epitope or subunit protein. We conclude that iTEM scores can be used as a binary test, with a threshold over which a peptide or protein is likely to bind an individual's HLA and could potentially trigger an immune response, and below which a response is unlikely.

iTEM Calculations.
To calculate an iTEM score we first identify putative HLA ligands and T cell epitope clusters using the EpiMatrix system [7,8]. Input amino acid sequences are parsed into overlapping 9-mer frames. Each frame is then evaluated for binding potential against a panel of eight common Class II alleles (DRB1 * 0101, DRB1 * 0301, DRB1 * 0401, DRB1 * 0701, DRB1 * 0801, DRB1 * 1101, DRB1 * 1301, and DRB1 * 1501) [9]. We call each frame-byallele evaluation an EpiMatrix "assessment". EpiMatrix raw scores are normalized and reported on a "Z" scale. Assessments with a score of at least 1.64, theoretically speaking the top 5% of any given sample, are considered highly likely to bind HLA and are called "hits" [10]. The resulting dataset is then screened for regions containing more hits than we would expect to find by chance alone. For regions with a high density of hits an EpiMatrix Cluster Score is calculated. To calculate an EpiMatrix Cluster Score we sum the scores of all the hits contained in a given cluster and deduct the sum of scores we would expect to find in a randomly generated sequence of similar length. In other words the EpiMatrix Cluster Score is the deviation between the observed EpiMatrix sum of scores and the expected EpiMatrix sum of scores. The expected sum of scores can be calculated as the number of 9-mer frames contained in the target sequence times the number of alleles screened against times .05 (the expected hit rate) times 2.06 (the expected value of a Z score above 1.64).
T cell epitope clusters are promiscuous but they are not universal, and human APCs present only two DR alleles. We have observed that certain peptides stimulate immune response in some subjects better than others. In order to explain part of this observed variation we have developed the iTEM Score. iTEM scores are a special case of the EpiMatrix Cluster Score. iTEM scores describe the relationship between a particular patient's HLA haplotype (considering only two HLA-DR alleles) and the amino acid sequence of a given epitope cluster. iTEM scores are used to predict the likelihood that the amino acid sequence of an antigenic peptide will be presented by a given subject's antigen presenting cells and in turn stimulate that subject's T cells. To calculate an iTEM score for a given individual we calculate an EpiMatrix Cluster Score for each HLA allele in the haplotype. Allele-specific cluster scores of less than zero are discarded (literally set to zero), and the two allele specific cluster scores are then added together to form an iTEM score. Negative allele specific cluster scores are discarded because the binding relationship between a given peptide and a given allele is independent of the relationship between that peptide and another allele. In other words the failure of one allele to present a given peptide does not negatively affect the relationship between that peptide and any other allele, and therefore we felt it would be wrong to allow negative allele specific cluster scores to detract from accompanying positive scores. Higher iTEM scores indicate an increased likelihood of immunogenicity.
An example of an EpiMatrix report from which an iTEM score can be calculated is shown in Figure 1.

Experimental Data Sources.
The six case studies reported here are based on immunogenicity measurements made in enzyme-linked immunospot (ELISpot) assays that measure interferon-gamma secretion from antigen-stimulated peripheral blood mononuclear cells (PBMCs) from humans or splenocytes from mice. Four involve vaccine candidate studies for Mycobacterium tuberculosis [11], Variola major [12], and Francisella tularensis [13,14]; one involves a type 1 diabetes (T1D) autoantigen [15] and finally an angiogenesis inhibiting protein therapeutic known as FPX [16]. ELISpot responses were considered positive if the number of spots detected was greater than 50 spots per one million cells over background (1 response over background per 20,000 cells).

Correlation of iTEM Scores and ELISpot
Results. Antigens from each study were grouped into one of four categories based on overall iTEM score and ELISpot result. True positives are peptide-HLA pairs with both positive iTEM scores and ELISpot results. True negatives are peptide-HLA pairs with both negative iTEM scores and ELISpot results. False positives have positive iTEM scores and negative ELISpot results, and false negatives, have negative iTEM scores and positive ELISpot results. The cutoff for a positive iTEM score was initially set at 2.06, as described above.
Analyses were rerun with a higher cutoff that we hoped would adjust for factors of immunogenicity not predicted by EpiMatrix since peptides that are more likely to bind HLA would probably be less likely to be affected by other factors during antigen processing and presentation. We arbitrarily decided on 2.5 as the higher cutoff. Standard chi-squared and linear regression analysis between iTEM scores and ELISpot results were performed by hand or using Microsoft Excel 2003, respectively. Statistical significance was defined as P < .05. In linear regression analysis, the intercept was constrained at zero because peptides that do not bind HLA cannot stimulate T cells above background levels. Humans bearing HLA DRB1 * alleles not predicted by EpiMatrix were excluded from this study.

Case Study 1: Tuberculosis Vaccine.
In the tuberculosis vaccine study, two groups of six HLA A2/DR1 transgenic mice were immunized intranasally once with a pool of 25 peptides at 2 μg/peptide and CpG oligodeoxynucleotide 1826 inside liposomes in 50 μL, twice two weeks apart. A third group received the same set of injections except that there were only 16 peptides in the pool. Two weeks after the second immunization, the mice were sacrificed and lymphocytes were isolated from the spleens. Cells were plated on an IFN-γ ELISpot plate at 200,000 cells/well and stimulated with individual peptides in triplicate at 10 μg/mL for two days. Peptides that induced an average spot count of 50 spots/million cells over background were considered to have generated a positive response in this assay.
There were 66 peptides tested in the DR1 mice, whose splenocytes were pooled, and therefore there were 66 peptide-HLA pairs for which we calculated iTEM scores. In the peptide immunization study, when the iTEM threshold was 2.06, there were 54 positives and 12 negatives. Of the 54 positives, 38 (70%) were ELISpot positive, and of the 12 negatives, 11 (92%) were ELISpot negative. Using chisquared analysis, we found the association between iTEM score and ELISpot result to be statistically significant (X 2 = 16.79, df = 1, P < .001). Although the linear regression  Figure 1: Calculating an iTEM score. Using the EpiMatrix report for a peptide, the iTEM score is equal to the difference between the sum of significant scores for an allele (shown in the black boxes above) and the expected score for a peptide of that length. Assessments outside the top 10% (Z-scores below 1.28) are hidden for ease of viewing the more significant scores, and assessments are shaded with different levels of darkness to highlight Z-scores that are in the top 5% (1.64-2.32) or top 1% (greater than 2.32). For a subject with two alleles, the iTEM score for each allele is calculated and then added together, as shown in the three equations within the figure. The peptide constant is equal to the product of the number of frames (13 in this example), the expected frequency of hits (0.05), and the expected value for a hit (2.06). For this peptide, an HLA of DRB1 * 0701/DRB1 * 1101 would be expected to respond (iTEM score: 6.02).
did not predict the number of SFC given the iTEM score (R 2 = 0.39), the analysis yielded a statistically significant positive slope of 39.35 ± 6.12 (P < .001).
Characteristics of this study are summarized in Table 1, and the results are summarized in Table 2.

Case Study 2: Smallpox Vaccine.
In the smallpox vaccine study, HLA-DR1 or DR3 transgenic mice were immunized using a DNA-prime/peptide-boost strategy. Mice were injected intramuscularly, twice, two weeks apart, with 100 μg of a plasmid DNA vaccine bearing a multiepitope gene. Two weeks following the second DNA injection, mice received 50 μg of peptides corresponding to the epitopes contained in the DNA vaccine via a subcutaneous injection of peptides in liposomes delivered intranasally, twice, two weeks apart. Two weeks after the final immunization, mice were sacrificed and spleens removed for splenocyte preparation. Cells were transferred to an IFN-γ ELISpot plate at 200,000 cells/well and stimulated with individual peptides in triplicate at 10 μg/mL for two days. Peptides that induced an average spot count of 50 spots/million cells over background were considered positive.
DR1 mice were immunized with 32 different peptides while DR3 mice were immunized with 41 different peptides; therefore, there were 73 peptide-HLA pairs for which we calculated iTEM scores. When the iTEM threshold was set to 2.5, 42 positives and 31 negatives were calculated. Of the 42 positives, 16 (38%) had positive ELISpot results and of the 31 negatives, 26 (84%) had negative ELISpot results. Using chi-squared analysis, we found the association between iTEM score and ELISpot result to be statistically significant (X 2 = 4.20, df = 1, P < .05). Again, there was no association between the SFC and iTEM score (R 2 = 0.20), but the linear regression analysis yielded a statistically significant positive slope of 23.92 ± 5.58 (P < .001).
Characteristics of this study are summarized in Table 1, and the results are summarized in Table 2. 3.3. Case Study 3: Tularemia Antigenicity. The Tularemia antigenicity study involved 23 F. tularensis survivors. PBMCs were isolated from whole blood and cultured for 5-20 days with 10 μg/mL of pooled putative T cell epitope peptides with IL-2 and IL-7 added every other day until 3 days before IFN-γ ELISpot assay. Cells were then transferred to ELISpots plates at 200,000 cells/well and stimulated with individual peptides or pools at 10 μg/mL in triplicate. Peptides that induced average spot counts greater than 50/million cells and more than double the negative control wells were considered positive responses.
Due to varying cell yields, PMBCs from all 23 subjects were not exposed to all 27 peptides, and only 510 peptide-HLA pairs were tested via ELISpot. Since EpiMatrix only predicts for certain HLA alleles, there were only 232 peptide-HLA pairs for which an iTEM score could be calculated. Because the study consisted of human subjects who had been naturally infected with tularemia, there was no cutoff to generate a statistically significant association between iTEM score and ELISpot result due to a very large number of false positives. Using the 2.5 iTEM cutoff, there were 181 positives and 51 negatives. Of the 181 positives, 34 (19%) had  positive ELISpot results and of the 51 negatives, 47 (95%) had negative ELISpot results (X 2 = 3.48, df = 1, P < .10). We could not perform linear regression analysis because the values of the negative spot counts were not reported. Characteristics of this study are summarized in Table 1, and the results are summarized in Table 2.

Case Study 4: Tularemia Vaccine.
In the tularemia vaccine study, six HLA-DR1 transgenic mice were immunized three times, at two-week intervals, with 25 μg in 50 μL intratracheally with a plasmid DNA vaccine encoding a multiepitope gene. Two weeks after the final DNA immunization, mice were immunized intratracheally, twice, two weeks apart, with 50 μg of peptides, corresponding to the epitopes contained in the vaccine construct, formulated in liposomes together with CpG oligodeoxynucleotide. Two weeks following the final immunization, mice were sacrificed and their spleens removed for splenocyte isolation. Cells were transferred to an IFN-γ ELISpot plate at 200,000 cells/well, and stimulated with individual peptides in triplicate at 10 μg/mL for two days. Peptides that induced an average spot count of 50 spots/million cells over background were considered positive.
The DNA plasmid administered to the DR1 mice contained 78 different peptide sequences; therefore, 78 peptide-HLA pairs were tested via ELISpot. There were 18 positives and 60 negatives using an iTEM score cutoff of 2.5. Of the 18 positives, 11 (61%) had positive ELISpot results and of the 60 negatives, 38 (63%) had negative ELISpot results. The large number of false negatives yielded a statistically insignificant association between iTEM score and ELISpot results (X 2 = 3.39, df = 1, P < 0.10). Even though the regression did not predict the number of SFC given the iTEM score (R 2 = 0.41), a linear regression analysis yielded a statistically significant positive slope of 163.43 ± 22.35 (P < .001).
Characteristics of this study are summarized in Table 1, and the results are summarized in Table 2.

Case Study 5: Immunogenicity of an Autoantigen.
We investigated the autoantigenicity of glutamic acid decarboxylase (GAD65) T cell epitopes in diabetic patients. PBMCs, isolated from whole blood samples of six human subjects, were cultured for seven days with a pool of 14 GAD65 peptides, with IL-2 and IL-7 added to the growth medium every two days. Cells were transferred to an IFN-γ ELISpot plate at 200,000 cells/well, and stimulated with individual peptides in triplicate at 10 μg/mL for two days. Peptides that induced an average spot count of 50 spots/million cells over background were considered positive.
Due to varying cell yields, PMBC from all six subjects were not exposed to all 14 peptides, and only 67 peptide-HLA pairs were tested via ELISpot; however, since EpiMatrix only predicts for certain HLA alleles, there were only 56 peptide-HLA pairs for which an iTEM score could be calculated. While this may seem different from the immunogenicity studies carried out above, in T1D, which is mechanically similar to a peptide immunization, a single autoantigen is considered to be the target of autoreactive T cells and therefore an iTEM score of 2.06 had the most predictive power, with 43 positives and 13 negatives. Of the 43 positives, 36 (84%) had positive ELISpot results, and of the 13 negatives, 7 (54%) had negative ELISpot results. Using chi-squared analysis, we found the association between iTEM score and ELISpot result was statistically significant Journal of Biomedicine and Biotechnology 5 (X 2 = 7.51, df = 1, P < .01). Again, performing a regression did not yield a clear link between the number of SFC and iTEM score (R 2 = 0.38), linear regression analysis yielded a statistically significant positive slope of 25.57 ± 4.42 (P < .001).
Characteristics of this study are summarized in Table 1, and the results are summarized in Table 2. 3.6. Case Study 6: Immunogenicity of Protein Therapeutic. In the FPX immunogenicity study, 36 human subjects received FPX peptides intravenously and 40 others received FPX subcutaneously. PBMCs from 15 subjects were isolated from whole blood samples on day 1 before dosing and then again on days 42 and 180. Cells were cultured for 7 days with 10 μg/mL of individual FPX T cell epitope peptides (three different peptides) and then transferred to ELISpot plates to detect levels of IFN-γ and IL-4 upon restimulation with 10 μg/mL of test peptide at 200,000 cells/well. Peptides that induced an average spot count of 50 spots/million cells over background were considered positive.
45 peptide-HLA pairs were tested via ELISpot; however, since EpiMatrix only predicts for certain HLA alleles, there were only 30 peptide-HLA pairs for which an iTEM score could be calculated. Since this was a peptide immunization study, an iTEM threshold of 2.06 was used. There were 16 positives and 14 negatives according to iTEM scores. Of the 16 positives, 15 (94%) had positive ELISpot responses, and of the 14 negatives, 10 (71%) had negative ELISpot results. Using chi-squared analysis, we found the association between iTEM score and ELISpot result was statistically significant (X 2 = 13.66, df = 1, P < .001). Linear regression analysis yielded a statistically significant positive slope of 70.37 ± 11.17 (P < .001); however, the regression could not accurately predict the number of SFC given the iTEM score (R 2 = 0.57).
Characteristics of this study are summarized in Table 1, and the results are summarized in Table 2.

Discussion
In this study, we developed an immunoinformatic tool to predict the outcomes of experimental measurements of vaccination and immunogenicity analysis. Two methods for predicting the association between iTEM and ELISpot response gave significant correlations. These were the linear model and the threshold model.
Of the studies that could be fit to a linear model, the linear slopes derived were rather varied. The average of the five slopes was 64.51 ± 58.34. While the linear model did reach statistical significance in most of the data, its accuracy was moderate at best. The average R 2 value, which is a measurement of the accuracy of the linear model, was 0.39. The low R 2 values as well as the different slopes for each study lead us to reject the use of a linear equation to predict experimental results from in silico analysis. Given that the average negative predictive value (NPV) and positive predictive value (PPV) of all the data when a cutoff of 2.06 was used were 77% and 61%, respectively, we believe that the relationship between iTEM scores and experimental results is more accurately described by the threshold model where the magnitude of an immune response cannot be predicted beyond "positive" or "negative". In the future, it is possible that when iTEM may be able to account for more complex issues (e.g., the likelihood that an epitope will be processed) that a more linear correlation between iTEM scores and experimental results will exist.
iTEM scores correlated best with immunogenicity for studies in which a protein or small peptide was administered, but the correlation did not reach statistical significance with studies of immune responses to peptides following exposure to a pathogen. The strongest correlations were observed for the TB, T1D and FPX studies. In these studies, the average negative predictive value (NPV) was 72% and the average positive predictive value (PPV) was 83%, when the iTEM threshold was set at 2.06. Thus it appears that processing and antigen presentation do not introduce significant variation unaccounted for by EpiMatrix predictions. Since iTEM scores are generated only by examining specific peptide-HLA interactions, when these variables are minimized by using protein or peptide prime and boosts, the iTEM was a more accurate predictor of immune responses. While HLA presentation is necessary for immunogenicity, it is not sufficient. A number of factors related to the expression and processing of the antigen also influence immunogenicity. If the protein is not expressed or secreted by the pathogen during infection or not properly cleaved or transported by the host, it will not be immunogenic. Even with proper presentation, the peptide may be homologous to self, and therefore either the corresponding T cell has been deleted during thymic selection or T cells that respond to the peptide have been anergized in the periphery.
Adding a second step between immunization and exposure to the protein immunogen and antigen presentation lowered the predictive accuracy of iTEM for a data set. For example, in those studies where DNA vaccination was used, such as smallpox, the predictive accuracy of iTEM was not significant if the cutoff of 2.06 was used, but was improved by increasing the threshold to 2.5. This suggests that T cell responses to those peptides that are very likely to bind HLA as indicated by higher iTEM scores were more likely following DNA vaccination. Perhaps the addition of steps beyond HLA binding (such as protein processing following expression of the pseudoprotein from the DNA vaccine) can accidentally remove peptides otherwise capable of activating the immune system. Thus, in the smallpox vaccine study, iTEM only had a PPV of 38%; on the other hand, its NPV was 84%.
The relationship between immune response and T cell epitope is more complicated in studies where exposure of T cells to the antigen (natural infection studies) is affected by many factors including gene expression, protein processing, dose and route. Thus, for subjects exposed to F. tularensis, iTEM was not an accurate predictor of immune response to tularemia peptide in vitro. In this case, for example, the proteins from which the peptide epitopes were derived might not have been expressed during infection or the bacteria generated factors that downregulated HLA expression, thereby lowering the repertoire of HLA ligands [17].
iTEM's accuracy is greatest when used as a negative predictor of immune response, a feature which may be very useful for the interpretation of failed vaccine efficacy studies and for clinical trials of protein therapeutics. Using a threshold of 2.06, iTEM had an average NPV across the six case studies of nearly 80%, but a PPV of only 61%. A high NPV may be due to the fact that peptides that are unable to bind to HLA (as described by a low iTEM score) have a very small chance of being immunogenic. In contrast, a low PPV can be explained by factors affecting the protein's processing and presentation, despite its constituent epitopes, capacity to bind HLA.

Uses of iTEM in Vaccine Efficacy
Studies. iTEM's ability to predict the absence of a T cell response suggests useful applications in vaccine design. iTEM analysis might be useful for vaccine studies to explain both interpeptide and intersubject variability in T cell assays and clinical trials. In theory, iTEM analysis could be used to predict whether or not a subject would fall into the 5%-20% of the population that does not respond to commonly used vaccines. If a lack of response is predicted, a patient could be exempt, thus sparing him/her from unnecessary vaccinations.

Application of iTEM to Studies of Autoimmune Diseases.
Others have examined the role that an individual's HLA DR genotype plays in regulating immune responses [18]. Several autoimmune disorders have demonstrated an association with specific HLA types. T1D is associated with DR4-DQ8 and DR3-DQ2 [19], multiple sclerosis with DR15 [6], and ankylosing spondylitis and the other spondyloarthropathies are associated with B27 [20]. It would be interesting to consider whether the antigenic targets of certain autoimmune diseases, such as the acetylcholine receptor in myasthenia gravis, contain T cell epitopes that are more likely to be presented in the context of the HLA alleles that are associated with manifestation of autoimmunity, and less likely to be presented on alleles that have an inverse association with autoimmunity. Such a correlation would be evidence supporting the use of iTEM to identify auto-antigens.

Application of iTEM to Clinical Studies of Protein
Therapeutics. In the context of protein therapeutics, drug developers generally agree that T cell response, which is associated with the development of antidrug antibodies, is undesirable. The ability to predict that an immune response in a particular subject is likely would be useful for identifying individuals at higher risk of developing antidrug antibodies. These individuals could be excluded from clinical trials and/or advised to avoid the use of the protein therapeutic, increasing the safety of protein therapeutics for human use.
For example, in a prospective study, Koren et al identified selected HLA types that were associated with a stronger immune response against a novel antitumor therapeutic. These HLA alleles were associated with higher iTEM scores, higher T cell responses, and higher antibody responses in the Phase I trial [16]. The same HLA restriction effect can be seen in the response to IFN-β treatment in human subjects with MS, where DRB1 * 0701 was overrepresented in subjects with anti-IFN-β antibodies [6].

Conclusion
In summary, the iTEM tool appears to be a useful method for predicting the absence of T cell response to a given vaccine or protein therapeutic. While this study has only examined CD4 epitopes, we expect that a similar, and possibly stronger correlation may exist for CD8 epitopes given that the constraints imposed by the closed-end HLA Class I binding groove [21] make class I predictions inherently much more accurate than the class II predictions [22]. Further studies of immune response using the iTEM tool will be needed before it will be able to be adapted for use by drug developers. Tools such as iTEM are likely to play an important role in the development of safer vaccines and therapeutics in the future.