Proposing Low-Similarity Peptide Vaccines against Mycobacterium tuberculosis

Using the currently available proteome databases and based on the concept that a rare sequence is a potential epitope, epitopic sequences derived from Mycobacterium tuberculosis were examined for similarity score to the proteins of the host in which the epitopes were defined. We found that: (i) most of the bacterial linear determinants had peptide fragment(s) that were rarely found in the host proteins and (ii) the relationship between low similarity and epitope definition appears potentially applicable to T-cell determinants. The data confirmed the hypothesis that low-sequence similarity shapes or determines the epitope definition at the molecular level and provides a potential tool for designing new approaches to prevent, diagnose, and treat tuberculosis and other infectious diseases.


Introduction
Mycobacterium tuberculosis is the causative agent of tuberculosis (TB) and is a major infectious pathogen. A resurgence of tuberculosis has marked the beginning of the third millennium, with about 8 million new cases and 1.8 million deaths annually worldwide attributed to M. tuberculosis [1]. More recently, the World Health Organization's Global Tuberculosis Control Report 2009 estimated 9.27 million cases of TB in 2007 [2], an increase from 9.24 million cases in 2006, 8.3 million cases in 2000, and 6.6 million cases in 1990. These numbers underscore the need to develop and characterize rational and effective TB therapies and diagnostic reagents. Currently, the M. bovis bacillus Calmette-Guérin (BCG) vaccine is used against TB. BCG has excellent immuno-potentiating properties, but it is not uniformly effective [3], and efforts to develop a more effective M. bovisderived vaccine against TB have not had satisfactory results [4,5]. Moreover, the emergence of multiple-drug-resistant TB [6] and the increasing prevalence of mycobacterial disease in people with acquired immunodeficiency syndrome [7] add urgency to the need for an effective vaccine against TB.
Recent studies have provided convincing evidence that antibodies can protect against mycobacterial infections [8]. Passive immunization using mono-and polyclonal antibodies and humoral immune responses against specific mycobacterial antigens in experimental animals have shown that antibodies may play an anti-TB role [9][10][11]. The epitopic characterization of mycobacterial antigens is performed antigen-by-antigen, searching for specific peptide features and epitope-paratope interaction domains for each mycobacterial protein antigen [12][13][14]. In essence, the common molecular basis and the fundamental rules underlying the immune function of specific peptide units remain unexplained [15,16]. Little is known about the structure-function relationship(s) of protein sequences in immunology. This lack of knowledge may underlie the confusion surrounding neutralizing versus nonneutralizing antibodies, harmful vaccine side effects, and imprecise diagnostic tools.
Within this framework, we propose that rare peptide sequences, that is, peptide motifs rarely found in host proteins, elicit an immune response. Unique protein sequences are more likely to evoke an immune response than are highly repeated motifs, which would be immunogenically silenced by tolerance. We have previously validated the relationship between sequence rarity and peptide immunoreactivity in cancer, autoimmunity, and infectious disease models. We reported several examples of specific targeting of peptide regions with no or limited sequence similarity to the host proteome, including mono-and polyclonal antibodies raised against EC Her-2/Neu oncoprotein [17], desmoglein-3 [18], melan-a/MART-1 [19], high-molecular-weight melanomaassociated-antigen [20,21], tyrosinase [22,23], prostatespecific-antigen [24], HPV16 E7 [25], HCV [26], and influenza A [27] proteins. In addition, a review of the epitope mapping literature revealed that most peptide epitopes obey the low-similarity rule [28], further supporting our hypothesis. Taken together, our data [17][18][19][20][21][22][23][24][25][26][27] and findings from other research [28] suggest that unique peptide sequences are more likely to be immunogenic than are highly repeated peptide sequences, which may be immuno-tolerated.
In the present study, we investigated the low-similarity hypothesis in M. tuberculosis antigen epitopes with the aim of defining immunogenic peptide sequences that may be effective for anti-TB immunotherapy. We studied validated M. tuberculosis epitopes currently cataloged in the Immune Epitope Database and Analysis Resources (IEDB) (http://www.immuneepitope.org/) [29,30]. We report that most of the mycobacterial epitopes involved in the immune response are characterized by a low level of similarity to the host proteome.

Methods
The IEDB contains hundreds of mycobacterial epitopes derived from various strains and with different characteristics [30]. Specifically, the IEDB contains B-and T-cell mycobacterial epitopes that are (a) linear or conformational; (b) derived from M. tuberculosis, M. bovis, M. Avium, and M. leprae; (c) of different lengths (up to 36 aa) [31]; and (d) have sequences that were negative, positive, or produced mixed negative/positive results in immunoassays. The present report focused on B-cell linear epitopes derived from M. tuberculosis that were positive in immunoassays.
These mycobacterial epitopes were analyzed for similarity to the host proteins with the aim of identifying identical amino acid groupings that were common sequences [32]. The amino acid groupings were linear pentapeptides. Pentapeptides were used because the canonical literature indicates that five to six amino acids are a sufficient minimal determinant for an epitope-paratope interaction, and thus a pentapeptide can act as an immune unit and play a crucial role in cell immunoreactivity and antigen-antibody recognition [33,34].
Fifty-six M. tuberculosis epitopes cataloged as immunopositive at IEDB resource were dissected into pentamer motifs offset by one residue: that is, WDEDG, DEDGE, EDGEK, DGEKR, and so forth. Then, each 5mer was used as a probe to scan the entire host proteome searching for identical matches using the PIR protein database and PIR perfect peptide match program (http://pir.georgetown.edu/pirwww/) [35]. The number of matches of each mycobacterial pentamer to the host proteome indicates the similarity score that can range from no matches to hundreds of matches. Following previous experimental data, a pentapeptide fragment that has about five (or fewer) perfect matches to the host proteome was considered to be a low-similarity sequence, that is, a rare fragment [17][18][19][20][21][22][23][24][25][26][27][28].

Low-Similarity Sequences and B-Cell Epitopes.
We examined the linear M. tuberculosis-derived B-cell epitopes cataloged in the IEDB [30], using pentapeptide probes as described above and report the similarity scores ( Table 1).
As shown in Table 1, most of the mycobacterial B-cell epitopes contain pentapeptides that are rare or absent in the host proteins. Furthermore, preliminary screening revealed that a few B-cell epitopes were also T-cell epitopes ( Table 1, footnote 5) [36,[49][50][51][52].
The relationship between low-similarity sequences and mycobacterial epitopes was further confirmed by extending the similarity analysis to the COOH domain of the mycobacterial DNAK protein. The DNAK chaperon protein is a member of the highly conserved heat shock protein family and is a protein antigen involved in the immune response to mycobacteria [53,54]. In particular, the DNAK carboxy-terminus domain is highly immunogenic [41,55] and hosts several of the epitopes described in the IEDB database (Table 1) [41]. Thus, the COOH domain of the mycobacterial DNAK protein was suitable for verifying the similarity-immunogenicity link. In brief, the primary sequence of the DNAK COOH-terminus domain (aa 500-609) was dissected into pentamers that were probed against the human proteome by the computer-assisted similarity analysis described in the Methods section. Then, the epitopic sequences were allocated along the mycobacterial-versushuman similarity profile. Figure 1 shows the similarity profile of the DNAK COOH-terminus protein to the human proteome and the localization of the linear epitopes along the antigenic sequence. It can be seen that the carboxy terminal sequence of the mycobacterial chaperon protein has pentapeptides occurring frequently in the human proteome as well as unique fragments. Five pentamers (i.e., TWRIGY, WRIGY, VTGHW, TGHWR, and GHWRC) were unique to the DNAK COOH-terminus domain and these accounted for 5% of the total DNAK pentamers. The remaining 100 pentapeptides were repeated at different extent throughout the human proteome and occurred a total of 2468 times. Furthermore, and most importantly, Figure 1 shows that the peptide epitopes recognized by the human sera, as demonstrated by Elsaghier et al. [41], were among the mycobacterial peptide fragments that had a low similarity to the human proteome. That is, the unique pentapeptide signatures of the DNAK carboxy-terminus domain coincided with the epitopic motifs involved in immune recognition, indicating a direct relationship between low similarity and the humoural antibody response. Conversely, M. tuberculosis peptides not responsive to the human sera had a high number of pentamer matches to the human proteome (Figure 1, black arrows numbered from 6 to 15). 4 Journal of Biomedicine and Biotechnology 1 IEDB ID Reference Dataset of Mycobacterial Immune Epitopes: http://iedb.zendesk.com/entries/18171-reference-datasets-of-mycobacterial-immuneepitopes. 2 Epitope sequence with amino acid sequence in one-letter code. Low-similarity pentapeptides in capital letters. 3 Number of times the low-similarity pentapeptide fragment(s) occur(s) in the set of proteins that comprehensively form the host proteome (see under Methods and [17][18][19][20][21][22][23][24][25][26][27]). 4 Host proteome: H, human; M, murine. 5 Sequences entirely or partially contained in M. tuberculosis T-cell epitopes [36,[49][50][51][52]. 6 Table 1 for amino acid sequence details. IEDB ID Reference Dataset of Mycobacterial Immune Epitopes: http://iedb.zendesk.com/entries/18171-reference-datasets-of-mycobacterial-immune-epitopes.

Low-Similarity Sequences and T-Cell Epitopes.
The lowsimilarity M. tuberculosis B-cell epitopes hosted T-cell epitopes [36,[49][50][51][52] in several cases (Table 1). These data suggest that low-similarity influences the T-as well as the B-cell immune response, a finding that might be used in defining T-cell epitopes. Characterization of T-cell epitopes is complicated by different sets of rules governing the immune response. The problems include, but are not limited to, epitope hierarchy, immunoprevalence and immunodominance [56], variable epitopic length, the epitope major histocompatibility complex (MHC) binding potential and the difficulty of correctly predicting MHC binding activities in the majority of the test-set peptides [57], and the degenerate T-cell receptor recognition [22]. In this intricate context, the low-similarity hypothesis may provide insight for the structural characterization of immunogenic T-cell epitopic sequences. We know that the optimal peptide length for MHC binding is between nine and fifteen amino acids, and the central 5-6 residues contribute the majority of the specific contacts [59]. Thus, as previously proposed [19], the structural features that define an immunogenic T-cell epitope may be characterized by the MHC binding terminal residues flanking a central low-similarity core formed by 5-6 residues. This model is illustrated in Figure 2 using an immunodominant mycobacterial T-cell epitope validated and described by Ivanyi et al. [58]. The immunodominant p61-80/PT19 mycobacterial epitope (VTGSVVCTTAAGNVNIAIGG) has a critical core formed by five residues (AGNVN, aa 71-75): a single amino acid substitution with N73A impairs T-cell immunogenicity of the target p61-80/PT19 epitope. Our similarity analysis reveals that this substitution causes a shift in the proteomic similarity level of the five core residues (aa 71-75) that changes from one match (aa sequence AGNVN) to eight matches (aa sequence AGAVN) and "hides" the 5-mer core from the immune system.
The model illustrated in Figure 2 is supported by reports that the HFMPT pentapeptide is a minimal antigenic determinant for MHC class I-restricted T lymphocytes [60], and the KYVKQ pentapeptide is a minimal antigenic determinant for CD4(+) T-cell clones [61]. Moreover, the proteomic similarity values for HFMPT and KYVKQ were no matches and one match, respectively.

Discussion
In this study, we explored the IEDB resources to define and characterize immunogenic mycobacterial epitopes that could be used for effective anti-TB immunotherapy. We observed that the immunological information carried by the M. tuberculosis antigens was localized in rare peptide fragments, confirming the relationship between low-similarity and immunogenicity. Sequences containing pentapeptide fragment(s) with low similarity to the host proteome appear to be those involved in the humoral antibody recognition of the mycobacterial antigen and may also influence T-cell immunoreactivity too.
Our findings provide a method for investigating the immune potential of the mycobacterial proteome, and thus, may help elucidate M. tuberculosis immunobiology, one of the primary challenges in current tuberculosis research. This is an important issue, particularly because the epitopic characterization of mycobacterial antigens advances slowly, antigen-by-antigen [12][13][14].
Moreover, mycobacterial antigenic motifs rarely found in host proteins are more likely to serve as potential immunogenic antigens. Use of short peptide modules rather than full-length mycobacterial antigens in vaccines may increase specificity and efficacy. It has been understood since the 1980s that chemically synthesized small peptides can induce antibodies that react with intact proteins [62]. Furthermore, peptide-based vaccines have been successfully used against several pathologies and infectious diseases. For example, tumour antigen-derived peptides evoked potent antitumour immunity in the murine melanoma M-3 [63]; cytotoxic CD4+ and CD8+ T lymphocytes generated by mutant p21-ras (12Val) peptide vaccination selectively killed autologous tumour cells carrying this mutation [64]; Her-2/neu peptide (aa 657-665) is an immunogenic epitope of Her-2/neu oncoprotein with potent antitumour properties [65]; synthetic peptides identified as antigenic sites on the S1 subunit of pertussis toxin induced especially high antibody titres against native pertussis toxin in mice [66]; a linear peptide containing minimal T-and B-cell epitopes of Plasmodium falciparum circumsporozoite protein provided protection against a transgenic sporozoite challenge [67]. These findings indicate that the use of exact immunogenic mycobacterial peptide sequences may provide the most effective active and passive immunotherapeutic anti-TB approaches. In addition, a major obstacle to antibody-based anti-TB therapy is the risk of adverse effects that range from Lupus vulgaris to granulomatous hepatitis [68][69][70][71][72], possibly caused by cross-reactivity [73]. The use of low-similarity peptides may allow the development of effective vaccines without adverse side-effects [74]. In conclusion, the findings of the present study may provide guidance in the analysis, identification, and utilization of the B-and T-cell response in vaccination against mycobacterial and other infectious diseases.