Genome-Wide Prediction of Vaccine Candidates for Leishmania major: An Integrated Approach

Despite the wealth of information regarding genetics of the causative parasite and experimental immunology of the cutaneous leishmaniasis, there is currently no licensed vaccine against it. In the current study, a two-level data mining strategy was employed, to screen the Leishmania major genome for promising vaccine candidates. First, we screened a set of 25 potential antigens from 8312 protein coding sequences, based on presence of signal peptides, GPI anchors, and consensus antigenicity predictions. Second, we conducted a comprehensive immunogenic analysis of the 25 antigens based on epitopes predicted by NetCTL tool. Interestingly, results revealed that candidate antigen number 1 (LmjF.03.0550) had greater number of potential T cell epitopes, as compared to five well-characterized control antigens (CSP-Plasmodium falciparum, M1 and NP-Influenza A virus, core protein-Hepatitis B virus, and PSTA1-Mycobacterium tuberculosis). In order to determine an optimal set of epitopes among the highest scoring predicted epitopes, the OptiTope tool was employed for populations susceptible to cutaneous leishmaniasis. The epitope (127SLWSLLAGV) from antigen number 1, found to bind with the most prevalent allele HLA-A⁎0201 (25% frequency in Southwest Asia), was predicted as most immunogenic for all the target populations. Thus, our study reasserts the potential of genome-wide screening of pathogen antigens and epitopes, for identification of promising vaccine candidates.


Introduction
Leishmaniases are a group of complex diseases caused by protozoan parasites of the genus Leishmania and transmitted to humans by hematophagous sandflies [1]. There are at least 20 species of the parasite, which vary according to geographical location and cause a variety of clinical manifestations, ranging from self-limiting cutaneous lesions to potentially fatal infection of the viscera [2,3]. It is a disease of tropical and subtropical areas, with more than 12 million cases in 88 countries and 2 million new cases annually including 1.5 million cases of cutaneous leishmaniasis (CL) and 0.5 million cases of visceral leishmaniasis (VL). The cutaneous disease is particularly prevalent in Afghanistan, Algeria, Brazil, Iran, Peru, Saudi Arabia, and Syria, accounting for 90% of the global CL burden [4].
Although high-cost chemotherapeutics are available, they show high toxicity and are prone to drug resistance development due to prolonged treatment periods [5]. Despite substantial effort spent in developing effective vaccines, there is currently no licensed vaccine against human leishmaniasis [6]. A large number of proof-of-principle studies have clearly demonstrated that different vaccine formulations, ranging from killed/live-attenuated parasites to recombinant DNA/protein vaccines, can provide significant protection against infection with Leishmania spp. in a variety of animal models [7,8]. However, the efficacy of these prophylactic or therapeutic vaccines remains partial, and it is therefore necessary to develop novel and effective vaccines [1].
In this regard, antigen identification represents the most important roadblock in vaccine development against any pathogen, as it is usually achieved through rather empirical, time-consuming, and labour-intensive in vivo and in vitro experiments. Efforts have thus been devoted to the development of novel strategies for a more rational and faster identification of antigens among large number of pathogen proteins [9]. In recent development, reverse vaccinology approach can be used to predict those antigens that are most 2 Journal of Tropical Medicine likely to be vaccine candidates using the pathogen genomic sequence [10].
Moreover, the genomic information, which contains the sequences of all known and unknown potential antigens of each pathogen, has enabled the prediction and analysis of the entire repertoire of potential cytotoxic T lymphocytes (CTL) epitopes using bioinformatics tools. This strategy allows the development of vaccines that were previously difficult or impossible to make and can lead to the discovery of unique antigens that may improve existing vaccines [11]. The recent genomic sequence completion of L. major, L. braziliensis, L. infantum, and L. donovani and the availability of immunoinformatics tools have opened new opportunities for the identification of novel vaccine targets against CL [9]. Additionally, the presence of genetically stable but highly conserved antigens among most of the species, including antigens with little or no homology to human proteins, offers hope for the development of a single vaccine for multiple disease indications [12].
As depicted in the literature, effective vaccines must invoke a strong response from both T and B cells; therefore, CTL epitope mapping is crucial in any vaccine designing strategy. Many immunoinformatics algorithms and resources have been available to predict T and B cell immune epitopes for peptide based vaccine design and development [13]. Thus, the approach of T cell epitopes prediction and their in vitro/in vivo validations appeared to be a very powerful strategy in rational antigen identification, particularly for a pathogen with a large genome such as Leishmania [9]. Hence, the current study deals with the analysis of L. major genome (33.6 Mb), considered to express about 8300 proteins, all of which are potential antigens containing effective CTL epitopes with respect to susceptible population [14].

Retrieval of Proteome Sequence
Dataset. The complete proteome of L. major (strain Friedlin), consisting of 8312 protein coding sequences, was extracted from database GeneDB [15]. We also retrieved five well-characterized control antigens (CSP-401GLIMVLSFL from Plasmodium falciparum, M1-58GILGFVFTL and NP-265ILRGSVAHK from Influenza A virus, core-141STLPETTVV from Hepatitis B virus, and PSTA1-41FVVALIPLV from Mycobacterium tuberculosis) from database AntigenDB in order to compare and validate the prediction results. These known antigens have been previously tested and verified in various experimental studies and reported as capable of eliciting CTL responses [16].

Methodology Used for Prediction and Characterization of
Candidate Antigens/Epitopes. Initially, the L. major proteome (8312 proteins) was screened for the presence of both signal peptide and GPI anchors using SignalP [17] and DGPI [18], respectively, and then consensus antigenicity predictions were done using VaxiJen [19] and ANTIGENpro [20] programs. Finally selected candidate antigens were further characterized using TMHMM [21], SCRATCH protein predictor [22], and BetaWrap program [23]. Thereafter, these candidate antigens were searched for potential sequence similarity with other closely related species and human and/or mouse proteins, using OrthoMCL database [24]. Furthermore, CTL epitopes prediction was carried out using NetCTL1.2 [25,26] tool integrating predictions of proteasomal cleavage, TAP transport efficiency, and 12 MHC class I supertypes' binding. Finally, OptiTope (http://etk.informatik.uni-tuebingen.de/ optitope) was used to determine good vaccine epitopes called the optimal set of epitopes from top scoring naturally processed T cell epitopes, for each population susceptible to cutaneous leishmaniasis (Figure 1) [27].
The tool OptiTope requires the following input data from the user: (i) sequences of known/predicted antigens, (ii) a target human population, that is, MHC alleles and corresponding frequency, and (iii) the epitope set to be optimized. The input given by the user is transformed into an optimization problem. If it is feasible, OptiTope will return an optimal set of epitopes along with fraction of immunogenicity contributing to overall immunogenicity. Otherwise, program will propose changes to the user's input that might yield a feasible optimization problem. The information related to MHC alleles frequency in susceptible human populations and geographic areas is retrieved from dbMHC database (http:// www.ncbi.nlm.nih.gov/gv/mhc). A good vaccine displays a high overall immunogenicity that means it is capable of inducing potent immunity in a large fraction of the target population including high mutation tolerance as well as a certain degree of allele and antigen coverage. Furthermore, the finally selected epitopes should display a high probability of passing through the natural antigen processing pathway. From all possible epitope combinations, the ones with a maximum overall immunogenicity will be called "optimal" (there may be more than one optimal epitope combination). Hence, the search for an optimal epitope set for an good vaccine can be considered as an optimization problem: out of a given set of epitopes, choose a subset which, out of all subsets meeting the other input requirements, displays maximum overall immunogenicity , which can be derived mathematically (1) as a weighted sum over immunogenicities of epitopes with respect to the given MHC alleles : where is the frequency of allele in the target population and , measure the immunogenicity of epitope when bound to allele (either predicted or experimentally determined).

Results and Discussion
The present study was divided into two major steps: (i) we utilized the L. major genome consisting of 8312 protein coding sequences and predicted 25 antigens (Table 1), through successive screening and consensus antigenicity predictions; (ii) we conducted a comprehensive analysis of the epitopes predicted from these 25 candidate antigens ( Figure 1). The present strategy is similar to the reverse vaccinology approach adopted by John et al. [28], for identifying common vaccine candidates from L. major and L. infantum genomes. Additionally, Singh et al. [29,30] also utilized the similar approach of MHC supertype based epitope identification, as a strategy to mine proteomic data for identification of novel CTL epitopes, in Plasmodium falciparum.

Screening of L. major Genome for Identification and
Characterization of Antigens. The previous studies revealed that surface-exposed proteins such as secretory/outer membrane proteins are ideal targets for vaccine development, with respect to those pathogens against which a strong B cell response (for antibody production) is critical. However, for vaccine development against those pathogens where T cell response is critical, subcellular localization is not an issue since a T cell response could be directed to any protein target [31]. In addition, GPI anchored proteins are abundantly expressed in the infective and intracellular stages of Trypanosoma cruzi (another kinetoplastid protozoan) and have been recognized as antigenic targets by both the humoral and cellular immunity [32]. Herein, the entire protein repertoire of L. major, consisting of 8312 protein coding sequences, was screened for presence of signal peptides and GPI anchors. Out of these, 265 proteins were predicted as GPI anchored proteins, using DGPI tool [18], and 1798 proteins were found to contain signal peptides/signal anchors, using SignalP3.0 tool [17]. However, 151 proteins were predicted to contain both signal peptides/signal anchors and GPI anchors (data not shown). Further screening of these 151 proteins, based on consensus antigenicity predictions using VaxiJen [19] and ANTIGENpro [20] tools, above a predefined threshold of 0.6, provided a set of 27 antigenic proteins (data not shown). Interestingly, three candidate antigens (GeneDB id: LmjF.04.0130, LmjF.04.0140, and LmjF.04.0170) were found to share a high sequence similarity (more than 99.6%) and thus the latter two antigens were excluded from further analysis. Finally, 25 candidate antigens were screened for further characterization as vaccine candidates (Table 1). Protein insolubility has been known to be a major obstacle for many experimental studies. Thus, we used SOLpro tool (of SCRATCH protein predictor [22]) to predict the propensity of a protein to be soluble upon overexpression. Out of the 25 antigens, 8 (numbers 3, 5, 7, 12, 13, 14, 16, and 18) were predicted to be soluble upon overexpression while control antigens M1, core, and PSTA1 were predicted to be insoluble upon overexpression (Table 1).
Similarly, proteins with more than one transmembrane (TM) region have been found to be difficult to clone, express, and purify. Thus, we predicted TM regions using TMHMM web server. Out of 25 predicted antigens, 19 antigens were found to contain less than two; 5 antigens (numbers 6, 11, 12, 13, and 24) were found to contain two each while antigen number 1 was found to contain five TM regions. On the other hand, PSTA1 was found to contain 6 TM regions (Table 1). Through literature analysis, it has also been observed that many bacterial and fungal proteins such as toxins, virulence factors, adhesins, and surface proteins have parallel betahelices which play important role in human infectious disease [33]. Therefore, BetaWrap program [23] was used to predict the super secondary structural motif in primary amino acid sequences of 25 antigens. A total 9 candidate antigens (numbers 2-6, 9, 11, 13, and 24) were predicted to contain right-handed parallel beta-helix.
Besides, heterologous immunity may exist to crossreactive epitopes in other strains of the same organism. Thus, we identified the potential orthologs in the available Leishmania genomes annotations using OrthoMCL database [24] through BLASTP homology prediction program. All the selected 25 candidate antigens showed orthologs in other related species, namely, L. braziliensis, L. infantum,  and L. mexicana except antigen number 11. One of the greatest barriers in vaccine development is the possibility that a particular vaccine may cross-react between host and parasite antigens [34]. Thus, vaccine candidates showing sequence similarity with the hosts (e.g., human or mouse) proteins are likely to cause autoimmunity in the host and should be discarded to avoid potential autoimmunity. Out of 25 antigens, 3 (numbers 3, 9, and 18) showed orthologs in human as well as mouse.

Epitope Based Analysis of the Selected Antigens Using
NetCTL. For elicitation of T cell responses, the subcellular location and function of target protein are less important than the presence of appropriate MHC binding epitopes in the protein sequences [35]. In past 15 years, significant efforts have been made toward generation of procedures/algorithms for accurate prediction of MHC binding affinity and T cell epitopes. Utilizing the clustering method, majority of HLA molecules have been classified in relatively few HLA supertypes on the basis of their peptide binding specificities [36,37]. One approach to identifying targets of CTL responses in an antigen is based on prediction of high affinity MHC class I restricted T cell epitopes using computerized algorithms [38]. It is also demonstrated that peptides that possess in vitro binding affinity (IC 50 ) values of ≤ 500 nM are more immunogenic in vivo [39].
Thus, in the current study, the immunogenicity screening was limited to the predicted peptides that were able to bind HLA class I supertypes, with binding affinities (IC 50 ) ≤500 nM [25]. Furthermore, it is important to consider whether each MHC binding peptide is being correctly processed from the native antigen and subsequently displayed on the surface of antigen presenting cells. At present, it is possible to predict the naturally processed peptides using NetCTL algorithm above a combined epitope processing score of 0.75, which includes predictions of proteasomal cleavage, TAP binding, and HLA binding [26]. Thus, in order to identify candidate CD8+ T cell epitopes, 25 candidate antigens selected from L. major were screened using NetCTL.
The tool NetCTL1. Apart from this, their experimentally validated CTL epitopes were also predicted by NetCTL ranked in top five. From the analysis, antigen number 1 was found to have largest number of predicted CTL epitopes for HLA-A and HLA-B supertypes while antigens numbers 4, 16, 19, and 21 had higher number of CTL epitopes for HLA-A supertypes and antigens numbers 13, 16, 19, and 21 had higher number of CTL epitopes for HLA-B supertypes in comparison to the control antigens. Overall, test antigen number 1 showed highest number of supertype epitopes and was thus predicted as best antigen. From among the predicted CTL epitopes, the epitopes which displayed the top processing score for the respective MHC supertypes are presented in Tables 2 and 3. These 624 potential CTL epitopes were also checked for their potential similarity with human proteins, using Human Protein Reference Database (http://www.hprd.org/). However, none of the epitopes were found similar to any human proteins [40].

Selection of Optimal Epitopes Set Based on Population
Coverage Analysis. MHC is highly polymorphic; hence each individual possesses a set of MHC molecules of differing specificities; that is, different patients typically bind different repertoires of peptides. Thus, a crucial step in the design of effective peptide based vaccine is the selection of the good epitopes set which yields the best immune response in a given population or individual. Furthermore, the frequency of an MHC allele to occur within the target human population directly affects the allele's contribution to the overall immunogenicity. Sette et al. have also demonstrated a correlation between immunogenicity and MHC class I binding affinity [39]. It is therefore reasonable to use MHC class I binding affinity prediction methods for calculation     of the overall immunogenicity [40]. Hence, the present study employed OptiTope (using BIMAS method [41]) to determine the optimal set of epitopes from the selected epitopes, which calculate the best immune response in the susceptible target populations of cutaneous leishmaniasis, namely, Southwest Asia, North Africa, and South America. Initially, the top scoring epitope sets predicted using NetCTL from each of the 25 candidate antigens were tried to optimize by OptiTope and screened the BIMAS based HLA nonbinders (negative) for the target populations. Out of these, 10 epitope sets (from antigen numbers 1, 2, 4, 5, 6, 8, 15, 21, 22, and 25) that were predicted HLA binders (positive) are clubbed together to form a set of 120 candidate epitopes. However, when this combined epitope set was further optimized for the three different target populations, no optimization solution was obtained for any population. Therefore, epitope set of antigen number 21 was randomly excluded from the combined set and got the optimized results with the 108 combined epitopes set (derived from the 9 positive antigens) for the different target populations.
For the target population of Southwest Asia, out of the 108 candidate epitopes set, OptiTope selected a subset of 60 epitopes restricted by the 19 MHC class I alleles covering 96.58 % human population. The most immunogenic epitope, 127SLWSLLAGV, from antigen number 1, was found to bind with allele HLA-A * 0201, contributing 7%, while the least immunogenic epitope, 246KSSALAHKL, from antigen number 25, contributed < 1% to the overall immunogenicity (Table 4).
Similarly, for the target population of North Africa, out of the 108 candidate epitopes' set, OptiTope selected a subset of 45 epitopes restricted by the 13 MHC class I alleles covering 88.48 % human population. Here again, the epitope 127SLWSLLAGV, from antigen number 1, was most immunogenic, covered the allele HLA-A * 0201, and contributes 9%, and the least immunogenic epitope, 4LLPRLFLAF, from antigen number 8, contributed <1% to the overall immunogenicity (Table 5).
Also, for the target population of South America, out of the 108 candidate epitopes' set, OptiTope selected a subset of 127SIMSLQIRY, from antigen number 8, was found to contribute <1% to the overall immunogenicity (Table 6). Thus, overall, it was found that 6 antigens (numbers 1, 4, 13, 16, 19, and 21) had larger number of predicted CTL epitopes as compared to control antigens which could be tested in vivo for validation. Similarly, the epitope 127SLWSLLAGV, from antigen number 1, binds to HLA-A * 0201 molecule and was predicted as most immunogenic for all the three populations' susceptibility to leishmaniasis [42,43]. Hence, these antigens/peptides may be considered as suitable candidates for vaccine and diagnostics design [44]. Further, these protective epitopes conform to the anchor-based MHC binding motifs' concept used for T cell epitope identification by many researchers such as Sette et al. [45] and Rötzschke et al. [46].

Conclusions
The current study aimed to mine L. major genome for antigens selection and characterization as vaccine components based on criteria such as presence of transmembrane domains and orthologs analysis. Furthermore, the immunogenic epitopes predicted from these antigens can be analyzed for HLA-supertype binding and optimization of good vaccine epitopes against susceptible human populations to L. major infection. In light of the results obtained, it can be concluded that the combined use of reverse vaccinology and immunoinformatics along with in vitro/in vivo validation strategies has emerged as the most promising approach in designing successful vaccine against tropical diseases. In future, it would be helpful to use modeling and simulation system where critical experiments may be performed in a computer in order to predict the effects of experimental modifications on the immune system and thus offer a criterion for the selection of the most likely meaningful experimental tests to be conducted in vivo or in vitro.