Identification of Novel Potential Vaccine Candidates against Tuberculosis Based on Reverse Vaccinology

Tuberculosis (TB) is a chronic infectious disease, considered as the second leading cause of death worldwide, caused by Mycobacterium tuberculosis. The limited efficacy of the bacillus Calmette-Guérin (BCG) vaccine against pulmonary TB and the emergence of multidrug-resistant TB warrants the need for more efficacious vaccines. Reverse vaccinology uses the entire proteome of a pathogen to select the best vaccine antigens by in silico approaches. M. tuberculosis H37Rv proteome was analyzed with NERVE (New Enhanced Reverse Vaccinology Environment) prediction software to identify potential vaccine targets; these 331 proteins were further analyzed with VaxiJen for the determination of their antigenicity value. Only candidates with values ≥0.5 of antigenicity and 50% of adhesin probability and without homology with human proteins or transmembrane regions were selected, resulting in 73 antigens. These proteins were grouped by families in seven groups and analyzed by amino acid sequence alignments, selecting 16 representative proteins. For each candidate, a search of the literature and protein analysis with different bioinformatics tools, as well as a simulation of the immune response, was conducted. Finally, we selected six novel vaccine candidates, EsxL, PE26, PPE65, PE_PGRS49, PBP1, and Erp, from M. tuberculosis that can be used to improve or design new TB vaccines.


Introduction
Tuberculosis (TB) is a chronic infectious disease caused by an acid-fast bacillus, Mycobacterium tuberculosis [1]. TB is the second cause of death caused by an infectious agent throughout the world [2,3]; in 2012, there were an estimated 8.6 million incident cases of TB globally, which is equivalent to 122 cases per 100,000 people, and the absolute number of cases continues to increase slightly from year to year [4].
The current vaccine against tuberculosis, bacillus Calmette-Guérin (BCG), exerts different levels of protection: from 46 to 100% against the disseminated disease form and from 0 to 80% against pulmonary disease [5,6]. In addition to this low efficacy, reemergence of the disease caused by the appearance of the acquired immunodeficiency syndrome (AIDS) and multidrug-resistant (MDR) strains has generated requirements for a new and more efficient vaccine against TB [7].
The development of new vaccines starts with the identification of unique components of the microorganism capable of generating a protective immune response [3]. With traditional techniques, this could be a long and arduous process, aside from the difficulty of cultivating the microorganism in the laboratory [8][9][10].
Advances in sequencing technology and bioinformatics have resulted in an exponential growth of genome sequence information that has contributed to the development of software that aids genomic analysis in a short period of time and at a low cost. Reverse vaccinology (RV) applied to the genome of a pathogen aims to identify in silico the complete repertoire of immunogenic antigens that an organism is capable of expressing without the need of culturing the microorganism. Additionally, RV can help to discover novel antigens that might be less abundant, not expressed in vitro, or less immunogenic during infection that are likely to be missed by conventional approaches [8,[11][12][13][14].

BioMed Research International
The RV process begins with the proteomic information in a database; then, the selection of vaccine candidates is performed by means of different bioinformatics tools that analyze the properties of each protein and the human immune response generated by them [8][9][10]15]. Good vaccine candidates are considered those that do not present homology with human proteins to avoid the generation of a potential autoimmune response [16]; these candidates must also lack transmembrane regions, in order to facilitate their expression. In addition, it is necessary to analyze the lack of crossreaction among other pathogenic antigens [14]. Another characteristic a good vaccine candidate should have is to possess good antigenic and adhesin properties, which are important for the pathogenesis of the microorganism and for protection against the disease [13,17]. Extracellular or cell surface localized proteins are good vaccine candidates due to their increased accessibility to the immune system [14,16]. Currently, software useful for simulation of the immune response has been developed that could help in the search for novel vaccine candidates [15]. In this work, we have applied RV to the M. tuberculosis proteome with the purpose of selecting new antigens that could be used in a novel and more efficient vaccine against TB.

Proteome Analysis. New Enhanced Reverse Vaccinology
(NERVE) software was downloaded, installed, and utilized to determine vaccine candidates employing the default parameters for Gram-positive bacteria [13]. The proteome sequences of M. tuberculosis H37Rv (NC 000962.2), Mycobacterium bovis AF2122/97 (NC 002945. 3), and M. bovis BCG str. Pasteur 1173P2 (NC 008769.1) were downloaded from the Genome Project database of the National Center for Biotechnology Information (NCBI) [18]. Each proteome was analyzed individually by NERVE; conservation values for all proteins were determined comparing the M. bovis and BCG proteome against the M. tuberculosis proteome using the comparative option.

Antigenicity Determination.
The antigenicity value was calculated for each protein using its amino acid sequences and the VaxiJen server, which predicts whether a protein could be a protective antigen. VaxiJen is based on auto cross covariance (ACC) and has a threshold of 0.5 in the antigenicity value [19].

Selection of Representative Proteins.
With the parameters calculated with NERVE and VaxiJen, we selected proteins that presented an antigenicity value ≥0.5, 50% adhesin probability, and without homology with human proteins or transmembrane regions. The proteins selected were grouped according to the family of proteins to which they belong. In this manner, we obtained seven groups: ESX family proteins, PPE family proteins, PE family proteins, PE PGRS family proteins, lipoproteins, hypothetic proteins, and, the last group, denominated "others, " composed of proteins with different miscellaneous characteristics. The amino acid sequence of each protein were downloaded from the NCBI protein data-base, and an alignment was made for each group of proteins using Clustal X software [20] in order to select representative proteins from each group.

Immune Response Simulation.
With the amino acid sequences of the proteins selected, a human immune response simulation was performed using the C-ImmSim software to predict whether these proteins could generate a protective immune response against TB [15]. C-ImmSim simulates a portion of a lymph node but is not set up to simulate a realistic concentration of antigen; however, we adjusted the antigen concentration simulation to a high dose, comparable to a vaccination event. Different immunizations were simulated with each protein in the following two different schemes: first, a single immunization with each protein individually at time zero and, second, three immunizations at 0, 2, and 4 weeks with each protein separately. The level of Th1 cells stimulated 80 days after the first injection was identified.

Protein Analysis.
The bioinformatics programs used to study the vaccine candidate's amino acid sequences included Phobius [21] to calculate and confirm protein subcellular localization more precisely, ANTHEPROT [22], Expasy [23], and IEDB software [24] and their different models for localizing protein regions with greater hydrophilic and greater solvent accessibility related with antigenic regions, and the SYF-PEITHI ver. 1.0 program [25], which was used to determine the frequency of presentation of peptides to 35 different alleles of the major histocompatibility complex (MHC). In the case of lipoproteins, we employed only ProPred software [26] to determine the frequency of presentation of 25 amino acid peptides to different alleles of the MHC-II.
2.6. Bibliographic Study. Bibliographic information was sought for each protein using different databases on the website for information regarding its putative function, its use as vaccines, its role in virulence, its corresponding evaluated mutants, its induction of an immune response, and its level of conservation in mycobacterial proteomes.

Vaccine Candidate Selection.
The vaccine candidates were selected using all the results, simulations, and bibliographic information obtained. The candidates possess the best values of the parameters calculated and exert diverse functions that render them useful as different targets in the microorganism ( Figure 1).

Results and Discussion
3.1. Proteome Analysis. RV offers the advantage of reducing the time and cost of the development process of a new vaccine with the advantage of being safer and more effective. With the purpose of designing a new vaccine against TB with a greater protection level against pulmonary disease, we utilized RV to select vaccine candidates from the M. tuberculosis proteome.
The selection of potential vaccine candidates in this study was based on the analysis of several important properties [13,19,21,27]. (1) Surface proteins or secreted proteins were selected because they are good targets of the immune system  effector molecules.
(2) Proteins with multiple transmembrane helices were discarded because they are not recommended for vaccine development, especially DNA vaccines, as they are difficult to clone, express, and purify. (3) Adhesin probability was considered an important factor since the first step in bacterial invasion is the contact with host molecules through adhesion structures, making adhesions good vaccine candidates capable of improving the immune response that results in blocking infection. (4) Proteins having similarity to those of the human proteome were avoided. The use of proteins or genes that encode them and having similitude with human proteins or DNA sequences can generate an autoimmune response or recombination and integration events in the host genome, respectively. (5) Proteins with the best values of antigenicity were chosen. Antigenicity is the property of the proteins to be recognized by the immune system; hence, it is desirable to find the highest antigenicity value for the selection of the best potential vaccine candidates. The M. tuberculosis proteome was studied with NERVE software, which identifies in silico vaccine candidates, analyzing the biological characteristics that influence vaccine design. NERVE selected the M. tuberculosis H37Rv proteome, composed of 3989 proteins; the selection of candidates was performed considering the following characteristics: ≥50% adhesin probability, fewer than two transmembrane regions, and fewer than five proteins similar to the human proteome. In addition to this, the candidates must lack either membrane or cytoplasmic localization. Finally, NERVE selected 331 proteins as vaccine candidates (Additional file 1) (see Supplementary Material available online at http://dx.doi.org/10.1155/ 2015/483150).
The results were compared with the information deposited in the VIOLIN database [28], and we found several important matches in some antigens. Those coincidences provided support for the results obtained with NERVE. The vaccine candidates selected have diverse putative functions and different conservation values. Moreover, this software tool has the option of comparing the proteomes of two different organisms and of determining a conservation value among all the proteins. In this case, we compared the M. tuberculosis H37Rv proteome against the M. bovis and the BCG proteome [13,14].
The proteome analysis was finalized by determining the antigenic value of the 331 vaccine candidates using the VaxiJen server to obtain protective antigens prediction.

Selection of Representative Proteins.
Using the calculated characteristics, the number of vaccine candidates was reduced to 73 proteins, with a stricter selection, as mentioned in the Methods section. These proteins were grouped in seven clusters according to their type and the family to which they belong, as follows: the ESX protein family (3 proteins), the PPE protein family (7 proteins), the PE proteins family (8 proteins), PE PGRS protein family (16 proteins), lipoproteins (5 proteins), hypothetical proteins (21 proteins), and the final group denominated "others" (13 proteins) ( Table 1). This process was carried out because it is well known that members within the same protein family possess a close relationship between their sequences and functions.
With the purpose of selecting representative proteins from each group, we used amino acid sequences from their members to perform alignments using the Clustal X program, with the exception of those from the "others" and hypothetical proteins groups.
In the case of the hypothetic proteins, an alignment was carried out using the Psi-BLAST tool from the NCBI website, in order to grant them a putative function; in some cases, a coincidence was not found, but in others we could assume a probable function ( Figure 2).
For the selection of representative proteins, we took into account the similarity among the sequences of the group, the best antigenicity values, the high probability to act as an adhesin, and their conservation in the M. bovis and BCG proteome. For this part, we chose 12 representative candidates from the seven protein families (Table 1).

Immune Response
Simulation. The 12 proteins selected were used to conduct simulations of the human immune system response under different conditions with C-ImmSim software. In terms of the results of the simulations, C-ImmSim server showed the following nine graphs for each simulation: B-cell population, B-cell population per state, Th cell population, Th cell population per state, Tc population, Tc population per state, CD population, EP population, and Ab production ( Figure 3). We found the same pattern in all the selected proteins with a slight difference among levels. However, we focused mainly on Th1 cells level including all states, because it has been reported that protective immunity against TB is conferred mainly by Th1 cells [14,29]. We found that the levels of stimulated cells were most similar among the selected proteins when one immunization was performed, but this level improved when the number of immunizations increased, and there was also a remarkable differentiation among the proteins at the final simulation step. PE PGRS family proteins showed the highest levels of stimulated Th cells, which also generated a good level of B cells stimulation, which is important for complementing the immune response ( Figure 4).

Protein Analysis.
The proteins were analyzed individually with several bioinformatics tools to determine whether the protein sequences had a region with important antigenic characteristics, that is, a region where there are matches with hydrophobicity, solvent accessibility, presentation to MHC, and antigenicity.
We did not find a protein that clearly possesses an antigenic region that could be used as an epitope or as a fusion in a vaccine. Conversely, we determined that all the proteins had high values of antigenicity in different parts of the sequence; thus, we recommend the use of complete proteins in a vaccine formulation because using only a fragment could eliminate some epitopes necessary for a complete and protective human immune response against the whole microorganism.
We also found that all of the proteins selected as vaccine candidates could be presented to several MHC with a high probability value, resulting in good probability of immune response induction against these components of M. tuberculosis.
In this analysis process, we identified protein subcellular localizations using Phobius tools to confirm the results emitted by NERVE, because Phobius software is more accurate than the program (HMMTOP) utilized by NERVE [21]. This characteristic is important in a vaccine candidate because proteins with cytoplasmic or membrane localization are less antigenic than extracytoplasmic proteins.

Bibliographic Study.
We wanted to know whether the proteins would be safe if we used them on a vaccine formulation prior to the preclinical trials; thus, we studied the information published about different characteristics related with their impact on virulence.
We were able to observe that some proteins have not been studied, but we found information about other members of their family groups, such as PE, PPE, and PE PGRS proteins, suggesting an influence on immune system evasion and antigenic variation, an important feature in considering a protein that will be included in a vaccine [39,47,59]; besides, PE PGRS family proteins are restricted to pathogenic mycobacteria and, in particular, PE PGRS11 and PE PGRS 17 have been reported to induce maturation and activation of human dendritic cells, enhancing the ability of the latter to induce Th cells stimulation [26,60].
In case of the antigen LppN we did not find specific information, besides, almost all the proteins in M. tuberculosis genome lack conserved regions, which means that they are unique proteins with different characteristics. Some lipoproteins are major antigens in the Mycobacterium genus that can generate also a cellular and humoral immune response but without immune memory response [50,[61][62][63][64].
Erp protein is a virulence factor present only in the Mycobacterium genus [51][52][53], it is an immunodominant antigen related to pathogenicity and is strongly induced in nutrient starvation related to the latency phase [53]. On the other hand, PBP1 protein is important in the replication phase because it catalyzes the final steps of bacterial cell wall peptidoglycan synthesis [54,65,66].
The EsxL candidate is an ESX-like protein with very similar characteristics to Esat-6, which is an immunodominant secreted protein used in research associated with the diagnosis of TB and new TB vaccines [30]. Esat-6 is a strong T-cell antigen, and its family members are involved in virulence and in host-pathogen interplay via either antigenic variation or antigenic drift [31, 32, 67].   10  20  30  40  50  60  70  80  90  100  110  120  130  140  150  160  170 370  380  390  400  410  420  430  440  450  460  470  480  490  500  510  520  530 10  20  30  40  50  60  70  80  90   10  20  30  40  50  60  70  80  90  100  110  120  130  140  150  160  170  180   190  200  210  220  230  240   We did not find any information related with hypothetical proteins Rv3207c and Rv3718c, and, up to date, it is unknown if these proteins could be of any risk within a vaccine formulation; thus further studies are needed to elucidate the role of these proteins in this kind of biological products. About the remainder candidates, we found no information that related them with high levels of bacterial virulence; conversely, we identified highlighted features that render these proteins as good vaccine candidates.
These proteins play different roles in mycobacterial pathogenicity and possess featured values of antigenicity and immune response induction; in addition, the PE PGRS49 candidate is not present in M. bovis or in the BCG proteome, an important characteristic that could be useful in the improvement of a specific immune response against pulmonary disease in the current BCG vaccine.

Conclusions
The application of bioinformatics programs for the identification of proteins that could be used as vaccine candidates is a very useful, easier, and shorter process compared with traditional vaccinology, which is important for the research concerning public health.
Using RV, we selected six novel vaccine candidates from the M. tuberculosis H37Rv proteome, employing mainly in silico studies. The six proteins selected: EsxL, PE26, PPE65, PE PGRS49, PBP1, and Erp correspond to several family proteins and possess different characteristics that are useful and important in vaccine design. The bibliographic information also indicated that they might be safe.
The potential vaccine candidates selected in this work could be used in different vaccine designs to conduct experiments in order to validate them as DNA vaccines, rBCG, or as recombinant proteins, to improve protection against TB, rendering the new vaccines more effective against the pulmonary disease.
We did not propose a specific and unique antigenic region within these protein structures; however, the bioinformatics and bibliographic analyses showed characteristics that make them valuable putative vaccine candidates that could be used further to experimentally investigate whether they are suitable for the development of a new vaccine against tuberculosis.   Figure 4: Levels of Th cells stimulated with vaccine candidates assessed with the C-ImmSim server. Simulation with C-ImmSim was performed for each vaccine candidate with one and three immunizations, and Th1 cells stimulated values per microliter were identified 80 days after the first immunization. The best stimulation was induced by PE PGRS49 protein followed by PE PGRS 56, PE26, PPE65, and PBP1 proteins.