EPIPOX: Immunoinformatic Characterization of the Shared T-Cell Epitome between Variola Virus and Related Pathogenic Orthopoxviruses

Concerns that variola viruses might be used as bioweapons have renewed the interest in developing new and safer smallpox vaccines. Variola virus genomes are now widely available, allowing computational characterization of the entire T-cell epitome and the use of such information to develop safe and yet effective vaccines. To this end, we identified 124 proteins shared between various species of pathogenic orthopoxviruses including variola minor and major, monkeypox, cowpox, and vaccinia viruses, and we targeted them for T-cell epitope prediction. We recognized 8,106, and 8,483 unique class I and class II MHC-restricted T-cell epitopes that are shared by all mentioned orthopoxviruses. Subsequently, we developed an immunological resource, EPIPOX, upon the predicted T-cell epitome. EPIPOX is freely available online and it has been designed to facilitate reverse vaccinology. Thus, EPIPOX includes key epitope-focused protein annotations: time point expression, presence of leader and transmembrane signals, and known location on outer membrane structures of the infective viruses. These features can be used to select specific T-cell epitopes suitable for experimental validation restricted by single MHC alleles, as combinations thereof, or by MHC supertypes.


Introduction
Smallpox was a devastating contagious disease that ravaged humankind for millennia, wiping out entire civilizations [1]. The disease was caused by two types of variola virus (VARV), major and minor, which differed greatly in their average mortality rates: 30% versus 1%, respectively. VARV major was the most prevalent form [2,3]. Systematic vaccination against smallpox began in the early 19th century but the disease lingered until the World Health Organization (WHO) initiated worldwide vaccination campaigns in 1967. The last case was reported in Somalia in 1977 and in May 1980 the WHO declared that smallpox had been eradicated, ceasing vaccination [1,2]. Eradication was facilitated because there are no animal reservoirs for the virus, as it only infects humans [4].
VARV belongs to the Orthopox genus of the Poxviridae family, consisting of large double-stranded DNA viruses that replicate in the cytoplasm of infected cells [5,6]. Poxviruses are large and complex with ∼250 genes and a multistage life cycle, producing different infective forms including intracellular mature virions (IMV) and extracellular enveloped virus (EEV) [5,6]. Humans can be infected by several poxviruses; the closest to VARV that are also pathogenic to humans are vaccinia (VACV), cowpox (CPXV), and monkeypox (MPXV) viruses [7,8]. The primary reservoir of MPXV is rodents [9], while CPXV has the broadest animal reservoir range of all poxvirus, including cats, dogs, elephants, and rodents [10]. Historically, VAVC has been considered to emerge after repeated passages from an ancestral CPXV [11]. However, phylogenetic studies question that view and there are some speculations that VACV could be a horsepox virus (HPXV) [12]; yet both, the host and origin of VACV, remain unknown [13]. VACV and CPXV infections in humans are generally mild and self-limiting and can induce cross-protective 2 Journal of Immunology Research immunity [14]. The observation that CPXV sufferers did not get smallpox led Edward Jenner in 1798 to introduce a method of vaccination through scarifications with Variolae Vaccinae, Latin, for CPXV [15]. Immunization with CPXV was eventually displaced by VACV vaccine, which was used subsequently for global smallpox vaccination [12].
As smallpox was eradicated and vaccination ceased, the global population has become increasingly susceptible to both smallpox and zoonosis by orthopoxviruses [8,9]. People under 30 have no immunity against these viruses and VACVinduced immunity is waning in those that were vaccinated [16]. Despite recommendations by the WHO, stockpiles of smallpox virus had never been destroyed and there are concerns that unregistered stocks could be used as a weapon of bioterrorism [17]. Several features make smallpox a major terrorist threat. It replicates easily, is aerosolizable, and is highly contagious before, during, and after disease onset. Moreover, smallpox is lethal and disfiguring and has already been used as a biological weapon in North America during the French and Indian Wars [18]. Thus, there is renewed interest in the development of vaccines against smallpox, particularly safer ones, since immunization with VACV can result in serious adverse events and it is considered risky in immunocompromised or immune-suppressed individuals [19].
Immune protection against orthopoxviruses requires both B and T cells [20] but the relevance of T cells is paramount. CD8 T cells are required to eliminate infected cells, while help by CD4 T cells is essential to elicit effective humoral responses [21]. Thus, people with dysfunctional humoral responses (e.g., agammaglobulinemia) can be vaccinated with VACV, while those with loss of T cells cannot as they can suffer severe disease [22]. T-cell immune responses are triggered by the recognition of foreign peptides bound to cell surface-expressed major histocompatibility complex (MHC) molecules, also known as human leukocyte antigens, HLA, in humans. CD4 T cells recognize peptides presented by MHC class II (MHC II) molecules while CD8 T cells recognize peptides presented by MHC class I (MHC I) molecules.
Advances in both immunology and genomic analysis offer new possibilities for eliciting immune protection without the requirement for live-virus vaccination and attendant complications. The identification of HLA class I and class II restricted T-cell epitopes (CD8 and CD4 T-cell epitopes, resp.) from poxviruses may allow us to develop safe and yet immunogenic peptide-based vaccines. Here, we describe the identification of protein antigens that are shared between several pathogenic orthopoxviruses, including VARV, MPXV, CPXV, and VACV, and T-cell epitopes that are identical in all selected proteins. This information was used to create a freely accessible web resource, EPIPOX: URL http://imed.med.ucm.es/epipox/, intended to facilitate the design of epitope-based vaccines against orthopoxviruses.

Orthopoxvirus Sequences and Experimentally
Defined T-Cell Epitopes. In this study, we used the entire proteomes of 8 orthopoxviruses: VARV major, strain Bangladesh-1975,  (Table 1). We also used experimentally defined poxvirus-specific HLA I and HLA II-restricted T-cell epitopes that were retrieved from the IEDB [24] and EPIMHC [25] databases. We only considered unique T-cell epitope sequences with a size of 9 amino acids that were reported to be identified in humans infected with orthopoxviruses or who were vaccinated. We provide a list of experimentally defined T-cell epitopes as supplementary material in Additional File S1 in Supplementary Material available online at http://dx.doi.org/ 10.1155/2015/738020.

Protein Sequence Analyses and Annotations.
We took VARV major, strain Bangladesh-1975, as the reference for subsequent sequence analyses. We identified proteins with leader signals using SIGNALP [26] and transmembrane regions using TMHMM [27]. We identified protein orthologs using BLAST [27]. Briefly, we first BLAST the reference proteins against formatted databases of each of the remaining orthopoxvirus proteomes. We performed BLAST searches with default settings and considered only the description of the first hit and the corresponding alignment. Subsequently, we selected those protein searches that gave hits in each of the proteomes with identities greater than 60% and identified the corresponding orthologs. We used BIOPERL to parse BLAST hits [23].
Information on the temporal expression of VACV genes was kindly provided by Dr. Lefkowitz from the Poxvirus Bioinformatics Resource Center [28]. The information consisted on annotations identifying those genes that are expressed early (E), intermediate (I), and late (L) during the life cycle of VACV. This information is provided as supplementary material in Additional File S2. In addition, we identified, from the data provided by Dr. Lefkowitz   and EEV infective forms, as well as those proteins that are part of the VACV virion or CORE. This information is also included as supplementary material in Additional File S2. Protein annotations obtained for VACV were transferred to protein orthologs.

Prediction of T-Cell Epitopes.
We predicted MHC I and MHC II peptide binding to anticipate potential CD8 and CD4 T-cell epitopes, respectively. Specifically, we predicted peptide-MHC binding from VARV Bangladesh proteins that are shared between all selected orthopoxviruses using 32 HLA I-and 33 HLA II-allele specific position-specific scoring matrices (PSSMs) [29][30][31]. For a given protein, we considered the top 2% and 4% of scoring peptides to constitute HLA Iand HLA II-binding peptides, respectively. We only predicted binding for peptides of nine residues; most HLA I-restricted peptides are 9 residues in length and while HLA II-restricted peptides vary in length (9-22 amino acids) they have a core of 9 residues that anchor the peptide in the binding groove of HLA II molecules [30,32]. We also used Ngram language models to identify whether peptides can be generated from the source antigen by proteasomal cleavage [33]. This information is only relevant to HLA I-binding peptides, since most peptides presented by MHC I are derived from antigens degraded by the proteasome [34].

Database Building and Web Server Implementation.
Predicted T-cell epitopes and obtained protein annotations were incorporated into a POSTGRES relational database. The database consists of 3 tables (peptides, predictions, and proteins) that are linked through unique keys ( Figure 1). Briefly, table predictions contains peptide sequences and their MHC restriction elements; table peptides includes the peptide molecular weight, its protein accession number, and whether the peptide is cleaved by the proteasome; and table proteins contains gene product information including temporal expression (E: early, I: intermediate, and L:late), location in the virus (IMV, EEV, and CORE), and the existence of leader and transmembrane regions. We also developed a web front end or GUI to allow ready access to EPIPOX. Behind the interface is a Python script that handles database queries through underlying SQL. The EPIPOX resource is implemented on an Apache Web server under the Mac OSX operating system.

Epitope-Vaccine Design against
Orthopoxviruses. T-cell adaptive immunity is required for clearance of poxviruses during infection and/or vaccination and can also contribute to protective immunity from subsequent exposures [35,36]. Moreover, peptides corresponding to VACV-specific CD8 T-cell epitopes can confer protection to mice subjected to lethal VACV challenges [37]. Fueled by the need to develop safer smallpox vaccines, such knowledge has led to the recent identification of many VACV-specific T-cell epitopes [37,38]. These T-cell epitopes are deposited haphazardly in various specialized databases, including IEDB [24], EPIMHC [25], TEPIDAS [39], and AntiJen [40]. Of relevance for epitope-vaccine design, CD8 T-cells target primarily early and nonstructural gene products [41,42]. CD4 T cells target late and the most abundant genes products (IMV and EEV membrane proteins and CORE proteins), as do antibodies [42,43]. While some of the identified VAVC-specific Tcell epitopes are conserved in VARV a rational approach to identifying all potential T-cell epitopes eliciting crossprotective immunity is still required.

Shared Orthopoxvirus Proteins for Cross-Protective Immunity.
Nearly all orthopoxviruses can protect against challenge with another orthopoxvirus [14]. This exquisite crossprotective immunity is likely a result of direct antigenic similarity between poxviruses. Therefore, prior to defining potential T-cell epitopes we identified shared antigens between pathogenic orthopoxviruses. Identification of shared antigens is also relevant to reducing the experimental burden associated with T-cell identification. Human pathogen orthopoxviruses have large genomes encompassing over 180 open reading frames (ORF) with the exception of VACV Ankara strain, which has only 157 genes and lacks the ability to replicate [44] (Table 1). Using VARV major, strain Bangladesh-1975, as a reference, we identified 124 ORFs that are shared between 8 different complete genomes from several orthopoxviruses, including VARV minor, CPXV, MPXV, and several VACV strains (Additional File S3). Despite the criterion for selection being 60% identity, all 124 selected proteins have an average identity ≥ 85% as shown in Additional File S3. These proteins are prime candidates to induce cross-protective immunity although they need to be targeted by the immune system. Interestingly, within the selected proteins there are 8 known immunogens that conferred >60% protection to VACV in animal models (Table 2) [45]. Six of these immunogens are IMV or EEV proteins carrying transmembrane regions and/or are being late gene products. Interestingly, among the selected 124 proteins we found 26 additional proteins with transmembrane regions that could also be prime vaccine subunits candidates ( Table 3). Some of these proteins also have leader signal sequences (Table 3). Viral proteins with leader sequences follow the cell secretory pathway and are thus also important targets to consider for vaccine design [46,47].

T-Cell Epitome from Pathogenic Orthopoxvirus Proteins.
We targeted the shared orthopoxvirus proteins for T-cell epitopes prediction using 32 and 33 HLA I-and HLA IIspecific profile matrices (details in Material and Methods). The alleles targeted for peptide binding prediction are shown in Additional File S4. We selected these alleles because there are experimental peptide-binding data for them, which is required to make accurate peptide-MHC binding predictors [48]. Incidentally, these HLA alleles are frequently expressed in the general population and targeting them for epitope prediction permits the development of epitope-based vaccines covering the entire population. These HLA allelic variants can have overlapping peptide-binding repertoires and can be clustered accordingly in supertypes [49,50]. Selecting promiscuous peptide-binders to multiple HLA molecules facilitates the development of vaccines with a minimum number of peptides [49][50][51].
We predicted a total of 18726 HLA I-restricted and 32722 HLA II-restricted orthopoxvirus specific T-cell epitopes, all being identical between all orthopoxviruses considered in this study. In Additional File S4 we provide numbers of Tcell epitopes predicted by each HLA-specific profile used in this study. We predicted more CD4 than CD8 T-cell epitopes because we used a more permissive peptide-binding threshold for MHC II molecules (4% of top scoring peptides) than for MHC I molecules (2% of top scoring peptides) since peptide-binding prediction to MHC II molecules is considerably less accurate than to MHC I molecules [46]. Interestingly, we identified only 8106 unique HLA I-restricted T-cell epitope sequences and a few more (8483) unique HLA II-restricted T-cell epitope sequences. Therefore, there is a considerable overlap between the peptide binding repertoires of HLA molecules, which is larger for HLA II molecules than for HLA I molecules. HLA I-restricted peptides bound on average to 2.3 distinct HLA I molecules, while HLA IIrestricted peptides bound on average to 3.8 distinct HLA II molecules. This is due to the fact that peptide-binding to MHC II molecules is more degenerate than to MHC I molecules [29,30]. In Additional File S5, we provide all distinct predicted peptides with the HLA molecules that they were predicted to bind. Interestingly, there is also some overlap between HLA I-and HLA II-restricted peptides. In particular, we find that there are 2452 peptides that are predicted to be restricted by both HLA I and HLA II molecules. Thus, in total the predicted T-cell epitome consisted of just 14137 unique sequences among all predicted T-cell epitopes.
We compared the predicted T-cell epitome with experimentally defined poxvirus-specific HLA-restricted T-cell epitopes deposited in the IEDB [24] and EPIMHC [25]. We retrieved 170 HLA I and 9 HLA II-restricted T-cell epitopes meeting our criteria (see Additional File S1) but we only considered for comparison 85 HLA I-and 8 HLA II-restricted T-cell epitopes that we identified here to be conserved in all orthopoxviruses considered in this study. Of those, 72 HLA I-and 6 HLA II-restricted T-cell epitopes were found within our predicted T-cell epitome. Moreover, we predicted the experimentally verified restriction element in > 80%. The experimentally determined and shared epitopes that were not predicted (a minority) either are restricted by noncovered alleles or were simply not predicted. In Table 4, we summarize the data showing the verified and predicted HLA restriction elements. In sum, we readily predicted most of the experimentally verified T-cell epitopes. Considering that on average 10% of predicted T-cell epitopes can be experimentally verified [52], we shall expect that there are many more valid T-cell epitopes remaining to be validated within the T-cell epitome predicted in this study.  Table 2). Annotations were obtained as indicated elsewhere in Section 2. U: information not found.

EPIPOX Database and Web Server.
We developed a relational database based upon the predicted T-cell epitome and a web-based resource to facilitate online access and to query the database. We named this resource EPIPOX and made it available for free public use (URL: http://imed.med.ucm.es/epipox/). EPIPOX is a de facto analysis pipeline of viral T-cell epitomes. The content of the EPIPOX database is organized in three tables (peptides, predictions, and proteins) (Figure 1). The  Figure 1). We only included annotations that are relevant to epitope vaccine design, such as temporal expression of gene products and location in relevant structures of the virus such as the EEV and IVM membranes and CORE. Early expressed proteins and highly expressed proteins are generally thought to be more immunogenic, particularly with regard to CD8 T cells [53,54]. On the other hand, highly abundant late proteins that are located in membrane structures of the poxvirus appear to be the main focus of the antibody and CD4 T-cell response [42,43]. In the table proteins, we also provide annotations on whether the proteins have transmembrane region or leader signal sequence, as proteins with these features often interact with host cells and are important targets for subunit vaccine design [46,47]. The EPIPOX web interface ( Figure 2) allows querying of the database combining any annotation field in the database, 6 Journal of Immunology Research Table 4: Experimentally identified T-cell epitopes within the shared T-cell epitome predicted from pathogenic orthopoxvirus proteins. CD4 T-cell epitope  In addition, users can select only those peptides with a relative score above some selectable value. HLA-specific profiles used to score T-cell epitopes can reach a maximum score, which is used to set the relative score in percentage of each peptide. For HLA I-restricted epitopes, users can also restrict the search to those epitopes potentially generated by the proteasome.
as described above. For intuitive use, the interface is divided in two main sections. In the first section (SEARCH), users select proteins and restriction elements for epitope retrieval. In this section, EPIPOX also provides the option to query the database for promiscuous T-cell epitopes binding to three HLA I supertypes (A2, A3, and B7). The alleles belonging to these supertypes are present in 88% of the population regardless of their ethnic groups. Selecting promiscuous peptides restricted by these 3 supertypes facilitates maximizing the population coverage of vaccines with minimum numbers of peptides [49,50,55]. In the second section (LIMIT), users can select annotation criteria to restrict the results.
As an example, in Figure 3, we show the page resulting from a sample query consisting of promiscuous peptides from CORE binding to the A2 supertype. From the EPIPOX output, users can also access additional information available from the Virus Pathogen Resource database (Figure 3) [56]. EPIPOX is related somewhat to certain existing databases. On the one hand, it shares features with generic epitope databases such as EPIMHC [25], AntiJen [40], and IEDB [24] and on the other hand it shares features with poxvirus genome annotation-orientated databases such as the Poxviridae database [28] (no longer operating) and the Virus Pathogen Resource (http://www.viprbrc.org/) [56]. This later resource contains information on virus sequences, functional annotations and epitopes derived from IEDB [24]. However, the Virus Pathogen Resource does not allow selection of epitopes or antigens by criteria that are relevant to epitope vaccine design. In fact, EPIPOX is the only dedicated immunologic resource that has been designed to facilitate the rational selection of epitopes and antigens for subunit vaccine design.

Conclusions and Future Development
The availability of the VARV genomes enables the use of predictive tools that reveal entire T-cell epitomes and facilitate the development of epitope-based vaccines. However, in large and complex viruses, such as VARV, the potential T-cell epitome can be so sizeable that it will challenge experimental validation. Therefore, in this work we applied a rational strategy to limit the list of potential T-cell epitopes. First, we reduced the number of antigens by half by simply selecting those that are conserved among pathogenic orthopoxviruses related to VARV. Second, we enriched the antigens with annotations such as temporal expression and location. Lastly, we created a resource and de facto analysis pipeline (EPIPOX) with which to interrogate the resulting T-cell epitome and enable users to select immunologically relevant subsets of Tcell epitopes suitable for experimental validation.
We expect that this work and EPIPOX will be instrumental in developing safer smallpox vaccines and thereby in preventing zoonosis caused by other orthopoxviruses, including MPXV, which is also a potential terrorist bioweapon. In the future, we plan to enhance EPIPOX with validated and/or experimentally determined epitopes, upgrade protein annotations with functional information, and include additional features such as TAP transport [57], ERAAP cleavage [58], and T-cell epitope immunodominance. In sum, we would expect EPIPOX to establish itself as a facilitating resource of true utility in inter alia immunoinformatic characterization of viral genomics and computational reverse vaccinology.