Immunoinformatics Approach for Multiepitope Vaccine Prediction from H, M, F, and N Proteins of Peste des Petits Ruminants Virus

Background Small ruminant morbillivirus or peste des petits ruminants virus (PPRV) is an acute and highly contagious viral disease of goats, sheep, and other livestock. This study aimed at predicting an effective multiepitope vaccine against PPRV from the immunogenic proteins haemagglutinin (H), matrix (M), fusion (F), and nucleoprotein (N) using immunoinformatics tools. Materials and Methods The sequences of the immunogenic proteins were retrieved from GenBank of the National Center for Biotechnology Information (NCBI). BioEdit software was used to align each protein from the retrieved sequences for conservancy. Immune Epitope Database (IEDB) analysis resources were used to predict B and T cell epitopes. For B cells, the criteria for electing epitopes depend on the epitope linearity, surface accessibility, and antigenicity. Results Nine epitopes from the H protein, eight epitopes from the M protein, and ten epitopes from each of the F and N proteins were predicted as linear epitopes. The surface accessibility method proposed seven surface epitopes from each of the H and F proteins in addition to six and four epitopes from the M and N proteins, respectively. For antigenicity, only two epitopes 142PPERV146 and 63DPLSP67 were predicted as antigenic from H and M, respectively. For T cells, MHC-I binding prediction tools showed multiple epitopes that interacted strongly with BoLA alleles. For instance, the epitope 45MFLSLIGLL53 from the H protein interacted with four BoLA alleles, while 276FKKILCYPL284 predicted from the M protein interacted with two alleles. Although F and N proteins demonstrated no favorable interaction with B cells, they strongly interacted with T cells. For instance, 358STKSCARTL366 from the F protein interacted with five alleles, followed by 340SQNALYPMS348 and 442IDLGPAISL450 that interacted with three alleles each. The epitopes from the N protein displayed strong interaction with BoLA alleles such as 490RSAEALFRL498 that interacted with five alleles, followed by two epitopes 2ATLLKSLAL10 and 304QQLGEVAPY312 that interacted with four alleles each. In addition to that, four epitopes 3TLLKSLALF11, 356YFDPAYFRL364, 360AYFRLGQEM368, and 412PRQAQVSFL420 interacted with three alleles each. Conclusion Fourteen epitopes were predicted as promising vaccine candidates against PPRV from four immunogenic proteins. These epitopes should be validated experimentally through in vitro and in vivo studies.


Introduction
Small ruminant morbillivirus (previously called peste des petits ruminants virus (PPRV)) is one of the most damaging ruminant diseases. It is among the priority diseases indicated in the FAO-OIE Global Framework for the Progressive Control of Transboundary Animal Diseases (GF-TADs) in the 5year Action Plan [1,2]. PPRV is one of the top ten diseases in sheep and goats that are having a high impact on the poor rural small ruminant farmers [3]. The disease is considered an acute and highly contagious viral disease with a high morbidity and mortality rate in small ruminants, such as goats and sheep and related wild animals [4,5]. The disease is characterized by high fever, depression, anorexia, ocular and nasal discharge, pneumonia, necrosis and ulceration of mucous membranes, and inflammation of the gastrointestinal tract leading to severe diarrhea [6,7]. It causes high death rates in goats and sheep up to 100% and 90%, respectively. However, sheep can be subclinically infected and play a major role in the silent spread of PPRV over large distances and across borders [1]. The disease is widely distributed in Africa, on the Arabian Peninsula, and in the Middle East and Asia [5,8,9]. Morbilliviruses are rapidly inactivated at environmental temperature by solar radiation and desiccation. This indicated that the transmission occurred by direct contact with infected animals or their excretions. Transmission of PPRV occurs primarily by droplet infection but may also occur by ingestion of contaminated feed or water [6].
PPRV is an enveloped single strand of negative sense RNA virus, belonging to the genus Morbillivirus, in the family Paramyxoviridae which is closely related to rinderpest virus (RPV), canine distemper virus (CDV), and measles virus (MeV) [5,10,11]. The genome of morbilliviruses is organized into six transcriptional units encoding six structural proteins. These structural proteins include the nucleoprotein (N protein), matrix protein (M protein), polymerase or large protein (L protein), phosphoprotein (P protein), and two envelope glycoproteins, the haemagglutinin protein (H protein) and the fusion protein (F protein) [12][13][14]. The N protein played an important role in the viral life cycle, interacting with both viral and cellular proteins. It also interacted with the viral RNA to form the nucleocapsid structures seen in both the virions and infected cells [13]. The viral L and P proteins interact with the nucleocapsids to form the functional transcription/replication unit of the virion [13]. The C-termini of morbillivirus N proteins also interacted with cellular regulatory proteins such as heat shock protein Hsp72, interferon regulator factor-(IRF-) 3, and a novel cell surface receptor (genetically engineered receptor) [13]. The F protein facilitated the virus penetration of the host cell membrane. This protein is also critical for the induction of an effective protective immune response [15]. The M protein of paramyxoviruses forms an inner coat to the viral envelope and thus serves as a bridge between the surface viral glycoproteins and the ribonucleoprotein core. By virtue of its position, M appeared to play a central role in viral assembly by formation of new virions which were liberated from the infected cell by budding [16,17]. Interaction of the PPRV H and F proteins with the host plasma membrane led to viral entry by binding of the H protein to receptors [17]. Generally, the protective cell-mediated and humoral immune responses against morbilliviruses are directed mainly against H, F, M, and N proteins. Moreover, PPRV is genetically grouped into four distinct lineages (I, II, III, and IV) based on the analysis of the fusion (F) gene. This classification of PPRV into lineages has broadened the understanding of the molecular epidemiology and worldwide movement of PPR viruses [7,[18][19][20].
Vaccination is the main tool for controlling and eradicating the PPR virus [12]. Despite the fact that live attenuated vaccines have been widely used to protect small ruminants against circulating PPRV [1,3,7], the continuous spread of PPR disease indicated two possible hypotheses. The first is the emergence of new PPRV strains with new genetic makeup and greater fitness in the face of vaccine-elicited protection. The second is the lapses in regulatory control that ultimately lead to movement of diseased/infected individuals across the region/state/country without proper monitoring and surveillance [1].
The advances made in the field of immunoinformatics tools coinciding with the knowledge on the host immune response lead to new disciplines in vaccine design against diseases via computer in silico epitope predictions. The epitopedriven vaccine is a new concept that is being successfully applied in multiple studies, particularly to the development of vaccines targeting conserved epitopes in variable or rapidly mutating pathogens [21][22][23]. The identification of specific epitopes derived from infectious disease has significantly advanced the development of peptide-based vaccines. Peptides elicited more desirable manipulation of immune response through the use of the B cell epitopes. These epitopes mainly induce antibody production from B cells and cellular response and cytokine secretion from T cells. The approach regarding the molecular basis of antigen recognition and HLA binding motifs to host class I and class II MHC proteins is highly supported by the immunoinformatics which aids in designing epitope-based vaccine motifs that serve as therapeutic candidates for many infectious diseases [24].
The main objective of this study was to analyze multiple immunogenic proteins from the PPR genome for designing a safe multiepitope vaccine using immunoinformatics tools present in the Immune Epitope Database (IEDB). These proteins include haemagglutinin protein (H), matrix protein (M), fusion protein (F), and nucleoprotein (N) sequences of PPRV strains reported in the (NCBI) database.         Each protein tree was constructed using the maximum likelihood parameter in the software.

Epitope Prediction. Several immunobioinformatics tools
were used for prediction of multiple epitopes from the four immunogenic proteins of PPRV. Tools from the Immune Epitope Database analysis resource (http://www.iedb.org/) [27] were used to analyze the immunogenic proteins. The input was the reference sequences of H protein (YP_ 133827.2), M protein (YP_133825.1), F protein (YP_ 133826.1), and N protein (YP_133821.1). They were submitted to Epitope Analysis Resources to predict B and T cell epitopes. The predicted epitopes were further investigated in aligned retrieved sequences for conservancy to identify the proposed candidate epitopes.

B Cell Epitope Prediction.
Epitopes that interacted with the B lymphocytes are a discrete part from the antigenic molecule that is recognized by the B cell receptor and elicited immunoglobulin production. These predicted epitopes are characterized by their surface accessibility and their antigenic reactivity with the immunoglobulins of the humoral immunity [24]. Epitope prediction tools of the Immune Epitope Database (IEDB) at http://tools.iedb.org/bcell/ [27] were used for this purpose. Linear B cell epitopes were predicted by BepiPred linear epitope prediction (http://tools.iedb.org/bcell/result/) [28]. The Emini surface accessibility prediction tool was performed to detect the surface accessible epitopes (http:// tools.iedb.org/bcell/) [29], while prediction of antigenic epitopes was performed to identify the antigenic determinants on proteins based on the physicochemical properties of amino acid residues using the Kolaskar   cytotoxic T cell (CTL) epitopes that bind to the major histocompatibility complex class I alleles (MHC class I) [31]. Analysis was done using cow alleles (BoLA-D18.4, BoLA-HD6, BoLA-JSP.1, BoLA-T2a, BoLA-T2b, and BoLA-T2c). An artificial neural network (ANN) was used to predict the binding affinity [32,33]. The peptide length for all selected epitopes was set to 9 amino acids (9mers). Percentile rank required for the peptide's binding to the specific MHC-I molecules was set in the range from 1 to 3. The prediction of the three-dimensional (3D) structure of H, M, and F protein reference sequences of PPRV was performed using the RaptorX structure prediction server (http://raptorx.uchicago.edu/StructurePrediction/ predict/) [34][35][36], while the N protein sequence was submitted to the SPARKS-X server (http://sparks-lab.org/yueyang/ server/SPARKS-X/) [37]. The 3D structure of each protein reference sequence was later treated with Chimera software 1.8 to show the position of proposed epitopes [38].

Results and Discussion
The validity and benefits of peptide vaccines designed by bioinformatics tools had been verified by appreciable research [24]. The availability of the complete genome, proteome sequences, and pathogenesis of many pathogenic microorganisms contributed to the production of a vaccine through bioinformatics [24,39]. In this study, the predicted epitopes from B and T lymphocytes would help in the development of a more effective, reliable, preventive, and therapeutic vaccine against the PPRV than the conventional methods.

Phylogenetic Evolution.
A phylogenetic tree was constructed using MEGA7.0.26 (7170509). The evolutionary divergence among each protein was analyzed. As shown in  6   5   4   3   2   1   0  0  20  40  60  80  100  120  140  160  180  200  220  240  260  280  300  320  340  360  380  400  420  440  460  480  500  520  540  560  580  600  620 Score Score Score Score Score Score Score Score Score Score   Figure 1, the retrieved strains of the H protein revealed that Asian strains were clustered together as well as the European and African strains. However, strains from the United Arab Emirates and Oman were closely related to African strains (namely to Ethiopian strains). With regard to the phylogeny of the M protein strains, the African strains were also clustered together, but among them, the Oman and United Arab Emirates strains were observed to be close to the Ethiopian strains same as those of the H protein. This result may indicate the transfer of the H and M strain segments between these countries. Also, some European and Turkish strains were clustered together. As shown in Figure 2, the retrieved strains of F and N proteins from the Asian strains were clustered together with molecular divergence among them as well as the strains retrieved from the African countries. Also, the Omanis and Emiratis strains showed close relationship to the African strains. These results indicated that these strain segments were widely distributed in Africa, Asia, Europe, and the Arab region.
3.2. Sequence Alignment. Multiple sequence alignment was performed using ClustalW in BioEdit software. As shown in Figure 3, the aligned sequences of each of the four analyzed proteins (H, M, F, and N proteins) showed considerable conservancy among the retrieved strains. However, some regions exhibited differences (mutations) in some amino acids in various sequences.  Figure 5: The prediction of the three-dimensional (3D) structure of H, M, and F protein reference sequences of PPRV was performed using the RaptorX structure prediction server, while the N protein sequence was submitted to the SPARKS-X server.

Prediction of B Cell
Epitopes. B cell epitope prediction methods aimed are at identifying the antigens recognized by B lymphocytes to initiate humoral immunity [24]. The important criteria for selecting a potential epitope for vaccine development are surface accessibility, hydrophobicity, flexi-bility, and antigenicity [40]. The predicted epitopes should be located on the surface of the cells so that it is more accessible for both the humoral and the cellular immune systems. Antigenicity also is one of the important features of an antigen for vaccine development [40]. Depending on binding affinity to B lymphocytes, the BepiPred linear epitope prediction method predicted nine linear epitopes from the H protein, eight epitopes from M proteins, and ten epitopes for each of the F and N proteins. Analysis of these linear epitopes for surface accessibility proposed seven surface epitopes from each of the H and F proteins, six epitopes from the M protein, and four epitopes from the N protein.
As shown in Figure 4, the threshold values were 0.350 and 1.000 for all epitopes predicted through the BepiPred linear epitope (conserved epitopes) and Emini prediction methods (surface accessibility), respectively. The antigenicity prediction method proposed only two epitopes for all test immunogenic proteins of PPRV. Also, Figure 4 shows that the antigenic epitopes were predicted from H, M, F, and N proteins using the Kolaskar and Tongaonkar antigenicity method under threshold values of 1.014, 1.037, 1.054, and 1.014, respectively. However, no epitopes successfully passed the threshold for the F and N proteins.
Only one epitope from each of the H and M proteins successfully overlapped all the B cell antigenic index prediction methods. Namely, these epitopes were 142 PPERV 146 from the H protein and 63 DPLSP 67 from the M protein. The 3D structure of the four proteins (H, M, F, and N) is shown in Figure 5. The positions of the best B cells that predicted epitopes from the H and M proteins are demonstrated in Figure 6. The overall predicted epitopes from the four proteins are illustrated in Table 5.

Prediction of CTL Epitopes That Interacted with MHC
Class I (BoLA Alleles). CD8+ and CD4+ T cells have a principal role in the stimulation of immune response as well as antigen-mediated clonal expression of the B cell [14]. Unfortunately, the bovine genome project did not assemble a complete sequence of the bovine MHC-II locus [41][42][43]. Thus, the analysis was completed with BoLA MHC-I alleles only. Cell-mediated immunity induced by cytotoxic T lymphocytes (CTLs) is vital for the defense against viral diseases. CTLs are responsible for the immune elimination of intracellular pathogens such as viruses because these cells recognize the presented endogenous antigenic peptides by the MHC class I molecules [44].
In this study, MHC-I binding prediction methods using the IEDB database predicted different CTL epitopes that strongly interacted with various BoLA alleles. The  The nucleoprotein (N) also displayed strong interaction activity with BoLA alleles. Seven epitopes were proposed with strong interaction with BoLA alleles. The top N protein epitope was 490 RSAEALFRL 498 which was associated with five alleles, followed by two epitopes, namely, 2 ATLLKSLAL 10 , and 304 QQLGEVAPY 312 that linked to four alleles each. In addition to that, four epitopes 3 TLLKSLALF 11 , 356 YFD-PAYFRL 364 , 360 AYFRLGQEM 368 , and 412 PRQAQVSFL 420 interacted with three bovine alleles each. Surprisingly, these two proteins (F and N) achieved promising results in CTL prediction methods, although they failed to predict any epitope carrying all the ideal traits in B cells.
The haemagglutinin (H) protein predicted five CTL epitopes, but one epitope was predicted as the best peptide, 45 MFLSLIGLL 53 , as it linked to four BoLA alleles, followed by four peptides that interacted with two alleles each. They were 113 DLVKFISDK 121 , 405 GRIPAYGVI 413 , 52 LLAIAGIRL 60 , and 44 VMFLSLIGL 52 . However, this protein showed a somewhat satisfactory result in B and T cell prediction methods. The M protein showed unsatisfactory results in CTL prediction methods different from that predicted by B cell methods. The results suggested only one epitope; 276 FKKILCYPL 284 interacted with only two alleles. The overall epitopes that were proposed to interact with CTL alleles are illustrated in Table 6 for all proteins. The positions of the best CTLpredicted epitopes in their immunogenic protein structure are shown in Figure 7.
Vaccination is considered the most effective way of controlling PPR. The infection by morbillivirus is associated with severe immunosuppression that is characterized by a massive virus-specific immune response. Protection is mediated by cell-mediated and humoral immune responses directed mainly against particular proteins in the viral structure. These proteins included H, F, and N proteins [45][46][47]. It was reported that the envelope glycoproteins H and F of PPRV demonstrated a protective and neutralizing antibody response [3,[48][49][50]. In this study, using the immunoinformatics prediction methods, the H protein demonstrated affinity to interact with B cells that was characterized by antibody production. This result coincided with the previously published reports [3,[48][49][50], while the F protein failed to interact with B cells; i.e., no epitopes from the F protein had passed the threshold of the B cell prediction methods. However, this protein revealed multiple predicted epitopes that demonstrated high affinity to the alleles of CTLs. The M protein which is believed to play a very significant role in morbillivirus assembly and budding by concentrating the F, H, and N proteins at the virus-assembly site [16,17] showed moderate affinity to B cells. One epitope from the M protein as well as the H protein was predicted as a B cell epitope. Moreover, the M protein revealed multiple epitopes that interacted with CTLs of the cell-mediated immunity. This result indicated that the M protein besides its role in the virus assembly may also contain antigenic determinants that could be elected as vaccine candidates.
In addition to that, cell-mediated immunity plays a role in protection against the viral infection. Despite the N protein being the most frequent viral protein in PPRV, it does not induce a neutralizing antibody response in the host [50]. However, it has been found to induce a strong cellmediated immune response, which is believed to contribute to protection. Here, in this report, the same result was obtained. The N protein demonstrated no affinity to elicit the humoral immune response. However, it showed favorable affinity to interact with a cell-mediated response. It is noteworthy that five out of seven epitopes predicted from the nucleoprotein of PPRV in this study were found to be proposed by another in silico study using mouse alleles and NetMHCI methods [51]. The proposed epitopes from that study were ATLLKSLAL, TLLKSLALF, YFDPAYFRL, AYFRLGQEM, and RSAEALFRL. Thus, the predictions for the different epitopes that bound to different alleles particularly from the N protein of PPRV were somewhat in agreement regardless of the alleles (cow and mouse alleles) and algorithm used (ANN, NetMHCI).
In general, epitope-based vaccines that are chemically well-characterized have become desirable candidate vaccines due to their relative ease of production and construction, chemical stability, and lack of infectious potential [52]. Many in silico studies have shown the value of using prediction programs to evaluate the efficiency of binding of putative epitopes to various human and animal alleles [33,[52][53][54][55].

Conclusion
This study focused mainly on the production of a peptide vaccine against H, M, F, and N proteins of PPRV using immunoinformatics tools. Epitopes that showed conservancy and high binding affinities to many MHC alleles are considered the best candidates for in vitro and in vivo testing. Epitopes that were predicted from B cell prediction methods like 142 PPERV 146 and 305 TVTL 308 from the H protein and 63 DPLSP 67 and 64 PLSP 67 from the M protein could act as good B cell epitopes to induce humoral immunity. While the F and N proteins failed to fulfill all B cell indexes used in this study for the prediction of promising epitopes, however, these proteins predicted epitopes that interacted with various BoLA MHC-I alleles. For instance, the best epitopes were predicted from F ( 358 STKSCARTL 366 ) and N ( 490 RSAEALFRL 498 ) proteins as they interacted with five MHC-I BoLA alleles, followed by 45 MFLSLIGLL 53 proposed from the H protein and linked with four alleles, while the 276 FKKILCYPL 284 epitope was predicted from the M protein linked with only two alleles. Although bioinformatics studies have been established to facilitate the peptide design, not all peptides that are predicted in silico are optimally immunogenic in vivo and it remains necessary to test the expected peptides in vivo to ensure that the T cell responses are elicited.