A Highly Conserved GEQYQQLR Epitope Has Been Identified in the Nucleoprotein of Ebola Virus by Using an In Silico Approach.

Ebola virus (EBOV) is a deadly virus that has caused several fatal outbreaks. Recently it caused another outbreak and resulted in thousands afflicted cases. Effective and approved vaccine or therapeutic treatment against this virus is still absent. In this study, we aimed to predict B-cell epitopes from several EBOV encoded proteins which may aid in developing new antibody-based therapeutics or viral antigen detection method against this virus. Multiple sequence alignment (MSA) was performed for the identification of conserved region among glycoprotein (GP), nucleoprotein (NP), and viral structural proteins (VP40, VP35, and VP24) of EBOV. Next, different consensus immunogenic and conserved sites were predicted from the conserved region(s) using various computational tools which are available in Immune Epitope Database (IEDB). Among GP, NP, VP40, VP35, and VP30 protein, only NP gave a 100% conserved GEQYQQLR B-cell epitope that fulfills the ideal features of an effective B-cell epitope and could lead a way in the milieu of Ebola treatment. However, successful in vivo and in vitro studies are prerequisite to determine the actual potency of our predicted epitope and establishing it as a preventing medication against all the fatal strains of EBOV.


Introduction
EBOV is a major member of the viral family Filoviridae and is known to be the highly lethal pathogen responsible for hemorrhagic fever [1]. According to Centers for Disease Control and Prevention (CDC), the disease symptoms include fever (greater than 101.5 ∘ F), unexplained hemorrhage (bleeding or bruising), muscle pain, abdominal (stomach) pain, severe headache, vomiting, and diarrhea (http://www.cdc.gov/vhf/ ebola/symptoms/). EBOV genome is composed of linearly arranged genes on a single negative-stranded RNA molecule that encodes the seven structural proteins (NP-VP35-VP40-GP-VP30-VP24-L), where NP, VP, GP, and L stand for nucleoprotein, viral structural protein, glycoprotein, and RNA dependent RNA polymerase, respectively [2]. EBOV is comprised of 5 distinct species: Bundibugyo, Zaire, Reston, Sudan, and Taï Forest. Among them, the Reston species is not known to cause disease in humans, but the fatality rates in outbreaks of the other four species have ranged from 25 to 90% [3]. Now in 2014 West Africa is experiencing the largest outbreak of Ebola, which is due to the Zaire species and is affecting Guinea, Sierra Leone, Liberia, Senegal, and Nigeria [4]. According to World Health Organization (WHO), as of December 31, 2014, a total number of 20,206 EBOV disease cases and 7905 deaths have been reported in the current outbreak (http://www.who.int/csr/disease/ebola/situationreports/en/).
As there is currently no proven therapeutic solution or vaccination against EBOV and the outbreaks of EBOV have been reported frequently, thus identification of therapeutics is a high priority (http://www.who.int/mediacentre/factsheets/fs103/en/). In addition to this, rapid and reliable Ebola virus specific assays are required for diagnosis and outbreak control. The availability of a great number of sequence information has made the potential B-and T-cell epitope identification an auspicious approach for developing therapeutics and vaccine against infectious disease. Nowadays the use of computational methods has made it easy to predict the epitopes and design vaccine in terms of time and cost.  Computer aided vaccine design has been proved as promising approach for combating diseases such as malaria, tumors, and multiple sclerosis [5][6][7].
In this investigation, we have reported a highly conserved B-cell epitope GEQYQQLR in nucleoprotein of EBOV for the prevention of all the fatal strains of this virus. This approach was accomplished by using bioinformatics analysis of viral structural proteins to find the conserved peptide region to identify a highly immunogenic, accessible, and conserved epitope.

Materials and Methods
The flowchart in Figure 1 summarizes the steps which have been followed in this study.

Multiple Sequence Alignment.
Multiple sequence alignment of the retrieved protein sequences were performed by using EBI-Clustal Omega program (http://www.ebi.ac.uk/ Tools/msa/clustalo/) [8,9]. Protein sequences that cover highest number of similar and identical amino acids without gap were chosen as the conserved region. Later on, the conserved region was used to predict linear B-cell epitopes, antigenic sites, and surface accessible epitopes.

Prediction of Linear B-Cell Epitope and Antigenic Site.
BepiPred (v1.0) was employed to predict linear B-cell epitopes from the conserved region based on hidden Markov model with a default threshold of 0.35 [10]. Besides, the B-cell epitope prediction tool of the immune epitope database (IEDB) (http://tools.immuneepitope.org/bcell/) was used to predict antigenic sites [11]. Here, the Kolaskar and Tongaonkar antigenicity method was applied while predicting the antigenic sites with a default threshold value of 1.0 [12].

Surface Accessible Epitope.
Emini surface accessibility prediction tool of the IEDB database (http://tools.immuneepitope.org/tools/bcell/iedb input) was used to predict the surface accessible epitopes from the conserved region keeping the default threshold value 1.0 unchanged [13].

Prediction of Epitope Conservancy.
To predict the conservancy of the candidate epitopes, the IEDB epitope conservancy prediction tool (http://tools.immuneepitope.org/tools/ conservancy/iedb input) was used [14]. It measures the epitope conservancy based on the sequence identity between epitopes and the given protein sequences.

Prediction of Epitope Hydrophilicity.
For the determination of the hydrophilicity of the conserved region, Parker hydrophilicity prediction tool of the IEDB database (http:// tools.immuneepitope.org/bcell/) was employed [15]. The default threshold value of 1.571 was used in this study.

Prediction and Structure Analysis of NP.
As there was no experimental 3D structure of NP present in PDB database, therefore the 3D structure of Ebola NP was constructed using Phyre2 (Protein Homology/analogY Recognition Engine V 2.0) (http://www.sbg.bio.ic.ac.uk/phyre2/html/) [16]. The Ramachandran plot for our designed 3D structure was done by employing RAMPAGE (http://mordred.bioc.cam.ac.uk/∼ rapper/rampage.php) to determine the quality and validity of the modeled structure [17]. Later on, the conserved epitope was mapped on the modeled structure of the NP protein by using PyMOL molecular graphics software (version 1.5.0.3) [18].

Conservancy of the Identified Epitope in the Marburg Virus (MBG) NP Protein.
A total of 42 primary amino acid sequences of MBG NP were retrieved from NCBI protein database (http://www.ncbi.nlm.nih.gov/protein/). Multiple sequence alignment of the retrieved sequences including one Ebola NP protein was conducted by using EBI-Clustal Omega program (http://www.ebi.ac.uk/Tools/msa/clustalo/) to identify whether the identified epitope is conserved within MBG or not [8,9].   Table 1.

Prediction and Selection of B-Cell Epitopes.
A peptide must ensure the immunogenic and antigenic property to be considered as potential B-cell epitope. Additionally, in antibody mediated immune response, the binding of an antibody with a surface accessible antigenic epitope is important. Therefore, surface accessibility is a prerequisite characteristic for a peptide to become a potential B-cell epitope [19]. By considering these conditions, we have aimed to predict and select B-cell epitopes from the identified conserved sequences of the proteins of EBOV by using various computational tools.

Glycoprotein (GP).
Predicted linear B-cell epitopes, antigenic sites, and surface accessible epitopes from the conserved sequence of GP protein are summarized in Table 2. We observed that only one predicted linear Bcell epitope from this region is HDWTKN. This epitope was found to be overlapped with another surface accessible epitope PHDWTKNITDKI but showed no antigenicity in the Kolaskar and Tongaonkar prediction. Hence, it can be said that none of the predicted epitopes from the conserved region of GP protein fulfilled the criteria to become a potential B-cell epitope.

Nucleoprotein (NP).
A total of ten linear B-cell epitopes, ten antigenic sites, and five surface accessible epitopes have been predicted from the conserved region of NP protein (Table 3). Among them, GEQYQQLR was the only one to satisfy the immunogenic, antigenic, and surface accessible criteria.

Matrix Protein (VP40) and Transcription Factor (VP30).
The predicted linear B-cell epitopes, antigenic sites, and surface accessible epitopes from the conserved sequences of VP40 and VP30 proteins are summarized in Tables 4 and 5, respectively. Here, none of the predicted peptides overlaps among these three predictions.

Polymerase Cofactor (VP35).
In case of VP35 protein, a total of two linear B-cell epitopes, two antigenic sites, and one surface accessible epitope were predicted (Table 6). Only PVPPSP epitope from the conserve sequence of VP35 was found to be overlap among all three predictions.

Conservancy Analysis. GEQYQQLR and PVPPSP
showed 100% conservancy among all the fatal strains of EBOV when all GP and VP35 protein sequences were implemented for conservancy analysis (Figure 2).

GEQYQQLR Epitope Demonstrated Hydrophilicity.
The results of Parker hydrophilicity prediction tool for the conserved sequences of NP and VP35 proteins are given in Figure 3. The average hydrophilicity score of the GP protein was 1.122. Most importantly, our predicted GEQYQQLR epitope (226-233 amino acids position in Figure 3(a)) was found to cross the threshold (1.571) value. On the contrary, PVPPSP epitope (19-24 amino acids position in Figure 3(b)) demonstrated lower hydrophilicity score than the threshold Advances in Bioinformatics  value (1.571). For passing all filter and criteria, GEQYQQLR was selected as the ultimate desired B-cell epitope.

Homology Modeling of Ebola NP and Epitope Mapping.
The Phyre2 generated 3D structure of Ebola NP is presented in Figure 4(a). From the results of Ramachandran plot analysis (Figure 4(b)), it was observed that 86.3% amino acid residues of the modeled Ebola NP were in the acceptable region of the plot. In Figure 4(c), the mapping of conserved epitope GEQYQQLR on the surface of designed Ebola NP was given.

Discussion
EBOV infection which is endemic in central Africa demonstrates high fatality rate in humans. Application of proper treatment and absence of effective vaccine combined with the lethal effect of EBOV have made this virus an important public health pathogen. Although several reports have been made in recent years for the development of vaccine against EBOV, but till date neither effective vaccines nor drugs are licensed for human use against EBOV [20][21][22]. As the clinical symptoms at the early stage caused by other pathogens are comparable with that of EBOV hemorrhagic fever, therefore developing rapid, sensitive, reliable, and virus specific diagnostic tests is in urgent need. In this regard, EBOV antigen detection assay using specific monoclonal antibodies (mAb) would be one of the best possible ways to detect viral infection at the early stage in the field setting [23].
In the year 2000 and 2007, identification of the multiple monoclonal antibodies including neutralizing antibodies against EBOV and the epitopes from viral glycoprotein in rodent models was done, respectively [24,25]. More recently, in 2014, Becquart et al. have identified continuous B-cell epitopes in the GP, NP, VP40, and VP35 proteins of EBOV [26]. In spite of these findings, the information on human B-cell epitopes from the structural proteins of EBOV is not sufficient. In addition to this, the works that have been done on humoral response associated with EBOV were against some specific strains, mostly Zaire. To present dates there is no evidence of work that includes all the fatal strains of EBOV. In this study, we have tried to identify highly conserved Bcell epitopes from GP, NP, VP40, VP35, and VP24 proteins of EBOV.
Results from the multiple sequence alignment have showed that GP, NP, VP40, VP35, and VP30 proteins are less mutation prone than VP24 protein. We did not get any conserved region from VP24 protein. Therefore, it was excluded from further analysis based on identified conserved regions. The remaining conserved regions were analyzed in turns of B-cell antigenicity, surface accessibility, persistent conservancy, and hydrophilicity to select final B-cell epitope. We have observed that none of the predicted epitopes from the conserved region of GP, VP40, VP35, and VP30 proteins possesses these properties. Only in case of NP, we have identified that GEQYQQLR epitope displays all the key properties mentioned above. Furthermore, we performed structural analysis by using a homology modeling and epitope mapping approach to find out whether or not this epitope present on the surface of modeled EBOV NP. In our analysis, GEQYQQLR epitope was found to have the existence on the surface of our modeled EBOV nucleoprotein. Thereby it gives further validation that GEQYQQLR epitope possesses all the key properties to become a successful B-cell epitope.
Conventional molecular immunoassay techniques of virus detection are based on conserved epitopes of specific viral protein. Considering this principle, EBOV NP could be the most important antigenic protein due to its abundance in EBOV in immunological detection [27,28]. In this study, it has been proved that the identified GEQYQQLR epitope is fully conserved at the C-terminal region of NP in all the EBOV strain included. Although significant variability of the C-terminal region has been reported but our predicted epitope was found to be conserved well [28].
Finally, the result of multiple sequence alignment with MBG and EBOV NP demonstrated that the identified epitope GEQYQQLR has 100% conservancy among the MBG NP. This dictates the importance of designing of a common vaccine against both of the virus using this conserved epitope. In addition to this, it could be an aid to develop new assays of viral antigen detection such as immune chromatography based rapid detection method.

Advances in Bioinformatics 7
As we have identified this epitope by employing a computational approach, hence determination of the real efficacy, immunogenicity, and stability of this epitope needs to be done by further in vivo and in vitro studies. An adjuvant might be needed to conjugate with this peptide if it shows poor immunogenicity or stability while testing in recipient's body [29].
Generally, development of diagnostic tools, vaccines, and therapeutics against viral diseases is accelerated by the characterization of antigenic sites in the viral proteins [23,30]. In this study, by employing sequence analysis and computational prediction we have identified GEQYQQLR epitope from EBOV NP as the best B-cell epitope target for designing new mAb based specific rapid detection assays and therapeutics against this EBOV upon successful in vitro and in vivo studies.