Design of multi epitope-based peptide vaccine against E protein of human COVID-19: An immunoinformatics approach

Background New endemic disease has been spread across Wuhan City, China on December 2019. Within few weeks, the World Health Organization (WHO) announced a novel coronavirus designated as coronavirus disease 2019 (COVID-19). In late January 2020, WHO declared the outbreak of a “public-health emergency of international concern” due to the rapid and increasing spread of the disease worldwide. Currently, there is no vaccine or approved treatment for this emerging infection; thus the objective of this study is to design a multi epitope peptide vaccine against COVID-19 using immunoinformatics approach. Method Several techniques facilitating the combination of immunoinformatics approach and comparative genomic approach were used in order to determine the potential peptides for designing the T cell epitopes-based peptide vaccine using the envelope protein of 2019-nCoV as a target. Results Extensive mutations, insertion and deletion were discovered with comparative sequencing in COVID-19 strain. Additionally, ten peptides binding to MHC class I and MHC class II were found to be promising candidates for vaccine design with adequate world population coverage of 88.5% and 99.99%, respectively. Conclusion T cell epitopes-based peptide vaccine was designed for COVID-19 using envelope protein as an immunogenic target. Nevertheless, the proposed vaccine is rapidly needed to be validated clinically in order to ensure its safety, immunogenic profile and to help on stopping this epidemic before it leads to devastating global outbreaks.


Introduction:
Coronaviruses (CoV) are a large family of zoonotic viruses that cause illness ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS-CoV) and severe Acute Respiratory Syndrome (SARS-CoV). In the last decades, six strains of coronaviruses were identified, however in December 2019; a new strain has been spread across Wuhan City, China [1,2]. It was designated as coronavirus disease 2019 (COVID-19) by the World Health Organization (WHO) [3]. In late January 2020, WHO declared the outbreak a global pandemic with cases in more than 45  COVID-19 is a positive-sense single stranded RNA virus (+ssRNA). Its RNA sequence is approximately 30,000 bases in length [4]. It belongs to the subgenus Sarbecovirus, Genus Betacoronavirus within the family Coronaviridae. The corona envelope (E) protein is a small, integral membrane protein involved in several aspects of the virus' life cycle, such as pathogenesis, envelope formation, assembly and budding; alongside with its interactions with both other CoVs proteins (M, N & S) and host cell proteins (release of infectious particles after budding) [5][6][7][8][9].
The infected person characterized with fever, upper or lower respiratory tract symptoms, or diarrhoea, lymphopenia, thrombocytopenia, and increased C-reactive protein and lactate dehydrogenase levels or combination of all these within 3-6 days after exposure. Further molecular diagnosis can be made by Real Time-PCR for genes encoding the internal RNAdependent RNA polymerase and Spike's receptor binding domain, which can be confirmed by Sanger sequencing and full genome analysis by NGS, multiplex nucleic acid amplification and microarray-based assays [10][11][12][13][14] A phylogenetic tree of the mutation history of a family of viruses is possible to reconstruct with a sufficient number of sequenced genomes. The phylogenetic analysis indicates that COVID-19 is likely originated from bats [15]. It also showed that is highly related with at most seven mutations relative to a common ancestor [16].
The sequence of COVID-19 RBD, together with its RBM that contacts receptor angiotensinconverting enzyme 2 (ACE2), was found similar to that of SARS coronavirus. On January 2020, a group of scientists demonstrated that ACE2 could act as the receptor for COVID-19 [17-21].
However, COVID-19 differs from other previous strains in having several critical residues at 2019-nCoV receptor-binding motif (particularly Gln493) which provide advantageous interactions with human ACE2 [15]. This difference in affinity possibly explains why the novel coronavirus is more contagious than those other viruses.
At present, there is no vaccine or approved treatment for humans, but Chinese traditional medicine, such ShuFengJieDu Capsules and Lianhuaqingwen Capsule, could be possible treatments for COVID-19. However, there are no clinical trials approving the safety and efficacy for these drugs [22].
The main concept within all the immunizations is the ability of the vaccine to initiate an immune response in a faster mode than the pathogen itself. Although traditional vaccines, which depend on biochemical trials, induced potent neutralizing and protective responses in the immunized animals but they can be costly, allergenic, time consuming and require in vitro culture of pathogenic viruses leading to serious concern of safety [23, 24]. Thus the need for safe and efficacious vaccines is highly recommended.
Peptide-based vaccines do not need in vitro culture making them biologically safe, and their selectivity allows accurate activation of immune responses [25,26]. The core mechanism of the peptide vaccines is built on the chemical method to synthesize the recognized B-cell and T-cell epitopes that are immunodominant and can induce specific immune responses. B-cell epitope of a target molecule can be linked with a T-cell epitope to make it immunogenic. The T-cell epitopes are short peptide fragments (8-20 amino acids), whereas the B-cell epitopes can be proteins [27,28]. Therefore, in this study, we aimed to design a peptide-based vaccine to predict epitopes from corona envelope (E) protein using immunoinformatics analysis [29][30][31][32][33][34]. While rapid further studies are recommended to prove the efficiency of the predicted epitopes as a peptide vaccine against this emerging infection.

Materials and Methods
Workflow summarizing the procedures for the epitope-based peptide vaccine prediction is shown in

The Artemis Comparison Tool (ACT):
ACT is an in silico analysis software for visualization of comparisons between complete genome sequences and associated annotations [35]. It is also applied to identify regions of similarity, rearrangements and insertions at any level from base-pair differences to the whole genome. (https://www.sanger.ac.uk/science/tools/artemis-comparison-tool-act).

VaxiJen server:
It is the first server for alignment-independent prediction of protective antigens. It allows antigen classification solely based on the physicochemical properties of proteins without recourse to sequence alignment. It predicts the probability of the antigenicity one or multiple of protein based on auto cross covariance (ACC) transformation of protein sequence. Structural CoV-2019 protein (N,S,E and M) was analyzed by VaxiJen with threshold of 0.4 [36]. (http://www.ddgpharmfac.net/vaxijen/VaxiJen/VaxiJen.html)

BioEdit:
It is a software package proposed to stream a distinct program that can run nearly any sequence operation as well as a few basic alignment investigations. The sequences of E protein were retrieved from UniProt were run in BioEdit to determine the conserved sites through ClustalW in the application settings [37].

The Molecular Evolutionary Genetics Analysis (MEGA):
MEGA (version 10.1.6) is software for the comparative analysis of molecular sequences. It is used for pairwise and multiple sequences alignment alongside construction and analysis of phylogenetic trees and evolutionary relationships. The gap penalty was 15 for opening and 6.66 for extending the gap for both pairwise and multiple sequences alignment. Bootstrapping of 300 was used in construction of maximum like hood phylogenetic tree [38,39]. (https://www.megasoftware.net).

Prediction of T-cell epitopes:
IEDB tools were used to predict the conserved sequences (10-mersequence) from HLA class I and class II T-cell epitopes by using artificial neural network (ANN) approach [40][41][42]. Artificial Neural Network (ANN) version 2.2 was chosen as Prediction method as it depends on the median inhibitory concentration (IC50) [40,[43][44][45]. For the binding analysis, all the alleles were carefully chosen, and the length was set at 10 before prediction was done. Analysis of epitopes binding to MHC class I and II molecules was assessed by the IEDB MHC prediction server at (http://tools.iedb.org/mhci/) and (http://tools.iedb.org/mhcii/), respectively. All conserved immunodominant peptides binding to MHC I and II molecules at score equal or less than 100 median inhibitory concentrations (IC50) and 1000, respectively were selected for further analysis while epitopes with IC50 greater than 100 were eliminated [46].

Population coverage analysis:
Population coverage for each epitope was carefully determined by the IEDB population coverage calculation tool. Due to the diverse binding sites of epitopes with different HLA allele, the most promising epitope candidates were calculated for population coverage against the whole world, China and Europe population to get and ensure a universal vaccine [47,48]. (http://tools.iedb.org/population/)

Tertiary structure (3D) Modeling:
The reference sequence of E protein that has been retrieved from gene bank was used as an input in RaptorX to predict the 3D structure of E protein [49,50], the visualization of the obtained 3D protein structure was performed in UCSF Chimera (version1.8) [51].

Ligand Preparation:
In order to estimate the binding affinities between the epitopes and molecular structure of MHC I and MHC II, in silico molecular docking were used. Sequences of proposed epitopes were selected from COVID-19 reference sequence using UCSF Chimera 1.10 and saved as a PDB file.
The obtained files were then optimized and energy minimized. The HLA-A*02:01 was selected as the macromolecule for docking. Its crystal structure (4UQ3) was downloaded from the RCSB Protein Data Bank (http://www.rcsb.org/pdb/home/home.do), which was in a complex with an azobenzene-containing peptide [52].
All water molecules and heteroatoms in the retrieved target file 4UQ3 were then removed.

Molecular docking
Molecular docking was performed using Autodock 4.0 software, based on Lamarckian Genetic Algorithm; which combines energy evaluation through grids of affinity potential to find the suitable binding position for a ligand on a given protein [54,55]Polar hydrogen atoms were added to the protein targets and Kollman united atomic charges were computed. The target's grid map was calculated and set to 60×60×60 points with grid spacing of 0.375 Ǻ . The grid box was then allocated properly in the target to include the active residue in the center. The genetic algorithm and its run were set to 100.The docking algorithms were set to default. Finally, results were retrieved as binding energies and poses that showed lowest binding energies were visualized using UCSF Chimera.

The Artemis Comparison Tool:
The reference sequence of envelope protein was aligned with HCov-HKU1 reference protein using artemis comparison tool as illustrated in (Fig. 2). Figure 2: Artemis analysis of envelope protein displaying 3 windows, the upper window represents HCov-HKU1 reference sequence and its genes are highlighted in blue starting from orflab gene and ending with N gene. The middle window describes the similarities and the difference between the two genomes. Red lines indicate match between genes from the two genomes blue lines indicates inversion which represents same sequences in the two genomes but they are organized in the opposite direction, and the lower windows represents COVID-19and its genes started from ARFLAB and ends with N genes.

VaxiJen server:
The mutated proteins were tested for antigenicity using VaxiJen software, where the envelope protein found as the best immunogenic target in Table 1.

BioEdit:
Sequence alignment of COVID-19envelope protein was done using BioEdit software which shows total conservation across four sequences which were retrieved from China and USA (

The Molecular Evolutionary Genetics Analysis:
To study the evolutionary relationship between all the seven strains of coronavirus, a multiple sequence alignment (MSA) was performed using ClustalW by MEGA software. This alignment was used to construct maximum likelihood phylogenetic tree as seen in Fig. 4.

Prediction of T-cell epitopes and population coverage:
IEDB website was used to analyze 2019-nCoV envelope protein for T cell related peptides. Results show ten MHC class I and II associated peptides with high population coverage (Tables  2 and 3; Fig. 5). The most promising peptides were visualized using UCSF Chimera software ( Fig.6a and b).    Designing of a novel vaccine is very crucial to defending the rapid endless of global burden of disease [56][57][58][59]. In the last few decades, biotechnology has advanced rapidly; alongside with the understanding of immunology which assisted the rise of new approaches towards rational vaccines design [60]. Peptide-based vaccines are designed to elicit immunity particular pathogens by selectively stimulating antigen specific for B and T cells [61].Applying the advanced bioinformatics tools and databases, various peptide-based vaccines could be designed where the peptides act as ligands [62][63][64]. This approach has been used frequently in Saint Louis encephalitis virus [65], dengue virus [66], chikungunya virus [67] proposing promising peptides for designing vaccines.
The COVID-19 is an RNA virus which tends to mutate more commonly than the DNA viruses [68]. These mutations lied on the surface of the protein, which make COVID-19 more superior than other previous strains by inducing its sustainability leaving the immune system in blind spot [69].
In our present work, different peptides were proposed for the designing of vaccine against COVID-19 (Fig. 1). In the beginning, the whole genome of COVID-19 was analyzed by comparative genomic approach to determine the potential antigenic target [70]. Artemis Comparative Tool (ACT) was used to analyze human Coronavirus (HCov-HKU1) reference sequence vs. Wuhan-Hu-1 COVID-19. Results obtained (Fig.2) revealed extensive mutation among the tested genomes. New genes; ORF8 and ORF6; were found inserted in COVID-19 which were absent in HCov-HKU1 that might be acquired by the horizontal gene transmission [71]. High rate of mutation between the two genomes were observed in the region from 20000 bp to the end of the sequence. This region encodes the four major structural proteins in coronavirus which are envelope (E) protein, nucleocapsid (N) protein, membrane (M) protein, and spike (S) protein, all of which are required to produce a structurally complete virus [72,73].
These conserved antigenic sites were revealed in previous studies through sequence alignment between MERS-CoV and Bat-coronavirus [74] and analyzed in SARS-CoV [75].
The four proteins were then analyzed by Vaxigen software to test the probability of antigenic proteins. Protein E was found to be the most antigenic gene with the highest probability as shown in Table 1. Literature survey confirmed this result in which protein E was investigated in severe acute respiratory syndrome (SARS) in 2003 and, more recently, Middle-East respiratory syndrome (MERS) [72]. Furthermore, the conservation of this protein against the seven strains was tested and confirmed through the use of BioEdit package tool (Fig. 3).
As highly related to each other (Fig. 4).
The immune response of T cell is considered as a long lasting response compared to B cell, where the antigen can easily escape the antibody memory response [76]. Vaccines that effectively generate cell-mediated response are needed to provide protection against the invading pathogen. Moreover the CD8+ T and CD4+ T cell responses play a major role in antiviral immunity [77]. Thus designing vaccine against T cell is much more important.
Choosing protein E as the antigenic site, the binding affinity to MHC molecules was then evaluated. The protein reference sequence was submitted to IEDB MHC predication tool. 21 peptides were found to bind MHC class I with different affinities (Table 1), from which ten peptides were selected for vaccine design based on the number of alleles and world population percentage ( Table 2; Fig. 5).Analysis in IEDB MHC II binding prediction tool resulted in prediction of 61 peptides (Table 2), from which ten peptides were selected for vaccine design based on the number of alleles and world population percentage ( where the envelope (E) protein is mutated or deleted have been described [80][81][82][83][84][85][86]. To best of our knowledge, this is the first study to identify certain peptides in envelope (E) protein as candidates for COVID-19. Accordingly, these epitopes were strongly recommended as promising epitopes vaccine candidate against T cell.

Conclusion:
Extensive mutations, insertion and deletion were discovered in COVID-19 strain using the comparative sequencing. In addition, a number of MHC class I and II related peptides were found promising candidates. Among which the peptides YVYSRVKNL, SLVKPSFYV and LAILTALRL show high potentiality for vaccine design with adequate world population coverage. T cell epitope-based peptide vaccine was designed for COVID-19 using envelope protein as an immunogenic target; nevertheless, the proposed vaccine rapidly needs to be validated clinically ensuring its safety and immunogenic profile to help on stopping this epidemic before it leads to devastating global outbreaks.

Acknowledgement:
The authors acknowledge the Deanship of Scientific Research at University of Bahri for the supportive cooperation.

Data Availability:
All data underlying the results are available as part of the article, and no additional source data are required.