Coronaviruses (CoV) are a large family of zoonotic viruses that cause illness ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS-CoV) and Severe Acute Respiratory Syndrome (SARS-CoV). In the last decades, six strains of coronaviruses were identified; however, in December 2019, a new strain has spread across Wuhan City, China [
COVID-19 is a positive-sense single-stranded RNA virus (+ssRNA). Its RNA sequence is approximately 30,000 bases in length [
The infected person is characterized with fever, upper or lower respiratory tract symptoms, diarrhea, lymphopenia, thrombocytopenia, and increased C-reactive protein and lactate dehydrogenase levels or combination of all these within 3-6 days after exposure. Further molecular diagnosis can be made by real-time PCR for genes encoding the internal RNA-dependent RNA polymerase and Spike’s receptor binding domain, which can be confirmed by Sanger sequencing and full genome analysis by NGS, multiplex nucleic acid amplification, and microarray-based assays [
A phylogenetic tree of the mutation history of a family of viruses is possible to reconstruct with a sufficient number of sequenced genomes. The phylogenetic analysis indicates that COVID-19 likely originated from bats [
The sequence of COVID-19 RBD, together with its RBM that contacts receptor angiotensin-converting enzyme 2 (ACE2), was found similar to that of SARS coronavirus. In January 2020, a group of scientists demonstrated that ACE2 could act as the receptor for COVID-19 [
At present, there is no vaccine or approved treatment for humans, but Chinese traditional medicines, such as ShuFengJieDu capsules and Lianhuaqingwen capsules, could be possible treatments for COVID-19. However, there are no clinical trials approving the safety and efficacy for these drugs [
The main concept within all the immunizations is the ability of the vaccine to initiate an immune response in a faster mode than the pathogen itself. Although traditional vaccines, which depend on biochemical trials, induced potent neutralizing and protective responses in the immunized animals, they can be costly, allergenic, and time-consuming and require in vitro culture of pathogenic viruses leading to serious concern of safety [
Peptide-based vaccines do not need in vitro culture making them biologically safe, and their selectivity allows accurate activation of immune responses [
The workflow summarizing the procedures for the epitope-based peptide vaccine prediction is shown in Figure
Descriptive workflow for the epitope-based peptide vaccine prediction.
Full GenBank files of the complete genomes and annotation of COVID-19 (NC_04551), SARS-CoV (FJ211859), MESA-CoV (NC_019843), HCoV-HKU1 (AY884001), HCoV-OC43 (KF923903), HCoV-NL63 (NC_005831), and HCoV-229E (KY983587) were retrieved from the National Center for Biotechnology Information (NCBI), while the FASTA format of the envelope (E) protein (YP_009724392.1), spike (S) protein (YP_009724390.1), nucleocapsid (N) protein (YP_009724397.2), and membrane (M) protein (YP_009724393.1) of 2019-nCoV and the envelope (E) protein of two Chinese and two American sequences (YP009724392.1, QHQ71975.1, QHO60596.1, and QHN73797.1) were obtained from the NCBI (
ACT is an in silico analysis software for visualization of comparisons between complete genome sequences and associated annotations [
It is the first server for alignment-independent prediction of protective antigens. It allows antigen classification solely based on the physicochemical properties of proteins without recourse to sequence alignment. It predicts the probability of the antigenicity of one or multiple proteins based on auto cross covariance (ACC) transformation of protein sequence. Structural CoV-2019 proteins (N, S, E, and M) were analyzed by VaxiJen with threshold of 0.4 [
It is a software package proposed to stream a distinct program that can run nearly any sequence operation as well as a few basic alignment investigations. The sequences of the E protein retrieved from UniProt were run in BioEdit to determine the conserved sites through ClustalW in the application settings [
MEGA (version 10.1.6) is software for the comparative analysis of molecular sequences. It is used for pairwise and multiple sequence alignment alongside construction and analysis of phylogenetic trees and evolutionary relationships. The gap penalty was 15 for opening and 6.66 for extending the gap for both pairwise and multiple sequence alignment. Bootstrapping of 300 was used in construction of the maximum like hood phylogenetic tree [
IEDB tools were used to predict the conserved sequences (10-mer sequence) from HLA class I and class II T-cell epitopes by using an Artificial Neural Network (ANN) approach [
Population coverage for each epitope was carefully determined by the IEDB population coverage calculation tool. Due to the diverse binding sites of epitopes with different HLA alleles, the most promising epitope candidates were calculated for population coverage against the population of the whole world, China, and Europe to get and ensure a universal vaccine [
The reference sequence of the E protein that has been retrieved from GenBank was used as an input in RaptorX to predict the 3D structure of the E protein [
In order to estimate the binding affinities between the epitopes and the molecular structure of MHC I and MHC II, in silico molecular docking was used. Sequences of proposed epitopes were selected from the COVID-19 reference sequence using UCSF Chimera 1.10 and saved as a PDB file. The obtained files were then optimized and energy minimized. The HLA-A
All water molecules and heteroatoms in the retrieved target file 4UQ3 were then removed. The target structure was further optimized and energy minimized using Swiss PDB Viewer V.4.1.0 software [
Molecular docking was performed using AutoDock 4.0 software, based on the Lamarckian genetic algorithm, which combines energy evaluation through grids of affinity potential to find the suitable binding position for a ligand on a given protein [
The reference sequence of the envelope protein was aligned with the HCoV-HKU1 reference protein using the Artemis Comparison Tool as illustrated in (Figure
Artemis analysis of the envelope protein displaying 3 windows. The upper window represents the HCoV-HKU1 reference sequence, and its genes are highlighted in blue starting from
The mutated proteins were tested for antigenicity using VaxiJen software, where the envelope protein was found as the best immunogenic target in Table
VaxiJen overall prediction of probable COVID-19 antigen.
Protein | Result | VaxiJen prediction |
---|---|---|
Protein E | 0.6025 | Probable antigen |
Protein M | 0.5102 | Probable antigen |
Protein S | 0.4646 | Probable antigen |
Protein N | 0.5059 | Probable antigen |
Sequence alignment of the COVID-19 envelope protein was done using BioEdit software which shows total conservation across four sequences which were retrieved from China and the USA (Figure
Sequence alignment of the envelope protein of COVID-19 using BioEdit software (total conservation through the 4 strains: 2 from China and 2 from the USA).
To study the evolutionary relationship between all the seven strains of coronavirus, a multiple sequence alignment (MSA) was performed using ClustalW by MEGA software. This alignment was used to construct the maximum likelihood phylogenetic tree as seen in Figure
Maximum likelihood phylogenetic tree which describes the evolutionary relationship between the seven strains of coronavirus.
The IEDB website was used to analyze the 2019-nCoV envelope protein for T-cell-related peptides. Results show ten MHC class I- and II-associated peptides with high population coverage (Tables
The most promising MHC class I-related peptides in the envelope protein-based vaccine of COVID-19 along with the predicted coverage of the world, China, Europe, and East Asia.
Peptide | Alleles | Coverage | Combined coverage of 10 peptides |
---|---|---|---|
YVYSRVKNL | HLA-C |
50.02% | World: 88.5% |
SLVKPSFYV | HLA-A |
42.53% | China: 78.17% |
SVLLFLAFV | HLA-A |
42.53% | Europe: 92.94% |
FLAFVVFLL | HLA-A |
40.60% | East Asia: 80.78% |
VLLFLAFVV | HLA-A |
39.08% | |
RLCAYCCNI | HLA-A |
39.08% | |
FVSEETGTL | HLA-C |
28.22% | |
LTALRLCAY | HLA-A |
26.34% | |
LVKPSFYVY | HLA-B |
21.72% | |
NIVNVSLVK | HLA-A |
20.88% |
The most promising MHC class II-related peptides in the envelope protein-based vaccine of COVID-19 along with the predicted coverage of the world, China, Europe, and East Asia.
Peptide sequence | Alleles | World coverage | Coverage/10 peptides |
---|---|---|---|
KPSFYVYSRVKNLNS | HLA-DPA1 |
99.93% | World: 99.99% |
VKPSFYVYSRVKNLN | HLA-DPA1 |
99.92% | China: 99.96% |
LVKPSFYVYSRVKNL | HLA-DPA1 |
99.90% | Europe: 100.0% |
PSFYVYSRVKNLNSS | HLA-DPA1 |
99.86% | East Asia:99.91% |
NIVNVSLVKPSFYVY | HLA-DPA1 |
99.77% | |
LLVTLAILTALRLCA | HLA-DPA1 |
99.72% | |
SFYVYSRVKNLNSSR | HLA-DPA1 |
99.72% | |
LVTLAILTALRLCAY | HLA-DPA1 |
99.69% | |
VTLAILTALRLCAYC | HLA-DPA1 |
99.56% | |
CNIVNVSLVKPSFYV | HLA-DPA1 |
99.53% |
Schematic diagrams (a) and (b) showing world population coverage of the envelope protein of COVID-19 binding to the MHC class I and MHC class II molecules, respectively.
3D structures visualized by UCSF Chimera: (a) and (b) show the most promising peptides in the envelope protein of COVID-19 (yellow colored) binding to MHC class I and MHC class II, respectively, while (c), (d), and (e) show the molecular docking of the YVYSRVKNL, LAILTALRL, and SLVKPSFYV peptides of coronavirus docked in HLA-A
Designing a novel vaccine is very crucial to defend against the rapid endless global burden of diseases [
The COVID-19 is an RNA virus which tends to mutate more commonly than the DNA viruses [
In our present work, different peptides were proposed for designing a vaccine against COVID-19 (Figure
These conserved antigenic sites were revealed in previous studies through sequence alignment between MERS-CoV and bat coronavirus [
The four proteins were then analyzed by VaxiJen software to test the probability of antigenic proteins. Protein E was found to be the most antigenic gene with the highest probability as shown in Table
Phylogenetic analysis is a very powerful tool for determining the evolutionary relationship between strains. Multiple sequence alignment (MSA) was performed using ClustalW for the seven strains of coronavirus, which are COVID-19 (NC_04551), SARS-CoV (FJ211859), MESA-CoV (NC_019843), HCoV-HKU1 (AY884001), HCoV-OC43 (KF923903), HCoV-NL63 (NC_005831), and HCoV-229E (KY983587). The maximum likelihood phylogenetic tree revealed that COVID-19 is found in the same clade of SARS-CoV; thus, the two strains are highly related to each other (Figure
The immune response of T-cells is considered a long-lasting response compared to B-cells, where the antigen can easily escape the antibody memory response [
Choosing protein E as the antigenic site, the binding affinity to MHC molecules was then evaluated. The protein reference sequence was submitted to the IEDB MHC predication tool. 21 peptides were found to bind MHC class I with different affinities (Table
It is well known that peptides recognized with a high number of HLA molecules are potentially inducing immune response. Based on the aforementioned results and taking into consideration the high binding affinity to both MHC class I and II, conservancy, and population coverage, three peptides are strongly proposed to formulate a new vaccine against COVID-19.
These findings were further confirmed by the results obtained for the molecular docking of the proposed peptides and HLA-A
Although both flu and anti-HIV drugs are used currently in China for treatment of COVID-19, chloroquine phosphate, an old drug for treatment of malaria, has recently been found to have apparent efficacy and acceptable safety against COVID-19 [
Extensive mutations, insertion, and deletion were discovered in the COVID-19 strain using the comparative sequencing. In addition, a number of the MHC class I- and II-related peptides were found to be promising candidates. Among which, the peptides YVYSRVKNL, SLVKPSFYV, and LAILTALRL show high potentiality for vaccine design with adequate world population coverage. The T-cell epitope-based peptide vaccine was designed for COVID-19 using the envelope protein as an immunogenic target; nevertheless, the proposed vaccine rapidly needs to be validated clinically ensuring its safety and immunogenic profile to help stop this epidemic before it leads to devastating global outbreaks.
World Health Organization
Coronaviruses
Middle East Respiratory Syndrome
Severe Acute Respiratory Syndrome
Novel coronavirus
Human coronavirus HKU1
Human coronavirus OC43
Human coronavirus NL63
Human coronavirus 229E
Angiotensin-converting enzyme 2
Receptor-binding motif
Receptor-binding domain
Versus
Artemis Comparison Tool
Auto cross covariance
Molecular Evolutionary Genetics Analysis
Artificial Neural Network
Immune Epitope Database
Median inhibitory concentrations
Major histocompatibility complex class I
Major histocompatibility complex class II
Protein database
Multiple sequence alignment.
All data underlying the results are available as part of the article, and no additional source data are required.
The authors declare that they have no conflicts of interest.
The contributions of the authors involved in this study are as follows: MIA: conceptualization, formal analysis, investigation, methodology, validation, visualization, and writing (original draft); AHA: formal analysis, investigation, and methodology; MIM: methodology, writing (original draft), and writing (review and editing); NME: formal analysis, methodology, and visualization; NSM: conceptualization, resources, and writing (review and editing); SWS: visualization, validation, and writing (review and editing); and AMM: data curation, conceptualization, project administration, supervision, and writing (review and editing). All authors have read and approved the final manuscript.
The authors acknowledge the Deanship of Scientific Research at University of Bahri for the supportive cooperation.