Annotating Spike Protein Polymorphic Amino Acids of Variants of SARS-CoV-2, Including Omicron

The prolonged global spread and community transmission of severe acute respiratory syndrome virus 2 (SARS-CoV-2) has led to the emergence of variants and brought questions regarding disease severity and vaccine effectiveness. We conducted simple bioinformatics on the spike gene of a representative of each variant. The data show that a number of polymorphic amino acids are located mostly on the amino-terminal side of the S1/S2 cleavage site. The Omicron variant diverges from the others, with the highest number of amino acid substitutions, including the receptor-binding site (RBS), epitopes, S1/S2 cleavage site, fusion peptide, and heptad repeat 1. The current sharp global increase in the frequency of the Omicron genome constitutes evidence of its high community transmissibility. In conclusion, the proposed guideline could give an immediate insight of the probable biological nature of any variant of SARS-Cov-2. As the Omicron diverged the farthest from the original pandemic strain, Wuhan-Hu-1, we expect different epidemiological and clinical patterns of Omicron cases. On vaccine efficacy, slight changes in some epitopes while others are conserved should not lead to a significant reduction in the effectiveness of an approved vaccine.


Introduction
e emergence of various variants of the severe acute respiratory syndrome virus 2 (SARS-CoV-2) has led to questions about disease severity and vaccine effectiveness. As of 19 December 2021, over 273 million cases and over 5.3 million deaths have been reported globally (https:// www.who.int).
e World Health Organization defines SARS-CoV-2 variants as variants of concern (VOCs), variants of interest (VOIs), and variants under monitoring (VUMs). e VOCs listed in the GISAID database, which was accessed on December 26, 2021, are Omicron, and ORF10 [1,2]. ORF1a and 1b are also translated following -1 ribosomal frameshifting to produce ORF1ab protein [3]. Proteins translated from ORF1a, 1b, and 1ab form viral polymerase complex [4], while proteins of other ORF's are also known as accessory proteins [5]. However, it is generally believed that the spike protein is a major pathogenic coronavirus determinant. is surface protein possesses major immunogenic domains, and most gene-based vaccines target only the spike gene of SARS-CoV-2. e protein is highly glycosylated and cysteine rich, with two cleavage sites: S1/S2 and S2′ [6]. e glycosylation pattern of the SARS-CoV-2 spike involves N-linked and O-linked glycosylation [7]. Spike also has two protease cleavage sites, which are critical for virus activation and replication. Protease cleavage of the spike has been established as a critical determinant of coronavirus tropism and pathogenesis [8].
e epitopes of the spike protein seem to be linear and conformational. One group [9] mapped nine linear epitopes along the spike protein designated IdA-IdI; another group [10] identified 16 epitopes. Some epitopes overlap. Due to folding and trimerization, conformational epitopes in the spike protein of SARS-CoV-2 have been predicted [11]. e biological consequences of each variant are poorly understood. In this study, we performed functional annotation of amino acid changes in the spike protein of variants based on existing knowledge of coronaviruses as well as rapidly accumulating knowledge about SARS-CoV-2.

Materials and Methods
e sequence of the original SARS-CoV-2 strain of Wuhan-Hu-1 (Accession Number NC_045512) was downloaded from GenBank. e open reading frame (ORF) of the spike protein was selected as determined in the database. Ten complete sequences of each annotated variant were selected randomly from GISAID and downloaded. Using the spike gene of Wuhan-Hu-1, the first 15 nucleotides of the 5′-terminus were searched, and the sequence prior to the marked sequence was deleted. e last 15 nucleotides of Wuhan-Hu-1 were used to mark the 3′-end of the sequence, and all nucleotides after that marked sequence were deleted. We then manually selected the sequence data without any undefined or any "N" residue. When there was no "clean" sequence data for each variant, we evaluated another set of variant data. e selected sequences were translated into amino acid sequences and aligned using MEGA-X software [12]. Using the same software, the data were exported in Mega format and analyzed further for polymorphic or variable amino acids. e evolutionary history of variants was inferred using the neighbor-joining method [13]. Evolutionary distances were computed using the Kimura 2-parameter method [14]. e probable biological function of each residue was annotated using the guidelines shown in Supplementary Material 1.  Table 1. e data show that a number of polymorphic amino acids are located in the S1 domain, on the N-terminal side of the S1/S2 cleavage site. e number of polymorphic amino acids in this region is 73; the S2 domain has only 15. Glycosylation motif loss (GML) occurs once in Delta due to the T19R substitution and in Lambda due to the T76I substitution. Additional glycosylation motif (AGM) gain occurred twice in the Gamma variant, i.e., T20N and R191S, and once in Lambda, i.e., R249N. Cysteine residue loss (CRL) occurred once in the GH/490 variant due to the deletion of C136. Ten residues of the receptor binding site (RBS) are polymorphic, with only a single residue difference from Wuhan-Hu-1 in the Alpha and GH variants; the other variants, except for Omicron, carry two residues, and Omicron shows nine amino acid differences from Wuhan-Hu-1. e number of amino acid substitutions in various linear epitopes is 19; that of probable conformational epitopes is 16. e number of amino acid changes from Wuhan-Hu-1 at the mapped linear epitopes of various variants is 2, 6, 4, 2, 3, 3, 10, and 3 for Alpha, Beta, Gamma, Delta, Lambda, Mu, Omicron, and GH/490R, respectively. A single amino acid difference from Wuhan-Hu-1 at probable conformational epitopes occurred in Beta, Gamma, Delta, Mu, and GH/490R; three changes in Alpha, and seven each occurred in Lambda and Omicron. At the S1/S2 cleavage site of the Alpha, Beta, Delta, Lambda, and GH/490R variants differ from Wuhan-Hu-1 in one residue, with Omicron differing in two. All residues at this site have changed from the nonbasic amino acids Q/N/P to the basic amino acids H/R/K. At the fusion peptide site, only a single amino acid substitution occurred in Omicron. In heptad repeat 1 (HR1), a single amino acid alteration occurred in the Alpha, Gamma, and Mu variants, with Omicron displaying three alterations. In heptad repeat 2 (HR2) and the transmembrane domain, a single amino acid change occurred in the Gamma variant only. e phylogenetic analysis presented in Figure 1 shows two clusters of variants, with good bootstrap support of 88%.

Results
e Alpha, Delta, Mu, and Omicron variants form one cluster, and GH/490R, Beta, and Gamma form another. Lambda appears to have emerged directly from Wuhan-Hu-1. In the first cluster, Omicron forms a long branch from the other members of the group.

Discussion
e number of whole-genome sequence data for SARS-CoV-2 submitted to international databases poses a major computational challenge to obtain an indication of the possible impact of each strain before clinical and experimental data are available, especially in resource-limited countries. At the time of writing of this paper, the number of submitted whole-genome SARS-CoV-2 sequences was approximately 6.5 million. Here, we offer a simple bioinformatic protocol for gene mining and predicting the possible biological meaning of genetic changes in strains.
e GISAID initiative has enabled data mining by providing a variant tag for each submitted sequence. We proposed a guideline of the probable biological function of each residue   Omicron Figure 1: Evolutionary relationships of variants of SARS-CoV-2. e evolutionary history was inferred using the neighbor-joining method [13]. e evolutionary distances were computed using the Kimura 2-parameter method [14]. Evolutionary analyses were conducted in MEGA X [12]. e tree was rooted to Wuhan-Hu-1.
in the spike protein of SARS-Cov-2 based on current knowledge which could be adjusted following the fast flow of upcoming research reports. It is generally believed that the phenotypic nature of viruses is mostly polygenic, whereby the entire genetic composition of strains determines the biology of the virus.
is should also be true for coronaviruses, including SARS-CoV-2. e mechanism of pathogenesis of the Middle East severe acute respiratory syndrome coronavirus (MERS-CoV) mainly occurs through interaction between spike and cellular receptors, papain-like protease PLpro, and accessory proteins such as p4a and membrane M protein [15]. For SARS-CoV-2, various nonstructural proteins, such as PLpro [16] and various accessory proteins [5], have been described as contributing to virus biology and pathogenesis. However, focusing on the spike gene is also important, as a body of literature provides evidence on the key functions of this protein.
e S1 domain, which carries major antigenic determinants, mediates receptor recognition and viral attachment to initiate host cell entry [17]. e NTD domain contributes to the host range [18]. Binding of the receptorbinding domain to receptors initiates infection [19], and the S2 domain mediates membrane fusion [17,20,21].
Various polymorphic amino acids are located mostly in the S1 domain, downstream from the S1/S2 cleavage site. e number of polymorphic amino acids at this site is 73, with only 15 in the S2 domain. We believe that as the S1 domain of the spike is located on the surface of the virion, thus allowing many substitutions, but that the S2 domain must be conserved to preserve virus integrity. Glycosylation and cysteine residues are also crucial for maintaining virus integrity. Our data show that glycosylation motif loss occurred once in the Delta and Lambda variants and that additional glycosylation motif gain occurred twice in Gamma and once in Lambda. Cysteine residue loss (CRL) occurred only once in the GH/490R variant. e newest variant, Omicron, which was recently identified, exhibits nine amino acid changes in the RBD, whereas other variants are more conserved, with only one or two residue differences from Wuhan-Hu-1. erefore, it is plausible to expect biological changes in Omicron that differ from those of other variants as well as the original strain. Tracking variant occurrence on the GISAID website revealed a sharp global increase from under 1% on November 29, 2021, to 50% of circulating strains on December 29, 2021. Higher transmissibility is evident. e Omicron linear and conformational epitopes diverge most from the Covid-19 origin strain Wuhan-Hu-1, with ten and seven amino acid substitutions, respectively. Seven amino acid substitutions at conformational epitopes also occur in the Lambda variant. e other variants have only 2-6 amino acid changes in linear and 1-3 substitutions in conformational epitopes of spike. e identified SARS-CoV-2 spike epitopes consist of at least 10-20 residues or longer. As MHC-1-presenting B-and T cell epitopes are limited to 9-11 residues and MHC-II-presenting epitopes are limited to 9-22 [22], the identified epitopes of the spike protein of SARS-CoV-2 need to be refined. Moreover, due to the multiple epitopes of more than 20, a slight change at some epitopes while others are conserved should not lead to a significant reduction in the effectiveness of existing vaccines. Reports on the reduced efficacy of vaccines against variants based on in vitro experiments [23] should not cause concern, as they might not significantly reduce vaccine efficacy in vivo. Indeed, the immune system consists of an array of components, such as cytokines, complement activation, and macrophage opsonization [24][25][26], as already reported in SARS-CoV-2 [27][28][29][30]. Moreover, cellular mediated immunity must pose critical role in a complete immune protection in SARS-CoV-2 [31], which is not involved in an in vitro neutralization testing. e S1/S2 cleavage site changes from nonbasic Q/N/P to basic H/R/K amino acids might be critical to the biology of the virus, especially Omicron. Although other variants show a single amino acid change, Omicron exhibits two. Moreover, all variants carry more basic S1/S2 cleavage sites, and Omicron has the most basic S1/S2 cleavage site. It is welldocumented for influenza viruses that the polybasic cleavage site allows for ubiquitous cellular protease activation for the virus to initiate infection [32][33][34]. e cleavage site of the origin of SARS-CoV-2 is indeed polybasic with an NH-PRRAR-COOH motif. A P681H change occurred in Alpha, Mu, and Omicron, with P681R in delta and GH/490R. A more basic cleavage site was acquired in Omicron due to the N679K substitution. For the Delta variant, the more basic cleavage site might have contributed to its transmissibility and clinical outcomes [35,36]. erefore, it is plausible to expect higher transmissibility of Omicron as its cleavage site is more basic than that of the delta variant. However, its clinical consequences among nonimmune people, who are unvaccinated or have not experienced natural infection, are expected to be reported soon.
Despite the protein changes of Omicron, it has a relatively conserved S2 domain. Omicron shows amino acid alterations in the fusion peptide and HR1, with one and three substitutions, respectively, which might have an impact on fusion capability. As described previously, these domains mediate membrane fusion in infected cells [17,20,21].
Phylogenetic analysis showed that the Omicron forms a long branch from the other members of the cluster. is phenomenon is most frequently observed due to isolation and having no known close relatives [37]. e ancestor of this variant might have been circulating without notice, or the number of genome sequences from the area of circulation was limited. Another explanation is that dramatic changes might have occurred shortly before its identification.
We expect some biological changes due to amino acid substitution of the spike protein of SARS-CoV-2, especially for omicron. e most significant number of amino acid substitutions occurred in Omicron, with divergence in the RBS, epitopes, S1/S2 cleavage site, fusion peptide, and HR1. e sharp global increase in dominance of 50% of circulating viruses shows evidence of their high community transmissibility. As multiple epitopes of more than 20 residues exist on spike, a slight change in some epitopes while others are conserved should not lead to a significant reduction in Biochemistry Research International existing vaccine effectiveness. e emergence of Omicron with dramatic changes and an unknown recent ancestor should draw attention from the global community for resource mobilization.
In conclusion, the proposed guideline could give an immediate insight into the probable biological nature of any variant of SARS-Cov-2. e Omicron diverged the farthest from the original pandemic strain Wuhan-Hu-1 with divergence in the RBS, epitopes, S1/S2 cleavage site, fusion peptide, and HR1. erefore, we expect different epidemiological and clinical pattern of Omicron cases. On the vaccine efficacy, slight changes in some epitopes while others are conserved should not lead to a significant reduction in existing vaccine effectiveness.

Data Availability
All the data are provided in the text as well as in the supplementary materials.

Additional Points
Statement of significance of the study. e number of wholegenome sequence data for SARS-CoV-2 submitted to international databases poses a major computational challenge to obtain an indication of the possible impact of each strain before clinical and experimental data are available, especially in resource-limited countries. Here, we offer a simple bioinformatic protocol for gene mining and predicting the possible biological meaning of genetic changes in strains. We proposed a guideline of the probable biological function of each residue in the spike protein of SARS-Cov-2 based on current knowledge which could be adjusted following the fast flow of upcoming research reports. Applying the guideline to representatives of various SARS-CoV-2 variants, it is remarkable that polymorphic amino acids are located mostly on the amino-terminal side of the S1/S2 cleavage site. e Omicron variant diverges from the others at including the receptor-binding site (RBS), epitopes, S1/S2 cleavage site, fusion peptide, and heptad repeat 1. e unique RBS and S1/S2 cleavage sites could contribute to the sharp global increase of Omicron cases. However, a slight change in some epitopes while others are conserved should not lead to a significant reduction in the effectiveness of existing vaccines.

Conflicts of Interest
e authors have declared no conflicts of interest.
Acknowledgments is study was supported by Research, Technology and Higher Education (RISTEKDIKTI) of Indonesia through World Class Research.
e English expression of the manuscript has been copy-edited by a global professional copy-editing service.

Supplementary Materials
Supplementary Materials. e probable biological function of each residue was annotated using the guidelines shown in Supplementary Material 1. e dataset containing the representative of each variant used is available in Supplementary Material 2. (Supplementary Materials)