Sars-CoV-2 Envelope and Membrane proteins: differences from closely related proteins linked to cross-species transmission?

The Coronavirus disease (COVID-19) is a new viral infection caused by severe acute respiratory coronavirus 2 (SARS-CoV-2) that was initially reported in city of Wuhan, China and afterwards spread globally. Genomic analyses revealed that SARS-CoV-2 is phylogenetically related to severe acute respiratory syndrome-like (SARS-like) Pangolin and Bat coronavirus specific isolates. In this study we focused on two proteins of Sars-CoV-2 surface: Envelope protein and Membrane protein. Sequences from Sars-CoV-2 isolates and other closely related virus were collected from the GenBank through TBlastN searches. The retrieved sequences were multiply aligned with MAFFT. The Envelope protein is identical to the counterparts from Pangolin CoV MP798 isolate and Bat CoV isolates CoVZXC21, CoVZC45 and RaTG13. However, a substitution at position 69 where an Arg replace for Glu, and a deletion in position 70 corresponding to Gly or Cys in other Envelope proteins were found. The Membrane glycoprotein appears more variable with respect to the SARS CoV proteins than the Envelope: a heterogeneity at the N-terminal position, exposed to the virus surface, was found between Pangolin CoV MP798 isolate and Bat CoV isolates CoVZXC21, CoVZC45 and RaTG13. Mutations observed on Envelope protein are drastic and may have significant implications for conformational properties and possibly for protein-protein interactions. Mutations on Membrane protein may also be relevant because this protein cooperates with the Spike during the cell attachment and entry. Therefore, these mutations may influence interaction with host cells. The mutations that have been detected in these comparative studies may reflect functional peculiarities of the Sars-CoV-2 virus and may help explaining the epizootic origin the COVID-19 epidemic. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 7 April 2020 doi:10.20944/preprints202004.0089.v1


Introduction
COVID-19 has become a planetary emergency which is seriously threatening human health (Benvenuto, Giovanetti, Salemi, et al., 2020;Lai, Shih, Ko, Tang, & Hsueh, 2020). Many aspects of the structure and biology of the Sars-CoV-2 virus are yet to be elucidated. Development of effective therapeutic and prevention strategies is significantly hampered by the lack of detailed structural information on virus proteins, although a few crystallographic structures of virus proteins are now available (Walls et al., 2020;Zhang et al., 2020). A contribution to the deciphering of virus properties may also come from careful comparative protein sequence and structure analysis to detect significant differences to similar viruses. In this report, we describe the results of a comparison of the Sars-CoV-2 surface proteins from different isolates of the virus to homologous proteins from the most closely related proteins such as those from Bat and Pangolin coronavirus. Our work has been focussed onto the Envelope (E) and Membrane (M) proteins that form along with the Spike, the virus protein interface to the external environment through which interacts initially with target human cells. The Spike glycoprotein has been already extensively studied and a crystallographic structure is available in the Protein Data Bank Walls et al., 2020); in consideration of this, the protein has not been specifically addressed within this note. Identification of local structural differences, even minimal, to the closest virus proteins may suggest the mutations that enabled Sars-CoV-2 to cross species and to acquire its peculiar pathogenic properties Ji, Wang, Zhao, Zai, & Li, 2020). In fact, a number of examples have been published in the scientific literature showing how even single point mutations in virus proteins can significantly alter their biology and pathogenesis (André, Cossic, Davies, Miller, & Whittaker, 2019;Sakai et al., 2017). Therefore, comparative studies may shed light on the molecular mechanisms through which epidemic of epizootic origin can emerge and may also suggest molecular targets for therapeutics or reverse vaccinology experiments.

Databank searches and modelling
The Sars-CoV-2 E and M protein sequences (Table 1) have been used as TBlastN queries to search the GenBank nucleotide database restricted to the Viruses taxonomical division. Sequences have been collected separately for each Sars-CoV-2 proteins and aligned with MAFFT. At the time of access to the GenBank, 102 genomes from different Sars-CoV-2 isolates were retrieved.

Envelope protein
TblastN search confirms that, in general, E protein is well conserved across β-coronaviruses and particularly across SARS CoVs. In particular, the Sars-CoV-2 E protein is identical to that of Pangolin CoV MP798 isolate and Bat CoV isolates CoVZXC21, CoVZC45 and RaTG13 ( Table 1) Prediction of the transmembrane helices and topology is difficult in such a short protein and therefore the internal and external portions cannot be assigned unambiguously. Experiments have not clarified definitively this point (Schoeman & Fielding, 2019).

Membrane glycoprotein
GenBank search confirms that, similarly to E protein, M glycoprotein is generally conserved across β-CoVs and specially across SARS CoVs. However, this protein appears more variable with respect to the SARS CoV proteins than the Envelope (Figure 3). Multiple sequence alignment points that there is a remarkable similarity among the Sars-CoV-2 sequences and those from the same Bat and Three-dimensional model for the membrane protein has been taken from I-Tasser server since other methods failed to find any suitable template (code QHD43419). However, it should be mentioned that HHpred found a weak local affinity, well below the statistical significance level, to 4N31, a peptidase-like protein from Streptococcus pyogenes essential for pilus polymerisation. Mapping of the relevant sites onto the three-dimensional model is displayed in Figure 4. According to the transmembrane helix topology predictions, the N-terminal portion is located outside the virus particle while the C-terminal inside (Figure 4). As this model has been predicted by ab-initio techniques, it should be considered with great caution and only as a low-resolution approximation of the real structure.

Discussion
Previous studies highlighted that E and M proteins could be important for viral entry, replication and particle assembly within the human cells (J Alsaadi & Jones, 2019;Schoeman & Fielding, 2019).
According to the most accepted theories, the current COVID-19 pandemic has been caused by the cross-species transmission of a Coronavirus normally hosted by Bats and, perhaps, Pangolin to humans Lu et al., 2020). In this paper, we have It has been demonstrated that M glycoprotein is more prevalent within the virus membrane and it is deemed to be important for the budding process of the Coronaviruses. Indeed, during the process of virus particle assembly, this protein interacts with the Nucleocapsid, Envelope, Spike and Membrane glycoprotein itself (J Alsaadi & Jones, 2019). Moreover, in Alphacoronaviruses it has been demonstrated that this protein cooperates with the Spike during the cell attachment and entry (Naskalska et al., 2019). Therefore, mutation occurring at the N-terminus region, which is exposed to the virus surface, could probably play a key role in the host cell interaction.
In conclusion, with these analyses we have investigated the potential epizootic origin of the SARS-   Multiple sequence alignment among Sars-CoV-2 envelope proteins and a selection of the most similar homologous proteins. The single Sars-CoV-2 sequence is identical to all the isolates. The variant sequence is reported separately. Pangolin sequence is the representative of the identical sequences listed in Table 1. Red lines indicate the variant sites discussed in the text. Alignment blue hue is proportional to column percentage of identity.

Figure 2
Three-dimensional model of the viroporin-like tetrameric assembly of the envelope protein from Sars-CoV-2 represented as cartoon model. Residues corresponding to the mutated sites indicated in Figure 1 are displayed as transparent space filling spheres and labelled. The C-terminal segments are reported for completeness even though they have no conformational meaning for lack of a corresponding segment in the structural template.

Figure 3
Multiple sequence alignment among Sars-CoV-2 Membrane glycoproteins and a selection of the most similar homologous proteins. The single Sars-CoV-2 sequence is identical to all the isolates.
The Sars-CoV-2 variant sequences are reported separately. Red box indicates the variant sites at the N-terminal discussed in the text. Red bars under the multiple alignment mark the consensus prediction of transmembrane helices. The location of the connect loop with respect to the virion surface is indicated as "in" or "out". Alignment blue hue is proportional to column percentage of identity.

Figure 4
I-Tasser model of the Membrane glycoprotein represented as cartoon model. Relevant residues are displayed as transparent space filling spheres and labelled.