In Silico Analysis of High-Risk Missense Variants in Human ACE2 Gene and Susceptibility to SARS-CoV-2 Infection

SARS-CoV-2 coronavirus uses for entry to human host cells a SARS-CoV receptor of the angiotensin-converting enzyme (ACE2) that catalyzes the conversion of angiotensin II into angiotensin (1-7). To understand the effect of ACE2 missense variants on protein structure, stability, and function, various bioinformatics tools were used including SIFT, PANTHER, PROVEAN, PolyPhen2.0, I. Mutant Suite, MUpro, SWISS-MODEL, Project HOPE, ModPred, QMEAN, ConSurf, and STRING. All twelve ACE2 nsSNPs were analyzed. Six ACE2 high-risk pathogenic nsSNPs (D427Y, R514G, R708W, R710C, R716C, and R768W) were found to be the most damaging by at least six software tools (cumulative score between 6 and 7) and exert deleterious effect on the ACE2 protein structure and likely function. Additionally, they revealed high conservation, less stability, and having a role in posttranslation modifications such a proteolytic cleavage or ADP-ribosylation. This in silico analysis provides information about functional nucleotide variants that have an impact on the ACE2 protein structure and function and therefore susceptibility to SARS-CoV-2.


Introduction
During December 2019 in Wuhan, China, a novel infectious disease called the coronavirus disease 2019 or COVID-19 was detected and found to be caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Later on, it was declared as a pandemic by the World Health Organization (WHO) on March 2020 [1,2]. The most common symptoms were fever, cough, shortness of breath, sputum production, and fatigue [3]. SARS-CoV-2 infects the respiratory tract and engenders acute respiratory disease [3]. SARS-CoV-2 utilizes angiotensin-converting enzyme 2 (ACE2) as an entry receptor and the cellular serine protease TMPRSS2 for SARS-CoV-2 spike (S) protein priming [4].
The ACE2 gene is located on the chromosome X, specifically at the position Xp22. It contains 18 exons that show a striking resemblance in exons size and organization to those of the ACE gene [5,6]. The complete cDNA sequence of ACE2 encodes an 805 amino acid protein, which is composed mainly of an N-terminal signal peptide (17 amino acid residues), a peptidase domain (positioning from 19-615 amino acids), and a C-terminal collectrin, which plays a significant role as a regulator of renal amino acid transportation, insulin exocytosis, and β-cell proliferation [5,7].
ACE2 is expressed mostly in the vascular endothelial cells of the kidney and constitutes a vital site of blood pressure regulation in the renin-angiotensin system (RAS) and heart function. In the heart, ACE2 constitutes the first pathway for the metabolism of angiotensin II and is the most essential factor for progressive cardiac disease [8][9][10]. Interestingly, it seems that ACE2 has diverse biological functions, such as regulation of blood pressure by the renin-angiotensin aldosterone system (RAAS), and metabolizing angiotensin II (Ang II) to Ang (1-7), a biological active peptide that binds to the Mas receptor to exert many beneficial vasodilatory, antifibrotic, antithrombotic, and antiproliferative actions [11,12]. Very recently, it remains possible that ACE2 gene polymorphism may influence both the susceptibility to SARS-CoV-2 infection and COVID-19 disease outcome [5]. There are a few reports related to computational analysis of missense variants on the ACE2 gene. Hence, the present study is aimed at identifying deleterious variants in the ACE2 gene by using various bioinformatics tools to extrapolate the possible associations of ACE2 polymorphisms with COVID-19 susceptibility.

Prediction of Deleterious nsSNPs of ACE2 Gene.
To predict the deleterious effect of nonsynonymous SNPs (nsSNPs) of the human ACE2 gene on protein function, four tools were used, namely, SIFT, PolyPhen 2.0, PANTHER, and PRO-VEAN. SIFT (Sorting Intolerant from Tolerant) (https://sift .bii.a-star.edu.sg/) is a program that employs sequence homology to predict the impact of an AA substitution on protein function. It classifies AA substitutions with a score less than 0.05 as deleterious [13]. PolyPhen 2.0 (Polymorphism Phenotyping v2), (http://genetics.bwh.harvard.edu/ pph2/) evaluates the impact of an AA substitution on the protein function and structure. It classifies the substitution as probably damaging (score = 1:0), and possibly damaging or benign (score = 0:0) [14]. PANTHER cSNP (Protein ANalysis THrough Evolutionary Relationship_coding_ SNP) (http://pantherdb.org/tools/csnpScoreForm.jsp) calculates the likelihood of a single AA change on protein function, and it is based on the PANTHER-PSEP (Posi-tion_Specific Evolutionary Preservation) method [15]. PROVEAN (Protein Variation Effect Analyzer) (http:// provean.jcvi.org/index.php) is a web server that was used to predict the impact of AA substitutions on the biological function of a protein [16].
2.3. Effect on the Stability of Protein. The stability of the protein was checked using the MUpro bioinformatics tool and I. Mutant Suite. MUpro (http://mupro.proteomics.ics.uci.edu/) predicts if a mutation increases or decreases the stability of protein structure [17]. I. Mutant Suite, a support vector machine-based algorithm, is available at (http://gpcr2 .biocomp.unibo.it/cgi/predictors/I-Mutant3.0/I-Mutant3.0 .cgi). It predicts the mutant protein stability starting from protein sequence alone [18].

Phylogenetic Conservational Analysis of ACE2.
Conservation prediction of the ACE2 protein sequence was analyzed with ConSurf (https://consurf.tau.ac.il/), a web server used for identifying functional regions in proteins by analysing the evolutionary dynamics of AA substitutions among homologous sequences. The conservation score of 1-3 is variable, 5-6 as an intermediate scale, and 7-9 as a highly conserved AA positions [19]. 2.6. Statistical Analysis and Cumulative Score Calculation of Pathogenic nsSNPs. In order to predict the high-risk pathogenic nsSNPs of human ACE2, the cumulative score for all software tools (SIFT, PolyPhen2.0, PROVEAN, PANTHER, MUpro, I. Mutant, and ModPred) was calculated by using the Sum function in Excel [21]. Then, we set a restricted cumulative score value; when the result of the seven software tools were combined, the amino acid substitution that was evaluated to be deleterious by at least 6 tools would be identified as ACE2 high-risk pathogenic nsSNPs. For the final, correlation analysis was performed using SPSS v19 software. ANOVA and Student t-tests were applied to compare the predictions of the different tools. A p value less than 0.05 was significant [22].

2.7.
Modelling. The 3D structure of full-length ACE2 protein was retrieved from The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB-PDB, http://rcsb .org), (ID 6M17) with a resolution of 2.9 Å. The 3D structure of the wild-type and mutant protein was generated using SWISS-MODEL (https://swissmodel.expasy.org/), and the quality of the model was checked using the Qualitative Model Energy Analysis (QMEAN) server (https://swissmodel .expasy.org/qmean/). SWISS-MODEL is a fully automated program that was used to predict the 3D structure of proteins. It generates 3D models by using homology modelling techniques [23]. The FASTA AA sequence of ACE2 protein was an input for the SWISS-MODEL. The predicted model of ACE2 from SWISS-MODEL was an input for the QMEAN analysis. QMEAN is a server that provides access to three scoring functions (QMEAN [24], QMEANBrane [25], and QMEANDisco) [26]. It estimates the quality of protein structure models in protein structure prediction [27]. PyMol is an open source program used for the three-dimensional visualization of macromolecules including proteins, nucleic acids, and small molecules [28]. The align command in PyMol is used to superpose two or more protein structures, and the superposition was evaluated based on RMSD (root mean square deviation) calculation [29].

Prediction of Structural Effect of Point Mutation on ACE2.
Project HOPE (Have (y) Our Protein Explained) is a next-generation web server available at "https://www3.cmbi .umcn.nl/hope/" that is used to analyze the single point mutations. HOPE collects information from data sources and produces a mutation report enriched with figures that illustrates the effects of the mutation [30].

Prediction of Protein
Interactions. STRING (Search Tool for Recurring Instances of Neighbouring Genes) is a web server available at "https://string-db.org/" that provides a platform for searching functional associations between proteins [31].

SNP Dataset.
In this study, 242 missense variants were collected from Ensembl and gnomAD A3 databases. SIFT, PolyPhen 2.0, PROVEAN, and PANTHER algorithms were used to predict the functional effects of mutation on the protein. MUpro and I. Mutant Suite tools were used to identify the mutation effects on protein stability. HOPE, ConSurf, ModPred, SWISS-MODEL, and STRING were used to predict the mutation effects on protein structure, function, and protein-protein interactions. In total, 13 different bioinformatics programs and web servers were used to assess the mutation effects on the variants in this investigation, because the prediction of the effect of a mutation using one algorithm is not sufficient for assessing that mutation effect.

Effect on the Stability of Protein. MUpro and I. Mutant
Suite were used to predict change in protein stability. The  , and R768W) were recorded as decreasing the stability of the ACE2 protein, with a reliability index value ranged between 0 and 9, while the mutations (R219C, M383T, P389H, and L731F) were predicted to increase the stability of the ACE2 protein ( Table 2). The result of stability predicted by MUpro showed that all the twelve nsSNPs (R219C, R219H, M383T, P389H, D427Y, R514G, R708W, R710H, R710C, R716C, L731F, and R768W) have decreased the stability of the ACE2 protein (Table 3).

Phylogenetic Conservation.
According to ConSurf analysis, R514, R708, R710, and R768 are highly conserved residues with conservation score equal to 9 and also predicted to be exposed and functional. M389 and L731 are slightly conserved residues and buried. R219, P389, and D427 are variable residues and exposed, and finally one residue (R716) is average (conservation score equal to 5). Results of ConSurf prediction of ACE2 SNPs are summarized in Figure 1.
Among these variants, six nsSNPs (D427Y, R514G, R708W, R710C, and R768W) are newly evaluated as ACE2 high-risk pathogenic nsSNPs and were selected for further investigation. The prediction from SIFT, PROVEAN, PolyPhen 2.0, PANTHER, ModPred, I. Mutant, and MUpro were shown to be significant with a p value equal to 6.1308E-29 of ANOVA test and correlated (Supplementary figure 1). Student t-test results between the software tools were significant with a p value less than 0.0001, suggesting that the selected algorithms are accurate enough to evaluate the pathogenicity of these nsSNPs.

3D
Modelling and Biophysical Validation of ACE2. The crystal structure of the ACE2 protein was retrieved from the Protein Data Bank (PDB ID: 6 M17, resolution at 2.9 Å) in complex with SARS-CoV-2 receptor-binding domain and sodium-dependent neutral amino acid transporter B(0)AT1. The structure was used as a template for the comparative modelling of mutant structures through SWISS-MODEL by submitting the FASTA AA sequence of ACE2 (percent identity equal to 100). The QMEAN server was used to predict the quality of the models, where the global QMEAN scores were 0:83 ± 0:05. This indicated that the predicted models were of good quality ( Table 6). The six tested mutants, namely, D427Y, R514G, R708W, R710C, R716C, and R768W, were subjected to PyMol using "Align command" and compared to the native structure with regard to conformational variations by calculating the RMSD (root mean square deviation) ( Table 7 and Figure 2). Project HOPE was used to identify the structural effects of mutations of interest. Results of project-HOPE of ACE2 SNPs were displayed in Table 8.

Discussion
Angiotensin-converting enzyme 2 (ACE2) plays an important role in the renin-angiotensin aldosterone system by metabolizing angiotensin II to angiotensin (1-7) [11]. This important enzyme was identified as a functional receptor for the severe acute respiratory syndrome coronavirus (SARS-CoV) and the novel coronavirus (SARS-CoV-2) [11]. Some studies suggest that the receptor-binding domains of SARS-CoV-2-S and SARS-CoV-S bind with identical affinities to ACE2 [25]. It has been shown that ACE2 genomic variants may play a key role in susceptibilities to COVID-19 [33]. The SARS-CoV-2 starts its infection by binding to ACE2 via its receptor-binding domain (RDB) [ -A predicted functional residue (highly conserved and exposed).
-A predicted structural residue (highly conserved and buried).
-Insufficient data -the calculation for this site was performed on less than 10% of the sequences.  been reported that the SARS-CoV-2 uses two mechanisms of host cell entry: the first mechanism on the ACE2 mediated virus endocytosis by using the clathrin-and caveolaedependent pathways. The second one is dependent on the transmembrane serine protease 2-(TMPRSS2-) mediated membrane fusion [34].
A 2020 study by Fahd Al-Mulla et al. has shown that the arginine residues at 708 and 716 positions play an important role in ACE2 cleavage by TMPRSS2 and TMPRSS11D, whereas mutations on R708 and R716 seem to reduce directly ACE2 cleavage by TMPRSS2 [32]. Another line of research (Hossein Lanjanian et al. 2021) suggests that the arginine residue at 710 position of the ACE2 receptor plays a crucial role in mediating ACE2-TMPRSS2 interaction by involving hydrogens bonds [38]. A second study achieved by Behrooz Darbani (2020) has identified the rs1316056737 (R514G) as an interaction inhibitor variant that might impact the interaction between the ACE2 receptor and the viral spike protein [39].  CBeta interaction energy: distance-dependent potential using CBeta atoms as interaction center; All-atom pairwise energy: assessment of long-range interactions; Solvation energy: description of the burial status of the residues; Torsion energy: analysis of the local backbone geometry [32]. In the current study, various algorithms, namely, SIFT, PolyPhen 2.0, PROVEAN, PANTHER, I. Mutant Suite, MUpro, and ConSurf, were used to identify the most deleterious nsSNPs of the ACE2 gene. In our in silico analysis, we identified six nsSNPs (D427Y, R514G, R708W, R710C, R716C, and R768W) from twelve nsSNPs. The six nsSNPs were predicted to be deleterious by at least six algorithms, decreased the stability of ACE2 by both MUpro and I. Mutant, and located in conserved regions with a score conservation of 9, expected D427Y, which may affect the structure and function of ACE2 protein.
ModPred was used to determine the posttranslational modification (PTM) sites of ACE2 identified R710 and R716 as PTM sites for proteolytic cleavage and R708 for ADP-ribosylation and proteolytic cleavage. Consequently, mutations at R708, R710, and R716 might affect PTM of the ACE2 gene.
The structural deviations between the native and modelled mutants were analyzed using the RMSD by measuring the average distance between the atoms of the superimposed proteins [40]. It has also been shown that RMSD values greater than 0.15 were evaluated to be significant and would have an impact on protein function and structure [40]. Therefore, the six models R427Y, R514, R708W, R710C, R716C, and R768W showed lower RMSD values, i.e., 0.010, 0.112, 0.112, 0.112, 0.112, and 0.010, respectively, which indicate minimal structural dissimilarity between the native and mutant models of ACE2.
According to HOPE, the AA substitution (R514G) introduces a glycine (flexible) and could disturb the required rigidity of the ACE2 protein and could affect the binding site where the mutation being located, while the mutation R716C is located in a region essential for cleavage by TMPRSS11D and TMPRSS2 proteases, and the difference in AA properties can disturb this region and its function.
Furthermore, by using the STRING server, ACE2 protein interacts with ten various proteins, namely, AGTR1, AGTR2, PRCP, REN, AGT, MME, DPP4, MEP1A, MEP1B, and XPNPEP2. The majority of those proteins play a crucial role in the regulation of blood pressure and kidney pathways, suggesting the association of ACE2 in kidney diseases. All the ten various proteins had a score ranging from 0.991 to 0.858, suggesting that proteins are partially biologically connected with an PPI enrichment p value equal to 1.29E-09. The proteins AGT, REN, AGTR1, and AGTR2 were found to have an important role in RAS signalling pathways such as regulation of blood pressure, body fluid and electrolyte homeostasis, and sodium retention by the kidney. The interaction (i) The WT is predicted to be located in its preferred secondary structure, a turn the mutant prefers to be in another secondary structure, therefore the local conformation will be slightly destabilized (ii) Mutation of the WT into none has the following effect SPD-PSN: slightly inhibits interaction with SARS-CoV spike gp (iii) The mutation is possibly damaging to the protein R514G (i) The mutant is smaller than the WT (ii) The mutation is located within a stretch of residues annotated in UniProt as a special region: essential of cleavage by TMPRSS11D and TMPRSS2 (iii) The difference in amino acid properties can disturb this region and its function (i) The WT is very conserved. In some rare cases, the mutation might occur without damaging the protein (ii) The mutant is bigger than the WT; this might lead to bumps (iii) The mutation introduces a more hydrophobic residue at this position; this can result in loss of hydrogen bonds and or disturb correct folding 8 BioMed Research International analysis using the STRING server suggested that the ACE2 protein not only participate in hypertension and cardiovascular diseases but also a candidate in COVID-19 infection. However, these findings showed that D427Y, R514G, R708W, R710C, R716C, and R768W, are the structurally and functionally most significant variants in the human ACE2 gene. (Table 7). It has been found that the residues ARG 708, 710, and 716, are located in the dimeric interface of human ACE2, and they are found to be important for its cleavage by the protease TMPRSS2, and this processing is indispensable for augmentation of SARS-S-driven entry into host cells [9]. The limitation of this in silico analysis is that the six deleterious variants should be confirmed with future extensive experiments and clinical wet lab approaches to figure out the mechanism of these mutations in susceptibility to COVID-19.

Conclusion
Our present in silico analysis of nsSNPs of the human ACE2 gene concluded that the mutations D427Y, R514G, R708W, R710C, R710H, and R716C are the most deleterious nsSNPs among the reported ACE2 gene variants. All six nsSNPs were predicted to be damaging and also affecting conserved positions, protein stability, and the posttranslational modification sites. Therefore, these mutations would likely affect the function on ACE2 with regard to host susceptibility to SARS-CoV-2 infection. Hence, the results of this study confirm previous findings and may be helpful for further understanding the role of these six ACE2 nsSNPs in susceptibility to COVID-19.

Data Availability
All data used in this study are included within the article and supplementary information file.

Conflicts of Interest
The authors declare that they have no conflicts of interest.