Probing the Peculiarity of EhRabX10, a pseudoRab GTPase, from the Enteric Parasite Entamoeba histolytica through In Silico Modeling and Docking Studies

Entamoeba histolytica (Eh) is a pathogenic eukaryote that often resides silently in humans under asymptomatic stages. Upon indeterminate stimulus, it develops into fulminant amoebiasis that causes severe hepatic abscesses with 50% mortality. This neglected tropical pathogen relies massively on membrane modulation to flourish and cause disease; these modulations range from the phagocytic mode for food acquisition to a complex trogocytosis mechanism for tissue invasion. Rab GTPases form the largest branch of the Ras-like small GTPases, with a diverse set of roles across the eukaryotic kingdom. Rab GTPases are vital for the orchestration of membrane transport and the secretory pathway responsible for transporting the pathogenic effectors, such as cysteine proteases (EhCPs) which help in tissue invasion. Rab GTPases thus play a crucial role in executing the cytolytic effect of E. histolytica. First, they interact with Gal/Nac lectins required for adhering to the host cells, and then, they assist in the secretion of EhCPs. Additionally, amoebic Rab GTPases are vital for encystation because substantial vesicular trafficking is required to create dormant amoebic cysts. These cysts are the infective agent and help to spread the disease. The absence of a “bonafide” vesicular transport machinery in Eh and the existence of a diverse repertoire of amoebic Rab GTPases (EhRab) hint at their contribution in supporting this atypical machinery. Here, we provide insights into a pseudoRab GTPase, EhRabX10, by performing physicochemical analysis, predictive 3D structure modeling, protein-protein interaction studies, and in silico molecular docking. Our group is the first one to classify EhRabX10 as a pseudoRab GTPase with four nonconserved G-motifs. It possesses the basic fold of the P-loop containing nucleotide hydrolases. Through this in silico study, we provide an introduction to the characterization of the atypical EhRabX10 and set the stage for future explorations into the mechanisms of nucleotide recognition, binding, and hydrolysis employed by the pseudoEhRab GTPase family.


Introduction
Membrane dynamics play an indispensable role in not only the transport of molecules through the parasite and its environment, but it is also a requisite for the host-parasite relationship. Membrane dynamics come into play at the first point of contact with the host and eventually support the creation of multiple copies of the parasite inside the host [1,2]. Phagocytosis is a well-known phenomenon of membrane dynamics observed universally from single-celled amoebas to the complex defences of the human body. Entamoeba his-tolytica, a deadly tropical pathogen, adopts not one but two membrane modulation phenomena, namely, phagocytosis and trogocytosis, to create a severe prognosis of invasive amoebic colitis [3].
This seemingly simple organism takes more than 55,000 human lives each year and is the leading cause of death by diarrhoea in children below 5 years of age in low-income countries. It often lurks silently inside the human body until one day under the right stimulus; it manifests itself as severe amoebic colitis with a mortality rate of more than 50%. This makes it imperative for us to understand more about the pathogenesis of E. histolytica, focusing on the attachment, invasion, and cytolytic abilities of the pathogen; all these processes are governed by vesicular trafficking [4,5].
The various steps of the membrane dynamics in a cell are coordinated by small GTPases that are essentially GTP/GDP molecular switches. According to their sequence, structure, and reactivity to botulinum toxin C3, they are divided into five families and subsequent subfamilies. Rab is the largest subfamily of the Ras-like GTPase superfamily. Different Rab GTPases are localized in different cellular compartments and orchestrate the vesicular trafficking seamlessly [3,[6][7][8]. The genome of the pathogenic amoeba E. histolytica has 102 annotated genes encoding a stupendous number of 91 Rab GTPases, yet the highest documented number in the available genomes of the eukaryotic kingdom. Some of these are perfect homologues to classical GTPases and others largely divergent. The vigorous and atypical endocytosis machinery observed in E. histolytica (Eh) provides support to the existence of a wide variety of Rab GTPases required to regulate this delicate machinery [9,10].
The G-domain in the Rab GTPases facilitates shuttling between GTP-and GDP-bound stages. It belongs to the most common protein fold in the natural world, the P-loop containing fold family of nucleotide hydrolases [11]. The five motifs of this universal fold are designated G1 to G5 with highly conserved sequences-GDSGVGKS (G1/P-loop), T (G2/switch I), DTAGQ (G3/switch II), GNKCDL (G4), and SAK (G5), pivotal to the functioning of the GTPase. Previous studies have reported that 52 out of the 91 EhRab enzymes have more than 40% identity to human, yeast, and amoebic homologues. However, the remaining 39 are peculiar in their G-domain sequences and do not show substantial identity to other eukaryotic small GTPases. Thus, a separate group was created for these 39 Rab GTPases. Three of these were previously discovered (EhRabA, EhRabB, and EhRabH) and 36 were newly identified proteins (EhRabX members) [12,13]. Surprisingly, only nine amoebic GTPases have been characterized to date and even fewer are well understood [14][15][16][17][18]. One cannot stress enough the importance of investigating this huge repertoire of small molecular switches in amoebic protozoans, which underpin the complex vesicular arrangements seen in E. histolytica and probably aid in making it a notorious pathogen [19]. It was initially believed that E. histolytica lacks endoplasmic reticulum and Golgi system; however, more recent studies have provided proof of the presence of an ER-Golgi system albeit not of the conventional type. The ER is distributed uniformly in the amoebic cytosol and not clustered together near the nucleus. Another unexpected observation was the continuous synthesis and movement of new proteins through the cell even after the collapse of Golgi. This indicates the presence of an atypical membrane transport system involving the diversified collection of Rab GTPases in the pathogenic amoeba [9,20,21].
The applications of vesicular dynamics can have a direct impact on creating therapeutics to control amoebiasis, currently managed only by the nitroimidazoles which do not come without their barrage of toxic side effects [4]. One such application is mentioned in a recent paper communicating that proanthocyanidins via manipulation of the multivesicu-lar body (MVB) pathway act as effective antitrypanosomal compounds displaying low toxicity in humans [22]. Taking the lead from here, we decided to characterize the grossly underrepresented master membrane regulators, amoebic Rab GTPases. We selected a greatly divergent protein from the atypical EhRabX family, EhRabX10, and subjected it to computational biology delving into the physicochemical analysis, protein structure modeling, molecular docking, and proteinprotein interaction (PPI) analysis to explore the premise of discovering the functioning of this atypical GTPase in the vesicular transport machinery of pathogenic E. histolytica.

Physicochemical
Characterization. The physicochemical properties of the target protein sequence were assessed on molecular weight, isoelectric point (pI), extinction coefficient, instability index, aliphatic index, and the GRAVY (grand average of hydropathicity) index using the Expasy ProtParam tool (https://web.expasy.org/protparam/) [23].

Sequence Alignment and Conserved Motif Identification.
The Clustal W 2.0 [24] tool at the NPS server was used to obtain multiple alignments of the EhRabX10 sequence with the other proteins of relevance. The identification of conserved domains was done through the NCBI Conserved Domain Database [25].

Structure
Modeling and Validation. The information regarding the structure of EHI_096440 is not available in the RCSB PDB database [26]. Thus, we utilized homology modeling to generate a protein structure of EHI_096440 (EhRabX10). Protein templates for in silico modeling of EhRabX10 were obtained by running a BLAST search through the PDB, NCBI, I-TASSER, and SWISS-MODEL databases [26][27][28]. The sequence similarity and query coverage cut-offs were 30% and 70%, respectively. The normalised Z-score (template-target alignment score) and global model quality estimate (GMQE) cut-offs were 1.0 and 0.5, respectively. The templates above the cut-off values were used for homology modeling of EhRabX10, through SWISS-MODEL via the Expasy web server [29]. The homology models were then analyzed for their quality via the SWISS-MODEL server. Further quality analysis was done by assessing the structural deviation (RMSD) between the template and the model, through PyMOL Molecular Graphics System (http://www.pymol.org). The final selected model of EhRabX10 was subjected to validation of the structural quality using the Ramachandran plot [30] through the PDBSUM server (http://www.ebi.ac.uk/ thornton-srv/databases/pdbsum/Generate.html) [31,32].

Protein-Protein Interaction Study.
To find out the interacting partners of our protein of interest, EhRabX10 (EHI_ 096440), we used the STRING version 11.0 (http://string-db .org/). It generated a network view of the interacting proteins. were selected as templates.
Normalised Z-score cut-off: 1.0 (the higher, the better) PDB structures with Z-score > 3.0 were selected as templates. These templates had best structural and sequence alignment based on the above score.

BioMed Research International
These interacting proteins were displayed as nodes, with the node edges indicating the confidence scores. The confidence scores were generated by integrative STRING algorithms that collect and compute information from seven evidence parameters; these are experiments, text mining, cooccurrence, gene fusion, coexpression, conserved neighbourhood, and databases. The multiple coloured lines in the network map represent the seven evidence parameters supporting the interaction [33,34].
2.6. Docking Studies. The proteins with high confidence interactions were subjected to molecular docking via the ClusPro server 2.0 (https://cluspro.org) [35]. None of the interacting partners of EhRabX10 had their three-dimensional structure defined in the databases. Since only the PDB format is accepted as input in ClusPro, we subjected the interacting proteins to homology modeling via the SWISS-MODEL server. We performed the docking of EhRabX10 with the 3D structures of the interacting partners and the algorithm generated eighty docked sets. These docked sets were based on the ClusPro PIPER algorithm energy terms: hydrophobic and electrostatic interactions, Van der Waals forces, and interatomic charges. The low-energy docked structures were

Molecular
Modeling of EhRabX10 (EHI_096440), Proteins from Entamoeba histolytica. To better understand the relevance of EHI_096440, a structure was needed and the absence of a three-dimensional X-ray structure of EHI_096440 in the PDB database necessitated homology modeling of the protein.
A thorough search for ideal templates was done via the servers of NCBI, PDB, I-TASSER, and SWISS-MODEL. The templates were selected based on the highest similarity to the sequence and the structural folds (secondary protein structures), predicted model quality, and percent query coverage (QC) (Figure 1(b)). The following templates scored higher than the cut-off values as mentioned in Section 2.4 and were selected for homology modeling: Rab GTPase Sec4p of Candida albicans (PDB ID: 6O62) [40], Ras-related protein Rab39B of Homo sapiens (PDB ID: 6S5F) [41], GTPase Kras isoform 2B of Homo sapiens (PDB ID: 4DSU) [42], GTPase Kras isoform 2B of Homo sapiens (PDB ID: 4DST) [42], and GTP-binding protein YPT7P of Saccharomyces cerevisiae (PDB ID: 1KY3) [43]. Homology models, based on all the above templates, were subjected to further scrutiny by assessing the QMEAN, GMQE (global model quality estimate), and RMSD (root mean square deviation) values (Table 1). Models with QMEAN scores closer to zero are of better quality, and models with QMEAN scores below −4.0 are rejected. Additionally, models with higher GMQE scores are considered of higher quality. Furthermore, a good model has a very low (closer to zero) root mean square deviation (RMSD) from the template.
Here, we observed the highest GMQE score (0.61) and the least RMSD value (0.124 Å) for the model built on the template, Rab GTPase Sec4p of Candida albicans (PDB ID: 6O62). The query coverage (QC) was good (86%), covering all the crucial G-motifs of the G-domain barring only the terminal residues (1-8 and 174-188) of the EhRabX10 sequence. Additionally, it also had a good model QMEAN score of −1.51 which is closer to zero ( Table 1). All these factors indicate the 6O62-based EhRabX10 model to be of the most credible quality [44]. This model of size 18.15 kDa was selected as final, and all further assessments were done using this model (Figures 1(c)-1(e)). The analysis of the structural quality of the EhRabX10 model was done through the Ramachandran plot (Figure 2(a)). The plot computed that 90.8% residues of the EhRabX10 structure lay in the most-favoured regions, 7.9% in the additionally allowed regions, 1.3% in the generously allowed regions, and none in the disallowed regions indicating the modeled structure to be of high quality when compared to the stereochemistry of protein structures experimentally decoded to date [30,45].
The topological model generated by PROCHECK (Figure 2(b)) revealed our protein to possess the classical   BioMed Research International fold of the Rab GTPase family. This fold is composed of a six-stranded β-sheet with one antiparallel and five parallel strands which are interspersed by five distinct α-helices. In agreement with the convention, the four Rab motifs of the G-domain (G1-G5), except the G2 box (absent), are located in the loops connecting the α-helices and β-strands, and this can be observed in the modeled structure (Figure 2(b)).

3.3.
Interactome. The STRING v11.0 server provided an interactive network (Figure 2(c)) composed of ten predicted interactions with the protein of interest, EhRabX10 (EHI_096440). Each of these interacting proteins had a combined confidence score computed by combining the individual scores of each of the seven evidence parameters mentioned in Section 2.5. Indeed, all the interactions had a combined confidence score   Table 2). We selected the protein partners with high confidence scores (>70%) for molecular docking studies (Table 3). Since none of the predicted interacting partners had a 3D structure resolved, we resorted to homology modeling and those providing the highest quality structures were used for the docking studies in ClusPro 2.0 server. Among the six, high confidence interactions, Myb-like DNA-binding domaincontaining protein (EHI_000550) was not amenable to predictive modeling; however, the other five were able to produce protein structures of reliable standard ( Table 2). Out of these five, we selected two proteins for further studies: the Rab family GTPase EhRabC8 (EHI_170390) and the syntaxin-binding protein, EhSec1 (EHI_093130). These proteins were subjected to docking after a final structural quality check where both fared excellently with high GMQE scores of 0.62 and 0.60 of EHI_093130 and EHI_170390, respectively ( Table 3). The functional links of these docked partners of EhRabX10 were supported by the STRING evidence parameters of in vitro experiments, text mining, homology, and coexpression.

Assessment of the Docked Complexes.
The modeled 3D structures of interacting partners, EHI_170390 and EHI_ 093130, were fed into ClusPro 2.0 server as receptor proteins and EHI_096440 (EhRabX10) as the ligand. The low-energy docked complexes were rendered into four groups of energy coefficients; these are balanced, electrostatically favoured, hydrophobic favoured, and Van der Waals. We assessed the top 10 complexes from each group using PyMOL and supplemented the assessment with quantitative data from

BioMed Research International
Protein Interaction Calculator (PIC) server for both the receptor proteins (Figures 3(a) and 3(b)). After careful examination of~80 docked sets, we identified the interacting residues of EHI_096440 to fall mainly in the switch II region composed of G3 box (DTQDME), RabF3 box (DISYIT), and RabF4 box (YY), for both the partners, EHI_170390 and EHI_093130 (Figure 3(c)). As established for classical Rab GTPases, the switch II region is involved in interaction with Rab effectors and regulators [11,46]. Thus, we can speculate from the molecular docking data that this region might also serve the same function in EhRabX10.
The interacting residues of EHI_170390 (EhRabC8) are positioned at residue 3 Gln (N-terminus) and residues 45-79 (mainly in RabF1 and RabF3 regions) (Figure 1(a), Table 4). The interacting residues of EHI_093130 are positioned at the C-terminal (485-521) ( Table 5). Singularly, we also found residues 41 Phe and 42 Asp of EhRabX10 to participate in hydrophobic and ionic interactions with the partner proteins. Based on the multiple alignment data (Figure 1(a)), these two residues fall in the region annotated as RabF1 in canonical Rab GTPases but have not yet been annotated for EhRabX10. This observation should be explored in future studies.

Discussion
Rab family members, though sharing the conventional Gdomain of P-loop containing nucleotide hydrolases, are distinguished from the other small GTPases by virtue of the Rab family regions (RabF) flanking the G-domain motifs. Moreover, these regions are bracketed by Rab subfamily   [46]. The discovery of these regions aided in the identification and classification of new Rab GTPases, among which lies our enzyme of interest, EhRabX10, of the pathogen Entamoeba histolytica [12].
With nothing known about this novel GTPase, we started with the physicochemical characterization based on the amino acid sequence of EhRabX10, where the instability, aliphatic, and GRAVY indices were calculated. The instability index (II) indicates the stability of a protein in a test tube. EhRabX10 has an II of less than 40.0; it is stable in vitro. The aliphatic index (AI) is the relative volume occupied by the aliphatic side chains, and a high value of 76.65 signifies our protein to be thermostable. Apart from being stable, it is also essential for a protein to be pure for its study and applications, thus underlining the importance of knowing the extinction coefficient of a protein. This coefficient conveys the amount of light a protein absorbs at a certain wavelength. The extinction coefficient of EhRabx10 was 19160 M -1 cm -1 at 280 nm, measured in water, a value of importance when considering the spectrophotometric analysis of purified protein at 280 nm. The negative GRAVY index of hydropathicity defines our protein as globular and hydrophilic, which endorses the small GTPase annotation of EhRabx10.
However, an interesting contrast during sequence-based evaluation revealed the absence of the prominent G2 box threonine, vital for binding guanine nucleotide and Mg +2 ion in small GTPases and is the most conserved motif of the GTPase fold [37]. In addition, a massive divergence was noted in the other conserved motifs of the globular Gdomain through sequence assessment. Multiple alignment of EhRabX10 with classical Rab GTPases of humans (HRas [37]and HRab5 [38]) and Entamoeba histolytica (Rab5 [47]) and another nonclassical GTPase (EhRabX3 [48]) revealed maximum deviations in the G3, G4, and G5 motifs (Figure 1(a)). The conserved residues of G3 (DxxGQE), G4 (NKxD), and G5 (SAK) are crucial for GTP/GDP state switching and hydrolysis and such significant alterations in the G-domain might affect the enzymes functioning greatly [37,49]. We thus have labelled EhRabX10 as a pseudoRab GTPase possessing noncanonical G-motifs.
Homology modeling was necessitated by the absence of the crystal structure of EHI_096440 (EhRabX10). It was a rigorous process in the bid to create the best possible predictive 3D model. After iterative examination of several templates and their corresponding models, Rab GTPase Sec4p of Candida albicans (PDB ID: 6O62) was used as a template to model EHI_096440. The crystal structure of 6O62 protein was itself resolved at a high resolution of 1.88 Å, thus providing a detailed and reliable template for creating our model. Consequently, the predicted structure was of a high standard as mentioned in Section 3.2; none of the residues lay in the disallowed region of the Ramachandran plot, illustrating the feasibility of the phi and psi dihedral angles of all residues in our structure. Extended proof of template credibility was exhibited by the low structural deviation (RMSD 0.124 Å) between the template and the model. The model was approximately 18.15 kDa in size, which supports its annotation of a small Rab GTPase. The EhRabX10 structure was also aligned with other notable Rab GTPases, among which EhRabX3 was of consequence (Figure 4), being the only documented pseudoRab GTPase with its structure defined through X-ray diffraction by Srivastava et.al [48]. Quite unexpectedly, the least deviation was observed in EhRabX10 alignment with HRab5 (RMSD 1.978), not EhRabX3 (Table 6), and this implies that all noncanonical Rab enzymes are not identical in their structural arrangements. More research is needed to better understand the pseudoGTPase family.
In agreement with the classical Rab fold [37], EhRabX10 houses six β-strands inside a partial shell of five α-helices with the loops connecting the secondary structures (Figure 1(c)). These loops contain the functional motifs (G1-G5), albeit the motifs are quite divergent as compared to the conserved sequences of conventional Rab GTPases. To explore the effects of these altered sequences on the enzyme functionality and interaction with other biomolecules, we generated an interactome of EhRabX10 (EHI_096440) through the STRING v11.0 server. The interacting partners predicted with high confidence fell mostly in the category of fellow Rab GTPases and one in the syntaxin-binding protein (STXBP) family that regulate vesicle docking and fusion [50,51] (Table 3). Highquality structures of EhRabC8 (EHI_170390) and EhSec1 (EHI_093130) were built through SWISS-MODEL and were then used for docking studies in ClusPro 2.0. Experimental evidence acquired from STRING v11.0 demonstrated a strong 10 BioMed Research International functional link between EHI_170390 and EhRabX10 (EHI_ 096440). It was based on affinity chromatography and tandem affinity purification assays. Coexpression evidence in the interactome showed that the STXBP partner (EHI_093130) is often coexpressed with Rab GTPases and controls the late-stage vesicular trafficking. Thus, we selected these two proteins for docking studies [50]. As described in the opening of this discussion, the RabF and RabSF regions are diagnostic for the diverse Rab GTPases found in the eukaryotic kingdom. These regions are clustered in the switch I and II regions of the GTPase [46]. In-depth tracking of residues of the protein-protein interaction complexes showed that the G3 box, RabF3, and RabF4 regions comprising the switch II loop were engaged in the docked interface of EHI_096440 (Figure 3(c)). Accordingly, it can be speculated that the diverged Gmotifs may not render EhRabX10 inactive and it may function as an atypical GTPase similar to the known functionally active hydrolase, EhRabX3, which also lacks the G2 domain. However, the altered G4 motif in EhRabX10 could considerably affect the nucleotide recognition potential of this pseu-doGTPase [16,48,52]. A closer inspection is required to validate these hypotheses.
Previous literature has shown that the yeast homologue of Rab8, the Sec4 protein, plays a regulatory role in the late-stage vesicular secretory pathway [1]; EhRabX10 is modeled using a template that is a Sec4 protein (PDB: 6O62). Additionally, EhRabX10 is predicted to interact with syntaxin-binding protein of the Sec1 family (EHI_093130). These observations suggest the putative involvement of EhRabX10 in the cascade of Rab-Sec signalling in the latestage vesicular pathway [53]. There is limited knowledge about the role of amoebic Rab GTPases, except EhRab5, in the early stages of the endocytic pathway [15,47]. In addition, there is a huge gap in understanding the orchestration of the secretion of various virulence factors such as amoebic cysteine proteases (EhCPs). E. histolytica (Eh) houses a huge arsenal of EhCPs, and these are the major contributors to the cytolytic potential of E. histolytica (Eh) [8,54]. Eh also contains a complex and unique system of vesicle-based de novo protein trafficking that continues to function even after the collapse of the Golgi complex [20]. This compels us to explore the role of these noncanonical Rab GTPases in regulating the stages of vesicle trafficking and the de novo synthesis of virulence factors in the peculiar vesicular transport system of E. histolytica [9,20]. The absence of these pseudoRab GTPases (EhRabX family) in humans hallmarks them as targets for future exploration and justifies extensive research in discovering their mechanisms of nucleotide recognition, binding, and hydrolysis which may throw new light on the complex membrane dynamics of enteric amoebic protozoa.

Data Availability
Others will be able to access these data in the same manner as the authors. The authors did not have any special access privileges that others would not have.

Conflicts of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this review.