Homology Modeling and Virtual Screening Studies of Antigen MLAA-42 Protein: Identification of Novel Drug Candidates against Leukemia—An In Silico Approach

Monocytic leukemia-associated antigen-42 (MLAA-42) is associated with excessive cell division and progression of leukemia. Thus, human MLAA-42 is considered as a promising target for designing of new lead molecules for leukemia treatment. Herein, the 3D model of the target was generated by homology modeling technique. The model was then evaluated using various cheminformatics servers. Moreover, the virtual screening studies were performed to explore the possible binding patterns of ligand molecules to MLAA's active site pocket. Thirteen ligand molecules from the ChemBank™ database were identified as they showed good binding affinities, scaffold diversity, and preferential ADME properties which may act as potent drug candidates against leukemia. The study provides the way to identify novel therapeutics with optimal efficacy, targeting MLAA-42.


Introduction
Cancer is a result of a progressive accumulation of epigenetic changes and genetic aberrations, which lead to uncontrolled accumulation of blood cells [1]. Leukemia is an incurable disease, characterized by the unrestrained proliferation of abnormal white blood cells, readily spread to the entire body through blood stream and lymphatic system [2]. It starts in blood-forming organs and directs the formation of abnormal blood cells which can divide to produce copies of themselves [3]. Acute monocytic leukemia (AMoL) is an irremediable disease in older adults [4]. e present need is to identify novel and more effective therapies for AMoL to alleviate the suffering of patients.
MLAA-42 is one of immunogenic antigens and a novelidentified monocytic leukemia-associated antigen which is overexpressed in acute monocytic leukemia. In an attempt to uncover the mechanism of MLAA-42 overexpression, it significantly works as an elongation factor which plays an important role in polypeptide elongation in leukemia as shown in the biochemical pathway ( Figure 1) [5]. Furthermore, MLAA-42 plays a critical role in the transformation of normal cells to malignant phenotype through binding with GTP bound protein leading to unrestrained proliferation of the abnormal white blood cells which lead to AMoL. In the meantime, inhibition of proliferation of bone marrow cells, which takes place if there is no binding between MLAA-42 and GTP, is caused directly to prevent formation of AMoL [5].
Heterocyclic compounds have been found to gain extensive interest due to their considerable wide range applications in medicinal, pharmaceutical, and pharmacological activities especially as anticancer agents [6,7]. In the present study, computer aided virtual screening studies were carried out using ChemBank ™ database to report new molecular entities which can be used as potent inhibitors against MLAA-42 protein. is study warrants further investigations to indicate that MLAA-42 is a novel target for designing newer drug-like molecules for leukemia treatment, using in silico approaches.  For the case where the target MLAA-42 is not experimentally determined by means of X-ray crystallography and nuclear magnetic resonance (NMR), the computationalbased techniques are used for 3D structure prediction [8].

Generation of the ree-Dimensional Structure of
e FASTA sequence of the target protein (ID : Q6W6M8) with 151 amino acid residues was identified from Universal Protein Resource "UniProtKB" database (https://www. uniprot.org/uniprot/Q6W6M8) [9]. e template was recognized by subjecting the obtained sequence to BLAST, Jpred3, and Domain Fishing servers, and Table 1 represents the resulted maximum E-value. e 1N0V "Crystal structure of elongation factor 2" was selected as a template protein, on the basis of parameters such as query cover (77%), maximum identity (63%), and statistical E-value (8 × 10 −47 ) with the target [10]. ClustalW server was used to perform the alignment of both elongation factor proteins MLAA-42 and 1N0V (541-720 aa), as shown in Figure 2. e conserved amino acid residues are elucidated as ( * ) (39.3%), highly similar residues as (:), and weakly similar residues as (.) (51.7%). e individual percentages, resulted from the server, refer to the identity and similarity of the target residue sequence with its template sequence, respectively.
Twenty-five models of MLAA-42 were generated by using Modeller 9.11 software, depending on satisfying spatial restraints in terms of probability density function [11]. e 3D model with fewest restraint violations, lowest probability density function (855.33), and acceptable geometry was selected for further validation [12]. Figure 3 represents the 3D model of the target, which consists of five alpha-helices and six beta-strands. e secondary structures of both 1N0V and MLAA-42, which represent the amino acid sequences of helices and strands in both proteins, are represented in Figures 4 and 5 and ( Figure S1 and Tables S1 and S2) in the supplementary data section, using PDBsum server [13]. To obtain a stable confirmation, energy minimization and protein loop modeling were performed by GROMOS96 Force-Field set, using Swiss-Pdb Viewer [14]. To get insight of the superimposition between the proteins MLAA-42 and 1N0V, the root-mean-square deviation RMSD for the backbone of MLAA-42 and 1N0V is 0.51Å (within the permissible range i.e. ≤2Å), which indicates close homology and ensures reliability of the model, as declared in Figure 6.  e physicochemical properties of MLAA-42 were predicted using ProtParam tool [15]. e protein sequence has 151 amino acid residues with molecular weight of 16.871 kDa. Furthermore, the most abundant amino acid residues are LEU, SER, GLU, ALA, VAL, LYS, GLY, and ASP, respectively, in high percentages in the target protein, as shown in Figure 7. Lucine has the highest abundance (9.9%), and methionine has the lowest abundance (0.7%). e physicochemical parameters predicted a       Computational and Mathematical Methods in Medicine

Validation of the Model and Active Site Analysis.
e 3D model was validated by different tools such as PROCHECK (Ramachandran plot) and ProSA to check its structural integrity. Ramachandran plot (Figure 8) of the modeled protein represents 87.3% (117 aa) of the total residues in the most favored regions and 11.9% (16 aa) in additionally allowed regions, indicating a good quality model.
Moreover, ProSA Z-score (dark spot) is −2.85, which falls within the values range of the known proteins determined by X-ray (light blue) and NMR (dark blue) (see Figure S2, in the supplementary data section), predicting a reasonable quality model. Figure S3, in the supplementary data section, shows the 3D model quality by plotting energies (X-axis) as a function against residue sequence position (Y-axis). e figure shows that most of residues of MLAA-42 have negative energies, which indicates an acceptable model. In view of the facts mentioned above about the unavailability of 3D structure of MLAA-42 in protein data bank PDB, thus, there are no previous data obtained for the binding regions of the target. erefore, to overcome the lack of experimental data, herein, CASTp and SiteMap cheminformatics tools were used to identify the binding site cavities of the target (Figures S4 and S5 and Tables S3 and  S4), in the supplementary data section, respectively. CASTp server gives information about hydrophobic pocket regions; in addition, it measures cavity volume and area. e results declared four sites, site 1 has volume of 40.9Å 3 , site 2 with volume of 28.2Å 3 , site 3 with volume of 50.7Å 3 , and site 4 has volume of 105.4Å 3 . Among the four different cavities obtained from CASTp, the amino acids from PHE44 to ILE96 are consistently present in all sites. SiteMap gives information about the hydrophilic and hydrophobic regions present on the protein surface. It identified two binding cavities on MLAA-42 protein.
e site 1 with volume 1222.79Å 3 and size of 4.38Å and site 2 has volume of 46.30Å 3 and size of 32Å. e active sites predicted by SiteMap are from TYR38 to ILE100. In conclusion, these computational prediction techniques give information about the binding site regions that are composed of TYR38 to ILE100 residues are responsible for MLAA-42 docking with ligand molecules. e docking interactions show that the residues TYR38, LYS40, THR89, ILE96, ASP99, and ILE100 are observed to be crucial to identify the antagonists against MLAA-42 protein. Figure S6, in the supplementary data section, represents the grid with dimensions 80Å × 80Å × 80Å around the binding site, to perform further studies.

Molecular Docking Analysis and ADMET Profile.
e compounds with heterocyclic core structures are pharmaceutically important class of compounds because of their diverse range of biological importance such as anticancer  activity against cancer cell lines [6,7]. In view of the facts mentioned above and as a part of our efforts to identify new anticancer drug candidates [16], a ChemBank ™ library of 2344 small molecules with an anticancer activity was downloaded [17] for specific docking. Computer-aided screening protocol at the binding regions of the target was performed using virtual screening workflow docking program of Glide, Schrodinger suite. e glide docking program identifies specific structural motifs and provides exceptionally large contribution to enhance binding affinity [18].
e docking approach was run in a flexible docking mode which automatically generates different confirmations for each ligand molecule. e total isomeric structures (13,354) obtained after ligand preparation was subjected to high throughput virtual screening (HTVS) mode and 10% hit structures (1335) obtained as output, followed by 10% of them, 133, screened in standard precision (SP) mode, and finally the 10% of the resultant were screened in the Extra precision (XP) mode to obtain 13 hits. e docked compounds were prioritized according to their binding energies (in range of −12.19 to −9.22 Kcal/mol) after docking to the active site pocket of the target protein, as tabulated in Table 2.
A sample of six best docking interactions between the ligands and protein, arranged rankwise based on their binding energies, is depicted in Figure 9 and Table 2. e top six ligand-protein complexes were visually checked in Discovery Studio Visualizer (for 3D representation) and Maestro (for 2D representation), which exhibited the noncovalent interactions such as hydrogen bonding and pipi stacking [19,20]. In 3D representation, the binding residues of MLAA-42 are shown in yellow-colored stick model, the ligands in purple one, H-bonds are in black dotted lines, and pi-pi interactions are in orange lines Lucine has the highest abundance (9.9%), and methionine has the lowest abundance (0.7%).   ( Figure 9). While in 2D representation, the residues are shown in a 3-letter code, the hydrogen bonds are represented in pink lines, and pi-interactions are shown in green lines. e ligand molecules with heterocyclic moieties such as thiophene, pyrazole, and pyrolidine and amide groups such as carboxamide (-CONH-) and sulphonamide (-SO 2 NH-) in ligands L1-L6 are observed to be common pharmacophore groups which interact with the active site region (TYR38, LYS40, TRP 85, ILE96, THR98, ASP99, and ILE100) of MLAA-42 through a network of hydrogen bonds and pi-interactions, as represented in Table 2. e carboxamide group, which is present in all ligand molecules, formed intermolecular hydrogen bonds with the residues TYR38, LYS40, TRP 85, ILE96, THR98, ASP99, and ILE100. In addition, the sulphonamide group, which is present in ligand L4, formed a hydrogen bond interaction (N---H---O) with the residue THR98 at the distance of 2.65Å. Furthermore, the tryptophan (TRP85) formed π-π interactions (face to face) with the benzene ring of ligand L2 on its side chain plus the pyrazole ring of ligand L5, at the distance of 4.95 and 4.87Å, respectively. e intermolecular hydrogen bonds and pi-interactions add more stability to the docked complexes [21]. e other docked compounds to the target are included in supplementary data section as Figure S7. SASA values of MLAA-42, before and after docking, were analysed using Discovery Studio Visualizer (see Figure S8, in the supplementary data section). e decrease in SASA values confirms that the amino acid residues (TYR38 to ILE100) are involved in bonds formation with ligand molecules (a)-(f). e ADMET properties for the resulted ligand molecules were determined in silico by using the Qik-Prop module of the Schrödinger suite. e molecules with agreeable ADME properties are considered as new novel drug candidates, as shown in Table 3. Interestingly, all the compounds have a molecular weight in the acceptable range of 349.7 to 511.3 Daltons (less than 725 Dalton). BBB + value describes the ability of the compounds to cross the blood-brain barrier, which is in the permissible ranges for all compounds. ey had <10 hydrogen bond acceptors and <5 hydrogen bond donors, and log P values of <5. ese properties are in the reasonable range of Lipinski's rule of five (LORF) [22]. e human oral absorption, partition coefficient (QPlogPo/w), and water solubility (QPlogS) values are in the agreeable range defined for human use, which indicates their possible ability to be drug candidates, as shown in Table 3. Also, the values show that the compounds can be absorbed by the human intestines. Furthermore, all molecules (a)-(f ) displayed negative AMES toxicity test which means that the ligands are nonmutagenic. Also, they displayed negative carcinogenicity test.
ese ligands can be considered as highly potent inhibitors against leukemia.

3D Homology Model of MLAA-42.
e protein's threedimensional structure is required to understand its function [23]. Herein, an in silico homology modeling generates a three-dimensional (3D) for an unknown structure of protein (target) depending on one or more proteins of known structures (templates) as reported earlier by Aboubakr et al. in 2016 [24,25]. e generation of (3D) structure of MLAA-42 was carried out using comparative modeling techniques. e amino acid sequence of MLAA-42 is retrieved from the UniProtKB database in FASTA format. e resulted sequence is submitted to BLASTp, Jpred3, and Domain fishing servers, to identify suitable templates [26][27][28]. E-value is identified the similarity between the target and template proteins [29]. e template with maximum identity is selected for generating the 3D structure of the target. e pairwise sequence alignment of MLAA-42 with its template is performed using the ClustalW tool to recognize the structurally conserved regions [30]. ClustalW server uses the Gonnet matrix algorithm to make certain the presence of conserved motifs and to minimize the atomic gaps. e % identity of alignment is measured as a ratio number of the identical residues in the alignment and the total number of residues of the target. e % similarity is measured as a ratio of the total number of identical and similar residues to the total number of residues of the target. e 3D structure of the target is generated using Modeler 9.11. en, the model is energy minimized using SPDBV and Impref in Schrödinger to obtain a stable conformation [31]. e RMSD value, which is calculated using the Swiss-Pdb viewer, is used to find the best superimposition for protein structures.
e lower value (≤2Å) means the best alignment between the structures.

3D Structure Validation.
Subsequently, quality of the generated model is evaluated by computational protocols such as PROCHECK and Protein Structural Analysis (ProSA) [32,33]. PROCHECK, a program that relies on Ramachandran plots for structure verification, figures out the stereochemical quality of the model. Furthermore, ProSA is applied to check for energy criteria in comparison with a large set of known protein structures with similar size [34].

Binding Site Prediction and Ligand Preparation.
It is the hydrophobic cavities on the surface of a protein, which is responsible for its specificity [35]. It is theoretically determined by means of tools based on recent theoretical and algorithmic results of computational geometry such as CAST-p and SiteMap of Schrödinger suite [36,37]. ese computational prediction tools analytically furnished the area and volume of cavities. e grid is created around the binding cavity to perform further structure-based virtual screening studies [38]. LigPrep module of Schrödinger suite is used to generate various conformers of small molecule depending upon its structural features [39].

3.4.
In silico Molecular Docking, SASA, and ADME Analysis. Molecular docking protocol is used to predict the preferred orientation of a ligand molecule to the target protein to form a stable complex [40,41]. e docking accuracy is determined by finding how closely the binding confirmation with Computational and Mathematical Methods in Medicine the lowest energy of the cocrystalized ligand molecule predicted by the object scoring function; G-score (Glide score) resembles an experimental binding modes determined by X-ray crystallographic technique [42]. Once the 3D model of MLAA-42 is validated and the target binding pockets are defined, the docking-based virtual screening study is performed using the Glide docking tool incorporated in the Schrödinger package by Maestro [43]. e virtual screening approach is performed through hierarchal flexible docking methods: High roughput Virtual Screening (HTVS), Standard Precision (SP), and Extra Precision (XP). e ligand molecules are prioritized on the basis of the docking score, docking energy in each step, and by default 10% of the molecules selected and considered for the next hierarchical step [18]. Solvent Accessible Surface Area (SASA) of the target, before and after docking, is computed using Accelrys Discovery Studio Visualizer 3.5 software. Prediction of drug-like profiles, such as physicochemical, pharmacokinetic, and safety of the compounds, is performed using the QikProp module [44].

Conclusion
In this work, the computer-aided drug design protocols were used to identify novel leads against MLAA-42 protein. e homology model of the target was evaluated by homology modeling techniques. In silico molecular docking were also adopted to identify the lead compounds. e resulted molecules with heteroscaffolds and amide groups (-CONHand-SO 2 NH-) exhibited better estimated binding energy values and agreeable pharmacokinetic properties and were ranked as potent drug-like candidates against the target protein. Hence, MLAA-42 has emerged as a therapeutic target for treatment of leukemia carcinoma.

Data Availability
e data used to support the findings of this study are included within the article.  Figure S1: the secondary structure of MLAA-42. e 3D model of the target protein consists of five helices as red cylinders and six strands as pink arrows. Figure S2: e 3D model quality of MLAA-42. e score (−2.85) is falling in the range of PDB proteins identified by NMR (dark-blue region) and X-ray crystallography (light-blue region). Figure    quality by plotting energies as a function against amino acid sequence position. Figure S4: binding site of MLAA-42 from the CAST-p server. Figure S5: binding site of MLAA-42 by SiteMap module. Figure S6: receptor grid generation by Schrodinger Suite. e grid generated with 80Å × 80Å × 80Å dimensions using receptor grid generation of Glide module. Figure S7: two-dimensional interactions between MLAA-42 with other ligand molecules. Figure S8: the surface accessibility of MLAA-42 and docked complex (L1). Red-colored peaks represent SASA value of MLAA-42 (after docking), blue-colored peaks represent before docking. X-axis represents the amino acid residues and Y-axis represents SASA value. Tables S1 and S2: the secondary structure details of MLAA-42 protein (S1) the α-helices in MLAA-42 protein. (S2) the β-strands in MLAA-42 protein. Table S3: binding site of the MLAA-42 by CAST-p server.