Cystathionine 𝛽 -Lyase-Like Protein with Pyridoxal Binding Domain Characterized in Leishmania major by Comparative Sequence Analysis and Homology Modelling

Cystathionine 𝛽 -lyase-like protein (CBLP), one of the key enzymes involved in methionine biosynthesis utilising pyridoxal phosphate (PLP) as a cofactor, has recently been reported in Leishmania major . Its presence in the parasite and absence in humans warrant its full characterisation and fruition as a potent, selective, and inevitable druggable target. Due to the unavailability of X-ray 3D structure of CBLP, a homology model for this protein was developed for the first time. The model was evaluated for PLP binding siteandvariousconservedomainresiduesoftheproteinrecommendedbycomparativesequenceanalysesbydifferentproteinanaly-sistools.Themodelwasvalidatedanddiscoveredtoberobustandstatisticallysignificant.Thefinalmodelwassuperimposedontemplateof Arabidopsis thaliana (PDBID:1IBJ)and RMSD wasfound tobe0.486. ThePLPbinding siteresiduesof boththeproteins were ensued to be highly conserved indicated by Gly71, Met72, Tyr95, Asp169, and Ser193 as well as formation of aldimine bond with Lys194. This was further verified through molecular simulation of PLP into the cofactor binding site of the modelled protein. The present study may therefore play a directing role in the designing of novel, potential, and selective antileishmanial agents.


Introduction
Cystathionine -lyase (CBL, EC 4.4.1.8) found in plants and microbes is a pyridoxal-5 -phosphate (PLP)-dependent homotetrameric enzyme of approximately 35-40 kDa and catalyses the penultimate step of methionine biosynthesis, that is, , -elimination reaction, where cystathionine is hydrolysed into pyruvate, ammonia, and homocysteine ( Figure 1) [1,2]. Subsequently, homocysteine gets methylated to methionine. The overexpression of CBL encoding metC gene in D-alanine gene knockout models of several bacteria was found to be significant for their sustenance in host cell environment [3]. In addition, CBL also dictated the virulence of Salmonella enterica [4] thus playing a decisive role in its pathogenicity. Although, in 2007, Ejim et al. explored CBL as a drug target in bacteria [5], its role in protozoan is yet to be delved into.
Recently, the decoding of the Leishmania genome [6,7] facilitated in extracting significant information regarding the presence of a new homolog of CBL, cystathionine -lyaselike protein (CBLP), a 50 kDa translated peptide of 459 amino acids which shares 48.05% structural resemblance with CBL of Arabidopsis thaliana. This information also revealed that 64-447 amino acids make up the core domain sequence which comprises the active-cofactor binding site [8]. Leishmaniasis or Kala-azar is a neglected, major tropical vectorborne protozoan disease endemic in different parts of about 88 countries, where it is identified to have claimed the lives of approximately 350 million people [9][10][11][12]. The current status of available treatments for leishmaniasis is highly concerning as depicted by their lack of selectivity, adverse effects, toxicity, and persistent development of resistance [12][13][14]. The presence of CBLP in the parasite and absence in humans [  warrant its full characterisation and fruition for it to be exploited as a potent, selective, and indispensable druggable target. X-ray 3D structure of CBLP of Leishmania is not known so far. Therefore, we have made an attempt to develop a homology model for this protein for the first time. This 3D homology model will not only unwind the structural features and assist in rational drug design of the compounds, but also resolve the toxicity-related issues of the existing antileishmanial drugs.

Materials and Methodology
Sequence of cystathionine -lyase-like protein in Leishmania (Leishmania major) was retrieved from NCBI (Accession: XP 00101687646, GI: 157866509).

Template Selection and Sequence Alignment.
Protein homology modelling of target sequence required a template which was provided by MOE Protein Data Bank (PDB) search tool. Preliminary FASTA scanning calculated the E-value for every sequence [15]. After scanning, only those sequences whose E-value persisted between cutoff and accepted E-value were selected and were consequently subjected for Z-score calculation which would in turn determine statistical significance of the alignment score. The resulting sequences were then checked for percentage residue identity alignment with the target sequence by MOE Alignment tool. In the following alignment step, the chosen sequences were aligned and computed by optimise function of substitution matrix (BLO-SUM62) and position-specific gap penalties. Similarly, for every template sequence, the alignment score was estimated [16]. The freezing of chain was noteworthy to perform a sequence-to-group alignment between the target sequence and the resulted templates.

Homology Modelling.
Only the target sequence of CBLP with no associated atomic data and 1IBJ as template sequences of known associated atomic coordinates were selected for designing the final model. Subsequently, target sequences were copied from the regions of conserved residues of template sequence which generally are heavy atom, backbone coordinates, or disulfide bonds. Some residues might still lack assigned backbone coordinates and be either in loops, outgaps, or in deletion regions, collectively referred to as indels, and their model was generated from fragments of highresolution chains retrieved from the PDB [17] which superimpose well onto anchor residues on either side of the insertion area. If no segment matching a specified RMSD criteria is found, then the segment search is also permitted to back off from the indel regions [18]. This resulted in acquiring a set of independent models. It is then followed by modeling of loops in a random manner. For each loop, a contact energy function was calculated and further evaluated by Boltzmann function. Once modeling of all loops is complete, side chains were modeled using information collected from an extensive rotamer library generated by systematic clustering of highresolution PDB data. Consequently, optimal packing was also attained by Unary Quadratic Optimization. Hydrogen atoms were added to intermediate model, and a series of energy minimizations was done to alleviate the steric clashes and other structural peculiarities by preloaded Amber99 force field [19] along with reactive field salvation [20]. The intermediate models were subjected to steepest optimisation (2500 times) followed by conjugate energy minimization function until RMS gradient of the potential energy fell below the 0.01 KJ-mol −1 -Å −1 that provided an optimised model. (ii) Electrostatic solvation energy, estimated by generalized Born/volume integral (GB/VI) methodology [21]. (iii) Knowledge-based residue packing quality function which allows for a statistical determination of the molecular dynamic environment of an individual amino acid residue where each residue is classified according to its solvent accessibility, hydrogen bonded contacts, and polar versus nonpolar ratio contacting atoms in its own molecular ambient surrounding. (iv) Effective atomic contact energy (ACE) comprises the effective energy difference resulting from replacing the residue-solvent contacts with residue-residue contacts based on the observed contact frequencies in a database of high-resolution protein structures [22,23]. The model was then further evaluated for protein geometry [24] by Structural Analysis and Verification Server tools (SAVES) (PROCHECK (Ramachandran plot) [25], ERRAT [26], VERIFY3D [27], and ProSA [28,29]). The PROCHECK analyses provide an idea of the stereochemical quality of all protein chains in a given PDB structure. These tools highlight the region of proteins which appear to have unusual geometry and provide an overall assessment of the structure as a whole. The parameters generally checked by PROCHECK are covalent geometry, planarity, dihedral angles (Ramachandran plot), chirality, nonbonded interactions, main-chain hydrogen bonds, disulphide bonds, and stereochemical parameters. The Ramachandran plot displays the and backbone conformational angles for each residue in a protein. In Ramachandran plot, the core or allowed regions are the areas that show the preferred regions for / angle pairs for residues in a protein ( Figure 2). If the determination of protein structure is reliable, most pairs will be in the favoured region of the plot, and only a few will be in the disallowed region. For those amino acids which did not comply with the consistent stereochemistry designated by Ramachandran plot, further energy minimisation was done.
The ERRAT program analyses the statistics of nonbonded interactions between different atom types, to give the plot of value of the error function versus position of a 9-residue sliding window. By comparison with statistics from highly refined structures, the error values have been calibrated to give confidence limits. Those regions which have an error values cutoff greater than 95% in ERRAT plot were selected for loop modelling one by one (see Supplementary Material available online at http://dx.doi.org/10.155/2013/520435). Finally, a model exhibiting good PROCHECK verification and ERRAT plot was subjected to native protein fold evaluation using ProSA (Figure 3) [28]. The program examines a PDB file and generates a score based on the quality of the local structure surrounding each residue, based on the typical ranges of dihedral angles and side chain contacts observed in real proteins. This program generates a plot which gives a measure of structure error at each residue in the protein. It also calculates an overall score for structural quality. The overall measure of quality is given at the top of the plot, and each bar in the histogram is shaded according to the significance of the local structural error. Higher the significance, with greater confidence a particular amino acid can be rejected. In ERRAT plot, two lines are present on the error axis to indicate the confidence with which the residues that exceed error value can be rejected. The overall quality factor is expressed as the percentage of protein for which the calculated error value falls below the 95% rejection limit. In Verify 3D, the accuracy of the 3D model is ascertained by comparing it with its own sequence using a 3D profile calculated using atomic coordinates of its structure. 3D profiles of correct protein structure have high S scores in contrast with poor structures with low S scores. The S score/3D-1D profile score is the sum over all residue positions of 3D-1D score for amino acid sequence of the protein (Figure 4) [27].

Characterization of PLP-Binding Domain.
The necessity to validate the precision of the homology model in physiological context resulted in carrying out comparative sequence analysis by Conserved Domain search tool of NCBI (CDsearch tool) [30,31], where we characterized PLP conserve interactions against cofactor binding domain ( Figure 5). In this comparative sequence search, different PLP-dependent enzymes were taken into account which were involved in cysteine-methionine metabolism in different species and also possessed alignment similarity with our query sequence (see supporting information). This step was quite helpful in establishing that the conserve residue interactions remain conserved throughout the different species and also helped us to validate the homology model against the PLP-conserved domain interactions to confirm the authenticity of the model 4 ISRN Computational Biology as an enzyme. The conserved residues interactions with PLP can also be used to check the acceptability of the homology model. If the corresponding conserved amino acid residue interaction were found to be retained in the homology model and exhibit similar interactions with PLP as that of reported in vitro studies, then it would validate the model. This was further observed by docking the PLP in the homology model.  Amber99 potential model in reactive field solvation. Subsequently, the PLP was also built through MOE molecular builder tool where partial charges were added and further minimised by MMFF94x force field [32]. Finally, the PLP structure was saved as database PLP.mdb extension for further computational studies. Here, MOE site finder tool was utilised to find the active site of interest. Based on the geometric methods, it identifies the hydrophobic and hydrophilic regions. Alpha spheres (geometrical 3D point) on the sites were calculated using a computational geometric algorithm, modified Delaunay triangulation. Finally, a small number of alpha spheres were retained on the basis of ambient hydrogen bonding. Clustering of the alpha spheres using a single-linkage clustering algorithm (i.e., connection distance 2.5Å) was finally performed [33]. This process resulted in series which contained possible amino acid residues, potentially able to become an active site. The series containing Gly134, Met136, Tyr158, Ser254, Thr256, and Lys257 was selected as it possessed the most conserve amino acid residues. Further, PLP was also docked to characterize the conserved cofactor (PLP) binding domain and subsequently evaluated for the conserve amino acid interactions within the same cofactor binding cavity. In docking 10,00,000 poses, scoring-London DG and Placement-Triangle matcher as energy functions were chosen. The 5000 top scoring poses were retained.

Homology Modelling and Model Refinement.
For homology modelling, CBL of Arabidopsis thaliana (PDB: 1IBJ) was selected as template, having 47.4% pairwise sequence identity and 48.01% similarity. The prepared model was further evaluated against the protein geometry by using different tools of SAVES. PROCHECK identifies 8 amino acids having stereochemical clashes with respect to Ramachandran plot. Among them, 2 were in the active-cofactor binding domain (Asn131 and Lys194) and were subsequently energy optimised ( Figure 2).
The backbone dihedral distribution of all the amino acids residues of homology model of CBLP, calculated by Ramachandran plot, highlighted that there are 84.5% residues in the most favoured region, 13.5% in the allowed region, 0.9% in the generously allowed region, and 1.2% in the disallowed region, while ERRAT plot (see supporting information) and Verify3D also exhibited 75.401 overall quality factor, and 81.77% of the residues had an average 3D-1D score less than 0.2 (passed), respectively. Consequently, ProSA displayed quite similar overall model quality (Z-score) for homology model (−8.34) and template (−9.55). The Z-score is the structural attribute of any protein, exhibiting the quality of model in terms of measuring the deviation of the total energy of the structure with respect to an energy distribution derived from random conformations. The negative Z-score of homology model is mandatory to hold the characteristic of being a good model as indicated in Figure 3. These results exhibited high quality of homology modelled protein and its reasonable accuracy in terms of dihedral distribution and stereochemistry features. Meanwhile, ERRAT plot and comparative similarity in Z-score (estimated through ProSA) further increase the confidence of the acceptability of homology model. Finally, this was also confirmed by superposition of optimized homology model against the template and the RMSD surprisingly found to be 0.486 ( Figure 6).

Characterization of Pyridoxal Binding Domain.
With a view to validate the model, we decided to characterise conserve PLPbinding domain of different enzymes in various species exclusively involved in cysteine-methionine metabolism, achieved successfully by comparative sequence analysis. This kind of consideration gave different types of generalized interactions which never varied from enzymes to enzymes in different species (see supporting information). Comparative sequence analysis gave rise to the following conclusions. (i) YXR interacts with the phosphate head of PLP, and X could be majorly Ser, Thr, or Gly or it might also be Lys or Ile. This interaction was found to be significant for stable conformation although they were not considered during our investigation owing to the fact that these conserved amino acids were present on the next chain of the enzyme; but later on this was proved by docking PLP against the template. (ii) SGX, GX interact with the PLP where X could be mutated, while mostly conserved Ser assists both amino residues to attain a better pose with PLP binding domain. (iii) Y, Tyr belong tostacking and/or arene-cation interaction and help in holding pyridinium ring at particularly one conformation, while the NCBI designated GI: 16081640 and GI: 15899124 sequences that exhibited mutated Phe amino residue at the place of Tyr still exhibited the same interaction. (iv) D, Asp amino acid neutralise the physiological active ionised nitrogen of pyridinium ring. (iv) SXTK, most peculiar PLP binding site contains Lys which forms aldimine bond with the carbonyl group of the PLP, while serine and Thr were mostly conserved and have interactions with phosphate head. Through comparative sequence analysis, we got a better view of PLP binding conserved residues and their position in the peptide chain of different proteins. This proved to be significant in the validation of designed homolog of CBLP against the conserved amino acid residue interactions.
The results of docking studies of PLP against template (CBL of A. thaliana; PDB ID: 1IBJ) revealed that PLP has a similar binding pose as in X-ray structure (Figures 7 and 8). Also, it was concluded from this docking protocol (provided by MOE program) that adopted methodologies were enabled to generate similarly binding pose as experimentally derived. This was the part of validation of docking protocol. The RMSD between docked and cocrystallized PLP was found to be 0.4483 (Figure 7). Similarly, this docking protocol was applied for docking of PLP into the cofactor binding site of modeled protein (Figures 9 and 10). The results of these studies are compared in Table 1 with docking of PLP with template protein. During docking analysis, it was found that 7 best docked poses of PLP-modelled protein were having conserved amino acid interactions. The intensity of PLP interactions with conserved amino acid of PLP-binding domain ( Figure 5) in homology model was evaluated ( Table 2). The interactions with Tyr95, Asp169, Ser191 and Lys194 were consistently found in most of the docking poses, while interactions with Gly71 and Met72 were less frequent. Although the Thr193 was missing in these dockings, but it was present in other 14 docking poses suggesting its presence in conserve domain. Moreover, the most peculiar interactions confirming the validation was observed and found to be a highly frequent interaction in most docking poses, where aldehyde group of PLP and -amino group of Lys194 were in optimum molecular distance to form an aldimine bond, as this bond formation is the trait of PLP-dependent enzyme.

Conclusions
Presence of CBLP in leishmanial protozoan and its absence in human make it a selective and rational target for antileishmanial drugs. The homology model of CBLP Leishmania major was built using high homologous protein (Cystathioninelyase of Arabidopsis thaliana) for the first time. The generated model was subjected to various validation methods which proved that it held significant degree of authenticity. The high sequence identity was found between cofactor (PLP) site residues of both the proteins, which commanded the use of template structure as a reference for comparing the results  Tyr95 (OP1, -interaction, arene-cation interaction), Asp169(N-H) Ser191(OP1, OP2) Lys(OP1, -CHO) * These conserved residues are present on the next chain of the enzyme hence not considered here. # Susceptible to undergo an aldimine bond formation. E conf: energy of PLP conformation in that particular pose; E plac: energy of the PLP-protein pose at particular place; E-score: energy scoring functions emphasize favorable hydrophobic, ionic, and hydrogen bond contacts. of homology modelling and docking studies. The docking studies elucidated that PLP binds in a similar manner as in the template and interacts with highly conserved residues. These interventions also recommended that the observed interactions were of paramount importance for binding of ligand (PLP) into the active site of CBLP. These outcomes may be helpful in understanding the mechanism of action of CBLP and provide a platform for designing novel inhibitors of CBLP as antileishmanial agents. Lys194 -CHO, OP1, OP2, OP3 High * * On the basis of the number of times the conserved residue was found in a docking pose out of 100 poses. Less: not more than 15 times; mild: not more than 30 times; high: not less than 50%, that is, out of 100 docking poses minimum, we get this interaction50 times.
Bathinda, for providing the facilities to carry out the present work. The Authors are also thankful to Professor P. Ramarao, Dean Academic Affairs, Central University of Punjab, Bathinda, for his constant encouragement and to Central University of Punjab (CUPB) Library Communication Number P33.