In Silico Phylogenetic Analysis and Molecular Modelling Study of 2-Haloalkanoic Acid Dehalogenase Enzymes from Bacterial and Fungal Origin

2-Haloalkanoic acid dehalogenase enzymes have broad range of applications, starting from bioremediation to chemical synthesis of useful compounds that are widely distributed in fungi and bacteria. In the present study, a total of 81 full-length protein sequences of 2-haloalkanoic acid dehalogenase from bacteria and fungi were retrieved from NCBI database. Sequence analysis such as multiple sequence alignment (MSA), conserved motif identification, computation of amino acid composition, and phylogenetic tree construction were performed on these primary sequences. From MSA analysis, it was observed that the sequences share conserved lysine (K) and aspartate (D) residues in them. Also, phylogenetic tree indicated a subcluster comprised of both fungal and bacterial species. Due to nonavailability of experimental 3D structure for fungal 2-haloalkanoic acid dehalogenase in the PDB, molecular modelling study was performed for both fungal and bacterial sources of enzymes present in the subcluster. Further structural analysis revealed a common evolutionary topology shared between both fungal and bacterial enzymes. Studies on the buried amino acids showed highly conserved Leu and Ser in the core, despite variation in their amino acid percentage. Additionally, a surface exposed tryptophan was conserved in all of these selected models.


Introduction
2-Haloalkanoic acid dehalogenase enzymes (EC 3.8.1.2) are present in many bacteria and fungi which in the presence of water catalyze the conversion of (S)-2-haloacid to (R)-2hydroxyacid with halide as product [1][2][3][4]. The basic scheme for the reaction is given as follows: Consequently, 2-haloalkanoic acid dehalogenase may be worthy for its bioremediation mechanism for different haloacid pollutants. Many microorganisms can break down halogenated compounds by cleaving their carbon-halogen bonds via dehalogenase-catalyzed reactions and, therefore, may aid in the removal of organohalides pollutant from the environment [5][6][7]. These dehalogenase enzymes are broadly classified as haloalkane dehalogenases, halohydrin dehalogenases, haloacetate dehalogenases, dichloromethane dehalogenases, and D-and L-haloalkanoic acid dehalogenases based on their cleavage nature [8,9]. Several microorganisms may produce more than one dehalogenase that might give them a survival advantage under fluctuating environmental conditions [10]. Although various dehalogenases have been grouped together, the classification may not indicate sequence similarity among the proteins. These enzymes differ in their optimum pH for activity, size and subunit structure, electrophoretic mobility under nondenaturing conditions, and substrate specificity [11,12]. Currently, the haloacid dehalogenase enzymes from both bacterial and fungal sources receive greater attention because of their potential use in biotechnological applications in the bioremediation of haloacid environmental pollutants [13,14].
In addition to the above, a structure based analysis of the enzyme is also important for proper understanding. Unfortunately, there are no experimental 3D structures of haloacid dehalogenase from fungal sources available till date. The objective of the present study is to analyse the sequence and structural relationship of 2-haloalkanoic acid dehalogenases from different bacterial and fungal sources by implementing several computational methods from the retrieved primary protein sequences.

Materials and Methods
The full-length primary protein sequences of 2-haloalkanoic acid dehalogenase from bacterial and fungal sources were retrieved from the NCBI database (http://www.ncbi.nlm.nih .gov/protein/). The amino acid composition of these sequences was computed using PEPSTAT module integrated in the EMBOSS software [20]. Multiple sequence alignment for individual profiles was performed using MUSCLE and phylogenetic analysis using MEGA 6 software [21]. The discovered motifs were further used to search their protein family using Pfam at the DDBJ MOTIF server (http://www .genome.jp/tools/motif/). The UPGMA and neighbour joining tool from MEGA 6 package were employed for visualizing the phylogenetic tree pattern. The phylogenetic tree was tested for statistical reliability by bootstrapping the analyses with 200 replications. From the cluster observed, the bacterial and fungal sequences were predicted for 3D structure using I-TASSER server [22]. Validations of these models were done by Ramachandran plot, ERRAT, and Verify-3D computation. Conservation of amino acid residues was computed by Consurf server [23]. The core amino acids of the fungal and bacterial structural models were computed by IPFP tool [24] and the conservation pattern of the core and the surface amino acid residues was analysed.

Results
From   and the average % ase is shown in Figure 1. In the boxplot, the unevenness distribution of the amino acids indicates different amino acids that contribute differently in their distribution pattern in the 2-haloalkanoic acid dehalogenase enzymes. The amino acids close to zero range are cystine, histidine, lysine, asparagine, tryptophan, and glutamine. There is little variation in the rarest amino acids like cysteine (C), methionine (M), and tryptophan (W) obtained. Since the hydrophobic amino acids occur in small numbers in the proteins, hence they do not make a significant contribution to their occupancy/diversity in the selected enzymes of both fungi and bacteria. Highest variability was observed in case of the alanine (A). Glycine (G) and aspartic acid (D) show the same median level and hence might have similar effect in their distributions. Distribution of isoleucine (I) in the enzyme sequences was observed to be anomalous as it contains many outliers followed by threonine (T) and asparagines (N).

Protein Motif and Family
Detection. All fungal and bacterial enzyme sequences associated with haloacid dehalogenase-like hydrolase motif were obtained. Thirty unique motifs were identified that are unique for the group of enzymes selected for this study. Details result has been given in Supplementary Material 2 (blue highlight).

Multiple Sequence Alignment and Phylogenetic Analysis.
The alignment of all selected sequences was analysed using freely available Accelrys DS visualizer software (http:// accelrys-discovery-studio-visualizer.software.informer.com/). From this computation, a conserved pattern of 4 amino acids was obtained for all the group of sequences ( Figure 2).
Further, phylogenetic analysis of sequences of bacteria showed major clusters based on fungal or bacterial species. However, one subcluster of NJ (neighbour joining) tree comprised of 2 fungal (Metarhizium robertsii and Fusarium oxysporum f. sp. cubense race 1) and bacterial (Staphylococcus massiliensis, Solemya velum gill symbiont) species was obtained (Figures 3 and 4). Also, two outgroup sequences were obtained, one for bacteria (Thermus scotoductus) and one for fungi (Beauveria bassiana D1-5). Similarly, almost the  same pattern was obtained when the UPGMA (Unweighted Pair Group Method with Arithmetic Mean) method was used for construction of the phylogenetic tree except very few exceptions. In this method, two bacterial outgroups with one fungal outgroup were obtained. Then, as a case study, to revisit the homology among the bacterial and fungal species, the above 4 enzyme sequences were further analysed by molecular modelling method.

Structural Modelling and Analysis of Conserved Core and
Exposed Amino Acids. The initial search for homologous structures in the PDB using BLAST tool resulted in no hits (≥40% identity); therefore, ITASSER (a threading program) server was used for 3D structure prediction. Four suitable models for the given species of fungi and bacteria were obtained; upon analysing their structures, their topological models were generated using proorigami tool (http://munk.csse.unimelb.edu.au/pro-origami/porun.shtml). From the results, a similar topological pattern was observed in their structure that is highly conserved in both bacteria and fungi ( Figure 5). The models were then validated for any steric clashes and reliability using a Ramachandran plot from Rampage server (Table 1) and the ERRAT and Verify-3D profile available in the SAVES server, respectively ( Figure 6).
Errat is a sensitive method for protein 3D structure validation. It computes the statistics of nonbonded interactions among atoms in the model structure in comparison with a database of high-resolution structures and provides the output as overall quality factor. The error values are also plotted as a function of the position of a sliding 9-residue window. In general, the more the quality factor, the better the quality of the protein structure [25]. Similarly, Verify-3D is another program that predicts the compatibility of a protein 3D structure with its own amino acid sequence by assigning    a particular structural class, namely, alpha, beta, loop, polar, nonpolar, and so forth, based on the position and the environment. The output given by the Verify-3D is a plot consisting of amino acid residues in -axis and 3D-1D compatibility score [26]. The computed result for the four protein models (presented in Figure 6) indicates their structural reliability.
3.5. Core and Exposed Residue Conservation Study. The above computed four predicted models were then fed to Consurf server to study the conserved amino acid residues (Figure 7). Again, analyses of these conserved amino acids in the protein core were computed using the IPFP software. IPFP is a free integrated software tool that consists of several combination of modules, out of which core finder module has been used to compute the core amino residues (http://mcbi.mitsbiotech .org.in/software/ipfp.rar). First, the (IPFP) software computes the solvent accessible surface area of all residues by Naccess program [27] from the given protein data bank (PDB) file by user defined probe size. After this, those computed amino acid residues having solvent accessible surface area are zero predicted as core residues. Results from both the above tools are summarized and presented in Table 2.
Similarly, the presence of aromatic amino acids position in the protein surface (those are not present in the core) was it is possible to reject regions that exceed that error value. * * Expressed as the percentage of the protein for which the calculated error value falls below the 95% rejection limit. Good high-resolution structures generally produce values around 95% or higher. For lower resolutions (2.5 to 3A), the average overall quality factor is around 91%. analysed and presented in Table 3

Discussion
From the current study, a clear-cut definable similarity was obtained at both sequence and structural level study while analysing the sequences from different source of organisms as explained above. Out of four conserved residues obtained after multiple sequence analysis, lysine and aspartic acid were observed as fully conserved, while cysteine and tyrosine are partially conserved in all bacterial and fungal sequences ( Figure 2). Previous computational study and crystallographic structure prediction suggest the presence of partially conserved cystine residues in haloacid dehalogenase enzymes in bacterial species, also responsible for the thermostability in archaea [28,29]. However, due to lack of crystal structure of 2haloalkanoic acid dehalogenase in fungi, no such information is available in the literature. Also, the result suggests that amino acids lysine and aspartate play a very important role in 8 Advances in Bioinformatics  -198  ILE-14, ILE-36, MET-126,  LEU-130, VAL-180, VAL-190   ALA-7, LEU-10, THR-13, LEU-15, ALA-71, LEU-104,  SER-119, GLY-121, SER-129, SER-150, SER-184, ALA-193  the evolution of 2-haloalkanoic acid dehalogenase sequences from prokaryotic organisms (bacteria) to eukaryotic organisms (Fungi). The fully conserved lysine and aspartic acid in case of haloacid dehalogenase superfamily have been obtained previously and it is proposed that they might be involved in the catalytic site of these enzymes which involves the dehalogenation of xenobiotics [30]. Further study about the site directed mutagenesis experiment reported previously also confirmed the importance of these two residues [31,32].
Functional similarities with some common motifs that are unique for the group were observed. Above all, the presence of clusters for bacteria and fungi provides a clear indication about the evolutionary relationship between the species at molecular level which was again confirmed by structural analysis using Consurf server. Usually, the core region possesses more hydrophobic type residues that are distinct from the rest of the protein architecture. This type of arrangement corresponds to different contributions to binding energy, stability, and so forth [33]. As a common phenomenon, in a protein, the substrate/solvent interacting sites are more conserved in comparison to other sites as core region. But from our analysis on core conservation, buried serine was frequently observed. However, alanine was observed to be more conserved in case of fungal enzyme structure and leucine residues were observed to be conserved in the bacterial enzyme. The presence of a conserved surface exposed tryptophan in the structures indicated multifunctional roles. At times, the exposed aromatic residues were found to be involved in the binding of substrate and activity [34,35].
One of the reasons for conservation could be to resist the differential evolutionary pressure to make the protein stable [36,37]. In other cases, this aromatic amino acid plays a major role in the dimerization of proteins due to their hydrophobicity and as reported in other studies dimerization of the 2-haloalkanoic acid dehalogenase enzymes holds good for the phenomena [38].

Conclusion
Patterns of sequence conservation in case of 2-haloalkanoic acid dehalogenase provide a clear evolutionary relationship among bacteria and fungi in both sequence and structural level. Sequences from bacteria and fungi have fundamental functional relationship, as they have motif identity. On the other hand, due to nonavailability of 3D structures for fungal 2-haloalkanoic acid dehalogenase enzymes, structural modelling was performed to predict the 3D structure. The results illuminate structure-function relationships in 2-haloalkanoic acid dehalogenase, suggesting roles for conserved residues in the mechanism of conformational change during catalysis of haloacid pollutants. The phylogeny provides a rational evolutionary framework to classify these enzymes. This in silico analysis of 2-haloalkanoic acid dehalogenase enzyme sequences revealed sequence level similarity which could be further utilized for designing strategy for cloning putative genes based on PCR amplification using degenerate primers. In our follow-up study, the role of exposed tryptophan in case of these enzymes will be analysed further experimentally.