Specific and Nonhomologous Isofunctional Enzymes of the Genetic Information Processing Pathways as Potential Therapeutical Targets for Tritryps

Leishmania major, Trypanosoma brucei, and Trypanosoma cruzi (Tritryps) are unicellular protozoa that cause leishmaniasis, sleeping sickness and Chagas' disease, respectively. Most drugs against them were discovered through the screening of large numbers of compounds against whole parasites. Nonhomologous isofunctional enzymes (NISEs) may present good opportunities for the identification of new putative drug targets because, though sharing the same enzymatic activity, they possess different three-dimensional structures thus allowing the development of molecules against one or other isoform. From public data of the Tritryps' genomes, we reconstructed the Genetic Information Processing Pathways (GIPPs). We then used AnEnPi to look for the presence of these enzymes between Homo sapiens and Tritryps, as well as specific enzymes of the parasites. We identified three candidates (ECs 3.1.11.2 and 6.1.1.-) in these pathways that may be further studied as new therapeutic targets for drug development against these parasites.


Introduction
Recent estimates indicate that more than one billion people, living in tropical and subtropical regions of developing countries, are at the risk of contracting diseases (which are mostly endemic at these places) caused by the protozoans Leishmania major, Trypanosoma brucei and Trypanosoma cruzi [1][2][3]. These three microorganisms, together known as the Tritryps (family Trypanosomatidae, order Kinetoplastida), also cause the death of thousands of people every year [4]. Despite all these facts, these infirmities are still considered as neglected diseases by the health agencies [5].
The control of the diseases caused by these parasites depends nowadays on chemicals, vaccines not being commercially available so far. Besides, there is a very limited set of pharmaceuticals available at this moment: most of them were discovered at approximately 50 years ago, and they also have disadvantages like high toxicity, low efficacy, or high costs; the development of resistance is also a possibility [6][7][8]. However, with the recent publication of the Tritryps' genomes [9][10][11], new opportunities allowed a better understanding of several biological processes that, up to this point, were poorly understood or even unknown in these organisms [7,12].
Cellular functions are based on complex networks of chemical reactions that interact producing observable results. The rapid development of DNA sequencing techniques provided a huge amount of information leading to a new comprehension about the organization of cellular processes. First, by using annotation data, genes are classified in groups in accordance with their functions. Part of the gene products are enzymes, proteins that catalyze cellular reactions, making part of complex biochemical pathways. In the postgenome era, the study of these processes is gaining an importance, to improve the comprehension of the dynamics and regulation of these pathways, as well as the discovery of previously unknown steps [13,14].
The reconstruction of biochemical pathways is considered to be one essential step in the study of cellular processes [15]. Applications of these reconstructions may vary from 2 Enzyme Research the drawing of the biological system to the generation of testable hypotheses about the structure and working of the pathway and from the elucidation of complex properties not inferred by the simple description of the individual components to the recognition of potential drug targets against pathogenic organisms via the identification of essential steps in these processes [16]. Several methods and databases are available for the reconstruction of said pathways from genome information; one of the main resources for this task is the KEGG database [13,17,18]. One way to link the biological processes to the genomic information is through the EC numbers, which represent the reaction each enzyme catalyzes. There are other types of functional classifications, (reviewed by Ouzounis and collaborators [19]), but the EC classification system is certainly one of the most used by the scientific community.
Enzymes have a high degree of specificity for their substrates and are fundamental for any biochemical process. They act in an organized sequence, catalyzing successive reactions in enzymatic pathways, guaranteeing the maintenance of life in all organisms [20]. A particular group of enzymes, the nonhomologous isofunctional enzymes (NISE or analogous enzymes), executes the same function in different organisms, but without detectable similarity between their primary structures and, possibly, between their tertiary structures as well. Once analogy is detected between a pathogen's enzyme and its human counterpart, it may be possible to use this analog as a potential target for drug development, provided it belongs to an essential biochemical step of the pathogen. However, only a few studies have been done to identify and annotate isofunctional nonhomologous enzymes as such [21][22][23][24][25].
Maintenance of the genome depends on the efficiency and accuracy of DNA replication, as well as the repairing systems. Through a series of complex interactions, the genome is transcribed and in good part translated, in order to produce the RNAs and proteins necessary for the organism. These molecules form its structure or participate in important reactions. For these reasons, the pathways of DNA replication and repair, transcription and translation (some of the Genetic Information Processing Pathways (GIPPs)) comprise some of the most important processes for the organism survival [26,27] and were thus chosen as targets of this study.
Analyses of genomic data from L. major, T. brucei, and T. cruzi have provided a global view of the proteincoding genes that produce enzymes belonging to important pathways through the identification of several processes in common between these parasites and other species. A thorough examination of all this information may allow the identification of steps of the GIPPs that are particularly accessible to potential therapeutic interventions. New drugs may be also developed from inhibitors of specific biochemical processes essential to the parasite but absent in their hosts. In this work, we employed computational methods to identify not only specific but also nonhomologous isofunctional enzymes in the genetic information processing pathways of the Tritryps, enzymes that could serve as interesting candidates for further studies aiming at their validation as drug targets.

Predicted Protein Sequences of Tritryps.
The dataset of predicted proteins of Leishmania major, Trypanosoma brucei, and Trypanosoma cruzi was obtained from TritrypDB (http://tritrypdb.org/tritrypdb/) as shown in Table 1.

Pathways and Enzyme Classes.
A set of pathways (maps) referring to the replication and repair, transcription and translation processes was obtained from KEGG (http://www.genome.jp/kegg/pathway.html#genetic). This dataset contains a complete biochemical description of the pathways related to genetic information processing observed in different organisms. Functions comprising a certain pathway were extracted from these descriptions as a collection of EC numbers and were used as templates for the reconstruction of the correspondent pathways in Tritryps. Each pathway is associated with a set of proteins, usually a list of enzyme families with their EC numbers. KEGG has a total of 10 maps distributed among these pathways: 6 maps representing replication and repair; 2 maps symbolizing the transcription, but only one with an associated EC number; 2 translation maps of which only one has an associated EC number.

Clustering.
To group homologous enzymes with the same activity, we used the AnEnPi pipeline (http://www .dbbm.fiocruz.br/AnEnPi/) [22], which was based on a previous study in which enzymes are considered analogous (i.e., with different evolutionary origins) according to differences in their primary structures [24]. After clustering, enzymes within a given cluster are considered homologous, while enzymes in different clusters (of the same function) are considered analogous. As the cut-off parameter used in AnEnPi is based on experimental data obtained from enzymes, other values should probably be employed for other types of proteins.

Protein Function Inference.
Using another module of AnEnPi, we were able to infer function of the predicted proteins of trypanosomatids using the groups (or clusters) obtained after clustering. In this module, the EC number as-signment is based on the sequence similarity report from a BLASTP [28] procedure: predicted proteins of Tritryps (query) against the sequences of each individual AnEnPi cluster (subject), as described in detail in [22]. The cutoff employed for functional inference was the e-value of e −20 .

Genetic Information Processing Pathways Reconstruction and Search for NISE and Specific
Enzymes. The reconstruction of the GIPPs was performed using the data inferred by the AnEnPi pipeline. After functional inference, enzymatic activities shared by Tritryps were disclosed using scripts written in Perl language. NISE and specific enzymes were obtained through an examination of the groups (or clusters) produced after clustering, where sequences of Tritryps and Enzyme Research 3

Results and Discussion
3.1. KEGG, Clustering, and Enzymatic Activity Inference. The Tritryps' genomes were first sequenced in 2005 [9][10][11], with all chromosomes well characterized (with the exception of T. cruzi due to the high degree of repetitions in its genome). However, some of the GIPPs still present gaps [29]. The computational reconstruction of these processes, in this work, is an attempt to obtain a better representation of them, with emphasis on the analogous and specific enzymes. These analogs are enzymes that, even with a small or no significant similarity between their primary structures (which reflect in differences in their 3D structure), are able to catalyze the same reaction [24]. For these reasons, recent efforts have been made to include this phenomenon in the functional annotations [21,22,30]. Inference of function, if based only on sequence similarity, may be insufficient since they are usually not able to detect nonhomologous isofunctional enzymes.
Tritryps share a series of features, like the presence of subcellular structures such as the kinetoplast and glycosomes. Each trypanosomatid is transmitted by a different vector, possessing distinct life cycles, tissue specificity, and pathogenies in their mammal host [31,32]. In addition, they are considered "ancient" from an evolutionary perspective; in fact, they present peculiar mechanisms in some of the genetic information transmission processes. Many of these still have gaps to be filled [33]. In this context, we have compared the number of enzymatic activities shared among the three microorganisms (taking into account all pathways) and the unique activities based on the results obtained after clustering ( Figure 1). It may be worth noticing that some activities found have the same isoform (or, more precisely, analog form) in the three microorganisms; this may serve as a basis (ideally and depending on several other factors) for one unique drug for the three pathogens or (much more likely) a family of related/similar molecules as drugs.
KEGG has its own annotation protocol, which to our knowledge is not described in detail anywhere; only its general lines are known [17,34]. We opted to make a functional inference from all the predicted proteins of Tritryps, in order to have a unified and comparable data. For this, we performed a BLASTP of the available predicted proteins in the TriTrypDB against the obtained clusters. From this it was possible to infer functions not detected by  KEGG, in almost all pathways studied. Even using a very restrictive cut-off (e-value < 10 −20 ), more enzymes were identified (data not shown), indicating the validity of this approach. In fact, even after using more restrictive e-values, like 10 −40 or 10 −80 , results did not differ for several ECs (data not shown). With these information, some of the GIPPs were reconstructed. The description of the enzymatic activities found by AnEnPi for each Tritryp is listed in Table 2.  1.16), which catalyzes the direct production of Cys-tRNACys, is lacking [35,36]. However, we could not identify the second enzyme which completes the alternative formation of Cys-tRNACys, SepCysS. One possible explanation is that, while this pathway is essential to archaea (that do not possess the direct pathway for Cys-tRNACys formation), it is not for the Tritryps. Or yet, this enzyme has a particular gene sequence or structure, not yet examined experimentally. The enzymatic activity represented by EC 2.1.2.9 (methyonyl-tRNA formyltransferase), which is also part of the aminoacyl-tRNA biosynthesis map, was identified by KEGG only for L. major and T. brucei; this activity was identified by AnEnPi in T. cruzi. This enzyme is responsible for adding the formyl radical to tRNAMet, which serves as the tRNA initiator of the polypeptide chain during translation in bacteria. It has the same function in eukaryotes, acting in mitochondria [27]. Since mitochondria have a bacterial evolutionary origin, their translational apparatus follow the bacterial model. Genomic data of the organisms studied in this work consists mainly of nuclear DNA. The occurrence of this enzyme in nuclear DNA is in agreement with the observed absence of tRNA genes in the mitochondrial DNA of Tritryps (kDNA), which are imported from the cytoplasm [37][38][39].

Computational Reconstruction of the GIPPs.
DNA in cells is often under attack by mutagens, oxygen radicals, and ionizing radiation, and even cellular processes can create mutagenic and cytotoxic DNA lesions which can be lethal to the cell. Organisms possess broad mechanisms of DNA repair to fix damaged DNA and in order to keep viability and genomic stability [40]. In this context, we identified four enzymatic activities with complete EC numbers (four digits) from three DNA repair pathways: base excision repair (EC 3.1.11.2), nucleotide excision repair (EC 2.7.11.22), and nonhomologous end-joining (EC 2.7.11.1 and EC 2.7.7.7) ( Table 2).
The enzyme exodeoxyribonuclease III (3.1.11.2-Figure S3 and Table S4) is responsible to catalyze the degradation of double-stranded DNA acting progressively in a 3 to 5 direction, releasing 5 -phosphomononucleotides on base excision repair (BER) pathway. The enzymes of this pathway are conserved from bacteria to man, but mammalian enzymes frequently add in, within a larger structural framework, the catalytic core domains of bacterial enzymes [40,41].
Cyclin-dependent kinase (EC 2.7.11.22) from nucleotide excision repair (NER) is linked to a complex called holo-TFIIH complex ( Figure S4 and Table S5). This is a multiprotein complex required not only for transcription but also for nucleotide excision repair. This enzyme is responsible for the phosphorylation of the carboxy-terminal domain (CTD) of RNA polymerase II in the absence of promoter opening [42].
Nonhomologous end-joining (NHEJ) is a kind of recombination that links the ends from broken nonhomologous chromosomes. The core NHEJ components are conserved from yeast to mammals and consist of the XRCC4/DNA-Ligase IV complex and the Ku70/Ku80 heterodimer. Both protect exposed DNA of degradation. First, the catalytic subunit, formed by DN-APKcs (EC 2.7.11.1-nonspecific serine/threonine protein kinase) and Artemis, is recruited. The DNA-PKcs phosphorylate the Ku heterodimer and also the Artemis complex which corresponds to a nuclease. Interactions between such protein complexes approximate  the chromosomal ends. Another enzyme whose participation is essential in such complex is the DNA-directed DNA polymerase (EC 2.7.7.7) which fills in the gaps when the ends are joined ( Figure S6 and Table S7) [43][44][45].

Specific Enzymes and Functional Analogs between the Tritryps and Homosapiens as Potential Therapeutic Targets.
Data produced by the genome projects of the Tritryps allowed researchers to establish new strategies to solve the problems caused by these diseases, which affect a great percentage of the world's population [46]. The majority of the proposed drugs so far were discovered many years ago and several of them are toxic, have low efficacy, and the risk of resistance development is also a possibility [7]. To search for functional analogs that could serve as potential candidates as drug targets, we looked for the presence of these enzymes between the Tritryps and H. sapiens, by comparing their primary structures. One case meeting these criteria was identified: the exodeoxyribonuclease III (EC 3.1.11.2) from BER pathway.
Exodeoxyribonuclease III is an exonuclease that cleaves the 5 side of an AP (apurinic/apyrimidinic) site, acting in 6 Enzyme Research the repair pathway by base excision [47]. In Escherichia coli this enzyme is a DNA-modifying enzyme, very frequently used in molecular biology, which degrades single-stranded DNA as a substrate. We searched for more information about the inhibitors of this enzyme in the BRENDA database (http://www.brenda-enzymes.org/). According to Hoheisel [48], double-stranded DNA was found to be a competitive inhibitor of the enzyme activity. Other known inhibitors are EDTA (Ethylenediamine tetraacetic acid) [49], Mn 2+ at concentrations above 5 mM [50], NaCl [48], pchloromercuribenzoate [51], PNA (Peptide nucleic acids) [52], and ZnCl 2 [51,53].
Apurinic/apyrimidinic sites are very toxic to cells if not repaired. These sites can be generated by normal aerobic metabolism, UV light, or H 2 O 2 . Exodeoxyribonuclease III (xthA gene) can be considered a relevant target for Tritryps because it plays an essential role in the BER pathway, a key repair system to neutralize DNA oxidative stress. E. coli xthA mutant strains hold a residual AP endonucleolytic activity due to the protein encoded by the nfo gene, the endonuclease IV (Endo IV). Mutants of nfo or xthA genes are generally sensitive to oxidizing agents [54]. Some authors pointed out that Exo III is involved in the protection of E.coli cells against the toxic effects of UV light, H 2 O 2 [54][55][56][57] and is necessary to induce DNA damage repair [58].
Moreover, we have also identified a potential therapeutic target unique for L. major, the DNA 3-methyladenine glycosilase II (EC 3.2.2.21). This enzyme consists in a glycosilase which breaks the bond between alkylated nitrogenated bases and their phosphate group, removing it and leaving an AP site [59,60].
O-phosphoseryl-tRNA synthetase (EC 6.1.1.-), assigned to the Aminoacyl-tRNA biosynthesis map, was identified as a specific activity in Tritryps when compared with H. sapiens. This enzyme, today designated by the EC number 6.1.1.27, catalyzes the alternative formation of Cys-tRNACys [61], as previously described.
The TDR Targets database (http://tdrtargets.org/) integrates genetic and biochemical information to pharmacological data, all related to (primarily) tropical pathogens. The main objective is to assist the search for targets using an integrative platform [62]. None of the two ECs identified (EC 3.1.11.2 and EC 6.1.1.27) had any information related to the Tritryps in this database. This suggests that the approach used in this work may increase the number of possible drug targets. However, exodeoxyribonuclease III (EC 3.1.11.2) is assigned as a potential target in this database, but for other organisms. In addition, DNA 3-methyladenine glycosilase II (EC 3.2.2.21), which in this work was identified only in L. major, is also assigned as a potential target (again, for other organisms, not for Leishmania).
None of the enzymatic functions disclosed in this work has a resolved 3D structure in the PDB database for any of the Tritryps. Use of resolved 3D structures, as well as other types of information like functional studies, is paramount to advance research on these enzymes, to ensure that they are indeed possible targets for drug development. In the present work, we have studied only a part of the pathways assigned to the GIPPs in KEGG. We have left aside other important pathways such as those related to protein folding, sorting, and degradation, consisting in about 7 additional maps with several enzymes. Moreover, KEGG has already integrated more information and maps to the GIPPs, since it is updated weekly. In the future, a thorough reevaluation of the available data may disclose new cases of analogy and/or new specific enzymes.
The utilization of computers is constantly increasing in the field of drug discovery, because of the great potential in speeding up the identification of suitable targets and useful compounds and also (arguably the most important feature) in reducing costs. In this work, the development and utilization of computational methods allowed us to identify, in the genetic information processing pathways of Tritryps, specific and nonhomologous isofunctional enzymes (NISE). The identification of NISE allowed the construction of an enriched list of proteins (containing not only organismspecific enzymes) that must be further studied to be validated as drug targets. Among these studies, we can cite (i) the obtention of crystals of the selected proteins to allow the construction of 3D models by molecular modeling, (ii) molecular dynamics and docking studies, to obtain a refined representation of their structure, including movement and possibly other interacting molecules as well, and (iii) a series of functional studies to determine their kinetics, expression patterns, stability, essentiality, and so forth.