A Survey of Nucleotide Cyclases in Actinobacteria: Unique Domain Organization and Expansion of the Class III Cyclase Family in Mycobacterium tuberculosis

Cyclic nucleotides are well-known second messengers involved in the regulation of important metabolic pathways or virulence factors. There are six different classes of nucleotide cyclases that can accomplish the task of generating cAMP, and four of these are restricted to the prokaryotes. The role of cAMP has been implicated in the virulence and regulation of secondary metabolites in the phylum Actinobacteria, which contains important pathogens, such as Mycobacterium tuberculosis, M. leprae, M. bovis and Corynebacterium, and industrial organisms from the genus Streptomyces. We have analysed the actinobacterial genome sequences found in current databases for the presence of different classes of nucleotide cyclases, and find that only class III cyclases are present in these organisms. Importantly, prominent members such as M. tuberculosis and M. leprae have 17 and 4 class III cyclases, respectively, encoded in their genomes, some of which display interesting domain fusions seen for the first time. In addition, a pseudogene corresponding to a cyclase from M. avium has been identified as the only cyclase pseudogene in M. tuberculosis and M. bovis. The Corynebacterium and Streptomyces genomes encode only a single adenylyl cyclase each, both of which have corresponding orthologues in M. tuberculosis. A clustering of the cyclase domains in Actinobacteria reveals the presence of typical eukaryote-like, fungi-like and other bacteria-like class III cyclase sequences within this phylum, suggesting that these proteins may have significant roles to play in this important group of organisms.


Introduction
Cyclic AMP and cGMP are involved in diverse signalling networks in all life forms. In bacteria, cAMP is an important second messenger that regulates several operons and regulons (Cases et al. 1998). There are six different families of proteins that convert NTPs to either cAMP or cGMP (Danchin 1993;Shenoy et al. 2002), and these are distinguished by differences in amino acid sequence and catalytic mechanism, suggesting that these six classes have evolved independently in diverse organisms during evolution.
The class I enzymes are present in E. coli and related enteric bacteria (Danchin 1993), and are represented by single copy genes that code for an adenylyl cyclase involved in the phenomenon of catabolite repression (Cases et al. 1998). Class II cyclases are represented by toxins secreted by Bacillus anthracis (Leppla 1982), Bordetella pertussis (Weiss et al., 1984) and Pseudomonas 18 A. R. Shenoy et al. aeruginosa (Yahr et al., 1998) that elevate cAMP levels within the host cells on infection. The crystal structure of the anthrax oedema factor, the calmodulin-activated adenylyl cyclase, shows that the catalytic mechanism of the class II cyclases appears to require a single divalent metal ion (Drum et al., 2002).
The class III enzymes are the most widely distributed and are found in bacteria, archaea and eukaryotes (Danchin, 1993;Shenoy et al., 2002). The mammalian G-protein coupled, receptoractivated adenylyl cyclases are members of the class III adenylyl cyclase family (Defer et al., 2000), as are the receptor and soluble guanylyl cyclases (Wedel et al., 2001). Crystal structures of the class III adenylyl cyclases (Tesmer et al., 1997;Bieger et al., 2001) and subsequent homology modelling and mutational analysis of the guanylyl cyclases (Liu et al., 1997) have led to the identification of amino acid residues responsible for substrate selectivity amongst these enzymes (Sunahara et al., 1998;Tucker et al., 1998;Hannenhalli et al., 2000). Classes IV, V and VI of the cyclase family have only a single representative each, found in Aeromonas (Sismeiro et al., 1998), Prevotella (Cotta et al., 1998) and Rhizobium (Tellez-Sosa et al., 2002), respectively.
Class III cyclases have been the most extensively studied, both structurally and biochemically. The mammalian 12-transmembrane adenylyl cyclases act as functional dimers of the C1 and C2 regions contained within a single polypeptide chain (Tang et al., 1998). The C1 and C2 regions are class III cyclase domains, hence mammalian membrane-bound adenylyl cyclases have two class III domains within their polypeptide chains. Crystal structures have revealed that the C1-C2 dimer interface gives rise to the active site, and hence dimerization and correct juxtaposition of residues from both protomers is essential for activity of these enzymes (Tang et al., 1998). The guanylyl cyclases, on the other hand, have only a single class III cyclase domain per polypeptide and act by homodimerization, as is seen in the receptor guanylyl cyclases, or as heterodimers of α and β subunits in the case of the soluble guanylyl cyclases (Wedel et al., 2001). Mutagenesis studies (Zimmermann et al., 1998) and structural similarity of the adenylyl cyclase core to the palm fold of DNA polymerases suggested that the cyclase reaction mechanism involves two-metal ion catalysis . Two metal atoms are bound by two conserved aspartate residues, and the reaction involves the abstraction of the proton from the 3 OH group of ATP and the nucleophilic attack by the negatively charged oxygen on the phosphodiester bond (Tesmer et al., 1997). The abstraction of the proton is facilitated by one of the two metal atoms in the class III cyclases (Tesmer et al., 1997), and this function is interestingly carried out by a suitably placed histidine residue in the class II cyclases (Drum et al., 2002). A pentavalent phosphate group is believed to be the transition state species that is stabilized by critical asparagine and arginine residues, which have been shown to be essential for activity by mutational analysis (Tesmer et al., 1997;Yan et al., 1997). The binding of ATP requires a lysine and an aspartate, which are replaced by glutamate and cysteine in the guanylyl cyclases to allow utilization of GTP as substrate (Liu et al., 1997).
The Actinobacteria include several bacteria of medical or industrial interest. Streptomyces is a ubiquitous soil organism known for its versatile metabolic prowess, especially in forming useful secondary metabolites, the production of some being regulated by cAMP (Susstrunk et al., 1998;Horinouchi et al., 2001). Mycobacterium tuberculosis is the aetiological agent of tuberculosis and belongs to an order that includes other pathogens such as M. leprae and several members in other genera, such as Corynebacterium, Nocardia and Actinomyces (NCBI http://www.ncbi.nlm.nih. gov/entrez/query.fcgi?db=Taxonomy). Interestingly, M. tuberculosis infection of macrophages has been shown to lead to increased cAMP in macrophages (Lowrie et al., 1979) and M. microti has been thought to escape macrophage killing by the release of cAMP (Lowrie et al., 1975). Since studies with Bordetella pertussis (Confer et al., 1982), Bacillus anthracis (Hoover et al., 1994) and Trypanosoma (Wirth et al., 1982) have demonstrated the importance of cAMP in reducing phagocyte activity, it would be interesting to investigate whether cAMP plays any role in compromising the functioning of the macrophage in the case of Mycobacterium infection. An analysis of the M. tuberculosis genome showed the presence of several cyclases and cNMP binding proteins (Cole et al., 1998;McCue et al., 2000) and recently at least two gene products that code for adenylyl cyclases from M. tuberculosis have been the focus

19
of biochemical studies (Guo et al., 2001;Reddy et al., 2001;Linder et al., 2002;Shenoy et al., 2003). However, no comprehensive analysis of the genomes from these bacteria has been performed to date in terms of the domain fusions and critical catalytic residues present in these proteins.
In this study, we have analysed the completed genome sequences of members of the Actinobacteria for the presence of nucleotide cyclases represented by all the six classes of nucleotide cyclases. Interestingly, only sequences similar to the class III cyclases are present in these bacteria, and prominent pathogens such as M. tuberculosis and M. leprae have more than one class III cyclase encoded in their genomes. Many of the putative cyclase genes have interesting domain fusions that have not been identified in other class III cyclases to date, suggesting that biochemical analysis of these proteins and their regulation could be a fruitful area of research in the future.

Searches
PSI-BLAST (Altschul et al., 1997) was performed with an inclusion (h) value cut-off of 10 −4 till convergence, using catalytic domain sequences of representative adenylyl and guanylyl cyclases as queries. The catalytic domain of the M. tuberculosis Rv1625c gene, shown to have adenylyl cyclase activity (Guo et al., 2001;Reddy et al., 2001), was also used as the seed sequence in PSI-BLAST using amino acids 212-443 (Shenoy et al., 2003). This performed better than other cyclase domain sequences. A position-specific scoring matrix (PSSM) generated using the alignment of the cyclase domains of the 16 M. tuberculosis cyclases was also used in the searches. The large number of pseudogenes in M. leprae (Cole et al., 2001) prompted us to search for the possible existence of cyclase pseudogenes. This was carried out using the same PSSM in a PSI-TBLASTN search on the nucleotide sequence of the M. leprae genome. The amino acid sequences were then mapped on to the annotated genome and all eight of the pseudogenes identified corresponded to genes labelled as pseudogenes earlier (Cole et al., 2001).
Hidden Markov model (HMM) based searches were performed using HMMer (Eddy 2001 http:// hmmer.wustl.edu/) in the global (ls) and local (fs) mode using the class III cyclase HMM (Accession No. PF00211) from the Protein Families (Pfam) database of alignments and HMMs (Pfam http://www.sanger.ac.uk/Software/Pfam/; Bateman et al., 2002) at an Expect (E) value cut-off of 0.1. Additional cyclase sequences from bacteria and eukaryotes were introduced or removed to generate additional models for the searches. The cyclase domains of the 16 M. tuberculosis cyclases were also used to generate a model for HMM search.
Domain organizations of the cyclases were predicted using the HMMs from the Pfam (Bateman et al., 2002) and the Simple Modular Architecture Research Tool (SMART) websites (SMART http://smart.embl-heidelberg.de/; Letunic et al., 2002) and the graphical outputs with modifications have been used in the figures. RPS-BLAST (Schaffer et al., 1999) using the Cluster of Orthologous Groups (Tatusov et al., 2001), Pfam and SMART matrices (NCBI; ftp://ftp.ncbi.nih.gov/ pub/mmdb/cdd/) was also used along with Conserved Domain Architecture Retrieval Tool (CD ART; http://www.ncbi.nlm.nih.gov/Structure/ lexington/lexington.cgi?cmd=rps) for assessing domain organization of the sequences. Transmembrane spanning helices were predicted using TMHMM (http://www.cbs.dtu.dk/services/ TMHMM/). Multiple sequence alignments were computed using HMMalign (Eddy, 2001; http:// hmmer.wustl.edu/), ClutalX (Jeanmougin et al., 1998) and T-Coffee (Notredame et al., 2000) and were subsequently manually edited. Clustal X was used to generate Figures 2 and 5. Alignments of the cyclase domains were then transferred to the Molecular Evolutionary Genetics Analysis software (Kumar et al., 2001) for generating and bootstrapping the neighbour-joining tree.

Cyclases in Actinobacteria
The presence of multiple classes of cyclases, without sequence similarity to each other, indicates the independent evolution of proteins for catalysing the conversion of ATP to cAMP. PSI-BLAST searches with query sequences of the all classes of adenylyl cyclases in Actinobacteria yielded hits above the E value cut-off defined in Materials and methods, for only the class III cyclases. PSI-BLAST and HMM searches (see Materials and methods) for sequences similar to the currently known class III cyclases revealed the presence of several class III cyclase members within the genomes of many Actinobacteria (Tables 1-3). We identified 62 proteins in this study, each having a single class III cyclase domain. The genomes of Bifidobacter longum and Tropheryma whipplei did not reveal the presence of members of any of the currently described classes of nucleotide cyclases, The genome of M. bovis is highly similar to that of M. tuberculosis and also has identical cyclase genes. The primary annotations for the M. tuberculosis CDC1551 and M. bovis genomes for corresponding genes from M. tuberculosis H37Rv are listed. The genes are classified based on the domain analysis that has been performed and described in this study. The protein lengths in amino acids (aa) for the M. tuberculosis H37Rv proteins are given, as per the primary annotation. M. tuberculosis CDC1551 has one cyclase more than the other two mycobacteria (MT1360). The full-length genes (gene identifiers and primary annotations are listed) represent those that code for proteins that match genes in M. tuberculosis (M. tb). The ML2341 cyclase is unique to M. leprae. The pseudogenes corresponding to cyclases were identified by PSI-TBLASTN and have been identified in the original genome annotation (Cole et al., 2001). The protein length is given in amino acids (aa). Gene identifiers and primary annotations for completed genome sequences are shown. The protein lengths are given in amino acids (aa). and therefore these organisms may have no need of cAMP as a regulatory molecule, or have novel proteins that need to be identified through classical screening procedures.

Cyclases in M. tuberculosis and M. bovis
An analysis of the M. tuberculosis H37Rv genome for the presence of cNMP binding proteins and cNMP metabolizing enzymes has been reported earlier, and predicted 15 cyclases in the genome (McCue et al., 2000). Recently, the genome sequence of the CDC1551 strain has been made available (Fleischmann et al., 2002) and we have analysed this genome as well for the members of the nucleotide cyclase family, using methods described above. Searches with queries from families other than the class III cyclase family did not yield hits with scores above cut-off, as described in Materials and methods. The largest number (17) of cyclases is seen in the genome of M. tuberculosis CDC1551, followed by the M. tuberculosis H37Rv genome which, in our analysis, shows 16 putative cyclase genes ( Table 1). The presence of one extra cyclase in the CDC1551 strain compared to the H37Rv strain is explained by expansion of the Rv1318c cyclase gene family (Fleischmann et al., 2002). All M. tuberculosis putative cyclase genes have only a single class III cyclase domain per polypeptide chain, in contrast to the eukaryotic adenylyl cyclases. Highly similar genes (>99% sequence identity) are present in the CDC1551 strain (the gene names from the CDC1551 strain are mentioned in parentheses in the text). A variety of domains are found fused to the cyclase domain in these 16 genes ( Figure 1) and some of these fusions are unique to the actinobacterial cyclases (see below). The genome of M. bovis reveals the presence of 16 cyclases identical to those in M. tuberculosis H37Rv ( Table 1).

Genes that code for proteins with only an identifiable cyclase domain
There are six genes in the M. tuberculosis genome that, upon prediction, seem to code for only a class III cyclase domain, with no additional domains identifiable from current databases. These are Rv0891c (MT0915), Rv1120c (MT1152), Rv1264 (MT1302), Rv1359 (MT1403), Rv1647 (MT1685), and Rv2212 (MT2268) (see Table 1, Figure 1). As shown in Table 1, Rv0891c and Rv1359 contain the number of amino acids minimally required for a functional cyclase domain. Rv1264, Rv1647 and Rv2212 are longer proteins and contain >100 amino acids adjacent to the cyclase domain. However, these sequences do not contain additional domains that are listed in Pfam currently. Rv0891c appears to have all the residues required for catalytic activity in class III cyclases. However, it has non-conserved residues (arginine and leucine) at positions responsible for substrate selectivity, in contrast to those seen in the mammalian enzymes ( Figure 2). A clustering based on the sequence alignment of representative actinobacterial cyclase domains identified in this study and other class III cyclases ( Figure 3) shows that the Rv0891c cyclase domain is more related to the fungal and parasite cyclases and a group of Actinobacteria-specific NB-ARC (nucleotide-binding-common to Apaf1, plant resistance gene products and CED4) domain (van der Biezen et al., 1998) containing cyclases as described below.
The Rv0891c (MT0915) gene is adjacent to the Rv0890c gene which is predicted to have a NB-ARC domain and a C-terminal helix-turn-helix (HTH) DNA binding domain and the operonic nature of these two genes (TIGR http://www.tigr. org/tigr-scripts/operons/pairs.cgi?taxon id=89) hints at their possible functional interplay.
A close examination of Rv1120c (MT1152) reveals that the protein is only 164 amino acids long, which is smaller than a typical class III cyclase catalytic domain (∼200-250 amino acids). A number of conserved class III cyclase-like residues are present in Rv1120c, but it lacks the more C-terminal residues responsible for substrate selectivity and transition-state stabilization ( Figure 2). A BLASTN search on the M. avium genome reveals the presence of an orthologue (80% identity to the first 149 amino acids of Rv1120c) whose protein length is long enough to encode a complete cyclase domain. We therefore aligned the nucleotide sequence in the H37Rv genome, beyond the predicted stop codon of the Rv1120c, against the corresponding region of the unfinished genome of M. avium and found that a single base deletion in the M. tuberculosis genome has led to a frame shift and premature truncation of the Rv1120c polypeptide ( Figure 4). The putative Rv1120c orthologue in M. avium extends to the end of an ORF that is contained within the Rv1119c gene in M. tuberculosis. Therefore, the putative gene in M. avium possesses functional class III cyclase C-terminal residues that are lacking in the M. tuberculosis protein. It is possible therefore, that a loss of the full-length cyclase, seen in the Rv1120c cyclase gene in M. tuberculosis H37Rv (and CDC1551), has occurred and that Rv1120c could represent the only cyclase pseudogene in M. tuberculosis. The status of the Rv1120c orthologue in M. bovis is identical to that in M. tuberculosis.
The Rv1264 (MT1302) adenylyl cyclase was recently characterized biochemically and its Nterminal region was found to downregulate the activity of the catalytic domain (Linder et al. 2002). Rv1264c is more closely related to the Streptomyces cyclases in having a short deletion just Nterminal to the substrate selective aspartate residue ( Figure 2). Another related cyclase is the Rv2212 (MT2268) cyclase, which is as yet uncharacterized biochemically. The clustering shown in Figure 3 reveals that it positions in a group of cyclases that includes proteins from Streptomyces, Brevibacterium and Thermobifida. Rv2212 has all the residues required for catalytic activity and appears to be specific for ATP as a substrate (Figure 2). Rv1264 has sequence similarity (29% identity in the cyclase domain) to the cyclase from Brevibacterium, which has been shown to be activated by pyruvate. However, Rv1264 was not stimulated by pyruvate or other small molecules (Linder et al., 2002).
Both Rv1359 (MT1403) and the adjacent Rv1358 (MT1402) genes contain the class III cyclase domain. However, Rv1359 has higher sequence similarity to the Rv0386 (MT0399) cyclase than to its neighbour (47% identity to Rv0386 compared to 38% to Rv1358). In fact, Rv0386 is an enzyme that as far as domain organization is concerned, is identical to Rv1358 (Figure 1 and see below). With the absence of one of the metal binding aspartates as well as the critical arginine in its gene product, Rv1359 is likely to be inactive as a homodimer ( Figure 2).
The catalytic domain of Rv1647 (MT1685) is similar (28-34% sequence identity) to several cyclases from nitrogen-fixing soil bacteria such as Sinorhizobium meliloti and Mesorhizobium loti. The gene appears to harbour the residues required for catalytic activity and therefore is likely to possess cyclase activity (Figure 2). The same region of the protein also picks up the GGDEF domain (Galperin et al., 2001) when the SMART database PSSM profiles are used during RPS-BLAST (E value 0.01). The GGDEF domain is thought to be homologous to the class III cyclase domain. However, with a very robust hit (E value 10 −28 ) to the class III cyclase domain, and the presence of all residues required for catalysis, it is probable that its gene product could be an active adenylyl cyclase.

Membrane-bound cyclases
Several putative cyclases in the M. tuberculosis genome have transmembrane helices and hence could be localized to the membrane (Figure 1). There are six predicted membrane-bound cyclases in the H37Rv genome and seven in the CDC1551 genome (accounted for by the addition to the Rv1318c family). Interestingly, a majority of these putative cyclases contain six transmembrane spanning domains. In addition, except Rv1625c and its orthologues in M. tuberculosis CDC1551 (MT1661) and M. bovis (Mb1651c), all the other putative membrane-associated cyclases have an intracellular juxtamembrane HAMP (present in histidine kinases, adenylyl cyclases, methyl accepting chemotactic receptors and phosphatases) domain (Aravind et al., 1999b, Galperin et al., 2001, followed by the C-terminal cyclase domain. The Rv1625c adenylyl cyclase with six transmembrane helices and a single catalytic domain is the adenylyl cyclase gene product that has been the most studied to date (Guo et al., 2001;Reddy et al., 2001;Shenoy et al., 2003). Homodimerization of the Rv1625c gene product is seen, and would give rise to a 12-transmembrane-spanning domain containing enzyme that is similar to the mammalian enzymes.
The high sequence similarity of the Rv1625c gene product with the mammalian enzymes (35-40% identity to eukaryotic adenylyl and guanylyl cyclases) does not, however, translate into identity in its biochemical behaviour (Shenoy et al., 2003). The protein could not be converted into a guanylyl cyclase by replacement of ATPspecifying residues to those for GTP present in guanylyl cyclases, probably due to alterations in the dimeric status and improper juxtaposition of the critical interfacial residues (Shenoy et al., 2003). This indicates a difference in the dimer interface of a homodimeric cyclase such as Rv1625c and that of the heterodimeric mammalian adenylyl cyclases.
The most common domain found to occur in fusion with the cyclase domain in the M. tuberculosis strains is the HAMP domain ( Figure 1). The HAMP domain is believed to act as a structural linker between a sensor domain and the C-terminal  effecter domain, e.g. the kinase, phosphatase or, as in the study here, a cyclase domain (Galperin et al., 2001). Mutations in the HAMP domain have been usually known to cause constitutive activation of fused downstream effecter domains (Appleman et al., 2003). The conserved proline and glutamate residues conserved in HAMP domains are also conserved in the HAMP domains of actinobacterial cyclases ( Figure 5A). There are four members of this type of cyclase in the H37Rv strain and five in CDC1551 (Fleischmann et al., 2002), where the six transmembrane helices are followed by a HAMP domain. Expression of one of these representatives, the Rv1320c cyclase, in E. coli led to the production of protein that was inactive (Reddy, et al. 2001). However, sequence alignment reveals that Rv1320c, Rv1319c and Rv1318c all have residues required for catalytic activity (Figure 2), and it is anticipated that the catalytic domains alone of these proteins may have adenylyl cyclase activity, with the HAMP domain providing a regulatory role. The likelihood that this family of proteins could be important for the physiology of the organism is suggested by the fact that the Rv1319c transcript has been detected by microarray analysis as being repressed during hypoxia (Sherman et al., 2001). The Rv3645 (MT3763) cyclase also has six transmembrane helices similar to the Rv1318c family of proteins, and all residues required for catalysis are present in the catalytic domain (Figures 1, 2). Interestingly, as described later, M. leprae and Corynebacterium have orthologues of this cyclase rather than those from the Rv1318 cyclase family. A multiple sequence alignment of the HAMP domain of Rv3645 with other HAMP domains is shown in Figure 5A.
The Rv2435c (MT2509) cyclase has two transmembrane domains and intracellular HAMP and cyclase domains (Figure 1), and therefore has similarity in domain organization with the two-transmembrane bacterial chemotaxis receptors (Mowbray et al., 1998). This gene product could therefore represent a receptor that senses some unknown ligand. Its N-terminal region up to the catalytic domain is similar to a methyl-accepting chemotaxis protein from Vibrio vulnificus (27% identity to gi 27 366 778 with an E value of 4 × 10 −43 ). The similarity extends within the HAMP domain as well and it appears that the fusion of this module to a cyclase domain gave rise to this protein in mycobacteria. However, the HAMP domain in Rv2435c differs significantly in sequence from the HAMP domain in the Rv1318c family ( Figure 5A), suggesting that the event that brought together the HAMP domain and a cyclase domain in a single protein occurred independently more than once during the evolution of the M. tuberculosis genome.
At the nucleotide level, the stop codon of Rv2435c overlaps by one base with the start codon of Rv2434c, suggesting that the Rv2425c cyclase exists in an operon with Rv2434c (TIGR; http://www.tigr.org/tigr-scripts/operons/pairs. cgi?taxon id=89). The latter gene is a transmembrane channel with a C-terminal cNMP-binding domain (Cole et al., 1998;McCue et al.; and might thus be functionally regulated by Rv2435c. One report suggests that the Rv2435c and Rv2434c transcripts are repressed during hypoxia (Sherman et al., 2001). Despite this interesting insight, it is puzzling to note that Rv2435c lacks one of the metal binding sites and both the transition state-stabilizing asparagine and arginine residues ( Figure 2).
The Rv1900c (MT1951) is annotated as lipJ (Cole et al., 1998) and has an N-terminal lipid esterase domain that is identified by PSI-BLAST (Figures 1, 5B). The corresponding gene in the M. bovis genome is annotated as a probable lignin peroxidase (Mb1935c). The N-terminal region of Rv1900c is similar to the 3-oxoadipate enol-lactone hydrolase from Pseudomonas sp. (22% identity to gi 17 736 948 at E value 5 × 10 −91 ) which is a protein amongst the top hits that has been biochemically characterized (Gobel et al., 2002). This domain is also identified as a αβ-hydrolase domain identified in the COG database (COG0596, MhpC, predicted hydrolase or acyltransferase). The Cterminal region of Rv1900c is similar to class III cyclases ( Figure 2) and therefore this unique domain combination makes it interesting to investigate whether both domains are enzymatically functional. The cyclase domain, however, lacks the critical asparagine residue ( Figure 2) and might therefore play only a regulatory role, by providing a site The numbers 1-4 stand for alpha helices. (D) Alignment of the NB-ARC domain from actinobacterial cyclases and representative proteins. The proteins shown are Arabidopsis thaliana (A.thal) resistance proteins, such as resistance to Pseudomonas syringae protein 2 (RPS2), resistance to P. syringae protein 3 (RPM1) and putative resistance RPP13-like protein 4 (R134). In addition there are Homo sapiens (H.sap) Apaf1 and Caenorhabditis elegans (C.ele) CED4 proteins. The regions within the NB-ARC domain are labelled as identified earlier (Neuwald et al., 1999, Jaroszewski et al., 2000 for binding of an allosteric small molecule such as a nucleotide. An alignment of the N-terminal region of Rv1900c that encompasses the αβ-hydrolase domain reveals the presence of the conserved active site serine residue that is known in other enzymes of this family. However, the other two residues in the catalytic triad (Ollis et al., 1992), an aspartate or glutamate and a histidine are not present ( Figure 5B). Three cyclases (Rv0386, Rv1358 and Rv2488c) are found in fusion with a NB-ARC domain that is C-terminal to the cyclase domain, and are the only proteins with a cyclase domain at the extreme N-terminus. The NB-ARC domain is a nucleotide binding domain of the AAA + superfamily (ATPases Associated with several cellular Activities), which has Walker motifs required for nucleotide binding (Neuwald et al., 1999). The NB-ARC domain has been shown in some proteins to bind and hydrolyse ATP, thereby acting as a switch that regulates the protein's function. In some proteins ATP hydrolytic activity is absent, and the domain has been shown to be involved in oligomerization and the formation of large protein assemblies (Jaroszewski et al., 2000).
These soluble cyclases also possess an extreme C-terminal LuxR-type HTH domain (Bateman et al., 2002). A similar domain organization is absent outside the mycobacteria, based on an analysis with available genome sequences (Shenoy AR, unpublished observations), and suggests the importance of this family of cyclases in the biology of mycobacteria. Several transcription factors are known to have an ATPase domain and the HTH domain (Yeats et al., 2003), and therefore the presence of a class III cyclase domain N-terminal to these domains is suggestive of an interesting regulation of a possible DNA-binding activity of these proteins. The HTH domain in these cyclases is similar to the GerE/LuxR family of transcription regulators. Figure 5C shows the multiple sequence alignment of HTH domains from the actinobacterial cyclases and other members of this family.
As described earlier, the Rv0891c soluble cyclase exists in an operon with the NB-ARC and HTHcontaining Rv0890c protein. The observation that the NB-ARC and HTH-containing domain protein gene is found adjacent to a gene encoding a putative cyclase domain in the genome suggests an interplay between these domains, which may have led to the ultimate evolution of a protein containing all three domains fused together in M. tuberculosis species. This is further supported by BLAST searches with the Rv0891c and Rv0890c cyclases that show Rv0386, Rv1358 and Rv2488c cyclases as the top three hits (data not shown). The NB-ARC domains of these proteins show the presence of typical Walker A and B motifs ( Figure 5D) identified in the AAA + superfamily of proteins (Neuwald et al., 1999;Jaroszewski et al., 2000). These domains were also identified earlier in a study for ATPases involved in apoptosis (Aravind et al., 2001).
Amongst the soluble cyclases, only the Rv0386 cyclase seems to possess all residues required for catalysis ( Figure 2). The Rv1358 cyclase lacks the second metal binding aspartate as well as the critical asparagine residue. Rv2488c lacks the critical arginine residue (Figure 2). The absence of these residues might indicate the lack of activity, unless there are other compensatory mutations to overcome the effect of these substitutions in the overall structure of these proteins. However, these cyclases do not have any of the typical adenylyl or guanylyl cyclase-like residues at the positions known to select for substrate selectivity. Thus, the nucleotide that these cyclases bind can only be determined through biochemical studies, although it seems more likely that this variant of the class III cyclase might be better suited to bind stretches of DNA sequences, due to the lack of well-defined nucleotide selectivity.

Cyclases in Mycobacterium leprae
The M. leprae genome has a large number of pseudogenes (Cole et al., 2001). Genome biologists have studied pathogens with smaller genomes so as to identify the minimum set of genes that are required for virulence, and comparative studies on the genomes of M. leprae and M. tuberculosis would also help to narrow down the list of genes that are required for virulence of the tubercle bacilli (Vissa et al., 2001). The putative cyclase genes are no exception to the general degeneration of genes observed in the M. leprae genome. There are only four genes in the M. leprae genome that encode for putative cyclases, viz. ML0201, ML1399, ML1753 and ML2341 (Table 2). There are at least eight identifiable pseudogenes (see   (Cole et al., 2001).

Membrane-bound cyclase
The ML0201 cyclase in M. leprae has six transmembrane helices, a HAMP domain and a Cterminal cyclase domain ( Figure 6). This is a cyclase similar to those of the Rv1318c cyclase family. However, the enzyme has high sequence similarity to the Rv3645 cyclase (79% identity across the full-length sequences) and is in a region syntenic in M. tuberculosis. ML0201 has all the residues required for catalytic activity (Figure 2). Its presence in the genome indicates the importance of the HAMP domain cyclases (alignment shown in Figure 5A) in mycobacteria, since M. leprae has retained at least one member from this family.

Soluble cyclases
Three of the four cyclases in M. leprae are predicted to be soluble enzymes ( Table 2). The ML1399 cyclase is very similar (74% identical) to the Rv1647 cyclase and lies in a syntenic region of the genome. It possesses all residues required for catalytic activity and substrate selectivity residues are those that classify it to be an adenylyl cyclase (Figure 2). The ML1753 and ML2341 cyclases are from the interesting family of cyclases that have a NB-ARC domain as a fusion with the cyclase domain ( Figures 5C, D and 6). The ML1753 cyclase is the orthologue of M. tuberculosis Rv1358, as judged by its position in a syntenic region on the chromosome and domain architecture. However, it lacks the second metal-binding residue and the transition state stabilization residues and therefore might be inactive as a cyclase, much like Rv1358.
The ML2341 cyclase with only a cyclase domain and a NB-ARC domain does not have a corresponding cyclase in M. tuberculosis (Figures 5D,  6). The ML2341 cyclase seems to have all the residues required for catalytic activity, except the critical asparagine, and might bind ATP in preference to GTP (Figure 2). This gene lies in a region on the M. leprae genome that is flanked by pseudogenes of possibly Rv3728 and Rv3730c in M. tuberculosis. However, the Rv3729 gene does not correspond to the ML2341 gene in M. leprae.

Cyclases in related bacteria
The Corynebacterium cyclases The two Corynebacterium genomes analysed here, C. efficiens YS-314 and C. glutamicum ATCC13032, each have a single putative cyclase   (Table 3, Figure 7). These proteins appear to have six transmembrane spanning domains with a HAMP and cyclase domains, and contain all the residues required for cyclase activity (Figures 2, 5A). The proteins are similar (∼40% identity) to the Rv3645 cyclase (Figures 3, 5A), and analysis of the region of the genome in these species indicates that these genes are flanked by genes similar to those found adjacent to Rv3645c in M. tuberculosis, viz. Rv3646 (DNA topoisomerase) and Rv3644c (subunit of DNA polymerase III). These genes could thus represent orthologues of the Rv3645 cyclase.

The Streptomyces cyclases
The Streptomyces are in a separate suborder within the Actinomycetales, the Streptomycineae, unlike the Corynebacterium and Mycobacterium species, both of which are within the suborder Corynebacterineae (NCBI; http://www.ncbi.nlm.nih.gov/ entrez/query.fcgi?db=Taxonomy). The genomes of Streptomyces coelicolor and S. avermitilis reveal the presence of only one cyclase gene each. However, unlike in Corynebacterium, these single cyclases are soluble cyclases that cluster with the Rv1264 and Rv2212 cyclases of M. tuberculosis (Table 3, Figure 7). A single cyclase each has been found in S. griseus and S. avermitilis, whose full-length sequences are 66% and 71% identical, respectively, to the cyclase from S. coelicolor. However, as seen in the alignment (Figure 2), there is a significant deletion near the second substrate selectivity residue and another insertion in the Cterminus of these cyclases, when compared to all other actinobacterial cyclases. The cyclases from   and S. griseus (Horinouchi et al., 2001) have been shown biochemically to be adenylyl cyclases (Figure 2).

Other actinobacterial cyclases
We identified cyclase genes through a search of other actinobacterial sequences deposited in nrdb, such as Brevibacterium liquefaciens, Thermobifida fusca, Arthrobacter nicotinovorans and Actinosynnema pretiosum subsp. auranticum. All these cyclases are potential soluble enzymes and, except for the cyclase from A. pretiosum subsp. auranticum, the other three have only an identifiable cyclase domain (Table 3, Figure 7).
The T. fusca cyclase is 39% identical to the cyclase from B. liquefaciens. The Brevibacterium cyclase has been studied extensively and is activated by pyruvic acid (Lynch et al., 1975;Peters et al., 1991). These cyclases also have a deletion immediately N-terminal to the second substrate selectivity residue but do not have the insertion seen in the Streptomyces cyclases. They cluster with the Streptomyces cyclases, as seen in Figure 3.
An interesting cyclase was identified in Arthrobacter nicotinovans, which is a soil bacterium and in that aspect is like Streptomyces. The cyclase domain, although divergent (only 17% identical to the cyclase domain of Rv1625c), has all residues required for catalysis with a conservative substitution (arginine to a lysine residue) of the transition state stabilization residue ( Figure 2) and could therefore be active.
The most unique domain organization of any cyclase within the Actinobacteria is probably that of A. pretiosum subsp. auranticum, which has two additional domains involved in regulation of transcription -a domain similar to the transcription regulatory protein, C-terminal (Pfam Accession No. 00 486) and the bacterial transcriptional activator domain (BTAD; Pfam Accession No. 03 704; Figure 7). The alignment of this cyclase domain with other cyclases reveals the absence of almost all residues required for catalysis ( Figure 2). This protein lies in a region on the genome that is implicated in the biosynthesis of a secondary metabolite from A. pretiosum (Yu et al., 2002). This domain is 19% identical to the cyclase domain of Rv1625c. Based on the other domains found in the protein and the known similarity of the class III cyclase domain to the DNA polymerase, this domain might also be a DNA-binding domain.

Discussion
The presence of different classes of cyclases in a single bacterial genome is rare, and so far only Pseudomonas aeruginosa seems to have adenylyl cyclases represented by class I, class II and class III (Yahr et al., 1998;Wolfgang et al., 2003). cAMP was recently shown to regulate virulence pathways in P. aeruginosa (Wolfgang et al., 2003). The class III cyclases seem to be far more widely distributed in bacterial genomes (Danchin, 1993;Shenoy et al., 2002). As evident in our study, the Actinobacteria contain only the class III cyclases, and completely lack representatives of all the other five classes of cyclases. Interestingly, only the class III cyclases have been found to exist in multiple copies within bacteria (e.g. Anabaena and Myxococcus), as is seen in the eukaryotes (Defer et al., 2000). The number and variations in domain fusions that are seen in the cyclase genes of Mycobacterium spp. suggest that this organism could utilize these signalling proteins in its obligate intracellular parasitic lifestyle. No putative cyclase genes are present within any of the regions of difference (Brosch et al., 2001;Cole, 2002) used to classify M. tuberculosis strains and related species (Kato-Maeda et al., 2001), suggesting that all organisms in this species retain identical copies of the cyclase genes.
It has been suggested that the class II cyclases originated from the DNA polymerase β family and the class III evolved from the DNA polymerase α family (Aravind et al., 1999a). We find here that the cyclase domain exists along with domains seen in subunits of the DNA polymerase, again suggesting a common evolutionary origin for these proteins. The fact that the NB-ARC domain is also found in subunits of the DNA clamp loader (Jaroszewski et al., 2000) has led to the identification of possible new roles for the cyclase domain, such as transcription, as indicated by fusion to domains such as C-terminal transcription regulator, BTAD and HTH DNA-binding domains. The presence of putative inactive cyclase domains, fused along with other domains, also indicates an allosteric regulatory role for the cyclase domain, in being possibly responsive to nucleotide-like molecules. of E. coli (Shenoy and Visweswariah, unpublished observations). This could reflect the presence of a large number of cyclases in its genome, and possibly that more than one cyclase is expressed at any given time.
In conclusion, the cyclases that are predicted within the Actinobacteria appear to all be adenylyl cyclases, and we have not identified any guanylyl cyclases, based on the substrate selectivity residues identified in mammalian class III cyclases. The cyclase genes that we have identified here thus appear to encode a large and an interesting group of proteins that might be involved in important signalling events. Further biochemical and structural studies of these proteins present in Actinobacteria might improve our understanding of the biology of several important pathogenic bacteria.