In Silico Analysis for Transcription Factors With Zn(II)2C6 Binuclear Cluster DNA-Binding Domains in Candida albicans

A total of 6047 open reading frames in the Candida albicans genome were screened for Zn(II)2C6-type zinc cluster proteins (or binuclear cluster proteins) involved in DNA recognition. These fungal proteins are transcription regulators of genes involved in a wide range of cellular processes, including metabolism of different compounds such as sugars or amino acids, as well as multi-drug resistance, control of meiosis, cell wall architecture, etc. The selection criteria used in the sequence analysis were the presence of the CysX2CysX6CysX5-16CysX2CysX6-8Cys motif and a putative nuclear localization signal. Using this approach, 70 putative Zn(II)2C6 transcription factors have been found in the genome of C. albicans.


Introduction
Biological systems contain an important group of proteins characterized by their ability for DNA binding and participation in important processes, such as DNA replication and repair and transcription gene control. Gene expression can be controlled at various levels, including transcription, mRNA splicing, mRNA stability, translation and even post-translation events, such as protein stability and modification. There are many regulatory sequences in genes that bind various transcription factors. These regulatory sequences are essentially located upstream (5 ) of the transcription initiation site, although some elements occur downstream (3 ) or even within the genes themselves. The number and type of regulatory elements are variable for each gene. Moreover, various cell types express characteristic combinations of transcription factors; this is the major mechanism for cell-type specificity in the regulation of gene expression.

Materials and methods
The BLAST utility provided by the C. albicans genome database (http://www.pasteur.fr/recherche/unites/GalarFungail; Altschul et al., 1997) was used to search for putative transcription factors containing the Zn(II) 2 C 6 binuclear motif. The C. albicans putative Zn(II) 2 C 6 proteins were aligned using the ClustalW online interface (http://www. ebi.ac.uk/clustalw; Thompson et al., 1994) and albicans Zn(II) 2 C 6 domain. The structure was produced by threading it to the S. cerevisiae Cyp1 (Hap1) DNA binding domain (PDB: 1PYC) by using the JIGSAW utility (http://www.bmm.icnet.uk/servers/3djigsaw; Bates and Sternberg, 1999). The position of the six cysteines is annotated. The figure was generated using the Swiss Pdb-viewer (Guex and Peitsch, 1997) manual alignment. After the alignment, the output data were submitted to the Phylip drawtree web interface utility at the Institute Pasteur (http:// bioweb.pasteur.fr/seqanal/interfaces/drawgram. html; Lim and Zhang, 1999) to get the phenogram. Comparative analysis between S. cerevisiae and C. albicans Zn(II) 2 C 6 putative transcription factors was carried out by reciprocal analysis of the SGD (http://www.yeastgenome.org) and C. albicans database entries. SCANPROSITE (http://www.expasy.org/tools/scanprosite; Gattiker et al., 2002) was also used for proteins matching the consensus sequence. PSORTII (http://psort.ims.utokyo.ac.jp; Horton and Nakai, 1997) was used for subcellular localization prediction. The potential of dimerization via Zn(II) 2 C 6 structures was investigated using the COILS program (http://www.ch. embnet.org/software/COILS form.html; Lupas et al., 1991) as described by Taylor and Zhulin (1999).

Results and discussion
In silico screening for potential Zn(II) 2 C 6 transcription factors The determination of the complete genomic sequence of Candida albicans (http://www-sequence. standford.edu/group/candida), annotated by the European Consortium Galar Fungail (http://www. pasteur.fr/recherche/unites/GalarFungail), has allowed us to search for new putative transcription factors containing the Zn(II) 2 C 6 binuclear motif. The criterion used for selection was the presence of the CysX 2 CysX 6 CysX 5−16 CysX 2 CysX 6−8 Cys cysteine pattern. All the 6047 C. albicans ORFs were screened, based on this criterion, and a set of 70 potential Zn(II) 2 C 6 transcription factors, including the four previously known Zn(II) 2 C 6 proteins, viz. CaFcr1p, CaSuc1p, CaCzf1p and CaCwt1p, was generated (Table 1). In the complete genome of S. cerevisiae a total of 58 Zn(II) 2 C 6 proteins have been reported (Akache et al., 2001).

Structure of the Zn(II) 2 C 6 domain
The characteristic DNA binding domain of Zn(II) 2 C 6 proteins contains a highly-conserved CysX 2 CysX 6 CysX 5−16 CysX 2 CysX 6−8 Cys motif, which was first described in S. cerevisiae (Pan and Coleman, 1990). In this motif the six cysteine residues are responsible for maintaining the structure by binding two atoms of zinc (Todd and Andrianopoulos, 1997). Cys 1 and Cys 4 act by binding two zinc ions, whereas the remaining cysteine residues are terminal ligands ( Figure 2) (Pan and Coleman, 1990). The metal-binding domain is composed of two substructures with three cysteine residues in each one. Cys 1 -Cys 2 and Cys 4 -Cys 5 are canonically separated by two amino acid residues, while Cys 2 -Cys 3 by six amino acid residues. Cys 3 -Cys 4 separation is highly variable (5-16 amino acid residues) while Cys 5 -Cys 6 separation has a length of 6-8 amino acid residues. The 70 OFRs found in C. albicans as putative Zn(II) 2 C 6 proteins were aligned using the ClustalW program (Thompson et al., 1994); 25 of them exactly fit with the most restrictive pattern CysX 2 CysX 6 CysX 6 CysX 2 CysX 6 Cys.
Another important amino acid residue in DNA binding is a lysine residue localized between Cys 2 and Cys 3 ( Figure 2). In some S. cerevisiae Zn(II) 2 C 6 proteins such as Gal4p and Pdr1p (Laughon and Gesteland, 1984), this lysine residue is responsible for the specific contact with the CGG triplet in the DNA. This lysine residue is conserved in 57 of the C. albicans Zn(II) 2 Cys 6 putative transcription factors (Figure 2), whereas 13 sequences contain arginine or histidine instead.
The subregion between Cys 3 and Cys 4 is highly variable. Although most of the motifs have a six 348 S. Maicas et al. LYS14 - - Alignment of the C 6 zinc cluster region in C. albicans proteins. The colour code is as follows: Cys, yellow and highlighted; Arg, His and Lys, blue (residues making specific DNA contact are also highlighted and in italics); Met, Val, Leu, Ala and Ile, grey; Glu, Asp, pink; Phe, Tyr and Trp, purple; Gln, Ser, Asn and Thr, green; Gly and Pro, red amino acid residues extension, its variability ranges from five to 16 residues. A proline residue is present in almost all the ORFs identified (Figure 2), maybe involved in the turn required between the two α-helix subregions. The subregions between Cys 2 and Cys 3 and between Cys 5 and Cys 6 are rich in basic amino acids. A consensus sequence in the N-and C-terminal regions flanking the six cysteines domain has not been found. However, there is a predominance of basic amino acid residues (Lys and Arg) at both the N-terminus and at the C-terminus, but to a lesser extent. These basic domains could permit or enhance the DNA recognition.
Structure of the C. albicans Zn(II) 2 C 6 proteins Zn(II) 2 C 6 transcription regulator factors are composed of two clearly different domains responsible for DNA binding and activation (Todd and Andrianopoulos, 1997, and references herein). In C. albicans, the Zn(II) 2 C 6 domain is usually found at the N-terminal region of the protein with a few exceptions: five at the C-terminus and one in the middle of the protein (Figure 3). Whiteway et al. (1992) identified the gene CFZ1 that conferred moderate pheromone resistance on S. cerevisiae. Czf1p shows an overall structure that resembles that of a transcription factor, with a glutamine-rich region in the central part and a cysteine-rich region at the C-terminus of the protein.
The similarity outside the zinc cluster region reported for some S. cerevisiae Zn(II) 2 C 6 proteins has also been found in C. albicans. It has been discovered that some putative fungal proteins contain a characteristic domain involved in transcription control (http://www.sanger.ac.uk/cgibin/Pfam/getacc?PF04082), although their function has not been elucidated to date (Marczak and Brandriss, 1991;Hedges et al., 1995;Kasten and Stillman, 1997;van Peij et al., 1998). The search in the C. albicans database for Zn(II) 2 C 6 proteins containing such fungal domain revealed the occurrence of at least 12 ORFs (Figure 3). Moreover, some of these ORFs (PUT3, STB5, CAT8, PPR1, IPF20023 and DAL81) present a high level of identity with their respective S. cerevisiae homologues.
SCANPROSITE analysis (Gattiker et al., 2002) revealed the presence of other interesting motifs ( Figure 3): (a) glutamine-rich regions, which may form hydrogen bonds with target factors (Courey and Tjian, 1988), resembling those previously described for other transcription factors (Aro et al., 2001); (b) proline-rich regions which may fold into a unique structure that forms protein-protein contact with the transcription machinery (Mermod et al., 1989) -IPF10079 and IPF9499 present such region at the C-terminus and RGT1 and IPF13024 at the N-terminus of the protein, close to the Zn(II) 2 C 6 motif; (c) histidine, serine and threonine-rich regions ( Figure 2); (d) basic leucine zipper domains, frequent in both S. cerevisiae (Fernandes et al., 1997) and C. albicans (Yang et al., 2001). These motifs are implicated in protein dimerization (Busch and Sassone-Corsi, 1990).
The global analysis of these proteins using the PSORTII program (Horton and Nakai, 1997) exhibits the presence of a nuclear localization signal (Talibi and Raymond, 1999;Moreno et al., 2003) in most of them, suggesting a putative nuclear localization ( Table 2).
The presence of coiled-coil elements was searched for the 70 sequences by the COIL program (Taylor and Zhulin, 1999), to investigate the potential of dimerization via this structure. The program described by Lupas et al. (1991) assigns a score to each amino acid residue included in a window with 7, 14 or 28 residues (two, three or four heptads) on the basis of their probability of being involved in a coiled-coil structure. Positive scores were only reported when probability values were >0.9 in the 150 residues of the C-terminal Zn(II) 2 C 6 domain ( Table 2). From the 65 putative proteins with an N-terminal Zn(II) 2 C 6 domain, a high peak was detected in 28, 21 and 17 for a two-, three-or fourheptad window, respectively. This analysis shows that the occurrence of a coiled-coil region situated at the C-terminus of the Zn(II) 2 C 6 domain is quite frequent in these putative transcription factors, and is probably involved in dimerization events.
A transcription factor (CaCwt1p), required for cell wall integrity, has been recently characterized by our group (Moreno et al., 2003). CaCwt1p has been structurally analysed and the presence of another family of C-terminal motifs has been predicted. This region, named PAS, is presumed to be involved in eukaryotic signal transduction or dimerization events (Taylor and Zhulin, 1999). The    Ct, located at the C-terminus of the protein.
Nuclear localization was predicted with PSORT-II (Horton and Nakai, 1997). Coiled-coil probability within the C-terminal region of the Zn(II) 2 Cys 6 domain was calculated by COILS (Lupas et al., 1991). The alignment was performed as described by Taylor and Zhulin (1999), with modifications. (B) The secondary structure of CaCwt1p was deduced from that of Rhizobium melitoti FixL (PDB: 1ew0). Similar residues in at least four sequences are highlighted. The N-terminal link region (blue), PAS core (red), helical connector (green) and β-scaffold (yellow) are shown. The figure was generated using RasMol (Sayle and Milner-White 1995) putative protein IPF6554 also possesses such characteristic domain. A multiple structure-based alignment also including the S. cerevisiae sequences for CaCwt1p, YPL133c and YJL103c, respectively, is shown in Figure 4. The biological function of the PAS domain in yeast proteins has not been elucidated yet. Although many of the transcription factors described at present can be basically depicted as three-component proteins, where the DNA recognition motif is linked to a dimerization region by a function variable spacer, this is not a general rule. This general structure is not always  Tong et al. 2004;b, El Alami et al. 2002;c, Kren et al. 2003;d, Kim et al. 2003;e, Akache et al. 2001;f, Kasten & Stillman 1997;g, Axelrod et al. 1991;h, Giaever et al. 2002: i, Moreno et al. 2003 present and some well-characterized proteins lack this dimerization motif (Anderson et al., 1995). The function of most of the C. albicans Zn(II) 2 C 6 proteins remains uncharacterized, although a few cluster proteins, involved in several biological functions, have been reported. CaSuc1p is a transcription factor involved in sucrose utilization by affecting an inducible α-glucosidase, and was the first Zn(II) 2 C 6 zinc finger protein described in C. albicans (Kelly and Kwon-Chung, 1992). The CaCFR1 gene encoding a Zn(II) 2 C 6 protein was isolated by its ability to complement the fluconazole hypersensitivity of a S. cerevisiae mutant lacking the transcription factors Pdr1p and Pdr3p (Talibi and Raymond, 1999). An atypical protein with a C-terminal Zn(II) 2 C 6 motif, CaAzf1p, has also been reported previously (Whiteway et al., 1992). Some other C. albicans putative Zn(II) 2 C 6 transcription regulators have also been tentatively assigned by comparison with their corresponding S. cerevisiae homologues (Table 3). However, the functional role of these similarities remains unclear, and could be related to evolutionary aspects. As an example, some zinc cluster proteins control the expression of genes required for gluconeogenesis in S. cerevisiae, such as Cat8p (Hedges et al., 1995), Arg81p (Messenguy, 1976), Lys14p (Ramos et al., 1988) and Ppr1p (Marmorstein and Harrison, 1994), involved in the metabolism of arginine, lysine and pyrimidines, respectively, which have homologues in C. albicans with a high sequence identity. Another member of the Zn(II) 2 C 6 protein family, CaPut3p, involved in controlling enzymes required for proline use as a nitrogen source, was previously characterized in S. cerevisiae (Marczak and Brandriss, 1989).

Evolutionary relationships among Zn(II) 2 C 6 clusters
Phylogenetic analysis was performed using the entire DNA binding region, including the Zn(II) 2 C 6 Figure 5. Phenogram of Zn(II)2Cys6 domains. Proteins containing Zn(II) 2 Cys 6 domains have been obtained from the C. albicans genome database at http://www-sequence.stanford.edu/group/candida. The 10 N-and C-terminal flanking nucleotides were included within the input sequences cluster and N-and C-terminal flanking sequences of the 70 predicted C. albicans proteins. After alignment using the ClustalW program (Figure 2), the output data were submitted to the Phylip drawtree web interface utility and a phenogram was obtained ( Figure 5).
Sometimes there is strong support for grouping as inferred from bootstrap analysis. All the Zn(II) 2 C 6 proteins containing the fungal domain previously described, with the single exception of LEU3, have been clustered in a single branch. This consistent association of unknown proteins could represent regulation via a common pathway, although this remains to be elucidated. CWT1 and IPF6554 have also been consistently clustered on the basis of their Zn(II) 2 C 6 domains. This data, together with the presence of the PAS domain in both these two putative proteins suggests their possible involvement in similar functional processes that should be investigated.