Emerging Functions of Transcription Factors in Malaria Parasite

Transcription is a process by which the genetic information stored in DNA is converted into mRNA by enzymes known as RNA polymerase. Bacteria use only one RNA polymerase to transcribe all of its genes while eukaryotes contain three RNA polymerases to transcribe the variety of eukaryotic genes. RNA polymerase also requires other factors/proteins to produce the transcript. These factors generally termed as transcription factors (TFs) are either associated directly with RNA polymerase or add in building the actual transcription apparatus. TFs are the most common tools that our cells use to control gene expression. Plasmodium falciparum is responsible for causing the most lethal form of malaria in humans. It shows most of its characteristics common to eukaryotic transcription but it is assumed that mechanisms of transcriptional control in P. falciparum somehow differ from those of other eukaryotes. In this article we describe the studies on the main TFs such as myb protein, high mobility group protein and ApiA2 family proteins from malaria parasite. These studies show that these TFs are slowly emerging to have defined roles in the regulation of gene expression in the parasite.


Introduction
Transcription is the synthesis of an RNA molecule complementary to the DNA template by the action of several enzymes. Transcription, whether prokaryotic or eukaryotic, has three main sequential events such as initiation, elongation, and termination [1,2]. Initiation is the most important step and involves the binding of RNA polymerase to double-stranded DNA. Elongation is the covalent addition of nucleotides to the 3 end of the polynucleotide chain, and termination involves the recognition of the transcription termination sequence and the release of RNA polymerase [1,2]. Transcription is the first checkpoint in the gene expression. The sequence of DNA that is transcribed into an RNA molecule is called a "transcription unit" and it usually encodes at least one gene. Transcription is considerably more complex in eukaryotic cells as compared to bacteria. In bacteria, all the genes are transcribed by a single RNA polymerase, but the eukaryotic cells contain multiple different RNA polymerases that transcribe various classes of genes [3]. This enhanced complexity of eukaryotic transcription most probably facilitates the sophisticated regulation of gene expression needed to direct the activities of the many different cell types of multicellular organisms [4]. Three distinct nuclear RNA polymerases are responsible for transcribing different classes of genes in eukaryotic cells. The ribosomal RNAs (rRNAs) and transfer RNAs (tRNAs) are transcribed by RNA polymerases I and III, and the protein-coding genes are transcribed by RNA polymerase II to yield mRNAs [3,5].
For efficient transcription, RNA polymerase requires other proteins, commonly known as transcription factors (TFs), to produce the transcript. TFs or sequence-specific DNA-binding factor is a protein that specifically binds to DNA sequence and controls the stream of genetic information from DNA to mRNA [4]. TFs along with other proteins in a complex control the transcription by promoting (activator), or blocking (repressor) the recruitment of RNA polymerase to specific genes ( Figure 1). The main functions of TFs are to bind to RNA polymerase, to bind another TF, and to bind to cis-acting DNA sequences. TFs usually work in groups or complexes forming multiple interactions which allow for varying degrees of control over rates of transcription. In eukaryotes genes are generally in an "off " state, thus TFs mainly work to turn "on" the gene expression. Journal of Biomedicine and Biotechnology While in bacteria the genes are expressed constitutively until a TF turns it "off." The eukaryotic genes generally contain a promoter region upstream from the gene and/or enhancer region upstream or downstream from the gene with some motifs that are specifically recognized by different types of TFs [3]. One distinct quality of TFs is that they have DNAbinding domains (DBDs) that give them the ability to bind to specific promoter or enhancer sequences. The binding of TFs triggers the other TFs to bind and this creates a complex that ultimately facilitates binding by RNA polymerase, thus initiating the process of transcription [3,4].
The basal transcription complex and RNA polymerase II bind to the core promoter of protein encoding gene which is normally within about 50 bases upstream of the transcription initiation site [3]. Further transcriptional regulation is controlled by upstream control elements (UCEs), generally present within about 200 bases upstream of the initiation site ( Figure 1). Sometimes TATA box, present in the core promoter for Pol II, the highly conserved DNA recognition sequence for the TATA-box-binding protein, TBP, helps in the assembly of transcription complex at the promoter ( Figure 1). General transcription factors (GTFs) are an important class of TFs essential for transcription ( Figure 1). Generally GTFs do not bind to DNA, but they form part of the large transcription preinitiation complex which interacts with RNA polymerase directly ( Figure 1). The most common GTFs are transcription factors IIA (TFIIA), TFIIB, TFIIE, TFIIF, and TFIIH [3,6].
Malaria caused by the mosquito-transmitted parasite Plasmodium falciparum is the most serious and widespread parasitic disease of humans. Malaria is the cause of enormous number of deaths every year in the tropical and subtropical areas of the world. Among four species of Plasmodium, P. falciparum causes the most fatal form of malaria [7,8]. Each year, approximately two hundred and twenty-five million people become infected with malaria and around 781,000 die as a result according to the World Health Organization's 2010 World Malaria Report [9]. The malaria parasite has 14 chromosomes, more than 7000 genes and a four-stage life cycle as it passes from humans to mosquitoes and back again [7,10]. It is very efficient at evading the human immune response. P. falciparum has a complex life cycle that involves defined morphological stages accompanied by the stage-specific gene expression in both the human and mosquito host, but the mechanisms of transcriptional control in this parasite are not well known [11]. P. falciparum contains characteristics which are common to eukaryotic transcription [12]. But it also has exclusive patterns of gene expression and an AT-rich genome. The genome analysis reveals a relative paucity of transcription-associated proteins and specific cis-regulatory motifs [11]. These observations have led to reflect a reduced role for the TFs in transcriptional control in the parasite.
It has been suggested that protein levels during the life cycle of malaria parasite are controlled through posttranscriptional mechanisms, thus it is possible that posttranscriptional regulation may play a major role in the control of gene expression in P. falciparum. The comparisons of mRNA and protein levels across seven major developmental stages of the P. falciparum life cycle were conducted [13]. Even though reasonably high correlations were observed between the transcriptome and proteome of each stage, a considerable fraction of genes was found to display a delay between the peak abundance of mRNA and protein [13]. The quantitative protein expression profiling during the schizontstage of the P. falciparum development revealed that extensive posttranscriptional regulation and posttranslational modifications occur in malaria parasites. These observations further support that the posttranscriptional gene regulation events are widespread and of presumably great biological significance during the intraerythrocytic development of P. falciparum [14]. In a recent interesting study, it has been reported that PfCLKs (cyclin-dependent kinase-like kinase) play crucial role in malaria parasites erythrocytic replication, presumably by participating in gene regulation through the posttranscriptional modification of mRNA [15]. The rate of mRNA decay is also an essential aspect of posttranscriptional regulation in all organisms. The half-life of each mRNA is precisely related to its physiologic role and thus plays an important role in determining levels of gene expression. By using genome-wide approach to describe mRNA decay in P. falciparum, it was observed that the rate of mRNA decay increases severely during the asexual intraerythrocytic developmental cycle [16].
As global efforts to eradicate malaria have been unsuccessful, there is a vital requirement to decipher the biology of Plasmodium and in particular the mechanisms of gene regulation that manage its developmental cycle, so as to propose novel strategies to fight malaria. There are only few TFs well-characterized from P. falciparum. These are Myb1 protein, high mobility group box (HMGB) proteins, and the Apetala2 (AP2) domain-containing proteins [17]. The studies on these factors will be described in the following sections.

Myb Protein
Myb is an abbreviation derived from "myeloblastosis," an old name for leukemia. This family of proteins was first characterized in the avian myeloblastosis virus (AMV). Myb proteins, highly conserved in eukaryotes, belong to tryptophan cluster family and regulate gene expression by binding to DNA [18]. Their characteristic Myb DBD of approximately 50 residues contains three tandem repeats (R1, R2, and R3) with three frequently spaced tryptophan residues [18,19]. The structure of the Myb domain is similar to the helix-turn-helix motif of prokaryotic transcriptional repressors and eukaryotic homeodomains. Myb proteins bind DNA in a sequence-specific manner and regulate the expression of genes involved in differentiation and growth control [18].
Myb protein was recognized in the P. falciparum genome by aligning about 200 nonredundant eukaryotic Myb proteins and generating a consensus sequence analogous to the characteristic DNA-binding domain [20]. This consensus was used as query for the Plasmodium database which resulted in the annotation of a 414 amino acid long open reading frame PfMyb1 [20]. Initially, only one Myb domain (R2) was identified in PfMyb1 but the alignment of complete sequence of PfMyb1 (PlasmoDB number PF13 0088) with the DBD of three proteins, DdMybH, DdMyb2, and DdMyb3 of Dictyostelium discoideum resulted in the recognition of three Myb domains situated in the C-terminus of the protein as in DdMyb2 and DdMyb3 of D. discoideum, whereas in most of the Myb proteins DBD is located in the Nterminus [20]. However, in place of tryptophan, PfMyb1 contains imperfect repeats with a tyrosine or a phenylalanine. Moreover, a critical cysteine residue, which is conserved as the tryptophan residues was also found in R1 and R2, and it most likely plays a role in redox regulation [20]. This detailed computational analysis of PfMyb1 confirmed that it is a genuine Myb protein conserved in all the Plasmodium species [20]. It was further reported that PfMyb1 is expressed throughout all the erythrocytic developmental stages of the parasite (rings, early and late trophozoites, as well as early and late schizonts) [20]. The expression was analyzed in two different clones of P. falciparum, 3D7 and the gametocyte-less F12 derived from 3D7, and the difference in the mRNA profile resides in a lower expression of the Pfmyb1 transcript in the ring stage of F12 compared to 3D7, followed by a quick increase in early F12 trophozoites. Myb-DNA-binding activity was observed with a prototype (mim-1) and two putative Plasmodium Myb regulatory elements, pfmap1 (MAP kinase) and pfcrk1 (cdc2-related protein kinase) genes. These genes were originally reported to be expressed preferentially during erythrocytic asexual and sexual stages, respectively [21,22]. This interaction was confirmed to be specific since it was inhibited by specific competitors and anti-PfMyb1 antibody in band-shift assays. During erythrocytic development, the band-shift profiles were clearly different in the 3D7 and the gametocyte-less F12 clones, in contrast to the transcript level [20]. In a follow-up study, the same group used long double-stranded RNA (dsRNA) to reduce the cognate messenger and encoded protein and reported that the parasite cultures treated with dsRNA of PfMyb1 showed growth inhibition [23]. As a result of this dsRNA inhibition, the parasite mortality occurred during trophozoite to schizont stages of the development suggesting that PfMyb1 is essential for parasite growth [23].
They have also shown that PfMyb1 binds to a number of promoters such as the promoter of phosphoglycerate kinase, calcium-dependent kinase, TATA-binding protein, proliferating cell nuclear antigen, phosphatase, histones, and cyclin-dependent kinase within the parasite nuclei, and therefore directly regulates the key genes involved in cell cycle regulation and progression [23].

High-Mobility-Group Box Protein
In eukaryotes, the high-mobility-group (HMG) box nuclear factors are highly conserved throughout evolution. HMG box domain is composed of around 80 amino acids folded in three α-helices arranged in an L shape, and this domain is involved in DNA binding [24]. HMG box proteins can bind to non-B-type DNA structures such as cruciform and distorted AT-rich DNA sequences in a nonsequence-specific fashion [25]. This binding triggers DNA bending and assists the binding of nucleoprotein complexes that in turn repress or activate transcription [24]. HMG box domains are also involved in a variety of protein-protein interactions. HMG box proteins actively participate in chromatin remodeling by increasing nucleosome sliding and accessibility of the chromatin [25].
A P. falciparum gene encoding a typical HMG box protein was reported [26]. The gene for PfHMG consisted of one putative DNA-binding domain contained within a single exon. The amino acid sequence revealed that PfHMG lacks the acidic C-terminal domain, which is present in the HMG of higher eukaryotes and interacts with basic proteins such as histones [24,26]. This domain is also absent in the HMG of yeast and Babesia bovis [26]. The northern blot analysis of PfHMG RNA expression showed that HMG is expressed in all the stages of the asexual erythrocytic life cycle, with the highest level of transcript at early schizont stages [26].
In another study, four putative P. falciparum HMG box proteins including one previously reported were predicted by sequence homology [26,27]. PfHMGB1 was annotated within chromosome 12, PfHMGB2 and PfHMGB3 on chromosomes 8 and 12, and PfHMGB4 within chromosome 13, respectively [27]. PfHMGB1 (PlasmoDB number PFL0145c) 4 Journal of Biomedicine and Biotechnology and PfHMGB2 (PlasmoDB number MAL8P1.72) are small proteins under 100 amino acids long and contain one characteristic HMG box domain similar to B-Box of mammalian HMGB1 [27,28]. PfHMGB4 (PlasmoDB number MAL13P1.290) encodes a 160 amino acids long protein, but PfHMGB3 is a larger protein (2,284 amino acid), with two HMG box domains and several additional putative functional motifs, including one Myb domain [27]. The sequence analysis showed that the PfHMGB1 contains 45, 23, and 18%, while PfHMGB2 shares 42, 21, and 17% homology with Saccharomyces cerevisiae, human, and mouse HMG box proteins, respectively [28]. The in vitro studies performed with both the recombinant proteins showed that they were able to interact with distorted DNA structures and bend linear DNA. These proteins were expressed in both asexual-and gametocyte-stage cells, and PfHMGB1 is preferentially expressed in asexual erythrocytic stages and PfHMGB2 in gametocytes. The subcellular localization study revealed that both factors were present in the nucleus, but PfHMGB2 was also detected in the cytoplasm of gametocytes [27]. On the basis of differences in their levels of expression, subcellular localizations, and capabilities for binding and bending DNA, these factors most likely have role in transcriptional regulation of Plasmodium development [27]. In an interesting study it was reported that PfHMGB1 and PfHMGB2 are effective inducers of proinflammatory cytokines such as TNFα from mouse peritoneal macrophages [28]. These observations imply that secreted PfHMGB1 and PfHMGB2 are most likely responsible for producing host inflammatory immune responses associated with malaria infection [28].
The role of HMGB2 protein in regulation of sexual stage gene expression was evaluated by disrupting the Plasmodium yoelii gene encoding HMGB2. It is in vivo function in the vertebrate host the mouse and the mosquito Anopheles stephensi was studied [29]. It has been reported that the parasites lacking PyHMGB2 develop into gametocytes but have severe impairment of oocyst formation [29]. It was also shown that PyHMGB2 is not required for asexual growth, but it is involved in controlling the genes which are important for oocyst development in the mosquito. These results suggest that the protein expression in sexual stages is transcriptionally and translationally regulated, where PyHMGB2 acts as an important regulator of sexual stage development [29].

ApiA2 Family
Activator protein-2 (AP-2) or Apetala2 family of transcription factors constitutes a family of closely related and evolutionarily conserved proteins that bind to the DNA consensus sequence GCCNNNGGC and stimulate target gene transcription. Four different isoforms of AP-2 have been identified in mammals, termed AP-2 α, β, γ, and δ [30,31]. These proteins share a characteristic helix-span-helix motif at the carboxyl terminus, which, together with a central basic region, mediates dimerization and DNA binding. The amino terminus contains a proline/glutamine-rich domain, which is responsible for transcriptional activation. The general functions of the family appear to be the cell-type-specific stimulation of proliferation and the suppression of terminal differentiation during embryonic development. The proteins are able to form hetero-as well as homodimers [30,31]. The AP-2 factors are primarily localized in the nucleus, where they bind to the target sequences and regulate the target gene transcription.
Using a comparative genomic analysis, it has been shown that the apicomplexans possess this AP2 family of proteins which is commonly known as ApiAP2. About 20-27 members of this ApiAP2 family are present in different genomes, and P. falciparum ApiAP2 gene family has 27 members, which are largely conserved across Plasmodium species [17]. All of these are expressed throughout the four stages of the intraerythrocytic development cycle [31,32]. Each of these proteins contain one to four copies of the AP2 DNA-binding domain and similar to plants, these domains in ApiAP2 proteins are also approximately 60 amino acids long and are found in both single-and tandem-domain arrangements [31,32].
By using protein-binding microarrays, the DNA-binding specificities of two ApiAP2 proteins representing different classes of AP2 domain architectures from P. falciparum were demonstrated [33]. The gene with PlasmoDB number PF14 0633 encodes an 813-aa protein, which shows high level of expression during the ring stage of development [34]. It contains a single 60 amino acids AP2 domain and an adjoining AT-hook DNA-binding domain [33]. PF14 0633 has orthologues in all the sequenced Plasmodium genomes and all the other sequenced apicomplexan genomes. The other ApiAP2 gene with PlasmoDB number PFF0200c shows high level of expression in late-stage parasites and encodes a 1,979 amino acid protein containing two AP2 domains in tandem. These two AP2 domains are linked with each other by a conserved 17 amino acid sequence [33]. It was reported that in Plasmodium spp., the orthologous tandem AP2 domains of PFF0200c share ∼95% amino acid sequence identity, but the individual AP2 domains of PFF0200c share only 35% identity with each other [33]. These AP2 domains specifically bind with unique DNA sequence motifs that are found in the upstream regions of different sets of genes that are coregulated during asexual development. Interestingly, despite the sequence deviation between ApiAP2 proteins from distantly related Apicomplexan species (P. falciparum and Cryptosporidium parvum), the DNA-binding specificities of orthologous pairs of AP2 domains are highly conserved, that is, TGCATGCA, although their downstream targets may vary. This demonstrated an interaction between Plasmodium transcription factors and their putative target sequences [33].
Using PEXEL/VTS search, it was reported that AP2 proteins do not have motifs for apicoplast targeting, mitochondrial transit, endoplasmic reticulum trafficking, transmembrane domains, or host cell surface targeting but the classical lysine-and arginine-rich nuclear localization signals were identified concluding that this protein family consisted of TFs [33]. The Plasmodium-based global yeast two-hybrid study suggested that ApiAP2 proteins interact with each other and with chromatin-remodeling factors, Plasmodium histone acetyltransferase GCN5 [35]. The binding to chromatin-remodeling factors may help in the recruitment of these complexes to specific chromosomal locations and facilitate interaction with the core transcription machinery. The crystal structure of the DNA-bound dimer of the AP2 domain of PF14 0633 exhibits many of the canonical features of similar DNA-binding domains [36]. The structure of PF14 0633 shows that it dimerizes through a three dimensional domain-swapping mechanism in which the αhelix of one protomer is packed against the β-sheet of its dimer mate. It was further reported that the dimerization of the AP2 domain of PF14 0633 aligns Cys76 residues of each monomer with one another with enough proximity to permit the disulfide bond formation. It was interesting to note that the Cys76 residue is conserved in all the orthologues of PF14 0633 in Plasmodium spp., however it is not conserved in other related apicomplexan species [36]. This DNAinduced dimerization of the AP2 domain of PF14 0633 facilitates the conformational rearrangement of the rest of the protein or its interaction partners and this concurrently loops out intervening DNA among pairs of binding sites enriched in the upstream regions of a set of sporozoitespecific genes [36].
In a recent comprehensive study, the global DNAbinding specificities for the entire P. falciparum ApiAP2 family of DNA-binding proteins was biochemically and computationally characterized [37]. Their results revealed that the majority of proteins bind diverse DNA sequence motifs and occur in functionally related sets of genes. In a number of proteins, multiple AP2 domains within the same ApiAP2 protein were reported to bind distinct DNA sequences. In addition to high affinity primary motif interactions, the interactions with secondary motifs were also observed [37]. By mapping these sequences throughout the parasite genome, the results of this study provide a basis for developing a regulatory network underlying parasite development [37]. The overall studies on ApiAP2 family of proteins in P. falciparum suggest that these proteins are main components of gene regulation in the parasite. Although further work is needed in order to determine how ApiAP2 proteins function as transcriptional regulators. But the DNA-binding sequence specificity of these proteins, their conservation across Apicomplexa, and the extremely consistent expression patterns of their predicted downstream targets suggest their vital function in regulating parasite development.
During its asexual life cycle, the P. falciparum develops into several distinct morphological forms occupies various compartments in its human host and often faces drug treatment. The microarray-based transcriptomic studies of these stages reported remarkable changes in the steady-state mRNA levels of several genes, suggesting that differential gene expression is essential for development. It is well established now that gene regulation in P. falciparum consists of a bulk transcriptional event characteristic of the majority of genes from which differential expression of a minority of genes is selected by a combination of pretranscriptional and posttranscriptional mechanisms. Therefore, the modulation of expression of the targeted genes is the outcome of the blend of these diverse interactions.
Some of these important TFs have also been characterized from other protozoan parasites. For example, a member of the HMGB was identified in Entamoeba histolytica [38] and some Myb family members were characterized from Trichomonas vaginalis (reviewed in [39]). The information compiled in this paper suggests that P. falciparum indeed contains few TFs which are responsible for the gene regulation in various stages of its development. Relatively, little is known about how the parasite uses these few TFs to globally regulate the transcription in order to produce the proteins essential for its development and pathogenesis. Future work is undoubtedly required in order to solve this mystery. It has been suggested that combinatorial gene regulation might be the general mode of transcriptional regulation in P. falciparum or it can be assumed that the effect of various TFs on gene expression is additive. Therefore a very useful antiplasmodial approach should be to target and inactivate one or more of these TFs with drugs. This strategy will directly or indirectly affect the gene regulation and consequently the function of several downstream genes and crucial biological processes. Due to the effect on numerous genes this approach will be very helpful, and it will be relatively difficult for the parasite to develop resistance to this line of drugs.