Characterization of Putative cis-Regulatory Elements in Genes Preferentially Expressed in Arabidopsis Male Meiocytes

Meiosis is essential for plant reproduction because it is the process during which homologous chromosome pairing, synapsis, and meiotic recombination occur. The meiotic transcriptome is difficult to investigate because of the size of meiocytes and the confines of anther lobes. The recent development of isolation techniques has enabled the characterization of transcriptional profiles in male meiocytes of Arabidopsis. Gene expression in male meiocytes shows unique features. The direct interaction of transcription factors (TFs) with DNA regulatory sequences forms the basis for the specificity of transcriptional regulation. Here, we identified putative cis-regulatory elements (CREs) associated with male meiocyte-expressed genes using in silico tools. The upstream regions (1 kb) of the top 50 genes preferentially expressed in Arabidopsis meiocytes possessed conserved motifs. These motifs are putative binding sites of TFs, some of which share common functions, such as roles in cell division. In combination with cell-type-specific analysis, our findings could be a substantial aid for the identification and experimental verification of the protein-DNA interactions for the specific TFs that drive gene expression in meiocytes.


Introduction
Meiosis is a special type of cell division that, after two consecutive rounds of nuclear divisions, leads to the production of haploid gametes. The processes of homologous chromosome pairing, synapsis, and meiotic recombination all occur during meiosis. Meiotic recombination is essential for plant reproduction and breeding because it ensures equal segregation and genetic exchange between homologous chromosomes [1][2][3][4]. The male meiocytes of Arabidopsis occupy only a small fraction of the anther tissue and are surrounded by somatic anther lobes [5]. An effective meiocyte collection method was established only recently; this development has enabled investigations of the meiotic transcriptome [5,6]. Genomewide gene expression analysis revealed unique transcriptome landscapes during male meiosis [5,6].
Gene expression in eukaryotic cells is regulated by transcription factors (TFs). There are around 2000 TFs in the Arabidopsis genome [7], and interactions of the DNA-binding domains of TFs with specific cis-regulatory elements (CREs) can activate the expression of several to many thousands of target genes. The transcriptional domains of regulatory genes are critically important in many developmental processes [8]. Meiosis operates in a highly specified cell cluster and thus requires precise spatial and temporal control [3]. In Arabidopsis, the expression of many meiotic genes such as AtDMC1 [9,10], SDS [11], MMD1 [12], and RCK [13] is highly regulated. Studying the commonness and distribution of CREs in the promoters of coexpressed genes can help facilitate the identification of signaling networks in specific cell types (e.g., [14][15][16][17]). For example, CREs or promoter motifs have been investigated in sperm cells (mature pollen) of both rice and Arabidopsis [18,19].
Transcriptome profiling experiments have shown that more than 1,000 genes were preferentially expressed in meiocytes [5]; a high proportion of the promoters of such preferentially expressed genes were sufficient to drive green fluorescent protein (GFP) reporter activity in meiocytes [20]. These preliminary studies laid a substantial foundation that has enabled the mining and the examination of the common structures of meiotically active promoters. In this study, the sequences of 50 meiotically active promoters were analyzed.

BioMed Research International
The putative CREs in these promoters were identified; these CREs may be responsible for the high activity of these promoters in male meiocytes.

Materials and Methods
We selected candidate genes from data generated in a previous mRNA deep-sequencing study of meiosis-specific genes in Arabidopsis [5]. These included the most highly expressed genes in male meiocytes. In a list with genes that had ≥4 times higher expression in meiocytes than in anthers, top 50 genes in the meiocytes to seedling comparison list were chosen with exclusion of transposable element genes. The difference in expression between meiocytes and anther, the difference in expression between meiocytes and seedlings, the annotated function, and the GO (gene ontology) functional categorization of the 50 top genes are presented in Supplemental File 1, available online at http://dx.doi.org/10.1155/2014/708364. As a negative control, 50 genes randomly selected from an Affymetrix ATH1 microarray experiment deposited in the NASC database were analyzed [21]; see Supplemental File 2 for descriptions of these control genes.
One Kb of upstream sequences relative to the transcription start sites were retrieved using Regulatory Sequence Analysis Tools (RSAT, http://rsat.ulb.ac.be/rsat/) [22]. Analysis of known CREs was initially performed using SIGNALS-CAN program in plant cis-acting regulatory DNA elements (PLACE, http://www.dna.affrc.go.jp/PLACE/) [23,24]. Analysis of statistically overrepresented elements was conducted by Pscan (http://159.149.160.51/pscan/) [25]. In the Pscan window, TAIR gene identifiers of the 50 genes were submitted, the source organism was specified as Arabidopsis thaliana, and the region to be analyzed was from −1000 to +0 with regard to the annotated transcription start site. For assessing the significance of the results, the values were computed by Pscan with a -test, a test that associated with each profile the probability of obtaining the same score on a random sequence set [25]. An element is considered to be significantly overrepresented if the value is less than 0.01. Additional analysis for unknown novel motifs was conducted by Promzea (http://promzea.org) [26]. 1000 bp long promoter regions were analyzed and each predicted motif was provided with a mean normalized conditional probability (MNCP); a MNCP score greater than 1 indicates that the motif is more represented in the input data set compared to a random set of promoters/first introns [26]. Motifs predicted by Promzea were compared with experimentally defined motifs in the PLACE database using STAMP [27]. Strand bias analysis of putative CREs was performed using Athamap (http://www .athamap.de/) [28][29][30][31][32], −1000 to 0 regions relative to the transcription start site were analyzed, and the total strand distribution of CREs was the sum of the individual CRE numbers in each promoter in the "overview" search result.
DOFCOREZM was the most abundant CRE in the 50 putative promoter sequences. It is a core site for the binding of Dof proteins in maize. The Dof proteins are a family of plantspecific TFs that includes Dof1, Dof2, Dof3, and PBF [33,34]. Maize Dof1 was suggested to be a regulator of the expression of the C4 photosynthetic phosphoenolpyruvate carboxylase (C4PEPC) gene [35]. Dof1 also enhances transcription of the cytosolic orthophosphate dikinase (cyPPDK) genes and the nonphotosynthetic PEPC gene [33]. Maize Dof2 suppresses the promoter of C4PEPC [35]; PBF is an endosperm-specific Dof protein that binds to the prolamin box of a native Bhordein promoter in barley endosperm [36]. CACTFTPPCA1 is a key component of Mem1 (mesophyll expression module 1) and is found in the distal promoter region of the C4 isoform of phosphoenolpyruvate carboxylase (ppcA1) in the C4 dicot Flaveria trinervia; it determines the mesophyll-specific expression of ppcA1 [37]. ARR1AT is the binding element of ARR1 found in Arabidopsis. ARR1 is a response regulator [38]. CAATBOX1 is responsible for the tissue specific promoter activity of a pea legumin gene [39]. GATABOX is required for light-dependent and nitrate-dependent control of transcription in plants [40]. The GATA motif has been found in the promoter ofthe Cab22 gene that encodes the Petunia chlorophyll a/b binding protein; this motif is the specific binding site of ASF-2 [41].
In addition to the five CREs that were found in all 50 promoters, there are 13 CREs that were found in at least 80% of the promoters (Figure 1 Among these, seven elements are found in genes specifically expressed in particular organ. POLLEN1LELAT52 is one of two codependent regulatory elements responsible for pollen specific activation of tomato (Lycopersicon esculentum) LAT52 gene [42]. GTGANTG10 is found in the promoter of the tobacco late pollen gene g10 [43]. EBOXBNNAPA is a motif associated with storage proteins [44]. TAAAGSTKST1 is a target site in the control of guard cell-specific gene expression [45].
Copy numbers of CREs  The involvement of these CREs in responses to environmental factors points to possible roles for these elements in combining signals from meiotic process and environmental factors, especially light and stress. The aforementioned PLACE motifs represent the basic CREs required for a promoter but may not be statistically overrepresented as compared with the average level of CREs in the Arabidopsis genome. Among the PLACE motifs that were present in at least 80% of the promoters we examined (Figure 1), 17 of 18 are also present in rice sperm cell-specific genes [18]. The only exception to this striking similarity was the EECCRCAH1 CRE.
We further searched for motifs that were statistically overrepresented. That is, the frequency of an element in the 50 examined promoters is above the average level of the Arabidopsis genome. Six overrepresented putative TF binding site motifs were identified in our Pscan analysis ( Figure 2 and Table 1). When we used 50 randomly selected genes (negative control) as input, only one such overrepresented motif was detected (Supplemental File 3), indicating that the meiotically active promoter sequences possess more conserved sequences.
The most significantly abundant motif detected by Pscan, CTCAGCG, is the binding sequence of Arabidopsis CELL DIVISION CYCLE 5 (AtCDC5), which is expressed extensively in shoot and root meristems and may function in cell cycle regulation [58,59]. This result suggests that similar regulatory machinery functions in meiocytes and meristems and that such machinery leads to high mitotic or meiotic cell division activity. Another overrepresented motif contains the core binding motif GTAC that is recognized by the plant-specific SQUAMOSA promoter binding protein (SBP) domain transcription factor AtSPL14, which is involved in plant development and resistance to programmed cell death [60]. The binding motif of the RAV1 (RAV: for related to ABI3/VP1) DNA binding protein is overrepresented in the Pscan search results [61]; RAV1 is a regulator of plant development and is involved in plant responses to biotic and abiotic stress [62][63][64][65]. Another overrepresented motif is recognized by ERF1; a TF that belongs to the EREB/AP2 family and regulates plant responses to jasmonate, ethylene, and fungi [66][67][68][69][70]. The statistically overrepresented CE-1 like sequence CACCG is an ABA response sequence in a number of ABA-related genes, and it is the target of the maize abscisic acid insensitive 4 (ABI4) protein [71]. Another Pscan motif, the SCHLAFMÜTZE (SMZ) binding site, is the target of an AP2-like transcription factor that acts as a repressor of flowering [72]. As a complement to our Pscan analysis, we searched for novel promoter DNA motifs associated with upregulation in Arabidopsis meiocytes using the Promzea motif discovery tool [26]. Nine overrepresented motifs were detected by Promzea with MNCP scores >1 in the promoters of the 50 meiotically active genes; five were detected in the promoters of the 50 randomly selected control genes (Supplemental File 4). This result supports the result from the Pscan analysis that meiotically active promoters possess more conserved motifs than randomly selected promoters. The 14 motifs matched to different experimentally defined motifs in the literature (Figure 3 and Supplemental File 5).
Motif1 from the Promzea analysis was statistically close to the TATABOX1 element, an element that is critical for the initiation of tissue specific transcription (Figure 3) [73,74]. Motif4 matched the phosphate response domain GMHDLGMVSPB [75]. Motif3 matched the experimentally defined motif PIIATGAPB, which is responsible for lightactivated gene expression [75]. Motif2 matched the E2FAT motif that is the binding site of E2F. The E2F transcription factors control the cell cycle by regulating the transcription of genes required for cell cycle and DNA replication [76]; these processes are obviously important in meiosis. Motif8 was similar to the pathogen/elicitor-related element TL1ATSAR [77]. Of the nine motifs predicted by Promzea, four motifs (Motif5, Motif6, Motif7, and Motif9) were enriched with CG, a property found in regulatory elements that is related to DNA methylation. CpG methylation is known to suppress transcription [78]. The presence of CG-enriched motifs identified in our analysis suggests that like gene activation, gene repression is also important for meiotically active gene regulatory networks, for example, the suppression of meiosisrestricted processes in somatic tissues. In addition, motif comparison analysis using STAMP found that these motifs possess other properties: Motif9 matched to INTRONLOER that is involved in 3 intron-exon splice junctions in plants [79], Motif5 matched to REGION1OSOSEM that is involved in the control of transcription by ABA [80], Motif6 matched to the tissue specific expression element BS1EGCCR [81], and Motif7 matched to the ammonium response element AMMORESVDCRNIA1 [82].
In the promoters of the negative genes, CREs are almost equally distributed on both the sense and the antisense strands (CREs on sense strand/CREs on antisense strand = 2742/2706 = 1/0.987); however, comparatively large numbers of CREs are located on the antisense strand compared to the sense strand of the meiotically active promoters (CREs on sense strand/CREs on antisense strand = 2758/2941 = 1/1.066). Interestingly, a similar bias of CRE distribution on the antisense strand is observed in promoters of rice sperm cell-specific genes [18].
The information from this study can be used in efforts to characterize the interactions between regulatory elements and TFs in meiocytes. Cell-type-specific analysis of TF expression is one of the strategies for sorting true protein-DNA interaction from numerous potentially spurious candidates [83]. For example, one of the PLACE motifs identified in this study, the GATABOX (Figure 1(a)), is the binding motif of the conserved C2C2-GATA TFs that have two GATA zinc fingers [40]. There are 29 C2C2-GATA family members that have been identified in Arabidopsis. They are highly expressed in early flower domains, and a few are involved in flower development [84,85]. In our analysis, we identified two members of this family of TFs that are highly expressed in male meiocytes (AT5G47140 and AT1G08000, Table 2). Therefore, AT5G47140 and AT1G08000 are better candidates than other C2C2-GATA family members for being proteins that can bind to GATABOX CREs in meiocytes. E2F transcription factors are essential for the regulation of the cell cycle and DNA replication. Three classical E2F proteins (E2Fa-c) and three atypical E2F proteins (E2Fd-f) have been characterized in Arabidopsis [86,87]. Among these, E2Fa (AT2G36010) and E2Fe (AT3G48160) are highly expressed in meiocytes (Table 2); they may therefore be better candidates than other E2Fs for being proteins that can bind to E2FATlike CREs in meiocytes, and this may link the E2Fs to the Promzea motifs Promzea motifs Figure 3: Sequence logos of novel motifs in the promoters of genes preferentially expressed during meiosis, detected using Promzea. The best match of each motif in the PLACE database is indicated in the panels to the right. Letters in the logos abbreviate the nucleotides (A, C, G, and T) and are sized relative to their occurrence. The -value for STAMP is indicated by the false discovery ratio (FDR). control of meiotic processes such as the meiotic cell cycle and DNA replication. More than half of the overrepresented CREs identified in this study are binding sites of TFs that function in plant responses to environmental factors. We therefore infer that, during meiosis, exogenous signals are perceived largely through particular CRE and that this is especially likely for light and stress signals [89][90][91][92].

Conclusions
In this study, which aimed to identify CREs associated with genes preferentially expressed during meiosis, we analyzed 1 kb upstream regions of the 50 genes that were highly expressed in Arabidopsis meiocytes. Although the CREs in 10 promoters of meiotically active genes were analyzed in our previous study [20], here we performed a more comprehensive in silico study with a larger number of genes. The CREs that we identified in the promoters of these 50 genes may be responsible for the high activity of corresponding promoters in male meiocytes. The information obtained from this study can be used to identify TFs that regulate meiotically active gene expression and, more attractively, the synthesis of artificial promoters that could drive high gene expression in meiocytes. As meiosis is evolutionarily conserved, the information on transcriptional domains obtained from the model system Arabidopsis has value not only in assessing the conservation of functional pathways in meiosis of other eukaryotes but also in applications seeking to improve crop plants.