Evidence for Directed Evolution of Larger Size Motif in Arabidopsis thaliana Genome

Transcription control of gene expression depends on a variety of interactions mediated by the core promoter region, sequence specific DNA-binding proteins, and their cognate promoter elements. The prominent group of cis acting elements in plants contains an ACGT core. The cis element with this core has been shown to be involved in abscisic acid, salicylic acid, and light response. In this study, genome-wide comparison of the frequency of occurrence of two ACGT elements without any spacers as well as those separated by spacers of different length was carried out. In the first step, the frequency of occurrence of the cis element sequences across the whole genome was determined by using BLAST tool. In another approach the spacer sequence was randomized before making the query. As expected, the sequence ACGTACGT had maximum occurrence in Arabidopsis thaliana genome. As we increased the spacer length, one nucleotide at a time, the probability of its occurrence in genome decreased. This trend continued until an unexpectedly sharp rise in frequency of (ACGT)N25(ACGT). The observation of higher probability of bigger size motif suggests its directed evolution in Arabidopsis thaliana genome.


Introduction
Gene expression in eukaryotic organisms has been a topic of great interest. Careful regulation and recruitment of transcription factors (TFs) to cis regulatory elements in promoter regions lead to generation of specificity and diversity [1] in genetic regulation. Promoters are arrays of cis regulatory elements present upstream of a gene arranged with other specific cis elements. At present 469 cis elements have been reported in the plant cis regulatory element (PLACE) database. The prominent group of cis acting elements in plants contains an ACGT core. Several cis elements with this core have been shown to be responding to abscisic acid [2][3][4], salicylic acid [5], and light signals [6]. It has been reported by Foster et al. [7] that bZIP class of transcription factors binds to this core motif. In an elegant study Krawczyk et al. [8] showed deletion of two base pairs between activator sequence-1 (as1) palindromes does not affect binding of activator sequence binding factor (ASF-1) and TGA factors (which binds to TGACG sequence), whereas insertion decreases factor binding in vitro. In their study the distance between palindromic centers was 12 base pairs. Mehrotra et al. [9,10] have shown that this motif functions even when they are placed out of the native context. R. Mehrotra and S. Mehrotra [11] have shown that promoter activation by ACGT in response to salicylic and abscisic acids is differentially regulated by the spacing between these motifs. It contributes synergistically to gene expression by stabilising the transcription complex formed on minimal promoter [10]. The present study is an extension of aforementioned work. In this study, genomewide comparison of the frequency of occurrence of two ACGT elements without any spacers and also separated by spacers of different lengths was done. Based on the data obtained we report that there is a directed evolution of bigger size of motif in the Arabidopsis thaliana genome.

Materials and Methods
The objective was to find out the frequency of the recurring sequences and then use these recurring sequences with 2 The Scientific World Journal  a random minimal promoter to predict transcription factors likely to interact with them. The genomic sequence database of Arabidopsis thaliana at http://www.arabidopsis.org/ (The Arabidopsis Information Resource, TAIR) was analyzed using software BLASTn (available at NCBI website). All sequences were run in BLASTn against whole Arabidopsis thaliana genome to find their frequency of occurrence. Accession numbers of Arabidopsis thaliana chromosomes are as follows: chromosome 1: NC 003070.9, chromosome 2: NC 003071.7, chromosome 3: NC 003074.8, chromosome 4: NC 003075.7, and chromosome 5: NC 003076.8.
Randomization of the sequence was carried out using SHUFFLE program [12]. Different sequences obtained are listed in Table 1. In the next step we found the transcription factors binding to these cis elements separated by different length of nucleotides. A 139 bp long minimal promoter Pmec [13] was used in this study. The minimal promoter sequence as shown below was suffixed to the sequences shown in Table 1; TCACTATATATAGGAAGTTCATTTCATTTGGAA-TGGACACGTGTTGTCATTTCTCAACAATTACCAACA-ACAACAAACAACAAACAACATTATACAATTACTATT-TACAATTACATCTAGATAAACAATGGCTTCCTCC.
These extended sequences were used in JASPAR core database [14] to scan for transcription factors and then these TFs were crosschecked with results obtained from CONSITE [15].

Promoters with Greater Length between ACGT Motifs Are
More Frequent. It has been reported that ACGT cis elements function even when they are placed out of native sequence context [9,10]. When the distance of separation between two ACGT elements are 5 base pairs, and 10 base pairs, they are induced in response to salicylic acid (SA) and abscisic acid (ABA), respectively. Interestingly, SA mimics biotic stress response and ABA mimics abiotic stress response in plants and thus is of great interest to plant biologists. Paixão and Azevedo [16] showed that multiplicity of cis element evolved through transitional forms showing redundant cis regulation. In this study, when the frequency of occurrence of two ACGT elements without any spacers and also separated by the spacer of different lengths was observed, we found that the total frequency of occurrence of two ACGT element in tandem is 1885 (Table 1), while the e value was same for all alignments obtained on a particular chromosome. When two ACGT elements were separated by spacer of 5, 10, and 25 nucleotides their frequency of occurrence was 72, 39, and 62, respectively. An unexpectedly high frequency of occurrence was observed when two ACGT elements were separated by 25 base pairs. According to the rule of probability the frequency of two ACGT elements separated by 25 base pairs should be less than when they are separated by 10 base pairs or lesser. Hobo et al. [17] have earlier reported that in ABA responsive promoters the distance between ACGT elements  Table 3: Alterations in transcription factor binding sites when spacer sequence length between two ACGT palindromes is gradually increased from 5 to 25 nucleotides.  1  0  1  1  1  1  2  EmBP-1  2  0  2  1  2  2  2  Gamyb  5  0  5  5  5  5  5  HAT5  2  0  2  2  2  2  2  HMG-1  6  0  6  6  6  6  6  HMG-I/Y  6  0  6  6  6  6  6  id1  5  0  5  5  5  5  5  myb.Ph3  1  0  1  1  2  1  1  PEND  1  0  1  1  1  1  1  squamosa  2  0  3  3  3  3  3  TGA1A  1  0  1  1  2  2  2  35  0  36  36  39  38  40   4 The Scientific World Journal is 30 base pairs. To address this discrepancy in the data obtained, we randomized the spacer sequence keeping the ACGT motif unchanged. The logic of this randomization was to identify how important is the distance between the binding sites for transcription factors. After randomization of the spacer there was a drop in the frequency of occurrence to 23, 14, and 21 from 72, 39, and 62 for (ACGT) N5 (ACGT), (ACGT) N10 (ACGT), and (ACGT) N25 (ACGT), respectively. This means that along with the distance between binding motifs there has been a positive selection for the sequence of the spacer in transcriptional regulation. In the next step we completely randomized the sequence and we observed that there is a drop in frequency of occurrence of two ACGT elements when separated by 10 and 25 base pairs while there was an unexpected increase in the frequency when ACGT elements were separated by five base pairs. This happened because randomization generated a motif that has been positively selected in evolution.

A and G Are the Preferred Bases.
We increased the spacer length one residue at a time and looked for the frequency of each resultant sequence in the database. As shown in Table 2, there has been preference for A and G in the spacer region between two ACGT sequences.

Increasing Spacing between Motifs Increases Transcription
Factor Binding Sites. Potential transcription factor binding sites for all experimental sequences when predicted using JASPAR CORE software and subsequently crosschecked with CONSITE revealed the minimal promoter sequence to be possessing 35 potential TF binding sites (Table 3, MPS). Interestingly the sequence ACGT as such has no site for binding of transcription factors but when minimal promoter is suffixed to it, an extra site for squamosa is generated and the total transcription factor binding site increases from 35 to 36 in minimal promoter alone (Table 3, (ACGT)(MPS)). When two ACGT elements in tandem are placed over minimal promoter sequence no extra site for binding of transcription factor is generated (Table 3, (ACGT) 2 (MPS)). However, when ACGT elements are separated by five base pairs (Table 3, (ACGT) N5 (ACGT)(MPS)), four additional transcriptional binding sites are generated while ATHB-5 binding site which existed in the earlier cases is lost. The new sites generated are for transcription factors bzip9-10, EmBP-1, myb.Ph3, and TGA1a. Placement of two ACGT elements separated by 10 base pairs, however, resulted in loss of one myb.Ph3 site and the total transcriptional binding site decreased to 38 (Table 3, (ACGT) N10 (ACGT)(MPS)). In case when ACGT elements are separated by 25 base pairs followed by minimal promoter an additional site for ARR10 and dof3 was generated (Table 3, (ACGT) N25 (ACGT)(MPS)). Based on the data obtained in this study, we report here that there has been directed evolution of bigger size of the motif in the Arabidopsis thaliana genome.

Conclusions
The central question in promoter evolution is to know how does cis regulatory element multiplicity evolved. The promoter regions of many genes contains multiple binding sites for the same transcription factor. Multiplicity may have evolved through transitional forms showing redundant cis regulation. In this paper, we focused on multiplicity of ACGT cis element and the distances between them which occurs in natural promoters. We found that ACGT element separated by 25 base pairs is more frequent than those by 10 base pairs which is against the law of probability. It signifies that under some evolutionary forces this interval was favoured since this distance may cause changes in the level of gene expression or in its robustness against variation in transcription factor concentration. Selection for different levels of expression of certain genes in certain environment could, over time, generates a positive association between cis element multiplicity and expression level.