The WRKY Transcription Factor Genes in Lotus japonicus

WRKY transcription factor genes play critical roles in plant growth and development, as well as stress responses. WRKY genes have been examined in various higher plants, but they have not been characterized in Lotus japonicus. The recent release of the L. japonicus whole genome sequence provides an opportunity for a genome wide analysis of WRKY genes in this species. In this study, we identified 61 WRKY genes in the L. japonicus genome. Based on the WRKY protein structure, L. japonicus WRKY (LjWRKY) genes can be classified into three groups (I–III). Investigations of gene copy number and gene clusters indicate that only one gene duplication event occurred on chromosome 4 and no clustered genes were detected on chromosomes 3 or 6. Researchers previously believed that group II and III WRKY domains were derived from the C-terminal WRKY domain of group I. Our results suggest that some WRKY genes in group II originated from the N-terminal domain of group I WRKY genes. Additional evidence to support this hypothesis was obtained by Medicago truncatula WRKY (MtWRKY) protein motif analysis. We found that LjWRKY and MtWRKY group III genes are under purifying selection, suggesting that WRKY genes will become increasingly structured and functionally conserved.


Introduction
Transcription factors are crucial in regulating gene expression. Transcription factors present sequence-specific DNA binding sites and are able to modulate the transcription rate of downstream target genes [1]. WRKY genes have primarily been located in plants, where they are one of the most important transcription factor families [2]. WRKY genes are defined by having a unique WRKY domain of approximately 60 amino acid residues [2]. The WRKY domain contains a highly conserved amino acid sequence WRKYGQK at the N-terminal and a metal chelating zinc finger motif (C-X 4-5 -C-X 22-23 -H-X-H, (C 2 H 2 ) or C-X 5-8 -C-X 25-28 -H-X 1-2 -C, (C 2 HXC)) at the C-terminal end [2][3][4][5]. In some WRKY genes, the WRKY domain can be characterized as WRRY, WSKY, WKRY, WVKY, or WKKY [6]. WRKY transcription factors interact with the W-box (TTGAC[T/C]) sequence in promoter regions to modulate gene expression [4,7,8]. In addition, WRKY transcription factors bind SURE, a novel ciselement in higher plants, to regulate sugar response [9].
Members of the WRKY family can be classified into three groups according to the number of WRKY domains and the pattern of the zinc finger motif [2]. Generally, group I WRKY transcription factors contain two WRKY domains with distinct functions. Previous studies have demonstrated that the C-terminal WRKY domain mediates sequencespecific binding to the target DNA [8,10,11]. It has been proposed that the N-terminal WRKY domain increases the affinity or specificity of these proteins to the target sites [2]. Group II and III WRKY transcription factors contain one WRKY domain with a C 2 H 2 zinc finger motif and C 2 HXC zinc finger motif [2]. Based on a phylogenetic analysis of the WRKY family, the members of group II can be divided into five subgroups: IIa, IIb, IIc, IId, and IIe [2].

Sequence Database
Search. The L. japonicus genome sequence (build 2.5) was downloaded from http://www .kazusa.or.jp/lotus/. The complete set of WRKY gene sequences was identified using a deliberative process. First, a Hidden Markov Model (HMM) profile of the WRKY domain (PF03106) was downloaded from the Pfam database (http:// pfam.sanger.ac.uk/) [50]. We employed the WRKY domain as a query to identify all possible WRKY gene sequences in the L. japonicus genome database using the BLASTp program ( value = 0.001). Subsequently, a search on the Pfam database was used to confirm and classify each putative WRKY sequence. We located overlapping genes by aligning all of the candidate WRKY gene sequences using Clustal W [51]. Only the nonoverlapping WRKY sequences were used for further analysis.

Gene Duplication and Chromosomal Locations of WRKY
Genes. To detect potential gene duplications, we aligned and calculated all of the relevant genes identified in L. japonicus genomes. We defined gene duplication between any two loci such that [52]: (1) the alignable nucleotide sequence covered >70% of the longer sequence; (2) the amino acid identity between the sequences was >70% identical.
In order to determine the physical locations of WRKY genes in chromosomes, we blasted each WRKY gene as a query against the L. japonicus genome (http://www.kazusa .or.jp/lotus/) to determine the initiation site of each gene. MapInspect software was used to draw the location images of the WRKY genes (http://www.plantbreeding.wur.nl/uk/ software mapinspect.html).

Multiple Sequence Alignments, Phylogenetic, and Gene
Structure Analysis. To examine the domain organization of WRKY proteins in detail, multiple sequence alignments of WRKY domain sequences were performed using Clustal W [51]. Phylogenetic and molecular evolution analyses of WRKY proteins were conducted in MEGA 4.0 [53]. Phylogenetic trees were estimated with the Neighbor-Joining method. Bootstrap analyses of 1,000 repetitions were obtained for each tree to analyze statistical support for nodes. The complete amino acid sequences of group III WRKY genes were analyzed for evidence of selection pressure [54]. Gene structure display server (GSDS) software [55] was used to illustrate exon-intron organization for individual WRKY genes by comparing the cDNA sequence with the corresponding genomic DNA sequence.

Identification of Conserved
Motifs. The program MEME 4.9 [56] was used to predict motifs in the WRKY domain with the following parameters: (1) any number of repetitions, (2) an optimum motif width between 6 and 200 residues, (3) and a maximum of 20 motifs. Structural motif annotation was performed using the Pfam database.

Selection Pressure in Group III WRKY Proteins.
The amino acid sequences of group III Medicago truncatula (MtWRKY) and L. japonicus WRKY (LjWRKY) proteins were used to estimate phylogenetic trees and then the trees were used to detect evidence of selection. The PAL2NAL program [57] was used for conversion of a protein sequence into the corresponding nucleotide sequence. We used PAML 4.7 [58] to analyze codon substitution patterns in a maximum likelihood framework, implementing a site-specific model. The program CODEML was employed to calculate the N / S ratio (or ), the ratio of nonsynonymous/synonymous distances. Generally, = 1, >1, and <1 indicate neutral, positive, and purifying selection, respectively. We detected variation in among sites by employing a likelihood ratio test (LRT) between M0 versus M3 and M7 versus M8 [54]. The nodes were considered to have undergone positive selection, if they satisfied the following criteria [59]: (1) an estimate of > 1 under M8, (2) sites identified to be under positive selection by Bayes Empirical Bayes (BEB) analysis, (3) and a statistically significant LRT.

Identification and Classification of WRKY Genes.
In this study, a total of 71 WRKY genes in the L. japonicus genome (build 2.5) were identified (Table 1). Among these sequences, 10 WRKY genes were excluded from this study due to the divergent structures of the putative proteins in these genes, a lack of specific domains or motifs, and the short length International Journal of Genomics 3  (Tables 1 and 2). To obtain a better classification within group II WRKY proteins, we constructed a phylogenetic tree with 42 group II LjWRKY using the Arabidopsis WRKY sequence as a reference [2] ( Figure 1). It was found that LjWRKY genes in group II could be divided into six subgroups, including 5 members in subgroup IIa, 8 in subgroup IIb, 13 in subgroup IIc, 5 in subgroup IId, 9 in subgroup IIe, and 2 in subgroup IIx    (Tables 1 and 2 and Figure 1). Among them, LjWRLY19 and LjWRKY37 genes did not cluster with Arabidopsis, and we temporarily named this group IIx ( Figure 1).
We found 9 potential pseudogenes among 61 WRKY genes, with either a premature stop codon or a frame shift mutation. Among these pseudogenes, 3 genes were found in group I, 2 in group IIa, 2 in group IIc, and 2 in groups IIx and IIe (Table 1).

Gene Duplication and Chromosomal Locations of WRKY
Genes. Gene duplication, including tandem and segmental duplication events, plays a crucial role in genomic expansions. Two or more duplicated genes located in the same chromosome are defined as tandem duplication, while other types of gene duplication are defined as segmental duplication events. We detected only one tandem duplication (LjWRKY34 and LjWRKY35) on chromosome 4 ( Figure 2), suggesting that WRKY genes in L. japonicus are not recently generated by gene duplication.
A total of 48 WRKY genes could be mapped to chromosomes 1-6, and the others could not be conclusively mapped to any chromosomes, because some WRKY genes have no precise location information. Fourteen WRKY genes including two group I, 10 group II, and two group III genes were located on chromosome 1. Thirteen genes were mapped to chromosome 4. Four genes were mapped to chromosome 3 (one group I, two group II, and one group III genes) and chromosome 6 (two group I and two group II genes), respectively. Eight (two group I, four group II, and two group III genes) and five WRKY genes (five group II genes) were found on chromosome 2 and chromosome 5, respectively ( Figure 2).
A gene cluster is defined as a chromosome region with two or more genes located within 200 kb sequence [60]. Using this criterion, we found 13 WRKY genes forming six gene clusters. Chromosomes 2 and 4 each contain two gene clusters, while only one gene cluster was found on chromosomes 1 and 5, respectively ( Figure 2). No clusters were found on chromosomes 3 and 6. Through chromosomal location and gene cluster analysis, we found that the number of genes on chromosomes is disproportionate to the number of gene clusters. For example, 14 WRKY genes were found on chromosome 1, which contains only one gene cluster.

Phylogenetic Analysis and Gene
Structure. Amino acid residues of WRKYGQK are the distinguishing regions of the WRKY transcription factor [2,27]. Multiple alignment analysis of LjWRKY domains found that mutations occurred at R, Y, and Q in the conserved WRKYGQK sequence ( Figure 3). Further study showed that the variation arose from amino acid substitutions of R to K or L or from K to C and from Q to Y, E, L, or K ( Figure 3). On the other hand, we identified a CX 4 CX 22 HXH zinc finger motif in subgroup In and IIx genes, a CX 4 CX 23 HXH motif in subgroup Ic and IIe genes, and a CX 5 CX 23 HXH motif in subgroup IIa, IIc, IId, and IIe genes ( Figure 3). We found that the WRKY domain was replaced by WKKY in subgroup In and IIx genes, suggesting that subgroup IIx genes originated from 6 International Journal of Genomics        Figure 4(a)). The group I proteins contain two WRKY domains located at the N-terminal domain (In) or the Cterminal domain (Ic). Although In and Ic belong to group I, members of In and Ic were clustered in different clades, representing the sequence divergence of WRKY domain in In and Ic.

S S T K G C L A R K Q V S D S W A W R K Y G Q N: P S K V P H I P G H T K G Y Y R C S S S K G C P A R K Q V S D L W A W R K Y G Q K P I K G S P Y P ---R G Y Y R C S S S K G C S A R K Q V -D S W A W R K Y G K K S I K G T P Y P ---R N: Y F K C T S S K N: C P A L K R V -D L W A W R K Y G Q K P I K G S P Y P ---R N: Y Y R C S S S K G C A A R K Q V S D I W A W R K Y G Q K P I K G S P Y P ---R G Y Y R C S S S K G C L A R K Q V A D M W E W R K Y G Q K P I K G S T Y P ---R S Y Y K C N: N: M K K C D A K K Q V E R C R T D S S M L I I T Y T S T H N: H P -E R N: R S D P T M F I V T Y T A E H N: H P -E R S H L D P A V F L V T Y T A E H N: H S -E R S R A D P T M L V V T Y S S E H N: H P -E R S R T D P N: M L V I T Y T S E H N: H P -E Q S N: I K S H M F I V T Y K G E H S H L -E R S N: T E D D I F I V T Y T G D H S H P -E R N: K S E P A M F I V T Y T G E H N: H P -E R S R T D L S M F I V T Y I G D H D H E K L D D G Y G W R K Y G Q K S I L G A K F P R G Y Y R C T H R N: V Q G C L A T K Q V -D D D H S W R K Y G Q K D I L N: S Q F P R S Y F R C T R K H D Q G C R A N: K Q V -E D G Y N: W R K Y G Q K D I L G S K H P R S Y Y R C T F R K S K G C W A T K Q V -E D G F T W R K Y G Q K V I H A S K Y P R N: Y Y R C T R ---Q K C L A K K R V L D D G Y S W R K Y G Q K D I L G A K F P R G Y Y R C T H R N V Q G C Q A T K Q V -D D G Y N: W R K Y G Q K D I L G A K Y P R S Y Y R C T F R N T Q N: C W A T K Q V L D D G Y S W R K Y G Q K D I L G A I Y P R G Y Y R C T H R N V Q G C L A T K Q V Q R S D E D P T T I E V T Y R G R H T C K Q R I Q E N: P D M Y Q I T Y I G F H T C K Q R S D E D P T I L D I S Y H G K H T C S Q R L D D T P E I Y E V T Y R N: E H T C -Q K S D K D P T M Y E I N: Y R G K H T C T Q R S D E D P T I F D I T Y R G K H T C S Q R S D E D S T V F E I T Y R G N
Clade Ic contained 14 members in L. japonicus, including 12 members with WRKY domains at the C-terminal region and two group II members (LjWRKY20 and LjWRKY63) (Figure 4(a)). These two group II members were clustered with LjWRKY24c and LjWRKY32c, respectively, indicating a common origin of their domains. Moreover, two group II members (LjWRKY19 and LjWRKY37) were also found in the In clade.
Group II can be divided into five clades with high bootstrap values (Figure 4(a)). Further analysis revealed that IIa and IIb form a single clade and IId and IIe form a clade, while IIc contained 11 LjWRKY members form a clade (Figure 4(a)). This suggests that the domains had a recent gene ancestor or formed under similar selective pressures. Clade III contained seven members which are more similar to clade IId and clade IIe than any other members (Figure 4(a)), suggesting that they may have shared an ancestor before divergence of group II and group III.
During analysis of the cDNA and DNA sequences, we found that most of the LjWRKY genes contained two types of introns in their WRKY domains. The phase-2 intron is spliced exactly after the R position, similar to the splicing position observed in Arabidopsis [2]. We designate the phase-2 intron as an R-type intron. A phase-0 intron is located before the V position, at the sixth amino acid after the second C residue in the C 2 H 2 zinc finger motif [22]. We designate the phase-0 intron as a V-type intron. Interestingly, in subgroups Ic, IIc, IId, IIe, and III, the R-type intron is located before the zinc finger motif region in the WRKY domain of genes, while in subgroups IIa and IIb, the Vtype intron is found within the zinc finger motif region in the WRKY domain (Figure 4(b)). Furthermore, introns have been lost from LjWRKY28 (subgroup IIe), LjWRKY51c (subgroup Ic), and subgroups In and IIx (Figure 4(b)). Intron loss can be considered as the result of intron turnover, the result of homologous recombination between an introncontaining allele and a mature mRNA [22].

Conserved Motifs in LjWRKY Proteins.
With the exception of the conserved 60 amino acid residues, no functional or structural homologies were previously known from the remainder of the WRKY protein sequences [2]. Analysis of the 20 motifs revealed that LjWRKY motif lengths ranged from 11 to 113 and the distribution of 20 motifs in each amino acid sequence varied greatly (Table 3 and Figure 5). In addition, the function of the majority of LjWRKY motifs could not be predicted. Unexpectedly, we observed a herpes virus glycoprotein motif (motif14) in LjWRKY34 and LjWRKY35, which has not been reported in previous studies of WRKY genes. It will be interesting to analyze the function of this motif in LjWRKY genes in the future.
LjWRKY47n  Name Combined P value Figure 5: Distribution of 20 predicted conserved motifs in Lotus japonicus WRKY proteins. Motifs of Lotus japonicus WRKY proteins were identified by MEME program. The conserved amino acid sequences and length of each motif are shown in Table 3.

Evolutionary Analysis of Group III Genes in Plants to
Determine Selection Pressure in L. japonicus and M. truncatula. In order to study the phylogenetic relationships of group III genes, a phylogenetic tree was estimated using the WRKY domain of seven species, including monocots and dicots. Group III LjWRKY genes did not form a clade; on the contrary, group III LjWRKY genes formed a clade with MtWRKY genes (Figure 6). This suggests that we did not find any paralogs of group III LjWRKY genes, but it suggests that group III LjWRKY genes are orthologous to MtWRKY genes. Paralogous relationships were observed among WRKY genes in other species (i.e., MtWRKY, AtWRKY, PtWRKY, OsWRKY, and BdWRKY). Gene duplication events are considered as the most likely process to result in paralogous copies of genes [59].
To detect whether selection pressure affected group III LjWRKY genes, ( N / S ) was calculated for phylogenetic nodes in PAML (Table 4 and Figure 7). In L. japonicus and M. truncatula, the ML estimations of values for all nodes under the model M0 were <1 ( Table 4), suggesting that group III LjWRKY and MtWRKY genes have been under purifying selection during evolution. Nevertheless, the log likelihood ratio differences between models M3 and M0 were statistically significant for all nodes tested, except nodes 1 and 2 in MtWRKY (Table 4). This indicates that some genes may be under positive selection. Interestingly, we further analyzed the positive selection in group III genes with models M8 and M7. The values for all nodes were ≥1 under M8. However, only one node identified one positive selection site in group III MtWRKY genes under model M8 (Table 4). This result shows that group III WRKY genes in L. japonicus and M. truncatula have not undergone positive selection.

Discussion
WRKY genes are commonly found in land plants and many WRKY genes have been identified and classified in Arabidopsis [2], Oryza sativa [6,22,62], Hordeum vulgare [63], Cucumis sativus [59], Brachypodium distachyon [64], and Populus trichocarpa [61]. However, little information has been reported on WRKY genes in leguminous forage crops. In 2008, approximately 67% of the L. japonicus genome (472 Mb) sequences were available on public databases, representing 91.3% coverage of the gene space [49]. In our current work, we conducted an analysis of 61 WRKY genes in the L. japonicus genome. One hundred and four WRKY genes were identified in P. trichocarpa, while only 55 WRKY genes were discovered in C. sativus and Vitis vinifera genomes, respectively ( Table 2). The relative number of WRKY genes was not associated with genome size. For example, the number of WRKY genes was about 2x greater in P. trichocarpa (104) than in C. sativus (55), while these two plants have an approximately equal genome size (458 Mb and 487 Mb; Table 2). By analyzing the number of WRKY gene groups or subgroups (except for subgroup IIx) in each species where they have been characterized, we discovered an uneven distribution of the number of WRKY genes in each group. In rice, the largest number of WRKY genes (36) was found in group III, but only four genes were found to belong to subgroup IIa (Table 2). In C. sativus, however, more WRKY genes (16) are classified as subgroup IIc and fewer WRKY genes (4) are categorized in subgroup IIa (Table 2).
Although gene duplication events seem to have led to the expansion of WRKY genes in the Arabidopsis [2] and Oryza [22] genomes, duplicated WRKY genes were not detected in C. sativus [59]. It is not yet clear whether gene duplication typically occurs during LjWRKY gene evolution. Among the 61 WRKY genes, we found that, in L. japonicus, only 2 of them were involved in duplication events. In contrast, 11 gene duplication events were identified in the model plant M. truncatula. In addition, there were 42 PtWRKY gene duplication events identified in the P. trichocarpa genome [61], with 29 out of 42 PtWRKY genes arising from segmental duplication. This comparison suggests that the expansion of LjWRKY genes is not necessarily dependent on gene duplication events. We found few duplicated LjWRKY genes in the L. japonicus genome and there are at least three possible explanations: (1) most duplicated genes have been lost after segmental duplication events in LjWRKY genes [65]; (2) LjWRKY genes that are nonfunctional or duplicate the function of other copies are inclined to disappear to avoid fitness cost; and (3) the draft sequenced genome in L. japonicus may not yet contain all WRKY genes present in the genome.
The WRKY gene sequences contain two types of conserved introns (R-type and V-type) [22], and we found them in this study. In Arabidopsis [2] and C. sativus [59] subgroup IIa and IIb WRKY genes, R-type introns are inserted in domains located at the fourth amino acid (K residue) after the second C residue in the zinc finger region and there are no Vtype introns. These results suggest that introns in subgroup IIa and IIb WRKY genes may have different origins.
The average length of the conserved introns (597 bp) in L. japonicus WRKY domains was longer than that in Arabidopsis (241 bp) but shorter than that in M. truncatula (705 bp). In nematodes and mammals, there is a dramatic decline in average intron size when there is increased gene expression [66]. However, the correlation between expression of WRKY genes in various plants and the intron length needs to be tested.
We found an interesting phenomenon which may provide context to interpret group I, II, and III WRKY gene origin and evolutionary relationships. Two conserved motifs (motif4 and/or motif9) were observed after motif1 (contains the conserved sequence WRKYGQK) in the N-terminal region, while, in the C-terminal region, other conserved motifs (motif5 or/and motif13) occur before motif1 ( Figure 5). In addition, analysis of group II and III LjWRKY gene motifs revealed that most LjWRKY genes contain motif1, motif5, and/or motif13. However, motif1, motif4, and/or motif9 in the N-terminal region of group I genes are distributed in LjWRKY19, LjWRKY37, (group II) and LjWRKY25 and LjWRKY27 (group III), respectively ( Figure 5). Previous research showed that group II and III WRKY genes evolved from group I through the loss of the N-terminal WRKY domain [2,5]. However, our results indicate that some WRKY genes in groups II and III may have originated from the N-terminal region of group I. From the phylogenetic relationship, we found that LjWRK19 and LjWRKY37 (group II) clustered with group In with well-supported bootstrap values, suggesting that they have a common origin or ancestry. Moreover, we consider that some group III WRKY genes could have resulted from a mutation in the zinc finger motif in the N-terminal of group I genes. To confirm our inference, 13 we analyzed the conserved motifs in MtWRKY genes. The same phenomenon was detected in MtWRKY genes based on analysis of location of motifs (data not shown) and 13 MtWRKY genes might have evolved from N-terminal of group I genes.
The natural selection pressure imposed by pathogens is expected to be diverse in different plants [67] and WRKY genes and NBS-LRR genes forming a fused gene may effectively resist a wide variety of pathogens. Fusion genes contain the C-terminal WRKY motif and a NBS-LRR (nucleotidebinding site-leucine-rich repeat) motif in the gene was identified in AtWRKY [68,69] and OsWRKY [22] genes. The gene is mainly involved in a pathogen response pathway in plants [70]. Fusion gene, one gene included WRKY and NBS-LRR domains and/or motifs, was not detected in L. japonicus.
The majority of group III AtWRKY genes under positive selection are expressed in response to various abiotic stresses [59]. In contrast, the expression of group III CsWRKY genes shows that these genes are under purifying selection and are specialized to respond as single type of stress [59]. Ling et al. showed that positive selection may have resulted in the functional divergence of duplicated genes during the expansion of group III WRKY genes in Arabidopsis [59]. Similarly, our selection pressure study of group III LjWRKY and MtWRKY genes found that expansion of these genes may be under purifying selection, although gene duplication events occurred within these genes. Purifying selection may generate genes with conserved functions or pseudogenization [71] in duplicated group III MtWRKY genes. Therefore, we speculate that group III LjWRKY and MtWRKY genes may be more conservative in their response to stress.