Significant Microsynteny with New Evolutionary Highlights Is Detected through Comparative Genomic Sequence Analysis of Maize CCCH IX Gene Subfamily

CCCH zinc finger proteins, which are characterized by the presence of three cysteine residues and one histidine residue, play important roles in RNA processing in plants. Subfamily IX CCCH proteins were recently shown to function in stress tolerances. In this study, we analyzed CCCH IX genes in Zea mays, Oryza sativa, and Sorghum bicolor. These genes, which are almost intronless, were divided into four groups based on phylogenetic analysis. Microsynteny analysis revealed microsynteny in regions of some gene pairs, indicating that segmental duplication has played an important role in the expansion of this gene family. In addition, we calculated the dates of duplication by Ks analysis, finding that all microsynteny blocks were formed after the monocot-eudicot divergence. We found that deletions, multiplications, and inversions were shown to have occurred over the course of evolution. Moreover, the Ka/Ks ratios indicated that the genes in these three grass species are under strong purifying selection. Finally, we investigated the evolutionary patterns of some gene pairs conferring tolerance to abiotic stress, laying the foundation for future functional studies of these transcription factors.


Introduction
Transcription factors (TFs) are critical regulators of gene expression that control many important biological processes, such as cellular morphogenesis, signal transduction, and environmental stress responses [1]. Zinc finger TFs belong to one of the largest TF families in plants and can be categorized into at least 14 families, such as RING finger, WRKY, DOF, and LIM families [2][3][4]. These TFs have been proven to regulate gene expression with the aid of DNA-binding or proteinbinding proteins. However, previous reports discovered a new type of Arabidopsis zinc finger proteins, named CCCH zinc finger family that is involved in mRNA binding and processing [5]. CCCH-type proteins are TFs with a typical motif consisting of three cysteine residues and one histidine residue. CCCH proteins, containing one to six copies of CCCH-type zinc finger motifs, were originally defined as C-X6-14-C-X4-5-C-X3-H, but a recent study has redefined them as C-X4-15-C-X4-6-C-X3-H, following genome-wide analysis of rice and Arabidopsis thaliana CCCH proteins [5].
Recent studies have revealed that CCCH proteins participate in the regulation of plant growth, developmental processes, and environmental responses. In rice, a novel nuclearlocalized CCCH-type zinc finger protein, OsDOS, is involved in delaying leaf senescence by integrating developmental cues to the jasmonic pathway [6]. In pepper and rice, the CCCHdomain proteins CaKR1 and OsC3H12 were shown to protect plants from bacterial blight [7,8]. During senescence in Arabidopsis, HUA1, a CCCH-type zinc finger protein with six tandem CCCH motifs, likely participates in regulating flower development [9]. In addition, some CCCH zinc finger proteins are also involved in the abiotic stress response. Two closely related proteins in Arabidopsis, AtSZF1 and AtSZF2, are both involved in modulating salt stress tolerance in plants [10]. Recently, ZmC3H4 and ZmC3H28, which are indirectly regulated by ABA or drought, and 10 other maize CCCH IX genes were found to be responsive to abiotic stress [11]. The CCCH-type zinc finger protein family had been studied in some model organisms on a phylogenetic scale, but its particular evolutionary pathway is still poorly understood. Gramineae, which evolved approximately 60-70 mya (million years ago) from a common ancestor, includes a number of agronomically important crops, such as Oryza sativa, Zea mays, and Sorghum bicolor [12]. The origin of these crops dates back to approximately 50 to 65 mya, and the family has now expanded to over 10,000 species [13]. Whole-genome analyses have revealed high levels of genetic conservation in grasses over the course of evolution, yet these studies have revealed no trace of microsynteny (preservation of a specific local gene order) across grasses [14,15]. The goal of this study was to identify stress-responsive genes in Z. mays, O. sativa, and S. bicolor and to analyze the evolutionary relationships of CCCH IX subfamily at the molecular level using microsynteny analysis. Specifically, we searched for CCCH IX subfamily genes and predicted their structures. To determine the expression patterns of CCCH IX genes in maize tissue, we utilized publically available microarray data from Sekhon et al. [16]. The expression map was shown in Figure S1 (in Supplementary Material available online at http://dx.doi.org/10.1155/2015/824287). In addition, we identified duplication events and calculated Ka/Ks values. Finally, we designed microsynteny maps to identified conservative CCCH IX genes during evolution. The results of this study lay the foundation for future functional studies of CCCH IX subfamily genes.

Identification of Genes
Encoding CCCH IX Proteins. The recent versions of genome, protein, and cDNA sequences for the following three grass species were downloaded from the respective genome sequence sites: Oryza sativa (version 7.0) from the Rice Genome Annotation Project (http://rice .plantbiology.msu.edu/), Zea mays (version 2.0) from the B73 Maize Genome Project (http://www.maizesequence .org/index.html), and Sorghum bicolor (version 1.0) from the DOE-JGI Community Sequencing Program (CSP) (http:// www.phytozome.net/sorghum.php). These nucleotide and protein sequences were used to build local databases using DNATOOLS software [17]. The conserved CCCH domain (PF00642) based on the Hidden Markov Model (HMM) was obtained from http://pfam.sanger.ac.uk/ (Pfam database) [18]. This HMM profile was used as a query to search against the protein database with the BLASTp program (version blast-2.2.9-ia32-win32) ( value = 0.001). This step was crucial for finding as many similar sequences as possible. All predicted protein sequences of genes were analyzed in the Pfam HMM database and the SMART tool (http://smart.embl-heidelberg.de/) to identify CCCH domains, and proteins without these regions were excluded from the dataset [19]. All potential sequences were aligned using Clustal W, and all identical sequences were checked manually to remove redundant genes prior to subsequent analysis [20].
The molecular weight (kDa) and isoelectric point (pI) of each gene were calculated using the online ExPASy tools (http://www.expasy.org/tools/) [21]. The intron distribution patterns and intron/exon boundaries of the CCCH IX genes were deduced by using Gene Structure Display Server (http://gsds.cbi.pku.edu.cn/) to compare the predicted fulllength cDNA or coding sequences with the corresponding genomic sequences [22].

Phylogenetic
Analysis of CCCH IX Genes. The phylogeny of CCCH IX genes was performed by clustering and aligning the protein sequences using Clustal W. The phylogenetic tree was constructed using MEGA 6.0 by the neighbor-joining (NJ) method with the following parameters: Poisson correction, pairwise deletion, and bootstrapping (1,000 replicates) [23]. To confirm the robustness of the NJ tree, an ML tree was constructed using the maximum likelihood method (MEGA 6.0; bootstrap = 1,000 replicates, amino acid substitution model, and Jones-Taylor-Thornton matrix), and an MP tree was constructed using the maximum parsimony method (MEGA 6.0; bootstrap = 1,000 replicates).

Detection of CCCH IX Gene Expansion.
The following analysis was performed to obtain in-depth knowledge of the evolutionary relationships among CCCH IX genes and to determine whether these genes were derived from segmental duplication or tandem duplication events. Tandem duplication is characterized as the presence of multiple gene family members within the same or neighboring intergenic regions. To be defined as a segmental duplication event, each pair of protein-coding genes (excluding noncoding RNA genes, pseudogenes, and so on) in each genome must reside within a duplicated block; moreover, there must be a high similarity between their neighboring protein-coding genes at the amino acid level [24]. First, all identified CCCH IX genes were used as the original anchor points. Next, 100 kb sequences upstream and downstream of each anchor point were compared by pairwise BLASTp ( -value ≤ 10 −10 ) analysis to identify duplicated genes between two independent regions. The number of protein-coding genes flanking any anchor point was then counted [25]. When three or more such genes pairs with syntenic relationships were identified in two regions, the regions were considered to have been derived from a large-scale duplication event [26,27].

Microsynteny Analysis.
Microsynteny analysis across the three species was carried out based on comparisons of the specific regions containing CCCH IX genes. Levels of similarity between the flanking genes of each CCCH IX gene in one species and those in the other species were determined by pairwise comparisons using the BLASTp program. A syntenic block was defined as a region where three or more conserved homologs (BLASTp -value ≤ 10 −20 ) were located within a 100 kb region between genomes [28,29]. The relative syntenic quality in a region was calculated based on the sum of the total number of genes in both conserved sequence regions, excluding tandem duplication. A circular microsynteny map was also constructed using the program Circos-0.54, which utilizes the Perl language [30]. Open reading frame (ORF), molecular weight (MW), and isoelectric point (IP).

Ks Analysis of Homologous Segments.
The time of divergence of duplicated gene pairs within each duplicated block or the divergence of homologous segments was estimated by calculating Ks values between homologous genes using DnaSP (version 5.10) [31][32][33]. Sliding window analysis of nonsynonymous substitutions per nonsynonymous site (Ka/Ks) ratios was conducted with the following parameters: window size, 150 bp; step size, 9 bp [34]. Ks values can also be used to calculate the timing of largescale replications. For each pair of duplicated regions, the mean Ks value for individual homologs in flanking conserved genes was calculated and used to determine the approximate time of divergence. Hence, Ks could be converted to the divergence time beyond the Gramineae evolutionary rate of each locus. The divergence time ( ) was calculated as = Ks/(2 × 6.5 × 10 −9 ) × 10 −6 mya [35].

Phylogenetic and Sequence Structure Analysis of CCCH IX.
In the previous study, 67 CCCH genes were identified in Z. mays. Then, we identified 55 genes encoding CCCH zinc finger proteins in S. bicolor (Supplementary Table 1) using the BLASTp program. For convenience, we assigned names to these genes (SbC3H1-SbC3H55) according to their chromosomal positions. Based on previous studies, we identified six S. bicolor genes in the CCCH IX subfamily [5,11]. A total of 27 CCCH IX genes, which are listed in Table 1 (nine genes from O. sativa, six genes from S. bicolor, and twelve genes from Z. mays), were subjected to further analysis. The lengths of the 27 encoded CCCH IX proteins vary from 225 to 764 aa, with an average of 476 aa. Other pieces of information, including the clone number, chromosomal location, molecular weight (Mw), and isoelectric point (pI) of each CCCH IX gene/protein, are listed in Table 1. To determine the organization and distribution of CCCH IX genes on different chromosomes, we constructed a chromosome map. The 27 CCCH IX genes are randomly distributed on chromosomes, as shown in Figure 1.
To explore the evolutionary relationships between members of the CCCH IX zinc finger subfamily, we constructed a phylogenetic tree using the neighbor-joining (NJ) method based on protein sequence alignment. The phylogenetic tree is divided into four clades (Figure 2(a)). Although different clades have different numbers of members, clades 1-3 consist of proteins from all three grass species, whereas S. bicolor proteins are absented from clade 4. These differences may have been derived from partial gene loss that may have occurred after large-scale duplication events after the formation of new species, which drove species separation. The phylogenetic relationships depicted in the ML and MP trees are largely consistent with these results ( Figure S2). We investigated the sequence structure by exon-intron structure analysis (Figure 2(b)) (http://gsds.cbi.pku.edu.cn/) [36]. The most closely related CCCH IX members in the same clades share similar gene lengths and exon lengths. The only exception was observed in the sequence (SbC3H44) from S. bicolor that contained one intron and one CCCH domain ( Figure 3). Interestingly, the remaining 26 genes are entirely intronless, and they contain two CCCH domains without exception. These findings imply that this subfamily of genes has retained sequences, which have been conserved at the structural level, including exon-intron structure and the number of CCCH domains, throughout millions of years of evolution.

Complicated Duplication Events Have Contributed to CCCH IX Expansion.
We estimated the chromosomal locations of CCCH IX genes in these grasses and examined the evolutionary relationships between these genes. However, in the relevant chromosomes, there was no universal tandem duplication, due to the irregular distribution of CCCH IX genes. To determine whether the regions flanking CCCH IX genes have undergone large-scale duplication events during the evolution, we compared the flanking genes of any two CCCH IX genes. If three or more flanking genes had a best nonself-match according to BLASTp ( -value ≤ 10 −10 within species and -value ≤ 10 −20 between species), we considered that these members belonged to a duplicated block. Based on this information above, we investigated the evolutionary origins and evolutionary relationships within and between grasses species using a Perl script ( Figure 4). Initially, we found 11 duplicated gene segments constituting a network within species, including five maize genes and four rice genes. However, we subsequently identified five groups containing 23 genes (the five groups are shown in Figure 5, with red arrows representing the 23 genes) from species exhibiting tight microsynteny relationships.    In rice, we found four highly similar genes among the sequences flanking both sides of OsC3H2/OsC3H35 ( Figure 5(a)). A duplication event may have occurred during the evolutionary history of this pair of genes. Based on this notion, we reasoned that this pair of genes from one group shares a closer phylogenetic relationship than the others, which helps confirm the results of phylogenetic analysis ( Figure 2). In addition, the gene pair OsC3H10/OsC3H37 is surrounded by six conserved genes compared with the gene pair OsC3H2/OsC3H35, and their microsyntenic relationship is closer than that of the latter pair ( Figure 5). In addition, the collinear gene pair OsC3H2/OsC3H35 is located on OsChr1/OsChr5, and the gene pair OsC3H10/OsC3H37 is present on the same chromosome ( Figure 1). These results suggest that, during the evolution of the rice genome, these two chromosome segments were generated by wholegenome duplication; a large-scale duplication has affected the evolution of CCCH IX genes in rice. In sorghum, the gene pair SbC3H12/SbC3H47 was identified as microsyntenic relationship ( Figure 5(a)), because only six CCCH IX genes belong to this subfamily and they exhibit relatively sparse microsynteny. Based on the phylogenetic tree (Figure 2), each branch of S. bicolor genes has a high degree of similarity with that of other grasses.
In maize, five CCCH IX genes are located in the duplicated section of the genome, and they share a microsyntenic relationship (Figures 5(c) and 5(d)). ZmC3H10/ZmC3H34/ZmC3H43 exhibit tight microsynteny. These results suggest that a large-scale duplication event has occurred during the process of evolution of the maize genome. This duplication event can also be deduced by examining the gene pair ZmC3H4/ZmC3H28. We compared 100 kb sequences upstream and downstream of each anchor point by pairwise alignment, finding that many duplicated regions occurred as mentioned above in the three species. These results indicate that large-scale duplication events have resulted in the production of paralogous genes throughout evolutionary history.
Microsynteny analysis can be used to predict the locations of homologous genes in different species [37]. Regions with 80% of close homologs in the same order and transcriptional orientation are characterized as exhibiting conserved microsynteny [38]. We used this analysis to deduce the molecular evolutionary origins and orthologous relationships within the chromosome regions containing CCCH IX genes in the three grass species. We performed a stepwise gene-bygene reciprocal comparison to gauge the linkages between CCCH IX regions. Based on these detailed comparisons and rigorous analysis, the results of mapping of conserved microsynteny regions are similar to those shown in the phylogenetic tree, but analysis of the flanking fragments represents a more thorough approach than phylogenetic analysis.

Conserved Microsynteny of CCCH IX Genes between
Species. The genes from the three grasses exhibiting tight microsyntenic relationships were divided into five groups ( Figure 5). Genome segments in the same group were likely derived from a single sequence during evolution, which resulted in species differentiation. Sequence segments from the same group are considered to be homologous genes whereby genetic evolution led to species separation. In group (a), we observed a marked opposite-direction microsynteny relationship among OsC3H2/OsC3H35, SbC3H12/SbC3H47, and ZmC3H38/ZmC3H51 segments ( Figure 5). In this group, these gene fragments were derived from a duplication event and are orthologous to each other. The same situation exists in group (b), where we observed microsynteny in a series of genes: OsC3H10, OsC3H37, ZmC3H39, and ZmC3H53 ( Figure 5). Chromosome inversions result in reversals in gene order. Group (c) exhibited a higher level of compact microsynteny, especially OsC3H50, SbC3H10, ZmC3H10, and ZmC3H43 ( Figure 5). On the contrary, SbC3H10, ZmC3H34, and ZmC3H43 exhibited opposite-orientation microsynteny.
In group (d), OsC3H24, SbC3H2, ZmC3H4, and ZmC3H28 appeared microsyntenic, especially the pair OsC3H24 and SbC3H2, which has nine best matching genes according to BLASTp alignment ( Figure 5). Group (e) genes OsC3H33 and SbC3H45 were identified as having successive samedirection microsynteny ( Figure 5). In CCCH IX regions in these species, the chromosomal composition of one species is often assembled from two or more successive segments. For example, fragment SbChr3 appears to match OsChr1  : SbC3H2 : ZmC3H4 : ZmC3H28 : ZmC3H54 (e) Figure 5: Microsynteny maps of CCCH IX genes in grasses. Red arrows represent anchor (CCCH IX) genes, and upstream and downstream genes are represented by black arrows. All genes are numbered from left to right for each segment. Black lines connect conserved gene pairs. and OsChr5, and fragment SbChr1 appears to match OsChr3 and ZmChr1. In the CCCH IX subfamily, chromosomal translocation is a common phenomenon that occurs during the process of differentiation. Based on previous study (in which 12 stress-responsive CCCH IX genes were identified in maize [11]), OsC3H24, SbC3H2, SbC3H45, and so on (as mentioned above) may be involved in regulating abiotic stress through microsynteny mapping, as shown in Figure 5.

Estimating the Dates of Duplication Events.
Based on the assumption that the synonymous mutation rate at each site is stable over time [39], we calculated the duplication event date based on the conserved flanking protein-coding genes. Each pair of proteins in a microsynteny block was aligned at the amino acid level, and codons from gapless aligned regions were used to calculate Ks values using CodeML [24]. We removed any Ks values > 2.0 due to the risk of saturation [40]. The approximate date of the duplication event was then calculated using the mean Ks and an estimated rate of silent site substitutions of 6.5 × 10 −9 substitutions/synonymous site/year. The divergence time ( ) was calculated as = Ks/(2 × 6.5 × 10 −9 ) × 10 −6 mya [35]. The mean Ks values for each duplication event and the estimated dates are listed in Table 2. The results reveal that, for these duplication events, few genes and their flanking fragments expanded before Gramineae speciation (60-70 mya) [40]. The subsequent whole-genome duplication played an important role in the expansion of genes containing CCCH IX regions, leading to complete genome diploidization along with gene rearrangement and loss. The related gene duplication events occurred frequently, leading to further integration of these genes in maize and sorghum approximately 15 mya (Figure 6(a)). We identified the same pattern for synchronous replication during evolutionary history, which helps confirm that section is synchronous in function.

Selection Pressure on CCCH IX Genes.
Since large-scale duplication has contributed to genome evolution, we also calculated the selection pressure among CCCH IX duplicated genes [41]. We calculated the Ka/Ks ratios for 31 pairs of conservative homogenous genes, along with their flanking segments. We found that the Ka/Ks ratios of homologous replication groups were less than 1, except for those of two flanking genes (Figure 6(b)), suggesting that these genes were subjected to purifying selection over the course of evolution [42]. To investigate the selection pressure on these genes in distinct regions, we performed sliding window analysis of Ka/Ks ratios using the following parameters: window size, 150 bp; step size, 9 bp. The Ka/Ks values reveal that the selection pressure differed among sites with sequence differences. We detected the stronger purifying selection in the CCCH domain, except for ZmC3H10/ZmC3H34 and SbC3H45/ZmC3H54 ( Figure S3). The Ka/Ks ratios of most sequences were < 1 suggesting that these gene pairs evolved under purifying selection. Purifying selection can remove detrimental mutations and has probably made the CCCH IX sequences consistent across evolutionary history. Hence, the CCCH IX genes are important for plant growth and development.

Discussion
CCCH IX genes are thought to play a variety of roles in plant growth, development, and stress resistance [6,9,10]. In this study, we selected 27 abiotic stress-responsive CCCH IX genes in Z. mays, O. sativa, and S. bicolor based on phylogenetic tree analysis and their genetic structures, as described in previous reports [5,11]. The CCCH IX subfamily was characterized into four classes based on our interspecific phylogenetic tree (Figure 2). We determined that these specific genes are almost intronless and that they respond to various adverse environmental factors throughout the plant's life cycle [43].
By calculating the dates of duplication of homologous segments and examining the phylogenetic tree, we determined that the most recent (15 mya) duplication events likely occurred in maize and sorghum. Thus, maize and sorghum have probably undergone a series of evolutionary events and experienced a higher rate of evolution than rice. We observed strong microsynteny among rice, maize, and sorghum genes, but this process is not simple, as transsituation, inversion, loss, and segmental duplication have occurred to varying degrees, which act as the driving force in evolution. Such a process is necessary for the expansion of gene families over the course of evolution. For example, through the analysis of microsynteny, we identified segmental duplication in OsChr1/OsChr5 and ZmChr6/ZmChr8; such duplication frequently occurs among genes. The CCCH IX microsynteny maps suggested that these genes have been conserved over the course of evolution. Chromosomes contain many syntenic segments that have undergone transsituation, inversion, deletion, and duplication. The gene order has been retained in syntenic segments. In such segments, key genes can be identified from other closely related species on homologous chromosomes at the same relative locations. In the CCCH IX subfamily, ZmC3H4/ZmC3H28/ZmC3H54 are stress-response genes, and the expression level of ZmC3H54 is the highest ( Figure  S1). Therefore, we can deduce that OsC3H33, SbC3H45, OsC3H24, and SbC3H2 might be responsive to abiotic stress according to the detailed microsynteny analysis ( Figure 5) [11]. This conclusion is supported by the close relationship between ATSZF1, ATSZF2, and CaKR1, which were previously identified as stress-response genes [7,10]. Our analysis of CCCH IX genes demonstrated that the CCCH IX genes and their flanking protein-coding genes are subjected to purifying selection. Subsequently, we conducted sliding window analysis to detect gene sequences with unusual selection pressure, which provided more insights into the effects of the environment and abiotic stress on this subfamily.
In the current study, comparisons among the genes across the three Gramineae genomic sequences demonstrated that extensive large-scale genome duplication has occurred in the CCCH IX subfamily before the species separated 60-70 mya [39]. CCCH IX genes have undergone dramatic expansion followed by whole-genome duplication, which led to speciation approximately 60 mya (Figure 6(a)). In general, we found that CCCH IX genes evolved through multiple largescale duplication events, which are similar to the events that have driven the evolution of protein-coding genes, but the structure, order, and transcriptional orientation of the CCCH IX genes have stayed the same. We then analyzed the evolutionary history of CCCH IX genes in the subsequent tens of thousands of years at the molecular level and performed detailed microsynteny analysis of the abioticresponsive gene pairs. The results of this study provide a foundation for further investigating the molecular evolution and functions of CCCH IX genes, particularly for members with potentially important roles in regulating abiotic stress responses in plants. However, further experiments should be conducted to directly explore the functions of CCCH IX genes.

Conclusion
In this study, we identified and analyzed stress-responsive members of the conserved CCCH IX subfamily through comparative genomic analysis. Some pairs of regions exhibited microsyntenic relationships during evolution according to microsynteny maps. In addition, we calculated the date of duplication by performing Ks analysis and examining Ka/Ks ratios. Through microsynteny analysis, we investigated the evolutionary patterns of OsC3H24/OsC3H33 and SbC3H2/SbC3H45, which function in the response to abiotic stress. The results of this study lay foundation for future functional analyses of these TFs.