A Cotton-Fiber-Associated Cyclin-Dependent Kinase A Gene: Characterization and Chromosomal Location

A cotton fiber cDNA and its genomic sequences encoding an A-type cyclin-dependent kinase (GhCDKA) were cloned and characterized. The encoded GhCDKA protein contains the conserved cyclin-binding, ATP binding, and catalytic domains. Northern blot and RT-PCR analysis revealed that the GhCDKA transcript was high in 5–10 DPA fibers, moderate in 15 and 20 DPA fibers and roots, and low in flowers and leaves. GhCDKA protein levels in fibers increased from 5–15 DPA, peaked at 15 DPA, and decreased from 15 t0 20 DPA. The differential expression of GhCDKA suggested that the gene might play an important role in fiber development. The GhCDKA sequence data was used to develop single nucleotide polymorphism (SNP) markers specific for the CDKA gene in cotton. A primer specific to one of the SNPs was used to locate the CDKA gene to chromosome 16 by deletion analysis using a series of hypoaneuploid interspecific hybrids.


Introduction
Cotton fibers are unicellular seed trichomes differentiated from the outer integument of a developing seed. The regulation of cell division is thus an important aspect of fiber initiation and development. About 25% of commercial cotton ovule epidermal cells stops division and develops to produce fibers [1]. It has been reported that the cell cycle in fiber cells is arrested in the G1 phase during the early stages of fiber development [2]. A central role in the regulation of the cell division is played by cyclin-dependent kinases (CDKs) and their regulatory cyclin subunits [3][4][5]. Eleven types of cyclins (A, B, C, D, H, CycJ18, L, T, U, SDS (solo dancers), and P) have been identified in plants [6,7]. Plant CDKs, identified in 23 species of algae, gymnosperms, and angiosperms, contain three functional domains: an ATP-binding domain, a cyclinbinding domain, and a catalytic domain. They are classified into five types (A, B, C, D, and E) based on their sequence differences in the cyclin-binding domain [8]. The A-type CDK (CDKA) proteins are characterized by the presence of the PSTAIRE motif, which is essential for cyclin binding [9]. Plant CDKAs, but not CDKBs, have been shown to complement yeast CDK mutants [10][11][12][13], suggesting that plant CDKAs are functional homologues of the yeast CDK. Plant CDKAs not only control cell cycle progression from the G1 to S phase and from the G2 to M phase [5,14] but also participate in cell proliferation and maintenance of cell division competence in differentiated tissues during development [15]. Since the CDKA gene is expressed in both dividing and differential tissues [15,16], it has been suggested that the gene is involved in both cell division and differentiation [17,18].
To dissect the possible functional role of CDKA in fiber cell differentiation and development, we have cloned and characterized a fiber CDKA cDNA and its corresponding genomic sequences. The expression levels of the CDKA transcript and the CDKA protein were also determined in elongating cotton fibers from 5 to 20 DPA ovules and other tissues. The CDKA sequence data was then used to develop single nucleotide polymorphism (SNP) markers specific for the 2 International Journal of Plant Genomics CDKA gene(s) in cotton. Lastly, a primer specific to one of the SNPs was used with single primer extension technology to locate the CDKA gene to chromosome 16 by deletion analysis using a series of hypoaneuploid interspecific hybrids.

Materials and Methods
2.1. Cloning of Fiber GhCDKA cDNA. Two degenerate primers (CDK1: 5 -ATHGGDGARGGHACHTAYGG-3 and CDK2: 5 -CKATCWATCARYARRTTYTG-3 ) (H: A + C + T, D: A + G + T, R: A + G, Y: C + T, K: G + T, W: A + T) designed from the conserved ATP-binding and catalytic domains of plant CDKA genes were used for PCR to amplify cDNA with homology to the CDKA gene using total cDNA from a cotton (Gossypium hirsutum L. cv. DES119) fiber cDNA library as template. The cDNA library was constructed using 10 DPA (days post-anthesis) fiber RNA with a Marathon cDNA amplification kit (BD Biosciences, San Jose, CA, USA). A 383 bp DNA fragment was amplified, purified using a QIAEX II gel extraction kit (Qiagen), cloned into pGEM-T Easy Vector (Promega), and sequenced with an ABI PRISM 310 Genetic Analyzer. The DNA sequencing data was analyzed using the BLAST program (NCBI) and LASERGENE software (DNASTAR). Analysis of the sequencing data showed that the 383 bp DNA fragment encoded an A-type CDK. Two gene specific primers CDKC-1 (5 GGCGTTGTTTATAAGGCTCGTGATCGTG-3 ) and CDKC-2 (5 CATTCCTTTATCAAATTCTCCGTG-GTG-3 ) were designed from the 383 bp DNA fragment and used to amplify a full-length GhCDKA cDNA by the Rapid Amplification of cDNA Ends (RACE) method with the Marathon cDNA Amplification kit. In the 3 RACE reaction, CDKC-1 and the adaptor primer AP1 (5 -CCA-TCCTAATACGACTCACTATAGGGC-3 , 10 μM) were used in the first PCR, and CDKC-2 and the adaptor primer AP2 (5 -ACTCACTATAGGGCTCGAGCGGC-3 ) were used in the second (nested) PCR. The 5 RACEs were also performed as 3 RACEs, except that primers CDK5-1 (5 -GACACTTTCTCAGGAAGATAGTTG-3 ) and CDKC-3 (5 -CCCTATGAGAGTGACAATAAGCAATG-3 ) were used in the first and second RACE amplifications, respectively. A full-length GhCDKA cDNA was assembled using the 5 and 3 RACE products and subsequently confirmed by PCR using Pfu DNA polymerase (Stratagene).

Isolation of the Genomic Sequence of the GhCDKA Gene.
Two primers CDKC-1 and CDK5-1 were used in LA (long and accurate) PCR to amplify DES119 genomic DNA with the Takara LA PCR kit ver.2.1. The PCR was conducted with an initial denaturation at 94 • C for 4 min, followed by 30 cycles at 94 • C for 30 sec and 68 • C for 4 min and a final extension at 68 • C for 5 min. A 7547 bp DNA fragment containing the GhCDKA gene was amplified. The PCR product was gel purified and cloned, and both DNA strands are sequenced as described above.

Expression
Analyses of the GhCDKA Gene. Total RNA (10 μg) isolated from various cotton tissues were electrophoresed in a formaldehyde/agarose gel, transferred onto a nylon membrane, and fixed by UV-crosslinking. A 618 bp DNA fragment corresponding to the C-terminal and 3 -UTR region of the GhCDKA cDNA was amplified by PCR using two primers CDKC-2 and CDK5-1, labeled with [α-32 P] dCTP with the random priming labeling method, and used as a probe for Northern hybridization. After hybridization, the membrane was stringently washed and exposed to X-ray film for autoradiography. The relative GhCDKA transcript levels were determined by the ratio of radioactive intensity of hybridized band of the 1.2 kb GhCDKA mRNA to the EtBr stained 28S rRNA using the program of Scion Image for Windows (Scion Corporation). The GhCDKA transcript level was also determined by RT-PCR. First strand cDNA, labeled by [α-32 P] dCTP, was synthesized with SuperScript II reverse transcriptase (Invitrogen) using oligo-dT primer and total RNA (2 μg) isolated from flowers, leaves, roots, and 5, 10, and 15 DPA fibers. An equal amount of the synthesized first strand cDNA (based on scintillation counting) from different samples was serially diluted to 1x, 5x, 10x, 20x with sterile distilled water and used as template for PCR amplification with primers CDKC-2 and CDK5-1. Five microliters of the PCR products was analyzed by electrophoresis in a 1% agarose gel.
A SNP primer (5 -GCCCAACTATAGAAATGAAA-3 ) designed based on a single nucleotide differences in the sequences between the lines among the three Gossypium species (G. hirsutum, G. barbadense, and G. tomentosum) was used to screen SNP markers of the genetic stocks with the ABI Prism SNaPshot multiplex kit following the method of Buriev et al. [23]. Briefly, the pfu-amplified PCR products were incubated with SAP and Exo I (5 units of SAP and 2 units of Exo I for 15 μL PCR product) at 37 • C for 1 hr followed by 75 • C for 15 min. The PCR mixture contained 5 μL of SnaPshot Multiplex Ready Reaction Mix, 3 μL of purified PCR product, 1 μL of SNP primer (10 μM), and 1 μL of distilled water. The thermal cycle reaction was carried out with 25 cycles of 96 • C, 10 sec, 50 • C, 5 sec, and 60 • C, 30 sec. After treated with SAP, 0.5 μL of SnaPshot product was mixed with 0.5 μL of size standard and 9 μL of Hi-Di formamide denatured at 95 • C for 5 min and then run onto a 3100 Genetic Analyzer (Applied Biosystems).

Cloning and Characterization of GhCDKA Gene.
A 383 bp DNA fragment was amplified by PCR from a 10 DPA cotton fiber cDNA library using two degenerate primers designed from the conserved ATP-binding and catalytic domains of plant A-type CDK genes. BLAST searching in GenBank Databases indicated that the 383 bp cDNA encoded a protein with extensive homology to plant Atype CDKs. A full-length fiber CDKA cDNA (1211 bp), named GhCDKA, was subsequently cloned by 5 and 3 RACEs using gene-specific primers designed from the 383 bp fragment. The GhCDKA gene and its 5 flanking region gene were high in 5 and 10 DPA fibers, moderate in 15 DPA fibers and roots, and low in flowers and leaves. The RT-PCR result was consistent with Northern analyses. Total protein isolated from 5, 10, 15, and 20 DPA cotton fibers, flowers, and leaves was separated by SDS-PAGE, electroblotted onto a nitrocellulose membrane, and probed with anti-PSTAIRE antibody. Western analysis showed that the antibody recognized a 34 kDa protein in all cotton tissues ( Figure 4). The GhCDKA protein was present in a moderate level in leaves but low in flowers. The GhCDKA protein in fibers increased from 5 DPA, peaked at 15 DPA, and decreased from 15 to 20 DPA. The Western and Northern results suggest that the GhCDKA gene is differentially expressed and developmentally regulated.
The 0.9 kb 5 flanking sequence of the CDKA gene amplified from genomic DNA of CMD-01 (TM-1, G. hirsutum), CMD-02 (3-79, G. barbadense), and CMD-11 (G. tomentosum), respectively, was aligned with G. hirsutum var. DES 119 ( Figure 5) for SNP identification. The incidence of SNP was about 1% in the -1 to -913 nt region of the CDKA gene. Specifically, we observed two indels, four transversions and three transitions type of mutation in the 5 flanking sequences of the CDKA gene ( Figure 5). Two SNP occurred between the two G. hirsutum genotypes and six SNP occurred between G. barbadense and G. hirsutum. Results suggested that a putative CDKA locus with at least four different haplotype variants was present in the tetraploid cotton species.  3-79 CDKA sequence from those of the other tetraploids. The sequence of this specific SNP primer was 5 -GCCCAA-CTATAGAAATGAAA-3 . Two SNPs corresponding to the TM-1 (G. hirsutum) and 3-79 (G. barbadense) alleles were identified by the single primer extension technology and designated here as CDKA cg (black) and CDKA at (green) ( Figure 6). F 1 hybrids between TM-1 and 3-79 exhibited peaks of both alleles, showing codominance. Our results also detected the presence of CDKA cg allele in G. tomentosum and the presence of both CDKA cg and CDKA at alleles in the diploid species of G. raimondii (D5). We did not find the presence of any other bases except G or T as SNP markers specific to this SNP primer, suggesting that this locus was biallelic. We did not find the presence of any CDKAspecific SNP marker using the genomic DNA of G. arboreum (A2) species, suggesting the absence of any such locus in G. arboreum specific to the SNP primer or a major change in the primer annealing site of this marker in G. arboreum. This result was concordant with the absence of amplified products specific to CDKA gene in G. arboreum (A2) species, confirming the absence of the CDKA gene in G. arboreum (A2).

Discussion
As a first step toward understanding the mechanisms of fiber cell division and differentiation, a fiber cDNA, GhCDKA, and its corresponding gene have been cloned and characterized. The deduced aa sequence of GhCDKA shows high identity (more than 86%) to the CDKAs from 10 diverse plant species. The alignment of the 11 plant CDKAs indicates that they all contain 294 aa residues (except for 302 aa in AmCDKA) and their three functional domains (ATPbinding, cyclin-binding, and catalytic) have identical aa sequences (data not shown). These results indicate that Atype CDKs are highly conserved in higher plants. Comparisons of the cotton CDKA gene with the Arabidopsis cdc2 A (CDKA; 1) gene revealed that both genes contain 7 introns within their ORFs (Figure 1). Although the two CDKA genes encode proteins with identical molecular mass, the intron sizes of the two genes are quite different. It will be interesting to examine whether there are any differences in transcriptional regulation or RNA splicing between the two genes. A genome-wide analysis of cell cycle genes indicated that a single CDKA gene (AtCDKA: 1) exists in Arabidopsis  thaliana [24]. In contrast, multiple copies of two genes (LeCDKA1 and LeCDKA2) encoding A-type CDKs have been found in tomato [25]. Nicotiana tabacum contains a single copy of the CDKA gene (NtCDKA) and at least one gene similar to NtCDKA in the genome [26]. In this study, Southern analysis revealed that one or two copies of the GhCDKA gene are present in cotton (Gossypium hirstum) (data not shown). Gossypium hirstum is a tetraploid plant which contains A and D genomes. Further work is needed to determine whether the GhCDKA gene is located in the A or D or both genomes. The Arabidopsis and rice CDKA genes have been shown to be expressed not only in dividing tissues of root apex but also in differentiated tissues, such as, sclerenchyma, pericycle, and parenchyma of the vascular cylinder [15,16]. These results suggest that A-type CDKs are involved not only in cell division but also in cell differentiation which is important to the integration of cell division and differentiation in meristems to produce new organs during plant development. In contrast, no cdc2 (CDKA) transcripts have been detected in differentiated adult tissues of chicken and Drosophila [27,28]. These findings suggest that plant CDKAs may have different functions from those of animals. The Arabidopsis CDKA; 1 gene (AtCDKA; 1) has been shown to participate in trichome morphogenesis and development [29]. Fiber cells grown in planta do not divide after initiation; however, some  [1]. These observations suggest that fiber cells retain the competence to divide after initiation. In this study, the GhCDKA gene has been shown to be strongly expressed in elongated fibers ( Figure 3). Western analysis revealed that the fiber GhCDKA protein level increased from 5 DPA, peaked at 15 DPA, and remained at a high level at 20 DPA (Figure 4), which coincided with primary and secondary cell wall syntheses during fiber development. The expression analysis results suggest that GhCDKA may play a role in fiber development.
The low GhCDKA transcript level versus the high amount of GhCDKA protein in 20 DPA fibers suggests the possible existence of posttranscriptional regulation of the GhCDKA gene. In Arabidopsis, the transcript and protein levels of AtCDKB; 1 (but not AtCDKA; 1) have been shown to correlate with cell division rate [30]. Duplications through polyploidization and/or segmental duplication and retrotransposon activity have contributed extensively to the extant genomes of the Malvaceae, including those of Gossypium [31][32][33]. The normal plant cell cycle process is distinguished by a S phase (a round of DNA replication) followed by M phase which are separated by two gap phases (G1 and G2). Previous studies demonstrated that some plant cells followed a different cell cycle mode including endoreduplication where cells undergo iterative DNA replications without any subsequent cytokinesis [34]. Endoreduplication is usually considered to provide a mechanism for increasing cell size [35] and involved modulating the levels of CDKA activity [36,37]. Cotton fibers are unique cells and they are differentiated from epidermal cells of the ovule. Regulation of cell cycle genes during the very early stages of fiber development triggered some specific epidermal cells in the ovule to stop cell division and then elongate into fiber cells. Previous experiments using 5-aminouracil (5-AU), an inhibitor of DNA replication, demonstrated that cotton fiber cells were arrested at the G1 stage [2]. Our results on Northern blot and RT-PCR analysis revealed that the GhCDKA transcript was high in 5-10 DPA fibers and moderate in 15 and 20 DPA fibers. Further studies are needed to reveal if GhCDKA is a regulator of cell cycle and DNA endoreduplication in fiber cells. Duplicated loci pose significant challenges in virtually all aspects of genomics research, including specific gene mapping in tetraploid cotton [23]. Locus-specific markers are thus particularly important for addressing these challenges, and the means to develop them are crucial to the advancement of structural genomics. One possible solution for marker development is to exploit sequence conservation of a specific gene and identify the gene in a locus-specific manner. The CDK gene is of special interest because of its possible importance to cotton fiber development, which entails major modifications of cell division and growth. Although cotton is clearly of polyploid origin, agarose gel analyses of amplified PCR product(s) from diverse taxa of cotton genomic DNAs using primers from conserved CDKA sequence regions showed no size polymorphisms. Such a result could be due to uniformity across duplicated loci or the existence of just one locus. The predicament had led us to seek SNP markers that could be used to define cotton CDK gene(s) and alleles in a locusspecific manner. This approach may be generally applicable for SNP development in cotton and is of particular value for genes that are highly conserved.
Deficiency tests with interspecific hypoaneuploid F1s provide a quick and fairly robust means of localizing various types of loci to specific chromosomes and arms of cotton. When we examined the hypoaneuploid F 1 hybrids used here, all but one exhibited a heterozygous banding pattern of CDKA at and CDKA cg alleles, suggesting that the CDKA locus was in any of respective chromosomes or chromosome arms. However, although CDKA at was present in the monotelodisomic Te16Lo-interspecific hybrid, it was differentially absent from the quasi-isogenic Te16sh hybrid. These results concordantly localized the CDKA gene to the long arm of chromosome 16. In lieu of a monosomicinterspecific F1 hybrid, we examined DNA from a euploid disomic backcross (BC 5 S n ) substitution line, CS-B16 [38]. The disomic chromosome substitution line is euploid but has one pair of chromosome 16 from G. barbadense line 3-79, whereas the other 25 chromosome pairs are largely or completely derived from TM-1. Accordingly, CS-B16 is expectedly devoided of TM-1 chromosome-16 alleles, homozygous for all loci in the alien (3-79) chromosome-16 and also homozygous for TM-1 alleles at nearly all (∼99%) other loci of the genome. We observed that only the 3-79 CDKA at allele is present in CS-B16, strongly supporting the finding from the monotelodisomic interspecific F1 plants that the CDKA gene is located on chromosome 16. Our results on the chromosomal location of CDKA SNP marker on chromosome 16 were concordant with the cytogenetic evidence on the origin of chromosome 16 from an ancestral D genome diploid species [39].
The identification of SNP markers enables facile tracking of the CDKA gene in cotton, and this gene has been successfully mapped in the long arm of chromosome 16. Our results indicate that single-primer extension technology can be used to identify SNP markers in cotton genes, including the 5 -upstream region of the genes and thus facilitate the mapping and investigation of candidate genes for their effects on fiber development.

Disclaimer
Mention of trademark or proprietary product does not constitute a guarantee or warranty of the product by the United States Department of Agriculture and does not imply its approval to the exclusion of other products that may also be suitable. The nucleotide sequence of GhCDKA has been submitted to GenBank and assigned an accession number EU006765.