Characterization and Development of EST-SSRs by Deep Transcriptome Sequencing in Chinese Cabbage (Brassica rapa L. ssp. pekinensis)

Simple sequence repeats (SSRs) are among the most important markers for population analysis and have been widely used in plant genetic mapping and molecular breeding. Expressed sequence tag-SSR (EST-SSR) markers, located in the coding regions, are potentially more efficient for QTL mapping, gene targeting, and marker-assisted breeding. In this study, we investigated 51,694 nonredundant unigenes, assembled from clean reads from deep transcriptome sequencing with a Solexa/Illumina platform, for identification and development of EST-SSRs in Chinese cabbage. In total, 10,420 EST-SSRs with over 12 bp were identified and characterized, among which 2744 EST-SSRs are new and 2317 are known ones showing polymorphism with previously reported SSRs. A total of 7877 PCR primer pairs for 1561 EST-SSR loci were designed, and primer pairs for twenty-four EST-SSRs were selected for primer evaluation. In nineteen EST-SSR loci (79.2%), amplicons were successfully generated with high quality. Seventeen (89.5%) showed polymorphism in twenty-four cultivars of Chinese cabbage. The polymorphic alleles of each polymorphic locus were sequenced, and the results showed that most polymorphisms were due to variations of SSR repeat motifs. The EST-SSRs identified and characterized in this study have important implications for developing new tools for genetics and molecular breeding in Chinese cabbage.


Introduction
Chinese cabbage (Brassica rapa L. ssp. pekinensis) is a diploid (2 = 2 = 20) dicot with a genomic size of 550 Mb (http://www.brassica.info/resource/). It is a subspecies of B. rapa with the A genome [1]. The species originated in China and now has become one of the most important and widely cultivated leaf vegetables in Asia. Chinese cabbage has rosette leaves (RLs) and folding leaves (FLs). The tight leafy head is the main edible part. After a long history of domestication, Chinese cabbage evolves into different cultivars with a variety of characteristics, such as rosette leaf morphology, heading leaf morphology, leafy head shape, size, and structure, flowering time, nutrient composition, and resistance to biotic and abiotic. A better understanding of the molecular mechanism of evolution of Chinese cabbage and further development of marker-assisted selection (MAS) will accelerate the selection process of improved cultivars to meet the growing consumers and environmental needs. Although progress has been made in underlining the molecular mechanism [2][3][4][5], many aspects are still unclear.

Plant Materials.
For EST-SSR identification and primer design, a typical heading Chinese cabbage, namely, FuShan-BaoTou, was used in this study. For primer assessment and SSR polymorphism analysis, a panel of twenty-four cultivars of Chinese cabbage was used, including nineteen morphologically diverse cultivars of Brassica rapa L. ssp. pekinensis (B. pekinensis L.) and five Brassica rapa L. chinensis (B. chinensis L.). All plants were grown in a greenhouse with 16/8 photoperiod at 22 ± 2 ∘ C. Leaves were collected after they were grown for two weeks from ten seedlings of each cultivar and were pooled together for DNA extraction.

De Novo Assembly.
We assembled the clean read dataset presented by Wang et al. [37] from the RL and FL libraries according to the methods described by Wang et al. [38] using the Trinity software (http://trinityrnaseq.sourceforge.net/). Contigs and unigenes were obtained from these two libraries, respectively. Redundant sequences were removed and overlapping unigenes were assembled into continuous sequences by the TIGR Gene Indices Clustering (TGICL) tools [39]. Similarity was set at 94% and an overlap length was set at 100 bp.

Identification of EST-Derived SSRs and Primer Design.
SSRs were detected with the MicroSAtellite software (MISA; http://pgrc.ipk-gatersleben.de/misa/). Parameters were set with a minimum number of 12, 6, 5, 5, 4, and 4 repeat units for identification of mono-, di-, tri-, tetra-, penta-, and hexanucleotide motifs, respectively. Primers were designed using primer 3 with no SSR allowed in primers. Primer length ranged from 18 to 28 bp (with an optimality at 23). Annealing temperature was set at 55-65 ∘ C (with an optimality at 60 ∘ C). The size of a PCR product ranged from 80 to 300 bp.

Mapping EST-SSRs.
The physical positions of the EST-SSRs identified in the study were determined by aligning the SSRs and flanking sequences (50 bp at each side) to the Brassica rapa (Chiifu-401) reference genome (http://brassicadb.org/brad/) using BLASTN. New EST-SSRs were identified by comparing with previously reported SSRs in the SSR marker database for Brassica (http://oilcrops.info/SSRdb) [25].

SSR Amplification and SSR Polymorphism Analysis.
DNA was extracted following a CTAB DNA extraction protocol [40]. The DNA sample of the Chinese cabbage FuShan-BaoTou was used as template to detect the availability of SSR primers designed above. The DNA samples of those aforementioned twenty-four cultivars of Chinese cabbage were used as templates for SSR polymorphism analysis. The polymorphisms of EST-SSRs were validated by 6% denaturing polyacrylamide gel electrophoresis, 12% nondenaturing polyacrylamide gel electrophoresis, and sequencing.

De Novo
Assembly. High quality clean read data from the RL and FL libraries by Wang et al. [37] were assembled using the Trinity software package [41]. A total of 99,684 and 95,411 contigs were obtained, with an average length of 333 and 342 bp and a median length (N50) of 531 and 536 bp, from the RL and FL libraries, respectively (Table 1).
Contigs from the same transcript were detected with paired-end reads, as well as the distances between these contigs. Using the Trinity software package, we assembled these contigs into unigenes, in which Ns were removed. These unigenes were set to be not extendable on either end of the sequences. A total of 46,294 and 48,473 unigenes from the RL and FL libraries were obtained with an average length of 707 and 680 bp and a median length (N50) of 1000 and 980 bp, respectively (Table 1). Size distribution of the contigs and unigenes is consistent with the RL and FL libraries as shown in Figure 1, indicating that our Illumina sequencing solution is reliable and reproducible. Unigenes from the two samples were combined; redundant unigenes were removed;  Figure 1).

Characterization of EST-SSRs in Chinese Cabbage.
A total of 10420 EST-SSRs were detected with the MicroSAtellite software (MISA; http://pgrc.ipk-gatersleben.de/misa/) in 8571 unigenes, accounting for 16.6% of total nonredundant unigenes (Tables 2 and s2). The mean SSR density is one per 3.9 Kb, corresponding to one for every 5.0 nonredundant unigenes. 1502 unigenes (17.5%) harbored more than one SSR and 666 SSRs (6.4%) were present in compound formation that had more than one repeat type ( Table 2).
The size of SSR repeat units ranged from one to six. The number of SSRs with each repeat unit was found to be  quite different. The SSRs with tri-and dinucleotide repeat motifs were the most common (4,405, 42.27%; 4,043, 38.80%, resp.), followed by mono-(1,644, 15.78%), hexa-(126, 1.21%), penta-(112, 1.07%) and tetra-(90, 0.86%) nucleotide repeat motifs ( Figure 2). The most common two repeat motif types accounted for 81.07% of the total SSRs detected, and the rest repeat motifs types only accounted for 18.93%. The iterate number of repeat units in an EST-SSR ranged from 4 to 25. The occurrence frequency of EST-SSTs with different iterate numbers was found to be unequal either. EST-SSRs with iterate number of 5 (2832, 27.18%) were the most common ones, followed by 6 (2739, 26.29%), 7 (1368, 13.13%), 8 (703, 6.75%), 12 (542, 5.20%), and 9 (480, 4.61%) (Table  s3). A dinucleotide containing EST-SSRs with a maximum of 25 repeat units was identified. For EST-SSRs with more than 10 repeat units, the mononucleotide repeat motifs were the most abundant, accounting for 93.46% of these EST-SSRs. The lengths of EST-SSR sequences ranged from 12 to 65 bp (Table s4). The longest one is a pentanucleotide containing EST-SSR with 65 bp in length. The lengths of most EST-SSRs are from 12 to 20 bp, accounting for 91.47% of the total EST-SSRs, followed by EST-SSRs with 21-30 bp in length (874 SSRs, 8.39%). Only 13 EST-SSRs were identified with over 30 bp, accounting for 0.12% of the total EST-SSRs.

Primer Design and Evaluation of EST-SSRs in Chinese
Cabbage. A total of 7877 PCR primer pairs from the unique sequences flanking 1561 EST-SSR loci were designed according to the criteria described in Section 2 using primer 3 (Table  s5). For each EST-SSR locus, a maximum of 5 alternative primer pairs was designed. The other 8859 EST-SSRs, which had no appropriate PCR primer pairs designed as their flanking sequences, did not fulfill the primer design criteria  mentioned above. For the 1561 EST-SSRs with PCR primers designed, PCR primers of those aforementioned 24 loci with ≥ 20 bp were selected for primer synthesis and amplification evaluation in Chinese cabbage FuShanBaoTou. Nineteen (79.2%) of these 24 EST-SSR loci successfully yielded PCR amplicons in FuShanBaoTou. We sequenced these nineteen PCR amplicons and found that the amplicons in thirteen loci were exactly the same as expected; two were longer than the expected size, and four were shorter (Table 3). Size deviation of five EST-SSRs loci with the expected sizes (BR-es6, BR-es7, BR-es8, BR-es12, and BR-es18) was due to the variations of SSR repeat motifs (Table s6). One amplicon (BR-es16) deviated from the expected sizes and had an additional 86 bp containing a (TC) 9 motif near the SSR repeat motif region (Table s6).

Validation of Polymorphism of EST-SSRs.
Nineteen effective primer pairs were used for polymorphism validation for these aforementioned 24 Chinese cabbage cultivars. The results showed that 17 loci (89.5%) were polymorphic ( Figure 5). A total of 56 alleles at the 17 polymorphic loci were identified and the average number of alleles per SSR locus was 3.29 with a range between 2 and 6. A maximum of 6 alleles was detected for BR-es16 and BR-es18 loci. BR-es6 and  Table s8). We sequenced the polymorphic alleles of the 17 polymorphic loci and found that polymorphisms of 9 loci (BR-es1, BR-es4, BR-es7, BR-es8, BR-es10, BR-es14, BR-es17, BR-es18, and BR-es19) were because of different iterate numbers of SSR repeat motifs. In another 6 polymorphic loci (BR-es2, BR-es3, BR-es12, BR-es13, BR-es15, and BR-es16), the most polymorphic alleles were found in the repeat motifs with additional changes in other regions (Table s7). For example, compared with the allele BR-es3-160 bp in FuShanBaoTou, the polymorphic alleles BR-es3-163 bp and 145 bp had different iterate numbers of the TAG/ATC repeat motif, while the polymorphic allele 99 bp had not only a different number of the repeat motif, but also a deletion in another region (Table  s7). The other two polymorphic loci, BR-es5 and BR-es9, had polymorphisms that are not related with the repeat numbers of SSR motifs (Table s7).

High-Throughput RNA Sequencing Provides Substantial
Knowledge for EST-SSRs. Illumina paired-end RNA sequencing is one of the fast immerging next-generation sequencing (NGS) technologies. Because of its advantages in highthroughput, high accuracy, and low cost, Illumina paired-end sequencing has been widely used for de novo transcriptome sequencing and assembly and transcriptome quality and quantity analysis in many plants [37,38,42,43]. In our previous study, the transcriptome of rosette and folding leaves in Chinese cabbage was analyzed using the Illumina paired-end RNA sequencing technology, and abundant clean reads and ESTs with high quality were obtained [37]. The large quantity of clean reads would increase coverage depth of transcriptome nucleotide, enhance sequencing accuracy, and provide useful information for developing new tools for genetic mapping and molecular breeding of Chinese cabbage. In this study, we further assembled the clean reads into contigs and unigenes from the RL and FL libraries, respectively. The parameters for both contigs and unigenes between the two libraries had no significant differences (Table 1), indicating our Illumina sequencing solutions have high reliability and reproducibility. The unigenes of the two libraries were further assembled and a total of 51,694 nonredundant unigenes were obtained from the 40.7 Mb sequence data. We discovered more nonredundant unigenes than those in previous studies [35,36], which represent a large portion of the Chinese cabbage transcriptome and are important for a comprehensive understanding of EST-SSRs.

New EST-SSRs Identification.
Of all 10420 EST-SSRs identified in this study, more than 70% have been identified and presented in the SSR marker database (http://oilcrops.info/SSRdb), among which over half were exactly the same with the earlier reported SSRs based on the Brassica rapa (Chiifu-401) genomic sequence (Table  s2) [25]. It demonstrates that our method is highly reliable for EST-SSR identification. 2317 EST-SSRs (22.2%) with polymorphism in different repeat numbers could further be used for identification of Chiifu-401 and FuShanBaoTou and for genetic linkage map constructions using these two cultivars as parents. A total of 2744 new EST-SSRs (26.3%) were identified in the study, which, in combination with previously discovered EST-SSRs, could be used for high-density genetic linkage map construction, gene/QTL mapping, cultivar identification, and so forth.

High Polymorphism of Chinese Cabbage EST-SSRs.
In the present study, 79.2% of the EST-SSRs primer pairs selected for primer evaluation successfully generated high quality amplicons, indicating that the ESTs from the high-throughput RNA sequencing of Chinese cabbage transcriptome are suitable for specific primer design. The unsuccessfully designed primer pairs may be due to splice sites, large introns, chimeric primer(s), or poor quality sequences [27]. We sequenced all PCR amplicons in Chinese cabbage FuShanBaoTou yielding 19 successful primer pairs. We found that all amplicons contained the expected SSRs and the SSRs in 13 amplicons were exactly the same as predicted (Table s6). The deviation of EST-SSR PCR amplicons from the expected size is likely due to the presence of introns, large insertions or repeat number variations, a lack of specificity, or assembly errors [43]. In the present study, we found five of six amplicons with unexpected sizes had different iterate number of SSR repeat units, while the other one had a 86 bp insertion near the expected SSR repeat motif region (Table s6). These results suggested that the unigenes assembled from the high-throughput RNA sequencing of Chinese cabbage transcriptome are reliable, and the EST-SSRs identified in our dataset could be used for further studies, such as genetic mapping and cultivar identification.
Most of the EST-SSR loci (accounting for 89.5% of the tested loci) were found to be polymorphic among the 24 tested cabbage cultivars. The mean number of alleles per SSR locus was 3.29 with a range between 2 and 6 ( Table 3), indicating that polymorphism of EST-SSRs in Chinese cabbage is relatively high. Most of the polymorphisms of the tested EST-SSR loci are due to the variations of SSR repeat motifs in this study. There were only two loci where the polymorphisms were not related to the SSR repeat motif variations (Table s6). The results indicate that the EST-SSRs identified and the PCR primers designed in this study could further be used for constructing high-density genetic linkage maps, mapping quantitative trait loci, assessing germplasm polymorphism and evolution, marker-assisted selection, and cloning functional gene in Chinese cabbage.
In summary, we assembled a large set of clean reads with high quality derived from the Chinese cabbage transcriptome using high-throughput RNA sequencing technology with a Solexa/Illumina platform. A total of 51,694 nonredundant unigenes were obtained from 40.7 Mb sequence data, providing substantial knowledge for EST-SSR identification and characterization. 10,420 EST-SSRs were identified and characterized, and PCR primer pairs for 1561 EST-SSRs were designed. By comparing with previously reported SSRs in the SSR marker database for Brassica (http://oilcrops.info/SSRdb), we identified a total of 2744 new EST-SSRs. Primer pairs for 24 EST-SSRs were selected for primer evaluation, and 79.2% of the 24 EST-SSR loci successfully generated high quality amplicons. Among the effective primers, 89.5% of them showed polymorphism in 24 cultivars of Chinese cabbage. The EST-SSRs developed in this study, in combination with previously reported EST-SSRs, will provide valuable resources for constructing highdensity genetic linkage maps, mapping quantitative trait loci, assessing germplasm polymorphism and evolution, markerassisted selection, and cloning functional gene in Chinese cabbage. To our knowledge, this is the first successful attempt to develop large quantity of EST-SSRs with high quality based on the transcriptome of Chinese cabbage using highthroughput RNA sequencing technology.