De Novo Sequencing and Characterization of the Transcriptome of Dwarf Polish Wheat (Triticum polonicum L.)

Construction as well as characterization of a polish wheat transcriptome is a crucial step to study useful traits of polish wheat. In this study, a transcriptome, including 76,014 unigenes, was assembled from dwarf polish wheat (DPW) roots, stems, and leaves using the software of Trinity. Among these unigenes, 61,748 (81.23%) unigenes were functionally annotated in public databases and classified into differentially functional types. Aligning this transcriptome against draft wheat genome released by the International Wheat Genome Sequencing Consortium (IWGSC), 57,331 (75.42%) unigenes, including 26,122 AB-specific and 2,622 D-specific unigenes, were mapped on A, B, and/or D genomes. Compared with the transcriptome of T. turgidum, 56,343 unigenes were matched with 103,327 unigenes of T. turgidum. Compared with the genomes of rice and barley, 14,404 and 7,007 unigenes were matched with 14,608 genes of barley and 7,708 genes of rice, respectively. On the other hand, 2,148, 1,611, and 2,707 unigenes were expressed specifically in roots, stems, and leaves, respectively. Finally, 5,531 SSR sequences were observed from 4,531 unigenes, and 518 primer pairs were designed.

With advances in next-generation sequencing technology, RNA sequencing (RNA-Seq), with high throughput, produced sequences and then mapped them on a reference genome, or de novo assembles a better depiction of transcriptome [9,10,[13][14][15] and has been/is being widely used in model organisms and nonmodel organisms to study biological processes and applications, such as SNP and gene discovery, SSR mining, and identification of differentially expressed genes [15][16][17]. Although the draft genome and transcriptome of T. aestivum and the transcriptome of tetraploid wheat were released [9][10][11][12], transcriptome information of polish wheat is not constructed and reported. Construction as well as characterization of a polish wheat transcriptome, therefore, is a crucial step to study useful traits in polish wheat.
Dwarf polish wheat (DPW) with a recessive dwarfing gene [3] was originally collected from Tulufan, Xinjiang province, China. Therefore, the genetic similarity between DPW and T. durum, T. turgidum, and T. aestivum should be low [7,8]. In this study, the transcriptome of DPW was constructed and characterized. Additionally, the transcriptome was compared with the genomes of barley, rice, and comment wheat and the transcriptome of T. turgidum. Finally, some SSR markers were mined.

Transcriptome Assembly and CDS (Coding Sequence) Prediction.
Reads containing adapters, poly-N, and low quality reads were removed using Novogene-written perl scripts to produce clean reads. Meanwhile, GC content and sequence duplication level of the clean data were calculated. All unigenes were assembled using the software of Trinity (V2012-10-15) [19] with minimum -mer coverage of 2, and other parameters were default. Unigenes were defined using the methods of Zhang et al. [14] and Krasileva et al. [10].

Tissue-Specific Expression
Analysis. Clean reads were aligned against assembled transcriptome to produce read count using the package of RSEM [21]. The read count of each unigene was converted into RPKM values for normalizing gene expression using the RPKM method [13]. If the value of RPKM was 0 (N/A), the unigene was not expressed. Tissuespecific unigenes were selected out according to RPKM values of unigenes among roots, leaves, and stems.

SSR Mining and Primer Design. SSR sequences (SSRs)
were observed using the software of MIcroSAtellite (MISA, http://pgrc.ipk-gatersleben.de/misa/) as described by Zhang et al. [15]. The SSRs were considered to contain motifs with one to six nucleotides in size and a minimum of 5 contiguous repeat units. Based on these SSRs, primers were designed using the software of Primer 3.  categories, three GO functional categories [molecular function (15,684), biological process (4,637), and cellular components (7,783)], KEGG, KOG, pfam, Nr, and Nt, respectively. All annotated information was also deposited at GenBank under the accession GEDT00000000. Previously well-studied transcriptomes reported that many unigenes were not functionally annotated, such as 30% in T. turgidum [10], 32.12% in peanut [15], and 45.10% in Dendrocalamus latiflorus [14]. In this study, 14,266 (23.10%) unigenes were not functionally annotated in any database. As proposed by Krasileva et al. [10], these unigenes might be (1) wheat-specific genes or highly divergent genes; (2) expressed pseudogenes; (3) noncoding transcribed sequences; (4) pieces of 5 and 3 UTRs; and (5) general assembly artifacts. Absolutely, some of these unannotated unigenes, such as noncoding transcribed RNAs, also regulate various cellular processes or other regulations in wheat [25].

Results and Discussion
On the other hand, as the lengths of unigenes were longer, the annotated efficiencies were higher [14]. In the present study, 99.67% of unigenes with more than 2,000 bp, 98.34% of unigenes with 1,500-1,999 bp, and 95.02% of unigenes with 1,000-1,499 bp were annotated in at least one public database. However, 85.08% of unigenes with 500-999 bp and 71.39% of unigenes with 201-499 bp were annotated (Figure 1).  Figure 2). Among 26,122 A/B genome-specific unigenes, 7,785 and 11,291 unigenes were mapped specifically on A and B genomes, respectively ( Figure 2). Meanwhile, all unigenes were compared with the transcriptome of T. turgidum [10]. 56,343 The number of specific unigenes (74.12%) unigenes were successfully matched with 103,327 (73.74%) unigenes of T. turgidum (SFile 2). Approximately, 25% of unigenes of DPW transcriptome did not match on draft wheat genome or the transcriptome of T. turgidum, which suggested polish wheat has low genetic similarity with T. durum, T. turgidum, and T. aestivum [7,8] or different tissues for constructing transcriptomes might product some tissue-specific unigenes [10,11]. Interestingly, 2,622 unigenes were mapped specifically on D genome (Figure 2, SFile 1). Meanwhile, polish wheat may be a hybrid of T. ispahanicum and T. durum [5,6]. This result indicated that AB genomes might give rise to the D genome through homoploid hybrid speciation [26]. Meanwhile, all unigenes were also blasted against the published genomes of barley [23] and rice [24] with anvalue below −5 and more than 100 matched amino acids. 14,404 (18.95%, SFile 3) and 7,007 (9.21%, SFile 4) unigenes were matched with 14,608 genes of barley and 7,708 genes of rice, respectively, which were lower than 70% of unigenes of bread wheat matched with rice and barley genes [11].

Tissue-Specific Unigenes.
Since this transcriptome was constructed from roots, leaves, and stems, there should be some tissue-specific unigenes. Among 76,014 unigenes, 39,083 unigenes, which were involved in basic development and life cycles, such as translation, secondary metabolites biosynthesis, DNA replication, recombination and repair, transcription, signal transduction, carbohydrate transport and metabolism, cell cycle control, cell division, chromosome partitioning, chromatin structure and dynamics, coenzyme transport and metabolism, defense mechanisms, energy production and conversion, and RNA processing and modification, coexisted in all tissues (Figure 3, SFile 5). 5,160, 3403, and 3183 unigenes coexisted in leaves and stems, roots and stems, and leaves and roots, respectively (Figure 3, SFile 5).

Competing Interests
The authors declare no conflict of interests.  Chao Wang, and Xiaolu Wang wrote the paper. Yi Wang and Yonghong Zhou supervised the entire study. Yi Wang, Chao Wang, and Xiaolu Wang contributed equally to this work.