Transcriptome Analysis of Two Different Developmental Stages of Paeonia lactiflora Seeds

Paeonia lactiflora is a herbaceous flower in the family Paeoniaceae with both hypocotyl and epicotyl dormant seeds. We used high-throughput transcriptome sequencing on two different developmental stages of P. lactiflora seeds to identify seed dormancy and germination-related genes. We performed de novo assembly and annotated a total of 123,577 unigenes, which encoded 24,688 putative proteins with 47 GO categories. A total of 10,714 unigenes were annotated in the KEGG database, and 258 pathways were involved in the annotations. A total of 1795 genes were differentially expressed in the functional enrichment analysis. The key genes for seed germination and dormancy, such as GAI1 and ARF, were confirmed by quantitative reverse transcription-polymerase chain reaction analysis. This is the first report of sequencing the P. lactiflora seed transcriptome. Our results provide fundamental frame work and technical support for further selective breeding and cultivation of Paeonia. Our transcriptomic data also serves as the basis for future genetics and genomics research on Paeonia and its closely related species.


Introduction
Paeonia lactiflora is a herbaceous perennial flower plant in the family Paeoniaceae, which is native to Central and Eastern Asia and widely grown in China. Paeonia lactiflora is one of horticulturally important flower species. It has been primarily grown for in use in the horticultural industry as home garden plants and is also cultivated as commercially cut flower. Furthermore, P. lactiflora, as a temperament plant species, is highly cold resistant and can normally grow and bloom under temperature −46.5°C. Therefore, P. lactiflora plants have not only become the main source of peonies for the cut flower business but they are also valuable coldresistant genetic resources for breeding and cultivation [1].
Plant seeds usually experience dormancy where they are unable to germinate in a specified period of time. Seed dormancy is a very important mechanism to inhibit germination during unsuitable ecological conditions, for example, low temperature [2]. It has been known that dormancy is caused by two categories of factors: exogenous and endogenous. Exogenous factors include physical barriers of impermeable seed coat, which prevent the seeds from taking up water or gases. As results, the seed is unable to germinate until the physical impermeable layer is broken [3]. Endogenous dormancy is caused by embryonic conditions. For example, physiological immature embryos, lack of growth hormone, or presence of inhibiting chemicals all can retard embryo growth and prevent seed germination [4,5]. In P. lactiflora, for example, abscisic acid (ABA) has been identified to be one of the major endogenous physiological factors to inhibit seed germination and root growth [6].
Paeonia lactiflora seeds display both hypocotyl and epicotyl dormancy and time from sowing to fully germination takes six to seven months under natural conditions. Hypocotyls of P. lactiflora start to elongate and stimulate the root growth when temperature goes down in later fall after sowing. After experiencing long winter, the dormancy of epicotyl is broken and starts to grow during spring [7]. Furthermore, incomplete removal of dormancy during seed reproduction leads to a decrease in germination rate. Therefore, such long process of dormancy and low germination rate greatly slow the Paeonia lactiflora breeding and cultivation. Due to these dual and sequential dormancy scenarios in P. lactiflora, understanding the hypocotyl dormancy is fundamental to tackle the epicotyl dormancy and further to unravel the dormancy of P. lactiflora [8]. Previous studies have been mainly focused on morphological and physiological perspective of seed dormancy in P. lactiflora. However, molecular mechanism of seed dormancy in P. lactiflora remains unexplored.
Previous studies have demonstrated that many genes regulate seed dormancy and germination and especially genes involving in ABA and gibberellic acid (GA) pathway [9][10][11]. Using whole genomic and transcriptomics analyses in Arabidopsis, numerous genes with various functions have been identified and shown differential expression between dormant seeds and dormancy-releasing seeds [12,13]. To better understand the regulatory mechanisms and identify the genes underlying seed dormancy in P. lactiflora, which display dual and long dormancy process, here, we utilized "-omics" approach to obtain transcriptomes of two different germination stages of peony seeds. We extracted total RNA, sequenced the RNA using the Illumina/HiSeq™2000/ MiSeq™ platform, assembled de novo, and annotated unigenes [14]. We identified P. lactiflora genes that were differentially expressed by seed hypocotyls during germination and analyzed their functions and the mechanism of differential expression. We ultimately aim to identify the key seed germination and dormancy genes to improve the breeding process.

Results and Discussion
2.1. Sequence Read Mapping. Based on RNA-seq data from stratification 0 day and stratification 40 days of P. lactiflora seeds before germination of down-hypocotyl (PDB) and after germination of up-hypocotyl (PDA), we obtained a total of 120,181,964 unigene sequences. 48% (58,107,876) and 5.2% (6,274,088) unigene sequences were present in RNAs of PDB and PDA, respectively. A total of 123,577 contigs were obtained from sequence assembly, and 30% of the contigs have the length larger than 2000 bases.

Simple Sequence Repeat (SSR) Analysis.
A total of 9225 SSR loci were identified in the 68,054 contigs from P. lactiflora transcriptomes, accounting for 13.56% of all contigs. SSR types were abundant, including mononucleotide repeat to hexanucleotide repeat (Table 1). Among 6 types of SSR, mononucleotide repeat motifs are the most common, which account for 58.71%, following by dinucleotide repeat, trinucleotide repeat, tetranucleotide repeat, hexanucleotide repeat, and pentanucleotide repeat. These SSR markers are useful resources that can be utilized for the development of universal molecular markers and the construction of genetic map of P. lactiflora.

Prediction of Unigene Coding DNA Sequence (CDS).
Compared with the NCBI nonredundant protein sequences (Nr), Swiss-Prot, KEGG, and KOG databases, we obtained that 30,803 unigenes contain CDS and encoded proteins. The lengths of predicted CDS are shown in Figure 4. The lengths of amino acids translated from predicted CDS are shown in Figure 5. Overall, we identified 8203 genepredicted proteins with more than 300 (26.6%) amino acids and 745 gene-predicted proteins with more than 1000 (2.42%) amino acids.      Figure 6. The PDB sample had 834 upregulated and 960 downregulated genes relative to those in the PDA sample.

DEGs Significantly Enriched in the GO Functional
Results. We identified DEGs that were significantly enriched in GO entries and identified their biological processes, cellular components, and molecular functions at three levels of gene function (Figure 7). We enriched the analysis by using all DEGs for each combination and separately analyzed each combination of differences in the gene enrichment analysis based on up-or downregulation to better understand the gene functions (Figures 8 and 9).
"Oxidation-reduction processes" was the most significantly enriched biological process GO term for DEGs, accounting for 13.99%. "Cell periphery" was the most significantly enriched GO term in cellular components, accounting for 7.85%. "Carbohydrate binding" was the most significantly enriched GO term in molecular functions, accounting for 2.39%.

Significantly Enriched Pathways in the DEGs.
In organisms, different genes are coordinated to exercise their biological functions. The most important biochemical metabolic pathways and signal transduction pathways of differentially expressed genes can be identified by the DEGs. The KEGG is the largest public pathway database [15]. We performed an enrichment analysis using the KEGG pathway unit and the hypergeometric test to identify the pathways for all annotated genes.
As the results of the analysis shown in Table 2, we observed 19 differential genes in the plant hormone signal transduction pathway (ko04075). Previous investigations reported that plant hormone signal transduction pathway (ko04075) is the plant hormone-regulating pathway, involving gibberellin, abscisic acid, cytokinin, auxin, ethylene and jasmonic acid, and other hormonal regulatory pathway [16]. Plant germination and growth is closely related to the regulation of hormones, so we lock this pathway as the research object.

Real-Time Polymerase Chain Reaction (PCR) Analysis of the Genes Involved in Plant Hormone Signal Transduction
Pathways. Four plant hormone signal transduction unigenes were chosen for qRT-PCR analysis to confirm differences in expression levels between accessions found in the FPKM analysis. These unigenes were GAI1, ARF, and BRI1, which were upregulated in seeds, and JAZ, which was downregulated ( Figure 10). The qRT-PCR data confirmed the expression patterns of these unigenes determined by the FPKM analysis.

Conclusion
This is the first study to apply RNA-seq transcriptomics profiling to investigate the sequences and transcript abundances of genes expressed in P. lactiflora seeds. This transcriptome analysis provided 68,054 unigenes, among which 45.86% were aligned to the Nr database, although no P. lactiflora reference genome sequence is available. The PDA and PDB identified a 1794 differentially expressed unigenes, including key dormancy and germination genes, such as GAI 1 and ARFs. GAI 1 inhibits elongation of Arabidopsis hypocotyl cells under dark and light conditions [17]. ARFs are transcription factors involved in auxin signaling pathway during many plant growth and developmental stages. ARF10 mutant shows upregulation of ABA-responsive genes during germination [18]. The transcriptomes of P. lactiflora seeds provide us a basis for further exploration of P. lactiflora seed germination-related genes.

Materials and Methods
4.1. Materials. P. lactiflora was grown at Shenyang Agricultural University, Liaoning Province, China (41°80′N, 123°45 ′ E) under field conditions. The male parent was "Fen Yulou," and the female parent was "Fen Yunu." We harvested the seeds annually every August and used two different seed developmental stages (0 and 40 days) as material. We extracted total RNA from seeds in the two stages and performed an RNA-seq transcriptome analysis.  x-axis represents the next level GO term of the three major GO categories. y-axis represents the number of annotated genes for each term (including the subterm) and the proportion to the total number of annotated genes. Three different GO categories are biological processes, cell components, and molecular functions.   Figure 8: Upregulated genes of enriched GO terms. [19]. The total extracted RNA was detected with the Agilent 2100 instrument (Agilent Technologies, Palo Alto, CA, USA) after passing sample testing using oligo (dT) magnetic bead-enriched eukaryotic mRNA. The constructed cDNA library was sequenced with the Illumina HiSeq 2000.

Raw Sequencing Data
Processing. The image data file obtained by sequencing was converted into raw reads using CASAVA base calling. Joint reads were removed from raw reads, and an N ratio > 2% was used to remove the low-quality reads and obtain the clean reads. All subsequent analyses were based on the clean reads.

De Novo
Assembly. Because P. lactiflora has no reference genome, we carried out de novo assembly using Trinity. First, the contigs were assembled using the overlapping area of the reads. Second, connected to the contig assembly sequence into the ends cannot be extended again   (unigene). The unigenes were compared with the Nr, NCBI nonredundant nucleotide sequences (Nt), Swiss-Prot, KEGG, and KOG databases to determine the direction of the unigenes.

Unigene Functional Annotation, GO Classification, and
Pathway Enrichment Analysis. The GO functional classification of the unigenes was performed using Blast2GO [20], and the pathway analyses were performed using the KEGG annotation service [21]. 4.6. Screening of Differentially Expressed Unigenes, GO Classification, and Pathway Analysis. Prior to differential gene expression analysis, for each sequenced library, the read counts were adjusted by edgeR program package through one scaling normalized factor. Differential expression analysis of two samples was performed using the DEGseq (2010) R package. P value was adjusted using q value. q value < 0.005 and │log 2 (fold change)│ > 1 were set as the threshold for significantly differential expression.
The analytic formula is In the formula, N is the number of genes with pathway annotations in all genes; n is the number of differentially expressed genes in N; M is the number of genes annotated for a particular pathway in all genes; m is the number of differentially expressed genes for a particular pathway.
GO enrichment analysis of the DEGs was implemented by the GOseq R package-based Wallenius noncentral hypergeometric distribution [22], which can adjust for gene length bias in DEGs.
KEGG [23] is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism, and the ecosystem, from molecular level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies (http://www. genome.jp/kegg/). We used KOBAS [24] software to test the statistical enrichment of differential expression genes in KEGG pathways.

SSR Locus Search and Analysis.
SSRs of the transcriptome were identified using MISA (http://pgrc.ipk-gatersleben.de/ misa/misa.html), and primer for each SSR was designed using Primer3 (http://primer3.sourceforge.net/releases.php). The standard used for a SSR was 10 single-nucleotide repeats, six dinucleotide repeats, and three, four, five, and six nucleotides repeated at least five times [25].   Table 3). Total RNA was extracted with the RNA prep pure Plant Kit and reverse transcribed into cDNA using the Pri-meScritH RT reagent kit with the gDNA Eraser (Perfect Real Time) (Takara Bio, Dalian, China). qRT-PCR was performed with a Bio-Rad CFX-96 Real-Time PCR System (Bio-Rad, Hercules, CA, USA) in a final volume of 20 μl containing 2 μl cDNA, 10 μl 26SYBR premix Ex taq™ (Takara Bio, Shiga, Japan), 0.4 μl each of 10 mM forward and reverse primers, and 7.2 μl RNase-free water. The thermal cycling conditions were 95°C for 5 min, 45 cycles at 95°C for 5 s for denaturation, and 56°C for 25 s for annealing and extension.