Characterization of the Complete Chloroplast Genome of Acer truncatum Bunge (Sapindales: Aceraceae): A New Woody Oil Tree Species Producing Nervonic Acid

Acer truncatum, which is a new woody oil tree species, is an important ornamental and medicinal plant in China. To assess the genetic diversity and relationships of A. truncatum, we analyzed its complete chloroplast (cp) genome sequence. The A. truncatum cp genome comprises 156,492 bp, with the large single-copy, small single-copy, and inverted repeat (IR) regions consisting of 86,010, 18,050, and 26,216 bp, respectively. The A. truncatum cp genome contains 112 unique functional genes (i.e., 4 rRNA, 30 tRNA, and 78 protein-coding genes) as well as 78 simple sequence repeats, 9 forward repeats, 1 reverse repeat, 5 palindromic repeats, and 7 tandem repeats. We analyzed the expansion/contraction of the IR regions in the cp genomes of six Acer species. A comparison of these cp genomes indicated the noncoding regions were more diverse than the coding regions. A phylogenetic analysis revealed that A. truncatum is closely related to A. miaotaiense. Moreover, a novel ycf4-cemA indel marker was developed for distinguishing several Acer species (i.e., A. buergerianum, A. truncatum, A. henryi, A. negundo, A. ginnala, and A. tonkinense). The results of the current study provide valuable information for future evolutionary studies and the molecular barcoding of Acer species.


Introduction
Acer truncatum Bunge, which is a member of the order Sapindales and the family Aceraceae, is a new versatile oilproducing woody tree that is widely distributed in northern China, Korea, and Japan, where it is a native species, but it has also been detected in Europe and North America [1]. is tree species represents a potential source of medicinal compounds. Many highly bioactive compounds have been extracted from Acer species, such as flavonoids, tannins, alkaloids, and terpenoids [2]. Acer truncatum seeds are processed to extract the seed oil, which was listed as a new food resource by the Ministry of Health of the People's Republic of China in 2011. Approximately 5-6% of the A. truncatum seed oil is nervonic acid (C24 : 1) [3]. Nervonic acid, which is a key component of brain nerve cells and tissues, promotes the repair and regeneration of nerve cells and damaged tissues, and has been detected in the seed oil of a number of plants. us, A. truncatum seed oil represents a novel plant resource with potential applications for treating human cerebral and neurological problems [4].
Chloroplasts (cps) have important functions related to some essential metabolic pathways, including photosynthesis and glycometabolism [5,6]. In plants, the DNA-replication mechanism associated with the cp genome is independent of the nuclear DNA-replication mechanism. Moreover, the cp genome is more highly conserved than the nuclear genome. In 1986, the liverwort (Marchantia polymorpha) cp genome became the first such genome to be described [7]. e subsequent emergence of rapid and cost-effective genome-sequencing technologies has led to more cp genomes being sequenced, with the resulting data deposited in the GenBank database.
ese sequences indicate that the angiosperm cp genomes typically form a circular DNA molecule comprising 120-170 kb that encode 120-130 genes [8]. e circular cp genome structure consists of the following four segments: two inverted repeat (IR) regions separated by large single-copy (LSC) and small single-copy (SSC) regions [9,10]. However, genome size variations [11], rearrangement events [12][13][14], and gene losses [15] have been detected in some plant species. ere is also considerable diversity in the IR size, possibly because the expansion and contraction of the IR regions have been very common events during the evolution of plant species, including those belonging to Fabaceae [16] and Poaceae [17]. e complete cp genome has been used in investigations of phylogenetic relationships, molecular markers, and evolution [18].
Insertions/deletions (indels) and single-nucleotide polymorphisms (SNPs) within the cp genome have been used to rapidly distinguish species [19][20][21]. Additionally, cp markers have been developed to identify closely related species, including buckwheat and the species of Solanum, Angelica, and other genera [20][21][22]. For example, Park et al. [23] used two indel markers (trnK-trnQ and ycf1-ndhF) to differentiate three Aconitum species. Additionally, indels in the trnL-F, trnG-trnS, and trnL introns have been used to analyze the molecular evolution of the Silene species cp genome [23]. us, indel and SNP cp markers are important for identifying species and investigating molecular evolution.
Several Aceraceae species have recently had their cp genome sequences published, including Acer morrisonense [24], Dipteronia sinensis and Dipteronia dyeriana [8], and Acer griseum [25]. Chen et al. [26] were the first to report the complete A. truncatum cp genome; however, they only focused on the genome composition and phylogenetic relationships. us, the A. truncatum cp genome was not comprehensively characterized. Compared with the result of Chen et al. [26], in our study, we also found A. truncatum is closely related to A. miaotaiense. Moreover, we also analyzed the repetition, contraction, and expansion of the IR regions as well as the synonymous and nonsynonymous substitution rates. Highly divergent regions and potential indels were detected via a comparative analysis of six available cp genome sequences. Additionally, on the basis of the results of our comparative analysis of cp genomes, we developed the ycf4-cemA indel marker to distinguish six Acer species (i.e., A. buergerianum, A. truncatum, A. henryi, A. negundo, A. ginnala, and A. tonkinense). e data presented herein will enrich the genetic information available for the genus Acer, provide novel insights into A. truncatum evolution, and form an important theoretical basis for increasing the A. truncatum seed yield.

DNA Sequencing and Chloroplast Genome Assembly.
We collected fresh leaves from A. truncatum plants, which were obtained from the Acer germplasm collection of the Jiangsu Academy of Agricultural Sciences, Nanjing, Jiangsu, China. e leaves were frozen in liquid nitrogen and stored at −80°C. Total DNA was extracted from the frozen leaves with the DNA Isolation Kit (Aidlab, China). We prepared 350-bp shotgun libraries, which were sequenced according to the doubleterminal sequencing method of the Illumina HiSeq X™ Ten platform.
A total of 16.30 GB high-quality clean data (Q30 > 95.23%) were used for assembling the sequence as described by Ferrarini et al. [27]. e cp DNA reads were extracted with SMALT, using the A. buergerianum (GenBank accession NC_034744), A. miaotaiense (GenBank accession NC_030343), and A. morrisonense (GenBank accession KT970611) cp genomes as queries. e reads with 90% similarity were considered to be derived from the cp genome. e data were trimmed with Sickle (https://github.com/najoshi/sickle) (using 푞 = 30 as the threshold for trimming and 푙 = 50 as the threshold for keeping a read based on length) and assembled with the default parameters of AbySS [28]. Redundant contigs were removed with the CD-Hit program [29] (threshold of 100%) and the unique contigs were merged with the default parameters of Minimus2. e boundary regions of LSC/IRB, IRB/SSC, SSC/IRA, and IRA/LSC of the completed cp genomes were validated with PCR-based sequencing. Details regarding the primers are provided in Supplementary Table S1.

Annotation and Comparative Analysis.
e A. truncatum cp genome was annotated with DOGMA (http://dogma. ccbb.utexas.edu/). e start and stop codons were coupled manually. All tRNA genes were identified with the default settings of tRNAscan-SE 1.21 [30]. e OGDRAW program was used to visualize the circular A. truncatum cp genome map [31]. Codon usage was analyzed with MEGA 6.0 [32]. e cp genomes of six Acer species (A. truncatum, A. buergerianum, A. davidii, A. griseum, A. miaotaiense, and A. morrisonense) were compared with mVISTA [33,34], with the annotated A. morrisonense sequence used as the reference.

Analysis of Synonymous and Nonsynonymous Substitution
Rates.
e A. truncatum, Citrus platymamma, Dimocarpus longan, and Spondias mombin cp genome sequences were compared to determine the synonymous ( ) and nonsynonymous ( ) substitution rates. e protein-coding exons were separately aligned with MEGA 6.0. e and substitution rates were estimated with DnaSP [38].  Table S5) were used for elucidating the evolutionary status of A. truncatum, with Euonymus hamiltonianus (order Celastrales) serving as the outgroup. e 64 single-copy orthologous genes common among the 23 analyzed genomes were aligned with the default parameters of ClustalW 2.0 [39]. e maximum likelihood (ML) analyses of phylogenetic relationships were completed with RAxML using the GTRGAMMA model [40].

Estimation of the Divergence Time.
For the divergence time, we first removed ambiguously aligned sites in the 23 whole genomes data set using GBLOCKS v.0.91b [41] with the flowing parameters: minimum sequences per conserved position, 15; minimum sequences per flank position, 20; maximum number of contiguous nonconserved positions, 8; minimum block length, 10; allowed gap positions, none. en, the divergence time was estimated with the MCMCTree program of PAML (version 4.9a) [42], with the following parameters: burnin 100000, sampfreq 200, and nsample 10000. Moreover, E. hamiltonianus was constrained to be the outgroup, and the root age was constrained by the divergence time of E. hamiltonianus from A. truncatum (98-117 million years ago) (http://www.timetree.org/).

Development and Validation of the ycf4-cemA Indel
Marker.
e indel regions were selected based on the results of a similarity search with mVISTA. Additionally, primers were designed with Primer 5. e PCR amplification was performed as described by Ma et al. [43]. To confirm the accuracy of the PCR product sizes, three samples per species were sequenced by the General Biology Company (Nanjing, Jiangsu, China).

Features of the A. truncatum Chloroplast Genome.
e A. truncatum genome sequence was submitted to the GenBank database (accession number MH638284). Chen et al. [26] was the first to describe the A. truncatum cp genomic features. Specifically, they reported that the A. truncatum cp genome comprises 156,262 bp, with an overall GC content of 37.9%. In the current study, we revealed similar structural features, with the A. truncatum cp genome consisting of 156,492 bp and forming a typical quadripartite structure ( Figure 1 and Table 1). e LSC region (86,010 bp) and SSC region (18,050 bp) were separated by a pair of inverted repeats (IRA and IRB; 26,216 bp each). e GC content may be an important factor for assessing species similarity. e GC content of the complete A. truncatum cp genome was 37.90%, which was the same as the result of Chen et al. [26] and that of the LSC, SSC, and IR regions was 36.10%, 32.10%, and 42.80%, respectively, which is similar to the GC contents reported for other Acer species (Table 1) [24,25]. e rRNA and tRNA genes had the highest GC contents in the IR regions across the complete cp genome, which is a phenomenon that is very common among plant species [44,45].
We detected 134 genes in the A. truncatum cp genome, including 20 duplicated genes in the IR regions, 112 unique functional genes, and 2 pseudogenes. e 112 functional genes comprised 4 rRNA genes, 30 tRNA genes, and 78 protein-coding genes ( Table 2). Among the 134 genes in the cp genome, 17 genes contained introns, of which three genes (ycf3, clpP, and rps12) contained two introns and the remaining genes contained one intron (i.e., eight protein-coding genes and six tRNA genes) ( Table 2). e rps12 gene was trans-spliced, with its 3´ exon duplicated in the IRs and its 5´ exon located in the LSC region. Interestingly, trnK-UUU had the largest intron (2,487 bp) because of the presence of the matK gene. e infA and ycf1 genes were designated as pseudogenes. e infA gene contained several internal stop codons and the ycf1 gene was located at the boundary region of IR and SSC ( Figure 1).
In this study, we assessed the relative synonymous codon usage (RSCU), which represents the nonuniform synonymous codon usage in coding sequences. Generally, RSCU values >1.00 and <1.00 indicate the codon is used more and less frequently than expected, respectively [46]. e codon usage frequency in the A. truncatum cp genome was estimated based on the protein-coding gene sequences (Table 3). e protein-coding genes comprised 77,796 bp encoding 25,932 codons. Leucine and cysteine were the most and least prevalent amino acids encoded by the codons, accounting for 10.82% and 1.17% of the codons, respectively. With the exception of the methionine and tryptophan codons, most of the amino acid codons had sequence biases [e.g., UUA (RSCU = 1.80) for leucine, UCU (RSCU = 1.56) for serine, and UAU (RSCU = 1.60) for tyrosine] ( Table 3). Codon usage was generally biased toward A or T (U) with high RSCU values, which is a phenomenon that is very common among the cp genomes of land plant species [47,48].

Analysis of the Repeats in the A. truncatum Chloroplast
Genome. An analysis of the repeats in the A. truncatum cp genome revealed 22 long repeats (i.e., one reverse, nine forward, five palindromic, and seven tandem repeats). e only reverse repeat was 35 bp long. e forward and palindromic repeats were mainly longer than 30 bp (Supplementary Table S2 and Figure 2), whereas the tandem repeats were mainly 13-28 bp long (Supplementary Table S3). Most repeats were located in the intergenic spacers, with the rest located in protein-coding regions and introns. Short dispersed repeats are important for promoting cp genome rearrangements [49].
Simple sequence repeats are useful molecular markers for studying genetic diversity and identifying species [43]. In the were AT. An examination of the distribution of the SSRs in the A. truncatum cp genome indicated that 73.08%, 21.79%, 3.85%, and 1.28% of the SSRs were in the intergenic spacer, protein-coding, intron, and tRNA regions, respectively ( Figure  3(b)). Moreover, our data suggest that the A. truncatum cp genome contains fewer SSRs than the A. miaotaiense cp genome [24]. However, in both of these Acer species, the SSRs current study, we detected 78 perfect microsatellites in the A. truncatum cp genome, including 67, 6, 1, and 4 mono-, di-, tri-, and tetranucleotide repeats, respectively; no hexanucleotide repeats were identified (Figure 3(a) and Supplementary  Table S4). Most of these repeats were located in noncoding regions. Additionally, A or T accounted for 94.03% of the mononucleotide repeats, whereas all of the dinucleotide repeats

Comparative Analysis of Six Acer Chloroplast Genomes.
A comparative analysis of cp genomes is important for elucidating phylogenetic relationships and identifying species [52,53]. With the annotated A. morrisonense cp genome as the reference, the overall sequence identities among the six analyzed Acer cp genomes were determined and visualized with mVISTA ( Figure 5). e comparative cp genome analysis proved that the noncoding regions were more diverse than the coding regions, which is consistent with the findings in other plant species [54]. e IR regions were more conserved than the LSC and SSC regions, and four rRNA genes were essentially identical in the six Acer species. e intergenic spacers were relatively diverse (e.g., trnH-psbA, matK-rps16, petN-psbM, petA-psbJ, and ycf4-cemA). e most diverse coding regions were the matK, rps2, rpoC2, rpoB, rps19, and ycf1 sequences. Similar results were observed in previous studies [55,56]. e highly diverse regions identified in the current study may be relevant for developing markers or genetic barcodes useful for exploring the genetic differentiation among Aceraceae species.

Analysis of Synonymous and Nonsynonymous Substitution
Rates. In a previous study, the nonsynonymous and synonymous substitution ratio (퐾 /퐾 ) was used to evaluate the evolutionary forces on some genes [49]. In this study, the 퐾 /퐾 ratio was determined for 78 protein-coding genes following the comparison of the A. truncatum cp genome with the cp genomes of C. platymamma, D. longan, and S. mombin ( Figure 6). Nearly all of the 퐾 /퐾 ratios were less than 1.0, implying most of the protein-coding genes were under purifying selection during evolution. However, the 퐾 /퐾 ratio of seven genes (atpF, matK, psbD, rps16, rps18, rpl36, ndhB, and ycf1) was between 0.5 and 1.0. Moreover, the 퐾 /퐾 ratio was greater than 1 for psaI clpP, rps4, rpl22 and ycf2, which indicated these genes were under positive selection generally comprise A or T, which contributes to the A/T richness of their cp genomes. ese results represent useful information regarding the cp SSR markers that can be applied to investigate the genetic diversity of A. truncatum as well as the relationships among species. ese markers may also be relevant for selecting germplasms with high nervonic acid contents.

Contraction and Expansion of the IR Regions.
e number and order of genes were highly conserved among the cp genomes of six Acer species. However, structural changes were detected in the IR boundaries ( Figure 4). ese changes represent a common evolutionary event and a major factor influencing the size differences among the cp genomes, implying they have an important evolutionary role in plants [50,51]. We also compared the boundary regions of IR/LSC and IR/SSC in the cp genomes of A. buergerianum, A. davidii, A. griseum, A. miaotaiense, A. morrisonense, and A. truncatum. In the A. buergerianum, A. miaotaiense, and A. truncatum cp genomes, the rps19, ycf1, and rpl2 genes were detected at the junctions of the LSC/ IRb, SSC/IR, and LSC/IRa boundary regions, respectively ( Figure 4). However, the rps19 gene was located entirely in the LSC region in the A. miaotaiense cp genome, but not in the other cp genomes. Additionally, in the A. buergerianum and A. truncatum cp genomes, the ycf1 gene was located in the SSC/IRa border regions, which resulted in a pseudogene in the IRb region. e cp genomes of the other three species (i.e., A. davidii, A. griseum, and A. morrisonense) exhibited a similar trend regarding the IR contraction and expansion. e rpl22 and ndhF genes were located in the LSC/IRb and SSC/IRb regions, respectively. e rpl22 gene extended 376 bp into the IRb region. In all cp genomes, the trnH gene was located in the LSC region. Overall, we detected the contraction and expansion of the IR regions in all six analyzed Acer cp genomes.  Gene name  Number   1  Photosystem I  psaA, B, C, I, J  5  2  ATP synthase  atpA, B, E, F a , H, I  6  3  Photosystem II  psbA, B, C, D, E, F, H, I Pseudogenes infA, ycf1 2 Total 134 during evolution. High 퐾 /퐾 ratios have been reported for some genes, including ndhC, rps16, and ycf2 [49]. ese results clearly indicate that cp genes in different plant species may be subjected to diverse selection pressures.

Phylogenetic Analysis.
Chloroplast genome sequences are valuable genomic resources for elucidating evolutionary history and have been widely applied in phylogenetic studies [55][56][57][58][59]. In the current study, to determine the phylogenetic position of A. truncatum, 22 complete cp genome sequences of Sapindales species were obtained from the GenBank database (Supplementary Table S5). A set of 64 single-copy orthologous genes present in the 23 analyzed cp genomes was used to construct phylogenetic trees, with E. hamiltonianus serving as the outgroup. All Aceraceae species, including Acer and Dipteronia species, were grouped in one clade, which was consistent with the results of earlier investigations [25,60,61]. node (107.2 Mya), which is the divergence time of E. hamiltonianus from A. truncatum (98-117 million years ago) (http://www.timetree.org/). Results of divergence dates for some of the observed clades as well as the upper and lower bounds of the 95% highest posterior density intervals are shown on Figure 8. According to the MCMCTREE time estimates, the estimated divergence date for Burseraceae and Anacardiaceae, Meliaceae, and Simaroubaceae were 75.9 (52.9-95.8) Mya, and 73.2 (53.9-91.9) Mya, respectively. ese results are in agreement with recent study [62]. Additionally, the Spaindaceae In a previous study, Chen et al. [25] proved that A. truncatum and A. miaotaiense are closely related. In our study, we obtain similar phylogenetic topologies, the ML trees also strongly supported the close phylogenetic relationship between A. truncatum and A. miaotaiense among the Aceraceae species, with 100% bootstrap support (Figure 7). Overall, the result of our analysis of cp genomes provides a valuable foundation for future analyses of the phylogenetic affinities of Acer species.  six Acer species were selected to characterize the ycf4-cemA sequence: A. tonkinense, A. ginnala, A. negundo, A. henryi, A. truncatum, and A. buergerianum. To develop indel markers, sequence-specific primers were designed to anneal to the conserved regions flanking ycf4 and cemA (Table 4). e predicted products were successfully amplified with the ycf4-cemA-F/R primers for all 24 tested samples (Figure 9(a)). e length of the amplified ycf4-cemA sequence was similar for A. tonkinense, A. ginnala, A. negundo, A. henryi, A. truncatum, and A. buergerianum. In contrast, the corresponding sequence in A. buergerianum was shorter because of the 91-bp deletion (Figures 9(a) and 9(b)). As presented in Figure 9 Moreover, a recent divergence event between A. truncatum and A. miaotaiense around 1.6 (0.7-3.6) Mya. ese results of our study will provide insights into the evolutionary of Aceraceae species.
3.8. Development of the ycf4-cemA Indel Marker. Because indel regions are relatively easy to detect, they are o en used to develop markers for identifying species [63]. In the current study, the sequence variability of the large indel regions, which was revealed by sequence alignments with mVISTA, was used to develop markers. A comparison with the A. truncatum cp genome sequence detected a 91-bp deletion in the ycf4-cemA region of the A. buergerianum cp genome. e following

Supplementary Materials
Supplementary Table S1: PCR-based sequence validation of junctions between the large single-copy (LSC), small single-copy (SSC), and inverted repeat (IRa and IRb) regions of the A. truncatum chloroplast genome. Supplementary