The Short ITS2 Sequence Serves as an Efficient Taxonomic Sequence Tag in Comparison with the Full-Length ITS

An ideal DNA barcoding region should be short enough to be amplified from degraded DNA. In this paper, we discuss the possibility of using a short nuclear DNA sequence as a barcode to identify a wide range of medicinal plant species. First, the PCR and sequencing success rates of ITS and ITS2 were evaluated based entirely on materials from dry medicinal product and herbarium voucher specimens, including some samples collected back to 90 years ago. The results showed that ITS2 could recover 91% while ITS could recover only 23% efficiency of PCR and sequencing by using one pair of primer. Second, 12861 ITS and ITS2 plant sequences were used to compare the identification efficiency of the two regions. Four identification criteria (BLAST, inter- and intradivergence Wilcoxon signed rank tests, and TaxonDNA) were evaluated. Our results supported the hypothesis that ITS2 can be used as a minibarcode to effectively identify species in a wide variety of specimens and medicinal materials.


Introduction
1.1. DNA Barcoding of Degraded DNA Materials. DNA barcoding takes advantage of short standard sequences to discover and identify species [1]. An ideal DNA barcode should be short enough to be ampli�ed from archival specimens using universal primers. e term "minimalist barcode� was �rst de�ned by Herbert as a tool to overcome the low PCR efficiency of cytochrome c-oxidase subunit 1 (CO1) in archival animal specimens in museums, and the possibility of identifying animal specimens using a region of approximately 200 bp was discussed. e results of that study showed that minibarcodes can be isolated from different types of specimens, including museum samples, trace tissue samples with degraded DNA and other specimens, from which the acquisition of a full-length barcode (CO1) is not feasible [2]. e ampli�cation of DNA from herbarium specimens is also important for barcoding studies because it is o�en necessary to con�rm the species identi�cation of fresh specimens by comparing their sequences with those of older museum specimens [3]. Additionally, most of the medicinal materials available in the market are dry and have been stored for long periods; thus, it is very difficult to amplify long DNA regions from some of these materials, which prevent the use of DNA barcodes for herb identi�cation.

e Trend of Core Plant DNA Barcodes. e Plant
Working Group of the Consortium for the Barcode of Life (CBOL) recommended the use of a combination of matk and rbcL as a barcode for land plants [4], and internal transcribed spacer (ITS)/internal transcribed spacer 2 (ITS2) was proposed as a supplemental marker for further study. e ITS sequence contains enough variable sites for species identi�cation in many samples [5][6][7][8][9], but ITS could not be ampli�ed from approximately 12% of herbarium samples [3], because ITS1 is too variable to guarantee reliable alignments and contains variable indels (insertions/deletions) at this taxonomic level. Additionally, multiple functional copies exist in many taxa. us, ITS was excluded as a universal land plant barcode in the earlier stages. In contrast, ITS2 is considered to have evolved in concert, which leads to a homogenization of all the copies of this gene throughout the genome and in most organisms ITS2 was treated as a single locus. us, the ITS2 region might be a suitable marker for taxonomic classi�cation [10][11][12]. Recently, ITS2 has been suggested as a useful barcode for medicinal plants [13][14][15][16][17], as a universal DNA barcode to identify plants and as a complementary locus of CO1 to identify animals [18]. e China Plant Barcode of Life Group considered ITS2 to be a useful alternative to ITS because it is more easily ampli�ed and sequenced [19]. In addition, the secondary structure of ITS2 was shown to be an efficient tool for biological species identi�cation [20,21].
Here, we demonstrated the effectiveness of ITS2 as a minibarcode in comparison with the full-length ITS for the identi�cation of a wide range of archived plant species. An initial set of 100 medicinal samples from museum specimens and the herb market was tested to determine the PCR and sequencing efficiencies of ITS and ITS2. A second set of 12861 sequences, representing 8313 species collected from Gen-Bank, was examined to compare the identi�cation abilities of ITS and ITS2. is work aims to provide an evaluation of ITS2 as a minibarcode for large samples.

Plant
Material. e initial set of 100 museum medicinal specimens and herbal products from 92 species representing 5 orders (see Table 1 of the Supplementary Material available online at http://dx.doi.org/10.1155/2013/741476) was collected from the Buozhou herbal market and from specimens at the Institute of Medicinal Plant Development, some of which were collected 90 years ago, to test the efficiency of PCR and sequencing. All the samples were authenticated at the species level by Professor Yulin Lin (Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences). A second set of sequences for the identi�cation efficiency analysis presented in this paper was obtained from the GenBank nucleotide sequence database. We carried out a bioinformatics analysis using all ITS sequences present in GenBank matching the search pattern "18S ribosomal RNA gene; internal transcribed spacer 1, 5.8S ribosomal RNA gene, and internal transcribed spacer 2, and 28S ribosomal RNA gene. " Partial sequences, fungal sequences, and sequences of less than 100 bp were removed. A �owchart is shown in Figure 1. e complete ITS2 and full-length ITS regions were annotated using the Hidden Markov Model (HMM) [22] and ITS plant model, respectively, which rely on highly similar and correctly annotated reference sequences present in the public database. Ultimately, 12861 sequences representing 8313 species from 1699 genera were obtained (GenBank accession numbers are listed in Table 2S) and used to analyze the identi�cation efficiencies of ITS and ITS2.

��A ��tra�tion� PCR Am�li��ation and �e��en�in�.
Total genomic DNA was extracted from specimens using the Plant Genomic DNA Kit (Tiangen Biotech Beijing Co., Ltd., China) according to the manufacturer's instructions. e primer sequences for ITS2 were described by Chen et al. [13]. ITS was ampli�ed using the primers ITS5 and ITS4 [23]. e PCR conditions and sequences used to amplify the two regions (ITS and ITS2) were based on the methods described by Kress et al. and Chen et al. [1,13,24,25].

Analysis Method.
Six parameters were used to characterize the interspeci�c and intraspeci�c divergences, according to a previously described method [13]. ree of the parameters were used to estimate the interspeci�c variability: average inter-speci�c distance, average theta prime, and smallest inter-speci�c distance. e other three parameters were used to evaluate the intraspeci�c divergence: average intraspeci�c difference, theta, and average coalescent depth. e Wilcoxon signed rank test was used as described previously [13,26,27]. Basic Local Alignment Search Tool (BLAST1) was performed to identify the species [13]. e TaxonDNA soware was used to calculate the identi�cation efficiency [28,29].

PCR and Universal Primers.
To evaluate the efficiency of PCR and sequencing, 100 medicinal samples from herbal market and museum specimens, including 91 species from 5 orders, were tested; 16% of the samples were obtained from the herb market, and the remaining 84% were obtained from the Institute of Medicinal Plant Development. e ITS primer pair yielded a recovery rate of only 23%, compared with the 91% recovery rate for ITS2. All sequences were submitted to GenBank (the GenBank accession numbers are listed in Table  1S, Supplementary Material). e small size of ITS2 facilitates its ampli�cation by universal primers, even in samples with partially degraded DNA.

Com�arison o� �nter� and �ntras�e�i�� �iver�en�es.
Comparison of the inter-and intraspecies sequence variation was an important aspect of the barcoding identi�cation. For the 12861 ITS and ITS2 sequences, which contained 8313 species from 1699 genera, the average lengths of ITS and ITS2 were 634 bp and 233 bp, respectively. e comparison of the inter-and intraspeci�c genetic distances revealed that the ITS2 region exhibited a higher inter-speci�c divergence according to the three inter-speci�c parameters (Table 1). Another advantage of ITS2 is that its conserved secondary structure is associated with relatively low intra-speci�c variation. e combination of a conserved secondary structure with a variable sequence appears to be a major bene�t of using ITS2 [30].
e differences in the percent sequence divergence between loci were tested using the Wilcoxon signed rank test. e results showed that ITS2 was a more variable barcode ( Table 2). ITS contained a conserved 5.8S region, which decreased the comparative divergence. Based on these results, ITS2 demonstrates sufficient variation to differentiate plants. ITS  To estimate the respective identi�cation efficiency per genus, genera that contain at least 20 species were selected independently ( Table 4). In 85% (68/80) of the genera, the success rates of ITS and ITS2 are identical. ITS had an identi�cation efficiency superior to that of ITS2 in the following 12 genera: Gunnera, Luzula, Strobilanthes, Nepeta, Dionysia, Adenia, Clidemia, Sedum, Indigofera, Kalanchoe, Pilea, and Melampodium. Of the 603 genera that contain at least 3 samples, ITS2 and ITS had the same identi�cation efficiency in 394 genera (65.3%), and ITS and ITS2 shared a 100 % identi�cation efficiency at the species level in 345 genera (57.2%) (Table 3S).
�.�.�. �a�onDNA Identi�cation. We also used TaxonDNA to assess the accuracy of species identi�cation based on ITS and ITS2. TaxonDNA is an alignment-based parametric clustering program that determines the closest match of a sequence by comparing it with all other sequences in the aligned data set. If the compared sequences were from the same species, the identi�cation was considered successful, whereas mismatched names were counted as failures. Cases with several equally good best matches from different species were considered ambiguous [29]. In this study, the successful identi�cation rates of the "best match" were 67.88% and 60% for ITS and ITS2, respectively. e ambiguous identi�cation rates of ITS and ITS2 were 14.9% and 0%, respectively, and the misidenti�cation rates were 17.2% and 40%, respectively. e dataset contained 8607 sequences with duplication.
We used TaxonDNA to set the threshold value. All sequences without a match below the 97% threshold value remained unidenti�ed. If the compared sample names were identical, the identi�cation was considered correct� if the sequence names were mismatched, the identi�cation was considered a failure. When several equally good best matches that belonged to a minimum of two species were found, the identi�cation was considered ambiguous [29,31]. e successful identi�cation rates under the "best close match" were 62.53% and 32% for ITS and ITS2, respectively. e ambiguous identi�cation rates of ITS and ITS2 were 14.0% and 0%, respectively. e misidenti�cation rates of ITS and ITS2 were 7.28% and 0%, respectively. e remaining samples were considered unidenti�ed because they had no matches below the threshold value. e nonmatch ratios of ITS and ITS2 were 16.2% and 68%, respectively (Table  5). ITS provided slightly superior successful identi�cation and misidenti�cation rates compared with ITS2, but ITS2 provided a lower ambiguous identi�cation rate (0% versus 14.9% and 14.0% under the "best match" and "best close match, " resp., for ITS).

PCR and Sequencing Success Rates.
Many museum specimens are very useful for DNA barcoding studies. However, high-quality DNA can be difficult to obtain from these specimens, making �CR ampli�cation and sequencing inefficient. In this study, we recovered short ITS2 sequences from more than 90% of the herbal specimens representing 5 orders, whereas the recovery rate for ITS with a single primer set was only 23%. is discrepancy between the two regions arises because ITS is very long relative to ITS2, and ITS require a variety PCR conditions and additives for successful ampli-�cation [32]. Another potential explanation is that intact DNA was difficult to extract from these samples due to the degradation that occurred in the museum specimens during the long storage period and in the herbs from the market during harvesting, processing, and storage. In contrast, the ITS2 region can be easily ampli�ed and sequenced with conserved primers. Due to its relatively short length, the ITS2 minibarcode could be ampli�ed with greater success than the full-length ITS sequences in almost all groups.

Identi�cation ��cienc� of ITS and ITS2.
To determine whether barcode gaps are present in this study, the relationships between the inter-and intraspeci�c divergences were compared for each species. For the 12861 samples, ITS and ITS2 could identify 97.5% and 93.8% of genera, respectively, by the BLAST method. e full-length ITS could identify approximately 89.2% of the species, and the mini-DNA barcode ITS2 successfully identi�ed approximately 79.2% of the species, which is higher than the CBOL proposed plant combination of matK and rbcL (70%) [4,5]. TaxonDNA was also used to compare the identi�cation efficiencies of ITS and ITS2, and the result appeared to be similar to that obtained by the BLAST method. ITS had slightly superior successful identi�cation and misiden-ti�cation rates compared with ITS2, but the ambiguous identi�cation rate of ITS2 was 0%, whereas that of ITS was 14.9% and 14.0% under the "best match" and "best close match" algorithms, respectively. e zero ambiguous identi�cation rate of ITS2 may be due to its conserved secondary structure. e secondary structure of ITS2 has proven useful for diagnostic purposes at the species level [21], which might reduce the ambiguous identi�cation rates and increase the correctness of the barcoding analysis. Evidence has shown that a combination of nucleotide and secondary structure data can overcome some of the limitations of ITS2 [33] and that the ITS2 sequence and secondary structure (sequence-structure) provided the most accurate results, which bene�t from the secondary structure [30,34]. us, the use of the ITS2 secondary structures would be extremely helpful to address the challenges of species identi�cation and classi�cation.

ITS2 versus ITS: Advantages and
Limitations. ITS2 has many advantages that make it superior to ITS. First, it is important that species be de�ned correctly for DNA barcoding by systematic analysis [3]. ITS2 regions with secondary structures are more conserved than the DNA sequences alone, which could provide information that is useful for the cladistic inference of relationships [35], and the ITS2 sequence-structure information provides a compensatory base changes (CBCs) analysis result that correlates with the biological species concept [21]. us, ITS2 has been considered a double-edged tool for evolutionary comparisons in eukaryotes [12].
Second, millions of species will need to be sequenced for a global barcode project, and this would be extremely costly using standard sequencing methods. e read lengths provided by high-throughput sequencing would be sufficient to build a database of ITS2 mini-DNA barcode sequences. High-throughput sequencing technology uses an emulsion PCR approach to simultaneously amplify several thousand 100-200 bp DNA molecules in one reaction and yields a large number of short sequences with a lower cost than standard approaches. Mello proved that the ITS2 read length obtained by high-throughput 454 sequencing provided adequate information for taxon assignment [36]. Song et al. used high-throughout 454 sequencing to successfully obtain a large number of ITS2 sequences in one reaction [37]. e amenability to high-throughput approaches and high identi�cation efficiency makes the ITS2 minibarcode useful for projects involving a large number of environmental samples.
ird, although ITS2 was less powerful than ITS for resolving some closely related species, it showed many advantages, especially in identifying herbs and specimens containing degraded DNA. ITS2 sequences could be used to design taxon-speci�c probes for the rapid identi�cation of plants [38], and an ITS2 microarray has been used to successfully separate species with sequence identities up to 97% [39]. Considering the short length and high identi�cation efficiency of the ITS2 sequence, we con�rmed that this very short barcode sequence is valuable for the identi�cation of old specimens and medicinal materials.
Finally, there are hundreds of copies of ITS within a genome. Nonetheless, ITS2 can be considered a single locus in the whole genome of most organisms [10,12,37], including Panax ginseng and Panax quinquefolius (unpublished), making ITS2 more suitable as a barcode than ITS.
is study demonstrated the potential of the ITS2 minibarcode for DNA barcoding analyses. ITS2 showed high sequence variability among 12861 samples from 8313 species. An ideal DNA barcoding marker for taxonomic classi�cation should be fast-evolving to allow classi�cation at the species level but must also contain highly conserved priming sites and be highly reliable for DNA ampli�cation and sequencing [40]. e ITS2 region meets the expected criteria of a global DNA barcode. Our analysis supports the use of the ITS2 minibarcode as a "universal DNA barcode" for the rapid identi�cation of medicinal materials and specimens.
�on��ct of �nterests e authors declare that no con�ict of interests exists in this paper.