An ideal DNA barcoding region should be short enough to be amplified from degraded DNA. In this paper, we discuss the possibility of using a short nuclear DNA sequence as a barcode to identify a wide range of medicinal plant species. First, the PCR and sequencing success rates of ITS and ITS2 were evaluated based entirely on materials from dry medicinal product and herbarium voucher specimens, including some samples collected back to 90 years ago. The results showed that ITS2 could recover 91% while ITS could recover only 23% efficiency of PCR and sequencing by using one pair of primer. Second, 12861 ITS and ITS2 plant sequences were used to compare the identification efficiency of the two regions. Four identification criteria (BLAST, inter- and intradivergence Wilcoxon signed rank tests, and TaxonDNA) were evaluated. Our results supported the hypothesis that ITS2 can be used as a minibarcode to effectively identify species in a wide variety of specimens and medicinal materials.
DNA barcoding takes advantage of short standard sequences to discover and identify species [
The Plant Working Group of the Consortium for the Barcode of Life (CBOL) recommended the use of a combination of
Here, we demonstrated the effectiveness of ITS2 as a minibarcode in comparison with the full-length ITS for the identification of a wide range of archived plant species. An initial set of 100 medicinal samples from museum specimens and the herb market was tested to determine the PCR and sequencing efficiencies of ITS and ITS2. A second set of 12861 sequences, representing 8313 species collected from GenBank, was examined to compare the identification abilities of ITS and ITS2. This work aims to provide an evaluation of ITS2 as a minibarcode for large samples.
The initial set of 100 museum medicinal specimens and herbal products from 92 species representing 5 orders (see Table 1 of the Supplementary Material available online at http://dx.doi.org/10.1155/2013/741476) was collected from the Buozhou herbal market and from specimens at the Institute of Medicinal Plant Development, some of which were collected 90 years ago, to test the efficiency of PCR and sequencing. All the samples were authenticated at the species level by Professor Yulin Lin (Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences). A second set of sequences for the identification efficiency analysis presented in this paper was obtained from the GenBank nucleotide sequence database. We carried out a bioinformatics analysis using all ITS sequences present in GenBank matching the search pattern “18S ribosomal RNA gene; internal transcribed spacer 1, 5.8S ribosomal RNA gene, and internal transcribed spacer 2, and 28S ribosomal RNA gene.” Partial sequences, fungal sequences, and sequences of less than 100 bp were removed. A flowchart is shown in Figure
Flowchart of data analysis.
Total genomic DNA was extracted from specimens using the Plant Genomic DNA Kit (Tiangen Biotech Beijing Co., Ltd., China) according to the manufacturer’s instructions. The primer sequences for ITS2 were described by Chen et al. [
Six parameters were used to characterize the interspecific and intraspecific divergences, according to a previously described method [
To evaluate the efficiency of PCR and sequencing, 100 medicinal samples from herbal market and museum specimens, including 91 species from 5 orders, were tested; 16% of the samples were obtained from the herb market, and the remaining 84% were obtained from the Institute of Medicinal Plant Development. The ITS primer pair yielded a recovery rate of only 23%, compared with the 91% recovery rate for ITS2. All sequences were submitted to GenBank (the GenBank accession numbers are listed in Table 1S, Supplementary Material). The small size of ITS2 facilitates its amplification by universal primers, even in samples with partially degraded DNA.
Comparison of the inter- and intraspecies sequence variation was an important aspect of the barcoding identification. For the 12861 ITS and ITS2 sequences, which contained 8313 species from 1699 genera, the average lengths of ITS and ITS2 were 634 bp and 233 bp, respectively. The comparison of the inter- and intraspecific genetic distances revealed that the ITS2 region exhibited a higher inter-specific divergence according to the three inter-specific parameters (Table
Analysis of interspecific divergence and intraspecific variation of candidate barcodes.
Marker | ITS | ITS2 |
---|---|---|
Avg_intra_avg |
|
|
Avg_intra_max |
|
|
Avg_intra_between_intra-species |
|
|
Avg_interbyG_avg |
|
|
Avg_interbyG_min |
|
|
Avg_between_interbyGenus |
|
|
The differences in the percent sequence divergence between loci were tested using the Wilcoxon signed rank test. The results showed that ITS2 was a more variable barcode (Table
Wilcoxon signed rank tests of inter- and intraspecific divergences among loci.
Divergence | Interrelative ranks, |
Result |
---|---|---|
Interspecific |
|
ITS2 > ITS |
Intraspecific |
|
ITS2 > ITS |
BLAST1 was used to evaluate the efficiencies of ITS2 and ITS. ITS and ITS2 successfully identified 89.2% and 79.2% of specimens, respectively, at the species level and 97.5% and 93.8%, respectively, at the genus level (Table
Identification efficiency of ITS and ITS2 by using BLAST.
Marker | Samples | Genus | Species | Length | Identification success at genus level | Identification success at the species level |
---|---|---|---|---|---|---|
ITS | 12861 | 1699 | 8313 | 633.7 | 97.5% | 89.2% |
ITS2 | 12861 | 1699 | 8313 | 232.6 | 93.8% | 79.2% |
To estimate the respective identification efficiency per genus, genera that contain at least 20 species were selected independently (Table
Comparing of the identification rates of ITS with ITS2 in genera with more than 20 species.
Unidentified species | Unidentified species | ||||||||
---|---|---|---|---|---|---|---|---|---|
Genus | Species | Samples | Genus | Species | Samples | ||||
ITS | ITS2 | ITS | ITS2 | ||||||
|
57 | 341 | 0.30% | 0.30% |
|
245 | 506 | 0.20% | 0.20% |
|
37 | 39 | 0.00% | 2.60% |
|
39 | 93 | 0.00% | 1.10% |
|
42 | 71 | 1.40% | 1.40% |
|
166 | 175 | 0.60% | 0.60% |
|
64 | 64 | 1.60% | 1.60% |
|
20 | 24 | 0.00% | 0.00% |
|
28 | 39 | 2.60% | 2.60% |
|
67 | 67 | 1.50% | 1.50% |
|
73 | 80 | 1.30% | 1.30% |
|
31 | 31 | 0.00% | 3.20% |
|
21 | 21 | 4.80% | 4.80% |
|
31 | 43 | 2.30% | 2.30% |
|
20 | 23 | 4.30% | 4.30% |
|
76 | 92 | 1.10% | 1.10% |
|
39 | 40 | 0.00% | 2.50% |
|
168 | 185 | 0.50% | 0.50% |
|
30 | 33 | 3.00% | 3.00% |
|
29 | 39 | 2.60% | 2.60% |
|
26 | 63 | 1.60% | 1.60% |
|
39 | 55 | 1.80% | 1.80% |
|
79 | 106 | 0.90% | 0.90% |
|
26 | 27 | 3.70% | 3.70% |
|
45 | 75 | 1.30% | 1.30% |
|
64 | 82 | 1.20% | 1.20% |
|
27 | 27 | 3.70% | 3.70% |
|
63 | 66 | 0.00% | 1.50% |
|
45 | 45 | 2.20% | 2.20% |
|
37 | 42 | 2.40% | 2.40% |
|
44 | 69 | 1.40% | 1.40% |
|
68 | 88 | 1.10% | 1.10% |
|
32 | 42 | 2.40% | 2.40% |
|
44 | 56 | 1.80% | 1.80% |
|
34 | 58 | 1.70% | 1.70% |
|
31 | 32 | 3.10% | 3.10% |
|
35 | 37 | 0.00% | 2.70% |
|
24 | 27 | 0.00% | 0.00% |
|
20 | 24 | 4.20% | 4.20% |
|
20 | 47 | 2.10% | 2.10% |
|
122 | 273 | 0.40% | 0.40% |
|
21 | 28 | 3.60% | 3.60% |
|
35 | 98 | 1.00% | 1.00% |
|
98 | 116 | 0.90% | 0.90% |
|
20 | 23 | 4.30% | 4.30% |
|
25 | 33 | 3.00% | 3.00% |
|
147 | 152 | 0.70% | 0.70% |
|
62 | 124 | 0.80% | 0.80% |
|
38 | 38 | 2.60% | 2.60% |
|
31 | 37 | 0.00% | 0.00% |
|
72 | 141 | 0.70% | 0.70% |
|
42 | 48 | 2.10% | 2.10% |
|
43 | 92 | 1.10% | 1.10% |
|
34 | 41 | 0.00% | 0.00% |
|
44 | 51 | 2.00% | 2.00% |
|
54 | 56 | 0.00% | 1.80% |
|
20 | 20 | 0.00% | 5.00% |
|
21 | 22 | 4.50% | 4.50% |
|
32 | 81 | 1.20% | 1.20% |
|
76 | 89 | 1.10% | 1.10% |
|
52 | 63 | 0.00% | 1.60% |
|
42 | 42 | 2.40% | 2.40% |
|
49 | 67 | 0.00% | 1.50% |
|
20 | 28 | 3.60% | 3.60% |
|
51 | 96 | 1.00% | 1.00% |
|
71 | 91 | 1.10% | 1.10% |
|
26 | 26 | 3.80% | 3.80% |
|
29 | 29 | 0.00% | 3.40% |
|
22 | 31 | 3.20% | 3.20% |
|
22 | 30 | 0.00% | 0.00% |
|
25 | 29 | 0.00% | 3.40% |
|
27 | 215 | 0.50% | 0.50% |
|
38 | 38 | 2.60% | 2.60% |
|
27 | 303 | 0.30% | 0.30% |
|
46 | 81 | 1.20% | 1.20% |
|
21 | 23 | 4.30% | 4.30% |
|
29 | 33 | 0.00% | 0.00% |
|
222 | 257 | 0.40% | 0.40% |
|
34 | 34 | 2.90% | 2.90% |
|
132 | 153 | 0.70% | 0.70% |
We also used TaxonDNA to assess the accuracy of species identification based on ITS and ITS2. TaxonDNA is an alignment-based parametric clustering program that determines the closest match of a sequence by comparing it with all other sequences in the aligned data set. If the compared sequences were from the same species, the identification was considered successful, whereas mismatched names were counted as failures. Cases with several equally good best matches from different species were considered ambiguous [
We used TaxonDNA to set the threshold value. All sequences without a match below the 97% threshold value remained unidentified. If the compared sample names were identical, the identification was considered correct; if the sequence names were mismatched, the identification was considered a failure. When several equally good best matches that belonged to a minimum of two species were found, the identification was considered ambiguous [
Identification success based on “best match” and “best close match.”
Best match | Best close match | |||
---|---|---|---|---|
ITS | ITS2 | ITS | ITS2 | |
Correct identification (%) | 67.88 | 60 | 62.53 | 32.00 |
Ambiguous identification (%) | 15 | 0 | 14.0 | 0.00 |
Incorrect identification (%) | 17 | 40 | 7.28 | 0.00 |
Without any match closer than 3.0% (%) | — | — | 16.20 | 68.00 |
Many museum specimens are very useful for DNA barcoding studies. However, high-quality DNA can be difficult to obtain from these specimens, making PCR amplification and sequencing inefficient. In this study, we recovered short ITS2 sequences from more than 90% of the herbal specimens representing 5 orders, whereas the recovery rate for ITS with a single primer set was only 23%. This discrepancy between the two regions arises because ITS is very long relative to ITS2, and ITS require a variety PCR conditions and additives for successful amplification [
To determine whether barcode gaps are present in this study, the relationships between the inter- and intraspecific divergences were compared for each species. For the 12861 samples, ITS and ITS2 could identify 97.5% and 93.8% of genera, respectively, by the BLAST method. The full-length ITS could identify approximately 89.2% of the species, and the mini-DNA barcode ITS2 successfully identified approximately 79.2% of the species, which is higher than the CBOL proposed plant combination of
TaxonDNA was also used to compare the identification efficiencies of ITS and ITS2, and the result appeared to be similar to that obtained by the BLAST method. ITS had slightly superior successful identification and misidentification rates compared with ITS2, but the ambiguous identification rate of ITS2 was 0%, whereas that of ITS was 14.9% and 14.0% under the “best match” and “best close match” algorithms, respectively. The zero ambiguous identification rate of ITS2 may be due to its conserved secondary structure. The secondary structure of ITS2 has proven useful for diagnostic purposes at the species level [
ITS2 has many advantages that make it superior to ITS. First, it is important that species be defined correctly for DNA barcoding by systematic analysis [
Second, millions of species will need to be sequenced for a global barcode project, and this would be extremely costly using standard sequencing methods. The read lengths provided by high-throughput sequencing would be sufficient to build a database of ITS2 mini-DNA barcode sequences. High-throughput sequencing technology uses an emulsion PCR approach to simultaneously amplify several thousand 100–200 bp DNA molecules in one reaction and yields a large number of short sequences with a lower cost than standard approaches. Mello proved that the ITS2 read length obtained by high-throughput 454 sequencing provided adequate information for taxon assignment [
Third, although ITS2 was less powerful than ITS for resolving some closely related species, it showed many advantages, especially in identifying herbs and specimens containing degraded DNA. ITS2 sequences could be used to design taxon-specific probes for the rapid identification of plants [
Finally, there are hundreds of copies of ITS within a genome. Nonetheless, ITS2 can be considered a single locus in the whole genome of most organisms [
This study demonstrated the potential of the ITS2 minibarcode for DNA barcoding analyses. ITS2 showed high sequence variability among 12861 samples from 8313 species. An ideal DNA barcoding marker for taxonomic classification should be fast-evolving to allow classification at the species level but must also contain highly conserved priming sites and be highly reliable for DNA amplification and sequencing [
The authors declare that no conflict of interests exists in this paper.
This work was supported by the National Natural Science Foundation of China (Grant nos. 81001608 and 81130069). The authors thank their colleagues who helped in the sample collection, identification, laboratory work, and paper preparation, including Professor Yulin Lin, Chang Liu, and many others.