Integrated Analysis for Identifying Radix Astragali and Its Adulterants Based on DNA Barcoding

Radix Astragali is a popular herb used in traditional Chinese medicine for its proimmune and antidiabetic properties. However, methods are needed to help distinguish Radix Astragali from its varied adulterants. DNA barcoding is a widely applicable molecular method used to identify medicinal plants. Yet, its use has been hampered by genetic distance, base variation, and limitations of the bio-NJ tree. Herein, we report the validation of an integrated analysis method for plant species identification using DNA barcoding that focuses on genetic distance, identification efficiency, inter- and intraspecific variation, and barcoding gap. We collected 478 sequences from six candidate DNA barcodes (ITS2, ITS, psbA-trnH, rbcL, matK, and COI) from 29 species of Radix Astragali and adulterants. The internal transcribed spacer (ITS) sequence was demonstrated as the optimal barcode for identifying Radix Astragali and its adulterants. This new analysis method is helpful in identifying Radix Astragali and expedites the utilization and data mining of DNA barcoding.

Due to the high market demand for Radix Astragali, a diverse group of adulterants with similar-morphological characteristics from genuses, such as Astragalus, Hedysarum, and Malva are often used in its stead [7]. The traditional methods used to identify Radix Astragali for use as a medicinal material, such as morphological and microscopic identification [8], thin-layer chromatography and Ultraviolet spectroscopy [9], Fourier Transform infrared spectroscopy (FTIR) [10], and high performance liquid chromatography (HPLC) [11], all, require specialized equipment and training.
Several PCR-based molecular methods have been developed, providing an alternative means of identification. Multiplex PCR methods of DNA fragment analysis, such as randomly amplified polymorphic DNA (RAPD) [12] or amplified fragment length polymorphism (AFLP) [13], are unstable for the results to identify. DNA barcoding is a widely used molecular marker technology, first proposed by Hebert et al. [14,15]. It uses a standardized and conserved, but diverse, DNA sequence to identify species and uncover biological diversity [16,17]. In previous studies, various coding sequences for identifying Radix Astragali and its adulterants have been used, such as the 5S-rRNA spacer domain [18], 3 untranslated region (3 UTR) [19], ITS (internal transcribed spacer region) and 18S rRNA [3,20,21], ITS2 [22], ITS1 [6], matK (maturase K) and rbcL (ribulose 1, 5-bisphosphate carboxylase) of chloroplast genome, and coxI (cytochrome c oxidase 1) of the mitochondrial genome [23]. However, sequence analysis was mainly focused on genetic distance, variable sites, amplified polymorphisms, and the use of a modified neighbor-joining (NJ) algorithm, Bio-NJ tree, which were basic analyses limited to particular species. A more effective

Materials Information.
A total of 77 specimens were collected from two origins of Radix Astragali, along with seven adulterants. Radix Astragali specimens were collected from Inner Mongolia, Shaan xi, and Gan su provinces in the People's Republic of China, which are the main producing areas. The collection information is shown in Table 1. All corresponding voucher specimens were deposited in the Herbarium of the Institute of Medicinal Plant Development at the Chinese Academy of Medical Sciences in Beijing, China. The GenBank accession number of the ITS2 in this experiment was orderly KJ999296-KJ999344, the accession number of ITS sequences was orderly KJ999345-KJ999416, and the accession number of psbA-trnH was orderly KJ999256-KJ999295. The sequences added in the subsequent analysis, including ITS, ITS2, psbA-trnH, matK, and rbcL, were downloaded from the GenBank database.

DNA Extraction, PCR Amplification, and Sequencing.
The material specimens were naturally dried and 30 mg of dried plant material was used for the DNA extraction. Samples were rubbed for two minutes at a frequency of 30 r/s in a FastPrep bead mill (Retsch MM400, Germany), and total genomic DNA was isolated from the crushed material according to the manufacturer's instructions (Plant Genomic DNA Kit, Tiangen Biotech Co., China). We made the following modifications to the protocol: chloroform was diluted with isoamyl alcohol (24 : 1 in the same volume) and buffer solution GP2 with isopropanol (same volume). The powder, 700 L of 65 ∘ C GP1, and 1 L -mercaptoethanol were mixed for 10-20 s before being incubated for 60 minutes at 65 ∘ C. Then, 700 L of the chloroform:isoamyl alcohol mixture was added and the solution was centrifuged for 5 minutes at 12000 rpm (∼13400 ×g). Supernatant was removed and placed into a new tube before adding 700 L isopropanol and blending for 15-20 minutes. The mixture was centrifuged in CB3 spin columns for 40 s at 12000 rpm. The filtrate was discarded and 500 L GD (adding quantitative anhydrous ethanol before use) was added before centrifuging at 12000 rpm for 40 s. The filtrate was discarded and 700 L PW (adding quantitative anhydrous ethanol before use) was used to wash the membrane before centrifuging for 40 s at 12000 rpm. This step was repeated with 500 L PW, followed by a final centrifuge for 2 minutes at 12000 rpm to remove residual wash buffer. The spin column was dried at room temperature for 3-5 minutes and then centrifuged for 2 minutes at 12000 rpm to obtain the total DNA. General PCR reaction conditions and universal DNA barcode primers were used for the ITS, ITS2, and psbA-trnH barcodes, as presented in Table 2 [24][25][26]. PCR amplification was performed on 25-L reaction mixtures containing 2 L DNA template (20-100 ng), 8.5 L ddH2O, 12.5 L 2× Taq PCR Master Mix (Beijing TransGen Biotech Co., China), and 1/1-L forward/reverse (F/R) primers (2.5 M). The reaction mixtures were amplified in a 9700 GeneAmp PCR system (Applied Biosystems, USA). Amplicons were visualized by electrophoresis on 1% agarose gels. Purified PCR products were sequenced in both directions using the ABI 3730XL sequencer (Applied Biosystems, USA).

Sequence
Assembly, Alignment, and Analysis. Sequencing peak diagrams were obtained and proofread, and then contigs were assembled using a CodonCode Aligner 5.0.1 (Codon-Code Co., USA). Complete ITS2 sequences were obtained using the HMMer annotation method, based on the Hidden Markov model (HMM) [27]. All of the sequences were aligned using ClustalW, in combination with 317 sequences from six commonly used barcodes (ITS2, ITS, psbA-trnH, matK, rbcL, and COI), which were downloaded from the GenBank database (Table 3). Sequence genetic distance and GC content were calculated using the maximum composite likelihood model. Maximum likelihood (ML) trees were constructed based on the Tamura-Nei model, and bootstrap tests were conducted using 1000 repeats to assess the confidence of the phylogenetic relationships by MEGA 6.0 software [28].
The barcoding gap, defined as the spacer region between intra-and interspecific genetic variations, and identification efficiency, based on BLAST1 and K2P nearest distance, were performed by the Perl language algorithm (Putty) [25,29,30].

Sequence Information and Identification Efficiency.
A total of 478 sequences for six barcodes were analyzed, from which 161 sequences were obtained from Astragalus Radix and its adulterants. Sequence information and identification success rates are listed in Table 4. The average GC content of six barcodes was discrepant, and ITS and ITS2 regions from nuclear ribosomal DNA performed higher than other barcodes (52.97% versus 50.80%). Among the six barcodes, ITS2 provided the largest average genetic distance (1.0792), and rbcL was the smallest (0.0349). All of the six barcodes obtained a zero value for the minimum genetic distance. In terms of identification efficiency, the nearest distance method was superior to the BLAST1 method for all of the six barcodes. Moreover, ITS and the psbA-trnH and matK regions provided a higher rate of success than the other three barcodes using the BLAST1 method. However, matK, ITS, and psbA-trnH performed better than the other three barcodes, based on the nearest distance method. ITS and psbA-trnH obtained higher genetic distances, so the matK, ITS, and psbA-trnH barcodes were the preferable methods for identifying Radix Astragali and its adulterants based on superior sequencing efficiency and identification efficiency.

Intra-and Interspecific Variation Analysis Using Six
Parameters. Six parameters to analyze intraspecific variation and interspecific divergence were employed to assess the utility of six DNA barcodes (Table 5). We expected the "minimum interspecific distance" would be higher than the "coalescent depth" (maximum intraspecific distance). Therefore, we first utilized the "gap rate" to indicate the distinctness, calculated by the formula: (minimum interspecific distance − maximum intraspecific distance)/minimum interspecific distance. Results show that the ITS2, COI, matK, and rbcL regions outperformed the ITS and psbA-trnH regions for gap rates. However, when we compared all of the average interand intraspecific distances, the ITS2, rbcL, matK, and psbA-trnH regions performed better than the ITS and COI regions. Therefore, in terms of intra-and interspecific variation, ITS2, matK, and rbcL are the preferable options for identifying Radix Astragali and its adulterants.

Barcoding Gap Analysis.
Analysis of the DNA barcoding gap presents the divergence of inter-and intraspecies and indicates separate, nonoverlapping distribution between specimens in an ideal situation [25]. In our study (Figure 1), the rbcL, COI, ITS, and matK regions possessed less relative distribution of inter-and intraspecific variation than psbA-trnH and ITS2, although there were no nonoverlapping regions for the six barcodes. Hence, the rbcL, COI, ITS, and matK regions are more successful at identifying Radix Astragali and its adulterants, from the standpoint of barcoding gap analysis.

ML Tree Analysis.
Maximum likelihood (ML) is a general statistical criterion in widespread use for the inference of molecular phylogenies [31]. An ML tree visually revealed the relationship between species. As the results show (Figure 2),    psbA-trnH successfully differentiated Radix Astragali and its adulterants. Furthermore, it produced areas of obvious separation for Radix Astragali. The remaining five barcodes also differentiated Radix Astragali and its adulterants. Each species clustered together, separate from other species. Considering the difficult amplification and sequencing and fast and accurate identification purpose of DNA barcoding, we did not add all the sequence data of ITS2 and psbA-trnH to build ML tree and subsequent analysis.

Discussion and Conclusions
Radix Astragali is reported to possess 47 bioactive compounds and has many bioactive properties [32][33][34][35][36][37]. Various Radix Astragali preparations are commercially available, not only in China as a TCM component, but also in the United States, as dietary supplements [38]. However, due to increasing demand, substitutes and adulterants have flooded the market. Traditional identification methods, such as morphological and microscopic methods, are limited by the lack of explicit criteria for character selection or coding and, thus, mainly depend on subjective assessments. Although chemical methods are able to distinguish between different species, it is difficult to differentiate sibling species that possess similar chemical compositions. In addition, chemical methods are unable to provide accurate species authentication. Several types of molecular markers for characterizing genotypes are useful in identifying plant species. For example, RAPD has been used to estimate genetic diversity in plant populations based on amplification of random DNA fragments and comparisons of common polymorphisms. DNA barcoding is advocated for species identification, due to its universal applicability, simplicity, and scientific accuracy. However, the analysis methods for DNA barcodes were limited. With the development of molecular biology and bioinformatics, a more improved analytic method for DNA barcoding can be established to identify Radix Astragali and closely related species.
In this study, we validated a new analytical method for identifying Radix Astragali using DNA barcoding. Seventyseven specimens of Radix Astragali and its adulterants were collected, and the sequences of 29 species reported in the literature were downloaded from the GenBank database. Based on the 478 sequences for six barcodes (ITS2, ITS from nuclear genome; psbA-trnH, rbcL, and matK from chloroplast genome; COI from mitochondrial genome), genetic distance and ML Tree were calculated by MEGA 6.0 software, and identification efficiency, intra-and interspecific variation, and barcoding gap were calculated using the Perl language algorithm. Results of the six indicators assessed are shown in Table 6. ITS and psbA-trnH outperformed other barcodes in terms of identification efficiency. ITS2 performed better in terms of genetic distance, gap rate, and inter-and intraspecific variation. RbcL performed better in terms of barcoding gap and inter-and intraspecific variation. Although ITS2 was part of the ITS sequence, it performed poorly in identification efficiency. Therefore, we suggest that the ITS sequence is the optimal barcode, and that the psbA-trnH region is a complementary barcode for identifying Radix Astragali and its adulterants.
In conclusion, we describe a new analytical method for the use of DNA barcoding in the identification of Radix Astragali. Six indicators, including average genetic distance, BLAST1 and the nearest distance method for identification efficiency, inter-and intraspecific variation, and gap rate were tested to evaluate six DNA barcodes using bioinformatics software and the Perl language algorithm. The ITS sequence was the optimal barcode for identifying Radix Astragali and its adulterants. This method provides a novel means for accurate identification of Radix Astragali and its adulterants and improves the utilization of DNA barcoding in identifying medicinal plant species.   30 * The total score of six parameters was set by 10, 30, 30, 10, 10, and 10 in order. Identification efficiency based on two methods was set by 30 score because of its importance for identification.