SSR Loci Analysis in Transcriptome and Molecular Marker Development in Polygonatum sibiricum

To study the SSR loci information and develop molecular markers, a total of 435,858 unigenes in transcriptome of Polygonatum sibiricum were used to explore SSR. The distribution frequency of SSR and the basic characteristics of repeat motifs were analyzed using MISA software, and SSR primers were designed by Primer 3.0 software and then validated by PCR. Moreover, the gene function analysis of SSR Unigene was obtained by Blast. The results showed that 112,728 SSR loci were found in the transcriptome of Polygonatum sibiricum, which distributed in 435,858 unigenes with a distribution frequency of 25.86%. Mo-nucleotide and Di-nucleotide repeat were the main types, accounted for 83.83% of all SSRs. The repeat motifs of A/T and AC/GT were the predominant repeat types of Mo-nucleotide and Di-nucleotide, respectively. A total of 113,305 pairs of SSR primers with the potential to produce polymorphism were designed for maker development. One hundred and fifty-four of the 500 randomly selected primers not only produced fragments with expected molecular size but also had high polymorphism, which could accurately separate the tested varieties. The gene function of unigenes containing SSR was mostly related to the molecular function of Polygonatum sibiricum. The SSR markers in transcriptome of Polygonatum sibiricum show rich type, strong specificity, and high potential of polymorphism, which will benefit the candidate gene mining and marker-assisted breeding. The developed markers can also provide technical methods for molecular identification of intraspecific species of Polygonatum Mill. and maker-assisted breeding of superior varieties of Polygonatum Mill.


Introduction
Polygonatum sibiricum was a perennial herb in Liliaceae, more than 60 species globally, mainly distributed among the north temperate zone and the north subtropical zone [1]. In China, 31 species of Polygonatum Mill. were recorded and only three species (P. cyrtonema Hua, P. kingianum Coll. et Hemsl., and P. sibiricum Red.) were introduced in Chinese Pharmacopoeia (2020 edition). With the clarification of the effective components of Polygonatum, its medicinal value and economic value have been gradually recognized by the market. However, some pseudo Polygonatum and shoddy Polygonatum rush into the market, and even some accidental poisoning phenomena occur, which seriously affects the clinical application value of Polygona-tum [2]. There were abundant germplasm resources of Polygonatum Mill. in China. At present, most of them were in wild state. However, unreasonable logging and habitat destruction have become more and more serious, resulting in the loss of Polygonatum germplasm resources, and some Polygonatum germplasm resources are on the verge of extinction [3]. Therefore, it was necessary to systematically collect and protect Polygonatum Mill. germplasm resources, so as to provide effective reference for its germplasm resources, improved variety breeding, classification basis, and protection strategies. The traditional procedure for identifying Polygonatum Mill. plants depended on the morphological characteristics such as the length-width ratio of Polygonatum leaves, the presence or absence of short hairs on the back of leaves, the length of pedicels, and the upper ends of filaments [4][5][6]. However, classification within the Polygonatum Mill. genus through phenotypic characteristics was blurred because of variations and interspecies hybridization [7].
Molecular markers can reveal the genetic relationship between species and subspecies from the level of genetic material such as DNA and have the advantages of being unaffected by the environment, high heritability, and easy detection [8][9][10]. In recent years, molecular markers including random amplified polymorphic DNA (RAPD), intersimple sequence repeat (ISSR), and DNA barcoding have been used to study the genetic relationship identify germplasm resources and analyze the genetic diversity of Polygonatum Mill. [11][12][13][14][15]. Unfortunately, the current molecular markers did not satisfy the demands for identifying Polygonatum Mill., which may be due to the limited type and number of molecular markers [12,16]. For example, a few accessions identified as P. cyrtonema Hua through morphological identification methods could not be determined using ITS2 and psbA-trn-H markers derived [12]. In the current study, only 225 SSR molecular markers and 43 EST-SSR molecular markers were published [11,17]. As a result, these molecular markers were far from satisfying species identification and genetic diversity analysis of Polygonatum Mill. [2]. With the rapid development of high-throughput sequence technology and the continuous reduction of sequence cost, abundant transcriptome data have been used to develop molecule markers in medicinal plants, such as Panax ginseng C. A. Meyer, Glycyrrhiza uralensis, Pharbitis purpurea (L.) Voisgt, etc. [18][19][20][21][22][23]. Simple repeat sequences (SSR), also known as microsatellites, were mainly tandem repeat sequences of 2 to 5 nucleotides as basic repeat units. Microsatellite markers were codominant markers, which can distinguish homozygotes from heterozygotes, detect multiple alleles, and have the advantages of rich polymorphism, simple operation, reliable results, and good repeatability, etc. [24].
In this study, the transcriptome data of P. sibiricum were used to analyze the composition, distribution, and characteristics of EST-SSR loci in P. sibiricum. Moreover, the potential EST-SSR markers were designed, and polymorphic markers were preliminarily verified their polymorphism levels in different Polygonatum Mill. These molecular markers might provide a powerful tool for interspecific identification, genetic diversity analysis, and genetic map construction in Polygonatum Mill.

Materials and Methods
2.1. Experimental Materials. There were 10 Polygonatum Mill. germplasms' transcriptome sequences assembled and used for SSR discovery, categorizations, and marker development (Table 1). Genomic DNA was extracted using the modified CTAB method [8][9][10]25]. The quality of DNA was detected by 1.5% agarose gel electrophoresis, and the concentration and purity of DNA were detected using Nanodrop2000. Each sample was diluted to 50 ng·μL -1 and stored at -20°C.

SSR Extraction from Transcriptome Data and Primer
Design. The MISA software was used to search the repeat sequence sites in the P. sibiricum transcriptome. The search criteria included the number of repetitions for mono-, di-, tri-, tetra-, penta-, and hexa-nucleotides with repetition times of 10, 6, 5, 5, 5, and 5, respectively. Primers for each SSR were designed using Primer3.0 software. The optimal primer length was 17-27 bp, and the expected product size ranged from 100 bp to 300 bp. Five hundred pairs of primers were randomly selected to validate their polymorphisms in 10 germplasm of Polygonatum Mill. (Supplementary  Table S1).

PCR Analysis and Nondenatured Polyacrylamide Gel
Electrophoresis. Three Polygonatum Mill. samples (DH-1-AH, D-1-HN, and HJ-2-HN) were used to optimize annealing temperature. Polymerase chain reaction (PCR) of each sample was performed in 10 μL volume containing 5 μL Premix Taq™ (Takara Biomedical Technology, Beijing, China), 0.3 μL forward primer (20 μM), 0.3 μL reverse primer (20 μM), 1 μL DNA (50 ng·μL -1 ), and 3.4 μL ddH 2 O. PCR amplification was performed using the following steps: initial denaturation at 94°C for 5 min, 45 cycles of denaturation at 94°C for 30 s, optimal gradient annealing for 30 s, and extension at 72°C for 1 min, and finally an elongation step at 72°C for 10 min. The PCR product was detected by 2% agarose electrophoresis, and the primers with clear bands at 50-500 bp were selected to characterize polymorphism among 10 germplasm of Polygonatum Mill. The amplified products and DL50 DNA marker (Takara Biomedical Technology, Beijing, China) were electrophoresed on 8% nondenaturing polyacrylamide gels [acrylamide-bisacrylamide (39 : 1), 1 × TBE] in the 1 × TBE buffer system at a voltage of 180 V and a time of 1.5 h. Electrophoresis gels were stained with Cell Red Nucleic acid dye solution.

Data
Processing. According to the electrophoretogram, clear and repeatable amplified bands were counted. The amplified fragments of markers were designated as 0 in the absence of a band and 1 in the presence of a band. Based on the characterization of a matrix, POPGENE 1.31 software was used to evaluate population genetic parameters, including the number of alleles (Na) and Shannon Information Index (I). The expected heterozygosity (He) and locus Polymorphism Information Content (PIC) were calculated using CERVUS v3.0 software. The Marker index ðMIÞ = NPB × PICav, where PICav = ∑PICi/NPB (PICi: PIC value of no. i marker; NPB: number of polymorphic bands) [26,27]. The NTSYS-pc version 2.0 software was used to calculate the genetic distance matrix, and an unweighted pair group method analysis (UPGMA) tree was constructed.

Validation of EST-SSR Molecular Markers and Genetic
Diversity Analysis in Polygonatum Mill.
3.3.1. Polymorphism Analysis of the Newly-Developed EST-SSR Molecular Markers. In order to evaluate the amplification efficiency of the newly developed EST-SSR markers, a total of 500 markers based on the SSR-containing sequence were randomly selected for validation and assessment of the polymorphism in different Polygonatum Mill. (DH-1-AN, D-1-HN, and HJ-2-HN). In these selected primer pairs, 241 pairs of primers produced clearly and reproducible amplification products. One hundred and four EST-SSR markers showed polymorphisms and high amplification effi-ciency among the tested germplasms. Motifs, primer information, and product size of the tested EST-SSRs were listed in Supplementary Table S1.
The polymorphic EST-SSR markers were used to evaluate the genetic diversity of 10 Polygonatum Mill. germplasm resources. All primer pairs amplified the fragments and a total of 845 alleles were obtained from 154 EST-SSRs in 10 germplasms. The results of nondenaturing polyacrylamide gels of some primers were shown in Figure 2. The number of alleles (Na) ranged from 3 to 9, with an average of 5.4870. The Shannon Information Index (I) was 0.3406-0.6929, with an average of 0.6177. The Polymorphism Information Content (PIC) value varied from 0.163 to 0.849, with a mean of 0.6005 (PIC > 0:5), which indicated that these loci contained a considerable amount of genetic information and could be used to analyze the genetic diversity of Polygonatum Mill. The expected heterozygosity (He) ranged from 0.177 to 0.908, with an average of 0.6740 (Table 3).

Cluster Analysis.
A dendrogram using UPGMA analysis was constructed based on the genetic similarity coefficient of the tested germplasm resources (Figure 3). In the diagram, ten germplasm resources could be divided into three categories when the coefficient was 0.53. All the germplasm resources of Polygonatum Mill. were gathered based on species. Cluster I consisted of four P. cyrtonema Hua accessions, including DH-1-AH, DH-2-GX, DH-3-ZJ, and DH-4-HN. Group II was comprised of P. kingianum Coll. et Hemsl accessions (D1-HN and D2-GZ). All the four P. sibiricum Red. accessions (HJ-1-JX, HJ-4-SX, HJ-3-SC, and HJ-2-HN) were concentrated in Group III (Figure 3).

The Characteristics of SSR.
With the rapid development of sequencing technology, more and more transcriptome data of Chinese herbal medicines such as S. miltiorrhiza, Dendrobium catenatum, Polygonatum Mill., and G. uralensis had been released [28], which provided feasibility and practical basis for developing genomic-SSR [29], EST-SSR, SNP [8][9][10], InDel [30], and KASP molecular markers. In this study, a total of 165,912 SSR loci with a frequency of 25.86% were identified from 435,858 unigenes of P. sibiricum transcriptome under the MISA screening conditions. Compared with other plants, the occurrence frequency of SSR loci was higher than that of P. cyrtonema Hua (7.89%) and P. ginseng C. A. Meyer (7.3%) [19], but lower than that of G. uralensis (60.10%) [21] and Gentiana macrophylla (30.73%) [23]. It has been reported that bioinformatics software tools, search criteria, and size of the database were used in different studies for identifying microsatellites may result different SSR loci frequencies [31,32].
With the increase in the number of repeat units, the distribution frequency of genomic SSRs decreased gradually [33], which was consistent with our results ( Table 2). We also found that single-nucleotide and dinucleotide repeats were the main repeat types with the most significant number of mononucleotides repeat units (56.52%), followed by dinucleotide distribution units (27.31%). The highest proportion of mononucleotide repeat units was also identified in G. uralensis (60.73%) [34], S. splendens Ker-Gawler (41.6%) [35], Punica granatum L. (51.95%) [36], and Eucommia ulmoides (54.34%) [37], whereas dinucleotide repeat units were the dominant motifs in Fagopyrum tataricum (L.) Gaertn (69.72%) [38], Rhododendron simsii Planch (94.58%) [39], and Gastrodia elata Bl (78.88%) [40]. In this study, there were 228 types of abundant SSR repeats in P. sibiricum. The dominant motifs were A/T (58.28%), AG/CT (10.48%), AT/AT (10.48%), and AC/GT (5.12%). The prevalent of A/T was also identified in G. uralensis [34], G. elata Bl [40], P. granatum L. [36], P. cyrtonema Hua, and E. ulmoides [37]. Gur-Arie et al. [41] suggested that this phenomenon may be related to the fact that repeat sequences rich in A/T bases are easier to melt in DNA. In addition, this biased result may also be related to the parameter settings in the SSR locus finding tools. Furthermore, the advantage of the di-nucleotide repeat sequence may be attributed to the overexpression of UTRs as compared to open reading frames, according to the previous studies [42,43]. AG/CT motifs frequently appear in plant EST datasets. Because the AG/CT motif can represent UCU and CUC codons in an mRNA population, which translate to the amino acids Ala and Leu, which are found in proteins at a higher frequency than other amino acids [11].

SSR Primer Validity and
Polymorphism. The length of SSR was an important factor affecting its polymorphism. Based on the length of SSR motifs, they can be categorized as low (<12 bp), medium (12-20 bp), or high (≥20 bp). The total number of SSR loci more than 12 bp in the transcriptome of P. sibiricum was 105,186, of which 37,290 were more than 20 bp and 67,896 were 12-20 bp. These results indicated that the SSR loci of P. sibiricum transcriptome were moderately polymorphic. However, the frequency is thought to be influenced by species differences, as well as the SSR search parameters, database size, and databasemining techniques used in different research.
The polymorphism degree of molecular markers can be measured by the number of alleles (Na), heterozygosity (He), and the Polymorphism Information Content (PIC). A total of 845 alleles were produced by 154 pairs of primers, with an average of 5.4870 alleles per locus. The abundant alleles of the developed markers indicated that EST-SSRs were suitable to detect genetic diversity of the Polygonatum Mill. Polymorphism Information Content (PIC) can measure allele frequencies present at single loci or summed multiple loci and act as the discriminatory power of the molecular markers. The degree of PIC values was generally categorized as low (PIC < 0:25), medium (0:5 > PIC > 0:25 ), or high (PIC > 0:5) [44]. In this study, the average PIC   [45]. Therefore, the developed markers exhibited high polymorphism in the tested germplasm. Thus, it was indicated that the newly screened EST-SSR markers were a useful and informative tool for genetic research and evolutionary adaptability across a vast variety of Polygonatum Mill. at the species level. Shannon Information Index (I) and expected heterozygosity (He) also demonstrated that these markers could be used to distinguish these Polygonatum Mill. germplasm well.

Genetic Diversity Analysis.
It has been reported that identification within the species of Polygonatum Mill. was complicated based on the morphology, possibly due to the interspecific hybridization in Polygonatum Mill. [2,12].
Recently, molecular markers have played an increasingly important role in the identification of Polygonatum Mill. species. Most species of the Polygonatum Mill. can be identified by molecular markers, while only several accessions identified as P. cyrtonema Hua through morphological identification methods could not be determined using markers derived from the chloroplast genome [12], resulting the lim-itation of mtDNA. Phylogenetic relationship among Polygonatum Mill. as revealed by SSR markers was highly consistent with the classification of species. These findings indicated that the newly developed EST-SSR molecular markers could separate the germplasm from different species and accurately reflect the genetic relationship of different germplasm. Although the polymorphism of EST-SSR was lower than that of genomic SSR markers, the sequence derived from the coding region was more conservative and had better universality [8][9][10]. In addition, the polymorphism of EST-SSR may be directly related to gene function and the identification of germplasm resources by such markers were not affected by environmental factors and material sources. Therefore, EST-SSR molecular markers can provide an effective tool for identifying medicinal plants [11]. These findings not only indicated that Polygonatum Mill has active metabolic processes but also that it synthesizes a variety of compounds in this species. There were many varieties of Polygonatum Mill. resources in all parts of the country, which were easy to be mixed, and the prescribed varieties of Polygonatum only account for 7.5% of the total varieties of Polygonatum Mill. [3]. In this study, these three varieties were selected as samples for the  Interestingly, the molecular markers we developed can accurately separate these three varieties. This result was undoubtedly a great advance to our research. As far as our current research results are concerned, we can identify these three kinds of Polygonatum in circulation in the market efficiently and quickly with the molecular markers we studied. At present, the number of SSR molecular markers reported in the literature can actually have high efficiency and polymorphism at the same time [2], but our research screened out 154 SSR molecular markers with both efficacy, which exceeded the total number of SSR molecular markers reported in the literature at present, and injected a powerful source for the development of Polygonatum Mill. molecular markers in the future. In addition, the numbers of unigenes with SSR in P. sibiricum transcriptome were mostly annotated in the molecular function category. Therefore, we speculated that unigenes with SSR site in P. sibiricum might be related to molecular function, which pointed out the direction for targeted research on specific gene functions based on SSR in a later stage.

Conclusion
This study demonstrated comprehensive mining and characterization of specific co-dominant EST-SSR markers using P. sibiricum transcriptome. All the tested Polygonatum Mill. resources gathered based on species after UPGMA analysis derived from the data matrices of the developed polymorphic EST-SSR markers. Therefore, the developed EST-SSRs offer great potential for the identification of Polygonatum Mill. and also facilitating marker-assisted selection in Polygonatum Mill. These results would be a valuable resource for future Polygonatum Mill. genetic and genomic studies, as well as a potent molecular tool for evolutionary adaption and genetic relationship study in other species.

Data Availability
The data that support this study are available in the article and accompanying supplementary material.