Development of Pineapple Microsatellite Markers and Germplasm Genetic Diversity Analysis

Two methods were used to develop pineapple microsatellite markers. Genomic library-based SSR development: using selectively amplified microsatellite assay, 86 sequences were generated from pineapple genomic library. 91 (96.8%) of the 94 Simple Sequence Repeat (SSR) loci were dinucleotide repeats (39 AC/GT repeats and 52 GA/TC repeats, accounting for 42.9% and 57.1%, resp.), and the other three were mononucleotide repeats. Thirty-six pairs of SSR primers were designed; 24 of them generated clear bands of expected sizes, and 13 of them showed polymorphism. EST-based SSR development: 5659 pineapple EST sequences obtained from NCBI were analyzed; among 1397 nonredundant EST sequences, 843 were found containing 1110 SSR loci (217 of them contained more than one SSR locus). Frequency of SSRs in pineapple EST sequences is 1SSR/3.73 kb, and 44 types were found. Mononucleotide, dinucleotide, and trinucleotide repeats dominate, accounting for 95.6% in total. AG/CT and AGC/GCT were the dominant type of dinucleotide and trinucleotide repeats, accounting for 83.5% and 24.1%, respectively. Thirty pairs of primers were designed for each of randomly selected 30 sequences; 26 of them generated clear and reproducible bands, and 22 of them showed polymorphism. Eighteen pairs of primers obtained by the one or the other of the two methods above that showed polymorphism were selected to carry out germplasm genetic diversity analysis for 48 breeds of pineapple; similarity coefficients of these breeds were between 0.59 and 1.00, and they can be divided into four groups accordingly. Amplification products of five SSR markers were extracted and sequenced, corresponding repeat loci were found and locus mutations are mainly in copy number of repeats and base mutations in the flanking region.


Introduction
Pineapple (Ananas comosus (L.) Merr.), belonging to Bromeliaceae, ananas, is a perennial evergreen herbaceous fruit tree that produces one of the most famous four tropical fruits beside banana, coconut, and mango. During cultivation and propagation, due to the different naming habits of the propagators and local cultivators, homonym and synonym are very common, nomenclature of pineapple was in chaos, and breeds vary greatly within major groups, which not only hinders rational use of pineapple germplasm resources, but also impedes breeding of better pineapple strains.
Molecular marker technology, such as RFLP, RAPD, and AFLP, has been reported to be used in pineapple germplasm analysis; for example, Duval et al. [1] used RFLP marker in research on germplasm diversity of pineapple. De Fátima Ruas et al. [2] analyzed 18 germplasms of pineapple using RAPD marker and concluded that the cultivated germplasms in this study had a similarity coefficient lower than 0.85. Duval et al. [3] determined pineapple chloroplast DNA polymorphism using RFLP analysis. Kato et al. [4] analyzed intraspecific DNA polymorphism of pineapple using AFLP assay. Popluechai et al. [5] assessed genetic diversity of nine germplasms of pineapple and divided them into three groups based on a 0.77 similarity coefficient. Wöhrmann and Weising [6] developed EST-SSR markers to carry out cross-amplification study within the pineapple bromeliad species, genus, and subfamily. Their results have shown that most genetic markers had low polymorphism, especially when the subjects are closely related. The recently developed microsatellite marker attracts a lot of interests and is being widely used due to its comparatively high polymorphism and genome specificity [7]. SSR markers can be detected by PCR amplification using specific primers which can be developed mainly by classical library screening [8], microsatellite enriching [9,10], 5anchoring PCR technology [11], sequence tagged microsatellite profiling (STMP) [12], selectively amplified microsatellite (SAM) [13], and bioinformatics methods [6,14,15]. Among these methods, SAM can generate SSR markers generating multilocus SSR fingerprints, which requires only one pair of primers and has high efficiency in developing informative SSRs. In this study, we designed SSR primers using SAM or bioinformatics method. Those highly informative and reproducible SSR primers were used to carry out germplasm diversity analysis for 48 breeds of pineapples, so as to reveal the genetic relationship among them, provide reference for improvement of the current chaotic situation of pineapple nomenclature, and reveal the regularity of mutation of pineapple SSR loci through amplification, extraction, and sequencing of SSR loci.  (Table 1). DNA was extracted using a modified CTAB method [16]. E. coli strain DH5 for transformation was kept by our laboratory.

Development of Genomic SSR Markers.
Genomic library was constructed in reference to the SAM method [13]. PstI (15 U/ L, 0.3 L), MseI (10 U/ L, 0.5 L), 10x NEB buffer II (5 L), and BSA (10 g/ L, 0.5 L) were added to 1 g of genomic DNA. Reaction was allowed at 37 ∘ C for 1 h and terminated by incubation at 65 ∘ C for 10 min. 5 pmol PstI adaptor and 50 pmol MseI adaptor were then added and incubated at 45 ∘ C for 5 min; then T4 DNA ligase (0.5 U), dATP (100 mM, 1.8 L), and sufficient reaction buffer were added to reach a total volume of 30 L, and the system was incubated at 16 ∘ C for 12 h for ligation. Product of SAM-PCR was separated using denaturing polyacrylamide gel electrophoresis. Based on Hayden and Sharp's [13] work, we increased the number of adaptors, sequences of adaptors, and primers used in this study are shown in Table 2. Target sequences were extracted, cloned, and sequenced, then screened for SSR sequences using the Microsatellite software  : CTC GGA AGC CTC AGT CCC AGA CTG CGT ACA TGC A-OH  Antisense strand: phos-TGT ACG CAG TCT GGG ACT GAG GCT TCC GAG A-OH   MseI adapter  Sense strand: GAG CAA GGC TCT CAC AAG GAC GAC CGA CGA G-OH  Antisense strand: phos-TAC TCG TCG GTC GTC CTT GTG AGA GCC TTG CT-  (MISA) (http://pgrc.ipk-gatersleben.de/misa/), the criteria for SSR screening were as follows: mononucleotide must be repeated for 10 or more times, dinucleotide and trinucleotide be repeated for six or more times, and ≥4 nucleotide units be repeated for five or more times. Complicated SSRs that are interrupted by no more than 100 bases were also included. Dinucleotide repeats such as AT/TA, CT/AG were regarded as the same type.
ESTs that had PolyT or polyA (≥5 repeats) within 50 bp downstream of 5 -end or upstream of 3 -end or shorter than 100 bp were excluded using the EST-trimmer software (http://pgrc.ipk-gatersleben.de/misa/download/est trimmer .pl); for ESTs that were longer than 700 bp, only the first 700 bp at the 5 -end were kept. Then, SSRs were screened using the MISA software. Screening criteria were the same as genome-based development.
Cluster analysis was carried out using stackPACK. Design and synthesis of primers were the same as genome-based development.

Genetic Diversity Analysis and of Locus Mutation Detection.
Eighteen pairs of these newly developed primers that were highly informative and reproducible were selected to carry out genetic diversity analysis for 48 breeds of pineapple. After silver staining, electrophoresis bands were recorded using the Banscan software, for the same migration distance, positive band was recorded as "1, " negative band as "0, " and failure of amplification as "9. " Genetic distance matrix was calculated using NTSYSpc ver 2.1 software (http://www.exetersoftware.com/), evolutionary tree was constructed using the Unweighted Pair Group Method with Arithmetic Mean (UPGMAM) method, primer polymorphism informativeness was calculated using the formula PIC = 1 − ∑( ) 2 , wherein stands for the frequency of th locus in all alleles [21]. Repeat types, (CA) n, (GCAGGA) n, (AG) n, (TCGCAG) n, and (TCT) n primers were used to amplify 10 samples that included bands corresponding to all the previous five repeat types; the bands were then recovered, sequenced, and subjected to SSR locus mutation analysis. The ClustalX software was used to compare original sequence and sequencing results.

Development of Genomic SSR Markers.
Products of SAM PCR were separated using denaturing polyacrylamide gel, and 200-750 bp bands were recovered after silver staining. A total of 99 bands were cloned and sequenced, 86 of them contained SSR loci. Numbers of bands obtained by combination of different anchoring primers and adaptor primers were shown in Table 3. Clustering analysis revealed 68 single sequences, and eight groups of repeated sequences; that is, a total of 76 sequences can be used for primer designing. PstI SAM primer in combination with 5 anchoring primer PAC/PGT developed 44 sequences and PstI SAM primer in combination with 5 anchoring primer PCT/PGA developed 55 sequences, indicating that CT/AG is more abundant than AC/GT in pineapple genome (Table 3). All sequences were screened for SSR loci using MISA; 52 GA/CT repeat loci were found and 39 AC/GT repeat loci were found, which is in accordance with the result developed by different anchoring primers. Three mononucleotide repeats were found and no tri-or more nucleotide repeat locus was found.
Thirty pairs of primers flanking the SSR locus were designed for each of the 36 SSR-containing DNA sequences;  24 of them generated clear, reproducible bands of expected size, and 13 of them showed polymorphism when amplifying the selected samples.

Development of EST-SSR Markers.
Fifty-six hundred and fifty-nine EST sequences with a total length of 4,141.084 kb were downloaded from NCBI database. MISA was used to analyze these sequences and 1397 EST sequences containing 1839 microsatellite loci were developed ( Figure 1). Frequency of SSR-containing sequences among all sequences was 24.68% (one SSR locus every 4.05 ESTs) or one microsatellite locus every 2.25 kb. Eight hundred and forty-three nonredundant SSRcontaining EST sequences were obtained after cluster analysis on the 1397 EST sequences using stackPACK v 2.2. 620 of them were single sequence and 223 were redundant groups. 1110 SSR loci were identified with MISA, and 217 of these sequences contained more than one SSR locus. Of the 1110 SSR loci, 952 were simple SSRs, and 158 were complicated.

Genetic Diversity Analysis and Detection of Mutation
Locus. Eighteen pairs of highly informative primers developed by EST-SSR or from genomic library were selected to carry out PCR amplification and genetic diversity analysis for   other one was developed from genomic library. After PCR amplification and sequencing, these five pairs of primers generated 73 sequences, different SSR markers generated corresponding sequences after amplification and sequencing. Through comparison by ClustalX software, insertions, deletions, transversions, and conversions of these SSR loci and flanking sequences were revealed ( Figure 3).

Efficiency of SAM Method to Develop SSR Markers.
This study used the SAM method invented by Hayden and Sharp [13] to develop positive clones from pineapple genome for sequencing. Ninety nine clones were sequenced and 86 of them contained 94 SSR loci. Thirty-six of these sequences were selected, and 36 pairs of primers flanking the SSR loci were designed, one for each, and 24 of them generated clear and reproducible bands of expected size, and 13 of them showed polymorphism when amplifying selected samples. 86.9% of all sequenced clones were positive, and frequency of SSR marker showing polymorphism was 13.1%, which is lower than [22] results for rubber trees (24.6%) and Wang et al. [23] for banana (19.5%). This may be due to variations between different materials; although SSRs are widely distributed in eukaryotic genomes, their content, type, and copy number vary between different materials. Even within the same species, there would also be variances. Another modification was that primers were designed on repeating sequences of microsatellite, and only a portion of flanking sequence was used instead of the whole initial 5 anchoring primer. This may have elevated reproducibility of the primers but may lower their polymorphism. Comparatively, the SAM method is much more efficient in developing SSRs than conventional constructing and screening from genomic library of small inserts or STMS method. For example, Ujino et al. [8] acquired only three positive SSR-containing sequences out of 6000 clones (0.05%) using conventional method, and Rajora et al. [24] developed 71 positive clones out of 4028 (1.8%) using STMS method.

Comparison of the Development of Genomic SSR and EST-SSR.
EST-SSR marker has unique advantages [27], including being able to detect polymorphism of expressing regions of the genome, high versatility, and relatively low development cost. Thus, it is of great value in genetic mapping, diversity of genetic resources, discovery, and positioning of functional genes, researches on origin of species, evolution, and genomic comparison [28]. Wöhrmann and Weising [6] screened NCBI database for SSRs, setting the criteria as no less than 15 times for mononucleotide repeats; no less than seven times for dinucleotide repeats, and no less than five times for 3-6 nucleotide repeats. Forty-two types SSRs were revealed from 5659 ESTs; one 9 SSR occurred every 4.1 kb on average. Trinucleotide repeats was the most common, followed by dinucleotide repeats. Ong et al. [19] also developed SSR markers from 5931 ESTs using SynaRex tool. To ensure comparability between ESTbased and genome-based SSR development, different analytic software and the same criteria for SSR screening were used in this study. For EST-based SSR marker development, due to differences in size of database, criteria for SSR screening, and tools for SSR development, the distribution, frequency, and abundance of SSRs also vary (Table 6), as concluded by Varshney et al. [28].
In this study, the rate of polymorphic SSR marker was 13.1% for SAM-developed SSRs, and 2.1% for EST-developed SSRs, showing a higher efficiency of SSR development by SAM method than genomic library-based method which was in consistence with results reported for soybean [29,30] and rubber trees [22,31].

Genetic Diversity Analysis for Pineapple Germplasms
Using SSR Markers. In the present study, 48 germplasms of pineapple were divided into four subgroups, instead of three (Caine, Queen, and Spain) by conventional morphological classification. It can be observed from the cluster analysis, of the 25 germplasms in the first group, Kallara local, Phuket, New Phuket, Natal Queen, MacGregor, Common Rough, Alexandria and Riply Queen, and so forth, belong to the Queen group, and the others such as Tainong-6, Tainong-18, Tainong Some taxonomists regarded Perolera as a new breed, and in this study, it was clustered into the second group, which is closely related to the Caine group. The Sarawak germplasm was thought to belong to the Caine group, but it was actually clustered into the Queen group. This phenomenon may be due to non-unified classification standards for pineapple that leads to different classification; the name for the germplasm's confusion, for exemple, one germplasm has multiple names or a single name used by multiple germplasms because of frequent regional and international exchange of pineapple germplasms; the internal limitation of morphological classification that characteristics of a germplasm is easily affected by environmental conditions. During cultivation and propagation, due to different naming habits of the propagators and local cultivators, homonym and synonym are very common, nomenclature of pineapple was in chaos, and germplasms vary greatly within major groups. In addition, SSR reveals not only genetic variations at the DNA level, but also differences in genotype between germplasms. Genome DNA contains not only structural genes, but also some silence genes which had yet no clear function, and the perceptible phenotype is the results of functional gene expression under influence of both internal and external environment. So, difference in DNA structure may not necessarily lead to differences in morphology.

Analysis of SSR Mutation.
Mutation of SSR mainly came from base changes in the flanking sequence and repeat region. In this study, we found no insertion or deletion mutations at the EP-11 or EP-20 loci, at the EP-12 locus, Alexandria-a, Tainong-17-b, and Red Spanish-b had "T" and "AA" insertion, respectively; flanking sequence of Bp and EP had deletions (Figure 3).
Flanking sequence of corn had insertion mutations [32]. Gutierrez et al. [33] found in their research on M. truncatula that sequence variation was mainly due to variation in copy number of repeats of the SSR region, as well as insertion, deletion, and nucleotide substitution mutations. Symonds and Lloyd [34] pointed out that interruptions in the SSR region shortened the SSR sequence; in this study, it was observed that nucleotide substitution resulted in decrease in copy number of repeats; a single long repeat sequence was divided into several smaller repeat sequences or became shorter. For example, the Golden pineapple had a CAGGAG insertion at the EP-11 "b" locus, increasing repeat number; the "T" of EP-20, Tainong-18 sequence was replaced with "C, " leading to decrease in TCT repeats; Red Spanish-b had its "A" replaced with "G" at the EP-12 locus and was thus divided into smaller repeating units.