Microsatellites or SSRs (
Microsatellites or
SSRs (
SSRs can be found in different regions of genes, that is, coding sequences, untranslated
sequences (5′-UTR and 3′-UTR), and introns, where the expansions and/or
contractions can lead to gene gain or loss of function [
The protocols for
isolating SSR loci for a new species were always very labor-intensive. Currently,
with the accumulation of biological data originating from whole genome sequence
initiatives, the use of bioinformatics tools helps to maximize the
identification of these sequences and consequently, the efficiency in the
number of generated markers [
The first in silico studies of SSRs were developed
using FASTA [
SSR detection is generally followed by the use of another program for primer design, to be
anchored on flanking sequences. Also, in some applications, a third step using e-PCR
[
In the present work, a computing tool with an interface for Windowsusers was developed, called SSR Locator. The
application integrates the following functions: (i) detection and
characterization of SSRs and minisatellite motifs between 1 and 10 base pairs; (ii)
primer design for each locus found; (iii)
simulation of PCR (polymerase chain reaction), amplifying fragments
with different primer pairs from a given set of fasta files; (iv) global alignment
between amplicons generated by the same primer pair; and (v) estimation of global
alignment scores and identities between amplicons, generating information on
primer specificity and redundancy. The described tool is publicly available at
the site
The algorithms used for the searches, alignment, and homology estimates are described separately.
The algorithm used for perfect and imperfect micro-/minisatellite searches was written in Perl and consists of the generation of a matrix that mixes A(adenine), T(thymine), C(cytosine), and G(guanine) in all possible composite arrangements between 1 and 10 nucleotides. The script instructions perform readings on fasta files, searching all possible arrangements in each database sequence.
Several instructions in the algorithm used in SSRLocator resemble those from MISA [
The SSR Locator software contains windows focused on the selection and configuration of SSR and minisatellite types (mono- to 10-mers) and a minimum number of repeats for each one of the selected types. The algorithm calls a perfect repeat when one locus is present with adjacent loci at an up or downstream distance higher than 100 bp.
The algorithm calls an imperfect repeat when the same motif is present on both sides of a fragment containing up to 5 base pairs.
The algorithm
identifies a composite locus when two or more adjacent
loci were found at distances between 6 and 100 bp [
In this study, only “Class I” (≥20 bp) repeats are shown. These repeats
have been described as the most efficient loci for use as molecular markers
[
In order to validate the efficiency of SSRLocator in finding SSRs and minisatellites, the same database was analyzed withMISA and SSRIT, using the same parameters for minimum number of repeats.
An algorithm written in Delphi language performs calls to Primer3 [
The module used to simulate a PCR reaction was written in Delphi. The algorithm consists in reading the file generated by the previous module (SSR locus, forward and reverse primers, and original amplicon), followed by a search of sequences containing primer annealing sites. When annealing sites are found for the two primers, the flanked region and the primer sequences are copied to a new variable called “paralog amplicon.”
For the global alignment between paralog and original amplicon sequences
and score calculations (match, mismatch, gaps),
a routine was written in Delphi language using
the algorithms of Needleman and Wunsch (1970) [
The strategy of creating a two-language hybrid program was established as a function of: (i) the higher speed achieved by handling large text files with Perl as compared to Delphi, and (ii) the better fitness of Perl for generating combinatory strings to be located. The Perl module was transformed into an executable file, making unnecessary to install Perl libraries during program installing. The graphic interface built, integrating input and output windows to the Windows operational system, was obtained using the Suite Turbo Delphi, where a menu system executes calls for each of the previously described modules.
A total of 28 469 rice (
A flow chart representing the different steps performed by the software is shown in Figure
Flow-chart showing the functional structure of SSR Locator. (A) Perl script to search SSRs; (B) text file where information from detected SSRs is stored; (C) module for the statistical calculations for SSR motif occurrence; (D) module that formats text files into standard Primer3 input files; (E) running of Primer3; (F) module for running Virtual-PCR (using a second sequence file as a template); (G) module performing global alignment between homologous amplicons; (H) identity and alignment score calculations between homologous amplicons; and (I) file containing SSR, primer, homologous amplicons, identity, and score information.
A total of 3907 micro- and minisatellites were detected by SSRLocator in the 28 469 analyzed cDNA sequences. The same database searched with MISA and SSRIT presented 3913 and 3917 loci, respectively. The mono-, 4-mer, 6-mer, 7-mer, 8-mer, 9-mer, and 10-mer repeats were identical for the three programs. In the case of 2-mer repeats, 594 elements were detected by SSRLocator and 596 elements were detected by MISA and SSRIT. 3-mer repeats were differently scored by SSRLocator (1990) and the other two (1994) algorithms. For 5-mer repeats, SSRLocator and MISA found the same number of repeats (426), while SSRIT (430) found a different value.
The results obtained with SSRLocator
indicate that out of 28 469 cDNA sequences, 3765 (13.22%) presented one or
more micro-/minisatellite loci. In other studies, microsatellites were found in
the following proportions in ESTs: 3% in arabidopsis [
Considering the 3765
The distribution of occurrences detected by SSRLocator was consisted of 138 monomers,
594 2-mers, 1990 3-mers, 251 4-mers, 426 5-mers, 390 6-mers, 82 7-mers, 6 8-mers, 25 9-mers, and 5 10-mers, corresponding to rates of 3.53%, 15.20%, 50.93%,
6.42%, 10.90%, 9.98%, 2.10%, 0.15%, 0.64%, and 0.13%, respectively (see Table
Distribution of SSR/minisatellite motifs according to the number of repeats.
Repeats | (%) | (%) | (%) | (%) | (%) | (%) | (%) | (%) | (%) | (%) | (%) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | — | — | — | — | — | — | 95.12 | 100 | 96 | 100 | 2.89 | |||||||||||
4 | — | — | — | — | 81.69 | 82.82 | 4.88 | 0 | 4 | 0 | 17.30 | |||||||||||
5 | — | — | — | 72.11 | 16.20 | 11.54 | 0 | 0 | 0 | 0 | 7.55 | |||||||||||
6 | — | — | — | 16.33 | 1.64 | 3.33 | 0 | 0 | 0 | 0 | 1.56 | |||||||||||
7 | — | — | 61.31 | 3.59 | 0 | 1.28 | 0 | 0 | 0 | 0 | 31.58 | |||||||||||
8 | — | 22.16 | 3.59 | 0.23 | 0.26 | 0 | 0 | 0 | 0 | 11.57 | ||||||||||||
9 | — | — | 8.69 | 1.59 | 0 | 0.26 | 0 | 0 | 0 | 0 | 4.56 | |||||||||||
10 | — | 21.04 | 3.42 | 0.40 | 0 | 0.51 | 0 | 0 | 0 | 0 | 5.02 | |||||||||||
11 | — | 13.80 | 1.61 | 1.20 | 0 | 0 | 0 | 0 | 0 | 0 | 2.99 | |||||||||||
12 | — | 12.79 | 0.90 | 0.40 | 0 | 0 | 0 | 0 | 0 | 0 | 2.43 | |||||||||||
13 | — | 11.95 | 0.25 | 0.40 | 0 | 0 | 0 | 0 | 0 | 0 | 1.97 | |||||||||||
14 | — | 6.57 | 0.10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.05 | |||||||||||
15 | — | 7.41 | 0.25 | 0 | 0.23 | 0 | 0 | 0 | 0 | 0 | 1.28 | |||||||||||
16 | — | 5.05 | 0.10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.82 | |||||||||||
17 | — | 5.56 | 0.05 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.87 | |||||||||||
18 | — | 2.53 | 0.15 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.46 | |||||||||||
19 | — | 2.86 | 0.05 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.49 | |||||||||||
20 | 15.22 | 2.36 | 0.10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.95 | |||||||||||
21 | 13.77 | 1.35 | 0.10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.74 | |||||||||||
22 | 10.87 | 1.01 | 0.15 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.61 | |||||||||||
23 | 5.80 | 1.18 | 0.15 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.46 | |||||||||||
24 | 2.17 | 0.84 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.20 | |||||||||||
25 | 6.52 | 0.84 | 0.05 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.38 | |||||||||||
26 | 3.62 | 0.67 | 0.10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.28 | |||||||||||
27 | 2.17 | 0.17 | 0.05 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.13 | |||||||||||
28 | 0.72 | 0.51 | 0.15 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.18 | |||||||||||
29 | 2.90 | 0 | 0.05 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.13 | |||||||||||
30 | 1.45 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.05 | |||||||||||
31 | 6.52 | 0.34 | 0.05 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.31 | |||||||||||
32 | 2.17 | 0.51 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.15 | |||||||||||
33 | 2.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.08 | |||||||||||
34 | 0.72 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.05 | |||||||||||
35 | 4.35 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.18 | |||||||||||
36 | 0.72 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.05 | |||||||||||
37 | 0.72 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.03 | |||||||||||
38 | 2.90 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.10 | |||||||||||
39 | 0 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.03 | |||||||||||
40 | 0.72 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.03 | |||||||||||
41 | 0.72 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.03 | |||||||||||
42 | 1.45 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.05 | |||||||||||
43 | 1.45 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.05 | |||||||||||
44 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00 | |||||||||||
≥45 | 10.14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.36 | |||||||||||
Total | ||||||||||||||||||||||
(%) |
For the
remaining SSRs, average percentage values have been reported as between 17 and 40% for 2-mer,
54–78% for 3-mer, 2.6–6.6% for 4-mer, 0.4–1.3% for 5-mer, and less than 1% for 6-mer
repeats [
The frequency of micro/minisatellite
locus occurrence for each million nucleotides (loci/Mb) [
On Table
Distribution of SSR/minisatellite repeats in the rice cDNA collection.
Motif | Ocur(1) | (%)(1) | Ocur(2) | (%)(2) | Total | (%) Group | (%) Overall | |
---|---|---|---|---|---|---|---|---|
Mono- | A/T | 88.80 | 11.20 | 90.58 | 3.20 | |||
C/G | 76.92 | 23.08 | 9.42 | 0.33 | ||||
2-mer | AG/CT | 36.06 | 63.94 | 45.29 | 6.89 | |||
GA/TC | 61.37 | 38.63 | 39.23 | 5.96 | ||||
CA/TG | 35.71 | 64.29 | 4.71 | 0.72 | ||||
AT | 100.00 | — | 4.04 | 0.61 | ||||
AC/GT | 31.58 | 68.42 | 3.20 | 0.49 | ||||
TA | 100.00 | — | 3.20 | 0.49 | ||||
CG | 100.00 | — | 0.34 | 0.05 | ||||
3-mer | CCG/CGG | 53.68 | 46.32 | 18.44 | 9.39 | |||
CGC/GCG | 61.24 | 38.76 | 17.89 | 9.11 | ||||
GCC/GGC | 53.08 | 46.92 | 10.60 | 5.40 | ||||
CTC/GAG | 42.69 | 57.31 | 8.59 | 4.38 | ||||
AGG/CCT | 30.91 | 69.09 | 5.53 | 2.82 | ||||
GGA/TCC | 62.50 | 37.50 | 4.82 | 2.46 | ||||
CAG/CTG | 76.32 | 23.68 | 3.82 | 1.95 | ||||
AAG/CTT | 50.75 | 49.25 | 3.37 | 1.71 | ||||
CGA/TCG | 54.10 | 45.90 | 3.07 | 1.56 | ||||
AGC/GCT | 62.07 | 37.93 | 2.91 | 1.48 | ||||
GCA/TGC | 83.93 | 16.07 | 2.81 | 1.43 | ||||
AGA/TCT | 62.26 | 37.74 | 2.66 | 1.36 | ||||
CCA/TGG | 75.00 | 25.00 | 2.61 | 1.33 | ||||
ACC/GGT | 48.89 | 51.11 | 2.26 | 1.15 | ||||
GAA/TTC | 63.64 | 36.36 | 2.21 | 1.13 | ||||
CAC/GTG | 65.12 | 34.88 | 2.16 | 1.10 | ||||
GAC/GTC | 54.55 | 45.45 | 1.66 | 0.84 | ||||
ACG/CGT | 42.31 | 57.69 | 1.31 | 0.67 | ||||
ATC/GAT | 45.45 | 54.55 | 0.55 | 0.28 | ||||
TCA/TGA | 50.00 | 50.00 | 0.50 | 0.26 | ||||
CAA/TTG | 50.00 | 50.00 | 0.40 | 0.20 | ||||
ACT/AGT | 42.86 | 57.14 | 0.35 | 0.18 | ||||
TAA/TTA | 14.29 | 85.71 | 0.35 | 0.18 | ||||
CTA/TAG | 66.67 | 33.33 | 0.30 | 0.15 | ||||
AAT/ATT | 20.00 | 80.00 | 0.25 | 0.13 | ||||
CAT/ATG | 100.00 | 0 | 0.20 | 0.10 | ||||
AAC/GTT | 75.00 | 25.00 | 0.20 | 0.10 | ||||
ATA/TAT | 50.00 | 50.00 | 0.10 | 0.05 | ||||
GTA/TAC | 100.00 | 0 | 0.05 | 0.03 | ||||
4-mer | GATC | 100.00 | 0 | 7.17 | 0.46 | |||
ATTA/TAAT | 52.94 | 47.06 | 6.77 | 0.44 | ||||
ATCG/CGAT | 20.00 | 80.00 | 5.98 | 0.38 | ||||
CATC/GATG | 40.00 | 60.00 | 3.98 | 0.26 | ||||
AGAA/TTCT | 25.00 | 75.00 | 3.19 | 0.20 | ||||
GCTA/TAGC | 75.00 | 25.00 | 3.19 | 0.20 | ||||
GATA/TATC | 14.29 | 85.71 | 2.79 | 0.18 | ||||
GCGA/TCGC | 42.86 | 57.14 | 2.79 | 0.18 | ||||
GCAC/GTGC | 33.33 | 66.67 | 2.39 | 0.15 | ||||
AGGG/CCCT | 33.33 | 66.67 | 2.39 | 0.15 | ||||
5-mer | AGGAG/CTCCT | 15.00 | 85.00 | 4.69 | ||||
CTCTC/GAGAG | 89.47 | 10.53 | 4.46 | |||||
GAGGA/TCCTC | 56.25 | 43.75 | 3.76 | |||||
CCTCC/GGAGG | 80.00 | 20.00 | 3.52 | |||||
AGAGG/CCTCT | 26.67 | 73.33 | 3.52 | |||||
GGAGA/TCTCC | 18.18 | 81.82 | 2.58 | |||||
CTCGC/GCGAG | 77.78 | 22.22 | 2.11 | |||||
AGCTA/TAGCT | 44.44 | 55.56 | 2.11 | |||||
GAAAA/TTTTC | 25.00 | 75.00 | 1.88 | |||||
AGGCG/CGCCT | 25.00 | 75.00 | 1.88 | |||||
6-mer | CGCCTC/GAGGCG | 85.71 | 14.29 | 3.59 | 0.36 | |||
CGGCGA/TCGCCG | 28.57 | 71.43 | 3.59 | 0.36 | ||||
CCTCCG/CGGAGG | 81.82 | 18.18 | 2.82 | 0.28 | ||||
AGGCGG/CCGCCT | 10.00 | 90.00 | 2.56 | 0.26 | ||||
CCGTCG/CGACGG | 44.44 | 55.56 | 2.31 | 0.23 | ||||
CGTCGC/GCGACG | 77.78 | 22.22 | 2.31 | 0.23 | ||||
ACCGCC/GGCGGT | 12.50 | 87.50 | 2.05 | 0.20 | ||||
CCACCG/CGGTGG | 85.71 | 14.29 | 1.79 | 0.18 | ||||
GGCGGA/TCCGCC | 71.43 | 28.57 | 1.79 | 0.18 | ||||
CTCCAT/ATGGAG | 100.00 | 0 | 1.54 | 0.15 | ||||
7-mer | CCGCCGC/GCGGCGG | 66.67 | 33.33 | 7.32 | 0.15 | |||
CTCTCTC/GAGAGAG | 80.00 | 20.00 | 6.10 | 0.13 | ||||
CCTCTCT/AGAGAGG | 100.00 | 0 | 4.88 | 0.10 | ||||
CTCTCTT/AAGAGAG | 100.00 | 0 | 4.88 | 0.10 | ||||
CCCAAAT/ATTTGGG | 100.00 | 0 | 3.66 | 0.08 | ||||
GCCGCCG/CGGCGGC | 100.00 | 0 | 3.66 | 0.08 | ||||
GCGGCGC/GCGCCGC | 100.00 | 0 | 2.44 | 0.05 | ||||
AATAAAA/TTTTATT | 100.00 | 0 | 2.44 | 0.05 | ||||
GTGTGCG/CGCACAC | 100.00 | 0 | 2.44 | 0.05 | ||||
CGCCGTC/GACGGCG | 100.00 | 0 | 2.44 | 0.05 | ||||
8-mer | TTGGTTTC/GAAACCAA | 100.00 | 0 | 33.33 | 0.05 | |||
TGGGCTTG/CAAGCCCA | 100.00 | 0 | 16.67 | 0.03 | ||||
GCTTCTTG/CAAGAAGC | 100.00 | 0 | 16.67 | 0.03 | ||||
ACGGGCGA/TCGCCCGT | 100.00 | 0 | 16.67 | 0.03 | ||||
ATGATGTA/TACATCAT | 100.00 | 0 | 16.67 | 0.03 | ||||
9-mer | TCGGCGGCG/CGCCGCCGA | 100.00 | 0 | 8.00 | 0.05 | |||
AGGTGGTGG/CCACCACCT | 100.00 | 0 | 8.00 | 0.05 | ||||
CCGGTGCGA/TCGCACCGG | 100.00 | 0 | 4.00 | 0.03 | ||||
ACGAGGAGG/CCTCCTCGT | 100.00 | 0 | 4.00 | 0.03 | ||||
TCCCTTTTC/GAAAAGGGA | 100.00 | 0 | 4.00 | 0.03 | ||||
CGGCATGAA/TTCATGCCG | 100.00 | 0 | 4.00 | 0.03 | ||||
CGGCAGCGA/TCGCTGCCG | 100.00 | 0 | 4.00 | 0.03 | ||||
ACCATCCCG/CGGGATGGT | 100.00 | 0 | 4.00 | 0.03 | ||||
ATGGGCGGC/GCCGCCCAT | 100.00 | 0 | 4.00 | 0.03 | ||||
ATGCAGGGT/ACCCTGCAT | 100.00 | 0 | 4.00 | 0.03 | ||||
10-mer | AGCCCCAACG/CGTTGGGGCT | 50.00 | 50.00 | 40.00 | 0.05 | |||
TTTTTTTCTT/AAGAAAAAAA | 100.00 | 0 | 20.00 | 0.03 | ||||
CCTGCTTTGC/GCAAAGCAGG | 1 | 100 | 0 | 0 | 1 | 20 | 0.03 | |
ATCTCCGCCG/CGGCGGAGAT | 1 | 100 | 0 | 0 | 1 | 20 | 0.03 |
The A/T monomer
repeats were found in 125 loci, with 111 (88.80%) and 14 (11.20%) loci formed
by A and T nucleotides, respectively. The C/G motifs were found in 13 loci,
with ten (76.92%) and three (23.08%) loci formed by C and G, respectively. A/T
containing SSRs were predominant and comprised 90.58% of monomer loci. In the
overall distribution, the monomers represent 3.53% of 3907 detected loci. Motifs
AG/CT and GA/TC were the most frequent and added up to 8.52% of 2-mer SSRs, and
6.89% and 5.96% of all 3907 detected occurrences. The motifs CT, GA, and TC were
the most abundant adding up to 172, 143, and 90 loci, respectively. In maize,
barley, rice, sorghum, and wheat ESTs, the motif AG was described as the most
frequent [
Among 4-mers, 100 different
arrangements were found, where the motifs GATC (7.17%), ATTA/AAT (6.77%), and
ATCG/CGAT (5.98%) were the most frequent. These motifs add up to 19.92% of 4-mer
repeats found and represent 1.28% of the overall content of micro-/minisatellites.
In barley ESTs, ACGT was reported as the most abundant motif [
Among 5-mers, 188
different arrangements were detected and the most frequent were CTCCT, CTCTC, and
CCTCC with 17, 17, and 12 occurrences, respectively. In the analysis of CDS
regions, the ACCCG motif was the most frequent in Arabidopsis, AAAAG in
The design of primers for the 3907 detected micro-/minisatellites resulted in 3329 primer pairs, covering 85.20% of loci. The running of “Virtual PCR” generated a total of 4610 amplicons. A module in SSRLocator checks for primer redundancy. A total of 2397 primer pairs amplified only the fragment from its original locus (specific amplicons) and 932 pairs amplified one or more regions besides the original locus. From these, 692 pairs amplified two fragments, one from the original site and a second from another region (paralogous). In this case, 692 specific amplicons plus 692 redundant amplicons, were detected. A total of 143, 90, 2, and 5 primer pairs generated three (two redundancies), four (three redundancies), five (four redundancies), and six (five redundancies) fragments, respectively. The final product of 932 primers with more than one anchoring region resulted in 932 specific amplicons and 1281 redundant amplicons, adding up to 2213 fragments.
To investigate the ability of these primers in amplifying genomic sequences, an extra experiment was performed against the whole rice genomic sequence available at NCBI. The different groups of redundant and nonredundant primer sets, that is, amplifying one, two, three, or more times in the cDNA database, were tested against the genomic sequence. From the 2397 nonredundant primers, only 924 amplified a locus in the genomic sequence. This difference was already expected because of difficulties in amplifying genomic regions, that is, if some primers anneal to a boundary region between two exons in the cDNA, the presence of introns would make this annealing site no more available. It is interesting to note that from the 924 amplicons detected, 914 (99%) did amplify only one locus in the genomic region, agreeing with the cDNA results. When the primer sets that amplified two different cDNAs were run against the genomic sequence, only 294/692 (42.5%) did amplify, having 14.5% been able to amplify two different loci. Only one primer set did amplify more than two loci. These results indicate that SSR locator performance was consistent between the two databases regarding the nonredundant loci, that is, from those loci that were able to be amplified in both databases, their status of nonredundant was maintained. The changes observed for the redundant loci can be attributable to many causes, including redundancy in the cDNA database, but also to biological reasons due to primer positioning.
Results of a global
alignment between amplicons from original and redundant sites are shown in Table
Distribution of amplicon alignments for specific and redundant amplicons with varying identity levels.
Identity | 100 | 99 | 98 | 97 | 96 | 95–90 | 89–80 | 79–70 | 69–60 | Total | |
---|---|---|---|---|---|---|---|---|---|---|---|
Amplicons | 787 | 261 | 151 | 29 | 11 | 8 | 8 | 6 | 5 | 15 | 1281 |
% | 61.44 | 20.37 | 11.79 | 2.26 | 0.86 | 0.62 | 0.62 | 0.47 | 0.39 | 1.17 | — |
The software SSRLocator was successfully implemented, adding steps for (1) SSR discovery, (2) primer design, and (3) PCR simulation between the primers obtained from original sequences and other fasta files. Also, the software produces reports for frequency of occurrence, nucleotide arrangement, primer lists with all standard information needed for PCR and global alignments. From the PCR simulation, it was possible to point out which primer pairs were nonredundant, suggesting that these primers are more appropriate for mapping purposes. In this case, however, wet lab experiments should be performed to confirm the advantage of nonredundant over redundant primers for mapping.
It is possible that the results for micro-/minisatellite frequencies (loci/Mb) obtained in this study diverge from the results found in the literature. This can be explained by the different databases used (redundant ESTs, nonredundant ESTs and/or fl-cDNA), different algorithm configurations and minimum requirements set for counting motifs. Another explanation for some contrasting results is the fact that only “Class I” repeats were analyzed in our study.
The results showed that 932 (27.99%) primers presented
amplifications in more than one gene sequence. This could be mostly due to the
fact that primer pairs derived from a specific gene (cDNA) anchored in similar
sites in other duplicated genes, since 5,607/28,469 (19.70%) genes were described as paralogs in the
annotation of the database used [
Finally, this tool can be used successfully for data mining strategies to find SSR primers in genomic or expressed sequences (ESTs/cDNAs). Also, this software can be a tool for microsatellite discovery in databanks of related species, anchoring primers in ortholog or paralog regions contained between databases from two different species.
The authors are thankful to the Brazilian Council for Research and Development (CNPq) and the Coordination for Support to Superior Studies (CAPES/Brazil) for grants and fellowships.