Repertoire of SSRs in the Castor Bean Genome and Their Utilization in Genetic Diversity Analysis in Jatropha curcas

Castor bean and Jatropha contain seed oil of industrial importance, share taxonomical and biochemical similarities, which can be explored for identifying SSRs in the whole genome sequence of castor bean and utilized in Jatropha curcas. Whole genome analysis of castor bean identified 5,80,986 SSRs with a frequency of 1 per 680 bp. Genomic distribution of SSRs revealed that 27% were present in the non-genic region whereas 73% were also present in the putative genic regions with 26% in 5′UTRs, 25% in introns, 16% in 3′UTRs and 6% in the exons. Dinucleotide repeats were more frequent in introns, 5′UTRs and 3′UTRs whereas trinucleotide repeats were predominant in the exons. The transferability of randomly selected 302 SSRs, from castor bean to 49 J. curcas genotypes and 8 Jatropha species other than J. curcas, showed that 211 (∼70%) amplified on Jatropha out of which 7.58% showed polymorphisms in J. curcas genotypes and 12.32% in Jatropha species. The higher rate of transferability of SSR markers from castor bean to Jatropha coupled with a good level of PIC (polymorphic information content) value (0.2 in J. curcas genotypes and 0.6 in Jatropha species) suggested that SSRs would be useful in germplasm analysis, linkage mapping, diversity studies and phylogenetic relationships, and so forth, in J. curcas as well as other Jatropha species.


Introduction
Biofuel is a renewable fuel which can be used as an alternative to or an addition to fossil-derived fuels with multitudes of environmental benefits.Off late, various oil seed plants suited to wide agroclimatic conditions have been explored as sources of future fuels due to the fear that the fossil fuels may get exhausted, in addition to their environmental concerns.Jatropha curcas is a promising bioenergy crop with more than 35% oil content in its seeds with chemical characteristics of oil suitable to be used in modern combustion engines.The plant species is native to tropical America with a heterozygous genome [1][2][3][4].The taxonomic studies of the genus Jatropha have shown that the J. curcas is a primitive ancestral species due to its morphological distinctness and other Jatropha species evolved from J. curcas with changes in growth habit [5].J. curcas crosses readily with other Jatropha species forming natural hybrid complexes (J.curcasgossypifolia).
The narrow genetic base in crop plants has been a major limitation in their genetic improvement for desirable traits [6,7].Previous studies based on RAPD, SSR, and AFLP analysis have indicated that the genetic base of J. curcas is narrow [8][9][10][11].Basha et al. [11] demonstrated polymorphisms of 61.8 and 35.5% with RAPDs and ISSRs, respectively.They identified 12 microsatellite primers differentiating the toxic and non-toxic Mexican accessions.Sudheer Pamidimarri et al. [12] identified RAPD, AFLP, and one SSR marker differentiating toxic and non-toxic varieties of J. curcas.The J. curcas lacks basic genome resources such as genetic map, molecular markers, and genome libraries, thereby necessitating the development of additional molecular markers so as to accelerate the process of genetic improvement programmes.The recent sequencing of J. curcas genome will enable further progress in its genomics [13].
SSRs occur as frequently as 1 in every 6 kb in the plant genomes [14].The functional role of SSRs vary with their location in the genome [14,15].Variations in SSRs in 5 UTRs and 3 UTRs are known to effect gene expression [16].For example, SSRs in the 5 UTRs affect gene regulation and/or gene transcription, and SSRs in the 3 UTRs may cause transcription slippage [14,15].Large numbers of SSRs have been detected and documented in the transcribed regions of genomes [17,18] with their usage as genetic markers for genotyping, mapping, and positional cloning of genes in different plant species [19][20][21][22].
Conservation in structure and function of genetic loci has been documented and utilized in the development of anchor markers such as in grass genomes [23], crucifers [24], and solanaceous plants [25].The availability of public genome sequence databases provides an easier alternative for the identification of anchor markers, including SSRs using bioinformatics, thereby reducing cost and time span for their development [26][27][28].Wen et al. [29] identified 241 EST-SSRs and genomic SSR markers in cassava and demonstrated their transfer and polymorphisms among J. curcas accessions.
The castor bean (Ricinus communis) is a perennial shrub with 50-55% seed oil and mainly cultivated in tropical and subtropical areas of India, China, and Brazil.It is taxonomically and biochemically related to J. curcas because both belong to Euphorbiaceae.The high level of synteny can, therefore, be expected between both plant species, which can be exploited to develop anchor markers.Genome sequence of castor bean was surveyed for SSRs and utilized in J. curcas and other Jatropha species [30].The extent of polymorphisms, in SSRs from putative genic (5 and 3 UTRs, exons, introns) versus nongenic genome regions and SSRs of different types and repeat motif numbers, was investigated.

Annotation of Castor
Bean Genome for SSRs.The castor bean genome sequence (∼400 Mb), consisting of 25,828 contigs (4X coverage), was downloaded from The JCVI website (http://castorbean.tigr.org/),and SSRs were identified using an in-house designed Perl script.The Perl script used regular expressions to locate SSR patterns in the FASTA-formatted sequence files and reported sequence contig ID, SSR motif, number of repeats, and sequence coordinates for each SSR.The minimum repeat unit was defined as six for dinucleotides and five for all other higher-order motifs, including tri-, tetra-, penta-, and hexanucleotides.The FASTA-formatted sequence file was allowed to search for all possible combinations of dinucleotide, trinucleotide, tetranucleotide, and pentanucleotide repeats.Castor bean genome sequence contigs harboring SSRs were annotated for putative open reading frames, including 5 UTRs and 3 UTRs, using gene prediction algorithms of FGenesH (http://linux1.softberry.com/berry.phtml?topic=fgenesh&group=programs&sub-group=gfind), because it was cited as the most accurate gene prediction tool [31,32].SSR motifs were identified in exons, introns, 3 UTRs, 5 UTRs, and non-genic regions of castor bean and Primers were designed from the sequences flanking each SSR repeat motif by using Primer 3.0 (http://frodo.wi.mit.edu/primer3/).The target amplicon sizes were set as 300-400 bp with optimal annealing temperature of 60 • C and the optimal primer length as 20 bp.From the total SSRs identified in the castor bean genome, primer pairs were designed for randomly selected 302 SSRs with a repeat motif of >10 from different genome regions such as 70 from 5 UTRs, 70 from exons, 42 from 3 UTRs, 57 from introns, and 63 from the non-genic regions.
2.2.Plant Material, DNA Extraction, and PCR.The J. curcas genotypes were obtained from the National Bureau of Plant Genetic Resources (NBPGR), New Delhi, India (see Table S1 in supplementary material available on line at doi:101155/2011/286089) and Jatropha species, other than J. curcas, from Dr. k.T. Parthiban of Forest College and Research Institute, Tamil Nadu Agriculture University, Mettupalayam, India.A representative set of 49 genotypes of J. curcas and 9 species of Jatropha, from different geographical regions of India, was used in diversity analysis.Total genomic DNA was isolated from unfurled leaves according to a modified CTAB-based procedure [33].The quality of DNA was checked in 1% agarose gels.The PCR reactions were performed in 25 µL reaction volume following thermocycler profiles, that is, 57-55 • C (189 markers), 52-54 • C (112 markers), and 51 • C (1 marker).Each PCR reaction consisted of 30 ng genomic DNA, varying amounts of primer pairs (0.3-0.4 µM), 1.5 mM Mg 2+ , 200 µM dNTPs, and 0.5 units Taq DNA polymerase.Amplification programs included 94 • C for 5 min, 30 cycles of 94 • C for 45 sec, annealing temperature (57-51 • C) for 45 sec, 72 • C for 2 min, and a final extension of 7 min at 72 • C. Ten µL of each PCR product was mixed with 2 µL of 10X gel loading dye (0.2% bromophenol blue, 0.2% xylene cyanol dye, and 30% glycerol in a TA buffer) and electrophoresed in a 4% agarose gel prepared in 0.5X Tris Borate-EDTA (TBE) buffer (0.05 M Tris, 0.05 M boric acid, 1 mM, EDTA pH 8.0).The gel was run at a constant voltage of 80 volts for 1.5 to 2 h, stained with ethidium bromide, and analyzed using the gel documentation system AlphaImager EP (Alpha Innotech Corp., USA).

Statistical Analysis.
PowerMarker version 3.25 [34] and Gen-AlEx version 6.1 [35] were used to measure variability at each locus: the observed heterozygosity (HO), the expected heterozygosity (HE), the polymorphism information content (PIC), and the deviation from Hardy-Weinberg equilibrium (HW).Deviations from Hardy-Weinberg (HW) and tests for linkage disequilibrium were evaluated using Fisher's exact tests and sequential Bonferroni corrections.The polymorphism information content (PIC) of each microsatellite locus was determined as described by Weir [36] , where P i is the frequency of the ith allele in the genotypes examined.Pairwise similarity matrices were generated by Jaccard's coefficient of similarity [37] by using the SIMQUAL format of NTSYS-pc [38].The presence or absence of amplicons in the genotypes was scored as 1 or 0, respectively.A dendrogram was constructed by using the unweighted pair group method with arithmetic average (UPGMA) with the SAHN module of NTSYS-pc to show a phenetic representation of genetic relationships as revealed by the similarity coefficient [39].
Comparative and Functional Genomics

Results
Computational analysis of 25,828 contigs (4X coverage) of castor bean genome identified 5,80,986 SSRs with a frequency of 1 per 680 bp.The location of SSRs in the putative genic (exons, introns, UTRs) and non-genic regions of the castor bean genome was inferred by annotation of 25,828 contigs with FGenesH gene prediction algorithm.A total of 31,221 genes were predicted in the 25,828 contigs of castor bean.

Occurrence and Distribution of SSRs in the Castor Bean
Genome.Abundance of SSRs in different regions of the castor bean genome showed that 73% were located in the putative genic regions and 27% in the non-genic regions.Comparison of SSR densities in the putative genic regions showed that SSRs were more frequent in 5 UTRs (26%) and introns (25%) followed by 3 UTRs (16%) and exons (6%).Analysis of SSRs in the castor bean genome showed that 51% SSRs were dinucleotide repeats, 29% trinucleotide, 12% tetranucleotide, and 8% pentanucleotide repeats.Dinucleotide repeats were more frequent in the non-genic regions (genome), introns, 5 UTRs and 3 UTRs, whereas trinucleotide repeats were more common in the exons.The tetra-and pentanucleotide repeats were randomly distributed.Among dinucleotide repeats, SSRs with (AT) n repeat motif were common (43%) with a repeat motif ranging from 7 to 48.The frequency of repeat motifs differed in different genomic regions for example (AT) n and (AG) n were predominant in 5 UTRs, (TA) n and (AATA) n in 3 UTRs, (AT) n and (TC) n in introns, and (AT) n and (TA) n in the non-genic regions.Analysis of trinucleotide repeats frequencies out of total SSRs indicated their predominance in the order of TCT/GAA/CGC/TTC.The trinucleotide repeats are runs of particular amino acids.The most frequent amino acid runs identified in the castor bean SSRs were serine (TCT) n (16.5%), glutamate (GAA) n (13.6%), arginine (CGC) n (12.3%), and phenylalanine (TTC) n (9.7%).
Out of all SSRs, which were successfully transferred to J. curcas and other Jatropha species, 50% contained 15 to 30 repeat units whereas 20% of the SSRs had repeat units of >30.The majority of SSRs with successful amplification and polymorphisms contained more than 15 repeat units (Table 3).The PIC values for polymorphic SSRs in J. curcas genotypes and Jatropha species varied from 0.1 to 0.5 with an average of 0.2 and 0.3 to 0.7 with an average of 0.5, respectively.The SSRs with dinucleotide repeat motifs showed higher allele numbers (average 2.7 per locus) followed by trinucleotide (average alleles 2.3 per locus).To understand possible relationship between polymorphism of SSR markers with repeat unit length in J. curcas genotypes and Jatropha species, a line graph was plotted between repeat unit length and numbers of alleles detected (Figure 1).Wide variation in the number of alleles for SSRs with 16 and 25 repeat motifs was seen compared to SSRs with low or high numbers of repeat motifs.An exception to this observation was for SSR, JM15, which contained maximum number of repeat units (TA) 42 with only two alleles, whereas SSR, JM20 with lower repeat motifs (TA) 23 showed the highest number of alleles (7).

Genetic Diversity Analysis among J. curcas Genotypes and
Jatropha Species.The major allele frequency (MAF) ranged from 0.4 to 0.9 for J. curcas genotypes and 0.1 to 0.5 for Jatropha species.The observed heterozygosity (HO) ranged from 0.1 to 0.5 (mean = 0.2) in J. curcas genotypes and 0.4 to 0.7 (mean = 0.6) in Jatropha species, and expected (HE) heterozygosities ranged from 0.1 to 0.5 (mean = 0.2) in J. curcas genotypes and 0.4 to 0.7 (mean = 0.6) in Jatropha species.Hardy-Weinberg probability tests revealed no significant deviations from expected genotype proportions (P > .004).There was no evidence of linkage disequilibrium among loci (P > .001)after corrections for multiple tests.Phylogenetic relationships among different genotypes of J. curcas and 9 species of Jatropha were inferred based on SSRs analysis.Jaccard's genetic coefficient for J. curcas genotypes varied from 1.08 to 9.02.The highest genetic dissimilarity co-efficient (9.02) was observed between 16 polymorphic SSRs in J. curcas genotypes while the lowest value (1.08) was measured between eight combinations.The UPGMA cluster analysis of the Jaccard's co-efficient generated a dendrogram for J. curcas genotypes, which illustrated the overall genetic relationship among genotypes (Figure 2).Cluster analysis indicated six distinct clusters comprising J. curcas genotypes.The J. curcas genotype 1 (Urli-Kanchan) and 32 (Hissar local) remained as outliers and formed the first and sixth clusters, respectively.

Discussion
Genomic resources of castor bean have been successfully used for the development and utilization of SSR markers in J. curcas and other Jatropha species, thereby establishing that the SSR markers are a valuable genetic resource for investigating relationships and comparative mapping in Euphorbiaceae.The availability of whole genome sequences and comparative genomics have opened up several avenues for the identification of anchor makers through computational approaches, thus avoiding tedious, costly, and time consuming techniques   Frequent occurrence of trinucleotide SSRs in the coding regions has been attributed to their advantages in codon usage whereas the suppression of non-trinucleotide SSRs in the coding regions may be due to the possible risks of their involvement in frameshift mutations [15,44,46].Although biased distribution of codon repeats has been demonstrated in several eukaryotic genomes [15,41,44,46,47], yet the over-representation of specific amino acid runs varies.
The most frequent amino acid runs in A. thaliana are serine, proline, glycine, glutamate, glutamine, and aspartate, and those in O. sativa are alanine, glycine, proline, serine, arginine, and glutamate [41].The most frequent amino acid runs in Brassica rapa are serine, glutamic acid, aspartic acid, glycine, lysine, and asparagines [45].Wheras the most frequent amino acid runs in the castor bean SSRs were serine, glutamate, arginine and phenylalanine, which are also the most frequent amino acid runs in the SSRs of other plant genomes [46,48,49].It has been reported that SSRs with longer repeat motifs are more informative for detection of polymorphisms [43,[50][51][52][53][54].For example, Sharma and Chauhan [55] identified an SSR with a long repeat motif (TTC) 31 in the iron transporter genes of maize showing high polymorphisms among maize inbreds compared to other repeat motifs.On the contrary, we found that the repeat motifs of 16-30 repeat length showed higher polymorphisms than longer repeat motifs of >30 repeat units.Other studies have also found no relationship or weak correlation between SSR polymorphisms and repeat unit length [56][57][58].High polymorphisms have been detected in SSRs with dinucleotide repeat motifs in Pearl millet [59] and White clover [60].
Overall low level of genetic diversity was detected among J. curcas genotypes compared to Jatropha species.Basha and Sujatha [9] have reported low levels of diversity among Indian accessions of J. curcas indicating a narrow genetic base.Ganesh Ram et al. [1] have shown that polymorphisms with 26 RAPD primers were considerably higher (80.2%) in 8 Jatropha species compared to J. curcas genotypes.Sun et al. [10] reported one SSR with two alleles and 14.3% polymorphism with 7 AFLP primers in J. curcas.
The study concludes that the dinucleotide repeat motifs in SSRs, from 5 UTR regions with repeat unit length of 16-30, showed higher polymorphisms suggesting that additional primers can be designed from those SSRs with a higher probability of detecting polymorphisms on castor bean and J. curcas and other Jatropha species (Table S2).The SSR markers developed in this study would be very useful for germplasm analysis, population genetic structure and taxonomic relationships in J. curcas and related taxa.

Figure 1 :
Figure 1: Number of alleles per locus for SSRs of different repeat units in J. curcas genotypes (black) and Jatropha species (gray).

Figure 2 :
Figure 2: Dendrogram based on allele sharing genetic distances of 49 genotypes of J. curcas on the basis of Jaccard's similarity coefficient.

Table 1 :
SSRs used to detect polymorphisms in Jatropha curcas genotypes.

Table 2 :
SSRs used to detect polymorphisms in Jatropha species.

Table 3 :
[41,44,45]amplification and polymorphism among SSRs of varying repeat units in J. curcas., (AT) n and (AC) n repeats were more abundant, the castor bean genome contained more (AT) n dinucleotide repeats, which was analogous to B. rapa genome[41,44,45].Trinucleotide SSRs were more frequent in the exonic regions of castor bean genome analogous to other genomes.Majority of trinucleotide repeats were in the coding regions of the castor bean genome, which may encode amino acid runs.