Definition of Soybean Genomic Regions That Control Seed Phytoestrogen Amounts

Soybean seeds contain large amounts of isoflavones or phytoestrogens such as genistein, daidzein, and glycitein that display biological effects when ingested by humans and animals. In seeds, the total amount, and amount of each type, of isoflavone varies by 5 fold between cultivars and locations. Isoflavone content and quality are one key to the biological effects of soy foods, dietary supplements, and nutraceuticals. Previously we had identified 6 loci (QTL) controlling isoflavone content using 150 DNA markers. This study aimed to identify and delimit loci underlying heritable variation in isoflavone content with additional DNA markers. We used a recombinant inbred line (RIL) population (n=100) derived from the cross of “Essex” by “Forrest,” two cultivars that contrast for isoflavone content. Seed isoflavone content of each RIL was determined by HPLC and compared against 240 polymorphic microsatellite markers by one-way analysis of variance. Two QTL that underlie seed isoflavone content were newly discovered. The additional markers confirmed and refined the positions of the six QTL already reported. The first new region anchored by the marker BARC_Satt063 was significantly associated with genistein (P=0.009, R2=29.5%) and daidzein (P=0.007 , R2=17.0%). The region is located on linkage group B2 and derived the beneficial allele from Essex. The second new region defined by the marker BARC_Satt129 was significantly associated with total glycitein (P=0.0005 , R2=32.0%). The region is located on linkage group D1a+Q and also derived the beneficial allele from Essex. Jointly the eight loci can explain the heritable variation in isoflavone content. The loci may be used to stabilize seed isoflavone content by selection and to isolate the underlying genes.


INTRODUCTION
Soybean seed products contain a wide variety of bioactive phytochemicals (eg, isoflavones, saponins, phytic acids, phytosterols, trypsin inhibitors, phenolic acids, peptides, etc) [1,2,3]. Among these, the plant estrogens genistein, daidzein, and glycitein are a major focus of medical research. Isoflavonoids and other phytoestrogenic compounds have been shown to inhibit various processes during carcinogenesis [1] such as mutation [2], proliferation of cancer cell lines [4], viral promotion [5], and angiogenesis [6]. Tumor numbers are reduced in animals after treatment with soy products [7] or after treatment with soy isoflavonoids [8,9]. Soy phytoestrogens are being implicated in a reduced risk of breast and prostate cancer, cardiovascular disease, and osteoporosis [10,11,12,13,14,15,16,17]. Phytoestrogens can also be used for low estrogen soybean of benefit to young females (aged 0-6), mammary carcinoma patients, zoological animals on soybean supplement, and human males.
Additionally, different phytoestrogens can have opposite effects on animal reproduction and growth [18,19]. The biological effects of these isoflavones appear to depend on species, genotype, age, purity, dose, and cofactors [20].
All soybean organs can produce isoflavones in response to stress, but the seeds are the major site of synthesis for storage. In inheritance studies with diverse cultivars, heritabilities of seed isoflavone amounts range from 20%-40% [21]. In a recombinant inbred line (RIL) population, the heritability of isoflavones ranged from 22% for genistein to 79% for daidzein and 88% for glycitein [22]. The environmental interactions cause the amount of isoflavones in soybean seed varies five fold among years, environments, and genotypes [21,22]. However, DNA markers closely linked to the 6 loci controlling isoflavone content can be used to double content and reduce variability in isoflavone content by 2-4 fold [22].
Many genes involved in the production of isoflavones from phenylalanine have been isolated and over-expressed in transgenic plants [23]. A key point enzyme in pathway regulation seems to be isoflavone synthase (P450) involved indirectly in both the genistein and daidzien biosynthesis pathways. The gene was isolated and shown to be differently regulated in seeds and vegetative organs [23,24]. Transgenic plants overexpressing the enzyme in soybean and Arabidopsis did produce a significant increase in leaf and seed phytoestrogen content [23] but less than the fold increase in protein abundance. Therefore, the loci that control the amount of phytoestrogen produced in nontransgenic plants may also regulate production during transgene-driven synthesis.
The purpose of this study is to identify new genetic markers that are linked to loci conditioning variation in daidzein, genistein, and glycitein. This genetic information would then be used in plant breeding to develop soybean varieties that have high and reliable concentrations of beneficial isoflavones.

Plant material
The study involved 100 RILs from "Essex × Forrest" [22,25]. Soybean seeds were planted in two different environments in southern Illinois over two years (1996,1997) in a randomized complete block (RCB) array as described in [22].

Isoflavone extraction
Isoflavones were extracted from each RIL. Within location, replicates were pooled before extraction. Isoflavone contents were determined by high performance liquid chromatography (HPLC) as described in [22].

DNA isolation
RILs were grown in the greenhouse; 3 g of leaves were collected from 5 to 6, 2-week-old seedlings and immediately frozen in liquid nitrogen. The leaves were ground in liquid nitrogen into a very fine powder and DNA was extracted as described in [22]. DNA concentration was measured by a fluorometer and diluted to 15-30 ng/µL for further use in PCR reactions. The two subpopulations were assayed separately and the results pooled.

Microsatellite amplification
Microsatellite markers from the 20 linkage groups of the soybean genetic map were tested for polymorphism between Essex and Forrest [26]. The primer pairs were purchased from Research Genetics, Inc, (Huntsville, Ala, USA). The PCR amplifications were performed in 96-well microtitre plates using a Perkin Elmer (Foster City, Calif) GeneAmp 9600 as described [22]. Two negative controls (with no template DNA), along with the two parents DNA as positive controls, were included in all the amplification sets. Marker scores were repeated on 3 occasions by 2 operators. Only unequivocal scores were accepted to reduce or eliminate marker scoring errors.

Data analysis
Polymorphic DNA markers were compared against all traits by a one-way analysis of variance (SAS Institute Inc, Cary, NC; see [25]). The probability of association of each marker with each trait was determined and a significant association with an interval was declared if marker(s) in that interval were associated at P ≤ 0.0005 because we calculated an approximate Bonferroni correction as P < 0.05/100 for the set of independent (unlinked or greater than 10 cM apart) DNA markers. Allowing for the gaps greater than 10 cM in the map, we also accept P < 0.009 as a significant association if the interval was flanking a single marker. Precedents with mapping other traits have shown these criteria to be valid [22,25,27].
MapMaker-EXP 3.0 was used to estimate map distances (cM, Haldane units) between linked markers [22]. The log10 of the odds ratio (LOD) for grouping markers was set at 2.0 and maximum distance was 30 cM. Trait data and marker map were simultaneously analyzed with MapMaker/QTL 1.1 using the F 2 -backcross genetic model for trait segregation [22,25] to identify the approximate position of QTL within intervals. Putative QTL were inferred when the LOD score exceeded 2.0 at some point in each interval. The position of the QTL was inferred from the LOD peaks at individual loci detected by maximum likelihood test at positions every 2 cM between adjacent linked markers.

Polymorphism and linkage
A total of 600 microsatellite markers were tested and 240 were found to be polymorphic within the Essex × Forrest population. Two hundred and one markers formed 22 linkage groups. By assuming all microsatellite markers map to a single locus in all soybean cultivars, the markers could all be anchored to the 20 linkage groups of soybean [26]. The linked markers encompassed about 2990.4 cM with an average distance of 26.4 cM between markers [28]. The remaining 39 markers were not linked but do map to known locations in the soybean genome [26]. This coverage is comparable to the known recombination distances of about 2500-3000 cM within 20 linkage groups [26,29]. The order of markers was consistent with that reported for the soybean genome [26]. Marker content ranged from three on linkage groups J and A1 to 22 on linkage group G [28].

Genomic regions associated with isoflavone content
To test the effect of seed storage on soybean isoflavone content, seeds from two subpopulations, 40 and 60 RILs, from the same generation were analyzed by HPLC in two consecutive years. All QTL that were detected in any subpopulation are reported because the marker map is sparse in some regions and real QTL may otherwise not be reported [25]. Table 1. DNA markers and intervals most likely to be associated with QTL for seed isoflavone content across two environments. LG: linkage group. LOD a indicates how much more probable the data are to have arisen assuming the presence of a locus than assuming its absence; LOD threshold = 2.0. QTL Var b : amount of variation in the phenotype explained by the DNA marker using MapMaker QTL. Isoflavone amounts for all the aglycones of daidzein (Daid), genistein (Gen), and glycitin (Gly) are given in micrograms per gram dry weight.

Marker
LG LG

Subpopulation of 40 RILs
Using subpopulation of 40 RILs with data from pooled plot replicates, one new chromosomal region was found to contain a QTL for seed isoflavone content (Table 1 and Figure 1). The new region defined by the marker BARC Satt129 was significantly associated with total glycitein (P = 0.0005, R 2 = 32.0%). The region had an LOD score of 3.6 and explained about 34.3% of the total variation in glycitein content. The region was located on linkage group D1a+Q and derived the beneficial allele from Essex. We also confirmed the positions of the two regions previously reported (Meksem et al [22]) on linkage group B1 but we added a new marker BARC Satt415 that was significantly (P = 0.0001, R 2 = 50.0%) associated with total glycitein content. The region had an LOD score of 5.3 and explained nearly 50.3% of the total variation in glycitein content (Table 2 and Figure 2).

Subpopulation of 60 RILs
Using the subpopulation of 60 RILs, with data from pooled plot replicates, a second new chromosomal region was found to contain QTL for seed isoflavone content (Table 1 and Figure 1). The new second region defined by the marker BARC Satt063 was significantly associated with genistein (P = 0.009, R 2 = 29.5%) and daidzein (P = 0.007, R 2 = 17.0%). The region had an LOD score of 2.9 and explained about 28.0% of the total variation in genistein content. The same region had an LOD score of 2.0 and explained about 16.9% of the total variation of daidzein content. The region is located on linkage group B2 and derived the beneficial allele from Essex. The positions of the regions previously reported (Meksem et al [22]) on linkage groups B1, H, and N (Table 2 and Figure 2) were confirmed. Table 2. DNA markers and intervals containing QTL for seed isoflavone content across two environments (Meksem et al [22]). * New markers (BARC Satt415 and BARC Sct26) mapped recently on linkage group B1 and associated with the QTL that underlie seed isoflavone content. Isoflavone contents for all the aglycones of daidzein (Daid), Genistein (Gen), and Glycitin (Gly) are given in micrograms per gram dry weight.

Marker
LG  Figure 2. Locations of microsatellite markers and QTL that underlie soybean seed isoflavone amount in regions of the soybean genetic map. The QTL LOD score is the peak LOD score of the interval showing association with seed isoflavone content. LOD scores for the intervals are given. * New microsatellite markers mapped [28] compared to the previous map (Meksem et al [22]). Where marker orders disagree with the Soybase composite map (B1 and K), the most likely map from the Essex × Forrest data was used. Gen: genistein; Daid: daidzein; TGly: total glycitein.

The pooled population of 100 RILs
Using the whole population of 100 RILs, with data from pooled plot replicates, one additional chromosomal region was found to contain QTL for seed isoflavone content (Table 1 and Figure 1). This was the same region found using subpopulation of 60 RILs with data from each plot replicate. The region defined by the marker BARC Satt063 was significantly associated with daidzein (P = 0.007, R 2 = 17.0%) and genistein (P = 0.009,

DISCUSSION
Testing a larger number of microsatellite markers increased the sampling of the soybean genome. Thereby, two additional chromosomal regions containing QTL for seed isoflavone amount were discovered on linkage groups B2 and D1a+Q. The positions of the regions previously reported [22] on linkage groups A1, B1, H, K, and N were confirmed and delimited. The identification of eight loci controlling the accumulation of specific isoflavone suggests that it is possible to combine a high glycitein amount (Essex trait) with high daidzein and genistein amounts (Forrest traits) in a single cultivar (Table 3).
Seed isoflavone content was reported to be variable among soybean cultivars grown in different locations and crop years [31]. High temperature during the seed filling stage was also linked to the amount of isoflavones in seeds [32]. In order to identify stable QTL that underlie stable isoflavone amounts across environments and crop years, seeds from the Essex × Forrest RIL population were an-alyzed in two consecutive years; the first year we determined the isoflavone content of a subpopulation of 40 individuals at each of 2 years and locations then produced a mean isoflavone content value. In the second year we analyzed 60 individuals with pooled samples from 2 years and 2 locations. All QTL identified in the subpopulation of 60 individuals were confirmed in the subpopulation of 40, however, the reported QTL on linkage group D1a+Q, identified by BARC Satt129, failed to show an association with the glycitein amount in the subpopulation of 60. We expect this is caused by sampling error in a small population not by changes in isoflavone amount due to storage.
Five of the eight regions were found to control soybean seed glycitein content, four controlled seed daidzein content, and two controlled genistein. Two loci controlled two traits. The new region defined by the marker BARC Satt063 and located on linkage group B2 controlled both seed genistein and seed daidzein. The region located on linkage group B1 and defined by the markers BARC Satt197, Satt251 and Satt415 controlled both seed glycitein and total glycitein contents. These two regions had a consistently strong effect on soybean seed genistein, daidzein, and glycitein contents and could be candidates for positional gene isolation [33].
Considering the possibility of interacting genes and QTL (Figure 3), we examined the Soybase for candidates in the intervals associated with seed isoflavone content. Associations of the loci with other traits were noted. The region located on linkage group N defined by the marker BARC Satt080, significantly associated with daidzein [22], was also reported to contain a QTL for   [46]. Genes encoding the underlined enzymes have been shown to be activated in maize by C1 and R. Dotted arrows represent multiple steps. Enzymes are indicated in italics (adapted from [24,46]). Glycitein (a glyceollin) is hypothesized to be made from daidzein in a multistep process (hydroxylation, oxidation, and acetylation) see Figure 4.
soybean sudden death syndrome and the pod gene L2 [21]. The region located on linkage group N defined by the marker BARC Satt237, significantly associated with glycitein [22], was also reported to contain three seed storage protein genes Gy1, Gy2, and Gy6 [3]. The region located on linkage group K, identified by the marker BARC Satt337, and significantly associated with daidzein, was also found to be associated with high soybean yield [34]. The region located on linkage group H defined by the marker BARC Satt237, significantly associated with glycitein and genistein [22], was also reported to contain two genes controlling pubescense density and seed hydroperoxide lyase. QTL for resistance to sclerotinia stem rot [35], iron efficiency [36], were identified and mapped 5 cM from the QTL identified by the marker BARC Satt129 on L.G. D1a+Q. QTL for plant height [37] and lodging [38] were also identified and mapped 20 and 40 cM, respectively, from the same region. At 5 cM around the region, on L.G. B2, a gene for linolenic acid amount in seed, QTL for iron efficiency of plants [39] and oleic acid amount in seed [40] were identified and mapped. No evidence for logical relationships between the QTL for seed isoflavone and the other traits could be inferred ( Figure 3) although several seed trait genes and QTL seemed clustered with the isoflavone QTL. No evidence for ESTs that encoded enzymes in the biosynthetic pathways ( Figure 3) mapping to the eight intervals was found (in October 2003).
The detection of QTL with large effects for glycitein and daidzein content and fewer QTL with smaller effects for genistein content of soybean may reflect the heritability, environmental stability [22,36], or the number of loci controlling the traits. Glycitein has been shown to be an active phytoestrogen [41], in some assays exceeding the activity of genistein. Therefore, increased glycitein amounts in seed can have a utility. The ability to stabilize [31] phytoestrogen amount is important to soybean value ( Table 3). The ability to manipulate both profile and total isoflavone content of soybean using the eight markers reported here and at the same time manipulate the amounts of other bioactive factors [3,42,43,44] will become increasingly important. Improved soybean varieties that produce high amounts of beneficial chemicals can lead both to new understanding of and new types of nutraceuticals and dietary supplements [45,46].

ACKNOWLEDGMENTS
This research was funded in part by a Grant from the NSF 9872635, ISPOB 02-127-03, and CFAR. The continued support of SIUC, College of Agriculture, and Office of the Vice Chancellor for Research to MJI is highly appreciated. We thank Dr. Patricia Murphy for assistance with the isoflavone analysis. We thank Jeffry Shultz and Kanokporn Triwitayakorn for assistance with marker mapping and data replication and analysis.