Quantitative trait locus (QTL) mapping is usually performed using markers that follow a Mendelian segregation ratio. We developed a new method of QTL mapping that can use markers with segregation distortion (non-Mendelian markers). An EM (expectation-maximization) algorithm is used to estimate QTL and SDL (segregation distortion loci) parameters. The joint analysis of QTL and SDL is particularly useful for selective genotyping. Application of the joint analysis is demonstrated using a real life data from a wheat QTL mapping experiment.
Segregation distortion is a phenomenon that the genotypic frequency array of a locus does not follow a typical Mendelian ratio. Depending on the population under investigation, Mendelian ratio of a locus varies from 1 : 1 for a backcross to 1 : 2 : 1 for an F2 and to 1 : 1 : 1 : 1 for a four-way cross. These ratios hold for codominant markers. For some reasons, a marker may not follow a typical Mendelian ratio. Such a marker is called a distorted marker. For a long period of time, the effects of distorted markers on the result of QTL mapping were not known. For the reason of precaution, people simply discarded all the distorted markers in QTL mapping. Recently, we found that distorted markers can be safely used for QTL mapping with no detrimental effect on the result of QTL mapping [
Marker segregation distortion is only a phenomenon. The reason behind the distortion is due to one or more segregation distortion loci (SDL). These loci are subject to gametic selection [
We only investigate interval mapping where a model contains a single QTL at a time and the entire genome is scanned through repeated calling of the same program for different locations of the genome. The technical difference between the joint mapping and QTL mapping occurs only in one place. In the traditional interval mapping of QTL, the conditional probabilities of genotypes for a QTL are calculated using flanking marker genotypes with the prior probabilities of QTL genotypes being substituted by the Mendelian ratio. For the joint mapping, the genotypic frequencies (segregation ratios) are treated as unknown parameters that are subject to estimation. We use an F2 population as an example to demonstrate the method. Extension to other population is discussed subsequently.
Let
Let
Let
Let
The MLE (maximum likelihood estimate) of the parameters is solved via an EM algorithm [
The expectation step of the EM algorithm requires computing the expectation of
The maximization step of the EM algorithm requires taking the partial derivatives of
Hypothesis tests are complicated when QTL segregate in a non-Mendelian fashion. There are many different hypotheses we can test here. Although the Wald test can be performed for testing the presence of QTL, such a test is not justified for testing the null hypothesis of Mendelian segregation. Therefore, the likelihood ratio tests are more justifiable. Regardless what hypothesis is tested, the full model joint log likelihood function given in (
The null hypothesis is
The null hypothesis is
The null hypothesis is
This example demonstrates the application of the method to joint mapping of QTL and SDL in a wheat QTL mapping experiment. The experiment was conducted by Dou et al. [
The likelihood ratio test statistics were divided by 4.61 to obtain their corresponding LOD scores. The LOD score profiles across the genome are shown in Figure
Estimated parameters for eight loci (QTL/SDL) of the wheat QTL and SDL analysis using an F2 family derived from two inbred lines of the wheat.
Locus | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|---|---|---|
Type | SDL | QTL | QTL | QTL | QTL | SDL | SDL | SDL |
Chromosome | 1 | 1 | 2 | 2 | 2 | 3 | 5 | 5 |
Position (cM) | 0.00 | 19.8 | 15.78 | 29.57 | 34.79 | 67.32 | 32.11 | 79.79 |
Interva | 0.00–5.45 | 0.00–35.15 | 12.40–18.18 | 27.41–31.74 | 34.35–36.10 | 59.21–67.32 | 22.00–35.25 | 72.56–102.84 |
LOD score | 3.5 | 3.13 | 15.77 | 29.87 | 21.86 | 3.35 | 18.12 | 8.6 |
0.2099 | 0.1986 | 0.2334 | 0.2684 | 0.2637 | 0.1852 | 0.0763 | 0.1071 | |
0.4239 | 0.5083 | 0.5456 | 0.5054 | 0.494 | 0.4568 | 0.4291 | 0.4801 | |
0.3663 | 0.2931 | 0.221 | 0.2262 | 0.2423 | 0.358 | 0.4946 | 0.4128 | |
0.1435 | 0.2076 | 0.3856 | 0.4438 | 0.3963 | 0.0307 | |||
0.0494 | 0.1594 | 0.3071 | 0.4159 | 0.3542 | 0.0269 | |||
0.2796 | 0.2622 | 0.2032 | 0.16 | 0.1847 | 0.2917 | 0.2913 | 0.2837 | |
0.0375 | 0.0962 | 0.3252 | 0.4697 | 0.373 | 0.0022 | 0.005 | 0.0495 | |
1.1247 | 1.0619 | 0.9509 | 0.8943 | 0.9398 | 1.1163 | 1.1023 | 1.2503 |
LOD score profiles for the wheat genome. The 5 chromosomes of the genome are separated by the gray reference lines. (a) The top panel represents the LOD profile for testing significance of QTL for the female sterility of wheat (regardless whether segregation is distorted or not). (b) The panel in the middle represents the LOD profile for testing significance of SDL (regardless whether a QTL is present or not). (c) The panel at the bottom represents the LOD profile for testing both QTL and SDL (joint test and the null model being no QTL and no SDL).
Estimated genotypic frequencies for the wheat genome. Frequencies of the three genotypes are represented by areas with different patterns. Chromosomes are separated by the gray reference lines.
LOD score profiles for the simulated genome (single chromosome). The horizontal line at LOD = 3 represents the threshold. (a) The top panel represents the LOD profile for testing significance of QTL for the simulated trait (regardless whether segregation is distorted or not). (b) The panel in the middle represents the LOD profile for testing significance of SDL (regardless whether a QTL is present or not). (c) The panel at the bottom represents the LOD profile for testing both QTL and SDL (joint test and the null model being no QTL and no SDL).
One of the major theoretical contributions of this study is the development of the variance-covariance matrix of the estimated QTL-SDL parameters. The covariances between pairs of estimated parameters are not of interest, but the variances of the estimated parameters are important. We reported the standard errors for two selected loci, locus 4 (QTL) and locus 7 (SDL). These standard errors are listed in Table
Standard errors of the estimated parameters for loci 4 (QTL) and 7 (SDL) of the wheat F2 mapping population (see Table
Parameter | Locus 4(QTL) | Locus 7(SDL) | ||||
Estimate | StdErr (EM) | StdErr (Boots) | Estimate | StdErr (EM) | StdErr (Boots) | |
0.8943 | 0.03705 | 0.06233 | 1.1023 | 0.06934 | 0.07434 | |
0.4438 | 0.03721 | 0.07086 | 0.06934 | 0.07653 | ||
0.4159 | 0.05258 | 0.08066 | 0.09138 | 0.09763 | ||
0.1600 | 0.01515 | 0.04082 | 0.2913 | 0.02644 | 0.02771 | |
0.2684 | 0.02905 | 0.02857 | 0.0763 | 0.01767 | 0.01749 | |
0.5054 | 0.03274 | 0.03437 | 0.4291 | 0.03336 | 0.03511 |
Statistical methods for mapping quantitative trait loci are well developed for Mendelian populations. Methods also are available for mapping viability loci or segregation distortion loci when markers do not segregate in a typical Mendelian ratio [
An obvious situation where the joint analysis can be more powerful is QTL mapping with selective genotyping. In most designed selective genotyping experiments, two groups of extreme phenotypes are selected for genotyping. The power increase under selective genotyping has been demonstrated [
Segregation distortion may be caused by gametic selection, zygotic selection, or both. Our model was developed under zygotic selection because we are dealing with the genotypic frequencies. However, if the true cause of segregation is gametic selection, we can still detect segregation distortion as long as the gametic selection leads to the genotypic frequencies deviating from the expected Mendelian ratio. A model particularly handling gametic selection has not been developed yet, but it is not difficult. Similar to the zygotic selection model, gametic selection requires known marker linkage phases. Let us take the F2 population as an example to show the gametic selection model. Denote the frequencies of
The joint analysis developed in this study only applies to line crossing data where the marker linkage phases are known. It cannot be applied to pedigree data analysis. Application of the method to pedigrees warrants further investigation and it is not obvious to us at this moment. However, extension to other line crossing families is possible. We have already extended the method to BC (backcrosses), RIL (recombinant inbred lines), DH (double haploids), and FW (four way crosses) and incorporated them into our QTL mapping program that is described in the next paragraph. The extension also includes dominance markers and missing marker genotypes.
The proposed joint mapping applies to interval mapping only. Extension to multiple QTL/SDL mapping is difficult. However, interval mapping is still the quickest method of QTL mapping, even though multiple QTL mapping programs are available. Compared with traditional QTL interval mapping, this joint analysis involves one additional step of updating the genotypic frequencies. This additional step presents a complication where the conditional genotypic frequencies given flanking marker genotypes cannot be calculated prior to QTL mapping. They must be calculated with the phenotypic values along with the flanking marker genotypes. This complication makes modification of existing QTL mapping programs difficult. Fortunately, we have incorporated the joint QTL/SDL mapping into our QTL mapping program. This program is a SAS procedure called PROC QTL [
Let us define the individual-wise complete-data log likelihood for plant
We now present the score vector and the Hessian matrix. The score vector is denoted by
This appendix provides methods for estimating parameters under various null models that are required for constructing likelihood ratio test statistics.
The log likelihood for the null model is
The log likelihood function for the null model is
The log likelihood function for the null model is
The authors are grateful to Dr. Dou and his research group for making their wheat QTL mapping data available to them. They also appreciate three anonymous reviewers for their comments and suggestions on an early version of the manuscript. This project was supported by the National Research Initiative (NRI) Plant Genome of the USDA Cooperative State Research, Education and Extension Service (CSREES) 2007-02784 to SX.