Genomewide association studies (GWAS) have extensively analyzed single SNP effects on a wide variety of common and complex diseases and found many genetic variants associated with diseases. However, there is still a large portion of the genetic variants left unexplained. This missing heritability problem might be due to the analytical strategy that limits analyses to only single SNPs. One of possible approaches to the missing heritability problem is to consider identifying multiSNP effects or genegene interactions. The multifactor dimensionality reduction method has been widely used to detect genegene interactions based on the constructive induction by classifying highdimensional genotype combinations into onedimensional variable with two attributes of high risk and low risk for the casecontrol study. Many modifications of MDR have been proposed and also extended to the survival phenotype. In this study, we propose several extensions of MDR for the survival phenotype and compare the proposed extensions with earlier MDR through comprehensive simulation studies.
In early genomewide association studies (GWAS), massive amounts of results have been reported on the associations between singlenucleotide polymorphisms (SNPs) and diseases. By now, 2,051 studies and 14, 836 causal variants (
Traditional statistical methods are not well suited for detecting such interactions since the number of SNPs and their interactions increase exponentially. To address these issues, many bioinformatics methods for identifying genegene interactions have been proposed and one such method is multifactor dimensionality reduction (MDR) [
In this study, we focus on genegene and/or geneenvironment interactions associated with the survival phenotype. In a prospective cohort study, survival time has been one of the important phenotypes in studies of associations with gene expression levels measured by highthroughput microarray technology. Similarly, it has been important to identify the effect of SNPs on the survival phenotype in GWAS. A series of extensions of MDR to the survival phenotype has recently been proposed, which includes SurvMDR [
Recently, a simple approach to MDR analysis of genegene interactions for quantitative traits, called QMDR, has been proposed [
We compare the power of the proposed methods for various parameters including heritability, minor allele frequency (MAF), and censoring proportion with and without adjustment of covariates. It has been found that the improvements of AFTMDR are less sensitive to censoring fraction than the original AFTMDR but tend to have less power as the effect of covariate increases. On the other hand, the improvement of CoxMDR is relatively robust to censoring fraction and tends to have reasonable power across many combinations of parameters.
Since the MDR method has been originally proposed for a binary phenotype in casecontrol study, it was extended to quantitative traits and various sampling designs. Among those, the SurvMDR was first proposed [
To overcome the drawback of SurvMDR, the CoxMDR method was proposed [
Similarly, the AFTMDR method has also been proposed by using the standardized residual as a new classifier under the accelerated failure time model [
As mentioned in the previous section, the improvement of AFTMDR is needed to make it more robust to the fraction of censoring. Based on the simulated data in [
We first transform the continuous standardized residual into a binary variable instead of taking their sum as done in AFTMDR. In other words, the individual having the positive standardized residual is regarded as a control, whereas the individual having the negative standardized residual is regarded as a case. As a result, all data is discretized into 0 or 1 and then the original MDR algorithm is implemented, which is called dAFTMDR (discretized AFTMDR). Though dAFTMDR is based on a binary value as the original MDR, it can adjust the covariate effect using the standardized residual of the AFT model, whereas the original MDR cannot adjust the covariate effect.
Secondly, we specify the lower and upper bounds of the standardized residuals and replace the extreme values of the standardized residuals beyond these bounds by either lower or upper bounds. Then we apply the algorithm of AFTMDR, which is called rAFTMDR (restricted AFTMDR). By replacing the extreme values by the prespecified thresholds, the effect of the outliers on the standardized residual may be weakened when the distribution of the standardized residual is extremely skewed under the heavier censoring. However, the determination of threshold of the lower and upper bounds seems to be arbitrary and it should be considered with the behavior of the standardized residuals.
Recently, a simple MDR approach called QMDR for the quantitative trait has been proposed [
For CoxMDR, we obtain the mean value of the martingale residual for each genotype combination and then compare it with the overall mean of the martingale residual. If the mean value of the martingale residual from the specific genotype combination is greater than the overall mean, the corresponding genotype is considered high risk group. Otherwise, it is considered low risk group, since the larger value of martingale residual has higher risk than expected. Once all of the genotypes are classified as high risk and low risk groups, a new binary attribute is created by pooling the high risk genotype combinations into one group and the low risk into another group. Then we use a
We propose various improvements of AFTMDR and CoxMDR to increase the power for detecting genegene interactions with the survival phenotype. We implement the comprehensive simulation studies to compare the power of these improvements with those of original AFTMDR and CoxMDR.
For the simulation studies, the two diseasecausal SNPs are considered among 20 unlinked diallelic loci with the assumption of HardyWeinberg equilibrium and linkage equilibrium. For the covariate adjustment, we consider only one covariate which is associated with the survival time but has no interactions with any SNPs. The simulation datasets are generated from different penetrance functions which define a probabilistic relationship between a status of high or low risk groups and SNPs. We consider eight different combinations of two minor allele frequencies of 0.2 and 0.4 and the four different heritabilities of 0.1, 0.2, 0.3, and 0.4. For each of the eight heritabilityMAF combinations, a total of 5 models are generated, which yield 40 epistatic models with various penetrance functions, as described in [
Suppose that SNP1 and SNP2 are the two diseasecausal SNPs and let
To generate the survival time, we consider three different models: lognormal, Weibull, and Cox model. For each model, the effect size of the genetic factor is fixed as 1.0 and the effect sizes of adjusted covariate are given as
First, we check whether the false detection rate is close to the expected value when there is no genegene interaction effect because the best model is selected using the maximum balanced accuracy in the algorithm of MDR. To do this, we generate 100 datasets from each of the 40 models, which is a total of 4000 null datasets. Here the false detection rate is estimated as the percentage of times that the method randomly chooses the two diseasecausal SNPs as the best model out of each set of 100 datasets for each model. Table
The false detection rate of AFTMDR, dAFTMDR, rAFTMDR, CoxMDR, qCoxMDR, and qAFTMDR for the lognormal distribution with
MAF 

AFTMDR  dAFTMDR  rAFTMDR  CoxMDR  qCoxMDR  qAFTMDR 

0.2  0  0.008  0.004  0.006  0.006  0.008  0.003 
0.2  0.3  0.002  0.005  0.008  0.006  0.007  0.005 
0.2  0.5  0.007  0.005  0.004  0.005  0.004  0.003 
0.4  0  0.003  0.006  0.006  0.004  0.008  0.006 
0.4  0.3  0.004  0.004  0.005  0.003  0.006  0.003 
0.4  0.5  0.007  0.006  0.005  0.006  0.005  0.008 
MAF: minor allele frequency;
For the power, we consider 100 simulated datasets for each of the 40 models, including two diseasecausal SNPs, and we selected the best model over all possible twoway interaction models without and with adjustment of covariates, respectively. The power of dAFTMDR is estimated as the percentage of times dAFTMDR correctly chooses the two diseasecausal SNPs as the best model out of each set of 100 datasets for each model. The power of the other improvements is defined as the same way of that of dAFTMDR.
Figures
Comparison of the power of AFTMDR, dAFTMDR, and rAFTMDR for the lognormal distribution when
Comparison of the power of AFTMDR, dAFTMDR, and rAFTMDR for the lognormal distribution when
On the other hand, the power of AFTMDR, dAFTMDR, and rAFTMDR behaves similarly when the effect of the covariate increases from
Figures
Comparison of the power of CoxMDR, qCoxMDR, AFTMDR, and qAFTMDR for a Cox model when
Comparison of the power of CoxMDR, qCoxMDR, AFTMDR, and qAFTMDR for a lognormal distribution when
Comparing the simulation results shown in Figures
On the other hand, for a lognormal model, the power of CoxMDR decreases from 0.650 to 0.458 as the censoring fraction increases to 0.3 when the MAF is 0.2 and the heritability is 0.2, whereas the power of qCoxMDR changes from 0.958 to 0.960. In addition, the power of CoxMDR decreases to 0.360 as the censoring fraction increases to 0.5, but the power of qCoxMDR is 0.95, which implies that qCoxMDR is very robust to the censoring fraction. Under the same setting, however, the power of AFTMDR decreases from 0.738 to 0.302 and the power of qAFTMDR decreases from 0.998 to 0.564, respectively, as the censoring fraction increases to 0.3. As the censoring fraction increases to 0.5, the power of AFTMDR and qAFTMDR decreases to 0.098 and 0.232, respectively. This result is consistent for both the Cox model and the lognormal model, which implies that only the power of qCoxMDR is robust to heavy censoring, though the power of qAFTMDR is rather higher for the lognormal model than that for Cox model. These trends are similar for Weibull distribution.
In summary, the simulation results show that AFTMDR, dAFTMDR, rAFTMDR, and qAFTMDR are more sensitive to heavy censoring (more than 0.5) than CoxMDR and qCoxMDR across various situations. However, for the moderate censoring (less than 0.3), dAFTMDR, rAFTMDR, and qAFTMDR perform much better than the original AFTMDR.
Since many findings from GWAS have been published for the last decades, there is still a missing heritability problem. In order to search the missing heritability, we focus on genegene interactions because most of common diseases may be due to the complexity of genegene and/or geneenvironment interactions rather than a single gene effect. Many plausible approaches have been developed by extending existing methods into a more general framework.
In this paper, we propose various improvements to AFTMDR and CoxMDR, which include dAFTMDR, rAFTMDR, qAFTMDR, and qCoxMDR. The motivation to propose dAFTMDR and rAFTMDR is to improve the power of AFTMDR because the performance of AFTMDR is poor when censoring becomes heavier than 0.3. To reduce the effect of heavy censored observation, we discretize the standardized residual into a binary value, which yields dAFTMDR. Alternatively, we truncate the extreme values and replace them by specified lower and upper bounds, which leads to rAFTMDR. As shown in the simulation results, both AFTMDR and rAFTMDR have larger powers than the original AFTMDR for the moderate censoring but still have low powers for the heavy censoring.
In addition, we considered the improvement of QMDR, which has been recently proposed in [
In conclusion, the improvement of CoxMDR, say qCoxMDR, has reasonable power and is robust to the heavy censoring, whereas the several improvements of AFTMDR, say dAFTMDR, rAFTMDR, and qAFTMDR, perform better than AFTMDR but are not robust to heavy censoring. More studies on the behavior of the standardized residuals are needed to improve the power of AFTMDR under the heavier censoring.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This research was supported by Basic Science Research Program through the National Research Foundation (NRF) funded by the Ministry of Education, Science and Technology of Korea (MEST) (NRF: 2013R1A1A3010025 and 2013M3A9C4078158).