Detecting SNP-SNP interactions associated with disease is significant in genome-wide association study (GWAS). Owing to intensive computational burden and diversity of disease models, existing methods have drawbacks on low detection power and long running time. To tackle these drawbacks, a fast self-adaptive memetic algorithm (SAMA) is proposed in this paper. In this method, the crossover, mutation, and selection of standard memetic algorithm are improved to make SAMA adapt to the detection of SNP-SNP interactions associated with disease. Furthermore, a self-adaptive local search algorithm is introduced to enhance the detecting power of the proposed method. SAMA is evaluated on a variety of simulated datasets and a real-world biological dataset, and a comparative study between it and the other four methods (FHSA-SED, AntEpiSeeker, IEACO, and DESeeker) that have been developed recently based on evolutionary algorithms is performed. The results of extensive experiments show that SAMA outperforms the other four compared methods in terms of detection power and running time.
The development of high-throughput sequencing technology makes it possible to analyze single-nucleotide polymorphisms (SNPs) from thousands of individuals [
In the past few years, many methods have been raised for detecting two-locus disease models. These algorithms can be categorized into exhaustive search, stochastic search, heuristic search, and swarm intelligent optimization algorithms [
The random search uses probabilistic methods to find the optimal solution [
In the recent years, swarm intelligent optimization algorithms arising from natural phenomena and biological system have held high attention in the detection of disease-associated SNP-SNP interactions [
One promising approach for tackling the drawbacks mentioned above is to use a fast local search in the evolutionary algorithm. Hybridization of genetic algorithms (GAs) with local search (LS) has already been studied in various optimization problems [
The paper is organized as follows. In Section
A set of SNPs is represented by
It is a time-consuming task to detect SNP-SNP interactions associated with disease if all possible two-locus interactions from hundreds of thousands of SNPs are considered in a genome-wide scale. In this paper, a fast self-adaptive memetic algorithm (SAMA) is proposed to enhance the detection power of two-locus SNP-SNP interactions in an efficient way.
Memetic algorithm (MA) [
The framework of MA.
The SAMA algorithm randomly generates a initial population with
The crossover operator, a fundamental genetic search operator, takes advantage of the information available in the search space. In the SAMA algorithm, we use a hybrid crossover (HS) to cross two individuals. HC can be considered the hybrid between the current best individual and the individuals in the current iteration. The pseudocode of HC is shown in Algorithm
1: 2: Initialize 3: 4: Finds the current optimal solution 5: 6: 7: 8: 9: 10: 11: 12: Finds the current optimal solution 13. Calculate 14: 15: Record 16: 17:
1: 2: 3: 4: 5: 6: 7: 8: 9:
In the algorithm, the current best individual
The mutation operator is used to randomly create the diversity of individuals in a population. We use a mutation called distributed breeder mutation (DBM) in the SAMA algorithm. DBM, inspired by the breeder genetic algorithm proposed by Muhlenbein and Schlierkamp-Voosen [
If the mutated individual
1: Compute 2: Select 3: Determine the range 4: 5: 6: 7: 8: 9: 10: 11:
Local search (LS) is a simple iterative method for finding approximate solutions. If a candidate solution has better or equal fitness, LS moves the search from the current solution to the candidate solution. If LS is applied to every solution many times, the running time is very long because the additional functional evaluations required for LS is expensive. Thus, a self-adaptive LS (SLS) is introduced, which uses a probability to reduce the number of times that are used for local search. The probability that each individual is selected to allpy the SLS operation is
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17:
In the SAMA algorithm, an elitist selection is introduced to select individuals that evolve to the next iteration. After HC, DBM, and SLS, the ES operation is performed according to
If the fitness value of the individual
In this subsection, we give a running instance of SAMA in Figure
A running instance of SAMA.
First, we perform the HC operation. Suppose
Next is the DBM operation. We assume that
After completing HC and DBM, the SLS operation is executed.
Finally, the selection operation is performed. We suppose that
To evaluate of the performance of the SAMA algorithm, we test it on both simulated and real-world biological datasets. we compare it with FHSA-SED, AntEpiSeeker, IEACO, and DESeeker on these datasets. For the simulated datasets, we adopt three two-locus disease models. For the real-world biological dataset, we run SAMA on an age-related macular degeneration (AMD) data [
In this subsection, we carry out the experiments in three simulated disease models (Models 1-3) [
Details of three two-locus disease models.
MAF | 0.05 | 0.10 | 0.20 | 0.50 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AA | Aa | aa | AA | Aa | aa | AA | Aa | aa | AA | Aa | aa | ||||
Model 1 ( | |||||||||||||||
BB | 0.098 | 0.098 | 0.098 | BB | 0.096 | 0.096 | 0.096 | BB | 0.092 | 0.092 | 0.092 | BB | 0.078 | 0.078 | 0.078 |
Bb | 0.098 | 0.299 | 0.522 | Bb | 0.096 | 0.197 | 0.282 | Bb | 0.092 | 0.145 | 0.181 | Bb | 0.078 | 0.105 | 0.122 |
bb | 0.098 | 0.522 | 0.912 | Bb | 0.096 | 0.282 | 0.408 | Bb | 0.092 | 0.181 | 0.227 | Bb | 0.078 | 0.122 | 0.142 |
Model 2 ( | |||||||||||||||
BB | 0.096 | 0.096 | 0.096 | BB | 0.092 | 0.092 | 0.092 | BB | 0.084 | 0.084 | 0.084 | BB | 0.052 | 0.052 | 0.052 |
Bb | 0.096 | 0.533 | 0.533 | Bb | 0.092 | 0.319 | 0.319 | Bb | 0.084 | 0.210 | 0.210 | Bb | 0.052 | 0.138 | 0.138 |
bb | 0.096 | 0.533 | 0.533 | Bb | 0.092 | 0.319 | 0.319 | Bb | 0.084 | 0.210 | 0.210 | Bb | 0.052 | 0.138 | 0.138 |
Model 3 ( | |||||||||||||||
BB | 0.080 | 0.192 | 0.192 | BB | 0.072 | 0.164 | 0.164 | BB | 0.061 | 0.146 | 0.146 | BB | 0.067 | 0.155 | 0.155 |
Bb | 0.192 | 0.080 | 0.080 | Bb | 0.164 | 0.072 | 0.072 | Bb | 0.146 | 0.061 | 0.061 | Bb | 0.155 | 0.067 | 0.067 |
bb | 0.192 | 0.080 | 0.080 | Bb | 0.164 | 0.072 | 0.072 | Bb | 0.146 | 0.061 | 0.061 | Bb | 0.155 | 0.067 | 0.067 |
In the experiments, we set the same maximum number of iterations for the five algorithms, that is, the maximum iteration number for datasets with
Parameter setting of five algorithms.
Algorithm | Parameters |
---|---|
SAMA | The crossover probabilities |
FHSA-SED | The harmony memory considering rate |
AntEpiSeeker | The size of large SNP sets |
IEACO | The switch parameter |
DESeeker | The number of SNPs in a large size SNP combination |
With the purpose of conducting the experiments comprehensively, we introduce two measurements: detection power and running time. The detection power is defined below:
Figures
Power comparison of five compared algorithms on the datasets with 200 SNPs.
Power comparison of five compared algorithms on the datasets with 2000 SNPs.
Tables
Running time of five compared algorithms on the datasets with
Model | MAF | SAMA | FHSA-SED | AntEpiSeeker | IEACO | DESeeker |
---|---|---|---|---|---|---|
Model 1 | 0.05 | |||||
0.10 | ||||||
0.20 | ||||||
0.50 | ||||||
Model 2 | 0.05 | |||||
0.10 | ||||||
0.20 | ||||||
0.50 | ||||||
Model 3 | 0.05 | |||||
0.10 | ||||||
0.20 | ||||||
0.50 |
Running time of five compared algorithms on the datasets with
Model | MAF | SAMA | FHSA-SED | AntEpiSeeker | IEACO | DESeeker |
---|---|---|---|---|---|---|
Model 1 | 0.05 | |||||
0.10 | ||||||
0.20 | ||||||
0.50 | ||||||
Model 2 | 0.05 | |||||
0.10 | ||||||
0.20 | ||||||
0.50 | ||||||
Model 3 | 0.05 | |||||
0.10 | ||||||
0.20 | ||||||
0.50 |
According to the results of the simulated experiments, SAMA performs well for detecting two-locus SNP-SNP interactions. In this section, we conduct experiments on a real-world biological dataset [
The number of two-locus SNP-SNP interactions detected by five algorithms.
Table
Results of two-locus SNP-SNP interactions detected by SAMA on AMD dataset.
SNP 1 | Gene | SNP 2 | Gene | |
---|---|---|---|---|
rs380390 | CFH | rs1363688 | NA | <1.0e-08 |
rs380390 | CFH | rs2224762 | KDM4C | |
rs380390 | CFH | rs555174 | NA | |
rs380390 | CFH | rs1374431 | NA | |
rs380390 | CFH | rs1740752 | NA | |
rs1329428 | CFH | rs7467596 | MED27 | <1.0e-07 |
rs1329428 | CFH | rs9328536 | MED27 | |
rs1329428 | CFH | rs3922799 | NA | |
rs1329428 | CFH | rs10489076 | NA | |
rs1740752 | N/A | rs3009336 | NA | |
rs380390 | CFH | rs718263 | NCALD | |
rs380390 | CFH | rs223607 | NA | |
rs380390 | CFH | rs620511 | NA | |
rs380390 | CFH | rs2178692 | COPS7A | |
rs380390 | CFH | rs34512 | NA | |
rs380390 | CFH | rs3853728 | EGFEM1P | |
rs380390 | CFH | rs210758 | NA | |
rs380390 | CFH | rs2446023 | ZNF518A | |
rs380390 | CFH | rs2167167 | NA | |
rs380390 | CFH | rs956275 | PPAT | |
rs380390 | CFH | rs1896373 | NA | <1.0e-06 |
rs380390 | CFH | rs1896373 | NA | |
rs380390 | CFH | rs143627607 | DDX3X | |
rs1329428 | CFH | rs10504043 | ANK1 | |
rs1329428 | CFH | rs10272438 | BBS9 | |
rs1329428 | CFH | rs2695214 | PPP3CA | |
rs1329428 | CFH | rs78812154 | NA | |
rs1329428 | CFH | rs74412587 | NA | |
rs1329428 | CFH | rs1363688 | NA | |
rs1329428 | CFH | rs9328536 | MED27 | |
rs1740752 | NA | rs943008 | NEDD9 |
Number of two-locus SNP-SNP interactions detected by SAMA under different parameters.
0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | |
---|---|---|---|---|---|---|---|---|---|
.1 | 9 | 12 | 14 | 17 | 19 | 18 | 17 | 13 | 10 |
.2 | 12 | 14 | 17 | 20 | 23 | 21 | 18 | 16 | 11 |
.3 | 13 | 13 | 16 | 19 | 21 | 18 | 20 | 16 | 13 |
.4 | 13 | 15 | 16 | 20 | 24 | 21 | 21 | 18 | 18 |
.5 | 16 | 17 | 17 | 23 | 30 | 25 | 23 | 20 | 19 |
.6 | 15 | 17 | 18 | 24 | 28 | 25 | 25 | 22 | 17 |
.7 | 15 | 13 | 18 | 25 | 27 | 26 | 27 | 21 | 19 |
.8 | 14 | 14 | 22 | 28 | 31 | 30 | 27 | 25 | 26 |
.9 | 12 | 13 | 17 | 23 | 29 | 25 | 26 | 22 | 21 |
In the paper, we propose the SAMA algorithm to detect two-locus SNP-SNP interactions associated with disease. The global search ability of SAMA is greatly increased by using HC, DBM, and EC. The self-adaptive behavior of SLS enhances the local search ability of SAMA without significantly increasing the running time. When using simulated datasets, the experimental results indicate that SAMA is more effective than FHSA-SED, AntEpiSeeker, IEACO, and DESeeker in terms of detection power and running time. When utilizing the real-world biological dataset, the experiments show that the proposed algorithm successfully detected known disease-associated SNP-SNP interactions and some new suspected interactions. However, the SAMA algorithm still has some limitations. First, the detection power of SAMA is low for the disease models with small
Ant colony optimization
Two-stage ant colony optimization algorithm
Age-related macular degeneration
Differential evolution
Distributed breeder mutation
Two-stage differential evolution algorithm
Elitist selection
Harmony search algorithm with two scoring functions
Genetic algorithm
Genome-wide association study
Self-adjusting ant colony optimization based on information entropy
Hybrid crossover
Local search
Memetic algorithm
Minor allele frequency
Self-adaptive memetic algorithm
Single-nucleotide polymorphism
Self-adaptive local search.
The data used to support the findings of this study are included within the article, which are described in detail in [
The auhors declare that they have no conflicts of interest.
This work was supported in part by the National Natural Science Foundation Program of China under grant 61772124.