Many genetic association studies used single nucleotide polymorphisms (SNPs) data to identify genetic variants for complex diseases. Although SNP-based associations are most common in genome-wide association studies (GWAS), gene-based association analysis has received increasing attention in understanding genetic etiologies for complex diseases. While both methods have been used to analyze the same data, few genome-wide association studies compare the results or observe the connection between them. We performed a comprehensive analysis of the data from the Study of Addiction: Genetics and Environment (SAGE) and compared the results from the SNP-based and gene-based analyses. Our results suggest that the gene-based method complements the individual SNP-based analysis, and conceptually they are closely related. In terms of gene findings, our results validate many genes that were either reported from the analysis of the same dataset or based on animal studies for substance dependence.
Genome-wide association studies (GWAS) have become a powerful tool in the identification of susceptible loci for numerous diseases [
Gene-based methods have been successfully applied to GWAS of complex diseases, including Crohn’s disease [
Recent studies show that there are many candidate genes associated with substance dependence. For example, GABRA2, CHRM2, ADH4, PKNOX2, GABRG3, TAS2R16, SNCA, OPRK1, and PDYN are well studied for alcohol addiction and have been replicated in many samples [
Based on the analysis of the SAGE data, we report a number of susceptible loci at the SNP and/or gene levels, which validate many susceptibility loci that have been reported to be associated with substance dependence [
The dataset included 4,121 subjects in SAGE with six categories of substance dependence data: alcohol, cocaine, marijuana, nicotine, opiates, and other dependencies on drugs. The data were downloaded from dbGaP (study accession phs000092.v1.p1) [
Following the conventional standards, we used
We took several steps in testing the associations between genetic variants (SNP or gene) and substance dependenice. First, the
In order to compare the performance of the SNP-based and gene-based methods, in the SNP-based method, we selected those SNPs whose
Table
Summary statistics for susceptibility loci identified by gene-based method and SNP-based method.
Alcohol | Cocaine | Marijuana | Nicotine | Opiates | Other | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
G | S | G | S | G | S | G | S | G | S | G | S | |
Black men | 4 | 3 | 4 | 1 | 6 | 2 | 5 | 2 | 8 | 2 | 9 | 5 |
Black women | 4 | 3 | 8 | 5 | 9 | 3 | 7 | 3 | 3 | 1 | 6 | 3 |
White men | 16 | 3 | 9 | 2 | 10 | 3 | 4 | 1 | 11 | 3 | 3 | 1 |
White women | 20 | 5 | 12 | 2 | 10 | 2 | 11 | 1 | 4 | 5 | 24 | 3 |
G refers to gene-based method. S refers to SNP-based method.
Next, we performed a literature search on the genetic regions which contain the identified genes and filtered the susceptible genetic regions which have been reported to associate with substance dependence for further investigation. In Table
Summary of the candidate genes identified by the gene-based and SNP-based methods.
Chr | Gene | Source |
|
Min |
Detected SDc | Reported SD | Reference |
---|---|---|---|---|---|---|---|
1 | KIAA0040 | White women |
|
|
Alcohol | Alcohol | [ |
2 | HAAO | White women |
|
|
Cocaine | Alcohol | [ |
2 | NCK2 | Black men |
|
|
Opiates | NA | NA |
3 | SH3BP5 | White men |
|
|
Cocaine | Alcohol | [ |
4 | MANBA | White men |
|
|
Alcohol | Alcohol | [ |
7 | RELN | White men |
|
|
Cocaine | Smoking | [ |
8 | CSMD1 | Black women |
|
|
Nicotine | Smoking | [ |
11 | LRP5 | White men |
|
|
Opiates | Smoking | [ |
11 | PKNOX2 | White women |
|
|
Alcohol | Alcohol | [ |
12 | IFNG | White women |
|
|
Opiates | Smoking | [ |
18 | FAM38B | Black women |
|
|
Cocaine | Smoking | [ |
18 | PTPRM | Black women |
|
|
Marijuana | Alcohol | [ |
22 | MAPK1 | Black women |
|
|
Marijuana | Alcohol | [ |
a
bmin
cSD: substance dependence.
In Figure
Comparison of candidate genes associated with substance dependence by the SNP- and gene-based analyses. A triangle represents the
Overall, five genes, NCK2 (opiates dependence in black men), SH3BP5 (cocaine dependence in white men), LRP5 (opiates dependence in white men), KIAA0040 (alcohol dependence in white women), and PKNOX2 (alcohol dependence in white women), were identified by both the SNP-based and gene-based methods as meeting either of the relaxed significance levels for a specific dependence and within a gender-racial group. Four genes, MAPK1 (marijuana dependence in black women), MANBA (alcohol dependence in white men), HAAO (cocaine dependence in white women), and IFNG (opiates dependence in white women), met the threshold by the gene-based method only. We found that the significant signal of gene MAPK1 was mainly driven by SNPs: rs7290469 (
Furthermore, four other genes, FAM38B (cocaine dependence in black women), PTPRM (marijuana dependence in black women), CSMD1 (nicotine dependence in black women), and RELN (cocaine dependence in white men), contain at least one SNP that met the SNP-based relaxed threshold of significance. The gene-based
Since none of the SNPs attained the genome-wide significance for any dependence by the SNP-based method, in this section we will only focus on the results from the gene-based method.
Table
Summary of genome-wide significant genes at the gene level (
Population | Substance dependence | Gene | Gene's |
Top SNPs | SNP's |
---|---|---|---|---|---|
Black men | Opiates | NCK2 | 2.70 |
rs2377339 |
|
rs7589342 |
|
||||
rs12995333 |
|
||||
rs12053259 |
|
||||
rs6747023 |
|
||||
rs879900 |
|
||||
| |||||
White men | Nicotine | DSG3 | 1.99 |
rs6701037 |
|
rs1057302 |
|
||||
rs6425323 |
|
||||
rs1057239 |
|
In this paper, we thoroughly analyzed the SAGE data from the SNP-based and gene-based methods, and compared the results obtained from these two methods. Specifically, for each sex-racial group, we performed association analysis for the six categories of substance dependence separately. The gene-based method appears to be more powerful in detecting susceptibility loci.
Most of the genes identified in our study are supported by various reports in the literature related to the genetics of substance dependence [
Overall, we did not detect any genome-wide significant SNP when using the SNPs-based method. However, one gene, DSG3,is genome-wide significantly (
The SNP-based method and gene-based method are closely related. In fact, the SNP-based method can be viewed as a gene-based method using the extreme function, namely, the minimal
We should point out that both the SNP-based and gene-based methods have their own advantages and disadvantages. The SNP-based method has its unique strength in identifying genes with only a small number of significant SNPs. However, since the SNP-based method focuses on a single SNP at a time, it is less powerful to detect a gene whose SNPs have weak marginal effects, but a strong joint effect. In our analysis, 207 genes passed the relaxed gene-based threshold, whereas only 64 genes passed the relaxed SNP-based threshold.
Both the SNP-based and gene-based methods can be conducted conveniently in commonly available software, such as PLINK [
X. Guo and Z. Liu are contributed equally to this work.
This work was supported by Grant R01 DA016750-09 from the National Institute on Drug Abuse. Funding support for the Study of Addiction: Genetics and Environment (SAGE) was provided through the NIH Genes, Environment and Health Initiative (GEI) (U01 HG004422). SAGE is one of the genome-wide association studies funded as part of the Gene Environment Association Studies (GENEVA) under GEI. Assistance with phenotype harmonization and genotype cleaning, as well as with general study coordination, was provided by the GENEVA Coordinating Center (U01 HG004446). Assistance with data cleaning was provided by the National Center for Biotechnology Information. Support for the collection of datasets and samples was provided by the Collaborative Study on the Genetics of Alcoholism (COGA; U10 AA008401), the Collaborative Genetic Study of Nicotine Dependence (COGEND; P01 CA089392), and the Family Study of Cocaine Dependence (FSCD; R01 DA013423). Funding support for genotyping, which was performed at the Johns Hopkins University Center for Inherited Disease Research, was provided by the NIH GEI (U01HG004438), the National Institute on Alcohol Abuse and Alcoholism, the National Institute on Drug Abuse, and the NIH contract ‘‘High throughput genotyping for studying the genetic contributions to human disease’’ (HHSN268200782096C). The datasets used for the analyses described in this paper were obtained from dbGaP at