Meta-Analysis of Genome-Wide Association Studies Identifies Novel Functional CpG-SNPs Associated with Bone Mineral Density at Lumbar Spine

Osteoporosis is a serious public health issue, which is mostly characterized by low bone mineral density (BMD). To search for additional genetic susceptibility loci underlying BMD variation, an effective strategy is to focus on testing of specific variants with high potential of functional effects. Single nucleotide polymorphisms (SNPs) that introduce or disrupt CpG dinucleotides (CpG-SNPs) may alter DNA methylation levels and thus represent strong candidate functional variants. Here, we performed a targeted GWAS for 63,627 potential functional CpG-SNPs that may affect DNA methylation in bone-related cells, in five independent cohorts (n = 5905). By meta-analysis, 9 CpG-SNPs achieved a genome-wide significance level (p < 7.86 × 10−7) for association with lumbar spine BMD and additional 15 CpG-SNPs showed suggestive significant (p < 5.00 × 10−5) association, of which 2 novel SNPs rs7231498 (NFATC1) and rs7455028 (ESR1) also reached a genome-wide significance level in the joint analysis. Several identified CpG-SNPs were mapped to genes that have not been reported for association with BMD in previous GWAS, such as NEK3 and NFATC1 genes, highlighting the enhanced power of targeted association analysis for identification of novel associations that were missed by traditional GWAS. Interestingly, several genomic regions, such as NEK3 and LRP5 regions, contained multiple significant/suggestive CpG-SNPs for lumbar spine BMD, suggesting that multiple neighboring CpG-SNPs may synergistically mediate the DNA methylation level and gene expression pattern of target genes. Furthermore, functional annotation analyses suggested a strong regulatory potential of the identified BMD-associated CpG-SNPs and a significant enrichment in biological processes associated with protein localization and protein signal transduction. Our results provided novel insights into the genetic basis of BMD variation and highlighted the close connections between genetic and epigenetic mechanisms of complex disease.


Introduction
Osteoporosis is a complex disease mainly characterized by low bone mineral density (BMD) and microarchitectural deterioration of bone tissue, which results in an increased risk of bone fragility and susceptibility to fracture [1]. It is an increasingly serious public health issue in the aging population; the prevalence of osteoporosis at lumbar spine in the elderly is over 20% in the United States [2]. Genetic studies have demonstrated that BMD is under strong genetic control, with heritability ranging between 50 and 85% [3,4].
Genome-wide association studies (GWAS) and metaanalyses of these studies have successfully identified over 250 genetic loci associated with BMDs at different skeletal sites [5][6][7][8][9][10][11]. However, these loci explained approximately 12% of BMD variation [11] and the specific functional variants at these loci were generally unknown. To search for additional genetic loci and to enhance our understanding of the biological mechanisms underlying BMD variation, one effective strategy is to focus on testing of specific variants with high potential of functional effects, such as exonic/nonsynonymous variants [5,12] or variants that may potentially affect regulatory factors [13][14][15][16]. Such strategy can alleviate the multiple testing problem of the conventional GWAS approach and consequently enhance the power to identify novel functional variants associated with the phenotype of interest. In addition, because the hypothesis testing is based on SNPs with potential functions, false positive findings may be minimized due to that the information on prior functional evidence is used.
DNA methylation is an essential epigenetic mechanism for the regulation of transcription. It has profound impacts on chromatin structure, genomic imprinting, embryonic development, X-chromosome inactivation, and the pathogenesis of several human genetic disorders [17]. Although epigenetic regulation by DNA methylation is generally thought to be transcriptionally repressed in gene promoters and transcriptionally activated when occurring in gene bodies [18,19], recent studies suggested a much more complex relationship between DNA methylation and the gene expression pattern. Both positive and negative associations between DNA methylation and gene expression have been revealed across all genomic regions of a gene, and DNA methylation can also modulate alternative RNA splicing via regulation of the RNA Pol II elongation rate [20][21][22][23][24], demonstrating that DNA methylation can have diverse, chromatin cell type-and context-dependent regulatory effects of transcription.
Single nucleotide polymorphisms (SNPs) may introduce or disrupt cytosine-phosphate-guanine dinucleotides (CpG sites), the major substrate for methyl transfer reactions, and therefore dramatically alter the methylation status at the affected loci [25]. These so-called CpG-SNPs have been suggested as an important mechanism through which genetic variants can affect gene function via epigenetics [25,26]. Shoemaker et al. performed genome-wide allele-specific methylation analysis in 16 human cell lines and found that a significant proportion (38-88%) of allele-specific methylation regions relied on the presence of CpG-SNP variations [27]. Similarly, Zhi et al. conducted genome-wide correlation analysis between genetic variants and DNA methylation levels in human blood CD4+ T cells and found that over 80% of CpG-SNPs were local methylation quantitative trait loci (cis-meQTLs) and CpG-SNPs accounted for over 2/3 of the strongest meQTL signals [28]. The effect of CpG-SNPs often extended beyond the directly affected CpG sites to surrounding regions, likely via correlated proximal methylation patterns and genetic linkage disequilibrium (LD) [25,28,29]. These evidences strongly suggested that CpG-SNPs are a crucial type of cis-regulatory polymorphic variants connecting genetic variation to the individual variability in epigenome. By focusing on CpG-SNPs in selected candidate genes, several studies have identified significant associations between CpG-SNPs with human complex disorders, such as breast cancer [30], type 2 diabetes [29], alcohol dependence [31], and suicide attempt in schizophrenia [32], implying that focusing on CpG-SNPs is an efficient strategy to identify novel functional variants underlying human complex disorders/traits.
In this study, we performed a targeted GWAS analysis for BMD on CpG-SNPs. As DNA methylation profiles are often cell-type specific [33], we further narrowed down to CpG-SNPs that are also meQTLs in an osteoclast-lineage cell, specifically, human peripheral blood monocytes (PBMs). PBMs can act as precursors of osteoclasts, produce cytokines important for osteoclast differentiation and function, serve as a major target cell of sex hormones for bone metabolism [34][35][36][37][38], and have been demonstrated as an excellent cell model for studying osteoporosis-related gene/protein expression patterns and their regulatory mechanisms [39][40][41][42][43][44][45][46][47][48][49][50]. Therefore, our targeted potential functional CpG-SNPs represent prominent candidates that can regulate BMD variation by affecting gene activity via epigenetic mechanisms in bone-related cells.

Study Cohorts.
The discovery dataset incorporated a total of 5905 subjects from five GWAS, of which three studies were "in-house" studies: (1) Omaha Osteoporosis Study (Caucasian ancestry, n = 987), (2) Kansas City Osteoporosis Study (Caucasian ancestry, n = 2250), and (3) China Osteoporosis Study (Han Chinese ancestry, n = 1547), and two studies were "external" studies obtained from the Database on Genotypes and Phenotypes (dbGaP): (1) Women's Health Initiative Observational Study African-American Substudy (African ancestry, n = 712) and (2) Women's Health Initiative Observational Study Hispanic Substudy (Hispanic ancestry, n = 409). The basic characteristics of the five study cohorts were shown in Supplementary Table 1. All studies were reviewed and approved from respective institutional review boards, and each eligible participant provided written informed consent for enrolment. The replication dataset included the summary statistics for the association of approximately 10 million SNPs with BMD by the Genetic Factors for Osteoporosis Consortium (GEFOS) [5,8]. To our knowledge, it is the largest GWAS metaanalysis dataset for BMD association to date in the bone field [5,8].
2.2. Selecting Potential Functional CpG-SNPs. The CpG-SNPs that are potentially functional in PBMs were selected according to the following steps: (1) We identified CpG-SNPs in the human genome by interrogating the extensive catalog of common and rare genetic variants from the 1000 Genomes reference panel [51] and our in-house whole-genome high-coverage deep resequencing study [52]. A SNP was defined as a CpG-SNP if it introduces or disrupts a CpG site. A total of 3,363,517 CpG-SNPs was identified throughout the human genome.
(2) We retrieved 39,859 PBM meQTLs at a stringent significance threshold (FDR < 0.001) from the previous study that assessed the association of over 7 million SNPs with methylome of PBMs in 200 unrelated individuals [53]. We then used SNiPA [54] to identify proxy SNPs in strong LD with retrieved PBM meQTLs. The search was depended on genotype information from the 1000 Genomes Project with the European samples [51]. The inclusion criteria for proxy SNPs were set as a pairwise r 2 threshold > 0.9 and a distance limit of 10 kb from the query meQTL. A total of 175,710 potential PBM DNA methylation-associated SNPs (reported meQTLs and proxy of meQTLs) were identified.
(3) By finding common SNPs between the CpG-SNPs and PBM DNA methylation-associated SNPs, we identified a total of 68,041 CpG-SNPs that are potentially functional in PBMs.

BMD Measurements.
The lumbar spine BMD was determined by either the Hologic Inc. (Bedford, MA, USA) or GE Lunar Corp. (Madison, WI, USA) dual-energy X-ray absorptiometry (DXA) scanner following the respective manufacturer's scan protocols. For each GWAS, multiple potential covariates such as scanner ID, sex, height, weight, age, and age 2 were screened using a forward stepwise linear regression. The significant covariates were used to adjust for raw BMD measurements. Correction of potential population stratification was performed with principal component analysis (PCA), and the top five PCs (i.e., PC1-PC5) were also included as covariates. Residual scores of adjusted phenotypes were normalized by inverse quantile of the standard normal distribution, which was analyzed subsequently.

Genotyping and Quality Control.
For each GWAS, genome-wide genotyping was performed by either Affymetrix Inc. (Santa Clara, CA, USA) or Illumina Inc. (San Diego, CA, USA) high-density SNP genotyping platforms following respective manufacturer's assay protocols. Quality control was implemented by PLINK (http://pngu.mgh.harvard.edu/ purcell/plink/) with the following criteria: individual missingness < 5%, SNP with successful call rate > 95%, and Hardy-Weinberg equilibrium p value > 1.0 × 10 −5 . PCs derived from genome-wide genotyping analysis were used to monitor the population outliers.

Genotype Imputation.
To allow for the merging of datasets from different types of genotyping platform to obtain higher depth of genome coverage, we performed extensive genotype imputation analysis. Generally, haplotype inference of each GWA study was initially phased by a Markov Chain Haplotyping algorithm (MACH) [55] and Minimac [56] was then used to impute genotypes at untyped variants based on haplotype data from the 1000 Genomes reference panel [51]. For each GWA study, the haplotype reference panel of relevant population was used to impute genotypes at untyped variants. SNPs with imputation quality score (r 2 ) > 0.3 and minor allele frequency (MAF) > 0.05 in no less than 2 studies were retained in the subsequent analyses. Imputation with the 1000 Genomes Project reference panels generated genotype data for more than 11.2 million SNPs. Among the 68,041 potential functional CpG-SNPs, 63,627 CpG-SNPs had qualified genotype data (genotyped + imputed) and thus were tested in the following GWAS meta-analyses.

Association Tests and Meta-Analyses.
For each GWAS, we test the association between directly typed/imputed SNPs and lumbar spine BMD using an additive genetic model. The association of unrelated subjects in each GWAS was tested by fitting a linear regression model with MACH2QTL [55] in which allele dosage was considered as a phenotype predictor. The genomic inflation factor (λ GC ) [57] was also estimated for each individual GWAS. We performed meta-analysis using software METAL [58] which based on weights proportional to the square root of the number of subjects in each sample, and between-study heterogeneity was estimated by Cochran's Q statistic and I 2 . Genome-wide significance threshold was defined as a p value < 7.86 × 10 −7 (Bonferroni correction for testing 63,627 selected CpG-SNPs).

Function
Annotation of the CpG-SNPs. CpG-SNPs were annotated with SNPnexus [59] based on reference genome GRCh37 and assigned to candidate genes (±2 kb upstream and downstream). In order to test the potential functional importance of the identified CpG-SNPs, we applied Hap-loReg [60] to annotate selected CpG-SNPs to enhancer histone marks (H3K4me1/H3K27ac) across diverse tissue/ cell types from the Roadmap Epigenomics Projects and test the effect of SNPs on changing the regulatory motifs and the effect of SNPs on the regulation of gene expression of target genes. We employed the software GOEAST [61] to identify significant gene ontology terms among genes associated with identified novel functional CpG-SNPs in lumbar spine.

Results
In this study, we identified 68,041 potential functional CpG-SNPs that may both affect DNA methylation by introducing or disrupting CpG sites and influence DNA methylation levels in human PBMs. Interestingly, although over 50% of these potential functional CpG-SNPs were mapped to introns, we observed a significant enrichment of potential functional CpG-SNPs in 5 ′ /3 ′ -UTR regions (fold change > 2) and underrepresentation in intergenic regions (Supplementary Figure 1), when comparing to the overall profile of CpG-SNPs in the human genome.
To further explore the potential functional significance of the identified significant/suggestive CpG-SNPs, we annotated these CpG-SNPs to various chromatin states and other possible regulatory elements with data from Roadmap Epigenomics and GTEx projects through the HaploReg program [60]. The chromatin state and histone modification data suggested the evidence of regulatory potential in the identified CpG-SNPs. 20 CpG-SNPs altered the regulatory motif, along with 14 CpG-SNPs involved enhancer histone markers. Notably, the novel SNPs rs9535889, rs9526841, and rs2408611 in NEK3 gene were all located in regions with strong transcription and enhancer activities in PBMs as well as various other tissues and cell types (Table 2), highlighting strong regulatory potential of these CpG-SNPs. In addition, many identified CpG-SNPs may affect binding of various transcription factors and have numerous reported eQTL evidences in various tissue/cell types ( Table 2). We also conducted gene ontology analysis for the genes related to the identified CpG-SNPs and revealed significant enrichment of biological processes which are closely associated to protein localization and protein signal transduction (Table 3), such as protein localization to plasma membrane/cell periphery and regulation of Ras/Rho protein signal transduction gene ontology terms.

Discussion
Our study represents the first targeted GWAS testing CpG-SNPs that are potentially functional in bone-related cells for association with BMD variation. As epigenomic and transcriptomic profiles are often tissue-/cell-type specific, we speculated that only a subset of CpG-SNPs in the human genome will have functional impact on DNA methylation levels in specific tissues/cells. Therefore, it is necessary to select out those CpG-SNPs that are potentially functional in disease-/trait-related tissues/cells when performing CpG-SNP-focused association studies. One reasonable and efficient filtering strategy is to leverage the enormous available meQTL data in diverse tissues/cells. Unfortunately, meQTL data in skeletal cells were scarce; therefore, we used meQTL data from PBMs to select out 68,041 candidate CpG-SNPs that may be functional in regulating bone mass, considering the direct and close connections between PBMs and bone metabolism. These potential functional CpG-SNPs were enriched in 5 ′ /3 ′ -UTR regions but underrepresented in intergenic regions (Supplementary Figure 1). This result is largely in line with the recent findings in tissue-/ cell-type specific DNA methylation profiles, suggesting that methylation-mediated regulatory effects often occur beyond the promoter areas [18,62].
By using the data from five independent GWAS cohorts and the summary statistics from the GEFOS study, we identified significant/suggestive associations for 24 CpG-SNPs with lumbar spine BMD. These BMD-associated CpG-SNPs were mapped to six genes; some of which have not been reported for association with BMD in previous GWAS, such as NEK3 and NFATC1 genes. Our finding highlighted the enhanced power of targeted association analysis for identification of novel associations that were missed by traditional GWAS. Interestingly, several genomic regions, such as LRP5 and NEK3 regions, contained multiple significant/suggestive CpG-SNPs, suggesting that multiple neighboring CpG-SNPs may synergistically mediate the DNA methylation and gene expression of the target genes. This is consistent with the fact that methylation signals among neighboring CpG sites are often strongly correlated and regulatory elements that were mediated by methylation usually extend across various genomic regions [63]. LRP5 gene encodes a transmembrane protein which acts as a receptor for low-density lipoprotein. This transmembrane receptor initializes the process of receptor-mediated endocytosis by binding and internalizing their corresponding ligands [64]. It is well known for the critical role in bone homeostasis and several skeletal disorders [65]. Several Note: CpG-SNPs reached a genome-wide significance level (p value ≤ 7.86 × 10 −7 ) in discovery meta-analysis and/or joint analysis of discovery, and replication studies are marked in bold. Gene/CpG-SNP reported in previous GWAS for BMD is marked in italics. NA: SNPs were not available in the GEFOS 2015 data release. * This result was based on the GEFOS 2012 data release because this SNP is not available in the 2015 release.
common genetic variants of LRP5 gene have been demonstrated as potential risk factors in osteoporosis and fracture by previous GWAS [66,67]. For example, gain of functional variations in LRP5 gene leads to extremely high BMD [64] and loss of functional variations in LRP5 gene results in osteoporosis-pseudoglioma syndrome [68]. Interestingly, the recent study showed that the differentiation of monocytes can be negatively regulated by LRP5 gene through abrogation of the Wnt pathway which has an essential role in bone remodeling in both physiological and pathological conditions [69]. The other interesting gene is the NEK3, which also contained several significant/suggestive CpG-SNPs and enriched with strong transcription and enhancer histone modification marks in PBMs and a variety of other tissue/cell types. NEK3 gene encodes a member of the NimA-related serine/threonine kinases [70]. These kinases have been implicated as the significant regulators of cell migration [71] and also regulate microtubule acetylation in neurons [72]. Although most of these CpG-SNPs were annotated to introns of the NEK3 gene, the eQTL data from GTEx project suggested that these CpG-SNPs were strongly associated with the expression of NEK3 gene in diverse tissues. Notably, the previous study [73] that assessed the association of over 675,000 SNPs with transcriptome of PBMs in 1490 unrelated individuals showed that SNP rs2408611 in NEK3 gene has a strong cis-eQTL effect in PBM. This evidence may support that CpG-SNP-mediated epigenomic alterations may be an important mechanism underlying the association between NEK3 and BMD variation. However, its function in other tissues, including bone, remains largely uncharacterized. Another interesting gene is NFATC1. This gene encodes a transcription factor involved in T cell maturation. Importantly, NFATC1 can also regulate activity of a number of osteoclast-specific enzymes and/or other molecules, such as osteoclast-associated receptor, TRAP, calcitonin receptor, and cathepsin K through cooperation with MITF and c-Fos [74][75][76][77]. The important role of this gene in differentiation of osteoclast has been well established by several studies performed on genetically modified mutant mice [78,79]. For example, Winslow et al. [78] identified that the transgenic mice generated by crossing NFATC1-knockout mice with mice that express Tie2 promoter-driven NFATC1 exhibit an osteopetrotic bone phenotype, which may result from a severe defect in the osteoclastogenesis process. Therefore, understanding the molecular basis underlying the functional regulation of NFATC1 in osteoclasts may provide novel therapeutic strategies for bone diseases. Several potential limitations of this study should be concerned and addressed in the future. First, the selection of an appropriate cell model is crucial. Due to the limited meQTL studies in the bone cell models, here, we focused on CpG-SNPs that are also meQTLs in an osteoclastlineage cell, specifically, human PBMs. Although PBMs act as precursors of osteoclasts and act as the major target cells of sex hormones for bone metabolism, the ideal model cells for the osteoporosis study are bone cells, such as osteoblast,  osteoclast, and osteocyte. Second, the results of functional annotation exclusively depend on computationally predicted regulation features and further experimental validation should be conducted to confirm the biological significance of these potential functional CpG-SNPs. In summary, we performed a targeted GWAS analysis for potential functional CpG-SNPs and identified 2 novel BMD-associated genes, NEK3 and NFATC1. Our results highlighted the power of targeted analysis of potential functional variants for the identification of novel disease susceptibility loci that have been missed by a conventional GWAS approach. More importantly, our findings suggested that CpG-SNP-mediated DNA methylation changes may be a crucial biological mechanism to be considered in the interpretation of associations between common genetic variants, epigenetic process, and phenotypes of human diseases.

Data Availability
The chromatin data used to support the findings of this study have been deposited in the Roadmap Epigenomics Project repository. The genotype data used to support the findings of this study are available from the corresponding author upon request.

Disclosure
This manuscript was not prepared in collaboration with investigators of the WHI, has not been reviewed and/or approved by WHI investigators, and does not necessarily reflect the opinions of the WHI investigators or the NHLBI.

Conflicts of Interest
The authors declare that there is no conflict of interest. SNP selection, data cleaning, meta-analyses, data management and dissemination, and general study coordination was provided by the PAGE Coordinating Center (U01HG004801-01). The authors thank the authors who generously shared their data.

Supplementary Materials
Supplementary Table 1: basic characteristics of the studied samples. Supplementary Figure 1: distribution of CpG-SNPs in distinct genomic features. Supplementary  Figure 2: a regional association plots of significant/suggestive CpG-SNPs at LRP5 regions. (Supplementary Materials)