Influence of Genetic Variants in EGF and Other Genes on Hematological Traits in Korean Populations by a Genome-Wide Approach

Hematological traits are important health indicators and are used as diagnostic clinical parameters for human disorders. Recently, genome-wide association studies (GWAS) identified many genetic loci associated with hematological traits in diverse ethnic groups. However, additional GWAS are necessary to elucidate the breadth of genetic variation and the underlying genetic architecture represented by hematological metrics. To identify additional genetic loci influencing hematological traits (such as hematocrit, hemoglobin concentration, white blood cell count, red blood cell count, and platelet count), we conducted GWAS and meta-analyses on data from 12,509 Korean individuals grouped into population-based cohorts. Of interest is EGF, a factor plays a role in the proliferation and differentiation of hematopoietic progenitor cells. We identified a novel EGF variant, which associated with platelet count in our study (P combined = 2.44 × 10−15). Our study also replicated 16 genetic associations related to five hematological traits with genome-wide significance (P < 5 × 10−8) that were previously established in other ethnic groups. Of these, variants influencing platelet count are distributed across several genes and have pleiotropic effects in coronary artery disease and dyslipidemia. Our findings may aid in elucidating molecular mechanisms underlying not only hematopoiesis but also inflammatory and cardiovascular diseases.


Introduction
Hematological metrics are used as essential medical indicators [1].Maintenance of homeostasis is linked to physiological pathways that can be tested via blood chemistry panels [2].Variation in hematological traits is heritable [3,4].Recently, genome-wide association studies (GWAS) have revealed hundreds of genetic loci associated with hematological traits [5][6][7][8].Many of associated loci with hematological traits are shared between different ethnic groups.Despite success of discovery of large number of disease-associated variants, less than 10% of the heritability was explained by identified variants [9].
In addition, previous studies illustrated that significant differences in hematological traits exist between ethnic groups.For example, African Americans tend to have lower white blood cell counts, whereas persons of Japanese descent generally have fewer red blood cell-related anomalies than typically seen in other populations [10,11].These observations may suggest that there is a genetic basis for many hematological traits and investigation of unveiled variants is still required [1].And also, previously identified common loci have yet

Study Subjects.
We performed GWAS based on 5 hematological traits (Hb, Hct, WBC, RBC, and PLT) with data from 12,509 subjects from two population-based cohorts that are comprised in the Korean Genome Epidemiology Study (KoGES).In discovery stage, we analyzed data for 8,842 subjects from the Korea Association Resource (KARE) project of KoGES [12].To validate our discovery stage results, 3,667 healthy subjects in the Cardio Vascular Disease Association Study (CAVAS) of KoGES were used for the replication stage.For further replication of a novel locus, 8,053 subjects taking part in the Health2 study of KoGES and 23,032 Japanese subjects from the BioBank Japan project were selected for analyses.The descriptive statistics of each cohort are described in Supplementary Table 1 (available online at http://dx.doi.org/10.1155/2015/914965).And more detailed explanations of each cohort were previously described [6,12,13].
This study was approved by the ethics committee of the Korea Centers for Disease Control and Prevention's Institutional Review Board, and all of study subjects provided written informed consent prior to taking part in the study.

Phenotype Determination. Hematological trait values
were available for up to 20,562 subjects (8,842 KARE subjects, 3,667 CAVAS subjects, and 8,053 Health2 subjects) taking part in KoGES.Fasting blood samples were drawn from study subjects into a test tube containing an anticoagulant (e.g., EDTA), and relevant traits were measured or calculated using an automated electronic cell counter, ADIVA 120 hematology system by Bayer Diagnostics, USA.

Genotyping and Quality Control.
In the discovery stage, 10,004 KARE study samples were genotyped by the Affymetrix Genome-Wide Human SNP array 5.0.Our quality control criteria are as follows: samples (i) with missing genotype call rate (>4%), (ii) with excessive heterozygosity (>30%), (iii) with gender inconsistencies, and (iv) from subject with cancer; SNPs with (i) missing genotype call rate (>5%), (ii) low MAF (<0.01), and (iii) Hardy-Weinberg equilibrium ( < 1 × 10 −6 ) were excluded.Following quality control analyses, data for 8,842 subjects and 352,228 SNPs were retained for further study.For in silico replication data, the 4,034 CAVAS samples were genotyped using the Illumina HumanOmni1-Quad BeadChip.The report file containing input signal intensity of samples was converted using the Illumina BeadStudio software package.Following quality control, 3,667 samples and 730,073 SNPs were deemed appropriate for analyses.Detailed quality control sample criteria and the genotypes from the two cohorts were described previously [12,13].
For further replication of a novel locus, analyses in two methods were performed, de novo and in silico replication analysis.In de novo replication, we genotyped a SNP with the GoldenGate assay (Illumina Inc.) using 8,053 samples from Health2 study.The genotype success rate was 99.9%.In in silico replication, we used imputed data of 4q25 region based on genotype data using Illumina Human610-Quad BeadChip in 23,032 samples from the BioBank Japan project at the Center for Integrative Medical Sciences, RIKEN.The quality control criteria of Japanese samples and SNPs were described previously [6].

Statistical Analyses.
To investigate the genetic causes for the five specified hematological traits, we carried out GWAS using a linear regression model via the PLINK program (http://pngu.mgh.harvard.edu/∼purcell/plink/)[16].Phenotypes used in the analyses were approximately normally distributed, and age and gender were incorporated into the analyses as covariates.We conducted meta-analyses for selected SNPs that exceeded our criteria of  < 1 × 10 −5 in the discovery stage and  < 0.05 in the replication stage, with the inverse variance method using the METAL program (http://genome.sph.umich.edu/wiki/METAL)[17].After meta-analyses, SNPs with the accepted genome-wide significance level ( < 5×10 −8 ), which reflected testing of one million SNPs [18], were considered statistically significant.

Association Analyses with Related Traits (Coronary Artery Disease and Lipid Profiles).
As we were interested in the effects of genome-wide significant SNPs on PLT, associations of each SNP with the lipid profile metrics (total cholesterol (TC), triglyceride (TG), LDL-cholesterol (LDL), and HDLcholesterol (HDL)) and CAD were implemented using 8,842 KARE subjects and CAD 2,123 cases and 2,690 controls that were previously published, respectively [15,19].Age and gender were used as covariates in all analyses.

Results
We conducted GWAS on 1,590,162 common SNPs (minor allele frequency (MAF) > 1%) and five hematological traits, namely, Hb, Hct, WBC, RBC, and PLT, for 8,842 subjects of the KARE project [12].We carried forward SNPs of our top association results that satisfied the threshold ( < 1 × 10 −5 ) for replication for 3,667 subjects in the CAVAS study, which represented a rural population-based cohort.Thirtytwo variants were validated with statistical significance ( < 0.05) in CAVAS study (Supplementary Table 3).Descriptive information for the study samples and the inflation of test statistics (genomic control) are shown in Supplementary Table 1 and Supplementary Table 2, respectively.The quantilequantile plots for five hematological traits are presented in Supplementary Figure 1.
For the 12,509 data we used, we identified 17 genetic regions including one novel genetic association for PLT (4q25, on the EGF gene) that reached our threshold for genome-wide significance ( < 5 × 10 −8 ), one for Hb, six for RBC, two for WBC, and six for PLT and one region (6q23.3)associated with three traits (Hct, RBC, and PLT) (Table 1 and Figure 1).

Pleiotropic Effect of PLT Related Variants on CAD and
Lipid Profile.We examined associations between seven PLTassociated variants with genome-wide significance and other traits related to CAD and lipid profile, including TC, TG, LDL, and HDL (Table 2).Two variants near HBS1L-MYB and PNPLA3 were associated with three lipids (TC, TG, and LDL), respectively.Rs739496, located on 3  -UTR of SH2B3, was associated with both decreased platelet count and a decreased risk for CAD (Table 2).Other variants did not have compelling associations with these five traits.

Discussion
Recently, numerous genetic loci for hematological traits were discovered through several GWASs of European, African American, and Japanese populations [5][6][7]22].Using a similar approach, we screened data for 12,509 Korean individuals and confirmed the participation of 16 known loci associated with hematological traits and also identified one novel genetic locus affecting PLT.The SNP rs2282786 located on EGF in 4q25 showed a strong association with PLT with genomewide significance ( combined = 2.44 × 10 −15 ) by combined meta-analysis in 43,594 individuals (20,562 Koreans and 23,032 Japanese) (Table 1, Figure 2, and Table 3).This SNP also showed an ethnicity-based difference in allele frequency (Supplementary Figure 2).This is a compelling discovery and provides evidence of a divergent genetic background based on ethnic differences seen in hematological traits.

BioMed Research International
Additionally, to examine the association between genetic variants and the level of gene expression, the novel PLT-associated locus was cross-referenced with expression quantitative trait loci (eQTL) associations using genetic variation and gene expression profiling data from Gene Expression Variation (GENEVAR) (http://www.sanger.ac.uk/resources/software/genevar)[23].These data were based on lymphoblastoid cell lines (LCLs) from 162 HapMap3 individuals (80 CHB + 82 JPT) [24].An intronic SNP, rs4698756 on EGF, in weak linkage disequilibrium (LD) ( 2 = 0.229,   = 0.787) with rs2282786, showed a statistical significant cis-regulatory effect on gene expression levels of EGF in Chinese populations ( = 0.0401) (data are not shown).Furthermore, to elucidate the regulatory function of the locus, we surveyed the Encyclopedia of DNA Elements (ENCODE) features such as regulatory chromatin states, DNAse hypersensitivity, and ChIP-seq experiment using UCSC Genome Browser (http://genome.ucsc.edu/).According to the functional annotation based on ENCODE data, rs4698756 lies within regulatory functional elements comprising transcription factor binding sites, DNase clusters, and proteins required for chemical modification of histones.Even though the extent of LD between rs2282786 and rs4698756 was not so strong to use rs4698756 as a direct surrogate of rs2282786, this functional information may suggest the possibility of the regulation of EGF expression that may modulate platelet counts.
The EGF gene encodes epidermal growth factor; the encoded protein acts as a potent mitogenic factor, playing an important role in the growth, proliferation, and differentiation of numerous cell types [MIM: 131530].It may play a role in growth, proliferation, and differentiation of megakaryocytes and platelet production.Previous studies reported that activated platelets induced by inflammation may secrete EGF and proinflammatory substances for subsequent thrombus formation in an inflammation-hemostasis cycle that is a tightly interrelated pathophysiologic process [25,26].It is well known that platelets play an important role in CAD both in the pathogenesis of atherosclerosis and in the development of acute thrombotic events.Accordingly, high blood lipid levels can enhance platelet aggregation, causing CAD [27].The resulting associations related to the identified PLT, CAD, and lipid profile loci suggested that they also have pleiotropic effects in the process of CAD and dyslipidemia (Table 2).Two variants located on HBS1L-MYB and PNPLA3, and one variant on SH2B3, that associated with PLT, also were significantly associated with lipids (TC, TG, and LDL) and CAD, respectively.Among them, SH2B3 encodes a member of the SH2B adaptor family of proteins [MIM: 605093] that plays a critical role in hematopoiesis involving blood coagulation and erythropoietin signaling pathways [28].This gene has previously been found to be associated with type 1 diabetes [29], cardiovascular diseases [30], and hypertension [31].
To date, many studies have reported genetic factors associated with hematological traits via GWAS across diverse ethnic groups [5][6][7]22].According to previous transethnic studies, the most commonly revealed variants were replicated in all of ethnic groups and have a species-wide role in biological pathways of hematopoiesis.However, these common genetic loci may account for a low percentage of hematological trait heritability [32].Therefore, to identify genetic mechanisms underlying these traits, additional large-scale population based studies incorporating multiple rare variants or gene-gene and gene-environment interactions should be undertaken.And performing functional experiments may also help to validate the results of statistical analyses for human hematological traits.
In summary, we illustrated that a genome-wide approach identified genetic variants contributing to phenotypic variation of hematological traits in Korean populations.We identified one novel ethnic specific variant associated with PLT that localized to a key regulator of hematopoiesis and confirmed previously implicated loci that were associated five hematological traits.We also provided pleiotropic effects of PLT-associated variants that may support the biological role of genetic determinants for hematological traits.Our findings may help identify biological pathways that contribute not only to hematopoiesis but also to inflammatory and cardiovascular diseases in humans.

Figure 1 :
Figure 1: Manhattan plots of the GWAS for five hematological traits in discovery stage.Vertical axis indicates −log 10  values of SNPs in the GWAS for Hb, Hct, RBC, WBC, and PLT, and horizontal axis represents chromosomal position.The genetic loci that exceeded the genomewide significance threshold of  < 5.0 × 10 −8 are marked in red in each of the traits.

Figure 2 :
Figure 2: A regional association plot of the novel genetic locus associated with PLT.Round shaped dots represent −log 10  values of SNPs from combined meta-analyses (discovery and replication stages).Diamond shaped purple dot that is located in the center of 0.8 Mb of 4q25 genomic region indicates the strongest signal associated with PLT.The color of each dot indicates the level of linkage disequilibrium (LD),  2 , relative to the SNP rs2282786.At the bottom, the locations of RefSeq genes are represented.Plot was generated from JPT + CHB panel based on hg18 HapMap phase 2 using LocusZoom.

Table 1 :
Results of genome-wide association analyses for hematological traits., chromosome; BP, base position; MAF, minor allele frequency; Hb, hemoglobin; Hct, hematocrit; RBC, red blood cell count; WBC, white blood cell count; PLT, platelet count.Effect sizes are shown as beta ± S.E.A test of heterogeneity ( CHRhet ) was conducted; , Cochrane's  value based on chi-squared statistics.Age and gender were used in analyses as covariates.

Table 2 :
Associations with CAD and lipid profiles for loci-associated PLT with genome-wide significance.

Table 3 :
Association results of rs2282786 for platelet count.