Association Study of Coronary Artery Disease-Associated Genome-Wide Significant SNPs with Coronary Stenosis in Pakistani Population

Genome-wide association studies (GWAS) of coronary artery disease (CAD) have revealed multiple genetic risk loci. We assessed the association of 47 genome-wide significant single-nucleotide polymorphisms (SNPs) at 43 CAD loci with coronary stenosis in a Pakistani sample comprising 663 clinically ascertained and angiographically confirmed cases. Genotypes were determined using the iPLEX Gold technology. All statistical analyses were performed using R software. Linkage disequilibrium (LD) between significant SNPs was determined using SNAP web portal, and functional annotation of SNPs was performed using the RegulomeDB and Genotype-Tissue Expression (GTEx) databases. Genotyping comparison was made between cases with severe stenosis (≥70%) and mild/minimal stenosis (<30%). Five SNPs demonstrated significant associations: three with additive genetic models PLG/rs4252120 (p = 0.0078), KIAA1462/rs2505083 (p = 0.005), and SLC22A3/rs2048327 (p = 0.045) and two with recessive models SORT1/rs602633 (p = 0.005) and UBE2Z/rs46522 (p = 0.03). PLG/rs4252120 was in LD with two functional PLG variants (rs4252126 and rs4252135), each with a RegulomeDB score of 1f. Likewise, KIAA1462/rs2505083 was in LD with a functional SNP, KIAA1462/rs3739998, having a RegulomeDB score of 2b. In the GTEx database, KIAA1462/rs2505083, SLC22A3/rs2048327, SORT1/rs602633, and UBE2Z/rs46522 SNPs were found to be expression quantitative trait loci (eQTLs) in CAD-associated tissues. In conclusion, five genome-wide significant SNPs previously reported in European GWAS were replicated in the Pakistani sample. Further association studies on larger non-European populations are needed to understand the worldwide genetic architecture of CAD.


Introduction
Coronary artery disease (CAD) is more prevalent among South Asians than any other ethnic groups [1,2]. Strong familial aggregation, twin studies, and established genetic associations have confirmed the important role of genetics in CAD [3][4][5][6][7]. Dyslipidemia has been identified as a major risk factor for CAD [8], and growing evidence shows that endothelial dysfunction, persistent inflammatory response, and impaired coagulation cascade also play central roles in the initiation and progression of the disease [9][10][11]. Genome-wide association studies (GWAS) for CAD carried out in populations of European descent have identified multiple loci [12,13] that highlight the potential involvement of many pathways in CAD pathogenesis. It is of great importance to replicate reported GWAS associations in independent samples and different populations to assess the generalization of such associations.
The aim of this study was to perform a replication study of CAD-associated genome-wide significant singlenucleotide polymorphisms (SNPs) reported among Europeans in a Pakistani sample comprising clinically ascertained and angiographically confirmed cases. We evaluated 47 genome-wide significant SNPs at 43 CAD loci that are involved in lipid metabolism, inflammation, coagulation, and endothelial function. Since most of the genome-wide significant SNPs are located in noncoding regions which are important for gene regulation [14][15][16], we also performed functional annotations of significant SNPs using the Regulo-meDB and Genotype-Tissue Expression (GTEx) databases.

Materials and Methods
The study cohort consisted of 663 ethnically Pathan subjects (22% female; mean age ± SD: 54 ± 11 years), who presented with chest pain to the Cardiology Unit of the Lady Reading Hospital, Peshawar, Pakistan and were enrolled consecutively for one year after obtaining written informed consent. All subjects were assessed by coronary angiography either to confirm or rule out the CAD. We stratified the study participants based on the coronary angiographic findings: those with ≥70% stenosis were classified as having severe coronary stenosis (n = 506), and those with <30% stenosis were considered as having mild/minimal coronary stenosis (n = 157). Only those CAD patients who had a first episode of disease and had neither started lipid lowering nor antihypertensive, antiinflammatory, or anticoagulant drugs were included in the study. 98% of the subjects in both groups had a sedentary life style, and only 2% were taking exercise regularly. In both groups of subjects, carbohydrates were the major energy source (60%) along with saturated fats (30%) and proteins (10%). The Institutional Review Board approved the study. Average blood pressure was obtained after two readings. Height and weight were measured, and electrocardiograph (ECG) was performed. All enrolled subjects were also asked about family history of CAD and smoking.
2.1. Biochemical Profile. Five ml of blood was drawn in a plain tube, and serum was analyzed for high-density lipoprotein cholesterol (HDL-C), triglycerides (TG), total cholesterol (TC), blood sugar fasting (BSF), and creatinine. LDL-C was calculated using the Friedewald equation in samples with TG < 400 mg/dl.

Genetic Variant Selection.
Information on 47 genomewide significant SNPs (p < 5 × 10 −8 ) that were selected for replication in our Pakistani sample is provided in Table S1. The selected loci/genes had putative roles in lipid metabolism, coagulation, inflammation, and endothelial dysfunction [17].
2.3. Genotyping. A sample of blood (5 ml) was taken in an EDTA tube from each subject to be processed for DNA extraction. Genomic DNA was isolated from leukocytes using the Qiagen DNeasy Kit. Five to 10 ng of DNAs dried on 384-well plates was processed for genotyping using the iPLEX Gold technology at the University of Pittsburgh Genomics and Proteomics Core Laboratories. The quality of the genotype data was checked for reproducibility by repeating 10% of the samples.

Statistical Analysis.
The basic quantitative traits of cases and controls were compared by an independent sample t -test and qualitatively by the Chi-square test. The Hardy-Weinberg Equilibrium (HWE) test was performed on all genotype data, and variants with a HWE p value < 0.001 were removed from the analysis. The association of SNPs with CAD was determined by logistic regression analysis under additive and recessive genetic models using sex, age, and family history of CAD as covariates (Tables S2 and S3). Family history of CAD was used as a covariate because this is an independent risk factor for CAD, especially in South Asians [18]. Statistical analyses for genetic association were performed using R software (https://www.r-project.org/). The Benjamini-Hochberg's false discovery rate (FDR) method [19] was employed for multiple testing corrections, and an FDR value (q value) of <0.20 was considered statistically significant along with nominal p < 0:05.
2.5. Linkage Disequilibrium (LD) and Functional Annotation of Significant SNPs. Significant SNPs and their proxies were assessed for their putative functional effects on gene expression using the SNAP web portal (https://data.broadinstitute .org/mpg/snpsnap/app/bootface2.py) to determine the closely linked SNPs (r 2 ≥ 0:80) followed by their regulatory effect assessment in RegulomeDB (online database: http:// regulome.stanford.edu/). The eQTLs of significant SNPs were assessed by the Genotype-Tissue Expression (GTEx) database (https://gtexportal.org/home/). The RegulomeDB scoring scheme consists of 6 categories: category 1 indicates the strongest evidence for variants to be regulatory by affecting transcription factor (TF) binding and linking to the expression of a gene target; category 2 indicates variants likely to affect TF binding; category 3 indicates variants less likely to affect TF binding; and categories 4-6 indicate variants with minimal TF-binding evidence [20,21].

Demographic Data and Biochemical
Profile. The basic characteristics and biochemical profiles of 506 subjects with severe coronary stenosis and 157 subjects with mild/minimal coronary stenosis are provided in Table 1. We observed statistically significant differences between the two groups for age, body mass index (BMI), blood pressure (BP), LDLcholesterol (LDL-C), and family history of CAD (Table 1).

Functional Analysis.
We used the RegulomeDB and GTEx databases to determine the functional nature of the five significant SNPs and those in LD with these SNPs. Proxy SNPs and eQTLs for the five significant SNPs (rs2505083, rs4252120, rs2048327, rs602633, and rs46522) are listed in Table S4. PLG/rs4252120, with a RegulomeDB score of 6, was not functional itself. However, it was in complete LD (r 2 = 1) with two PLG variants located in intron 11 (rs4252126) and intron 12 (rs4252135), each with a RegulomeDB score of 1f, indicating that these SNPs likely affect TF binding and are also potentially linked to the expression of gene targets (eQTL). Of these two PLG SNPs, rs4252126 affects the binding of CTCF, RUNX3, TEAD4, and RAN21, while rs4252135 affects the binding of CTCF, FOXA1, NFKB1, RAD21, ZNF263, SMC3, ZNF143, and FOXA2. KIAA1462/rs2505083, with a RegulomeDB score of 5, was also not functional. Rather, it was in LD with rs3739998 (r 2 = 0:87) having a RegulomeDB score of 2b, indicating that this likely affects TF binding. rs373998 is located in exon 2 of KIAA1462 and affects the binding of CTCF, MYC, PAX5, and ZNF143. SLC22A3/rs2048327 was neither functional itself nor in LD with any functional SNP. SORT1/rs602633 is not functional but in LD with three functional SNPs (rs646776, r 2 = 0:86; rs12740374, r 2 = 0:86; and rs629302, r 2 = 0:85). UBE2Z/rs46522 was functional itself and also in LD with 12 functional SNPs (Table S4).

Discussion
In this study, we sought the replication of 47 previously GWAS-implicated CAD risk variants among Europeans in the Pakistani population and found a significant association of five of the variants with coronary stenosis (PLG/rs4252120, KIAA1462/rs2505083, SLC22A3-LPAL2-LPA/rs2048327, SORT1/rs602633, and UBE2Z/rs46522).
The comparison of frequency, type, and direction of effect alleles of five significant SNPs with Europeans showed a difference only for PLG/rs4252120, where C was the risk allele in Pakistanis but T was the risk allele in European populations [17]. The reason of this discrepancy is not clear. A possible explanation may be due to different patterns of LD between Pakistanis and Europeans. Based on the RegulomeDB score, this variant was not functional but rather in LD with two functional variants. Plasminogen encoded by the PLG gene on chromosome 6 breaks down the fibrin clot to ward off the coagulation process. Although the precise mechanism by which PLG variants may affect atherosclerosis needs to be elucidated, genetic variation in this gene may delay and disrupt the process of fibrin resolution leading to clot buildup. Hence, the event of myocardial infarction (MI) becomes almost inevitable in the presence of overwhelmed clot formation. PLG is located in close proximity to the LPA gene on chromosome 6 [22,23]. PLG/rs4252120 has been shown to be associated with plasma levels of Lp(a), and higher Lp(a) level is a risk factor for MI [24,25]. Additionally, PLG has been shown to be part of one of four CAD risk loci (APOA1, MRAS, IL6R, and PLG) that are involved in the acute inflammatory response signaling pathway [26].
The second significant variant, KIAA1462/rs2505083, is also intronic that has shown association with CAD, independent of lipid levels, smoking, hypertension, and diabetes mellitus [27,28]. Consistent with prior observations, the association of this variant in our study was also independent 3 Disease Markers of the examined and established nongenetic risk factors, signifying the involvement of a novel pathway in CAD pathogenesis besides hyperlipidemias. According to Regulo-meDB, KIAA1462/rs2505083 is not a functional SNP itself but it is in LD with KIAA1462/rs373998 which is a functional SNP and located in exon 2 of KIAA1462. It harbours a missense mutation that results in a serine-to-threonine change at position 1002 for the JCAD protein [27]. In the GTEx eQTL analysis, KIAA1462/rs2505083 was associated with the RNA expression difference of the KIAA1462 gene in the aorta artery. JCAD is considered as a novel component of VE-cadherin (a cell-to-cell junction protein [28]. The interendothelial cell junctions play an important role in vessel wall integrity. Tight junctions (TJs) regulate the   Disease Markers paracellular transport of the cell, while adherent junctions (AJs) are responsible for cell adhesion. VE-cadherin is a major adhesion molecule of AJs, responsible for cell adhesion, and JCAD has been identified as an integral part of VE-cadherin [28]. Although it may be premature to predict a causative role of the JCAD protein in CAD pathogenesis, it is noteworthy that the loss of junctional integrity of endothelial cells is the first step for the initiation of atherosclerotic plaque formation. The JCAD-attributed endothelial cell-cell junctional integrity may be critical in maintaining the mechanical strength of the vessel wall, and any dysfunction can lead to the loss of this support and may promote the atherosclerotic process. Endothelial dysfunction due to the loss of adhesion is a relatively new concept in disease pathogenesis, and its potential role in CAD could provide new research avenues. SLC22A3/rs2048327 is located in intron 8 of SLC22A3. SLC22A3 is a solute carrier family membrane protein and has a critical role in the elimination of endogenous and organic cations. It may affect the body blood pressure, which is a significant risk factor for CAD. This SNP is most likely functional, as this is an eQTL in the left ventricle of the heart. Sallinen et al. [29] have also provided evidence of the association of SLC22A3/rs2048327 in diabetic nephropathy and hypertension.
Because of the relatively high consanguinity rate in the population, the use of the recessive model enabled us to detect two additional associations in the SORT1 and UBE2Z regions. SORT1/rs602633 is located downstream of the gene cluster CELSR2-PSRC1-MYBPHL-SORT1. It encodes sortilin. Kjolby et al. [30] provided the first link between atherosclerosis and sortilin. He demonstrated the deletion of the sortilin protein in the LDL receptor in mice and correlated it with reduced atherosclerotic plaque size [31]. His observation was reinforced by Patel et al. [32]. Sortilin also plays an important role in an inflammatory reaction and foam cell occurrence during atherosclerotic plaque formation as revealed by some of the animal studies [33]. Since our study as well as GWAS showed an association of coronary artery disease with this gene, it would be interesting to provide mechanistic insights of sortilin in humans.
UBE2Z/rs46522 is located in intron 2 of the ubiquitinconjugating enzyme E2 Z gene. Lu et al. showed an association of UBE2Z/rs46522 with CAD in the Chinese Han population [34]. Although there is not a single study that has described the direct role of this protein in CAD pathogenesis, the ubiquitin protein is vital in intracellular cell signaling and regulates the important pathways implicated in cell growth and viability [35]. Aberrations in ubiquitin signaling can lead to the pathogenesis of many human diseases, including CAD. The use of a comparable number of CAD samples with that used by us but with a much smaller number of SNPs has been reported in two Pakistani samples. While one study examined 13 CAD risk SNPs in relation to premature CAD (mean age =~40) assessed by angiography in a total sample of 650 [36], the other study screened 6 SNPs in 624 subjects where CAD cases were assessed only clinically (no angiography) and controls were apparently healthy subjects [37]. While no significant association was observed in the latter case-control study, 5 nominal significant associations were seen in the angiography-assessed premature CAD, including APOE, which we have also previously reported in our angiography-assessed sample [38]. Of the other 4 significant SNPs reported by Ansari et al. [36], only one SNP (MIA3/rs17465367) overlapped with our SNPs and those of Shahid et al. [37], which was not significant in both studies. Ansari et al. [36] also reported a nominal significant association with SORT1/rs646776 (p = 0:02), where we also found significant association in SORT1, although with a different SNP, rs602633 (p = 0:005). Thus, 1 of 5 associations reported by Ansari et al. [36] in an angiographyassessed sample are confirmed in our comparable sample. On the other hand, all of our 4 significant SNPs were not examined by either of the reported studies.
The following limitations should be considered before generalizing our results in the Pakistani population. The sample size was small, and subjects were collected from a cardiology hospital where patients present to the hospital in the advanced stage of their disease. Only a small number of cases with mild/minimal coronary stenosis were available for comparison with severe coronary stenosis. Despite this limitation, we have previously reported a significant association of APOE with coronary stenosis in our sample [38] which has been confirmed by another study that also used an angiography-assessed small sample [36]. Recently, more than 160 loci have been implicated with CAD [4,39]. Due to funding constraints, we used mostly the top GWAS hit in each of the 43 gene regions in our replication study. Often, GWAS help to identify a gene region rather than a specific gene and the identified SNPs are rarely causal. For this reason, more SNPs need to be genotyped in a region to replicate a locus in a different population. Genotyping of additional SNPs in each gene region may have replicated more loci in our population. In summary, we have replicated 5 of the 47 previously reported genome-wide significant SNPs in our sample. The five genes have a role in endothelial dysfunction, coagulation disorder, and hypertension which are critical processes that initiate and sustain CAD. The screening of SNPs in all known CAD gene regions in a much larger sample may help to better understand the genetic risk profile in the Pakistani population that will help improve risk prediction, prevention, and treatment approaches.

Conclusions
Five genome-wide significant SNPs previously reported in European GWAS were also replicated in the Pakistani sample. Further association studies on larger non-European populations are needed to understand the worldwide genetic architecture of CAD.

Data Availability
The data supporting the results of our study are included in this research paper.

Disclosure
We acknowledge that an earlier version of this research has been presented as a conference abstract in the 12th Asia-Pacific Conference on Human Genetics and is available on the following link: http://atm.amegroups.com/article/view/ 16658/16900.

Conflicts of Interest
The authors declare that they have no conflicts of interest.

Acknowledgments
Dr. Asma Naseer Cheema was a visiting scholar at the University of Pittsburgh supported by the Higher Education Commission of Pakistan. This research paper is written as part of a PhD research project, and it was partially funded by the Higher Education Commission of Pakistan.

Supplementary Materials
Table S1: list of selected variants (identified in Europeans) replicated in the Pakistani population. Table S2: logistic regression analysis of all the SNPs (additive genetic model). Table S3: logistic regression analysis of all the SNPs (recessive genetic model). Table S4: functional annotation of significant SNPs. (Supplementary materials)