Investigating the GWAS-Implicated Loci for Rheumatoid Arthritis in the Pakistani Population

Rheumatoid arthritis (RA) is a complex and multifactorial autoimmune disorder with the involvement of multiple genetic and environmental factors. Genome-wide association studies (GWAS) have identified more than 50 RA genetic loci in European populations. Given the anticipated overlap of RA-relevant genes and pathways across different ethnic groups, we sought to replicate 58 GWAS-implicated SNPs reported in Europeans in Pakistani subjects. 1,959 unrelated subjects comprising 1,222 RA cases and 737 controls were collected from three rheumatology facilities in Pakistan. Genotyping was performed using iPLEX or TaqMan® methods. A total of 50 SNPs were included in the final association analysis after excluding those that failed assay design/run or postrun QC analysis. Fourteen SNPs (LINC00824/rs1516971, PADI4/rs2240336, CEP57/rs4409785, CTLA4/rs3087243, STAT4/rs13426947, HLA-B/MICA/rs2596565, C5orf30/rs26232, CCL21/rs951005, GATA3/rs2275806, VPS37C/rs595158, HLA-DRB1/rs660895, EOMES/rs3806624, SPRED2/rs934734, and RUNX1/rs9979383) were replicated in our Pakistani sample at false discovery rate (FDR) of <0.20 with nominal p values ranging from 4.73E-06 to 3.48E-02. Our results indicate that several RA susceptibility loci are shared between Pakistani and European populations, supporting the role of common genes/pathways.


Introduction
Rheumatoid arthritis (RA) is a complex autoimmune disorder due to the involvement of many genetic, environmental, and stochastic elements in its etiology [1,2]. It is characterized by continuous inflammation associated with altered expression of different proinflammatory (TNFα, IL-1, IL-17) and anti-inflammatory (IL-4, IL-10, IL-13, IL-35) cytokines and activation of B and T cells, which leads to the destruction of synovial joints and ultimately physical disability [3,4]. Worldwide, RA affects 0.5 to 1% of the population [5]. RA prevalence does not differ significantly between rural and urban areas [6]. It can affect both sexes at any age, but data suggest that women are three times more prone than men [7]. There is a significant genetic susceptibility to RA with 50-60% heritability estimates [8]. Twin studies also revealed high concordance rates in monozygotic twins (12.3-15.4%) relative to dizygotic twins (3.5%), strongly indicating the presence of a genetic component in RA [9].
During the last two decades, many genetic studies, including genome-wide association studies (GWAS), have been conducted to understand the genetic basis of RA and this has led to the identification of more than 50 risk loci [10]. Besides HLA-DRB1, many non-HLA loci have been implicated that are involved in multiple functions/pathways including immune activation (NF-κB signaling, JAK-STAT pathway, regulation, and activation of CD4 + T-cells, 2 Disease Markers complement activation), fibroblast differentiation and dedifferentiation, and bone modeling and repair [11,12]. Thus far, the major focus of genetic studies on RA has largely been on Europeans or European-derived populations, with only limited data available on other populations [13], especially in South Asians [14]. In an effort to expand genetic studies on RA in South Asians and to further delineate its genetic basis, we sought to replicate 58 genome-wide significant single nucleotide polymorphisms (SNPs) from 58 loci previously implicated in Europeans or European-derived populations [15][16][17][18] in Pakistani population where there is the paucity of such data.

Study Subjects.
A total of 1,959 subjects (1,222 cases, 737 controls) were included in our study. Blood samples and relevant clinical information were collected from three major public or private rheumatology clinics in Pakistan: Pakistan Institute of medical sciences (PIMS), Military hospital, and Rehmat Noor Clinic. All cases included in this study (mean age ± SD = 43:1 ± 12:33, 78.6% women) were clinically diagnosed by rheumatologists and met the American College of Rheumatology (ACR) 1987 classification criteria for RA [19]. All controls included in this study (mean age ± SD = 40:7 ± 12:49, 39.5% women) were recruited from the general population and were free of any autoimmune disease at the time of recruitment. A screening questionnaire was filled out and a written informed consent was obtained from each subject at the time of the recruitment. All blood samples were collected in EDTA tubes to avoid coagulation and processed shortly after the collection. The study was approved by the Institutional review board (IRB) of the University of Pittsburgh, USA (IRB no. PRO12110472).
2.2. Genomic DNA Extraction. Genomic DNA was extracted from whole blood using either the standard phenolchloroform extraction method or GeneJET Whole Blood Genomic DNA Purification (Thermo Scientific USA) and quantified using the NanoDrop™ 2000 spectrophotometer (Thermo Scientific USA).

Genotyping.
A total of 58 RA-associated genome-wide significant SNPs (p < 5E − 08) was selected from the previously published GWAS in subjects of European ancestry (Table 1). SNP genotyping was performed using either the TaqMan® (Applied Biosystems, Thermo Fisher Scientific) or iPLEX® Gold (Agena Bioscience) methods and following the manufacturer's design/order instructions and protocols. After the thermal cycling of the TaqMan® assays and DNAs on 384-well plates, the endpoint fluorescence reading was performed on a QuantStudio™ 12K Flex system (Applied Biosystems, Thermo Fisher Scientific). The iPLEX® Gold genotyping was performed in the Core laboratories of the University of Pittsburgh. 18% replicates were used to test genotyping consistency.

Statistical Analysis. Concordance to Hardy-Weinberg
Equilibrium (HWE) was tested using the Chi-square goodness of fit test. Departure from HWE was considered at arbitrarily p < 1E − 05. Logistic regression using an additive model and minor allele as the effect allele was employed for case-control association analysis using sex and age as covariates. The Benjamin Hochberg false discovery rate (FDR) was applied to correct for multiple testing [20]. p < 0:05 was considered as suggestive evidence of association and FDR (q value) of <0.20 as statistically significant as used in previous reports [21,22]. All analyses were implemented in R, version 3.4.4.

Functional Annotations.
To evaluate the potential biological significance of reported genome-wide significant SNPs, we used the Genotype-Tissue Expression (GTEx) database (https://gtexportal.org/home/) to search for expression quantitative trait loci (eQTL) in RA-relevant tissues and whole blood. We also used the RegulomeDB online database

Results
A total of 1,222 unrelated RA cases and 737 controls were recruited for this research study. The prevalence of RA was higher in females (78%) than males (22%), supporting the earlier data that females are more prone to RA [14]. Eight of the 58 genotyped SNPs failed the QC (quality control) during assay design (either iPLEX® Gold/Taq-Man® or both). The genotype distribution of all QCpassed 50 SNPs adhered to the HWE. The association analyses results in our Pakistani sample are presented in Table 2. Fourteen SNPs showed nominal significance at p < 0:05 and FDR of ≤0.20.
We found the same direction of association where the minor allele showed the same risk or protective effect as compared to the previous findings on the tested allele (major or minor). In our data, LINC00824/rs1516971 showed the most significant association (p = 4:73E − 06). The second most significant SNP was PADI4/rs2240336 (p = 5:00E − 05)  Figure 1 shows the distribution of tested SNPs across the genome where the SNPs with p value < 0.05 are labeled.
Next, we examined the functional significance of all 50 SNPs using the GTEx and RegulomeDB databases. Table 3 shows the category summaries of RegulomeDB scores, and Table 4 shows 14 SNPs that had a RegulomeDB score of ≤3, indicating strong evidence of potential regulatory role. SNPs falling in this category have more likelihood to affect the binding of transcriptional factors. Out of these fourteen SNPs, only five (HLA-B/MICA/rs2596565, HLA-DRB1/rs660895, C5orf30/rs26232, CTLA4/rs3087243, and RUNX1/rs9979383) had p < 0:05 in our association results.

Discussion
There have been a number of genome-wide association studies on RA over the past decade that have resulted in the identification of many RA susceptibility loci, thus improving our understanding of the complex genetic underpinning of RA [15][16][17]. Most of the recently published GWAS have been conducted on Europeans and North Americans, which have identified many new RA risk loci and replicated and confirmed the previously reported putative risk loci. Many of these reported risk loci are shared among different ethnic groups, while some are specific to certain populations [23]. Replication studies across different ethnic groups are necessary to better understand the globewide population specificity of these established RA risk loci and guide future directions. Since South Asians, including the Pakistani population, have significant European ancestry [24,25], we chose European originated GWAS-implicated SNPs for replication in Pakistanis. For this purpose, we enrolled 1,959 unrelated RA cases and controls from two public and private Rheumatology facilities, where people from different demographic regions of Pakistan visit for treatment, and we successfully genotyped 50 GWAS-implicated SNPs in those subjects.
In our data set, rs1516971 was the most significant SNP (p = 4:73E − 06), which is an intronic variant in the LINC00824 gene at chromosome 8q24.21. Previously, this variant has also been reported in a trans-ethnic association study of RA (p = 1:3E − 10) [16]. Another SNP (rs6651252) Disease Markers from the same region, which is in complete LD (r 2 = 1:00) with rs1516971, has also been reported to be associated with RA and Crohn's disease [15,26,27]. GTEx expression data suggests that rs1516971 affects the expression of RP11-89M16.1 at chromosome 8 in transformed fibroblast cells (p − eQTL = 0:0157).

Conclusions
We were able to successfully replicate 14 of 50 SNPs selected from previously published GWAS results in European populations in our unique Pakistani population. These findings suggest that there is a sharing of RA risk loci among different population groups. A weakness of our study is the availability of relatively small-sized sample, which may have prevented 36 SNPs from achieving the significance threshold, although they showed a trend for the same directional effects as the reported ones. Further studies using larger samples may help to identify and replicate more RA risk loci in the Pakistani population.

Data Availability
The data used to support the findings of this study are included within the article.

Conflicts of Interest
All authors declared no competing interests.