Whole-Genome Sequencing Identifies the Egl Nine Homologue 3 (egln3/phd3) and Protein Phosphatase 1 Regulatory Inhibitor Subunit 2 (PPP1R2P1) Associated with High-Altitude Polycythemia in Tibetans at High Altitude

Background The hypoxic conditions at high altitudes are great threats to survival, causing pressure for adaptation. More and more high-altitude denizens are not adapted with the condition known as high-altitude polycythemia (HAPC) that featured excessive erythrocytosis. As a high-altitude sickness, the etiology of HAPC is still unclear. Methods In this study, we reported the whole-genome sequencing-based study of 10 native Tibetans with HAPC and 10 control subjects followed by genotyping of selected 21 variants from discovered single nucleotide variants (SNVs) in an independent cohort (232 cases and 266 controls). Results We discovered the egl nine homologue 3 (egln3/phd3) (14q13.1, rs1346902, P = 1.91 × 10−5) and PPP1R2P1 (Protein Phosphatase 1 Regulatory Inhibitor Subunit 2) gene (6p21.32, rs521539, P = 0.012). Our results indicated an unbiased framework to identify etiological mechanisms of HAPC and showed that egln3/phd3 and PPP1R2P1 may be associated with the susceptibility to HAPC. Egln3/phd3b is associated with hypoxia-inducible factor subunit α (HIFα). Protein Phosphatase 1 Regulatory Inhibitor is associated with reactive oxygen species (ROS) and oxidative stress. Conclusions Our genome sequencing conducted in Tibetan HAPC patients identified egln3/phd3 and PPP1R2P1 associated with HAPC.


Introduction
In excess of 140 million humans have primarily lived in the high-altitude regions of various locations around the world, including the Qinghai-Tibet plateau in Asia, the Andes Mountains of South America, and the Ethiopian plateau in East Africa [1]. All dwellers in plateaus like native Tibetans of the Qinghai-Qinghai-Tibet plateau are regarded as one of the best dwellers adapted to high altitudes. They possess heritable adaptations to the hypoxic condition [2], as they display lower hemoglobin and hematocrit [3], higher oxygen saturation of blood [4], and better work efficiency [5], which are conducive to adapting the high-altitude and hypoxic situations. However, some Tibetans living at high altitude with genetic adaptations remain maladapted and possess symptoms of high-altitude polycythemia (HAPC).
HAPC is diagnosed with excessive erythrocytosis encountered by around 10% of the population residing at the Qinghai-Tibet plateau [6,7]. HAPC shows several impacts on blood viscosity with various harmful consequences such as dysfunction of microcirculation, vascular thrombosis, extensive organ damage, sleep disturbances, and even death [7,8]. In high-altitude dwellers, hypobaric hypoxia is thought to be one of the primary initiators of HAPC. However, the mechanisms underlying the pathogenesis and genetic basis of HAPC have not been well understood.
In recent years, the genetic basis underlying adaptation to high altitude in Tibetans has been elucidated by using exome and single nucleotide polymorphism (SNP) array data [9][10][11]. Furthermore, gene candidate approaches in Tibetan have identified that genomic variants in EPAS1, CYP17A1, CYP2E1, and mitochondrial DNA 10609T are associated with HAPC [12][13][14][15]. However, comprehensive genome-wide scanning to detect the variations associated with HAPC has not been performed. It has been known that the variants that occurred in the genome-wide association study (GWAS) may account for some diseases with only relatively modest proportions and numerous deleterious variants with low frequency in the population.
With the technical development of high-throughput sequencing, it provides an opportunity to resequence multiple genetic regions and recognize novel risk alleles in diseases including high-altitude sickness [1,[16][17][18]. To dissect the genetic mechanisms underlying HAPC, we compared genetic variations between native Tibetans with HAPC and adapted subjects without HAPC by whole-genome sequencing (WGS) of 20 individuals, and selected variants were further genotyped in an additional larger case-control cohort. Our investigation revealed the egl nine homologue 3 (egln3/phd3) (14q13.1) and PPP1R2P1 (Protein Phosphatase 1 Regulatory Inhibitor Subunit 2) gene (6p21.32) associated with HAPC.

Patients and Healthy Control Subjects.
As the same ethnicity, HAPC was collected in healthy control subjects and matched with native high-altitude Tibetans. In this study, 10 patients with HAPC and 10 control subjects without HAPC that were performed in Tibet area (Table 1) were recruited. Based on the International Consensus Statement [7], (1) HAPC was diagnosed with an Hb concentration of at least 19 g/dL and 21 g/dL for women and men, respectively, and (2) HAPC patients were local Tibetans normally living at 3600 to 4500 m. Blood samples for WGS were collected, and each subject was informed and gave written consent.

Whole-Genome
Sequencing. Blood DNA was isolated from each subject, and the desired length fragments were gel purified. According to the manufacturer's instruction, DNA cluster preparation and adaptor ligation were performed with the library preparation kit (Illumina). WGS was performed on Illumina HiseqX ten Platform with 150 base pairs (bps) paired-end reads.

Read Alignment and Variant
Calling. All reads were mapped to human reference genome by BWA [19] with default parameter. For downstream filtering, picard tools (http://broadinstitute.github. io/picard/) were used to mark PCR duplicates. We used Genome Analysis Toolkit (GATK) to adjust the alignments via GATK indel realignment and score recalibration modules [20]. We finally called and filtered SNVs using the GATK Unified Genotyper under default parameters. With the Ensembl annotated database, the resulting SNVs were annotated by ANNOVAR [21] according to the locations and expected effect on encoded gene products. According to dnSNP, SNVs were identified into "newly identified" and "known" variants.

SNP Discovery
Stage. The allele frequency distribution of 10 cases and 10 controls and variants with significant difference (P < 0:01) were analyzed by the Fisher exact test in allele frequency. To select the SNVs for association mapping, the pathway enrichment analysis using WebGestalt (http://bioinfo.vanderbilt.edu/webgestalt/) was performed by the distribution of genes with different allele frequencies. We used the hypergeometric test and Benjamini-Hochberg FDR to determine the significance of enrichment and adjust the multiple testing, respectively. This analysis revealed pathways including the HIF-1 signaling pathway and hematopoietic cell lineage [22] were associated with high-attitude adaption. We selected 21 SNVs in the two pathways from SNVs with different allele frequencies for genotyping in the stage of the association study.

Genotyping and Statistical
Analysis. Genotyping of the selected 21 variants was performed with Sequenom MassAR-RAY or Snapshot system in an additional 232 HAPC cases and 266 controls. The primer of genotyping was designed by use of Primer 5.0. The frequency of alleles was calculated by genotype frequency in HAPC patients and control subjects, and the intergroup differences were counted using the Fisher exact test. P value for multiple testing was adjusted by Benjamini-Hochberg FDR. QQ plot was produced using the man statistics package of R (http://cran.r-project.org/ web/packages/qqman/). We use the chi-square test to assess deviations from Hardy-Weinberg equilibrium.

Results
3.1. Whole-Genome Sequencing and Single Nucleotide Variants (SNVs). After extracting DNA from the peripheral blood, the whole-genome sequencing of 10 native Tibetans with HAPC and 10 adapted subjects without HAPC was performed by using next-generation sequencing technology (Supplementary Table S1). As a result, there were more than 360 million pairedend reads for each subject (Table 2). More than 99.7% of the reference genome was covered, and we sequenced the genome with a mean depth of 39x per individual. After applying stringent quality controls, we obtained 69,591,555 SNVs (single nucleotide variants), of which 191,734 (0.28%) were nonsilent mutations (nonsynonymous and splice site mutations). Comparing with 1000 Genome Project [23] gave 94% of SNVs known in 1000 Genomes Phase 1 ASN.
To discover the susceptibility loci associated with HAPC, two-stage studies including SNP discovery study and replication study were performed. To select the candidate SNVs for association mapping, we applied the Fisher exact test to WGS data and identified 33,662 SNVs (P < 0:01) in allele frequency between native Tibetans with HAPC and adapted subjects without HAPC, of which 72 were nonsilent SNVs (Supplementary Table S2). Pathway enrichment analysis was performed on the genes with significant difference in allele frequency and revealed pathways including the HIF-1 signaling pathway [24] and hematopoietic cell lineage [22] that were associated with high-attitude adaption (Supplementary  Table S3). To explore the association between HAPC and the two pathways, we subjected 21 SNVs that are in the two pathways and also showed allele frequency difference in the above Fisher exact test for genotyping in the stage of the association study (Supplementary Table S4).

Association Study.
Associations between the 21 selected variants and HAPC were evaluated by genotyping in an  Table S5). Most of the observed associations showed close to the expected distribution based on the quantile-quantile (QQ) plot, which is consistent with the null hypothesis of no association (Figure 1). Two loci exceeded significance in association with HAPC (Table 3): egln3/phd3 at rs1346902 in EGLN3 on 14q13.1 (P = 1:91 × 10 −5 , odds ratio ðORÞ = 0:57, 95% confidence intervals (CI): 0.44-0.74) and PPP1R2P1 at rs521539 in the region between HLA-DRB1 and HLA-DQA1 on 6p21.32 (P = 0:012, OR = 1:41, 95% CI: 1.08-1.84). The allele frequencies of other variants between these two groups were not significantly different. Furthermore, the overlap of these two associated SNPs was evaluated with the Encyclopedia of DNA Elementsannotated genomic elements (Supplementary Table S6). There were histone modifications at promoter or enhancer; DNase hypersensitivity sites, binding proteins, and motifs changed in both SNVs.

Discussion
In the present study, we used whole-genome sequencing of native Tibetans with HAPC and adapted subjects without HAPC to identify alleles' associated risk in previously unidentified HAPC susceptibility genes. We selected 21 variants from our WGS data for further evaluation in an additional larger cohort of Tibetans with HAPC and controls. The differences in the SNPs of tegln3/phd3 (C/G) and PPP1R2P1 (A/G) between the two groups are found for the first time in this present paper. SNP rs1346902 is located in the intron region (transcript ID: ENST00000487915 and ENST00000551935) of EGLN3 (also known as PHD3) that belongs to the Egl-9 (EGLN) family. EGLN3, along its paralogous genes EGLN1 and EGLN2, can catalyze the hydroxylation of hypoxiainducible factor subunit α (HIFα), leading to increased ubiquitination and proteasomal degradation of HIFα and decreased HIFα activity [25]. Upregulated EGLN3 is related to apoptosis in sympathetic neurons [26], p53-induced growth arrest in RAS-transformed embryo fibroblasts [27], and differentiation of C2C12 myoblasts [28]. EGLN3 also plays important roles in human endothelial cells [29] and cancers, including pancreatic cancer [30] and glioblastoma [31]. Previous study also identifies that EGLN3 can negatively regulate the NF-κB pathway, which is a crucial regulator of cell differentiation, inflammation, stress responses, and immunity [32]. Although association analysis with microsatellite markers in an Andean high-altitude population shows a possible association between EGLN3 (marker D14S1049) and severe polycythemia [33], no susceptible association between SNPs in EGLN3 and HAPC is reported. Here, we found a susceptibility locus in EGLN3 for HAPC. We also found significantly lower Hb and RBC in the subjects with rs1346902 C/C allele (Figures 2(a) and 2(b)), supporting and extending the potential role of EGLN3 in HAPC. Therefore, the significant difference of genotypes between two groups for SNP rs1346902 (C/C) was the beneficial allele of HAPC and it might contribute to high-altitude adaption in Tibetans with lower Hb and RBC.
A large number of genomic surveys about nucleotide polymorphism among humans in high-altitude environments have offered evidence for genes in the HIF oxygen signaling pathway, among which EPAS1 and EGLN1 are notable [9,10,[34][35][36]. EPAS1, EGLN1, VEGFA, CYP17A1, and CYP2E1 in the HIF pathway are also shown to be associated with high-altitude sickness, including HAPC [12][13][14][37][38][39]. The finding of susceptibility SNP in EGLN3 for HAPC in our study further supported the key role of the HIF oxygen signaling pathway in high-altitude adaption and relevant diseases.
Human leukocyte antigen (HLA) loci have been associated with more than 100 diseases [40]. HLA-DRB1 and HLA-DQA1 play an important role in the immune system, which belong to the classical HLA class II molecules. They can take part in recognizing and presenting antigens to T cells and are mainly expressed in B cells, dendritic cells, and macrophages. In addition, several studies have found a connection between HLA class II molecules and immune- Expected −log 10 (P) Figure 1: QQ plot indicates the differences between observed and expected -log 10 (P value).  . PPP1R2P1 was also significantly correlated with the albumin to globulin ratio (e). The association analysis was examined by analysis of variance method, and differences between genotypes were evaluated with a t-test.
mediated disorders, such as susceptibility loci in HLA-DRB1 for sarcoidosis [41] and pemphigus [42] and susceptibility loci in HLA-DQA1 for achalasia [43] and celiac diseases [44]. In our study, we also found that there were significant differences (P = 0:012) in the allele frequency of PPP1R2P1 near HLA-DRB1/DQA1 between the HAPC group and the control group, with high-frequency A allele among HAPC (57%) than controls (47%). Our finding was consistent with positive selection of HLA loci (HLA-DRA, HLA-DQB1/HLA-DQA2) in Ethiopian population [45] and significantly associated with the high-altitude pulmonary edema of HLA-DR6 and HLA-DQ4 [46]. Our results also revealed a significant correlation between PPP1R2P1 (A/G) and albumin/globulin ratio (Figure 2(e)), which is an indicator of liver and kidney disorders. The implication of HLA suggests a role of inflammation perhaps associated with the kidneys or livers. Previous studies show that chronic hypoxia induced hypertension and erythrocytosis leads to renal injury and inflammation in rats [47]. The similar effect between HLA and HAPC may exist in humans. Protein Phosphatase 1 Regulatory Inhibitor has been reported to be associated with reactive oxygen species (ROS) and oxidative stress [48].
The primary limitation of our research was the small case-control cohort in the discovery phase, which might result in low power of detection of candidate susceptibility loci. Moreover, the functional role of newly identified susceptibility loci was needed to be verified by some experiments. We will pay more attention to the expression pattern of these genes and related regulatory pathways in HAPC patients.

Conclusions
In summary, our genome sequencing conducted in Tibetan HAPC patients identified two new susceptibility loci at 14q13.1 and 6p21.32 and offered to our knowledge of the genetic architecture of HAPC. Further study about these two loci was likely to improve our knowledge on the etiology of HAPC.

Data Availability
Our datasets are presented within the manuscript and additional supporting files.

Ethical Approval
The experimental protocol was established according to the ethical guidelines of the Helsinki Declaration and approved by the ethics committee of People's Hospital of Tibet Autono-mous Region, Lhasa, China. Besides, all experiments were performed in accordance with relevant guidelines and regulations

Consent
Written informed consent was obtained from the individual participants.

Supplementary Materials
Supplementary Table S1: sequence data for HAPC and control subjects. Supplementary Table S2: list of 72 nonsilent SNPs with P < 0:01 in allele frequency between cases and healthy controls in the discovery phase. Supplementary Table  S3: pathway enrichment analysis on the genes with significant difference in allele frequency between 10 cases and 10 controls. Supplementary