Identification of Susceptibility Genes for Peritoneal, Ovarian, and Deep Infiltrating Endometriosis Using a Pooled Sample-Based Genome-Wide Association Study

Characterizing genetic contributions to endometriosis might help to shorten the time to diagnosis, especially in the most severe forms, but represents a challenge. Previous genome-wide association studies (GWAS) made no distinction between peritoneal endometriosis (SUP), endometrioma (OMA), and deep infiltrating endometriosis (DIE). We therefore conducted a pooled sample-based GWAS and distinguished histologically confirmed endometriosis subtypes. We performed an initial discovery step on 10-individual pools (two pools per condition). After quality control filtering, a Monte-Carlo simulation was used to rank the significant SNPs according to the ratio of allele frequencies and the coefficient of variation. Then, a replication step of individual genotyping was conducted in an independent cohort of 259 cases and 288 controls. Our approach was very stringent but probably missed a lot of information due to the Monte-Carlo simulation, which likely explained why we did not replicate results from “classic” GWAS. Four variants (rs227849, rs4703908, rs2479037, and rs966674) were significantly associated with an increased risk of OMA. Rs4703908, located close to ZNF366, provided a higher risk of OMA (OR = 2.22; 95% CI: 1.26–3.92) and DIE, especially with bowel involvement (OR = 2.09; 95% CI: 1.12–3.91). ZNF366, involved in estrogen metabolism and progression of breast cancer, is a new biologically plausible candidate for endometriosis.


Introduction
Endometriosis is an inflammatory estrogen-driven condition, defined as misplaced endometrium outside of the uterine cavity and causing chronic pelvic pain and infertility [1,2].Endometriosis is a major women's health concern that dramatically impairs the quality of life.With a prevalence reaching up to 10% of women of reproductive age, endometriosis has a strong socioeconomic impact [3].It makes sense to consider endometriosis as a public health priority and to assign substantial human and financial resources to improve the management of patients [4].
Endometriosis is held as a complex heritable trait, with additive genetic effects accounting for about one half of the variance [5].In recent years, two genome-wide association studies (GWAS) have been conducted in individuals from Japan, Australia, and UK.The first Japanese study reported a significant association of endometriosis with rs10965235 located in CDKN2BAS in 9p21 and with rs16826658m in the linkage disequilibrium block including WNT4 in 1p36 [6].CDKN2BAS regulates P16, a tumor suppressor genes repressed in endometriosis [7], possibly by promoter hypermethylation [8].WNT4 has a role in the development of the genitourinary system, steroidogenesis, and folliculogenesis 2 BioMed Research International [9,10].The second UK/Australian GWAS found an association of endometriosis with rs12700667 located in 7p15.2 in an intergenic region upstream of NFE2L3, HOXA10, and HOXA11 [11].The role of HOX genes in endometriosis-related infertility has been largely debated [7].After pooling the data from the two studies, rs12700667 remained significantly associated with endometriosis and also with stages III/IV [12].Consistent with these associations, the latest GWAS implicated a 150 kb region around WNT4 that also include LINC00339 and CDC42 [13].An independent set of 305 laparoscopically proven endometriosis patients and 2710 controls confirmed WNT4, CDKN2BAS, and FN1 as the first identified common loci for endometriosis [14].
With odds ratios below 1.5, these associations are nonetheless not strong enough to suggest a causal relation or to consider a potential clinical application [15].This could be due to different genetic backgrounds across populations: rs10965235 for instance is monomorphic in Caucasians.Most likely, this reflects the heterogeneity of endometriosis [16].Indeed, endometriosis has inconstant symptoms spanning from no symptoms to severe chronic pelvic pain and infertility [1,2].Three distinct clinical forms of endometriosis have been identified.(i) Peritoneal superficial endometriosis (SUP) consists of lesions lying on the surface of the peritoneum or the ovaries.(ii) Endometrioma (OMA) is an ovarian endometriotic "chocolate" cyst.(iii) Deeply infiltrating endometriosis (DIE) comprises lesions infiltrating the muscularis propria of structures surrounding the uterus (vagina, bladder, bowel, or ureters) [17].The phenotypic distinction between SUP, OMA, and DIE may also be raised in terms of their sex hormone responsiveness.SUP, OMA, and DIE have different responsiveness to progesterone [18], differential gene expressions [19,20], and specific putative variants that may influence one form and not the others [21][22][23].
Therefore, we initiated a pooled sample-based GWAS with a distinction between SUP, OMA, and DIE.As this was not a classic GWAS approach, we conducted an initial discovery step on 10-individual pools in biological duplicates using the Affymetrix GenChip 250K Nsp chip.After quality control filtering and SNPs ranking based on a false discovery rate (FDR) below 5%, we performed a replication step of individual genotyping to validate the top-ranked SNPs in an independent cohort of 259 endometriosis and 288 controls.Endometriosis was confirmed histologically in cases and invalidated surgically in controls.

Study Population.
The data were obtained from 627 unrelated women of Caucasian origin treated in our tertiary referral center for endometriosis and gynecologic conditions.The 319 endometriotic patients underwent a complete surgical resection of all visible lesions allowing confirmation of the diagnosis by expert pathologists in all the cases.We classified the patients into SUP, OMA, and DIE, considering the most severe lesion.DIE was histologically defined when endometriotic lesions involved the muscularis of the vagina, the bladder, the intestine, or the ureter [17].As these three types of lesions were frequently concomitant [24,25], patients were arbitrarily classified as per the worst findings [26].Hence, a patient presenting with SUP lesions associated with OMA and DIE was classified as DIE.As well, a patient with OMA and concomitant SUP lesions was classified as OMA, whereas a patient classified as SUP only presented SUP lesions.The 308 controls (CTR) consisted of women without any lesion suggestive of endometriosis, as checked during a comprehensive surgical exploration.Indications for surgery in these patients were infertility work-up, symptomatic uterine fibroids, benign ovarian cysts, or chronic pelvic pain.
The discovery group was composed of 60 cases (20 SUP, 20 OMA, and 20 DIE) and 20 CTR randomly selected from the entire cohort for DNA-pooling experiments.The DIE group included the most severe patients, that is, those presenting at least one lesion infiltrating the bowel and associated bilateral OMAs more than 3 cm.This was done intentionally in order to have a homogeneous group since DIE might encompass a large variety of lesions.The replication cohort was composed of the remaining 259 cases (42 SUP, 121 OMA, and 96 DIE) and 288 CTR.The ethical review board of our center (Comité Consultatif de Protection des Personnes dans la Recherche Biomédicale de Paris-Cochin) approved the study design.All subjects provided written informed consent before entering the study.

DNA Extraction and
Quantification, Pool Construction, and SNP Allelotyping.Five millimeters of venous blood were collected from individuals into EDTA tubes.Genomic DNA was subsequently extracted with the MagNa Pure Compact Nucleic Acid Isolation Kit (Roche Applied Science, Indianapolis, IN, USA).DNA was quantified using a spectrophotometer (NanoDrop 1000, Thermo Scientific, Wilmington, DE, USA) and diluted to a concentration of 50 ng/L.Afterwards each DNA was quantified using fluorimetry (Quant-It DNA Assay Kit, Invitrogen, Carlsbad, CA, USA) and checked for quality using 1% agarose gel electrophoresis.Each pool was composed of 10 samples of 100 ng DNA aliquots from the same category (SUP, OMA, DIE, and CTR).In all, we built eight 10-patient DNA pools, two per category in biological replicates (2 SUP, 2 OMA, 2 DIE, and 2 CTR).DNA quantification and quality were checked several times after the pooling procedure to make sure each pool had the same amount of DNA.Genotyping analysis with GenChip Human Mapping 250K Nsp Array Set (Affymetrix, Santa Clara, CA, USA) was performed for each pool following the manufacturer's guidelines.Briefly, genomic DNAs were restricted with NspI.NspI adaptors were then ligated to restricted fragments and subjected to PCR using the universal primer PCR002 provided by the kit.PCR fragments were then purified and 90 g used for fragmentation and end-labeling with biotin using Terminal Transferase.Labeled targets were then hybridized overnight to Genechip human 250K NspI array (Affymetrix) at 49 ∘ C. Chips were washed on the fluidic station FS450 following specific protocols (Affymetrix) and scanned using the GCS3000 7G.The image was then analyzed with the GCOS software to obtain raw data.
Quality controls were performed using Affymetric GType software and the MPAM algorithm (Modified Partitioning Around Medoids).All samples had a call rate >94% and a detection rate >99%.

Analysis of Microarrays Raw
Data.Raw data, presenting as fluorescence intensities for each 25-base perfect match probe (PM) and mismatch probe (MM), were reported and analyzed using Excel 2008 (Microsoft, Redmond, WA).For a given SNP, we computed the amount of fluorescence () of each allele ( and ) as  =  − .Allele frequency for allele  was estimated by the formula   =   /(  +   ) and reciprocally for allele ,   =   /(  +   ).Then, we calculated the ratio of allele frequencies () between the cases (SUP, OMA, or DIE) and the controls (CTR).For a given category, since we had 2 pools per category (SUP 1 and SUP 2 , OMA 1 and OMA 2 , DIE 1 and DIE 2 , and CTR 1 and CTR 2 ), we obtained 4 ratios of allele frequencies.As instance for DIE: Finally, we calculated the mean of the 4 ratios of allele frequencies and the coefficient of variation (defined as the ratio of the standard deviation to the mean).

Genotypes Calling.
SNPs were sorted according to the mean of the four ratios of allele frequencies in each category.We selected the SNPs where the mean of the ratios of allele frequencies was >20.Then, for each selected SNP on a given chromosome, we submitted the mean of the four ratios of allele frequencies and the coefficient of variation to a Monte-Carlo simulation [27].Briefly, we simulated artificial chromosomes with a number of SNPs identical to that of the chip.For each SNP, we produced random allele frequencies for cases and controls (two of each).These frequencies were used to generate a mean of the four ratios of allele frequencies and a coefficient of variation that were compared to the actual values of each real SNP.This procedure was repeated thousand times, enabling to define a probability of random occurrence of a given pair (mean and coefficient of variation).When this  value was <0.05, corresponding to a FDR of 5%, the SNP was considered as significantly biased between cases and controls.For instance, for chromosome 1, four frequencies corresponding to 19,865 SNPs were randomly simulated and used to generate ratios, means, and coefficients of variation, along the lines of what was done with the real dataset.Then, for a real given SNP we performed a test to check if its values (mean and coefficient of variation) could occur and how many times in the artificial database following its random generation.This Monte-Carlo procedure allowed us to eliminate the SNPs whose coefficient of variation was too high and to select the SNPs with a FDR  value of 0.05.After this procedure, we obtained a list of 280 SNPs.

Individual Genotyping.
To validate high-ranking SNPs from the GWAS, we made another selection by using different criteria.(1) We first selected SNPs from gene regions containing at least 2 SNPs positioned among the top-280 SNPs and/or common to at least two endometriosis subtypes.In this way, we selected 16 SNPs that were individually genotyped by pyrosequencing in 188 cases (42 SUP, 50 OMA, and 96 DIE) and 96 controls from the replication cohort.
(2) We also selected the top-5 SNPs, those with a  value after Monte-Carlo testing below 0.0001.These SNPs were genotyped in 71 OMA and 192 controls.For technical reasons (degraded DNA), samples used for genotyping the top-5 SNPs were different from those used to test the 16 SNPs first selected.Primers for PCR amplification and pyrosequencing were purchased from BioTeZ (Buch, Germany) (see Supplemental Table 1 in Supplementary Material available online at http://dx.doi.org/10.1155/2015/461024).Regions of interest were amplified using 20 ng of human genomic DNA and 5 pmol of forward and reverse primer, one of them being biotinylated.Reaction conditions were 10x Platinum Taq DNA Polymerase High Fidelity buffer supplemented with 0.5 mM MgCl 2 , 200 mM dNTPs, and 1.5 U Platinum Taq DNA Polymerase High Fidelity (Invitrogen, Cergy Pontoise, France) in a 25 L volume.The PCR program consisted of a denaturing step of 4 min at 95 ∘ C followed by 50 cycles of 30 s at 95 ∘ C, 30 s at the respective annealing temperature, and 15 s at 72 ∘ C, with a final extension of 4 min at 72 ∘ C. Ten L of each amplification product were purified and rendered single-stranded on a pyrosequencing workstation (Qiagen, Uppsala, Sweden).Beads were released into 12 L annealing buffer containing 4 pmol of the respective sequencing primer.Primers were annealed to the target by incubation at 80 ∘ C for 2 min.Genotyping was carried out on a PyroMark MD system with the PyroGold SQA Reagent Kit (Pyrosequencing) and results were analyzed using the PyroMark MD software (Pyrosequencing).A genetic association study was performed.Statistical analysis was performed using chisquare statistics for assessing single SNP association.A  value of <0.05 was considered statistically significant.

Baseline Characteristics of the Cohorts.
Three hundred and nineteen (319) endometriosis cases and 308 diseasefree controls were recruited for the study.Basic clinical characteristics of the discovery group ( = 60) and the replication cohort ( = 567) are described in Supplemental Tables 2 and 3.All patients were of Caucasian origin and presented similar mean age, BMI, parity, gravidity, and infertility.In the discovery group, DIE patients ( = 20) were chosen among the patients having a bowel involvement with associated bilateral OMAs, in order to improve the clinical homogeneity of the patients.Indeed, patients with bowel DIE are considered to have the most severe form of endometriosis.In the replicative cohort, DIE group was comprised of patients having lesions of the uterosacral ligaments ( = 24), vagina ( = 23), bladder ( = 12), or intestinal ( = 37) lesions.Most of the DIE patients had a bowel involvement (38.5%).

Pooled Sample-Based GWAS.
Eight DNA pools of 10 samples each (two per endometriosis subtype and two for controls, in biological duplicate) were hybridized to the Affymetrix 250K chip.For the analysis of pooled samplebased array data, we used Monte-Carlo simulations of the mean of ratios of allelic frequencies and coefficient of variation to rank the 262,000 array SNPs.Two hundred and eighty SNPs were significantly associated with at least one endometriosis group.The complete list of top-ranked SNPs is provided in Supplemental Table 4.The number of significant SNPs was not different according to the subtype (102, 108, and 104 for SUP, OMA, and DIE, resp.).Thirty-two (11.4%)SNPs were common to at least two subtypes (Table 1).Two (0.7%) SNPs (rs3746192 and rs776108) were common to the three subtypes.Fifteen (5.4%) SNPs were common to SUP and OMA, six (2.1%) to SUP and DIE, and nine (3.2%) to OMA and DIE.Among the 280 top-ranked SNPs, SUP had 79 (77.4%) specific SNPs, OMA 82 (75.9%), and DIE 87 (83.7%).This distribution, although not significant ( 2 test,  = 0.347), suggests that DIE with bowel involvement may be more homogeneous than other forms of the disease.
We examined the expression of the genes corresponding to the 108 top-ranked SNPs in the OMA group according to our transcriptome analysis of OMA [7] (Table 1, Supplemental Table 4).Twenty-seven genes (25.0%) were modified at the expression level (i.e., induced or repressed more than 2-fold), while 16.8% of all the genes were modified in OMA as compared to eutopic endometrium ( < 0.001) [7].This suggests that these 108 variants might affect gene expression in OMA.Such an association is quite original and could give independent clues about the importance of a given gene in a complex disease.Another way to obtain independent clues is the extensive use of bioinformatics that can test whether groups of genes were obtained randomly or not.We performed nonsupervised gene functional clustering separately for each subtypes by using DAVID Bioinformatics Resources (http://david.abcc.ncifcrf.gov/home.jsp).We observed a strong bias of endometriosis-related genes towards transmembrane proteins affecting cell/cell interactions and cell adhesion (Supplemental Table 5).

Individual Genotyping.
For this replication step, the average genotyping rate was 97.5%.Hardy-Weinberg equilibrium was confirmed in the control group for all SNPs tested.We performed individual genotyping of 192 cases (96 DIE, 50 OMA, and 46 SUP) and 96 controls for selected SNPs.We first settled on the 16 SNPs that presented the following characteristics: (i) common to the 3 subtypes (rs3746192, rs776108) or (ii) common to 2 subtypes and in genes containing other significant SNPs (rs988147, rs227849, rs10733833, rs322609, rs1884779, rs4703908, rs6946871, rs11865033, rs247004, rs11647011, rs7477459, rs5913038, rs17056959, and rs16986515).Rs227849 and rs4703908 were significantly associated with OMA (ORs = 2.31 (1.41-3.78) = 0.003 and 2.22 (1.26-3.92) = 0.009, resp.)(Table 2, Supplemental Table 6).Rs227849, introducing A instead of a G at position 44698708 on chromosome 6, is located near RUNX2, SUPT3H, and CDC5L.In the SUP group, a small difference was observed for rs227849 ( = 0.08).In the DIE group, rs4703908 was significantly associated with a higher risk of bowel endometriosis (OR = 2.09 (1.12-3.91) = 0.012) (Table 2, Supplemental Table 7).Rs4703908, related to a G to C change, is located on chromosome 5 in an intronic region near ZNF366 (zinc finger protein 366).The results for DIE+OMA were not significant for any of the SNPs tested, expect for rs4703907 that was just over the limits of statistical significance ( = 0.09).We then focused on OMA, which is regarded as the most well-defined endometriosis phenotype [28].For this attempt we used even more stringent criteria to select 5 SNPs (rs11865033, rs16913217, rs966674, rs7333155, and rs2479037).Those criteria were the following: (i) probability of random occurrence in the Monte-Carlo simulation ≤1/1000 and (ii) allelic frequency of the SNP in controls similar to that given by HapMap.Rs2479037 and rs966674 were found significantly associated with OMA (OR = 4.36 (1.32-14.43) = 0.005 and 2.95 (1.60-5.42) = 0.002, resp.)(Table 2, Supplemental Table 8).

Discussion
In this Caucasian population-based study, we performed a pooled sample-based GWAS in endometriosis with a distinction between SUP, OMA, and DIE.Selection of the cases was as rigorous as possible.Systematic histologic confirmation was obtained allowing undoubted classification into one of the three groups.All controls were surgically checked for the absence of endometriosis.We found four new variants (rs227849, rs4703908, rs2479037, and rs966674) significantly associated with an increased risk of OMA.Rs4703908, located close to ZNF366, provided a higher risk of OMA (OR = 2.22 (1.26-3.92))and DIE, especially with bowel involvement (OR = 2.09 (1.12-3.91)).
This study demonstrated the feasibility of the pooled sample-based method.In "classic" GWAS, the cost associated with the purchase of microarrays and dedicated reagents for tens of thousands of patients put such studies beyond the reach of most research groups.An alternative approach of genotyping pooled instead of individual genomic DNA samples has been developed to reduce the overall cost of GWAS.Our study confirmed the feasibility of DNA pooling on SNP genotyping microarrays and that the pooled sample-based approach presents an attractive and cost-effective alternative to classic GWAS, like other reports [29][30][31][32].The study design has been completed for thousands of dollars whereas individual genotyping requires millions of dollars.For this reasons, other research groups could easily initiate replication and validation of our results.This possibility counterbalances the limitations associated with the pooling strategy, mainly due to the small number of DNA samples.Using 10-individual pools carries a risk of population stratification, as suggested by the discordance in allelic frequencies between our study and the HapMap data for some SNPs.However, two procedures have been used in the present study in order to minimize the biases and remove SNPs for which strong intergroup differences were observed in terms of allelic frequencies: (i) the analysis in parallel of biological duplicates (two pools of 10 patients per condition) and (ii) the Monte-Carlo simulation.In other words, our approach was certainly very stringent but not very sensitive.The SNPs we selected were likely Table 1: Top ranked SNPs significantly associated with at least two endometriosis subtypes.Gene name is provided if the genotyped SNP is located in upstream, 3  UTR, intronic, exonic, 5  UTR, or downstream regions of that gene as annotated by Ensembl Genome Browser GRCh37 (Feb 2009).Inter denotes that a SNP is located in an intergenic region.Chr: chromosome.CV: coefficient of variation.In bold are figured the SNPs common to at least two endometriosis subtypes.Induction ratio is the ratio of gene expression in the endometriotic lesion as compared to the expression in the eutopic endometrium (data from Borghese, 2008)  to be valid, but we probably missed some SNPs associated with endometriosis.This point is highlighted by the fact that we did not replicate the SNPs reported by "classic" GWAS [6,11,13,14].Indeed, it seems difficult to compare the results from "classic" GWAS to our results.First, the SNPs spotted on the Affymetrix chip are not exactly the same as those on the Illumina chip used by other groups.Some genomic regions are not covered by the Affymetrix chip and reciprocally.For example the Affymetrix chip did not cover a genomic region of >5 kb around the variant reported by the Japanese GWAS (rs16826658m).In addition, the patients included in the "classic" GWAS consisted of all-coming endometriosis without distinction between the different phenotypes.Even when a distinction between stages I/II and III/IV was carried out, the rAFS classification was not systematically correlated with the phenotype or the extension of the lesions [33].In other words, it is quite possible that a patient scored stage I or II might have a bowel endometriosis [34].Rationale for using the rAFS classification rather than the clinical subtypes (SUP, OMA, and DIE) is truly questionable [35].
It is noteworthy that the four individually validated polymorphisms (rs4703908, rs227849, rs966674, and rs2479037) were found in patients suffering from OMA, not from SUP or DIE.In light of the known heterogeneous character of endometriosis, OMA probably represents the most "clearcut" phenotype of all endometriosis subtypes.Our team recently emphasized this point [28].Moreover rs4703908 was significantly associated with bowel endometriosis, a specific form of DIE, considered as the most severe one.This might indicate that bowel DIE could be more genetically driven than the other forms.This observation reinforced the importance of the clinical phenotype of endometriosis, which is, in our opinion, a crucial parameter in GWAS.It might be interesting to collect precise and homogeneous phenotypes, in a multicentric international effort, to conduct targeted GWAS in those populations [17,34,36].
Beyond the above-mentioned limitations, we introduced a new tool in the fields of genetics of endometriosis.Two SNPs were common to all subtypes: rs3746192 and rs776108.Rs3746192 is located in an intronic region near KCNN1 (potassium intermediate/small conductance calcium-activated channel, subfamily , member 1) on chromosome 19 and related to a G to A change.KCNN1 is a member of the KCNN family of potassium channel genes that are thought to regulate neuronal excitability.Rs776108, related to an A to G change, is located in a noncoding region of chromosome 3, near ROBO2 (roundabout, axon guidance receptor, and homolog 2).ROBO2 is a receptor for SLIT2, known to function in axon guidance and cell migration [37].Defects in this gene are the cause of vesicoureteral reflux type 2 [38].With regard to their function, KCCN1 and ROBO2 are good candidate genes for endometriosis, since neoneurogenesis frequently occurs in endometriosis [39].In the OMA group, we have identified four new polymorphisms: rs4703908, rs227849, rs966674, and rs2479037, with robust ORs.Rs2479037 is close to VTI1A, possibly involved in colorectal adenocarcinoma [40].Rs4703908, associated with both OMA and intestinal DIE, is located near ZNF366 that plays an important role in regulating the expression of target genes in response to estrogen [41].ZNF366 could be an independent prognostic factor for breast tumors [42].ZNF366 could also act as a tumor suppressor in breast cancer development [43,44].These findings make it a plausible candidate gene for endometriosis, which is an estrogendependent disease [45].
To conclude, the present study clearly indicated some areas of the genome to focus on and confirmed the heterogeneity of endometriosis.The combined use of several polymorphisms identified in this study and in others could eventually permit to define Bayesian probabilities enabling to categorize the patients and predict the risk of endometriosis in future life, as shown on other subjects [46].Such predictive tool could contribute to a better follow-up of every woman and move towards a person-centered care of infertility and pain associated with endometriosis.
. A ratio above 2.0 denotes an upregulation of the gene expression.A ratio below 0.50 denotes a downregulation of the gene expression.

Table 2 :
Distribution of genotype and allele frequencies in cases and controls.NA: not applicable, OR: odds ratio, 95% CI: 95% confidence interval, CTR: controls, SUP: superficial peritoneal endometriosis, OMA: endometrioma, and DIG: deep infiltrating endometriosis (DIE) with intestinal involvement.valuesareforthe Pearson's chi-square test.OR and 95% CI are computed for comparison between OMA or DIG and CTR.Phet:  value for heterogeneity across discovery and replication steps calculated using the Breslow-Day test.Detailed results for the 21 SNPs tested are available as Supplemental Tables6 and 8. Detailed results for all subtypes of DIE are available as Supplemental Table7.