Potentially Functional Polymorphisms in POU5F1 Gene Are Associated with the Risk of Lung Cancer in Han Chinese

POU5F1 is a key regulator of self-renewal and differentiation in embryonic stem cells and may be associated with initiation, promotion, and progression in cancer. We hypothesized that functional polymorphisms in POU5F1 may play an important role in modifying the lung cancer risk. To test this hypothesis, we conducted a case-control study to explore the association between 17 potentially functional SNPs in POU5F1 gene and the lung cancer risk in 1,341 incident lung cancer cases and 1,982 healthy controls in a Chinese population. We found that variant alleles of rs887468 and rs3130457 were significantly associated with increased risk of lung cancer after multiple comparison (OR = 1.29, 95% CI: 1.11–1.51, P fdr = 0.017 for rs887468; OR = 1.29, 95% CI: 1.10–1.51, P fdr = 0.034 for rs3130457, resp.). In addition, we detected a significant interaction between rs887468 genotypes and smoking status on lung cancer risk (P = 0.017). Combined analysis of these 2 SNPs showed a significant allele-dosage association between the number of risk alleles and increased risk of lung cancer (P trend < 0.001). These findings indicate that potentially functional polymorphisms in POU5F1 gene may contribute to lung cancer susceptibility in a Chinese population.


Introduction
Lung cancer is the leading cause of cancer related mortality worldwide. Over 80% of lung cancer can be attributed to cigarette smoking [1]. However, Only 10% to 15% of chronic smokers develop lung cancer, indicating that other factors (e.g., genetic factors) might also play a pivotal role in lung cancer risk [2]. Recently, genome-wide association studies have discovered dozens of loci that are related to lung cancer risk [3][4][5][6][7][8][9][10][11]. These loci only account for a small fraction of the risk of developing lung cancer due to the stringent screening criteria of GWAS [8]. Thus, an effort on candidate gene strategies might help to explain the missing heritability.
The Pit-Oct-Unc (POU) homeodomain transcription factor, POU5F1 (also known as OCT-3, OCT-4, and OCT 3/4), is a key regulator of self-renewal and differentiation in embryonic stem cells [12][13][14][15]. POU5F1 gene expresses in adult human stem cells, immortalized nontumorigenic cells, and tumor cells and cell lines, and its level decreases with the onset of differentiation and loss of pluripotency in these cells [16][17][18]. According to the cancer stem cell (CSC) dogma, the reactivation of early embryonic stem cell genes such as POU5F1 in somatic stem cells and/or differentiating progenitor cells may lead to transformation into CSCs, which may result in cancer initiation, promotion, and progression [19][20][21]. To date, high expression level of POU5F1 has been detected in various types of cancer cells [22,23]. In particular, Karoubi et al. observed higher levels of expression of POU5F1 gene and atypical cytoplasmic distribution of POU5F1 in lung adenocarcinoma cell lines, indicating an oncogenic role in lung adenocarcinoma [24]. Polymorphisms in POU class 5 homeobox 1 pseudogene 1 gene (POU5F1P1), a highly homologous pseudogene of POU5F1, were identified to be associated with the risk of gastric cancer [25]. Therefore, we postulated that potentially functional genetic variation within POU5F1 might modify the susceptibility to lung cancer. To test this hypothesis, we After providing a written informed consent, participants donated 5 mL venous blood sample and underwent a faceto-face interview that solicited information on participants' demographics (e.g., age and sex) and health related behaviors (e.g., smoking). Those who had smoked one cigarette or more per day for >1 year were considered as smokers; smokers who had quit smoking for >1 year were defined as former smokers; all others were classified as never smokers [6].  Minor allele frequency (MAF) ≥ 0.05 was used to filter low-frequency variants. The remaining variants were annotated by SNPinfo Web Server (http://snpinfo.niehs .nih.gov/); 27 SNPs were selected as potentially functional variants. We then performed linkage disequilibrium (LD) analysis with an 2 threshold of 0.8; 19 SNPs were retained for genotyping. However, rs1265163 and rs3132517 were excluded because of probe design failure. Therefore, there were 17 SNPs included in the final set (Table 2).

Genotyping.
Genomic DNA was extracted from a leukocyte pellet by proteinase K digestion, followed by phenolchloroform extraction and ethanol precipitation. Illumina Infinium BeadChip (Illumina Inc.) was used for genotyping and GenTrain version 1.0 clustering algorithm in GenomeStudio V2011.1 (Illumina Inc.) for genotype calling. Technicians performing the genotyping were blinded to the case or control status of participants.

Statistical Analysis.
Deviation of genotype distribution for each SNP from the Hardy-Weinberg equilibrium was tested by a goodness-of-fit 2 . Student's test for continuous variables and 2 test for categorical variables were applied for analyzing distribution differences of demographic characteristics and genotypes between cases and controls. The association between SNPs and lung cancer risk was examined under an additive model using logistic regression to estimate odds ratios (ORs) [26] and 95% confidence intervals (CIs) with adjustment for age, sex, and pack-years of smoking when appropriate. We used 2 -based -test to test the heterogeneity from corresponding subgroups. Multiplicative interactions were tested using a general logistic regression model by applying the equation  where is the logit of case-control status, and are factors (SNP or smoking status), 0 is constant, and are the main effects of factors and , respectively. And is the interaction term. Covar denote covariates for adjustment, including age and sex.
All analyses were performed using R software (version 3.1.1, The R Foundation for Statistical Computing, http:// www.cran.r-project.org/).

Results
The geographic characteristics of participants were summarized in Table 1. The distributions of age ( = 0.980) and sex ( = 0.179) were comparable between cases and controls. However, cases have larger proportions of smokers and heavy smokers than controls ( < 0.001). 481 (35.87%) cases had squamous cell carcinoma and 860 (64.13%) cases had adenocarcinoma.
To further examine the associations of these 2 SNPs with lung cancer risk, we conducted stratified analyses within subgroups according to selected variables ( Table 4). As a whole, the devastating effect of variant alleles was more pronounced in elder age group (>60), males, never smokers,  and adenocarcinoma patients. No significant heterogeneity was found for ORs and their 95% CIs between different subgroups. We also explored interaction between these SNPs and smoking status. As shown in Table 5, an interaction existed between the genotype of rs887468 and smoking status ( for multiplicative interaction = 0.017). However, no significant multiplicative interaction was observed for rs3130457 (data not shown). Furthermore, we investigated cumulative effect of these 2 risk alleles on lung cancer and observed significant allele-dosage association between number of risk alleles and lung cancer risk ( trend < 0.001). People who carried one or two risk alleles were 1.33 times more likely to develop lung cancer than those who carried no risk allele, and those with more than two risk alleles were 1.46 times more likely to develop lung cancer. These data indicated that those who carried more risk alleles had a higher risk of lung cancer.

Discussion
In the current study, we explored the association of 17 potentially functional SNPs in POU5F1 gene with the development of lung cancer in 1,341 cases and 1,982 healthy controls. We found that the variant allele of rs887468 and rs1310457 were associated with increased lung cancer risk. To the best of our knowledge, it is the first association study of polymorphisms in the POU5F1 gene and lung cancer, which provides more evidence for understanding the role of POU5F1 in lung cancer risk. POU5F1, a transcription factor, is involved in regulation of pluripotent state of stem cells, and its level decreases with the onset of differentiation in these cells [15][16][17][18]. Amini et al. have found that cancer cell lines and cancer tissues had significantly higher expression levels of early embryonic stem cell genes, including POU5F1, SOX2, and CD133 [21]. In the hypoxia conditions, POU5F1 can promote CD133 expression in the lung cancer cells, which is a specific cell surface marker for cancer stem cells [27]. In addition, parallel elevated expression of POU5F1 and Nanog in lung adenocarcinoma (LAC) increases the percentage of CD133-expressing subpopulation, enhances drug resistance, and promotes epithelialmesenchymal transition (EMT); coexpression also activates Slug and enhances the tumor-initiating capability of LAC [28,29]. Furthermore, knockdown of POU5F1 impeded tumorigenic and metastatic ability and reversed the EMT process of lung adenocarcinoma [28,29]. For squamous cell carcinoma, Chen et al. measured POU5F1 expression in nonsmall cell lung cancer tissues and found that although the proportion of squamous cell carcinoma tissues which have elevated expression was smaller than that of adenocarcinoma, the increased expression of POU5F1 was associated with poor differentiation of cancer cells and shorter overall survival in both histologic subtype of lung cancer [30]. Taken together, POU5F1 expression is crucial for the selfrenewal and oncogenic potentials of lung cancer stem cells. Our findings further demonstrated that POU5F1 might be a susceptibility gene of lung cancer, which was consistent with the observations mentioned above.
The two SNPs, rs887468 and rs3130457, are located in the upstream region from the transcription starting site of POU5F1 gene. According to a web-based SNP analysis tool, SNPinfo (http://snpinfo.niehs.nih.gov/), the two SNPs are all potential transcriptor binding sites and the SNP rs887468 may influence an exonic splicing enhancer [31] or exonic splicing silencer (ESS) (Supplemental Table 1) (see Supplementary Material available online at http://dx.doi.org/10.1155/ 2015/851320). Variations in ESE or ESS can be involved in disruptions of balanced interplay of ESE-and ESS-binding proteins, which thereby results in missplicing and causes deficiency in expression products of nearby genes [31][32][33][34]. We speculate that variant genotype of rs887468 might lead to alternative splicing events and disequilibrium for different isoforms of POU5F1, suggesting a biological plausible mechanism for lung cancer risk. Variation in rs887468 may influence its interactions with transcription factors such as Myc-associated zinc-finger protein (MAZ) (as predicted by RegulomeDB (http://www.regulomedb.org/)). MAZ gene normally expresses in the lungs [35]. It is responsible for regulation of oncogene transcription from promoter [36]. One possible explanation for the association between rs887468 and lung cancer risk might be that the variant genotype alters interactions of the loci with transcription factors and results in aberration in function of POU5F1 gene and elicits procedure of carcinogenesis.
Based on an online tool, DNase-seq and RegulomeDB (http://www.regulomedb.org/), rs887468 and rs3130457 fall into DNase I peaks. DNase I hypersensitivity regions are potential genetic regulatory loci and correlate with binding sites of sequence-specific DNA-binding proteins [37]. This suggests a possible mechanism for the effect of these variants on lung cancer risk. Additionally, a multiplicative interaction was observed between rs887468 in POU5F1 gene and smoking status. The interaction result indicated that the harmful effect of variant allele of rs887468 was weaker among current smokers. However, our results are very preliminary; further experimental studies are warranted to uncover the underling mechanism of these observations.
In summary, our case-control study reported 2 potential functional SNPs in POU5F1 gene that may affect the risk for lung cancer in a Chinese population. Given that our results are very preliminary, more large scale studies are required to validate our findings in diverse ethnic populations and clarify the molecular basis behind these observations.