Glial Cell-Derived Neurotrophic Factor Functions as a Potential Candidate Gene in Obstructive Sleep Apnea Based on a Combination of Bioinformatics and Targeted Capture Sequencing Analyses

Background Obstructive sleep apnea (OSA) is a prevalent chronic disease that increases the risk of cardiovascular disease and metabolic and neuropsychiatric disorders, resulting in a considerable socioeconomic burden. The present study was aimed at identifying potential key genes influencing the mechanisms and consequences of OSA. Methods Gene expression profiles associated with OSA were obtained from the Gene Expression Omnibus (GEO) database. Differentially expressed genes (DEGs) in subcutaneous adipose tissues from patients with OSA and normal tissues were screened using R software, followed by gene ontology and pathway enrichment analyses. Subsequently, a protein-protein interaction (PPI) network was constructed and hub genes were extracted using Cytoscape plugins. The intersected core genes derived from different topological algorithms were considered hub genes, and the potential candidate gene was selected from them for further analyses of expression variations using another GEO dataset and targeted capture sequencing in 100 subjects (50 with severe OSA and 50 without OSA). Results A total of 373 DEGs were identified in OSA samples relative to normal controls, which were primarily associated with olfactory transduction and neuroactive ligand-receptor interaction. Upon analyses of nine topological algorithms and available literature, we finally focused on glial cell-derived neurotrophic factor (GDNF) as the candidate gene and validated its low expression in OSA samples. Two rare nonsynonymous variants (p.D56N and p.R93Q) were identified among the 100 cases through targeted sequencing of GDNF, which could be potentially deleterious based on pathogenicity prediction programs; however, no significant association was detected in single nucleotide polymorphisms. Conclusion The present study identified GDNF as a promising candidate gene for OSA and its two rare and potentially deleterious mutations through a combination of bioinformatics and targeted capture sequencing analyses.


Introduction
Obstructive sleep apnea (OSA) is a sleep-related respiratory disease characterized by breathing problems at night such as snoring, apnea, and daytime sleepiness, which leads to intermittent hypoxemia, transitory hypercapnia, and sleep structure disturbances. The principal clinical risk associated with OSA is multiple organ system impairment, such as cardiocerebrovascular diseases, metabolic syndrome, and cognitive dysfunction [1][2][3]. The prevalence of OSA has increased among the general population, and moderate to severe OSA has been estimated to be as high as 49.7% in males and 23.4% in females [4]. Unfortunately, the vast majority of patients with OSA (70-90%) remain undiagnosed, resulting in a heavy health and socioeconomic burden [5]. Therefore, it is imperative to study the etiology and pathogenesis of OSA and to identify indicators for early diagnosis and treatment targets.
Strong genetic influence has been reported for OSA, with more than 1.5-fold increased risk in first-degree relatives of patients [6]. Approximately 35% to 40% of variations in apnea-hypopnea index (AHI), which determines apnea severity, can be explained by genetic factors [7]. Genetic research has provided new insights into understanding the underlying mechanisms of OSA to develop a more effective targeted therapy and optimize treatment strategies. To date, most efforts focusing on identification of the genetic causes of OSA have used the candidate gene approach, which is based on four intermediate pathogenic pathways: obesity and body fat distribution, craniofacial morphology, ventilatory control, control of sleep, and circadian rhythm [7,8]. However, the method limits validity of the findings because such studies depend on prior hypotheses on disease mechanisms, which precludes the determination of genetic variations in previously unknown pathways [9]. Only a few genome-wide association studies (GWAS) on OSA, which could provide further information with regard to the candidate gene approach, have been conducted to date [9][10][11]. However, data obtained from such study designs are currently limited. Furthermore, the findings on significant or potentially relevant genes that could influence susceptibility to OSA have been controversial, and the results from candidate gene studies and GWAS have not been consistent [8][9][10][11].
Recently, a genome-wide DNA microarray based on highthroughput platforms for gene expression analysis emerged as an efficient and relatively economical tool for studying the genetic basis of complex diseases [12]. In addition, the recent application of next-generation sequencing technologies, such as targeted capture sequencing, could be very instrumental in the identification of new pathogenic genes or new pathogenic sites of known pathogenic genes [13]. In the present study, we made the first attempt to identify susceptibility gene/locus for OSA by combining bioinformatics and targeted capture sequencing. We compared gene expression profiles in subcutaneous adipose tissues of patients with OSA and control samples obtained from the Gene Expression Omnibus (GEO) database to screen for differentially expressed genes (DEGs). Subsequently, the DEGs were identified using a combination of functional enrichment and protein-protein interaction (PPI) network analyses. Hub genes were identified based on the different topological algorithms and a potential candidate gene was selected from the hub genes for validation of gene expression using an additional independent dataset. Moreover, the associations between common and rare functional variants in the candidate gene with OSA were explored using targeted sequencing analysis to reveal potential genetic variants of OSA.

Materials and Methods
2.1. Microarray Dataset Source. A systematic retrieval of gene expression microarray datasets from the National Center for Biotechnology Information GEO (http://www.ncbi.nlm.nih .gov/geo/) was performed to evaluate DEGs between OSA and normal samples. The keyword "obstructive sleep apnea" was used for the screening. The gene expression profile of GSE135917 (https://www.ncbi.nlm.nih.gov/geo/query/acc .cgi?acc=GSE135917) [14] was included in the present study and was based on GPL6244 platform (HuGene-1_0-st; Affymetrix Human Gene 1.0 ST Array). The dataset contained 18 subcutaneous adipose tissue samples obtained from abdominal subcutaneous fat biopsies of 10 patients with OSA and 8 normal controls during a ventral hernia repair surgery [14]. OSA severity was assessed using the ARES™ Unicorder (Watermark Medical®, Boca Raton, FL, USA), a previously validated portable sleep monitor worn for two consecutive nights prior to surgery [14].

Data Preprocessing and Differential Expression Analysis.
We used the R software (version 3.6.2; https://www.rproject.org/) in addition to packages available from Bioconductor (http://www.bioconductor.org/) to perform statistical analyses. Processing, normalization, and quality control of the microarray datasets with raw data (.CEL files) were performed using the Affy package [16] in R on the Affymetrix platform. The robust multichip average (RMA) method was used for background correction, quantile normalization, and median polish summarization. Probe IDs were converted into gene symbols based on the annotation platform. Probe sets without corresponding gene symbols or genes with more than one probe set were removed or averaged.
The Linear Models for Microarray Data (LIMMA) package [17] in R Bioconductor was used to identify DEGs (|log 2 fold change | ≥1; false discovery rate ðFDRÞ < 0:05) between OSA samples and normal controls. FDR was controlled using the Benjamini-Hochberg method, and empirical Bayes-modified t-tests were performed to select sets of DEGs. Heat map and volcano plot of DEGs were generated in R using the ggplot2 and pheatmap packages.

Biological Functions and Pathway Enrichment Analyses.
The identified DEGs were subjected to gene ontology (GO) term enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses using clusterProfiler package (version 3.10.0) [18] and org.Hs.eg.db annotation package (version 3.10.0) [19] in R to further elucidate the biological functions of DEGs identified in OSA and normal control samples. The significant GO terms or KEGG pathways were enriched by more than two genes with a threshold of FDR < 0:05. , which is a plugin of Cytoscape, was used to search for the most dense and significant module in the PPI network using the following criteria: degree cutoff = 2, node score cutoff = 0:2, K − core = 2, and max depth = 100. cytoHubba (version 0.1), a Cytoscape plugin was used to select and identify the top ten ranking genes based on 11 topological analysis algorithms [20]. The top ten nodes were defined as core genes for each algorithm in the topological network. The intersected core genes derived from nine topological algorithms were considered as hub genes. The intersection of the top gene sets revealed the hub genes, which were presented using a Venn diagram generated with OmicShare tools (http://www.omicshare.com/tools). Furthermore, functional annotation of hub genes was carried out using Database for Annotation, Visualization and Integrated Discovery (DAVID database version 6.8; https://david .ncifcrf.gov/). The potential candidate was selected from the hub genes based on available literature and through data mining.

Validation and Analysis of the Candidate
Gene. An additional independent microarray dataset, GSE75097, was used to verify the differential expression of the candidate genes in OSA and control samples. The analyses and visualization of gene expressions were performed using ggpubr and ggplot2 R packages. The DEGs (P value < 0.05) were used for subsequent targeted sequencing.  [21]). Subjects without OSA and severe OSA (exhibiting extreme phenotype based on AHI) were enrolled for an additional genomic study. In the present study, we selected AHI (either very low or very high) as the extreme phenotype. Extreme phenotype sampling that involved selection of subjects from the extremes of trait distribution was applied, which could enhance the ability and statistical power to detect rare variants. Following the inclusion and exclusion criteria as detailed in our previous study [21], while excluding smokers, a total of 100 subjects (50 without OSA and 50 with severe OSA) were enrolled for the study.

Interventions.
Overnight PSG monitoring, clinical data acquisition, blood sample collection, and genomic DNA extraction from all subjects were performed according to the standard procedures as previously described [21,22]. All subjects underwent overnight PSG monitoring (Compumedics E series, Australia) and were evaluated by a registered polysomnographic technologist based on the American Academy of Sleep Medicine criteria for scoring sleep and associated events [23]. http://annovar.openbioinformatics.org/en/ latest/). Frequency of variants was evaluated based on publicly available databases (1000 Genomes, ESP6500, and ExAC03). A combination of pathogenicity prediction software packages (SIFT [24], POLYPhen V2 [25], MutationTaster [26], CADD [27], and DANN [28]) was used to predict the potential impact of each genetic variant on gene function. Requiring at least two of the software packages to support the variant may be damaging.

Statistical Analyses
Continuous data were expressed as means ± standard deviations or medians (interquartile range), while categorical data were expressed as n (%). Independent Student's t-test or Mann-Whitney U-test was used to analyze continuous variables based on the normality of data distribution. Chi-3 BioMed Research International squared test or Fisher's exact test was used to analyze categorical variables as appropriate. Hardy-Weinberg equilibrium, single nucleotide polymorphism (SNP) association analyses, and multiple comparison correction were performed using PLINK (version 1.0.7; http://pngu.mgh.harvard.edu/ purcell/plink/). A P value < 0.05 was considered statistically significant.

Identification of DEGs.
Raw microarray data from the GSE135917 dataset was subjected to RMA preprocessing and subsequently normalized to the median of all samples. A box plot of each sample before and after normalization demonstrated that the chip data had been normalized and were available for DEG selection (Supplementary Material 1, Supplemental Figure S1). A total of 23281 gene expression values were obtained from the 18 samples after data preprocessing. Based on the calculation criteria of absolute log 2 FC ≥ 1 and FDR < 0:05, 373 DEGs, which included 342 downregulated genes and 31 upregulated genes were identified in OSA samples relative to normal controls (Supplementary Material 2). A volcano plot (Figure 1(a)) was used to illustrate the distribution of DEGs, and a heat map (Figure 1(b)) used to present the results of bidirectional hierarchical clustering of DEGs and samples based on the expression level of DEGs.

Functional and Pathway Enrichment Analyses of DEGs.
GO term enrichment and KEGG pathway analyses were performed using the clusterProfiler package in R, and only the GO terms and KEGG pathways enriched with an adjusted P value < 0.05 (Benjamini-Hochberg correction for multiple testing) were considered. The GO functional enrichment analysis results revealed that 373 DEGs were mapped to four GO terms including three biological process (BP) terms and one molecular functional (MF) term (Figure 2(a)). Enriched GO terms were predominantly associated with olfactory receptor activity and detection of chemical stimulus involved in sensory perception of smell. Furthermore, KEGG pathway analyses demonstrated that DEGs were significantly enriched in olfactory transduction and neuroactive ligand-receptor interaction pathways (Figure 2(b)).  (Figure 3(a)). We screened 11 upregulated and 79 downregulated genes in patients with OSA. Subsequently, we combined the results of MCODE and cytoHubba analyses and selected the top ten ranking genes based on nine topological analysis algorithms (Table 1). A Venn diagram-based analysis was used to determine the intersection of the nine gene sets, as presented in Figure 3(b). Finally, the key hub genes including glial cell line-derived neurotrophic factor (GDNF), SLC2A2, PRL, and SST were identified and are presented in Supplementary Material 1 (Supplemental Table S1); a functional annotation of the genes was processed using DAVID. Strikingly, the hub gene, GDNF, has been previously reported to be a key candidate gene associated with OSA in a large candidate gene study [8], but it remains controversial. Combining our data mining analyses, we finally focused on the GDNF gene and used a different independent dataset to validate GDNF gene expression.

Validation of GDNF Gene Expression.
Gene expression level analysis of GDNF based on data obtained from an independent dataset, GSE75097, was performed to validate the reliability of the results obtained from the GSE135917 dataset. In the validation dataset, the non-OSA group, which comprised of six subjects with primary snoring, and an equal number of patients with OSA selected from the MSO or VSO groups based on age, gender, and major comorbidities were included in the analyses (see Supplementary Material 1, Supplemental Table S2 for details). A comparison of baseline data of enrolled subjects from the GSE75097 dataset is summarized in Table 2. The expression level of the GDNF gene was significantly lower in the OSA group than in the non-OSA group (P = 0:015), which was consistent with the results obtained from the GSE135917 dataset ( Figure 4). Therefore, we speculated that the GDNF gene could be an indicator of OSA and further applied the targeted capture sequencing technology to explore the association between common and rare variants in the GDNF gene with OSA.

Targeted
Capture Sequencing of GDNF Gene. The clinical characteristics of subjects without OSA and patients with severe OSA are summarized in Table 3. The severe OSA group had more male, older, and obese individuals than the non-OSA group. Patients with OSA had a significantly higher BMI, abdominal and neck circumference, apnea index, hypopnea index, and AHI, as well as lower oxygen saturation than the patients without OSA (all P < 0:001). In addition, no significant differences were observed in alcohol consumption (P = 0:086), systolic blood pressure (SBP; P = 0:748), diastolic blood pressure (DBP; P = 0:389), sleep efficiency (P = 0:329), or total sleep time (P = 0:608) between the two groups. The target region of GDNF gene was sequenced at an average depth of 411X in the present study. Based on the minor allele frequency (MAF) of the control group, all variations were designated as common (MAF > 1%) or rare (MAF < 1%). A total of 28 SNPs and one short tandem repeat were present in the GDNF gene fragment and were subjected to association analyses under additive, dominant, recessive, and allele genetic models using unconditional logistic regression models. All SNPs satisfied the Hardy-Weinberg equilibrium. The genotype and allele distributions of the 29 polymorphisms between the two groups under the four genetic models were not significantly different. However, we identified two rare nonsynonymous variants (p.D56N and p.R93Q), which could be potentially deleterious based on the prediction by SIFT, POLYPhen V2, MutationTaster, and DANN software packages (Table 4).

Discussion
Currently, bioinformatics has become increasingly crucial in the study of the pathogenesis of multifactorial disorders [29].
The identification of variations in gene expression in tissues relevant to a disease is a key step toward enhancing our understanding of pathogenesis and could eventually lead to improved diagnosis and treatment [29]. In the present study, we identified 31 upregulated and 342 downregulated genes in subcutaneous adipose tissues of patients with OSA when compared with normal controls, which suggested that the occurrence and development of OSA are a complex biological process involving multiple genes and steps. Biological function and pathway enrichment analyses indicated that  BioMed Research International DEGs were primarily involved in the GO terms or KEGG pathways associated with olfactory receptor activity and olfactory transduction, as well as neuroactive ligandreceptor interaction. Furthermore, GDNF, SLC2A2, PRL, and SST were identified as the hub genes from the PPI network. Of note, GDNF, which has been previously reported to be a key candidate gene associated with OSA [8], was also screened and initially validated in our microarray data mining. Nevertheless, the association between GDNF and OSA remains controversial [8,30] and warrants further studies. As such, the association between genetic variants of the GDNF gene with OSA was further explored using targeted capture sequencing. Although we could not find significant SNPs, we observed two rare nonsynonymous variants. The c.166G>A mutation results in an Asp-to-Asn amino acid change (p.D56N) of a conserved Asp, and the c.278G>A mutation results in an Arg-to-Gln amino acid change (p.R93Q) of a conserved Arg residue in GDNF, which has not been previously implicated in OSA. The mutation could influence the biochemical properties of proteins, which requires further functional studies and validation.
GDNF is a member of the transforming growth factor family, which is crucial for motor neurons, dopaminergic neurons, and peripheral neurons [31]. A large candidate gene study on OSA suggested a potential pathogenic role of polymorphisms in the GDNF gene (rs2910705, rs2975100, rs2973042, and rs2973041) in European Americans [8]. The associations of GDNF with OSA remained after adjustment for BMI, implying that the genetic variants influence susceptibility to OSA through obesity-independent pathways [8]. Larkin et al. argued that GDNF variants are associated with the pathogenesis of OSA via the influence of ventilatory control abnormalities due to the following reasons [8]. First, ventilatory control abnormalities may predispose individuals to OSA by exacerbating ventilatory instability, impairing arousal response to airway obstruction, or promoting an imbalanced activation of the upper airway muscles when compared with the chest wall muscles. Additionally, GDNF   is a key factor influencing the development of neural pathways (e.g., development of noradrenergic neurons and differentiation), which are vital for normal respiration. Moreover, GDNF seems to play a pivotal role in sensory afferent neurons in the carotid body, which could be essential in the development of hypoxic responses. Finally, knockout of GDNF gene results in an abnormal central respiratory output and severe mutations in GDNF are associated with the congenital central hypoventilation syndrome. Similarly, the present study revealed that GDNF expression decreased in OSA samples. Notably, the results of Larkin et al. were not successfully replicated in a large Icelandic OSA cohort [30], and the SNPs in the study were not identified in our study, which could be because of the variation in study designs, sample sizes, study populations, or environmental factors in the studies. In addition, due to the increasingly recognized adverse impacts of OSA on cardiovascular disease (CVD), metabolic syndrome, and neuropsychiatric disorders, Mukherjee et al. [7] suggested that a key subject for consideration in future research should be to identify the genetic links between OSA and adverse health outcomes, that is, to examine if genetic variants are shared between OSA and comorbid disorders. GDNF is one of the most potent neurotrophic factors for catecholaminergic neurons and its involvement in the survival, proliferation, differentiation, and migration of nigrostriatal dopaminergic neurons is well-known [32]. Previous animal studies have demonstrated that GDNF is intimately associated with learning and memory [33,34] and that its heterozygous mutation (GDNF+/−) in mice causes anomalies in hippocampal synaptic transmission, which suggests the role of GDNF in cognitive performance [35]. Furthermore, GDNF has been reported to be associated with aging [36] and various diseases such as Alzheimer's, Huntington's, and Parkinson's diseases [37], as well as several neuropsychiatric disorders [38]. The role of GDNF in the treatment of Parkinson's disease is currently being investigated through clinical trials [39]. OSA has been strongly associated with CVD, metabolic syndrome, and neuropsychiatric disorders, and some genetic variants could also be critical in determining whether pathways disrupted during the pathogenesis of OSA causally contribute to related consequences [7,8]. Consequently, it can be presumed that a mutation in the GDNF gene could cause OSA and cognitive dysfunction through different mechanistic effects, which warrants further research.
The present study had several limitations. First, the two microarray datasets used were obtained from public databases, which were of poor quality, and the sample size for targeted sequencing was relatively small due to financial constraints. Consequently, we cannot rule out the fact that lack of statistical power influenced our results; replication studies should be conducted in future using a large sample size. Second, the results of the present study are subject to confounding factors due to the use of different original samples and conditions of the training and validation datasets, and the external validity of our results could be limited. Therefore, more appropriate and robust validations are required in future studies. Third, the number of patients with severe OSA enrolled in the study may not fully represent the general OSA population. Fourth, we did not investigate the interaction between gene and environmental exposure, which underlies the pathogenesis of many complex diseases. Finally, the exact mechanisms of the identified rare nonsynonymous variants and the effect of GDNF gene mutation on the cognitive function of patients with OSA remain indeterminate and require further functional studies.
In conclusion, the present research is a first attempt at studying potential genetic mechanisms influencing OSA through a combination of bioinformatics and targeted capture sequencing analyses. GDNF was identified as a promising candidate gene associated with OSA and its two rare