Identification of Crucial lncRNAs, miRNAs, mRNAs, and Potential Therapeutic Compounds for Polycystic Ovary Syndrome by Bioinformatics Analysis

Background This study was aimed at mining crucial long noncoding RNAs (lncRNAs), microRNAs (miRNAs), and messenger RNAs (mRNAs) for the development of polycystic ovary syndrome (PCOS) based on the coexpression and the competitive endogenous RNA (ceRNA) theories and investigating the underlying therapeutic drugs that may function by reversing the expression of lncRNAs, miRNAs, and mRNAs. Methods RNA (GSE106724, GSE114419, GSE137684, and GSE138518) or miRNA (GSE84376 and GSE138572) expression profile datasets of PCOS patients were downloaded from the Gene Expression Omnibus database. The weighted gene coexpression network analysis (WGCNA) using four RNA datasets was conducted to construct the lncRNA-mRNA coexpression networks, while the common differentially expressed miRNAs in two miRNA datasets and module RNAs were used to establish the ceRNA network. A protein-protein interaction (PPI) network was created to explore the potential interactions between genes. Gene Ontology and KEGG pathway enrichment analyses were performed to explore the functions of genes in networks. Connectivity Map (CMap) and Comparative Toxicogenomics Database (CTD) analyses were performed to identify potential therapeutic agents for PCOS. Results Three modules (black, magenta, and yellow) were identified to be PCOS-related after WGCNA analysis, in which KLF3-AS1-PLCG2, MAPKAPK5-AS1-MAP3K14, and WWC2-AS2-TXNIP were important coexpression relationship pairs. WWC2-AS2-hsa-miR-382-PLCG2 was a crucial ceRNA loop in the ceRNA network. The PPI network showed that MAP3K14 and TXNIP could interact with hub genes PLK1 (degree = 21) and TLR1 (degree = 18), respectively. These genes were enriched into mitosis (PLK1), immune response (PLCG2 and TLR1), and cell cycle (TXNIP and PLK1) biological processes. Ten small molecule drugs (especially quercetin) were considered to be therapeutical for PCOS. Conclusion Our study may provide a novel insight into the mechanisms and therapy for PCOS.


Introduction
Polycystic ovary syndrome (PCOS), characterized by polycystic ovary morphology, hyperandrogenism, and ovulatory disturbance, is the most frequent endocrine disorder in women of the reproductive age, affecting approximately 6%-9% of women worldwide [1]. PCOS is not only a leading cause of female infertility [2] but also an important risk factor for insulin resistance, diabetes, obesity, hypertension, metabolic syndrome, cardiovascular disease, and endometrial cancer [3]. However, the treatment of PCOS remains a challenge; thus, it is of significance to investigate the mechanisms and develop more effective preventative and therapeutic strategies.
Although the pathogenesis of PCOS has not been fully understood, apoptosis of ovarian granulosa cells (OGCs) is considered to be a potential contributor [4,5]. By binding to the receptor on the OGCs, a follicle-stimulating hormone induces the expression of cholesterol side-chain cleavage cytochrome P450 (CYP19A1) and 17β-hydroxysteroid dehydrogenase (CYP17A1) which are rate-limiting enzymes to catalyze conversion from estrogen to estradiol [6]. Apoptosis of OGCs leads to the deficiency of estradiol and blocks the follicle development and maturation, ultimately forming more numbers of hypogenetic secondary follicles undergoing atresia and antral follicles [7]. Therefore, exploration of molecular mechanisms associated with the apoptosis of OGCs may provide underlying targets for developing drugs.
Accumulating evidence has suggested that long noncoding RNAs (lncRNAs), microRNAs (miRNAs), and messenger RNA (mRNAs) are important regulators of cell physiological and pathological processes (such as proliferation and apoptosis): mRNAs that encode the proteins were directly functional; miRNAs, approximately 22 nucleotidelong noncoding RNAs, act by binding to complementary sequences in the 3 ′ untranslated region (UTR) of mRNAs and then to negatively regulate the expressions of these mRNAs [8]; lncRNAs, >200 bp in length, can interact with the miRNA as a competing endogenous RNA (ceRNA) to regulate the expression of target genes [9] or directly regulate the transcription of mRNAs via a coexpression manner [10], indicating that aberrant expression of lncRNAs, miRNAs, and mRNAs in OGCs may be the primary contributors for PCOS development and potential targets for the treatment of PCOS. This theory has been demonstrated in some studies. For example, Yang et al. found that the expression level of lncRNA BANCR was significantly higher in OGCs of patients with PCOS compared with non-PCOS patients. Transfection with BANCR expression vector significantly inhibited proliferation, promoted apoptosis of ovarian granulosa-like tumor cell line KGN, and significantly enhanced the expression of proapoptotic Bax and p53 [5]. Li et al. reported that lncRNA SRLR was upregulated in PCOS patients compared with heathy females. Overexpression of lncRNA SRLR promoted apoptosis of KGN cells by upregulation of interleukin-6 (IL-6) [11]. The study of Fu et al. revealed that miR-16 expression was downregulated in ovarian cortex tissues of PCOS patients. Overexpression of miR-16 inhibited apoptosis of OGC in vitro and alleviated PCOS in vivo, which was associated with its function to downregulate its direct target programmed cell death 4 (PDCD4). Enforced expression of PDCD4 reversed the effects of miR-16 on OGC apoptosis [12]. Liu et al. demonstrated that lncRNA PVT1 may regulate the progression of PCOS by modulating miR-17-5p-PTEN: overexpressed PVT1 could bind with and inhibit the expression of miR-17-5p in OGCs of PCOS, thereby preventing the inhibitory effects of miR-17-5p for its target gene PTEN and leading to the elevated expression of PTEN. shRNA-mediated silencing of PVT1 and transfection with miR-17-5p mimics could decelerate apoptosis, while accelerate colony formation ability and proliferation of OGCs [13].
Wang et al. observed that cotreatment with metformin and sitagliptin attenuated the apoptosis in PCOS model cells by inducing lncRNA H19 expression and proposed the threecombined strategy (metformin-sitagliptin-H19-expressing lentiviruses) for the treatment of PCOS [14]. However, crucial lncRNAs, miRNAs, mRNAs, and drugs that target these molecules remain rarely reported for PCOS.
Recently, with the development of the sequencing technique, there were some studies to mine the key lncRNAs [15], miRNAs [16][17][18], and mRNAs [15,17,19] in OGCs of PCOS patients using the gene or miRNA expression profile data. However, all of them focused only on the differentially expressed mRNAs (coding genes) (DEGs) [15,19], lncRNAs (DELs) [15], miRNAs (DEMs) alone [16,18], or DEM-DEG [17]. Also, the small molecule drugs that targeted them were not investigated. In this study, we aimed to screen pivotal lncRNAs, miRNAs, and mRNAs based on lncRNA-mRNA coexpression and lncRNA-miRNA-mRNA ceRNA mechanisms via integrated analysis of multiple high-throughput datasets. In addition, the Connectivity Map (CMap) and Comparative Toxicogenomics Database (CTD) analyses were performed to acquire bioactive compounds that may have potential for the treatment of PCOS by regulating the expression of DEGs, DELs, and DEMs.

Differential Expression
Analysis. The mRNAs and lncRNAs in the above four datasets were reannotated according to the official nomenclature in the HUGO Gene Nomenclature Committee (HGNC; http://www.genenames.org/) [21]. Only the shared mRNAs and lncRNAs in all of the included datasets were obtained for the differential 2 BioMed Research International analysis. The DELs and DEGs between PCOS and normal controls were identified using the MetaDE.ES function in the MetaDE package (version 1.0.5, https://cran.r-project .org/web/packages/MetaDE/). The threshold was set based on three aspects: (1) no heterogeneity was present among these datasets, which was assessed by tau 2 statistic (=0) and chi-square-based Q-test (p > 0:05); (2) false discovery rate (FDR) was less than 0.05; and (3) the differential trend was consistent based on log 2 FC (fold change). The Linear Models for Microarray Data (LIMMA) package (version 3.34.7; https://bioconductor.org/packages/ release/bioc/html/limma.html) [22] for R was used for identification of DEMs in GSE138572 and GSE84376 datasets, with |logFC | >0:263 and false discovery rate ðFDRÞ < 0:05 defined as the cut-off point. The DEMs commonly screened in two datasets were shown by Venn diagram. The volcano plot was visualized using the ggplot2 package (version 3.3.0; https://cran.r-project.org/web/packages/ggplot2) in R. A heat map was created using the "pheatmap" package (version: 1.0.8; https://cran.r-project.org/web/packages/pheatmap) in R.

Crucial lncRNA and mRNA Identification for PCOS.
Weighted gene coexpression network analysis (WGCNA) package in R (version 1.61; https://cran.r-project.org/web/ packages/WGCNA/index.html) [23] was used to cluster mRNAs and lncRNAs into several coexpression modules in which genes were highly correlated and considered to be crucial. During this WGCNA analysis, four steps were included: (1) calculating the expression and connectivity correlations of lncRNAs and mRNAs between any two datasets (GSE106724, GSE114419, GSE137684, and GSE138518), with correlation coefficient > 0:6 and p value > 0.001 defined as significantly correlated; (2) using the GSE106724 dataset to select the soft threshold power (β) based on the scalefree topology criterion (that is, when the R 2 reached 0.9 for the first time); (3) generating the dendrogram by calculation of the topological overlap matrix dissimilarity between genes in the GSE106724 dataset and identifying gene modules (cutHeight = 0:99 and minSize ≥ 100) by the dynamicTreeCut method [24]; (4) assessing the preservation (Z-score > 5 and p < 0:05) of the identified modules in the GSE114419, GSE137684, and GSE138518 datasets using the modulePreservation statistics [25]; (5) mapping the DEGs and DELs into the modules and screening the modules with more DEGs and DELs (p < 0:05, fold enrichment > 1) by using the hypergeometric algorithm [f ðk, N, M, nÞ = Cðk, MÞ * Cðn − k, N − MÞ/Cðn, NÞ] [26]; and (6) correlating coexpression gene modules with PCOS development by moduleTraitCor and moduleTraitPvalue algorithms.
2.4. lncRNA-mRNA Coexpression Network Visualization. The associations between the DEGs and DELs screened in the crucial WGCNA modules were evaluated by using the cor.test function (https://stat.ethz.ch/R-manual/R-devel/library/ stats/html/cor.test.html) in R, and then the coexpression network was constructed based on their Pearson correlation coefficients (PCC). The coexpression network was visualized using Cytoscape (version 3.6.1; http://www.cytoscape.org/).

Protein-Protein Interaction (PPI) Network Construction.
The interactions between proteins coding by the module DEGs were predicted using the STRING (Search Tool for the Retrieval of Interacting Genes; version 10.0; https:// string-db.org) database [29]. PPIs with a confidence score ≥ 0:4 were chosen to construct the network. The PPI network was visualized by Cytoscape software (version 3.6.1; http://www.cytoscape.org/). Furthermore, the CytoNCA plugin in Cytoscape software (http://apps.cytoscape.org/ apps/cytonca) [30] was applied to explore the hub genes by calculating the topological features of each protein in the PPI network, including degree centrality (DC), betweenness centrality (BC), and closeness centrality (CC). The proteins with high DC, BC, and CC were considered as hubs and essential for PCOS.

Function Enrichment Analysis. The Gene Ontology (GO) annotation, Kyoto Encyclopedia of Genes and Genomes
(KEGG), and REACTOME pathway analyses were conducted for the DEGs in the PPI network using the online Database for Annotation, Visualization and Integrated Discovery (DAVID) (version 6.8; http://david.abcc.ncifcrf.gov) [31]. p value < 0.05 under hypergeometric test was considered as statistically significant.

Small
Molecule Drug Analysis. The CMap (https://portals .broadinstitute.org/cmap/) was used to identify candidate small molecule drugs that may have potential to treat PCOS patients. The identified module DEGs were uploaded into the CMap web tool, and then the resultant small molecules with negative connectivity scores were considered to be therapeutic agents because it could reverse the expression of the query signature. The small molecules with the enrichment score approximate to -1 may be especially effective. Furthermore, CTD (http://ctdbase.org) was also searched using the gene name, after which a serial of chemical-gene interaction pairs were obtained. The small molecules identified by CMap and CTD were then compared with these chemical-gene interaction pairs to screen the common. The gene-drug interaction networks were visualized using Cytoscape (version 3.6.1; http://www.cytoscape.org/).     (Table S1). Heat map analysis showed that these DELs and DEGs can obviously cluster the samples into two groups in any one dataset (Figure 1(a)). Furthermore, LIMMA analysis was used to screen DEMs in GSE138572 and GSE84376, respectively. The results showed that 80 DEMs (upregulated, 16; downregulated, 64) were identified for the GSE138572 dataset (Figure 1 (Table S3). These DEMs also can obviously distinguish the PCOS patients from normal controls in the GSE138572 (Figure 1(c)) and GSE84376 (Figure 1(e)) datasets. The Venn diagram revealed that there were 11 DEMs (including 4 upregulated and 7 downregulated) consistently expressed in these two datasets (Figure 1(f)).

Construction of lncRNA-mRNA Coexpression Network
Using Module Genes. Based on GSE106724, GSE114419, GSE137684, and GSE138518 datasets, WGCNA was applied to detect the PCOS-related modules which included the potential interactions between lncRNAs and mRNAs. These four datasets were considered to be comparable because there were significantly positive correlations of RNAs (coefficient > 0:6 and p value < 1e-200) between any two datasets irrespective of their expression levels (Figure 2(a)) or the connectivity (Figure 2(b)). As shown in Figures 2(c) and 2(d), power = 20 was chosen as the soft-thresholding to ensure a scale-free network (R 2 = 0:9; mean connectivity = 1). A cluster dendrogram showed that eleven colors of modules (black, blue, brown, green, grey, magenta, pink, purple, red, turquoise, and yellow, ranging in size from 136 to 2,812) were identified using the GSE106724 dataset (Figure 3(a); Table 1). These modules were also observed in the analysis of GSE138518, GSE137684, and GSE114419 datasets (Figures 3(b)-3(d)). Among these eleven modules, black, blue, magenta, purple, turquoise, and yellow may be preserved because of their Z-score > 5 and p value < 0.05 (Table 1). Black, magenta, and yellow modules were further considered to be crucial for PCOS because there were relatively many DEGs or DELs enriched in them (enrichment fold > 1 and p value < 0.05) ( Table 1). This conclusion was also demonstrated according to the  (Figure 3(e)). Subsequently, the expressions of DELs and DEGs in these three modules were extracted and the PCC were calculated to construct the coexpression network. As shown in Figure 4, all the included genes in the black module were DEGs (26) (Figure 4(a)), while the magenta module (Figure 4 3.3. Construction of the ceRNA Regulatory Network. The ceRNA regulatory network was also constructed using the DELs and DEGs in the black, magenta, and yellow modules. A total of 4 interaction relationship pairs between 4 DELs (LINC00910, MAPKAPK5-AS1, SMCR5, and WWC2-AS2) and 4 DEMs (hsa-miR-455, hsa-miR-4467, hsa-miR-4665, and hsa-miR-382) were predicted by the DIANA-LncBase database, while 12 paired interactions were predicted between 4 DEMs and 9 DEGs by the starBase database. Then, the lncRNA-miRNA-mRNA network was established based on these ceRNA loops ( Figure 5), including SMCR5-hsa-miR-4665-G protein signaling modulator 2 (GPSM2), LINC00910-hsa-miR-455-polo like kinase 1 (PLK1), MAP-KAPK5-AS1-hsa-miR-4467-PLEKHG3, and WWC2-AS2hsa-miR-382-PLCG2. However, only WWC2-AS2-hsa-miR-382-PLCG2 may be believable because the expressions of WWC2-AS2 (upregulated) and PLCG2 (upregulated) were opposite to hsa-miR-382 (downregulated).

Construction of the PPI Network.
In order to provide the potential interpretations for the DEGs that may have no function enrichment, a PPI network was constructed for the DEGs in crucial modules. As a result, a total of 280 interaction pairs were identified, such as MAP3K14-PLK1 and TXNIP-toll-like receptor 1 (TLR1) (Figure 6). PLK1 and TLR1 were considered as hub genes in the PPI because they belonged to the top 20 genes ranked by DC, BC, and CC ( Table 2).

Discussion
Although previous studies have attempted to reveal the molecular mechanisms of PCOS by using high-throughput microarray or sequencing data [15][16][17][18][19], all of them focused on the single dataset (mRNA: GSE95728 [15], miRNA: GSE84376 [16,18], miRNA: GSE84376 and mRNA: GSE34526 [17], and mRNA: GSE34526 [19]) which may result in a high probability of false positive results. In order to prevent this shortcoming, in this study, we analyzed four lncRNA-mRNA expression profile datasets (GSE106724, GSE114419, GSE137684, and GSE138518) using the MetaDE package and only selected the DELs and DEGs that overlapped in the four datasets. The crucial DELs and DEGs were screened via WGCNA analysis which used GSE106724 as the training dataset and GSE114419, GSE137684, and GSE138518 as the validation datasets. Similarly, two miRNA expression profile datasets were used. Although some key miRNAs and mRNAs identified in the previous studies (miR-3135, miR-3188 [16,18], and miR-486 [17]; aquaporin 9, free fatty acid receptor 2, and S100 calcium binding protein A8 [15]) were also found in our study, they were not the focus because they were only identified in one dataset (miRNA, GSE84376) or not included in preserved modules. More interestingly, we, for the first time, integrated the DEL-DEG and DEL-DEM-DEG data to construct the coexpression and ceRNA network. As expected, we identified three new coexpression relationships between 3 DELs and their target genes (KLF3-AS1-PLCG2, MAPKAPK5-AS1-MAP3K14, and WWC2-AS2-TXNIP) and one ceRNA axis among WWC2-AS2, hsa-miR-382, and PLCG2. These mRNAs were predicted to be involved in PCOS by influencing cell mitosis, cell cycle, and immune response. These three DELs (KLF3-AS1, MAP-KAPK5-AS1, and WWC2-AS2), one DEM (hsa-miR-382), and five DEGs [including 3 in the coexpression or ceRNA axes (PLCG2, MAP3K14, and TXNIP) and 2 interacted genes (PLK and TLR1) in the PPI network] were predicted to be regulated by 10 small molecular drugs for the treatment of PCOS via CMap and CTD analyses. Among all these DELs, DEMs, and DEGs, only TXNIP, TXNIP-interacted TLR1, and miR-382-5p were directly reported to be associated with PCOS. TXNIP expression was downregulated during the time of in vitro culture of the OGCs [32][33][34]. Upregulation of TXNIP in the OGCs resulted in the patients suffering PCOS [35]. Higher serum TXNIP was also indicated to be associated with impaired β cell function and insulin resistance in PCOS patients [36]. The expressions of toll-like receptor signaling pathway genes (including TLR1, TLR2, TLR4, TLR8, and CD14) were found to be significantly increased in OGC samples of PCOS patients compared with controls [19,37]. Thus, TXNIP and TLR1 may be involved in PCOS by activating the inflammatory pathway and then inducing cell apoptosis as reported in other diseases. For example, Shan et al. proved that knockdown of TXNIP decreased the inflammatory damage in kidney tissues induced by 2,2′,4,4′-tetrabromodiphenyl ether [38]. The study of Chen et al. revealed that hypoxia induced pancreatic β cell death by upregulating the reactive oxygen species-(ROS-) TXNIP-NLR family pyrin domain containing 3 (NLRP3) inflammasome axis [39]. Also, the traditional Chinese medicine Lycium barbarum polysaccharide was demonstrated to attenuate hepatocyte apoptosis induced by ethanol through inhibiting the TXNIP-NLRP3 inflammasome pathway [40]. By using the littermate model, Mohamed et al. verified that TXNIP deletion ameliorated the inflammatory response in high fat diet-induced nonalcoholic steatohepatitis via deactivation of the TLR2-NLRP3 inflammasome axis [41]. Furthermore, miR-382-5p was shown to be negatively correlated with a free androgen index [42]. The PCOS group had a significantly higher free androgen index than controls [43]. In line with these studies, we also identified

11
BioMed Research International that TXNIP and TLR1 were expressed higher, while miR-382-5p was expressed lower, in the OGCs of PCOS patients than in normal controls.
Furthermore, in vitro exposure of human luteinized mural granulosa cells to dibutyl phthalate was found to induce the high expression of PLK1 [44]. Gestational

12
BioMed Research International exposure to dibutyl phthalate was revealed to induce polycystic ovaries and a hormonal profile similar to PCOS [45]. Mass spectrometry analysis showed that culture of OGCs with 10 ng/mL fibroblast growth factor-(FGF-) 8 and FGF-18 significantly triggered upregulation of several proteins, including PLCG2 [46]. Increased intrafollicular FGF-13 levels were positively correlated with elevated total testosterone and increased ovarian volume, but negatively associated with the MII oocyte rate in PCOS patients [47]. These findings indirectly demonstrated their possible roles in PCOS. In agreement with these studies, we also identified that PLK1 and PLCG2 were upregulated in the OGCs of PCOS patients than in normal controls. Although KLF3-AS1, WWC2-AS, MAPKAPK5-AS1, and MAP3K14 were not previously verified to be associated with PCOS, they may also be important for PCOS because they could, respectively, interact with PCOS-related PLCG2, TXNIP, miR-382-PLCG2, and PLK1 as described above. Also, the expression trend of KLF3-AS1, WWC2-AS, MAPKAPK5-AS1, and MAP3K14 (upregulated) was similar to their interacted genes, while the expression of WWC2-AS2 was inversely correlated with miR-382. These findings indirectly explain the possible regulatory relationships between them.
More importantly, valproic acid, doxorubicin, methotrexate, quercetin, thapsigargin, etoposide, irinotecan, imatinib, gentamicin, and promethazine were predicted to treat PCOS by reversing the expressions of the above crucial DELs, DEMs, and DEGs in this study. Some of them were confirmed to have therapeutic potential for PCOS previously. For example, Khorshidi et al. observed that supplementation of 1,000 mg/day quercetin for 12 weeks significantly decreased testosterone, fasting blood glucose, insulin, and homeostatic model assessment of insulin resistance of PCOS patients compared with the placebo group [48]. Quercetin alleviated the PCOS by suppressing the levels of inflammatory cytokines (IL-1β, IL-6, and tumor necrosis factor α) [49], increasing the activity of antioxidant enzymes (superoxide dismutase, catalase, glutathione-Stransferase, and reduced glutathione) and improving granulosa cell apoptosis (upregulation of Bcl-2 and downregulation of Bax) [50]. In this study, we predicted that quercetin targeted PLK1, TXNIP, and MAP3K14. Thus, we recommended combining quercetin supplementation and siRNA knockdown of PLK1, TXNIP, and MAP3K14 for the treatment of PCOS, which may be more effective than quercetin-alone treatment.
Some limitations of our study should be acknowledged. First, all of the lncRNAs were not previously confirmed to be associated with PCOS by wet experiments. Clinical samples should be collected to further validate their expressions and associations with androgen, MII oocyte rate, and infertility rate. In vitro and in vivo studies should be performed to explore their influence on apoptosis and proliferation of OGCs. Second, the coexpression relationship between lncRNAs and mRNAs and the ceRNA axes among lncRNAs, miRNAs, and mRNAs should be verified by immunoprecipitation, cooverexpression, or coknockdown experiments. Third, the target mechanisms of small molecular drugs on lncRNAs, miRNAs, and mRNAs and the effects of their combination therapy also needed further investigation. Fourth, the significantly differential lncRNAs, miRNAs, and mRNAs that are not commonly expressed in multiple datasets or enriched in WGCNA modules also should be given attention to confirm their importance.

Conclusion
Based on coexpression and ceRNA network analyses, the present study identified that several lncRNAs (KLF3-AS1, WWC2-AS, and MAPKAPK5-AS1), miRNAs (miR-382), and DEGs (PLK1, PLCG2, TXNIP, TLR1, and MAP3K14) were associated with the development of PCOS. In addition, ten small molecular drugs (such as quercetin) were predicted to be therapeutic agents for PCOS by reversing the expressions of these crucial genes. Our study may provide a novel insight into the mechanisms and therapy for PCOS.

Conflicts of Interest
The authors declare that they have no competing interests.

Authors' Contributions
ZZ, XL, XHT, and MCL conceived of the study and participated in its design. ZZ and XL downloaded, processed the data, and performed the related bioinformatic analyses. TTX and WXL contributed to the interpretation of the results. ZZ and XL drafted the article. XHT and MCL revised the manuscript. All authors have read and approved the final version of the article.

Supplementary Materials
Supplementary 1. Table S1: all differential lncRNAs and miR-NAs in four datasets.  14 BioMed Research International