Construction and Investigation of an LINC00284-Associated Regulatory Network in Serous Ovarian Carcinoma

The low survival rate associated with serous ovarian carcinoma (SOC) is largely due to the lack of relevant molecular markers for early detection and therapy. Increasing experimental evidence has demonstrated that long noncoding RNAs (lncRNAs) are involved in cancer initiation and development, and a competitive endogenous RNA (ceRNA) hypothesis has been formulated. Therefore, the characterization of new lncRNA and lncRNA-related networks is crucial for early diagnosis and targeted therapy of SOC. Data on lncRNAs, mRNAs, and miRNAs with differential expression in SOC, compared to normal ovarian tissue, were obtained from the Gene Expression Omnibus (GEO) database. Data on lncRNA expression and clinical data in SOC were obtained from The Cancer Genome Atlas (TCGA). lncRNA-miRNA interactions were predicted by the miRBase database. Different online tools, i.e., TargetScan, RNA22, miRmap, microT, miRanda, StarBase, and PicTar, were cooperatively utilized to predict the mRNAs targeted by miRNAs. The plugin of BiNGO in Cytoscape and KOBAS 3.0 were used to conduct the functional and pathway enrichment analyses. The lncRNA, miRNAs, and mRNAs identified to be expressed at statistically significant and different levels between SOC and healthy fallopian tube tissues were further validated using qRT-PCR. A total of 4 lncRNAs (LINC00284, HAGLR, HCAT158, and BLACAT1) and 111 mRNAs were found to be upregulated in SOC tissues compared to normal tissues, based on the GEO database. LINC00284 was found to be highly expressed in SOC, in association with the upregulation of the transcription factor SOX9. The high LINC00284 expression was associated with poor prognosis and proved to be an independent risk factor in patients with SOC, based on TCGA database. The qRT-PCR validation results closely recapitulated the expression profiles and prognostic scores of the aforementioned bioinformatic analyses. The LINC00284-related ceRNA network was found to be associated with SOC carcinogenesis by biofunctional analysis. In conclusion, the LINC00284-related ceRNA network may provide valuable information on the mechanisms of SOC initiation and progression. Importantly, LINC00284 proved to be a new potential prognostic biomarker for SOC.


Introduction
Ovarian carcinoma (OC) is one of the most common malignancies of the female genital organs, the eighth most lethal female cancer worldwide, and the most lethal gynecological malignancy in developed countries [1]. Serous ovarian carcinoma (SOC) is the most common subtype, accounting for 75-80% of epithelial ovarian carcinomas (EOCs). Due to the lack of effective biomarkers for early detection, approximately 75% of SOC patients present with advanced-stage disease at diagnosis, which results in poor prognosis [2]. Thus, exploring novel biomarkers of SOC progression and prognosis, as well as alternative therapeutic targets, is crucial to improving patient management.
Long noncoding RNAs (lncRNAs) are a class of noncoding transcripts greater than 200 nt in length, which are involved in many biological processes such as chromatin recombination, transcriptional gene expression, and posttranscriptional regulation. lncRNAs play various roles in the regulation of gene expression, serving as "signals," "decoys," "guides," and "scaffolds" [3,4]. There is accumulating evidence that lncRNAs are involved in the initiation and development of many types of carcinoma, including EOC. For instance, lncRNA TPT1-AS1 [5,6], lncRNA TPT1-AS1 [7,8], and HOXD-AS1 [9] were reported to be upregulated in EOC and to promote EOC proliferation and migration. The most common mechanism by which lncRNAs are believed to regulate the expression of target genes involves their role as ceRNAs [10].
In the last decade, signaling networks formed by lncRNA and miRNA molecules were found to coordinate the regulation of gene expression. According to the ceRNA hypothesis, mammalian lncRNAs function as "miRNA sponges," which competitively bind to miRNAs to antagonize them. This represents one of the "decoy" mechanisms [11]. The ceRNA hypothesis suggests that a variety of RNA molecules form interaction networks, in which lncRNAs, miRNAs, and mRNAs are in a dynamic equilibrium. Alterations in the level of one or more of these molecules affect the expression of the target gene(s), which could lead to tumorigenesis [11].
Here, we comprehensively investigated lncRNA, miRNA, and mRNA sequencing data of SOC and control samples from the Gene Expression Omnibus (GEO) data matrix, to identify aberrantly expressed species. Next, the prognostic value of overexpressed lncRNAs (LINC00284, HAGLR, HCAT158, and BLACAT1) was assessed in patients with SOC, based on TCGA database. Finally, transcription factors (TFs) positively associated with LINC00284 expression were identified.

Materials and Methods
2.1. Data Collection. Gene Expression Omnibus (GEO) datasets including GSE18520, GSE36668, GSE119055, and GSE83693 were downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/). Specifically, 61 serous ovarian carcinomas and 14 normal ovarian surface epithelium tissues were used for lncRNA and mRNA data analysis in the GSE18520 and GSE36668 datasets; 22 serous ovarian carcinomas and 7 normal ovarian surface epithelium tissues were used for miRNA data analysis in the GSE119055 and GSE83693 datasets.

Analysis of Differentially Expressed
Genes. Differential expression analysis was carried out to identify differentially expressed lncRNAs, mRNAs, and miRNAs between SOC and normal tissues by using the R/Bioconductor package of edgeR, setting a cutoff value of | log 2FC| > 2 (FC, fold change) and a P value < 0.01 as the statistical significance threshold.

Survival Analysis
Based on TCGA Data. For survival analysis in TCGA SOC patients, high-throughput sequenc-ing LINC00284 expression data (ending date: January 28, 2016) from 371 SOC samples were downloaded using R software (R 3.4.2). The "RTCGAToolbox" library was used for this analysis. The best cutoff value of LINC00284 RNA expression was used as the cutoff value to divide the samples into high-and low-expression groups. The median, minimum, and maximum LINC00284 expression values were 1.11, 0, and 29.65, respectively. The publication guidelines of TCGA Research Network were followed in this study (https://cancergenome.nih.gov/publications/ publicationguidelines). Thus, no further ethical approvals were required.
2.4. Kaplan-Meier Plotter Online Platform. TCGA and GEO SOC datasets were selected using the Kaplan-Meier plotter online platform (http://kmplot.com/analysis/). LINC00284 RNA expression was determined using the 232318_s_at probe (the same probe was used for the GEO database, so that the datasets were comparable). The best cutoff value for LINC00284 RNA expression was automatically selected by the online platform. A total of 614 SOC samples were analyzed for progression-free survival, whereas 356 and 380 samples from SOC patients treated with a combination of taxol and platin were employed for overall survival and progression-free survival analyses, respectively.
2.6. Functional Annotation. The BiNGO plugin in Cytoscape (version 3.5.1) and KOBAS 3.0 (http://kobas.cbi.pku.edu.cn/) were used to conduct the functional and pathway enrichment analyses. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed to assess the potential biological functions and pathways of the overexpressed mRNAs included in the network (P value <0.05).

Preparation of Human SOC Samples.
In total, 40 and 20 formalin-fixed, paraffin-embedded SOC and healthy fallopian tube tissue specimens (one from each patient), respectively, were obtained from the Department of Pathology of the First Affiliated Hospital of Shihezi University School of Medicine. The collection of specimens was approved and supervised by the Ethics Committee of the First Affiliated Hospital of Shihezi University School of Medicine. Clinical data of patients with SOC, including age, recurrence-free survival, and overall survival, were collected from the on-paper medical records at the First Affiliated Hospital of Shihezi    2.9. Statistical Analysis. A nonparametric test was used to analyze the differences in LINC00284 and SOX9 expression between normal ovarian surface epithelium and SOC tissues. Univariate and multivariate analyses using the Cox regression model were conducted to determine the independent significance of relevant clinical covariates. Survival analysis was performed using the Kaplan-Meier method, and the logrank test was used to analyze the correlation between LINC00284 expression and SOC patient prognosis. All tests were two-sided. P < 0:05 was considered significant, and all analyses were performed using the Statistical Product and Service Solutions (SPSS) software (version 20.0; SPSS, Chicago, IL).

Results
3.1. Screening of lncRNAs in GEO Databases. The differential expression of lncRNAs and mRNAs between SOC and normal tissues was separately analyzed in 2 datasets of the GEO database. Genes with a fold change > 2 and P value < 0.01 were considered discriminatively expressed. Four lncRNAs were identified in both datasets (LINC00284, HAGLR, HCAT158, and BLACAT1), and 111 mRNAs were found to be upregulated in SOC compared to normal tissues ( Figure 1(a)). Expression heatmaps were constructed based on the above lncRNAs ( Figure 1(b)). The results suggested that the expression profiles of the upregulated species could distinguish SOC tissues from normal tissues.

Screening of Survival-Related lncRNAs.
Survival information on SOC samples from 371 patients was available in TCGA. Receiver operating characteristic (ROC) analysis was used to determine the area under the curve. The ð0, 1Þ point, which maximizes both sensitivity and specificity, could be clearly observed on the ROC curve of each lncRNA expression profile ( Supplementary Fig (available here)). Therefore, we assigned expression scores of 1.27, 19.75, 0.26, and 4.58 to LINC00284, HAGLR, HCAT158, and BLA-CAT1, respectively, as optimal cutoffs for survival analyses. The relationship between these 4 lncRNAs and patient prognosis was evaluated by Kaplan-Meier survival analysis ( Table 1). The results indicated that LINC00284 overexpression (differential expression of LINC00284; Figures 1(c) and 1(d)) was associated with significantly reduced overall survival (P < 0:05; Table 1 and Figure 2(a)). Moreover, based on Kaplan-Meier plotter analysis of TCGA and GEO data, patients with LINC00284 overexpression had shorter progression-free survival than those with low LINC00284 expression (P < 0:001, Figure 2(b)). SOC patients with LINC00284 overexpression who were treated with chemotherapeutic drugs that contained taxol and platin together displayed significantly reduced overall and progression-free survival compared to patients with low LINC00284 expression (P < 0:01 and P < 0:0001, respectively; Figures 2(c) and 2(d)).

LINC00284 Is an Independent Risk Factor for and
Prognostic Predictor of SOC. Based on univariate analysis using the Cox regression model, LINC00284 overexpression was found to be a strong prognostic factor of poor overall survival (P = 0:044; Table 2). In addition, advanced stage (P = 0:038) and age (P = 0:046) were associated with shorter overall survival. For multivariate Cox regression analysis, only variables that were statistically significant based on univariate Cox regression analysis were considered, and the results identified LINC00284 overexpression (P = 0:020), advanced stage (P = 0:038), and age (P = 0:009) as independent prognostic factors (Table 2). ROC analysis (Figure 3) revealed that the area under the curve of LINC00284 expression (AUC = 0:568, P = 0:028; Figure 3(a)) was the same as that of the FIGO stage (AUC = 0:568, P = 0:029; Figure 3(d)). Thus, LINC00284 expression exhibited the same prognostic sensitivity and specificity as the FIGO stage.    (Figure 4(a)). Potential interactions within lncRNA-miRNA-mRNA networks were predicted. Two specific downregulated miRNAs, hsa-miR-195-5p and hsa-miR-497-5p, were predicted to interact with LINC00284 through miRNA response elements, by the miRBase (http://www.mirbase .org/) online tools (Table 3). To improve the predictive accuracy, TargetScan, RNA22, miRmap, microT, miRanda, Star-Base, and PicTar databases were combined to identify candidate mRNA targets of the 2 downregulated miRNAs; mRNAs with at least 3 binding sites were selected. As a result, 15 candidate mRNA targets were identified. Finally, a ceRNA network including 1 lncRNA, 2 miRNAs, and 15 mRNAs was visualized, using the Cytoscape software, based on the interactions among LINC00284, miRNAs, and mRNAs indicated in Table 3 (Figure 4(b)).

Functional
Analysis of Upregulated mRNAs in the LINC00284-Related ceRNA Network. Functional analysis revealed that the 15 upregulated mRNAs in the above ceRNA network were enriched in 64 GO biological process categories and 15 KEGG categories (P < 0:05). The significant GO biological processes of dysregulated genes were regulation of the macromolecule metabolic process (GO: 0060255), regulation of the metabolic process (GO: 0019222), and transcription activator activity (GO: 0016563) (Figure 4(c)). Figure 4(d) shows the significantly enriched pathways related to these upregulated mRNAs, according to KEGG analysis (Figure 4(d)). Two cancer-related pathways were included, i.e., the TGF-beta signaling and the chemical carcinogenesis pathway.

Discussion
Long noncoding regulatory elements, accounting for most of the genome components, are transcribed into lncRNAs located in the nucleus and the cytoplasm. lncRNAs are involved in the regulation of gene expression [4] and affect chromatin modification, X-chromosome silenced genomic imprinting, transcriptional interference and activation, mRNA splicing, mRNA stabilization, and protein translation [14]. Alterations in the expression profile of lncRNAs may be associated with the initiation of specific lesions and may therefore serve as early disease indicators. Indeed, a growing number of lncRNAs were found to be suitable biomarkers for diagnosis and prognosis [15]. Moreover, lncRNAs are also regarded as new potential therapeutic targets.
In the present study, 111 mRNAs and 4 lncRNAs were found to be upregulated in SOC compared to normal tissues, based on the GEO database. Patients with LINC00284     9 Disease Markers overexpression experienced significantly reduced overall survival compared to patients with low LINC00284 expression, based on TCGA database, which was consistent with the results of Kaplan-Meier plotter analysis of TCGA and GEO data. Based on multivariate analysis using the Cox regression model, LINC00284 overexpression was identified as an independent prognostic factor and was related to SOC development and poor prognosis. In addition, ROC analysis revealed that the area under the curve of LINC00284 expression was the same as that of FIGO staging, demonstrating comparable prognostic sensitivity and specificity. Notably, it has been reported that LINC00284 overexpression in triple-negative breast cancer (TNBC) and cancer stem cells (CSCs) contributes to cancer cell survival and tumor growth [16], which is consistent with our results.
The ability of lncRNAs to regulate mRNA stability and protein translation was also demonstrated [14]. We hypothesized that the mRNAs found to be overexpressed in SOC may be regulated by LINC00284. No protein was predicted to directly bind to LINC00284 by the RPISeq machine learning tool (http://pridb.gdcb.iastate.edu/RPISeq/) and the LncTar software (http://www.cuilab.cn/lnctar). Therefore, we reasoned that LINC00284 could act indirectly on target genes by upregulating the expression of specific mRNAs. miRNAs have been reported to bind to the 3 ′ UTR region of their target genes, thereby decreasing the stability of the target mRNA or downregulating the expression of the related protein [10]. The ceRNA hypothesis postulates that lncRNAs recruit free miRNAs, thereby reducing their abundance and affecting the expression of downstream target genes [10]. We used the GEO database to select mRNAs that were downregulated in SOC, identified potential lncRNA-miRNA-mRNA interaction networks based on the presence of specific binding sites, and reconstructed a comprehensive ceRNA network. Several recent studies demonstrated that ceRNA-based mechanisms may operate in all types of carcinoma [5,6,[17][18][19][20][21][22][23]. In the present study, among the miRNAs found to be downregulated in SOC, hsa-miR-195-5p and hsa-miR-497-5p were predicted to bind to LINC00284. Notably, miRNA-195-5p was also found to be downregulated in human prostate cancer and inhibit cell proliferation and angiogenesis by downregulating PRR11 expression [24]. Moreover, miRNA-497-5p is downregulated in breast cancer, which results in PTEN upregulation and promotion of cell proliferation by competitive binding to HOXC13-AS [7,8]. Our results predict that both hsa-miR-195-5p and hsa-miR-497-5p could bind to 11 of the 15 mRNAs that were found to be upregulated in SOC. In addition, function analysis revealed that these upregulated mRNAs may relate to tumor  11 Disease Markers occurrence and development, as also previously reported for many cancers, including EOC. For example, SOX9 [25], MYB [26], and ESRP1 [27] promote ovarian cancer cell proliferation. Therefore, we speculated that a dual modulation by miR-497 and miR-195 could underlie SOC pathogenesis. Vidovic and colleagues found that LINC00284 is mainly expressed in the nucleus of breast cancer cells [16].
However, based on our hypothesized ceRNA mechanism, LINC00284 would mainly function in the cytoplasm. The pathogenesis and microenvironment of these two tumors are different, which may account for a different intracellular distribution of LINC00284. Further research is needed to directly verify the intracellular localization of LINC00284 in SOC.
It was reported that the transcription of lncRNAs is regulated by TFs [13]. We hypothesized that LINC00284 overexpression could be induced by specific TFs. Therefore, we screened TFs that were upregulated in SOC and found that the binding of one of them, SOX9, to the LINC00284 promoter region positively correlated with LINC00284 expression. Of note, in gastric cancer, the upregulation of the transcription factor EGR1 results in enhanced transcription of lncRNA-HNF1A-AS1 and in the promotion of cell proliferation [28]. Based on the present results, we hypothesized that LINC00284 may promote initiation and progression of SOC through the SOX9-LINC00284-miRNA-195/497-5p-mRNA network (Figure 7).
Subsequently, qRT-PCR validation of LINC00284, miR-195-5p, miR-497-5p, MYB, ESRP1, and SOX9 expression and correlation analyses between SOX9 and LINC00284 in 40 SOC tissue samples and 20 healthy fallopian tube tissues were performed. The results of the qRT-PCR validation showed consistent agreement with the expression data available in the GEO and TCGA databases. Next, we analyzed the association between LINC00284 expression and prognosis of the patients with SOC, and the results were similar to the aforementioned bioinformatic analysis results. Therefore, the bioinformatic analysis used in this study can be deemed reliable.  Figure 6: qRT-PCR analysis of the expression of identified molecules. As compared with healthy fallopian tube tissues, (a-d) LINC00284, SOX9, MYB, and ESRP1 are overexpressed in SOC tissues, and (e, f) miR-195-5p and miR-497-5p are expressed at a low level in SOC tissues. (g, h) Kaplan-Meier analysis suggested that patients with LINC00284 overexpression had shorter overall and recurrence-free survival than those with low LINC00284 expression. (i) The expressions of SOX9 and LINC00284 were significantly and positively correlated.

Conclusions
In conclusion, genome-wide analysis in a cohort of patients with SOC identified various dysregulated lncRNA, miRNA, and mRNA networks from the GEO database. LINC00284 was found to be highly expressed in SOC. LINC00284 upregulation was most likely induced by SOX9 and was associated with poor prognosis, proving to be an independent risk factor in SOC. Therefore, LINC00284 could be a new biomarker for predicting the prognosis of SOC. Further in-depth functional characterization of the LINC00284-related ceRNA network may provide valuable insights into the molecular events responsible for SOC initiation and progression.

Data Availability
All datasets are included in the manuscript.

Conflicts of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.