Immune-Related Four-lncRNA Signature for Patients with Cervical Cancer

Cervical cancer (CC) is a common gynecological malignancy for which prognostic and therapeutic biomarkers are urgently needed. The signature based on immune-related lncRNAs (IRLs) of CC has never been reported. This study is aimed at establishing an IRL signature for patients with CC. A cohort of 326 CC and 21 normal tissue samples with corresponding clinical information was included in this study. Twenty-eight IRLs were collected according to the Pearson correlation analysis between the immune score and lncRNA expression (p < 0.01). Four IRLs (BZRAP1-AS1, EMX2OS, ZNF667-AS1, and CTC-429P9.1) with the most significant prognostic values (p < 0.05) were identified which demonstrated an ability to stratify patients into the low-risk and high-risk groups by developing a risk score model. It was observed that patients in the low-risk group showed longer overall survival (OS) than those in the high-risk group in the training set, valid set, and total set. The area under the curve (AUC) of the receiver operating characteristic curve (ROC curve) for the four-IRL signature in predicting the one-, two-, and three-year survival rates was larger than 0.65. In addition, the low-risk and high-risk groups displayed different immune statuses in GSEA. These IRLs were also significantly correlated with immune cell infiltration. Our results showed that the IRL signature had a prognostic value for CC. Meanwhile, the specific mechanisms of the four IRLs in the development of CC were ascertained preliminarily.


Introduction
Cervical cancer (CC) is a malignant gynecologic tumor threatening the health of women. The morbidity and mortality for CC rank fourth worldwide among women [1]. Infection with high-risk human papillomavirus (HPV), especially HPV16 and HPV18, is the main etiologic risk factor for CC and plays an important role in diagnostic tests [2]. Surgery is the main treatment method for CC in early stages while advanced-stage CC can be treated with radiotherapy, chemotherapy, or concurrent chemoradiation, thereby improving the survival rate of CC patients [3]. However, a considerable number of CC patients have poor prognosis due to metastasis or recurrence within two years after treat-ment [4]. Hence, effective prevention to reduce morbidity and individual treatments to improve the prognosis of CC are important for obstetricians and gynecologists.
The immune system can recognize tumor antigens expressed on the surface of tumor cells. It generates an immune response via the activation of effector cells and triggers the release of a series of effector molecules to attack and eliminate tumor cells and to inhibit tumor growth [5]. The immune imbalance in the tumor microenvironment plays an important role in the occurrence and development of cancer. With the development of cellular molecular biology and immunology, immunotherapy has become a new treatment approach for cervical cancer [6]. Tumor immunotherapy acts mainly by increasing the immunogenicity of tumor cells and

Datasets and Preprocessing.
The RNA sequencing profiles associated with CC were obtained from The Cancer Genome Atlas (TCGA) [16] (https://toil.xenahubs.net), which consisted of 306 CC and 13 normal tissue samples. The low-expression genes were filtered, and the genes whose expression level was greater than 0 in more than a third of the samples were retained. Additionally, the RNAs were identified as mRNAs or lncRNAs based on their annotation information in the GENCODE database [17] (https://www .gencodegenes.org/). GPL570 (HG-U133_Plus_2) Affymetrix Human Genome U133 Plus 2.0 Array platform was used to obtain the microarray dataset GSE6791 from the Gene Expression Omnibus (GEO) repository (Gene Expression Omnibus (GEO), http://www.ncbi.nlm.nih.gov/geo/) [18], of which twenty CC and eight normal tissue samples were included. All 326 CC samples and 21 normal tissue samples were contained with corresponding clinical information. For the RNA sequencing profiles obtained from TCGA TAR-GET GTEx, empirical Bayes and linear regression along with Benjamini and Hochberg multiple comparison methods from the limma package [19] (version 3.10.3, http://www .bioconductor.org/packages/2.9/bioc/html/limma.html) were performed to gain adjusted p value and |logFC| (adj:p:value < 0:05, |logFC | >0:585). For GSE6791, GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r/) was used with the SeqMap [20] tool to map probes to mRNA and lncRNA sequences. The immune-related genes (IRGs) were downloaded from the InnateDB database (http://www.innatedb .com) [21]. Foc`using on screening genes that were up-and downregulated consistently, we used Venn analysis to select the intersection genes of the aforementioned datasets. Single-sample gene set enrichment analysis (ssGSEA) [22] was used to identify the immune scores (IS) of each sample. Pearson's correlation coefficient between lncRNAs and IS was calculated for each corresponding samples to identify immune-related lncRNAs (IRLs) (p < 0:01).

Signature Development of IRLs.
Univariate Cox regression analysis with hazard ratio (HR) was gained from overall survival (OS) and OS time from the candidate IRLs. HR > 1 indicated that expression was higher and the risk and the survival rate were lower. lncRNAs that are upregulated in the tumor should, in principle, have an HR greater than 1. The Kaplan-Meier analysis was generated by survminer (version 0.4.3) in the R package based on the expression value, survival time, and survival status to determine the optimal cut point. The log-rank test was performed based on survival (version 2.42-6) in the R package to sort the IRLs with a significant prognostic value (p < 0:05); then, the survival curves were drawn. Multivariate Cox regression analysis was performed based on the expression value, OS, and OS time of IRLs in each sample. Subsequently, the individual prognostic risk model for corresponding samples was established. Expr gene indicated the expression of corresponding IRL for each sample. All the samples were divided into the low-risk and high-risk groups according to the median of risk scores of the following study. All the samples in TCGA were regarded as a total set. Training and valid sets were 2.6. Evaluation of Infiltrating Immune Cells. To further observe the differences in the abundance of infiltrating immune cells of the risk groups, we used the CIBERSORT algorithm, a deconvolution method [25], coupled with LM22 that distinguished 22 immune cell subpopulations from CIBERSORT (a web server), and a heat map for all the samples was drawn using ggplot2 (version 3.2.1). Student's t-test was applied to find significant immune cell subpopulations in risk groups to chart violin plot by boxplot (version 0.3.2) in the R package.

2.7.
Construction of an IRL-Associated ceRNA Network. The ceRNA network was constructed to explore the association among IRLs, miRNAs, and mRNAs based on the ceRNA hypothesis [26]. First, the Pearson correlation coefficient analysis between the IRLs and immune-related DEGs was performed to obtain lncRNA-mRNA pairs (r > 0 and adj:p: value < 0:01). The target miRNAs of IRLs were predicted using DIANA-LncBase v2 [27] [29].
Sorting the mRNA with a significant prognostic value (p < 0:05) in the ceRNA network, the Kaplan-Meier analysis was generated using survminer (version 0.4.3) to determine the optimal cut point, and the log-rank test was performed based on survival (version 2.42-6) in the R package.

Differential Analysis of Genes.
In the aggregate, 1995 differentially expressed lncRNAs, with corresponding clinical information, were extracted from the TCGA and GSE6791 databases mentioned above. Of these, 567 lncRNAs were highly expressed and 1428 lncRNAs were expressed at low levels in the CC samples (Table 1). A total of 64 lncRNAs were obtained from the intersection of the two databases ( Figure 1(a)), among which 10 lncRNAs were consistently upregulated and 54 lncRNAs were consistently downregulated in the two databases. A total of 1040 IRGs were identified from the InnateDB. The ssGSEA analysis was based on 201 IRGs obtained from the intersection of the TCGA and InnateDB databases (Figure 1(b)) to compute IS. Finally, a cohort of 28 IRLs was obtained based on the Pearson correlation coefficient analysis between the 64 lncRNAs and IS of 201 IRGs (p < 0:01).

Univariate Cox Regression and Kaplan-Meier Survival
Analysis. Univariate Cox regression and Kaplan-Meier survival analysis were performed for the cohort of 28 IRLs. Four IRLs (BZRAP1-AS1, EMX2OS, ZNF667-AS1, and CTC-429P9.1) with low expression were found to be in accordance with our expectation and were then included in the signature development. The results are shown in Table 2. The Kaplan-Meier survival analysis revealed that four IRLs were related to OS in CC significantly (p < 0:05) ( Figure 2). The four IRLs were defined as protective factors due to their HR value < 1, which showed that the high expression of the four IRLs was associated with lower OS.

Construction of the Prognostic Risk Model and Validation
Using Four IRLs. A total of 304 TCGA samples were regarded as the total set so that there were both 152 samples in the training set and the valid set as described in the methods. The regression coefficient β was first generated from the training set (β ZNF667−AS1 = −0:152565, β EMX2OS = −0:019887, β BZRAP1−AS1 = −0:17831, and β CTC−429P9:1 = 0:0096755; Table 3) to establish the risk score (RS) formula. Hence, the prognostic risk model for corresponding samples was established using the following formula: Expr gene indicated the expression value of the corresponding IRL for each sample. An RS higher than the median was identified as a high-risk group while an RS lower than the medium was identified as a low-risk group. Using this approach, a risk model signature based on four IRLs was constructed, which was further validated in the total set and valid set using the same β to confirm the prediction potential of the 4-IRL signature. The Kaplan-Meier survival curves based on the log-rank test revealed that OS in the low-risk group was markedly longer than that in the other group in all three sets (p Training−set = 0:0068, p Valid−set = 0:02, and p Total−set = 0:0015; Figures 3(a)-3(c)). The one-year, two-year, and three-year survival ROC curves predicted by the risk model indicated that the AUCs were larger than 0.65 (0.695, 0.66, and 0.676; Figure 3(d)), thus predicting that the risk score model could efficiently forecast over 65% of OS for CC patients. Therefore, the risk model signature based on four IRLs was accurate in predicting OS of CC patients.  Figure 4(b)). The scatterplot for the distribution of IS in risk groups showed that the IS of the high-risk group was significantly lower than that of the low-risk group (Figure 4(c)). Furthermore, the high expression of the four IRLs could be seen with low RS, which suggested that the upregulation of the four IRLs were associated with better prognosis. In contrast, the low expression of the four IRLs had the opposite consequence. An RS contrast of the neoplasm histologic grade showed that the RS in stages IIB-III-IV was significantly greater than that in stages I-II-IIA (Figure 4(c)). This part of the result suggests that a high-risk score has an adverse effect on prognosis, which may be caused by a decrease in immune score.

Nomogram Model Construction and Visualization.
The univariate and multivariate Cox regression analyses of clinicopathological characteristics and the four-IRL signature for the total TCGA dataset demonstrated that the four-IRL signature was an independent risk factor for CC patients (p < 0:05, Table 4). In univariate Cox analysis, FIGO stage and TNM stages were risk factors for CC patients (p < 0:05 ), whereas their prognostic values were not validated in the multivariate Cox analysis. Nonetheless, radiation therapy and a history of tobacco smoking were not correlated with prognosis independently. To better predict prognosis at one-, three-, and five-year OS of CC patients, we constructed a nomogram of variables such as the four-IRL signature, age, and FIGO stage ( Figure 5).
3.6. Gene Set Enrichment Analysis. GSEA analysis of the two risk groups was carried out to predict enrichment status disparities of molecular mechanism functions. The enrichment analysis showed that 118 biological functions were markedly enriched in the low-risk group, whereas only one biological function (GO_ATP_SYNTHESIS_COUPLED_ELECTRON_ TRANSPORT) was enriched in the high-risk group. In the enrichment status enriched in the low-risk group, we sought out couple immune-related responses as shown in Figure 6. to the prognosis and treatment of malignant tumors [29]. From the GSEA analysis in our study, we discovered that the four-IRL signature was associated with many immune characteristics. Hence, the abundance of twenty-two infiltrating immune cells (Figure 7     BioMed Research International naïve, B cell memory, T cell CD4 memory resting, NK cells resting, macrophages M1, dendritic cells activated, mast cells resting, mast cells activated, and neutrophils) was significantly different (p < 0:05) between the risk groups (Figure 7(b)). The abundance of four infiltrating immune cells (B cell naïve, T cell CD4 memory resting, macrophages M1, and mast cells resting) in the low-risk group were significantly higher than that in the high-risk group as shown by Student's t-test (p < 0:05).

Construction of an IRL-Associated ceRNA Network.
Four-IRLs, forty-five hub mRNAs, and thirty-eight miR-NAs were involved in the ceRNA network (Figure 8). A total of 46 lncRNA-miRNA pairs, 232 miRNA-mRNA pairs, and 55 lncRNA-mRNA pairs were identified. The downregulation of 12 mRNAs in the ceRNA network was significantly related to OS in CC. Among them, the downregulation of seven mRNAs (CXCL12, FREM1, IGF1, IRF4, NFATC2, NTN1, and STAT6) had an adverse effect on  Figure 5: A nomogram based on the signature and clinical information. 8 BioMed Research International prognosis, which was in line with what was expected ( Figure 9).

Discussion
In this study, a cohort of 326 CC and 21 normal tissue samples from two datasets (TCGA, GSE6791) were included to identify the differential lncRNAs for patients with CC. A total of 1040 IRGs were collected from the InnateDB. Four IRLs (BZRAP1-AS1, EMX2OS, ZNF667-AS1, and CTC-429P9.1) were identified after ssGSEA analysis, Pearson correlation coefficient analysis, univariate Cox regression analysis, and Kaplan-Meier survival analysis between the lncRNAs and IS. The prognostic risk model based on the four IRLs could divide CC patients into two risk groups according to the median RS, which was validated by dividing the total samples equally. The univariate and multivariate cox regression analyses of clinicopathological characteristics showed that age, AJCC stage, and the four-IRL signature were all independent prognostic factors. A nomogram was constructed based on age, AJCC stage, and four-IRL signature to predict OS for CC patients more clearly. We also found that RS and IS showed significant negative correlation, which indicated that high RS had an adverse effect on prognosis due to a decrease in IS. In our study, the four IRLs (BZRAP1-AS1, EMX2OS, ZNF667-AS1, and CTC-429P9.1) were identified to be protective against CC; only ZNF667-AS1 had been previously reported in CC. In the existing literature, ZNF667-AS1 with low expression has been identified to be negatively correlated with the OS, tumor size, and FIGO stage in CC [30]. Additionally, ZNF667-AS1 can competitively bind to miR-93-3p, which targets PEG3, to regulate the progression of CC [31]. Recent research has shown that inhibiting PEG3 would promote the immune escape of cancer cells [32]. BZRAP1-AS1 was found to be a novel biomarker associated with prostate cancer (PC), being downregulated in PC samples [33]. It was shown, however, to be highly expressed in hepatocellular carcinoma and inhibited the transcription of THBS1 by recruiting DNMT3b to its promoter region [34]. Previous research has revealed that the downregulation of EMX2OS in classical papillary thyroid cancer might independently predict shorter recurrence-free survival [35], while the overexpression of EMX2OS in ovarian cancer and EMX2OS/miR-654/AKT3 axis may target PD-L1 (programmed cell death protein 1) to suppress the initiation and progression of cancer [36]. Accumulating evidences have suggested that Thrombospondin-1 (THBS1) may affect tumor immunity [37]. PD1-PDL1 (PD1 ligand) has already been shown to be an important immune checkpoint pathway, which can be used by cancer cells to evade immune attacks [38]. Thus, BZRAP1-AS1 and EMX2OS may play a dual role in cancer and directly or indirectly regulate tumor immunology. By contrast, no reports concerning CTC-429P9.1 in cancer have been published, and therefore, the role of CTC-429P9.1 remains unclear.
In recent years, a small number of researches have reported that lncRNAs can directly or indirectly affect the tumor microenvironment of cervical cancer. LOC105374902 induced by TNF-α, a multiple functional cytokine which can regulate inflammation and immunity of cancer, was found to promote the malignant behavior of cervical cancer cells by acting as a sponge of miR-1285-3p [39]. lncRNA SNHG14 was shown to be associated with the activation of the JAK-STAT pathway in Rank in ordered dataset 0 500 1,000 1,500 2,000 2,500 3,000 Rank in ordered dataset 0 500 1,000 1,500 2,000 2,500 3,000 Rank in ordered dataset 0 500 1,000 1,500 2,000 2,500 3,000 Rank in ordered dataset 0 500 1,000 1,500 2,000 2,500 3,000 Rank in ordered dataset 0 500 1,000 1,500 2,000 2,500 3,000 Rank in ordered dataset 0 500 1,000 1,500 2,000 2,500 3,000 Rank in ordered dataset 0 500 1,000 1,500 2,000 2,500 3,000 Rank in ordered dataset Rank in ordered dataset 0 500 1,000 1,500 2,000 2,500 3,000 Rank in ordered dataset 0 500 1,000 1,500 2,000 2,500 3,000 Rank in ordered dataset 0 500 1,000 1,500 2,000 2,500 3,000 Rank in ordered dataset 0 500 1,000 1,500 2,000 2,500 3,000 Rank in ordered dataset 0 500 1,000 1,500 2,000 2,500 3,000 Rank in ordered dataset 0 500 1,000 1,500 2,000 2,500 3,000 Rank in ordered dataset    BioMed Research International cervical tumor cells [40]. STAT3-binding sequence in the enhancer region of lncRNA MALAT1 was demonstrated to be crucial for the IL-6-or STAT3-induced MALAT1 promoter activation in cervical cancer HeLa cells [41]. HOTAIR was identified to promote the overactivation of the Wnt/β-catenin signaling pathway by the downregulation of PCDH10, SOX17, AJAP1, and MAGI2 and also TET [42]. lncRNA HIPK1-AS has been proved to regulate the inflammatory process of cervical cancer [43]. Generally, there are still relatively few immune-related lncRNAs in CC.
Our GSEA analysis showed that certain immunerelated enrichment statuses were dramatically enriched in the low-risk group. Noncanonical WNT signaling has been demonstrated to be closely associated with cancer stem cell survival, bulk tumor expansion, and invasion/metastasis, which have the potential for tumor immunology [44]. Ubiquitination has been shown to be crucial for tumor immunity [45].
Compared to a certain number of studies carried out to explore IRL signatures in several human malignancies [53][54][55][56][57][58][59], this was the first study based on a comprehensive analysis that focused on IRL signature in CC. However, our study has some limitations. Additional datasets are needed to validate our study. The specific molecular mechanisms in which these genes may be involved should be verified through in vitro and in vivo experiments.

Conclusions
We established a four-IRL-based signature with a prognostic value for CC, which could stratify CC patients into the lowand high-risk groups. Meanwhile, the specific mechanisms of the four IRLs in the development of CC were preliminarily ascertained. This may provide the basis for tumor prevention and immunotherapy in the future.