Construction of Immune-Associated Nomogram for Predicting the Recurrence Survival Risk of Stage I Cervical Cancer

Background Various studies reported that the prognosis of patients with cervical cancer (CC) was significantly associated with immunity, whereas limited studies have explored whether immune-associated genes could be classifiers for recurrence-free survival (RFS) of stage I CC. Thus, an improved immune-related gene signature for stage I CC patients' prognosis is urgently required. Materials and Methods We retrospectively analyzed the gene expression profiles of stage I CC patients in the GSE44001 set from the Gene Expression Omnibus (GEO) database. The stage I CC patients were randomly divided into the training group and the internal validation group. The training patients were adopted to develop a prognostic immune gene-based signature; meanwhile, the internal validation patients were used to validate the power of the selected immune gene-related signature using univariate Cox proportional hazard analysis, least absolute shrinkage and selection operator (LASSO), and multivariate Cox regression analysis. The accuracy and reliability of the immune gene-related signature were evaluated based on Kaplan-Meier analysis and time-dependent receiver operating characteristic (ROC) curves. Results High power of the 8-immune gene signature was found on the basis of ROC analysis (AUC at 1, 3, and 5 years were exhibited in the internal validation group (0.702, 0.715, and 0.728, respectively), external validation group (0.702, 0.825, and 0.842, respectively), and entire GEO dataset (0.840, 0.894, and 0.852, respectively)). Besides, C-index, ROC, calibration plots, and decision curve analysis (DCA) also acted well in our nomogram, suggestive of a high ability of the nomogram to elevate the prognostic prediction of stage I CC patients. Conclusions In this study, we successfully constructed an integrated 8-immune gene-based signature which could accurately identify patients with low prognostic risk from those with high prognostic risk. In addition, we developed an immune-related nomogram which can elevate the prognostic prediction of stage I CC patients.


Introduction
Cervical cancer (CC) is the fourth most frequently diagnosed malignancy in women worldwide [1]. CC accounts for about 10% of cancer-associated deaths in women worldwide, and there are an estimated 560,000 new cases in 2018 [2,3]. Standard therapies which include chemotherapy, radiotherapy, and surgical resection have improved the prognostic management of early-stage CC, whereas it is hard to prevent metastasis and recurrence of CC, which results in the majority of CC deaths [4,5]. Therefore, new therapeutic methods and novel hallmarks that provide prognostic information for CC patients are urgently required.
Previous studies suggested that the immune system was a determining index during carcinoma initiation and progression [6,7]. Accelerating evidence has showed that immune genes may serve as a hallmark of cancer. For example, Shen et al. developed and validated an immune gene set-based prognostic signature in ovarian cancer [8]. Tsakonas et al. showed that an immune gene expression signature distinguished central nervous system metastases from primary tumors in non-small-cell lung cancer [9]. Li et al. developed and validated an individualized immune prognostic signature in early-stage nonsquamous non-small-cell lung cancer [10]. Cheng et al. identified an immune-related risk signature for glioblastoma based on bioinformatic profiling [11]. Zhou et al. revealed an immune-related six-lncRNA signature to improve prognosis prediction of glioblastoma multiforme [12]. Various studies reported that the prognosis of patients with CC was significantly correlated with immunity. For instance, Yang et al. identified a prognostic immune signature for CC to predict survival and response to immune checkpoint inhibitors [13]. Huang et al. revealed the prognostic value of the preoperative systemic immune-inflammation index in patients with CC [14]. Wang et al. showed the prognostic landscape of tumor-infiltrating immune cells in CC [15]. Chen et al. revealed the correlation between subsets of tumorinfiltrating immune cells and risk stratification in patients with cervical cancer [16]. Cui et al. identified the TCR repertoire as a novel indicator for immune monitoring and prognosis assessment of patients with CC [17], whereas limited studies have explored whether immune-associated genes could be predictors for the prognosis of stage I CC. Therefore, an improved immune-associated gene signature for stage I CC patients' prognosis is urgently needed.
In this study, we investigated the gene expression data and related clinical information of stage I CC patients in GSE44001 from the GEO database to develop an individualized prognostic signature for stage I CC patients based on bioinformatic methods. Then, the power of the 8-immune gene signature was validated via ROC analysis and Kaplan-Meier analysis. We developed a nomogram on the basis of the risk score, cancer status, and race to improve the predicted value of the 8-immune gene signature. The outcomes indicated that our nomogram was an accurate prognostic prediction tool.

Materials and Methods
2.1. Data Acquisition. We retrospectively analyzed the gene expression profiles of stage I CC samples in the GSE44001 set of the GEO database via the GEOquery package [18].
Only samples with both survival information and expression data available were included in the present study. In addition, genes with a lack of expression values or samples with a lack of prognostic information for recurrence were removed. Following this, immune gene data were downloaded from the ImmPort database [19]. Consequently, 1811 immune genes and 258 stage I CC cases were included to develop the immune gene-based prognostic classifier for stage I CC. Besides, 153 samples with stage I CC from the TCGA database were searched across the TCGAbiolinks package [20] which was employed as an external validation group. The detailed information of enrolled samples in GEO and TCGA databases was presented in Table S1. In addition, our article was about data mining analysis through TCGA and GEO database and not associated with ethical approval.

Development and Validation of the Immune-Related
Signature for Stage I CC. We first performed univariable Cox regression analysis using the collected immune gene to identify the immune genes which generated a close (P < 0:05) correlation with RFS of stage I CC patients. Following that, the identified immune genes were adopted for LASSO analysis to determine stage I CC patients' RFSrelated candidate immune genes (P < 0:05). Finally, we implemented multivariate Cox regression analysis according to the candidate immune genes to select RFS signature genes for stage I CC patients, and an 8-metabolic gene signature was identified as a prognostic predictor.
The stage I CC samples in the entire GEO dataset were randomly assigned into the training and internal validation groups with a ratio of 7 : 3. The training patients were adopted to build a prognostic immune gene-related signature; meanwhile, the internal validation patients were exploited to validate the reliability of the selected immune gene-related signature. After that, we established a formula by using the 8-immune gene-based signature to calculate the immune risk score for each patient. Patients were then separated into high-and low-risk cohorts based on the median cutoff of the risk score. We assessed the prognostic performance of the immune signature by comparing the sensitivity and specificity of ROC curves via R (version 4.0.0), and the AUC value was employed as an evaluation indicator. The comparison of RFS between the high-risk and low-risk cohorts was assessed by the Kaplan-Meier curve through R (version 4.0.0).

Quantitative Reverse Transcription-Polymerase Chain
Reaction (qRT-PCR). Total RNA was extracted from stage I CC tissues via TRIzol (Invitrogen) according to the  Table S2.

Single Sample Gene Set Enrichment Analysis (ssGSEA).
ssGSEA was carried out based on the TCGA-CESC mRNA dataset across the GSVA package [21] to find the immune gene signature-based signaling pathways. We investigated the top 20 key immune gene-related pathways that had a positive association with the risk score. The stage I CC patients in the entire GEO dataset were randomly divided into the training and internal validation groups with a ratio of 7 : 3. p values of <0.05 were considered statistically significant.
2.5. Identification of the Nomogram. We implemented univariate and multivariate Cox proportional hazard analysis on the basis of the risk score and several clinic-related factors. Cox proportional hazard models were utilized for calculating hazard ratios (HR) and corresponding 95% confidence interval (CI). To elevate the predictive reliability of the 8-immune gene-based signature for stage I CC patients' RFS, a nomogram was established on the basis of the risk score, cancer status, and race via the "rms" R package. The predictive capability of our nomogram for stage I CC patients' RFS was assessed according to C-index, ROC, calibration plots, and DCA. The predicted results of the nomogram were illustrated in the calibrate curve, and the 45°line stood for the ideal prediction.

Patient Characteristics.
A cohort containing 258 stage I CC patients with available expression data and related clinical information in the GEO database was analyzed. The clinicopathological features of the analyzed samples are showed in Table 1. A flowchart which manifested the entire process of the study is showed in Figure 1.

Construction of Immune-Associated Risk Signature.
Univariate and LASSO Cox regression analyses were adopted to examine the association between the 2059 immune-related genes and RFS of the stage I CC patients (   Figure 3 and Figure S1 illustrate that the high immunecorrelated gene expression of HFE, OSMR, PLXNA3, PLXNC1, TNFRSF11B, and UCN had a distinctly miserable survival; nevertheless, the low immune-associated gene expression of CCL14 and MAP3K14 produced an evidently gloomy survival. In addition, 28 immune checkpoints and tumor mutation burden (TMB) from the TCGA database were also analyzed between the two risk groups ( Figure S2). The result showed that the expression levels of CD4, CXCR4, LGALS9, TNFRSF4, and TNFSF4 were elevated in the high-risk group, while the expression levels of CD48, CD83, and KLRG1 were raised in the low-risk group. Importantly, the expression of TMB was raised in the lowrisk group ( Figure S3). Otherwise, the expression of the immune genes in CC tissues was analyzed by real-time quantitative reverse transcription-polymerase chain reaction (qRT-PCR) and immunohistochemical (IHC) assay ( Figure S4). The result exhibited that the expression levels of CCL14 were reduced in CC tissues; meanwhile, the expression levels of HFE, TNFRSF11B, OSMR, and PLXNA3 were increased in CC tissues.

3.3.
Relationship between the Immune-Associated Signature and Stage I CC Patients' RFS. The stage I CC samples were separated into the high-and low-risk groups according to the median cutoff of the risk score. Kaplan-Meier analysis was adopted to explore the difference in RFS between two groups. Survival analysis showed that the cohorts with lower risk scores had longer RFS than the high-risk cohorts, which was illustrated in the internal validation group (p = 0:012) (Figure 4(a)). A similar outcome was implied in the external validation group (p = 2e − 04) (Figure 4(c)) and the entire GEO group (p = 2e − 05) (Figure 4(e)).

Evaluation of the Predicted Capacity of the 8-Immune
Gene-Correlated Signature Based on ROC Analysis. Time-   (Figure 5(a)), and the survival status of stage I CC cases in the training and internal testing cohorts was illustrated on the dot plot ( Figure 5(b)). We found that the groups with lower risk scores had longer RFS than the high-risk groups. Heatmap distribution of the 8 immune-correlated genes clustered through the immune risk score is exhibited in Figure 5(c), which was concordant with the result in Figure 3. A similar outcome was found in the external validation group (Figure S5), which supported the result in Figure 3.

Exploration of the 8-Immune Gene Signature-Correlated
Biological Pathways. The stage I CC samples were divided into the high-and low-risk groups according to the median cutoff of the risk score. The top 20 core immune geneactivated pathways that produced a positive association with the immune risk score are illustrated in Figure 6(a) ( Table S4). A significantly positive association between the enriched pathways and immune risk score is further exhibited in Figure 6  BioMed Research International 6:94e − 08) ( Table 2). To elevate predicted robustness of the 8-immune gene-based signature for stage I CC patients' RFS across a quantitative method, we built a nomogram on the basis of the risk score, cancer status, and race ( Figure 7) to predict 1-, 3-, and 5-year stage I CC patients' RFS. The importance between the risk score and the clinic-related variables is depicted in Figure 8(a). The result exhibited that C -index (0.911, 95% CI: 0.876-0.936), AUC (0.929, 0.954, and 0.974) (Figure 8(b)), and calibration plot illustrated that the nomogram served as a perfect predictive model (Figures 8(c)-8(e)). Besides, the DCA manifested that the nomogram produced a crucial clinical application for prognosis prediction of stage I CC patients than that in the treat all or treat none cluster. Net benefit was verified for stage I CC patients' 3-year recurrent risks (Figure 8(f)), illustrating the good capacity of our tool.

Discussion
We constructed and validated an immune gene-based signature for CC using GEO and TCGA dataset. The signature consisted of 8 immune genes with prognostic ability. A    [29]. We speculated that the above 8 immune genes were also related to stage I CC.
Various studies manifested that nomograms may elevate prognostic prediction for tumors on the basis of several clinical variables via a quantitative method. For example, Chen et al. reported the transcription factor profiling to predict recurrence-free survival in breast cancer: development and validation of a nomogram to optimize clinical management [30]. Xiong et al. suggested that nomogram integrating genomics with clinicopathologic features improved prognosis prediction for colorectal cancer [31], whereas fewer studies developed a nomogram to predict the prognosis of stage I CC patients. Our nomogram was built according to clinical variables to predict the prognosis of stage I CC patients in a quantitative method; in other words, our model can predict specific survival percentages of stage I CC patients, which may elevate prognostic prediction for stage I CC patients.
Considerable researches demonstrated that the LASSO Cox regression model can be adopted to select prognostic predictors of various cancers, For example, Lan et al. identified prognostic factors and constructed a prognostic miRNA signature based on univariate Cox regression analysis and LASSO [32]. Luo et al. developed a three-miRNA signature as a novel potential prognostic hallmark in patients with clear cell renal cell carcinoma [33]. The LASSO method can minimize the log partial likelihood subject to the sum of the absolute values of the parameters which are bounded by a constant. Thus, it shrinks coefficients and generates some coefficients which are precisely zero. Consequently, it can reduce the assessment variance when providing an explicable final tool [34]. We adopted the LASSO method to select the candidate immune-associated genes significantly related to RFS of stage I CC patients for eliminating the interference of the possible multicollinearity.
Therefore, the application of LASSO analysis can elevate the prognostic prediction for RFS of stage I CC patients.
Although the 8-immune gene-based signature appears to be a potential prognostic predictor in clinical application, there are also some limitations. Firstly, the IHC results for PLXNA3 and PLXNC1 were not available. Secondly, the sample size in our external validation set was not large enough. The third limitation was that the prognostic value of the 8-immune gene-based signature was tested only by online databases, and more prospective studies should be further performed. Fourthly, we developed the nomogram according to retrospective data from the TCGA database, which may produce a hazard of selection bias.

Conclusion
We constructed an integrated 8-immune gene-based signature that was significantly related to RFS of stage I CC patients which could accurately identify patients with low prognostic risk from those with high prognostic risk. Furthermore, we evaluated the accuracy and reliability of the above signature based on Kaplan-Meier analysis and ROC curves. These results suggested that the 8-immune genebased signature could potentially serve as a prognostic tool in stage I CC. The indicators indicated that our nomogram can elevate the prognostic prediction of stage I CC patients.

Data Availability
The gene expression and clinical data of stage I cervical cancer patients in our study were obtained from TCGA (http:// tumorsurvival.org/) and GEO (https://www.ncbi.nlm.nih .gov/geo). Table S1: the detailed clinical information of the samples enrolled in the present study. Table S2: all primer sequences used in this study. Table S3: hazard ratios and 95% CIs and p values of 1811 immune genes via univariate Cox regression analysis. Table S4: the enriched pathways related to the immune risk score. Figure S1: boxplots of 8 immune gene expression values against the risk group in the TCGA dataset. Figure S2: 28 immune checkpoints from the TCGA database were explored between the two risk groups. Figure S3: tumor mutation burden (TMB) from the TCGA database was explored between the two risk groups. Figure S4: the expression of the immune genes in CC tissues was assessed by qRT-PCR and IHC analysis. Figure