Identification of Key Genes of Prognostic Value in Clear Cell Renal Cell Carcinoma Microenvironment and a Risk Score Prognostic Model

Objective We aimed at identifying the key genes of prognostic value in clear cell renal cell carcinoma (ccRCC) microenvironment and construct a risk score prognostic model. Materials and Methods Immune and stromal scores were calculated using the ESTIMATE algorithm. A total of 539 ccRCC cases were divided into high- and low-score groups. The differentially expressed genes in immune and stromal cells for the prognosis of ccRCC were screened. The relationship between survival outcome and gene expression was evaluated using univariate and multivariate Cox proportional hazard regression analyses. A risk score prognostic model was constructed based on the immune/stromal scores. Results The median survival time of the low immune score group was longer than that of the high immune score group (p = 0.044). Ten tumor microenvironment-related genes were selected by screening, and a predictive model was established, based on which patients were divided into high- and low-risk groups with markedly different overall survival (p < 0.0001). Multivariate Cox analyses showed that the risk score prognostic model was independently associated with overall survival, with a hazard ratio of 1.0437 (confidence interval: 1.0237–1.0641, p < 0.0001). Conclusions Low immune scores were associated with extended survival time compared to high immune scores. The novel risk predictive model based on tumor microenvironment-related genes may be an independent prognostic biomarker in ccRCC.


Introduction
Renal cancer, the most common lethal genitourinary cancer, accounts for up to 2-3% of all adult malignancies worldwide [1]. Clear cell renal cell carcinoma (ccRCC) is the most common pathological subtype, accounting for approximately 65-70% of renal neoplasms [2]. It is a relatively low-grade cancer, and patients are generally asymptomatic in the early stages. Symptoms are obvious when tumor volume is large and the cancer has reached the advanced stages [3]. With rapid advancements in clinical diagnosis as well as treatment strategies, the overall survival rate of ccRCC patients has increased. However, due to a delay in diagnosis, and local or distant metastasis, the overall survival of 30% of the newly diagnosed ccRCC patients, who present with metastasis at the time of diagnosis, is approximately 13 months [4,5]. Currently, prognosis is based on the pathological stage and grade of cancer in patients with renal cancers [6]. Despite significant advances in early detection, surgery, and medical treatment, the prognosis of ccRCC remains unsatisfactory. Therefore, more effective biomarkers and therapeutic targets are urgently needed.
It is widely known that tumor tissues consist of tumor cells as well as tumor-related normal epithelial, immune, stromal cells, and vascular cells [7]. In ccRCC, cells in the tumor microenvironment (TME) promote growth and metastasis of tumor cells and suppress the immune system via several elaborate mechanisms [8]. Constituents of the TME include tumor cells, immune cells, and various types of stromal cells; their interactions have a significant effect on treatment response and disease prognosis. TME also significantly influences carcinogenesis, gene expression in the cancer tissues, and clinical prognosis [7,[9][10][11]. TME cells constitute important components of the cancer tissue. TME is the cellular milieu where the tumor is located. Tumors are usually formed by aggregates of various tumor cell types, and immune and stromal cells, which constitute the two main nontumor cell types in the TME [12,13]. To quantify the cellular composition of the TME, an algorithm called ESTIMATE (Estimation of Stromal and Immune cells in Malignant Tumor tissues using Expression data) was used to determine the composition of immune and stromal cells in tumor samples [7]. ESTIMATE is a tool that predicts tumor purity and the level of infiltrating immune/stromal cells in tumor tissues using gene expression data. ESTIMATE scores are significantly related to tumor purity in various cancer samples. In this algorithm, immune and stromal cell levels were chosen to predict the infiltration of nontumor cells by analyzing their specific gene expression signatures. Stromal score indicates the presence of stroma in the tumor tissue, while the immune score indicates the infiltration of immune cells in tumor tissue; the estimated score indicates tumor purity. A positive correlation was reported between immune and stromal scores, and samples with high tumor purity showed low immune and stromal scores [7]. Previous studies have used the ESTIMATE algorithm to evaluate the prognostic value of immune and stromal cells in several cancers [14][15][16]. The TME is increasingly being considered to play a vital role in tumor growth. Therefore, understanding the components of stromal and immune microenvironments may contribute to improve prognosis and customized therapies. However, whether TME contributes to ccRCC survival has not been well studied. In this study, we aimed at identifying key TME-related genes of prognostic value in ccRCC and generate a risk score prognostic model.

Materials and Methods
Level 3 raw microarray mRNA expression data pertaining to 539 ccRCC samples were downloaded from the TCGA database (https://portal.gdc.cancer.gov/repository) on April 3, 2019. The clinicopathological data of 537 tumor patients, including sex, age, clinical stage, T-stage, lymph node involvement, survival outcome, and survival (duration in days) were also downloaded and reorganized for further analysis. ESTI-MATE is known to predict the immune and stromal scores of cells by performing single sample Gene Set Enrichment Analysis (ssGSEA), which forms the basis of the ESTIMATE score [17,18]. The scores for all TCGA tumor types are available online in the MD Anderson Cancer Center website (https://bioinformatics.mdanderson.org/estimate/). Stromal and immune scores of each ccRCC sample were determined by using the ESTIMATE algorithm using the estimate package provided on R-Forge for all tumor samples [7].

Identification of Differentially Expressed Genes (DEGs).
We categorized all patients into high and low-score groups according to their median immune/stromal scores. The limma package in R software with multiple testing correc-tions based on the Benjamini and Hochberg method was employed to screen the DEGs [19]. The criteria for screening of upregulated DEGs were defined as log 2 FC > 1 and adjusted p value < 0.05, while that for downregulated DEGs were defined as log 2 FC < −1 with adjusted p value < 0.05. For comparison based on immune and stromal scores, upregulated and downregulated genes were identified for both. Lastly, the common upregulated and downregulated DEGs in the stromal and immune score groups were identified using an online Venn diagram and defined as upregulated and downregulated DEGs.

Overall Survival Curve.
Kaplan-Meier plots, tested using the log-rank method, were used to establish a potential relationship between prognostic values and gene expression levels of the identified DEGs between the high and low immune/stromal score groups in patients.

Construction of the Risk Score Prognostic Model in the
Training Cohort. The genes tested using the Kaplan-Meier plots with a p value < 0.05 were considered for further analysis. A total of 530 patients with complete clinical information and mRNA expression data were enrolled in the study. We first randomly assigned 264 patients to the training cohort, while the entire TCGA cohort was used as the validation cohort. Next, a total of 43 genes associated with overall survival were subjected to further selection using univariate Cox proportional hazard regression analysis in the training cohort. Only genes with a p value < 0.05 were regarded as possible variables and used in multivariate Cox regression analysis to build the risk predictive model.
where Expr is the expression level of gene and Coef is the regression coefficient derived from the multivariate Cox regression model. Each patient was assigned a risk score according to the above risk predictive model. Patients were then further divided into high-and low-risk groups using the median value as the cutoff point. Time-dependent receiver operating characteristic (ROC) curve analysis was performed to determine the power of the predictive model.

Enrichment Analysis of the DEGs.
To identify the potential gene ontology (GO) categories by their biological processes (BP), molecular functions (MF), cellular components (CC), and to determine the Kyoto Encyclopedia of Genes and Genomes (KEGG) signaling pathways associated with all the upregulated and downregulated DEGs, clusterProfiler and DOSE package in Bioconductor (https://bioconductor .org/) were used to perform GO and KEGG pathway analyses. Results of the GO and KEGG analyses were regarded as statistically significant at p < 0:05. . The 537 ccRCC patients in the study were classified into low and high immune/stromal score groups using their median immune/stromal scores as cutoff points. Kaplan-Meier survival curve of the immune scores revealed that the median survival time of the highscore group was associated with shorter survival time than that of the low-score group (p = 0:044), as evidenced by log-rank test (Figure 1(a)). Similarly, the Kaplan-Meier survival curve of stromal scores revealed that the high-score group was associated with shorter median survival time than that of the low-score group (although the difference was not statistically significant), as indicated by log-rank test (p = 0:258; Figure 1(b)). Furthermore, the average immune score of stage IV patients ranked highest among all stages, followed by stages III and II. Stage I patients received the lowest immune scores, as revealed by the Wilcoxon signed-rank test ( Figure 1(c)). Although the results were statistically significant, there was a significant overlap in the median values as well as the range of the immune scores among the four groups of tumor stages. Moreover, these differences were only marginally significant, and therefore the results must be interpreted cautiously. However, this association was not found in the stromal scores (Figure 1(d)).

Gene Expression Profiles in the Stromal and Immune
Score Groups of ccRCC Patients. A total of 512 upregulated and 147 downregulated DEGs were identified in the immune score group, while 259 upregulated and 152 downregulated DEGs were identified in the stromal score group. Furthermore, Venn diagrams revealed that 48 upregulated genes and 47 downregulated genes were common between the two high-scores groups (Figures 2(a) and 2(b)). To investigate the potential biological functions of the 95 identified DEGs, KEGG pathway enrichment and GO functional analyses were performed. The top GO terms identified included humoral immune response, cell chemotaxis, chemokine activities, cytokine-cytokine receptor interaction, and primary immunodeficiency (Figures 2(d) and 2(d)).

Identification of the Effect of the Individual DEGs on
Overall Survival. To explore the potential roles of the 95 identified DEGs in overall survival, Kaplan-Meier survival curves were used to establish the potential relationship between the prognostic roles and gene expression levels. Among the 95 DEGs, a total of 43 DEGs (Supplementary Table S1) were found to be significantly related to overall survival in the log-rank test.

Establishment of Risk Score Prognostic Model. Univariate
Cox regression analysis demonstrated that 33 DEGs were significantly associated with overall survival (p < 0:05) among the 43 DEGs. To construct a predictive model with optimal efficacy and sufficient information, the 33 candidate genes, including APCDD1L, GJB6, CASP5, SLN, HSD11B1,  PPARGC1A, ZPLD1, SLC22A12, SLC22A6, HMGCS2,  CPA4, ADGRV1, GPAT3, PAEP, MZB1, RORB, IGLL5,  OGDHL, AQP9, LDHD, FDCSP, HSD11B2, TNFSF13B,  FREM1, FCRL5, POU2AF1, MUC20, VSIG4, RAP1GAP,  MIXL1, GREM1, PAH, and SLC22A8, were fitted into a multivariable Cox proportional hazards regression analysis in the training cohort. Ten survival-related genes displayed a significant prognostic value for ccRCC. We then constructed a prognostic signature based on the expression levels of these ten genes and their coefficients derived from the multivariable Cox model. The risk score prognostic model was com- The value of their respective coefficients indicated the extent of their impact on survival prediction. MIXL1 presented the highest risk, while ADGRV1 had the most protective effect. Patients with a ten-gene signature risk score higher than the median risk score were classified as high-risk, while those with a risk score lower than the median risk score were classified as low-risk. The Kaplan-Meier overall survival curve of the high-risk group was significantly lower than that of the low-risk group (log-rank p < 0:0001; Figure 3(a)). Univariate analysis revealed that the ten-gene signature correlated significantly with poor overall survival (hazard ratio [HR]: 1.0509; 95% confidence interval [CI]: 1.0283-1.0740; p < 0:0001). Multivariate Cox analyses showed that the tengene signature remained independently associated with overall survival, with an HR of 1.0437 (CI: 1.0237-1.0641, p < 0:0001), along with age, grade, and stage (Table 2), revealing that the risk score prognostic model may be an independent predictor of overall survival. Furthermore, to estimate the prognostic risk score model for 3 and 5 years, the area under the receiver operating characteristic (ROC) curve (AUC) was calculated. As shown in Figure 3(b), the AUC of the ten-gene model for survival prediction was 0.748 at 3 years of overall survival and 0.756 at 5 years of overall survival. The  Disease Markers distribution of risk score, survival status, and ten-gene signature expression of individual patients was further analyzed. Patient subgroups with high and low-risk scores were classified using the optimal cut-off. An increased risk score was associated with higher patient death rate (Figure 3(c)).

Validation of the Risk Score Prognostic Model in TCGA
Cohort. To test the robustness of the prediction power of the ten-gene signature, we extended the testing to the TCGA validation cohort entirely. The risk score of each patient in the validation cohort was calculated based on the expression value of the ten-gene signature and was divided into the highor low-risk groups according to the cutoff point of the median risk score (0.8768) derived from the training cohort.
Using the same risk score model and cutoff point, 230 patients were classified into the low-risk group and 300 patients were classified into the high-risk group. Kaplan-Meier survival curves of the two groups based on the tengene signature were notably different in the validation cohort. Patients in the high-risk group had obviously shorter overall than those in the low-risk group (p < 0:0001, Figure 4(a)). By calculating the AUC of the ROC curve of the risk score, we could predict the 3-year and 5-year survival rates of patients with ccRCC (0.666 for 3 years and 0.658 for 5 years; Figure 4(b)). Distribution of the ten-gene signature risk scores, expression pattern of prognostic signature, and survival status are shown in Figure 4(c) and are consistent with findings in the training cohort. Combining the results of the training and total cohorts, the ten-gene signature was demonstrated to be an effective independent prognostic biomarker in patients with ccRCC.

Discussion
Increasing evidence has revealed a close interplay between the tumor and nontumor components that contribute to increased angiogenesis, invasion, progression, and metastasis in several tumors, such as pancreatic, breast, and lung cancers, and glioblastoma [20][21][22][23]. Stromal and immune microenvironments suppress the tumor and prevent the metastasis as well as permeability of several drugs into the tumor [24]. The tumor microenvironment can stimulate angiogenesis and tumor cell survival, resulting in poor prognosis [25,26]. Our study results were consistent with previous findings and revealed an association between immune and stromal microenvironments and tumor progression. These mechanisms have not been completely explored and warrant future investigations.
In this study, we aimed at determining the role of TMErelated genes in the overall survival in ccRCC based on the TCGA database. We classified all patients with ccRCC into low and high immune score groups based on their median immune scores. The Kaplan-Meier survival analysis revealed that the median survival time of the high immune score group was shorter than that of the low-score group, indicating that TME-related immune cells can be used to categorized patients into high and low-score groups with notably different overall survival. This finding was inconsistent with the previous view that tumor immunity suppresses tumor cells. The TME is a mixture of fluids, stromal cells, immune cells, extracellular matrix molecules, and numerous cytokines and chemokines, coupled with their significant interactions. However, cells and molecules in this environment are in a dynamic state and contribute to tumor immune evasion, tumor growth, and metastasis, thereby showing the evolutionary nature of tumors [8]. TME-induced metabolic stress on infiltrating immune cells can result in local immunosuppression and limited reinvigoration of antitumor immunity and lead to impaired antitumor immune responses [27]. TME is a factor that drives the carcinogenesis of various cancers. In pancreatic adenocarcinoma (PAAD), high-and low-immune score groups were identified using the ESTI-MATE score. Kaplan-Meier curves revealed significantly worse survival of patients with high-risk scores in both the training and validation groups [15]. Higher scores were associated with worse survival outcomes for immune scores (p = 0:0167), stromal scores (p = 0:0035), and ESTIMATE scores (p = 0:0190) in lower-grade glioma [14]. The potential prognostic value of immune and stromal scores in stomach adenocarcinoma has been confirmed. High stromal and immune scores were reported to be associated with poor overall survival (p = 0:0032 and p = 0:05, respectively). The estimated score was also related to overall survival (p = 0:0359) [16]. In ccRCC, a higher immune score was associated with shorter overall survival (p = 0:04), and a close association was reported between immune scores, clinical  characteristics, as well as prognosis in ccRCC [28]. These data indicate that TME plays an important role in tumorigenesis and prognosis. KEGG pathway enrichment and GO functional analyses of these genes revealed that they were mainly involved in humoral immune response, cell chemotaxis, chemokine activities, cytokine-cytokine receptor interaction, and primary immunodeficiency. We also performed a Kaplan-Meier analysis of these genes and identified 33 immunerelated genes that were associated with different outcomes in patients with ccRCC. Of these, several genes including CASP5, RAP1GAP, and GREM1 have been reported to be involved in the pathogenesis of renal cancer or significant in predicting overall survival, suggesting that the present analysis using the TCGA database has potential prognostic value [29][30][31][32]. Further, using univariate and multivariate Cox proportional hazard regression analysis, a ten immune-related gene risk score prognostic model was established. The Kaplan-Meier overall survival of the risk score prognostic model in the high-risk group was significantly shorter than that in the low-risk group, revealing that the risk score prognostic model based on immune-related genes may be an independent predictor of overall survival. The prognostic power of the risk score model was evaluated by calculating the AUC of the ROC curve. Higher AUC demonstrates good model performance. The AUC for the ten-gene model for survival prediction was 0.703 at 3 years of overall survival and 0.715 at 5 years of overall survival. The performance of the prognostic model was further confirmed in the entire cohort, revealing its superior performance in predicting ccRCC patient survival.
Several limitations of the study need to be noted. First, the TME signature was analyzed and validated only in the TCGA data set, and no other ccRCC-related expression profiles including prognostic clinical information were available for further validation. Second, no experimental data were obtained. Further experimental studies are needed to improve our understanding of the functional role of immune-related genes in ccRCC.

Conclusion
Based on a comprehensive analysis of TME-related genes, our results indicate that the assessment of the immune and stromal status using the TME signature is an optimal predictor of survival in ccRCC patients. The novel risk predictive model proposed in this study constitutes an effective independent prognostic biomarker in patients with ccRCC. These findings have promising clinical implications for improving the outcome prediction for ccRCC patients contingent upon future validation.

Data Availability
The raw data of this study are derived from the TCGA data portal (http://tcga.cancer.gov), which is a publicly available database.

Conflicts of Interest
The authors declare that they have no conflicts of interest.