Identification of Tumor Microenvironment Scoring Scheme Based on Bioinformatics Analysis of Immune Cell Infiltration Pattern of Ovarian Cancer

Background Tumor microenvironment (TME) is the crucial mediator of tumor progression, and the TME model based on immune cell infiltration to characterize ovarian cancer is considered to be a promising strategy. Methods Sample data of three ovarian cancer cohorts were obtained from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. The scores of 22 kinds of immune cells were calculated based on CIBERSORT, and the TME clusters (TMECs) of ovarian cancer was determined by ConsensusClusterPlus. Genomic subtype was identified by non-negative matrix factorization (NMF). A TME scoring scheme was constructed using k-means algorithm and principal component analysis (PCA) to quantify the TME infiltration pattern of individuals. Results Four TME subtypes of ovarian cancer samples were defined: TMEC1, TMEC2, TMEC3, and TMEC4. There were also significant differences in overall survival (OS) among the four TMEC, and the OS of TMEC3 was the longest. The difference analysis of TMEC3 and the other three TMECs respectively identified the DEGs and took the intersection, and 585 DEGs were obtained. Two genomic subtypes were identified by NMF based on the expression of 585 genes, which were called GeneC1 and GeneC2. The TME scoring scheme constructed by k-means and PCA algorithm was used to calculate the TME score of ovarian cancer in TCGA. High-TME score was significantly correlated with shorter survival time, older age, lower immunoactivated molecules, and immune checkpoint gene expression. Conclusions This study highlighted the complexity and diversity of TME infiltration patterns in ovarian cancer and constructed a set of TME scoring scheme to reveal TME infiltration patterns and provided new insights into the landscape of TME.


Introduction
Ovarian cancer is the fifth deadliest malignant tumor among women, with more than 310,000 newly diagnosed cases and more than 200,000 deaths each year [1]. Most ovarian tumors originate from the epithelial surface of the ovary, others come from germ cells or stromal cells. e main subtypes of epithelial carcinoma are serous carcinoma, endometrioid carcinoma, mucinous carcinoma, clear cell carcinoma, and undifferentiated carcinoma [2]. e tumor heterogeneity among these subtypes is very high, and the etiology, molecular biology, treatment response, and many other characteristics are different [3]. For the past few years, tremendous efforts have been made to characterize these subtypes and identify tumoral pathways and potential biomarkers for therapeutic strategies [4]. Given that ovarian cancer is among the first cancers with an established association of immune cell infiltration [5]. It is considered promising to classify ovarian cancer according to the composition of infiltrating immune cells in tumor microenvironment (TME). However, whether particular immune cell types are associated with a greater or lesser risk of disease progression or death and how these effects differ by ovarian cancer subtype remain unclear [6]. To this end, several available software tools have been developed. ese tools characterize the composition of immune and nonimmune stromal cells in large tumors from their entire transcriptome. For example, the microenvironment cell population (MCP)counter tool provides a quantitative method for calculating the abundance of eight immune cell groups and two stromal cell groups to help depict the global image of tumor immune infiltration [8]. Estimating the proportion of immune and cancer cells (EPIC) estimates the ratio of immune cells to cancer cells by combining the RNA-Seq spectra of all major immune and other nonmalignant cell types established from circulating cells and tumor infiltrating cells, as well as information on cell morphology and algorithm development [9]. xCell is a method that integrates the advantages of gene set enrichment with deconvolution approaches, which can calculate 64 immune and stromal cell types [10]. CIBERSORT is a deconvolution support vector regression algorithm that calculates the proportion of 22 types of cells in TME, including seven T-cell types, immature and memory B cells, plasma cells, NK cells, and bone marrow subsets [11]. ese tools facilitate the study of immune cell infiltration in TME.
In this study, we classified ovarian cancer based on the scores of 22 immune cells estimated by CIBERSORT, and constructed a set of TME scoring scheme to define different types of TME infiltration patterns. By comparing the differences in prognosis and clinical and molecular characteristics among different TME invasion patterns, we expect to find some characteristics related to TME phenotype.

Collection and Preprocessing of RNA-Seq and
Clinicopathological Data of Ovarian Cancer. TCGA was visited to download RNA-seq data and clinical features of 379 ovarian cancer samples. e cancer tissue samples with complete clinical annotations were retained and readcount was converted into TPM through the annotated information of gencode v22. Two ovarian cancer transcriptome datasets GSE14764 (n � 79) and GSE26712 (n � 184) in MINiML format were downloaded from gene expression omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/). For the chip probe, the bioconductor package [12] was used to map the human gene SYMBOL. e research shows the logic route of the total research approach utilizing the form of the flow chart ( Figure S1).

Evaluation of Immune Cell Infiltration in Ovarian Cancer.
e leukocyte gene signature matrix (LM22) data of TCGA and GEO were combined and the RNA-seq data of TPM format were uploaded to CIBERSORT (https://cibersort. stanford.edu/) portal network. With the expression values of 547 genes representing different immune cell types as a reference, the scores of 22 immune cells were analyzed and the immune cell infiltration matrix was drawn.

Consensus Clustering Infiltrating Immune Cells.
ConsensusClusterPlus toolkit was used according to the unsupervised cluster of the three combined ovarian cancer cohorts [13]. e toolkit used "Pam" method with the Euclidean and Ward linkage methods to identify stable subtypes 1000 times, and the resulting subtype was defined as TMEC. e final number of clusters was determined by the area under the cumulative distribution function (CDF) curve.

Non-negative Matrix Factorization Algorithm Based on
Differentially Expressed Genes among TME Subtypes. Among the obtained TME subtypes, the differentially expressed genes (DEGs) among TME clusters in TCGA were analyzed by DESeq2 and used in unsupervised clustering of non-negative matrix factorization (NMF) to identify new subtypes, which was defined as GeneC.
is process was performed in the R package "NMF" [14], the standard "brunet" was selected, and 50 iterations were performed.

Development of TME Scoring Scheme.
e genes related to the prognosis of ovarian cancer were screened from DEGs among TME subtypes by univariate Cox regression analysis, and their importance was evaluated by inputting them into R package "randomForest". e mtry parameters (random variables) of each segmentation were set to 1-101 and ntree � 500, and the mtry value with the lowest error rate was selected as the optimal mtry value of the random forest algorithm. Furthermore, the above prognosis-related genes were also clustered by k-means algorithm, and the obtained category (Signature G) performed principal component analysis (PCA) using R psych package for 100 iterations. e best principal component number (PCs) was obtained and the score of each PC was calculated, and the PC1 score was selected as the signature score. Each signature G was given a regression coefficient β by multivariate Cox regression analysis, and the established TME scoring scheme was as follows: TME score � PC1 * βi.

Analysis of Clinical Results of Ovarian Cancer According to TME Scoring Scheme.
e TME score of each sample was computed according to TME scoring scheme, and the median value was used as the boundary between the high-TME score group and the low-TME score group. e Kaplan-Meier curve was generated by R package "survival," and the survival differences between groups were compared by log-rank test. TME score was used as an indicator to assess differences in different clinical characteristics (age, stage, and grade) within groups.

Statistical
Analysis. All the statistical analyses were completed in R. Two groups of continuous variables were compared by Wilcoxon test, and three groups and more than three groups of continuous variables were compared by Kruskal-Wallis test.
e survival curve was drawn by Kaplan-Meier method and the survival differences were compared by log-rank test. P < 0.05 was defined as statistically significant.

e Immune Cell Infiltration Pattern in Ovarian Cancer TME.
e LM22 data of TCGA and GEO were combined, and by calculating the correlation of 22 immune cells in the sample, we found the clustering of three kinds of positive correlation (Figure 1 Table 1). e results showed that activated mast cells, resting CD4 memory T cells, and neutrophils were significantly correlated with poor prognosis of ovarian cancer. M0 macrophages, gamma delta T cells, and M1 macrophages were significantly correlated with good prognosis of ovarian cancer (Figure 1(b)). To eliminate the heterogeneity between datasets, we analyzed the relationship between 22 immune cells and prognosis of ovarian cancer in TCGA and GEO datasets, respectively. e results showed that there were similarities in the relationship between several immune cells and prognosis in the two datasets, but there were also some differences, for example, the HR of monocytes, eosinophils, and activated NK cells was >1 in the TCGA dataset and <1 in the GEO dataset. e HR of resting cells, CD8 T cells, and regulated T cells in TCGA dataset was <1 in TCGA dataset and >1 in GEO dataset. One of the reasons for this may be that GEO dataset is mainly composed of advanced and highlevel samples (Figures S2(a) and S2(b)).
To reveal the different patterns of immune cell infiltration, we clustered the combined ovarian cancer samples based on LM22 scores and selected the optimal number of clusters from k � 2-10. e consensus matrix of k � 2-5 was generated. e area under the CDF curve did not increase significantly with the increased k. e best choice was to divide the merged ovarian cancer samples into four clusters ( Figures S3(a)-S3(f )). erefore, four subtypes of ovarian cancer samples were defined: TMEC1 (n � 134), TMEC2 (n � 202), TMEC3 (n � 156), and TMEC4 (n � 118). ere was a high content of M0 macrophages in TMEC1, and more activated dendritic cells and activated mast cells were enriched in TMEC2. M1 macrophages and gamma delta T cells, and activated memory CD4 T cells in TMEC3 had higher scores, and the immune cell with the highest score in TMEC4 was resting CD4 memory T cells (Figure 2(a)). e LM22 score difference analysis of four TMEC showed that there were significant differences in the LM22 signature score of 17 immune cells among the four TMEC ( Figure 2(b)). And there were significant differences in overall survival (OS) among the four TMEC, and the OS of TMEC3 was the longest (Figure 2(c)).

Two Genomic Subtypes
Were Identified Based on the DEGs among TMECs. To investigate the differences in gene expression patterns among different TMEC, we used DESeq2 to analyze the differential expression between the TMEC3 with the best survival results and the other three kinds of TMECs. DEGs (585) were shared among the DEGs of MEC3/TMEC1, TMEC3/TMEC2, and TMEC3/TMEC4 (Figure 3(a), Supplementary Table 2). NMF clustered the ovarian cancer samples in TCGA according to the expression of 585 DEGs, and the minimum number of members of each subclass was set to 10. According to the indicators such as cophenetic, dispersion, and silhouette, the optimal number of clusters was determined to be 2, which was called GeneC1 and GeneC2, respectively ( Figure 3(b), Figures S4  and S5). Survival analysis showed that GeneC1 was significantly related to longer survival time, and GeneC2 was significantly related to poor prognosis (Figure 3(c)). By comparing the scores of two kinds of GeneC on 22 kinds of immune cells, we found that there was a complex relationship between the GeneC with different prognosis and their corresponding TME scores. For example, the GeneC1 with the best prognosis had significantly higher scores on CD8 T cell, activated NK cells, and M1 macrophages, and significantly lower scores on M0 macrophages, M2 macrophages, and resting CD4 memory T cells relative to the GeneC2 with poor prognosis (Figure 3(d)).
In addition, GO and KEGG enrichment analyses of Signature G1 and Signature G4 indicated that Signature G1 was mainly involved in lipid metabolism regulation, including regulation of phosphatidylinositol 3 kinase activity, regulation of phospholipid metabolic process, regulation of lipid kinase activity, regulation of lipid metabolic process, phospholipid metabolic process, and cellular lipid catabolic process. And Signature G1 was also linked to immune pathways for cancer, such as 17 cell differentiation, 1 and 2 cell differentiation, and T-cell receptor signaling pathway ( Figure S6(a)). While Signature G4 mainly mediates calcium (Ca 2+ ) signal transduction, the related pathways of enrichment were positive regulation of cAMP-mediated signaling, positive regulation of release of sequestered calcium ion into cytosol, positive regulation of calcium ion transport into cytosol, positive regulation of calcium ion transmembrane transport, etc. Signature G had also been significantly associated with multiple diseases and viral infections, such as Alzheimer's disease, thyroid cancer, Epstein−Barr virus infection, human immunodeficiency virus 1 infection, and human cytomegalovirus infection ( Figure S6(b)).   4 *** *** *** *** *** *** *** *** *** *** *** ** ** ** ** ** ** ***p<1e−5 **p<0.01 *p<0.05 TMECluster  TMEC1  TMEC2  TMEC3  obtained. e importance of 102 DEGs was assessed using random forest. According to our set parameters and the plot of the random forest, the ntree � 100 was selected ( Figure S7(a)). And each DEG was ranked according to its importance ( Figures S7(b) and S7(c)). en, according to the TPM expression levels of 102 genes, k-means algorithm was used to classify these genes into four categories: Signature G1-Signature G4 ( Figure S8(a)), including 41, 36, 15, and 10 genes, respectively ( Figure S8(b)). rough PCA analysis and multivariate Cox regression analysis, we got a TME scoring scheme. Heatmap gave the expression patterns of 102 genes in different TME score, TMEC, and GeneC (Figure 4(a)). e TME score of the two GeneC samples was calculated and sorted by TME scoring scheme. e TME score of GeneC2 with poor prognosis was significantly higher than that of GeneC1 with good prognosis by analyzing the relationship between TME score and GeneC (Figures 4(b) and 4(c)). e prognostic performance of the two TME score groups was significantly different according to median of TME score, the survival advantage of the high-TME score group was much worse than that of the low-TME score group (Figure 4(d)).

Clinical and Mutation Characteristics of TME Score.
TCGA dataset provides mutation data and clinical information of ovarian cancer samples, including age, grade, and stage. We analyzed the relationship between TME score and these mutation and clinical features.
e box plots showed that there was no significant correlation between TME score and grade and stage (Figures 5(a) and 5(b)). TME score was correlated with the age of ovarian cancer samples, and TME score was significantly upregulated in patients aged 50-60 years compared with patients under 50 years old   Journal of Oncology 5 ( Figure 5(c)). We screened a group of important genes related to TME score. Fisher's test was used to obtain 15 genes with significant differences in mutation frequency (excluding intron and silent mutations) between high-TME score and low-TME score. e waterfall plot showed that the mutation frequency of CACNA1D and BIRC6 in high-TME score group was significantly higher than that in low-TME score group, while BRCA1 gene showed the opposite trend ( Figure 5(d)).

Immune Gene Expression of TME Score.
To study the immune status of different TME score, we analyzed the expression of immune-related genes, including immune activation genes, immune checkpoint genes, and TGF/EMT pathway genes under different TME score, different TMEC, and different GeneC. All the immunoactivated genes detected were differentially expressed in high-TME score group and low-TME score group. e expression of CXCL10, CXCL9, GZMA, GZMB, PRF1, CD8A, and TNF in the low-TME score group was significantly higher than that in the high-TME score group with poor prognosis ( Figure S9). Although we only observed significant changes in CXCL10 and CXCL9 in the heatmap (Figure 6(a)). According to the generated heatmap, the expression of immune checkpoint gene IDO1 decreased with the increase of TME score (Figure 6(b)). Gene differential expression analysis of immune checkpoints in high-TME score and low-TME score groups showed that the expression levels of all seven immune checkpoints in low-TME score group were significantly higher than those in high-TME score group ( Figure S10). In addition, the genes VIM, ACTA2, COL4A1, TME score Type TME score Type TME score-H TME score-L TME score Type TME Score -H TME Score -L (d) Figure 4: Construction of TME scoring scheme. (a) e expression heatmap of 102 genes screened by random forest. (b) TME score of two GeneCs (two-tailed t-test). (c) e TME score distribution of the samples in two GeneCs (two-tailed t-test). (d) Prognostic outcomes in two TME score groups (log-rank test).     Figure 6: Immune gene expression of TME score. (a) Heatmap expression of immunoactivated genes under different TME score, different TMEC, and different GeneC. (b) e correlation between immune checkpoints expression and TME score, TMEC types, and GeneC clusters. (c) Expression trends of genes in TGF/EMT pathway in TME score group, TMEC group, and GeneC group.

Event
TGFBR2, ZEB1, and SMAD9 and TWIST1 in TGF/EMT pathway also showed significant differences in expression between the two TME score groups ( Figure S11). is trend was not evident in heatmap (Figure 6(c)).

Discussion
Ovarian cancer is a gynecological malignant tumor with high mortality. One of the main characteristics that distinguishes this cancer from other solid tumors is the specific TME in the ovary [15]. In the pathological process of ovarian cancer, various types of immune cells infiltrate into TME, which is related to the clinical outcome of ovarian cancer [16]. Previous studies have shown that tumor immune infiltration in ovarian cancer is cohort and subtype dependent, and activated CD4+T and CD8+T tumor infiltrating lymphocytes are associated with good OS of ovarian cancer [5]. A high number of M0 and M1 macrophages were strongly associated with a better prognosis, and M2 macrophages led to worse OS [17]. Treg cells inhibit tumor-specific T-cell immunity and contribute to the growth of ovarian cancer in vivo, which is associated with a higher risk of death and reduced survival [18]. Activated mast cells and neutrophils were negatively correlated with the prognosis of ovarian cancer [19]. In fact, in recent years, several studies have classified multiple subtypes of ovarian cancer based on tumor immune cell infiltration [6,[20][21][22]. e different results obtained by these studies also remind us of the heterogeneity and great variation of tumor immune infiltration patterns. In this study, we selected a total of three ovarian cancer cohorts from TCGA and GEO, and combined the data of all cohorts to identify several immune cells related to the prognosis of ovarian cancer from 22 immune cells in TME. e score of immune cell infiltration in ovarian cancer TME was evaluated by CIBERSORT tool. Four TME subtypes of ovarian cancer samples were defined according to the unsupervised clustering of immune cell infiltration scores, and the main immune cells in each TME were detected.
ere was a high content of M0 macrophages in TMEC1, and more activated dendritic cells and activated mast cells were enriched in TMEC2. M1 macrophages, gamma delta T cells, and activated memory CD4 T cells in TMEC3 had higher scores, and resting CD4 memory T cells were the highest immune cells in TMEC4. M1 macrophages have always been regarded as antitumor cells [23]. In many studies, tumor infiltrating gamma delta T cells exhibited immunosuppressive activity [24]. High expression of activated memory CD4 T cells was associated with better clinical prognosis in patients with bladder cancer [25]. erefore, it is not difficult to detect that TMEC3 has a good prognosis.
Furthermore, in this study, the difference analysis of TMEC3 and the other three TMECs, respectively, identified the DEGs and took the intersection, and 585 DEGs were obtained. NMF based on 585 genes defined two genomic subtypes with different OS and immune cell infiltration in ovarian cancer samples. erefore, distinct genomic subtypes should be managed differently. Considering the heterogeneity of TME, we constructed a TME scoring scheme to quantify the TME infiltration pattern of individual ovarian cancer patients. TME score was higher in GeneC2 with poor prognosis than in GeneC1 with good prognosis. e TME score assessed by TME scoring scheme was related to the age of individuals with ovarian cancer. Higher TME score was also associated with higher frequency of CACNA1D and BIRC6 mutation and lower frequency of BRCA1 mutation. Mutations of CACNA1D and BIRC6 in ovarian cancer were first reported in our study. e mutation of BRCA1 in ovarian cancer has been widely reported, and it is considered to be an important marker of hereditary ovarian cancer [26,27].
In addition to cellular components, TME also includes extracellular matrix and a variety of signal molecules, such as chemokines and cytokines [28]. In our work, we evaluated whether TME score can characterize TME status by calculating the expression of several types of immune genes under two TME phenotypes. As we speculated, the expression of cytotoxic genes and immune checkpoint genes were significantly upregulated in low-TME score. In addition, most of the genes in TGF/EMT pathway also showed significant differences in expression between the two TME phenotypes.
In summary, based on the infiltration scores of 22 immune cells, we evaluated the TME infiltration status of ovarian cancer samples, and analyzed the complexity and heterogeneity of individual ovarian cancer immune microenvironment with different TME infiltration states. e TME scoring scheme was constructed to quantify the TME infiltration pattern of individual ovarian cancer, and the complex correlation and synergistic effect between TME score and prognosis, gene mutation, and immune gene expression were clarified.

Data Availability
e dataset analyzed in this study could be found in GSE14764 at https://www.ncbi.nlm.nih.gov/geo/query/acc. cgi?acc�GSE14764 and in GSE26712 at https://www.ncbi. nlm.nih.gov/geo/query/acc.cgi?acc� GSE26712.  Figure S4: e optimal number of clusters was determined according to cophenetic, dispersion, evar, residuals, rss, silhouette and sparseness. Figure S5: Consensus matrix heatmap with clustering number 2-10 respectively. Figure S6: GO and KEGG enrichment analysis for (A) Signature C1 and (B) Signature C4. Figure S7: Importance evaluation of 102 DEGs A: Random forest plot of ntree � 100. B: Distribution of 102 DEGs in GeneC. C: Order of importance of 102 DEGs. Figure S8: K-means classification based on 102 genes. A: 102 DEGs were divided into 4 categories according to the TPM expression level of 102 genes by k-means algorithm. B: e number of genes contained in each signature G1. Figure S9: e expression levels of immunoactivated genes in TMEC group, GeneC group and TMEscore group, respectively. Figure S10: e expression levels of immune checkpoint genes in TMEC group, GeneC group and TMEscore group, respectively. Figure S11: e expression levels difference of genes in TGF/EMT pathway in TMEC group, GeneC group and TME score group, respectively. (Supplementary Materials)