Integrated Analysis Identifies an Immune-Based Prognostic Signature for the Mesenchymal Identity in Gastric Cancer.

Background Gastric cancer (GC) has been divided into four molecular subtypes, of which the mesenchymal subtype has the poorest survival. Our goal is to develop a prognostic signature by integrating the immune system and molecular modalities involved in the mesenchymal subtype. Methods The gene expression profiles collected from 6 public datasets were applied to this study, including 1,221 samples totally. Network analysis was applied to integrate the mesenchymal modalities and immune signature to establish an immune-based prognostic signature for GC (IPSGC). Results We identified six immune genes as key factors of the mesenchymal subtype and established the IPSGC. The IPSGC can significantly divide patients into high- and low-risk groups in terms of overall survival (OS) and relapse-free survival (RFS) in discovery (OS: P < 0.001) and 5 independent validation sets (OS range: P = 0.05 to P < 0.001; RFS range: P = 0.03 to P < 0.001). Further, in multivariate analysis, the IPSGC remained an independent predictor of prognosis and performed better efficiency compared to clinical characteristics. Moreover, macrophage M2 was significantly enriched in the high-risk group, while plasma cells were enriched in the low-risk group. Conclusions We propose an immune-based signature identified by network analysis, which is a promising prognostic biomarker and help for the selection of GC patients who might benefit from more rigorous therapies. Further prospective studies are warranted to test and validate its efficiency for clinical application.


Introduction
Gastric cancer (GC) is ranked as the third cause of cancerrelated death; each year, there is about one million newly diagnosed GC [1,2]. In the early stage of GC, surgery can prolong the survival of patients [3]. However, more than half of the patients with advanced-stage GC have local recurrence or distant metastasis which eventually leads to poor prognosis (5-year survival rate is about 5-10%) [3,4]. Therefore, researchers and clinicians need to focus on targeted prognostic and treatment strategies and accurately identify and personalize treatments to extend GC patient survival.
Gene expression-based biomarkers in tumor tissue are reliably associated with cancer prognosis [5,6]. Large-scale public cohorts with tumor gene expression data provide a broader opportunity to search for reliable prognostic markers for gastric cancer. Several studies have developed markers based on gene expression for GC prognosis prediction [7][8][9][10]. However, due to the heterogeneity of GC, most of the markers have low prognostic efficacy and cannot be directly used in clinical practice. Recently, four gastric cancer subtypes with different molecular and clinical characteristics were found [11], among which the mesenchymal subtype had the poorest prognosis. Thus, the intrinsic modalities of the more malignant mesenchymal subtype could potentially be used for risk assessment in GC patients and for development of more precise and targeted treatment strategies.
There is growing evidence that the immune system plays an important role in the development and progression of cancer [12,13]. Many previous studies have shown that immunotherapy targeting immune checkpoint is strongly pursued [14,15]. In addition, previous studies have tentatively shown that the immune system has a prognostic value in gastric cancer [16,17]. Therefore, it is possible to develop prognostic markers based on immune genes for clinical application in gastric cancer.
In this study, we applied network analysis to integrate mesenchymal modalities and immune signature genes from the ImmPort database underlying the mesenchymal subtype. The master regulator analysis showed that six immune genes were the key factors of the mesenchymal subtype. We pooled and analyzed six public cohorts containing 1,221 GC patients to develop and validate an immune gene-based prognostic signature for GC (IPSGC) using these six immune genes. Although immune prognostic markers for gastric cancer have been reported [18], no research has been done for risk stratification by integrating the characteristics of the mesenchymal subtype. The robustness and reliability of our model were proved by sufficient verification in 5 independent datasets. In addition, we conducted a comprehensive analysis to investigate the intrinsic biological and clinical relevance of IPSGC. Our signature combines the molecular modalities involved in the mesenchymal subtype and would be used to screen GC patients who may benefit from more rigorous treatment.

Materials and Methods
2.1. Ethical Approval. The researchers were authorized to conduct the study by the Ethics Committee of the Beilun People's Hospital, Ningbo, China. All procedures were implemented in accordance with the Declaration of Helsinki and relevant policies in China.

Patient Series.
We retrospectively collected and comprehensively analyzed the gene expression profiles (GEPs) from 6 independent datasets, containing 1,221 cases. The complete lists of all GEPs are shown in Table S1. These datasets involved patients from GSE15459 (n = 192), GSE13861 (n = 65), GSE84437 (n = 433), GSE62254 (n = 300), GSE26901 (n = 97), and GSE29272 (n = 134). The expression data of all cohorts together with the corresponding clinical parameters were downloaded from Gene Expression Omnibus (GEO). The molecular subtyping information for GSE15459 and GSE62254 was retrieved from Cristescu et al.'s study [11]. The detailed clinical characteristics of the 6 datasets were described in Table 1. The design and workflow of this study are illustrated in Figure 1(a).

Expression Data
Preprocessing. GEPs were downloaded from GEO by "GEOquery" (R package, version 1.0.7) [19] and normalized with the robust multiarray analysis (RMA). For each cohort, the GEPs were collapsed from probe IDs to gene symbols; if multiple probe IDs correspond to the same gene symbol, the one with the highest mean value was kept as the representative of the corresponding gene [20].
2.4. Integrated Network Analysis. Immune genes (IRGs) were downloaded from the ImmPort database [21]. IRGs measured by all cohorts were kept. Network analysis was applied to integrate mesenchymal modalities and immune genes underlying the mesenchymal subtype. Together, we used the GSE15459 dataset as the training cohort. 46 immune genes (|log 2FC| > 1:5, FDR < 0:05) and 1,865 target genes (log 2FC > 0:5, FDR < 0:05) were differentially expressed by comparing the mesenchymal subtype with the other three subtypes (MSI, TP53-, and TP53+). Integrated network analysis was performed by "RTN" (R package, version 2.10.0) [22]. Master regulator analysis (MRA) was done to examine significantly overrepresented epithelial-mesenchymal transition (EMT) genes [23] within each immune gene's regulon. Six immune genes of top significance were kept as the key factors of the mesenchymal subtype.
2.6. Validation of the IPSGC. The IPSGC score was further evaluated in the 5 independent validation cohorts in terms of OS and RFS by the log-rank test, respectively. The IPSGC then was evaluated with other clinical parameters in the uniand multivariate Cox analyses. In the multivariable Cox regression, tumor stages, histology, and gender were included as covariates.

Profiling of Immune Cell Infiltration.
To analyze the immunobiological characteristics of the high-and low-risk groups, we used CIBERSORT [24], to characterize immune cells' abundance of tumor tissue GEPs. Based on a set of reference immune cell GEPs, CIBERSORT used support vector regression [24] to deconvolute tumor tissue gene expression profile with each type of immune cell enrichment. More specifically, standardized gene expression profiles were submitted to the CIBERSORT Web portal (http://cibersort. stanford.edu/) with 1,000 permutations. For each sample, CIBERSORT quantified the relative proportions of 22 infiltrated immune cell types.

Gene Ontology (GO) Analysis and Gene Set Enrichment
Analysis (GSEA). GO analysis was conducted for the significantly upregulated genes in the high-risk group using g:Profiler [25]. GSEA [26] was conducted using "fgsea" (Bioconductor package, version 1.12.0) with 1,000 permutations. Gene sets were retrieved from the Molecular Signature Database (MSigDB hallmark and KEGG, version 7) [26]. A P value below 0.05 was used to choose significant gene sets.
2.9. Statistical Analysis. Continuous variables were compared using the Wilcoxon signed-rank test or Kruskal-Wallis ranksum test. Kaplan-Meier analysis was performed using the log-rank test using "survival" (R package, version 2.41.3). Uni-and multivariable analyses were conducted by the Cox proportional hazards model. For all tests, a P value below 0.05 was used to choose significant gene sets. Statistical significance is presented as follows: * P < 0:05, * * P < 0:01, and 2 BioMed Research International

Integrative Analysis Reveals Seven Immune Genes as Master Regulators for the Mesenchymal Subtype of GC.
GC is a molecularly heterogeneous disease. In recent studies [11], four molecular subtypes have been identified, among which the mesenchymal subtype has the worst prognosis ( Figure S1(A, B)). To investigate the immune system role underlying the mesenchymal subtype, a total of 1,221 patients with GC from 6 independent public cohorts were included (Table 1). We applied network analysis to integrate mesenchymal modalities and immune genes in the GSE15459 cohort (Figure 1(a)). We investigated the differences between the mesenchymal subtype and the other three subtypes, as the mesenchymal subtype has shown the worst prognostic outcome. The networks consist of 46 immune genes (| log 2FC| > 1:5, FDR < 0:05) and 1,865 target genes (log 2FC > 0:5, FDR < 0:05) by comparing the mesenchymal subtype with the other three subtypes (Figure 1(b)). Master regulator analysis (MRA) identified six immune genes (CLEC11A, NRP2, TPM2, ANGPTL2, FGF7, and FABP4) as the key factors of the mesenchymal subtype (Table S2). These six immune genes are significantly upexpressed in the mesenchymal subtype in the training cohort and one validation dataset containing molecular subtyping information ( Figure S2). FN1 [27], SNAI1 [28], TGFB1 [29], and CDH1 [30], which are epithelial-mesenchymal transition (EMT) signature [23] genes, are significantly correlated with the six immune genes, showing that the mesenchymal property is indeed governed by these six immune genes ( Figure S3). Results from the univariable Cox proportional analysis demonstrated strong prognostic values of the six immune genes for OS ( Figure S4). Therefore, these six immune genes are key factors of the mesenchymal modalities and can be potentially applied for risk assessment of GC patients.  (Table S3). The risk score plots clearly demonstrate the difference between alive and dead patients ( Figure S5(A, B)). The median risk value was set as the cutoff to separate patients into high-and low-risk groups    (Figure 2(a) and Table S3).

In Silico Functional Assessment of the IPSGC.
To gain insight into the biological differences between risk groups, we performed immune cell infiltration, GO, and GSEA anal-yses. Immune cell types, such as macrophages M2, T cell CD4 memory resting, and T cell CD8, were enriched in training and validation cohorts ( Figure S6). We observed a significantly higher proportion of macrophage M2 in the high-risk group and a significantly higher enrichment of plasma cells in the low-risk group (Figures 4(a) and 4(b)). Furthermore, these two risk groups' specific immune cell infiltration was also validated in validation cohorts ( Figure S7). GO analysis showed that the differentially expressed genes between risk groups were mostly involved in immune responses and tumor metastasis ( Figure 5(a)). Enrichment analysis between high-and low-risk groups identified that many mesenchymal phenotype-related pathways, including the TGF-beta signaling, epithelialmesenchymal transition, angiogenesis, and focal adhesion, were positively enriched in the high-risk group (P < 0:01) ( Figure 5(b) and Table S4). We observed that the risk score levels were significantly increased with tumor stages,  BioMed Research International of which stage IV patients have the highest risk score ( Figure S8(A)). Moreover, the risk scores were significantly higher in diffuse GC than in other GC histologies ( Figure S8(B)).

Discussion
Gastric cancer (GC) is the third leading cause of cancer death with pathological and molecular heterogeneity characteristics [1,2,11]. In the past, clinicopathological indicators have been used for risk stratification of GC. However, some patients with advanced GC remained stable for several years after surgery, while some patients with early GC progressed rapidly [31]. At present, various multigene prognostic markers have been developed [7][8][9][10], but their prediction efficiencies were still uncertain. Therefore, a new signature that can accurately recognize patients with poor GC prognosis is urgently needed to give more rigorous treatments. The GC transcriptome was unsupervised classified into four molecular subtypes with different molecular and clinical characteristics [11]. Prognostic signature screened based on molecular portraits specific to the worst prognosis subtype may be used for risk stratification of GC patients [32,33]. Recent studies have shown that the tumor microenvironment plays an important role in the occurrence and develop-ment of tumors [34]. The occasion, growth, and forecast of GC are closely related to the crosstalk between different immune cells and GC cells [35]. In this study, we established an immune gene-based prognostic signature for GC (IPSGC) by integrating mesenchymal modalities and the immune system underlying the mesenchymal subtype and validated it in 5 independent validation cohorts which covered most common microarray platforms. The large sample size provided sufficient validation for the IPSGC and makes it more robust. To our knowledge, no research has been done for risk stratification by integrating the immune system and the characteristics of the mesenchymal subtype in GC. The IPSGC was constructed by six immune genes as the key factors of the mesenchymal subtype and could stratify patients into different risk groups. Within these six immune genes, CLEC11A was the driver gene in multiple myeloma (MM) [36]. NRP2 could promote gastric adenocarcinoma lymphatic invasion with VEGF-C stimulation [37]. High expression of ANGPTL2 is associated with tumor malignancy, early recurrence, and poor prognosis in GC patients [38]. FGF7 promotes gastric cancer invasion and migration [39]. Elevated expression of FABP4 correlates with poor prognosis in non-small-cell lung cancer (NSCLC) [40]. The defined high-risk group showed a worse OS and RFS than the lowrisk group. The IPSGC remained an independent prognostic   Previous studies have described that the infiltration of plasma cells contributes to prolonged survival in GC [41], while mac-rophage M2 indicates a poor prognosis in GC [42]. We observed a significantly higher proportion of macrophage M2 in the high-risk group and a significantly higher enrichment of plasma cells in the low-risk group. Moreover, some mesenchymal phenotype-related pathways, such as EMT, angiogenesis, and focal adhesion, were positively enriched This study still has some limitations. First of all, the prognostic signature was screened from gene expression profiles generated from microarray platforms, which are expensive, are difficult to operate, and involve professional bioinformatics expertise, so it is difficult to be popularized in daily clinical application. Second, the training and validation datasets were all from retrospective studies in the study, including fresh frozen samples. Therefore, the efficiency and stability of FFPE samples are still in doubt. In the following improvement process, more datasets containing more clinical characteristics need to be included for more extensive screening and validation.

Conclusion
Taken together, our network analysis established an immune gene-based signature, which could effectively predict GC patients' survival. Our study is the first attempt to integrate tumor heterogeneity and the immune system to develop the prognostic signature for GC.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
All authors declare that they have no conflicts of interest.