Identification of Prognosis-Related Molecular Subgroups and Construction of a Prognostic Prediction Model Using Immune-Related Genes in Pancreatic Cancer

Background Pancreatic cancer patients with similar clinicopathological status exhibit substantially different therapeutic responses, which might be caused by the vast molecular heterogeneity of tumors. In this study, we attempted to identify specific molecular subgroups and construct a prognostic prediction model based on the expression level of immune-related genes in pancreatic cancer. The transcriptome profiling, single nucleotide variation, copy number variation, clinicopathological information, and follow-up data of pancreatic cancer patients were obtained from The Cancer Genome Atlas database. Thereafter, the immune-related genes with prognostic significance were identified for further consensus cluster analysis. The molecular characteristics and clinicopathological information were compared between the identified subgroups, and a weighted correlation network analysis was performed to identify the hub genes associated with the subgroups. Finally, the prognostic prediction model based on immune-related genes was established using the least absolute shrinkage and selection operator (LASSO) analysis. Results A total of 67 immune-relevant genes with prognostic significance were selected and used for the consensus cluster analysis. The total samples were divided into two groups, C1 and C2. The subgroup C1 had a significantly worse prognosis than C2, as well as lower levels of immune cell infiltration, which indicate an immunosuppressed state. The mutational rate of the cancer-related genes including KRAS, TP53, and RNF43 was higher in the C1 subgroup. The C1 subgroup was associated with more advanced tumor grade and T stage and with higher mortality. Using LASSO regression, we developed a prognostic prediction model based on the expression levels of 19 immune-related genes, which we validated in three external data sets. In addition, we identified four potential therapeutic and prognostic biomarkers (TNNT1, KCNN4, SH2D3A, and PHLDA2). Conclusion We identified two novel molecular subgroups of pancreatic cancer and developed a prognostic prediction model based on the expression levels of immune-related genes, which could be used in a clinical setting and could aid in unraveling the molecular processes leading to the development of pancreatic cancer.


Introduction
Pancreatic cancer is one of the most lethal malignancies affecting the digestive system. It is characterized by an insidious onset with nonspecific early symptoms and early metastasis [1,2]. As a result, pancreatic cancer patients often are at an advanced stage or present distant metastases when they are diagnosed. is greatly hinders surgical management, which is the most effective therapeutic approach for pancreatic cancer nowadays [3]. In addition, pancreatic cancer is associated with a high risk of relapse. Altogether, these factors contribute to the high mortality associated with this disease, which is the fourth leading cause of cancerrelated deaths in America, and that has a five-year survival rate of less than 9% [4]. e traditional tumor staging methods that merely rely on basic clinicopathological information such as the tumor-node-metastasis classification or the age group in the same category individuals with substantial phenotypic differences have a considerable impact on prognosis. erefore, traditional staging methods are not sensitive enough to provide individualized diagnosis and treatment for pancreatic cancer patients [5,6]. In addition, classical serum tumor biomarkers such as carbohydrate antigen 19-9, carbohydrate antigen 125, or carcinoembryonic antigen perform poorly in the early diagnosis and outcome prediction of pancreatic cancer [7,8]. Improving early diagnosis and developing new risk stratification methods will contribute to more effective management and a better prognosis for pancreatic cancer patients. With the rapid development and progress of sequencing methods, huge amounts of high-throughput data from different omics technologies are now available in public databases such as e Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO). Wise use of this information may help us further investigate the molecular mechanisms underlying the development of pancreatic cancer.
Previous studies have already investigated the molecular classification and the internal heterogeneity of pancreatic cancer using omics data [9,10]. Integrating genomics data, Bailey et al. established four distinct molecular subtypes of pancreatic cancer (squamous, pancreatic progenitor, immunogenic, and aberrantly differentiated endocrine-exocrine), a new classification system that might contribute to understanding the molecular evolution of pancreatic cancer, as well as to developing new therapeutic methods [11]. Using transcriptional data, Collisson et al. distinguished three pancreatic ductal adenocarcinoma subtypes (classical, quasimesenchymal, and exocrine-like) that are associated with different survival and progress rates, as well as distinct therapeutic responses [12]. Finally, a classifier constructed using the expression levels of 19 miRNAs was reported to accurately predict the prognosis of pancreatic cancer patients with high sensitivity [13]. ese studies highlight the potential ways in which the use of omics data can help molecularly characterize tumors, contributing to the discovery of new diagnostic and prognostic biomarkers. e malignant behaviors of pancreatic cancer largely depend on the complex cross-talk between tumor cells and the tumor immune microenvironment [14]. A deeper understanding of the immune landscape in pancreatic cancer could help elucidate these interactions and develop new immunotherapy approaches [15]. A previous study identified four molecular subtypes of prostate cancer that markedly differed in their prognosis by analyzing the expression levels of immune-related genes. Six of these immune-related genes were used to develop a prognostic prediction model using least absolute shrinkage and selection operator (LASSO) regression [16]. To our knowledge, no molecular subgroups of pancreatic cancer have been identified using immune-related genes to date.
In this study, we attempted to identify specific molecular subtypes of pancreatic cancer that are closely associated with the immune system signature. Analyzing the expression levels of immune-related genes with prognostic value using consensus cluster analysis, we identified two specific molecular subgroups, C1 and C2. Significant differences in the overall survival (OS) and clinical features including tumor grade, T stage, and survival status were observed between the C1 and C2 subgroups. To molecularly characterize these two subgroups, we annotated the differentially expressed genes (DEGs) according to gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) and performed gene set enrichment analysis (GSEA). In addition, we also performed immune infiltration, mutation spectrum, and copy number variation analyses. Weighted correlation network analysis (WGCNA) identified four hub genes associated with the molecular subgroups. In the future, it is necessary to further study the function of the four genes identified, TNNT1, KCNN4, SH2D3A, and PHLDA2, and evaluate their potential use as biomarkers. Finally, we constructed a prognostic prediction model using the expression levels of 19 immune-related genes and validated it using three external data sets. In addition, uni-and multivariate Cox analyses revealed that the constructed model was an independent prognostic factor in pancreatic cancer.
ese results indicate that the model could be used in a clinical setting in the near future to aid clinicians in making management-related decisions.

Data Download and Preprocess.
Transcriptome data (RNA-Seq, HTSeq-Counts type), single nucleotide variation data (MuTect2, Annotation type), copy number variation (Copy Number Segment, Masked type), clinicopathological information, and the latest follow-up data of pancreatic cancer patients from TCGA database were downloaded from the Genomic Data Commons Data Portal (https://portal. gdc.cancer.gov) on December 8, 2020 [17]. Genes with RNAsequencing missing values in more than half of the total samples and data from patients with overall survival (OS) of less than 30 days were excluded from the subsequent analyses. A total of 171 pancreatic cancer samples met the selection criteria and were used for consensus cluster analysis. In addition, three GEO data sets, GSE62452 [18], GSE71729 [19], and GSE78229 [20], were used for external validation [21,22]. e detailed information on pancreatic cancer samples from TCGA and the three validation data sets are provided in Supplementary Tables 1 and 2, respectively.

Identification of Prognosis-Related Molecular Subgroups
Based on the Immune-Related Genes. To study the existence of distinct molecular subgroups in pancreatic cancer, we selected immune-related genes with prognostic significance for further analyses. First, 1,811 immune-relevant genes were obtained from the Immunology Database and Analysis Portal (ImmPort) website (https://www.immport.org/) [23].
ereafter, log-rank and Cox survival analyses were performed to identify survival-relevant genes based on the gene expression data of pancreatic cancer samples from the TCGA data set. A total of 1,350 genes had associated p-values < 0.01 in both the log-rank and Cox analyses and were selected for further study. Immune-related genes with prognostic significance were defined as the intersection of potential prognosis-related markers and immune-related genes, and a total of 67 genes were selected. Detailed information about the selected genes is provided in Supplementary Table 3. Subsequently, consensus cluster analysis based on the above 67 genes was performed using the R package "ConsensusClusterPlus" [24]. e number of consensus clusters was set to 2, based on the package guidelines. Finally, the pancreatic cancer samples were separated into groups C1 (n � 103) and C2 (n � 68). OS and principal component analysis (PCA) between the two subgroups were performed with the R packages "survival," "survminer," and "DESeq2" [25][26][27][28].

Molecular Characteristics and Clinicopathological Information Comparisons between the Two Subgroups.
To further elucidate the mechanisms underlying the two subgroups, the DEGs between the C1 and C2 subgroups were identified and selected using the criteria |Log2 Fold Change| > 1 and adjusted p-value < 0.05. A total of 2,698 DEGs were identified using the R package "DESeq2" [28]. Furthermore, GO, KEGG, and GSEA were performed using the R packages "clusterProfiler" and "enrichplot" to identify the related molecular signaling pathways of the specific subgroups [29,30]. Detailed information on the DEGs is provided in Supplementary Table 4. In addition, the TIMER2.0 database was used to estimate and compare the immune infiltrate levels between the two subgroups [31][32][33]. Next, mutation spectrum analysis of the two subgroups was performed using the R package "maftools" [34]. e function "mafComapre" of the R package "maftools" was used to perform Fisher's test on all genes to detect differentially mutated genes between C1 and C2 subgroups. Correlation analysis between gene expression levels and copy numbers was analyzed. Genes with different frequencies of copy number variations were compared between C1 and C2 subgroups. Finally, the clinicopathological features were compared between the C1 and C2 subgroups.

Identification of Genes Associated with the Molecular
Subgroups Using WGCNA. We performed WGCNA using the R package "WGCNA" to identify the genes associated with each molecular subgroup [35,36]. e hub genes were selected and visualized using the software Cytoscape (version 3.8.2) [37,38] and the website Metascape [39]. e webbased tool gene expression profiling interaction analysis (GEPIA) was used to visualize the expression levels of the hub genes in 179 pancreatic tumor samples and 171 normal tissue samples [40]. Overall survival (OS) and relapse-free survival (RFS) analyses of the selected hub genes in pancreatic cancer patients were performed based on the Kaplan-Meier Plotter website [41].

Construction of a Prognostic Prediction Model Based on 19
Immune-Related Genes. To aid clinicians in stratifying pancreatic cancer patients based on the risk level, we developed a prognostic predictor model based on the expression data of 67 immune-relevant genes via LASSO analysis using the R package "glmnet" according to the official recommendations [42,43]. First, the total samples from the TCGA data set were randomly divided into the training (70% of the total sample, n � 120) and validation sets (30% of the total sample, n � 51). We developed a prognostic prediction model based on 19 immune-related genes using the training set. Detailed information on the 19 immunerelated genes is provided in Table 1. A risk score was assigned to each patient in the training set, according to the following formula: Risk score � 19 n�1 (Coefficient n * Expression of gene n ). Samples were categorized as high or low risk, considering the threshold of the median value of the overall risk scores. OS analysis was performed between the high-and low-risk groups. e predictive efficiency of the model was assessed by determining the receiver operating characteristic (ROC) curve and the area under the curve (AUC) using the R package "timeROC" [44]. e validation set and three external data sets (GSE62452, GSE71729, and GSE78229) were used to assess the validity of the model. e expression data of all data sets used were standardized using z-scores.

Uni-and Multivariate Cox Analyses of the Prognostic
Factors. We performed univariate analysis using the constructed model and common clinicopathological factors such as age, gender, tumor grade, clinical stage, T stage, M stage, and N stage, to identify the prognosis-related factors (p < 0.05). Next, these significantly prognostic factors were extracted for further multivariate Cox analysis to identify the independent prognostic factors (p < 0.05) in pancreatic cancer.

Statistical Analysis.
In this study, the R (version 3.6.3) and RStudio software were utilized to carry out the statistical analysis and figure preparation. p-values less than 0.05 were defined as statistically significant.

Identification of Specific Molecular Subgroups Based on Immune-Related Genes Using Consensus Cluster Analysis.
To identify the molecular subgroups associated with immune-relevant genes in pancreatic cancer, we extracted 1,811 immune-relevant genes from the ImmPort website, which provides user-friendly bioinformatic analysis tools for basic and clinical immunology. ereafter, log-rank and Cox survival analyses were performed to identify survival-relevant genes based on the gene expression data of pancreatic cancer samples from the TCGA data set. e genes with p < 0.01 in both log-rank and Cox analyses were selected and defined as genes with a prognostic value. Immune-related genes with prognostic significance were defined as the intersection of potential prognosis-related markers and immune-related genes, and a total of 67 genes were selected (Figure 1(a)). Expression data of the immune-related genes with prognostic significance were selected for further consensus cluster analysis. Samples of patients with survival time inferior to 30 days were excluded for further analyses to avoid possible disturbances. e consensus cumulative distribution function (CDF) plot showed that the slope of the CDF curve changes the most when the consensus index is set at 0 or 1 and the number of clusters is set as 2, as per official recommendations (Figure 1(b)). e relative change in the area under the CDF curve also indicated no substantial changes occur when the k value is higher than 2 ( Figure 1(c)). A tracking plot was constructed to reflect the distribution of the samples when k ranges from 2 to 10 ( Figure 1(d)). According to the cluster-consensus and consensus matrix plots, the total sample is divided into two distinct subgroups when k � 2 (Figures 1(e) and 1(f )). e pancreatic cancer samples were divided into groups C1 (n � 103) and C2 (n � 68). e expression level of the 67 immune-related genes among C1 and C2 subgroups was represented in a heat map (Figure 2(a)). e subgroup C1 has a significantly worse prognosis in comparison to the subgroup C2 (Figure 2(b); p < 0.0001). e existence of two distinct subgroups was further confirmed via PCA (Figure 2(c)).
ese results indicate the existence of two distinct molecular subgroups in pancreatic cancer according to the expression levels of prognosis-associated immune genes.

Identification of Signaling Pathways
Related to the Molecular Subgroups. Since the molecular subgroups of pancreatic cancer samples were significantly correlated to the OS, the DEGs between the two subgroups might be related with the initiation and development of pancreatic cancer. erefore, we first determined the DEGs between the C1 and C2 subgroups, using the standard criteria |Log 2 Fold Change| > 1 and adjusted p-value < 0.05. A total of 2,698 DEGs were identified, of which 827 were upregulated and 1,871 were downregulated (Figures 3(a) and 3(b)). To determine the function of the DEGs, GO, KEGG, and GSEA were used. e top 10 enriched terms of the GO analysis including the biological process, cellular component, and molecular function are shown in Figure 3(c) (Supplementary Table 5). Several important molecular mechanisms were enriched in the GO analysis, including regulation of transsynaptic signaling, multicellular organismal signaling, signal release, synaptic membrane, postsynaptic membrane, ion channel activity, ion gated channel activity, and potassium ion transmembrane transporter activity.
ese results indicate that there is a close relationship between cellular signal transduction pathways and the molecular subgroups. KEGG enrichment analysis revealed several molecular pathways that may play a vital role in the development of pancreatic cancer, such as cytokine-cytokine receptor interaction, cAMP signaling pathway, cell adhesion molecules, pancreatic secretion, and primary immunodeficiency (Figure 3(d) and Supplementary Table 6). GSEA showed that apical junction, glycolysis, mitotic spindle, mTORC1 signaling, and p53 pathway were significantly activated, whereas bile acid metabolism and pancreas β cell pathways were significantly inhibited (Figures 3(e) and 3(f )). Further study of these alterations might contribute to a better understanding of the distinct molecular mechanisms underlying the two molecular subgroups identified.

Comparisons of Immune Infiltration, Mutation Spectrum, Clinical Features, and Copy Number Variation between the C1 and C2 Subgroups.
As the subgroups identified based on the immune-related genes were significantly associated with prognosis, we compared the molecular characteristics and some clinical features between the C1 and C2 subgroups. e infiltration level of six immune-cell populations was higher in the C2 subgroup compared with the C1 subgroup ( Figure 4(a)). e difference was significant for the infiltration level of macrophages, myeloid dendritic cells, T cells CD4+, and T cells CD8+ (Figure 4(b)). ese results indicate an immunosuppression state in the C1 subgroup, which might contribute to the poor prognosis associated with this group. Mutational spectrum analysis of the two subgroups was performed. e top 20 mutated genes in the C1 and C2 subgroups are shown in Figures 4(c) and 4(d), respectively. e mutational frequency of KRAS, T753, RNF43, FLG, PCDH15, and ADAMTS16 was significantly higher in the C1 subgroup compared with the C2 subgroup (Supplementary Table 7; Figure 4(e)). A comparison of common clinicopathological characteristics indicated that the two subgroups significantly differed in tumor grade, T stage, and survival (Table 2). e C1 subgroup, with a worse prognosis, was associated with a more advanced tumor grade and T stage  Further analyses of genes with the difference in copy number variation were performed between the C1 and C2 subgroups. Interestingly, our results demonstrated that PTK2 and PLEC expression levels were significantly correlated with their copy number (Figures 5(a) and 5(d)). e PTK2, also known as FAK, upregulation of its expression could accelerate progression and contribute to an immunosuppressive environment of pancreatic cancer [45,46]. PTK2 gene expression was higher in the C1 than the C2 subgroup ( Figure 5(b)). And the frequencies of amplification and single gain were also higher in the C1 subgroup ( Figure 5(c)).
In addition, previous research proved that the PLEC gene could serve as a novel biomarker to identify preinvasive, primary, and metastatic pancreatic ductal adenocarcinoma, and its expression was continuously increasing along with tumor progression [47]. Similarly, PLEC expression was also significantly higher in the C1 subgroup compared with the C2 subgroup, and frequencies of amplification and single gain were lower in the C2 subgroup than the C1 subgroup (Figures 5(e) and 5(f )). e above results might contribute to elucidating the underlying mechanisms behind the heterogeneity between different molecular subgroups.

Hub Genes Associated with the Molecular Subgroups Were
Identified via WGCNA. We performed WGCNA of 2,698 DEGs to identify specific gene coexpression modules, and subsequently, we identified the gene coexpression module most correlated with the clinical traits. e complete clinical information is shown in the clustering dendrogram with the trait heat map (Figure 6(a)). e soft threshold of WGCNA was defined as 6 to maintain the balance between scale independence and mean connectivity ( Figure 6(b)). Genes were separated into eight modules depicted in different colors in Figure 6(c). e correlation analysis between the gene modules and clinical traits identified that the module labeled as brown was the one most significantly correlated with the C1 subgroup ( Figure 6(d)). In addition, this module was also significantly correlated with the T stage and the clinical stage. Gene significance, defined as the correlation between gene expression and clinical traits, was put in relation to module membership, defined as the correlation between the module and the gene expression profile. e correlation between module membership and gene significance for the Tstage, the clinical stage, or the subgroup in the brown module is shown in Figures 6(e)-6(g). Detailed information on the genes in the brown module is provided in Supplementary Table 8. To further identify the hub genes in the brown module, the hub genes were selected using the software Cytoscape (Figure 7(a)) and identified using the database Metascape (Figure 7(b)). Analysis of the expression level of the hub genes indicated that the genes TNNT1, KCNN4, SH2D3A, and PHLDA2 were differentially expressed between 179 pancreatic tumors and 171 normal tissue samples (Figure 7(c)). e expression level of the four genes was significantly correlated with the OS, with higher expression levels indicating a worse prognosis (p < 0.05; Figure 7(d)), and with RFS (p < 0.05; Figure 7(e)). ese results suggest that these four genes might play an important 2 3 3 4 4 5 5 5 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 9 10 10   Journal of Oncology

Construction of the Prognostic Prediction Model Based on the 19 Immune-Related Genes.
In this study, a prognostic prediction model was developed to accurately stratify pancreatic cancer patients according to the risk level, which could be a tool of great importance in a clinical setting. To select the genes to construct the model, LASSO regression was performed using the 67 immune-relevant genes with prognostic significance. Samples in the TCGA data set were randomly divided into the training (70% of the total samples, n � 120) and validation sets (30% of the total samples, n � 51). e prognostic prediction model was constructed using the training data set using the LASSO regression model. e c-index was highest when log (λ) � −3.08 (Figure 8(a)). With these parameter values, a total of 19 immune-related genes were selected to construct the model. e coefficients of different genes corresponding to various combination models are shown in Figure 8  Frequency % (f ) Figure 5: Comparisons of genes with copy number variation between the C1 and C2 subgroups: (a) correlation analysis between PTK2 gene expression level and its copy number variation, (b) PTK2 gene expression level between C1 and C2 subgroups, (c) frequency of copy number variation of PTK2 gene between C1 and C2 subgroups, (d) correlation analysis between PLEC gene expression level and its copy number variation, (e) PLEC gene expression level between C1 and C2 subgroups, and (f ) frequency of copy number variation of PLEC gene between C1 and C2 subgroups.  Journal of Oncology 13 Detailed information on 19 genes and their coefficients is listed in Table 1. A risk score was calculated for each patient according to the following formula: Coefficient n * Expression of gene n . (1) e distribution plots of risk score and OS between the high-and low-risk groups in the training set are shown in Figure 8(c). A heat map of the expression levels of the selected genes in the training set is depicted in Figure 8(e). e OS analysis between the high-risk (n � 60) and low-risk (n � 60) groups showed that the high-risk group had a significantly worse prognosis (log-rank p < 0.001, Cox p < 0.001, HR � 4.7, 95% CI: 3 − 7.3; Figure 8(f )). e efficiency of the model constructed based on the expression levels of the 19 genes was assessed using a ROC curve. e AUC of 1-, 3-, and 5-year survival time were 0.80, 0.89, and 0.96, respectively (Figure 8(g), c-index: 0.76, 95% CI: 0.66-0.87). ereafter, the samples in the validation set were classified as high or low risk based on the median value of the overall risk scores. e risk score and OS distribution plots and the heat map of the genes for the internal validation set are provided in Figures 8(d) and 8(h), respectively. A significant difference between the high-risk (n � 25) and low-risk (n � 26) groups was also identified in the internal validation set (log-rank p < 0.014, Cox p < 0.001, HR � 3, 95% CI: 1.7 − 5.2; Figure 8(i)). e AUC of 1, 3, and 5 years in the ROC curve were 0.84, 0.91, and 1.00 in the internal validation set, respectively (Figure 8(j), c-index: 0.79, 95% CI: 0.65-0.94).
ese results indicated that the model based on the expression levels of the selected immunerelated genes could serve as an accurate prognostic prediction tool in pancreatic cancer.

Uni-and Multivariate Cox Analyses of the Prognostic Factors.
To identify the prognosis-associated factors for pancreatic cancer patients, univariate Cox regression analysis was performed on the risk score model and common clinicopathological information, including age, gender, tumor grade, clinical stage, T stage, M stage, and N stage based on the data from TCGA data set. Risk score, age, tumor grade, T stage, and N stage could serve as prognosis-related factors (p < 0.05). Multivariate Cox regression analysis indicated that from these five prognosis-related factors, risk score and N stage were independent prognostic predictor factors (p < 0.05). Detailed results of the uni-and multivariate Cox analysis are provided in Table 3. ese results demonstrate that the prognostic prediction model based on the 19 immune-related genes could serve as an independent prognostic factor.

Discussion
Pancreatic cancer patients with similar tumor morphology and clinicopathological status often show considerable differences in responses to the same therapeutic method, which may be caused by the vast molecular heterogeneity of the tumor tissue [10,48,49]. Accurately stratifying patients in molecular subgroups with specific OS, clinical outcome, and therapeutic responses would aid clinicians in making accurate decisions and administering individualized treatment, leading to a better prognosis in pancreatic cancer [50][51][52]. As the immune landscape is closely associated with the development of pancreatic cancer and immunotherapy stands as a promising therapeutic option in the near future [53,54], we attempted to identify distinct molecular subgroups in pancreatic cancer by analyzing the expression level of immune-related genes.
For this purpose, we selected the expression data of the immune-related genes with p < 0.01 in both log-rank and Cox survival analyses for further analysis. To identify the distinct molecular subgroups, we performed consensus cluster analysis, an unsupervised technique that allows the grouping of similar objects and division of the data and has been widely used to study the existence of distinct subgroups in various cancer types [55,56]. Using this approach, the samples were divided into two distinct molecular subgroups, C1 (n � 103) and C2 (n � 68), which presented significant differences in the prognosis.  IL1RAP  IL18  IL31RA  TNFSF10  CXCL11  IL20RB  PLAU  ERAP2  PIK3CB  EIF2AK2  IL6R  NPR1  GDF11  INSL5  LRSAM1  TYK2 MET  SLURP1  IL1RAP  IL18  IL31RA  TNFSF10  CXCL11  IL20RB  PLAU  ERAP2  PIK3CB  EIF2AK2  IL6R  NPR1  GDF11  INSL5  LRSAM1  TYK2 MET  SLURP1  IL1RAP  IL18  IL31RA  TNFSF10  CXCL11  IL20RB  PLAU  ERAP2  PIK3CB  EIF2AK2  IL6R  NPR1  GDF11  INSL5  LRSAM1  TYK2 Figure 9: Validation of the prognostic prediction model in the three external validation sets from the GEO database: (a) distribution plots of risk score and survival time between the high-and low-risk groups in the GSE62452 data set, (b) heat map of 19 genes in the GSE62452 data set, (c) overall survival (OS) analysis between the high-and low-risk groups in the GSE62452 data set, (d) the ROC curve was used to evaluate the efficiency of the 19 gene model in the GSE62452 data set, (e) distribution plots of risk score and survival time between the high-and lowrisk groups in the GSE71729 data set, (f ) heat map of 19 genes in the GSE71729 data set, (g) OS analysis between the high-and low-risk groups in the GSE71729 data set, (h) the ROC curve was used to evaluate the efficiency of the 19 gene model in the GSE71729 data set, (i) distribution plots of risk score and survival time between the high-and low-risk groups in the GSE78229 data set, (j) heat map of 19 genes in the GSE78229 data set, (k) OS analysis between the high-and low-risk groups in the GSE78229 data set, and (l) the ROC curve was used to evaluate the efficiency of the 19 gene model in the GSE78229 data set.
To further elucidate the underlying molecular mechanisms behind the established molecular subgroups, we utilized GO, KEGG, and GSEA by employing the DEGs to identify the signaling pathways specifically associated with each subtype. Immune infiltration analysis revealed that the immune scores of six immune cell types were higher in the C2 subgroup than in the C1 subgroup. ese differences were significant for macrophages, myeloid dendritic cells, and the T cell populations CD4+ and CD8+. ese results suggest a state of immune suppression in the C1 subgroup, which might underlie the poor prognosis associated with this subgroup. Mutation analysis indicated that classical cancer-related genes such as KRAS, TP53, and RNF43 were more frequently mutated in the C1 subgroup [57][58][59]. Analyses of genes with the difference in copy number variation were performed between the C1 and C2 subgroups. e expressions of the cancer-promoting PTK2 and PLEC genes were significantly higher in the C1 subgroups than the C2 subgroups. Besides, the two genes' frequencies of amplification and single gain were lower in the C2 subgroup than the C1 subgroup. Finally, analysis of the clinical features in the two subgroups revealed that a more advanced tumor grade and T stage and higher mortality were associated with the C1 subgroup. Our analysis provides a comprehensive perspective of the differential characteristics of the two identified groups.
To identify the DEGs specifically associated with each molecular subgroup, we performed WGCNA. e coexpression gene module labeled in brown in Figure 6(d) was the one most significantly correlated with the molecular subgroups. e hub genes in this module were identified using the Cytoscape software and the Metascape database. e expression level of genes TNNT1, KCNN4, SH2D3A, and PHLDA2 was significantly different between the 179 pancreatic tumors and 171 normal tissue samples. In addition, the expression level of these genes was significantly correlated with the OS and RFS of pancreatic cancer patients. A previous study reported that TNNT1 is significantly upregulated in breast tumor samples, and it facilitated their uncontrolled proliferation of tumor cells [60]. In addition, another study reported TNNT1 overexpression in colorectal cancer cells, where it enhances their proliferation, migration, and invasion capacities [61]. KCNN4 has been shown to modulate epithelialmesenchymal transition and cell apoptosis, increasing the malignant behavior of papillary thyroid cancer cells [62].
Finally, high expression of SH2D3A has been reported to enhance the progression of ovarian cancer [63], and downregulation of PHLDA2 has been reported to significantly inhibit the development of colorectal cancer through the PI3K/AKT signaling pathway [64]. In the future, we aim to study in detail the function of these four genes using in vitro and in vivo approaches.
We constructed a risk model using the expression levels of 19 immune-related genes using LASSO regression. e resulting model performed well in the internal training and validation TCGA subsets, as well as in three external validation data sets, although the AUC for 3-and 5-year OS were not all available in the three external data sets due to time limitations. Finally, uni-and multivariate Cox analyses demonstrated that the prognostic-prediction model developed using these genes could serve as an independent prognostic factor, indicating its potential use in a clinical setting.

Conclusions
In this study, we established two specific molecular subgroups based on the immune-related genes with prognostic significance using consensus cluster analysis. e two subgroups demonstrated significant differences in the OS and clinical features, including tumor grade, T stage, and survival status. e signaling pathways identified using GO, KEGG, and GSEA could contribute to understanding the underlying mechanisms behind the molecular classifications. e immune infiltration, copy number variation, and mutation spectrum analysis provided novel insights into the molecular subgroups. Four potential biomarkers, TNNT1, KCNN4, SH2D3A, and PHLDA2, were identified using WGCNA. e prognostic prediction model based on 19 immune-related genes could serve as an effective tool to predict the overall survival of pancreatic cancer patients.

Data Availability
All data generated or analyzed during this study are included in this published article and its supplementary information files.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this article.

Supplementary Materials
Supplementary Tables 1-8 were uploaded to the submission system. Supplementary Table 1: Detailed information on pancreatic cancer samples from the TCGA data set. Supplementary Table 2: Detailed information on pancreatic cancer samples from the three validation data sets including GSE62452, GSE71729, and GSE78229. Supplementary Table 3: A total of 67 immune-related genes with prognostic significance (p-values < 0.01 in both the log-rank and Cox analyses) were selected for further analysis. Supplementary Table 4: Detailed information on the differentially expressed genes between the C1 and C2 subgroups. Supplementary