Identification of Key Prognostic Biomarker and Its Correlation with Immune Infiltrates in Pancreatic Ductal Adenocarcinoma

Pancreatic ductal adenocarcinoma (PDAC) is an extremely malignant tumor. The immune profile of PDAC and the immunologic milieu of its tumor microenvironment (TME) are unique; however, the mechanism of how the TME engineers the carcinogenesis of PDAC is not fully understood. This study is aimed at better understanding the relationship between the immune infiltration of the TME and gene expression and identifying potential prognostic and immunotherapeutic biomarkers for PDAC. Analysis of data from The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO) databases identified differentially expressed genes (DEGs), including 159 upregulated and 53 downregulated genes. Gene Ontology analysis and Kyoto Encyclopedia of Genes and Genomes enrichment were performed and showed that the DEGs were mainly enriched for the PI3K-Akt signaling pathway and extracellular matrix organization. We used the cytoHubba plugin of Cytoscape to screen out the most significant ten hub genes by four different models (Degree, MCC, DMNC, and MNC). The expression and clinical relevance of these ten hub genes were validated using Gene Expression Profiling Interactive Analysis (GEPIA) and the Human Protein Atlas, respectively. High expression of nine of the hub genes was positively correlated with poor prognosis. Finally, the relationship between these hub genes and tumor immunity was analyzed using the Tumor Immune Estimation Resource. We found that the expression of SPARC, COL6A3, and FBN1 correlated positively with infiltration levels of six immune cells in the tumors. In addition, these three genes had a strong coexpression relationship with the immune checkpoints. In conclusion, our results suggest that nine upregulated biomarkers are related to poor prognosis in PDAC and may serve as potential prognostic biomarkers for PDAC therapy. Furthermore, SPARC, COL6A3, and FBN1 play an important role in tumor-related immune infiltration and may be ideal targets for immune therapy against PDAC.


Introduction
Pancreatic ductal adenocarcinoma (PDAC) is an extremely malignant tumor of the digestive system, with a 5-year survival rate of only 8% [1]. GLOBOCAN 2018 estimated that pancreatic cancer ranked as the seventh leading cause of cancer death worldwide, with approximately 458,918 new incidence cases and 432,242 deaths [2]. Because of its poor prognosis, newly diagnosed cases and deaths from pancreatic cancer have significantly increased at the same pace over the last few decades [2]. Currently, surgical resection is still the most effective treatment, which significantly improves the 5-year survival rate to 20-30%. However, less than 20% of all patients are eligible for resection because the majority of patients have locally advanced or metastatic disease at the time of diagnosis [3]. Therefore, there is an urgent need for more effective therapeutic strategies.
In recent years, cancer immunotherapy has become an active area of cancer research. This approach is designed to improve the immunogenicity of tumor cells, stimulate and enhance the antitumor immune response, and inhibit tumor growth and progression. Since the FDA approval of the first targeted drug ipilimumab (humanized anti-CTLA-4 IgG1 monoclonal antibody) in 2011, six more immune checkpoint inhibitors have been approved for cancer therapy [4,5]. At present, immune checkpoint inhibitors have achieved significant therapeutic effects in melanoma [6], squamous nonsmall-cell lung cancer [7], renal cell carcinoma [8,9], and urothelial cancer [10,11]. However, the immune profile of PDAC and the immunologic milieu of its tumor microenvironment (TME) are unique relative to other malignant tumors, and the mechanism of how the TME engineers the carcinogenesis of PDAC is not entirely clear. Therefore, it is necessary to find new possible prognostic and immunotherapeutic biomarkers for this disease.
In the current study, we analyzed pancreatic cancer gene expression data from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases to identify differentially expressed genes (DEGs). Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment and protein-protein interaction (PPI) network analyses were performed to reveal the interactive relationships between the DEGs to explore the underlying molecular mechanisms involved in the carcinogenesis and progression of PDAC. Subsequently, we used the cyto-Hubba plugin of Cytoscape to search for hub genes and identify the most significant hub genes using four different models (Degree, Maximal Clique Centrality (MCC), Density of Maximum Neighborhood Component (DMNC), and Maximum Neighborhood Component (MNC)). We further validated the expression levels of the hub genes and their clinical relevance using Gene Expression Profiling Interactive Analysis (GEPIA) and the Human Protein Atlas, respectively. Finally, we evaluated the associations between the prognosis-related genes and several immune cell types based on the Tumor Immune Estimation Resource      Disease Markers GSE62452 datasets. P < 0:05 and |log 2FC | >1 were set as the cut-off criteria for identifying DEGs.

Functional Enrichment Analysis.
Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment were performed on the DEGs, and the results were visualized using the R package. A P value < 0.05 was set as the cut-off criterion.

PPI Network Construction and Module
Analysis. The STRING database (https://string-db.org/) was used to draw the PPI network diagram of the DEGs. The hub genes were identified and visualized using Cytoscape [14]. Specifically, the hub genes were identified using the Degree, MCC, DMNC, and MNC models with the Cytoscape plugin cytoHubba [15].

Validation of Hub Genes.
We analyzed the expression of the hub gene using GEPIA (http://gepia.cancer-pku.cn/index .html) [16]. Because there were few normal pancreatic tissue samples in the TCGA database, the expression level of specific DEGs was validated using the TCGA PDAC tumor data and matched data for normal tissue in the TCGA and Genotype-Tissue Expression (GTEx) databases. |log 2FC | > 1 and P < 0:05 were considered statistically significant. Protein expression of the DEGs was evaluated in the pancreatic tumor and nontumor tissues using the Human Protein Atlas tool (https://www.proteinatlas.org/) [17]. Mutation data were  Disease Markers obtained from the cBioPortal for Cancer Genomics (https:// www.cbioportal.org/) [18].

Association of Hub Gene Expression with the Survival of
Patients with PDAC. We analyzed the association of patient prognosis with the hub genes using the human protein atlas website. Based on the fragments per kilobase million (FPKM) value of each gene, pancreatic cancer patients were classified into two expression groups (high expression and low expression), and the correlation between the expression level and patient survival was examined. The prognosis of each group of patients was examined using Kaplan-Meier survival estimators, and the survival outcomes of the two groups were compared by the log-rank test. P < 0:05 was considered statistically significant.

Associations between Hub Genes and Immune Cell
Infiltration and Immune Checkpoints. TIMER is an online website that uses RNA-Seq expression profiling data to detect immune cell infiltration in more than 30 cancer types [19].
We used this approach to analyze the correlation between the expression levels of the hub genes and the abundance of immune infiltrates (B cells, CD4+ T cells, CD8+ T cells, neutrophils, macrophages, and dendritic cells) and four immunological checkpoints (CTLA4, CD274, PDCD1, and PDCD1LG2).

Screening for DEGs in PDAC.
We identified the differentially expressed genes in PDAC in the TCGA database using GEPIA (Figure 1(a)). Based on the thresholds of P < 0:05 and |log 2FC | >1, a total of 9202 DEGs (8724 upregulated and 478 downregulated) were identified. DEGs between normal and cancer tissues in the GEO datasets GSE15471 and GSE62452 were screened using GEO2R. A total of 295 DEGs (189 upregulated and 106 downregulated genes) were identified in profile GSE62452, and 1794 DEGs (1561 upregulated and 233 downregulated genes) were found in profile GSE15471 (Figure 1(b)). The overlapping DEGs within the three datasets consisted of 159 upregulated genes (Figure 1(c)) and 53 downregulated genes (Figure 1(d)).

Functional Analysis of the DEGs.
To analyze the underlying interplay of the DEGs, GO analysis and KEGG pathway enrichment were performed using the R package. Based on the GO analysis, the upregulated DEGs mainly participated in the extracellular matrix organization, extracellular  Figure 4: Expression analysis of ten hub genes in PDAC based on GEPIA. The mRNA expression levels in TCGA pancreatic tumors (n = 179) and matching normal tissue (n = 171) from the TCGA and GTEx databases. P < 0:05 was considered statistically significant. 5 Disease Markers structure organization, positive regulation of cell migration, skeletal system development, and cell-substrate adhesion (Figure 2(a)). KEGG pathway analysis revealed that the DEGs were mainly enriched in the PI3K-Akt signaling pathway, focal adhesion, and ECM-receptor interaction (Figure 2(b)). Genes specifically enriched in each KEGG term were visualized (Figures 2(c) and 2(d)).    Disease Markers the STRING database to study the relationship between the various DEGs. We identified ten hub genes based on the Degree (Figure 3(a)), MCC (Figure 3(b)), DMNC (Figure 3(c)), and MNC (Figure 3(d)) Cytoscape models. The ten hub genes were BGN, COL5A2, COL6A3, COL11A1, COL12A1, FBN1, POSTN, SPARC, THBS2, and VCAN (Figure 3(e)).

Validation of Expression and Alteration of Hub Genes in PDAC.
To verify the differential expression of the hub genes between PDAC and normal pancreatic tissue, we analyzed the ten hub genes using the GEPIA-based TCGA database. We found that the mRNA expression levels of the hub genes were significantly increased in PDAC compared to normal pancreatic tissue (Figure 4). Protein expression levels were similarly compared using the Human Protein Atlas database.
Typical immunohistochemistry images (https://www .proteinatlas.org/) for the protein expression of eight of the ten genes (COL5A2 and COL11A1 were not included in the database) in tumor and normal pancreatic tissue are shown in Figure 5(a). Genetic alterations found in the ten genes in pancreatic cancer are shown in Figure 5(b). Missense mutations were the most common type of mutation in the upregulated genes.

Identification of the Clinical Prognostic Value of Hub Genes in PDAC.
To estimate the influence of hub gene expression on the prognosis of PDAC, we performed survival analysis for the hub genes using the Human Protein Atlas online tool for differential analysis. High expression of nine of the hub genes was positively correlated with poor prognosis (COL5A2 P = 0:044; COL6A3  3.6. Immune Infiltration Analysis. To explore whether there was a correlation between immune cell infiltration into the tumors and hub gene expression, we analyzed the relationships between the ten hub gene signatures and tumor purity and six important immune cell types (CD4+ T cells, CD8+ T cells, B cells, neutrophils, macrophages, and dendritic cells) using TIMER. We observed that most of these prognosisrelated genes were positively correlated with the infiltrating levels of the different immune cell types but negatively related to tumor purity (Supplementary Figure 1, Supplementary Table 1). In particular, SPARC, COL6A3, and FBN1 showed remarkable positive correlations with the infiltrating levels of the six immune cell types (Figure 7). Since immunotherapy is currently focused on immunological checkpoint inhibitors (e.g., CTLA4, PDCD1, PDCD1LG2, and CD274), we chose SPARC, COL6A3, and FBN1 as potential target genes and further analyzed the coexpression relationship of these three genes with these immune checkpoint-related genes by TIMER. We found that all three of these potential target genes had strong coexpression relationships with CTLA4, PDCD1, PDCD1LG2, and CD274 ( Figure 8).

Discussion
Pancreatic cancer is one of the most malignant and aggressive tumors. Traditional treatment methods (e.g., surgery, chemotherapy, radiotherapy, and other locoregional therapies) provide low survival rates. Currently, several clinical studies have focused on immunotherapeutic strategies in pancreatic cancer [20]. Therefore, a better understanding of the immune infiltration into pancreatic tumors and the identification of novel PDAC immune-related biomarkers may prove useful for immunotherapy.
In the current study, we found 212 reliable DEGs in PDAC by the comprehensive analysis of three datasets. These differentially expressed genes were then subjected to GO and KEGG pathway enrichment analysis, which indicated that these DEGs were mainly enriched in pathways involved in the extracellular matrix (ECM) and extracellular structure organization and the PI3K-Akt signaling pathway. The ECM, a major component of the tumor microenvironment, is known to undergo significant changes during angiogenesis and tumor progression [21]. Activation of the PI3K-AKT signaling pathway can enhance pancreatic cancer cell proliferation and invasion [22,23]. Our results can help elucidate the underlying mechanisms of TME in pancreatic cancer proliferation and invasion. Furthermore, we identified ten hub genes through PPI network analysis. According to the results  , high expression of COL5A2,  COL6A3, COL11A1, COL12A1, FBN1, POSTN, THBS2, SPARC, or VCAN was associated with poor prognosis for patients with pancreatic cancer. These results suggest that these genes could serve as potential prognostic biomarkers for PDAC.
Immune cell infiltration in the tumor microenvironment has received much attention and has become a promising therapeutic target. To further investigate the relationship between these prognosis-related genes and different immune cell types, we predicted gene-immune cell interactions using TIMER. We found that the expression levels of SPARC, COL6A3, and FBN1 were significantly correlated with six infiltrating immune cell types (CD4+ T cells, CD8+ T cells, B cells, neutrophils, macrophages, and dendritic cells).
SPARC is a multifunctional calcium-binding glycoprotein that is usually secreted into the extracellular matrix and plays a key role in proliferation, migration, adhesion, and differentiation [24,25]. Previous studies have shown that SPARC is localized in the tumor stroma and overexpressed in various cancers, such as breast [26], lung [27,28], and melanoma [29]. Moreover, SPARC expression is dramatically increased in gastric tumors and associated with poor outcome [30]. Stromal SPARC expression is observed in almost 40% of pancreatic adenocarcinoma patients who have undergone curative resection, and this expression is an independent prognostic factor [31]. Consistent with our research, high SPARC expression in PDAC is associated with poor prognosis.
COL6A3 is an extracellular matrix protein that is typically found in most connective tissues, including muscle, skin, tendon, and vessels. Many recent studies have demonstrated the important role of COL6A3 in the diagnosis and prognosis of colorectal, lung, and prostate cancer [32,33]. A high level of expression of COL6A3 has been observed in pancreatic tumors, where it was correlated with negative prognostic factors [34]. This finding is consistent with the results obtained in our study. FBN1 encodes fibrillin, which is the primary component of microfibrils in the extracellular matrix. A previous study reported that FBN1 overexpression plays a key role in the development of germ cell tumors [35]. FBN1 is also a target gene for microRNA-(miR-) 133b. miR-133b inhibits the proliferation, migration, and invasion of gastric cancer cells by increasing FBN1 expression [36]. Moreover, hypermethylated FBN1 is found in tissue samples from colorectal cancer patients but not in healthy controls, suggesting that hypermethylated FBN1 may be a sensitive biomarker for this disease [37]. The role of FBN1 in the development of pancreatic cancer and its correlation with the prognosis of patients has not been reported previously. In summary, we speculate that SPARC, COL6A3, and FBN1 might be involved in the development of PDAC by regulating the functions of the TME.
Previous studies have suggested that the tumor microenvironment is one of the significant factors determining the prognosis of PDAC and may even play a role in the resistance to treatment [38,39]. However, the tumor microenvironment is complex and determined by many factors. We com-prehensively analyzed the correlation between immune checkpoints and identified three potential prognosis-related genes (SPARC, COL6A3, and FBN1). We were surprised to find that these genes had a significant coexpression relationship with CTLA4, PDCD1, PDCD1LG2, and CD274. These results suggest that the three prognosis-related genes may affect the development of pancreatic cancer by affecting immune checkpoints. Future studies should identify the specific roles of these genes in the regulation of PDAC development and progression.

Conclusion
Taken together, our results suggest that the nine high expression hub genes (COL5A2, COL6A3, COL11A1, COL12A1, FBN1, POSTN, THBS2, SPARC, and VCAN) may be associated with the development and prognosis of PDAC. All these genes may be potential prognostic biomarkers for PDAC. In addition, three hub genes (SPARC, COL6A3, and FBN1) may also play a vital role in the microenvironment of pancreatic cancer through the regulation of tumor-infiltrating immune cells, suggesting they could serve as potential therapeutic targets for the modulation of the antitumor immune response. Nevertheless, further investigation is needed to confirm the mechanisms underlying the potential roles of these biomarkers in the immune microenvironment.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.