Pyroptosis-Related Gene Signature Is a Novel Prognostic Biomarker for Sarcoma Patients

Sarcoma is a rare and an extremely aggressive form of cancer that originates from mesenchymal cells. Pyroptosis exerts a dual effect on tumours by inhibiting tumour cell proliferation while creating a microenvironment suitable for tumour cell development and proliferation. However, the significance of pyroptosis-related gene (PRG) expression in sarcoma has not yet been evaluated. Here, we conduct a retrospective analysis to examine PRG expression in 256 sarcoma samples from The Cancer Genome Atlas database. We identified the PRGs that had a significant correlation with overall patient survival in sarcoma by performing a univariate Cox regression analysis. Subsequently, we conducted a LASSO regression analysis and created a risk model for a six-PRG signature. As indicated from the Kaplan–Meier analysis, this signature revealed a significant difference between high- and low-risk sarcoma patients. A receiver operating characteristic curve analysis confirmed that this signature could predict overall patient survival in sarcoma patients with high sensitivity and specificity. Gene ontology annotation and Kyoto Encyclopaedia of Genes and Genomes pathway enrichment analyses revealed that five independent PRGs were closely associated with increased immune activity. Moreover, we also deciphered that increased number of immune cells infiltrated the tumour microenvironment in sarcoma. In brief, the PRG signature can effectively act as novel prognostic biomarker for sarcoma patients and is associated with the tumour immune microenvironment.


Introduction
Sarcomas are a heterogeneous group of uncommon mesenchymal malignancies that originate from the mesodermal tissue. They amount to approximately 1-2% of all malignancies, with 4-6 estimated incidence of sarcoma per 100,000 cases of cancer per year [1]. The World Health Organization classifies major cancer types into over 100 subtypes based on their morphological and genetic attributes [2]. Subcohort studies on the morphological and molecular heterogeneity of sarcomas revealed the biologically complex processes that govern sarcoma development and lead to an unfavourable prognosis and limited treatment choices for people with sarcomas.
Since current systemic therapy options have limited effectiveness, metastatic progression is observed in approximately 50% of sarcoma patients during the first five years of treatment [3]. Only 16% of these patients with distant metastasis have a five-year relative survival rate [4]. Thus, pretreatment assessment of sarcomas by molecular biomarkers potentially facilitates the development of a risk-adapted approach for individualized treatment strategies in the future [5]. As a result, identifying novel prognostic biomarkers for accurate prognostic evaluation of sarcoma patients and the development of potential targeted treatments is of great significance.
Pyroptosis is a novel form of programmed necrosis that is behaviourally similar to the inflammatory necrosis of cells. In inflammatory necrosis, gasdermin is cleaved via classical and nonclassical pathways and causes continuous cell expansion until the cell membrane ruptures. This results in the release of cell contents and triggers an intense inflammatory response [6,7]. An inflammatory response attributed to pyroptosis reduces the effects of immune surveillance and suppression on malignant cells, thereby accelerating tumour growth and progression [8,9]. Moreover, malignant cells can escape immune surveillance by immunoediting, called "immune escape," thus, promoting metastasis and cell proliferation. Few studies have linked a substantial proinflammatory impact of pyroptosis to the modulation of the tumour immune microenvironment (TIME) [10]. Increasing numbers of studies have suggested that pyroptosis impacts the proliferation, invasion, and metastasis of tumour cells and in turn affects cancer prognosis [11,12]. Nevertheless, the prognostic value of pyroptosis-associated gene (PRG) signatures in sarcoma patients has not yet been determined.
This study is aimed at building a risk-score model to predict patient prognoses by conducting retrospective bioinformatic analysis of PRG expression profiles in sarcoma patients. Moreover, we explore a potential connection between pyroptosis, patient prognoses, and the TIME in sarcoma.

Datasets.
We extracted RNA-sequencing (RNA-seq) expression profiles and related clinical follow-up parameters, mainly survival status and period, for the sarcoma cohort from The Cancer Genome Atlas (TCGA) database. The filtered data was provided (Supplementary Figure 1-3). We analysed this data using the R (version 4.1.1) and R Bioconductor software. Prior to performing an in-depth analysis, we normalized the expression data to the values of fragments per kilobase of exon model, per Million mapped fragments, and eliminated samples that were in duplicates and those with missing clinical information. On the whole, 33 PRGs, gathered from prior reports, were analysed in this study [13][14][15].

Identification of PRG Signatures and Development of Risk
Model. We performed a univariate Cox regression anal-ysis to identify a correlation between PRG expression and overall patient survival in sarcoma as a potential biomarker. This analysis was conducted via the "survminer" and "survival" packages and candidate genes with P < 0:05 were selected for further analyses. Moreover, the "forestplot" package was applied to visualize the results of the univariate prognostic analysis.
The Least Absolute Shrinkage and Selection Operator (LASSO) Cox regression analysis simultaneously analyses all independent variables and identifies the most crucial regulatory factors.
Subsequently, we conducted a LASSO regression analysis to identify the optimal prognostic PRGs using the "glmnet" and "survival" packages and built a multigene signature. Thus, we identified candidate genes to build a future risk model. Following that, we generated a risk score to predict the overall survival of sarcoma patients using the regression coefficients derived from the LASSO regression model. After centralization and standardization (using the R "scale" function), the risk score was determined.
The risk model is expressed below: ∑ n i=1 = exp i × βi. In this formula, the β i denotes the regression coefficient for a gene, exp i is the expression value of each prognostic gene for each TCGA sample, and n expresses the number of candidate genes.

Development of Independently Prognostic PRGs.
To assess the value of PRGs as independent prognostic biomarkers in different risk score subcohorts, we performed the Kaplan-Meier survival analysis. A nomogram was plotted using the "rms" package obtained from the multivariate Cox regression analysis. A calibration square plot was plotted to assess the prognostic accuracy of the nomogram and to calculate a relatively corrected C-index. Additionally, we performed a bootstrap validation (1000 bootstrap resamples) for the PRG nomogram.

Survival Investigation of Risk Model.
For subsequent analysis, we classified all sarcoma patients into either highrisk (Risk-H) or low-risk (Risk-L) subcohorts using the mean risk score of a patient as the threshold. We performed the Kaplan-Meier survival analysis to estimate and compare the differences in survival status and risk scores in the respective subcohorts. To estimate the prognostic accuracy of the PRG signature, we generated a time-dependent receiver operating characteristic (ROC) curve. Thereafter, using the "survminer" and "timeROC" packages of the R software, we plotted the Kaplan-Meier and ROC survival curves, respectively. We then initiated internal validation to test our risk model and randomly divided the cohort of sarcoma patients into two groups (as internal validation) for 259 TCGA sarcomas samples. The R package "caret" was employed for internal validation of risk score model (Supplementary Figure 4).  Disease Markers functions (MF) ontologies, cellular components (CC), and biological processes (BP) [16]. The DAVID [17] (Gene Functional Classification Tool, http://david.abcc.ncifcrf.gov/ ) and KOBAS databases [18] (http://kobas.cbi.pku.edu.cn/) are online tools that analyse KEGG signalling pathways and GO terms, respectively. Significant enrichment was defined as a P value < 0.05 and count ≥ 2.
2.6. Analysis of Immune Cell Infiltration in TIME. The CIBERSORT algorithm is an analytical method that assesses the overall expression cellular components by comparing them with their corresponding characteristics in a cell type. Therefore, we used CIBERSORT to determine the proportion of immune cell that infiltrated the TIME in sarcoma. A P < 0:05 was the threshold value to filter the results of the CIBERSORT analysis. Moreover, the proportion of each immune cell type in the samples was computed and plotted as a bar graph. The "pheatmap" package was used to create a heat map that presented the relative levels of 22 immune cells in the respective samples from each risk subcohort. The difference in the level of infiltration of these cells between the Risk-L and Risk-H subcohorts was analysed and visually compared. Additionally, "corrplot" package was used to conduct a correlation heat map analysis of the TIME infiltration by the 22 immune cell types.

Statistical Analysis.
All data were statistically analysed with the R (version 4.1.0) software. We used the "survminer" package to perform a log-rank test to compare the differences in the survival curves between the Risk-H and Risk-L subcohorts. We specified 1.258 as the optimal cut-off value [19], and the difference was considered to be statistically significant for P < 0:05.
We categorized 129 sarcoma samples into Risk-H and 130 into Risk-L cohorts based on the mean risk scores (Figure 1(d)). Based on the data obtained from principal component analysis, we categorized sarcoma patients into two distinct clusters by their risk scores (Figure 1(e)). Patients in the Risk-L cohort were more likely to survive lon-ger and had lower mortality rates than in the Risk-H cohort ( Figure 1(f)).
Our Kaplan-Meier survival analysis corroborated with the finding of the risk point distribution plot and revealed a significant difference in the overall survival period between the two risk score clusters (Figure 1(g)). In addition, the ROC curve revealed that the AUC value of the risk score in predicting the overall survival of sarcoma patients at 1/ 2/3-year was 0.736, 0.687, and 0.694, respectively. This indicated that the PRG risk model had a considerable prognostic sensitivity and specificity (Figure 1(h)).

Prognostic Significance of Individual Six-PRG Signature.
We determined the correlation between PRG expression levels and patient survival in the Risk-L and Risk-H cohorts using the six-PRG risk model and assessed the prognostic significance of each PRG. Only sarcoma patients with high expression levels of IL18, NLRP2, PYCARD, and TNF genes had favourable clinical outcomes, implicating that these PRGs are possible biomarkers of protective pyroptosis in sarcoma (Figures 2(a)-2(d)). However, sarcoma patients with high expression of PLCG1 had poor clinical outcomes, suggesting that it is a PRG biomarker that predicts poor prognosis of sarcoma patients (Figure 2(e)).

Development and Validation of Five-PRG Nomogram
Model. Subsequently, we constructed a nomogram model of the five PRGs (PLCG1, IL18, NLRP2, PYCARD, and TNF) based on the results of the multiple-variate Cox analysis ( Figure 3(a)). We then used a calibration plot to validate the prognostic capability of the nomogram model and to evaluate the accuracy of the predicted risk compared to the actual risk. In the nomogram model, as the expression levels of PLCG1 increased, the probability of survival of sarcoma patients decreased; thus, PLCG1 gene expression had a negative correlation with patient survival in sarcoma. On the other hand, the higher the expression levels of TNF, the higher the possibility of survival in sarcoma patients; the TNF gene had a positive correlation with patient survival in sarcoma. The calibration plot reflected the accuracy of the nomogram model in sarcoma patients in predicting the actual risk of death by these five genes, providing a valuable guide for clinical application (Figure 3(b)). And the risk score of PLCG1 and TNF could potentially serve as prognosis markers for sarcoma patients, in clinic.

Enrichment Analyses of Five Differentially Expressed
PRGs in Two Risk Subcohorts. We carried out GO annotation and KEGG pathway enrichment for the five PRGs to elucidate the biological functions and pathways associated with them. Additionally, these five PRGs were differentially expressed in the two risk subcohorts (Figure 4(a)). Remarkably, they were found to be associated with several immune and inflammatory-related BP terms (P. adjust < 0.05, Figure 4(b)). Furthermore, they were involved in CC related to inflammatory complexes, such as the NLRP1 inflammasome complex, AIM2 inflammasome complex, and NLRP3 inflammasome complex (P. adjust < 0.05, Figure 4(b)). The five PRGs had MF that was related to the terms "cysteine-3 Disease Markers type endopeptidase activity involved in the process," "cysteine-type endopeptidase activator activity participating in the apoptosis," "protease binding," "cytokine activity," and "protein binding" (Figure 4(b)).
The correlation analysis revealed that the main immune cell pairs with a negative correlation among the immune cells were as follows: memory resting CD4+ T cells and    Figure 7). The interaction parameters were P < 0:05 and | Correlation coefficient | >0:15. Notably, a significant difference in the percentage of infiltrating immune cells was observed among the two risk subcohorts that resulted in different clinical outcomes and risk statuses for the sarcoma patients.

Discussion
Molecular markers that were associated with different clinical outcomes have been identified in various solid tumours, underpinning individualized therapies to facilitate diagnosis and treatment [21][22][23]. Owing to biotechnology and bioinformatics, genetic analysis methods have been exploited to screen for vital cancer and tumour biomarkers. Molecular biomarkers are of great prognostic significance for sarcoma patients, as they provide additional information and insight into the mechanisms of carcinogenesis. It is noteworthy that loss of tumour suppressor genes may present some information on patient prognosis [24].
Since pyroptosis exerts a dual effect on cancer patients, the most direct and effective way to explain its significance is to build diagnostic and prognostic models related to pyroptosis [25]. The prognostic PRG signature was developed for multiple types of cancers (e.g., lung adenocarcinoma [13], gastric cancer [19], ovarian cancer [10], and skin cutaneous melanoma [25]). Since sarcomas exhibit considerable heterogeneity with respect to the anatomical location and the age of a patient and origin of the mesenchymal cells, the exact function of PRG signature in sarcoma patient remains unknown and deserves further study.
In this study, the mRNA levels of 33 PRGs were examined in samples from sarcoma patients to explore their     Disease Markers multiple-gene fitting approach is used in machine learning for the prognosis of various tumours [26] because multigene signatures are more accurate and reliable as diagnostic and predictive biomarkers of sarcoma than single-gene signa-tures. In the present study, the Kaplan-Meier survival analysis confirmed the diagnostic and prognostic significance of multi-PRG signatures in patients with early-stage sarcoma. Furthermore, the ROC curve validated the accuracy and

Group Group
Risk were stronger predictors of overall patient survival at 3 and 5 years, respectively, than the ideal model for the entire cohort, in accordance with multigene prediction nomograms. The constructed prognostic six-PRG risk model performed well in the TCGA sarcoma cohort and provided a reliable prognosis.
In the present study, PLCG1 was one of the PRGs identified. This gene is involved in apoptosis, differentiation, and cell growth through a receptor tyrosine kinase-mediated signal transduction channel [27]. Previously, pyroptosis was suppressed in GSDMD-N-induced cells by knocking down the PLCG1 gene and was found to be critical to facilitate the GSDMD-mediated pyroptosis [28]. However, the correlation between PLCG1-mediated pyroptosis and tumour development remains largely unknown. High expression of PLCG1 in our model was correlated with poor patient survival outcomes and was, to a certain extent, a negative regulator of pyroptosis and had a negative correlation with survival of sarcoma patients. Enrichment analysis of five independently prognostic PRGs revealed their involvement in immune-related signalling pathways and biological processes.
Pyroptosis is caspase-dependent inflammatory cell death characterized by swelling of the cells, the formation of holes and ruptures in the cell membrane, and the release of intracellular content causing pericellular inflammation and immune response [12]. It can induce the release of various inflammatory cytokines and is a consequence of inflammasome activation [29]. Interaction of cytokines and cytokine receptors involved in inflammasome activation is critical for sarcoma development and prognosis [30]. As a result, it is reasonable to presume that PRGs are linked to the immune system and inflammatory response.

Disease Markers
Interestingly, besides the KEGG analysis results, we also identified several signalling pathways associated with immune rejection and autoimmune-related diseases. This is probably because in sarcoma, tumour cells predominantly express PD-1/PD-L1. Though the expression of the PD-1/PD-L1 checkpoint pathway in sarcomas has been reported to be complex, tumour cells in sarcomas usually express low PD-1 levels. On the other hand, PD-L1 has been conformed to be expressed on tumour-infiltrating lymphocytes [31]. In parallel, pyroptosis-induced inflammation synergizes with the checkpoint blockade to trigger the strong antitumour immunity in the TIME [32]. Given the findings of our GO and KEGG analyses, it is reasonable to hypothesize that pyroptosis impact the composition of the TIME in sarcoma.
To examine the correlation between the proportion of tumour-infiltrating immune cells and patient prognoses, sarcoma cases were classified into high and low-risk cohorts.
The type of tumour-infiltrating immune cell in the sarcoma tissues of patients belonging to both cohorts was then assessed using the CIBERSOTR algorithm. Theories that hypothesize that the immune system can alter the formation, expansion, of tumour development [33]. Consistent with these theories, the better the prognosis, the more immune cells that infiltrate the TIME. In the six-PRG signature model, high-risk scores determined from the expression levels of the six PRGs were associated with poor patient outcomes. Furthermore, an overlap with the genetic signature found using the CIBERSORT platform implied that there may be some association with the risk score and the percentage of the TIME infiltrating immune cells. The immune cell infiltration analysis revealed that sarcoma cases in the Risk-L cohort had a reliable prognosis with increased levels of memory B cells, activated natural killer (NK) cells, M0 macrophages, and resting dendritic cells infiltrating the TIME.