PADI1 and Its Co-Expressed Gene Signature Unveil Colorectal Cancer Prognosis and Immunotherapy Efficacy

Peptidyl arginine deiminase 1 (PADI1) catalyzes protein citrullination and has a role in regulating immune responses. The tumor immune microenvironment has been reported to be important in colorectal cancer (CRC), which was correlated with the ability of CRC patients to benefit from immunotherapy. However, there is a lack of molecular markers for matching CRC immunotherapy. Previously, single-gene risk models have only considered the effect of individual genes on intrinsic tumor properties, ignoring the role of genes and their co-expressed genes as a whole. In this study, we analyzed the differential expression of PADI1 in colorectal cancer (CRC). We found that PADI1 was highly expressed in CRC. Subgroup survival analysis revealed a prognostic survival difference for PADI1 in CRC patients aged less than 65 years, male, T stage, N0, M0, and stage I-II (p < 0.05). In addition, we analyzed the functions and signaling pathways associated with PADI1 in CRC and found that it was highly enriched in several immune-related functions and pathways. Then, a set of PADI1 co-expressed genes (PCGs) risk-prognosis scores was developed with PADI1 as the core, which could accurately predict the prognosis of CRC (p < 0.05). PCGs risk score can be an independent prognostic factor for CRC. A new set of Norman plot models were developed for clinical characteristics with age, sex, and TNM stage, which can accurately predict CRC 1, 3, and 5 years survival, and calibration curves and decision curve analysis (DCA) validated the accuracy of the models. The risk score assessed the immune microenvironment of CRC and found that the immune score was higher in the low-risk group, and CD4+ T cells, helper T cells, and eosinophils were more infiltrated in the low-risk group (p < 0.05). Immunotherapy efficacy was better in the low-risk group (p < 0.05). The underlying mechanism may be that the high-risk group of PCGs was enriched in some pathways that promote immune escape and immune dysfunction. In conclusion, PCGs may better predict CRC prognosis and immunotherapeutic response.


Introduction
Colorectal cancer (CRC) is the second leading cause of death from cancer worldwide, accounting for approximately 10% of all cancer diagnoses and cancer-related deaths worldwide [1]. Its age of onset is advancing each year [2]. Colorectal cancer can usually be cured by surgery in the early stages of the disease, with a 5 years relative survival rate of about 90% [3]. Still, once colorectal cancer progresses to the middle and late stages, its 5 years survival rate is less than 50%, and the key to a good prognosis is early diagnosis [4]. Once colorectal cancer is diagnosed, the preferred treatment is still surgical resection and a postoperative combination of chemotherapy, radiotherapy, antiangiogenic therapy, immunotherapy, etc., but drug resistance still inevitably occurs [5][6][7][8]. Terefore, it is essential to understand the mechanism of colorectal cancer, fnd new tumor markers, and develop new targeted drugs to detect colorectal cancer more accurately, which has important research prospects and clinical signifcance. PADI1, peptidyl arginine deiminase 1, is a calciumdependent cysteine hydrolase that can mediate the citrullination of post-translational proteins [9]. It is a member of the PADs family and its primary function is to catalyze the conversion of arginine residues to citrulline residues in post-translational proteins [10]. A close association with the progression of oral mucosal cancer, breast cancer, and pancreatic ductal carcinoma has been reported in several publications but not in colorectal cancer [11][12][13]. In the subsequent studies, where genes regulate cellular traits in the form of networks, single-gene studies do not reveal the intrinsic properties of cancer more accurately, and a system consisting of a combination of the target gene and its coexpressed genes is required better to predict the biological properties of tumors and survival prognosis [14,15].
In this study, we frst investigated the expression of PADI1 in CRC and its impact on CRC survival prognosis and analyzed the function and enriched signaling pathways of PADI1 in CRC. Te PADI1-relatedco-expression gene network was mapped, and a risk-prognosis model was developed. By using this model, diferences in immune microenvironment status of CRC and diferences in the efcacy of immunotherapy could be accurately predicted.

Pan-Cancer Expression and Survival Analysis of PADI1.
TIMER is a web tool created by Harvard University's Professor of Immunoinformatics, TIMER (Tumor Immune Estimation Resource), at https://cistrome.shinyapps.io/ timer/. UALCAN is a comprehensive, user-friendly, and interactive web resource for analyzing cancer OMICS data. We entered the gene name "PADI1" in UACALN and selected "COAD" and "READ" for tumor type to observe the expression of PADI1 in colorectal cancer. Te diference in PADI1 expression in colorectal cancer was observed.

Analysis of Gene Ontology and Kyoto Encyclopedia of
Genes and Gene Sets. We performed the diferential analysis of the PADI1 high and low expression groups defned in the previous PADI1 survival analysis by selecting |logFC| > 0.585, p < 0.05, to obtain diferential genes, including 69 upregulated genes and 128 downregulated genes. Te molecular function (MF), biological process (BP), and cytological components (CC) of the diferential genes were analyzed using the "clusterProfler" in the R package. Components (CC) and KEGG pathway analysis.

Gene Co-Expression Analysis.
Enter the search term "PADI1" in "Protein Name" on the STRING website, and select "Homo sapiens" for "Organization." A total of 200 proteins interacting with PADI1 were identifed. By using Cytoscape software, we analyzed the protein interactions with diferent color lines and formed a visual graph.

Construction and Validation of PCGs Risk Model.
Tree independent co-expressed genes were integrated, resulting in 237 co-expressed genes. After univariate Cox regression analysis, we used the R package "caret" to open a prognostic model for colorectal cancer with ten gene signatures by least absolute shrinkage and selection operator (LASSO) regression, and the formula of the risk score model was established as follows: risk score � (βi * Expi). i index represented the signifcant prognostic genes analyzed by Lasso regression. βi represents the beta coefcient of these genes.

2.5.
Plotting of Norman Diagrams. Cox univariate and multivariate regression analysis of PCGs was performed with the "survival" R package. Age, sex, PCGs, and clinical stage were included as variables, and the prognostic model was constructed by using the R package "rms." 2.6. Te Relationship between the Construction of the CRC Immune Landscape and Risk Scores. Bioinformatics analysis based on transcriptomic data, several reliable algorithms have been established to quantify the relative proportions of immune swelling cells in individual samples, including ESTI-MATE and cell classifcation methods [16]. In the present study, we calculated and compared the content of immune swelling cells between the high-risk and low-risk groups. Briefy, estimation allowed us to calculate the immune score and estimated score for each patient. We also assessed the correlation between risk score and immune swelling cells using Spearman's correlation analysis. Finally, we measured the CIBERSORT to measure 22 immune cells in tissues, including seven T cell types, naive B cells and memory B cells, plasma cells, NK cells, and myeloid subpopulations [17].

Te Relationship between Risk Score and Immunotherapy.
Te TCIA database was developed mainly based on the nextgeneration sequencing data from TCGA. Each patient is analyzed separately. ID, disease, gender, and age information, we focus on the information in the IPS column. Te IPS (immunophenoscore) column has four items with diferent attributes that can be good predictors of CTLA-4 and PD-1 responsiveness [18]. Te tumor immune dysfunction and exclusion (TIDE) algorithm models other tumor immune escape mechanisms, including tumorinfltrating cytotoxic T lymphocyte (CTL) dysfunction and immunosuppressive factor exclusion of CTL [19]. A higher TIDE score indicates that tumor cells are more likely to induce immune escape, thus indicating a lower response rate to ICI therapy.

Gene Set Enrichment Analysis.
Gene set enrichment analysis (GSEA) was performed on risk genes to obtain this model's HALLMARK and KEGG pathways for MsigDB (c2.cp.kegg.v7.5.1.symbols.GMT; h.all.v7.5.1.symbols.GMT). Te genes expressed between the high-and low-risk categories were studied for gene set enrichment. Te alignment of this gene set was tested 1000 times to demonstrate its ability to sustain function.

PADI1 Expression Levels and Survival Prognosis in CRC.
We frst observed the diference in mRNA expression levels of PADI1 in pan-cancer, and the TIMER database showed that PADI1 was expressed at high and at low to medium levels ( Figure 1(a)). Next, we looked at the expression of PADI1 in COAD and READ on the UACLAN website, respectively. Te results showed that PADI1 expression was higher in cancer tissues than in paraneoplastic tissues in COAD and READ (p < 0.05, Figures 1(b)-1(c)). To further clarify the survival prognosis diferences of PADI1 in different clinical features of CRC, we observed the survival diferences of PADI1 by six clinical features, namely, age, gender, tumor size, lymph node metastasis, distant metastasis, and TNM stage, respectively ( Figure 2).

GO and KEGG Analysis of PADI1.
Tere were 96 differential genes in the PADI1 high and low expression groups (|logFC| > 0.5, p < 0.05), including 80 upregulated genes and 80 downregulated genes. GO analysis showed that diferential genes were mainly enriched in BP for functions such as epithelial cell diferentiation and keratinization, CC for extracellular matrix remodeling and signaling receptor activation, and MF for cell membrane fxation components (Figure 3(a)). KEGG pathway analysis showed that the diferential genes were enriched in several functions that promote intercellular interactions, extracellular matrix receptor interactions, and PI3K-AKT signaling pathway (Figure 3(b)).

Co-Expression Network Construction of PADI1.
Te top 50 genes co-expressed with PADI1 were selected (Figure 4(a)). Next, the top 20 genes that interacted most closely with PADI1 were further mapped using Cytoscape software (Figure 4(b)). Te GenneMANIA website further predicted the specifc interactions of PADI1 with some genes with physical binding, co-expression, etc., (Figure 4(c)). Finally, we used the colorectal cancer microarray in the TCGA database to map 11 genes in which PADI1 was associated (p < 0.05, Figure 4(d)). Te abovementioned results reveal the core value of PADI1 in the cancer regulatory network.

Construction and Validation of a Risk Score for PADI1
Co-expression Genes. We conducted a univariate Cox regression analysis using a total of 30 genes co-expressed with CRC prognosis. Subsequently, we extracted 13 PCGs obtained from Lasso regression analysis. A prognostic model of CRC risk with PADI1-associated gene (PCGs) signature was constructed, and the CRC cohort in TCGA was randomly divided into two cohorts in the ratio of 7 : 3, the training cohort and the internal validation cohort, and GSE39582 was used as the external validation cohort. In both the training cohort and the internal validation cohort, a higher proportion of high-risk patients died, while a higher proportion of low-risk patients survived long-term (p < 0.05, , indicating that the model has a good predictive efect. Te established prognostic features were then further validated using GSE39582 as an external validation cohort. Te results showed that the Kaplan-Meier survival curve showed a worse prognosis in the high-risk group than in the low-risk group (p < 0.05, Supplementary Figure 3A), and the proportion of risk varied with increasing risk score (Supplementary Figure 3B-3C), and the area under the ROC curve showed AUC of 0.628, 0.661, and 0.648 at 1, 3, and 5 years, respectively (Supplementary Figure 3D).

Establishment of Norman Diagrams for PCGs.
To validate these candidate prognostic genes as independent biomarkers, univariate and multivariate Cox regression analyses were used to assess whether the predicted value was an independent prognostic factor (Figures 6(a)-6(b)). Te risk scores of age and stage combined were selected to construct the Nomogram model, as shown in Figure 6(c). Te corrected plots of the Nomogram ( Figure 6(d)) show better agreement between the predicted OS results and the actual observations, indicating the good predictive performance of the PCGs prognostic model. Te results show that age, gender, stage, and risk score prognostic characteristics incorporated in the model.

PCGs Were Highly Expressed in the Immune
Microenvironment. Immune scores were higher in the lowrisk group than in the high-risk group, with no diference in stromal scores (Figure 7(a)). Tere was a positive correlation with M0 macrophages and a negative correlation with CD4+ T cells, plasma cells, eosinophils, dendritic cells, and helper T cells (Figure 7(b)). CD4+ T cells were more abundantly infltrated in the low-expression group, and Treg cells and M0 macrophages were more abundantly implanted in the high-risk group (Figure 7(c)).

Correlation of PCGs with Immune Checkpoints and
Chemokines. To clarify the correlation of genes in PCGs with immune checkpoints, we correlated PADI1 and coexpressed genes with 47 immune checkpoints and 42 chemokines. Te results were presented as heat maps, which showed that PADI1, MUC12 and CRACR2B were negatively correlated with immune checkpoints overall, CA2, CLDN23, EDEM1 ITLN1, PNRC1, SPINK4 and TDRD7 were positively correlated with immune checkpoints (Figure 8(a)); PADI1, MUC12 and CRACR2B were negatively correlated with chemokines in general, and CA2, CLDN23, EDEM1, ITLN1, PNRC1, SPINK4 and TDRD7 were positively correlated with chemokines in general (Figure 8(b)).

Relationship between PCGs Risk Scores and
Immunotherapy. Next, we further analyzed the relationship between IPS scores and risk scores, and Figures 9(a)-9(d) show that the overall IPS scores were higher in the low-risk group than in the high-risk group (p < 0.05, Figures 9(a)-9(d)). Te TIDE algorithm assessed the immunodefciency, immune escape, MSI and TIDE scores of CRC, and the results showed that the immunodefciency and immune escape ability of the low-risk group were lower than those of the high-risk group; Te MSI level was higher than that of the high-risk group; and the TIDE score was lower than that of the low-risk group p < 0.05, Figures 9(e)-9(h)). Te abovementioned results indicated that the immunotherapy efcacy was better in the low-risk group than in the high-risk group in the risk score constructed by PCGs.

Identifcation of PCGs Risk Scores for Functional
Enrichment Analysis. GSEA was employed to identify the pathways enriched in the HALLMARK and KEGG databases, showing the top fve pathways in the NES score. Te top fve signaling pathways in the high-risk group in the KEGG database were mainly pathways for intercellular interactions, and the low-risk group was mainly enriched in some molecular metabolic pathways (Figures 10(a)-10(b)); the top fve pathways in the high-risk group in the HALLMARK database were the top fve pathways in the high-risk group in the HALLMARK database were angiogenesis, epithelial-mesenchymal transition, KRAS, and WNT signaling pathways; the low-risk group was enriched in cell cycle molecules and oxidative phosphorylation pathways (Figures 10(c)-10(d)).

Discussion
Tere is growing evidence linking the PADs family to carcinogenesis and tumor immune tolerance [20]. However, apart from a previous study that identifed PAD1 as an EMT that can regulate TNBC and is a biomarker for early oral squamous carcinoma, no studies have been conducted to correlate the tumorigenic potential of PAD1 [12]. Previous studies have always studied single genes as a starting point but ignored the related regulatory role between gene-gene [21,22]. Tis study was the frst to integrate single genes with their co-expressed genes to develop a CRC risk prediction model for PCGs.
To expand our understanding of the role of PAD1 in CRC, we evaluated the expression of TCGA in colorectal cancer. We showed that PAD1 expression was upregulated in colorectal cancer patients and positively correlated with CRC. We mapped co-expression regulatory networks (PCGs) with PADI1 as the core. We used these coexpressed genes to construct a risk prediction model for CRC. the model constructed by PCGs has the value of predicting CRC prognosis. Te constructed Norman diagram can better predict CRC 1, 3, and 5 years survival rates than TNM staging. Moreover, there is a relationship between PCGs and CRC immune microenvironment. Indeed, the immune score was higher in the low-risk group of   PCGs, which may be related to the fact that PADI1 has been reported to have an immunosuppressive function, an ability that the tumor may exploit to promote its ability to escape immune cells. Te abundance of immune cell infltration calculated from CIBSORT showed that PADI1 was negatively correlated with CD4+ T cells, plasma cells, and helper T cells and that CD4+ T cell infltration was less in the low-risk group. In contrast, Tregs cell infltration was more abundant. Nine genes in the PCGs model correlated with PADI1, so we looked at these genes separately concerning the immune. We, therefore, performed an analysis of the relationship between these genes and immune checkpoints and chemokines separately, which contains both positive and negative correlations, precisely due to the complexity of the PADI1 regulatory network. Finally, we evaluated the relationship between PCGs risk scores and IPS and TIDE. Both immunotherapy predictions suggested better efcacy in the low-risk group than in the high-risk group. Tis indicates that PADI1 and its co-expressed genes may serve as new markers for clinical immunotherapy and improve clinicians' predictions for CRC immunotherapy. In exploring the mechanisms associated with poor prognosis and poor immunotherapy in the highrisk group of PCGs, we found extracellular matrix remodeling, PPAR signaling pathway, angiogenic signaling pathway, EMT signaling pathway, KRAS signaling pathway, and WNT signaling pathway were highly enriched [23][24][25]. Tese pathways were reported to have     VTCN1  TNFSF9  TNFSF4  TNFSF18  TNFSF15  TNFSF14  TNFSF9  TNFSF8  TNFSF4  TNFRSF25  TNFRSF18  TNFRSF14  TMIGD2  TIGIT  PDCD1LG2  PDCD1  NRP1  LGALS9  LAIR1  LAG3  KIR3DL1  IDO2  IDO1  ICOSLG  ICOS  HHLA2  HAVCR2 CTLA4 a relationship with the immune escape of tumors. It is suggested that low-risk patients are immunogenetically "hot" tumors and high-risk patients are immunogenetically "cold" tumors [26]. PAD gene family is all located on the short arm of human chromosome 1, region 3, band 6 (1p36), in a highly clustered gene cluster, hence, the name PADI. Interestingly, this locus is expected to contain a novel, as yet undefned, protein associated with tumorigenesis. In recent years, PADsmediated protein guanylation has received much attention due to its diference from traditional phosphorylation and acetylation modifcations [27]. For example, PADI2 and PADI4 can catalyze the guanylation of histones H3 and H4 at the gene promoter, leading to local alterations in chromatin structure and regulation of tumor-associated gene transcription in human breast cancer cells. Following PADI1-mediated guanylation, the loss of charge on target protein substrates is thought to lead to the breakdown of cytokeratin-polyserin complexes and protein degradation of these target proteins [20]. Apart from its role in epidermal function, we are poorly informed about the potential functions of PADI1 in other physiological or pathological activities.
Despite the merits of the PCGs signature, our study has some limitations that need to be addressed. First, due to the retrospective nature of this study, our views should be interpreted with caution. Second, sampling bias may be unavoidable due to genetic heterogeneity within tumors. Tird, although we validated the predictive value of the new signature for prognosis, immune cell infltration, and treatment response using various methods, external validation is needed for other independent CRC cohorts.

Conclusions
Genes from PADI1-related co-expression as a newly developed signature show great potential as prognostic biomarkers and immunotherapy predictors in colorectal cancer patients. Prospective studies are essential to further validate the predictive accuracy of this signature before applying it to the individualized management of CRC in a clinical setting.

Data Availability
Publicly available datasets were used in this study. Tese data can be found in the cancer genome Atlas (TCGA) database and gene expression omnibus (GEO) database.

Supplementary Materials
Supplementary