Identification of CD8+ T Cell-Related Genes: Correlations with Immune Phenotypes and Outcomes of Liver Cancer

Purpose Treatment outcomes for advanced liver cancer are poor. Immunotherapy is a treatment strategy that has been widely used to treat other cancers. Studies have shown that CD8+ T lymphocytes are essential factors affecting the efficacy of immunotherapy. We used computational biology methods to determine the coexpressed gene network that promotes CD8+ T lymphocyte infiltration. Method We obtained the liver cancer gene matrix and clinical follow-up information data from TCGA liver hepatocellular carcinoma FPKM. We obtained single nucleotide polymorphism (SNP) data to evaluate the tumor mutation burden. The “estimate” package and the CIBERSORT algorithm were used to evaluate tumor purity and the proportion of CD8+ T lymphocytes in the liver cancer cohort. We used the gene expression matrix of liver cancer and the relative proportion of CD8+ T lymphocytes as input files and performed WGCNA based on this analysis. The weighted coexpression network identified the most CD8+ T lymphocyte-related coexpression modules in liver cancer. Then, we analyzed the biological processes involved in the module. We determined the coexpression module with CD8+ T lymphocyte infiltration in terms of data and function. We then screened the factors in the coexpression module correlated with CD8+ T lymphocyte content greater than 0.4. Finally, the expression levels of these factors were verified at the protein level using immunohistochemistry and single-cell sequencing. Results We determined the CD8+ T lymphocyte proportions that correlated with coexpression networks. Four coexpressed genes (C1QC, CD3D, GZMA, and PSMB9) were identified as CD8+ T cell coexpression genes that promoted infiltration of CD8+ T cells. Because the factors in the coexpression network often participate in similar biological processes, we found that these factors were most related to antigen processing and presentation of peptide antigen through functional enrichment. In the clinical phenotype analysis, we found that 18 factors can be used as independent prognostic protective factors. We found that these factors were significantly negatively correlated with tumor purity and negatively correlated with M2 macrophages in the immunophenotyping analysis. Using immunohistochemistry and single-cell sequencing analysis, we found that CD3D antibody staining was weaker in tumor tissues than normal tissues and was related to CD8+ T cells. Conclusion These coexpressed genes were positively related to the high infiltration proportion of CD8+ T lymphocytes in an antigen presentation process. The biological process might provide new directions for patients who are insensitive to immune therapy.


Introduction
In recent years, breakthroughs have been made in using immune checkpoint inhibitors [1], ushering in a new era to treat advanced tumors. The emergence of immunotherapy provides options for the treatment of liver cancer; these include immune checkpoint inhibitors, adoptive cell transfer, tumor vaccines, and cytokine therapy. Immune checkpoint inhibitors enhance antitumor immune responses by revers-ing the exhaustion of T cell function and restoring immune recognition and immune attack. Immune checkpoint inhibitor targets include programmed death-ligand 1 (PD-L1) and its receptor PD-1 (programmed cell death protein 1) and cytotoxic T lymphocyte-related antigen 4. PD-1 is a member of the CD28 family and is expressed on the surface of most immune cells, mainly on CD8+ T cells [2]. It binds to PD-L1 and PD-L2 to cause inhibitory signals to be transmitted to T cells and induce tolerance [3]. PD-L1 is abnormally expressed in various tumors, including liver cancer; tumor cells achieve immune escape by abnormally expressing PD-L1 or PD-L2 [4]. Nivolumab (PD-1 monoclonal antibody) [5], pembrolizumab (PD-1 monoclonal antibody), and atezolizumab (PD-L1 monoclonal antibody) have been used to treat advanced liver cancer. Although immunotherapy has achieved gratifying results and has been approved as a treatment option for liver cancer, the objective response rate of the PD-1/PD-L1 antibody alone rarely exceeds 40%. The objective response rate of nivolumab and pembrolizumab in liver cancer was not significantly effective [6]. It is currently believed that the main reason for the low objective response rate of immunotherapy is the emergence of drug resistance [7]. The drug resistance of immunotherapy is a complex and multimechanism interdependent dynamic process, explained by impaired immune infiltration in tumors, depletion of T cells, or recruitment of immunosuppressive cells.
Studies have shown that low PD-1 expression in tumor tissues and low CD8+ T lymphocyte infiltration can cause this insensitivity [8]. According to the results of a metaanalysis of non-small-cell lung cancer, increased numbers of CD8+ tumor-infiltrating lymphocytes are associated with better overall survival [9]. In patients with advanced melanoma treated with pembrolizumab, the densities of CD8+ T cells in the invasion margin and tumor center of the tissue specimens of responders were higher than those of nonresponders [10].
Therefore, this article intends to identify the coexpression modules and functions related to the content of CD8+ T lymphocytes in the tumor microenvironment and define the coexpression network related to the content of CD8+ T cells, to provide a basis for improving the efficacy of immunotherapy.

Data Collection and
Preprocessing. TCGA expression matrix data from LIHC samples were downloaded from The Cancer Genome Atlas (http://cancergenome.nih.gov/). Tumor transcriptomic profiles of 19530 mRNA were measured in 377 liver hepatocellular carcinoma patients and were brought into the subsequent analysis. The hepatocellular carcinoma single-cell mice sequencing cohort GSE129516 [11] was also downloaded from the GEO database, whose platform is GPL24247.

CD8+ T Cell Proportion Evaluation.
To obtain the relative proportion of CD8+ T lymphocytes in each liver hepatocellular carcinoma sample, we used the CIBERSORT [12] method. CIBERSORT evaluated the proportion of 22 tumor-infiltrating immune cells in each sample. The samples with p < 0:05 were brought into the WGCNA.

Tumor Microenvironment Score and Tumor Mutation
Burden (TMB). The estimation of stromal and immune cells in malignant tumor tissues [13] is a method that evaluates the proportion of stromal and immune cells by gene expression signatures [13]. To perform a correlation analysis with the tumor mutation burden [14,15], we obtained TCGA-LIHC SNP data and evaluated the tumor mutation burden of each sample.

Weighted Gene Coexpression Network Analysis
(WGCNA). We used WGCNA [16] to explore the correlations between gene expression and CD8+ T lymphocytes. First, we matched the mRNA matrix of TCGA-LIHC and the corresponding CD8+ T lymphocyte content of the       sample and included them in the subsequent WGCNA. To ensure the nonscale of the network, we built a scale-free topology network [17,18] and set the soft threshold at 5 (R-squared = 0:78, slope = −1:93) and the number of genes in the minimum module at 30.

Gene
Ontology Functional Analysis. Gene ontology (GO) [19] analysis was performed to show the biological processes and molecular functions based on the different modules. In this study, we performed GO analysis based on the cluster-Profiler [20] package in R3.6.2.

Cox Hazard Proportion Regression Model and Subgroup
Evaluation. We applied the Cox risk proportional regression model for the factors in the grey module. The prognostic risk model related to CD8+ T lymphocytes in liver cancer was determined using the forward screening method. We used survival analysis and the area under the curve (AUC) to evaluate the accuracy of the prognostic risk model after rain. Next, we divided TCGA-LIHC cohort into various subgroups according to age, gender, clinical stage, and tumor purity and evaluated the CD8+ T lymphocyte-related risk models in various subgroups.
2.7. Gene Set Enrichment Analysis. Gene set enrichment analysis (GSEA) [21] was used to calculate the most involved pathway to these coexpression genes.
2.8. Immunohistochemistry. The extracted human tissues were fixed with a 4% formaldehyde buffer. Deparaffinized specimens were then sectioned into 4 μm slices. Tissue slices were incubated at 60°C for 2 h before dewaxing; the sections were autoclaved at 115°C for 3 min for antigen retrieval in a citric acid buffer (pH 6.0) and quenched for endogenous peroxidase activity with 0.3% H 2 O 2 solution for 15 min.
Then, the slices were blocked for nonspecific binding with normal goat serum for 45 min and incubated with the specific primary antibody against C1QC, GZMA, CD3D, and PSMB9 (dilution 1 : 200) overnight at 4°C. Subsequently, the sections were treated with the goat anti-mouse secondary antibody for 30 min at room temperature. Protein expression was visualized using 3,3′-diaminobenzidine. Images were captured using a Nikon Eclipse 80i microscope (Nikon Corporation).

Results
A flow chart is displayed in Figure 1, which illustrates the logic of our analysis.

CD8+ T Lymphocyte, Tumor Purity, and Tumor Mutation
Burden Evaluation. We measured the CD8+ T lymphocyte proportion, tumor purity, matrix score, immune score, and tumor mutation burden of each LIHC sample. Using the screening principle of p < 0:05, we obtained 377 liver cancer samples accurately evaluated by CD8+ T lymphocytes. By integrating the immune-related file with TCGA-LIHC mRNA expression files, we determined WGCNA phenotype entry files.

CD8+ T Lymphocyte Coexpression Network Conduction
in TCGA-LIHC. WGCNA was performed on TCGA liver hepatocellular carcinomas. To construct a CD8+ T lymphocyte coexpression network, we used a dynamic hybrid cutting method to construct a hierarchical clustering tree (Figure 2(a)). Each leaf on the tree represents a gene, and each branch represents a coexpression module. A total of 22 expression modules were obtained (Figure 2(b)). Then, we calculated the correlation coefficient between each module and CD8+ T lymphocytes and selected the grey60 and purple modules according to the correlation coefficient

CD8+ T Lymphocyte Coexpression Module Functional
Analysis. We determined the top 20 CD8+ T lymphocyte proportions positively coexpressing mRNA in TCGA-LIHC grey60 and magenta modules (Tables 1 and 2). The 20 CD8 + T lymphocyte proportions positively coexpressing mRNA in the magenta module were most significantly enriched in antigen processing and presentation. The 20 CD8+ T lymphocyte proportions positively coexpressing mRNA in the grey60 module were most significantly enriched in regulating lymphocyte activation, suggesting that these biological processes might promote CD8+ T lymphocyte infiltration in the liver hepatocellular cancer microenvironment (Figures 3(a) and 3(b)). The protein-protein interaction network of yellow and green modules is shown in Figure 3.      (Figure 4). These results suggest that these coexpression genes in grey60 and magenta modules protect against liver hepatocellular cancer. Next, we found that these factors were positively correlated with CD8+ T lymphocytes ( Figure 5) and negatively correlated with tumor purity (Figure 6).

Cox Regression
The samples in high-risk samples for liver hepatocellular cancer patients (TCGA: p < 0:001; HR = 23) (Figure 7) showed survival risk against low-risk groups, with AUC = 0:67 (Figure 7). The risk score was evaluated in various subgroups, including age, gender, stage, tumor purity, and tumor mutation burden. The results were significant in these subgroups.
3.6. GSEA. Antigen processing and presentation, the chemokine signaling pathway, the cytokine-cytokine receptor interaction pathway, and the T cell receptor signaling pathway were related to the high expression group in C1QC, CD3D, GZMA, and PSMB9 (Figure 8). The results showed that CD3D had lower staining intensity (Figure 9(a)). We also performed UMAP dimensional reduction clustering in the data of a single cell of liver cancer in GEO. After annotating the subsets with "SingleR," we obtained the subsets containing T cell macrophages and other cells. We found that the expression content of CD3D was relatively high in the T cell subsets, which confirmed our previous conclusion in TCGA-LIHC (Figure 9(b)). We analyzed the relationship between CD3D and currently known gene sets in the tumor microenvironment and verified the expression association between CD3D and CD8A in two other cohorts, GSE29721 [22] and GSE121248 [23]. We also marked the distribution of different T cell subtypes in single-cell cohorts, and the results showed that CD3D was more strongly associated with CD8A and less with CD4+ T lymphocytes (Supplementary Figure 1).

Discussion
Immunotherapy is a late-stage cancer treatment strategy widely used in clinical practice; nevertheless, there are many unsatisfactory results, including the low success rate of immunotherapy, complications associated with immunotherapy, and the super progression of immunotherapy. Studies have shown that the low success rate of immunotherapy is related to the low infiltration of CD8+ T lymphocytes, the weakening of the antigen presentation process, and the reduction of PD-1 expression. In this study, we explored the coexpression network that promotes the infiltration of CD8+ T lymphocytes in liver cancer by combining computational biology and experiments. The factors in the coexpression network are considered to have similar biological functions in organisms. Therefore, mining these coexpression factors helps us understand the biological processes most closely related to the infiltration of CD8+ T lymphocytes in liver cancer.
The grey60 module was the most relevant coexpression module for CD8+ T lymphocytes. The factors in this module are related to the regulation of T cell proliferation. To date, we have demonstrated that these factors are related to the infiltration of CD8+ T lymphocytes at the sequencing and functional levels. Based on our inferences, we believe that these factors may improve outcomes by increasing the infiltration of CD8+ T lymphocytes. Then, we conducted survival analysis on these factors and successfully constructed a prognostic risk score for liver cancer based on CD8+ T lymphocyte coexpression factors. The differences in protein levels of C1QC, CD3D, GZMA, and PSMB9 in 15 Journal of Immunology Research different liver cancer stages were determined using immunohistochemistry.
C1QC encodes the complement C1q C chain, which associates with C1r and C1s to yield the first component of the serum complement system [24,25]. C1q is composed of 18 polypeptide chains which include six A-chains, six B-chains, and six C-chains. Each chain contains an N-terminal collagen-like region and a C-terminal C1q globular domain. The protein encoded by CD3D is part of the T cell receptor/CD3 complex (TCR/CD3 complex) and is involved in T cell development and signal transduction [26,27]. The encoded membrane protein represents the delta subunit of the CD3 complex. Along with four other CD3 subunits, the encoded membrane protein binds either TCR alpha/beta or TCR gamma/delta to form the TCR/CD3 complex on the surface of T cells [28]. In this study, transcriptome, histological, and single-cell cohorts were used to demonstrate the importance of C1QC and CD3D in CD8+ T lymphocytes.
This article has some limitations. We used TCGA and liver cancer tissues from China Medical University to conduct a joint analysis. More external cohorts still need to be cross-validated. Due to the limited computing power of computers, we only included factors with variances in the top 25% when performing WGCNA. This may cause some false-negative results. More advanced computers need to be used to repeat this screening method. Although we hypothesize that these factors can improve the therapeutic effect of liver cancer immunotherapy by promoting the infiltration of CD8+ T lymphocytes, due to the lack of immunotherapy efficacy evaluation in the follow-up data of TCGA, we only explored the passage of these coexpression factors in liver cancer. More cohorts with immune follow-up data need to be added to promote the infiltration of CD8+ T lymphocytes and improve the outcome of patients.
In summary, we constructed a prognostic risk scoring model for liver cancer based on a CD8+ T lymphocyte content coexpression molecular network, determined that the factors in the risk scoring model can be used as independent prognostic factors for liver cancer, and determined the levels of these factors at the mRNA and protein levels. These prognosis-related CD8+ T lymphocyte coexpression factors and their related biological functions may provide new directions for improving the efficacy of immunotherapy.

Data Availability
The datasets TCGA-LIHC for this study can be found at http://cancergenome.nih.gov/. The datasets GSE129516, GSE29721, and GSE121248 in this study can be found at http://www.ncbi.nlm.nih.gov/geo/.

Consent
All the patients signed the informed consent.

Conflicts of Interest
The authors declare that they have no competing interests.