Integrated Single-Cell RNA-Sequencing Analysis of Gastric Cancer Identifies FABP1 as a Novel Prognostic Biomarker

Gastric cancer (GC) is usually diagnosed in an advanced stage at the first visit due to the atypical clinical symptoms. The low surgical resection rate and chemotherapy sensitivity result in dismal survival. Therefore, it is urgent to develop novel biomarkers with high sensitivity and specificity to accurately assess the prognosis of GC patients. In the present study, 3385 differentially expressed genes (DEGs) were obtained from the single-cell RNA sequencing data of GC specimens. Using the unsupervised dimensionality reduction, we further found 3 subsets of cells including gastric cells, plasmacytoid dendritic cells, and memory T cells. Based on the cell clustering, we explored the key regulatory genes for GC progression by pseudo-time analysis and functional enrichment analysis. According to the results, the significant differentially expressed fatty acid-binding protein 1 (FABP1) verified by pseudo-time analysis was identified as the hub gene of GC progression. FABP1 was shown to be closely related to the long-term survival and the age at diagnosis of patients with GC in analysis based on the TCGA (The Cancer Genome Atlas) database. To further verify the role of FABP1 in GC, we performed immunohistochemical (IHC) analysis using the GC tissue microarray and found that the expression level of FABP1 was higher in GC tissues than in the adjacent tissues. Moreover, GC patients with higher expression of FABP1 had a worse clinical outcome. In summary, our study revealed that FABP1 is a potential effective biomarker for the prognosis of GC, and high expression of FABP1 predicts unsatisfactory survival.


Introduction
Gastric cancer (GC) is one of the most common gastrointestinal cancers in the world, with the second-highest mortality rate [1]. Global GC incidence varies widely, with the highest incidence in East Asia [1]. e current treatment strategies for GC are mainly surgery, chemotherapy, radio-chemotherapy, and targeted therapy. However, chemotherapy resistance and high postoperative cancer recurrence rate usually lead to cancer-related death [2,3]. erefore, it is critical to study the molecular mechanisms of gastric cancer progress and to identify novel biomarkers for early gastric cancer. gene was first found in the liver, located on chromosome 2, encoding a protein of 127 amino acids (aa). FABP1 protein has a classical β structural fold and two short alpha-helices [8][9][10]. FABP1 has multiple functions, such as a protective agent against oxidative stress, and is involved in the regulation of adipogenesis and lipid metabolism [11]. In addition, FABP1 is associated with various diseases such as liver fibrosis, nonalcoholic steatohepatitis, acute kidney injury, renal ischemia/reperfusion injury, type I diabetes, and type II diabetes [12]. Studies have shown that the expression level of FABP1 is closely related to the occurrence and progression of various tumors. FABP1 is found highly expressed in Barrett's esophagus [13]. e expression of FABP1 is also positively correlated with the incidence of pancreatic cancer, especially the diabetes-related pancreatic cancer [14]. FABP1 has also been found to be downregulated in cancers. Low expression of FABP1 was found in the early-stage colorectal cancer and 93% of microsatellite unstable colorectal cancers [15,16].
In this study, we analyzed the single-cell sequencing data of gastric cancer and found that FABP1 was one of the most significant differentially expressed genes (DEGs) in GC tissues. FABP1 was identified as the hub gene in GC progression. e TCGA analysis also showed that FABP1 was closely related to the prognosis of GC patients. e IHC analysis based on GC tissue microarray further verified that FABP1 was highly expressed in GC tissues, confirming that GC patients with higher expression of FABP1 have a lower long-term survival rate. erefore, we proposed that FABP1 is a promising biomarker for GC.

Data Acquisition and Preprocessing.
After searching through the Gene Expression Omnibus (GEO, http://ncbi. nlm.nih.gov/geo/) database [17], we downloaded the reads per kilo base per million mapped reads (RPKM) scRNA-seq data from the GSE134520 [18]. e data were constructed by thirteen gastric antral mucosa biopsies with the pathologic diagnosis including nonatrophic gastritis (NAG), chronic atrophic gastritis (CAG), intestinal metaplasia [17], or early gastric cancer (EGC). ese data also match the platform annotation of GPL20795 HiSeq X Ten (Homo sapiens) [18]. e study flow chart is shown in Figure 1.
During data preprocessing, we read the original expression values by the Seurat function, and the number of genes and the Unique Molecular Identifiers (UMIs), representing the non-normalized expressional values within a cell, were automatically calculated [19,20]. e sum of the percentages for mitochondrial was calculated with the criterion of filtration of 5%. Here, cells with expressed genes <100 and genes expressed in <3 cells were removed from the dataset. e LogNormalize algorithm was used to normalize the original data, and the FindVariableGenes algorithm was used to find the variable features [19,20].
In addition, based on the orthogonal transformation algorithm, the principal component analysis (PCA) analysis was applied to the dimension reduction process of scRNAseq data to highlight data features in lower dimensions [21].
After selecting the key components, the important dedicator contributing to data differentiation, the tSNE method was selected to detect the cell subtypes with the data resolution of 1.0 (3,4). For the patterns of the gene expression matrix of a cell corresponding to different cell subtypes, we selected the SingleR and scCATCH methods to identify the cell subtypes [19,20,22].

Differential Gene Analysis.
e FindAllMarkers function is a frequently used method to detect the differentially expressed genes (DEGs), based on the Wilcox analysis, with the criterion of log |fold changes (FC)| over 0.25 and P < 0.05 after FDR correction [19].

Analysis of Biological Functions of Differential Genes.
Using the clusterProfiler package and MetaScape database (http://metascape.org/gp/index.html#/main/step1), we further analyzed the Gene Ontology (GO) [23] and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment, respectively [24]. e MetaScape database is an international authoritative functional database of gene annotation, visualization, and integrated exploration, facilitating the integration of genetic pathways and functional enrichment. A P-value < 0.05 was considered statistically significant in our GO function and KEGG pathway analysis.

Pseudo-Time Analysis and Pseudo-Time DEG Analysis.
Many of the cellular states of the various cell fate processes are not perfectly synchronized, some cells are at the beginning of a particular process, while others are already at the completion of that process, which is also known as "asynchronous." Based on the Monocle reverse embedding graph algorithm, we quantified the transformation under different cell states and the transcription state of the corresponding gene set and form a trajectory by sorting these cells according to this transcriptional process, thus tracking the process change function that accompanies the trajectory, called pseudo-time analysis. Pseudo-time is an abstract unit of differentiation, which is only a distance from the cell to the start of the trajectory, measured along the shortest path. By estimating SizeFactors algorithm, we evaluated the gene expression range of each cell and used it for subsequent normalization and calculation of gene variance. Differential GeneTest algorithm is used to calculate the differential core regulatory genes in pseudo-time analysis, and B-H correction P value <0.01 is considered to be pseudo-time DEGs.

Analysis of Protein Interaction Networks Based on DEGs.
After deriving the DEGs of the hub pathway, we performed the core expression analysis in a volcano map based on the average Log |FC| and Negative Log10 (adjusted P value). Subsequently, the hub tagged target proteins were calculated based on the COMPPI database (http://comppi.linkgroup. hu/) [25]. e COMMPI platform was used to integrate the subcellular localizations, protein-protein interactions, and scores of localizations and interactions, with the locations of cytosol, mitochondrion, nucleus, extracellular secretory pathway, and membrane.

Functional Enrichment and Prognosis Analysis of Hub
Genes. We used the ToppGene database (https://toppgene. cchmc.org/) to annotate and visualize the hub marker's functional enrichment based on Gene Ontology [23] and pathways [26]. e terms with P < 0.05 were significantly enriched. To gain further insight into the hub gene expression and its association with prognosis, we applied a friendly online web tool Wanderer (http://maplab.imppc. org/wanderer/) to identify the important clinical features correlated with transcriptional expression. Additionally, the SurvExpress (http://bioinformatica.mty.itesm.mx:8080/ Biomatec/SurvivaX.jsp) online database was applied to validate the prognostic relationship among the FABP1 expressions [23]. Besides, the statistical difference was considered with the P value <0.05.

Immunohistochemical Analysis and Statistical Analysis.
Gastric cancer tissue microarray (TMA, HStmA180Su19) was obtained from Shanghai Outdo Biotech, including 94 cases of gastric cancer tissues and 86 cases of adjacent tissues; with complete case data and follow-up information, more detailed sample information and clinical features of colorectal cancer are shown in Table 1. e IHC assays and IHC scores were performed with a previously described protocol [27], and the antibodies against FABP1 were purchased from Abcam (MA, US, ab171739). e statistical analysis was conducted by the GraphPad Prism software 8.0 (GraphPad Software, Inc., San Diego, CA, USA). Data are represented as means ± standard deviations. e expression level of FABP1 in gastric cancer tissues and adjacent tissues was analyzed by Student's t-test, the Chi-square test was used for the analysis of clinicopathological features, and the Kaplan-Meier method and the log-rank test were used for survival analysis. P < 0.05 was considered statistically significant.

Differential Gene Expression Analysis.
Using the analytical methods described in Methods, we filtered out cells with unique feature counts over 4000 or less than 200. e sum of 3385 variable features expressed in 4110 early gastric cancer (EGC) cells was subjected to scRNA-seq background correction, normalization, and differentially expressed (DE) analysis (Figures 2(a) and 2(b)). A total of 15 PCs were found to simultaneously meet the selection criteria of the contribution degree model and PCA analysis (Figures 2(c) and 2(d)).

tSNE Cell Subtype Detection. After annotation and identification, we detected the gastric cells, plasmacytoid dendritic cells, and memory T cells. e corresponding top
DEGs expression profile is presented in Figure 3(b). e expression levels of the top 8 hub genes in the corresponding cell clusters are shown in Figure 3(c).

Biological Function Analysis of Differential Genes.
After selecting the appropriate cellular principal components ( Figure 4(a)), we performed a pseudo-time analysis based on the monocle algorithm with the data reduction by the DDtree method (Figures 4(b) and 4(c)). In addition, we further analyzed the most significant GO functions and KEGG pathways of DEGs by the pseudo-time analysis. e significant GO functions mainly involved three main aspects: biological processes (BP), cellular components (CC), and molecular functions (MF). In terms of BP-related functions, the DEGs were mainly associated with protein targeting (enriched genes � 68, P value � 3.85E-36), the establishment of protein localization to the membrane (enriched genes � 62, P value � 2.78E-26), and neutrophil degranulation (enriched genes � 58, P value � 4.01E-22). For CC functions, the DEGs are closely related to an adherent junction (enriched genes � 48, P value � 1.05E-10), cellsubstrate junction (enriched genes � 43, P value � 1.85E-10), and focal adhesion (enriched genes � 38, P value � 2.23E-10). In terms of MF, DEGs are mainly associated with cell adhesion molecule binding (enriched genes � 51, P value � 5.12E-06), structural constituent of ribosome (enriched genes � 42, P value � 4.87E-06), and enzyme inhibitor activity (enriched genes � 28, P value � 2.91E-05). Visualization of the GO functions occupied by DEGs is shown in Figure 4(d). In terms of their KEGG pathways, the DEGs were significantly correlated with oxidative phosphorylation (enriched genes � 45, P value � 3.21E-12), protein processing in the endoplasmic reticulum (enriched genes � 38, P value � 2.75E-10), and TNF signaling pathway (enriched genes � 31, P value � 1.85E-09). e KEGG pathway enrichment results are shown in Figure 4(e).

Hub Marker Detection and COMPPI Network Analysis.
e volcano map in Figure 4(f ) shows the average fold changes in the expression of the marker and adjusted P value (the fold change value is mainly based on ComPPI to analyze the interaction regulation of target proteins at the subcellular level to explore the possible regulation or interaction proteins at the levels of the nucleus, cytoplasm, mitochondria, and cell membrane). Using the COMPPI database and Cytoscape software, network diagrams were generated and FABP1 targeted genes in the network were screened based on the differential cellular locations of correlated connectivity. In this part, we targeted the FABP1, and thus, 7 correlated proteins, located in the cytosol, mitochondrion, nucleus, extracellular secretory pathway, and membrane, were intercepted to construct the PPI network in Figure 4(g).
In Figure 5(a), results indicated that the biological functions of FABP1 were significantly correlated with hormone-sensitive lipase (HSL)-mediated triacylglycerol hydrolysis (P value � 0.0032), fat digestion and absorption (P value � 0.0047), and mechanism of gene regulation by peroxisome proliferators via PPAR-alpha (P value � 0.0051). Figure 5(b) shows that the expression of FABP1 had a significant influence on the overall survival (OS) of STAD patients (P value � 0.046) in the SurvExpress database. Besides, the expression level of FABP1 was also correlated with age at initial pathologic diagnosis (P < 0.05 and R � 0.31) and OS (P < 0.05 and R � 0.44) in the Wanderer database ( Figure 5(c)).

Upregulation of FABP1 in GC and Its Correlation with
Poor Prognosis. To further investigate the expression of FABP1 in GC tissues and its relationship with the prognosis of GC patients, we analyzed the expression of FABP1 in GC tissue microarray by IHC. e results showed that the expression level of FABP1 in GC tissues was higher than that in the noncancer tissues (Figures 6(a) and 6(b)), and the high expression rate of FABP1 in gastric cancer tissues was higher than that in noncancer tissues (Figure 6(c)). However, the expression of FABP1 was not significantly correlated with clinical-pathological features such as age, gender, tumor size, histopathological type, lymph node positive, TNM stage, and HER2 positive (Figure 6(d) and Table 1). More importantly, Kaplan-Meier analysis indicated that upregulation of FABP1 was consistently correlated with a worse prognosis ( Figure 6(d)), suggesting a tumor promotion role and prognostic value of FABP1 in GC.

Discussion
Studies have shown that about 70% of GC patients have already developed liver and peritoneal metastasis at their first visit [28]. As a result, improving the early diagnosis rate of GC is critical to promoting the survival rate of GC patients. Most GC patients are failed to accept medical examinations in time due to the atypical symptoms. e popularization and promotion of GC screening methods are particularly important. Currently, the diagnostic methods of GC are mainly endoscopy, imaging examination, and serum markers. However, many patients cannot accept the    Journal of Oncology endoscopy because it is an invasive examination. In addition, the lack of skills and experience of endoscopists and pathologists lead to the early missed diagnosis. It is also difficult to detect small lesions by imaging examination, and the disease may be already in an advanced stage when positive symptoms are detected. Nowadays, the serum markers used for the diagnosis of GC are mainly CEA, CA 19-9, and CA72-4. However, these serum markers are not or less expressed in some GC, leading to false negatives and early misdiagnosis of GC. erefore, it is important to explore and develop new biomarkers with high sensitivity and specificity for the early diagnosis of GC.
Recently, as an emerging sequencing technology, singlecell RNA sequencing can further explore the heterogeneity of malignant tumors, tumor evolution, clinical diagnosis, and treatment at different omics levels of single cells [29].
e key regulatory factors of various tumor cells have been identified using single-cell RNA sequencing technology, including factors in the immune microenvironment, drug resistance, and metastasis. Some researchers conducted single-cell RNA sequencing analysis of human liver cancer T cells and found that there are many dysfunctional CD8 + T cells and regulatory T cells in tumor tissue. By analyzing the DEGs of the two types of cells, they found that the gene Layilin can inhibit the killing function of CD8 + T cells and may become a potential target for liver cancer immunotherapy [30]. In esophageal cancer cell lines resistant to paclitaxel, single-cell RNA sequencing results showed that proteasome genes and HIF-1 signaling genes were associated with acquired paclitaxel resistance in esophageal cancer cells [31]. In a colorectal cancer study, single-cell RNA sequencing analysis was performed, respectively, on the primary, metastatic, and circulating tumor cells of the metastatic colorectal cancer. Results showed that circulating tumor cells have not only the same mutated driver genes (such as APC, KRAS, or PIK3CA) with the primary and metastatic lesions but also new variant genes [32].
In this study, we analyzed the single-cell RNA sequencing data of GC. Firstly, we analyzed 3385 variable cell features expressed by 4110 EGC cells. After annotation and identification, we detected gastric cells, plasmacytoid dendritic cells, and memory T cells. Expression levels of the top 8 (OLFM4, TFF3, TTR, CHGA, SRGN, CCL5, KRT7, and FABP1) hub genes of the corresponding cell clusters were also identified. In addition, using the pseudo-time analysis, we further analyzed the most significant GO functions and KEGG pathways of DEGs, revealing the regulatory effects of DEGs on the biological function of GC according to biological processes, cellular components, and molecular functions, as well as the closed relationship between the DEGs and oxidative phosphorylation, endoplasmic reticulum protein processing, and TNF signaling pathways. Moreover, we identified the significant DEG-FABP1 and found that FABP1 may regulate the PPAR signaling pathway, hormone-sensitive lipase (HSL)-mediated triacylglycerol hydrolysis, fat digestion, and absorption in gastric cancer progression. Survival analysis showed that higher FABP1 expression predicts a lower survival rate in GC patients. e expression of FABP1 is also correlated with the age of patients at initial pathological diagnosis. Singlecell RNA sequencing can obtain genomic and transcriptome information of cancer center cells, pericancerous cells, and distant metastasis cancer cells, so as to find effective therapeutic targets for cancer. For our analysis, we found the abnormal expression of FABP1 in early gastric cancer tissues by analyzing the data of single-cell RNA sequencing, which plays a very important role in the treatment of gastric cancer. In recent years, a number of studies have reported the use of single-cell RNA sequencing to find treatment for gastric cancer. Immune cells and stromal cells were found to exhibit cellular heterogeneity in tissues with distant metastases from gastric cancer, and genes regulating CD8+ cell depletion were screened [33]. Both inflammatory cancer-associated fibroblasts and extracellular matrix cancer-associated fibroblasts can mobilize surrounding immune cells to build a microenvironment conducive to the growth of gastric cancer cells [34]. ese results obtained by single-cell RNA sequencing undoubtedly can provide new ideas for the treatment of gastric cancer. FABP1 is a low-molecular-weight protein composed of 127 amino acids. As a lipid chaperone, each FABP1 molecule can bind to two long-chain fatty acid molecules. FABP1 can  Journal of Oncology 7 also bind to other hydrophobic ligands to regulate various biological processes such as cell growth, differentiation, and apoptosis [35]. Studies have shown that FABP1 is detected in about 38% of GC patients, mainly in gastric papillary adenocarcinoma, female cases, and patients with age less than 50. FABP1 is highly expressed in gastric intestinal metaplasia and gastric adenocarcinoma tissues, but not or less expressed in gastric tissues [36]. We performed IHC by GC tissue microarray and found that the expression level of FABP1 was significantly higher in the GC tissues than in the adjacent tissues. Furthermore, GC patients with higher expression of FABP1 had a worse prognosis. Previously, researchers had reported that FABP1 is expressed in earlystage GC, with a specificity of 95% and a sensitivity of 67% for the diagnosis of early recurrence, and patients with multiple positive results of this gene have a worse prognosis [37]. Interestingly, the expression of FABP1 has no significant correlation with clinicopathological features such as age, sex, tumor size, histopathological type, lymph node positivity, TNM stage, and HER2 positivity. However, some researchers reported that FABP1 expression was detected in the peritoneal lavage fluid of GC patients, and the prognosis of FABP1-positive patients was worse than that of CEA. At least half of them had a peritoneal recurrence, and the recurrence rate was 67% [38].
e results reported are inconsistent with ours, which may be caused by the diversity of clinical samples. In our future study, we will expand the sample size to clarify these issues, and the content of FABP1 can be detected in serum and feces of patients with a confirmed diagnosis of GC, to prove that FABP1 can be used as a marker for the diagnosis of gastric cancer. In addition, we need to conduct further research in other aspects, such as observing the biological effect of FABP1 on the GC cells after downregulating and upregulating the expression of FABP1 in vitro and in vivo and elucidating the molecular mechanism of FABP1 promoting cancer from different perspectives in terms of fat metabolism.
In conclusion, we found that FABP1 is a key regulatory gene of GC and is associated with poor prognosis based on the single-cell RNA sequencing data. Tissue microarray analysis also showed that FABP1 is highly expressed in GC      tissues, and the survival rate of patients with high FABP1 expression is lower. FABP1 is expected to become a promising marker for early diagnosis and targeted therapy of GC.

Data Availability
e data used to support the findings of this study are available from the corresponding authors upon request.

Conflicts of Interest
All the authors declare no conflicts of interest.

Authors' Contributions
Ling Huang and Fan Yang participated in the research design. Fan Yang, Lianfang Gan, Junhua Pan, Yaying Chen, Hong Zhang, and Ling Huang conducted experiments. Fan Yang and Lianfang Gan performed data analysis. Fan Yang, Lianfang Gan, Junhua Pan, Yaying Chen, Hong Zhang, and Ling Huang wrote or contributed to the writing of the manuscript. Fan Yang and Lianfang Gan contributed equally to this work.