Identification and Analysis of Crucial Genes in H. pylori-Associated Gastric Cancer Using an Integrated Bioinformatics Approach

Background The relationship between H. pylori infection and gastric cancer (GC) has been widely studied, and H. pylori is considered as the main factor. Utilizing bioinformatics analysis, this study examined gene signatures related to progressing H. pylori-associated GC. Materials and Methods The dataset GSE13195 was chosen to search for abnormally expressed genes in H. pylori-associated GC and normal tissues. The TCGA-STAD database was chosen to verify the expression of key genes in GC and normal tissues. Results In GSE13195, a total of 332 differential expression genes (DEGs) were screened. The results of weighted gene co-expression network analysis showed that the light cyan, plum2, black, and magenta4 modules were associated with stages (T3, T2, and T4), while the orangered4, salmon2, pink, and navajowhite2 modules were correlated with lymph node metastasis (N3, N2, and N0). Based on the results of DEGs and hub genes, a total of 7 key genes (ADAM28, FCER1G, MRPL14, SOSTDC1, TYROBP, C1QC, and C3) were screened out. These gene mRNA levels were able to distinguish between normal and H. pylori-associated GC tissue using receiver operating characteristic curves. After transcriptional level verification and survival analysis, ADAM28 and C1QC were excluded. An immune infiltration study revealed that key genes were involved in regulating the infiltration levels of cells associated with innate immune response, antigen presentation process, humoral immune response, or Tcell-mediated immune response. In addition, drugs targeting FCER1G and TYROBP have been approved and are under investigation. Conclusion Our study identified five key genes involved in H. pylori-associated GC tumorigenesis. Patients with higher levels of C3 expression had a poorer prognosis than those with lower levels. In addition, these key genes may serve as biomarkers and therapeutic targets for H. pylori-associated GC diagnosis, targeted therapy, and immunotherapy in the future.


Introduction
Incidence of gastric cancer (GC) is the sixth highest of all cancer types, with approximately 1,089,103 cases worldwide. GC is also the third leading cause of cancer death, with approximately 769,000 deaths each year [1]. Te number of new cases of GC in China approaches 0.5 million each year [2]. Currently, the 5-year survival rate of GC patients is 32%, and more than 50% of patients are diagnosed with advanced cancer [3]. So far, surgery remains the only cure for GC [4]. Te Human Genome Project is nearing completion and next-generation sequencing is being widely applied; researchers have made great progress in the study of the mechanism of GC occurrence and development [5]. Te new medical model of the cross-development of sequencing technology and bioinformatics utilizes genomics and proteomics to guide targeted therapy, enabling GC patients to receive individualized and precise treatment [6]. To decrease the high incidence and mortality of GC, early detection and diagnosis are urgently needed, as well as new biomarkers for the disease. Although technology has advanced considerably, there is still an urgent need for efcient and timely diagnostic methods and new GC-specifc biomarkers.
Various risk factors afect the incidence of GC, including Helicobacter pylori (H. pylori) infection, gender, poor dietary habits, and smoking [7]. Of these, H. pylori infection, which often leads to gastritis, followed by gastric atrophy and gastrointestinal metaplasia, is most closely related to GC [8]. Currently, the detection of H. pylori and its eradication therapy can reduce the risk of GC [9]. Mechanistically, the toxic efects of H. pyloriproducedcytotoxicity-associated gene A (CagA) and vacuolar cytotoxicity A (VacA) proteins on gastric mucosal cells can trigger a series of complex biological efects, including release of proinfammatory cytokines, recruitment of immune cells, and stimulation of the survival of gastric epithelial cells [10,11]. H. pylori inhibits phagocytic activity and T cell function during infection, while catalyzing the formation of urea to ensure its survival in harsh low pH conditions. Furthermore, H. pylori metabolism byproducts damage epithelial cells of the host and contribute to the carcinogenesis of H. pylori infection [12]. Despite numerous studies on H. pylori, it remains unclear whether H. pylori is only involved in the initiation of gastric tumor processes, or whether it afects the mechanisms of tumor progression.
In recent years, immunotherapy, as a novel treatment method, mainly induces antitumor efects by modulating the immune system and has made revolutionary progress in the treatment of gastric cancer [13]. Te tumor microenvironment (TME) is a complex ecosystem consisting of immune cells coming in many forms and other acellular components of the extracellular matrix with marked heterogeneity. In the TME, tumor cells and immunomodulators interact dynamically to produce positive immunotherapy responses [14]. Te immune microenvironment of GC itself is in a dynamic change, and whether the addition of H. pylori will make it more complicated.
In this article, based on the GSE13195 dataset and the TCGA-STAD dataset, we used a series of bioinformatics research methods to explore the dysregulated genes and mechanisms in H. pylori-associated GC tissues and to fnd possible biomarkers and targeted drugs.

Gene Set Enrichment Analysis (GSEA).
To more accurately determine the functions of diferential genes, we performed GSEA using Sangerbox Tools on the basis of normal tissues and H. pylori-associated GC tissues [16]. Te reference gene set is c2.cp.kegg.v7.0.

Screen for Tumor Progression-Related Modules and Central Genes by Weighted Gene Co-Expression Network
Analysis (WGCNA). Gene co-expression networks in H. pylori-associated GC tissues were constructed using Sangerbox Tools [16]. First, based on Pearson correlation analysis, 25 samples were clustered to identify outliers. Ten, we set the soft threshold to 5 to achieve a scale-free topology. Subsequently, using a dynamic tree-cut approach, the genes were classifed into diferent modules based on gene expression correlations. Te expression similarity of module eigen genes was further used to cluster similar modules with a height of 0.85. Module membership (MM) is the correlation of gene expression profles with module characteristic genes, and genes with MM ≥ 0.8 are considered hub genes [18]. Te protein interaction network was mapped using the String online website (https://string-db.org/).

Validation of Key Genes.
Key genes were selected from abnormally expressed genes and hub genes. Receiver operating characteristic (ROC) curves were drawn to calculate specifcity and sensitivity. In order to verify the accuracy and reliability of the screened key genes, the gene expression data of GC patients in the TCGA-STAD dataset (including 34 normal samples, 20 H. pylori-associated GC samples, 157 H. pylori-unassociated GC samples, and 153 other samples) were used for validation (including mRNA expression level and survival analysis) in UALCAN online website (https://ualcan.path.uab.edu/) [19].

Immune Infltration Analysis.
According to the calculation method of the immune microenvironment score of CIBERSORT, the immune microenvironment analysis of H. pylori-associated GC tissues and normal tissues was performed [20]. We calculated enrichment scores for each immune-related cell population using ssGSEA to examine the relationship between key genes and immune infltration. In addition, Spearman correlations between each hub gene expression and immune enrichment scores were calculated and tested.

Target Drug.
Te DrugBank online analysis website (https://go.drugbank.com/) was used to fnd compounds that might act on key genes [21]. Te fowchart of the study is provided in Figure 1.

Data Collection and Acquisition of Diferential Genes.
Te dataset GSE13195 from GEO was selected for this study. According to the screening conditions of P < 0.05 and |logFC| > 1, we found 332 diferentially expressed genes (DEGs), including 198 that were upregulated and 134 that were downregulated (Figures 2(a) and 2(b)).

Functional and Pathway Enrichment Analysis.
DAVID v6.8 was used for GO and KEGG enrichment analysis in order to better elucidate the functional and biological signifcance of the modules identifed. GO biological process analysis showed that in terms of biological process, these diferential genes were mainly related with cell adhesion, collagen fbril organization, response to drug, maintenance of gastrointestinal epithelium and detoxifcation of copper ion; in terms of cellular components, these diferential genes were mainly located in extracellular space, extracellular exosome, extracellular region, cell surface, and basolateral plasm membrane; in terms of molecular functions, these diferential genes mainly participated in extracellular matrix structural constituent, identical protein binding, protein binding, integrin binding, and collagen binding (Figure 2(c)). Furthermore, KEGG analysis revealed that these diferential genes were highly involved in the regulation of gastric acid secretion, mineral absorption, protein digestion and absorption, ECM-receptor interaction, and cell cycle (Figure 2(d)).

Diferential Gene Set Enrichment
Analysis. GSEA was conducted to better elucidate how diferential genes function. Te eight KEGG pathways associated with DEGs are shown in Figure 2(e). Tey were melanogenesis, thyroid cancer, bladder cancer, P53 signaling pathway, glycosphingolipid biosynthesis, renal cell carcinoma, basal cell carcinoma, and endometrial cancer. Moreover, compared with normal tissues, these related pathways were hyperactivated in H. pylori-associated GC tissues.

Co-Expression Network Construction and Module
Detection. To fnd modules highly correlated with the progression of H. pylori-associated GC, samples of cancer tissues were used to construct a network of co-expression. We investigated the relationship between the scale-free topological ft index R 2 and the soft threshold (power) in order to make the network scale-free. As shown in Figures 3(a) and 3(b), we chose a soft threshold (power) of 5 when R 2 reached 0.85 for the frst time. After the adjacency matrix was constructed, we transformed it into a topological overlap matrix. Genes were then sorted into diferent modules, performing a dynamic tree-cutting method. Diferent genes would be categorized into the same module if their expressions were signifcantly correlated. Finally, we got 66 modules; the module feature vector and clustering dendrogram are shown in Figures 3(c) and 3(d). Ten, to identify modules that were highly correlated with the progression of H. pylori-associated GC, the correlation between tumor characteristics and each module was examined. As shown in Figure 3(e), among the 66 modules, modules light cyan, plum2, black, and magenta4 were most associated with stage (T3, T2, and T4) with P values below 0.05; modules orangered4, salmon2, pink, and navajowhite2 were associated with lymph node metastasis (N3, N2, and N0) were most correlated with P values below 0.05. We calculated MM and defned genes with MM ≥ 0.8 as central genes among the genes in selected modules and obtained a total of 318 hub genes. Te protein interaction networks of these 318 hub genes in their respective categories are shown in Figure 4.

Acquisition and Specifcity Analysis of Key Genes.
Seven genes obtained by intersecting the diferential genes and hub genes were defned as key genes, namely, ADAM28, FCER1G, MRPL14, SOSTDC1, TYROBP, C1QC, and C3 ( Figure 5(a)). Teir expression in the tissues of the GSE13195 dataset is shown in Figure 5(b). Among them, FCER1G, MRPL14, TYROBP, C1QC, and C3 were signifcantly highly expressed in H. pylori-associated GC tissues compared with normal tissues, while ADAM28 and SOSTDC1 were completely opposite. In addition, the ROC curves showed that the key genes were well predicted (AUC values: 0.957, 0.902, 0.934, 0.925, 0.862, 0.826, and 0.726, respectively) ( Figure 5(c)). Tis suggested that seven key genes had the potential to be diagnostic markers for H. pyloriassociated GC.

Validation and Survival Analysis of Key Genes.
Based on the TCGA database, boxplots of tumor samples and normal samples (including 34 normal samples, 20 H. pylori-associated GC samples, 157 H. pylori-unassociated GC samples, and 153 other samples) were generated for further validation of the key genes. As shown in Figure 6(a), the mRNA expression levels of the fve key genes (FCER1G, MRPL14, C3, SOSTDC1, and TYROBP) were signifcantly diferent between tumor tissues and normal tissues, while ADAM28 and C1QC showed no signifcant diferences. In addition, FCER1G, MRPL14, and C3 were abnormally high in H. pylori-associated and H. pylori-unassociated GC tissues compared to normal tissues; SOSTDC1 was abnormally low in H. pylori-associated and H. pylori-unassociated GC tissues. Interestingly, TYROBPHP was abnormally high in H. pylori-associated GC tissues compared to normal tissues but not in H. pylori-unassociated GC tissues. Furthermore, the expression of TYROBP was signifcantly increased in H. pylori-associated GC tissues relative to H. pylori-unassociated GC tissues. Te expression levels of key genes were correlated with the prognosis of GC patients through survival analysis. According to the median expression value, GC patients were divided into a high expression group and low expression group. We found that patients with GC who expressed high levels of C3 had poorer overall survival, while the results of survival analysis of other genes were not statistically signifcant (Figures 6(b) and S1). Terefore, we removed ADAM28 and C1QC from the key genes.

Immune Infltration Analysis.
We performed immune microenvironment analysis on H. pylori-associated GC and normal tissues according to the CIBERSORT's calculation method of the immune microenvironment score. As shown in Figures 7(a) and 7(b), compared with normal tissues, H. pylori-associated GC tissues had stronger infltration of activated NK cells, M0 macrophages, M1 macrophages, and M2 macrophages, but less infltration of plasma cells and CD8 T cells, others are no diferent. We used ssGSEA to determine enrichment scores for immune-related cells. Spearman correlations between gene expression and immune enrichment scores for each hub were calculated and tested (Figure 7 darkgreen  darkorange2  bisque4  darkgrey  skyblue2  antiquewhite4  honeydew1  magenta  grey60  plum1  lightgreen  thistle  darkorange  lightcyan  darkseagreen4  honeydew  yellow4  navajowhite1  pink  orange  maroon  skyblue1  lightpink3  mediumpurple2  darkviolet  brown  darkolivegreen  darkred  mediumpurple3  brown2  lightyellow  orgagered3  brown4  purple          plasma cell infltration. C3 was positively correlated with infltration of M2 macrophages, delta gamma T cells and M1 macrophages, and negatively correlated with infltration of monocytes and plasma cells.

Possible Targeted Drugs.
We used the DrugBank online website to search for possible targeted drugs in key genes. As shown in Table 1, for FCER1G, currently approved and under investigation drugs were benzylpenicilloyl polylysine and fostamatinib. Among them, benzylpenicilloyl polylysine acted as an agonist, while fostamatinib functioned as an inhibitor. For TYROBP, the currently approved and understudied drug was dasatinib, but it played a multitargeted role, and the specifc mechanism remained to be further studied. Te remaining compounds targeting key genes were poorly studied.

Discussion
Globally, GC is the third most common malignancy as well as the sixth most common cause of death [1]. Te recent research showed that more than half of newly diagnosed patients were from developing countries (Eastern Europe, East Asia, and Central and South America) [22]. GC can occur due to a number of risk factors, including exposure to chemical carcinogens, environmental factors, genetic susceptibility, poor diet, and excessive alcohol intake [23]. However, infection with H. pylori remains the main cause of GC induction [24]. Despite the rapid development of targeted therapies and immunotherapies in recent years, there was still a lack of clinical efectiveness in treating some patients with GC [25]. It would be benefcial if more methods and targets could be found for treating GC. Based on transcriptome data analysis, our study identifed DEGs associated with the occurrence and progression of H. pyloriassociated GC, and provided some potential targets for the treatment of H. pylori-associated GC. Based on the GSE13195 and TCGA-STAD datasets, we identifed fve key genes, FCER1G, MRPL14, SOSTDC1, TYROBP, and C3, which presented diferent expression patterns in H. pyloriassociated GC and normal tissues, where C3 may afect the prognosis of GC patients. FCER1G is located on chromosome 1q23.3 and encodes the gamma subunit of the crystalline (Fc) region (Fc R) of an immunoglobulin fragment involved in various immune responses such as phagocytosis and cytokine release [26,27]. Cellular efector functions are activated by the interaction between the Fc of immunoglobulins and the Fc R of immune cells, which in turn trigger destructive infammation, immune cell activation, phagocytosis, oxidative burst, and cytokine release [28,29]. FCER1G was implicated in the progression of several cancers, such as squamous cell carcinoma, multiple myeloma, and clear cell renal cell carcinoma [27,29,30]. In renal cancer, the high expression of FCER1G may be a functional basis for the induction of M2 macrophages by the increased secretion of IL-4. In addition, M2 macrophages can acquire their tumor suppressor function in part by suppressing cytotoxic T cells. Tis may explain the relevance of FCER1G to macrophage and T cell function [31]. Tese fndings were consistent with our results that high expression of FCER1G was positively correlated with infltration of M2 macrophages and negatively correlated with CD8 T cells.
MRPL14 is a highly conserved protein. One proteinbinding site and two RNA-binding sites are located in the Cterminal region of MRPL14, which consists of a fvestranded beta barrel and two small alpha helices [32]. MRPL14 was found to be closely related to mitochondrial metabolism [33]. Te conserved interaction of C7orf30 with MRPL14 promoted biogenesis of the mitochondrial large ribosomal subunit and mitochondrial translation [32]. However, research on the role of MRPL14 in cancers is currently still blank.
SOSTDC1 is a secreted protein with a glycosylated Nterminus that contains a C-terminal cysteine knot domain [34]. SOSTDC1 negatively regulates BMP (bone morphogenetic protein) signaling during cell proliferation, diferentiation, and apoptosis, and also regulates various processes in development and cancer by regulating the Wnt pathway [35,36]. Researchers have found that a lack of SOSTDC1 in GC patients was associated with a shorter survival rate. In gastric cancer, SOSTDC1 acts like a tumor suppressor, and its silencing can promote tumor growth and lung metastasis. SOSTDC1 signifcantly inhibits the SMAD-dependent BMP pathway, c-Jun activation, and transcription of c-Jun downstream targets [37]. In addition, SOSTDC1 regulates NK cell maturation and Ly49 receptor expression from nonhematopoietic and hematopoietic sources in a cellularexogenous manner [38]. Tis seems to be contrary to the results we obtained in H. pylori-associated GC tissues, which needs to be further explored in the follow-up studies.
TYROBP, also known as DAP12, can noncovalently bind to activating receptors on the surface of various immune cells and mediate signal transduction and cell activation [39,40]. Tere was evidence that patients with GC who overexpressed TYROBP had a poorer survival rate. Furthermore, TYROBP can stimulate macrophage activation, regulate tumor necrosis factor production, and induce tolerance [41]. TYROBP is involved in the interaction between tumor cells and macrophage M2 to enhance TGF-β secretion in vitro [42]. Our research partially confrmed this, but this part of the results still needs to be verifed with large samples later.
Complement is an important part of the innate immune system. Previously, it was thought to be a network of proteins that released infammatory mediators in response to microbial invasion [43]. A growing number of studies have shown that complement activation in the tumor microenvironment can delay local T-cell immunosuppression and chronic infammation, thereby promoting tumor-promoting efects, ultimately promoting tumor immune escape, growth, and distant metastasis [44,45]. C3 and downstream signaling molecules are involved in multiple biological processes of tumor cells, including tumor cell anchoring, proliferation, tumor-associated angiogenesis, matrix remodeling, migration, and invasion [46][47][48]. In GC, monocytes, TAMs, M2 macrophages, DCs, Tregs, and T cell exhaustion were signifcantly associated with C3 expression. An immunotherapeutic approach based on C3 could provide a potential biological target for GC [49].
Although we identifed and confrmed 5 key genes that were highly correlated with the progression of H. pyloriassociated GC, we were unable to perform multifaceted validation due to the small sample size of GSE13195 and the lack of studies of the same type. In addition, we did not perform experimental tests on key genes. It is critical to conduct larger sample studies as well as multicenter clinical trials to gain a deeper understanding of how genes are involved in H. pylori-associated gastric cancer.

Conclusion
In conclusion, we identifed fve key genes, FCER1G, MRPL14, SOSTDC1, TYROBP, and C3, associated with the occurrence of GC in H. pylori infection. Among them, H. pylori-associated GC patients with higher C3 expression had worse prognosis than those with lower expression. In addition, in the future, H. pylori-associated GC may be diagnosed and treated precisely by biomarkers and therapeutic targets related to these key genes.

Data Availability
Te datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.