Construction and Investigation of MicroRNA-mRNA Regulatory Network of Gastric Cancer with Helicobacter pylori Infection

Background Helicobacter pylori (H. pylori) is a common human pathogen, which is closely correlated with gastric cancer (GC). However, the mechanism of H. pylori-related GC has not been elucidated. This study aimed to explore the role of H. pylori infection in GC and find biomarkers for early diagnosis of H. pylori-related GC. Methods We identified differentially expressed microRNAs (DEMs) and genes (DEGs) from the Gene Expression Omnibus (GEO) dataset, constructed microRNA-(miRNA-)mRNA expression networks, analyzed the function and signal pathway of cross-genes, analyzed the relations between cross-genes and GC prognosis with the Cancer Genome Atlas (TCGA) data, and verified the expression of cross-genes in patients with H. pylori infection. Results 22 DEMs and 68 DEGs were identified in GSE197694 and GSE27411 dataset. 16 miRNAs and 509 genes were involved in the expression network, while the cross-genes of the network were mainly enriched in MAP kinase (MAPK) signaling pathway and TGF-beta signaling pathway. Patients with higher expression of hsa-miR-196b-3p, CALML4, or SMAD6 or lower expression of PITX2 or TGFB2 had better outcomes than those with lower expression of hsa-miR-196b-3p, CALML4, or SMAD6 or higher expression of PITX2 or TGFB2 (P < 0.05). Patients with H. pylori infection had a higher expression of hsa-miR-196b-3p and CALML4 than those without H. pylori infection (P < 0.05). Conclusion The study of miRNA-mRNA expression network would provide molecular support for early diagnosis and treatment of H. pylori-related GC.


Introduction
Gastric cancer (GC) is one of the most common cancers in China, with half a million deaths annually [1]. Because of population growth and life extension, the incidence and mortality rates of GC are increasing [2]. erefore, it is urgent and important to identify the key genes in its pathogenesis. Helicobacter pylori (H. pylori) is acknowledged as a class I carcinogen [3], which colonizes in the gastric mucosa and causes chronic gastric, atrophic gastric, and GC [4]. However, how H. pylori infection is involved in the pathogenesis and development of GC is unknown [4][5][6]. erefore, elucidating the molecular mechanism of H.
pylori-related GC is of great importance for its early diagnosis and targeted therapy. MicroRNAs (miRNAs) are 19∼25-nucleotide-long endogenous noncoding RNAs, which negatively regulate the expression of their targets at the posttranscription level and play significant roles in many biological processes, such as proliferation, differentiation, and apoptosis [7][8][9]. Evidence has been increasing that abnormal miRNAs expression is involved in the pathogenesis and development of many cancers, which suggests the promising biomarkers for early diagnosis and therapy of tumors [5,10]. Now, a high-throughput platform combined with bioinformatics analysis has become a new way to identify biomarkers of disease [11,12].
In this study, differentially expressed miRNAs (DEMs) and differentially expressed genes (DEGs) of gastric biopsy with H. pylori infection from the Gene Expression Omnibus (GEO) database [13,14] were identified by R software. After predicting the potential targets of DEMs, we constructed the coexpression network of miRNA-mRNA to identify hub genes. e hub genes were identified using bioinformatics methods including gene ontology (GO) annotation [15] and Kyoto Encyclopedia of Genes and Genomes (KEGG) [16] signal pathway enrichment analysis, while the prognostic value of hub genes was analyzed from the Cancer Genome Atlas (TCGA) database, and the expression of hub genes was confirmed in 69 gastric specimens with or without H. pylori infection. We hope this study will provide new information for the molecular mechanism of H. pylori-related GC.

Microarray Data.
e miRNA and gene expression profiles were obtained from the GEO dataset (https://www. ncbi.nlm.nih.gov/geo/). e screening criteria for GEO were as follows: (1) human gastric mucosa samples with or without H. pylori infection; (2) datasets were raw or standardized. e miRNA expression profile (GSE19769) [7] and the gene expression profile (GSE27411) [17] were included in this study. e GSE19769 profile was from the platform of GPL9081 including 10 cases of H. pylori-negative and 9 cases of H. pylori-positive gastritis specimen, while the GSE27411 from the platform of GPL6255 containing 6 cases of H. pylori-negative and 6 cases of H. pylori-positive atrophic gastritis.

Identification of DEMs and DEGs.
e significance analysis of DEGs and DEMs in H. pylori-negative and -positive samples was performed using the R language software (version 3.6.0, https://www.r-project.org/) and a limma R package. e Benjamini and Hochberg false discovery rate (FDR) method was used to adjust the P value to reduce the false-positive risk. e raw data of miRNAs and mRNAs expression were averaged and normalized, and the data of miRNAs were also log2-transformed, while the data with a median expression value of zero or less than zero were removed. P value < 0.05 and |log2 fold change (FC)| > 1 were used as the filter threshold for identifying DEGs and DEMs. A pheatmap R package was used for hierarchical clustering analysis and for drawing heat maps of DEGs and DEMs, while a ggplot2 R package was used for drawing volcano plot [13,14].

Interactive Analysis and Construction Expression
Network of miRNA-mRNA. MiRNA-mRNA regulatory networks were constructed to predict the potential hub genes of DEMs and DEGs. First, TargetScan (http://www.targetscan.org/ vert_72/), miRDB (http://mirdb.org/), PicTar (https:// pictar.mdc-berlin.de/), and Miranda (http://miranda.org. uk/) were used to predict targets of DEMs, while different software programs with different algorithms and only genes predicted by at least 3 software programs were selected as the targets of DEMs. e overlapped genes of the targets of DEMs and differential expression genes of GSE27411 dataset were used to construct coexpression networks by Cytoscape software [18].

GO and KEGG Analysis of Cross-Genes
. GO analysis is a common approach for gene annotation and gene classification from three aspects of cellular component (CC), biological process (BP), and molecular function (MF) [15]. KEGG is a comprehensive database resource with 17 main databases, which are used to understand advanced gene functions and practical biological systems [16]. e org.Hs.eg.db, clusterProfiler, enrichplot, and ggplot2 R packages were used for GO and KEGG analysis with a cut-off criterion of a P value < 0.05 [19].

TCGA Data Processing.
e TCGA database is a comprehensive and open database, which contains a variety of human cancer types [20], and was used for validating the relations between the hub genes of the network and the GC prognosis.
e inclusion criteria were as follows: (1) the primary site is the stomach; (2) the disease type is the adenomas and adenocarcinomas; (3) the data category is transcriptome profiling; (4) gene expression quantification is used for gene data type, while miRNA expression quantification is used for miRNA data type. Eventually, the mature miRNA expression and clinical data of 452 GC cases (42 normal and 410 having tumors) were downloaded from the TCGA database, and the gene expression and clinical data of 374 GC cases (30 normal and 344 having tumors) were obtained. Survival analysis was performed with a survival R package, while the original data were standardized by the log2 (x + 1) method. e prognostic value of cross-genes was determined by Kaplan-Meier analysis, and P < 0.05 was considered as a significant difference. e expression of hsa-miR-196b-3p and hsa-miR-196b-5p was detected in the ABI7500 quantitative PCR system (Applied Biosystems, USA) instrument by SYBR Green Premix Ex Taq II (Tli RNase H Plus, Takara, Japan). U6 small nuclear RNA was used as the internal controls. e 20 μl reaction mixture included 10.0 μl SYBR Green Master Mix, 4 μl cDNA template, 0.4 μl ROX Reference Dye, 1.0 μl primer pairs (10 μm), and 4.6 μl deionized water. PCR cycle was performed as follows: initial degeneration for 30 s at 95°C, followed by 40 cycles of 5 s at 95°C and 34 s at 60°C. e relative expression of hsa-miR-196b-3p and hsa-miR-196b-5p was calculated by comparing the cycle threshold (CT) method using the 2 −△△ct method with U6 expression according to [21]. e primers of hsa-miR-196b-3p were 5′-TCGACAGCACGACACTGCCTTC-3′ (sense) and 5′-GACACGGACCCACAGACA-3′ (antisense), while the hsa-miR-196b-5p primers were 5′-GCACCAGCGTAGG-TAGTTTCC-3′ (sense) and 5′-TATGCTTGTTCTC-GTCTCTGTGTC-3′ (antisense).

Detecting Cross-Genes by Immunohistochemistry.
Immunohistochemistry (IHC) was performed in paraffinembedded sections on the basis of the standardized protocol. Briefly, paraffin-embedded sections (2-4 μm) were deparaffinized with xylene and rehydrated in series gradient ethanol (100%, 95%, and 85% for 1 min, respectively) at room temperature. Heat antigen repair was performed in an autoclave (121°C for 1.5 min) in a citrate sodium buffer (0.01 M), and then endogenous peroxidase was blocked using 3% hydrogen peroxide for 10 min. Sections were incubated with anti-human antibody SMAD6 (ab80049, Abcam), TGFB2 (ab53778, Abcam), PITX2 (ab98297, Abcam), CALML4 (A5086, Zhongshan Golden Bridge, Beijing), and NRP1 (ab25998, Abcam) for 1 h at 37°C and then incubated with a rabbit polyclonal antibody for 20 minutes. After dying with diaminobenzidine for 5-10 min at room temperature, sections were sealed with neutral balata, respectively. e sections were evaluated with semiquantitative method. Briefly, more than 400 cells were counted in each section, while some necrotic cells and peripheral-colored cells were elided. More than 10% of cells were nuclear; staining in all cells was defined as proteinpositive expression, while less than 10% was protein-negative expression.
2.9. Statistical Analysis. All data were expressed as mean-± standard deviation (SD) of 3 independent experiments, and statistical analyses were performed with SPSS 17.0 (SPSS Ins., Chicago, IL, USA). e difference of miRNA and protein expression was analyzed with two-tail unpaired ttest with P < 0.05 considered as significant difference.

Identification of DEMs and DEGs.
e dataset GSE19769 was selected to screen DEMs including 10 H. pylori-negative samples and 9 H. pylori-positive samples. 470 human miRNAs were analyzed in this dataset, and 22 DEMs have met the filtration criteria of logFC > 1 and P value < 0.05, including 11 upregulated and 11 downregulated miRNAs ( Table 2). Volcano map and heat map for the hierarchical clustering of the DEMs were carried out by a pheatmap R package (Figures 1(b) and 1(d)). 6 H. pylori-negative and 6 H. pylori-positive specimens of GSE27411 were analyzed in this study, while there are 18 samples in the dataset. A total of 18187 human mRNAs were expressed, and 68 genes reached the filtration criteria, among which 56 genes showed upregulated and 12 genes showed downregulated expression (Table S1). Volcano map and heat map for the hierarchical clustering of the DEGs were drawn by the pheatmap R package (Figures 1(a) and 1(c)).

GO and KEGG Analysis of Cross-Genes in the Network.
To explore the biological functions of the cross-genes of the network, we used an enrichplot and a ggplot2 R package to analyze the GO categories and KEGG signal pathways of the cross-genes. In the BP, the cross-genes were concentrated in outflow tract septum morphogenesis, epithelial cell migration, and endothelial cell migration. In the CC, the crossgenes were enriched in transcription factor complex, synaptic membrane, and adherence junction, but RNA polymerase II proximal promoter sequence-specific DNA binding, proximal promoter sequence-specific DNA binding, and enhancer binding in the MF (Table 3 and Figure S3(A)). KEGG analysis demonstrated that the crossgenes were prominently enriched in the mitogen-activated protein kinase (MAPK) signaling pathway, Ras signaling pathway, and TGF-β signaling pathway (Table 4 and Figure S3(B)).

Survival Analysis Verification in TCGA.
Survival analysis demonstrated that hsa-miR-196b-3p (P � 0.02162, Figure 3(c)) was significantly related to the prognosis of GC patients, while hsa-miR-196b-5p was not (P � 0.1065, Figure 3(f)), and the patients with lower hsa-miR-196b-3p expression had poor outcomes. CALML4, SMAD6, PITX2, and TGFB2 gene expression were also closely correlated with the GC patients' prognosis. e survival time of GC patients with high expression of CALML4 or SMAD6 was significantly longer than that of patients with low mRNA expression of CALML4 (P � 0.02146, Figure 3(a)) or SMAD6 (P � 0.03213, Figure 3(d)). In addition, patients with high expression of PITX2 or TGFB2 had a significantly poor prognosis than those with low mRNA expression of PITX2 (P � 0.01874, Figure 3(b)) or TGFB2 (P � 0.01272, Figure 3(e)).

Validation of miRNA and Cross-Gene Expression.
e expression of hsa-miR-196b-3p and hsa-miR-196b-5p was analyzed from the TCGA database to determine its prognostic value of GC. e results revealed that the expression of hsa-miR-196b-5p (log2FC = 3.269, P < 0.001) and hsa-miR-196b-3p (log2FC = 4.674216894, P < 0.001; Figure 4(c) and Table S3) in GC tissue was significantly higher than that in normal tissue. e expression of hsa-miR-196b-3p and hsa-miR-196b-5p was also detected by qPCR in 69 GC and gastritis patient specimens, which included 18 H. pylori negative and 51 positive. e results showed that hsa-miR-196b-3p and hsa-miR-196b-5p were overexpressed in the 69 patients with log2FC values of 2.01665 and 1.8458, respectively ( Figure 4(d)). Further analysis showed that the expression level of hsa-miR-196b-3p in H. pylori-positive group was significantly higher than that in the negative group (P < 0.05, Figure 4(e)); however, there was no significant difference in hsa-miR-196b-5p expression between the H. pylori-negative and -positive groups (P > 0.05, Figure 4(f )). e protein expression of CALML4, SMAD6, PITX2, and TGFB2 was detected by immunohistochemistry in 69 specimens. e positive rate of CALML4 in H. pyloripositive samples (84.3%) was significantly higher than that in negative samples (55%, P < 0.01), while there was no significant difference in SMAD6, PITX2, and TGFB2 between the two groups (P > 0.05, Figures 4(a) and 4(b)).

Discussion
Although H. pylori infection is closely related to GC, the pathogenesis of H. pylori-related GC has not been clarified [7,[24][25][26]. erefore, a comprehensive study of the molecular mechanism of H. pylori-related GC may be helpful to understand the disease and get better diagnosis and treatment methods. As an important part of bioinformatics, gene expression microarray, which has been widely used in tumor research [27,28], can analyze the expression of thousands of  Type  MT1E  SPINK2  TEF  dBP  SYT17  ACSS1  TPSAB1  GUSBL1  TMPRSS6  GLDN  C9orf66  TSPAN7  LOC400986  F13A1  IRAK3  PDZK1LP1  FCN1  LTF  E2F8  GBP6  SERPINB7  SAA2  CYP4F11  DEF84  IL19   (CALML4, SMAD6, PITX2, and TGFB2) in H. pylori-related GC which can be used as biomarkers for GC prognosis. miRNA participates in biological, pathological processes and infections by downregulating target expression [29]. e ectopic expression of miRNAs plays an important role in the pathogenesis of multiple cancers, including H. pylori-related GC [8,30]. Several studies have shown that hsa-miR-196b is upregulated in GC and related to the outcomes of GC patients [31,32], while other research showed that hsa-miR-196b expression in Epstein-Barr virus-(EBV-) positive GC  was significantly lower than that in EBV negative [33]. However, the role of hsa-miR-196b in GC, especially in EBV or H. pylori infection, is still unknown. Our study showed that the expression level of hsa-miR-196b in H. pylori-positive group was significantly higher than that in H. pylori-negative group. Analyzing TCGA data showed that the hsa-miR-196b-3p expression, rather than hsa-miR-196b-5p, could be used as a better biomarker for GC prognosis, which was consistent with the previous report [31]. And we also verified the hsa-miR-196b-3p and hsa-miR-196b-5p expression in clinical samples with or without H. pylori infection. e results showed that the expression of   hsa-miR-196b-3p in H. pylori-positive group was significantly higher than that in the negative group (P < 0.05), while there was no significant difference in hsa-miR-196b-5p expression (P > 0.05). miRNA target analysis showed that hsa-miR-196b and hsa-miR-326 could regulate the expression of SMAD6 involved in many biological activities through phosphorylation of the TGF-β signaling pathway [34,35]. en, the results suggested that H. pylori may be involved in the pathogenesis of GC with hsa-miR-196b by regulating the expression of many genes and activating the infectious immune pathways. However, the molecular mechanism of hsa-miR-196b in infection and GC needs further experimental study.
Studies have shown that the hsa-miR-200 family, including hsa-miR-200a, participates in the negative feedback loop formed by ZEB1, ZEB2, and SIP1, in which hsa-miR-200 suppresses the expression of ZEB1, ZEB2, and, SIP1 and then downregulates their expression [36,37]. ZEB2 and SIP1 have also been suggested to inhibit the transcription of cyclin D1 [38]. Some studies showed that, in H. pylori infection, the CagA gene can promote transformation from G1 into S/G2 in the cell cycle through activating AP-1 and cAMP, which suggested that hsa-miR-200 was involved in the transformation from gastric epithelial cells to EMT through ZEB loop [39]. Our study showed that hsa-miR-200a-3p showed low expression in H. pylori-related GC, and it could inhibit the expression of PITX2 and TGFB2, which participate in the TGF-β pathway. Analysis of the relationship between cross-genes and GC prognosis showed that PITX2 and TGFB2 were related to the prognosis of GC. e survival time of GC patients with overexpression of PITX2 and TGFB2 was remarkably shorter than those with underexpression, while hsa-miR-200a-3p had no prognostic value for GC.
Some results have shown that hsa-miR-223 was abnormally overexpressed in GC and was significantly upregulated in H. pylori-infected tissues, which could be involved in the pathogenesis of GC by targeting FBXW7 [40]. Some studies showed that hsa-miR-411 showed low expression in GC, and overexpression of hsa-miR-411 in GC cell led to decreasing proliferation and increasing apoptosis [40], while hsa-miR-411 could not be used as an independent predictor of GC prognosis [41]. However, the role of hsa-miR-411 in H. pylori infection has not been reported. Our results showed that the expression level of hsa-miR-223 was high in H. pylori-positive GC, while that of hsa-miR-411 was low, and both of them can regulate CALML4, which can activate the CGMP-PKG signaling pathway. Besides, patients with overexpression of CALML4 had better outcomes than those with underexpression. However, the mechanism of CALML4 regulated by hsa-miR-223 and hsa-miR-411 in the pathogenesis of H. pylori-related GC needs further experimental study.
In conclusion, we constructed a coexpression network of miRNA-mRNA and identified the key genes of hsa-miR-196b, CALML4, PITX2, TGFB2, and SMAD6 in H. pylorirelated GC, which may provide a new way for the diagnosis and treatment of H. pylori-related GC.