Screening of Hub Genes Associated with Pulmonary Arterial Hypertension by Integrated Bioinformatic Analysis

Background Pulmonary arterial hypertension (PAH) is a disease or pathophysiological syndrome which has a low survival rate with abnormally elevated pulmonary artery pressure caused by known or unknown reasons. In addition, the pathogenesis of PAH is not fully understood. Therefore, it has become an urgent matter to search for clinical molecular markers of PAH, study the pathogenesis of PAH, and contribute to the development of new science-based PAH diagnosis and targeted treatment methods. Methods In this study, the Gene Expression Omnibus (GEO) database was used to downloaded a microarray dataset about PAH, and the differentially expressed genes (DEGs) between PAH and normal control were screened out. Moreover, we performed the functional enrichment analyses and protein-protein interaction (PPI) network analyses of the DEGs. In addition, the prediction of miRNA and transcriptional factor (TF) of hub genes and construction miRNA-TF-hub gene network were performed. Besides, the ROC curve was used to evaluate the diagnostic value of hub genes. Finally, the potential drug targets for the 5 identified hub genes were screened out. Results 69 DEGs were identified between PAH samples and normal samples. GO and KEGG pathway analyses revealed that these DEGs were mostly enriched in the inflammatory response and cytokine-cytokine receptor interaction, respectively. The miRNA-hub genes network was conducted subsequently with 131 miRNAs, 7 TFs, and 5 hub genes (CCL5, CXCL12, VCAM1, CXCR1, and SPP1) which screened out via constructing the PPI network. 17 drugs interacted with 5 hub genes were identified. Conclusions Through bioinformatic analysis of microarray data sets, 5 hub genes (CCL5, CXCL12, VCAM1, CXCR1, and SPP1) were identified from DEGs between control samples and PAH samples. Studies showed that the five hub genes might play an important role in the development of PAH. These 5 hub genes might be potential biomarkers for diagnosis or targets for the treatment of PAH. In addition, our work also indicated that paying more attention on studies based on these 5 hub genes might help to understand the molecular mechanism of the development of PAH.


Background
Pulmonary arterial hypertension (PAH) is a disease or pathophysiological syndrome of abnormally elevated pulmonary artery pressure caused by known or unknown reasons [1,2]. PAH not only can be caused by the pathological changes of pulmonary vessels themselves but also be secondary to other heart, lung, or systemic diseases [3]. Pulmonary circulatory disturbance and high right heart load are the main characteristics of PAH [3]. Severe PAH can lead to right heart failure and even death. It is reported that the PAH prevalence rate is 15-60 cases/million people/year, and the incidence rate is 5-10 cases/million people/year [4]. In the past few decades, great progress has been made in understanding the basic pathobiology of PAHs and their underlying history, prognostic biomarkers, and treatment options; however, the mortality from PAH remains high. Recently, the US-REVEAL registry displayed a low 5-year survival rate of 61.2% in patients with newly diagnosed PAH [5]. Therefore, it has become an urgent matter to search for clinical molecular markers of PAH, study the pathogenesis of PAH, and contribute to the development of new science-based PAH diagnosis and targeted treatment methods.
Bioinformatic analysis is a tool that can be used to discover potential molecular markers in the pathology of disease by analyzing the differential gene expression between patients and healthy controls [6]. At present, the in-depth study of transcriptome data through bioinformatic analysis provides a new reference for finding new diagnostic molecular markers, prognostic monitoring markers, and therapeutic targets [7,8]. The primary diagnostic and evaluation method for pulmonary hypertension is invasive right heart catheterization, and the primary therapeutic agent is difficult to achieve a satisfactory therapeutic outcome due to its systemic effect on blood vessels. Therefore, we intend to use the information of PAH patients in the GEO database for bioinformatic analysis to find diagnostic markers and target genes for treatment, so as to reduce the harm caused by invasive diagnostic techniques and reduce the side effects caused by nonspecific treatments. We hypothesized that some genes or proteins found through bioinformatics could contribute to the diagnosis and treatment of PAH more specifically and help for molecular mechanism of PAH.
Through gene expression analysis by chip technology, more data of PAH expression profile are revealed, which is helpful for comprehensive basic research and understanding of biological function of differentially expressed genes (DEGs) of PAH. In this study, a microarray dataset about PAH was downloaded from the Gene Expression Omnibus (GEO) database, and the DEGs between PAH samples and normal samples were identified. Moreover, we performed the functional enrichment analyses including Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses and protein-protein interaction (PPI) network analyses of the DEGs. In addition, the prediction of microRNA (miRNA) of hub genes and construction of the miRNA-hub gene network were performed. This study is aimed at exploring the hub genes related to PAH by bioinformatic analysis.

Materials and Methods
2.1. Microarray Data Analysis. The National Center for Biotechnology Information Gene Expression Omnibus (NCBI-GEO) (https://www.ncbi.nlm.nih.gov/geo/) is a public database created in the year 2000 [9,10]. It contains the transcriptome data of microarray chips that were submitted by all kinds of institutions around the world. One mRNA expression profiling (GSE117261) was downloaded from the GEO database for further analysis, whose data has been normalized and log2 transformed. The GSE117261 contained gene expression information of 58 PAH lung tissue samples and 25 normal samples. The demo data of the samples in GSE117261 was shown in Table S1. These microarray data were executed with the help of GPL6244 [HuGene-1_0st] Affymetrix Human Gene 1.0 ST Array [transcript (gene) version]. The detail of GSE117261 was shown in Table 1.
To make this article better understand, the workflow of this study was shown in Figure 1.

Data
Processing. After GSE117261 was downloaded, probe identification numbers were transformed into gene symbols in R software (version 3.6.3 https://www.r-project .org) for further analyses, respectively [11]. The annotation library hugene10sttranscriptcluster.db was used to performed the probesets mapping to their respective gene symbol identifier, and probesets annotated to the same gene symbol identifier were aggregated according to their mean value [12,13]. In the GSE117261 dataset, the "limma" package in the R software (version 3.6.3) was used to screened out DEGs between PAH samples and normal samples [14]. DEGs with a threshold of |log 2 fold change ðFCÞ | >1 and P ad j value <0.05 as the cut-off criteria were selected for further analyses [15,16].

Functional Enrichment
Analyses. The DAVID database (https://david.ncifcrf.gov/) is an online bioinformatic tool that provides a comprehensive set of functional annotation tools for researchers to understand the biological meaning behind a large number of genes [17,18]. To further understand the function of DEGs in PAH, the DEGs were uploaded to the DAVID database to perform the GO enrichment analysis. The "clusterprofile" package in R software can perform statistical analysis and visualization of functional clustering of gene collections [19]. The KEGG pathway enrichment analysis of DEGs was performed using the "clusterprofiler" package [20]. Gene Ontology is a widely used ontology in the field of bioinformatics, which covers three aspects of biology: biological process (BP), cellular component (CC), and molecular function (MF) [21]. KEGG pathway annotation and analysis of DEGs can determine the major metabolic and signal transduction pathways involved in these genes [22]. The P value < 0.05 was regarded as the threshold values for remarkable enrichment.
2.4. Construction of the PPI Network. The Search Tool for the Retrieval of Interacting Genes (STRING) database (http:// string-db.org/) is a database for searching known proteinprotein interactions and predicting protein-protein interactions. It contains not only experimental data, data from text mining, but also results of protein-protein interactions predicted by using bioinformatic methods [23]. In this study, a PPI network of DEGs that was established with the minimum interaction score which was set as 0.4. Cytoscape is a very powerful tool for visualizing web data that can be used to demonstrate the interrelationships between a set of genes/proteins. Next, the PPI network was visualized by the Cytoscape software (version 3.7.2) [24]. CytoHubba from The genes scored in the top 10 by all 5 methods were selected as hub genes.

Prediction of miRNA and TF of Hub Genes and
Construction miRNA-TF-Hub Gene Network. The online prediction tool microRNA Data Integration Portal (mirDIP) (http://ophid.utoronto.ca/mirDIP) is a web-based computational database that integrates dozens of bioinformatic tools used to predict the target miRNA of genes [25]. We used the mirDIP database to predict potential miRNA targeting of hub genes. The hub genes were submitted with the threshold set as follows: minimum score = very high, and the top five predicted miRNAs of every gene were chosen and listed. TRRUST (Transcriptional Regulatory Relationships Unraveled by Sentence-Based Text Mining) (version 2, http:// www.grnpedia.org/trrust/) is an artificially annotated transcriptional regulatory network database [26]. The TRRUST database contains 800 human transcription factors (TFs) and 828 mouse TFs, with 8444 human and 6552 mouse TFtarget regulatory relationships, respectively [26]. Then, the TRRUST database was used to predicted the TFs of the hub genes [27]. The TF-hub gene interaction pairs with P values <0.05 were selected to establish the regulatory network. Finally, the Cytoscape software was used to construct the miRNA-TF-hub gene network.
2.6. Drug-Gene/Protein Interactions. The Drug-Gene Interaction database (DGIdb) (http://dgidb.genome.wustl.edu/) was used to identify potential drug targets for the 5 identified hub genes. It contains data from 13 different sources on human drugs, drug-deliverable genes, and drug-gene interactions and currently contains more than fourteen thousand druggene interactions including more than six thousand drugs and two thousand human genes [28].

Diagnostic Significances of 5 Hub
Genes. The evaluation of the diagnostic value of hub genes was analyzing by establishing a receiver operating characteristic (ROC) according to the hub gene expression data in 58 PAH patients and 25 normal control samples. The area under the curve (AUC) value of the ROC curve was used to determine the diagnostic effect of the hub genes in distinguishing patients with PAH from normal subjects. Usually, an AUC value of >0.85 indicated excellent diagnostic value [29,30].

Statistical
Analysis. The moderate t-test was used to screen out DEGs; Fisher's exact test to analysis was used to performed function enrichment analysis including GO and KEGG analysis [31]. All statistical analyses were performed in R version 3.6.3 software.

Identification of DEGs in PAH.
To explore the drive genes of PAH, we first excavated the mRNA expression profiling (GSE117261) of PAH and normal tissue from GEO and filtrated DEGs compared to the normal tissue. In our result, 69 DEGs of PAH were obtained. Among them, 38 were upregulated expression (log 2 FC > 1), and 31 were downregulated expression (log 2 FC < −1) ( Table 2 and Table S2). These DEGs were shown as a volcano plot and heat map in Figures 2(a) and 2(b).

Functional Enrichment Analyses.
To further understand the function of 69 DEGs in PAH, these 69 DEGs were  3 BioMed Research International uploaded to the DAVID database to perform the GO enrichment analysis (Table S3 and Figures 3(a)-3(c)). The P value <0.05 was regarded as the threshold values for remarkable enrichment. The top five GO terms of DEGs based on the P value are shown in Table 3. In BP analysis, it was shown that the DEGs were mainly involved in the inflammatory response, neutrophil chemotaxis, cellular response to tumor necrosis factor, immune response, and cell chemotaxis. In CC analysis, it was significantly involved in the extracellular region, extracellular space, proteinaceous extracellular matrix, extracellular exosome, and extracellular matrix. In addition, MF analysis showed that DEGs were mainly involved in chemoattractant activity, chemokine receptor binding, RAGE receptor binding, integrin binding, and chemokine activity. To obtain more information about the crucial pathways of these DEGs, a KEGG pathway analysis was performed (Table S4). The top ten KEGG terms of DEGs according to the P value are shown in Table 4 and Figure 3(d). KEGG enrichment analysis showed that DEGs were mainly involved in hematopoietic cell lineage, cytokine−cytokine receptor interaction, fluid shear stress and atherosclerosis, viral protein interaction with cytokine and cytokine receptor, and malaria (Table 4).
3.3. Construction of the PPI Network. The PPI network of DEGs was constructed by using the STRING database and Cytoscape software. After removing the unconnected nodes, a PPI network that contained 51 nodes and 132 edges was constructed ( Figure 4). The cytoHubba in the Cytoscape software was used to identify the hub genes of PAH scored in the top 10 by all 5 methods. These genes were SPP1, CXCL12, CXCR1, VCAM1, and CCL5, which may play an important role in PAH progression.

Prediction of miRNA and TF of Hub Genes and
Construction miRNA-TF-Hub Gene Network. To further explore the mechanism of the hub genes, we investigated the potential interaction network of these hub genes. According to the mirDIP database and TRRUST database, there were 131 potential miRNAs (Table S5) and 7 TFs by targeting hub genes, and a miRNA-TF-hub gene interaction network was established. Finally, the miRNA-TF-hub gene interaction network was visualized via Cytoscape software ( Figure 5).

Discussion
PAH still is a severe disease which is difficult to diagnosed and continuously makes patients and social suffering. The rapid development of high-throughput microarray technology and bioinformatic could provide more references for finding diagnostic biomarkers and prognostic suggestions for diseases.
In our work, GSE117261, a mRNA expression profile which downloaded from the NCBI-GEO database, was used to explore potential biomarkers and molecular mechanisms of PAH. Firstly, 69 DEGs (38 upregulated genes and 31 downregulated genes) between PAH lung tissues and normal lung tissues were screened out. Then, GO and KEGG analysis was conducted to gain more insights into the function of these DEGs in PAH. Further, we constructed a PPI network via the STRING database and Cytoscape software and screened out 5 hub genes (CCL5, CXCL12, VCAM1, CXCR1, and SPP1) from the PPI network. Moreover, the miRNA-hub gene interaction network was established. Finally, the candidate drug targeting hub genes with PAH were screened out via the DGIdb database.
CCL5 (C-C motif chemokine ligand 5), also known as SCYA5, RANTES, and TCP228, is one of the chemokine system genes and belongs to the C-C chemokine subfamily [32,33]. CCL5 could be released from platelets, macrophages, fibroblasts, endothelium, and epithelial cells [34]. And it was reported that CCL5 plays multiple roles in Table 2: Screening DEGs in PAH by integrated analysis of microarray.

DEGs
Gene names   Upregulated   LTBP1, PDE3A, HBB, MFAP4, BMP6, GEM, SFRP2, CCL5, SLC18A2, RGS5, DEPP, WIF1, ASPN, VCAM1, AGBL1,  RGS1, CXCL12, POSTN, MS4A, MXRA5, DPT, OGN, CD69, CCL21, KIT, ENPP2, CPA3, THY1, EDN1, ESM1, GZMK,  FABP4, ROBO2, COL6A6, CCDC80, CHIAP2, PI15, HAS2 Downregulated CSF3R, RNASE2, SULT1B1, SAA2, S100A9, MGAM, NQO1, CR1, SIGLEC10, IL1R2, RIPOR2, S100A8, AQP9, S100A12, SAA1, CD163, LCN2, TIMP4, ANPEP, CXCR1, VNN2, HMOX1, BPIFA1, SLCO4A1, ELF5, BPIFB1, SPP1, SERPINA3, CHIT1, MS4A15, MSMB DEGs: differentially expressed genes; PAH: pulmonary arterial hypertension.   PI15  HAS2  GEM  VCAM1  CXCL12  CCL21  CCDC80  MXRA5  THY1  POSTN  ENPP2  SFRP2  OGN  FABP4  CHIAP2  WIF1  AGBL1  COL6A6  DPT  ASPN  ROBO2  KIT  CPA3  SLC18A2  MS4A2  RGS5  PDE3A  LTBP1  MFAP4  BMP6  ESM1  HBB  EDN1  CCL5  GZMK  DEPP1  RGS1  CD69  MS4A15  ELF5  CHIT1  NQO1  HMOX1  SPP1  SIGLEC10  SAA2  SAA1  MGAM  RIPOR2  SULT1B1  IL1R2  CXCR1  VNN2  AQP9  S100A12  S100A9  S100A8  ANPEP  RNASE2  CSF3R  CR1  CD163  SERPINA3  TIMP4  SLCO4A1  BPIFB1  MSMB  LCN2  CCL5 was a risk factor for the pathogenesis of PAH, and CCL5 could exert vasoconstriction and remodel effects on the lung tissue in PAH [41]. In our study, we found that CCL5 is overexpressed in PAH patients, and this is consistent with previous research. What role does CCL5 play in pulmonary hypertension? It is reported that there may be in the following aspects: CCL5 may play an indirect role in PAH by inducing endothelin-converting enzyme 1 and endothelin-1, and endothelin-1 is a powerful endothelin-derived factor with strong vasoconstriction effects [42]. CCL5 is one of the genes regulating the NF-κB signaling pathway [40]. Activation of NF-κB is a feature of many chronic inflammatory conditions such as asthma and chronic obstructive pulmonary disease [40,43,44]. In atherosclerosis, activation of NF-κB can be seen in macrophages endothelial cells and vascular smooth muscle cells in atherosclerotic plaques [45]. Studies show that NF-κB may play an important role in PAH via mediating the cytokine-induced release of endothelin-1 [46]. PAH animal models have demonstrated that proinflammatory cytokines and chemokines are involved in the development of wild liliine-induced pulmonary hypertension [47,48]. Therefore, we speculate that the crosstalk between the CCL5 and NF-κB pathway plays an important role in the development of PAH, doing further researches of which might make a great progress in PAH. However, we found in the DGIdb database that there are currently no FDA-approved drugs targeting CCL5. Therefore, the SPP1 (secretory phospho-protein1), also known as OPN or osteopontin, is a chemokine-rich matrix phosphoglycoprotein, which mostly exists in human body fluids, lungs, gastrointestinal tract, and other organs. Previous studies have found that the SPP1 gene is highly expressed in idiopathic pulmonary fibrosis (IPF), the occurrence, and metastasis of multiple tumors. High expression of SPP1 promotes the occurrence of lung fibers by regulating the expression of many genes, such as LEP and KCNJ5 [49]. Chronic hypoxia and repeated chronic airway inflammation will lead to lung tissue destruction, fibrosis, and then increase pulmonary vascular resistance and eventually induce exacerbating PAH. Pulmonary fibrosis is a common reason for the formation of PAH. The comprehensive analysis showed that the high expression of SPP1 should be a driver of PAH. It has been reported that SPP1 plays a role in PAH via enhancing pulmonary vascular smooth muscle cell (PVSMC) proliferation [50,51]. The expression level of SPP1 was related to the severity of PAH [52,53]. These results suggest that SPP1 may be a prognostic indicator and therapeutic target for PAH. Targeted SPP1 therapy might reverse the development of pulmonary fibrosis and prevent or delay the progression of pulmonary hypertension. Further study of the molecular mechanism of SPP1 in pulmonary hypertension will be beneficial to the majority of patients.
VCAM1 encodes adhesion molecules induced by proinflammatory cytokines, and it has been reported that VCAM1 is increased in systemic sclerosis complicated with PAH [54]. VCAM1 on endothelial cells is involved in leukocyte adhesion and activates intracellular calcium release and NADPH oxidase Nox2, further promoting leukocyte migration [55,  ]. It has been reported that the concentration of VCAM-1 only increases in PAH-CTD, but not in patients with IPAH, suggesting that VCAM1 may play a specific role in PAH subtypes and may be related to the degree of inflammation, vascular injury, and antiangiogenesis [57]. However, other studies have found that VCAM1 is increased in both patients and animal models of IPAH [58,59]. Therefore, the expression pattern and role of VCAM1 in pulmonary hypertension need to be further studied. CXCL12 belongs to the CXC subfamily of chemokines, and increased blood CXCL12 levels are associated with right ventricular dysfunction in patients with idiopathic pulmonary hypertension [60]. The expression level of CXCL12 is significantly increased in the pulmonary vascular endothe-lium and the endothelium of the vasa vasorum of larger pulmonary vessels removed from PAH patients when compared to normal control [61,62]. The neutralization effect of CXCL12 can reduce the infiltration of pulmonary macrophages, so as to improve pulmonary hypertension in rats [60]. The promoter region of CXCL12 contains binding sites for several transcription factors, including NF-κB [63]. CXCL12 may play an in PAH via the NF-κB signaling pathway. It is reported that the CXCL12/CXCR4/CXCR7 axis plays a central role in PAH [64]. These results also suggest that CXCL12 may be a therapeutic target for PAH. CXCR1, also known as IL8R1, encoded a protein that is a member of the G-protein-coupled receptor family [65]. This protein is a receptor for interleukin 8 (IL8) [65]. Bone   morphogenetic protein receptor-II (BMPR-II) expressed in pulmonary artery endothelial cells, which play an antiinflammatory role by regulating the release of proinflammatory cytokines and promote the barrier function by inhibiting the migration of white blood cells to the pulmonary vascular wall [66]. In mice with deficient expression of BMPR-II, the decreased barrier function and the resulting PAH are the results of increased leukocyte recruitment caused by 13 BioMed Research International CXCR1/2 increase and inhibiting CXCR1/2 could be slow down to the progress of PAH [66]. These results suggest that CXCR1 may be a therapeutic target for PAH.
However, there are some limitations in this study. First, the sample size was relatively small. Second, this study was based on bioinformatics, and the results were based on computer analysis; so, the results need to be verified by experiments in vivo and in vitro.

Conclusions
Through bioinformatic analysis of microarray data sets, 5 hub genes (CCL5, CXCL12, VCAM1, CXCR1, and SPP1) were identified from DEGs between control samples and PAH samples. Studies showed that the five hub genes might play an important role in the development of PAH. These 5 hub genes might be potential biomarkers for the treatment of PAH. In addition, our work also indicated that paying more attention on studies based on these 5 hub genes might help to understand the molecular mechanism of the development of PAH.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors have no conflicts of interest.

Authors' Contributions
Yu Zeng, Nanhong Li, Zhenzhen Zheng, Junfen Cheng, and Cheng Hong contributed to the study concept and design and drafted the manuscript. Yu Zeng, Nanhong Li, Zhenzhen Zheng, Min Peng, Wang Liu, and Jinru Zhu contributed to the collection and assembly of data. Yu Zeng, Nanhong Li, and Mingqing Zeng contributed to preparing the tables and figures. Yu Zeng, Nanhong Li, Zhenzhen Zheng, Junfen Cheng, and Cheng Hong contributed to the data analysis and interpretation. All authors reviewed and approved the final version manuscript. Yu Zeng, Nanhong Li, Zhenzhen Zheng, and Riken Chen contributed equally to this work.