Identification of Signal Pathways and Hub Genes of Pulmonary Arterial Hypertension by Bioinformatic Analysis

Pulmonary arterial hypertension (PAH) is a progressive and complex pulmonary vascular disease with poor prognosis. The aim of this study was to provide a new understanding of the pathogenesis of disease and potential treatment targets for patients with PAH based on multiple-microarray analysis.Two microarray datasets (GSE53408 and GSE113439) downloaded from the Gene Expression Omnibus (GEO) database were analysed. All the raw data were processed by R, and differentially expressed genes (DEGs) were screened out by the “limma” package. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed and visualized by R and Cytoscape software. Protein-protein interactions (PPI) of DEGs were analysed based on the NetworkAnalyst online tool. A total of 442 upregulated DEGs and 84 downregulated DEGs were identified. GO enrichment analysis showed that these DEGs were mainly enriched in mitotic nuclear division, organelle fission, chromosome segregation, nuclear division, and sister chromatid segregation. Significant KEGG pathway enrichment included ribosome biogenesis in eukaryotes, RNA transport, proteoglycans in cancer, dilated cardiomyopathy, rheumatoid arthritis, vascular smooth muscle contraction, focal adhesion, regulation of the actin cytoskeleton, and hypertrophic cardiomyopathy. The PPI network identified 10 hub genes including HSP90AA1, CDC5L, MDM2, LRRK2, CFTR, IQGAP1, CAND1, TOP2A, DDX21, and HIF1A. We elucidated potential biomarkers and therapeutic targets for PAH by bioinformatic analysis, which provides a theoretical basis for future study.


Introduction
According to the latest clinical guidelines, pulmonary hypertension (PH) is divided into five major categories, the first of which is pulmonary arterial hypertension (PAH). PAH is a progressive and fatal disorder accompanied by nonspecific clinical manifestations such as chest pain, dyspnea, and cyanosis [1]. e main pathophysiological changes in PAH are increases in pulmonary arterial pressure and pulmonary vascular resistance (PVR) because of vascular proliferation and adverse remodeling in the distal pulmonary arteriole, giving rise to right heart failure and even death [1,2]. Although basic discoveries and pivotal clinical trials have led to the development of medications in recent years [3], patients with PAH still have a poor prognosis. According to a previous study of Chinese people, the 1-and 3-year survival rates are only 68% and 39%, respectively, and even though survival rates for PAH have improved in China in the targeted therapy era, the mortality is still high [4]. e existing medications, including prostacyclin analogues and receptor agonists, phosphodiesterase type 5 inhibitors, and endothelin receptor antagonists, are aimed at correcting the imbalance of vasoactive factors in PAH [5,6]. ere is a shortage of new drugs to address other pathological mechanisms such as vascular remodeling [6]. erefore, it is important to elucidate the molecular mechanisms and to identify novel drug targets for potential therapy.
Microarrays are powerful and effective tools to detect the gene expression differences between health and disease conditions. Microarray technology has been used for more than two decades and is a well-established and mature biochemical technique [7]. A large amount of microarray data is freely available from the NCBI-Gene Expression Omnibus (GEO), which is an open access database. e objective of this study was to use transcriptomic microarray data analysis to find genes associated with the onset of pulmonary hypertension, and to provide a research basis for identifying new potential therapeutic targets. In this study, we searched for and screened the transcriptome chip data of PAH from GEO to find differentially expressed genes (DEGs) between PAH and controls using bioinformatics analyses. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed to detect statistically significant signaling pathways of DEGs. Also, a protein-protein interaction (PPI) network was established to identify hub genes. is research will provide a direction for the discovery of diagnostic biomarkers and targeted therapy strategies related to PAH.

Microarray Data Source and Screening.
e GEO is a public database that stores microarray, next-generation sequencing, and other high-throughput sequencing data (https://www.ncbi.nlm.nih.gov/). Using the search terms "pulmonary hypertension" and "microarray," and after filtering by "Homo sapiens" and "Expression profiling by array," we identified 15 transcriptomic studies. en, only human transcriptomic studies from lung tissue and with the same GEO platform were included. We ultimately selected two GEO series (GSE) studies (GSE53408, GSE113439) for further analysis. e study information is shown in Table 1.
In the two original studies, lung tissue samples were from PAH patients undergoing lung transplantation and healthy lung tissue was derived from normal tissue adjacent to cancer during lung cancer resection. First, total RNA was extracted by the Trizol method and reverse transcribed into cDNA. Second, cRNA was obtained by in vitro transcription using cDNA as a template and labeled by biotin.
ird, GeneChips were hybridized with cRNA, washed, and stained in an Affymetrix Fluidics Station 400. Finally, GeneChips were scanned using a GeneArray scanner. e two GSE studies used two different scanners.

Data
Processing and Quality Control. GSE53408 [8] contains 11 normal and 12 severe PAH lung tissue samples, while GSE113439 [9] contains 11 normal and 15 PAH samples (one CTEPAH sample (GSM3106338) was removed because it does not belong to WHO Group 1 pulmonary hypertension). We downloaded the raw CEL files of GSE53408 and GSE113439 from the GEO database. R software (version 4.0.2) was used to process the raw data.
e "affy" package [10] was used to read the CEL files and extract Affymetrix GeneChip probe level data. After correcting background, normalizing, and summarizing by using the Robust Multichip Average (RMA) algorithm [11], we got the probe expression matrix. Finally, we converted probe IDs into gene symbols (international standard names of genes) using the GPL6244 annotation package ("huge-ne10sttranscriptcluster.db" [12]). Probe IDs without corresponding gene symbols were deleted, and the maximum value was regarded as the gene expression value when multiple probes corresponded to one gene. We further carried out quality control for the gene expression matrix obtained above using cluster analysis and principal component analysis (PCA).

Screening and Integration of DEGs.
We used the "limma" package [13] for DEG analysis. Gene expression values have been log2 transformed and normalized after RMA algorithm processing and were fitted to a linear model by using weighted least squares. Differential expression analysis was performed between PAH and controls. "Limma" package calculated the log2 fold change (log2FC) and false discovery rate (FDR) adjusted P value for each gene. FDR uses the Benjamini-Hochberg method to correct the P value in order to control the number of false positives for multiple tests. Adjusted P value <0.05 and |log2FC| ≥ 1.0 were considered as DEGs. Considering the large batch effect between GSE53408 and GSE113439, we analysed and obtained the DEGs for each dataset. Ultimate DEGs were integrated by using the Venn diagram tool (https://bioinformatics.psb. ugent.be/webtools/Venn/).

GO and KEGG Pathway Enrichment
Analysis. GO enrichment analysis of DEGs was performed using the "enrichGO" function of the "clusterProfiler" package [14]. Genes were divided into three categories in function including biological process (BP), molecular function (MF), and cellular components (CC). GO terms were considered significantly enriched for the DEGs while an adjusted P value <0.05. e Benjamini and Hochberg method was used to adjust the raw P values. e KEGG (https://www.genome. jp/kegg/) pathway enrichment analysis of integrated DEGs in our study was performed using the DAVID webtools (https://david.ncifcrf.gov/). P value <0.05 was considered to be significant statistically. e visualization of analytical results was achieved by using the R (4.0.2 version) and Cytoscape software (3.8.0 version). Mura et al. [9] Lung tissue GSE113439 GPL6244 11 14 Abbreviations: PAH, pulmonary arterial hypertension.

Tissue-Specific PPI Network Analysis and Hub Gene
Identification. Different gene lists were inputted into Net-workAnalyst for lung tissue-specific PPI network analyses [15]. NetworkAnalyst is a bioinformatic online analysis tool with power functions and friendly operation interface, which can be accessed in https://www.networkanalyst.ca/. PPI data from human tissues were obtained from the Dif-ferentialNet database (https://netbio.bgu.ac.il/diffnet/) [16]. Hub genes were further identified by a selected minimum connected network. Each node represents a protein encoded by a gene while connections between nodes represent the interaction of proteins. e nodes of maximum connectivity were considered as core proteins or hub genes with important biological regulatory functions in PAH.

Microarray Data Information.
To identify DEGs in PAH compared with healthy individuals, two gene expression profiles were selected in our study. ey were GSE53408 and GSE113439. Raw data were downloaded and processed by the "affy" package. e gene expression values were normalized by the RMA integrated preprocessing algorithm. Boxplots of data before and after normalization are shown in Figure 1(a). e results of cluster analysis and PCA indicate that the PAH group and the control group data can be readily distinguished, as shown in Figure 1(b). erefore, subsequent genic difference analysis could be carried out.

DEG Analysis in PAH.
DEGs between PAH and normal lung tissues were evaluated by the "limma" package. ere were 635 DEGs in GSE53408, including 525 upregulated genes and 110 downregulated genes ( Figure 2(a)). And 564 DEGs were screened in GSE113439, including 468 upregulated genes and 96 downregulated genes ( Figure 2(a)). And the top 100 DEGs are displayed in the form of heatmaps, as shown in Figure 2(b). Taking into account the original experimental processes such as different model Genechip scanners, the two sets of data have a large batch effect, so a Venn diagram was constructed to obtain the final DEGs, as shown in Figure 2(c). Finally, 526 DEGs were identified in total, with 442 upregulated genes and 84 downregulated genes (Table 2). e top twenty upregulated and downregulated DEGs sorted by average log2FC size of the two datasets are shown in Table S1.

GO Term Enrichment Analysis of DEGs.
GO enrichment analysis for integrated DEGs was performed using the "clusterProfiler" package. e enriched GO terms were divided into three ontologies including biological process (BP), molecular function (MF), and cell component (CC). Significant results of GO enrichment analysis are exhibited in Figure 3(a). In BP, DEGs were generally enriched in mitotic nuclear division, organelle fission, chromosome segregation, nuclear division, and sister chromatid segregation. In MF, DEGs were chiefly enriched in ATPase activity, helicase activity, DNA-dependent ATPase activity, catalytic activity acting on DNA, and DNA helicase activity.
In CC, DEGs were mainly enriched in condensed chromosome, spindle, chromosomal region, mitotic spindle, and microtubule (Table S2 and Figure S1).

KEGG Pathway Enrichment Analysis of DEGs.
After inputting the DEGs list into DAVID online tool, the most significantly enriched KEGG pathways of the DEGs were obtained (Table S3). e signaling pathways of DEGs were eventually enriched in ribosome biogenesis in eukaryotes, RNA transport, proteoglycans in cancer, dilated cardiomyopathy, rheumatoid arthritis, vascular smooth muscle contraction, focal adhesion, regulation of actin cytoskeleton, and hypertrophic cardiomyopathy (Figure 3(b)). Cytoscape software was used to calculate the network topological characteristics and draw each node and edge. Different pathway terms and genes are displayed by ovals in different colors, as shown in Figure 4(a).

Tissue-Specific PPI Network Analysis and Hub Gene
Identification. Protein-protein interactions among the DEGs were predicted by the NetworkAnalyst tool. Human lung tissue-specific networks were constructed based on the DifferentialNet database. We chose a minimum connected network for identifying hub genes in PAH. is PPI network contains 1030 nodes and 3364 edges (Figure 4(b)). e top ten hub genes evaluated according to the degree of connectivity in the PPI network were HSP90AA1, CDC5L, MDM2, LRRK2, CFTR, IQGAP1, CAND1, TOP2A, DDX21, and HIF1A ( Table 3). All of these identified hub genes were upregulated in PAH patients.

Discussion
Pulmonary arterial hypertension is a chronic and progressive disease with few typical manifestations and thus could be delayed in diagnosis and treatment [2]. Unfortunately, the fundamental pathogenesis of PAH preliminarily remains unclear warranting the necessity to explore basic mechanisms to delineate specific targets promising for therapeutic intervention.
In this study, we downloaded two microarray datasets, GSE53408 and GSE113439, from the GEO database and separately obtained the DEGs by bioinformatics analysis. en the upregulated and downregulated genes from two datasets were intersected by a Venn diagram webtool to obtain the final integrated DEGs. We identified 526 DEGs, including 442 upregulated genes and 84 downregulated genes. e integrated DEGs were further subjected to GO and KEGG pathway enrichment analysis and tissue-specific PPI network analysis.
ere has been a number of PAH studies using microarray analyses [17][18][19], and the signaling pathway or hub genes identified by each study varied widely. is was mainly due to different microarray platforms or different data processing procedures. In Li's study [18], they analysed the DEGs using the GEO2R online tool. Although the GEO2R tool is simple and convenient, the average gene expression value varies from one sample to another in the raw data, so it is hard to ensure the accuracy of the analysis  CHDPAH2 CHDPAH3  CHDPAH4  CTDPAH1  CTDPAH2  CTDPAH3  CTDPAH4  IPAH2  IPAH3  IPAH4  IPAH5  IPAH6  normal1  normal2  normal3  normal4  normal5  normal6  normal7  normal8  normal9  normal10  normal11  IPAH1   4   8   10   12   14   CHDPAH1  CHDPAH2  CHDPAH3  CHDPAH4  CTDPAH1  CTDPAH2  CTDPAH3  CTDPAH4  IPAH2  IPAH3  IPAH4  IPAH5  IPAH6  normal1  normal2  normal3  normal4  normal5  normal6  normal7  normal8  normal9  normal10   results. Our study started by analyzing raw CEL format microarray data and normalized it using the RMA algorithm to keep the gene expression levels of each sample at the same level, which ensured the reliability of downstream analysis. Luo et al. [17] combined the disease group and control samples of the two datasets and used the "sva" R package to remove batch effects for DEG analysis that may cause some deviation considering the use of different GeneArray scanners between GSE53408 and GSE113439. In this study, we analysed GSE53408 and GSE113439 and merged the DEGs of the two datasets, which may be more reasonable. In addition, one CTEPH sample in GSE113439 was not selected in our study because we only paid attention to the WHO group 1 PH. e DEGs associated with GO BP terms mainly consisted of mitotic nuclear division, organelle fission, chromosome segregation, nuclear division, and sister chromatid segregation, which all participated in the process of cell division and proliferation. In PAH progression, excessive proliferation of pulmonary artery smooth muscle cells plays an important role, which causes vascular medial thickening and pulmonary vascular remodeling [20]. DEGs associated with GO MF terms such as ATPase activity, helicase activity, DNA-dependent ATPase activity, catalytic activity acting on DNA, and DNA helicase activity indicated that DNA replication was active, which was consistent with the results of BP. e enriched KEGG pathways of DEGs such as vascular smooth muscle contraction, focal adhesion, and regulation of the actin cytoskeleton may be associated with the progression of PAH. As we all know, pulmonary vascular smooth muscle contraction is a critical pathological mechanism of PAH, especially in the state induced by hypoxia. Hypoxia promoted pulmonary vasoconstriction by interacting with a variety of protein kinases, calcium channels, and potassium channels [21]. Blocking these signal transduction pathways may provide novel therapeutic approaches for PAH depending on further research. e focal adhesion pathway was also identified in PAH by Zhao and Austin's study [22]. Focal adhesion dynamics is regulated by focal adhesion kinase (FAK) family signaling via integrins [23]. Several studies have demonstrated the association of FAK with vascular diseases including PAH [23,24]. Jia et al. [24] found that activation of FAK could stimulate pulmonary arterial smooth muscle cell proliferation in vitro. Paulin et al. [25] confirmed that FAK inhibitor improved hemodynamics, vascular remodeling, and right ventricular hypertrophy in monocrotaline induced PAH rat model. Based on the above research, FAK inhibition may open a new therapeutic way for PAH. Finally, regulation of the actin cytoskeleton was found to be involved in the process of smooth muscle cell contraction, proliferation, and migration [26]. Abelson tyrosine kinase (c-Abl) could modulate restructuring of actin cytoskeletal [27]. ere was some research [28,29] suggesting that c-Abl inhibitor imatinib attenuated the symptoms of patients with PAH. normal_15  IMPA1  ZNF33A  LYSMD3  ZNF654  ODF2L  ZNF277  USP8  BLZF1  EEA1  DLD  GALNT1  DNAJC21  COPB1  TPR  OPA1  CLOCK  ZNF28  ITPR2  KIF5B  USO1  CDC5L  UPF2  R3HCC1L  NBN  RPAP3  SMC4  AP3B1  RAD50  ATP6V1C1  IDE  WRN  IBTK  IDH1  BRCC3  VRK2  ZNF91  CEP290  SMC3  SMC6  LARP7  ZC3H13  ROCK1  HOOK3  AZI2  THOC2  USP34  PCM1  OSBPL8  BRWD3  STAG2  GCC2  BOD1L1  CEP350  DIAPH2  ROCK2  GOLGB1  PPP1R12A  NSRP1  SLTM  TXLNG  USP15  PNN  EIF5B  DHX36  GOLGA4  TMF1  SBNO1  DNM1L  KRR1  TAX1BP1  SREK1  USP16  RANBP2  SRFBP1  MAPK6  ARMC8  SCYL2  ZC3H15  WDR75  IWS1  MIOS  SOCS4  PHF20L1  STK38L  EPS8  CCDC88A  CCDC59  COPS2  SLC4A1AP  FRS2  GPR146  WFS1  CRIP2  SCN4B  ICAM2  SNORA37  SNORD94  SCARNA4  SNORD74   group_list   DLL4  SNORD74  SCARNA4  SNORA22  FAM167A  CRIP2  PTMS  CACNG4  GPR146  GDPD3  ZFX  JPX  ZNF33A  RPAP3  CWC22  RSRC1  DLD  EEA1  GCC2  ZNF91  PIBF1  ZC3H13  TTC3  LARP7  HOOK3  ROCK1  SMC3  SMC6  GOLGA5  ZNF28  SUPT16H  DNAJC21  NSRP1  CLOCK  EPC2  RABEP1  FRS2  USP8  SLU7  SNX2  PYROXD1  NOL8  SDAD1  DHX36  ZNF146  ESF1  WDR75  WRN  NUCB2  SMC4  DEK  LYRM2  IDH1  GFPT1  LDHB  SLC4A1AP  GALNT1  ATP6V1C1  RSL1D1  LTV1  EPS8  CCDC88A  COPB1  IWS1  STK38L  ITPR2  BLZF1  NBN  KIF5B  USO1  AZI2  SREK1  RANBP2  ROCK2  GOLGB1  SLTM  OSBPL8  TMF1  GOLGA4  TANK  PNN  PPP4R2  CCDC59   CCDC47  BZW1  ZC3H15  EIF3A  TPR  OPA1  EIF5B  PRPF40A  COPS2  USP16  KRR1  SBNO1  TAX1BP1  DNM1L GSM3106345 MPHOSPH10 PBDC1 (b)    Network analysis is a powerful tool that helps to identify hub genes and to discover novel targets for therapy [30]. We constructed a PPI network of proteins encoded by DEGs and screened 10 hub genes according to their connectivity degree: HSP90AA1, CDC5L, MDM2, LRRK2, CFTR, IQGAP1, CAND1, TOP2A, DDX21, and HIF1A. We searched for the relationship between each gene symbol and pulmonary hypertension in PubMed and found that four genes involved in PAH had been studied by previous researchers. Shen et al. [31] reported that MDM2-mediated ubiquitination of ACE2    [32]. In hypoxia-induced PAH mice, deficiency of CFTR signally attenuated right ventricular systolic pressure, pulmonary vessel medial wall thickness, and muscularization [33]. Grigolo et al. [34] found that antitopoisomerase II (TOP2A) autoantibodies were related to systemic sclerosis associated with pulmonary hypertension. Luo et al. [35] reported that the CD146-HIF-1α axis in PASMC derived vascular remodeling and PAH. Modulation of CD146-HIF-1α may be a new antiremodeling therapy strategy. e remaining six hub genes, HSP90AA1, CDC5L, LRRK2, IQGAP1, CAND1, and DDX21, were not found to have a direct connection with PAH previously. HSP90AA1, as a molecular chaperone, assisted the assembly of intracellular molecules and protein folding, and participated in multiple processes of fibroblast activation [36,37]. Upregulation of CDC5L could promote cell cycle progression and proliferation [38]. Hongge et al. [39] found that LRRK2 was expressed in endothelial cells and regulated the procedure of monocyte adhesion, which was involved in vascular inflammation and immune response. IQGAP1, as a scaffold protein, maintained the integrity of the endothelium barrier [40]. Gharib et al. [41] discovered that IQGAP1 was a novel candidate biomarker and played a crucial role in hypoxia-induced pulmonary hypertension and vascular remodeling by global gene annotation analysis. ere are several limitations in our study. At first, microarray technology can only detect known genes, and due to discrepancies in chip probes designed by different companies, the conclusions obtained above might vary. en, considering the relatively small sample size in the two public datasets (48 samples in total), there may be some bias on screened DEGs. Finally, all of our results were based on the forecasting of bioinformatic analysis, which should be further confirmed by in vitro and in vivo experiments.

Conclusion
In summary, we analysed the DEGs between patients and healthy individuals aiming to assist the understanding of the pathogenesis of PAH. Vascular smooth muscle contraction, focal adhesion, and regulation of the actin cytoskeleton pathways were found to be involved in PAH in this study. And ten hub genes, including HSP90AA1, CDC5L, MDM2, LRRK2, CFTR, IQGAP1, CAND1, TOP2A, DDX21, and HIF1A may be new therapeutic targets for patients with PAH to be further validated.

Data Availability
All data relevant to the study are included in the article and are available from the corresponding author on reasonable request.

Disclosure
Chunmei Piao and Guangfa Zhu are the co-first authors.

Conflicts of Interest
e authors have no conflicts of interest to declare.

Authors' Contributions
Chunmei Piao and Guangfa Zhu contributed equally to this work. Rui-Qi Wei, Chunmei Piao, and Guangfa Zhu contributed to the study design, analysis, and writing of the article. Wen-Mei Zhang and Zhe Liang contributed to data analysis. All authors read and approved the manuscript.  Table S1: top 20 upregulated and downregulated DEGs according to average gene expression fold change from two datasets. Table S2: GO enriched analysis of DEGs between PAH and normal tissues (top five terms in BP, MF, and CC of DEGs according to P value).