Identifying Key Biomarkers and Immune Infiltration in Female Patients with Ischemic Stroke Based on Weighted Gene Co-Expression Network Analysis

Stroke is one of the leading causes of death and disability worldwide. Evidence shows that ischemic stroke (IS) accounts for nearly 80 percent of all strokes and that the etiology, risk factors, and prognosis of this disease differ by gender. Female patients may bear a greater burden than male patients. The immune system may play an important role in the pathophysiology of females with IS. Therefore, it is critical to investigate the key biomarkers and immune infiltration of female IS patients to develop effective treatment methods. Herein, we used weighted gene co-expression network analysis (WGCNA) to determine the key modules and core genes in female IS patients using the GSE22255, GSE37587, and GSE16561 datasets from the GEO database. Subsequently, we performed functional enrichment analysis and built a protein-protein interaction (PPI) network. Ten genes were selected as the true central genes for further investigation. After that, we explored the specific molecular and biological functions of these hub genes to gain a better understanding of the underlying pathogenesis of female IS patients. Moreover, the “Cell type Identification by Estimating Relative Subsets of RNA Transcripts (CIBERSORT)” was used to examine the distribution pattern of immune subtypes in female patients with IS and normal controls, revealing a new potential target for clinical treatment of the disease.


Introduction
Stroke is one of the leading causes of death and long-term disability in the world. Every year, it is estimated that 96 million new cases of ischemic stroke and 41 million new cases of hemorrhagic stroke are diagnosed worldwide [1]. Arterial occlusion-related ischemic stroke is a major cause of the majority of strokes, accounting for 87 percent of stroke cases and nearly half of all deaths [2]. Current evidence shows that [3] the focus of IS treatment is on timely blood clot removal and long-term secondary prevention. However, the molecu-lar mechanism of IS remains unknown. In addition, gender differences in stroke patients may influence clinical manifestations, epidemiological features, pathophysiology, prognoses, and outcomes. Therefore, a study of stroke patients with no gender differences may result in biased results [4,5]. Previous research indicates that [6] women had a higher risk of stroke-related death than men, with six out of ten women dying from stroke. This increased risk could be attributed to a variety of factors, one of which is that women live averagely longer, which increases their risk of stroke. Meanwhile, other risk factors such as high blood pressure during pregnancy and certain types of birth control medication increase their overall risk of stroke [7]. As such, we only used data of female IS patients.
Researchers frequently use WGCNA to investigate the complex relationships between genes and phenotype; it converts gene expression data to co-express module, allowing a better understanding of the network signal responsible for phenotypic characteristics [8]. This approach has been widely used in a variety of biological processes whereby it plays a significant role in the comparison of differentially expressed genes and the discovery of co-expression module genes [9]. The WGCNA approach has provided functional interpretation tools in system biology, and it is widely used in stroke-related research [10][11][12]. In this study, we used genes in three GEO datasets from female patients with ischemic stroke and healthy controls to build a co-expression network by WGCNA. In addition, we looked at the relative proportions of 22 different types of immune cells in 72 blood samples from IS patients and 24 blood samples from healthy females. We built co-expression modules based on the expression data of female IS patients and perform integrated bioinformatics analysis of the modules of interest, whose gene expression was specifically co-related to female IS patients and could provide potential therapeutic targets for disease management.

Dataset Information.
The gene expression data used in this study were obtained from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/) [13]. Data processing was divided into three stages. First, the one probe expression matrix files downloaded from the GEO database were normalized and log2 transformed. The platform annotation file was then matched with each probe expression matrix, and only well-annotated probes were retained. In order to ensure the accuracy of included data, we analyzed the average expression values of multiple probes corresponding to a gene. Finally, R package sva, installed from Bioconductor (https://bioconductor.org/), was applied to eliminate the heterogeneity caused by different experimental batches and platforms.

Construction of Co-Expression
Network. The WGCNA method assists in investigating gene set expression. Herein, the WGCNA R package was used at various stages for the construction and module division of different gene networks through the following major steps [14]. The samples were clustered to see if any obvious outliers were present. Next, automatic network construction was used to construct the co-expression network. Hierarchical clustering and a dynamic tree-cutting function detection module were employed. Gene salience (GS) and module membership (MM) were calculated to correlate modules with clinical characteristics. The module with the largest absolute value of Pearson's correlation of module membership (MM) and a p-value <0.05 was defined as the hub module. MM >0.8 and GS >0.2 represented high module connectivity and high clinical significance, respectively. The corresponding module gene information was extracted for further analysis.

Functional Enrichment Analysis.
The intersection genes of target modules were extracted from the network, and the enrichment analysis was performed to further investigate the functions of each module. Gene Ontology (GO) terms and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were considered to be enriched with thresholds of p-value <0.05 and an enriched gene count >2.

Immune Infiltration through CIBERSORT Analysis.
CIBERSORT is a deconvolution algorithm that can analyze any immune cell subtype and accurately quantify the various immune cell components. We looked at analyzed immune infiltration in 72 female patients with ischemic stroke and 24 healthy females. These immune cells include immature B cells, memory B cells, and the other 19 types of immune cells. The percentage of each immune cell in the sample was calculated, with p < 0:05. The R software packages "ggplot 2" and "GGPUBR" were used to compare the levels of immune cell infiltration in female stroke patients and normal female subjects.

Protein-Protein Interaction (PPI) Network Analysis.
Genes were imported into the STRING website (version: 11.0) to explore the mutual relationship between proteins encoded by different genes [15]. We ensured that the lowest interaction score is greater than 0.4, and isolated nodes in the network were removed. The analysis results were output to a TSV format file, and Cytoscape software (version: 3.7.1) was used for details processing and module analysis. Cytohubba is a plug-in that can find closely connected nodes in a complex network based on topology. It can be downloaded from the Cytoscape App Store. We used this plug-in to find the most significant cluster of 10 nodes in a PPI network using the default parameters.
2.6. Animal Experiments. Female SD rats (3-4 months old); mean body weight =220 g) were provided by Shanghai Xipu Bikai experimental animal company (animal license No: SCXK (Shanghai) 2018-0006). Animal experiments were performed following the China legislation on the use and care of laboratory animals and were approved by the Medical Norms and Ethics Committee of Zhejiang Chinese Medical University. Female SD rats were randomly assigned to the normal group and the IS group (n =8 per group).

Model Preparation.
The middle cerebral artery occlusion (MCAO) model was established in 4-month-old female SD rats. First, the rats were anesthetized with 3% pentobarbital and then fixed on an operating table. A midline neck incision was used to expose the common carotid artery (CCA), external carotid artery (ECA), and internal carotid artery (ICA). A 6-0 nylon suture was inserted into the internal carotid artery through the external carotid artery stump and gently advanced to occlude the middle cerebral artery. Sham-operated animals underwent the same surgical operation but with no nylon suture insertion. Following that, the wound was then sutured and disinfected.

2
Neural Plasticity Building the weighted gene co-

Neural Plasticity
All animals received intraperitoneal injection of penicillin (100 U/d) to prevent infection.

Quantitative Reverse Transcription PCR (qRT-PCR).
qRT-PCR was used to confirm the expression of 10 hub genes in peripheral blood. qPCR was performed on the CFX96 Real-Time System (BioRad, USA) using the Fast Start Universal SYBR Green Master kit (TaKaRa Bio Inc., China) according to the manufacturer's protocol. The relative quantification was performed by the ΔΔCT method. Primer sequences are listed in Table 1. 2.9. Statistical Analyses. All statistical analyses between control and experimental groups were completed using the GraphPad Prism8 and data were analyzed with a oneway ANOVA followed by Tukey Kramer tests. The results are presented as mean ± SEM; p < 0:05 denote statistical significance. Figure 1 shows a schematic diagram of the workflow. A co-expression network was built in a sample of female IS patients and a sample of normal females, and several modules of clinical significance were identified. The function of the key modules was then examined to reveal the core differential genes in female stroke patients.

Constructing the Weighted Gene Co-Expression Network.
The GSE22255, GSE37587, and GSE16561 datasets were obtained from the GEO database. A total of 24 normal samples and 72 IS samples were examined. The samples were first clustered, and the obvious outlier samples were eliminated by setting the threshold as illustrated in Figure 2(a). The sample cluster tree included 96 samples. After that, we selected the soft threshold. As illustrated in Figures 2(b) and 2(c), when R^2 > 0:8, the fitting degree was high enough and the mean connectivity was relatively high. Furthermore, we used the WGCNA R software package to construct the gene network and determined the module based on a certain soft threshold. A weighted gene co-expression network was built to split the cluster, and the co-expression modules were divided using dynamic cutting and module merging.

Identification of Clinically Significant Modules.
The correlation analysis of gene expression and disease characteristics was performed by WGCNA, and four-gene expression modules were developed (Figure 3(a)). Next, we linked modules to clinical features and searched for the most important ones. As demonstrated in Figures 3(b) and 3(c), gene expression of the turquoise module was most closely related to the two groups of key factors (normal and IS) (r =0.44). Therefore, we selected the turquoise module for the subsequent analysis.   The GO analyses results demonstrated that the genes were primarily related to the regulation of T cell activation, mononuclear cell differentiation, and neutrophil-mediated immunity in biological processes (BP) (Figure 4(a) and Table 2). Furthermore, gene cell components were primarily enriched in the ribosome and secretory granule lumen. In terms of molecular function (MF), genes were primarily enriched in enzyme inhibitor activity and ribosome structural constituents. Next, we performed functional analysis (KEGG analysis) on the genes in the turquoise module (Figure 4(b) and Table 3). The findings revealed an association between the regulation pathway of the turquoise module with the ribosome, the HIF−1 signaling pathway, and natural killer cell-mediated cytotoxicity.  Figure 5(c)). F Neutrophils and macrophages M0, for example, was 0.48, and T cells CD8 and neutrophils was -0.53, and so on. Moreover, IS females had more monocytes and neutrophils when compared to healthy females ( Figure 6(d)).
3.6. Identification of Hub Genes in the Functional Modules and Protein-Protein Interaction Network Construction. A total of 211 genes were identified from the clinically significant module (turquoise module) of the co-expression network. A PPI network was then constructed using the STRING database ( Figure 6). Finally, cytoHubba was used to screen for and visualize the hub genes (RPS15A, RPS6, RPS28, RPL7, PABPC1, RPL31, RPL14, RPL9, PFDN5, TNF) in the network (Figure 7).

3.7.
Validation of the Hub Genes Using qRT-PCR. qRT-PCR was used to reverify the 10 hub genes to further demonstrate the reliability of the WGCNA results. First, the stroke model was constructed by performing MCAO as described in the methods. Next, the peripheral blood was used for qRT-PCR. The results showed that the expression of the RPS28, RPS6, RPS15A, RPL9, TNF, and RPL31 genes were significantly higher in the IS model, whereas the expression of RPL7, PABPC1, and PFDN5 genes was significantly lower in the IS model (p-value<0.01). However, no statistically significant differences in expression levels were found between the two groups (Figure 8(f)).

Discussion
Ischemic stroke, a neurological disease with a high morbidity and mortality rate, is one of the leading causes of permanent disability in adults [16]. Previous evidence indicates that [17] the etiology, risk factors, and prognosis of this disease all differ by gender. Of note, the risk of IS increases with an increase in the life expectancy of women. Ischemic stroke is also a common female complication during pregnancy and puerperium. Nearly 30 cases per 100000 cases of gestation

13
Neural Plasticity (including all subtypes) show stroke symptoms in the two periods, with puerperae in high-risk groups having a higher incidence of IS [18]. In this view, females may bear a heavier burden than males.
Three dates were used in this study to collect 96 peripheral whole blood samples, 72 IS females and 24 normal females. We built a co-expression module using three datasets from WGCNA and confirmed that the turquoise module was crucial for females with IS (Figures 2 and 3). GO analysis demonstrated that immune response and ribosomes played critical roles in the pathogenesis of females with IS. Furthermore, KEGG analysis revealed that the main pathways of this disease in IS female patients were ribosome, HIF−1 signaling pathway, and natural killer cell-mediated cytotoxicity (Figure 4 and Tables 2 and 3). Ribosomal proteins (RPs) play an important role in the regulation of gene expression and protein synthesis [19]. Elsewhere, an experimental study found that cytoplasmic ribosomes play a role in functional recovery after ischemic stroke [20]. In the present study, the GO and KEGG results revealed that immune response and ribosomes are critical links in the mechanism of female patients with ischemic stroke.
Based on the above findings, we looked into the difference in immune infiltration of 22 immune cell subsets between IS females and healthy females. Mechanistically, the immune system is activated following ischemic stroke. A series of changes occur during immune cell migration to the ischemic area, resulting in either beneficial or detrimental effects on ischemic outcomes [21]. More importantly, certain key immune cells could become a new target for IS prevention or treatment. Our findings demonstrated that neutrophils, NK cells resting, macrophages, and T cells CD8 were the primary immune infiltrating cells.
Previous research has shown [22] that neutrophil infiltration is involved in recruitment of other immune cells. While some studies suggest that neutrophil infiltration may exacerbate ischemic stroke injury, other studies have discovered [23,24] neutrophil involvement in tissue remodeling after stroke. NK cells are large granular lymphocytes of innate immunity that are required for IS immunosurveillance. NK cells regulate cellular immune response by block-ing CD8 + T cell activation [25]. Infiltration of NK cells into the periinfarct area, on the other hand, can hasten neuronal death [26]. Improving NK cell infiltration into the periinfarct area is thus an important therapy for female patients with ischemic stroke that will eventually improve clinical responses. Macrophages are critical regulators of host defense in organisms, and they play critical roles in the repair of the central nervous system. In addition to providing neuroprotection, macrophages are a major source of proinflammatory cytokines [27]. These proinflammatory factors prevent the repair of brain tissue [28]. T cells regulate both innate and adaptive immunity, which may play a role in the pathogenesis of some neurological diseases [29]. Previous evidence shows that [30] activated and infiltrated microglia/macrophages after cerebral ischemia can stimulate activated CD4 + T cells to differentiate into Th1 or Th2 cells, which then produce proinflammatory or anti-inflammatory cytokines to damage or protect the brain. On the other hand, CD8+ cytotoxic T cells cause neuronal death and exacerbated brain injury via cell interaction. Findings from the present study demonstrate that T cells, macrophages, NK cells, and neutrophils may be important immune targets for the treatment of female ischemic stroke patients. The PPI network yielded 10 hub genes of female IS patients, which is consistent with the results of the GO and KEGG analyses.
Ribosomes have a wide range of complex functions. They are made up of several ribosomal proteins (RP), ribosomal RNA (rRNA), and small nucleolar RNA [31]. In terms of functions, ribosomal proteins are involved in a variety of critical biological processes in diseases. RPS15A (ribosomal protein S15A) is a component of the ribosomal 40s subunit that is involved in a series of biological processes, including proliferation, apoptosis, differentiation, and DNA repair [32]. Increasing evidence suggests that [33] RPS15A exerts critical functions in the development and progression of cancer. Ribosomal protein S6 (RPS6), a component of the cell translation system, has been shown to play a role in the progression of 40s ribosome biogenesis [34]. rpS6 phosphorylation is commonly used as a marker of neuronal activity in neuroscience [35]. Recent evidence shows that 14 Neural Plasticity [36] RPS28 regulates MHC Class I peptide generation for immunosurveillance. However, the dysfunction of RPS15A, RPS6, and RPS28 in females with ischemic stroke patients has never been reported. Similarly, the present study found upregulated levels of RPS15A, RPS6, and RPS28 in IS samples (Figures 8(a)-8(c)). Ribosomal protein S7 (RPL7), ribosomal protein S31(RPL31), and ribosomal protein S9 (RPL9) are components of the large (60S) human ribosomal subunit. Overexpression of ribosome genes has been shown [37] to promote the translation and protein biosynthesis of Cardiac Allograft Rejection (AR) and cancer-related cytokines. Furthermore, ribosomes in stroke-induced peripheral immunosuppression may be a potential mechanism of sex disparities in outcome following IS [38]. Researchers implicate autoantigens, including RPL7, RPL31, RPL14, and RPL9, in the regulation of diseases such as systemic lupus erythematosus, rheumatoid arthritis, and systemic sclerosis [37]. Herein, we found higher levels of RPL31and RPL9 in IS samples compared to non-IS samples (Figures 8(e) and 8(g)). The IS model had significantly lower RPL7 genes, but there was no significant difference in RPL14 between the two groups (Figures 8(f) and 8(d)).
Our findings suggest a close association of PABPC1, PFDN5, and TNF with female ischemic stroke patients. PABPC1 (poly (A) binding protein cytoplasmic 1) is an important component of the RNA stabilization protein complex involved in germ cell development and mRNA translational regulation, and they have been linked to cancer [39]. Prefoldin (PFDN) is a co-chaperone protein widely accepted to play important roles in normal neuronal development and maintenance [40]. Mounting evidence had demonstrated that [41]inflammation is the principal cause of the pathology and physiology process of IS. Furthermore, the balance between proinflammatory cytokines and anti-inflammatory cytokines was linked to the progression and prognosis of IS patients [42]. Tumor necrosis factor (TNF) is a neuroinflammation cytokine and a potential target in future stroke therapy among the 10 core genes listed above. TNF regulates the size of ischemic injury. Following an ischemic stroke, TNF levels rise in cerebrospinal fluid (CSF) and blood [43]. This is consistent with our research. Additionally, furthermore, there is preliminary evidence that [44] targeting TNF may become a therapeutic approach in ischemic stroke. In addition, previous research has shown [45] a close association of ribosomes with immune cells. The 10 core genes identified in our present research were mostly ribosomal proteins. Therefore, we hypothesize that these genes are linked to T cells, macrophages, NK cells, and neutrophils in female ischemic stroke patients.
In conclusion, this study, for the first time, used microarray samples from IS females for WGCNA. We validated four immune-related gene expression modules and 10 central genes. The findings may be useful in further elucidating the pathogenesis of IS in female patients. Furthermore, we hypothesize that neutrophils and monocytes may play an important role in the disease progression of female IS patients. These findings could point to important biological targets for drug screening and drug design in IS females.

Data Availability
The data used to support the findings of this study are included within the article.