Identification of the Potential Molecular Mechanism of TGFBI Gene in Persistent Atrial Fibrillation

Background Transforming growth factor beta-induced protein (TGFBI, encoded by TGFBI gene), is an extracellular matrix protein, widely expressed in variety of tissues. It binds to collagens type I, II, and IV and plays important roles in the interactions of cell with cell, collagen, and matrix. It has been reported to be associated with myocardial fibrosis, and the latter is an important pathophysiologyical basis of atrial fibrillation (AF). However, the mechanism of TGFBI in AF remains unclear. We aimed to detect the potential mechanism of TGFBI in AF via bioinformatics analysis. Methods The microarray dataset of GSE115574 was examined to detect the genes coexpressed with TGFBI from 14 left atrial tissue samples of AF patients. TGFBI coexpression genes were then screened using the R package. Using online analytical tools, we determined the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, Gene Ontology (GO) annotation, and protein-protein interaction (PPI) network of TGFBI and its coexpression genes. The modules and hub genes of the PPI-network were then identified. Another dataset, GSE79768 was examined to verify the hub genes. DrugBank was used to detect the potential target drugs. Results In GSE115574 dataset, a total of 1818 coexpression genes (769 positive and 1049 negative) were identified, enriched in 120 biological processes (BP), 38 cellular components (CC), 36 molecular functions (MF), and 39 KEGG pathways. A PPI-network with average 12.2-degree nodes was constructed. The genes clustered in the top module constructed from this network mainly play a role in PI3K-Akt signaling pathway, viral myocarditis, inflammatory bowel disease, and platelet activation. CXCL12, C3, FN1, COL1A2, ACTB, VCAM1, and MMP2 were identified and finally verified as the hub genes, mainly enriched in pathways like leukocyte transendothelial migration, PI3K-Akt signaling pathway, viral myocarditis, rheumatoid arthritis, and platelet activation. Pegcetacoplan, ocriplasmin, and carvedilol were the potential target drugs. Conclusions We used microdataset to identify the potential functions and mechanisms of the TGFBI and its coexpression genes in AF patients. Our findings suggest that CXCL12, C3, FN1, COL1A2, ACTB, VCAM1, and MMP2 may be the hub genes.


Introduction
Atrial fibrillation (AF) is a complex arrhythmia, which makes about 5.2 million people suffering palpitations in the US until 2010 and would increase to 12.1 million in 2030 [1], owning to 25-30% of ischemic stroke [2]. In addition, persistent atrial fibrillation leads to mitral regurgitation and decreases up to 25% of left heart ejection function, aggravating heart failure even results in death [3]. It is estimated that around 6-12 million people will suffer this condi-tion in the US by 2050 and 17.9 million people in Europe by 2060 [4,5]. In China, it is estimated that the lifetime risk of AF is approximately 1 to 5 [6]. Populations of AF bring great economic burden to patients and countries worldwide. Data from previous researches shows that, in 18-45 years old AF patients, the mean cost of AF management in hospitalization was $7924 in 2015 [7]. To make matters worse, the prevalence of AF and the economic burden is still at a high level. To date, AF is considered as a disorder of electrical activity in the atrium, and the drive center probably locates in the left atrium-pulmonary vein junction. Although catheter ablation and balloon cryoablation based on circumpulmonary vein electrical dissection isolation show an exciting effect, but the mechanism of AF remains unclear.
Transforming growth factor beta-induced protein (TGFBI, HGNC: 11771, Ensembl: ENSG00000120708, MIM: 601692), also known as CSD, CDB1, EBMD, and CDGG1, encodes an RGD-containing protein that binds to type I, II, and IV collagens. In an early study, Li et al. [8] reported that the expression of TGFBI can be upregulated by stimulated by TGF-beta, activating the TGF-beta BMP signaling pathway and induced the differentiation of bone marrow stem cells into immature cardiomyocytes. A recent research confirmed that TGFBI is a candidate marker for human cardiac fibroblasts in vivo and in vitro [9]. In addition, Chen et al. [10] reported that TGFBI is a target of microRNA-21, which plays a role in the regulation of fibrosis. In miR-21 knockdown cells, TGFBI was significantly upregulated, which promoted the formation of fibrosis. It is well known that AF is an age-related disease, and atrial fibrosis has emerged as an important pathophysiological contributor in aging, and has been linked to recurrences and complications of AF [11]. Thus, TGFBI is likely to play a role in the process of AF. However, little is known of the clear mechanism of TGFBI in AF.
In the near decades, microarray-sequencing technology has rapidly developed and has significantly promoted the improvement of basic and clinical medicine. The Gene Expression Omnibus (GEO) database is a large repository, integrated with a series of high-throughput microarray and next-generation sequence functional genomic datasets, and is free for global researchers [12]. Up to now, GEO database have helped numerous researchers to identify key mechanisms and hub targets of cardiovascular diseases, tumors, and other diseases [13][14][15]. The regulation of a pathway often involves several genes, and there are coexpression relationships among these genes. In the current study, we aimed to further understand the mechanism of TGFBI in AF patients by detecting the TGFBI and its coexpression genes and their pathways enriched in AF patients.

Study Design and Dataset
Selection. Genes with similar functions or involved in the same pathways often have coexpression relationships. In order to detect the mechanism of TGFBI in AF, we aimed to screen out the genes with coexpression relationship to TGFBI for analysis. Indeed, it was similar to the analysis of the subanalysis of the coexpression of blocks in weighted correlation network analysis (WGCNA) [16]. The difference was that WGCNA screens coexpression from differentially expressed genes or RNAs, but the central idea of our study was to take TGFBI as the central gene and combine its coexpressed genes to detect their biological functions, cellular localization, and enrichment pathways in AF. The cor function in R package can be used to detect the correlation coefficient between two variables (https://stat.ethz.ch/R-manual/R-devel/library/stats/ html/cor.html). The code for the cor function is as follows: cor(x, y = NULL, use = "everything," method = c ("pearson," "kendall," "spearman")). The annotation information of the code was as follows: x: a numeric vector, matrix, or data frame; y: NULL (default) or a vector, matrix, or data frame with compatible dimensions to x. The default is equivalent to y = x (but more efficient); use an optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything," "all.obs," "complete.obs," "na.or.complete," or "pairwise.complete.obs"; method: a character string indicating which correlation coefficient (or covariance) is to be computed. One of "pearson" (default), "kendall," or "spearman" can be abbreviated. In prior study, we used this method to obtain the mechanism of NPPB and its coexpressed genes in different patients with heart failure [17]. Zhang et al. [18] also used cor function to analysed the correlation ship between the methylation levels and expression levels of the differentially methylated proteincoding genes for construction a nomogram survival model of the lung squamous cell carcinoma.
We selected one microarray dataset for analysis, and another for validation. The microarray dataset GSE115574 was retrieved from the GEO database. This dataset contained 14 left atrial tissue samples from persistent AF patients [19]. The expression of all genes in every sample is shown in Figure 1, and no significant outlier samples were found. Therefore, the included samples could be used for further analysis. Another dataset GSE79768, containing 7 left atrial specimens from persistent AF patients, was retrieved from the GEO database for validation. The flowchart is shown in Figure 1. The further analysis were performed via the software RStudio (based on R package, version 3.6.4), on the platform of Windows 10 system (64bit).

Identification of TGFBI Coexpression Genes and
Pathway Enrichment Analyses. A screening of coexpression genes for TGFBI from the samples was performed by the cor function in R (version 3.6.4). In our study, the code was set as: cor(x, y, method = pearson), x and y represent the expression levels of TGFBI and other genes, respectively. The strength of the correlation was represented by the calculated correlation coefficient. Screening criteria were as follows: P < 0:05 and jPearson correlation coefficientj ≥ 0:6. The online database, the Database for Annotation, Visualization, and Integrated Discovery (DAVID, version 6.8), was used for GO and KEGG enrichment analyses [20][21][22]. P value of <0.05 was set as significance. The ggplot2 package was used for visualization of the results in R (version 3.6.4).

2.3.
Integration of the PPI-Network. The STRING (version 10.5) database was used for evaluating the interactions among the coexpression genes, and a combined interaction score of > 0:4 was set as significant [23]. In addition, the top 10 hub genes were identified used Cytoscape plugin cytoHubba (version 0.1) with the degree ratio ranking method. Furthermore, the MCODE and ClueGO apps in Cytoscape were used to identify the modules, namely the GO annotation and KEGG pathway enrichment analyses, respectively, of the PPI-network [24].  Table S1.

Results
In addition, a total of 39 KEGG pathways were identified, such as antigen processing and presentation (hsa04612), cell adhesion molecules (CAMs, hsa04514), viral myocarditis (hsa05416), and PI3K-Akt signaling pathway (hsa04151). The visualization of these KEGG pathways is shown in Figure 2(b), and the whole information of these pathways is shown in Table 1.

PPI-Network Construction and Hub Gene Identification.
As shown in Figure 3(a), the interactions between TGFBI and its coexpression genes were presented by a PPInetwork with 1465 nodes. The average node degree was 12.2, and the PPI enrichment P value was <1.0E-16. This finding was saved in TSV format and then imported into Cytoscape for visualization. With a cutoff criterion of a degree that is >5 and a K-core >5, three clusters were  Figure S1; and it is shown in Figure 3(c) that the top 10 hub genes of this PPI-network were also identified (CXCL12, C3, FN1, COL1A2, ACTB, VCAM1, MMP2, VWF, BMP4, and CD44), with the degree ratio ranking method.
We selected the first cluster, which is descripted in Figure 3(b) for GO and KEGG pathway analyses and found that the coexpression genes in this cluster enriched in 20 BPs, such as collagen catabolic process (GO:0030574), colla-gen fibril organization (GO:0030199), and endodermal cell differentiation (GO:0035987); 18CCs, such as collagen trimer (GO:0005581), plasma membrane (GO:0005886), and platelet alpha granule lumen (GO:0031093); 5MFs, such as collagen binding (GO:0005518), and platelet-derived growth factor binding (GO:0048407), and 47 KEGG pathways like viral myocarditis (hsa05416), PI3K-Akt signaling pathway (hsa04151), rheumatoid arthritis (hsa05323), and Inflammatory bowel disease (IBD, hsa05321). The visualization of the      Figure 4, and the whole information of them is shown in Table S2. 3.3. Verification of the Hub Gene. The correlations between the hub genes and TGFBI were verified in GSE79768 dataset. We used cor analysis to detect the correlation value between the coexpression gene and TGFBI in GSE79768. As shown in Table 2, except for VWF, BMP4, and CD44, the correlations between TGFBI with CXCL12, C3, FN1, COL1A2, ACTB, VCAM1, and MMP2 were consistent with the results in GSE115574 dataset as they were all positively correlative and the P value <0.05. Figure 5, the verified hub genes were enriched in different and/or same KEGG pathways. CXCL12, ACTB, and MMP2 were enriched in leukocyte transendothelial migration (hsa04670). FN1 and COL1A2 were both enriched in PI3K-Akt signaling pathway (hsa04151), amoebiasis (hsa05146), and focal adhesion (hsa04510). In addition, CXCL12 was enriched in rheumatoid arthritis (hsa05323), and C3 was enriched in leishmaniasis (hsa05140).

Potential Drugs Targeted by the Verified Hub Genes.
To detect the potential drugs of the hub genes, we used the    Computational and Mathematical Methods in Medicine online database DrugBank (http://www.drugbank.ca) to identify the drug that targeted by the verified hub gene. DrugBank is an online and free-access database, integrating the mechanisms, targets, and interactions of the drugs [25]. We selected the genes, which have a complete record of actions and have gotten the approval for presentation in Table 3.

Discussion
The rapid development of sequencing technology has helped researchers to gain a deeper understanding of several comprehensive diseases, such as cardiovascular diseases, tumors, and autoimmune diseases. The mechanism of AF remains not well clear. In the current study, we identified the poten-tial mechanism of TGFBI and its coexpression genes in AF patients, and verified the hub genes, hoping to provide reference for the further study of AF.
It is descripted in Figure 6 that TGFBI is located in 5q31.1, and it expresses in several organs like heart, liver, colon, urinary bladder, and so on (https://www.ncbi.nlm .nih.gov/gene/7045). In early studies, the research of TGFBI mainly focused on corneal dystrophy as it is a primary disease-causing gene of corneal dystrophy, leading to protein deposits on the cornea then cause blindness [26,27]. In addition, it is also a star gene owing to its important effect on the outcome of cancer and the sensitivity to chemotherapeutic drugs. It affects the progress of tumor main because it would promote cell proliferation, migration, and change the microenvironment [28,29]. However, the function of  Computational and Mathematical Methods in Medicine TGFBI in heart remains unclear. An early study suggested that in diseased heart, the expression of TGFBI mRNA induced [30]. Similarly, Schwanekamp et al. [31] reported that the expression of TGFBI in the heart increased after injury. What is more, in a plasma proteome profiling, TGFBI has a potential role in extracellular matrix remodeling in fibrosis [32]. A recent study further confirmed that TGFBI and ADAM19 were associated with the TGF-β1 pathway and cardiac fibrosis [33]. Thus, TGFBI probably plays a role in myocardial fibrosis.
Myocardial fibrosis is a key pathophysiological mechanism of heart failure and arrhythmia. Cardiac fibrosis gets the pathological feature with disorder of cardiac muscle cells and notable increasing of collagen fibers. Disordered myocardium and interwoven collagen fibers lead to the uncoordinated conduction of power induced by systole and diastole, and anisotropy of action potentials, which lead to the decrease of heart function and the risk of arrhythmias [34,35]. As is known, genes with similar functions often show coexpression relationships and then coregulate biological functions. We found that TGFBI and its coexpression genes enriched in several GOs and pathways like collagen fibril organization, PI3K-Akt pathway, and viral myocarditis. PI3K-Akt pathway has been reported regulating myocardial fibrosis. FN1 and COL1A2 genes are both enriched in this pathway. Huang et al. [36] reported that miR-144-3p/ FN1 and miR-9-3p/FN1 pathways may play an important role in myocardial fibrosis. In addition, FN1 takes part in the cardiac endothelial cell dysfunction induced by myofibroblast-derived exosomes. Actually, FN1 has been used as a myocardial fibrosis marker in research [37,38]. The gene ACTB, encoding one of six different actin proteins, which are involved in structure, integrity, cell motility, and intercellular signaling, was enriched in the viral myocarditis pathway. Viral myocarditis is the result of direct damage to the myocardium by virus and indirect damage to the myocardium by immune response. These damage leads to edema and necrosis of the myocardial cell and proliferation and fibrosis of interstitial cell [39]. Thus, these genes may play a role in the regulation of myocardial fibrosis and then increase the risk of arrhythmia.
Recent studies gave increasing suggestion of the association of immune-related diseases (IRD) and AF. A nationwide population-based study with 37,696 patients with IBD showed that patients with IBD got a 36% (95% confidence interval = 20%-54%) higher risk of AF than controls [40]. Similarly, patients with systemic lupus erythematosus (SLE) have also been reported to get a higher risk of AF compared to controls (Hazard ratio = 2:84, 95% CI = 2:50 -3.23) [41]. When it comes to the systemic sclerosis (SSc), they may get a higher risk of AF at 1.75 times of the controls (95% CI = 1:51-2.04), what is more, SSc could affect the heart and then lead to myocardial fibrosis, which would promote the formation of AF [42,43]. Rheumatoid arthritis (RA), another autoimmune disease, has been suggested to increase the risk of AF [44]. In an early study, researchers reported that systemic inflammatory can lead to epicardial adipose tissue expansion and inflammation, and then cause the enlarger, fibrotic, and noncompliant of the left atrium, finally results in AF [45]. IL-6 is a crucial prerequisite for fibrosis of cardiac myocytes, when it causes the decrease expression of Cx40 and Cx43, it is strongly correlated with the high expression of collagen fibrin I and collagen fibrin III via the pSTAT3 pathway [46]. The expression of IL-6 increases in several IRDs, and it may be the reason that it could be suggested as a biomarker of AF [47][48][49]. In this study, we found that the hub gene C3 was enriched in SLE, and CXCL12 was enriched in rheumatoid arthritis. The expression of these genes may affect the activation of the autoimmune system, and then promotes the remodeling of the atrium, which leads to AF.
Targeted drugs of the hub genes were also identified in this study. The complement system is an important part of the human immune system and is involved in the inflammatory response. Pegcetacopla, a PEGylated peptide targeting C3, is an inhibitor of hemolysis used in clinics. It can improve hemoglobin, clinical outcome, and hematologic outcome via effect control of intravascular hemolysis as well as extravascular hemolysis [50]. C3 has also been reported associated with age and hypertension, which is known as the risk of AF. In addition, dipeptidyl peptidase III can prevent heart from inflammatory cell infiltration and fibrosis via cleavage of a peptide that is a part of C3 [51,52]. However, little is known of the use of pegcetacopla in the treatment of heart disease. Carvedilol is one of the calcium channel blockers, and it could alter circulating miR-1 and miR-214, which are suggested in the processes of myocyte hypertrophy and apoptosis and release myocardial fibrosis [53,54]. In addition, it can provide prevention of chemotherapy-related cardiotoxicity [55]. Thus, carvedilol may be an ideal antifibrosis target drug. Vitreomacular adhesion (VMA) is an eye disease, and it always leads to visual impairment, even loss of vision when it gets worse with vitreomacular traction (VMT). Pharmacological vitreolysis was an alternative treatment for VMA. Recently, a new drug named ocriplasmin, a recombinant DNA molecule based on autologous plasmin, was developed to catalyze the breakdown of the bond of laminin and fibronectin,  [56]. The blood clots caused by AF are usually venous thrombosis, and fibrinolysis enzymes and clotting factor inhibitors are common treatments. However, whether ocriplasmin would be an effect component of anticoagulant or thrombolytic therapy still needs further researches to detect.
With the progress of technology and the continuous evolution of algorithms, our understanding of complex diseases is further deepened. Several related works provide powerful boost to medical research. Su et al. [57] reported a framework using horizontal and vertical multiverse optimization, providing an effective segmentation method for diagnosing Coronavirus Disease 2019 . Similarly, Qi et al. [58] reported a directional mutation and crossover boosted ant colony optimization for diagnosing COVID-19. In addition, saliency detection network with neutrosophic enhancement have been reported to be an effective approach to colorectal polyp region extraction [59]. Bioinformatics has developed rapidly in the last decade, and it was not only owing to the development of sequencing technology but also to the update of algorithms. No matter what the specific mechanism is, at present, the cause of AF is basically considered as cardiac fibrosis induced by atrial remodel. In the current study, we detected the potential mechanism of TGFBI and its coexpression genes of AF and found that they may induce cardiac fibrosis via several pathways. The identified hub genes may be potential targets for the interference of AF.
Based on our finding, our future work was designed as follows. Firstly, we aim to cooperate with surgeons, collecting several LAAs of AF patients, and verify the correlation of the expression levels of TGFBI and the identified hub genes via reverse transcription-polymerase chain reaction (RT-qPCR). Secondly, measuring the expression of proteins in the downstream of the pathway they enriched in via knocking down or overexpressing these genes to verify their function. Thirdly,   Clove oil Approved Antagonist after the intervention of the extracted cardiomyocytes from the LAAs with the predicted targeted drugs, the effect on the expression of these genes and proteins was evaluated by RT-qPCR and western blot. In addition, we aimed to construct an AF animal model with pacemaker via persistent rapid atrial stimulation to repeat the experiment. Finally, several representative computational intelligence algorithms, like monarch butterfly optimization (MBO) [60], earthworm optimization algorithm (EWA) [61], elephant herding optimization (EHO) [62], slime mould algorithm (SMA) [63], hunger games search (HGS) [64], RUNge Kutta optimizer (RUN) [65], colony predation algorithm (CPA) [66], and Harris hawks optimization (HHO) [67,68] could be used to optimized our analysis.

Conclusions
In the current study, we used TGFBI and its coexpression genes to identify the potential molecular mechanisms of AF. These findings may help elucidate the functions of these genes in AF and provide a target of AF management. However, there were several limitations of the current study. First, the sample-size is not large as other randomized controlled study. Second, the results of this study were mainly based on bioinformatic analysis, and further experiments are needed to confirm both in vivo and in vitro. Finally, several potential factors that participant in the formation of AF may not be included. Fortunately, with the development of algorithms, several representative computational intelligence algorithms like MBO, EWA, EHO, SMA, HGS, RUN, CPA, and HHO may be used to solve the problems. In the future, the results of our study will be further verified by more optimized algorithms, expanded samples, and experiments both in vivo and in vitro.

AF:
Atrial fibrillation TGFBI: Transforming growth factor beta induced GEO: Gene Expression Omnibus GO: Gene Ontology KEGG: Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses PPI: Protein-protein interaction BP: Biological processes CC: Cellular components MF: Molecular functions IBD: Inflammatory bowel disease SLE: Systemic lupus erythematosus IRD: Immune-related diseases SSc: Systemic sclerosis VMA: Vitreomacular adhesion.

Data Availability
The datasets used and/or analyzed during the current study are available from the Gene Expression Omnibus repository (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? GEO accession: GSE115574 and GSE79768).

Conflicts of Interest
The authors declare that they have no conflicts of interest.

Authors' Contributions
Yao-Zong Guan and Hao Liu conceived the study, participated in the design, performed the statistical analyses, and drafted the manuscript. Huan-Jie Huang and Dong-Yan Liang conceived the study, participated in the design, and  Table S1: GO annotation of TGFBI and coexpression genes enriched in AF. Table S2: GO annotation and KEGG pathway TGFBI and coexpression genes enriched in cluster 1. Figure S1: the clusters constructed from the PPI-network.