Identification of Key CircRNAs Related to Pulmonary Tuberculosis Based on Bioinformatics Analysis

Pulmonary tuberculosis (TB) is a chronic infectious disease that is caused by respiratory infections, principally Mycobacterium tuberculosis. Increasingly, studies have shown that circular (circ)RNAs play regulatory roles in different diseases through different mechanisms. However, their roles and potential regulatory mechanisms in pulmonary TB remain unclear. In this study, we analyzed circRNA sequencing data from adjacent normal and diseased tissues from pulmonary TB patients and analyzed the differentially expressed genes. We then constructed machine learning models and used single-factor analysis to identify hub circRNAs. We downloaded the pulmonary TB micro (mi)RNA (GSE29190) and mRNA (GSE83456) gene expression datasets from the Gene Expression Omnibus database and performed differential expression analysis to determine the differentially expressed miRNAs and mRNAs. We also constructed a circRNA–miRNA–mRNA interaction network using Cytoscape. Gene Ontology enrichment analysis and Kyoto Encyclopedia of Genes and Genomes pathway analysis were used to predict the biological functions of the identified RNAs and determine hub genes. Then, the STRING database and cytoHubba were used to construct protein-protein interaction networks. The results showed 125 differentially expressed circRNAs in the adjacent normal and diseased tissues of pulmonary TB patients. Among them, we identified three hub genes associated with the development of pulmonary TB: hsa_circ_0007919 (upregulated), hsa_circ_0002419 (downregulated), and hsa_circ_0005521 (downregulated). Through further screening, we determined 16 mRNAs of potential downstream genes for hsa-miR-409-5p and hsa_circ_0005521 and established an interaction network. This network may have important roles in the occurrence and development of pulmonary TB. We constructed a model with 100% prediction accuracy by machine learning and single-factor analysis. We constructed a protein-protein interaction network among the top 50 hub mRNAs, with FBXW7 scoring the highest and SOCS3 the second highest. These results may provide a new reference for the identification of candidate markers for the early diagnosis and treatment of pulmonary TB.


Introduction
Pulmonary tuberculosis (TB) is a contagious disease that is caused by the slow-growing Mycobacterium tuberculosis (MTB), which is spread by aerosols [1]. The primary pathological feature of pulmonary TB is necrotizing granuloma infiltration [2]. TB infections mainly occur in the lungs but may also cause problems in other parts of the body [3]. Pulmonary TB has a long incubation period. When MTB enters the lungs, it usually forms capsules (granulomas), which are harmless in the lungs and cause latent pulmonary TB infection [4][5][6]. However, these capsules may be activated in the future and develop into active pulmonary TB [7]. Although pulmonary TB can be cured, it remains one of the ten leading causes of death worldwide, particularly in developing countries [8,9]. Latent pulmonary TB infections are asymptomatic, making this difficult to detect and treat on time [10]. Therefore, there is great interest in finding effective biomarkers for the early diagnosis and/or treatment of latent pulmonary TB [11]. Circular (circ)RNAs are a type of noncoding RNA molecule that lack a 5 ′ cap and 3 ′ tail and form a single-stranded circular structure by covalent bonds [12]. CircRNAs are formed by reverse splicing and are mainly divided into allexon circRNAs, circRNAs composed of introns and exons, and lasso-shaped circRNAs composed of introns [13,14]. Cir-cRNAs are abundant and present in bodily fluids such as blood, saliva, cerebrospinal fluid, and urine [15]. Compared with linear RNAs, circRNAs are more stable and more resistant to degradation by RNase R. Therefore, circRNAs can be selectively enriched during sample processing [16,17]. Compared with other types of RNAs, circRNAs are more suitable as candidate molecules for molecular diagnostic biomarkers and could be effective therapeutic targets [18].
Additionally, circRNAs can be used as micro (mi)RNA sponges to bind miRNAs through miRNA response elements, releasing the degradation or translational inhibition of downstream target mRNAs by disrupting the competitive endogenous (ce)RNA mechanism [19]. CircRNAs bind to and regulate the functions of RNA-binding protein (RBP), changing the stability of mRNAs. CircRNAs can also directly encode proteins [20][21][22]. According to a report by Li et al., who analyzed the expression profile of circRNAs in nonsmall cell lung cancer, a circRNA named CircNDUFB2 could inhibit nonsmall cell lung cancer progression by decreasing the stability of insulin-like growth factor 2 mRNA-binding proteins (IGF2BPs) and activating antitumor immunity [23]. Wang et al. studied the expression profile of circRNAs in the lung tissue of mice infected with influenza A virus (IAV) using high-throughput sequencing and RNA sequencing. They identified many differentially expressed circRNAs between the IAV group and the control group. Bioinformatic analysis revealed a potential role of cir-cRNAs in the IAV-induced lung injury model [24]. Zhou et al. conducted RNA sequencing and correlation analysis on liver biopsies of untreated patients with chronic hepatitis B and healthy controls. They found that there was a strong positive correlation between hsa_circ_0000650 and TGFB2 expression and a negative correlation between hsa_circ_ 0000650 and miR-6873-3p expression. They confirmed that there were different circRNAs and circRNA/miRNA interactions in patients with chronic hepatitis B [25].
To summarize, many studies have shown that circRNAs are associated with the occurrence and development of a variety of respiratory and infectious diseases [3,[26][27][28]. Currently, there are few studies on circular RNAs and pulmonary TB. Pulmonary TB-related circRNA research could be important for the early diagnosis and treatment of pulmonary TB [29]. In this study, we sought to identify differentially expressed circRNAs and their downstream mRNAs using tissues from pulmonary TB patients to identify hub genes involved in pulmonary TB infection and provide references for the early diagnosis and treatment of pulmonary TB.

Materials and Methods
2.1. Data Collection. The Shanghai (Fudan University) Public Health Clinical Center provided all patient data used in this study. They performed positron emission tomography (PET) scans on the lungs of nine pulmonary TB patients and whole transcriptome sequencing of lung tissues with high PET (high metabolic activity) and normal PET (low metabolic activity) uptake. For raw sequencing data, please refer to PRJNA795290 (4 * 2 samples) from the Sequence Read Archive (SRA) database and GSE158767 (5 * 2 samples) from the Gene Expression Omnibus (GEO) database. All patients signed an informed consent form before obtaining tissue samples. RNA extraction, library construction, and sequencing were performed in the same way as previously described [30]. In this paper, we analyzed 8286 differentially expressed genes (DEGs) by bioinformatics.

Differential Expression Analysis of CircRNAs.
First, the raw data were corrected, filtered, and normalized using the R package. Analyses of differentially expressed RNAs included fold change (FC), P value, and false discovery rate (FDR). The criteria for selecting differentially expressed cir-cRNAs were |log 2FC | >1 and P value < 0.05. Scatter plots, volcano plots, and heat maps were used to show the differently expressed circRNAs.

Feature Selection.
In machine learning applications, the number of features is often large, and there may be irrelevant features. With more features, the resulting model will be more complex. Feature selection can eliminate irrelevant or redundant features, reduce the number of features, improve model accuracy, reduce the running time, and simplify the model. We identified hub circRNAs using eight feature screening methods: CfsSubsetEval-BestFirst, PrincipalComponents-Ranker-T, CorrelationAttributeEval-Ranker-T, GainRatioAttributeEval-Ranker-T, InfoGainAttri-buteEval-Ranker-T, OneRAttributeEval-Ranker-T, ReliefFAt-tributeEval-Ranker-T, and SymmetricalUncertAttributeEval-Ranker-T.

2.4.
Building the Machine Learning Models. We used 13 algorithms to build the models: ZeroR, Logistic, SMO, IBK, AttributeSelectedClassifier, OneR, DecisionStump, Hoeffding-Tree, J48, LMT, RandomForest, RandomTree, and REPTree. Comparisons of the average accuracy of the machine learning models established by different feature selection methods were then conducted. We also performed statistical and univariate analyses of the number of occurrences of circRNAs in various models to identify hub circRNAs.

Preliminary Screening of Differentially Expressed
Genes in Pulmonary TB. We initially conducted a differential expression analysis of pulmonary TB patient sequencing data. The larger the difference in DEGs between normal  and diseased tissues, the stronger the association between the disease state and the DEG. A scatter plot was used to show gene expression. The genes clustered towards the middle show less differential expression, while those dispersed towards the sides show larger differential expression. Points on both sides are more likely to be disease hub genes (Figure 1(a)). The basic conditions for screening DEGs were a statistical significance measure P < 0:05 and the absolute change in differential gene expression (fold change, FC) > 2. There were 125 circRNAs expressed in the adjacent normal and diseased tissues of pulmonary TB patients that satisfied P value < 0.05 and |log 2FC | >1, among which 50 were upregulated and 75 were downregulated. There were more downregulated genes in normal tissues compared with diseased tissues (Figure 1(b)). Among the differentially expressed cir-cRNAs, the top 10 up-and downregulated circRNAs are listed in Table 1. Next, we drew a heat map for cluster analysis on basis of circRNA expression in the different samples (Figure 1(c)). The results showed that there were more upregulated genes in diseased tissues. These conclusions were consistent with the volcano plots.

Construction of the Whole Gene Prediction Models.
These machine learning models were built from the 125 differently expressed circRNAs. The prediction accuracy of the Logistic, SMO, and IBK algorithms were the highest, reaching 100%; the accuracy of the J48 algorithm also reached 89% (Table 2).

Feature Selection.
The pathogenesis of pulmonary TB is complex. Predictive modeling of pulmonary TB hub genes by machine learning was then performed with the 125 differentially expressed circRNAs. First, we performed feature selection to screen the major influencing factors among the 125 circRNAs; the hub circRNAs were determined by eight feature selection methods. We constructed a machine learning model on the basis of the 125 circRNAs and feature-screened circRNAs. We built models separately using 13 different algorithms: ZeroR, Logistic, SMO, IBK, Attribu-teSelectedClassifier, OneR, DecisionStump, HoeffdingTree, J48, LMT, RandomForest, RandomTree, and REPTree (Tables S1-S9).
CorrelationAttributeEval-Ranker-T feature screening had four algorithms with a 100% accuracy rate (Logistic, SMO, IBK, and HoeffdingTree). The CfsSubsetEval-BestFirst feature screening method was the next best, and its three algorithms (SMO, IBK, and HoeffdingTree) had a 100% accuracy rate. OneRAttributeEval-Ranker-T feature screening had two algorithms (SMO and IBK) that showed a 100% accuracy rate. CfsSubsetEval-BestFirst feature screening also had two algorithms with a 100% correct rate (Logistic and SMO). Only CorrelationAttributeEval-Ranker-T had a better accuracy rate than the model constructed with 125 circRNAs. Therefore, the CorrelationAttributeEval-Ranker-T feature screening method  BioMed Research International was more suitable for the 125 circRNAs (Table 2 and Figure 2). Among the eight feature screening methods, four reached 100% model correctness. These four methods contained 29 circRNAs, counting their occurrences. There were 14 circRNAs that appeared more than once in these four methods. Among them, hsa_circ_0007919, chr10:15590454|15628663, and hsa_circ_0002419 appeared in all four methods. This suggested that these genes may be closely related to the occurrence and development of pulmonary TB (Table 3).

Univariate Analysis and Confirmation of the Key CircRNAs
3.3.1. Univariate Analysis. To further clarify the effect of circRNAs on pulmonary TB, we built machine learning models for 14 circRNAs and performed univariate analysis. We calculated the average of the models with correct rates of over 80%, with random seeds taken from 1 to 10 ( Table 4). The results showed that each circRNA had at least one algorithm with an accuracy greater than 80%. Among them, hsa_circ_0002419 had a strong correlation with pulmonary TB, and the accuracy of the four algorithms was 94%. Hsa_circ_0005521 had a strong correlation, and the correct rate was 89%.

Confirmation of the Key CircRNAs.
To identify hub genes in the development of pulmonary TB, we made Venn diagrams that included 20 DEGs and 14 univariate analysis genes using Jvenn. We uncovered three hub circRNAs: hsa_ circ_0007919 (upregulated), hsa_circ_0002419 (downregulated), and hsa_circ_0005521 (downregulated). We identified these three circRNAs as hub genes in the development of pulmonary TB (Figure 3 and Table 1).

The Downstream Genes of the Identified CircRNAs.
The most interesting function of circRNAs is to serve as molecular sponges for miRNAs by binding and influencing miRNA expression. The potential miRNAs that interact with the hub circRNAs were predicted using the ENCORI database. The downstream mRNAs of the miRNAs were predicted by the miRTarBase database (Table 5). We used the miRNA Gene Chip GSE29190 of pulmonary TB from the GEO database and screened out 47 differentially expressed miRNAs (P < 0:05). We screened out potential miRNAs and differentially expressed miRNAs by Jvenn, which identified hsa-miR-409-5p (Figure 4(a)). The upstream molecule of hsa-miR-409-5p is hsa_circ_0005521, and the total number of potential downstream mRNAs is 31. We used the mRNA Gene Chip GSE83456 of pulmonary TB from the GEO database to screen out 9272 differentially expressed mRNAs (P < 0:05). We screened out 16 potential downstream mRNAs for hsa-miR-409-5p, and their differential expression was analyzed using Jvenn. These mRNAs included RPS4X, NBEA, GSK3B, RGL2, ZNF512B, SOD2, ZNF12, KPNA3, AKAP1, RPRD2, ACAP2, RSU1, FDXR, EIF4EBP2, SRRD, and ZBTB34 (Figure 4(b)). Finally, the circRNA-miRNA-mRNA interaction network was constructed using Cytoscape (Figure 4(c)).

Biological Function Analysis of the Identified Genes.
To study the effects of the identified mRNAs on pulmonary TB, we performed GO and KEGG analysis of the downstream mRNAs of the three hub circRNAs using the DAVID database. In BP analysis, DNA template, transcription, and positive regulation of transcription from RNA polymerase II promoter were the three terms with the highest enrichment ( Figure 5(a)). Regarding CC, genes were enriched in the nucleus, cytoplasm, cytosol, nuclear plasma, and other components ( Figure 5(b)). In MF analysis, protein binding, metal ion binding, and poly(A) RNA binding were the three terms with the highest enrichment ( Figure 5(c)). Additionally, KEGG pathway analysis showed that the genes were enriched in the cancer pathway, PI3K-Akt signaling pathway, proteoglycan in cancer, Ras signaling pathway, HTLV-I infection, and other pathways ( Figure 5(d)). Collectively, these results show that the downstream mRNAs of the three key circRNAs play a major immunomodulatory role in the development of pulmonary TB.

PPI Network Construction.
In this study, we have uncovered potential mRNAs downstream of three hub circRNAs. Among them, there were 1429 downstream mRNAs of hsa_circ_0007919, 2201 downstream mRNAs of hsa_circ_ 0002419, and 5401 downstream mRNAs of hsa_circ_ 0005521. We hypothesized that mRNAs simultaneously regulated by the three circRNAs may play important roles in the development of pulmonary TB. Therefore, we screened out 372 overlapping mRNAs using Jvenn and conducted further analysis (Figure 6(a)). To further find the hub genes among the 372 mRNAs in the network, we used the STRING database and Cytoscape. We found the top 50 hub genes according to the cytoHubba plugin in Cytoscape. We then constructed the PPI network on the basis of these top 50 genes (Table 6 and Figure 6(b)).

Discussion
Current laboratory tests for pulmonary TB include immunological-and molecular-based assays. Traditional smear staining has the disadvantages of a low positive rate and a long culture period [31]. With the further development of biomedical technologies, exosomal microRNA, real-time fluorescence quantitative PCR, and other techniques have been applied, but there is still a lack of rapid and reliable detection techniques for pulmonary TB [32,33]. CircRNAs have several advantages as a biomarker, and their roles in the pathological regulation of pulmonary TB have received increasing attention in recent years [34]. Many studies have shown that circRNA expression is altered in the tissues and peripheral blood of pulmonary TB patients [35]. In this study, we uncovered three hub circRNAs by using differential gene expression, machine learning, and univariate analysis, including hsa_circ_0007919 (upregulated in diseased tissues) and hsa_circ_0002419 and hsa_circ_0005521 (downregulated in diseased tissues). To date, there have been no reports about hsa-circ-0002419 or hsa_circ_0005521. However, it has been found that potential downstream genes of hsa-miR-409-5p and hsa_circ_0005521 may also interact with hsa_circ_0028883, which has potential diagnostic value for pulmonary TB [36]. Wang et al. showed that hsa_circ_ 0007919 plays a role in ulcerative colitis by binding to hsa-miR-138 and hsa-let-7a to regulate the expression of VIPR1    BioMed Research International and EPC1, respectively [37]. Pulmonary TB also includes chronic inflammation, and the two diseases may share common pathways in terms of inflammation. We further studied the effects of potential genes downstream of the hub circRNAs on pulmonary TB. According to GO and KEGG analysis, the genes were enriched in the PI3K-Akt signaling pathway, the proteoglycan pathway in cancer, and Ras signaling. Yang et al. showed that the inflammatory response of macrophage-like cells to MTB can be attenuated by modulating the PI3K/Akt/mTOR signaling pathway [38]. Other studies have shown that PI3K/ AKT/mTOR signaling pathways are suppressed in patients with active pulmonary TB [39]. Gill et al. explained that the mechanism by which proteoglycans modulate inflammatory responses in the lung and showed that they may be part of a new treatment for inflammatory lung diseases and lung infections [40]. It has also been shown that hsa_circRNA_ 103571, which is differently expressed in the plasma of patients with active pulmonary TB, is also involved in the Ras pathway [41].
We uncovered 50 hub genes and then constructed a PPI network using the STRING database and cytoHubba. FBXW7 scored the highest, and SOCS3 was the next highest. We concluded that these genes may be associated with the development of pulmonary TB. Additionally, some of these genes have been reported to be involved in the development of pulmonary TB and other diseases. FBXW7 is an important tumor suppressor. Ni et al. found that miR-92a plays an oncogene role in nonsmall cell lung cancer by regulating FBXW7 [42]. It has been reported that FBXW7 plays a key role in regulating colitis by inducing CCL2 and CCL7 expression in macrophages and promoting the accumulation of pro-inflammatory mononuclear macrophages [43]. Cui et al. found that inactivation of FBXW7 in cancer cells, especially those with wild-type p53, may improve the efficacy of radiotherapy or chemotherapy, and thus improve patient survival [44]. Pulmonary TB is a risk factor for lung cancer, and the probability of developing lung cancer is much higher in pulmonary TB patients than in the general population. Therefore, FBXW7 may play a role in pulmonary TB

Conclusions
In conclusion, we screened three hub circRNAs using the adjacent normal and diseased tissues of pulmonary TB patients. GO and KEGG enrichment were used to identify pathways associated with pulmonary TB. We identified hub genes through a PPI network. This study may provide a reference for finding candidate markers for the early diagnosis of pulmonary TB and provide new directions for possible pulmonary TB therapeutic targets.

Data Availability
The data can be seen in PRJNA795290, GSE158767, GSE29190, and GSE83456 and the Supplementary Materials.

Ethical Approval
All participants agreed to take part in this study; we received informed consent for the publication of data from the participants.