Cytotoxic Effect of Recombinant Mycobacterium tuberculosis CFP-10/ESAT-6 Protein on the Crucial Pathways of WI-38 Cells

To unravel the cytotoxic effect of the recombinant CFP-10/ESAT-6 protein (rCFES) on WI-38 cells, an integrative analysis approach, combining time-course microarray data and annotated pathway databases, was proposed with the emphasis on identifying the potentially crucial pathways. The potentially crucial pathways were selected based on a composite criterion characterizing the average significance and topological properties of important genes. The analysis results suggested that the regulatory effect of rCFES was at least involved in cell proliferation, cell motility, cell survival, and metabolisms of WI-38 cells. The survivability of WI-38 cells, in particular, was significantly decreased to 62% with 12.5 μM rCFES. Furthermore, the focal adhesion pathway was identified as the potentially most-crucial pathway and 58 of 65 important genes in this pathway were downregulated by rCFES treatment. Using qRT-PCR, we have confirmed the changes in the expression levels of LAMA4, PIK3R3, BIRC3, and NFKBIA, suggesting that these proteins may play an essential role in the cytotoxic process in the rCFES-treated WI-38 cells.


Introduction
Mycobacterium tuberculosis is a global infectious disease that has affected one-third of the world population, killing 2-3 million people and causing 7-8 million new infections annually [1]. M. tuberculosis (MTB) is a major cause of human tuberculosis. During the early stages of human tuberculosis, MTB induces an immune response [2] and subsequently leads to the development of lung granulomas consisting of macrophages, T cells, B cells, and fibroblasts [3]. Recent researches reveal that fibroblasts are not only essential in secreting chemokine for modulating inflammatory response to MTB infection and influencing the survival of MTB within macrophages [4] but also involved in the regulation of granuloma formation during MTB infection [4,5]. Despite the potentially vital role of fibroblasts in MTB infection, the detailed MTB-regulated mechanism in fibroblasts, especially its relationship to MTB secreted protein, remains unknown.
Two secreted proteins CFP-10 and ESAT-6, produced by the region of difference 1 (RD1) in MTB, have been identified to play important roles in the pathogenesis of tuberculosis [6][7][8] in primary pulmonary infection. These two proteins have also been shown to be virulence factors with cytotoxic effects on macrophages, lung epithelial cells, and dendritic cells [8][9][10]. Individually, the cytotoxic effect of ESAT-6 protein has been found to evoke apoptosis of macrophages, dendritic cells, and fibroblasts [11]. Recent report reveals that CFP-10 and ESAT-6 choose a stable structure, forming a 1 : 1 heterodimeric complex [12]-the CFP-10/ESAT-6 protein (CFES). It has been shown that CFES elicits immune response in the host organism [13,14]. However, the role and function of CFES in fibroblasts is not clear. Therefore, as our first attempt to unravel the effect of CFES on fibroblasts, an integrative analysis approach combining biological resources and bioinformatics was developed in this study.
Microarray is a biological resource, which has been often used to analyze gene expression profiles in biological experiments. Most analysis tools for microarray data, for example, SAM [15], LPE [16], Bayesian [17], and so forth, have been designed mainly for identification of important genes. Other tools like GenMAPP [18], PharmGKB [19], and KEGG [20] only show the positions of the genes on a known pathway. Although some softwares such as ArrayXpath [21] integrate the pathway resources and provide analysis and visualization tools for deciphering the important genes and pathway structures, no notion of chaining or aggregating regulatory effect in a biological process has been taken into account in identifying the crucial pathways. To characterize the regulation mechanism of recombinant CFP-10/ESAT-6 protein (rCFES) on WI-38 cells taking account of the chaining or aggregating regulatory effect, an integrative analysis approach was proposed in this study. By combining time-course microarray data and annotated pathway databases, a new composite score quantifying the average significance and topological properties of important genes in a pathway was proposed to identify the potentially crucial pathways. Biologically, a crucial pathway in this study is a pathway that is substantially influenced by the rCFES-treatment, the consequence of which is highly related to the observed response in the rCFES-treated WI-38 cells, for example, the increased cell death rate. Nevertheless, since the crucial pathways suggested by any computational analysis approach like the proposed one require further experimental verification, they are thus considered as "potentially crucial pathways". A pathway with a better composite score was considered to be potentially more crucial in an rCFES-treated WI-38 cell in the sense that (i) the important genes in this pathway were more significantly expressed, (ii) this pathway contained a higher percentage of important genes, and (iii) the important genes in this pathway interacted more closely with one another. Based on the composite scores, the pathways with the best composite scores were suggested as the potentially crucial pathways in an rCFES-treated WI-38 cell, which may serve as the basis for further experimental studies on unraveling the cytotoxic effect of rCFES on WI-38 cells.

Expression and Purification of rCFES.
The bacterial vector pET29b-CFES was created by cloning CFP-10 and ESAT-6 from the H37Rv strain of MTB and was transformed into Escherichia coli BL21 (DE3). IPTG was used to express abundant amounts of recombinant CFP-10/ESAT-6 protein (rCFES). Next, we purified recombinant rCFES using affinity chromatography with nickel ion characteristic and dialysis. The Bio-Rad DC Protein Assay Kit was used to measure rCFES concentrations. The purity of rCFES was controlled at a level higher than 95% as assessed by densitometry of 12% SDS-PAGE gels. In addition, MALTI-TOF mass spectroscopy was used to confirm that rCFES was composed of CFP-10 and ESAT-6. More detailed confirmation analysis may be found in Supplemental Data I (http://homepage.ntu.edu.tw/ ∼d91548013/CFES supp data1.pdf).

Cell Survival
Assay. WI-38 cells were seeded at 1.3 × 10 4 cells per well in 96-well plates containing Modified Eagle Medium with 10% FCS. The fibroblasts were arrested at the G0/G1 phase after treatment with serum-free medium for 24 hours, further treated with rCFES at different concentrations, including 0, 0.625, 1.25, 3.125, 6.25, 12.5, and 25 μM, and incubated for 48 hours. The cell viability was measured with Cell Proliferation Assay Kit (Promega). The absorbance was recorded at 490 nm using a 96-well-plated reader. The assay was performed in triplicate on control and rCFEStreated WI-38 cells at different concentrations.

Microarray
Analysis. WI-38 cells were arrested at the G0/G1 phase after treatment with serum-free medium for 24 hours, treated with rCFES (12.5 μM), and incubated for 0, 3,8,16,24,32,40, and 48 hours. Total RNA was extracted with Trizol reagent, according to the manufacturer's protocol (Invitrogen) and with RNeasy Mini Kit (Qiagen). RNA was also extracted from nontreated WI-38 cells as control at the same times. In this study, experiments on control and rCFEStreated cells were performed in triplicate at each time point. Purified RNA was quantified at OD260 with an ND-1000 spectrophotometer (Nanodrop) and qualitated with Agilent Bioanalyzer 2100 (Agilent).
Total RNA was amplified with a Fluorescent Linear Amplification Kit (Agilent) and labelled with Cy3-CTP or Cy5-CTP (CyDye, PerkinElmer). Next, cRNA was hybridized onto human whole genome oligo microarrays (Agilent), according to the manufacturer's protocols. A total of 43 931 probe sets on the arrays were analyzed. After washing and drying by blowing with a nitrogen gun, microarrays were scanned with an Agilent microarray scanner (Agilent) at 535 nm for Cy3 and 625 nm for Cy5. Feature Extraction Software (Agilent) was used to analyze the scanned images and to estimate differential gene expression by calculating statistical confidences. We selected 41 675 probe sets by rankconsistency-filtering using the LOWESS method.

Significance Analysis of Gene Expression.
The following is screening conditions for selecting the matched genes at every time point: rBGSubSignal >32, gBGSubSignal >32, and absolute value log2 ratio >0. The unions of genes passing this filter were used as the input data. Significance analysis of microarray (SAM) [15] was then performed to identify important genes using the false discovery rate (FDR) controlling procedures.
2.6. Pathway Topology Analysis. Pathway topology analysis aimed to identify the potentially crucial pathways involved in the biological process for WI-38 cells treated with rCFES. The potentially crucial pathways were selected in two steps based on a composite criterion characterizing multiple clustering properties of important genes. In the first step, called pathway enrichment analysis, a set of pathways were chosen as the candidates of the potentially crucial pathways, each of which consisted of a significant proportion of genes that were important genes in the pathway. In the second step, these candidates were ranked according to a scoring system combining several descriptors that quantify the average significance and topological properties of important genes.
More specifically, in the first step, important genes were mapped onto the human pathways constructed in the KEGG databases using LocusLink IDs. Supposed that there were T total genes with LocusLink IDs, out of which N genes were mapped to a pathway, say pathway P j . Moreover, supposed that there were t important genes in total, out of which k important genes were involved in pathway P j . To determine if pathway P j contained a significant proportion of important genes, hyper geometric statistic test was used with the null hypothesis that k/t was not greater than N/T, given that N and t are both greater than 0. The P-value for pathway P j was calculated as (1) While the P-value represented the significance of pathway P j having a significant proportion of genes that were important genes, it also stood for the Type I error rate, that is, the probability that k/t was mistakenly inferred to be significantly higher than N/T. Statistically, the more pathways were tested, the greater probability it might have to observe a false-positive result. With the hundreds of pathways to be tested, to control the overall Type I error rate of multiple testing, false discovery rate (FDR) [22] correction was employed in this study.
The second step was to further rank the importance of the candidates selected in the first step based on the average significance and topological properties of important genes. The average significance quantified the overall significance of the important genes identified in pathway P j . Two descriptors were computed to represent the average significance. The first one, denoted by Q s , was defined as the geometric mean of the q-values computed by SAM for the important genes in pathway P j . The geometric mean instead of arithmetic mean was adopted because the q-values of the genes in a pathway may span several orders of magnitude. If arithmetic mean is taken, the genes with large q-values, even if they are minority in the pathway, may dominate the mean value, yielding a misleading result. The second one, denoted by Q ps , was defined as the geometric mean of the cooccurrence significances of all pairs of important genes in pathway P j . A pair of important genes was composed of two directly connected important genes in the KEGG pathway database and the cooccurrence significance of a pair of important genes was defined as the product of the q-values of these two genes. While Q s measured the overall significance due to individual important genes, Q ps placed more emphasis on the significance characterizing the cooccurrence of a pair of important genes. It was assumed that a gene pair with two highly important genes would be more valuable than that with two genes of moderate significance.
Two classes of topological properties of important genes were quantified to represent the overall significance of pathway P j in the underlying biological process. One was density and the other was clustering. Density referred to the average compactness of important genes in pathway P j . Two density descriptors, named Q d and Q pd , were computed. Q d was defined as the ratio of the number of important genes to the total number of genes in pathway P j , whereas Q pd was defined as the ratio of the number of pairs of important genes to the total number of gene pairs in the pathway. A gene pair consisted of two directly connected genes in the KEGG pathway database.
Clustering was performed based on the important genes mapped onto the KEGG database and the connection information among the mapped important genes provided by the KEGG database. Clustering revealed how closely the important genes were connected. A cluster was an aggregation of connected important genes. Two types of clustering, that is, directed clustering and undirected clustering, were considered, which were characterized by two pairs of topological descriptors. In a directed cluster, one may traverse all important genes starting from at least one gene, called root gene, following the upstream-downstream relation between every pair of connected important genes. On the other hand, in an undirected cluster, every link connecting two important genes defined in the KEGG databases was regarded as an undirected link. That is, one may traverse from any gene in an undirected cluster to all the other genes in the same cluster without considering the upstreamdownstream relation. More detailed descriptions for the roles of the directed and undirected clusterings may be found in Supplemental Data I (http://homepage.ntu.edu.tw/ ∼d91548013/CFES supp data1.pdf).
The first pair of topological descriptors, denoted as N cd and Q cd , stood for the number of important genes and the geometric mean of the q-values computed by SAM for the important genes, respectively, in the maximum directed cluster of pathway P j . If more than one maximum directed cluster was identified in pathway P j , the maximum directed cluster with the smallest Q cd was chosen. The second pair of topological descriptors, denoted as N cu and Q cu , represented the number of important genes and the geometric mean of the q-values of the important genes, respectively, in the maximum undirected cluster of pathway P j . Similarly, if more than one maximum undirected cluster was identified in pathway P j , the maximum undirected cluster with the smallest Q cu was chosen. For convenience, the clusters quantified by the first, and second pairs of topological descriptors were termed as MD-cluster, and MUD-cluster, respectively.
To determine the potentially crucial pathways, the candidate pathways selected in the first step were ranked based on a composite score, R r , which was defined as the sum of the individual ranks of the descriptors for the average 4 Journal of Biomedicine and Biotechnology significance and topological properties. More specifically, R r = R a + R d , where R a = rank(S a (Q s )) + rank S a Q ps + rank(S a (Q cd )) + rank(S a (Q cu )), where S a (X) and S d (Y ) are two sorting functions that sort X and Y in the ascending order and descending order, respectively, and "rank" denotes the rank in the sorted sequence. The smaller the composite score was, the more likely a pathway was considered to be a potentially crucial pathway.

QRT-PCR Analysis.
To validate the microarray data, real-time qRT-PCR using SYBR Green I Kit (Roche) was performed to assess RNA expression level accuracy. A set of important genes were selected for qRT-PCR assessment. Some of them were from the highest-ranking potentially crucial pathway based on the pathway topology analysis and the others were the immediate downstream genes of the highest-ranking potentially crucial pathway. The same RNA samples as those isolated for microarrays at 3, 8, 16, 24, 32, 40, and 48 hours were used for RT-PCR. TATA box-binding protein (TBP) was used as the internal control gene. It was performed in triplicate for control and rCFES-treated WI-38 cells of the same time point.

Cytotoxic Effect of rCFES on WI-38 Cells.
To understand the effect of recombinant CFP-10/ESAT-6 protein (rCFES) treatment on the survival of human lung fibroblasts, rCFES was purified by affinity chromatography. To ensure the treatment response was due to rCFES, a purity of higher than 95% was achieved as determined by densitometric scanning of the 12% SDS-PAGE ( Figure 1). Then, WI-38 cells were treated with different concentrations of rCFES and incubated for 48 hours. Cell viability of WI-38 cells was determined by MTS assay. It was found that the survival rate of WI-38 cells decreased and the decrement varied with the different concentrations of rCFES. For example, only 62% WI-38 cells survived after 12.5 μM rCFES treatment (Figure 2), which suggested that rCFES influenced cell survival and consequently caused death in 48 hours. Based on these evidences, we inferred that rCFES could induce WI-38 cell death via cytotoxic process.

Identification of Important Genes.
To identify the important genes involved in the rCFES-regulated mechanism of WI-38 cells, the microarray data for 0, 3, 8, 16, 24, 32, 40, and 48 hours were analyzed by SAM. Totally, 6542 important genes were identified with the following parameters: data type = one class time course, time summary method = signed area, Δ = 0.511 at time course, and false discovery rate = 4.74%. Out of these 6542 important genes, 2074 genes were upregulated and 4468 genes were downregulated with rCFES treatment.

Potentially Crucial Pathways in rCFES-Induced WI-38
Cells. To unravel the cytotoxic effect of rCFES on WI-38 cells, an integrative analysis approach was proposed in this study with the emphasis on identification of the potentially crucial pathways. While the high-throughput biological experiments, for example, microarray data, may provide hundreds or thousands of important genes revealing changes of expressions induced by rCFES, the information at the gene level may be too overwhelming to extract the most important regulatory effects. Moreover, the analysis results Journal of Biomedicine and Biotechnology 5 are generally lack of the notion at the system level. To remedy these deficiencies, the proposed integrative analysis approach aimed at finding the pathways that played crucial roles in rCFES-induced WI-38 cells.
We integrated the significance of gene expressions and topological distribution of important genes to account for the regulatory effects at the gene and system levels. While important genes represented the regulatory effects of rCFES on individual genes, topological distribution of the important genes bore the notion of chaining or aggregating regulatory effects on genetic subnetworks of WI-38 cells. Suppose that there were two pathways containing the same number of important genes with similar significance, if the important genes distributed sporadically in one pathway and were highly connected in the other, the proposed approach preferred the latter to the former. This was because a cluster of connected genes implied a chaining or aggregating regulatory effect of rCFES on WI-38 cells, which deserved a further investigation.
With a control of the false discovery rate at 5%, 23 pathways as listed in Table 1 were identified to be the candidates of potentially crucial pathways in rCFES-induced WI-38 cells based on the hyper geometric statistic test. Ranking these 23 pathways according to the composite score, R r , the second step of pathway topology analysis further suggested that the focal adhesion pathway is the potentially most crucial pathway in rCFES-induced WI-38 cells. Other than the focal adhesion pathway, the proposed integrative analysis approach also suggested that several metabolism pathways of WI-38 cells (as listed in Table 1) might play crucial role in the response to the treatment of rCFES. While further investigation would be required to validate this finding, it was partially supported by some previous studies, in which antibacterial genes of the immune cells infected with MTB were found to be involved in metabolism [23,24].
To characterize the functional significance of the focal adhesion pathway, the topological and functional properties of the important genes involved in this pathway were elaborately analyzed. With the false discovery rate of 4.74%, 65 important genes, including 7 upregulated genes and 58 downregulated genes (Table 2), were selected by SAM for the focal adhesion pathway annotated by the KEGG database. The root gene of the MD-cluster identified by the pathway topology analysis was ITGB1. The functional interpretations of these 65 important genes were annotated by GOstat [25] using biological process terms at P-value < 2 * 10 −20 and numbers of gene >16, the results of which were shown in Figure 3. The GO terms that best described the functional disorders associated with these 65 important genes were developmental process, cell motility, localization of cell, biological adhesion, and cell adhesion. Most gene functions of 65 important genes were downregulated by cytotoxic effect of rCFES. Combining the analysis results of gene ontology, pathway topology analysis and annotation of KEGG database, overall speaking, the biological process of WI-38 cells treated with rCFES involved such functions as cell proliferation, cell motility, cell survival, amino acid metabolism, and so on. In particular, many of them were related to cell survival.  To check if the proposed approach is also applicable to other microarray data, the same approach is used to analyze a set of publicly available human neutrophil microarray data. The analysis results may be found in Supplemental Data II (http://homepage.ntu.edu.tw/∼d91548013/ CFES supp data2.pdf).

Validation of Important Genes by qRT-PCR.
Twentythree pathways were identified as the potentially crucial pathways in rCFES-induced WI-38 cells and the focal adhesion pathway was considered as the most crucial one. These findings were based on the correctness of the gene expressions revealed by microarrays. To further corroborate the basis of the pathway topology analysis, the gene expressions of 5 important genes involved in the focal adhesion pathway were validated by qRT-PCR. These 5 genes were LAMA4, ITGB1, PIK3R3, BIRC3, and NFKBIA. According to gene annotations from KEGG databases, ITGB1 was the root gene of the focal adhesion pathway. LAMA4 was an upstream gene of ITGB1. PIK3R3 was a downstream gene of ITGB1. BIRC3 and NFKBIA were the downstream genes of PIK3R3. LAMA4 was also related to the ECM receptor interaction pathway. BIRC3 and NFKBIA were directly involved in the apoptosis pathway. Figure 4 provided the fold changes measured by microarray data and qRT-PCR for these 5 important genes at each time point. The qRT-PCR results were generally in congruence with the microarray data for LAMA4, PIK3R3, BIRC3, and NFKBIA in terms of up-or downregulation (Pearson correlations: 0.66∼0.82). More specifically, the first two genes, LAMA4 and PIK3R3, were downregulated by rCFES treatment and the last two genes, BIRC3, and NFK-BIA, were upregulated by rCFES treatment. Nevertheless, the microarray data of ITGB1 was not in agreement with Table 1: The values of the descriptors constituting the composite score, R r , and the selected pathways sorted in the ascending order of R r derived by the pathway topology analysis. the readings of qRT-PCR (Pearson correlations <0), though ITGB1 was identified as the root gene in the focal adhesion pathway. One reasonable explanation for this inconsistency was that the microarray data were inherently noisy, and there was a false discovery rate associated with the microarray data analysis for ITGB1.
Among the confirmed genes, the downregulation of LAMA4 would affect cell survival via laminin-integrin interaction [26,27] and the downregulation of PI3K would affect the survival signal transduction via an integrin/PI3K/Akt pathway [28]. Furthermore, the upregulation of baculoviral IAP repeat-containing 3 (BIRC3), known as an inhibitor of   Cell survival [28] α β Figure 5: Inferred model for the rCFES-regulated focal adhesion pathway in WI-38 cells. LAMA4, PIK3R3, BIRC3, and NFKBIA identified in this study were included in the rCFES-regulated model in WI-38 cells. Upregulated genes were marked by "↑" and downregulated genes by "⊥". Dotted arrows were links requiring further confirmation and the numbers in "[ ]" were the supportive previous works listed in the reference. caspase family [29], would inhibit the activation of caspase genes and the upregulation of NFKBIA would induce cell apoptosis via interfering with NF-κB activation [30][31][32]. These results supported that rCFES could influence WI-38 cell survivability via the regulation of focal adhesion pathway at least. Based on these results, a working model was inferred for the rCFES-regulated pathway in WI-38 cells in Figure 5. Although this model needed further verification, it was believed that these results will aid the investigation of the CFES-regulated mechanism in human lung fibroblasts.