The Differential Gene Expression Pattern of Mycobacterium tuberculosis in Response to Capreomycin and PA-824 versus First-Line TB Drugs Reveals Stress- and PE/PPE-Related Drug Targets

Tuberculosis is a leading infectious disease causing millions of deaths each year. How to eradicate mycobacterial persistence has become a central research focus for developing next-generation TB drugs. Yet, the knowledge in this area is fundamentally limited and only a few drugs, notably capreomycin and PA-824, have been shown to be active against non-replicating persistent TB bacilli. In this study, we performed a new bioinformatics analysis on microarray-based gene expression data obtained from the public domain to explore genes that were differentially induced by drugs between the group of capreomycin and PA-824 and the group of mainly the first-line TB drugs. Our study has identified 42 genes specifically induced by capreomycin and PA-824. Many of these genes are related to stress responses. In terms of the distribution of identified genes in a specific category relative to the whole genome, only the categories of PE/PPE and conserved hypotheticals have statistical significance. Six among the 42 genes identified in this study are on the list of the top 100 persistence targets selected by the TB Structural Genomics Consortium. Further biological elucidation of their roles in mycobacterial persistence is warranted.


Introduction
Tuberculosis (TB) is a deadly infectious disease caused by Mycobacterium tuberculosis that typically affects the lungs (pulmonary TB) but may also occur in other organs (extrapulmonary TB), such as the central nervous system, lymphatic system, circulatory system, genitourinary system, bones, joints, and the skin. M. tuberculosis infection is extremely difficult to treat mainly because of its adaptive ability to turn a hostile environment within human macrophages (phagocytes) into a friendly niche for its replication. The bacilli can persist in human tissues, at the primary or secondary infection sites, for a long period of time without multiplication and later be reactivated when the host immune system is compromised. Persistent bacilli refer to those nongrowing bacilli that, often derived from in vivo, can grow immediately in a fresh medium [1]. Bacterial persistence in vivo is analogous to the stationary phase culture in vitro [2]. Nonreplicating persistence of tubercle bacilli can be induced under hypoxia [3]. Dormant bacilli refer to nongrowing bacilli that do not grow immediately in a fresh medium but can be resuscitated [1]. Latent TB infection (LTBI) is a clinical condition associated with only a positive tuberculin skin test (i.e., evidence of infection with M. tuberculosis) but without clinical or radiographic signs of active disease. Persons with LTBI are at increased risk for development of active disease, which may occur after decades of latent infection [4]. From the above definitions, the distinction between persistent and dormant bacilli seems to be their rates of recovery from the nongrowing state. A recent study shows that M. tuberculosis replicates throughout the course of chronic TB and is restrained by the host immune system [5], which finding suggests that the switch between the nonreplicating and slow replicating states is a dynamic process subject to host immunity. In the text, we do not make distinction between "nonreplicating persistence" and "dormancy". The presence of dormant tubercle bacilli in animal tissues is best demonstrated in a well-known Cornell model [6]. In this model, some mice infected with M. tuberculosis and then treated with a short-term chemotherapy (INH plus PZA) for 3 months developed a relapse after discontinuation of the therapy. Dormancy can cause phenotypic drug resistance. In the Cornell model [6], the bacilli recovered from the relapsed mice were fully susceptible to INH and PZA, indicating phenotypic rather than genetic drug resistance associated with dormant bacilli.
Eradicating mycobacterial persistence would be an indispensable element for development of next-generation TB drugs. Recently, much research effort has been focused on testing capreomycin and PA-824 for TB treatment with emphasis on their unique bactericidal effect on persistent tubercle bacilli. Capreomycin is an old antibiotic for treatment of pulmonary tuberculosis [7] with recently increasing interest. Capreomycin is effective against (Multi-Drug resistant) MDR and intracellular TB bacilli [8]. A recent study demonstrates that among known anti-TB drugs, only capreomycin is active against nonreplicating M. tuberculosis bacilli in an in vitro model of persistence [9]. Capreomycin is thus important because it can deal with both MDR and latent TB. PA-824 exhibits bactericidal activity against both replicating and static (persistent) tuberculosis and it also has potent bactericidal activity against MDR M. tuberculosis in animal infection models [10]. PA-824 shows a better sterilizing activity than moxifloxacin [11] and could enhance bactericidal activity of rifampin and/or pyrazinamide [12]. Both capreomycin and PA-824 inhibit protein synthesis [10,13]; PA-824 also inhibits the synthesis of cell wall lipid [10]. Yet, how these two drugs eradicate persistent bacilli is unclear.
In this work, we conducted an exploratory analysis of differential gene expression in response to capreomycin and PA-824 versus the current first-line TB drugs in an attempt to identify drug targets potentially linked to the unique drug action against nonreplicating TB bacilli. Our study found that the genes significantly differentially expressed in response to capreomycin and PA-824 were dominated by PE/PPE or conserved hypotheticals, and many of these genes were related to stress responses. These genes and their products may serve as new drug targets for TB drug development.

Materials and Methods
We performed differential gene expression analysis between two sets of data samples such that one set of samples was derived from drugs that were bactericidal to nonreplicating persistent TB bacilli (defined as the goal property) and the other set of samples from drugs that lacked this property. The first set of samples served as the experimental group whereas the second set as the control group. Only a few drugs have experimentally demonstrated the bactericidal effect on persistent M. tuberculosis grown in an anaerobic or hypoxic condition. In the present study, the experimental group consisted of two best-known drugs in this category: capreomycin and PA-824. All first-line TB drugs were selected to be included in the control group except (pyrazinamide) PZA for the reason stated later. A second-line TB drug, ethionamide, and a non-TB drug, ampicillin, were also added to the control group. Fluoroquinolones were not included in this study because current evidence is inconsistent about their activity against persistent TB bacilli.

Data Collection
The microarray-based gene expression data were collected from the GEO (Gene Expression Omnibus) database at NCBI (National Center for Biotechnology Information) of NIH (National Institutes of Health) at http://www .ncbi.nlm.nih.gov/geo/. In GEO, the data are stored in a relational database with three top-level entity types: platforms, samples, and series. A platform describes the format or model of the microarray (e.g., oligonucleotides, cDNAs, etc.). A sample describes the gene expression data of a single hybridization under a given experimental condition. A series consists of the related samples in an experiment. The original data were generated by treating M. tuberculosis H37Rv with a wide variety of metabolic inhibitors, including TB drugs, using a cDNA platform in a study conducted by Boshoff et al. [14], where the details of original data collection can be found. The gene expression data of the present study comprised samples of the following accession numbers: GSM28106, GSM28096, GSM28246, GSM28245, GSM28062, GSM28055, GSM28037, GSM28060, GSM28065 and GSM28063, which were collected under the series GSE1642.

Significance Analysis of Microarrays
The (Significance Analysis of Microarrays) SAM program can effectively recognize differentially expressed genes with high statistical significance from multiple data sets of two or more classes by using balanced perturbation of repeated measurements and minimization of the false discovery rate (FDR) [15]. FDR, as an alternative to the traditional Pvalue, has been widely accepted for gene selection from microarray data. Through balanced perturbation of repeated measurements, SAM proves to be more robust than traditional statistical tests such as the t-test. In the present study, we applied SAM version 3.0 to determine significant genes that were differentially expressed in gene expression profiling between the experimental and control groups. The data used in the present study, organized in the excel format ready to be processed by SAM, can be downloaded from our web site (http://www.patcar.org/Research/IJMB-2009.html). The data were processed by SAM with the following parameter settings: two-class, unpaired, log-ratio, and FDR set to 0.05.

Significance Analysis of Genes in Functional Categories
The chance probability of identifying n genes among which i genes belong to a functional category of size f is computed International Journal of Microbiology Expected score Observed score by the following adapted formula [16]: where g is the total number of genes in the genome. . SAM analysis on two classes of gene expression samples yielded positive and negative significant genes. Positive significant genes refer to genes more strongly induced in the experimental group than in the control group, whereas negative significant genes are genes less strongly induced in the experimental group compared with the control group. In this study, the four gene-expression data samples concerning the responses of M. tuberculosis to capreomycin and PA-824 at a low and high doses constituted the experimental group, and the rest of data samples representing the responses of M. tuberculosis to other selected drugs is assigned to the control group. Consequently, positive significant genes are genes that are differentially expressed in response to capreomycin and PA-824. Since capreomycin and PA-824 are bactericidal to nonreplicating TB bacilli, genes induced by these two drugs but not by the other drugs in the control group would likely be linked to bactericidal activity against nonreplicating TB bacilli, as this is the main attribute that differentiates between the two groups. SAM analysis on the gene expression data collected for this study generated 42 positive significant genets and 196 negative significant genes ( Figure 1). Genes were removed from consideration if they were not induced by capreomycin or PA-824. In the drug-treated gene-expression profiling experiment, induced genes are more significant than repressed genes since the induction of certain genes reflects the feedback or compensatory mechanism that senses the interruption of the drug-acting metabolic pathways in which the genes are involved [17]. Our prior data also suggest that whereas the molecular characteristics of the induced genes reflect the drug's mode of action, the repressed genes are often nonspecific or secondary [18]. Therefore, we focused our attention on the positive genes. Those genes differentially induced by capreomycin and PA-824 are displayed in Table 1 and their gene expression values across all the drugs are summarized in Table 2. Functional analysis of these genes would provide further insight into the main issue.

Functional
Analysis of Identified Genes. The genes differentially expressed in response to capreomycin and PA-824 fall in seven functional categories ( Table 3). The number of the genes identified from a functional category is considered statistically significant at a level of significance of 0.05 if the associated chance probability is less than 0.005 after the adjustment for the multiplicity effect due to 10 categories. By this criterion, PE/PPE and conserved hypotheticals are the only categories with statistical significance, while identification of the genes in other categories could be explained by chance.
The persistence-related genes identified in this study were interpreted in terms of their functions and roles based on current research evidence (Table 4). These genes are involved in stress (hypoxia, starvation, or heat-shock) responses, fatty acid degradation/catabolism for energy utilization, membrane function, survival (i.e., essential genes), growth, or sulfur metabolism. Yet, many genes identified in the present study are conserved hypotheticals with unknown function. Further biological validation on these genes is warranted.

Discussion
In the present study, we collected two sets of gene expression data samples from the GEO database so that one set of samples was derived from drugs that were active against nonreplicating persistent TB bacilli and the other set of samples from drugs that were not. We used a standard differential gene expression analytical method [25] developed in the field of functional genomics [26]. The gene expression data were generated in a study that was aimed to gain new insight into drug mechanisms of action by performing a large scale of gene expression profiling experiments concerning M. tuberculosis responses to a variety of metabolic inhibitors [14]. Their study, however, did not address the mechanisms against nonreplicating M. tuberculosis as we did in the present study.
PZA is used in combination with other first-line TB drugs in order to reduce the required treatment duration [27]. In addition, PZA in conjunction with rifampin can be used to treat latent tuberculosis [28]. These two facts suggest that a potential role PZA may play in killing persistent TB bacilli, despite that PZA alone did not show this property experimentally [9]. Because of this uncertainty, PZA was not included in the control group.
The data were generated from gene expression experiments carried out on aerobic growing bacilli rather than on anaerobic dormant bacilli. Research based on global gene 4 International Journal of Microbiology expression profiling has identified genes that are differentially expressed by M. tuberculosis resident in macrophages compared with M. tuberculosis grown in standard broth culture [29]. Despite recent progress in this area, the genetic responses of intracellular M. tuberculosis exposed to drugs have not yet been studied. Thus, the present study was limited by this fact. Research shows that gene expression in a hypoxic condition reflects the fact that many metabolic pathways are repressed and specific hypoxic response pathways are turned on. In our prior study on capreomycin, the upregulation of isocitrate lyase (ICL) suggests that capreomycin can affect the glyoxylate shunt, which is a pathway alternative to the tricarboxylic acid (Krebs) cycle and is involved in intracellular mycobacterial persistence International Journal of Microbiology 5 when fatty acids become a major source of carbon and energy in M. tuberculosis metabolism [30]. Although that study was conducted in an aerobic condition, the gene expression pattern found there also revealed responses to the nonreplicating state due to the drug's action.
Another limiting factor in the present study is due to the fact that the observed bactericidal effect of capreomycin and PA-824 on nonreplicating bacilli is based on the Wayne model [3], which has been widely adopted for this purpose, but its accuracy for simulating the in vivo nonreplicating Table 3: Functional classes of genes strongly induced by capreomycin and PA-824 relative to other TB drugs. P = the chance probability of the number of the genes identified from a given functional category among all genes. Functional classes are based on TubercuList (http://genolist.pasteur.fr/TubercuList/). state has been questioned. Despite the above limitations, the present findings would serve as a basis for future work. The pathogenicity of M. tuberculosis involves a complex set of factors. The availability of the M. tuberculosis genome sequence coupled with advances in molecular biology has resulted in a wide range of novel drug targets related to transcription, cell wall synthesis, signal transduction, information pathway, intermediate metabolism, virulence, and persistence [31]. The need of lengthy chemotherapy reflects the lack of adequate understanding in bacterial persistence. However, a finding that the persistence of M. tuberculosis in mice requires the glyoxylate shunt enzyme isocitrate lyase (ICL) [32] points to a new drug target. Other possible drug targets related to TB persistence are DosR, RelA, and PcaA [31]. As there have been only a handful of drug targets hypothesized as relevant to mycobacterial persistence to date and the prospect of their serving as a basis for developing a practically useful new TB drug is uncertain, our study is aimed to address this issue by taking a genomewide approach to finding new TB drug targets.
The in vivo microenvironment of persistent TB bacilli is often located in such lesions as granulomas, cavities, or caseous tissue necrosis, where oxygen and nutrients are both deprived. Stress results from hypoxia and starvation as well as host immunity. To survive, TB bacilli must activate certain metabolic pathways to deal with the stress. Such rescue pathways have been studied for hypoxia [33], starvation [20], and high temperature [21].
Six among the 42 genes identified in the present study are on the list of the top 100 persistence targets selected by the TB Structural Genomics Consortium (http://www.webtb.org/). These genes are: Rv2557, Rv1285, Rv2878C, Rv2777C, Rv1929C, and Rv0834C. The chance probability of identifying 42 genes from all genes in the M. tuberculosis genome such that 6 genes are in the category of the 100 persistencerelated genes is.0005, and hence it is highly statistically significant in this sense. It is not clear how the relevance to persistence is assessed for those top 100 persistence targets, but it is presumably based on evidence derived from longterm research efforts on M. tuberculosis conducted by the Consortium members. There is no guarantee that these top targets are the best drug targets for future TB drug development. Nevertheless, the overlap of our findings with the data provided by the Consortium is significant and offers credence to the findings.
It is interesting to note that none of the identified genes belong to information pathways or to regulatory proteins. Some well-known TB drugs target information pathways. For example, rifampin targets RNA polymerase, fluoroquinolones target DNA gyrase, and streptomycin targets ribosomal protein and 16S rRNA. These drugs are effective against actively growing TB bacilli, but they cannot eradicate persistent bacilli. In contrast, capreomycin and PA-824 target not only information pathways but also other metabolic pathways. Our analysis suggests that inhibition of the information pathway alone is not sufficient for resolving the issue of mycobacterial persistence.  The distribution of the genes identified in the categories of virulence, lipid metabolism, cell wall and cell processes, intermediary metabolism and respiration is not statistically significantly different from their distribution in the whole genome. Thus, the significance of these gene categories as relevance to mycobacterial persistence is questionable. For example, isoniazid, the best known first-line TB drug, works by inhibiting mycobacterial cell wall synthesis but it is not effective against persistent TB bacilli [9].
PE/PPE and conserved hypotheticals are the only two functional categories where the distribution of the genes identified in this study is significantly different from their distribution in the whole genome. This finding suggests that a significant number of genes pertinent to persistence elimination are likely PE/PPE or conserved hypothetical proteins. If this hypothesis turns out to be true, then it explains why the persistence problem is difficult to solve since the current knowledge about genes in these two categories is quite limited. Elucidation of the functions of these PE/PPE and hypothetical proteins could result in information useful for developing new TB drugs. The investigation of some of these genes related to persistence is currently underway, for example, Rv2557, Rv2777c, and Rv1929c in the hypothetical category, and Rv0834c in the PE family.
About 10% of the genes in M. tuberculosis genome are dedicated to the production of two families of glycinerich proteins called PE (proline-glutamine motifs) and (proline-proline-glutamine motifs) PPE, characterized by repetitive structure that may represent a source of antigenic variation [24]. It is logical to assume that suppression of the PE/PPE synthesis pathway may reduce antigenic variation and thereby facilitate the bactericidal mechanisms due to host immunity; however, this hypothesis remains to be proven.

Conclusion
The treatment of M. tuberculosis infection is often difficult mainly because of its adaptive ability to persist in the host tissue for a long period of time. Eradicating mycobacterial persistence would be an indispensable element for developing next-generation TB drugs. Yet, our understanding of the mechanisms of persistence is still far from reaching the point where an effective new drug can be developed to solve the problem. In fact, only a handful of genes are found to be potential drug targets pertinent to mycobacterial persistence, and only a few drugs exhibit bactericidal activity against nonreplicating TB bacilli, notably, capreomycin and PA-824. In the present study, we identified genes differentially expressed in response to capreomycin and PA-824 versus the first-line TB drugs. Six genes identified in our study are on the list of the top 100 persistence targets selected by the TB Structural Genomics Consortium. A significant number of genes identified are PE/PPE or conserved hypotheticals, and many of these genes are related to stress responses. Further biological validation on these genes is warranted.