Pseudomonas aeruginosa and a Proteomic Approach to Bacterial Pathogenesis

Pseudomonas aeruginosa is a Gram-negative bacterium that is ubiquitous in the environment and can cause a variety of diseases in compromised patients. The genome of P. aeruginosa strain PAO1 has been reported to contain 5570 potential proteins. The value of this genomic database is that new proteins can be recognized to use as diagnostic markers, novel drug targets, and to better understand the physiology of this organism. However, similar to what has been observed in other sequenced bacterial genomes, approximately one third of the potential proteins have no known function. This is somewhat surprising given the long-standing interest in P. aeruginosa as an opportunistic pathogen. Obviously new tools, in addition to sequence similarity analysis, are needed to determine the role of these proteins. Proteomics using two-dimensional gel electrophoresis followed by mass spectrometry to detect and identify P. aeruginosa proteins represents a novel approach to address this gap.


Introduction
Pseduomonas aeruginosa is an opportunistic pathogen that can cause acute infections in compromised patients including those undergoing chemotherapy, with burns, or with eye injury [1]. P. aeruginosa also causes chronic lung infections in patients with cystic fibrosis (CF); this chronic colonization is the major cause of death in these patients [2,3]. Unfortunately, this bacterium is naturally resistant to high levels of many commonly used antibiotics and is nutritionally diverse allowing it to compete and survive in a large variety of niches. The basis for the antibiotic resistance of P. aeruginosa is its lower outer membrane permeability and mechanisms such as inducible β-lactamases and antibiotic efflux pumps [4]. In addition to its inappropriate interactions with humans, P. aeruginosa is also a normal inhabitant of the environment, growing in aquatic ecosystems and on plants.
P. aeruginosa is a well studied bacteria due to its ability to cause disease. This bacterium makes a large number of recognized virulence factors including lipopolysaccharide (LPS), polysaccharides, toxins, proteases, and other enzymes [5]. Some of these are bound to the cell surface, some are released, others are secreted, and still others are injected directly from this bacterium using a type III secretion apparatus. This bacterium can also change its phenotype: strains isolated initially from lung infections in CF patients have phenotypes similar to those from acute infections and from the environment. However, strains isolated from chronic lung infections make large amounts of an exopolysaccharide called alginate that is responsible for the characteristic mucoid phenotype. In addition, these chronic CF isolates express lower levels of extracellular enzymes and toxins, an altered lipid A, and a defective LPS [2]. In infections where it grows on surfaces, P. aeruginosa can form multicellular structures referred to as biofilms [6], the formation of which is dependent on quorum sensing [7]. Growth in this mode is responsible for increased resistance to host defenses and antibiotics.
Many virulence factors have been noted by their effect in in vitro assays or in vivo infection models that mimic the human infectious process. Traditionally these factors have been identified by the construction of mutations in genes required for product production followed by their characterization in various models of P. aeruginosa infection. These include infection of neutropenic mice that reproduces infection after immunocompromise, the scratched mouse or rabbit eye model that mimics P. aeruginosa bacterial keratitis, the burned mouse model to simulate infection after thermal injury, and the mouse lung infection model which is patterned after initial infections in cystic fibrosis patients. Aspects of these models mirror steps likely to be important in infection, including avoidance of the immune sys-tem, adherence, invasion, and dissemination [5]. Mutagenesis techniques such as in vivo expression technology (IVET) have been developed for P. aeruginosa to identify genes that are induced specifically during infection [8]. Recently infection systems for P. aeruginosa have been developed that do not use animals. These include infections in plants [9], the nematode Caenorhabditis elegans [10,11], and insects [12,13]. Some P. aeruginosa virulence factors have been shown to be required in both mammalian and non-mammalian infection models while others were important in only one system.

Sequence analysis
Because of the importance of P. aeruginosa as an opportunistic pathogen in both acute and chronic infections and for its environmental versatility, the sequence of the genome of the laboratory strain PAO1 was determined by the Pseudomonas Genome Project (www.pseudomonas.com) [14]. The release of this sequence represents a major breakthrough in the analysis of this important human pathogen. One of the largest bacterial genomes sequenced to date (6.3 million base pairs [Mbp]), the PAO1 genome sequence includes 5570 open reading frames (ORFs) encoding potential proteins. The homology of these ORFs to other proteins provides vital information on their proposed activities and should provide a framework to test these functions and determine their role in virulence.
Proportionally, P. aeruginosa has the highest number of regulatory proteins of any bacteria sequenced so far [14,15]. This observation implicates these regulators in the ability of this organism to compete and survive in a variety of environments. It was surprising that nearly 50% of the "homology hits" of the P. aeruginosa ORFs were to those found in Escherichia coli; no other bacterium showed even 10% similarity. In addition, significant similarity in the order of genes between P. aeruginosa and E. coli was noted with increased genetic complexity in P. aeruginosa compared to E. coli. The greater complexity of the P. aeruginosa genome likely results in this bacterium's extreme metabolic diversity.
While the sequence analysis of PAO1 provides scientists with an enormous amount of data on this organism, it is also appreciated that not all P. aeruginosa strains are PAO1. Overall approximately 3% of the DNA is unique between P. aeruginosa clinical isolates and PAO1, a number significantly lower than between E. coli strains [16]. Pathogenic isolates of P. aeruginosa can contain an approximately 50 kb region of DNA not present in PAO1. This DNA may be considered a "pathogenicity island", which is an unstable region containing genes encoding products important for virulence or survival in specific environmental niches [17]. Strain PAO1 contains other DNA in this region [16].
With the release of the genome sequence, a genomewide mutagenesis approach has been developed for P. aeruginosa. This genomic analysis and mapping by in vitro transposition (GAMBIT) technology relies on allelic exchange mutagenesis to detect essential P. aeruginosa genes [18]. This functional genomic characterization should lead to the identification of essential gene products that may prove valuable drug targets.
As mentioned, P. aeruginosa has been genetically well studied due to its importance as a human pathogen. Even so, similar to what has been found in other bacterial genome sequences, about 1/3 of the ORFs have no homology to any reported sequence. These potential gene products likely denote unique features of P. aeruginosa. Targeting these proteins represents a rational approach for making new drugs to specifically combat these bacteria. Similarly, ORF encoding conserved hypothetical proteins that are similar to potential gene products in other organisms may represent important non-P. aeruginosa-specific targets. However determining whether these ORF actually encode proteins and the functions of these gene products remains a challenge.

Proteomic analysis
Genomic information coupled with protein analysis, referred to as proteomics, represents a new approach to address the gap in our understanding of previously undescribed gene products. Using the established techniques of two-dimensional (2-D) gel electrophoresis and mass spectrometry (MS) changes in protein expression or post-translational modification can be monitored between different samples, samples representing different physiological conditions, and other comparisons. 2-D gel electrophoresis separates protein mixtures by two techniques, isoelectric focusing in the first dimension and SDS-PAGE in the second dimension. The resulting gel provides a high-resolution separation of a complex mixture of proteins. The degree of staining of individual bands represents a quantitative measurement of the relative amounts of the protein, effectively providing a third dimension of information. The amino acid sequence of the selected proteins can be de-termined by MS and this information can be compared to any genetic databases including the Pseudomonas Genome Project database. Based on previous experience, use of information derived from the analysis of ORFs alone to calculate molecular weight (MW) and pI is moderately useful at best in assigning identity to proteins separated by 2-D gel electrophoresis; many proteins are products of processing/degradation or posttranslational modifications, which radically alter MW and/or pI.
We have already taken advantage of the availability of the completed genome of P. aeruginosa PAO1 and the techniques of 2-D gel electrophoresis followed by MS. We have optimized procedures for the extraction of proteins from P. aeruginosa and found that DNase and RNase treatment followed by phenol extraction improved the resolution, separation, and reproducibility of the 2-D gel electrophoresis. This was true for both P. aeruginosa CF isolates as well as PAO1 and has allowed us to apply approximately 3 mg of protein onto each gel. After staining the gel with either Coomassie or silver, first pass mass mapping procedures were used. 2-D gel electrophoresis was coupled with Matrix Assisted Laser Desorption Ionization Time-of-Flight (MALDI-TOF) mass spectrometry (Mass Mapping) to rapidly identify proteins. In addition, those proteins not identified by this method were automatically processed further through microcapillary column liquid chromatography tandem mass spectrometry (µLC/MS/MS) for identification or de novo sequencing. A local copy of MS-FIT was used to identify proteins against the P. aeruginosa genome or ORFs. Any proteins that fail to mass map were further analyzed by µLC/MS/MS and searched with the SEQUEST program (ThermoFinnigan) or BLAST (National Center for Biotechnology Information). Finally, any novel proteins or post-translationally modified peptides were de novo sequenced.
We have detected variations in protein expression in P. aeruginosa strains with phenotypes corresponding to those initially and chronically infecting the lungs of a CF patient and identified 14 proteins during this analysis [19]. We noted proteins that were expressed at similar levels between the strains as well as those that were overexpressed in either strain. Of particular interest, we recognized that some protein spots represented specific degradation products. Whether this degradation is a differentially regulated event is currently under investigation. The eventual goal of this type of analysis is to determine how specific proteins may contribute to the observed phenotypic differences between these strains. Changes in protein expression detected through this analysis may reflect changes that are induced by the CF lung environment, those that play a role in survival in the lung environment, or those that contribute to the activities of the organism that are responsible for its pathogenesis. Characterizing these differences will promote further studies on the function and regulation of these particular proteins, their role in virulence, and their potential as novel drug targets and as vaccine candidates.
There are a few other groups that have looked at P. aeruginosa proteins by 2-D gel electrophoresis and other proteomic techniques. Quadroni et al. [20] have detected 13 proteins from P. aeruginosa that were induced during sulfate starvation. Most of these proteins were identified by N-terminal Edman and MS/MS sequencing. Three classes of proteins were identified including those for solute binding, sulfonate and sulfate ester metabolism, and general stress response. These were identified by homology and localized to sulfate controlled operons on the P. aeruginosa genome. The difference in expression levels of these proteins was confirmed by RT-PCR, which indicated that repression in the presence of sulfate was at the level of transcription. The value of this approach is that changes in protein expression under certain growth conditions can be monitored. This may help to suggest the function of unknown genes as well as steps in the response to varying environmental stimuli.
Malhotra et al. [21] have detected protein differences between PAO1 and a mucoid mutant of PAO1,PDO300. This mucoid strain has a mucA22 mutation in the antisigma factor that controls the activity of sigma-22; this is the mutation that is responsible for the emergence of the mucoid phenotype and seen in most isolates from CF patients [22]. N-terminal sequencing of proteins after 2-D gel electrophoresis revealed increased expression of 6 proteins including AlgA, AlgD, DsbA, and OprF as well as decreased expression of 3 other proteins in PDO300 compared to PAO1 [21]. The genes encoding two of these proteins, algA and algD, are encoded by the alginate biosynthetic operon, and were known to have increased transcriptional activity in mucoid strains [23]. These authors confirmed that the transcription of dsbA, encoding a disulfide bond isomerase, was increased in PDO300 compared to PAO1, verifying the protein expression studies and suggesting the regulation by mucA22 [21]. Interestingly, oprF activity was not increased in PDO300. Whether this was due to degradation of OprF in PAO1 rather than overexpression in PDO300 as we observed for OprF in our analysis of initial and chronic CF isolates [19] was not determined.
An "in house" resource to manage the enormous amount of data that can be generated from these sorts of analyses, called "The Microbial Proteomic Database", has been described [24]. This group has also isolated membrane fractions of PAO1 and identified about 189 spots on a 2-D gel [25]. These workers determined that the genes for 104 of these were present in the PAO1 genome. They designated these ORFs as "previously characterized in P. aeruginosa" (38%),those with "similarity to proteins in other organisms" (46%) and those with "unknown function" (16%). These authors also noted at least 15 proteins that may be glycosylated [25]. However how any of these proteins correspond to the genes of the Pseudomonas Genome Project was not reported.
In an attempt to integrate the information available from the Pseudomonas Genome Project and our ability to detect proteins, we have randomly chosen 125 spots from a Coomassie-stained 2-D gel of soluble P. aeruginosa PAO1 proteins and performed mass mapping ( Fig. 1 and Table 1). The phenol extraction, 2-D gel electrophoresis, and processing of spots prior to mass spectrometry were accomplished as described previously [19]. Unlike the previous work, the digests were subjected to mass mapping (mass analysis alone without amino acid sequence). After the tryptic digest, 20% of each sample was micropurified using a ZipTip U-C18 (Millipore, Bedford, MA) according to the manufacturer's instructions for direct elution with matrix. Peptide mass maps of each digest were generated by MALDI-TOF mass spectrometry using a PE Biosystems Voyager DE-PRO (PE Biosystems, Foster City, CA). The system was operated with the following parameters: reflective, positive ion, 20 kV extraction, 76% grid, 0.002% guide wire, 180 nsec delay, and CHC (alpha-cyano-4-hydroxycinnamic acid) matrix (8 mg/mL in 70% acetonitrile/0.1% trifluoroacetic acid). Spectra were acquired by averaging ∼ 100 laser shots. Mass calibration was done using autodigestion tryptic peptides. Protein identifications were done automatically using PS1/ Protein Prospector software (PE Biosystems, Foster City, CA) against the Pseudomonas ORF database from the Pseudomonas Genome Project (12/15/00; http://www.pseudomonas.com). For confident identification, parameters such as percent ion intensity matched, percent of ions matched, molecular mass, and pI were used. Those considered not matched, "none", did not sufficiently meet the above criteria upon manual inspection.
Out of the 125 spots chosen for analysis, 99 gave a dominant protein identification corresponding to an ORF in the PAO1 genome. In each case we identified the major protein component, however we acknowledge that many spots probably contain proteins at lower abundance and these were not identified in this analysis. We noted that for 85% of the proteins, the functions were defined in the PAO1 database annotation. These included membrane proteins, housekeeping proteins, regulators, and structural proteins. We recognized 6% of the spots as conserved hypothetical proteins indicating their identification in other organisms. Perhaps the most interesting group of proteins that we recognized was the 9% hypothetical proteins. As with the conserved hypothetical proteins, these proteins had only been identified as ORFs in the genome sequence. It was not previously known whether these genes were even translated into proteins. While the functions of these proteins are unknown, they are unique to P. aeruginosa suggesting a specific function for this bacterium. Uncovering the conditions under which these proteins are expressed should help to elucidate the functions of these gene products.

Future directions
The post-genomic analysis of P. aeruginosa remains a formidable task. The resolution and detection of proteins by 2-D gel electrophoresis followed by MS is improving. In general, a silver-stained 2-D gel spot containing ∼ 500 fmol of protein can be mass mapped (∼ 0.5-1 mg total protein in the gel). In addition to 2-D gel electrophoresis, other methods of separating proteins have been developed. Isotope-coded affinity tags (ICAT) is a method to specifically label and compare proteins between samples [26]. A "heavy" (deuterium) ICAT reagent, with a biotin tag is used to label all cysteines in a sample. In another sample, a "light" (hydrogen) ICAT reagent is used. The samples are combined, fractionated, and proteolyzed. The ICAT-labeled peptides are isolated, quantified, and identified by MS/MS allowing comparisons of proteins levels between samples. The ICAT technique has been used to measure differences in protein expression in Saccharomyces cerevisiae grown under varying conditions [26]. Another method, multidimensional protein identification technology (MudPIT) is a technique to separate proteins by coupling 2-D LC with MS/MS; this has been used to detect and identify over 1400 proteins from S. cerevisiae [27]. The further standardization and optimiza- tion of both ICAT and MudPIT should be applicable to proteomic investigations of P. aeruginosa. Proteomic studies ongoing for P. aeruginosa include the fractionation of PAO1 to identify those proteins secreted from the bacteria, those in the outer and inner membrane, periplasmic space and cytoplasm under standard laboratory conditions. This should allow the compartmentalization of components within PAO1. In addition, identifying surface proteins may provide new candidates for vaccine development. Subsequent studies include detection of PAO1 proteins during different growth phases, temperatures, and media, all conditions shown to alter protein expression. Conditions that mimic the biofilm mode of growth will be com-pared to growth in liquid media. Hypothetical proteins expressed under specific conditions may have critical roles in these various circumstances. Defining the protein expression patterns may help suggest the function of these proteins. Protein expression in various PAO1 regulatory mutants will be compared to PAO1 to determine how proteins are controlled. Again any data obtained about regulation of hypothetical proteins will provide critical information concerning protein function.
Proteins expressed from various P. aeruginosa clinical isolates will be compared to PAO1. Any differences between P. aeruginosa CF isolates and strains from other types of infection may suggest the importance of  ). particular proteins to a specific infectious process and indicate potential drug targets. In addition, P. aeruginosa proteins expressed during in vitro and in vivo infections can be followed. To further identify proteins important during infection, 2-D gel electrophoresis followed by Western immunoblots with patient sera will be used to detect which proteins are "seen" by the immune system. Antigens from the bacteria Helicobacter pylori have been identified in this manner; it is suggested that these may be useful for serological detection and monitoring infection [28]. The release of the P. aeruginosa genome coupled with microarray gene chip technology will help to define the transcriptional organization of this important pathogen. While this method would seem to make the proteomic characterization of P. aeruginosa superfluous, post-transcriptional modification and protein processing can not be determined by this transcriptome analysis. In fact, the correlation of gene expression and protein expression is often quite poor. Since protein localization and establishing environmental conditions regulating protein production are critical to understanding elaborate biological pathways and designing new drugs, the importance of the proteomic analysis should not be underestimated. Similarly, defining the patterns of protein expression should help to determine the function of unknown gene products. Likewise, investigating protein-protein interactions using techniques such as yeast two-hybrid analysis will provide a mechanism to integrate proteomics and protein complex formation [29]. Such genome-wide approaches will be essential to define the functions of unrecognized protein products.

Conclusions
P. aeruginosa proteomic analysis allows the comparison of proteins expressed between bacterial strains or under varying growth or stress conditions. This examination should uncover proteins whose expression is critical in certain specific circumstances. This in turn will allow for the identification of patterns of protein expression that should help to define the functions of many hypothetical and conserved hypothetical proteins and provide a focus on those relevant in infection. This combination of genomic, proteomic, and informatic technologies applied to P. aeruginosa should rapidly advance the field, making rational and unique targets for drug design within closer reach.