Helicobacter pylori and its genome : Lessons from the treasure map

Since the description and microbiological cultivation of Helicobacter pylori and its association with gastritis and peptic ulceration shown by Marshall and Warren (1) in the early 1980s, interest in this microorganism has continued to grow. Its more recent association with the development of gastric cancer and mucosa-associated lymphoid tissue lymphoma in patients (2) has served only to increase the interest and the publications in this area. Studies on the molecular genetics of H pylori have increased exponentially during the past several years, with the publication of a number of exciting findings that relate to the pathogenicity of the organism and its ability to develop resistance to antibiotics. The genetics of H pylori, as well as of the related genus

campylobacter, was reviewed in 1992 (3), and readers are referred to this source for earlier work.The present review concentrates on recent findings on molecular genetic aspects of H pylori, with particular emphasis on data from the H pylori 26695 genome sequencing project (4).

THE COMPLETE GENOME SEQUENCE OF H PYLORI 26695
In August 1997, the complete genome sequence of H pylori strain 26695 was published by The Institute for Genomic Research (4).The sequence is available on the Institute's website at http://www.tgr.org/tdb/mdb/hpdp/hpdb.html, and the sequence has been deposited in GenBank with accession number AE000511.Thus, it is possible to search for a gene of interest and to compare the sequence of a gene from another H pylori strain with that from H pylori 26659.
The random sequencing of H pylori 26695 DNA confirmed much of what had already been ascertained painstakingly during the previous 12 years by molecular geneticists working with other H pylori strains.A few examples are given below, and for more detail readers are referred to Tomb et al (4).H pylori 26695 has a genome size of 1,667,867 base pairs (bp) (1.67 megabases), consistent with the size range of 1.6 to 1.73 determined for various H pylori genomes by pulsed field gel electrophoresis (PFGE) (5).The genome of H pylori 26695 is circular, and the 23S-5S and 16S rRNA genes are located separately, as had been determined previously for other H pylori strains (5,6).However, in H pylori 26695, an additional copy of the gene encoding 5S rRNA was identified.The urease gene cluster was identified, which was previously extensively characterized by Cussac et al (7) and De Reuse et al (8).The flagella components specified by flaA and flaB (9) were located, and, in addition, Tomb et al ( 4) calculated that at least 40 different genes appear to encode proteins involved in the regulation, secretion and assembly of the flagella.
In total, 1590 predicted coding sequences were identified, 1091 of which were identified either unequivocally because of previous studies done on H pylori, such as those mentioned above, or from matches with sequences in various databases (4).The low level of homology in some cases may mean that the function of some genes may ultimately need to be reinterpreted.For 499 predicted coding sequences, no function was determined (4).The availability of the sequence of H pylori 26695 and the identification of many new putative genes should enable researchers to rigorously test functions of particular proteins and ultimately should provide new insights into the mechanism of lifestyle and virulence of H pylori. Throughout the present review, information concerning the complete genome sequence of H pylori 26695 will be discussed.

THE H PYLORI PATHOGENICITY ISLAND
The cagA pathogenicity island (PAI) was first identified in H pylori NCTC 11638 by Censini et al (10) as a 40 kilobase DNA insertion in the chromosomal glutamate racemase gene (glr).Direct repeats of 31 bp flank the PAI, and, in some strains, the PAI is split into a left and a right segment by the insertion sequence IS605.Tomb et al (4) confirmed the presence of a PAI in H pylori 26695 as well.These PAIs contain cagA, which is frequently used as a marker for virulence because it is more often found in strains from patients with ulcers and gastric cancer than from those with mild gastritis or from asymptomatic individuals (11).It appears likely that the presence of the PAI in H pylori predisposes infected individuals to develop ulcers or even ultimately gastric cancer, whereas people who are infected by H pylori strains lacking the PAI may be affected by mild gastritis and may show no symptoms.Still, additional work is required to confirm the influence of the PAI because other factors may play a role in virulence such as the vacuolating cytotoxin (12).The cytotoxin is encoded by vacA, which is genetically unlinked to the PAI (4,6).
The genes and some of the deduced proteins encoded on the PAI are shown in Figure 1.Transposon inactivation of several of the genes within the PAI, although not of cagA, abolishes the induction of cytokine interleukin-8 expression in gastric epithelial cell lines (10).Sequence similarity searches of databases for proteins related to those in the H pylori PAI have demonstrated that a significant number of proteins are similar to those involved in toxin secretion in Bordetella pertussis as well as to those involved in transport of plasmid DNA, including transport between bacterial cells of the plant pathogen Agrobacterium tumefaciens and plant cells (4,10).Previous workers noted that proteins similar to those encoded by the H pylori PAI are involved in the generation of a particular secretion system, termed a 'type IV secretion system' (13,14).The proteins direct their own passage across the bacterial outer membrane by forming a pore through which they pass.
Like H pylori, a number of other microbial pathogens contain virulence genes in large contiguous blocks of DNA found as chromosomal inserts or PAIs (15).The PAI frequently has 31 bp direct repeat sequences at either end, suggesting that it may be gained or lost by site-specific or generalized recombination involving the terminal repeats (16).The cag PAI lies between two protein-coding genes, glr and an open reading frame with no known function called HP0519 by Tomb et al (4).Strains that contain no PAI contain a single empty site that may have been generated by excision of the PAI or may have existed before the PAI insertion (16).Besides H pylori, the following bacteria are believed to contain one or more PAI: uropathogenic and enteropathogenic Escherichia coli, Salmonella typhimurium, Yersinia pestis, Clostridium difficile and Listeria monocytogenes (14).Of these bacteria, only the H pylori PAI appears to encode a type IV secretion system.

DIVERSITY OR CONSERVATION IN H PYLORI GENOMES?
Early studies of H pylori indicated that different H pylori strains were genetically diverse (17).Various methods have been used and have been reviewed by Jiang et al (6) including conventional gel electrophoresis of H pylori DNA using frequent cutting enzymes, and analysis using NotI and NruI.Polymerase chain reaction (PCR) technology, either alone or followed by Southern hybridization with various gene probes, also demonstrated diversity.The latter methods are not dependent on restriction enzyme site polymorphisms.In addition to these studies, macrorestriction maps of five strains of H pylori revealed almost complete heterogeneity in the physical genetic maps of five unrelated strains (6).
The genetic maps of H pylori 26695 and four other strains of H pylori that were mapped using PFGE by Jiang et al ( 6) are compared in Figure 2. The map of NCTC 11637 has been reversed compared with a previous map (6) so that it appears more closely related to 26695 and UA861.However, maps of the two other strains appear to be quite distinct.There are limitations to the method of constructing maps by PFGE because the order of genes within a single restriction fragment usually cannot be determined.The order of known genes can be determined by cosmid mapping, as the order of genes for NCTC 11638 was determined (18), but H pylori DNA in E coli cosmids is sometimes unstable and may undergo rearrangement (19).Therefore, previously constructed H pylori genome maps are low resolution maps compared with maps constructed using complete random genome sequencing, which was used to construct the map of H pylori 26695 (4).
The genome of a second clinical isolate of H pylori (J99) has been sequenced in large part by the Genome Therapeutics Corporation (licensed to Astra Research Center Boston, Massachusetts).Although the NotI restriction patterns of J99 and H pylori 26695 are quite different (unpublished data), analysis of outer membrane proteins and genes that encode a family of 32 porins and adhesins suggests that J99 and 26695 proteins are very similar, and the genes encoding them are relatively conserved in location on their respective chromosomes (20).This has led to the suggestion that the H pylori genome is not as plastic as earlier studies suggested (20).
Other work also suggests that the order of specific genes may be conserved among many strains.For example, the order of genes in the PAI appears to be fairly well conserved (4,10,16).In addition, genes that make up a novel stress-responsive operon (SRO) in H pylori were linked in 14 of 15 clinical isolates examined by PCR (unpublished data).These genes include ftsH, the protein for which is involved in cell division and possesses an ATPase domain associated with diverse cellulase activities (21), as well as genes involved in copper transport out of bacterial cells.The latter have been designated copA, which encodes a Ptype ATPase (24), and copP, which encodes a small pro- tein that is also associated with copper transport (25).These genes are arranged in the order ftsH-pss-copA-copP, where pss encodes phosphatidylserine synthase (24).The pss gene is believed to be required for motility and chemotaxis, and a mutant with pss disrupted by a chloramphenicol acetyltransferase cassette (CAT) could not be constructed (24), suggesting that phosphatidylserine synthase plays an essential role in the growth of H pylori.
Work by Beier et al (25) in 1997 identified the four genes in the order ftsH-pss-copA-copP as part of an SRO that contains additional upstream genes cheY, which encodes the flagella motor protein CheY, and hsm, believed to code for a heat shock protein with methyltransferase activity.
Beier et al proposed that the genes in the order cheY-hsm-ftsH-pss-copA-copP are coordinately expressed, and showed that transcription can be regulated by addition of copper to the medium and by raising the temperature from 37°C to 45°C (25).The coordinate expression of genes within the H pylori SRO locus may be a response to stress conditions encountered by this pathogen during the process of bacterial infection, thus providing an advantage in establishing a successful infection.Therefore, there may be enormous selective pressure on some blocks of genes to remain together during the course of bacterial evolution.Although variability may be seen in restriction endonuclease sites and in the location of certain genes, which gives an overall impression of genome diversity, selective pressure for the conservation of certain genes and operons within the H pylori genome is likely a major factor in reducing genome diversity.Tomb et al (4) reported the presence of 13 copies of insertion element IS605, five full length and eight partial copies, within the H pylori 26695 sequence.A second insertion element, IS606, was present in four copies, two full length and two partial copies.These insertion elements may be responsible for transposition of certain portions of the genome and may contribute to genome variability.
In addition, polymeric tracts of cytosine or guanine residues, or other simple nucleotide repeats are present in a number of H pylori genes (4) (discussed below).Slippedstrand mispairing within these polymeric tracts of DNA may result in genotypic variation and be responsible for the genetic variability reported among H pylori strains.

GENES ENCODING PROTEINS INVOLVED
IN MOLECULAR MIMICRY Tomb et al (4) identified many genes that are likely to be involved in the synthesis of lipopolysaccharide in H pylori. H pylori lipopolysaccharide is highly unusual in that it contains Lewis X (Le x ) and Lewis Y (Le y ) epitopes, which are mono-and difucosylated glycoconjugates (26)(27)(28)(29)(30)(31).
Most H pylori strains express either Le x or Le y , or both, and levels of Le x and Le y in a single strain can vary depending on growth conditions (unpublished data).Both Le x and Le y are found in gastric epithelial tissue (32), indicating molecular mimicry between the bacterial sur-face and the gastric epithelium, which is the specific niche occupied by H pylori.Such mimicry may contribute to the considerable persistence of H pylori infection (2).Antibodies against H pylori have been demonstrated to crossreact with human gastric mucosa cells (33).Such an autoimmune reaction may induce or exacerbate H pylori gastritis and may be an important virulence mechanism.
Chan et al (34) demonstrated that the enzymes required for Le x biosynthesis are beta-1,4-galactosyltransferase and alpha-1,3-fucosyltransferase.The biosynthetic pathway is identical to that found in humans.Subsequently, Ge et al (35) cloned a complete alpha-1,3-fucosyltransferase gene (HpfucT) and expressed alpha-1,3-fucosyltransferase activity in E coli.The deduced amino acid sequence of H pylori alpha-1,3-fucosyltransferase (HpfucT) consisted of 478 residues with a calculated molecular mass of 56,194 Da and is approximately 100 amino acids longer than known mammalian alpha-1,3/1,4-fucosyltransferases.When expressed in E coli, a 52 kDa protein encoded by hpfucT gives rise to alpha-1,3-fucosyltransferase activity but not alpha-1,2-fucosyltransferase activity (required for Le y synthesis), as characterized by radiochemical assays and capillary zone electrophoresis.An approximately 72 amino acid long region of HpFucT exhibited significant sequence identity (40% to 45%) with the highly conserved C-terminal catalytic domain among known mammalian alpha-1,3-fucosyltransferases.Several structural features unique to HpFucT were observed, including 10 direct repeats of seven amino acids and the lack of a transmembrane segment typical for known eukaryotic alpha-1,3fucosyltransferases.In addition, the repeat region contained a leucine zipper motif, responsible for dimerization of various basic region-leucine zipper proteins, suggesting that the HpFucT protein may form dimers.A leucine zipper motif is fairly uncommon in prokaryotic proteins and is not present in mammalian alpha-1,3-fucosyltransferase proteins.
Construction of mutants with inserts of CAT in cloned HpfucT followed by natural transformation back into H pylori and selection for chloramphenicol resistance did not result in any notable change in the alpha-1,3fucosyltransferase activity in H pylori (unpublished data).There are two copies of the HpfucT in various H pylori strains (Figure 2).In addition, the genome sequence of H pylori 26695 contains two nonadjacent copies of HpfucT (Figure 2), which are designated HP0379 and HP6651.Therefore, a knockout in a single copy of HpfucT did not inhibit alpha-1,3-fucosyltransferase activity in H pylori. Polymeric tracts of cytosine residues are present near the 5¢ end of all HpfucT genes that have been sequenced (4,35, unpublished data).This DNA motif has been postulated to cause on/off phase variation by DNA slippage (4,36,37).Additional work is required to demonstrate unequivocally the relationship between these cytosine tracts and Le x phase variation in H pylori.It is also important to determine whether knockouts in both HpfucT genes can be constructed or whether they are lethal to H pylori.
genes can be constructed or whether they are lethal to H pylori.
Synthesis of the Le y structure by H pylori requires alpha-1,2-fucosyltransferase activity.Chan et al (34) were not able to detect such activity in H pylori cells that express Le y .No gene for alpha-1,2-fucosyltransferase was identified in H pylori 26695 (4); however, it has been pointed out that insertion of a cytosine-guanine pair in HP0094 would yield a full length protein with homology to alpha-1,2-fucosyltransferase (36).Additional work is required with alpha-1,2-fucosyltransferase to clone the complete gene, and to optimize conditions for the assay of the protein's expression both in H pylori and in E coli clones.

DNA RESTRICTION AND MODIFICATION GENES
The H pylori genome is unusual in that it contains homologues of more than 20 genes associated with DNA restriction and modification systems (4).The role of these enzymes is unclear.They may be responsible for breakdown of intracellular foreign DNA, although there is limited evidence that this is taken up into H pylori cells.The enzymes may act outside the cells and degrade foreign (eg, human) DNA.Modification of DNA likely occurs frequently in H pylori, leading to restriction site polymorphisms, but the reasons for this phenomenon are not clear.H pylori is naturally transformable with DNA from other H pylori cells (38,39), and some of the enzymes may be required for DNA fragmentation and stimulation of the formation of recombinants.

GENE TRANSFER IN H PYLORI
The most efficient method of transferring genes appears to be natural transformation (38,39).The majority of H pylori strains tested can be transformed using H pylori DNA, and this is an extremely convenient way to construct knockout mutants in a gene of interest.An antibiotic resistance cassette (such as CAT) is cloned into a specific gene, and the resulting recombinants, which must undergo homologous recombination between the incoming DNA and the resident chromosome, are selected on antibioticcontaining media.These recombinants are generally checked by PCR analysis to determine that the cassette is in the correct position within the gene of interest (42).Such recombinants appear to be extremely stable.If a recombinant cannot be obtained, then the gene product may be absolutely required for growth of H pylori cells.Genes such as ftsH and pss are absolutely required for growth because knockout mutations in these genes cannot be constructed (21,24).
Natural transformation of H pylori DNA has not been demonstrated in vivo, and whether it may occur in a clinical setting can be deduced only from laboratory studies.Bacteriophages for H pylori appear to be relatively rare (3) and seem to play little part in H pylori gene transfer.
Plasmid isolation studies were also reviewed (3).No antibiotic resistance plasmids have been identified, and no conjugation, plasmid-mediated or otherwise, has been demonstrated.The PAI contains genes related to mating pair formation and for transfer of DNA between A tumefaciens (the crown gall-producing bacterium) and plant cells (4,10,13).However, there is no evidence that the PAI participates in transfer of genomic DNA from one H pylori cell to another.

CONCLUSIONS AND PERSPECTIVES
With the availability of the sequence of H pylori strain 26695 (4) and a second strain, J99, becoming available shortly (20), molecular biologists who study H pylori have the opportunity to undertake incisive and informative analyses of the relatively small H pylori genome.
The availability of the H pylori genome sequences enables more rapid analysis of a gene of interest if the gene can be identified within the database.The genome sequence is useful as a reference with which other H pylori gene sequences, and predicted or isolated proteins can be compared.Many questions remain about the lifestyle of this important pathogen.Why was no origin of replication (4) identified in H pylori 26695?How is H pylori transmitted within humans?How does the bacterium persist throughout the lifetime of an individual and avoid host immune defenses?Why does the bacterium cause few symptoms in some individuals but cause ulcers and, more rarely, gastric cancer in others?How have the interactions between H pylori and the human host evolved over time?Molecular genetics will help supply answers to these and other questions.
The recognition that the genomes of H pylori contain conserved elements may help in the development of new drug therapies, and vaccines for treatment and complete eradication of this pathogen.Molecular methods, such as the use of DNA transformation and genetic recombination, along with the available DNA sequence data can now be exploited to make the next decade an extremely productive and exciting one for H pylori research.Can J Gastroenterol Vol 13 No 3 April 1999 Taylor

Figure 1 )
Figure 1) Map of the pathogenicity island of Helicobacter pylori strain 26695

Figure 2 )
Figure 2) Maps of several unrelated Helicobacter pylori strains including 26695.The other strains were compared previously by Jiang et al (6).The maps have been displayed to maximize the similarities among gene locations, particularly among kat (catalase gene), vacA (vacuolating cytotoxin gene), hpaA (putative adhesin gene) and pfr (bacterial ferritin).See reference 6 for relevant references and complete gene designations.The gene encoding the 26 kDa protein (6) is now referred to as tsaA (which has homology with alkyl hydroperoxide reductase).The fucT gene refers to the alpha-1,3-fucosyltransferase gene.Genomic map of H pylori 26695 adapted from reference 4. Genomic maps of all other strains adapted from reference 6.