Distribution and phylogenies of enzymes of the Embden-Meyerhof-Parnas pathway from archaea and hyperthermophilic bacteria support a gluconeogenic origin of metabolism

Enzymes of the gluconeogenic/glycolytic pathway (the Embden-Meyerhof-Parnas (EMP) pathway), the reductive tricarboxylic acid cycle, the reductive pentose phosphate cycle and the Entner-Doudoroff pathway are widely distributed and are often considered to be central to the origins of metabolism. In particular, several enzymes of the lower portion of the EMP pathway (the so-called trunk pathway), including triosephosphate isomerase (TPI; EC 5.3.1.1), glyceraldehyde-3-phosphate dehydrogenase (GAPDH; EC 1.2.1.12/13), phosphoglycerate kinase (PGK; EC 2.7.2.3) and enolase (EC 4.2.1.11), are extremely well conserved and universally distributed among the three domains of life. In this paper, the distribution of enzymes of gluconeogenesis/glycolysis in hyperthermophiles--microorganisms that many believe represent the least evolved organisms on the planet--is reviewed. In addition, the phylogenies of the trunk pathway enzymes (TPIs, GAPDHs, PGKs and enolases) are examined. The enzymes catalyzing each of the six-carbon transformations in the upper portion of the EMP pathway, with the possible exception of aldolase, are all derived from multiple gene sequence families. In contrast, single sequence families can account for the archaeal and hyperthermophilic bacterial enzyme activities of the lower portion of the EMP pathway. The universal distribution of the trunk pathway enzymes, in combination with their phylogenies, supports the notion that the EMP pathway evolved in the direction of gluconeogenesis, i.e., from the bottom up.


Introduction
Modern day reconstruction of the ancient chemical and biological scenarios that led to the origins of life on Earth approximately 3.8 billion years ago is an inherently difficult task and is intimately tied to the origins of metabolism.There are three general theories or hypotheses that attempt to delineate the favorable conditions in which life is presumed to have arisen: (1) the Oparin-Haldane model or prebiotic soup hypothesis (expounded by Stanley Miller and colleagues); (2) the pyruvatepulled chemoautotrophic origin of metabolism on the surface of metal sulfide minerals (Wächtershäuser 1988a(Wächtershäuser , 1988b)); and (3) the theory of extraterrestrial seeding of Earth, during the early formative years, with organic compounds (reviewed by Orgel 1998 andMaden 1995).The Wächtershäuser theory envisages a hyperthermophilic and autotrophic origin of life, whereas the Oparin-Haldane and extraterrestrial seeding theories presuppose a heterotrophic lifestyle (Wächtershäuser 1988a(Wächtershäuser , 1988b(Wächtershäuser , 1994)).In addition, the notion of an "RNA world," in which RNA-based life preceded DNA-based life, has gained wide acceptance, although it has detractors.However, even a putative RNA-based "organism" could have arisen only from a prebiotic chemical environment conducive to its existence (Poole et al. 1999).
Although these theories and their associated predictions have done much to provide explanations for the potential array of prebiotic chemistries, it is uncertain which one of the three best explains the origin of life.However, there are clues that can be gleaned from the distribution of key enzymes of metabolism in present-day organisms and from phylogenetic analysis of universally distributed enzymes.A similar approach has been used with DNA-dependent RNA polymerases (Klenk et al. 1994), translation factors EF-Tu and EF-G, ATPase subunits, tRNA (tRNA-Met-E and tRNA Met-I; Iwabe et al. 1989), aminoacyl-tRNA synthetases (Brown and Doolittle 1995), RNA polymerases (Iwabe et al. 1991) and heat-shock proteins (Gupta 1998).
Universal phylogenies based on small subunit ribosomal sequence comparisons have been cited as evidence that extremely thermophilic (optimal growth temperatures above 70 °C) and hyperthermophilic microorganisms (optimal growth temperatures above 80 °C) are the least evolved of present-day microorganisms, and therefore most closely resemble the last common universal ancestor (Woese et al. 1990).Thus, hyperthermophiles represent the most appropriate microorganisms with which to investigate facets of the early evolution of life and metabolism.In accordance with the "thermophiles-first" hypothesis, deep sea hydrothermal vent ecosystems have been suggested to represent the most ancient, continuously inhabited microbial habitats on Earth (Reysenbach and Shock 2002).In addition, geological evidence supports the view that hot aquatic habitats existed throughout the early history of Earth, regardless of the temperature of the ocean (circa 4.3 Ga years ago) or of the atmosphere (Baross 1998).It should be noted, however, that the notion that the first microorganisms were hyperthermophiles is not universally accepted.For example, others have argued that hyperthermophiles may have evolved from mesophilic ancestors, based on comparison of the enzyme reverse gyrase, which is found in all organisms with optimum temperatures over 70 °C (Forterre 1996).Furthermore, a hyperthermophilic origin of life is also contrary to the RNA world hypothesis because of the relative instability of RNA at high temperatures (Poole et al. 1999).
To acquire a clearer understanding of the origins of gluconeogenesis/glycolysis, it is informative to examine the distribution and phylogenetic origins of the enzymes of gluconeogenesis/glycolysis and their associated biochemical characteristics.This, in turn, could help provide supporting evidence for one or more of the competing evolutionary theories of metabolic development mentioned above.The central metabolic pathway of glycolysis is present in each of the three domains, although there is significant variation in enzymes, especially in the Archaea and in hyperthermophilic bacteria (Fothergill-Gilmore and Michels 1993, Schönheit and Schäfer 1995, Selig et al. 1997, Ronimus and Morgan 2001).
Several reviews of the distribution of glycolytic enzymes were published in 1999 or earlier, but with the rapid increase in genome sequencing, resulting in 10 new published archaeal genomes (Fothergill-Gilmore and Michels 1993, Danson et al. 1998, Galperin et al. 1998a, Cordwell 1999, Dandekar et al. 1999, Koike et al. 1999), sufficient progress has been made to warrant the new summary presented in this paper.Furthermore, the most recent of these earlier reviews (1998)(1999) examined only the distribution of enzymes, not the phylogenetic relationships between enzymes of representative organisms of the three domains.In this paper, we summarize the current state of knowledge regarding (1) enzymes of glycolysis/ gluconeogenesis among hyperthermophilic microorganisms, both bacterial and archaeal, and (2) fully sequenced mesophilic and thermophilic archaeal genomes.In addition, we examine the evolution of the universally distributed enzymes of the lower trunk pathway of glycolysis.Our results provide strong support for a gluconeogenic origin for the Embden-Meyerhof-Parnas (EMP) pathway.

Data analysis and phylogenetic tree construction
Representative amino acid sequences from the glycolytic enzyme gene families were used in BLASTP, Gapped-BLAST and PSI-BLAST searches of the non-redundant database (Altschul et al. 1990(Altschul et al. , 1997) ) and retrieved for analysis.In addition, we used the Comprehensive Microbial Resource with the gene matrix option at the Institute for Genome Research (TIGR: http://www.tigr.org/tdb/mdb/mdbcomplete.html), as well as the PEDANT database (http://pedant.gsf.de), the COG database (Tatusov et al. 2001), the BRENDA enzyme database (BRENDA: http://www.brenda.uni-koeln.de),conserved domain architecture retrieval toll (CDART at NCBI) and the KEGG enzymatic pathway database (http://www.genome.ad.jp/kegg/metabolism.html).Sequence alignments and phylogenetic trees were produced with full-length sequences as reported in the databases using the neighbor-joining method of ClustalW 1.7 and ClustalX 1.81, unless otherwise stated (Saitou andNei 1987, Thompson et al. 1994; for brevity, the alignments are not shown).The alignments were examined for accuracy by comparison with secondary structure in the sequences derived from crystal structures in the Protein Data Base (PDB).
For maximum parsimony analyses, the alignments were first formatted for PAUP* (Phylogenetic Analysis Using Parsimony (*and other methods)) using the WebPhylip Clustal converter (Lim and Zhang 1999).Maximum parsimony analyses were conducted with PAUP* 4.0 Beta 10 Version by performing heuristic searches with random stepwise-addition (10 addition-sequence replicates) with the branch swapping algorithm tree-bisection-reconnection (Swofford et al. 1996, Swofford 2002).When more than one maximum parsimony tree was saved, using the strict guidelines option, a consensus tree was generated and statistically validated with 500 bootstraps.Statistical validation of the tree branch nodes and lengths for all phylogenies were assessed using 500 bootstraps and values were rounded to % values.The triosephosphate isomerase (TPI), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), phosphoglycerate kinase (PGK) and enolase analyses utilized 282, 466, 415 and 464 phylogenetically informative amino acid residues, respectively.The number of sequences available in the database was estimated using the BLASTP (non-redundant database with daily updates) and Pfam databases (Altschul et al. 1997, Sonnhammer et al. 1997).Quoted % similarities and % identities were determined using praze (http://www.tigr.org)with alignment by the Smith-Waterman algorithm, or alternatively with the BLAST two sequence option (Altschul et al. 1997).

Distribution of glycolytic/gluconeogenic enzymes in hyperthermophiles
The distribution of genes encoding glycolytic/gluconeogenic enzymes, their Enzyme Commission (EC) numbers and four-digit identification numbers or database accession codes are shown in Table 1.Individual enzymes and the gene sequence families that encode them will be discussed sequentially starting with the first step of glycolysis, which is catalyzed by hexokinases/glucokinases.Table 2 summarizes the enzymatic activities associated with cell-free extracts of the organisms presented in Table 1 or with purified or heterologously expressed proteins from type strains or from closely related species or strains.

Glucokinases
Bacterial glucokinases (GLKs; EC 2.7.1.2;ATP:D-glucose-6-phosphotransferase) preferentially catalyze the phosphorylation of glucose in the EMP pathway, with high specificity.Sequence comparisons have shown that bacterial glucokinases have no apparent homology to the less specific hexokinases found throughout the higher Eukarya (EC 2.7.1.1;Cárdenas et al. 1998).However, a potential ancestral relationship has been suggested (Cárdenas et al. 1998) on the basis of the size of the bacterial enzyme, which is about half that of the larger eukaryal protein.
Thus, the current evidence demonstrates the presence of at least two unrelated sequence families for glucokinases/hexokinases, and most likely three, based on the lack of a convincing common ancestral sequence for bacterial GLKs and the hexokinases found in most higher eukaryotes.
The gene encoding the PGI from Pyrococcus furiosus (DSM 3638) has recently been cloned and sequenced (Hansen andSchönheit 2001b, Verhees et al. 2001b).The P. furiosus sequence possesses only 3-6% identity with the classic PGI enzyme family, and evidence from structure-based sequence analysis indicates that these PGIs are unlikely to be phylogenetically related (Hansen andSchönheit 2001b, Verhees et al. 2001b).When the P. furiosus sequence (PF0196) was used to probe the database, similar sequences were found in the P. horikoshii (strain OT3; PH1956), P. abysii (strain GE5; PA1199) and unfinished Methanosarcina barkerii and Methanosarcina mazei genomes, but not in the genome of Archaeoglobus fulgidus strain VC16.Lower scoring, paralogous sequences are present in P. furiosus, P. abysii, P. horikoshii, Methanosarcina acetivorans (strain C2A) and M. jannaschii (Table 1; Altschul et al. 1997, Verhees et al. 2001b).Using PSI-BLAST with a second iteration cycle, an ortholog to PF0196 was found in Methanopyrus kandleri that probably encodes a bona fide PGI (strain AV19; Table 1).Thus, the current evidence indicates the presence of at least two sequence families for PGIs (for species or strains in which PGI activity has been found; see Table 2).

Phosphofructokinases
Phosphofructokinases (PFKs) catalyze the third step of glycolysis: phosphorylation of fructose-6-phosphate at the 1 position (recently reviewed by Ronimus and Morgan 2001).The well-characterized, textbook, Family A ATP-dependent PFKs (ATP-PFKs; EC 2.7.1.11),which catalyze an irreversible reaction, and the pyrophosphate-dependent PFKs (PP i -PFKs; EC 2.7.1.90),which catalyze a reversible reaction, are widely distributed among bacteria and eukaryotes.The PP i -PFKs are found predominantly in anaerobic bacteria, some plants and lower eukaryotes, but also in the crenarchaeote Thermoproteus tenax (Morgan and Ronimus 1998, Siebers et al. 1998, Ronimus and Morgan 2001).Thermotoga has both a Family A ATP-PFK, allosterically regulated by PP i and polyphosphate (Ding et al. 2001), and a PP i -PFK.The ATP-PFK is also regulated by typical bacterial effectors such as ADP (activation) and phosphoenolpyruvate (inhibition; Hansen et al. 2002).Sequence analysis also indicates that a Family A ATP-PFK is present in Aquifex aeolicus (AE1708).A recent iterative PSI-BLAST sequence analysis suggests that diacylglyceride kinases, sphingosine kinases and NAD kinases may be distantly related to PFKs and may share a similar phosphate-donor binding site and reaction mechanism (Labesse et al. 2002).Kengen et al. 1994, 1995, 2001, Selig et al. 1997, Koga et   Family B ATP-PFKs (also EC 2.7.1.11),which catalyze an essentially irreversible reaction, are related to Family B sugar kinases that catalyze various substrate phosphorylations, such as those of adenosine and ribose (Sigrell et al. 1998).Archaeal Family B ATP-PFKs have been described for Aeropyrum pernix (Hansen andSchönheit 2001a, Ronimus et al. 2001b) and Desulfurococcus amylolyticus (Ding 2000, Hansen andSchönheit 2000).Others may be present in Pyrobaculum aer-   (Altschul et al. 1997, Ronimus andMorgan 2001).Other high scoring sequences with the PFKB/ribokinase signature (with E values less than 10 -7 ) are present in all of the archaeal genomes (Altschul et al. 1997, Sonnhammer et al. 1997; PEDANT: http://pedant.gsf.de).However, the presence of ATP-PFK activity associated with any of these remains to be substantiated by biochemical data.Adenosine diphosphate-dependent PFKs (ADP-PFKs; tentatively named Family C PFKs; EC 2.7.1.146)have been found in Thermococcus zilligii (Ronimus andMorgan 2001, Ronimus et al. 2001a), T. celer (Selig et al. 1997), T. litoralis (Selig et al. 1997, Koga et al. 2000), Pyrococcus furiosus (Tuininga et al. 1999, Koga et al. 2000) and M. jannaschii (Verhees et al. 2001a), although the last enzyme is apparently bifunctional, with ADP-dependent glucokinase activity (Sakuraba et al. 2002).It is also present in some mesophilic methanogens (Table 2).The crystal structure of T. litoralis ADP-glucokinase (467 amino acids) has shown that the enzyme has three-dimensional similarities to Family B sugar kinases, including the human adenosine kinase and the E. coli ribokinase (each approximately 33 kDa), although the level of sequence identity is low (only seven strictly conserved residues in a structure-based alignment; Ito et al. 2001).The lack of sequence identity was such that a BLASTP comparison with an expect size of 1000 revealed no significant sequence similarity between the T. litoralis ADP-glucokinase and the E. coli ribokinase (Altschul et al. 1997; results not shown).Nevertheless, the authors proposed a common evolutionary origin for the two sequence families (Ito et al. 2001), and a Family B sugar kinase motif was found in a sequence-based comparison (Ronimus et al. 2001b).Halococcus saccharolyticus (Johnsen et al. 2001), Haloferax mediterranei and Haloarcula vallismortis utilize a 1-phosphofructokinase to degrade fructose, whereas H. saccharolyticus, H. saccharovorum and H. vallismortis use a modified Entner-Doudoroff pathway to degrade glucose (Altekar andRangaswamy 1992, Rangaswamy andAltekar 1994).Thus, in summary, the current data (Tables 1 and 2 and above discussion) implies that there is a minimum of two sequence families to account for PFKs.
The discovery of a second FBPase in E. coli, encoded by the gene glpX and classified as FBPase II (Donahue et al. 2000), and the FBPase in B. subtilis encoded by fbp (Class III FBPase), which shares no significant homology with other FBPases, has significantly affected the classification of FBPases (Fujita et al. 1998).Following recent descriptions of the archaeal FBPases/IMPases and the P. furiosus FBPase, a new classification scheme has been proposed with four classes of FBPases and one class of IMPases (the latter with little or no FBPase activity).The Class I FBPases (classical FBPases of bacteria and eukarya), Class IV FBPases (including the P. furiosus enzyme) and Class V IMPases share significant sequence similarity (Verhees et al. 2002a).Unfortunately, no structures are yet available for the FBPase Class II or III enzymes to substantiate potential distant sequence similarities.
Another recent development in FBPase enzymology was the cloning, expression, purification and characterization of an FBPase from the heterotroph Thermococcus kodakaraensis KOD1 (Rashid et al. 2002).This FBPase is specific for fructose-1,6-bisphosphate, in contrast to FBPase/IMPases, and its expression is repressed when strain KOD1 is grown in a sugar-based medium.High scoring orthologs of the T. kodakaraensis FBPase (E values ranging from between 10 -152 and 10 -117 ) are present in all of the archaea in Table 1 except Halobacterium NRC-1 and M. acetivorans.Whether the FBPase/ IMPases (similar to the MJ0109 gene product) or orthologs of the T. kodakaraensis FBPase fill the role of an FBPase in other archaeal species in vivo needs to be determined by additional biochemical evidence.However, their wide distribution in hyperthermophiles and archaea is intriguing.Thus, the current evidence indicates that there are likely to be four separate sequence families encoding enzymes with FBPase activity in the three domains, with Class IV FBPases and the orthologs of the T. kodakaraensis enzymes common in hyperthermophilic bacteria and archaea (Rashid et al. 2002, Verhees et al. 2002a).

Fructose-1,6-bisphosphate aldolase
Genes encoding aldolases (EC 4.1.2.13) that catalyze the reversible aldol cleavage of fructose-1,6-bisphosphate into glyceraldehyde-3-phosphate and dihydroxyacetone phosphate are present in the Thermotoga and Aquifex genomes (Table 1; Cordwell 1999, Dandekar et al. 1999).Two classes of aldol-ases are distinguishable on the basis of their reaction mechanisms (Fothergill-Gilmore and Michels 1993): Class I aldolases form a Schiff-base intermediate between the substrate and a lysine residue, whereas Class II aldolases do not form a covalent intermediate.Class II enzymes are found in bacteria and lower eukaryotes (yeasts, fungi and algae), whereas Class I enzymes are found in bacteria, protists and metazoa (Fothergill-Gilmore and Michels 1993).
Recently, a Class I aldolase was identified in E. coli (dhnA gene product), and sequence analysis showed that orthologs were present in several archaeal genomes including Archaeoglobus, P. horikoshii, M. jannaschii, M. thermoautotrophicum and Aeropyrum pernix, and also in the bacterium Aquifex aeolicus (Dandekar et al. 1999, Galperin et al. 2000).Sequence analysis shows that orthologs to dhnA are also present in other archaea, excluding T. volcanium, T. acidophilum and P. aerophilum, and in some cases two or more paralogous sequences are present (Altschul et al. 1997, Siebers et al. 2001; see Table 1).
The lack of an obvious aldolase gene in P. aerophilum is interesting because this organism apparently has an otherwise complete glycolytic pathway.Pyrobaculum aerophilum does, however, contain a putative deoxyribose aldolase (PAE1231) that has been shown to possess sequence similarity to DhnAlike aldolases and which could supply aldolase activity (Galperin et al. 2000).Alternatively, aldolase activity may be provided by an L-fuculose-1-phosphate aldolase encoded by fucA, which, in E. coli, possesses up to 4% fructose-1,6-bisphosphate aldolase activity (Ghalambor and Heath 1966).More recently, the aldolases from P. furiosus and T. tenax have been cloned, expressed and characterized (Siebers et al. 2001).These archaeal dhnA-like aldolases are also present in some bacteria (including Aquifex and Thermotoga), but not in the Eukarya, and Siebers et al. (2001) have proposed the name archaeal type Class I aldolase.
Thorough iterative-based sequence analyses have revealed sequence signatures typical of classical Class II aldolases in Class I dhnA-encoded archaeal aldolases, highly suggestive of a distant relationship (Galperin et al. 2000).Furthermore, based on their common TIM-barrel structural folds, an ancestral origin for the Class I and II aldolases has been proposed as part of a phylogenetic evolutionary history of central metabolic TIM-barrel proteins (Copley and Bork 2000).In summary, the most parsimonious explanation for the current data, although still somewhat controversial (Fothergill-Gilmore and Michels 1993), is for a single Schiff-base ancestral gene having evolved into the present-day representatives of Class I and ultimately Class II aldolases after development of a metal-dependent reaction mechanism in the latter (Cooper et al. 1996, Galperin et al. 1998a, 2000).

Triosephosphate isomerases
Triosephosphate isomerase (TPI; EC 5.3.1.1)is the archetypical (β/α) 8 -barrel protein (TIM-barrel) that catalyzes the reversible inter-conversion of dihydroxyacetone phosphate and glyceraldehyde-3-phosphate, ensuring that, at least in glycolysis, two ATP molecules are eventually obtained from the breakdown of glucose (Fothergill-Gilmore and Michels 1993, Copley and Bork 2000).The enzyme itself has been described by some investigators as catalytically perfect because, based on the K cat /K m ratio, its reaction rate is limited only by substrate diffusion (Fothergill-Gilmore and Michels 1993).
A total of 127 TPI genes were found in the Pfam database (Sonnhammer et al. 1997), and the sequences are distributed throughout the three domains of life (Cordwell 1999, Dandekar et al. 1999, Koike et al. 1999).A total of 31 representative sequences from the three domains, including sequences from archaea and hyperthermophilic bacteria, were used to produce an alignment and construct an associated phylogenetic tree (Figure 1).When the sequence from M. jannaschii was used for sequence retrieval from the TIGR Comprehensive Microbial Database, % identities varied from 67% (Pyrococcus abysii) to 25% (Borrelia bugdorferi B31) in an analysis containing TPI sequences from 66 genomes.The archaeal sequences form a clearly separated domain-specific cluster with a particularly long branch length, with 100% bootstrap support from both maximum parsimony and neighbor-joining analyses.The eukaryal sequences group together, as do the bacterial sequences, but the domain-specific clusters are not nearly so marked or bootstrap-supported as that of the Archaea.
Triosephosphate isomerases of the hyperthermophiles Aquifex and Thermotoga do not branch as deeply as expected on the basis of their 16S rDNA sequence comparisons.However, the Thermotoga TPI was isolated as a fusion complex with both TPI and phosphoglycerate kinase activities (Schurig et al. 1995b, Beaucamp et al. 1997, Hensel et al. 2001), whereas the usual TPI quaternary structure is homodimeric with no associated PGK activity (Fothergill-Gilmore and Michels 1993, Hensel et al. 2001, Yu andNoll 2001).This could affect the rate of evolution of both the TPI and phosphoglycerate enzyme sequences.
For the archaeal TPIs, the crenarchaeal enzymes are the most deeply rooted, which is also somewhat unexpected given that the organisms shown grow aerobically and, based on 16S rDNA sequence comparisons, are relatively late archaeal arrivals (Woese et al. 1990).Furthermore, T. acidophilum and S. solfataricus do not process glucose via glycolysis but instead use the Entner-Doudoroff pathway (Selig et al. 1997, Danson et al. 1998).Therefore, the evolutionary constraints operating on crenarchaea may have been different than those that existed throughout the evolution of, for example, anaerobic fermentative archaea.In general, TPIs are highly conserved and universally distributed, and a single sequence family can explain currently identified and characterized TPIs.
Thirty-five of the 591 GAPDH amino acid sequences found in the Pfam database (Sonnhammer et al. 1997) were used for the alignment and associated neighbor-joining tree shown in Figure 2. As in the TPI phylogeny, the archaeal sequences (except that of Haloarcula vallismortis) cluster together with strong bootstrap support (100% with both phylogenetic methods) on a relatively long branch.The Thermoplasma, S. solfataricus, Halobacterium and A. pernix sequences branch near the base of the main archaeal cluster, as observed with the TPI phylogeny.Two large clusters are shown, one with mostly bac- terial and the other with mostly eukaryotic sequences.For example, the Haemophilus, Vibrio, E. coli and Synechocystis sequences group with predominantly eukaryotic sequences.
Several of the sequences have deep branchings but were not well resolved phylogenetically and do not have strong support from the bootstrapping analysis, including M. tuberculosis, Borrelia burgdorferi, H. vallismortis and the P. aeruginosa-Tritrichomonas lineage.Thus, regarding the phylogenetic distribution of GAPDHs, the evidence suggests that a single gene family is present in hyperthermophilic microorganisms and that GAPDHs are universally distributed throughout the three domains.It should be noted that the tungsten-containing enzyme glyceraldehyde-3phosphate:ferredoxin oxidoreductase (GAP:OR), a novel site for regulation of glycolysis, is present in Pyrococcus furiosus and M. jannaschii and functions in the glycolytic reaction, leaving the classic GAPDH version to operate in gluconeogenesis (van der Oost et al. 1998, Verhees et al. 2001a).High scoring orthologs are also present in P. aerophilum, Aeropyrum pernix and Archaeoglobus fulgidus and in other Pyrococcus species (data not shown).Importantly, there is a clear phylogenetic separation, with strong bootstrap support (100%), of the archaeal GAPDH sequences and the bacterial/ eukaryotic GAPDHs.

Phosphoglycerate kinase
Phosphoglycerate kinase (PGK; EC 2.7.2.3) catalyzes the reversible transfer of a phosphate group from 1,3-bisphosphoglycerate to ADP, resulting in net substrate-level phosphorylation of ATP and the formation of 3-phosphoglycerate.The enzyme is usually a monomer with a two-domain, lobed structure comprised of six β-strands surrounded by an equal num-Figure 2. Phylogenetic tree of glyceraldehyde-3-phosphate dehydrogenases (phosphorylating; EC 1.2.1.12)generated with the neighbor-joining method.Short stretches of amino acid residues were removed from the Borrelia (4) and Tritrichomonas (11) sequences to maintain alignment based on comparison with crystal structure data.See Figure 1 caption for further explanation.ber of α-helices, with the active site located at the hinge region between the domains (Fothergill-Gilmore and Michels 1993, Fleming and Littlechild 1997).A PGK-encoding gene homologous to the E. coli sequence is present in all archaea and hyperthermophilic bacteria shown in Table 1.The phylogenetic positions of representative archaeal, bacterial and eukaryal PGKs are shown in Figure 3.There were 202 PGK sequences in the Pfam database (Sonnhammer et al. 1997), and a pgk gene has been previously identified in Pyrobaculum aerophilum (Fitz-Gibbon et al. 1997), S. solfataricus (Jones et al. 1995), Methanobacterium bryantii and Methanothermus fervidus (Fabry et al. 1990).The archaeal sequences cluster together on a relatively long branch with high bootstrap support (100%) but on a slightly shorter offshoot compared to the TPI and GAPDH trees.This supports the monophyletic character of the domain Archaea as proposed on the basis of 16S rDNA sequence data akin to the phylogenies of TPIs and GAPDHs (Woese et al. 1990).Similar to what was found for TPIs and GAPDHs, some of the individual PGK branchings of the archaeal sequences do not follow their respective 16S rDNA lineages.For example, Halobacterium strain NRC-1 is positioned toward the end of the 16S rDNA tree within the euryarchaeotic branch and is considered a relatively late arrival in evolutionary terms, but in this case the halobacterial PGK branches at the root of the archaeal cluster.Overall, the current evidence suggests that there is only one PGK-encoding sequence family and that it is universally distributed (Sonnhammer et al. 1997), although this enzyme is by-passed during glycolysis in favor of GAP:OR in Thermococcus and Pyrococcus species and in M. jannaschii (van der Oost et al. 1998, Verhees et al. 2001a).
Both Thermotoga and Aquifex, as well as M. acetivorans, S. tokodaii, S. solfataricus, T. acidophilum and T. volcanium, appear to possess both iPGM and dPGM activities (see Table 1).A gene encoding an iPGM is present in all of the archaeal genomes sequenced thus far.In M. thermoautotrophicum and M. jannaschii, two high scoring, iPGM-encoding genes were found when the database was searched with the T. maritima TM0542 iPGM sequence.The M. jannaschii genes (MJ0010 and MJ1612; Table 1) have been cloned and expressed and it has been confirmed that both gene products have thermostable iPGM activity (Graham et al. 2002).The MJ1612 gene product and the P. furiosus gene product PF1959 also encode thermostable iPGMs (van der Oost et al. 2002).
Although there are, overall, two unrelated gene families for PGMs, all hyperthermophiles possess genes encoding iPGMs.Because of the lack of an all-encompassing distribution of iPGMs and dPGMs, no phylogenetic comparisons were undertaken for iPGMs and dPGMs.However, a recent phylogenetic analysis of bacterial and archaeal iPGMs, with an emphasis on the latter, has suggested that lateral gene transfer has had a significant effect on the distribution and that the iPGMs from Halobacterium NRC-1 and Methanosarcina barkeri may have been acquired laterally from bacteria (Graham et al. 2002).The phylogeny of archaeal iPGMs and dPGMs has also been recently examined by van der Oost et al. (2002).The analysis provides support for the notion that the dPGM orthologs present in some archaea (Table 1; SSO2236, ST2120, TA1347 and TV1358) act as bona fide dPGMs in these respective species.In addition, the bacterial and archaeal iPGMs could be distin-guished easily at the sequence level by the presence of obvious indels.Overall, the phylogenetic distribution of iPGMs supports the notion that they could have preceded the arrival of dPGM based on their presence in all hyperthermophiles, particularly within the domain Archaea (Graham et al. 2002).

Enolases
Enolase (EC 4.2.1.11;phosphopyruvate dehydratase) catalyzes the reversible dehydration of 2-phosphoglycerate to phosphoenolpyruvate, which is usually then processed for ATP production in the next step of glycolysis (Fothergill-Gilmore and Michels 1993).Enolase possesses a structure similar to that of TPI (TIM eight-stranded α-β barrel), but has a slightly different topology (Fothergill-Gilmore and Michels 1993) and is usually active as a dimer (Hannaert et al. 2000).An enolase superfamily has been proposed on the basis of structure-function relationships, and the enzymes within this family, including glucarate dehydratase, galactonate dehydratase and perhaps carboxyphosphenolpyruvate synthase, can extract the α-proton from carboxylic acids (Babbit et al. 1996).
A total of 139 sequences were found in the Pfam database (Sonnhammer et al. 1997).These sequences are universally distributed throughout the domains and are present in all organisms listed in Table 1.Enolase-encoding genes have been identified in A. aeolicus, A. fulgidus, M. thermoautotrophicum, M. jannaschii, P. aerophilum and Pyrococcus horikoshii (Fitz-Gibbon et al. 1997, Cordwell 1999, Dandekar et al. 1999, Koike et al. 1999, Makarova et al. 1999), and they are present in all of the hyperthermophiles and archaea presented in Table 1.
Two genes encoding enolase-related proteins were found in P. abysii and P. horikoshii when the A. fulgidus (AF1132), T. maritima (TM0877) and M. jannaschii (MJ0232) genes were used to search the TIGR Comprehensive Microbial Resource.Compared with TM0877, the homology score for one set of the genes (PAB1126 and PH1942) significantly exceeded (10 -117 and 10 -115 , respectively) that of the other set (PH1630 and PAB0367; 10 -11 and 10 -10 , respectively), raising the possibility that the lower scoring genes encode enzymes with a related activity or that, alternatively, they are expressed under particular circumstances.
As shown in the phylogram in Figure 4, the enolases of the Archaea cluster into a main group and a subgroup, both with strong bootstrap support.The single intein-containing M. jannaschii sequence (MJ0232) clusters with the sequences from P. abysii (PAB1126) and P. horikoshii (PH1942), whereas the remainder of the archaeal sequences cluster together.Except for the sequence from Euglena gracilis, which clusters with enolases from the domain Bacteria, the bacterial and eukaryotic sequences also cluster in domain-specific arrangements.Thus, enolases are universally present in hyperthermophiles and are highly conserved and widely distributed among all organisms.

Conversion of phosphoenolpyruvate to pyruvate
The last recognized step of glycolysis, the conversion of phos-phoenolpyruvate (PEP) to pyruvate, is usually catalyzed by pyruvate kinase (PK; EC 2.7.1.40),which is often a tetramer and carries out substrate-level phosphorylation of ADP to produce ATP (Fothergill-Gilmore and Michels 1993, Schramm et al. 2000).Pyruvate kinase normally requires a monovalent cation (usually K + , but not required for the PK from T. tenax) and two divalent cations for activity.The PK subunit contains four domains (N, A, B and C) and is thus more complex than other glycolytic enzymes (Fothergill-Gilmore and Michels 1993, Schramm et al. 2000).Interestingly, the enzyme from T. tenax is unaffected by the usual effectors and is apparently partially regulated by positive cooperative binding of PEP and divalent cations (Schramm et al. 2000).
A total of 108 PK sequences were found in the databases (Altschul et al. 1997), but not all archaea or hyperthermophilic bacteria possess PKs (Table 1).The lack of a PK-encoding gene in Archaeoglobus and M. thermoautotrophicum has been noted previously (Cordwell et al. 1999, Dandekar et al. 1999, Koike et al. 1999, Marakova et al. 1999).The lack of a PK in these organisms precludes the generation of a universally distributed phylogeny for the organisms shown in Table 1, similar to those presented here for TPIs, GAPDHs, PGKs and enolases.
A recent sequence analysis using the T. tenax PK sequence has shown that PKs fall into two broad groups, Types I and II, which are differentiated by their general allosteric regulatory properties (Schramm et al. 2000).Type I enzymes are found in eukaryotes and some bacteria and require either fructose-1,6bisphosphate or other sugar phosphates for activity, whereas type II enzymes are usually regulated by either AMP or ATP and are found in the archaea and some bacteria.
The reaction catalyzed by PK in the gluconeogenic direc- tion is usually catalyzed by phosphoenolpyruvate synthase (PEP synthase; EC 2.7.9.2, the systematic name is ATP: pyruvate, water phosphotransferase, but it is also known as pyruvate, water dikinase or PEP synthetase).Sequences that encode either PEP synthases or pyruvate, orthophosphate dikinases (PPDKs; EC 2.7.9.1) are distributed among the organisms presented in Table 1.Phosphoenolpyruvate synthases catalyze the formation of PEP, orthophosphate and AMP from pyruvate and ATP, whereas PPDKs catalyze the reversible formation of PEP, AMP and pyrophosphate from pyruvate, ATP and orthophosphate.Pyruvate, orthophosphate dikinases typically contain three domains: an N-terminal PEP/pyruvate binding domain (Pfam 01326.5); a centrally placed mobile domain found in PEP-utilizers (Pfam 00391.5);and a C-terminal TIM-barrel domain (Pfam 02896.5;Sonnhammer et al. 1997).
Pyruvate, orthophosphate dikinases are found in some anaerobic protists, such as Entamoeba; in numerous, mostly anaerobic bacteria; and in many plants.These enzymes can substitute for PK (Reeves 1968).
Recently, a PEP synthase was purified from P. furiosus that accounts for 5% of the cellular protein (PF0043 or mlrA gene; Hutchins et al. 2001).It was suggested that this enzyme operated in the gluconeogenic direction and that it might have an energy-spilling role when P. furiosus is grown in carbohydrate-rich medium.However, the idea that the enzyme acts preferentially in gluconeogenesis has been challenged by data obtained from the purification of an AMP-dependent kinase (catalyzing the reverse of the reaction catalyzed by PEP synthase, i.e., pyruvate production and ATP formation) from the same organism (Sakuraba et al. 1999) that was shown to have an N-terminal amino acid sequence identical to that of the mlrA-encoded product (PF0043; Sakuraba and Ohshima 2002).The enzyme, which was induced in the presence of maltose-containing media, is thought to utilize AMP that is generated by the ADP-dependent glucokinase and ADP-PFK for the generation of ATP (Sakuraba et al. 1999, Sakuraba andOhshima 2002).
Because of the multiple pathway connections in pyruvate metabolism (including the tricarboxylic acid cycle, amino acid synthesis, and conversion to acetyl-CoA and acetyl-phosphate, among others) and the consequent evolutionary considerations (varying selective pressures dependent on an individual organism's physiology), an analysis of the phylogeny of PKs and PEP synthases/PPDKs was not undertaken.Overall, the nexus of interconnecting pathways leading to and between phosphoenolpyruvate, pyruvate and acetyl-CoA lends support to a pivotal role for these transformations in the origins of metabolism, and circumstantially a gluconeogenic origin for the Embden-Meyerhof-Parnas pathway.

Discussion
We have summarized the distribution of glycolytic enzymes in hyperthermophilic bacteria and archaea and examined the phylogeny of universally distributed TPIs, GAPDHs, PGKs and enolases.The identification of biochemical activities in archaea and hyperthermophilic bacteria (Table 2), which com-plements and supports the gene matrix in Table 1, has also been summarized.Because hyperthermophilic microorganisms form deep lineages within the three-domain model for extant life, the data provide insight into the nature of the early evolution of metabolism.The multiplicity of gene sequence families encoding GLKs, FBPases, PGIs, PFKs and perhaps aldolases at the top of the Embden-Meyerhof-Parnas pathway, combined with the fact that enzymes of the lower trunk pathway are universally present and highly conserved, can be taken as strong support for a gluconeogenic origin for the pathway (Romano and Conway 1996, Galperin et al. 1998a, Dandekar et al. 1999, Ronimus and Morgan 2001, Ronimus et al. 2001b).Phosphoglycerate mutases represent a special case and are encoded by two sequence families; however, the 2,3-bisphosphoglycerate-independent form is present in all hyperthermophiles, suggesting that it is the more ancient of the two forms (Graña et al. 1995, Dandekar et al. 1999).
A scenario describing the origin of the EMP pathway that fits with the current phylogenetic distributions is that the genes encoding TPIs, GAPDHs, PGKs and enolases were all present before the divergence of the archaeal and bacterial domains, i.e., in the presumed last universal common ancestor (LUCA; Woese et al. 1990, Yu et al. 1994, Dandekar et al. 1999).According to this scenario, the lower trunk portion of the EMP pathway, with a high degree of reversibility, is the most well conserved and its origins are more ancient (Fothergill-Gilmore and Michels 1993, Schönheit and Schäfer 1995, Romano and Conway 1996, Selig et al. 1997, Dandekar et al. 1999, Ronimus and Morgan 2001).This portion of the pathway is indispensable given the role it plays in amino acid, pentosephosphate and purine synthesis, and is thus under strict evolutionary maintenance (Galperin and Koonin 1999).These enzymes of the trunk pathway are common to all three domains, and are also common to the Entner-Doudoroff pathway (Romano and Conway 1996), emphasizing their central nature in the origin of metabolism in the early archaean time.In contrast, there are several instances where the upper glycolytic pathway (and the Entner-Doudoroff pathway) of archaea has evolved unique enzymes compared with the pathway in Thermotoga maritima, which resembles a more classical glycolytic sequence (Schönheit and Schäfer 1995, Selig et al. 1997, Danson et al. 1998).An alternative scenario for the evolution of the entire pathway is that the enzymes of the lower trunk pathway, being essential, would be immune to replacement by either recruitment or lateral gene transfer.The enzymes of the upper portion of the pathway, on the other hand, could have been lost in selected organisms and the activity restored through recruitment or lateral gene transfer.
In some cases, exemplified by the Family A PP i -PFK in T. tenax and the eukaryal-bacterial type of FBPase in Halobacterium, lateral gene transfer can be invoked to explain the distribution (Ronimus and Morgan 2001).In other cases, the distributions of enzymes such as the ADP-dependent kinases (glucokinases and PFKs) of Pyrococcus, Thermococcus and some methanogens are best explained by recruitment of an enzyme with similar activity and extensive modification of the parent sequence.Interestingly, a novel glycolytic pathway op-erates simultaneously in Thermococcus zilligii during glucose fermentation.This pathway utilizes pentose phosphoketolase activity that is present in some lactic acid bacteria (Xavier et al. 2000).The two pathways are used in approximately a 2:1 ratio (novel:glycolysis).The extent to which similar unexpected deviations from the archaeal glycolytic/gluconeogenic pathways occur remains to be determined.For some enzymes (TPIs, for example), the catalytic activity does not seem to have been improved upon significantly (ignoring thermal stability in enzymes of hyperthermophiles or solubility of enzymes of halobacteria); apparently, there is little evolutionary gain to be made by making changes to the sequence.In addition, early thermophilic microorganisms (or LUCA) could have gained an advantage by converting glyceraldehyde-3phosphate, which is relatively unstable (half-life of 3.4 min at 80 °C), to fructose-1,6-phosphate (half-life of more than 3 h at 90 °C; De Rosa et al. 1984, Ronimus and Morgan 2001, Schramm et al. 2001).It has been suggested that the relative thermal instability of several lower trunk pathway intermediates, including phosphoenolpyruvate, glyceraldehyde-3-phosphate, 1,3-bisphosphoglycerate and dihydroxyacetone phosphate, played a role in the streamlining of glycolysis in heterotrophic Thermoproteus tenax (Schramm et al. 2000).Even pyruvate is well known to be relatively unstable (von Korff 1969), supporting the contention that a selective metabolic advantage would have been gained through the formation of more stable six-carbon phosphorylated sugars early in the evolution of metabolism.
The glycolytic/gluconeogenic pathway is ancient and central to the metabolism of all organisms (Fothergill-Gilmore and Michels 1993, Selig et al. 1997, Cordwell et al. 1999, Dandekar et al 1999, Koike et al. 1999, Marakova et al. 1999, Galperin et al. 2000, Ronimus and Morgan 2001).The development of the pathway early in evolution appears to have occurred by chance assembly of enzymes that evolved independently, as there are only a few sequence similarities between individual enzymes, although some are structurally analogous (Fothergill-Gilmore 1986, Fothergill-Gilmore and Michels 1993, Copley and Bork 2000).For example, TPIs, Class I and Class II fructose-1,6-bisphosphate aldolases, enolase, pyruvate kinase and phosphoenolpyruvate synthetase all have TIM-barrel structures, or modifications of these, and a tentative phylogeny of all TIM-barrel proteins has been proposed (Copley and Bork 2000).
The control of glycolysis in hyperthermophilic microorganisms also raises some interesting questions with respect to the evolution of the pathway.For example, T. tenax possesses a non-allosteric PP i -PFK, and the first regulatory step in the pathway occurs at GAPDH/GAPN, at the juncture of the upper and lower portions of the pathway (Brunner et al. 2001).Likewise, in Pyrococcus furiosus, the transcriptional control of GAPDH and GAP:OR also represents a novel site for regulation, again in the lower trunk portion of the pathway (van der Oost et al. 1998).
The evolutionary rates of glycolytic enzymes have been estimated by Fothergill-Gilmore and Michels (1993) by comparison of amino acid sequences from groups of selected organ-isms with estimated times of divergence.Enzymes of the trunk pathway of glycolysis, in particular, have rates of evolution among the slowest known (TPIs, PGKs and enolases), and varied by between five and six accepted point mutations (PAMS) per 100 million years (Dayhoff 1978).Glyceraldehyde-3phosphate dehydrogenase was the most slowly evolving of these, with only 3% of its catalytic domain changing in 10 8 years (Fothergill-Gilmore and Michels 1993).In contrast, rates for ATP-hexokinases, PFKs (both N-and C-terminal halves of the Family A PFKs), dPGMs and pyruvate kinases are higher, although the enzymes are still well conserved (Fothergill-Gilmore and Michels 1993).Because of their slow rates of change, the lower glycolytic enzymes can serve as suitable molecular chronometers, albeit with somewhat lower information contents than 16S/18S rDNA (Ludwig et al. 1998).
The domain-specific clustering of archaeal TPIs is striking and provides support for early divergence of the archaeal sequences and, simultaneously, a closer relationship between the bacterial and eukaryal proteins (Brown and Doolittle 1997).This topology is supported by a recent study of eukaryal compartment-specific TPI isoforms and a study examining the evolutionary history of some glycolytic enzymes of the facultative intracellular parasite Bartonella henselae (Liaud et al. 2000, Canback et al. 2002).The TPI phylogeny incorporating the latest archaeal sequence data can also be compared, in part, with that obtained previously with primarily bacterial and eukaryal sequences and a single archaeal sequence (Keeling and Doolittle 1997).In this study, the eukaryal sequences were found to be most closely related to alpha-proteobacterial proteins, probably derived from a promitochondrial endosymbiont, and were most distantly related to the archaeal sequences.
The domain-specific clustering of the archaeal GAPDH sequences (Figure 2) and the long branch length are similar to those found for archaeal TPIs.The bacterial and eukaryal sequences do not group in a domain-specific fashion nearly as well as TPIs.As observed in three previous studies (Henze et al. 1995, Brown and Doolittle 1997, Liaud et al. 2000), the bacterial and eukaryal sequences, likely acquired through endosymbiotic gene transfer, are more similar to each other than to the archaeal sequences.The individual branching points within the clustering of the archaeal GAPDH sequences may be explained partially by several factors.First, the presence of the GAP:OR in Pyrococcus, Thermococcus and M. jannaschii, which operates only in the glycolytic direction, could help explain why the GAPDHs of these organisms branch later than those of other organisms such as S. solfataricus (van der Oost et al. 1998, Verhees et al. 2001a).When Pyrococcus and Thermococcus species are grown fermentatively on carbohydrates such as maltose, and when M. jannaschii is grown under starvation conditions (in which it is presumed to utilize glycogen; Verhees et al. 2001a), their GAPDHs function almost entirely in the gluconeogenic direction and would thus have been under different evolutionary constraints than those that would exist were the enzymes to function bidirectionally.In addition, Thermoproteus tenax possesses two GAPDHs: a traditional one, operative in gluco-neogenesis; and a unique, non-phosphorylating form (GAPN; EC 1.2.1.9)that is unrelated to the archaeal, bacterial and eukaryal GAPDHs (Brunner et al. 2001).The latter is instead related to a group of aldehyde dehydrogenases and functions in glycolysis (Brunner et al. 1998, Brunner andHensel 2001).Furthermore, Thermoplasma species and Sulfolobus are thought to utilize the Entner-Doudoroff pathway exclusively for the metabolism of glucose and generation of energy through respiration via the TCA cycle (Danson and Hough 1991, Selig et al. 1997, Danson et al. 1998).The overall topology of the GAPDH tree is similar to that obtained with only four archaeal sequences by Brown and Doolittle (1997), in that the archaeal sequences are the most distant and are on a separate lineage, indicative of early divergence of the archaeal sequences (and of a close relationship between the bacterial and eukaryal GAPDHs).
In the PGK tree shown in Figure 3, the archaea grouped together with 100% bootstrap support.One consideration with respect to phylogenetic interpretation of Figure 3 is the presence of a GAP:OR in Pyrococcus and Thermococcus species and in M. jannaschii that bypasses the PGK reaction.In these organisms, it is likely that PGKs have been optimized for gluconeogenesis over the course of evolution.In addition, T. maritima cells contain both a monomeric PGK and a homotetrameric PGK-TPI fusion complex with both TPI and PGK activity (only one PGK-encoding gene, pgk, has been identified in T. maritima; Crowhurst et al. 2001, Hensel et al. 2001), A TPI-PGK fusion complex from Thermotoga neapolitana has also been characterized (Yu and Noll 2001).The placement of the halobacterial sequence at the base of the archaeal sequences is surprising, but this placement is supported by three other phylogenetic analyses of PGKs.For instance, the pgk gene from H. vallismortis has been sequenced and was found to branch deeper than the PGK sequences from the methanogens M. bryantii and Methanothermus fervidus, yet still within the archaeal sequence cluster (Brinkmann and Martin 1996).This placement of the H. vallismortis sequence is similar to that reported in a parsimony-based phylogenetic analysis of PGKs (Fleming and Littlechild 1997) and neighbor-joining by Brunner et al. (2001).Brown and Doolittle (1997) concluded that the archaeal sequences were highly divergent from the bacterial and eukaryal sequences, and that the association of eukaryal and bacterial PGKs is yet another example counter to the 16S/18S rDNA phylogeny.
Finally, in the enolase phylogeny (Figure 4), the archaeal sequences again cluster together with the exception of M. jannaschii and the extra low-scoring enolase-related genes of P. abysii and P. horikoshii.If one assumes that the lower scoring homologs from P. abysii and P. horikoshii are not true enolases, but serve another as yet unknown cellular function, then the general conclusion is that, like GAPDHs, TPIs and PGKs, the enolase sequences (except for MJ0232) form an archaea-specific cluster reflecting their monophyletic character based on 16S rDNA sequences.However, there are instances where a lack of agreement with the 16S/18S rDNA phylogeny exists.For example, both the Sulfolobus and Aeropyrum enolase sequences cluster together and are the deepest lineage within the main group of archaeal sequences, counter to their placement based on 16S rDNA sequences.However, the euryarchaeal sequences of M. thermoautotrophicum, A. fulgidus and Halobacterium strain NRC-1 cluster together, in agreement with 16S rDNA sequence comparisons.
Overall, the phylogenies presented here support the monophyletic character of the Archaea with strong bootstrap support, although in some instances, the positions of branching points of individual species did not directly correspond to the topologies based on 16S/18S rDNA trees.For the GAPDHs, PGKs and enolases, the eukaryal proteins were routinely more closely associated with their bacterial counterparts (but for TPIs they were separated).This in itself is contrary to the often quoted sister relationship of the Archaea and Eukarya based on transcription and translation-related sequence analyses, including 16S/18S rDNA analyses (Woese et al. 1990).In an extensive phylogenetic analysis of both information processing and general metabolic amino acid sequences (including GAPDH, enolase and PGK), the informational gene product phylogenies predominantly supported the close relationship between the Archaea and the Eukarya (Brown and Doolittle 1997; see also Koonin et al. 1997 andRivera et al. 1998).Brown and Doolittle (1997) found that for enzymes from the lower trunk portion of glycolysis, including GAPDH, PGK and enolase, the bacterial and eukaryal sequences were closer and the archaeal sequences more distant.For the special case of iPGM, it has recently been suggested that the iPGM gene family arose before divergence of the domain Archaea (Graham et al. 2002).Additional studies using whole genome sequence comparisons have added extra support for this division between the information processing systems of the Eukarya and general metabolic processes, as well as proteins involved in cell division and cell wall synthesis (Koonin et al. 1997, Rivera et al. 1998, Marakova et al. 1999).Furthermore, recent evidence based on sequence analysis of compartment-specific isoforms of TPIs (and GAPDHs) supports the view that eukaryal cytosolic glycolytic enzymes have been acquired from mitochondrial genomes (Liaud et al. 2000 and references therein).The phylogenetic distinction between the bacterial and eukaryal lower glycolytic enzymes has been obscured by lateral gene transfer events and is probably responsible for some of the low bootstrap support values obtained in some analyses.On the other hand, the separation of archaeal sequences was in all cases robust and strongly supported.
In conclusion, the high degree of conservation and universal distribution of the trunk pathway enzymes of glycolysis enable them to be used as viable and important molecular chronometers to investigate the origins of metabolism and life itself.Their evolutionary history as shown here, with an emphasis on enzymes from hyperthermophilic microorganisms, points to a gluconeogenic origin for the Embden-Meyerhof-Parnas pathway.This contrasts with the upper portion of the pathway, which is characterized by multiple origins for the enzymatic activities present in hyperthermophiles and other archaea.The Archaea have evolved some unique enzymatic solutions to produce the same sequence of biochemical transformations, and in some cases reaction bypasses, as exemplified by GAP:OR.A recent review of the thermodynamics of hydrothermal vent system chemistry strongly suggests that there was a favorable energetic environment for the origin of metabolism to have occurred at higher than mesophilic temperatures, supporting the central role that hyperthermophilic microorganisms are likely to have played in the origins of life (Amend and Shock 2001).

Figure 4 .
Figure 4. Phylogenetic tree of enolases (EC 4.2.1.11)generated with the neighbor-joining method.Three residues within the C-terminus of the Haemophilus influenzae sequence (419-421) were removed to maintain the alignment based on comparison with crystal structure data.See Figure 1 caption for further explanation.

Table 1 .
Enzyme Commission (EC) numbers and four-digit identification numbers or database accession numbers of genes identified in hyperthermophiles and mesophilic archaea that encode glycolytic enzymes.

Table 2 .
Detection of glycolytic enzyme activities in hyperthermophilic prokaryotes and mesophilic archaea.

Table 2
continued.Detection of glycolytic enzyme activities in hyperthermophilic prokaryotes and mesophilic archaea.

Table 2
continued.Detection of glycolytic enzyme activities in hyperthermophilic prokaryotes and mesophilic archaea.