The archaeal origins of the eukaryotic translational system

Among the 78 eukaryotic ribosomal proteins, eleven are specific to Eukarya, 33 are common only to Archaea and Eukarya and 34 are homologous (at least in part) to those of both Bacteria and Archaea. Several other translational proteins are common only to Eukarya and Archaea (e.g., IF2a, SRP19, etc.), whereas others are shared by the three phyla (e.g., EFTu/EF1A and SRP54). Although this and other analyses strongly support an archaeal origin for a substantial fraction of the eukaryotic translational machinery, especially the ribosomal proteins, there have been numerous unique and ubiquitous additions to the eukaryotic translational system besides the 11 unique eukaryotic ribosomal proteins. These include peptide additions to most of the 67 archaeal homolog proteins, rRNA insertions, the 5.8S RNA and the Alu extension to the SRP RNA. Our comparative analysis of these and other eukaryotic features among the three different cellular phylodomains supports the idea that an archaeal translational system was most likely incorporated by means of endosymbiosis into a host cell that was neither bacterial nor archaeal in any modern sense. Phylogenetic analyses provide support for the timing of this acquisition coinciding with an ancient bottleneck in prokaryotic diversity.


Introduction
There have been a number of studies since the original work of Woese et al. (1990) on ribosomal RNA showing that many components of the eukaryotic translational system are closely related to those of the archaeal system.Given recent work showing that the ribosomal proteins common to both Archaea and Bacteria have an unusual phylodomain-specific block or segmental structure (Vishwanath et al. 2004), we have undertaken a detailed study of the eukaryotic ribosomal proteins.In particular, we have investigated their relationship to ribosomal proteins of the two current major divisions of Archaea.Such an investigation has the potential to provide some insight into fundamental questions on the origin of the complex structures of the eukaryotic cell.Although the origin of mitochondria via an ancient endosymbiotic event (Margulis and Bermudes 1985) is generally accepted, there are many competing theories on the origin and timing of the other eukaryotic components (Hartman 1984, Sogin 1991, Lake and Rivera 1994, Gupta and Golding 1996, Brown and Doolittle 1997, Moreira and Lopez-Garcia 1998, Horiike et al. 2001, Hartman and Fedorov 2002).Our results not only support the closeness of the eukaryotic to the archaeal ribosome but, in combination with other data (Hartman andFedorov 2002, Vishwanath et al. 2004), support the idea that the eukaryotic ribosome was brought into a non-bacterial, non-archaeal host cell from a crenarchaeal ancestor very early in the history of Eukarya.

Materials and methods
Detailed multiple comparative sequence alignments across a wide taxonomic range of Eukarya (excluding reduced genome parasites), Archaea and Bacteria were carried out for the ribosomal proteins and several other translation-associated proteins.Only species for which there existed a draft of the entire genome were included in the study (for a list of species, see the caption to Figure 4).Each of the sequences annotated as ribosomal protein was searched through the set of genomes for both potential homologs and paralogs using PsiBlast (Altschul et al. 1997) and all suggestive matches were followed up.Next, all of the identified probable ribosomal proteins were clustered using CLUSTAL (Thompson et al. 1994).The primary results of these initial analyses was two-fold.First, no additional ribosomal proteins common to all three major phylodomains were identified; and second, when all three phylodomains were included, most of the alignments were only partial.However, within each phylodomain the alignments were nearly full-length, except for a few of the N-and C-terminal regions.To generate alignments that were as complete as possible, initial statistical profiles (Das and Smith 2000) for each of the identified segments common within and between the three phylodomains were constructed and used to generate initial full-species multiple alignments.These were then extended and refined by hand by exploiting the available structural and functional information (Ban et al. 2000, Brodersen et al. 2001, Klein et al. 2004) in constraining and adjusting the computer-generated alignments.In particular, the information was used to restrict alignment gaps to surface loop regions, maintain the alignment of probable rRNA-contacting, functionally equivalent residues and to maintain likely structurally equivalent patterns across all sequences whenever possible.The latter included maintaining hydrophobicity profiles and regions containing amino acids of high turn propensity.These procedures are described in Vishwanath et al. (2004).Finally, new statistical profiles were constructed for all segments or blocks (see Results) that were clearly alignable, and thus were assumed truly homologous among each phylodomain, for Bacteria and Archaea, for Eukarya and Archaea and for all three phylodomains.This was followed by full-sequence genome searches to ensure that apparent conserved patterns were truly characteristic of the various phylodomains over the widest possible set of sequenced genomes.
Full phylogenetic reconstruction was carried out based on the sequence variation within the ribosomal protein alignment sets.Considerable effort was expended to ensure that the implied results are robust.This involved comparing the results of multiple methodologies (Strimmer and von Haeseler 1997, Katoh et al. 2001, Kolaczkowski and Thornton 2004) and various combinations of subsets of the alignments.In particular, trees were constructed from the sequence variation for each of the three classes of ribosomal taxon-specific segments or blocks concatenated into various subsets, ranging from those of individual proteins, to the two ribosomal subunits separately, to the full set of all common blocks among the 34 universal ribosomal proteins.
Implied trees were assumed to be compatible if all the nested sets of one were either identical to or contained within those of the other.In other words, many of the implied trees constructed from single proteins showed little resolution within one or more of the phylodomains, yet did not cluster a mix of phylodomain members.For each ribosomal protein or its universal concatenated blocks, or both, statistics were obtained on the length and the number of conserved and informative positions (see Table 1).Here, informative position was defined as any position with two or more aligned distinct amino acids, each present in at least two species, implying a division of the species into at least one cluster and all of the rest.

Results
Several key points were made clear by the resulting multi-sequence alignments and database searches.As previously reported (Lecompte et al. 2002), there are 11 unique and ubiquitous eukaryotic ribosomal proteins with no archaeal or bacterial homologs; there is a clear homology of 67 eukaryotic ribosomal proteins with those of Archaea; and 33 of these 67 eukaryotic-archaeal homologs have no bacterial homologs (Figure 1).Finally, the remaining 34 of those 67 are universal ribosomal proteins found in all three phylodomains (Figure 2).What is new and supported by this work is that the eukaryotic members among these 34 clearly show closer similarity to the crenarchaeal division of Archaea (Table 1).These results were obtained through analysis of the positional conservation from the alignments of the 34 universal proteins across all three phylodomains.The alignments show a well-defined block or segmental structure associated uniquely to one, two or all three of the phylodomains (Vishwanath et al. 2004) (Figure 3).Critically, the current analyses show that eukaryotic homologous proteins contain only the archaeal-specific blocks and no bacterial-specific blocks, with the possible but unclear exception of S3p (Figure 3).In addition, the vast majority of the 67 eukaryotic-archaeal ribosomal protein homologs contain eukaryotic-specific sequence additions, many of which are ubiquitous to all known Eukarya, as indicated in Figure 4.The full detailed amino-acid-level alignments are available on the web at http:// bmerc.bu.edu.
Using the alignable regions of all 67 ribosomal proteins common to Eukarya and Archaea, as well as various subsets, probable phylogenetic relationships were constructed, as described in Materials and methods.First, as noted by others (Lecompte et al. 2002), there is a clear separation of the three major phylodomains for all subsets of the data, excluding the individual ribosomal proteins for which there were insufficient numbers of aligned informative positions (Table 1).In addition, the concatenation of the alignable regions among both the full 67-and the 33-protein subset of only those common to Archaea and Eukarya provided a clear three-way separation between Eukarya, Crenarchaea and Euryarchaea (Figure 4b).The same relationship was obtained using only the blocks common to both Eukarya and Archaea found in the 34 universal ribosomal proteins.The implied close association between Eukarya and Crenarchaea is more clearly seen in the numbers of conserved positions between the two archaeal divisions on the one hand and Eukarya on the other hand, as displayed in Table 1.This is also supported by the presence of five proteins (S25e, S26e, S30e, L13e, L38e) that are apparently shared universally only by Crenarchaea and Eukarya (Lecompte et al. 2002).The concatenated alignable common sequence segments in the 34 universal ribosomal proteins provided a highly statistically significant and consistent separation among Eukarya, Crenarchaea, Euryarchaea and Bacteria for both the large and small subunits independently.For phylogenetic analyses based on many individual proteins, the limited number of informative variable positions either failed to fully resolve these divisions or produced low bootstrap values.The positional variation statistics and clustering resolutions are listed in Table 1.A number of other translation-related proteins, including the elongation and initiation factors EF2/EFG and IF2P/IF2, not only showed a similar taxon-specific block structure, but displayed significantly higher eukaryotic similarity to Crenarchaea than to Euryarchaea, as observed earlier (Rivera and Lake 1992).Phylogenetic reconstruction with these proteins, combined with other translational proteins (e.g., EFTu/EF1a, SRP54 and SRP19), again generated the same four-division clustering shown in Figure 4a as a concatenated set as well as individual proteins.
The two archaeal divisions coalesce for the concatenated set of ribosomal proteins, L23p, L29p, L5p and L18p, which interact respectively with the SRP complex and the 5S rRNA (Speek and Lind 1982), as happens with SRP54 and SRP19.These were the only small ribosomal protein sets with significant statistical resolving power that did not resolve the two major archaeal divisions relative to Eukarya.There are other proteins that clearly support the closeness of Eukarya and Archaea (Puhler et al. 1989), such as the DNA clamp protein with its common six-domain trimer symmetry, that also fail to resolve these two archaeal divisions.These results may pro-vide information on the relative age of these components within Eukarya.
Only 23 of the total 67 common archaeal-eukaryotic ribosomal proteins contain neither N-nor C-terminal eukaryotic ubiquitous extensions, nor insertions relative to their archaeal homologs (Figure 3).Eleven proteins in the large subunit, L4p, L29p, L13p, L13e, L14e, L18e, L19e, L21e, L24e, L31e and L37e, show significant eukaryotic-specific C-terminal extensions, whereas only S6e and S17e have such C-terminal extensions within the small subunit.Large eukaryotic-specific N-terminal extensions are rarer and are found only in L23p, L30p, S27ae and S27e.As in many other protein families, there are also substantial additional length variations at both eukaryotic protein termini that have very limited eukaryotic phylogenetic correlation.With the exception of L13e, the eukaryotic-specific internal sequence insertions are relatively small.Finally, there are three eukaryotic-archaeal homologs, L21ae, S3p and S26e, that appear to have commensurate insertions in both Eukarya and Archaea, these being regions that show no sequence similarity yet are found between clearly alignable segments.Whether these blocks are actually homologous remains unclear without full comparative structural information.
To complete the study, the mitochondrial ribosomal proteins homologous to the 34 universal ribosomal proteins were aligned and subjected to phylogenetic analysis.The 34 mitochondrial ribosomal proteins can be aligned with the bacterial proteins, although they display considerable modification and variation.In general, they are not alignable with their archaeal and eukaryotic homologs over any significant contiguous regions.This is not only the result of the inclusion of archaealspecific blocks, but of the increased variation within the blocks common to all.Thus, no consistent, statistically significant trees could be obtained when the mitochondrial ribosomal proteins were added to those of the three major phylodomains, due largely to the limited variation in reliably alignable regions.

Discussion
That 67 out of 68 archaeal ribosomal proteins have clear eukaryotic homologs and that 34 of these eukaryotic ribosomal proteins within the universal set contain the archaealspecific block structure, implies that these eukaryotic ribosomal proteins are predominantly of archaeal origin.The inclusion of the archaeal-specific blocks at the exclusion of bacterial-specific blocks also implies that the incorporation of the 34 universal proteins into the eukaryotic cell must have occurred long after the last common ancestor of modern Archaea Figure 2. A schematic representation of the multiple alignments of the archaeal and eukaryotic species for the 34 universal ribosomal proteins.Purple marks segments alignable, and thus homologous, across both the eukaryal and archaeal phylogenetic domains; blue marks segments unique and alignable only among Eukarya; red marks segments alignable only among Archaea.Dotted lines represent sequence regions of varying length that are unalignable across the full sets of either Eukarya or Archaea.The representative species are listed in the legend of Figure 4. Ribosomal protein L7ae has been identified in some, but not all, major representative Bacteria, and is therefore neither universal nor eukaryoticarchaeal specific.and Bacteria.The latter is in part represented by the common blocks within these universal ribosomal proteins.A comparison of the sequence divergence within the three different phylodomain-specific block types suggests that, within the large uncertainties common to such analyses, the time from the last common ancestor of Archaea and Bacteria to the origin of the modern eukaryotic cell is nearly the same as the time from the origin of the eukaryotic cell to the present.The set of ribosomal proteins with no recognizable bacterial or archaeal homologs, unique and ubiquitous to modern eukaryotes, suggests an additional source.
It has long been noted that many of the eukaryotic metabolic proteins are closely related to those found in modern Bacteria.This has been explained as the result of a genetic transfer from the endosymbiotic precursors of the mitochondria and chloroplasts.If this is correct, the subtraction of those components from the eukaryotic proteome provides insight into an original host cell's makeup, perhaps largely living by the phagocytosis of Bacteria or Archaea or both.The same subtractive logic should be applicable to the closest archaeal related components.The resulting remainders from the translational system are: (1) the 11 eukaryotic-specific ribosomal proteins; (2) the eukaryotic ubiquitous ribosomal protein extensions; (3) the insertions and additions to the eukaryotic rRNA; and (4) a number of other translational proteins.All suggest a non-archaeal, non-bacterial origin for these translational components.
This same subtractive logic was used in the identification of eukaryotic signature proteins (ESPs) found ubiquitously in eukaryotic cells having no obvious bacterial or archaeal homologs (Hartman and Fedorov 2002).In addition to the eukaryotic-specific ribosomal proteins, these ESPs include a considerable number of RNA processing proteins, many signaling proteins, cytoskeleton proteins such as actin and tubulin and motor proteins such as kinesin and myosin.Eukarya use membrane proteins and a calcium ion gradient to control their unique ability to phagocytize prokaryotes and to control motil- ity (Berridge et al. 2003).This control by the calcium ion is mediated through Eukarya-unique cytoskeleton (microtubules and actin filaments) and sliding filament motility by a motor protein (i.e., myosin).Bacteria and Archaea use membrane proteins and a proton gradient for motility and ATP synthesis (Harold 1977).This suggests that the direct ancestors of modern Archaea or modern Bacteria were not the source of these proteins and processes.One must assume that, like many differences between Bacteria and Archaea, the far more distinctive features of the eukaryotic cell accumulated over a long time period.Yet the mitochondria, and now the apparent crenarchaean hybrid ribosome, support the view that much of the complexity of Eukarya arose through symbiosis on a short time scale.There are several current hypotheses concerning the origin of the eukaryotic cell and its various components, in particular its nucleus: (i) the nucleus formed autogenously, perhaps in a bacterial ancestral cell through a membrane surrounding the bacterial DNA, as currently observed in Planctomycetes (Lindsay et al. 2001); (ii) the nucleus formed through fusion of a bacterium and an archaeon (Lake and Rivera 1994, Gupta and Golding 1996, Martin and Müller 1998, Moreira and Lopez-Garcia 1998); and (iii) the nucleus formed through endosymbiosis, rather than fusion, of an archaeon or a bacte-rium into a distinct host cell.At least three candidates have been proposed for a potential host cell: a bacterium (Lake and Rivera 1994, Brown and Doolittle 1997, Horiike et al. 2001); an archaeon (Brown andDoolittle 1997, Martin andMüller 1998, as a strict hydrogen-dependent, autotrophic); and a chronocyte, an RNA-based cell (Hartman 1984, Sogin 1991, Hartman and Fedorov 2002).In all such hypotheses, it has been assumed that the appearance of mitochondria and chloroplasts was the result of later or possibly concurrent endosymbioses (Margulis and Bermudes 1985).
The simple fusion hypotheses or mutual endosymbiosis of a bacterium and an archaeon do not provide a direct explanation for the presence of ESPs.In particular, the hypothesis that a bacterial host or a bacterial membrane surrounding its DNA to form the nucleus would not explain why the eukaryotic ribosomal protein set has no bacterial-like blocks independent of those shared with Archaea.Thus, even though the majority of the mitochondrial ribosomal proteins are encoded in the modern eukaryotic nuclear genome, there is no detectable recombination between these mitochondrial riboproteins or their bacterial precursor and their nuclear-encoded archaeal or eukaryotic homologs.The latter appears to set limits on the interchangeability or compatibility of the two prokaryotic ribosomal protein components, and thus on the likely success of any horizontal transfers.This is in contrast with the evidence of such transfers of tRNAs, their synthetases and some enzymes.It has been assumed that successful horizontal gene transfers not only require significant selective forces, but the transferred proteins have few required complex interactions.Whereas the tRNAs interact with the full ribosomal complex, they are notable in being nearly identical in structure and function in all three domains.It is possible, but unlikely, that all modern Bacteria and Archaea acquired their unique and ubiquitous ribosomal components via massive horizontal gene transfer across their wide ranges of environments.
We therefore suggest that the eukaryotic ribosome is a modified ancient archaeal ribosome that was brought into a distinct host and was complemented by the addition of extra proteins, additional RNAs and peptide and RNA insertions.It is now clear that archaeal endosymbiosis can happen, as observed in the endosymbiosis of a methanogenic archaeon by an anaerobic ciliate (Van Hoek et al. 2000).The reason for suggesting a possibly more ancient original RNA-based host, a chronocyte, is the large role played by RNA in eukaryotic cells, such as in retroposons, spliceosomes and the large number of small and large non-protein coding RNAs, many of which are thought to be essential in gene regulation.However, such an RNA-based host would require a protein synthesis system compatible (e.g., having the same genetic code and tRNA) with that of the acquired ancestral crenarchaeal ribosome at the time of such an endosymbiotic takeover.This in turn points to an early common ancestry for this common protein encoding system, among the earliest cellular life that later led to Bacteria, Archaea and the pre-eukaryotic host.
As shown in this work, the eukaryotic ribosomal proteins are more similar to those of Crenarchaea than of Euryarchaea.On the other hand it is Euryarchaea, containing an archaeal histone associated with its DNA, that is a probable precursor to the eukaryotic histones (Bailey et al. 2000).This implies that there were at least two endosymbioses of Archaea into a host cell, one leading to the formation of a nucleus with histonepackaged DNA (from a euryarchaeon as endosymbiont) and the other leading to the takeover of the cytoplasmic translational apparatus (from a crenarchaeon as a endosymbiont).Although the majority of eukaryotic riboproteins clearly show closer relatedness to those of Crenarchaea than of Euryarchaea, there are a few informational proteins that show no clear separation between Crenarchaea and Euryarchaea.These include SRP54 and those ribosomal proteins contacting the 5S RNA and the SRP.It is thus possible that SRP54 and its RNA-bound loop (Nagai et al. 2003), the 5S RNA and even the 5.8S and their ribosomal-contacting proteins, are more ancient and represent parts of an older protein synthesis system.
The block structure observed in the majority of proteins related to polypeptide synthesis suggests that the aforementioned proposed endosymbiotic events occurred after a catastrophe that enveloped the cellular biosphere.In our recent paper on the block structure of the 34 universal ribosomal proteins among the Prokaryota (Vishwanath et al. 2004), we noted that phylogenetic analyses implied the occurrence of some series of catastrophic events or the equivalence of an evolutionary bottleneck that led to a major reduction in prokaryotic cellular diversity after the divergence of the lines of descent leading to the common ancestors of modern Bacteria and Archaea.Steitz and co-workers (Klein et al. 2004) recently identified a set of ribosomal proteins in Bacteria and Archaea that show no sequence or structural similarity, but bind to the ribosome in identical positions and have loops or terminal extensions making nearly identical contact with their respective rRNAs (L44e/L33, L21e/L27, L31e/L17, L37e/ L34, L15e/L31, L24e/L19).We observed a similar situation in two universal ribosomal proteins, S8 and L4, where sequence blocks showed no similarity but the contacts of these blocks with the rRNA were almost identical (Vishwanath et al. 2004).This implies that multiple alternative solutions to the roles played by individual ribosomal proteins, such as stabilizing the structure and restricting the RNA fold space, were possible (Favaretto et al. 2005).Given the significant differences in these proteins and protein blocks there appears little reason to assume they were the only successful solutions.Yet only two types have survived to the present: one solution adopted by the last common ancestor of all modern Archaea, and the other by the ancestor of all modern Bacteria.Neither cross-species coalescence nor massive horizontal gene transfers seem as likely as one or more diversity-reducing events.
The present analysis suggests that the eukaryotic cell incorporated its modern translational machinery from a crenarchaeal ancestor at about the same time as the probable prokaryotic bottleneck noted above.It is possible that besides the proposed radical cooling of the earth, i.e., the snowball hypothesis (Kirschvink et al. 2000), the introduction of oxygen represented a major cause of this evolutionary bottleneck.Thus, because extreme selective pressures would have been acting at that time, only those pre-eukaryotic host organisms (i.e., Chronocyte) that took up Bacteria (Martin et al. 2001, Esser et al. 2004) for oxygen protection and utilization and that possessed the more selectively advantageous crenarchaeal translational system, may have survived.That pre-eukaryotic host would be one of the last representatives of the very ancient RNA-based cellular organism with an older protein coding system.

Table 1 .
Phylogenetic information statistics for individual ribosomal proteins and translation factors.The statistics were calculated from the multialignments generated as described in Materials and methods for the concatenation of the blocks within each protein common to all phylodomains considered.Five subsets of representative species were considered: AEB (Archaea, Eukarya, Bacteria), AE (Archaea, Eukarya), AB (Archaea, Bacteria), EC (Eukarya, Crenarchaea) and EY (Eukarya, Euryarchaea).For each subset, the table reports the percentage of informative (Inform.) and absolutely conserved (Abs.cons.)positions in the concatenated universal blocks (ccblk) of each protein separately.An informative position is defined as a position in the alignment where at least two different amino acids are observed, each in at least two different species.Phylogenetic tree reconstructions performed on each individual protein show resolution among the three phylodomains (B-A-E), resolution between the two archaeal divisions (C-Y) and clustering of the Eukarya with the Crenarachaea (E + C).