The Distribution of eIF4E-Family Members across Insecta

Insects are part of the earliest faunas that invaded terrestrial environments and are the first organisms that evolved controlled flight. Nowadays, insects are the most diverse animal group on the planet and comprise the majority of extant animal species described. Moreover, they have a huge impact in the biosphere as well as in all aspects of human life and economy; therefore understanding all aspects of insect biology is of great importance. In insects, as in all cells, translation is a fundamental process for gene expression. However, translation in insects has been mostly studied only in the model organism Drosophila melanogaster. We used all publicly available genomic sequences to investigate in insects the distribution of the genes encoding the cap-binding protein eIF4E, a protein that plays a crucial role in eukaryotic translation. We found that there is a diversity of multiple ortholog genes encoding eIF4E isoforms within the genus Drosophila. In striking contrast, insects outside this genus contain only a single eIF4E gene, related to D. melanogaster eIF4E-1. We also found that all insect species here analyzed contain only one Class II gene, termed 4E-HP. We discuss the possible evolutionary causes originating the multiplicity of eIF4E genes within the genus Drosophila.


Introduction
Insects are the most diverse animal group on Earth and comprise over half of all extant described species, dominating thus all terrestrial ecosystems [1][2][3][4]. Winged insects were the first organisms that evolved controlled flight, some 120, 200, and 300 million years (Myr) before flying reptiles, birds, and bats, respectively. Indeed, wings are believed to have led largely to the spectacular diversification of insects because they were able to explore and invade all terrestrial ecosystems, escape predators, and exploit scattered resources [2,5]. Many studies show that insect diversity has been also strongly shaped by other evolutionary and ecological processes, including their relative ancient geological age, low extinction rate, ecological niches occupancy, sexual selection, and sexual conflict [1].
Insects originated 434-421 Myr ago during the Silurian Period, and it is suggested that earliest terrestrial faunas already included wingless insects [2,5,6]. Indeed, the aquatic-terrestrial transition of insect ancestors is associated with the earliest vascular land plants fossils. Thus, it is thought that true insects evolved from an aquatic arthropod that formed an ecological association with the earliest vascular plants and subsequently both lineages coevolved [2,6]. By the Permian (299-251 Myr ago) nearly all extant insect orders already have emerged, and later a second spectacular radiation happened in the Jurassic. Insects have been diverging ever since [2,7,8]. Winged insects, which account for more than 98% of the class Insecta, emerged when early arborescent plants evolved (pteridophytes, mostly ferns, and horsetails) 380-354 Myr ago (during the Devonian). It is hypothesized that insect flight arose as an adaptation to the increasing height of trees, and that a number of highly successful insect species coevolved with flowering plants [2,5,6,9].
Besides their crucial ecological importance in all terrestrial ecosystems, insects have a huge direct impact in all aspects of human life and economy. In agriculture, some 2 Comparative and Functional Genomics species cause huge damage to crops (e.g., aphids and weevil beetles), whilst others are of great benefit to flowering plants, which depend on pollinating species (e.g., bees, wasp, and butterflies). There are many species that can spread human pathogens (e.g., mosquitoes, fleas, and bed bugs) as well as key model organisms for basic research (Drosophila). Furthermore, several species serve as research objects for social behavior studies (e.g., bees and ants). Because of their overall significance, for many years immense efforts have been put forward to studying all aspects of insect biology. However, many biological processes, including translation, are still poorly studied at the molecular level. Therefore, further characterization of insect translation is necessary.
Most eukaryotic mRNAs are translated by a capdependent mechanism, whereby the mRNA is recruited to the ribosome through recognition of the 5 cap structure (m 7 GpppN, where N is any nucleotide) by the cap-binding protein eIF4E in complex with the scaffold protein eIF4G and the RNA helicase eIF4A [10,11]. Three-dimensional studies demonstrated that eIF4E associated to cap-analogues resembles "cupped-hands" in which the cap structure is stacked between two highly conserved tryptophan residues (Trp-56 and Trp-102 of mouse eIF4E) through π bond interactions. A third conserved tryptophan residue (Trp-166 of mouse eIF4E) binds the N 7 -methyl moiety of the cap structure [12][13][14][15]. Due to its pivotal role in translation, eIF4E activity is tightly regulated. Perhaps the most prominent regulatory mechanism is performed by eIF4E-binding proteins (4E-BPs), which bind eIF4E via an eIF4E-binding motif that is shared with eIF4G. 4E-BPs act as competitive inhibitors of eIF4E-eIF4G interaction and therefore of translation [10,16,17]. Another mechanism regulating eIF4E activity in some metazoans, including human, Drosophila, and Aplysia, is by phosphorylation of Ser-209 (mouse protein numbering; Ser251 in Drosophila eIF4E-1) [18][19][20].
Recent advances in sequencing technology allow comparative analysis of multiple genomes across a wide range of evolutionarily related species. Thus, gene and protein annotation of twelve different Drosophila species [43] and from other insect species [44,45] are now available. Here we investigated the distribution of the cap-binding proteins eIF4E and 4E-HP across the class Insecta.
Jagus and colleagues proposed a classification of eIF4Es from 230 species into three classes according to variations in the residues Trp-43 and Trp-56 (human eIF4E numbering) [45,49]. Class I members contain both Trp residues; Class II members contain Tyr, Phe, or Leu at the first position and Tyr or Phe at the second position; Class III proteins contain Trp at the first position and Cys or Tyr at the second position [45,49]. In the present study we will follow this classification. Since D. melanogaster is one of the most characterized model organisms and thus the best-studied species of all insects (whose entire genome is available for over a decade now (http://flybase.org/ [50]), and because among insects only eIF4Es and 4E-HP from D. melanogaster have been characterized [19,, we chose D. melanogaster eIF4Es sequences, numbering and nomenclature (http://flybase.org/ [25,26]) as a reference. To avoid misunderstanding with another nomenclature [45,49], here we will keep the fly database (http://flybase.org/) nomenclature, referring when necessary, to the Class each eIF4E belongs to.

Results and Discussion
3.1. eIF4E Proteins across the Genus Drosophila. Gene duplication of eIF4E is particularly striking in D. melanogaster        with seven different cognates of Class I eIF4Es (eIF4E-1 trough eIF4E-7) and one Class II gene, termed 4E-HP [25,26]. Although sequence comparisons of all D. melanogaster eIF4Es are shown elsewhere [25,26], a comparison of these proteins including an extended version of eIF4E-6 (see below) is shown in Figure 1.  and D. mojavensis contains three cognates (eIF4E-4, -5, and -7) ( Table 1). It has been shown that D. melanogaster eIF4E-1 and eIF4E-2 arise by alternative splicing from the same gene (eIF4E-1/2), both proteins differing only in amino acids in the N-terminus. While eIF4E-1 contains the peptide sequence MQSDFHRMKNFANPKSMF, eIF4E-2 contains MVVLETE instead [23,24] (Figure 1) Figure 2). The high variability in eIF4E-1 N-terminus among Drosophila species suggests that this region of the protein has no biological relevance.
All residues involved in cap-and eIF4G/4E-BP-binding as well as for phosphorylation are conserved in eIF4E-1 from across the genus Drosophila (Figure 2). In eIF4E-3, residues involved in eIF4G/4E-BP binding are mutated in two positions, namely, Trp103>Phe, and Leu160>His (numbering according to D. melanogaster eIF4E-3; Figure 3). This significant alteration may explain the weak binding to eIF4G and 4E-BP shown in the yeast two-hybrid system [26]. Both changes are strongly conserved in eIF4E-3 across the genus Drosophila. Moreover, eIF4E-3 from all Drosophila species lack the counterpart of the phosphorylatable Ser251 of D. melanogaster eIF4E-1, possessing a proline instead [31]  D . a n a n a ss a e e IF 4 E -3  Figure 4). eIF4E-5 varies considerably in length, ranging from 204 amino acids in D. persimilis to 271 amino acids in D. ananassae, and the N-terminus of eIF4E-5 (amino acids 1-53) is highly variable. However, eIF4E-5 is highly conserved from amino acid 54 on ( Figure 5). D. persimilis eIF4E-5 also diverges from its orthologs in at least ten functionally important amino acids ( Figure 5).

D . g r i m s h a w i e I F 4 E
Recent experimental evidence supports an extended Cterminus of eIF4E-6 (Tettweiler, Hernández, Sonenberg, and Lasko, unpublished), not detected in previous studies [25,26]. This extended eIF4E-6 showed the highest similarity to eIF4E-3 and has functionally important residues diverged from eIF4E-1 (Figure 1). One of the differences is a lack of phosphorylatable Ser251 (numbering of eIF4E-1). Surprisingly, although extended eIF4E-6 possesses all amino acids involved in eIF4G/4E-BP binding, experimental evidence showed that it does not bind either of them (Tettweiler, Hernández, Sonenberg, and Lasko, unpublished).
The extended eIF4E-6 could only be detected in five species, all of which contain conserved residues important for cap binding ( Figure 6). In D. erecta, a conserved substitution His>Arg is observed in position 33 (numbering according to D. melanogaster eIF4E-6; Figure 6), a residue essential for eIF4G/4E-BP binding. Similar to eIF4E-3, no eIF4E-6 from any species has the counterpart of eIF4E-1 Ser251 (Figure 6). eIF4E-7 is the longest protein from Class I family members with 301 amino acids in D. virilis to 458 amino acids in D. ananassae (Figure 7). The high degree of discrepancy in length is attributed to the variability in the Nterminal moiety of the protein (Figure 7). Although eIF4E-7 orthologs are most similar to eIF4E-1, eIF4E-7 from all Drosophila species cluster together in separate phylogram branches (Figure 8). Several species are lacking functionally important residues in the eIF4E-7 C-terminus. In particular, in D. simulans eIF4E-7 the eIF4E-1 Ser251 counterpart is substituted by a Gln, albeit it is conserved in other Drosophila species (Figure 7).
Overall, our analyses indicate that the seven eIF4Ecognates in the genus Drosophila form discrete clusters

eIF4E Proteins in Other Insects.
We analyzed protein annotations from all insect genomes that are publicly available. These include species representing non-Drosophila Diptera, as well as Hymenoptera, Coleoptera, Lepidoptera, and Hemiptera. Outside of the genus Drosophila, eleven more Class I eIF4Es were identified in different insect species (Figures 9 and 10). In contrast to Drosophila species, which contain three to seven different Class I eIF4Es cognates, we identified only a single Class I eIF4E gene in each insect genome analyzed, all of them related to D. melanogaster eIF4E-1 and with a highly variable N-terminus moiety ( Figure 9). All amino acids described to be involved in cap and eIF4G/4E-BP binding are conserved in all insect eIF4Es analyzed. The exception is Leu174 (numbering according to D. melanogaster eIF4E-1), which is exchanged to Lys in A. pisum eIF4E-1.
Several evolutionary forces could account for the multiplicity of eIF4E genes in Drosophila genus, as opposed to the other insect lineages containing only one eIF4E gene. Diptera experienced three episodes of explosive radiation, one of them happened during the emergence of Schizophora (close relatives of D. melanogaster) in the early Tertiary Period (65 MYA). The Schizophora radiation originated most of the family-level diversity in Diptera, accounting for more than a third of extant fly diversity [2,[51][52][53]. Interestingly, the temporal pattern of fruit flies speciation corresponds with the major periods of climate cooling and habitat fragmentation during the Cenozoic Era, which could be one of the causes for stimulating the rapid fruit flies speciation [52]. The vigorous burst of diversification of the Schizophora was also coincident with the emergence of some developmental novelties, including the ptilinal sac, an improved escape mechanism for the fly from its puparium [53]. Since flies originated in wet environments, it has been suggested that the emergence of an impervious pupation to their surrounding allowed flies to adapt to almost all substrates and to occupy a broad range of trophic niches [53]. The explosive diversification of schizophoran could have induced the repeated events of eIF4E duplication in Drosophila species. It is conceivable that specific modes of temporal and spatial regulation of protein synthesis driven by different eIF4E isoforms conferred an adaptive advantage to these environmental changes.
At the molecular level, genomic studies revealed that repeated tandem gene duplication has generated ∼80% of the nascent genes during the D. melanogaster subgroup evolution, and that retroposition has generated ∼10% of the new genes in these species [54,55]. Five to eleven new functional genes per million years were originated during evolution of this lineage [54,55]. These findings may explain that D. melanogaster eIF4E-1/2, eIF4E-3, eIF4E-4, and eIF4E-5 genes lie within a narrow region of the chromosome 3L and share exon/intron genomic structure [23,24,26]. Thus, it is conceivable that these genes originated by tandem duplication of an original eIF4E-1 gene. On the other hand, eIF4E-6 and eIF4E-7 genes, which lie in different chromosomes and contain no introns in the core region of the genes [26], could have originated by retroposition events from eIF4E-3 and eIF4E-1, respectively. Noteworthy, D. mojavensis only encodes eIF4E-4, -5, and -7, but not eIF4E-1. Since eIF4E-7 appears to be an extended eIF4E-1, we speculate that eIF4E-7 functions for eIF4E-1 in this species, which at a certain point of evolution lost the original eIF4E-1 gene. When available in the near future, the chromosomic location of D. mojavensis eIF4E-7 gene could corroborate this hypothesis.

4E-HP in the Genus Drosophila.
We also analyzed Class II eIF4E, namely, 4E-HP, in species of the genus Drosophila. In a striking contrast to all eIF4Es, a single copy of the 4E-HP gene was identified in each Drosophila species. Interestingly, 4E-HP displays an unusually strong conservation in the N-terminal moiety of the protein and residues important for eIF4G/4E-BP binding diverge considerably from eIF4E-1 in all Drosophila species (Figure 11). This is the case of Asn46, Gln82, Glu139, Asn140, and Met143 (positions refer to D. melanogaster 4E-HP), which are His, Glu, Leu,  Asp, and Leu residues in most D. melanogaster eIF4Es, respectively. Accordingly, D. melanogaster 4E-HP does not bind eIF4G [26] but it interacts with Bicoid (bcd) and Brain Tumor (Brat) instead [27,33]. Many residues critical for cap binding underwent both conservative and nonconservative mutations in 4E-HP from all Drosophila species analyzed. Thus, Tyr68, Glu102, Gln124, Lys164, Pro166, and Ser169 are Trp, Asp, Arg, Arg, Lys, and Lys in all D. melanogaster eIF4Es, respectively. The counterpart of phosphorylatable Ser251 in D. melanogaster eIF4E-1 is conserved in most species of the genus Drosophila. Finally, D. ananassae and D. mojavensis 4E-HP is considerably shorter than 4E-HP in other species (Figure 11).

4E-HP in Other
Insects. Further BLAST searches identified again single-copy 4E-HP genes in other insect species. Sequence comparison showed a strong conservation in the core region of the protein, albeit N-and C-terminus are less conserved ( Figure 12). In contrast to 4E-HP from Drosophila species, all residues important for eIF4G/4E-BP binding in eIF4Es are conserved in 4E-HP from all analyzed insects outside the genus Drosophila. This might suggest that 4E-HP in non-Drosophila insects do bind eIF4G/4E-BP. Similar to 4E-HP from all Drosophila species, most residues critical for cap-binding also show conservative changes in 4E-HP from Insecta species. The counterpart of phosphorylatable Ser251 of eIF4E-1 is only conserved in 4E-HP from D. melanogaster, T. castaneum, and A. pisum (Figure 12). A phylogram showing the relationships among 4E-HPs from all insects analyzed is shown in Figure 13.
Phylograms construction including all Drosophila 4E-HP and eIF4E sequences showed that all 4E-HPs cluster separately from all eIF4Es (not shown). Moreover, 4E-HP is widespread across metazoa, plants, and some fungi [45], and the D. melanogaster and human 4E-HP are able to bind the 5 cap structure of the mRNA but not eIF4G [26,56], thereby acting as a translational repressor of mRNAs associated to 4E-HP [27,33,34]. This, together with the findings that the A. thaliana [57] (termed nCBP) and the C. elegans [58] (termed IFE-4) orthologs can compete with reticulocyte eIF4E to reduce m 7 GTP binding and can be found associated with small ribosomal subunits, respectively, which is consistent with a regulatory function, led to the suggestion that 4E-HP diverged from a widespread ancestral Class I eIF4E into a translational repressor in mammals and in Drosophila [59]. This is supported by the observation that all residues important for eIF4G/4E-BP binding in eIF4Es are highly conserved in 4E-HP from non-Drosophila insects, but not in Drosophila species (Figure 12). Thus, 4E-HP from insects outside the genus Drosophila should bind eIF4G and promote translation. It is important to experimentally analyze this controversial hypothesis.
3.5. Class III eIF4Es. Among insects, only two partial Class III eIF4Es were identified, one in A. mellifera and one in H. coagulata. Both are missing the start methionine and were therefore not further analyzed.
Comparative and Functional Genomics

Concluding Remarks
Constant updating of genomic data and annotations as well as improved search algorithms provided a more comprehensive overview of insect eIF4E cognates than previously possible. Here we presented an updated analysis of eIF4Es and 4E-HP across Insecta. This analysis revealed an interesting observation, that is, that eIF4E is a single-copy gene in all insects analyzed, but in the genus Drosophila this gene underwent a striking multiplication along with the explosive radiation this lineage went through in the early Tertiary. eIF4E diversification led to variability of biochemical properties and physiological specialization, as documented for some D. melanogaster eIF4Es. It would be worthy to investigate whether this is also the case for other species with several eIF4E cognates, as sequence alignments showed how diverse this protein is in the genus Drosophila. It also would be interesting to search for novel, so far unknown, 4E-BPs in other Drosophila species. Moreover, it is possible that different eIF4Es could translate specific target mRNAs. eIF4E from more insect species must be analyzed to obtain a better picture of the evolution and diversity of eIF4E in this group, and to see whether the rise of multiple eIF4E genes is found in other insect lineages too. If so, correlating eIF4E evolution with the natural history of those lineages might lead us to find general, underlying forces driving the translation apparatus evolution.