A substantial number of “retrogenes” that are derived from the mRNA of various intron-containing genes have been reported. A class of mammalian retroposons, long interspersed element-1 (LINE1, L1), has been shown to be involved in the reverse transcription of retrogenes (or processed pseudogenes) and non-autonomous short interspersed elements (SINEs). The
Gene duplication is a fundamental process of gene evolution [
Schematic representation of the formation of a processed pseudogene.
Eukaryotic genomes generally contain an extraordinary number of retroposons such as long terminal repeat (LTR) retrotransposons, LINEs or non-LTR retrotransposons, and SINEs [
Schematic representation of a SINE and a LINE that have the same 3′-end sequence. Three-dimensional protein structures are taken from the L1-encoded ORF1 protein [
The Bombyx R2 LINE protein, which has sequence-specific endonucleolytic and RT activity, makes a specific nick in one of the DNA strands at the insertion site and uses the 3′ hydroxyl group that is exposed by this nick to prime the reverse transcription of its RNA transcript [
SINEs are non-autonomous retroposons, the 5′-end sequences of which are derived from tRNA, 5S rRNA, or 7SL RNA with promoter activity for RNA polymerase III (Figure
Eickbush’s group conducted comprehensive phylogenetic analysis of LINEs using extended sequence alignment of their RT domains [
The 3′-end sequences of various SINEs originated from a corresponding LINE (Figure
Identification of SINE/LINE pairs [
SINE | Species | Promoter | LINE tail | Description of SINE/LINE pair | ||
---|---|---|---|---|---|---|
|
||||||
MIR (CORE-SINEs: |
[ |
All mammals | tRNA | L2 | [ |
[ |
CORE-SINEs (MIR3/Ther-2) | [ |
Mammals | tRNA | L3 | [ |
[ |
CORE-SINEs (Mar-1/MAR1_MD) | [ |
Marsupials | tRNA | RTE-3_MD | [ |
[ |
MAR4 |
[ |
Opossum and wallaby, |
(5′-end of RTE) | RTE-2 (MD, ME) | [ |
[ |
RTESINE1 | [ |
Opossum, |
(5′-end of RTE) | RTE-1_MD | [ |
[ |
Ped-1 | [ |
Springhare, |
5S rRNA | BovB_Pca | [ |
[ |
Ped-2 | [ |
Springhare, |
tRNA (ID SINE) | BovB_Pca | [ |
[ |
Bov-tA | [ |
Ruminants | tRNAGlu | Bov-B | [ |
[ |
Bov-A2 | [ |
Ruminants | (5′-end of BovB) | Bov-B | [ |
[ |
SINE2-1_EC | [ |
Horse, |
tRNA | RTE-1_EC | [ |
[ |
Afro SINEs |
[ |
All Afrotherians | tRNA | RTE1 (LA, Pca) | [ |
[ |
RTE1-N1_LA | [ |
Elephant, |
(5′-end of RTE) | RTE1_LA | [ |
[ |
SINE2-1_Pca | [ |
Hyrax, |
tRNA | RTE1_Pca | [ |
[ |
| ||||||
|
||||||
TguSINE1 | [ |
Zebra finch, |
tRNAIle | CR1-X | [ |
[ |
Tortoise Pol III/SINE | [ |
Tortoises and turtles, |
tRNALys | PsCR1 | [ |
[ |
Sauria SINE | [ |
Lizard, |
tRNA | Anolis Bov-B | [ |
[ |
Anolis SINE 2 | [ |
Lizard, |
(Box A & B) | Anolis LINE 2 | [ |
[ |
SINE2-1B_Acar/ |
[ |
Lizard, |
tRNA | Vingi-2_Acar | [ |
[ |
| ||||||
|
||||||
V-SINEs (SINE2-1_XT) | [ |
Frog, |
tRNA | L2-4_XT |
[ |
[ |
CORE-SINEs (MIR_Xt) | [ |
Frog, |
tRNA | L2-5_XT | [ |
[ |
| ||||||
|
||||||
Sma I | [ |
Chum and pink salmon, |
tRNALys | SalL2 | [ |
[ |
Fok I | [ |
Charr, |
tRNALys | SalL2 | [ |
[ |
SlmI | [ |
All salmonids, |
tRNALeu | RSg-1 | [ |
[ |
CORE-SINEs (Hpa I) | [ |
All salmonids, |
tRNA | RSg-1 | [ |
[ |
CORE-SINEs |
[ |
Cichlid fish, |
tRNA | CiLINE2 | [ |
[ |
CORE-SINEs (UnaSINE1, UnaSINE2) | [ |
Eel, |
tRNA | UnaL2 | [ |
[ |
HAmo SINE | [ |
Carp, |
tRNA | HAmoL2 | [ |
[ |
DeuSINEs |
[ |
Mammals, chicken, |
5S rRNA | CR1-4_DR (CR1-7, CR1-9, CR1-13) | [ |
[ |
DeuSINEs |
[ |
Coelacanth and dogfish shark, |
tRNA | CR1-4_DR-like | [ |
[ |
DeuSINEs (OS-SINE1) | [ |
Salmon and trout, |
5S rRNA | RSg-1 | [ |
[ |
V-SINEs (HE1) | [ |
Sharks and rays, |
tRNA | HER1 | [ |
[ |
V-SINEs (DANA) | [ |
Zebrafish, |
tRNA | CR1-3DR/ZfL3 | [ |
[ |
V-SINEs (Lun1) | [ |
Lungfish, |
tRNA | LfR1 | [ |
[ |
SINEX-1_CM/SINE2-1_CM | [ |
Elephant shark, |
tRNA | CR1-2_CM | DQ524334 | [ |
| ||||||
|
||||||
DeuSINEs (BflSINE1) | [ |
Amphioxus, |
tRNA | Crack-16_BF | [ |
[ |
| ||||||
|
||||||
SURF1/SINE2-4c_SP | [ |
Sea urchin, |
tRNA | CR1-4_SP | [ |
[ |
DeuSINEs (SINE2-3_SP) | [ |
Sea urchin, |
tRNA | CR1Y_SP (CR1X_SP) | [ |
[ |
SINE2-8_SP |
[ |
Sea urchin, |
tRNA | L2-1_SP/CR1-3_SP | [ |
[ |
| ||||||
|
||||||
Gecko | [ |
Mosquito, |
tRNA | I-74_AAe (MosquI, I-58, I-59, I-62, I-64, |
[ |
[ |
| ||||||
|
||||||
Nve-Nin-DC-SINE-1 |
[ |
Sea anemone, |
tRNA | L2-22_NV | [ |
[ |
Nve-Nin-DC-SINE-2 |
[ |
Sea anemone, |
tRNA | CR1-5_NV | [ |
[ |
Nve-Nin-DC-SINE-3 |
[ |
Sea anemone, |
tRNA | CR1-15_NV | [ |
[ |
SINE2-1_NV | [ |
Sea anemone, |
tRNA | CR1-16_NV | [ |
[ |
SINE2-5_NV | [ |
Sea anemone, |
tRNA | Rex1-24_NV | [ |
[ |
| ||||||
|
||||||
Mg-SINE | [ |
Rice blast fungus, |
tRNA | MgL/MGR583 | AF018033 | [ |
SINE2-1_BG | [ |
Powdery mildew fungus, |
tRNA | Tad1-24_BG (HaTad1-3, 1-5) | [ |
[ |
| ||||||
|
||||||
EdSINE1 (SINE-lile) | [ |
Amoeba, |
Unknown | R4-1_ED | [ |
[ |
R4-N1_ED (SINE-lile) | [ |
Amoeba, |
Unknown | R4-1_ED | [ |
[ |
EhLSINE1/ehapt2 (SINE-lile) | [ |
Amoeba, |
Unknown | EhLINE1/EhRLE1 | [ |
[ |
EhLSINE2 (SINE-like) | [ |
Amoeba, |
Unknown | EhLINE2/EhRLE3 | [ |
[ |
| ||||||
|
||||||
TS | [ |
Tobacco, |
tRNA | RTE-1_Stu | [ |
[ |
ZmSINE2/SINE2_SBi | [ |
Maize, |
tRNA | LINE1-1_ZM | [ |
[ |
ZmSINE3 | [ |
Maize, |
tRNA | LINE1-1_ZM | [ |
[ |
| ||||||
|
||||||
SINEX-1_CR | [ |
|
Unknown | RandI-2/ |
[ |
[ |
SINEX-2_CR | [ |
|
Unknown | RandI-2 (RandI-3) | [ |
[ |
SINEX-3_CR | [ |
|
tRNA | L1-1_CR | [ |
[ |
SINEX-4_CR | [ |
|
Unknown | RandI-2 (RandI-3) | [ |
[ |
SINEX-5_CR/SINEX-6_CR | [ |
|
tRNA | RandI-5 | [ |
[ |
Sequence comparison of tobacco TS SINE with its partner LINE. The entire sequence of the TS SINE was aligned with the 3′-end sequence (~200 nucleotides) of a potato RTE-clade LINE. Dots and hyphens represent identical nucleotides and gaps, respectively. The tRNA-related region of the SINE is underlined, with the promoter sequences for RNA pol III (A & B boxes) highlighted in red. Nucleotide positions are shown on the right.
Since the R2 LINE protein specifically recognizes the sequence near the 3′-end of the RNA transcript for the initiation of first-strand synthesis [
Figure
Relationship between the number of SINE/LINE pairs and the number of LINEs in each clade. The vertical axis shows the number of SINEs with a LINE tail [
Mammalian PPs and retrogenes were probably mobilized by L1s because they end in poly(A), and have L1-type target site duplications; they are inserted in L1-type endonuclease cleavage sites [
The 3′-end sequences of mammalian L1 LINEs do not exhibit any similarity to SINEs, except for the presence of 3′-poly(A) repeats, although these L1s are thought to have mediated the retroposition of mammalian SINEs such as primate Alu and rodent B1 families [
L1-encoded proteins are
Schmitz et al. discovered a novel class of retroposons that lack poly(A) repeats in mammals. Termed tailless retropseudogenes, they are derived from truncated tRNAs and tRNA-related SINE RNAs [
Abundant PPs are a feature of mammalian genomes [
Roy-Engel’s group recently recreated and evaluated the retroposition capabilities of two ancestral L1 elements, L1PA4 and L1PA8, which were active ~18 and ~40 mya, respectively [
We hypothesized that many human retrogenes were created during this period and that such retrogenes were involved in generating new characteristics specific to simian primates [
Most new genes arise by the duplication of existing gene structures, after which, relaxed selection on the new copy frequently leads to mutational inactivation of the duplicate; only rarely will a new gene with a modified function emerge. My collaborators and I described a unique mechanism of gene creation, whereby new combinations of functional domains are assembled at the RNA level from distinct genes, and the resulting chimera is then reverse-transcribed and integrated into the genome by the L1 retrotransposon [
Domain shuffling has provided extraordinarily diverse functions to proteins; nevertheless, how newly combined domains are coordinated to create novel functions remains a fundamental question of genetic and phenotypic evolution. My group presented the first evidence for the translation of PIPSL in humans [
We determined the evolutionary fate of PIPSL domains created by domain shuffling [
Molecular phylogeny and pattern of nucleotide substitutions of the
The SINE/LINE relationship in land plants is controversial. The first SINE/LINE pair of land plants was reported recently in maize [
I systematically analyzed the increasing wealth of genomic data to elucidate the SINE/LINE relationships in eukaryotic genomes, especially plants [
Figure
The number of LINE families belonging to each LINE clade according to biological taxa [
While a significant number of SINEs, more than half of which end in poly(A) repeats, have been identified in the genomes of flowering plants (Table
3′-Repeats of plant SINE families [
SINE | Species | 3′-Repeat | LINE tail | Reference for SINEs |
---|---|---|---|---|
|
||||
SINEX-1_CR |
|
(ATT) |
RandI-2/DualenCr3 | [ |
SINEX-2_CR |
|
(CTTT) |
RandI-2 (RandI-3) | [ |
SINEX-3_CR |
|
(A) |
L1-1_CR | [ |
SINEX-4_CR |
|
(ATT) |
RandI-2 (RandI-3) | [ |
SINEX-5_CR/SINEX-6_CR |
|
(ATT) |
RandI-5 | [ |
| ||||
|
||||
Au | Angiosperms and a gymnosperm | (T)2–5 | Nd | [ |
ZmSINE1 (Au-like) |
|
(T) |
Nd | [ |
SINE2-1_ZM (Au-like) |
|
(T)3 | Nd | [ |
SINE-5_Mad (Au-like) |
|
(T)3 | Nd | [ |
| ||||
|
||||
p-SINE1 |
|
(T) |
Nd | [ |
p-SINE2 |
|
(T) |
Nd | [ |
p-SINE3 |
|
(T) |
Nd | [ |
ZmSINE2.1*/SINE2-1a_SBi |
|
(T) |
LINE1-1_ZM | [ |
ZmSINE2.2* |
|
(T) |
LINE1-1_ZM | [ |
ZmSINE2.3* |
|
(T) |
LINE1-1_ZM | [ |
SINE2-1_SBi (ZmSINE2-like) |
|
(T) |
LINE1-1_ZM | [ |
SINE2-1c_SBi (ZmSINE2-like) |
|
(T) |
LINE1-1_ZM | [ |
ZmSINE3 |
|
(A) |
LINE1-1_ZM | [ |
OsSN1/F524 |
|
(A) |
Nd | [ |
OsSN2/SINE2-12_SBi |
|
(A) |
Nd | [ |
OsSN3 |
|
(A) |
Nd | [ |
SINE9_OS/SINE2-11_SBi (OsSN-like) |
|
(A) |
Nd | [ |
| ||||
|
||||
TS |
|
(TTG) |
RTE-1_STu | [ |
SB1-15 (S1/AtSN/RAthE/BoS) |
|
(A) |
Nd | [ |
LJ_SINE-1 |
|
(A) |
Nd | [ |
LJ_SINE-2 |
|
(A) |
Nd | [ |
LJ_SINE-3 |
|
(A) |
Nd | [ |
MT_SINE-1 |
|
(A) |
Nd | [ |
MT_SINE-2 |
|
(A) |
Nd | [ |
MT_SINE-3 |
|
(A) |
Nd | [ |
SINE-1_Mad |
|
(A) |
Nd | [ |
SINE-2_Mad |
|
(A) |
Nd | [ |
SINE-4_Mad |
|
(A) |
Nd | [ |
SINE2-1_PTr |
|
(A) |
Nd | [ |
SINE2-2_PTr |
|
(A) |
Nd | [ |
*subfamilies. Nd: no data.
In accordance with this hypothesis, almost all L1-clade LINEs in flowering plants end in poly(A) repeats, while all RTE-clade LINEs end in (TTG)n or (TTGATG)n (Table
3′-Repeats of plant LINE families [
Species | LINE clade | Families | 3′-Repeat | ||
---|---|---|---|---|---|
(A) |
Other repeats | None | |||
Flowering plants | L1 | 233 | 224 | 0 | 9 |
RTE | 7 | 0 |
|
0 | |
| |||||
L1 | 15 |
|
|
5 | |
Green algae | RandI | 8 | 0 |
|
0 |
RTEX | 6 | 0 |
|
0 |
Comprehensive phylogenetic analysis of L1-clade LINEs revealed three important points [
One monocot L1 lineage (monocot 1a in ME1) consisted of a large number of L1-clade LINEs that were identified mainly in the recently released maize and sorghum genomes. Moreover, one group of LINEs in this lineage retained a conserved 3′-end sequence [
Sequence comparisons of the 3′-end sequences of L1-clade LINEs and monocot SINE families. The 3′-end sequences of the monocot 1a (consensus), LINE1-1_ZM, and SINE2 (consensus) were aligned [
Furthermore, the putative transcript from this region forms a putative hairpin structure (Figure
Secondary structure models for the 3′-end sequences of L1s and monocot SINEs. The putative transcripts form putative hairpin structures. Compensatory mutations (1–5) are shown by red rectangles. Conserved nucleotides are indicated by blue circles. The minimum free energy levels were −10.8 or −12.6 (kcal/mol) for L1s (monocot 1a and LINE1-1, resp.) and (−12.5)–(−13.7) for SINEs (ZmSINE2.3: −15.4 and SINE2-1c: −17.7). The structures were deduced using mfold [
The last example of a SINE/LINE pair in the L1-clade was found in a green alga. The 3′-end sequence (~80 nucleotides) of
Proposed model for the 3′-end recognition of L1-clade LINEs. The ancestral L1-clade LINE in the ancestral green plant possessed a stringent, nonmammalian-type RNA recognition property. During the course of plant evolution, an L1 lineage lost the ability to recognize specifically the RNA template for reverse transcription, thereby introducing relaxed 3′-end recognition in land plants. ME1–3: plant L1 lineages; M, F: vertebrate L1 lineages.
It is possible that the ancestral L1-clade LINE in the genome of the common ancestor of green plants possessed stringent, nonmammalian-type RNA recognition properties. During the course of plant evolution, an L1 lineage then lost the ability to recognize specifically the RNA template for reverse transcription, thereby introducing relaxed 3′-end recognition in land (flowering) plants as well as in mammals. This model assumes that rigid sequence specificity was an ancestral state, although the timing of its loss might be subject to debate. Since horizontal transfer of LINEs between eukaryotes is rare [
The ancestral L1-clade LINE might have required the 3′-end sequence and the terminal poly(A) repeats. A few L1 lineages might then have lost their specific interaction with the 3′-UTR of the template RNA, retaining some role for the 3′-repeats. As shown in Table
Alternatively, the ancestral L1-clade LINE may have possessed relaxed, mammalian-type RNA recognition properties. During the course of plant evolution, the L1 lineages of land plants (ME1) and green algae might then have gained specific stringent-type recognition of the RNA template. However, it is difficult to imagine that the molecular machinery for rigid sequence specificity, such as the particular conformation of the RNA-binding domain, has arisen independently under reduced constraints.
L1 LINEs have contributed significantly to the architecture and evolution of mammalian genomes, whereas LTR retrotransposons are overwhelmingly found in certain flowering plants. Understanding the independent origins of flexible 3′-end recognition may help us to determine what distinguishes the fate of a retroposon in the eukaryotic genome and why it has succeeded so well in certain genomes [
This work was supported by an institutional grant from the Nagahama Institute of Bio-Science and Technology to K. Ohshima.