The ORF1 Protein Encoded by LINE-1: Structure and Function During L1 Retrotransposition

LINE-1, or L1 is an autonomous non-LTR retrotransposon in mammals. Retrotransposition requires the function of the two, L1-encoded polypeptides, ORF1p and ORF2p. Early recognition of regions of homology between the predicted amino acid sequence of ORF2 and known endonuclease and reverse transcriptase enzymes led to testable hypotheses regarding the function of ORF2p in retrotransposition. As predicted, ORF2p has been demonstrated to have both endonuclease and reverse transcriptase activities. In contrast, no homologs of known function have contributed to our understanding of the function of ORF1p during retrotransposition. Nevertheless, significant advances have been made such that we now know that ORF1p is a high affinity RNA binding protein that forms a ribonucleoprotein particle together with L1 RNA. Furthermore, ORF1p is a nucleic acid chaperone and this nucleic acid chaperone activity is required for L1 retrotransposition.

L1 is an interspersed repeated DNA found in mammalian genomes that attained its high copy number by retrotransposition. It belongs to a large family of mobile elements that replicate via reverse transcription of an RNA intermediate. These elements, non-LTR retrotransposons, are distinct from retroviruses and LTR-containing retrotransposons which also replicate via reverse transcription of an RNA intermediate. Non-LTR retrotransposons are widely distributed throughout eukaryotes and likely all share the novel mechanism of replication known as target-site-primed reverse transcription, or TPRT, whereby reverse transcription of the L1 RNA template is primed from a 3 hydroxyl at the genomic insertion site (reviewed in [1]).
Close inspection of sequences of the > 500, 000 copies of either mouse or human L1 reveals a multigene family comprised mainly of truncated, mutated, or rearranged copies of a small number of functional, full-length elements; only a subset of the full-length elements is capable of retrotransposition (see [2,3], for recent reviews). The functional retrotransposons are 6-7 kb in length and contain two long open reading frames (ORFs), both of which encode proteins that are required for retrotransposition [3]. ORF2 encodes a 146 kd polypeptide which provides the two known enzymatic activities that are required to convert the RNA intermediate into a new genomic DNA copy of L1 during TPRT: endonuclease [4] and reverse transcriptase [5]. These two activities of ORF2p were predicted from sequence similarity between L1 and known apurinic/apyrimidinic endonuclease [4] and reverse transcriptase [6,7] proteins, then verified biochemically [4,8]. In contrast, the role of ORF1p during retrotransposition has remained far more elusive because the amino acid sequence predicted by ORF1 lacks homology with any protein of known function (see [9], and results of October 2005 NCBI protein and nucleotide searches).

ORF1 AND RELATED SEQUENCES
The first intact coding sequence for an ORF1 protein was found by sequence analysis of a mouse L1 element called L1Md-A2 [7]. Comparison of the theoretical translation of this 41.2 kd protein from L1Md-A2 to that of a consensus primate L1 sequence revealed that the C-terminal half of ORF1 was evolving under selective pressure, whereas the N-terminal half was not. This early analysis also noted that the predicted ORF1 protein is quite basic, a common feature of nucleic acid-binding proteins [7]. Subsequently, many ORF1-like sequences have been determined from the L1 elements of different mammals, and from related elements found in fish [16]. The C-terminal, homologous, basic Thicker bars represent the coiled-coil (gray, based upon coils analysis [10]) and conserved domains (black, based upon multiplesequence alignments using T-Coffee [11]). These two domains overlap in the human, rabbit, and fish, but not the two mouse or rat ORF1 protein sequences, as indicated. Sequences used were mouse A101 (Q91V68l, [12]), mouse L1spa (O54849, [13]), rat (Q63303), human L1rp (AAD39214, [14]), rabbit ( [15], not in GenBank), and fish swimmer (AAD02927, [16]). The two mouse and the human ORF1 protein sequences are from retrotransposition-competent elements; the other sequences are from untested elements. A model for the trimeric structure of mouse L1 and its role in TPRT appeared in [17] by Martin et al. domain is a general feature of the ORF1 proteins from all of these elements (the conserved domain in Figure 1).
A second predicted feature of all of these ORF1 amino acid sequences is the presence of a long coiled-coil domain upstream of the conserved domain ( Figure 1). In human L1, this coiled-coil domain encompasses a leucine zipper [18]. In rabbit ORF1, this coiled-coil region appears similar to keratin [15]. The most likely explanation for the poor sequence similarity among the different ORF1 sequences in this region with one another and with other coiled-coil-containing proteins (eg, keratin) is that all share a coiled-coil domain with distinct evolutionary origins, probably brought into proximity with the conserved, basic C-terminal domain via recombination. Recombination to create novel sequence variants is often evident in L1 lineages [19][20][21][22][23]. With this scenario, the constraints imposed by a requirement for protein-protein interaction via a coiled-coil domain in ORF1 protein forces a small degree of apparent similarity in the absence of homology among these diverse sequences. Conversely, it is possible that all of these ORF1 sequences evolved from a common ancestor, but extremely rapid divergence of the sequences towards the N-terminus of the protein has obscured the evidence for this homology. Interestingly, sequence variation in this N-terminal region of ORF1p is particularly great within subtypes of human [24], rat (see [2], and references therein), and mouse (see [25,26], and references therein) L1. Positive selection acting within this portion of the ORF1 protein is associated with the evolutionary success/extinction of human L1 lineages, perhaps reflecting drive for ORF1p to either attract or avoid an interacting factor [24]. Additional sequences from L1 elements in other species may shed light on whether the amino acid sequence of the N-terminal region is undergoing strong selective pressure for rapid sequence divergence by accumulating replacement substitutions, or if novel sequences are often acquired from nonhomologous sources.
An unexpected feature of the L1 ORF1 sequence is revealed when its amino acid sequence is used as the query in a BLASTP search. The program reports that it has detected a putative conserved domain. This conserved domain is essentially the entire mammalian ORF1 protein sequence and has been annotated "transposase 22." Given that transposase is the enzyme responsible for the DNA breakingjoining reactions that occur during transposition of a wide variety of DNA elements [27], it seems likely to be a misnomer to call this domain a transposase for several reasons. Most significantly, the TPRT reaction used by L1 and the other non-LTR retrotransposons does not require a transposase enzymatic activity because cDNA is synthesized in situ using chromosomal primers [28]. Thus, L1 replication lacks any intermediate equivalent to the double-strand DNA substrate of transposases and the related integrases [27]. Futhermore, biochemical and mutational analyses demonstrate that the endonuclease activity of L1 ORF2p is responsible for the DNA cleavages that occur during TPRT [4,29,30]. Finally, the conserved domain in ORF1p of known functional mouse and human L1 elements lacks an apparent DDE motif [31], which is conserved in the active sites of transposases and integrases. Due to vast sequence divergence among members of the transposase/integrase superfamily of proteins, their DDE motifs are best recognized in structure [32] rather than sequence alignments; hence absolute resolution of the question of whether L1 ORF1p should be annotated "transposase 22" awaits atomic-level resolution of its structure.

ORF1p IS REQUIRED FOR RETROTRANSPOSITION
Even the relatively conserved C-terminal domain of ORF1 is more divergent than ORF2 when the sequences of human and mouse L1s are compared [7]. Hence, when it was finally possible to measure L1 retrotransposition activity using an autonomous retrotransposition assay [5,33], it was unexpected to learn that mutations in ORF1 were at least as severe, if not more so, than those that abolish reverse transcriptase activity. No retrotransposition events were detected in human L1 mutants in which either the serine at position 119 of ORF1p was replaced with a stop codon, or a highly conserved diarginine at 261/262 was replaced by dialanine; in both of these cases, the frequency of retrotransposition was less than 0.06% of the wild-type parental element. In contrast, mutation of a critical active-site residue in the reverse transcriptase domain of ORF2 (D702Y), which abolishes in vitro enzymatic activity [8], knocked retrotransposition down to 0.15% of wild type [5]. The other known enzymatic activity of ORF2 in L1 is endonuclease, which is also required for TPRT [34]. Several mutations that eliminate detectable endonuclease activity in an in vitro nicking assay again knock retrotransposition down to 0.2-1%, but do not eliminate it [34]. We observe similar effects of mutations in the ORF1 conserved domain compared to the endonuclease and reverse transcriptase domains in mouse L1.
3 Thus, to date, the most stringent mutations of L1 are those in ORF1. As noted when the leakiness of the ORF2 mutations was originally observed, it is likely that ORF2p is more readily supplied in trans (albeit with substantially reduced efficiency, [5]), whereas ORF1p appears to be more stringently required in cis with the L1 RNA. These findings imply that ORF1 is critical for an earlier step in the retrotransposition cycle than reverse transcription itself, for example, regulating expression of ORF2 [35] or recruitment of ORF2p into the L1 ribonucleoprotein complex, and/or delivering the L1 RNP to the chromosomal DNA and facilitating the strand exchanges that are required during TPRT [17,36].
In light of the stringent cis-requirement for ORF1p during L1 retrotransposition, it is interesting that ORF1p appears to be dispensable when the L1 machinery provided by ORF2p is usurped by the human SINE, Alu, for its amplification-this surprising finding may be explained if the SRP9/14 protein can replace ORF1p function [37]. In contrast, ORF1p is required along with ORF2p for processed pseudogene formation by L1 [38,39].

FUNCTIONAL ANALYSIS OF ORF1p: PROTEIN-PROTEIN INTERACTION
Leucine zippers and coiled-coil domains are typically associated with protein-protein interactions. In cytoplasmic extracts from human cells that express high levels of L1, NTera2D1, the ORF1p (also called p40) partitions into a 160,000 x g pellet. Treatment of this pellet with increasing concentrations of glutaraldehyde to cross-link the protein shifts increasing amounts of the 40 kd ORF1 protein into complexes that run at 78, 89, 100, and 200 kd on SDS-PAGE, suggesting that the ORF1ps in these cytoplasmic particles are interacting closely with one another, or with other cellular proteins. This study also examined full-length p40 and various truncations expressed in E coli for protein-protein interactions, thereby mapping the multimerization domain to the N-terminal half of the protein, in the region of the predicted coiled-coil [9].
Similar findings were obtained with mouse L1 ORF1p using somewhat different experimental approaches. Recombinant protein purified from E coli coimmunoprecipitated 35 S-labeled protein synthesized in vitro in rabbit reticulocyte lysate, demonstrating that mouse ORF1p is able to self-associate [40]. A combination of yeast 2-hybrid and GST pull-down assays were later used to map the region in mouse ORF1p responsible for multimerization; the predicted coiled-coil domain is both necessary and sufficient for protein-protein interaction [41]. More recently, overexpression of soluble ORF1p in baculovirus permitted analysis of its multimerization state by analytical ultracentrifugation. These studies revealed that the full-length protein forms a highly stable homotrimer, whereas a truncated ORF1p containing just the carboxy-terminal C-1/3 does not selfassociate, even at relatively high protein concentrations [17]. All of the above findings consistently support the conclusion that the coiled-coil domain is wholly responsible for multimerization in both mouse and human L1 ORF1ps, with the trimer being the biologically relevant form of mouse ORF1p [17].

FUNCTIONAL ANALYSIS OF ORF1p: NUCLEIC ACID BINDING
The bias towards highly basic amino acids in ORF1p led to the hypothesis that this protein interacts with nucleic acids [7]. Early evidence for such interaction was provided by cosedimentation of ORF1p with L1 RNA in sucrose gradients loaded with cytoplasmic extracts prepared from the mouse embryonal carcinoma cell line F9. The heavy complexes that formed were termed L1 ribonucleoprotein particles, or L1 RNP. L1 RNPs are not sensitive to disruption by EDTA, but are sensitive to proteolysis [42]. Exposure of the RNPs to UV light rapidly cross-links the RNA to protein, indicating a close association between L1 RNA and protein [21]. The human ORF1p (p40) also associates with L1 RNA based upon a series of cosedimentation experiments. p40 remains in the supernatant upon centrifugation at 800 and 12,000 x g, but pellets at 160,000 x g. Treatment of the cytoplasmic extract (800 x g supernatant) with RNase but not DNase prior to centrifugation shifts p40 from the 160,000 x g pellet to the supernatant, indicating that the protein is pelleting because it is in a large complex with RNA, or an RNP. The human L1 RNPs are not dependent on divalent cations or disturbed by 10 mM EDTA, thus it appears that human ORF1p is bound to RNA in an RNP that is quite similar to the mouse L1 RNP. Further experiments indicated that the RNA present in these RNP was L1 RNA and not actin or G3PDH RNA [9]. The presence of ORF1p in RNP was found to be sensitive to high concentrations of monovalent cations as well as RNase treatment [43,44], leading to an enrichment procedure for RNAfree ORF1p from human cells [43], which was then used to provide evidence for one or two relatively high-affinity binding sites for ORF1p in L1 RNA [35]. It is important to note that all of the above experiments examined the interaction of L1 RNA with ORF1p in extracts from animal cells where L1 RNA and ORF1p were present as minority components of a complex mixture.
A more direct assessment of the nucleic acid-binding properties of the ORF1 protein is provided by studies of highly purified protein prepared after overexpression in either E coli or baculovirus-infected insect cells. As with ORF1p from mammalian cells, it is critical to take precautions against copurification of RNA with the protein-when protein is purified in standard, nondenaturing conditions without high concentrations of monovalent cation, RNA is coenriched and the protein is heavily contaminated with nucleic acid. This is readily apparent on a wavelength scan, or by examining purified protein by ethidium bromide staining after electrophoresis through agarose gels [41]. Our earliest experiments with protein expressed in E coli used denaturing conditions (8 M urea) to purify the protein from the insoluble inclusion body fraction which simultaneously removed bound nucleic acid. That protein was used for UV cross-linking and electrophoretic mobility-shift assays (EMSAs), which demonstrated that ORF1p binds RNA and single-stranded DNA [40]. The affinities observed in those experiments, however, were lower than those obtained with subsequent experiments that were done using protein purified from the soluble fraction rather than the refolded denatured protein from inclusion bodies, probably because most of the protein was not correctly refolded to its native form after the denaturation. Interestingly, the RNA-binding region of the full-length ORF1p was mapped by simply examining various GST-ORF1p fusion constructs (containing fulllength and a variety of truncated regions of ORF1p) for the presence of copurifying RNA. As long as the E coli extracts and affinity purification steps were kept in physiological concentrations of monovalent cation, RNA copurified with the protein if it contained the RNA-binding domain. All deletions containing the C-1/3 basic domain were contaminated with RNA and those that lacked it were free of RNA contamination. This same region of mouse ORF1p was found to be both necessary and sufficient for binding nucleic acid based upon transfer of 32 P from RNA to protein by UV crosslinking [41].
The RNA-binding properties of the full-length mouse ORF1 protein purified from baculovirus were further assessed using coimmunoprecipitation and filter-binding assays. These experiments examined the affinity of ORF1p for a variety of transcripts, and tested whether a specific cisacting sequence in mouse L1 RNA recruits ORF1p. The presence of a high-affinity site in human L1 RNA was suggested based upon preferential coimmunoprecipitation of a 41 nt T1 nuclease-resistant fragment with ORF1 antibody [35]. The mouse L1 RNA coimmunoprecipitation experiments revealed that efficient recovery of the 32 P-labeled RNA required at least 38 nt, suggesting a length effect rather than a sequence requirement. All longer RNAs tested precipitated efficiently, independent of sequence. Further evidence that ORF1p is a nonsequence-specific RNA-binding protein was provided by results of nitrocellulose filter-binding assays using highly purified mouse ORF1p expressed in baculovirus. Transcripts that contained specifically the 38 nt sequence in either the sense or antisense orientation both bound with high affinity. Although there is a slight increase in the apparent binding affinity of ORF1p to RNA containing the sense 38 nt sequence compared to the same sequence in antisense orientation, it is only 4-to 7-fold and therefore too small to be considered specific binding for sense versus antisense L1 RNA [45].
This discrepancy between the results with mouse and human L1 ORF1ps regarding the existence of a high-affinity binding site within L1 RNA has not been resolved. Possibly, it is due to differences between mouse and human L1, or, more likely, between the reagents used for the assays. For example, it is possible that another protein that is critical for the site-specificity was present in the partially purified preparation from human cells, but missing when the protein was purified from baculovirus-infected insect cells or E coli. The question of whether L1 RNA contains a specific, high-affinity binding site for ORF1p is important for L1 biology because it offers an attractive explanation for the cis-preference of ORF1p for L1 RNA that is evident from both the evolution-ary pattern of L1 as well as experimental evidence from the autonomous retrotransposition assay (see [38,39], and references therein).

FUNCTIONAL ANALYSIS OF ORF1p: NUCLEIC ACID CHAPERONE ACTIVITY
Non-LTR retrotransposons are present throughout Eukaryota, but diverged long ago into five groups based upon the phylogenetic relationships of their reverse transcriptase region (the only sequence feature conserved among all non-LTR retrotransposons) and the type and organization of their protein domains [1]. Three of these groups, L1, I, and Jockey, each named for the first element of that group described, have a separate ORF upstream of their reverse transcriptasecontaining ORF. In two of these three groups, I and Jockey, the upstream ORFs (ie, ORF1s), both contain zinc-finger domains, making their ORF1 proteins reminiscent of retroviral gag proteins. An important function associated with the zinc-finger-containing, nucleocapsid domain of gag is that of nucleic acid chaperone, which is critical for retroviral replication [46]. Nucleic acid chaperones are proteins that facilitate rearrangements of nucleic acids to their thermodynamically most stable form. A combination of at least three protein features contribute to nucleic acid chaperone activity: charge neutralization due to an excess of basic amino acids, a higher affinity for single-stranded than for doublestranded nucleic acids, and the ability to lower the cooperativity of the helix: coil transition [47]. These properties must be exquisitely balanced so that the chaperone can promote both melting and annealing of nucleic acids. The ORF1 protein from the non-LTR retrotransposon, I factor, shares several biochemical properties with retroviral nucleocapsid proteins, including the ability to accelerate annealing of complementary single-strand DNA sequences; these observations led to the suggestion that the I factor ORF1 protein functions as a nucleic acid chaperone during replication [48].
Mouse L1 ORF1 protein also accelerates annealing of complementary oligonucleotides. In addition, it lowers the Tm of mispaired duplex DNA, accelerates a strand displacement reaction if an imperfect duplex is challenged by the addition of the perfect complement, and alters the force required for the helix: coil transition in single-molecule studies using optical forceps [36,49]. Significantly, the nucleic acid chaperone activity of ORF1p is required for retrotransposition. A single-point mutation that destroys effective chaperone activity (R297K) without affecting RNA or singlestranded DNA binding affinity also destroys retrotransposition activity [49]. Consistent with this observation, the analogous mutation in human L1 also destroys retrotransposition, but not RNP formation [44].

SUMMARY
L1 is arguably the most significant dynamic force currently operating upon the mammalian genome. Retrotransposition is just one of many facets of L1's contribution to genetic plasticity and diversity [50], although it lies at the root of all of the others. Retrotransposition requires the proteins encoded by both of the two open reading frames in L1. The two known functions of the protein encoded by ORF2, endonuclease and reverse transcriptase, were readily predicted based upon sequence homology, whereas homology has so far failed to provide clues regarding the function of the ORF1 protein. In spite of this disadvantage, however, several significant advances have been made in establishing the structure and function of this critical retrotransposition protein though a series of in vivo and in vitro experiments. The protein binds both RNA and DNA, with a higher affinity for single-stranded than double-stranded nucleic acids. The RNA-binding function leads to RNP formation and safe delivery of the RNP to genomic DNA so that it can undergo TPRT. The nucleic acid chaperone activity of ORF1p likely contributes more directly to reverse transcription by TPRT, perhaps by facilitating the strand exchanges that place the DNA primer onto the RNA or cDNA template, or by melting secondary structure in the RNA, or both.