L1 Antisense Promoter Drives Tissue-Specific Transcription of Human Genes

Transcription of transposable elements interspersed in the genome is controlled by complex interactions between their regulatory elements and host factors. However, the same regulatory elements may be occasionally used for the transcription of host genes. One such example is the human L1 retrotransposon, which contains an antisense promoter (ASP) driving transcription into adjacent genes yielding chimeric transcripts. We have characterized 49 chimeric mRNAs corresponding to sense and antisense strands of human genes. Here we show that L1 ASP is capable of functioning as an alternative promoter, giving rise to a chimeric transcript whose coding region is identical to the ORF of mRNA of the following genes: KIAA1797, CLCN5, and SLCO1A2. Furthermore, in these cases the activity of L1 ASP is tissue-specific and may expand the expression pattern of the respective gene. The activity of L1 ASP is tissue-specific also in cases where L1 ASP produces antisense RNAs complementary to COL11A1 and BOLL mRNAs. Simultaneous assessment of the activity of L1 ASPs in multiple loci revealed the presence of L1 ASP-derived transcripts in all human tissues examined. We also demonstrate that L1 ASP can act as a promoter in vivo and predict that it has a heterogeneous transcription initiation site. Our data suggest that L1 ASP-driven transcription may increase the transcriptional flexibility of several human genes.


INTRODUCTION
Non-LTR and LTR retrotransposons are the two most abundant classes of transposable elements that contain regulatory regions (promoter, enhancer, and polyadenylation signal) necessary for their transcription and transposition [9]. Although most of the non-LTR retrotransposons and all the LTR retrotransposons in the human genome have lost their transpositional competence due to broken ORFs, a large number of them have retained regulatory sequences [10]. Scattered all over the chromosomes, retrotransposons can affect the regulation of host genes' transcription.
Recent studies carried out in several laboratories have revealed that LTR retrotransposons, such as an intracisternal A-particle in mice [11], endogeneous retroviruses in humans and mice [12], and Wis 2-1A in wheat [13], can influence transcription of adjacent genes. Similarly, two families of non-LTR retrotransposons, L1 [3] and B2 SINE [14], have been shown to drive transcription of human and mouse genes, respectively. It has been shown that the effect of retrotransposons on the host gene expression depends on their epigenetic status and thus may cause phenotypic variation between genetically identical individuals [15].
To reveal the possible function of L1 ASP as an alternative promoter of human genes, we carried out a systematic search for additional chimeric L1 ESTs/mRNAs deposited in GenBank. Here we describe 49 chimeric mRNAs generated by L1 ASP-driven transcription. Four of these chimeras differ from the bona fide mRNAs by 5 untranslated region (UTR) and another four (antisense RNAs) have regions complementary to exons of known mRNAs. Based on these bioinformatic data, we show that L1 ASP is capable of functioning as an alternative promoter in normal human tissues and drives tissue-specific transcription of several human genes.

Computational analysis
The search and analysis of chimeric L1 transcript sequences derived from the human subset of EST division of GenBank, 2 Journal of Biomedicine and Biotechnology EMBL, and DDBJ was carried out by using the strategy described earlier [2]. The alignment of EST and mRNA sequences to genomic contigs was done with SPIDEY [1] and confirmed with the human genome browser available at University of California, Santa Cruz [5]. BLAST [6], BLAST2 sequences [7], and SPIDEY programs, used in the analysis of sequences of RT-PCR products, were run on the National Center for Biotechnology Information BLAST network service using default parameters. The Transcriptional start sites in the DBTSS [22] were mapped using the BLASTN [6]. The accession numbers  of the respective one-pass cDNA entries were OFR00417,  CNR02292, KAR05296, TDR09332, T3R04859, TDR07820,  KMR03236, HKR11044, KMR01202, COL02332, KMR-02654, TDR05153, TDR04283, T3R08474, T3R07002, TDR-08640, DMC04507, HKR03051, T7R06886, T3R04414,  29R05294, OFR01051, T3R00241, and HKR11121. Splice site search was done with NNSPLICE 0.9 [23] and NetGene2 [24].

RT-PCR, Southern blot, and sequence analysis
PCR amplification of the human cDNAs of the multiple tissue cDNA (MTC) panels I and II (BD Biosciences Clontech) was carried out using recombinant Taq polymerase and Taq buffer with (NH 4 ) 2 SO 4 , 2.0 mM Mg 2 Cl 2 , 0.2 mM dNTP (Fermentas), and 0.75 μM primers. Each reaction contained 0.5 μl cDNA and 0.5 units of Taq polymerase in a final volume of 10 μl. After cDNA denaturation at 95 • C for 1 minute, amplification (35-40 cycles) was carried out by using the following cycling profile: 95 • C 30 s, 55 • -65 • C 30 s, and 72 • C 30 s for products < 0.5 Kb or 1 minute for products > 0.5 Kb. Primers and annealing temperatures used are given in the supplementary table Table 1. The locations of primers are shown in Figures 1 and 2. PCR products were sized on 1-2% agarose gels and analyzed by restriction mapping. After gel elution, their sequences were determined from both ends using BigDye Terminator cycle sequencing kit (Applied Biosystems).
First strand L1-MET cDNA was synthesized with a reverse primer positioned in MET exon 5 (TATGGTCAGC-CTTGTCCCTC) using total RNA isolated from human teratocarcinoma cell line (NTera2D1) and RevertAid H minus M-MuLV reverse transcriptase (Fermentas). This cDNA was denatured at 95 • C for 1 minute and amplified (30 cycles, see above) using one of the primer pairs (L1-MET-A-G) shown in Table 1. For Southern blot analysis, the RT-PCR products obtained were sized on an agarose gel, transferred to a nylon membrane and hybridized with a riboprobe specific to MET exons 2-5. Hybridization-positive products were detected by autoradiography.  Figure 1: Distribution of chimeric mRNAs derived from the L1 ASP as an alternative promoter. The presence of native mRNAs derived from a gene predicted by (a) AL711955 and KIAA1797, (b) CLCN5 [25], (c) SLCO1A2, (d) MET proto-oncogene [26], and their corresponding chimeric transcripts is shown at the upper and lower RT-PCR panels. cDNAs were derived from the following human tissues: 1, thymus; 2, prostate; 3, spleen, 4, small intestine; 5, colon; 6, ovary; 7, testis; 8, peripheral blood leukocytes; 9, placenta; 10, skeletal muscle; 11, brain; 12, kidney; 13, heart; 14, lung; 15, pancreas, and 16, liver. GenBank accession numbers for each mRNA and chimeric L1 mRNA are shown.

L1 ASP is predicted to function as an alternative promoter
We have previously characterized 9 out of 25 ESTs representing the L1 ASP-driven transcription of human genes [2]. Using the strategy described earlier [2] and an updated version of the dbEST (12 May 2004), we extended our search to reveal chimeric transcripts derived from an L1 ASP acting as a sole/alternative promoter or driving antisense transcription of host gene. Our search revealed 81 ESTs containing the opposite strand of L1 5 UTR, followed by a region identical to a cellular mRNA or random genomic sequence. Of this large number of chimeric transcripts, 49 ESTs represented mRNAs derived from the genes annotated in RefSeq database [8] (see the supplementary table (Table 2)). The remaining 32 ESTs contained noncoding or repetitive DNA sequences (Alus, MIR, LTR, L1, etc) spliced to the L1 5 UTR. Since they contained only short ORFs (< 100 aa) and had no similarity to known proteins, as revealed by BLASTP analysis, they were not analyzed further. Because of our interest in the L1 ASP-driven transcription of human genes, we carried out a detailed analysis of the 49 chimeric ESTs (Table 2). While most of the ESTs (40 out of 49) corresponded to mRNAs generated from the L1 ASPs of full-length L1s located in introns, 7 ESTs/mRNAs (NM 017794, BP351387, BM557937, CF593264, BP358215, BX955947, and BU176833) were derived from L1 ASPs located upstream of genes. In these 7 cases, L1 ASP may function as an alternative promoter. Four of these cases (NM 017794, BP351387, BX955947, and BP358215) represented chimeric mRNAs that contained the first coding exon of the gene. Thus, their translation could produce proteins identical to those encoded by the respective gene (Table 3). These genes encoded hypothetical protein KIAA1797 (possibly involved in mitotic chromosome condensation), CLCN5 (chloride channel 5) [25], SLCO1A2 (solute carrier organic anion transporter family member 1A2), and RGS6 (regulator of G-protein signalling 6) [29]. For the remaining three ESTs, splicing occurred within the coding sequence, giving rise to the chimeric mRNA lacking bona fide translation initiation signals. Since translation initiation signals are commonly located in the second exon of mammalian mRNAs [30], an L1 ASP located in the first intron could also give rise to a translatable chimeric mRNA. Of the 3 ESTs (BM910612, BE735854, and BP352155) derived from such L1 ASPs, only one (BE735854) had translation initiation signals matching those of the bona fide mRNA.
Of the 49 ESTs/mRNAs analyzed, 45 chimeras matched the orientation of the respective gene, while 4 ESTs had regions complementary to the exons of known mRNAs and thus were derived from the opposite strand of the gene ( Table 3).
Two of these ESTs (CB960713 and AV693621) were derived from the L1 ASPs located in the intron 25 of ABCA9  Journal of Biomedicine and Biotechnology      [2]. Sixteen identical or similar ESTs described earlier by Nigumann et al [2] and Wheelan et al [44] are shown by + and ++, respectively. 2 Source of the EST as annotated in EST division of GenBank. 3 EST similarity (≡) or identity (=) to a representative L1 genomic clone #11A [3]. Subfamily of L1 [4] and GenBank accession number were determined by genome browser [5]. For some ESTs the 5 nucleotides (< 28 nt) were derived either from vector/adaptor or represented as low quality sequence. 4 Similarity/identity to known mRNA as determined by BLASTN [6] and BLAST2 sequences [7] programs. mRNA description is based on the RefSeq database [8]. If the mRNA has not been described, an EST (marked by an asterisk) is shown. This EST contains a putative first exon transcribed from the non-L1 (native) promoter. 5 Genomic contig (accession no), chromosome (chr), and position of the L1 ASP in the intron, upstream (5 ) or downstream (3 )/total number of exons, as determined with MegaBLAST and SPIDEY programs. ND stands for not determined. 6 Orientation with respect to the gene's transcription.
(ATP-binding cassette, subfamily A, member 9) [31] and intron 46 of COL11A1 (collagen type XI alpha 1) [27], respectively ( Table 3). The remaining two ESTs (CD642260 and BE866323) were derived from L1 ASPs located downstream of the gene. One of these L1 ASPs resided 77 Kb downstream of the single exon gene encoding olfactory receptor, family 56, subfamily B, member 4 (OR56B4) [32] and the other located 34 Kb downstream of BOLL, homologous to the bol or boule-like gene of Drosophila [28].

L1 ASP provides an alternative promoter for several human genes
To reveal the potential of L1 ASP to function as an alternative promoter, we determined the expression profile of the chimeric mRNAs (containing bona fide translation initiation signals) in 16 different human tissues. For comparison, we also determined transcription from the native promoters (genes' true promoters). Results for the three chimeric mR-NAs (KIAA1797, L1-CLCN5, and L1-SLCO1A2) which were detected in the tissues studied are presented in the following section. Figure 1(a) shows that both the chimeric KIAA1797 mRNA, derived from the L1 ASP located about 26 Kb upstream of the first exon of gene, and the native mRNA (the 5 end of the mRNA was predicted from EST AL711955) are expressed in lung and pancreas. In addition, native mRNA is expressed in testis, placenta, and liver. Figure 1(b) shows that the chimeric L1-CLCN5 mRNA is expressed exclusively in placenta, while CLCN5 mRNAs derived from the upstream and downstream promoters (located about 102 Kb and 44 Kb from the L1 ASP, resp) produce mRNAs expressed strongly in lung. Translation of the chimeric mRNA could yield a protein identical to the one obtained from the CLCN5 mRNA derived from the downstream promoter. However, the latter is inactive in placenta suggesting that the L1 ASP provides placenta-specific expression to one of the protein isoforms encoded by CLCN5. The other protein isoform has a 70 aa-long N-terminal extension and is derived from an mRNA generated from the CLCN5 upstream promoter. This promoter is active in a number of tissues. Figure 1(c) shows that the chimeric L1-SLCO1A2 mRNA predicted from the EST (BX955947) is derived from the L1 ASP located 61 Kb upstream of the SLCO1A2 first exon. Surprisingly, RT-PCR yielded a 315 bp product (instead of the expected 324 bp product) derived from another L1 ASP located about 24 Kb further upstream. This novel chimeric mRNA is expressed exclusively in placenta, while SLCO1A2 mRNA is present in a number of tissues, but not in placenta. Therefore, similarl to CLCN5, L1 ASP is responsible for the placenta-specific expression of SLCO1A2.
Since the multiple tissue cDNA panel has been produced using different donors for different tissues (brain and lung pooled from 2 donors and other tissues pooled from 4-45 donors, except leukocytes which were pooled from 550 donors; the total number of donors was ∼750), it is conceivable that an RT-PCR product represents a donor-specific L1 insertion rather than tissue specific activity of the L1 ASP in that chromosomal position. Sequence analysis showed that only one of the L1 elements (L1-CLCN5), for which the tissue-specificity of L1 ASP activity was examined ( Figures  1 and 2), belongs to the highly polymorphic L1Ta subfamily [33]. The rest of the L1 elements, depicted in Figures 1 and  2, belong to the L1PA2 subfamily that expanded before the divergence of hominids [34], although some polymorphic insertions have been reported in humans [35]. It is unlikely that an L1 insertion is found in only one of the ∼750 donors  Journal of Biomedicine and Biotechnology  [1]. ESTs are grouped according splicing schemes [2]. EST described earlier by Nigumann et al [2] is marked by +. 2 Source of the EST as annotated in EST division of GenBank. 3 EST similarity (≡) or identity (=) to a representative L1 genomic clone #11A [3]. Subfamily of L1 [4] and GenBank accession number were determined by genome browser [5]. For some ESTs, the 5 nucleotides (< 28 nt) were either derived from vector/adaptor or represented as low quality sequence. 4 Similarity/identity to known mRNA as determined by BLASTN [6] and BLAST2 sequences [7] programs. mRNA description is based on the RefSeq database [8]. If the mRNA has not been described, an EST (marked by an asterisk) is shown. This EST contains a putative first exon transcribed from the non-L1 (native) promoter. 5 Genomic contig (accession no), chromosome (chr), and position of the L1 ASP in the intron, upstream (5 ), or downstream (3 )/total number of exons, as determined with MegaBLAST and SPIDEY programs. ND stands for not determined. 6 Orientation with respect to the gene's transcription.
represented in the MTC panel while it is present in GenBank (Table 3) and Ntera2D1 cell line (data not shown). Therefore we believe that the RT-PCR products obtained represent tissue-specific L1 ASP activity of fixed or high frequency L1 insertions.
In summary, the examples analyzed here provide evidence that L1 ASP can function as an alternative promoter in normal human tissues. Our results show that the L1 ASPdriven transcription correlates with that of the respective native promoter (Figure 1(a)) or expands the tissue-specific expression pattern of the respective gene (Figures 1(b) and 1(c)).
Although our primary goal was to reveal the potential of L1 ASP as an alternative promoter that generates translatable mRNAs, we also determined the distribution of the chimeric L1-MET mRNA derived from the L1 ASP located in the second intron of the MET proto-oncogene [26]. Figure 1(d) shows that the expression of the chimeric L1-MET mRNA correlates with that of the MET mRNA.

L1 ASP generates antisense transcripts complementary to different mRNAs
Of the 49 chimeric ESTs analyzed, only four corresponded to mRNAs that contained regions complementary to the exons of known mRNAs (see above). The expression data are presented for only those two so-called antisense RNAs which were detected in the human tissues examined. Figure 2(a) shows that the chimeric L1-COL11A1 mRNA, derived from the L1 ASP located in the intron 46 of COL11A1, is expressed in testis and to a lesser extent in placenta. Similarly, COL11A11 mRNA is present in these tissues. It should be noted that L1-COL11A1 (EST: AV693621) contains a 90 nt region complementary to the entire exon 40 of COL11A1 (Table 3). Figure 2(b) shows that two alternatively spliced variants of the chimeric L1-BOLL, derived from the L1 ASPs located about 34 Kb and 87 Kb downstream of BOLL, are expressed in prostate and peripheral blood leukocytes, respectively. The 5 ends of these transcripts are spliced according to splicing schemes III and V [2]. BOLL mRNA is expressed exclusively in testis. L1-BOLL contains a 60 nt region complementary to the 3 part of exon 6 of BOLL (Table 3). These results suggest that L1 ASP-driven antisense transcription has no general correlation with the transcription of the host gene.

L1 ASP-derived transcripts are present in all human tissues examined
Our study revealed that chimeric transcripts derived from the six unique genomic regions are present only in a few tissues. To examine the tissue specificity of L1 ASP activity more generally, we studied tissue-specific distribution of L1 ASP-derived transcripts, in which splicing occurs within the L1 5 UTR (splice variants II and IV) [2]. The use of these splice variants allowed us to discriminate between the L1 ASP-derived spliced transcripts and transcripts passing through the whole L1 5 UTR. Figure 3 shows that the splice variant II is expressed in most human tissues, except in thymus, skeletal muscle, and brain. The variant IV shows a more uniform expression pattern with minimal expression in placenta, skeletal muscle, and brain. In summary, these results show that L1 ASP-derived transcripts are present in all human tissues examined.

L1 ASP-driven transcription is characterized by heterogeneous start site
The fact that the sequence corresponding to the opposite strand of L1 5 UTR is present in the EST or mRNA sequence (Table 2) does not necessarily mean that transcription is initiated in the L1 ASP region, that is, in the L1 5 UTR around positions +400 to +600 [3]. In order to find evidence that the L1 ASP region acts as a promoter in vivo, we analyzed the database of transcriptional start sites (DBTSS) [22] for the presence of transcriptional start sites (TSS) which  Table 1 map to the opposite strand of L1 5 UTR. It has been estimated that more than 80% of the TSS in the DBTSS represent true sites of transcription initiation, that is, they correspond to the full-length cDNAs [36]. Twenty four of the 34 TSS, which mapped to the opposite strand of the L1 5 UTR, resided between positions +386 and +503 (Figure 4(a)). The observed nonuniform distribution of the TSS (∼70% of TSS within ∼13% of the 5 UTR) clearly shows that the region from +386 to +503, overlapping with the L1 ASP region, must contain a promoter. These results also suggest that transcription initiates at various positions within the L1 ASP region (Figure 4(a)).
To confirm the transcription initiation in the L1 ASP region, we analyzed the distribution of L1-MET chimeric transcripts (Figure 1(d)) by using RT-PCR and various oligonucleotide primers. Figure 4(b) shows that amplification of L1-MET cDNA can be carried out using primers A-F, but not by using primer G. This result indicates that the TSS is located in the L1 ASP region between the binding sites of primers A and F, while the region corresponding to primer G is absent from the L1-MET transcripts. Also, an in silico search for potential splicing signals [23,24] did not reveal any acceptor sites in the region between primers G and E, lending support to the conclusion that transcription is initiated in the L1 ASP region rather than read through the L1 5 UTR. The difference in band intensities (Figure 4(b)) observed for different primer pairs is consistent with the predicted start site heterogeneity. In summary, our results show that the L1 ASP can act as a promoter in vivo and its activity is characterized by start site heterogeneity.

DISCUSSION
In this paper we show that L1 ASP can cause widespread transcription of human genes and its activity correlates with that of the native promoter in some cases, while in other cases it can expand the tissue-specific expression pattern of the respective gene. It is believed that two or more genes located in a single expression domain are coexpressed [37]. Accordingly, an L1 ASP located near or within a gene may behave like a "parasite" whose activity is dependent on the transcription of the gene. This is exemplified by the simultaneous transcription from the L1 ASP and native promoter ( Figures  1(a), 1(d), and 2(a)). Surprisingly, in other cases the L1 ASP activity may be regulated independently, as observed here for L1-CLCN5, L1-SLCO1A2, and L1-BOLL mRNAs ( Figures  1(b), 1(c), and 2(b)). Although the L1 ASP-driven transcripts were detected in all tissues examined (Figure 3), the results described suggest that the L1 ASPs at defined loci are not active in all tissues. The different tissue-specific activity of L1 ASPs can hardly be explained by their minimal sequence divergence, but could be explained with differences in their epigenetic state. In some cases, transcriptionally active epigenetic state could be stochastically confined to some L1s in certain tissues.
Our results show that L1 ASP acts as an alternative promoter of several human genes (Figures 1(a)-1(c)). Alternative promoters, giving rise to alternative first exons, generate variation in gene expression by increasing transcriptional flexibility and translational diversity. For example, the human NOS1 gene, encoding neuronal isoform of nitric oxide synthase, has 9 alternative promoters, which determine its tissue-specific transcription and translational efficiency of the resulting NOS1 mRNAs with different 5 UTRs [38]. Another striking example is the human BDNF gene, encoding brain-derived neurotrophic factor, which has 6 promoters and first noncoding exons differentially used in different parts of the brain (A Kazantseva and T Timmusk, personal communication). The L1 ASP, acting as an alternative promoter, generates a chimeric mRNA whose translation could produce a protein identical to the genuine protein. However, the translatability of this transcript depends on the length of the 5 UTR, the number of upstream ORFs, and the strength of initiation signals [39]. Comparison between the 5 UTRs of the native and chimeric mRNA revealed no major differences in the above-mentioned factors that can abrogate the usage of the genuine ORF (data not shown). Therefore, it is likely that the chimeric L1 transcripts may be translated with efficiency comparable to that of the native transcripts.
Alternative promoters can also generate mRNAs with different 5 coding exons, which may be used in the generation of N-terminal variants of the same protein [40]. Similarly, most L1 ASPs located in introns may, in principle, produce chimeric mRNAs and their translation could yield N-terminally truncated proteins. However, transcription from an L1 ASP located in an intron (39 examples described in Table 2) may be strongly inhibited because of the readthrough transcription from the upstream native promoter [41,42]. In addition, if transcripts from the intronic L1 ASPs are produced, they may not be readily translated because of the absence of proper initiation context. Although N-terminally truncated proteins with possible dominant negative effects have been shown to exist in normal and cancer cells [40] (references therein), additional experiments are required to prove the translation of chimeric L1 transcripts.
We have detected two L1 ASP-derived antisense RNAs complementary to the exons of COL11A1 and BOLL mR-NAs ( Figure 2). The other two antisense RNAs predicted from the ESTs (Table 3) were not detected in the human tissues analyzed. Antisense RNAs and antisense transcription are known to cause downregulation of gene transcripts via RNAi-mediated mRNA degradation [43] and transcriptional collision [42], respectively. The possible regulatory interaction between sense and antisense RNAs or transcription may be revealed from the negative (or inverse) correlation of their expression. The partial positive correlation between COL11A1 mRNA and its antisense counterpart and the negative correlation between BOLL and L1-BOLL suggest that there is no general correlation between the L1 ASP-driven antisense transcription and the transcription of the gene.
In summary, we have demonstrated that L1 ASP is active in a wide variety of normal human tissues and it is capable of functioning as an alternative promoter by providing the tissue-specific expression of several human genes.