Intron Retention and TE Exonization Events in ZRANB2

The Zinc finger, RAN-binding domain-containing protein 2 (ZRANB2), contains arginine/serine-rich (RS) domains that mediate its function in the regulation of alternative splicing. The ZRANB2 gene contains 2 LINE elements (L3b, Plat_L3) between the 9th and 10th exons. We identified the exonization event of a LINE element (Plat_L3). Using genomic PCR, RT-PCR amplification, and sequencing of primate DNA and RNA samples, we analyzed the evolutionary features of ZRANB2 transcripts. The results indicated that 2 of the LINE elements were integrated in human and all of the tested primate samples (hominoids: 3 species; Old World monkey: 8 species; New World monkey: 6 species; prosimian: 1 species). Human, rhesus monkey, crab-eating monkey, African-green monkey, and marmoset harbor the exon derived from LINE element (Plat_L3). RT-PCR amplification revealed the long transcripts and their differential expression patterns. Intriguingly, these long transcripts were abundantly expressed in Old World monkey lineages (rhesus, crab-eating, and African-green monkeys) and were expressed via intron retention (IR). Thus, the ZRANB2 gene produces 3 transcript variants in which the Cterminus varies by transposable elements (TEs) exonization and IR mechanisms. Therefore, ZRANB2 is valuable for investigating the evolutionary mechanisms of TE exonization and IR during primate evolution.

Alternative splicing (AS) of premessenger RNAs (pre-mRNAs) is an important molecular mechanism that increases human transcriptome complexity and flexibility [10]. Genome-wide analyses of AS events suggest that 40-60% of human genes have alternatively spliced transcripts [11]. With the aid of accumulated transcriptome sequencing data, 5 distinct AS mechanisms have been identified, including exon skipping, alternative 5 splice sites, alternative 3 splice sites, intron retention, and mutual exclusion. Intron retention (IR) events are very rare, accounting for less than 3% of all AS events in the human and mouse genomes [10]. In humans, 14.8% of the 21,106 known genes showed at least 1 IR event, mostly involving untranslated regions (UTRs). Eighty-eight cases of IR events seem to be involved in several syndrome-associated genes and tumorigenic processes [12]. Although exonization is not categorized as 1 of the 5 distinct mechanisms, it represents an AS process by which new exons are acquired from intronic DNA sequences [13]. The generation of canonical splicing sites (splicing acceptor and donor sites) by genomic insertions/deletions or mutations could cause the exonization events [13]. Recent studies have indicated that exonization events are derived from transposable elements (TEs) such as LTR retrotransposons (e.g., human endogenous retroviruses (HERVs)) and non-LTR retrotransposons (e.g., short interspersed elements [SINEs] and long interspersed elements (LINEs)) in various species [14,15]. Many TE sequences contain potential splice sites [16]. TEs are a major component, comprising more than 40% of the human genome [17]. LINEs and LTR retrotransposons insert themselves into new genomic positions through a copy and paste mechanism by encoding their own reverse transcriptase [18]. SINEs do not have their own reverse transcriptase domain; however, they are reverse transcribed and inserted into the genome by LINE element-enzymatic machinery [13]. TEs regulate gene activity by providing promoter, enhancer, exonization, and new polyadenylation signals [19]. Thus, various genes are regulated by TEs [20]. Nekrutenko and Li suggest that TEs are located in ∼4% of proteincoding regions in the human genome [21]. According to the genome-wide survey by Sela et al., 1824 human genes have TE-derived exons; SINE and LINE elements comprise approximately 68% and 18% of exonized TEs, respectively, [22]. Although LINE elements are present in fewer copies and mediate exonizations at a lower frequency than SINE elements do, exonization events of LINEs frequently occur in the human genome. LINEs comprise up to 20% of the human genome [23]. LINE1, LINE2, and LINE3/CR1 are 3 distantly related LINE families that represent approximately 17%, 3%, and 0.3% of the human genome, respectively, [23]. Thus, transposable elements and intronic sequences may serve as transcript units to enrich the transcriptome with limited genomic resources [13].
We performed evolutionary and comparative analyses of rhesus, crab-eating, and African-green monkeys and marmosets to investigate exonization events derived from insertion of LINEs and IR in the protein-coding regions of human ZRANB2.
We used a standard protocol to isolate genomic DNA form heparinized blood samples from the following species:

Molecular Cloning of Genomic PCR and RT-PCR Products and Sequencing
Procedure. RT-PCR products were separated on a 1.2% agarose gel, purified with the Gel SV extraction kit (GeneAll), and cloned into the pGEM-T-easy vector (Promega). The cloned DNA was isolated using the Plasmid DNA mini-prep kit (GeneAll). Sequencing of primate DNA samples and alternative transcripts was performed by a commercial sequencing company (Macrogen).

Integration Time of Plat L3 and L3b
Elements. To determine when the Plat L3 and L3b elements were integrated during primate radiation, we performed genomic PCR amplification using primer pairs specific to highly conserved regions in various primate samples (Figure 2). We validated the randomly selected amplified products by sequencing (see Supplementary Figure 2). The Plat L3 and L3b elements in the ZRANB2 gene were integrated in all tested primate lineages, including hominoids (human, chimpanzee, bonobo, and gorilla), Old World monkeys (rhesus monkey, Japanese monkey, crab-eating monkey, pig-tail monkey, Africangreen monkey, mandrill, colobus, and langur), New World monkeys (marmoset, tamarin, capuchin, squirrel monkey, night monkey, and spider monkey), and prosimian (ringtailed lemur).

RT-PCR Amplification and Sequencing Analysis of ZRANB2 in Human and Monkeys.
To investigate the expression pattern of isoform b transcript (containing the Plat L3 element-derived exon), we performed comparative RT-PCR analysis in 6 human and marmoset tissues (cerebrum, colon, liver, lung, kidney, and stomach) and 7 rhesus monkey, crabeating monkey, and African-green monkey tissues (cerebrum, colon, liver, lung, kidney, pancreas, and stomach). To amplify the isoform b-specific transcript in primate tissues, we designed primers specific to the 9th exon (sense primer) and Plat L3-derived exon (antisense primer). We confirmed that these primer pair sequences were highly conserved in human and marmoset. The expected RT-PCR products (215 bp) were ubiquitously transcribed in all tested samples (Figures 3(a)-3(e)). Remarkably, an unexpected product (upper bands) was also ubiquitously expressed in rhesus monkey, crab-eating monkey, and African-green monkey. Although the unexpected bands were not detectable by gel electrophoresis of human and marmoset samples, we confirmed that the very weak bands were present in these species. Sequencing of the amplified product revealed that it was an intron-retained transcript variant (V1) (Figure 3(f)).

TE-Exonization of LINE Elements during Primate Evolution.
A structural analysis revealed that Plat L3 is integrated between the 9th and 10th exons in an intronic region of ZRANB2, where it provides an alternative splicing site that generates a new Plat L3-derived exon via exonization (Figures 1 and 4). To determine when the Plat L3 element was integrated, PCR amplification was performed in various primate genomes (Figure 2). Plat L3 was integrated in all primate lineages, suggesting that Plat L3 was integrated in a common ancestor prior to the divergence of simian and prosimian, possibly more than 63 million years ago. RT  in human, Old World monkey (rhesus monkey, crab-eating monkey, and African-green monkey), and New World monkey (marmoset) (Figures 3(a)-3(e)). Sequencing analysis showed that integrated Plat L3 sequences are highly conserved in all primate lineages, in comparison to the adjacent intronic sequences and L3b element ( Supplementary Figure 2). Moreover, canonical splicing sites (splicing acceptor and donor sites) are perfectly conserved from the hominoid to prosimian lineages ( Figure 4). Therefore, perfectly conserved splicing sites of Plat L3 and well-conserved Plat L3 sequences could be exonized in ZRANB2 gene transcripts in different primate lineages, including hominoids, Old World monkeys, and New World monkeys. We were unable to validate the Plat L3-derived transcripts in prosimian samples, but, based on sequence analysis, we assume the existence of exonization events in ring-tailed lemur (Supplementary Figure 2).

Abundant IR Event in Old World
Monkeys. An unexpected large band (1225 bp) detected by RT-PCR amplification was found in all tissues of Old World monkeys (rhesus monkey, crab-eating monkey, and African-green monkey) (Figure 3). Sequencing revealed this to be an alternatively spliced variant with the 9th intron retained and transcribed via an IR event [10]. Intriguingly, the intron-retained exon was detected at very low levels in human and marmoset monkey, unlike Old World monkeys. We suggest 2 alternative hypotheses of lineage-specific IR events and mixed mechanisms, including IR events and lineage-specific protection of nonsense-mediated mRNA decay (NMD).
In the previous model of IR events, high GC content in an intron sequence reduced the excision rate [24], and the retained introns are significantly shorter than nonretained introns [25]. However, we found that the 9th intron sequence of ZRANB2 has a low GC content (approximately 30%) in human, rhesus monkey, and marmoset and was 1023 bp long, which is greater than the length of nonretained introns. Therefore, these older models could not explain the results of the IR event in ZRNAB2. The splicing acceptor and donor sites of the 9th and 10th exons were highly conserved in primates. Therefore, the IR event was not induced by weak signals of alternative splicing sites (Figure 4). Recent studies have shown that splicing is repressed by the binding of polypyrimidine tract-binding proteins (PTB) to specific sequence motifs (CUCUCU, UUCUCU, UUCCUU, and CUUCUUC), induced by IR events in FOSB [26]. We found that the PTBbinding consensus sequence UUCUCU 16 bp upstream from the 3 end of the 9th intron was perfectly conserved from hominoid to prosimian (Supplementary Figure 2). Moreover, recent studies suggested regulation of PTB movement from the cytoplasm into the nucleus by phosphorylation [27]. These concepts are merged in the ZRANB2 gene, where the IR event may occur via the PTB-binding sequence in human, rhesus monkey, crab-eating monkey, African-green monkey, and marmoset; however, the phosphorylation status of PTB may regulate the IR event in the primate lineage.
To explain the mixed mechanisms including IR events and lineage-specific protection of NMD, we first analyzed the relationship between IR events and NMD. In most cases, IR introduces premature termination codons (PTCs) into the mRNA, typically resulting in degradation by NMD [28]. The exon junction complex (EJC) NMD model is a well-known regulatory mechanism in mammals. The distance of PTC from the exon-exon junction is important; PTC located more than 50-55 nucleotides (nt) upstream of the last exonexon junction causes mRNA decay by NMD, whereas PTClocated downstream of this boundary does not induce NMD [29]. The 9th exon of the V1 transcript containing the retained intron induced a PTC in the adjacent 5 exonic region (Figure 3(f)). This PTC is located more than 50-55 nucleotides (nt) upstream of the last exon-exon junction.
Theoretically, the V1 transcript should be degraded by the NMD mechanism in all tested primates; however, it was transcribed in all tissues of rhesus monkey, crab-eating monkey, and African-green monkey (Figures 3(a)-3(e)). Therefore, we suggest a lineage-specific protection event for NMD, specifically in Old World monkeys. Although a lineage-specific NMD protection mechanism has not been clearly established, a few studies have shown that the cytidine (C) to uridine (U) RNA-edited APOB mRNA was protected from NMD by the APOBEC1-ACF-editing complex [30]. The uORF-containing thrombopoietin (TOP) gene and nonsense mutation in the first exon of the β-globin (HBB) gene also escape NMD [31,32]. Therefore, these elements specific to Old World monkeys may yield similar results. Supplementary Figure 2 illustrates the 7 Old World monkey-specific nucleotides. We believe that this sequence affects protection from NMD. However, further studies are needed to demonstrate the validity of this mechanism, including the relationship between specific nucleotides and NMD escape.
The species-and lineage-specific IR, NMD, and NMD escape mechanisms have not been demonstrated. A large number of IR studies have been performed in human and mouse. Therefore, experimental validation of our results in Old World monkeys (rhesus monkey, crab-eating monkey, and African-green monkey) suggesting that abundant IR events could be an attractive topic for IR and NMD research.  .4), and V1 encoded 330, 320, and 314 amino acids, respectively. The amino-acid sequences encoded by isoform b (SQVIGENTKQP) and V1 (FGFL) differ from the sequence in the C-terminus of isoform a. This region belongs to the SR domain, which is essential for nuclear localization of ZRANB2 and is regulated by phosphorylation [33]. These functional features were demonstrated by an SR deletion study. However, more detailed deletion studies have not been performed. Although we did not perform a functional analysis, our results indicate that Plat L3-exonization and intron retention events could yield various transcripts in the primate lineage. A previous study suggested that integrated full-length L1 elements in the intronic region of host genes could cause transcriptional interference (TI) by IR and exonization [34]. However, isoforms a and b and variant transcripts of the ZRANB2 gene were not prematurely terminated by IR and exonization derived from the Plat L3 element (Figure 3(f)). Our results also show that Plat L3exonized isoform b was transcribed in all tested human and monkey samples. The TI effect did not occur in ZRANB2 by IR and exonization events derived from the Plat L3 element. Therefore, we focused on the mechanism of gene diversity by IR and exonization.

Conclusion
We investigated and compared exonization derived from Plat L3 element and IR events in the ZRANB2 gene in human and monkeys (rhesus monkey, crab-eating monkey, Africangreen monkey, and marmoset). First, we confirmed that the Plat L3 and L3b LINE elements in the intronic region of ZRANB2 were integrated in human and all primate lineages (18 species, including hominoids, Old World monkeys, New World monkeys, and prosimian). RT-PCR experiments indicated that the Plat L3-encoded exon was conserved in all tested tissues of human and the 4 monkeys; the IR event occurred only in Old World monkeys (rhesus monkey, crab-eating monkey, and African-green monkey). Transcript variants of ZRANB2 genes derived from these events encoded different-sized products via 6-frame translation sequence analysis. Based on our results, we assume that IR and TEderived exonization events are intriguing evolutionary factors that could enhance the transcriptome and protein diversity under limited genomic sources in the primate lineage.

Author's Contribution
Sang-Je Park, Jae-Won Huh, and Young-Hyun Kim are contributed equally to this work.