Discovery of a Novel Long Noncoding RNA Lx8-SINE B2 as a Marker of Pluripotency

Pluripotency and self-renewal of embryonic stem cells (ESCs) are marked by core transcription regulators such as Oct4, Sox2, and Nanog. Another important marker of pluripotency is the long noncoding RNA (lncRNA). Here, we ind that a novel long noncoding RNA (lncRNA) Lx8-SINE B2 is a marker of pluripotency. LncRNA Lx8-SINE B2 is enriched in ESCs and downregulated during ESC differentiation. By rapid amplification of cDNA ends, we identified the full-length sequence of lncRNA Lx8-SINE B2. We further showed that transposable elements at upstream of lncRNA Lx8-SINE B2 could drive the expression of lncRNA Lx8-SINE B2. Furthermore, ESC-specific expression of lncRNA Lx8-SINE B2 was driven by Oct4 and Sox2. In summary, we identified a novel marker lncRNA of ESCs, which is driven by core pluripotency regulators.


Introduction
Most of the mammalian genome is composed of noncoding sequences. Among them, transposable elements (TEs) contribute to~40% of the genome [1]. The majority of TEs are silenced, however, a small percentage of TEs are expressed during development and in diseases [2]. They play multiple roles in these processes, including function as enhancers, promoters, and long noncoding RNAs (lncRNAs) [3][4][5][6]. In vertebrates, 70% lncRNAs are composed of TEs [7]. TEs also confer tissue-specific expression on lncRNAs through the recruitment of transcription factors [3,4,6]. TE-derived lncRNAs actively participate in development. TE-derived lncRNA ROR functions as a sponge to miRNA and also works with hnRNPA1 to promote c-Myc expression during reprogramming [8][9][10]. Endogenous retrovirus HERVHderived lncRNAs maintain pluripotency of human embryonic stem cells [3,[11][12][13]. Asymmetrical expression of ERV1 and ERVK-derived lncRNA LincGET in two-to four-cell mouse embryos biases cell fate toward inner cell mass [14]. These findings all suggest an important role of TE-derived lncRNA in development. Most of these findings are based on human cell lines. We are still lack of understanding of TE-derived lncRNAs in mouse embryonic stem cells (ESCs). In this study, we investigated the expression and regulation of one representative lncRNA Lx8-SINE B2 in ESCs.
2.2. RNA Extraction, Reverse Transcription, and Quantitative PCR (qPCR). Total RNA was extracted with RNAiso Reagent (B9109, Takara) as described [15] and treated with DNase I to remove genomic DNA in DEPC water (B501005, Sangon Biotech). The cDNA synthesis was carried out in RNasefree tubes (401001, NEST Biotechnology) with the Transcriptor First Strand cDNA Synthesis Kit (4897030001, Roche), according to the manufacturer's instructions. Quantitative PCR (qPCR) reactions were performed using the Hieff qPCR SYBR Green Master Mix (H97410, Yeasen) in a QuantStudio 6 Real-Time PCR System (Life Technologies). Primer sequences for qPCR analysis are listed in Table 1. 2.3. Depletion of Gene Expression with shRNAs. For gene knockdown, short hairpin RNAs (shRNAs) for luciferase (control) or target genes were designed by an online tool (http://sirna.wi.mit.edu/) and synthesized by GENEWIZ corporation. The shRNA plasmids were constructed using the pSuper-puro system and purified with a kit (1211-01, Biomiga). mESCs were transfected with DNA using Polyjet (SL100688, SignaGen), according to the manufacturer's protocol. Transfected ESCs were selected with 1 μg/ml puromycin from 24 h after transfection. After four days of puromycin selection, transfected cells were harvested. The sequences of shRNAs are listed in Table 2. 2.4. 5 ′ and 3 ′ Rapid Amplification of cDNA Ends (RACE) Analysis. For 3 ′ RACE, first-strand cDNA synthesis is initiated at the poly(A) tail of total RNA using the anneal oligo(dT)-containing RT Adapter Primer (AP) to mRNA. Gene-specific primer pF1 was designed based on the known sequence. 3 ′ fragment was amplified by primer pF1 and general primer gR1, the RACE PCR products were separated on a 1.5% agarose gel.
For 5 ′ RACE, the first-strand cDNA was synthesized from total RNA using a gene-specific primer (RT GSP1), which was designed according to the 3 ′ known sequence. A homopolymer tail was subsequently added to the 3 ′ -end of the cDNA using terminal deoxynucleotidyl transferase kit (2230A, Takara), according to the manufacturer's instruc-tion. First-round PCR was performed based on poly(C) tail designed dG adaptor primer to synthesize double-stranded cDNA. Then, general primer gP1 and gene-specific primer pR2 were used for second-round PCR to amplify the cDNA 5 ′ end sequence. The RACE PCR products were separated on a 1.5% agarose gel and cloned into pEASY-T1 (TransGen Biotech) for Sanger sequencing. The gene-specific RACE primers used for mapping each end were from Sangon Biotech and were listed in Table 3.
2.5. Dual-Luciferase Reporter Gene Assay. Mouse ESCs were seeded at a density of 8 × 10 4 cells per well in a 24-well plate. Luciferase assay was performed as previously described [16]. The total amount of 200 ng of the various promoters of lncRNA Lx8-SINE B2 or pGL4.23 empty vector was transfected into each well of E14 ESC on a 24-well plate together with 10 ng of pCMV-Renilla. The medium was changed 12 h after transfection. After transfection of 36 h, cells were collected and lysed in 1x passive lysis buffer. The luciferase activity was determined by Dual-Luciferase Reporter Assay Table 1: Primer sequences for qPCR analysis.

Results
3.1. Mapping the Full-Length Sequence of lncRNA Lx8-SINE B2. Through mining the previous publication [17], it was shown that lincRNA-1282 was expressed in ESCs and its depletion leads to downregulation of c-Myc [17], which is an important reprogramming factor. Therefore, we set out to perform RACE to identify the full-length of lincRNA-1282 [17], which is a partial sequence of lncRNA Lx8-SINE B2. To identify the full length of Lx8-SINE B2, we performed 3 ′ RACE and 5 ′ RACE with primers as designed (Figures 1(a) and 1(b)). Our amplicons for both 5 ′ and 3 ′ RACE were visible as a single DNA band without multiple or unspecific bands (Figures 1(a) and 1(b)). Next, we sequenced the amplicons and identified the sequences of lncRNA Lx8-SINE B2 (Figure 1(c)). With the 5 ′ and 3 ′ ends of lncRNA Lx8-SINE B2 found, we designed primers to amplify the full length of lncRNA Lx8-SINE B2 and subcloned the lncRNA into TA cloning vector (Figure 1(d)). The lncRNA Lx8-SINE B2 was revealed to be a 734 bp lncRNA.  Stem Cells International genome (mm10) and discovered that lncRNA Lx8-SINE B2 contained 3 exons, which are located between Adgrv1 and Lysmd3 gene (Figure 2(a)). Exon 1 of lncRNA Lx8-SINE B2 overlapped with LINE1 family Lx8 and its third exon overlapped with SINE B2 (Figure 2(a)); therefore, we named this lncRNA as Lx8-SINE B2. We designed primers on the nonrepeat region of exon 2 and 3 to detect the expression of lncRNA Lx8-SINE B2. Interestingly, it is noticed that lncRNA Lx8-SINE B2was downregulated during ESC differentiation, similar to the pluripotency gene Oct4, Sox2, and Esrrb, according to qPCR results (Figure 2(b)). We also found that lncRNA Lx8-SINE B2 was also expressed in ESCs instead of differentiated cells such as MEF (Figure 2(c)). Furthermore, we demonstrated that the expression of lncRNA Lx8-SINE B2was not affected by the alternation of ESC culture condition. Its expression was slightly upregulated in the presence of 2i/LIF or 2i condition in contrast to the serum/LIF culture condition (Figure 2(d)). These suggest lncRNA Lx8-SINE B2 as a marker of ESC.

Promoter
Structure of lncRNA Lx8-SINE B2. After that, we examined how the specific expression of lncRNA Lx8-SINE B2 was achieved. The upstream 1 kb promoter region of lncRNA Lx8-SINE B2 contains ORR1D2 and SINE B1 (Figure 3(a)). To study how Lx8-SINE B2 is regulated in ESCs, we cloned -623 bp to +327 bp of lncRNA Lx8-SINE B2 gene into luciferase reporter (Figures 3(a) and 3(b)). We also created various truncation versions of this region to identify the core promoter of Lx8-SINE B2 (Figure 3(b)).
The region corresponding to ERV, origin-region repeat 1 type D2 (ORR1D2, -157 bp to +3 bp) carried the strongest promoter activity in contrast to those of other truncations (Figure 3(c)). The promoter activity of ORR1D2 was specific to ESCs but inactivated in 3T3 fibroblasts (Figure 3(d)). These results support that lncRNA Lx8-SINE B2 is driven by ERV ORR1D2, implicating that TEs not only contribute to the exons of lncRNAs but also the promoter of lncRNAs.
To exclude the possibility that Oct4 and Sox2 activate neighboring genes of lncRNA Lx8-SINE B2 together with it, we examined the expression of Lysmd3 and Adgrv1 during 6 Stem Cells International ESC differentiation. Different from lncRNA Lx8-SINE B2, both Lysmd3 and Adgrv1 were unaffected by LIF withdrawal (Figure 5(a)). Furthermore, the expression of Lysmd3 and Adgrv1 were activated by depletion of Oct4 or Sox2, suggesting they are regulated differently from Lx8-SINE B2 (Figures 5(b) and 5(c)). Moreover, the expression of LINE1 and SINE B2 were not affected by Oct4 or Sox2 depletion ( Figure 5(d)), confirming the specificity of Oct4 and Sox2 in activating the expression of lncRNA Lx8-SINE B2.

Discussion
In summary, we identified a novel pluripotency marker lncRNA Lx8-SINE B2, whose expression is driven by the binding of Oct4 and Sox2 on ORR1D2. Oct4 and Sox2 are the core pluripotency regulators in ESCs [18,19]. Oct4 and Sox2 can drive the expression of lncRNAs in cancer cells and ESCs [20][21][22]. The binding profiles of OCT4 are different in human and mouse ESCs [23], which can be explained by its binding differences on species-specific TEs [23]. Here, we found that Oct4 and Sox2 targeted mouse TE ORR1D2 to drive ESC-specific lncRNA expression (Figure 4), further supporting the important role of TEs in driving the expression of species-specific lncRNAs. There are many pluripotency markers; however, we provide Lx8-SINE B2 as an additional novel marker of pluripotency. It lies at the downstream of key pluripotency genes Oct4 and Sox2 (Figure 4). It composes of TEs and is distinct from traditional markers of pluripotency. In comparison to other ESC markers, Lx8-SINE B2 is unique as an ORR1D2-driven pluripotency marker, which demonstrates that transposable elements can function as cell type-specific lncRNA and promoter, similarly , and Nanog (c) in ESCs. The data are represented as mean ± s:e:m: from three biological replicates. ns, non-significant; * * p < 0:01; * * * p < 0:001 according to two-sided Student's t-test. Biological-triplicate data (n = 3 dishes). (d-f) Luciferase assay analysis of core promoter activity of lncRNA Lx8-SINE B2 after depletion of core transcription factors Oct4 (d), Sox2 (e), and Nanog (f) in ESCs. Biological-triplicate data (n = 3 extracts) are presented as mean ± s:e:m. (g) Binding profile of Sox2, Oct4, and Nanog on the promoter region of lncRNA Lx8-SINE B2 according to published data as described in methods. Input was included as a control. 7 Stem Cells International to protein-coding genes. Finally, its depletion is associated with the downregulation of Myc in ESCs [17]; therefore, Lx8-SINE B2 expression also reflects Myc expression status of ESCs. Myc represses primitive endoderm differentiation [24]. Myc also maintains ESC pluripotency and self-renewal [25]. Therefore, we speculate that the depletion of lncRNA Lx8-SINE B2 may cause a phenotype similar to that of Myc downregulation.
Our study demonstrates that different types of TEs combine to form lncRNA and drive lncRNA expression ( Figures 2 and 3), implicating TEs as important components of lncRNA. TEs in lncRNAs work as an important RNA domain [26,27]. TEs within lncRNAs regulate the tissuespecific expression of lncRNAs [4,28]. In human, lncRNAs containing HERVH are specifically expressed in human ESCs [3,4,7]. TEs within lncRNAs also contribute to their functions. For example, SINE B2 in antisense lncRNA of Uchl1 interacts with Uchl1 mRNA and promotes the translation of Uchl1 through enhancing the association of mRNA with polysome [29]. These studies demonstrate that TEs are

Conclusion
In conclusion, we mapped the full-length sequence of lncRNA Lx8-SINE B2 and found it as an ESC-specific lncRNA. We also found that it was driven by ORR1D2 which was bound by Sox2 and Oct4 to drive its transcription. These findings support TEs as important compositions and promoter of lncRNA.

Conflicts of Interest
We declare that there is no conflict of interest present for this study.