The central dogma of gene expression considers RNA as the carrier of genetic information from DNA to protein. However, it has become more and more clear that RNA plays more important roles than simply being the information carrier. Recently, whole genome transcriptomic analyses have identified large numbers of dynamically expressed long noncoding RNAs (lncRNAs), many of which are involved in a variety of biological functions. Even so, the functions and molecular mechanisms of most lncRNAs still remain elusive. Therefore, it is necessary to develop computational methods to predict the function of lncRNAs in order to accelerate the study of lncRNAs. Here, we review the recent progress in the identification of lncRNAs, the molecular functions and mechanisms of lncRNAs, and the computational methods for predicting the function of lncRNAs.
1. Introduction
Proteins and related protein-coding genes have been the main subject of biological studies for years. However, with the development of RNA sequencing technology and computational methods for assembling the transcriptome, it has become clear that besides protein-coding genes much of the mammalian genome is transcribed, and many noncoding RNA (ncRNA) transcripts tend to play important roles in a variety of biological processes. Understanding the function of ncRNAs has become one of the most important goals of modern biological studies [1–3]. ncRNAs can be classified into several distinct subclasses, including processed small RNAs [4], promoter-associated RNAs [5], and functional long noncoding RNAs (lncRNAs) [6]. The term of lncRNA was introduced to distinguish the special class of ncRNA from well-known small regulatory RNAs (i.e. miRNAs and siRNAs). lncRNAs are generally longer than 200 nucleotides [3, 7, 8]. Recent studies have shown that lncRNAs may act as important cis- or trans-regulators in various biological processes. Mutations in lncRNAs are related with a wide range of diseases, especially cancers and neurodegenerative diseases. Even so, the functions and molecular mechanisms of most lncRNAs are unknown. Though several computational methods have been developed to predict the functions of lncRNAs, it still remains a challenging task, partly owing to the lack of conservation in both the sequence and secondary structures of lncRNAs [9–11]. In this paper, we will summarize the recent progresses and challenges in the identification, molecular mechanism, and function prediction of lncRNAs.
2. Definition and Classification of lncRNA
The definition of lncRNA is based on two criteria, the size and the lack of protein-coding potential. In this paper, lncRNA refers to nonprotein-coding RNA longer than 200 nt [7, 10–12], which distinguishes it them from mRNA and small regulatory RNA in a relatively satisfying way [11, 13]. Depending on their relationships with the nearest protein-coding genes, lncRNAs can be classified in three different ways [12, 14, 15]: (1) sense or antisense: lncRNAs that are located on the same strand or the opposite strand of the nearest protein-coding genes [16]; (2) divergent or convergent: lncRNAs that are transcribed in the divergent or convergent orientation compared to that of the nearest protein-coding genes [12]; (3) intronic or intergenic: lncRNAs that locate inside the introns of a protein-coding gene, or in the interval regions between two protein-coding genes [12, 17].
3. Identification of lncRNA
To identify lncRNAs, the first step is to obtain all transcripts including ncRNAs and mRNAs in cells, and then to distinguish lncRNAs from mRNAs and other types of ncRNAs. Traditional technologies, such as microarray, focus on the identification of protein-coding RNA transcripts. New technologies, such as RNA-Seq, are not limited to the identification of protein-coding RNA transcripts, and have led to the discovery of many novel ncRNA transcripts. The discrimination between lncRNAs and other small regulatory ncRNAs depends on their length. However, the length information alone is not enough to separate lncRNAs from mRNAs, and other criteria are needed for this purpose. Below, we will first briefly introduce new technologies in identifying RNA transcripts, especially ncRNA transcripts. Then, we will review current methods to distinguish lncRNAs from mRNAs.
3.1. Experimental Methods in Identifying lncRNAMicroarray
Traditional microarray technologies use predefined probes to determine the expression level of mRNA transcripts and are not appropriate to identify lncRNAs. However, it has been found that a few previously defined mRNAs or some probe sequences actually are lncRNAs; thus, former microarray datasets can be reannotated to study the expression of lncRNAs [60]. With more and more lncRNAs discovered, new probes specific for lncRNAs can be designed. For example, Babak et al. designed probes from conserved intergenic and intragenic region to identify potential ncRNA transcripts [61]. However, microarray is not sensitive enough to detect RNA transcripts with low-expression level. Thus the use of microarray to identify lncRNAs is limited due to the low expression level of many lncRNAs.
SAGE and EST
SAGE (serial analysis of gene expression) technology produces large numbers of short sequence tags and is capable of identifying both known and unknown transcripts. SAGE has been used and proved to be an efficient approach in studying lncRNAs. For example, Gibb et al. compiled 272 human SAGE libraries. By passing over 24 million tags they were able to generate lncRNA expression profiles in human normal and cancer tissues [62]. Lee et al. also used SAGE to identify potential lncRNA candidates in male germ cell [63]. However, SAGE is much more expensive than microarray, therefore is not widely employed in large-scale studies. EST (expressed sequence tag) is a short subsequence of cDNA, and is generated from one-shot sequencing of cDNA clone. The public database now contains over 72.6 million EST (GeneBank 2011), making it possible to discover novel transcripts. For example, Furuno et al. clustered EST to find functional and novel lncRNAs in mammalian [64]. Huang et al. used the public bovine-specific EST database to reconstruct transcript assemblies, and find transcripts in intergenic regions that are likely putative lncRNAs [65].
RNA-Seq
With the development of next generation sequencing (NGS) technologies, RNA-Seq (also named whole transcriptome shotgun sequencing) has been widely used for novel transcripts discovery and gene expression analysis. Compared to traditional microarray technology, RNA-Seq has many advantages in studying gene expression. It is more sensitive in detecting less-abundant transcripts, and identifying novel alternative splicing isoforms and novel ncRNA transcripts. The basic workflow for lncRNA identification using RNA-Seq is shown in Figure 1. RNA-Seq is currently the most widely used technology in identifying lncRNAs. For example, Li et al. applied RNA-Seq to identify lncRNAs during chicken muscle development [66]. Nam and Bartel integrated RNA-Seq, poly (A)-site, and ribosome mapping information to obtain lncRNAs in C. elegans [16]. Pauli et al. performed RNA-Seq experiments at eight stages during zebrafish early development, and identified 1133 noncoding multiexonic transcripts [67]. Prensner et al. used RNA-Seq to study lncRNA in human prostate cancer from 102 prostate tissues and cell lines, and concluded that lncRNAs may be used for cancer subtype classification [68].
Workflow of lncRNA identification from RNA-Seq.
RNA-IP
RNA-IP (RNA-immunoprecipitation) is a new method developed to identify lncRNA that interacts with specific protein. Antibodies of the protein are first used to isolate lncRNA-protein complexes. Then, cDNA library is constructed followed by deep sequencing of interacting lncRNAs. Using RNA-IP, Zhao et al. discovered a 1.6-kb lncRNA within Xist that interacts with PRC2 [69].
Chromatin Signature-Based Approach
The above-mentioned methods target on RNA transcripts directly. In contrast, chromatin signature-based approach uses chromatin signatures, such as H3K4me3 (the marker of active promoters) and H3K36me3 (the marker of transcribed region), to study actively transcribed genes including lncRNAs. In this approach, ChIP-Seq is used to generate genome-wide profiles of chromatin signatures [70], and the transcribed regions are mapped in the genome, where lncRNAs are determined and studied. For example, Guttman et al. identified 1,600 large multiexonic lncRNAs that are regulated by key transcription factors such as p53 and NFkB [71]. The advantage of this approach is its directness in investigating the mechanisms that regulate lncRNA expression.
3.2. Computational Methods in Identifying lncRNAORF Length Strategy
Unlike protein-coding genes, the start codons and termination codons in lncRNAs tend to distribute randomly. As a result, the ORF length of lncRNAs can hardly extend to over 100 from a probabilistic point of view. Based on this principle, one way to discriminate lncRNAs from mRNAs is by ORF length. For example, the FANTOM project used a maximum ORF length cutoff of 100 codons to differentiate noncoding RNAs from mRNAs [72]. However, some lncRNAs are known to have ORFs longer than 100 codons, while some protein coding genes have fewer than 100 amino acids, such as RCI2A gene in Arabidopsis which encodes a protein of 54 amino acids [73]. Thus, this approach may cause misclassification. To overcome the drawbacks of methods based on ORF length, Jia et al. utilize a comparative genomics method to refine ncRNA candidates. They defined the RNA sequences as ncRNAs only if the cDNAs have no homologous proteins longer than 30 amino acids across the mammalian genomes [7]. However, this method relies largely on the completeness of the databases. Therefore, deficiency in protein coding annotation may cause misclassification of lncRNAs as well.
Sequence and Secondary Structure Conservation Strategy
Compared to protein coding genes, noncoding genes are generally less conservative, meaning they are more inclined to mutate [21, 67]. Thus, measuring the coding potential is considered a way of identifying lncRNAs. Codon Substitution Frequency (CSF) is one of the criteria. For example, Guttman et al. used the maximum CSF score to assess the coding potential of a RNA sequence [71]. Clamp et al. and Lin et al. further combined CSF with reading frame conservation (RFC) to discriminate lncRNAs from mRNAs [74, 75]. Other similar methods include PhyloCSF use a phylogenetic framework to build two phylogenetic codon models that can distinguish coding from noncoding regions [76]. RNAcode combines amino acid substitution with gap patterns to assess the coding potential [77]. There are also methods that explore the conservation of RNA secondary structures to identify lncRNAs, including programs QRNA [78], RNAz [79], and EvoFOLD [80]. However, this approach is limited by lack of common conserved secondary structures specific for lncRNAs.
Machine Learning Strategies
Owing to the complex identities of lncRNAs, recently an increasing number of machine learning-based methods have been developed to integrate various sources of data to distinguish lncRNAs from mRNAs. Table 1 summarizes the machine learning methods and the features used to train the model for identifying lncRNAs. For instance, CONC utilizes a series of protein features such as amino acid composition, secondary structure, and peptide length, to train a SVM model that distinguishes lncRNAs from mRNAs [18]. CPC (Coding Potential Calculator) also uses SVM for modeling and extracting sequence features and the comparative genomics features to assess the coding potential of transcripts [19, 20]. Lu et al. developed a machine learning method that integrates GC content, DNA conservation, and expression information to predict lncRNAs in C. elegans [21].
Although the above-described methods have shown their effectiveness in identifying lncRNAs, exceptional cases still remain. For instance, whether an RNA transcript is translated or not may be changeable during the course of evolution. As an example, Xist, a well-known lncRNA, evolves from a protein-coding gene [81]. Besides, some genes are bifunctional, and both the coding and noncoding isoforms exist. The steroid receptor RNA activator (SRA) was characterized as a noncoding RNA previously but the coding product was detected later [82]. Such ambiguity will be clarified when more about lncRNAs are known.
lncRNAs have once been thought as the “dark matter” of the genome, because of our limited knowledge about their functions [83]. With more studies about lncRNAs conducted, it has become clear that lncRNAs have many specific functional features, and are likely to be involved in many diverse biological processes in cells. Rather than “dark matter,” they may act as necessary functional parts in the genome. These functional features include but are not limited to (i) lncRNAs have conserved splice junctions and introns [84]; (ii) the expression patterns of lncRNAs are tissue- and cell-specific [12, 67]; (iii) the altered expression of lncRNAs can be found in neurodegeneration, cancer, and other diseases [9, 10]; (iv) lncRNAs are associated with particular chromatin signatures that are indicative of actively transcribed genes [11, 85]. Below, we will briefly summarize the cellular functions of lncRNAs and molecular mechanisms of their functions.
4.1. Cellular Functions of lncRNA
With thousands of lncRNAs identified in mammals and other vertebrates [16], a few lncRNAs have been extensively studied, which have shed light on their possible functions. Firstly, lncRNAs are involved in various epigenetic regulations through recruitment of chromatin remodeling complexes to specific genomic loci, such as Xist, Air, and Kcnq1ot1 [22, 43]. Secondly, lncRNAs can regulate gene expression by interacting with protein partners in biological processes like protein synthesis, imprinting (Kcnq1ot1, Air), cell cycle control (TERRA), alternative splicing (MALAT1), and chromatin structure regulation (DNMT3b, PANDA) [9, 10, 38, 71, 85–89]. Thirdly, lncRNAs are involved in enhancer-regulating gene activation (eRNAs), in which cases they may interact directly with distal genomic regions [90]. Fourthly, some lncRNAs serve as interacting partners or precursors for short regulatory ncRNAs [91]. For example, microRNAs (miRNAs) can be generated through sequential cleavage of lncRNAs, while Piwi-interacting RNAs (piRNAs) can be produced by processing a single lncRNA transcript [88].
Recent studies have shown the expression of lncRNA is tissue specific. Loewer et al. studied the expression of lncRNA in global remodeling of the epigenome and during reprogramming of somatic cells to induce pluripotent stem cells (iPSCs). They found some lncRNAs have cell-type specific expression pattern [26, 92]. Loss-of-function studies on most intergenic lncRNAs expressed in mouse embryonic stem (ES) cells revealed that knockdown of intergenic lncRNAs has major consequences on gene expression patterns, which are comparable to the effects of knockdown of well-known ES cell regulators [93]. This indicated that lncRNAs might play important roles in regulating developmental process. The ENCODE project analyzed the tissue-specific expression of lncRNAs in 31 cell types, and found that many lncRNAs have brain-specific expression pattern [9, 12]. There are increasing lines of evidences that link dysregulations of lncRNAs to diverse human diseases ranging from neuron diseases to cancer [9, 10], suggesting that the involvement of lncRNAs in human diseases can be far more prevalent than previously thought [94].
4.2. Molecular Mechanisms of lncRNA
The precise mechanism of how lncRNAs function still remains largely unknown. Currently, there are several hypothesis about it, including (1) RNA:DNA:DNA triplex (trans-); (2) RNA:DNA hybrid; (3) RNA:RNA hybrid of lncRNA with a nascent transcript; (4) RNA-protein interaction (cis-/trans-). Although only (1), (2), and (4) have been experimentally demonstrated so far [14], it is generally thought that lncRNAs may function through the interaction with its partners, such as DNA, RNA, or protein, and serve the following roles: signal, decoy, scaffold, and guide [11, 14]. Table 2 lists lncRNAs that use different mechanisms when carrying out their functions. Below, we give examples for the above-mentioned mechanisms.
Function classification of lncRNAs.
Archetype
lncRNA name
Length
Target
Function
cis-/trans-
References
Signal
KCNQ1ot1, Air, Xist
91 kb, 108 kb, ~17 kb
G9a, PRC, YY1
Transcriptional silencing of multiple genes; X inactivation (XCI)
cis-
[11, 14, 22, 23]
HOTAIR, Frigidair, HOTTIP,
2.2 kb, N.A., 3.7 kb
LSD1-CoREST
Signals of anatomic position,
trans-
[6, 11, 14]
lincRNA-p21, PANDA
3 kb; 1.5 kb
hnRNP-K
p53 targets in response to DNA damage
trans-
[14, 24, 25]
lincRNA-RoR
2.6 kb
Oct4, Sox2, Nanog
Pluripotency-associated
N.A.b
[11, 26]
COOLAIR, COLDAIR
Multiple spliced: 400 bp/750 bp; ~1.1 kb
FLC, PRC2
Combinatorial transcriptional regulation
N.A.
[27, 28]
eRNA
Various sizes
MLL-WDR5, TFsa
Promotes mRNA synthesis
cis-
[29, 30]
Gas5
~7 kb
Glucocorticoid receptor
Represses the glucocorticoid receptor
N.A.
[31]
1/2-sbsRNAs
N.A.c
SMD
Formation of STAU1 binding sites
N.A.
[32]
Decoys
DHFR-Minor
7.3, 5.0, 1.4, and 0.8 kb
TFIIB
Inhibits assembly of the preinitiation complex
N.A.
[33]
TERRA
Various sizes
Telomerase
Regulation and protection of chromosome ends
N.A.
[34]
PANDA
1.5 kb
NF-YA
Inhibits expression of apoptotic genes
trans-
[35]
PTENP1
~3.9 kb
PTEN
Sequestration of miRNAs
N.A.
[36, 37]
MALAT1
~7 kb
SR splicing factors
Alters pattern of alternative splicing
N.A.
[38, 39]
Guides
Xist
~17 kb
PRC2, YY1
Inactives X chromosome
cis-
[14, 40–42]
Air, COLDAIR
108 kb,
G9a, PRC2
Silences transcription, affects histone acetylation and methylation states
cis-
[28, 43, 44]
HOTTIP
~3.8 kb
MLL-WDR5
Chromosomal looping, chromatin modifications
cis- (looping)
[11, 45]
HOTAIR
2.2 kb
LSD1-CoREST
Alters and regulates epigenetic states
trans-
[14, 46, 47]
Jpx
Multiple isoforms
polycomb complexa
Activation of Xist RNA on the inactive X
trans-
[11, 48]
lincRNA-p21
3 kb
hnRNP-Ka
p53 targets in response to DNA damage
trans-
[11, 24]
Scaffold
TERC
Various sizes
TERT
Telomerase catalytic activity
trans-
[49, 50]
HOTAIR
2.2 kb
PRC2, LSD1, CoREST, REST
Demethylates histone H3 on K4 to antagonize gene activation
trans-
[46, 51]
ANRIL
Multiple spliced: 3.9 kb/34.8 kb
PRC1, PRC2
Contributes to the functions of both PRC1 and PRC2 proteins
trans-
[52, 53]
Alpha Satellite Repeat LncRNA
N.A.
SUMO-HP1
Molecular scaffold for the targeting and local accumulation of HP1
N.A.
[11, 54]
aNot yet understood.
bNot clearly referred as cis-action.
cNo length data available in all six databases listed in Table 3.
Signal
Some lncRNAs have been reported to respond to diverse stimuli, hinting they may act as molecular signals [12, 24, 25, 27, 35]. For example, lncRNAs can act as markers for imprinting (Air and Kcnq1ot1), X inactivation (Xist), and silencing (COOLAIR). ChIP-Seq studies showed that the gene-activating enhancers produce lncRNA transcripts (eRNAs) [29, 95], and their expression level positively correlates with that of nearby genes, indicating a possible role in regulating mRNA synthesis. This is supported by a recent Loss-of-Function study that found the knockdown of 7 out of 12 lncRNAs affects expression of their cognate neighboring genes [8].
Decoy
lncRNA can function as molecular decoy to negatively regulate an effector. Gas5 contains a hairpin sequence motif that resembles the DNA-binding site of the glucocorticoid receptor [31]. It can serve as a decoy to release the receptor from DNA to prevent transcription of metabolic genes [14]. Another example is the telomeric repeat-containing RNA (TERRA). It interacts with the telomerase protein through a repeat sequence complementary to the template sequence of telomerase RNA [11, 34].
Guide
Upon interaction with the target molecular, lncRNA may have the ability to guide it into the proper position either in cis (on neighboring genes) or in trans (on distantly located genes). The newly found eRNAs appear to exert their effects in cis by binding to specific enhancers and actively engaged in regulating mRNA synthesis [11, 29]. HOTAIR and HOTTIP are transcribed within the human HOX clusters, and serve as signals of anatomic positions by expressing in cells that have distal and posterior positional identities; they both require the interacting partners to be properly localized to the site of action [6]. In this process, chromosomal looping of the 5′ end of HOXA brings HOTTIP into the spatial proximity of multiple HOXA genes, enforcing the maintenance of H3K4me3 and gene activation [14]. This long-range gene activation mechanism suggests that chromosome looping plays a central role in delivering lncRNA to its site of action [11, 45].
Scaffold
Recent studies found that several lncRNAs have the capacity to bind more than two protein partners, where the lncRNAs serve as adaptors to form the functional protein complexes. The telomerase RNA TERC (TERRA) is a classic example of RNA scaffold, and is essential for telomerase function. HOTAIR binds the polycomb complex PRC2 to exert its “signal” function. A recent study found that the 3,700 nt of HOTAIR also interact with a second complex consisting of LSD1, CoREST, and REST to antagonize gene activation, further emphasizing its important role as the scaffold of the functional complex [11, 51].
Cis- and Trans-Action of lncRNAs
lncRNAs can be classified as cis- or trans-regulators depending on whether it exerts its function on a neighboring gene on the same allele from which it is transcribed [96]. It was considered that many lncRNAs act as cis-regulators, as the expression of lncRNA is significantly correlated with their neighboring protein-coding genes [97, 98]. However, recent studies have questioned that the positive correlation between lncRNAs and their neighboring genes may be due to shared upstream regulation (such as, lincRNA-p21 [24] and lincRNA-Sox2 [6]), positional correlation (such as, HOTAIR [6]), transcriptional “ripple effects” [98], and indirect regulation of neighboring genes, instead of the effects of cis-regulation. This was supported by the fact that knock down of different number of lncRNAs had little effect on the expression of neighboring genes [96]. In general, it has been accepted that some lncRNAs are cis-regulators [99, 100], while the vast majority may function as trans-regulators [6, 11, 93]. Recently, some cis-regulating lncRNAs were found to have the capacity to act in trans [33, 101, 102], highlighting the complexity of lncRNAs.
Although substantial research progresses have been made since the discovery of lncRNAs, it still remains a challenge to understand the functions of lncRNAs. One reason is, unlike protein-coding genes whose mutations may result in severely obvious phenotypes, mutations in lncRNAs often do not cause significant phenotypes [85]. It is likely that lncRNAs may function at specific stage of development process or under specific conditions, and thus condition-specific studies of lncRNAs’ phenotypes may be necessary. With more omics data about lncRNAs accumulating, computational prediction of the function of lncRNAs can help to design experiments to accelerate the understanding of lncRNAs.
5. lncRNA Database
The current lncRNA databases are summarized in Table 3. lncRNAdb is an integrated database specific for lncRNAs, including annotation, sequence, structural, species, and function categories of lncRNAs [55]. NONCODE is a database about ncRNAs that have been experimentally confirmed. It covers almost all published 73,272 lncRNAs in human and mouse; it also includes expression profiles of lncRNAs and their potential functions predicted from Coding-Noncoding coexpression network (see below) [56]. LNCipedia is another integrated lncRNA database, which includes 21,488 annotated human lncRNAs. It contains lncRNAs information about the coding potential, secondary structure, and microRNA binding sites [57]. fRNAdb and NRED are databases for ncRNAs including lncRNAs [58, 59]. The above databases provide great convenience for further analysis and applications of lncRNAs.
List of lncRNA databases.
Tools
Source
Description
Reference
lncRNAdb
http://www.lncrnadb.org/
Contain comprehensive list of lncRNAs in eukaryotes, and mRNAs with regulatory roles
[55]
NONCODE
http://noncode.org/
Integrative annotation of noncoding RNA (73,372 lncRNAs)
[56]
LNCipedia
http://www.lncipedia.org/
21 488 annotated human lncRNA transcripts with secondary structure information, protein coding potential, and microRNA binding sites
[57]
fRNAdb
http://www.ncrna.org/frnadb/
A large collection of noncoding transcripts including annotated/unannotated sequences from H-inv database, NONCODE, and RNAdb
Computational prediction of lncRNA functions is still at its early development stage. Unlike protein-coding genes whose sequence motifs are indicative of their function, lncRNA sequences are usually not conserved and do not contain conserved sequence motifs [103, 104]. The secondary structures of lncRNA are also not conserved [105]. Thus, it is difficult to infer the function of lncRNAs based on their sequences or secondary structures alone. Since current knowledge suggests that lncRNAs function by regulating or interacting with its partner molecular, current methods focus on exploring the relationships between lncRNAs and protein-coding genes or miRNAs. Below, we will describe several current approaches for predicting the functions of lncRNAs.
6.1. Comparative Genomics Approach
Although most lncRNAs are not conserved, there are lncRNAs that are conserved across species, indicating their essential functions. Amit et al. identified 78 lncRNAs transcripts conserved in both human and mouse, and found 70 are either located within or close (<1000 nt distance) to a coding gene that is also conserved in the two genomes [106]. They assumed these lncRNAs might have close functional relationships with the nearby coding genes. However, this approach is limited because of the poor conservation of lncRNAs and cannot be applied at genome scale.
6.2. Coexpression with Coding Genes Approach
Many studied lncRNAs play important regulatory roles, and it is likely that lncRNAs regulating a specific biological process may be coexpressed with the genes involved in the same process. Thus, identifying coding genes that are coexpressed with lncRNAs may help to infer the function of lncRNAs. Based on this assumption, Guttman et al. developed a coexpression based method to predict lncRNAs functions at genome scale [71]. For each lncRNA, they ranked coding genes based on their coexpression level with the lncRNAs, and then performed a Gene Set Enrichment Analysis (GSEA) for the top-ranked genes to identify enriched functional terms corresponding to the lncRNAs. Out of 150 lncRNAs subjected for experimental validation, 85 exhibited the predicted functions, proving the effectiveness of using the coexpressed coding genes to infer the function of lncRNAs from their coexpressed coding genes. According to their predictions, lncRNAs participate in a rather wide range of biological processes such as cell proliferation, development, and immune surveillance. Andrea et al. employed a similar approach to predict the function of lncRNAs during zebrafish embryogenesis [67].
Liao et al. furthered the coexpression idea by constructing a coding-noncoding (CNC) gene coexpression network [107]. In contrast to the GSEA method that collects coding genes coexpressed for each lncRNA, the CNC method considers not only the coexpression between lncRNAs and coding genes, but also within lncRNAs group and coding gene group. When predicting the function of lncRNAs, the CNC method employs two different approaches: the hub-based and the network-module-based. In the hub-based approach, functions are assigned to each lncRNA according to the functional enrichment of its neighboring genes. In the network-module-based approach, Markov cluster algorithm (MCL) is used to identify coexpressed functional module in the CNC network; then functions of the module are transferred to the lncRNAs inside the module. Liao et al. applied the CNC method to annotate the functions of 340 mouse lncRNAs, and found these lncRNAs function mainly in organ or tissue development, cellular transport, and metabolic processes.
6.3. Interaction with miRNAs and Proteins Approach
Recent analysis found that lncRNAs share a synergism with miRNA in the regulatory network [108, 109]. It is likely that some lncRNAs function by binding miRNA. Therefore, identifying well-established miRNAs that bind lncRNAs may help to infer the function of lncRNAs. Jeggari et al. developed an algorithm named miRcode that predicts putative microRNA binding sites in lncRNAs using criteria such as seed complementarity and evolutionary conservation [110]. Jalali et al. constructed a genome-wide network of validated RNA mediated interactions, and uncovered previously unknown mediatory roles of lncRNA between miRNA and mRNA (Saakshi Jalali, arXiv preprint). Besides the interaction with miRNA, the interaction of lncRNAs with proteins can also be explored to predict their functions. Bellucci et al. developed a method called “catRAPID” that correlates lncRNAs with proteins by evaluating their interaction potential using physicochemical characteristics, including secondary structure, hydrogen bonding, van der Waals, and so forth [111]. However, unlike the coexpression based approach, the above two approaches were successful in only a number of lncRNAs, partly because the mechanism of how lncRNAs interact with miRNAs and proteins still remains unclear.
6.4. Challenges
Computational prediction of lncRNA functions is still at its primary stage. As the sequence and secondary structure of lncRNAs are generally not conserved, function prediction of lncRNAs mainly relies on their relationships with other moleculars, such as protein coding genes, miRNAs, and proteins. However, the molecular mechanism of how lncRNA function by interacting with other molecular remains largely unknown, making it difficult to develop computational methods to precisely predict the functions of lncRNAs. On the other hand, there are currently only a small number of lncRNAs whose functions are well understood, which makes it difficult to validate and optimize computational algorithms for predicting lncRNA functions. Finally, unlike protein-coding genes that have systematic functional annotation systems, there lacks an annotation system for lncRNA functions, making it difficult to evaluate computational algorithms for function prediction. Nevertheless, the success of predicting lncRNAs using the coexpression based approach has shown promises. With more functional genomics data about lncRNAs available in the near future, more powerful and accurate methods will be developed to help decipher the functions of lncRNAs.
7. Perspectives
It has been widely accepted that lncRNAs play important functional roles in cell, though the molecular mechanism of how lncRNAs function remains to be unraveled. In this paper, we have described several currently proposed models about the molecular mechanism of lncRNA functions. One commonality about these models is that lncRNAs function through the interaction with other molecular, including DNA, RNA, and proteins. Given the abundance of lncRNAs in genome, it is likely that the interaction between lncRNAs and other moleculars may be specific. This thus raises the possibility of developing novel methods to target certain lncRNA for gene-specific regulation. However, phenotypic studies of lncRNAs suggested that knockdown of many lncRNAs does not result in obvious phenotypes, making it difficult to understand their functions. Computational prediction of lncRNAs can provide hypothesis about the functions of lncRNAs, and help to design experiments to test them under specific conditions. Yet, it remains a significant challenge to develop effective methods to accurately infer the lncRNA functions, owing to the lack of detailed information about the molecular mechanisms of lncRNAs. In order to develop powerful computational methods, more studies about the derivation of lncRNAs, the molecular mechanism of lncRNAs and tissue-specific, or development-specific expression about lncRNAs are necessary.
Acknowledgment
This work was supported by the National Natural Science Foundation of China (Grant no. 31071113).
CarninciP.KasukawaT.KatayamaS.The transcriptional landscape of the mammalian genome200530915591563BirneyE.StamatoyannopoulosJ. A.DuttaA.Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project200744779981610.1038/nature05874KapranovP.ChengJ.DikeS.NixD. A.DuttaguptaR.WillinghamA. T.StadlerP. F.HertelJ.HackermüllerJ.HofackerI. L.BellI.CheungE.DrenkowJ.DumaisE.PatelS.HeltG.GaneshM.GhoshS.PiccolboniA.SementchenkoV.TammanaH.GingerasT. R.RNA maps reveal new RNA classes and a possible function for pervasive transcription20073165830148414882-s2.0-3425016025610.1126/science.1138341WiluszJ. E.FreierS. M.SpectorD. L.3′ end processing of a long nuclear-retained noncoding RNA yields a tRNA-like cytoplasmic RNA200813559199322-s2.0-5634911345510.1016/j.cell.2008.10.012SeilaA. C.CalabreseJ. M.LevineS. S.YeoG. W.RahlP. B.FlynnR. A.YoungR. A.SharpP. A.Divergent transcription from active promoters20083225909184918512-s2.0-5784914066110.1126/science.1162253RinnJ. L.KerteszM.WangJ. K.SquazzoS. L.XuX.BrugmannS. A.GoodnoughL. H.HelmsJ. A.FarnhamP. J.SegalE.ChangH. Y.Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs20071297131113232-s2.0-3425072913810.1016/j.cell.2007.05.022JiaH.OsakM.BoguG. K.StantonL. W.JohnsonR.LipovichL.Genome-wide computational identification and manual annotation of human long noncoding RNA genes2010168147814872-s2.0-7795511097910.1261/rna.1951310ØromU. A.DerrienT.BeringerM.GumireddyK.GardiniA.BussottiG.LaiF.ZytnickiM.NotredameC.HuangQ.GuigoR.ShiekhattarR.Long noncoding RNAs with enhancer-like function in human cells2010143146582-s2.0-7795724392110.1016/j.cell.2010.09.001QureshiI. A.MattickJ. S.MehlerM. F.Long non-coding RNAs in nervous system function and disease20101338C20352-s2.0-7795258135010.1016/j.brainres.2010.03.110WapinskiO.ChangH. Y.Long noncoding RNAs and human disease20112163543612-s2.0-7995784035610.1016/j.tcb.2011.04.001WangK. C.ChangH. Y.Molecular mechanisms of long noncoding RNAs201143904914DerrienT.JohnsonR.BussottiG.TanzerA.DjebaliS.The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression20122217751789DingerM. E.PangK. C.MercerT. R.MattickJ. S.Differentiating protein-coding and noncoding RNA: challenges and ambiguities20084112-s2.0-5714910757810.1371/journal.pcbi.1000176e1000176RinnJ. L.ChangH. Y.Genome regulation by long noncoding RNAs201281145166PontingC. P.OliverP. L.ReikW.Evolution and functions of long noncoding RNAs200913646296412-s2.0-6014909938510.1016/j.cell.2009.02.006NamJ.-W.BartelD. P.Long noncoding RNAs in C. elegans201222122529254010.1101/gr.140475.112TsaiM. C.SpitaleR. C.ChangH. Y.Long intergenic noncoding RNAs: new links in cancer progression2011711372-s2.0-7865139016710.1158/0008-5472.CAN-10-2483LiuJ.GoughJ.RostB.Distinguishing protein-coding from non-coding RNAs through support vector machines200624, article no. e292-s2.0-3374650778410.1371/journal.pgen.0020029AltschulS. F.MaddenT. L.SchäfferA. A.ZhangJ.ZhangZ.MillerW.LipmanD. J.Gapped BLAST and PSI-BLAST: a new generation of protein database search programs19972517338934022-s2.0-003080100210.1093/nar/25.17.3389KongL.ZhangY.YeZ. Q.LiuX. Q.ZhaoS. Q.WeiL.GaoG.CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine200735W345W3492-s2.0-3454758241810.1093/nar/gkm391LuZ. J.YipK. Y.WangG.ShouC.HillierL. W.KhuranaE.AgarwalA.AuerbachR.RozowskyJ.ChengC.KatoM.MillerD. M.SlackF.SnyderM.WaterstonR. H.ReinkeV.GersteinM. B.Prediction and characterization of noncoding RNAs in C. elegans by integrating conservation, secondary structure, and high-throughput sequencing and array data20112152762852-s2.0-79955565532PandeyR. R.MondalT.MohammadF.EnrothS.RedrupL.KomorowskiJ.NaganoT.Mancini-DiNardoD.KanduriC.Kcnq1ot1 antisense noncoding RNA mediates lineage-specific transcriptional silencing through chromatin-level regulation20083222322462-s2.0-5404913894810.1016/j.molcel.2008.08.022MohammadF.MondalT.KanduriC.Epigenetics of imprinted long noncoding RNAs2009452772862-s2.0-79959355182HuarteM.GuttmanM.FeldserD.GarberM.KoziolM. J.Kenzelmann-BrozD.KhalilA. M.ZukO.AmitI.RabaniM.AttardiL. D.RegevA.LanderE. S.JacksT.RinnJ. L.A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response201014234094192-s2.0-7795532387910.1016/j.cell.2010.06.040HungT.WangY.LinM. F.KoegelA. K.KotakeY.GrantG. D.HorlingsH. M.ShahN.UmbrichtC.WangP.WangY.KongB.LangerødA.Børresen-DaleA. L.KimS. K.Van De VijverM.SukumarS.WhitfieldM. L.KellisM.XiongY.WongD. J.ChangH. Y.Extensive and coordinated transcription of noncoding RNAs within cell-cycle promoters20114376216292-s2.0-7995975626310.1038/ng.848LoewerS.CabiliM. N.GuttmanM.LohY. H.ThomasK.ParkI. H.GarberM.CurranM.OnderT.AgarwalS.ManosP. D.DattaS.LanderE. S.SchlaegerT. M.DaleyG. Q.RinnJ. L.Large intergenic non-coding RNA-RoR modulates reprogramming of human induced pluripotent stem cells20104212111311172-s2.0-7864946708810.1038/ng.710SwiezewskiS.LiuF.MagusinA.DeanC.Cold-induced silencing by long antisense transcripts of an Arabidopsis Polycomb target200946272747998022-s2.0-7204913320710.1038/nature08618HeoJ. B.SungS.Vernalization-mediated epigenetic silencing by a long intronic noncoding RNA2011331601376792-s2.0-7865096667010.1126/science.1197349KimT. K.HembergM.GrayJ. M.CostaA. M.BearD. M.WuJ.HarminD. A.LaptewiczM.Barbara-HaleyK.KuerstenS.Markenscoff-PapadimitriouE.KuhlD.BitoH.WorleyP. F.KreimanG.GreenbergM. E.Widespread transcription at neuronal activity-regulated enhancers201046572951821872-s2.0-7795236779810.1038/nature09033WangD.Garcia-BassetsI.BennerC.LiW.SuX.ZhouY.QiuJ.LiuW.KaikkonenM. U.OhgiK. A.GlassC. K.RosenfeldM. G.FuX. D.Reprogramming transcription by distinct classes of enhancers functionally defined by eRNA201147473513903972-s2.0-7995919816610.1038/nature10006KinoT.HurtD. E.IchijoT.NaderN.ChrousosG. P.Noncoding RNA Gas5 is a growth arrest- and starvation-associated repressor of the glucocorticoid receptor20103107, article no. ra82-s2.0-7795163828710.1126/scisignal.2000568GongC.MaquatL. E.LncRNAs transactivate STAU1-mediated mRNA decay by duplexing with 39 UTRs via Alu eleme201147073332842882-s2.0-7995149582210.1038/nature09701MartianovI.RamadassA.Serra BarrosA.ChowN.AkoulitchevA.Repression of the human dihydrofolate reductase gene by a non-coding interfering transcript200744571286666702-s2.0-3384693035710.1038/nature05519RedonS.ReichenbachP.LingnerJ.The non-coding RNA TERRA is a natural ligand and direct inhibitor of human telomerase20103817579758062-s2.0-7795724398710.1093/nar/gkq296gkq296HungT.ChangH. Y.Long noncoding RNA in genome regulation: prospects and mechanisms2010755825852-s2.0-78649339069PolisenoL.SalmenaL.ZhangJ.CarverB.HavemanW. J.PandolfiP. P.A coding-independent function of gene and pseudogene mRNAs regulates tumour biology20104657301103310382-s2.0-7795395763310.1038/nature09144SongM. S.CarracedoA.SalmenaL.SongS. J.EgiaA.MalumbresM.PandolfiP. P.Nuclear PTEN regulates the APC-CDH1 tumor-suppressive complex in a phosphatase-independent manner201114421871992-s2.0-7865148204610.1016/j.cell.2010.12.020TripathiV.EllisJ. D.ShenZ.SongD. Y.PanQ.WattA. T.FreierS. M.BennettC. F.SharmaA.BubulyaP. A.BlencoweB. J.PrasanthS. G.PrasanthK. V.The nuclear-retained noncoding RNA MALAT1 regulates alternative splicing by modulating SR splicing factor phosphorylation20103969259382-s2.0-7795692782310.1016/j.molcel.2010.08.011BernardD.PrasanthK. V.TripathiV.ColasseS.NakamuraT.XuanZ.ZhangM. Q.SedelF.JourdrenL.CoulpierF.TrillerA.SpectorD. L.BessisA.A long nuclear-retained non-coding RNA regulates synaptogenesis by modulating gene expression20102918308230932-s2.0-7795688272310.1038/emboj.2010.199PlathK.Mlynarczyk-EvansS.NusinowD. A.PanningB.Xist RNA and the mechanism of X chromosome inactivation2002362332782-s2.0-003694865910.1146/annurev.genet.36.042902.092433LeeJ. T.The X as model for RNA's niche in epigenomic regulation201029a0037492-s2.0-78650240738SunB. K.DeatonA. M.LeeJ. T.A transient heterochromatic state in Xist preempts X inactivation choice without RNA stabilization20062156176282-s2.0-3334446131010.1016/j.molcel.2006.01.028NaganoT.MitchellJ. A.SanzL. A.PaulerF. M.Ferguson-SmithA. C.FeilR.FraserP.The Air noncoding RNA epigenetically silences transcription by targeting G9a to chromatin20083225908171717202-s2.0-5654911112910.1126/science.1163802CamblongJ.IglesiasN.FickentscherC.DieppoisG.StutzF.Antisense RNA stabilization induces transcriptional gene silencing via histone seacetylation in S. cerevisiae200713147067172-s2.0-3604895104010.1016/j.cell.2007.09.014WangK. C.YangY. W.LiuB.SanyalA.Corces-ZimmermanR.ChenY.LajoieB. R.ProtacioA.FlynnR. A.GuptaR. A.WysockaJ.LeiM.DekkerJ.HelmsJ. A.ChangH. Y.A long noncoding RNA maintains active chromatin to coordinate homeotic gene expression201147273411201262-s2.0-7995374867310.1038/nature09819KhalilA. M.GuttmanM.HuarteM.GarberM.RajA.MoralesD. R.ThomasK.PresserA.BernsteinB. E.Van OudenaardenA.RegevA.LanderE. S.RinnJ. L.Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression20091062811667116722-s2.0-6765092194910.1073/pnas.0904715106ZhaoJ.OhsumiT. K.KungJ. T.OgawaY.GrauD. J.SarmaK.SongJ. J.KingstonR. E.BorowskyM.LeeJ. T.Genome-wide identification of polycomb-associated RNAs by RIP-seq20104069399532-s2.0-7865025376310.1016/j.molcel.2010.12.011TianD.SunS.LeeJ. T.The long noncoding RNA, Jpx, Is a molecular switch for X chromosome inactivation201014333904032-s2.0-7795847202510.1016/j.cell.2010.09.049CollinsK.Physiological assembly and activity of human telomerase complexes20081291-291982-s2.0-3884919113710.1016/j.mad.2007.10.008ZappullaD. C.CechT. R.Yeast telomerase RNA: a flexible scaffold for protein subunits20041012710024100292-s2.0-304279845410.1073/pnas.0403641101TsaiM. C.ManorO.WanY.MosammaparastN.WangJ. K.LanF.ShiY.SegalE.ChangH. Y.Long noncoding RNA as modular scaffold of histone modification complexes201032959926896932-s2.0-7795457273510.1126/science.1192002KotakeY.NakagawaT.KitagawaK.SuzukiS.LiuN.KitagawaM.XiongY.Long non-coding RNA ANRIL is required for the PRC2 recruitment to and silencing of p15INK4B tumor suppressor gene20113016195619622-s2.0-7995546828010.1038/onc.2010.568YapK. L.LiS.Muñoz-CabelloA. M.RaguzS.ZengL.MujtabaS.GilJ.WalshM. J.ZhouM. M.Molecular interplay of the noncoding RNA ANRIL and methylated histone H3 lysine 27 by polycomb CBX7 in transcriptional silencing of INK4a20103856626742-s2.0-7795309607210.1016/j.molcel.2010.03.021MaisonC.BaillyD.RocheD.De OcaR. M.ProbstA. V.VassiasI.DingliF.LombardB.LoewD.QuivyJ. P.AlmouzniG.SUMOylation promotes de novo targeting of HP1alpha to pericentric heterochromatin20114332202272-s2.0-7995218908510.1038/ng.765AmaralP. P.ClarkM. B.GascoigneD. K.DingerM. E.MattickJ. S.LncRNAdb: a reference database for long noncoding RNAs2011391D146D1512-s2.0-7865132593210.1093/nar/gkq1138BuD.YuK.SunS.XieC.SkogerboG.NONCODE v3. 0: integrative annotation of long noncoding RNAs201240D210D215VoldersP. J.HelsensK.WangX.MentenB.MartensL.LNCipedia: a database for annotated human lncRNA transcript sequences and structuresNucleic Acids Research. In press10.1093/nar/gks915KinT.YamadaK.TeraiG.OkidaH.YoshinariY.OnoY.KojimaA.KimuraY.KomoriT.AsaiK.fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences2007351D145D1482-s2.0-3384605683510.1093/nar/gkl837DingerM. E.PangK. C.MercerT. R.CroweM. L.GrimmondS. M.MattickJ. S.NRED: a database of long noncoding RNA expression2009371D122D1262-s2.0-5814917717710.1093/nar/gkn617MichelhaughS. K.LipovichL.BlytheJ.JiaH.KapatosG.BannonM. J.Mining Affymetrix microarray data for long non-coding RNAs: altered expression in the nucleus accumbens of heroin abusers201111634594662-s2.0-7865130563810.1111/j.1471-4159.2010.07126.xBabakT.BlencoweB. J.HughesT. R.A systematic search for new mammalian noncoding RNAs indicates little conserved intergenic transcription20056article no. 142-s2.0-2544445265110.1186/1471-2164-6-104GibbE. A.VucicE. A.EnfieldK. S.StewartG. L.LonerganK. M.Human cancer long non-coding RNA transcriptomes20116e2591510.1371/journal.pone.0025915LeeT. L.XiaoA.RennertO. M.Identification of novel long noncoding RNA transcripts in male germ cells2012825105114FurunoM.PangK. C.NinomiyaN.FukudaS.FrithM. C.BultC.KaiC.KawaiJ.CarninciP.HayashizakiY.MattickJ. S.SuzukiH.Clusters of internally primed transcripts reveal novel long noncoding RNAs200624, article no. e372-s2.0-3374657375410.1371/journal.pgen.0020037HuangW.LongN.KhatibH.Genome-wide identification and initial characterization of bovine long non-coding RNAs from EST data201243674682LiT.WangS.WuR.ZhouX.ZhuD.Identification of long non-protein coding RNAs in chicken skeletal muscle using next generation sequencing201299292298PauliA.ValenE.LinM. F.GarberM.VastenhouwN. L.Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis201222577591PrensnerJ. R.IyerM. K.BalbinO. A.DhanasekaranS. M.CaoQ.BrennerJ. C.LaxmanB.AsanganiI. A.GrassoC. S.KominskyH. D.CaoX.JingX.WangX.SiddiquiJ.WeiJ. T.RobinsonD.IyerH. K.PalanisamyN.MaherC. A.ChinnaiyanA. M.Transcriptome sequencing across a prostate cancer cohort identifies PCAT-1, an unannotated lincRNA implicated in disease progression20112987427492-s2.0-7996120286510.1038/nbt.1914ZhaoJ.SunB. K.ErwinJ. A.SongJ. J.LeeJ. T.Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome200832259027507562-s2.0-5534910996310.1126/science.1163045ParkP. J.ChIP-seq: advantages and challenges of a maturing technology200910106696802-s2.0-7034931235410.1038/nrg2641GuttmanM.AmitI.GarberM.FrenchC.LinM. F.FeldserD.HuarteM.ZukO.CareyB. W.CassadyJ. P.CabiliM. N.JaenischR.MikkelsenT. S.JacksT.HacohenN.BernsteinB. E.KellisM.RegevA.RinnJ. L.LanderE. S.Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals200945872352232272-s2.0-6224913370910.1038/nature07672OkazakiY.FurunoM.KasukawaT.HayashizakiY.AdachiJ.BonoH.KondoS.NikaidoI.OsatoN.SaitoR.SuzukiH.YamanakaI.KiyosawaH.YagiK.TomaruY.HasegawaY.NogamiA.SchönbachC.GojoboriT.BaldarelliR.HillD. P.BultC.HumeD. A.QuackenbushJ.SchrimlL. M.KanapinA.MatsudaH.BatalovS.BeiselK. W.BlakeJ. A.BradtD.BrusicV.ChothiaC.CorbaniL. E.CousinsS.DallaE.DraganiT. A.FletcherC. F.ForrestA.FrazerK. S.GaasterlandT.GariboldiM.GissiC.GodzikA.GoughJ.GrimmondS.GustincichS.HirokawaN.JacksonI. J.JarvisE. D.KanaiA.KawajiH.KawasawaY.KedzierskiR. M.KingB. L.KonagayaA.KurochkinI. V.LeeY.LenhardB.LyonsP. A.MaglottD. R.MaltaisL.MarchionniL.McKenzieL.MikiH.NagashimaT.NumataK.OkidoT.PavanW. J.PerteaG.PesoleG.PetrovskyN.PillaiR.PontiusJ. U.QiD.RamachandranS.RavasiT.ReedJ. C.ReedD. J.ReidJ.RingB. Z.RingwaldM.SandelinA.SchneiderC.SempleC. A. M.SetouM.ShimadaK.SultanaR.TakenakaY.TaylorM. S.TeasdaleR. D.TomitaM.VerardoR.WagnerL.WahlestedC.WangY.WatanabeY.WellsC.WilmingL. G.Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs200242069155635732-s2.0-014210404510.1038/nature01266YangX.TschaplinskiT. J.HurstG. B.JawdyS.AbrahamP. E.LankfordP. K.AdamsR. M.ShahM. B.HettichR. L.LindquistE.KalluriU. C.GunterL. E.PennacchioC.TuskanG. A.Discovery and annotation of small proteins using genomics, proteomics, and computational approaches20112146346412-s2.0-7995333194410.1101/gr.109280.110LinM. F.CarlsonJ. W.CrosbyM. A.MatthewsB. B.YuC.ParkS.WanK. H.SchroederA. J.GramatesL. S.PierreS. E. S.RoarkM.WileyK. L.KulathinaR. J.ZhangP.MyrickK. V.AntoneJ. V.CelnikerS. E.GelbartW. M.KellisM.Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes20071712182318362-s2.0-3884910177710.1101/gr.6679507ClampM.FryB.KamalM.XieX.CuffJ.LinM. F.KellisM.Lindblad-TohK.LanderE. S.Distinguishing protein-coding and noncoding genes in the human genome20071044919428194332-s2.0-3764899931310.1073/pnas.0709013104LinM. F.JungreisI.KellisM.PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions20112713i275i2822-s2.0-7995944696210.1093/bioinformatics/btr209btr209WashietlS.FindeißS.MüllerS. A.KalkhofS.Von BergenM.HofackerI. L.StadlerP. F.GoldmanN.RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data20111745785942-s2.0-7995300923010.1261/rna.2536111RivasE.EddyS. R.Noncoding RNA gene detection using comparative sequence analysis20012, article no. 82-s2.0-000315718310.1186/1471-2105-2-8WashietlS.HofackerI. L.StadlerP. F.Fast and reliable prediction of noncoding RNAs20051027245424592-s2.0-1404427283510.1073/pnas.0409169102PedersenJ. S.BejeranoG.SiepelA.RosenbloomK.Lindblad-TohK.LanderE. S.KentJ.MillerW.HausslerD.Identification and classification of conserved RNA secondary structures in the human genome200624, article no. e332512622-s2.0-3364627240910.1371/journal.pcbi.0020033DuretL.ChureauC.SamainS.WeissanbachJ.AvnerP.The Xist RNA gene evolved in eutherians by pseudogenization of a protein-coding gene20063125780165316552-s2.0-3374512590110.1126/science.1126316Chooniedass-KothariS.EmberleyE.HamedaniM. K.TroupS.WangX.CzosnekA.HubeF.MutaweM.WatsonP. H.LeygueE.The steroid receptor RNA activator is the first functional RNA encoding a protein20045661-343472-s2.0-244259724410.1016/j.febslet.2004.03.104KimE. D.SungS.Long noncoding RNA: unveiling hidden layer of gene regulatory networks2012171621HillerM.FindeißS.LeinS.MarzM.NickelC.RoseD.SchulzC.BackofenR.ProhaskaS. J.ReuterG.StadlerP. F.Conserved introns reveal novel transcripts in Drosophila melanogaster2009197128913002-s2.0-6765002286610.1101/gr.090050.108MattickJ. S.The genetic signatures of noncoding RNAs2009542-s2.0-6564909263310.1371/journal.pgen.1000459e1000459BernsteinE.AllisC. D.RNA meets chromatin20051914163516552-s2.0-2304443749810.1101/gad.1324305WhiteheadJ.PandeyG. K.KanduriC.Regulation of the mammalian epigenome by long noncoding RNAs2009179099369472-s2.0-6834909025310.1016/j.bbagen.2008.10.007WiluszJ. E.SunwooH.SpectorD. L.Long noncoding RNAs: functional surprises from the RNA world20092313149415042-s2.0-6764967196110.1101/gad.1800909BeltranM.PuigI.PeñaC.GarcíaJ. M.ÁlvarezA. B.PeñaR.BonillaF.De HerrerosA. G.A natural antisense transcript regulates Zeb2/Sip1 gene expression during Snail1-induced epithelial-mesenchymal transition20082267567692-s2.0-4114911256410.1101/gad.455708ØromU. A.ShiekhattarR.Noncoding RNAs and enhancers: complications of a long-distance relationship2011274334392-s2.0-7996112150610.1016/j.tig.2011.06.009MattickJ. S.MakuninI. V.Small regulatory RNAs in mammals2005141R121R1322-s2.0-1594440292010.1093/hmg/ddi101NaganoT.FraserP.No-nonsense functions for long noncoding RNAs201114521781812-s2.0-7995385865810.1016/j.cell.2011.03.014GuttmanM.DonagheyJ.CareyB. W.GarberM.GrenierJ. K.lincRNAs act in the circuitry controlling pluripotency and differentiation2011477295300JohnsonR.Long non-coding RNAs in Huntington's disease neurodegeneration201246245254De SantaF.BarozziI.MiettonF.GhislettiS.PollettiS.TusiB. K.MullerH.RagoussisJ.WeiC. L.NatoliG.A large fraction of extragenic RNA Pol II transcription sites overlap enhancers201085e10003842-s2.0-7795290874310.1371/journal.pbio.1000384LiZ. H.RanaT. M.Molecular mechanisms of RNA-triggered gene silencing machineries20124511221131PonjavicJ.OliverP. L.LunterG.PontingC. P.Genomic and transcriptional co-localization of protein-coding and long non-coding RNA pairs in the developing brain2009582-s2.0-7014912370810.1371/journal.pgen.1000617e1000617EbisuyaM.YamamotoT.NakajimaM.NishidaE.Ripples from neighbouring transcription2008109110611132-s2.0-5104910233210.1038/ncb1771BrownC. J.BallabioA.RupertJ. L.LafreniereR. G.GrompeM.TonlorenziR.WillardH. F.A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome1991349630438442-s2.0-002596177110.1038/349038a0SleutelsF.ZwartR.BarlowD. P.The non-coding Air RNA is required for silencing autosomal imprinted genes200241568738108132-s2.0-0037075032LeeJ. T.Lessons from X-chromosome inactivation: long ncRNA as guides and tethers to the epigenome20092316183118422-s2.0-6924923574510.1101/gad.1811209SchmitzK. M.MayerC.PostepskaA.GrummtI.Interaction of noncoding RNA with the rDNA promoter mediates recruitment of DNMT3b and silencing of rRNA genes20102420226422692-s2.0-7795857636310.1101/gad.590910WillinghamA. T.OrthA. P.BatalovS.PetersE. C.WenB. G.Aza-BlancP.HogeneschJ. B.SchultzP. G.Molecular biology: a strategy for probing the function of noncoding RNAs finds a repressor of NFAT20053095740157015732-s2.0-2464448140610.1126/science.1115901MercerT. R.DingerM. E.MattickJ. S.Long non-coding RNAs: insights into functions20091031551592-s2.0-6034912091410.1038/nrg2521PangK. C.DingerM. E.MercerT. R.MalquoriL.GrimmondS. M.ChenW.MattickJ. S.Genome-wide identification of long noncoding RNAs in CD8+ T cells200918212773877482-s2.0-6764916060010.4049/jimmunol.0900603KhachaneA. N.HarrisonP. M.Mining mammalian transcript data for functional long non-coding RNAs2010542-s2.0-7795641175410.1371/journal.pone.0010316e10316LiaoQ.LiuC.YuanX.KangS.MiaoR.XiaoH.ZhaoG.LuoH.BuD.ZhaoH.SkogerbøG.WuZ.ZhaoY.Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network2011399386438782-s2.0-7995601026310.1093/nar/gkq1348BraconiC.KogureT.ValeriN.HuangN.NuovoG.CostineanS.NegriniM.MiottoE.CroceC. M.PatelT.microRNA-29 can regulate expression of the long non-coding RNA gene MEG3 in hepatocellular cancer201130475047562-s2.0-7995746886110.1038/onc.2011.193EbertM. S.SharpP. A.Emerging roles for natural microRNA sponges20102019R858R8612-s2.0-7795785029110.1016/j.cub.2010.08.052JeggariA.MarksD. S.LarssonE.miRcode: a map of putative microRNA target sites in the long non-coding transcriptome20122820622063BellucciM.AgostiniF.MasinM.TartagliaG. G.Predicting protein associations with long noncoding RNAs2011864444452-s2.0-7995784086810.1038/nmeth.1611