Evolutionary and Expression Analysis of miR-#-5p and miR-#-3p at the miRNAs/isomiRs Levels

We mainly discussed miR-#-5p and miR-#-3p under three aspects: (1) primary evolutionary analysis of human miRNAs; (2) evolutionary analysis of miRNAs from different arms across the typical 10 vertebrates; (3) expression pattern analysis of miRNAs at the miRNA/isomiR levels using public small RNA sequencing datasets. We found that no bias can be detected between the numbers of 5p-miRNA and 3p-miRNA, while miRNAs from miR-#-5p and miR-#-3p show variable nucleotide compositions. IsomiR expression profiles from the two arms are always stable, but isomiR expressions in diseased samples are prone to show larger degree of dispersion. miR-#-5p and miR-#-3p have relative independent evolution/expression patterns and datasets of target mRNAs, which might also contribute to the phenomena of arm selection and/or arm switching. Simultaneously, miRNA/isomiR expression profiles may be regulated via arm selection and/or arm switching, and the dynamic miRNAome and isomiRome will adapt to functional and/or evolutionary pressures. A comprehensive analysis and further experimental study at the miRNA/isomiR levels are quite necessary for miRNA study.


Introduction
MicroRNAs (miRNAs) have been widely studied as a class of well-conserved negative regulatory molecules. They play an important role in biological processes by regulating gene expression at the posttranscriptional level [1,2]. As endogenous small noncoding RNAs (ncRNAs) (∼22 nt), miRNAs are generated from the cleavage of primary miRNAs (pri-miRNAs) and precursor miRNAs (pre-miRNAs) by Drosha and Dicer cleavage [3][4][5]. miRNA may be generated from 5p or 3p arm of pre-miRNA, and the selection is believed to be influenced by hydrogen-bonding selection [6]. Based on the typical miRNA genesis, one arm can produce abundant active mature miRNAs, while another arm can produce rare and inactive miRNAs * (miRNA star, also ever named passenger strand). However, increasing evidence indicates that both arms can generate mature miRNAs under specific developmental stages or species [7][8][9][10][11][12][13]. Indeed, many pre-miRNAs have been reported to yield two kinds of mature miRNAs, although the two products, miR-#-5p and miR-#-3p, may vary in expression levels. The term given to this dynamic selection and expression is "arm switching'' [8,14]. Evolutionary analysis demonstrates that both miR-#-5p and miR-#-3p are conserved, although the nondominant miRNA sequences are not well-conserved with dominant miRNA sequences [15]. Increasing reports indicate that the nondominantly expressed miRNA sequences may act as potential regulatory molecules with unexpectedly abundant expression levels [16][17][18].
Although the typical miRNA is annotated and studied as a single sequence, accumulating evidence suggests that multiple sequences with varied 5 and/or 3 ends or varied lengths have been detected from the miRNA locus. The annotated or canonical miRNA is only one specific member of the multiple sequences. These multiple sequences are termed miRNA variants, also named isomiRs [19][20][21][22][23]. The miRNA isoforms are mainly derived from imprecise cleavage 2 BioMed Research International by Drosha/Dicer and 3 addition events through miRNA processing and maturation processes. RNA editing and single nucleotide polymorphisms (SNPs) also contribute to the generation of these multiple isomiRs [22]. The occurrence of multiple isomiRs is quite common, and each miRNA locus can be associated with these various miRNA isoforms [9,19,21,[23][24][25][26][27][28][29][30]. Despite the fact that both miR-#-5p and miR-#-3p are generated from the pre-miRNA and can form miRNA:miRNA duplex through nucleotide complementary base pairing, the two miRNA loci may yield various isomiR expression profiles and patterns [31].
This study aimed to explore the potential evolutionary and expression divergences and relationships between miRNAs from different arms of different/same pre-miRNAs. First, we characterized the origins and nucleotide compositions of all the annotated human miRNAs. Second, we performed evolutionary analysis on the common miRNAs among 10 typical vertebrates and then analyzed the nondominant miRNAs based on the pre-miRNAs. Finally, the expression analysis was performed in samples from female patients using published small RNA sequencing datasets. Because gender difference can affect isomiR expression profiles [32], and common variation affects various diseases and medically relevant characteristics in a sex-dependent manner [33], we selected female patients to analyze miRNA/isomiR expression profiles to avoid potential effects from gender difference. miRNA expression patterns were mainly estimated at the miRNA/isomiR levels, especially between homologous miRNAs and between miR-#-5p and miR-#-3p. This study provides insights on the arm selection and/or arm switching in miRNAs from the evolutionary and expression angles, which would partly be informative to understanding the dynamic miRNAome and isomiRome and to characterizing miRNA and isomiR expression profiles. Study from the isomiR level may be a necessary way to understand miRNA, especially for those isomiRs from ever termed passenger stand, which will contribute to further explore miRNA biogenesis and function.
Location information of miRNA on pre-miRNAs was obtained according to the annotations in the miRBase database. Specifically, miRNA generated from 5p arm of pre-miRNA was named miR-#-5p (# indicated the detailed miRNA name, such as miR-100), and miRNA generated from 3p arm of pre-miRNA was named miR-#-3p. If there is no existing annotation, the detailed location distributions were determined using self-developed scripts. Many miRNAs may be generated from multicopy pre-miRNAs, and herein we only presented the detailed isomiR expression profiles based on location of the first pre-miRNA. In the study, miR-#-5p and miR-#-3p were defined as miRNA pairs generated from the 5p and 3p arm of pre-miRNA, respectively, and 5p-miRNA and 3p-miRNA were defined as the miRNAs generated from 5p or 3p arm of different pre-miRNAs.

Evolutionary Analysis of miRNAs in Ten Test Vertebrates.
Known annotated miRNAs from ten vertebrates were comprehensively surveyed for common miRNA members using self-developed scripts. These miRNAs were further classified based on the unit of miRNA gene family because many miRNAs could belong to the same gene family based on homologous sequences with high sequence similarity. Those pre-miRNAs that were not comprehensively annotated (miR-#-5p or miR-#-3p was not simultaneously annotated based on limited studies), unannotated miRNA sequences, were predicted and obtained from consensus sequences using pre-miRNAs and known human miRNAs. The main reasons were as follows: (1) human miRNAs have been widely studied, and most miR-#-5p and miR-#-3p are reported and annotated; (2) most miRNAs are phylogenetically well-conserved across different animal species, and well-conserved consensus sequences are easily obtained using sequence alignment analysis; (3) although the miR-#-5p and miR-#-3p show different levels of evolutionary divergence, both of them are conserved; (4) according to the known miRNA sequences and pre-miRNAs, the detailed miR-#-5p and miR-#-3p sequences can be collected. The shared miRNAs were aligned using Clustal X 2.0 multiple sequence alignment [35]. Nucleotide divergence was analyzed using MEGA 5.10 software [36] and DnaSP 5.10.01 software [37]. Simultaneously, nucleotide diversity ( ), haplotype diversity (Hd), and average number of nucleotide differences ( ) for the miRNAs from different animal species were calculated using DnaSP software as special miRNA populations [38]. Evolutionary patterns were estimated based on nucleotide divergence across the ten animal species using percentage of nucleotide substitutions (transition and transversion) and insertions/deletions in each position. The reference nucleotide was denoted as human miRNA. Based on the potential length difference between miRNAs in different species, we only analyzed the core sequences and not the terminus nucleotides with deficiency (these nucleotides were mostly derived from length differences). Nucleotide divergence patterns were further estimated between 5p-miRNA and 3p-miRNA and between miR-#-5p and miR-#-3p.

Analysis of the miRNA/isomiR Expression Levels Using
Public Sequencing Datasets. In order to understand the expression patterns of miR-#-5p and miR-#-3p pairs, we analyzed them at the miRNA/isomiR levels using small RNA sequencing datasets generated by The Cancer Genome Atlas (TCGA) pilot project established by the NCI and NHGRI. Information about TCGA and the investigators and institutions constituting the TCGA research network can be found at http://cancergenome.nih.gov/. Available small RNA sequencing datasets associated with the three kinds of women's diseases including breast cancer (BRCA), ovarian serous cystadenocarcinoma (OV), uterine corpus endometrial carcinoma (UCEC), and their respective control samples were selected to investigate miRNA expression patterns at the miRNA/isomiR levels (see Table S1 in Supplementary Material available online at http://dx.doi.org/10.1155/2015/168358). We also conducted expression analysis in the three kinds of women's diseases dataset of some miRNAs (especially homologous miRNAs) identified from our evolutionary analysis. All of these high-throughput sequencing datasets were generated on Illumina HiSeq sequencing platform.
Reads per million (RPM) were used to estimate the relative expression levels, and relative expression rate (percentage) in the miRNA locus was used to assess the isomiR expression patterns across different samples. In order to track relative expression levels of miRNA/isomiR and reduce potential sequencing errors/mapping procedures, only those abundant miRNAs/isomiRs were selected to perform the analysis using larger sample sizes. The abundant expression and larger sample sizes could reduce error. Further, functional analysis was performed between miR-#-5p and miR-#-3p and between canonical miRNA sequences and their 5 isomiRs (with the novel 5 ends and seed sequences). According to the seed sequences, target mRNAs were predicted and obtained from TargetScan program (http://www.targetscan.org/).

Evolutionary Patterns of miR-#-5p/miR-#-3p and 5p
-miRNA/3p-miRNA across Species. There were 31 miRNAs gene families (contain 43 miRNA members) shared by the 10 test animal species (Table S2). They may be composed of two or more members with high sequence similarity, but these members were not always shared by the 10 species. The common miRNA might have different number of pre-miRNAs (also termed multicopy pre-miRNAs) in different species and even have different number of homologous miRNAs ( Figure S1).
Phylogenetic trees and networks were reconstructed using pre-miRNAs and miRNAs from Figure S1, respectively ( Figure 3). The phylogenetic tree of let-7a was split into three clusters, and each cluster contained pre-miRNAs from different animal species (Figure 3(a)). Compared to the tree of the single miRNA gene of let-7a, the phylogenetic tree of homologous mir-30b, mir-30c, and mir-30d could be split (Figure 3(b)). mir-30d showed larger genetic distance with mir-30b and mir-30c. The pma-mir-30b and pma-mir-30c were clustered with mir-30d, which indicates that these should be members of pma-mir-30d ( Figure S1 and Figure 3). The evolutionary networks of miR-#-5p and miR-#-3p showed various patterns (Figures 3(c) and 3(d)). Different types of sequences (termed miRNA haplotypes) were classified with different frequencies. For example, let-7a-5p was highly conserved across the ten animal species, and only one specific sequence was identified. However, let-7a-3p was associated with high nucleotide variation and showed a complex evolutionary network ( Figure S1A and Figure 3(c)). Compared to let-7, both evolutionary networks of miR-30-5p and miR-30-3p showed clear module networks based on miRNA members (Figure 3(d)).

Expression Analysis of miR-#-5p/miR-#-3p at the miRNA/isomiR Levels.
We analyzed available miRNA datasets of 2,144 patients or volunteers with women's diseases (BRCA, OV, or UCEC) and their relevant controls (Table  S1). Following evolutionary analysis, several miRNAs were selected to perform expression analysis using these sequencing datasets. Generally, in the miRNA locus, only several isomiRs were dominantly expressed (Figure 4 and Tables S6, S7, and S8). Homologous miRNAs were likely to show similar isomiR expression pattern, such as miR-30a and miR-30e (Figure 4). Dominant miRNAs and their multiple isomiRs were present at abundant expression levels, while most of nondominant strands were not abundant. Abundantly expressed isomiRs were always near the most dominant isomiR sequence. Specifically, their 5 or 3 ends either were the same or differ at 1-2 nucleotides (Figure 4 and Tables S6, S7, and S8). The standard deviation (SD) of the average percentage of each isomiR showed diverse distributions ( Figure 5 and Figures S2, S3, and S4). Different miRNAs showed different types of isomiRs with diverse expression distribution and SD (Figures 4 and 5 and Figures  S2, S3, and S4)   be detected larger SD (Figure 4 and Figures S2 and S3), and similar SD distributions could be found between diseased and normal samples ( Figure 5 and Figure S4). Generally, at the isomiR level, the average percentages of samples from disease patients would be involved in larger divergence than control samples, and similar results can be detected based on all miRNAs ( Figure 5 and Figure S4).

Functional Analysis of miR-#-5p/miR-#-3p at the miRNA/isomiR Levels.
Although miR-#-5p and miR-#-3p had different sequences and seed sequences, some common targets could be detected ( Figure S5A). These miRNA pairs could bind different regions in UTR (untranslated regions) of target mRNAs, although the phenomenon was rare (larger amounts of specific targets could be detected). The common targets were more popular between the canonical miRNA sequences and their 5 isomiRs, despite the fact that "seed shifting" could be detected between them ( Figures  S5B and S5C). There were about half of target mRNAs of 5 isomiRs that were shared by the canonical miRNA sequences, although these 5 isomiRs were involved in novel seed sequences via "seed shifting" events.

Evolutionary Divergence between miRNAs from Different
Arms. miRNAs have been widely regarded as a class of crucial negative regulatory molecules with important biological roles, especially for their roles in tumorigenesis. Based on the current annotated human miRNAs, similar numbers of 5p-miR and 3p-miR show well-conserved sequences across different species, although they are involved in inconsistent length distributions and nucleotide compositions, including multiple repetitive nucleotides (Figures 1(a)-1(c), Figure 2, and Table 1). This difference may be influenced by larger sample sizes. Simultaneously, mirtrons have been reported as alternative precursors for miRNA biogenesis in vertebrates [43], which may lead to the difference of nucleotide compositions because of nucleotide biases in mirtrons. There are 849 pairs that are identified as miR-#-5p and miR-#-3p, and significant difference in length distributions and nucleotide compositions is detected between the two arms (Figures 1(b)-1(d), Table 1, and Table S3). Evolutionary analysis shows that both dominant and nondominant miRNAs are conserved, although the nondominant miRNA is associated with more nucleotide variation across homologous miRNAs and different species [15]. Phylogenetic relationship shows that these multicopy pre-miRNAs are located in different clusters (Figure 3), which suggests the similar distributions of miRNA genes across different species. The well-conserved sequence contributes to stable miRNA-mRNA regulatory network, and simultaneously, the evolutionary process is also controlled by functional pressures. The two arms of pre-miRNA showed various evolutionary patterns via different levels of nucleotide substitutions and insertions/deletions ( Figure S1, Figure 2, and Table S3), which may influence stem-loop structure of pre-miRNA (Table S5). However, both of the two arms are always well-conserved in the functional region, termed the "seed sequences" (Figure 3(a) and Table S3). These results suggest that both products from the two arms are regulatory molecules, although they always have various expression levels. Homologous and clustered miRNAs are commonly found in miRNAs [44]. No significant relationships between these homologous miRNAs can be detected (Figure 2(b) and Table  S4). These findings indicate relatively rapid evolutionary patterns between homologous miRNAs, especially between the less well-conserved nondominant strands (Figure 2(b)). Despite the possibility that these miRNAs have evolved from the common ancient miRNA gene, varied nucleotides in miRNAs, especially in the "seed sequences, " will generate novel miRNAs with novel candidate target mRNAs. Simultaneously, coevolution of miRNA and target mRNAs also contributes to the varied miRNAs across different species [45]. Taken together, homologous miRNAs may provide a method to generate novel miRNA genes via duplication events, and multicopy pre-miRNAs are probably transitional products. The driving force should be mainly derived from functional and evolutionary pressures, which largely contributes to the dynamic miRNAome, and enriches the potential relationships between different miRNAs.

Expression and Function between miRNAs from
Different Arms. Similar to our previous studies [21,46,47], we found that only several isomiRs (always 1-3) are dominantly expressed, and others have lower expression rate ( Figure 4 and Tables S6, S7, and S8). The interesting distributions are consistent in different individuals, including samples from patients with disease and healthy controls. The similar distributions suggest that isomiR expression patterns are always stable across different samples [21,26]. The characteristics of these dominant isomiRs provide the possibility of imprecise cleavage of Drosha and Dicer through pre-miRNA processing and miRNA maturation processes. Indeed, due to the smaller size of miRNA sequence (∼22 nt), degradation of hairpins may also be one factor that contributes to rare isomiRs [48]. Although the distribution of isomiR expression is similar across different samples, no significant correlations can be found between isomiR expression profiles of miR-#-5p and miR-#-3p ( Figure 4). Simultaneously, various standard values of deviation can be found ( Figure 5 and Figures S2, S3, and  S4). Compared to control samples, samples from patients with disease may be involved in larger expression divergence across different samples ( Figure 5). This suggests that a more flexible expression of isomiRs can be detected across different samples from patients with disease compared to control samples. Functional analysis showed that some common target mRNAs between miR-#-5p and miR-#-3p can be detected, although they have no different sequences and most target mRNAs are specific ( Figure S5A). Simultaneously, more shared target mRNAs are obtained between the canonical miRNA and 5 isomiRs despite being with "seed shifting" events ( Figures S5B and S5C). The interesting results imply that multiple isomiRs may coordinately contribute to the specific biological processes by binding different regions in UTR. Moreover, 3 addition events (isomiRs with additional nontemplate nucleotides in 3 ends) are quite common in isomiRome, while no further analysis is performed in the present study based on the previous TCGA datasets. The phenomenon of 3 additions may have versatile biological roles, including affecting target selection or miRNA stability [22,24,26,49]. Collectively, analyzing multiple isomiRs and their expression patterns is the first step towards a systematic understanding of the miRNA world, including the genesis and regulatory roles of miRNAs.
miRNAs are likely to be members of miRNA gene families/clusters sharing high sequence similarity or close location distribution. These homologous/clustered miRNAs may have evolved from ancestor genes via part or tandem historic duplication events [15,[50][51][52]. Previous study reported that homologous miRNAs are likely to show similar isomiR expression patterns [47], and our results are consistent with this observation (Figure 4 and Table S7). The similarity in the expression patterns implies that the pre-miRNA processing and miRNA maturation processes should be derived from the ancestral gene, which may contribute to the potential interactions in the regulatory network [47]. Moreover, we found that deregulated miRNAs are likely to have different types of isomiRs (miR-30a, miR-30e, and miR-10b, Figure 4 and Tables S6, S7, and S8). These deregulated miRNAs have been reported in breast cancer [53,54], and the moderate expression patterns can be detected. No enough evidence indicates that miRNA with moderate isomiR expression is likely to be abnormally expressed and contributes to abnormal biological roles. More studies, especially for experimental validation, are needed to further study the small noncoding RNAs at the isomiR level.

Selection of 5p and 3p or Switching between the Two
Arms in miRNAome/isomiRome. The phenomenon of arm selection shows that miRNAs may be derived from different arms, and the arm switching phenomenon suggests that the two arms may also show dynamic expression patterns. miRNAs from the two arms (they can form miRNA:miRNA duplex) always show different evolutionary patterns and also have various expression levels and isomiR expression patterns. Most of pre-miRNAs only produce one dominant and one rare miRNAs in specific samples, although the expression rate of the two miRNAs may be changed in other samples (arm switching phenomenon). Indeed, the two arms of many pre-miRNAs are conserved (especially in "seed sequences"), providing the possibility to be regulatory molecules, and the arm switching phenomenon further enriches the dynamic miRNAome by controlling miRNA expression profiles to adapt to functional and/or evolutionary needs. Expression and evolution patterns in miR-#-5p and miR-#-3p are relatively independent, and they are prone to regulate different targets. Based on the phenomena of arm selection or arm switching, the dynamic miRNAome also represents the multiple and dynamic isomiRome at the isomiR level. These isomiRs provide more information towards further understanding of miRNAs, in that isomiR expression patterns may indicate the characteristics of pre-miRNA processing and miRNA maturation processes. Thus it is worth exploring the biological roles of miRNAs at the isomiR level and the origin of miRNAs (5p or 3p) and related miRNAs based on miRNA gene family/cluster. Taken together, the arm selection and/or arm switching may be an important method to regulate miRNAome and isomiRome, and the dynamic miRNA and isomiR expression profiles will adapt to functional and/or evolutionary pressures.