Computational Identification of Novel MicroRNAs and Their Targets in Vigna unguiculata

MicroRNAs (miRNAs) are a class of endogenous, noncoding, short RNAs directly involved in regulating gene expression at the posttranscriptional level. High conservation of miRNAs in plant provides the foundation for identification of new miRNAs in other plant species through homology alignment. Here, previous known plant miRNAs were BLASTed against the Expressed Sequence Tag (EST) and Genomic Survey Sequence (GSS) databases of Vigna unguiculata, and according to a series of filtering criteria, a total of 47 miRNAs belonging to 13 miRNA families were identified, and 30 potential target genes of them were subsequently predicted, most of which seemed to encode transcription factors or enzymes participating in regulation of development, growth, metabolism, and other physiological processes. Overall, our findings lay the foundation for further researches of miRNAs function in Vigna unguiculata.


Introduction
MicroRNAs (miRNAs) are a class of endogenous, small, noncoding, single-stranded RNAs that act as posttranscriptional regulators in eukaryotes [1]. They have been reported to be located mostly within noncoding regions of genomes, and usually transcribed from RNA polymerase II promoters [2,3]. The generation of mature miRNA is a complicated enzyme-catalyzed process, from the initial transcript pri-miRNA to the precursor (pre-miRNA) with a characteristic hairpin structure, then a miRNA duplex (miRNA : miRNA * ) [4]. In the end, it is assembled to the RNA-induced silencing complex (RISC) to direct its activity on a target mRNA, depending on the degree of base-pairing between the miRNA and the responsive element and results in either cleavage or translational repression of the target mRNA. Perfect complementarity generally results in cleavage, such as in plants, whereas imperfect base-pairing leads to translational repression [4,5].
Considering the importance of miRNAs in gene regulation, two major categories of approaches have been applied for miRNA investigation [1]. Compared to the experimental approaches, computation (bioinformatics) methods have been proved to be faster, more affordable, and more effective, contributing mostly to today's plentiful storage in miRBase [1]. Different computational miRNA finding strategies have been developed based on a core principle of looking for conserved sequences between different species that can fold into extended hairpins [18]. The biogenesis of miRNAs suggests that it is possible to find miRNAs by searching Expressed Sequence Tags (ESTs) with known miRNAs. There have been more and more reports about the identification of miRNAs by mining the repository of available ESTs [19][20][21][22][23][24][25][26]. EST analysis makes it possible to rapidly study miRNAs and their functions in species whose genome sequences have not been well known [27].
Vigna unguiculata is a vital leguminous crop in tropical and subtropical areas of Asia, Africa, and Latin America, as well as parts of southern Europe and the USA [28].
High protein and carbohydrate contents make it not only important to the human diet, but also suitable as high protein feed and fodder to livestock. With its greater tolerance to heat, drought, and low soil fertility, Vigna unguiculata is a particularly valuable component of low-input farming systems of resource-poor farmers. Also, it is able to enhance soil fertility through nitrogen fixation [28,29]. Until recently limited progress has been made in basic gene discovery and gene regulation in Vigna unguiculata, and only two new miRNAs of it have been reported [29,30].
Despite the limited genome resources of Vigna unguiculata, publishing of EST and GSS databases in GenBank has provided the chance to get more genetic information. In this study, new miRNAs were mined for the purpose of understanding their roles in regulating growth, development, metabolism, and other physiological processes in Vigna unguiculata. RNA secondary structure and the free energy were calculated by web server mfold (http://mfold.bioinfo.rpi.edu/) [31]. The software MiRNAassist was applied to improve the analysis efficiency [20].

Prediction of Vigna unguiculata MiRNAs.
The prediction procedure was shown in Figure 1. The sequences of known plant miRNAs were used as query sequences for BLAST searches against the EST and GSS databases, with the BLAST parameters Evalue being 1000 and word-match size between the query and database sequences being 7. Mature miRNA sequences should be no less than 16 nt, and the mismatches should be less than 4. Wherever available, precursor sequences of 400 nt were extracted (200 nt upstream and 200 nt downstream to the BLAST hits) and used for the hairpin structure prediction. If the length of a sequence was less than 400 nt, the entire available sequence was used as a miRNA precursor sequence. These precursor sequences were then BLASTXed online to remove the protein coding sequences (http://blast.ncbi.nlm.nih.gov/Blast.cgi). The retained precursor sequences underwent hairpin structure prediction through web server mfold. Only those meeting the following criteria were designated as miRNA homologs: (1) the RNA sequence folding into an appropriate stem-loop hairpin secondary structure, (2) a mature miRNA sequence located in one arm of the hairpin structure, (3) predicted mature miRNAs with no more than 3 nt substitutions as compared with the known miRNAs, (4) miRNAs having less than 6 mismatches with the opposite miRNA * sequence in the other arm, (5) no loop or break in miRNA * sequences, and (6) predicted secondary structures with higher minimal folding free energy (MFE) and minimal folding free energy index (MFEI), the MEFI usually being over 0.85 [32]. Also, the AU content of pre-miRNA within 30% to 70% was considered [20].

Computational Prediction of MiRNA Targets.
MiRNA targets prediction was performed by aligning the predicted miRNA sequences with EST sequences of Vigna unguiculata through the BLAST program. The targets were screened according to these criteria: the number of mismatches should be less than 4, the minimal free energy of the pairing between miRNA and its target mRNA was lower than −28.2 kcal/mol [33]. After removal of the repeated sequences, the potential target genes were BLASTed against protein databases to predict their function (identity > 25%), since there were no functional annotations for mRNA sequences of Vigna unguiculata.

Phylogenetic Analysis of the New MiRNAs.
Considering the conservation of miRNAs and their precursors, the precursor sequences of the novel and the known miRNAs in the same family were aligned and phylogenetically analyzed by ClustalW online to investigate their evolutionary relationships (http://www.clustal.org/).

Prediction of Vigna unguiculata MiRNAs.
Sequence and structure homologies are the main theory behind the computer-based approach for miRNAs prediction. As described in Materials and Methods, after BLASTN searches, all blasted hits except coding sequences were maintained for secondary structure analysis, only those in line with the screening criteria were selected as candidates. In the end, 47 potential Vigna unguiculata miRNAs belonging to 13 miRNA families were identified, including two known miRNAs (vun-MIR1507a and vun-MIR1507b), and they were named according to Ambros [33]. Information on predicted Vigna unguiculata miRNAs, including names, lengths, sources, and other aspects, were listed in Table 1. The length of the 47 predicted miRNAs ranged from 19 nt to 24 nt, while the predicted precursor sequences ranged in length from 68 nt to 188 nt, all forming into typical stem-loop structures, with the mature miRNA either on the 5 end or the 3 end ( Figure 2). All the MFEIs of these hairpin structures were over 0.85, which was thought to be the gold standard to differentiate miRNAs from other ones [32].
Vigna unguiculata miRNA precursors were diverse in structure and size, even if they were from the same family, such as those from MIR 156, MIR 172, MIR 319, and MIR 399 families ( Figure 2). The distribution of miRNAs in each famiy was different too, some miRNA families have more than 10 members, such as miRNA156 family whereas only one was predicted in other families, such as in miRNA164, miRNA482 and miRNA2118 families ( Figure 3). That was consistent with the diversity of miRNAs in other plants [27].
With the availability of sequence resources, computerbased miRNA identification is becoming more and more important than experimental approaches due to its advantages of low cost and high efficiency [1]. At present, three kinds of databases, genome, GSS, and EST, are mainly used for plant miRNA prediction. Considering the unavailability of genome sequences of Vigna unguiculata, both EST and GSS databases were searched for miRNA identification. The number and sorts of miRNAs predicted in this work showed that this software-based approach was as feasible and effective as in other work [19][20][21][22][23][24][25][26].
According to Zhang [27], about 10000 ESTs contained one miRNA, so about 20 miRNAs should be predicted theoretically from the total of 241854 ESTs and GSSs in this work. Why as many as 47 were predicted? Generally, plant miRNA clusters have not been frequently observed, so only one plant miRNA can come from the same transcript. In this work, different length of mature miRNAs from the same precursor were regarded as different ones, considering they corresponded to different target genes.

Prediction of Vigna Unguiculata MiRNA Targets.
Based on the complementarity between miRNAs and their target genes in plants, the Vigna unguiculata EST database was searched for homology to the new miRNA sequences with a BLASTN algorithm for the discovery of miRNA targets. A total of 30 potential targets for 47 Vigna unguiculata miRNAs were identified, and the miRNA target site could be found in 5 UTR, 3 UTR or open reading frame. The miRNAs and their putative targets with known functions were listed in Table 2. These potential miRNA targets belonged to a number of gene families that had different biological functions.
Many of these targets were transcription factors that controlled plant development and phase change from vegetative growth to reproductive growth. Several studies indicated that MIR156/157 targeted squamosa promoter binding protein (SBP)-like (SPL) genes, a plant-specific family of transcription factors involved in early flower development and vegetative phase changes [34,35]. Auxin response factors (ARF), a plant-specific family of DNA binding proteins, were involved in hormone signal transduction [36]. AP(2)APETALA2 transcription factors were involved in several developmental processes in Arabidopsis thaliana embryo, endosperm, and seed coat development [37]. TCP family transcription factors have been reported to play roles in various aspects of plant development [38,39]. Heat shock transcription factor was reported to participate in heat shock response to environmental stress [40].
Identification of genes targeted by miRNAs is widely believed to be an important step toward understanding the role of miRNAs in gene regulatory networks. EST database   Comparative and Functional Genomics   searches play a vital role for the discovery of miRNA targets in plants based on the homology between miRNA and its target sequences [45]. The protein databases applied to predict the functions of the target genes in this work included nonredundant protein sequences, reference protein sequences and so on. Though this method may be errorprone, it provides a fast way to know something about the gene function.

(j)
Our prediction of target genes for the 47 miRNAs also revealed that not just one gene might be regulated by individual miRNA (Table 2). This result was consistent with recent findings in other plant species. Wu et al. also reported that multiple miRNAs could target the same gene. All these suggested that miRNA research should focus on networks more than on individual connections between miRNA and strongly predicted targets [46].   3.3. Phylogenetic Analysis of the New MiRNAs. Plant miRNAs are highly conserved among distantly related plant species, both in terms of primary and mature miRNAs [27]. Comparison of the precursor sequences of the predicted miRNAs with other members in the same family showed that most members could be found to have a high degree of sequence similarity with others, for example, the precursor sequence similarity between vun-MIR156 and other MIR156 members was over 60%, that between vun-MIR157 and ath-MIR157a, gra-MIR157a, bol-MIR157a, or sly-MIR157b was over 80% (data not shown). The conservation of mature miRNAs and premiRNAs provides the chance to investigate their evolutionary relationships. Based on the premiRNA sequence comparisons, the evolutionary relationships of Vigna unguiculata miRNAs with other members from the same families were analyzed using the ClustalW online. It could be seen from the phylogenetic trees that in different families, the evolutionary relationships of Vigna unguiculata miRNAs with other species were different; for example, in MIR160 family, vun-MIR160 and mtr-MIR160 were on the same branch, while in MIR169 family, vun-MIR169 had close relationship with wi-MIR169, and in MIR482 family, vun-MIR482 was closely related to pvu-MIR482 and gma-MIR482 (Figures 4(a)-4(c)). Also, it could be seen that different Vigna unguiculata miRNA members in the same family were often distantly

Conclusions
In this paper, with a bioinformatic method, 47 miRNAs were identified from the EST and GSS databases of Vigna unguiculata; also about 30 potential targets of them were predicted, which appeared to be related to the development, growth, metabolism, and other physiological processes such as stress response of Vigna unguiculata. The findings from this study will contribute to further researches of miRNAs function and regulatory mechanisms in Vigna unguiculata.