MicroRNAs are a group of noncoding RNAs that are about 20–24 nucleotides in length. They are involved in the physiological processes of many diseases and regulate transcriptional and post-transcriptional gene expression. Therefore, the prediction of microRNAs is of great significance for basic biological research and disease treatment. MicroRNA precursors are the necessary stage of microRNA formation. RBF kernel support vector machines (RBF-SVMs) and shallow multiple kernel support vector machines (MK-SVMs) are often used in microRNA precursors prediction. However, the RBF-SVMs could not represent the richer sample features, and the MK-SVMs just use a simply convex combination of few base kernels. This paper proposed a localized multiple kernel learning model with a nonlinear synthetic kernel (LMKL-D). The nonlinear synthetic kernel was trained by a three-layer deep multiple kernel learning model. The LMKL-D model was tested on 2241 pre-microRNAs and 8494 pseudo hairpin sequences. The experiments showed that the LMKL-D model achieved 93.06% sensitivity, 99.27% specificity, and 98.03% accuracy on the test set. The results showed that the LMKL-D model can increase the complexity of kernels and better predict microRNA precursors. Our LMKL-D model can better predict microRNA precursors compared with the existing methods in specificity and accuracy. The LMKL-D model provides a reference for further validation of potential microRNA precursors.

MicroRNAs are a class of highly conserved endogenous noncoding RNAs with a length of about 20–24 nucleotides. They are single stranded and regulate gene expression at the post-transcriptional or translational level by binding specifically to target messenger RNA [

MicroRNA precursors (pre-microRNAs) can fold into hairpin structures, which are considered the most important indicators of microRNA maturation [

Structure of the pre-microRNA.

The methods of finding new microRNAs mainly include biological experimental methods and computer prediction methods [

MicroRNA precursors have a unique hairpin structure and are easier to obtain than microRNAs. Thus, computational prediction methods use machine learning to mainly identify microRNA precursors from candidate hairpin sequences. The authors in [

Multiple kernel methods have been successful on small data sets. By mapping the samples into a high-dimensional reproducing kernel Hilbert space, they only use very few parameters to enable a classifier to learn a complex decision boundary. How to determine the basic kernel function is the difficulty and key problem of multiple kernel learning. The localized multiple kernel learning [

The rest of the paper is organized as follows. In Section

The LMKL-D model proposed in this paper should be able to correctly identify pre-microRNAs and pseudo hairpin sequences from the candidate hairpin sequences dataset. Thus, the candidate hairpin sequence datasets have two parts. One is the positive real pre-microRNAs sequences. We obtained a total of 4,028 annotated known pre-microRNA sequences spanned 45 species from miRBase 12 [

In order to select better model, we randomly selected seventy percent of the candidate hairpin sequences as the training set and the remaining thirty percent as the test set. Thus, we randomly selected 1,500 pre-microRNAs and 6,000 pseudo pre-microRNAs as the training set. As for test set, 700 of the remaining positive real pre-microRNAs and 2,400 of the remaining negative pseudo hairpin sequences were randomly selected. Both training set and test set are normalized.

There are many methods to select the pre-microRNAs features. Traditionally, sequence, secondary structure, and thermodynamic properties are considered. In this paper, we use the dinucleotide frequencies proposed in [

The kernels are the inner products of the mapping relationship. A kernel can be described by the dot product of its two basic mapping functions as follows [

The mapping functions

Kernels are usually associated with SVMs. The basic principle of a single kernel SVM is, for a given dataset

Multiple kernel learning model is a kind of kernel-based learning model with more flexibility. Recent theories and applications have proved that using multiple kernels instead of single kernel can enhance the interpretability of the decision function and obtain better performance than the single kernel model [

In the multiple kernel learning model,

The traditional multiple kernel learning method is just a simple linear combination of a set of basic kernels and could not represent the deep features of the samples. Thus, we adopt a three-layer multiple kernel learning model to represent the deep features of the samples [

A deep multiple kernel model with

Although the increased complexity of the kernels can increase the risk of over-fitting, Strobl et al. [

Leave one out error has shown better accuracy in multiple kernel learning [

We use a contracting function

Span is optimized using the gradient descent method. Now, we get the deep multiple kernel learning algorithm with the derivative of

In the deep synthetic kernel proposed in this paper, the number of layers was set as three layers and each layer was set as 3 kernel functions. The kernel functions of the first layer were linear kernel, polynomial kernel, and Gaussian kernel.

While a single kernel function has only one characteristic, multiple kernel learning (MKL) has more flexibility by choosing a combination of basic kernels. However, multiple kernel learning assigns the same weight to each kernel when combining the basic kernels. The localized multiple kernel learning (LMKL) algorithm uses a gating model to locally select the appropriate weight for each basic kernel. Compared with MKL, LMKL could select suitable weight for the datasets. Experimental results on bioinformatics datasets show that LMKL with the gating model has better accuracy than the model with single kernel [

Derivatives of equation (

Traditional multiple kernel learning methods only select a few simple basic kernels, such as linear kernel

LMKL-D model proposed in this paper.

We used grid search to find the parameters of the simple basic kernels. The parameters with the highest accuracy were adopted. Finally, the parameter of the Gaussian kernel

We used multiple kernel learning models to obtain

For model selection, the dataset selection operations were repeated three times, and the average value of the results on test set was taken as the final performance of the model. Thus, for each training and test, the training set had 7,500 samples in total and the test set had 3,100 samples in total. For the DMKL model, we used LIBSVM [

In order to evaluate the performance of the localized multiple kernel learning using the deep synthetic kernel (LMKL-D) model proposed in this paper, the performances of the LMKL-D model were measured by sensitivity (SE, the proportion of the positive examples correctly classified), specificity (SP, the proportion of the negative examples correctly classified), geometric mean (GM, the square root of the product SE and SP), and accuracy (ACC, the percentage of correctly classified instances). SE, SP, GM, and ACC are defined in equation (

Comparison of prediction performance of LMKL-D with other existing methods. The BPNN was three layers here.

Comparison of prediction performance of LMKL-D with other existing methods.

Methods | SE | SP | GM | ACC |
---|---|---|---|---|

Triplet-SVM (libSVM) [ | 79.47 | 88.30 | 83.77 | 83.90 |

miPred (libSVM) [ | 84.55 | 97.97 | 91.01 | 93.50 |

MiPred (random forest) [ | 89.35 | 93.21 | 91.26 | 91.29 |

BPNN (3 layer) | 94.64 | 95.44 | 95.04 | 95.18 |

LMKL-D | 93.06 | 99.27 | 96.11 | 98.03 |

As shown in Figure

ROC curve of LMKL-D; AUC = 0.9611.

In order to better evaluate our LMKL-D model, we also compared LMKL-D with LMKL. For basic kernels, the LMKL-D model used four basic kernels,

Comparison of LMKL-D and LMKL on F1 and MCC.

Comparison of prediction performance of LMKL-D with LMKL.

Dataset | Methods | SE (%) | SP (%) | GM (%) | ACC (%) | AUC |
---|---|---|---|---|---|---|

Training set | LMKL | 87.47 | 99.60 | 93.34 | 97.17 | 0.9352 |

LMKL-D | 91.33 | 99.22 | 95.19 | 97.64 | 0.9535 | |

Test set | LMKL | 88.71 | 99.60 | 94.00 | 97.42 | 0.9407 |

LMKL-D | 93.06 | 99.27 | 96.11 | 98.03 | 0.9611 | |

Training set | LMKL | 87.83 | 99.60 | 93.53 | 97.24 | 0.9383 |

And test set | LMKL-D | 91.83 | 99.23 | 95.46 | 97.75 | 0.9574 |

From Figure 6 and Table 2, we can see that the LMKL-D model has 91.33

In this work, we have proposed a localized multiple kernel learning model with a three-layer deep synthetic kernel in improving the pre-microRNAs prediction accuracy of existing methods. The experiments show that our proposed model yielded comparable better predictive performances and is more stable than existing classifiers for identifying known pre-microRNAs. After being trained on hairpin sequences train set, the LMKL-D methods obtain 93.06

The known pre-microRNA sequences data can be downloaded from the miRBase website at http://www.mirbase.org; the UCSC refGene annotation list and the human RefSeq gene were available through

The authors declare that they have no conflicts of interest regarding the publication of this paper.

This work was supported by the National Natural Science Funds (grant nos. 62072391, 61872419, and 61573166), Joint Funds of ShanDong Natural Science Funds (grant no. ZR2019LZH003), Taishan Scholars Program of Shandong Province, China (grant no. tsqn201812077), Shandong Provincial Natural Science Foundation (grant no. ZR2018LF005), University Innovation Team Project of Jinan (grant no. 2019GXRCO15), and Key Science and Technology Innovation Project of Shandong Province (grant no. 2019JZZY010324).