Associating microRNAs (miRNAs) with cancers is an important step of understanding the mechanisms of cancer pathogenesis and finding novel biomarkers for cancer therapies. In this study, we constructed a miRNA-cancer association network (miCancerna) based on more than 1,000 miRNA-cancer associations detected from millions of abstracts with the text-mining method, including 226 miRNA families and 20 common cancers. We further prioritized cancer-related miRNAs at the network level with the random-walk algorithm, achieving a relatively higher performance than previous miRNA disease networks. Finally, we examined the top 5 candidate miRNAs for each kind of cancer and found that 71% of them are confirmed experimentally. miCancerna would be an alternative resource for the cancer-related miRNA identification.
MicroRNAs (miRNAs) are a large class of small noncoding RNAs [
To this aim, the manually collected miRNA-disease association databases HMDD [
However, thousands of papers on miRNA and cancer researches are published each year, making it difficult to manually check papers. On the other hand, automatic text-mining methods are needed to extract reliable miRNA-disease associations [
In this paper, we collected 1,018 associations between 226 miRNA families and 20 common cancers by extracting from more than 7.1 million publications with an automatic text-mining method. All these relationships have been recorded in a database named miCancerna, which can be freely assessed at
We collected the abstracts from NCBI’s MEDLINE database as our target literature resource. MEDLINE is a comprehensive database containing the abstracts of millions of articles in biomedical area. Since a large number of papers are not fully accessible in the PubMed database, we only consider the abstracts for the papers, which are always available.
In 2000, Reinhart et al. [
Currently, the 20 most common cancers reported by National Cancer Institute (
With the selected abstracts, we firstly established relationships between miRNAs and cancers by a text-mining method. The associations between miRNAs and cancers were estimated based on the cooccurrence assumption, which is the fundamental assumption in the field of text-mining and can be used to infer whether two terms are associated or not. In our case, if a particular miRNA appears in the abstracts marked by a specific cancer frequently, we can reasonably assume that they cooccurred and tend to be related. To establish the associations between miRNAs and cancers, we detect the appearance of miRNAs in the abstracts marked by cancer types. In this study, the regular expression was applied to match miRNA names against the texts with the following steps. (1) miRNAs (such as “miR-1” and “miR-2”) were firstly extracted from the abstracts with the nomenclature of a “miR” prefix accompanied by a unique identifying number [
The significance levels of the associations of the miRNAs and the cancers extracted from the marked abstracts were estimated by one-sided Fisher’s exact tests [
We first queried PubMed with “MIR or MIRN or MIRNA or MICRORNA” and randomly picked up 100 MEDLINE abstracts with at least one miRNA identifier from the querying result as our evaluating data. We then investigated the reliability of detecting miRNAs in texts using the
Based on the network constructed by the data from miCancerna, a random walk with restart (RWRA) method is applied to prioritize cancer-related miRNAs.
RWRA is one of the random walk models widely used in disease gene discovery [
The performance of cancer-related miRNA prioritization by random walk with restart algorithm through miCancerna could be evaluated by calculating the area under the ROC through the leave-one-out cross-validation. For each training node, we took it as a candidate node and randomly picked 20 miRNAs not belonging to the same cancer as testing nodes and then prioritized them as above. For each threshold, the sensitivity (SN) and specificity (SP) are defined as follows:
In the first release, miCancerna records 1,018 associations between 226 miRNA families and 20 common cancers extracted from 7.2 million papers. Now all the data that miCancerna refers to can be freely assessed at
To check the text-mining quality, we randomly picked up 100 MEDLINE abstracts that contained at least one miRNA identifier from the search results by querying MEDLINE with “MIR or MIRN or MIRNA or MICRORNA.” A total of 739 miRNA identifiers were manually recognized in the texts of evaluating data, while our regular expression correctly matched 735 of them (true positive, TP), miscalled 2 (false positive, FP), and missed 4 (false negative, FN). So the miRNA annotation gained recall of 0.9946, precision of 0.9973, and
According to these comparison results, we concluded that miCancerna is a high-quality resource of miRNA-cancer associations.
To reveal the roles of miRNA in different cancers, we constructed a bipartite network with the top 5% associations based on Fisher’s exact test
Network illustrated significant associations of miRNAs and cancers. Red circles and green squares represent cancers and miRNAs, respectively, with different sizes according to the number of corresponding annotated papers (logarithmic). Each link represents a miRNA-cancer association with colour and width according to the strength of relationship.
As shown in Figure
It is interesting that four miRNA-cancer associations in top 10 (Table
Top 10 associates between miRNAs and cancers.
miRNA | Cancer | Papers |
|
---|---|---|---|
miR-15 | Leukaemia | 35 | 6.804 × 10−43 |
miR-16 | Leukaemia | 33 | 5.028 × 10−36 |
miR-122 | Liver cancer | 22 | 9.742 × 10−26 |
miR-181 | Leukaemia | 23 | 3.142 × 10−25 |
miR-155 | Non-Hodgkin lymphoma | 22 | 7.393 × 10−22 |
Let-7 | Lung cancer | 34 | 1.110 × 10−19 |
miR-223 | Leukaemia | 16 | 1.987 × 10−18 |
miR-17 | Non-Hodgkin lymphoma | 19 | 3.772 × 10−18 |
miR-21 | Breast cancer | 31 | 1.659 × 10−16 |
miR-221 | Thyroid cancer | 11 | 1.607 × 10−14 |
We applied RWRA on the network established by miCancerna to prioritize candidate cancer-related miRNAs, and the performance is evaluated by leave-one-out cross-validation. With a restart probability alpha of 0.9, the AUC of ROC curve can reach 0.798 (Figure
AUC value under different alpha.
Alpha | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 |
---|---|---|---|---|---|---|---|---|---|
AUC | 0.7952 | 0.7973 | 0.7974 | 0.7978 | 0.7981 | 0.7981 | 0.7983 | 0.7983 | 0.7984 |
ROC curves for RWRA on miCancerna and previous miRNA-cancer network.
Distribution of random AUC for miCancerna.
The top 5 potential miRNAs of each cancer are presented in Table
Top 5 potential miRNAs of 20 cancers.
Bladder cancer | Brain cancer | Breast cancer | Cervix cancer | ||||
---|---|---|---|---|---|---|---|
miRNAs | Confirm | miRNAs | Confirm | miRNAs | Confirm | miRNAs | Confirm |
miR-15 | Null | let-7 | Ref. [ |
miR-143 | dbDEMC | let-7 | Null |
miR-34 | Ref. [ |
miR-145 | Ref. [ |
miR-223 | dbDEMC | miR-221 | Null |
miR-16 | Ref. [ |
miR-16 | Ref. [ |
miR-203 | dbDEMC | miR-17 | Ref. [ |
miR-146 | Ref. [ |
miR-155 | Ref. [ |
miR-194 | dbDEMC | miR-125 | Null |
miR-155 | Ref. [ |
miR-143 | Ref. [ |
miR-100 | dbDEMC | miR-222 | Null |
|
|||||||
Colorectal cancer | Esophageal cancer | Kidney cancer | Leukemia | ||||
miRNAs | Confirm | miRNAs | Confirm | miRNAs | Confirm | miRNAs | Confirm |
|
|||||||
miR-221 | dbDEMC | miR-17 | dbDEMC | miR-125 | dbDEMC | miR-200 | Ref. [ |
miR-146 | dbDEMC | miR-222 | dbDEMC | miR-222 | dbDEMC | miR-205 | Null |
miR-29 | dbDEMC | miR-15 | dbDEMC | miR-146 | dbDEMC | miR-193 | Null |
miR-199 | dbDEMC | miR-125 | dbDEMC | miR-16 | dbDEMC | miR-9 | Ref. [ |
miR-193 | Null | miR-200 | dbDEMC | miR-143 | dbDEMC | miR-31 | Ref. [ |
|
|||||||
Liver cancer | Lung cancer | Melanoma | Myeloma | ||||
miRNAs | Confirm | miRNAs | Confirm | miRNAs | Confirm | miRNAs | Confirm |
|
|||||||
miR-205 | Null | miR-23 | dbDEMC | miR-21 | Ref. [ |
miR-145 | Null |
miR-27 | dbDEMC | miR-148 | dbDEMC | miR-145 | Ref. [ |
miR-200 | Null |
miR-124 | Ref. [ |
miR-27 | dbDEMC | miR-26 | Null | miR-221 | Ref. [ |
miR-520 | dbDEMC | miR-203 | dbDEMC | miR-143 | Ref. [ |
miR-34 | Null |
miR-203 | Ref. [ |
miR-520 | dbDEMC | miR-126 | Ref. [ |
miR-205 | Null |
|
|||||||
Non-Hodgkin lymphoma | Oral cancer | Ovarian cancer | Pancreatic cancer | ||||
miRNAs | Confirm | miRNAs | Confirm | miRNAs | Confirm | miRNAs | Confirm |
|
|||||||
miR-200 | dbDEMC | miR-15 | Null | miR-26 | Null | miR-16 | Ref. [ |
miR-205 | dbDEMC | miR-205 | Ref. [ |
miR-181 | Null | miR-125 | Ref. [ |
miR-126 | dbDEMC | miR-10 | Ref. [ |
miR-143 | Ref. [ |
miR-26 | Null |
miR-224 | dbDEMC | miR-182 | Null | miR-10 | Null | miR-126 | Ref. [ |
miR-23 | dbDEMC | miR-20 | Null | miR-23 | Null | miR-181 | Ref. [ |
|
|||||||
Prostate cancer | Stomach cancer | Thyroid cancer | Uterine cancer | ||||
miRNAs | Confirm | miRNAs | Confirm | miRNAs | Confirm | miRNAs | Confirm |
|
|||||||
miR-155 | dbDEMC | miR-155 | Ref. [ |
miR-15 | Null | miR-17 | dbDEMC |
miR-29 | Null | miR-29 | Null | miR-34 | Null | miR-222 | dbDEMC |
miR-30 | dbDEMC | miR-30 | Null | miR-145 | Ref. [ |
miR-224 | dbDEMC |
miR-10 | dbDEMC | miR-10 | Ref. [ |
miR-16 | Null | miR-30 | dbDEMC |
miR-199 | dbDEMC | miR-199 | Null | miR-205 | Ref. [ |
miR-106 | dbDEMC |
“Null” means we did not find experimental evidence.
We made comparisons with similar database or networks. First we compared the data involved in miCancerna and the manual checking database miR2Disease on the number of evidence papers. For most cancers, miCancerna provides much more evidence papers than miR2Disease (Table
Number of evidence papers comparing with miR2Diease.
Cancer types | miCancerna | miR2Disease | Increase |
---|---|---|---|
Bladder cancer | 14 | 11 | 27.27% |
Brain cancer | 35 | 3 | 1067% |
Breast cancer | 137 | 58 | 136.2% |
Cervix cancer | 11 | 4 | 175% |
Colorectal cancer | 81 | 39 | 107.7% |
Esophageal cancer | 16 | 7 | 128.6% |
Kidney cancer | 14 | 4 | 250.0% |
Leukemia | 146 | 45 | 224.4% |
Liver cancer | 99 | 39 | 153.8% |
Lung cancer | 112 | 37 | 202.7% |
Melanoma | 21 | 9 | 133.3% |
Myeloma | 9 | 3 | 200.0% |
Non-Hodgkin lymphoma | 62 | 13 | 376.9% |
Oral cancer | 19 | 0 | — |
Ovarian cancer | 47 | 18 | 161.1% |
Pancreatic cancer | 47 | 16 | 193.8% |
Prostate cancer | 61 | 19 | 221.1% |
Stomach cancer | 48 | 16 | 200.0% |
Thyroid cancer | 21 | 9 | 133.3% |
Uterine cancer | 28 | 5 | 460.0% |
These results indicate that miCancerna provides an alternative resource of miRNA-cancer associations.
In this study, we constructed a reliable miRNA-cancer network based on text-mining method, which is stored in the database miCancerna. In current release, there are 1,018 associations between 226 miRNA families and 20 common cancers. According to our test result, the miCancerna provides a reliable and comprehensive resource of miRNA-cancer associations, which can be further used in the identification of cancer-related miRNAs.
For future development, we plan to consider more types of cancers, add regulation information to the miRNA-cancer associations, and integrate miCancerna into other related databases, such as MISIM [
The authors declare that there is no conflict of interests regarding the publication of this paper.
Lun Li, Xingchi Hu, and Zhaowan Yang contributed equally to this work.
This study was supported by the National Natural Science Foundation of China