Long noncoding RNAs (lncRNAs) have an important role in various life processes of the body, especially cancer. The analysis of disease prognosis is ignored in current prediction on lncRNA–disease associations. In this study, a multiple linear regression model was constructed for lncRNA–disease association prediction based on clinical prognosis data (MlrLDAcp), which integrated the cancer data of clinical prognosis and the expression quantity of lncRNA transcript. MlrLDAcp could realize not only cancer survival prediction but also lncRNA–disease association prediction. Ultimately, 60 lncRNAs most closely related to prostate cancer survival were selected from 481 alternative lncRNAs. Then, the multiple linear regression relationship between the prognosis survival of 176 patients with prostate cancer and 60 lncRNAs was also given. Compared with previous studies, MlrLDAcp had a predominant survival predictive ability and could effectively predict lncRNA–disease associations. MlrLDAcp had an area under the curve (AUC) value of 0.875 for survival prediction and an AUC value of 0.872 for lncRNA–disease association prediction. It could be an effective biological method for biomedical research.
Long noncoding RNAs (lncRNAs) are noncoding RNA molecules, including miRNAs [
Many computational methods have been applied to human lncRNA–disease association prediction in recent years. These methods have two prominent features: machine-learning-based feature and network-based feature.
The machine-learning-based feature of lncRNA–disease association prediction is to establish a learning model in the training dataset and then to perform tests in the test dataset using this learning model. For instance, Zhao et al. [
The network-based feature of lncRNAs–disease association prediction is to establish a learning network of lncRNA–disease associations using known associations. For instance, Yang et al. [
The reviews of Chen et al. [
To overcome the aforementioned issues, a multiple linear regression model for lncRNA–disease association prediction based on clinical prognosis data (MlrLDAcp) was constructed to predict the potential associations between lncRNAs and diseases. At the same time, the survival time of patients with prostate cancer was also predicted in MlrLDAcp. The concepts of predictive correlation factor
The lncRNA expression data of prostate cancer was obtained from the lncRNAtor database (http://lncrnator.ewha.ac.kr). A total of 44 normal samples and 176 prostate cancer samples were obtained, and the prostate cancer samples were denoted in an ascending order according to sample ID (denoted by
The clinical prognosis data of 176 prostate cancer samples in Section
This study on lncRNA–disease associations was conducted from the following two aspects:
(a) A part of lncRNAs in the 481 lncRNAs, which were most closely related to prostate cancer, was screened out through the analysis of prognosis survival. Hence, a subset of
(b) The potential relationship between
In (
Schematic of
In (
The components (their value was 1) in
A component
Whether a new component was added to
Whether
A component
Whether a new component was removed from
The components in
The algorithm flow of computing
Algorithm flowchart of computing
Initialization was set.
Whether the maximal iterations were reached was judged; if true, then Step 5 was performed; otherwise, Step 2 was performed.
The multivariate linear regression analysis on
The calculation results of variation center
Distribution of the variation center on
Figure
Distribution of the decay coefficients of the
Table
Variation position, interval, and result of
| | | LncRNA number in |
---|---|---|---|
0 | ---- | 1–107 | |
1 | (38–441) | (1–37) | |
2 | (1–384) | (38–107) | |
3 | (106–469) | (108–177) | |
4 | (1–312) | (178–214) | |
5 | (184–487) | (178–183) | |
6 | (1–227) | (184–214) | |
7 | (276–481) | (184–214) | |
8 | (1–126) | (108–126) | |
9 | (389–474) | (108–126) | |
10 | (1–24) | (1–24) | |
Results of multiple linear regression analysis on
Serial number | Gene name | Coefficients | Serial number | Gene name | Coefficients |
---|---|---|---|---|---|
Intercept | --------- | 1.486e+03 | X120 | AMZ2P1 | 7.286e+07 |
X1 | AC017048.3 | –1.808e+08 | X122 | A2M-AS1 | –2.177e+08 |
X2 | KCP | 1.669e+09 | X125 | RP11-399O19.5 | –7.673e+07 |
X3 | RP11-342C23.4 | –6.523e+07 | X126 | SNHG16 | –7.531e+06 |
X9 | FAM222A-AS1 | –9.817e+07 | X127 | MIR143HG | 6.360e+07 |
X10 | PCA3 | 3.460e+05 | X130 | GABPB1-AS1 | –1.999e+08 |
X11 | CYP4F8 | –4.604e+06 | X131 | GGTA1P | 9.237e+07 |
X13 | RP11-627G23.1 | –7.704e+07 | X134 | CTD-2284J15.1 | 7.787e+07 |
X14 | RP11-279F6.1 | –2.877e+07 | X135 | KB-431C1.4 | 2.647e+07 |
X16 | RP11-279F6.1 | 6.888e+07 | X137 | RP11-66B24.4 | –3.950e+07 |
X18 | RP1-163G9.1 | 2.957e+08 | X138 | CBR3-AS1 | –3.629e+07 |
X19 | AC003090.1 | –2.135e+08 | X139 | MIR22HG | –3.956e+07 |
X20 | AP001626.1 | –8.890e+08 | X140 | DANCR | 2.628e+06 |
X22 | AC073133.1 | 1.038e+08 | X145 | RRN3P2 | 4.049e+08 |
X23 | RP11-401F24.4 | 6.834e+08 | X146 | LINC00654 | –4.514e+08 |
X25 | AC073343.13 | 6.070e+08 | X149 | ARHGEF26-AS1 | 3.073e+07 |
X26 | MAGI2-AS3 | –7.389e+07 | X150 | RMST | –9.826e+07 |
X27 | BOLA3-AS1 | 1.865e+08 | X151 | LINC00086 | –8.181e+07 |
X29 | C1orf126 | –8.830e+08 | X152 | NBPF8 | 1.050e+08 |
X30 | CTD-3199J23.4 | 3.565e+08 | X153 | CTD-2126E3.1 | –1.185e+07 |
X33 | FBXL19-AS1 | 1.895e+08 | X154 | AP001258.4 | 2.169e+07 |
X34 | RPL13P5 | –2.848e+08 | X157 | LINC00312 | 6.236e+08 |
X35 | RP11-412D9.4 | –1.784e+08 | X163 | RAET1K | –7.308e+08 |
X36 | ADAMTS9-AS2 | 2.616e+08 | X164 | PCBP1-AS1 | –3.683e+08 |
X108 | XKR5 | –1.608e+09 | X165 | RP11-1000B6.3 | 3.913e+08 |
X110 | HOXA-AS2 | 1.159e+08 | X169 | CTBP1-AS1 | 3.419e+07 |
X112 | CTC-308K20.1 | 2.544e+09 | X171 | BX004987.4 | 1.583e+08 |
X113 | BX284650.3 | 1.066e+08 | X172 | GAS5 | –1.064e+07 |
X114 | AC002055.4 | –1.601e+08 | X173 | RP11-166D19.1 | –1.299e+08 |
X116 | CD27-AS1 | 3.876e+07 | X176 | GBP1P1 | –2.108e+08 |
X118 | ATG9B | –1.542e+08 |
| Residual standard error | |
---|---|---|
Value | 46.42 on 115 degrees of freedom | 1.558e-10 |
AIC value among 10 iterations.
The prediction model (MlrLDAcp) proposed in this study had two potential aspects:
(a) The survival of cancer patients was predicted by combining with the multiple linear regression model of MlrLDAcp.
(b) The association between lncRNAs and diseases was predicted using MlrLDAcp.
The performance of evaluation was expanded from the two aforementioned aspects.
Receiver operating characteristic (ROC) analyses were performed to compare the predictive accuracies of prostate cancer samples between MlrLDAcp and Huang’s method [
ROC contrast curves of MlrLDAcp and Huang in predicting 5-year biochemical recurrence survival. The prediction accuracy of 5-year biochemical recurrence survival in MlrLDAcp improved by 4.2% (versus Huang).
The leave-one-out cross validation (LOOCV) was implemented on the gold standard dataset to compare MlrLDAcp and two state-of-the-art methods: LRLSLDA [
ROC contrast curves of MlrLDAcp and two state-of-the-art methods, LRLSLDA and KRWRH, in predicting lncRNA–disease associations. As can be observed, the prediction accuracy of lncRNA–disease associations in MlrLDAcp improved by 3.4% (versus KRWRH) and 5.0% (versus LRLSLDA).
In this study, a model of MlrLDAcp was constructed. MlrLDAcp took the expression quantity of lncRNAs transcript as an independent variable and the clinical prognosis data as a dependent variable. Using MlrLDAcp, 60 lncRNAs, which were most closely related to cancer prognosis information (survival time), were selected from 481 alternative lncRNAs. MlrLDAcp could realize not only the cancer survival prediction but also the lncRNA–disease association prediction.
Further research directions about lncRNA–disease association prediction are as follows.
(a) The lncRNA–disease association prediction should take into account clinical prognostic data in future investigations. The lncRNAs associated with diseases may have a clinical value as therapeutic targets. Hence, the clinical prognostic data is quite valuable to lncRNA–disease association prediction. The clinical implications and the mechanism underlying the association of lncRNAs with diseases are definitely worth exploring further.
(b) How to build an effective computational model to construct an lncRNA similarity function, which can reasonably integrate the similarity scores of different biological information, is worthy of further research.
(c) With the increase in lncRNA–disease correlation, the prediction accuracy can be further improved. Furthermore, most computing models rely heavily on unobtainable negative samples, which is an urgent problem to be solved.
(d) The new network-based computing model should be implemented on heterogeneous networks instead of single networks. Hence, more heterogeneous networks, such as lncRNA–disease network, disease similarity network, lncRNA functional similarity network, and lncRNA interactive networks, should be integrated in the future.
The data used to support the findings of this study are included within the article.
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This work was supported in part by NSFC (2017-2020, No. 51679058), the funds for the 2013–2016 China Higher Specialized Research Fund (PhD supervisor category) (No. 20132304110018), and the Fundamental Research Funds in Heilongjiang Provincial Universities (No. 135109249, No. 135109241).