Machine learning has become a powerful tool for analysing medical domains, assessing the importance of clinical parameters, and extracting medical knowledge for outcomes research. In this paper, we present a machine learning method for extracting diagnostic and prognostic thresholds, based on a symbolic classification algorithm called REMED. We evaluated the performance of our method by determining new prognostic thresholds for well-known and potential cardiovascular risk factors that are used to support medical decisions in the prognosis of fatal cardiovascular diseases. Our approach predicted 36% of cardiovascular deaths with 80% specificity and 75% general accuracy. The new method provides an innovative approach that might be useful to support decisions about medical diagnoses and prognoses.
Machine learning (ML) disciplines provide computational methods and learning mechanisms that can help generate new knowledge from large databases. Applications of ML are useful for constructing approaches to solving problems of classification, prediction, recognition patterns, and knowledge extraction, where the data take the form of a set of examples, and the output takes the form of prediction of new examples [
To successfully implement an ML application in problems related to clinical decisions, it is necessary to consider some specific requirements [
Another important aspect to consider is the need to obtain interpretable approximations, in order to provide medical staff with useful information about the given problem. This is typically achieved using symbolic learning methods (e.g., decision trees and rules systems), which allow decisions to be explained in an easily comprehensible manner. However, the use of a symbolic learning algorithm to obtain a more comprehensible model frequently sacrifices accuracy in the prediction.
Another problem that often hinders high overall performance in the analysis of medical datasets is that generally these exhibit an unbalanced class distribution [
In addition to developing ML approaches that result in good overall performance and provide medical staff with interpretable prognostic information, providing the ability to support decisions and to reduce the number of medical tests for a reliable prognosis are also desirable. A measure of reliability of the diagnosis or prognosis is also important, because this would give medical staff sufficient confidence to put the new approach into practice. On the other hand, it is also desirable to have an approach that can provide reliable predictions based on a small amount of information about the patient, because collection of that information is often expensive, possibly subject to privacy issues, time consuming, and possibly harmful to the patient [
The present study focused on the implementation of a ML method to support medical decisions in the prognosis of fatal cardiovascular diseases, which are ranked among the top ten in the global disease burden [
Currently available ABP monitors are fully automatic and portable devices (Figure
Ambulatory blood pressure monitoring procedure.
The MAS is an ongoing population-based, longitudinal study that includes 2500 subjects older than 55 years, residing in the Santa Lucia County, Maracaibo, Venezuela. All participants underwent extensive clinical and laboratory examinations and randomly selected individuals also underwent ABP monitoring. Informed consent was obtained from the subjects who agreed to participate, and from a close family member when doubts existed about the competence of the subject. The ethical review board of the Institute of Cardiovascular Diseases of the University of Zulia approved the protocol.
The leading global risk factor for mortality is high BP, which is responsible for 13% of deaths globally. Eight changeable risk factors (alcohol use, tobacco use, high BP, high body mass index, high cholesterol, high blood glucose, low fruit and vegetable intake, and physical inactivity) account for 61% of cardiovascular deaths. Combined, these same risk factors account for over three quarters of ischaemic heart disease, the leading cause of death worldwide [
However, investigators continue to look for new and emerging risk factors for cardiovascular disease. Recent ABP monitoring studies using a novel variability index [
To estimate 24-hour BP level, we computed the weighed mean of valid BP readings (WBP) using the time interval between successive valid measurements as weighting factors [
We implemented an interdisciplinary ML method that encompassed all stages of knowledge extraction from databases (data preprocessing, attribute selection, data mining, and knowledge extraction), to examine the application of ML to support clinical decisions (Figure
Machine learning method proposed.
To improve the accuracy of predictions for affected subjects (positive class), we used the Rule Extraction for MEdical Diagnosis (REMED) algorithm [
First, REMED attempts to select the best combination of relevant attributes, using a simple logistic regression model. This is a standard method of analysis in medical research that uses the odds ratio metric [
where <relation> is either ≥ or ≤ depending on whether
To avoid overfitting during the training and testing phase, REMED implements the k-fold cross validation technique, which is based on randomly shuffling sample vectors among training and testing spaces [
Based on current medical guidelines [
Only continuous and changeable attributes were considered in the knowledge extraction process. Continuous attributes have a higher degree of uncertainty than discrete attributes, because discrete attributes are usually binary in the clinical environment (e.g., smoker versus nonsmoker), and their associations with specific diseases are almost always well known. We also excluded age, which is a nonchangeable attribute. The attributes considered in the initial ML analysis were body mass index (BMI), serum cholesterol level, 24-hour heart rate, and systolic and diastolic 24-hour WBP and ARV.
The minable dataset was composed of 551 observations with 7 attributes, with only 43 missing values (1.1%) in the serum cholesterol attribute. The missing data were estimated from the regression slope on sex and age, according to the criteria of physician specialists. The sample included 374 women (67.8%) and 170 patients (30.9%) undergoing treatment with antihypertensive drugs (Table
Baseline characteristics.
Frequency in percent or median | |
---|---|
Demographic variables | |
Men, % ( |
32.1 (177) |
Age, years |
|
Race, % ( |
|
Mixed | 73.1 (404) |
Caucasian | 22.2 (122) |
African-Venezuelan | 4 (22) |
Natives | 0.5 (3) |
Use of antihypertensive drugs, % ( |
30.9 (170) |
Use of anti-diabetic drugs, % ( |
11.1 (61) |
History of cardiovascular disease, % ( |
11.5 (63) |
Diagnosis of diabetes mellitus, % ( |
18.1 (100) |
Lifestyle, physical and lipid factors | |
Smoking current status, % ( |
15.6 (86) |
Drinking current status, % ( |
31.6 (174) |
Body max index, kg/m2 |
|
Total serum cholesterol, mmol/L |
|
24-hour ambulatory measurements | |
Systolic blood pressure, mm Hg |
|
Diastolic blood pressure, mm Hg |
|
Heart rate, bpm |
|
The median follow-up period was
Using the simple logistic regression model, REMED found only two attributes significantly associated with the positive class: systolic WBP (
To provide medical staff with more information and comprehensible models, we used REMED to build several simple rule systems, which included individual and combined predictions of the more significant attributes (systolic WBP and ARV), as well as the combined predictions with the additional risk factors.
The confusion matrix from the predictions of the system rule, combining only high systolic ARV and WBP and using 10-fold cross-validation, indicated that REMED performed at 0.36 sensitivity, correctly diagnosing more than 35% of the cardiovascular deaths (Table
Confusion matrix of REMED predictions.
Predictive class | |||
---|---|---|---|
Positive | Negative | ||
Actual class | Positive | 22 | 39 |
Negative | 98 | 392 |
Performance of classifiers.
Classifiers | Sensitivity | Specificity | Accuracy |
---|---|---|---|
|
55.7% | 60.4% | 59.9% |
|
52.5% | 58.8% | 58.08% |
|
36.1% | 80.0% | 75.1% |
|
8.2% | 93.3% | 83.8% |
|
9.8% | 93.3% | 84.0% |
|
22.9% | 87.5% | 80.4% |
|
11.48% | 95.92% | 86.57% |
Use of the REMED algorithm selecting only the more significant attributes provided some of the desired features for solving medical diagnosis/prognosis problems:
The ML approach generated a new prognostic threshold for cardiovascular mortality: systolic WBP ≥ 137 mmHg, which is lower than the currently proposed by hypertension guidelines (≥140 mmHg) and in agreement with recent ABP studies [
Both systolic WBP and ARV were independent predictors of cardiovascular mortality, performed >50% of sensitivity, but sacrificed significantly in specificity and general accuracy (≤60%). The addition of other well-known cardiovascular risk factors decreased considerably the accuracy in the prediction of affected subjects (<23%). Therefore, the use of logistic regression for the selection of significant attributes (>99%) could be an effective strategy in this stage of ML analysis in medical datasets.
Undoubtedly, one of the most important goals of the application of ML in the medical field is to generate new knowledge, providing the medical community with tools to develop novel points of view about any given problem. In our case, for example, although previous medical studies determined possible ranges of a low and high BPV measured whit ARV through statistical methods (median and quartiles analysis) [
We do not claim that our ML analysis using REMED is the ultimate solution for medical diagnosis/prognosis problems from unbalanced datasets, because it is necessary to implement modifications that improve REMED’s predictive capacity in terms of sensitivity (≥50%) without significantly deteriorating its specificity. However, we obtained better results than the Naïve Bayes classifier (11.48%), which is considered as a benchmark algorithm that in any medical domain has to be tried before any other advanced method [
The authors are grateful to the referees for their detailed review on the paper and thoughtful comments. This paper was supported by the Secretaria de Educación Pública, México DF, México (PROMEP/103-5/11/4145). The Maracaibo Aging Study was funded by the Venezuelan Grant FONACIT G-97000726, FundaConCiencia, and by Award no. R01AG036469 from the National Institute on Aging.