^{1}

^{1}

^{1}

^{1}

Background and Objective. Current cardiovascular disease (CVD) risk models are typically based on traditional laboratory-based predictors. The objective of this research was to identify key risk factors that affect the CVD risk prediction and to develop a 10-year CVD risk prediction model using the identified risk factors. Methods. A Cox proportional hazard regression method was applied to generate the proposed risk model. We used the dataset from Framingham Original Cohort of 5079 men and women aged 30-62 years, who had no overt symptoms of CVD at the baseline; among the selected cohort 3189 had a CVD event. Results. A 10-year CVD risk model based on multiple risk factors (such as age, sex, body mass index (BMI), hypertension, systolic blood pressure (SBP), cigarettes per day, pulse rate, and diabetes) was developed in which heart rate was identified as one of the novel risk factors. The proposed model achieved a good discrimination and calibration ability with C-index (receiver operating characteristic (ROC)) being 0.71 in the validation dataset. We validated the model via statistical and empirical validation. Conclusion. The proposed CVD risk prediction model is based on standard risk factors, which could help reduce the cost and time required for conducting the clinical/laboratory tests. Healthcare providers, clinicians, and patients can use this tool to see the 10-year risk of CVD for an individual. Heart rate was incorporated as a novel predictor, which extends the predictive ability of the past existing risk equations.

Cardiovascular disease (CVD) describes various conditions that affect the functioning of heart/cardiovascular [

Majority of cardiovascular-related deaths are premature and preventable and can be improved by effective health management by employing effective diet plans, lifestyle interventions, and drug intervention [

In the past decades, a great deal of research has been done on the CVD risk estimation such as the Framingham risk scores from the Framingham Heart Study (FHS) [

However, challenges and issues regarding the development of CVD risk estimation models still exist. CVD risk models [

The study population selected from the Framingham Original Cohort study dataset [

CVD event distribution in male and female.

Count. | CVD Events | Age Range | |
---|---|---|---|

Male | 2294 | 1560 | 30 - 74 |

Female | 2785 | 1629 | 30 - 74 |

Total | 5079 | 3189 | 30 - 74 |

There are 32 exams in the Framingham Original Cohort study dataset, as shown in Appendix

Description of candidate predictors.

ORDERS | PREDICTORS | UNITS | TYPES |
---|---|---|---|

1 | AGE | YEARS | CONTINUOUS |

| |||

2 | SEX | 0001 MALE | CATEGORICAL |

| |||

3 | BMI | KG/M2 | CONTINUOUS |

| |||

4 | HYPERTENSION | 0000 NEGATIVE | CATEGORICAL |

| |||

5 | HISTORY OF NERVOUS HEART | 0000 NO | CATEGORICAL |

| |||

6 | HISTORY OF PERICARDITIS | 0000 NO | CATEGORICAL |

| |||

7 | HISTORY OF OTHER CVD | 0000 NO | CATEGORICAL |

| |||

8 | PREMATURE BEATS | 0000 NO | CATEGORICAL |

| |||

9 | HISTORY OF ATRIOVENTRICULAR BLOCK | 0000 NO | CATEGORICAL |

| |||

10 | HISTORY OF RHEUMATIC FEVER | 0000 NONE | CATEGORICAL |

| |||

11 | HISTORY OF ALLERGY OR ASTHMA | 0000 NEGATIVE | CATEGORICAL |

| |||

12 | HISTORY OF THYROID DISEASE | 0000 NEGATIVE | CATEGORICAL |

| |||

13 | HISTORY OF SUBACUTE ENDOCARDITIS | 0000 NO | CATEGORICAL |

| |||

14 | BLOOD PRESSURE SYSTOLIC | MM HG | CONTINUOUS |

| |||

15 | BLOOD PRESSURE DIASTOLIC | MM HG | CONTINUOUS |

| |||

16 | CIGARETTES PER DAY | LAPSE, FORM 8/50 | CONTINUOUS |

| |||

17 | CIGARS PER DAY | LAPSE, FORM 8/50 | CONTINUOUS |

| |||

18 | PIPERS PER DAY | LAPSE, FORM 8/50 | CONTINUOUS |

| |||

19 | PULSE RATE | PER MINUTE | CONTINUOUS |

| |||

20 | DIABETES | 0000 NO | CATEGORICAL |

Cox proportional hazard regression analysis [

Statistical analyses were performed in R Studio platform [

For candidate predictors listed in Table

In the validation stage, two approaches were undertaken to assess the predictive ability of our fitted model, statistical validation, and empirical validation. The statistical validation was performed with respect to both discrimination and calibration. The empirical validation was defined as an empirical comparison with a general CVD risk prediction model (the Framingham office-based risk equation [

Risk factors included in the risk model are age, sex, body mass index (BMI), hypertension, systolic blood pressure (SBP), cigarettes per day, pulse rate, the status of diabetes. Characteristics of risk factors were listed in Table

Summary statistics for risk factors used in risk model.

Predictors | Variables | Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
---|---|---|---|---|---|---|---|

AGE | Age | 28 | 37 | 44 | 44.15 | 51 | 74 |

SEX | Sex | 1 | 1 | 2 | 1.548 | 2 | 2 |

BMI | Bmi | 14.12 | 22.66 | 25.17 | 25.61 | 27.92 | 56.68 |

HYPERTENSION | Hyp | 0 | 0 | 0 | 0.147 | 0 | 1 |

BLOOD PRESSURE SYSTOLIC | Bps | 84 | 122 | 136 | 138.6 | 150 | 270 |

CIGARETTES PER DAY | Cgrpd | 0 | 5 | 20 | 16.26 | 20 | 60 |

PULSE RATE | Pr | 37 | 67 | 75 | 75.61 | 83 | 170 |

DIABETES | Dia | 0 | 0 | 0 | 0.0197 | 0 | 1 |

The regression coefficients, hazard ratios, and their corresponding upper and lower 95% confidence intervals (CI) were estimated, as presented in Table

Regression coefficients and hazard ratios in risk model.

Predictors | Variables | coef | Hazard Ratio | lower .95 | upper .95 |
---|---|---|---|---|---|

AGE | log of age | 2.083643 | 8.033686 | 6.4082 | 10.0716 |

SEX | sex | -0.469719 | 0.625178 | 0.5787 | 0.6754 |

BMI | log of bmi | 0.608864 | 1.838342 | 1.4368 | 2.3521 |

HYPERTENSION | hyp | 0.241461 | 1.273108 | 1.1342 | 1.429 |

BLOOD PRESSURE SYSTOLIC | log of bps | 1.682571 | 5.37937 | 3.7938 | 7.6277 |

CIGARETTES PER DAY | cgrpd | 0.009669 | 1.009716 | 1.0065 | 1.013 |

PULSE RATE | log of pr | -0.30209 | 0.739271 | 0.5879 | 0.9297 |

DIABETES | dia | 1.087501 | 2.96685 | 2.3244 | 3.7869 |

Baseline hazard and survival at 10 years.

Covariates at mean value | Covariates equal to zero | |
---|---|---|

Baseline hazard estimate | 0.1023354 | 0.001863652 |

Baseline survival estimate | 0.9027267 | 0.9981381 |

The Cox model has an exponential form (see Equation (

So, the Cox model can be written as a survival function:

A general formula for computing risk estimates has the following form:

where H(t) is the CVD risk estimated for an individual; S0(t) is baseline survival rate at follow-up time t, where t = 10 years (see Table

A nomogram is a two-dimensional diagram to represent a mathematical function involving several predictors [

Nomogram for predicting overall survival in 10 years.

In Figure

The validation of the proposed predictive risk model was performed using traditional statistics. C-index (also called receiver operating characteristic (ROC) area) [

Then, we performed an empirical validation by comparing our risk model with the Framingham Heart Study model in an external dataset horizontally and longitudinally over time. In the horizontal validation process, there were 2786 samples in the external dataset, and 1693 samples have got a CVD event. Risk scores using the FHS model and the proposed risk model were computed separately. Statistics of

Horizontal comparison between Cox model and FHS model.

In the longitudinal validation process, we selected four sex-specific subjects with or without CVD at the end of the Framingham Study. A summary of these four subjects is listed in Table

Data summary for samples in the longitudinal validation.

Samples | Gender | CVD | Diabetes |
---|---|---|---|

Sample 1 | Male | ||

Sample 2 | Male | ✓ | ✓ |

Sample 3 | Female | ||

Sample 4 | Female | ✓ | ✓ |

Exams in the Framingham Original Cohort study data set.

Exams | Exam Date Range | Age Range | Mean Age | Attendees |
---|---|---|---|---|

Exam 1 | 1948 - 1953 | 28 - 74 | 44 | 5209 |

Exam 2 | 1950 - 1955 | 31 - 65 | 46 | 4792 |

Exam 3 | 1952 - 1956 | 32 - 67 | 48 | 4416 |

Exam 4 | 1954 - 1958 | 34 - 69 | 50 | 4541 |

Exam 5 | 1956 - 1960 | 37 - 70 | 52 | 4421 |

Exam 6 | 1958 - 1963 | 38 - 72 | 54 | 4259 |

Exam 7 | 1960 - 1964 | 40 - 74 | 55 | 4191 |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

Exam 14 | 1975 - 1978 | 55 - 88 | 68 | 2871 |

Exam 15 | 1977 - 1979 | 57 - 89 | 69 | 2632 |

Exam 16 | 1979 - 1982 | 59 - 91 | 70 | 2351 |

Exam 17 | 1981 - 1984 | 61 - 93 | 72 | 2179 |

Exam 18 | 1983 - 1985 | 63 - 94 | 74 | 1825 |

Exam 19 | 1985 - 1988 | 65 - 96 | 75 | 1541 |

Exam 20 | 1986 - 1990 | 67 - 97 | 77 | 1401 |

Exam 21 | 1988 - 1992 | 69 - 99 | 79 | 1319 |

Exam 22 | 1990 - 1994 | 72 - 101 | 80 | 1166 |

Exam 23 | 1992 - 1996 | 73 - 101 | 81 | 1026 |

Exam 24 | 1995 - 1998 | 76 - 103 | 83 | 831 |

Exam 25 | 1997 - 1999 | 78 - 104 | 84 | 703 |

Exam 26 | 1999 - 2001 | 79 - 103 | 86 | 558 |

Exam 27 | 2002 - 2003 | 82 - 104 | 87 | 414 |

Exam 28 | 2004 - 2005 | 84 - 104 | 89 | 303 |

Exam 29 | 2006 - 2007 | 85 - 102 | 91 | 218 |

Exam 30 | 2008 - 2010 | 88 - 102 | 92 | 141 |

Exam 31 | 2010 - 2011 | 90 - 99 | 92 | 91 |

Exam 32 | 2012 - 2014 | 93 - 106 | 96 | 40 |

For each sample, data with fixed time intervals (approximately two years) from longitudinal time follow-up are extracted. The data from five exams (Exam 8, Exam 9, Exam 10, Exam 11, and Exam 12) are extracted for comparison. Data summary for sample 1, sample 2, sample 3, and sample 4 are listed in Appendix

Longitudinal validation.

It is widely accepted that CVD has become one of the significant public health issue globally [

Motivated by the objective of early detection and risk estimation of CVD, the present study was designed to identify novel CVD risk factors, determine the effect of these factors, and then develop a risk prediction model based on the identified factors. Although risk factors could vary from one specific CVD component to another, there is sufficient evidence that different types of CVD have commonalities of risk factors. We developed and validated a 10-year risk equation for CVD risk using follow-up data rigorously measured by the Framingham Heart Study.

This investigation extends the number of risk factors by the previous general CVD risk formulations, incorporating heart rate to estimate absolute CVD risk. The approach used in this research is based on advanced statistical techniques that allow reducing the bias in the assessment of true CVD risk. The whole process of data analysis strictly follows the guideline of regression modelling strategies and survival analysis [

We use continuous variables (age, BMI, SBP, and pulse rate) to generate the model that performs better than other similar models developed using categorical variables. Compared with simpler approaches that try to make inferences of 5-year and 10-year risk models such as the model based on logistic regression analysis [

The old version Framingham general CVD risk function [

Risk models formulated by using machine learning or data mining techniques have incorporated heart rate as a risk factor but tools that can predict CVD absolute risk are fewer. For example, a prediction tool [

Some equations only focused on specific CVD outcomes. The Europe SCORE project equations were developed for the fatal cardiovascular event [

Moreover, compared with the laboratory-based algorithms, the present research proposed a more straightforward way to estimate 10-year CVD risk based on risk factors. An individual can assess his or her CVD risk during an office visit or his monitoring of the combination of risk factors in the risk model, either manually or use some devices like wearable sensors.

The CVD risk prediction model could be implemented at the primary care for population analysis and identifying the high-risk individual. This would be a transformation in healthcare management of CVD at an individual as well as at a population level. However, with a small event size of diabetes, caution must be applied to the practice of this risk model. Even though we have used multiple imputation methods to impute the missing values for diabetes, the original feature of data in-balance, which decides that the imputed data frame for the “diabetes” might still have a data in-balance there. Advanced imputation methods need to be considered in the future for avoiding unexpected outcome caused by the diabetes data in-balance.

Our research aims to provide a CVD prediction model based on key risk factors, so that it can be used at the point-of-care for better and informed decision making. Thus, risk factors based on a clinical test such as total cholesterol, HDL cholesterol were not included, but some of these risk factors have a substantial effect on the development of CVD. We have provided a valid framework for creating a risk model using the Cox regression model; future work should consider risk factors not included in our model at this moment. Thus, expanding more predictors into the risk model is an important issue for future research.

The proposed study devised a risk prediction model based on multivariable predictors. A novel risk factor “heart rate” was incorporated into this risk equation by conventional risk factors. A satisfying predictive ability with C-index (AUROC) of 0.71 was obtained, which ensures the accuracy of estimating risk scores. Compared with studies focusing on specific diseases, the proposed algorithm can be applied to measure the 10-year risk of CVD. Health care professionals, public health physicians, practice managers, and individuals can run the proposed model to quantify risk at a population level, during patient consultation and identify high-risk individuals for further preventive health care for the entire practice.

See Table

See Tables

Exam data for Sample 1: male without CVD.

Exams | age | bmi | bps | pr | cgrpd | trt | hyp | dia | smk |
---|---|---|---|---|---|---|---|---|---|

Exam 8 | 44 | 26.386894 | 120 | 82 | 40 | 0 | 0 | 0 | 1 |

Exam 9 | 45 | 26.826676 | 120 | 80 | 0 | 0 | 0 | 0 | 0 |

Exam 10 | 47 | 27.467643 | 118 | 70 | 20 | 0 | 0 | 0 | 1 |

Exam 11 | 49 | 28.222249 | 110 | 76 | 44 | 0 | 0 | 0 | 1 |

Exam 12 | 52 | 28.675012 | 110 | 80 | 50 | 0 | 0 | 0 | 1 |

Exam data for Sample 2: male with CVD and diabetes.

Exams | age | bmi | bps | pr | cgrpd | trt | hyp | dia | smk |
---|---|---|---|---|---|---|---|---|---|

Exam 8 | 45 | 27.74258 | 132 | 83 | 20 | 0 | 0 | 0 | 1 |

Exam 9 | 47 | 26.26118 | 124 | 80 | 20 | 0 | 0 | 0 | 1 |

Exam 10 | 49 | 27.664352 | 130 | 78 | 20 | 0 | 1 | 0 | 1 |

Exam 11 | 51 | 27.121914 | 130 | 90 | 20 | 0 | 1 | 0 | 1 |

Exam 12 | 53 | 24.816551 | 122 | 82 | 20 | 0 | 0 | 1 | 1 |

Exam data for Sample 3: female without CVD.

Exams | age | bmi | bps | pr | cgrpd | trt | hyp | dia | smk |
---|---|---|---|---|---|---|---|---|---|

Exam 8 | 44 | 20.776333 | 110 | 70 | 20 | 0 | 0 | 0 | 1 |

Exam 9 | 46 | 20.265439 | 120 | 70 | 20 | 0 | 0 | 0 | 1 |

Exam 10 | 48 | 22.312012 | 118 | 73 | 20 | 0 | 0 | 0 | 1 |

Exam 11 | 50 | 21.797119 | 114 | 82 | 20 | 0 | 0 | 0 | 1 |

Exam 12 | 52 | 21.797119 | 130 | 76 | 20 | 0 | 0 | 0 | 1 |

Exam data for Sample 4: female with CVD and diabetes.

Exams | age | bmi | bps | pr | cgrpd | trt | hyp | dia | smk |
---|---|---|---|---|---|---|---|---|---|

Exam 8 | 46 | 21.793044 | 130 | 65 | 3 | 0 | 1 | 0 | 1 |

Exam 9 | 48 | 21.967388 | 170 | 75 | 16 | 0 | 1 | 0 | 1 |

Exam 10 | 50 | 22.494583 | 140 | 60 | 8 | 0 | 1 | 0 | 1 |

Exam 11 | 53 | 22.31746 | 140 | 63 | 8 | 0 | 1 | 0 | 1 |

Exam 12 | 54 | 23.380197 | 160 | 58 | 2 | 1 | 1 | 1 | 1 |

Here, we take a specific subject to illustrate the process of risk score calculation. This sample is a 44-year-old man not having diabetes and hypertension. He has a systolic blood pressure of 120 mm Hg, pulse rate of 82 per minute, BMI of 26.38689413 kg/

Data summary for the subject 15018644.

PREDICTORS | VALUES | UNITS |
---|---|---|

AGE | 44 | YEARS |

SEX | 1 | MALE |

BMI | 26.38689413 | KG/M2 |

HYPERTENSION | 0 | NO |

TREATMENT OF HYPERTENSION | 0 | NO |

BLOOD PRESSURE SYSTOLIC | 120 | MM HG |

CIGARETTES PER DAY | 40 | LAPSE |

SMOKING | 1 | YES |

PULSE RATE | 82 | PER MINUTE |

DIABETES | 0 | NO |

| | |

| |

The risk estimate based on the Cox model is calculated as follows:

The cardiovascular disease (CVD) data used to support the findings of this study were supplied by Framingham Heart Study-Cohort (FHS-Cohort) under license and so cannot be made freely available. Requests for access to these data should be made with Open BioLINCC Studies Group through this website

The main contribution of the present study is developing a risk prediction model for early detection of CVD. More specifically, the contribution can be summarized in four major respects: firstly, a novel risk factor “heart rate” was identified as significant for the development of CVD; secondly, an CVD risk prediction model aiming for early detection of CVD was developed based on various risk factors; thirdly, an absolute risk score in 10 years of CVD can be calculated using this risk model; lastly, multiple forms of the risk estimation of CVD, namely risk equation and nomogram, were also developed.

Authors declare no conflicts of interest.

All authors contributed equally.