Development and Validation of a Novel Model for Predicting the 5-Year Risk of Type 2 Diabetes in Patients with Hypertension: A Retrospective Cohort Study

Background Hypertension is now common in China. Patients with hypertension and type 2 diabetes are prone to severe cardiovascular complications and poor prognosis. Therefore, this study is aimed at establishing an effective risk prediction model to provide early prediction of the risk of new-onset diabetes for patients with a history of hypertension. Methods A LASSO regression model was used to select potentially relevant features. Univariate and multivariate Cox regression analyses were used to determine independent predictors. Based on the results of multivariate analysis, a nomogram of the 5-year incidence of T2D in patients with hypertension in mainland China was established. The discriminative capacity was assessed by Harrell's C-index, AUC value, calibration plot, and clinical utility. Results After random sampling, 1273 and 415 patients with hypertension were included in the derivation and validation cohorts, respectively. The prediction model included age, body mass index, FPG, and TC as predictors. In the derivation cohort, the AUC value and C-index of the prediction model are 0.878 (95% CI, 0.861-0.895) and 0.862 (95% CI, 0.830-0.894), respectively. In the validation cohort, the AUC value and C-index of the prediction model were 0.855 (95% CI, 0.836-0.874) and 0.841 (95% CI, 0.817-0.865), respectively. The calibration plots demonstrated good agreement between the estimated probability and the actual observation. Decision curve analysis shows that nomograms are clinically useful. Conclusion Our nomogram can be used as a simple, affordable, reasonable, and widely implemented tool to predict the 5-year T2D risk of hypertension patients in mainland China. This application helps timely intervention to reduce the incidence of T2D in patients with hypertension in mainland China.


Introduction
Type 2 diabetes (T2D) is a form of hyperglycemia caused by insulin resistance and related insulin deficiency, accounting for 90%-95% of all diabetes [1]. The growing burden of T2D has become the focus of global public health attention in the 21st century [2]. Previously, T2D was the most common in the rich "western" developed countries [3]. But at present, T2D has become an epidemic in developing countries [4]. China is among the most populous countries in the world, and the country with the most diabetes cases in the world, with an estimated 120.9 million adults suffering from diabetes [5]. T2D patients have to bear high medical expenses every year. It is estimated that the global medical expenditure for T2D in 2017 was approximately $850 billion [6]. In addition, T2D can cause a variety of complications, such as diabetic nephropathy, cardiovascular and cerebrovascular diseases, and diabetic retinopathy [3]. There are numerous causes of T2D. Although the specific reasons are not clear, a large number of studies have pointed out that T2D may be related to lifestyle, genetic susceptibility, and environmental factors [4].
High blood pressure is the single only risk factor responsible for the global burden of disease, and 9.4 million people worldwide die of hypertension every year [7]. It is pointed out that hypertension and T2D often coexist, and about 70% of T2D patients also have hypertension [8]. Patients with a history of hypertension are nearly 2.5 times more likely to develop T2D than normal people [9]. Compared with patients with hypertension or T2D alone, patients with hypertension and T2D have an increased risk of all-cause and cardiovascular death [10]. Analysis of the Framingham cohort showed that patients with elevated blood pressure at the time of T2D diagnosis had higher rates of cardiovascular disease (CVD) and all-cause mortality than patients without T2D [11].
Due to the greatly increased prevalence of diabetes in patients with a history of hypertension, it has caused huge economic costs and serious complications. Therefore, the purpose of this study is to establish an effective risk prediction model, which can provide an early prediction for the risk of new-onset diabetes in patients with a history of hypertension, provide timely intervention measures to prevent or delay the occurrence of T2D, and ultimately reduce the adverse cardiovascular prognosis of such patients.

Study Population and Follow-Up Evaluation.
The data for this study can be for download from a shared database called the Dryad Digital Repository (http://www.datadryad.org/), with an identifier of 10.5061/dryad.ft8750v. The database is funded by the National Science Foundation of the United States. It is a place to store high-quality data resources to discover, reuse, and cite data behind scientific publications. Its goal is to form an academic exchange system with academic groups, publishing, research and education institutions, funding agencies, and other stakeholders to coordinate, maintain, and promote the protection and reuse of basic data in academic literature. The original data was provided by Chen et al. [12]. The raw data was extracted from a computer database established by Rich Healthcare Group, which includes all medical records of participants who underwent medical examinations in 32 locations and 11 cities in China from 2010 to 2016. All participants were older than 20 years of age and had at least two visits between 2010 and 2016 (n = 685277). They are not included in the baseline if any of the following conditions are met: participants have no available weight and height measurements, no available gender information, no available blood pressure values, missing questionnaire data (including smoking status, drinking status, and family history of diabetes), extreme body mass index (BMI) values (<15 kg/m 2 or >55 kg/m 2 ), no fasting plasma glucose value (FPG), no serum triglycerides (TG), no total cholesterol (TC), no low-density lipoprotein cholesterol (LDL-C), no alanine aminotransferase (ALT), no aspartate aminotransferase (AST), no blood urea nitrogen (BUN), no creatinine clearance (CCR), and no high-density lipoprotein cholesterol (HDL-C). We further excluded subjects who were followed up for less than 2 years; baseline subjects diagnosed with diabetes, coronary heart disease, heart failure, and valvular heart disease; and subjects with uncertain diabetes status at follow-up. A total of 1688 baseline patients with complete information and history of hypertension but no diabetes were obtained. All subjects underwent at least two follow-up visits at intervals of ≥2 years between 2010 and 2016. We downloaded the raw data and performed a secondary analysis. In order to establish and validate the prediction model, a simple random method was used to randomly select 75% of the patients as the derivation cohort and the remaining 25% as the validation cohort.

Variable Measurement.
Well-trained staff obtained detailed demographic characteristics, lifestyle, past medical history, and family history of chronic diseases by conducting detailed questionnaires on each subject. Height, weight, and blood pressure were measured by trained staff. Using a height and weight scale to measure height to the nearest 0.1 cm and weight to the nearest 0.1 kg. The scale was calibrated before use, with participants wearing light, shoeless clothing. Body mass index (BMI) was calculated based on body weight ðkgÞ /height 2 ðm 2 Þ. After at least 5 minutes of rest, blood pressure was measured by a standard mercury sphygmomanometer. Prior to each visit, the respondents fasted for at least 10 hours overnight. Venous blood samples were collected after fasting and processed within 2 hours. Serum TG, TC, LDL-C, HDL-C, ALT, AST, BUN, and CCR were measured on Beckman 5800. Plasma glucose levels were measured on an automatic analyzer (Beckman 5800) by the glucose oxidase method.

Data Collection.
The variables for each case were extracted from the raw data as follows: age, gender, fasting blood glucose (FPG), BMI, systolic blood pressure (SBP), diastolic blood pressure (DBP), TC, TG, HDL-C, LDL-C, ALT, AST, BUN, CCR, smoking status, drinking status, family history of diabetes, years of follow-up, and eventual diagnosis of diabetes. Incomplete records were excluded.

Definitions.
Hypertension is defined as SBP ≥ 140 mmHg or/and DBP ≥ 90 mmHg. Smoking status can be divided into three categories: never smoker, ever smoker, or now smoker. Drinking status is divided into three categories: never drinking, ever drinking, or now drinking. The family history of diabetes was divided into the positive and negative groups. During the follow-up period, FPG ≥ 7:00 mmol/L or/and self-reported diabetes can be diagnosed as sudden diabetes. Patients were checked at the time of diagnosis of diabetes or the last visit, whichever came first.
2.5. Feature Selection. The least absolute shrinkage and selection operator (LASSO) logistic regression algorithm is a punitive regression method. It estimates the regression coefficient by maximizing the logarithmic likelihood function and limits the sum of the absolute values of the regression coefficients [13]. The regression coefficient estimated by LASSO is very sparse, and many components are exactly 0 [14]. Therefore, the LASSO automatically removes unnecessary covariates. The LASSO logistic regression algorithm can be used for regression analysis of high-dimensional data [15]. In this study, the LASSO logistic regression algorithm was used to select the most important prediction features in 2 BioMed Research International the derivation data set [16]. All category variables are converted to dummy variables. The state of T2D was used as a dependent variable. Cross-validation was used to determine the appropriate adjustment parameter (λ) for LASSO logistic regression [13,15].
2.6. Statistical Analysis. The continuous variables with normal distribution are expressed as mean ± standard deviation , and category variables are expressed as frequency (percentage). In the derivation cohort, the multivariate Cox proportional risk regression analysis was used to further evaluate the factors with statistical significance in the univariate analysis. The features selected in the derivation data set using the LASSO algorithm were utilized to construct a nomogram through a multivariate Cox regression analysis. A nomogram was utilized to show the risk prediction model of new-onset T2D in patients with hypertension. The prediction model was evaluated from three aspects: discrimination ability, calibration ability, and clinical effectiveness. The nomogram was verified internally in the derivation cohort and externally in the validation cohort. Harrell's C-statistical consistency index (C-index) was applied to the evaluation of nomogram identification [17]. The C-index value ranges from 0.5 to 1.0, where 0.5 represents random chance and 1.0 represents exactly the same. In general, C-index > 0:7 was considered to have excellent discrimination [18]. The area under the ROC curve (AUC) was used to evaluate the prediction discrimination of the nomogram. In the regression model, AUC value was similar to C-index, and AUC value > 0:7 was considered to have better discrimination ability. The calibration capability was evaluated through the calibration chart and the Hosmer-Lemeshow test [19]. The bootstrap method with 1000 resamples was applied to measure the C -index, AUC value, and calibration curve [20]. A decision curve analysis (DCA) was used to evaluate the clinical usefulness of the nomogram based on its net benefits at different threshold probabilities in the validation cohort. All tests were two-tailed, and a P value of < 0.05 was considered statistically significant. All statistical analyses were carried out using R software (http://www.r-project.org/) with the R base package.

Characteristics of Study Population.
A total of 1688 patients participated in our study, and 103 of them developed diabetes. The median follow-up time for all participants in this study was 3.0 years (range: 2.0-5.7 years). Eligible participants were randomly split into a derivation cohort (n = 1273 ) and a validation cohort (n = 415). The ages of the derivation group and the validation group were 48:09 ± 12:91 years and 48:97 ± 13:54 years, respectively. The BMI of the derivation group and the validation group were 25:57 ± 3:29 kg/m 2 and 25:55 ± 3:36 kg/m 2 , respectively. The FPG of the derivation group and the validation group were 5:23 ± 0:67 mmol/L and 5:25 ± 0:68 mmol/L, respectively. The average follow-up time of the derivation and the validation groups was 1078 days and 1087 days, respectively. There were 81 and 22 T2D cases in the derivation and validation cohorts, respectively. There were no statistical differences between the two groups in FPG, follow-up time, incidence of T2D, age, gender, BMI, TC, TG, LDL-C, ALT, AST, BUN, CCR, smoking status, drinking status, and family history. The baseline characteristics of the derivation and validation cohorts are set out in Table 1.
3.2. Feature Selection. The LASSO logistic regression method was employed to select the most significant prediction features in the prediction model. In this study, feature selection was carried out based on the derivation dataset. In total, 20 features were used in LASSO logistic regression. Moreover, eight features with nonzero coefficients were selected by the LASSO logistic regression algorithm with an optimal λ of 0.0212 (Figures 1(a) and 1(b)). These eight features include Aage, BMI, FPG, TC, HDL-C, smoking status, drinking status, and family history.

Independent Risk Factors in the Derivation Cohort.
In patients with a history of hypertension, the variables identified as predictors of incident T2D are listed in Table 2. In the derivation cohort, univariate Cox regression analysis showed that age, BMI, FPG, TC, TG, HDL-C, CCR, smoking status, drinking status, and family history were significant risk factors for T2D in addition to gender, LDL-C, ALT,AST, and BUN. Multivariate Cox regression analysis showed that age, BMI, FPG, and TC are independent risk predictors for the development of T2D in patients with a history of hypertension, which can be further used to establish a nomogram.

Nomogram Construction and Performance Assessment.
As shown in Figure 2, the nomogram was drawn to provide a quantitative and convenient tool to predict the 5-year incidence of T2D in patients with hypertension by using age, BMI, FBG, and TC in the derivation cohort. To assess the 5-year risk of T2D in a hypertensive patient, the value of the hypertensive patient was located on each variable axis. A vertical line was drawn from the value to the vertex scale to the number of points at which the variable value is specified. Then, points from each variable value are summed. The sum was located on the total submark and projected vertically on the bottom axis to obtain the individualized risk of

Discussion
With the development of the economy and the improvement of people's living standards, T2D is on the rise worldwide, which is thought to be the third largest disease that threatens human health, next to cancer and cardiovascular disease [21]. T2D and its complications are one of the major economic burdens in the present era [22]. China has the largest number of patients with diabetes in the world. According to the International Diabetes Federation, the annual cost of diabetes in China is $ 25 billion [23]. It is estimated that these costs will continue to increase and will exceed $ 47 billion by 2030 [24]. A large number of studies have shown that hypertension is  [25]. T2D is more common in patients with hypertension than in patients without a history of hypertension. The coexistence of DM and hypertension significantly increase the risk for coronary heart disease, left ventricular hypertrophy, congestive heart failure, and stroke compared to either condition alone [10]. In addition, both hypertension and DM are present in all prediction models for the occurrence of stroke in patients with atrial fibrillation [26]. Microvascular complications are also more common in patients with coexistent hypertension and  5 BioMed Research International DM, and both retinopathy and nephropathy are more prevalent in patients with DM and hypertension [10]. Therefore, primary prevention and timely intervention are the keys to prevent or delay the onset of T2D in patients with hypertension [11]. Early detection of people at high risk for diabetes is critical to reducing morbidity, which led us to this study.
In this community-based cohort study, we developed a quantifiable and simple nomogram to predict the 5-year T2D risk of hypertension patients in mainland China. In the derivation and validation cohorts, our model shows higher prediction accuracy, relatively high C-index, and excellent calibration curve consistency. As far as we know, this study is the first nomogram to estimate the risk of type 2 diabetes in Chinese mainland hypertension patients using continuous values rather than segmented values. In addition, the nomogram will be of great practical value because of their easily obtained parameters. In this study, we developed a nomogram based on a large-scale multicenter population of China. Although it has been reported that more than 40 T2D risk prediction models have been established among different populations, there are few risk prediction models based on the East Asian race, especially the Chinese popula-tion [27][28][29][30]. Taking into account the genetic and environmental differences (i.e., economic level, diet, climate, and lifestyle), the intensity or distribution of T2D risk factors varies among different populations, complex mathematical formulas, and the lack of simple and intuitive tools to facilitate the use of these predictive risk models [14]. Therefore, few models are currently used clinical practice. Our study is the first nomogram to predict the 5-year incidence of T2D in hypertensive patients in mainland China. Nomogram is a graphical representation of a complex mathematical formula, which is widely used in the study of tumor prognosis [31,32]. However, a few types of research have focused on developing this easy-to-use T2D risk prediction tool. The nomogram can generate individual T2D probability by integrating various risk prediction factors, which can meet our needs for visualization tools and meet our progress towards personalized prevention [33]. Compared with traditional mathematical formulas, through the user-friendly digital interface, higher accuracy, and easier to understand risk prediction, rapid calculation of the nomogram can seamlessly integrate risk assessment into clinical decision-making [34]. Using these simple, fast, cheap, noninvasive tools, we hope to be able to  BioMed Research International effectively identify individuals with a high risk of 5-year T2D in patients with hypertension. Medical interventions, lifestyle changes, diagnostic management, and treatment then can be initiated, and ultimately the patient's prognosis can be improved. Our prediction models include age, BMI, FBG, and TC. These variables identified as risk factors for T2D were consistent with previous studies. Multiple studies have found that dyslipidemia, obesity, and T2D usually coexist in individuals and share common pathological mechanisms (metabolic disorders, insulin resistance, inflammation, and changes in the intestinal flora) [35][36][37]. Therefore, the application of these parameters in the model is well founded. T2D usually occurs in adults and is more common in the elderly. Numerous studies have proven that advanced age is an unchangeable risk factor for the manifestation of diabetes [3,38]. Aging β-cells may exhibit lower glucose responsiveness and glucose sensitivity, leading to hyperglycemia and T2D [39]. The epigenetic changes caused by aging may affect islet gene expression and insulin secretion [40]. Davegårdh et al. [41] found that age-related changes in pancreatic islet DNA methylation can increase insulin resistance, cause impaired β-cell function, and induce T2D. Impaired FBG is one of the diagnostic criteria for diabetes. Studies have shown that hemoglobin A1c, FBG, and 2hPG can predict diabetes, but the detection reliability of hemoglobin A1c and FBG is better than 2hPG [42]. In addition, compared with A1c hemoglobin, the feasibility and applicability of FBG detection in low-resource settings are more prominent [43]. In our predictive model, BMI was one of the main aspects of all diabetic risk factor scores. It is well known that T2D is usually associated with overweight and obese individuals [44,45]. Obesity, especially longlasting and visceral obesity, is the cornerstone of T2D pathogenesis [46]. According to gender and ethnicity, the incidence rate of BMI in patients with T2D ranges from 50% to 90% was 25 kg/m 2 , and the incidence of T2D in elderly obese patients was higher [47]. It is worth noting that even if BMI is less than 25 kg/m 2 , the relative risk of diabetes in adults seems to increase, and it increases exponentially with BMI [47]. The pathophysiological pathways behind this association are complex and progressive, leading to the development of insulin resistance and secondary impairment of β-cell function [44,46,47]. Obesity-induced metabolic disorders, adipose organ dysfunction, and changes in fat metabolic processes play a fundamental role in insulin resistance [44]. Excess energy induces insulin resistance by inhibiting adenosine monophosphate-activated protein kinase signaling pathways in obese patients [44,48]. According to previous studies, dyslipidemia is a well-known risk factor for T2D [49]. Similar to those reports, patients with TC abnormalities had higher T2D risk scores on the nomogram. The underlying pathophysiology of dyslipidemia leading to insulin 7 BioMed Research International resistance is complex and has not been well understood [36]. At present, some studies have pointed out that TC itself may directly lead to disorders of glucose metabolism [37,50]. TC can also increase insulin secretion by protecting β-cells from cholesterol-induced β-cell dysfunction, stress-induced apoptosis, and islet inflammation [46,47]. The higher level of TC may exacerbate abnormal glucose homeostasis. In contrast, oxidized low-density lipoprotein can inhibit molecular insulin secretion and even cause β-cell apoptosis [39,40].
In this study, we established a nomogram model of 5-year T2D incidence of hypertension patients in mainland China by using parameters that can be collected in the general health care settings. This will have a major impact on the clinic and society, especially for residents in mainland China, where OGTT is not easily accessible. Our nomogram provides a quantitative way to distinguish the high-risk group of T2D in patients with hypertension who must focus on their own physical condition and follow advanced  Although our nomograms performed well in both the derivation and validation cohorts, there are still some limita-tions in this study. First, all participants are from China. Therefore, the result may not apply to other countries. Second, although our analysis includes a wide range of potential predictors, there are other factors that cannot be measured, such as insulin secretion, which leads to the limited predictive power of the model. Third, the nomogram is based on a retrospective cohort study in which individuals with incomplete data are excluded, which may lead to selection bias. Therefore, prospective studies are needed to further validate our results. Fourth, although the genomic classifier is 9 BioMed Research International considered a promising predictive tool, this study did not consider genomic characteristics. Fifth, the lack of treatment data for participants in the study may interfere with the development of diabetes. Sixth, failure to perform HbA1c and oral glucose tolerance tests may mask diabetes events at baseline or during follow-up. Despite these limitations, the study was the first large cohort study in mainland China to predict 5-year T2D incidence in patients with hypertension.

Conclusion
In summary, we have established a nomogram based on four risk factors, including FBG, age, BMI, and TG, to determine the population with high T2D risk in patients with hypertension in mainland China. Our nomogram can be used as a simple, affordable, reasonable, and widely implemented tool to predict the 5-year T2D risk of hypertension patients in mainland China. This application helps timely intervention to reduce the incidence of T2D in patients with hypertension in mainland China.

Data Availability
All datasets generated and/or analyzed during the present study are included in this published article and available in Dryad Digital Repository (http://www.datadryad.org/).

Conflicts of Interest
The authors declare that they have no competing interests.