Development and Validation of a Nomogram to Predict Type 2 Diabetes Mellitus in Overweight and Obese Adults: A Prospective Cohort Study from 82938 Adults in China

Background The twin epidemic of overweight/obesity and type 2 diabetes mellitus (T2DM) is a major public health problem globally, especially in China. Overweight/obese adults commonly coexist with T2DM, which is closely related to adverse health outcomes. Therefore, this study aimed to develop risk nomogram of T2DM in Chinese adults with overweight/obesity. Methods We used prospective cohort study data for 82938 individuals aged ≥20 years free of T2DM collected between 2010 and 2016 and divided them into a training (n = 58056) and a validation set (n = 24882). Using the least absolute shrinkage and selection operator (LASSO) regression model in training set, we identified optimized risk factors of T2DM, followed by the establishment of T2DM prediction nomogram. The discriminative ability, calibration, and clinical usefulness of nomogram were assessed. The results were assessed by internal validation in validation set. Results Six independent risk factors of T2DM were identified and entered into the nomogram including age, body mass index, fasting plasma glucose, total cholesterol, triglycerides, and family history. The nomogram incorporating these six risk factors showed good discrimination regarding the training set, with a Harrell's concordance index (C-index) of 0.859 [95% confidence interval (CI): 0.850–0.868] and an area under the receiver operating characteristic curve of 0.862 (95% CI: 0.853–0.871). The calibration curves indicated well agreement between the probability as predicted by the nomogram and the actual probability. Decision curve analysis demonstrated that the prediction nomogram was clinically useful. The consistent of findings was confirmed using the validation set. Conclusions The nomogram showed accurate prediction for T2DM among Chinese population with overweight and obese and might aid in assessment risk of T2DM.


Introduction
Globally, type 2 diabetes mellitus (T2DM) is a common public health problem that has affected 422 million adults and caused 1.6 million deaths in 2016 [1,2]. Furthermore, T2DM causes huge financial burden. e health expenditure of diabetes alone is 673 billion dollars in 2015, accounting for 12% of total expenditure [3]. However, the global burden of disease study and epidemiological studies have confirmed that the prevalence of T2DM has increased rapidly worldwide in the last three decades, especially in developing countries including China [4][5][6]. China is the world's most populous nation and the largest developing country. Almost one in four of patients with diabetes all over the world lives in China, which makes China become the country with the largest T2DM population in the world [5].
Simultaneously, the prevalence of overweight and obesity all over the world has been increasing steadily over the past several decades [7]. In 2016, World Health Organization (WHO) estimated 39% and 13% of adults (≥18 years) in the world being overweight and obese, respectively [2]. Accumulating surveys indicate overweight/obesity to be a major risk factor for T2DM [8][9][10]. Previous cohort studies indicate that overweight and obese adults are 2.5 times more likely to develop T2DM than normal weight individuals [11]. Additionally, compared with overweight and obese adults or T2DM alone, patients with T2DM and overweight and obesity have an increased risk of cardiovascular-related mortality [12]. e twin epidemic and parallel escalation of overweight/ obesity and T2DM is a major health crisis globally. Approximately 63% of patients with T2DM are overweight or obese in China [13]. erefore, it is of great significance to distinguish individuals with high risk of suffering from T2DM from those with low risk and follow-up with those high-risk subjects closely for early detection and prevention of T2DM.
ough several prediction models were established for diabetes [14,15], traditional risk factors related to T2DM might play a different role in overweight and obese adults. In addition, most of the predictive score models are built in European and American populations [16,17], which may not be suitable for the prediction in Chinese population. Moreover, some prediction models built in the general population might underestimate T2DM risk in overweight and obese adults. To our knowledge, a prediction model has not been developed specifically to predict T2DM in overweight and obese population.
Accordingly, our study aimed to establish and validate a comprehensive visual predictive model for T2DM in Chinese with overweight and obesity. e proposed nomogram can help healthcare workers and individuals assess the risk of T2DM, thus promoting early detection and intervention for T2DM.

Methods and Materials
2.1. Setting and Participants. Data for this study were obtained from a prospective cohort study which was established by the Rich Healthcare Group in China from 2010 to 2016. It is a computerized database including all medical records for participants who received a health check. is cohort study was conducted in 43 sites across 11 provinces involving 685277 participants. e number of subjects with duration of follow-up more than two years was 225575 participants. Finally, a total of 211833 participants free of diabetes at baseline were included in the cohort study. is study is a secondary data analysis of the cohort data which was downloaded from a shared database by Chen et al. [18,19] in the Dryad Digital Repository (http://www. datadryad.org). We analyzed the data for 82938 overweight and obese participants in this current paper.

Data Collection.
Trained staff used standardized electronic questionnaires and collected data on demographic characteristics (age, gender) and health-related behaviors (alcohol consumption and cigarette smoking) in each visit to the health check center. Blood pressure (BP) of each participant was measured using the uniform sphygmomanometer. Height and weight were also measured to the nearest 0.1 cm and 0.1 kg, respectively, by trained staff. Body mass index (BMI) was derived as weight divided by the square of height (kg/m 2 ).
Fasting for ≥10 h venous blood samples was collected for all participants. en, fasting plasma glucose (FPG), total cholesterol (TC), triglycerides (TG), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), alanine aminotransferase (ALT), aspartate aminotransferase (AST), blood urea nitrogen (BUN), and concentration of creatinine (CCR) were tested at local health check center using an autoanalyzer. e variables for each case were extracted from the raw data as follows: age, gender, sites, height, weight, BMI, FPG, systolic and diastolic BP (SBP, DBP), TC, TG, LDL-C, HDL-C, ALT, AST, BUN, CCR, smoking and alcohol consumption status, family history of diabetes, years of follow-up, and eventual diagnosis of diabetes.

Follow-Up Data Collection.
e annual health check of the participants was considered as a follow-up examination. e primary outcome measure was the first (incident) diagnosis of T2DM, which was recorded on the general practice computer records. FPG and lipids profiles and presence of T2DM were evaluated as per baseline. Diabetes was defined as FPG ≥7.0 mmol/L or a self-reported presence of T2DM. If a participant developed diabetes during followup, the participant was asked in detail when the diabetes occurred, to the exact date of month.

Definition.
Overweight and obesity were classified if BMI is between 24.0 and 27.9 Kg/m 2 and ≥28.0 Kg/m 2 , respectively [20]. Current cigarette smoking was coded as yes/ no. Current alcohol drinking was defined as yes/no. Family history of diabetes was categorized into yes/no. Hypertension is defined as SBP ≥140 mmHg and/or DBP ≥90 mmHg. Dyslipidemia was defined as a combination of one or more statuses: TC ≥ 6.22 mmol/L, LDL-C ≥ 4.14 mmol/L, HDL-C < 1.04 mmol/L, and TG ≥ 2.26 mmol/L in terms of criteria recommended by Chinese guidelines for the Prevention and treatment of dyslipidemia in adults [21].

Statistical Analysis.
Descriptive analyses were conducted for 82938 participants using SPSS 20.0 for Windows (SPSS Inc., Chicago, IL). All continuous variables were summarized as means ± standard deviations (M ± SD), and categorical variables were expressed as frequency (n) and proportions (%), and the results were compared using Student's t-test and the chi-square test to detect the statistical significances, respectively. e developement and the assessment of nomogram were divided into four steps. First, we randomly selected 70% of the participants (n � 58056) as training set to construct the model. We reserved the remaining 30% (n � 24882) as validation set for validation. Second, we identified independent predictive features using nonzero coefficients in the least absolute shrinkage and selection operator (LASSO) regression model [22,23]. ird, Cox proportional hazards model was applied to construct a predicting nomogram based on the selected feature from the LASSO regression model [24], with results presented as hazards ratio (HR) with associated 95% confidence interval (95% CI) and corresponding p value. Fourth, the discrimination and calibration of the nomogram were assessed by Harrell's concordance index (C-index) and the area under the receiver operating characteristic curve (AUC) and calibration curves plot, respectively [25,26]. Finally, to quantify the net benefits at different threshold probabilities in the model, decision curve analysis (DCA) was conducted to determine the usefulness of the nomogram in the validation cohort [27]. e nomogram and the bootstrap analysis were performed using the package of "rms" in R version 3.5.1. A p value < 0.05 was considered to indicate significance.

Baseline Characteristics.
In total, 82938 overweight and obese subjects with mean age 44.99 ± 12.98 years were enrolled with men accounting for 72.3%. e median followup time for all participants in this study was 2.98 years (range: 2.15-3.93 years). During the follow-up period in this study, the overall incidence of T2DM was 3.7% (n � 3069).
ere were no significant differences between the training set and the validation set for baseline characteristics excepted gender and smoking status (p range:0.056 to 0.943) ( Table 1).

Predicted Feature Selection.
We used the LASSO regression model to screen independent predicting features of T2DM in training set. Six potential predictors were screened out of 19 factors in the study (∼3 : 1 ratio; Figures 1(a) and 1(b)) and were with nonzero coefficients (min lambda of 0.02238) in the LASSO regression model. ese factors included age, BMI, FPG, TC, TG, and family history which were presented in Table 2.

Construction and Assessment of Nomogram.
e predictive nomogram that integrated all the significant features for the type 2 diabetes-free survival (T2DFS) probability was then developed (Figure 2). e C-index and AUC for the predictive nomogram was 0.859 (95% CI: 0.850-0.868) and 0.862 (95% CI: 0.853-0.871), respectively, which indicated the model's good discrimination (Figure 3(a)). e calibration of nomogram for the T2DFS probability at 3 and 5 years demonstrated good agreement by performing the calibration curve plot (Figures 4(a) and 4(c)).

Internal Validation of the Nomogram.
e nomogram showed good discrimination with a C-index of 0.848 (95% CI: 0.833-0.863) and AUC of 0.851 (95% CI: 0.837-0.865) through internal validation in the validation set. Additionally, the good calibration of the prediction nomogram was confirmed in the validation set (Figures 4(b) and 4(d)).
us, this prediction nomogram performed well using both the training and validation sets.

Clinical Use of Nomogram.
e DCA for the nomogram showed that when the threshold T2DFS in overweight and obese adults ranged between 3.9% and 73.5% at 3 years and between 5.1% and 82.3% at 5 years, using this nomogram to predict the T2DFS probability yielded more net benefit than the scheme, which showed the nomogram to be clinically useful ( Figure 5).

Discussion
Evidence is mounting that high BMI causes the incidence of T2DM [8][9][10]. Coexistence of obesity/overweight and T2DM is associated with increased risk of stroke, angina, and coronary heart disease and constitutes a significant cardiovascular health burden [12]. Primary prevention and timely intervention are at the core of preventing or postponing onset of T2DM. erefore, early identification of those individuals at high risk of developing diabetes in overweight and obese adults is vital for reducing the incidence. Accordingly, we attempted to develop and validate a nomogram to predict the T2DFS probability at 3 and 5 years in Chinese with overweight and obesity. e nomogram developed is simple (consisting of only six factors, during selection of variables for each block, many were eliminated because they were not associated with T2DM or because they showed strong colinearity with other variables) and shows good standardization and ability to discriminate. It is worth mentioning its high sensitivity (approximately 90%), indicating that the factors included are capable, as a whole, of predicting properly the risk of developing T2DM in overweight and obese adults.
T2DM is the ninth cause of disease burden worldwide [4]. erefore, several researchers have constructed T2DM risk prediction scores [14][15][16][17]. However, there are racial and ethnic differences in the prediction factors of T2DM since environmental and genetic characteristics differ among various racial/ethnic populations [28]. Consequently, T2DM risk assessment model developed in white populations are not suitable for Chinese population [14,15,29]. Moreover, several predicting models might not accurately predict the future risk of T2DM because they are based on participant coming from single study site, cross-sectional studies, or on relatively small sample size [30][31][32]. In addition, though there are several models based on Asian or Chinese, they did not contain some of other significant risk factors such as blood lipid levels and family history of diabetes, which might result in insufficient accuracy with small AUC of model [32,33]. Furthermore, the T2DM risk prediction scores developed in the general population cannot accurately predict the risk of T2DM in overweight and obese adults. To our knowledge, current study is the first to develop and validate a predicted nomogram for predicting 3-year and 5year incidence probability of T2DFS in a Chinese population with overweight and obesity based on multicenter cohort study. Our model shows good accuracy and excellent agreement in training and validation set, which suggests that it contains good transportability and generalizability.
Results of the current study show that the risk factors related to T2DM in overweight and obese adults include age, International Journal of Endocrinology 3 BMI, FPG, TC, TG, and family history of diabetes. is is consistent with the previous studies reporting the risk factors of T2DM [8,[34][35][36]. Currently, the mechanism on older people prone to develop T2DM might be attributed to aging β-cells with lower glucose responsiveness and glucose sensitivity and age-related islet cell DNA methylation, which affects insulin secretion and causes T2DM [35,37]. High BMI has been widely known as one of major risk factors for T2DM. It commonly coexist with T2DM. Our study shows that BMI also plays an important role in incidence of T2DM, which is related to insulin resistance derived from high BMI inducing adipose metabolic derangements and mild chronic inflammatory state [38]. Clinical studies have indicated that increased TC and TG lead to deterioration of glucose tolerance and disorders of glucose metabolism and that a high level of TC can predict T2DM, consistent with our study [36]. In addition, our finding showed that family history of T2DM was a predictive factor of new onset T2DM, which is associated with clear genetic predisposition for T2DM mentioned in previous studies [39]. We built a nomogram to assess the probability of T2DM combining these risk factors. Healthcare workers can make a preliminary judgment on the risk of T2DM in overweight/ obese individuals and follow-up with those high-risk populations closely. e high-risk individuals might represent a subset of those who might benefit the most from more frequent evaluations (with FPG and blood lipid detection and weight monitoring). Furthermore, the use of moderate exercise, healthy diet, lipid-lowering therapies, and excess weight loss might be pursued more aggressively for high-risk   Figure 1: Variable selection using the LASSO binary regression model. Notes: optimal parameter (lambda) selection in the LASSO model used tenfold cross-validation via minimum criteria. e partial likelihood deviance (binomial deviance) curve was plotted versus log (lambda). Dotted vertical lines were drawn at the optimal values by using the minimum criteria and the 1-SE of the minimum criteria (the 1-SE criteria). LASSO coefficient profiles of the 19 features. A coefficient profile plot was produced against the log (lambda) sequence. Vertical line was drawn at the value selected using tenfold cross-validation, where optimal lambda resulted in six features with nonzero coefficients. LASSO, least absolute shrinkage and selection operator; SE, standard error.    individuals, which may play a vital role in delaying the onset of diabetes and related complications.
Current study includes several strengths. First, we established a nomogram to predict T2DM for Chinese population with overweight and obesity to make individualized screening possible. Second, our study contains larger sample, multiple study sites, and wider age range, which may merit the data quality and generalizability. It makes the report one of the valuable information for public health sectors and clinical setting. Inevitably, this study has some limitations. First, the study sample is selected from China, which may hamper the representativeness of study results. However, one-fourth of the total people with diabetes live in China, which makes the nomogram significantly useful. Second, our research database is derived from the health check database, so it may bring some deviations to the selection of the study population. For example, the current smoking rate of the sample population in this study is significantly lower than the national average. ird, drug treatment of hypertension and dyslipidemia were associated   with an increased risk of new onset diabetes [40]. However, this study failed to include treatment data on hypertension and dyslipidemia. Fourth, although the robustness of our nomogram was examined extensively with internal validation, external validation could not be conducted. erefore, the further study of the generalizability to overweight and obese populations in other cohort studies is warranted.

Conclusions
We developed the nomogram as a potentially useful tool to predict T2DM in Chinese with overweight and obese adults based on a multicenter database, which includes six predictors: age, BMI, FPG, TC, TG, and family history. e nomogram shows good discriminative and calibrative ability, which could help healthcare workers and individuals assess the risk of T2DM in overweight and obese populations, and its external evaluation in wider overweight and obese populations is warranted.
Data Availability e materials included in the manuscript, including all relevant raw data, will be made freely available to any researchers who wish to use them for noncommercial purposes, while preserving any necessary confidentiality and anonymity.

Ethical Approval
is study was approved by the Rich Healthcare Group Review Board, and the information was retrieved retrospectively.