Construction of a Prediction Model for the Mortality of Elderly Patients with Diabetic Nephropathy

To construct a prediction model for all-cause mortality in elderly diabetic nephropathy (DN) patients, in this cohort study, the data of 511 DN patients aged ≥65 years were collected and the participants were divided into the training set (n = 358) and the testing set (n = 153). The median survival time of all participants was 2 years. The data in the training set were grouped into the survival group (n = 203) or the death group (n = 155). Variables with P ≤ 0.1 between the two groups were selected as preliminary predictors and involved into the multivariable logistic regression model and the covariables were gradually adjusted. The receiver operator characteristic (ROC), Kolmogorov-Smirnov (KS), and calibration curves were plotted for evaluating the predictive performance of the model. Internal validation of the performance of the model was verified in the testing set. The predictive values of the model were also conducted in terms of people with different genders and ages or accompanied with chronic kidney disease (CKD) or cardiovascular diseases (CVD), respectively. In total, 216 (42.27%) elderly DN patients were dead within 2 years. The prediction model for the 2-year mortality of elderly patients with DN was established based on length of stay (LOS), temperature, heart rate, peripheral oxygen saturation (SpO2), serum creatinine (Scr), red cell distribution width (RDW), the simplified acute physiology score-II (SAPS-II), hyperlipidemia, and the Chronic Kidney Disease Epidemiology Collaboration equation for estimated glomerular filtration rate (eGFR-CKD-EPI). The AUC of the model was 0.78 (95% CI: 0.73–0.83) in the training set and 0.72 (95% CI: 0.63–0.80) in the testing set. The AUC of the model was 0.78 (95% CI: 0.65–0.91) in females and 0.78 (95%CI: 0.68–0.88) in patients ≤75 years. The AUC of the model was 0.74 (95% CI: 0.64–0.84) in patients accompanied with CKD. The model had good predictive value for the mortality of elderly patients with DN within 2 years. In addition, the model showed good predictive values for female DN patients, DN patients ≤75 years, and DN patients accompanied with CKD.


Introduction
Diabetic nephropathy (DN) is a common microvascular complication of diabetes mellitus (DM) [1]. Approximately 30% of DM patients are diagnosed with renal complications including DN [2]. DN in patients can lead to end-stage renal failure and disability, which is associated with high mortality all over the world [3]. DN patients tend to be elderly and may be associated with various complications, such as cerebrovascular, cardiovascular, peripheral vascular, connective tissue, liver, and chronic pulmonary diseases and tumors [4,5]. DN is associated with higher mortality rates and worse clinical outcomes, which were largely due to the serious complications [6].
erefore, predicting the all-cause mortality in DN patients was of great value for providing timely interventions in these patients and improving the outcomes of these patients.
Previously, various studies have explored the risk factors for the mortality in DN patients [7][8][9], but the risk of mortality could not be estimated based on the findings of these studies, as they did not form a prediction model. Currently, the model for predicting the mortality of DN patients was rare. In 2017, Sato et al. [10] established a prediction model for all-cause mortality in DN patients [10]. e model had an area under the curve (AUC) of 0.791, which had good predictive ability for the mortality of DN patients. Previously, multiple studies have indicated that prediction model based on combined variables might be better than those including only one variable [11]. e prediction model by Sato et al. [10] was focused on predialysis neutrophil-lymphocyte ratio, and validation was not performed to verify the performance of the model. Due to the poor prognosis of DN patients at old age [12], a suitable prediction model was required for the all-cause mortality in elderly DN patients to quickly identify those at high risk of mortality and provide timely treatments for these patients.
In this study, the purpose was to construct a prediction model for all-cause mortality in elderly DN patients. e predictors were screened out and included in the model. e internal validation was performed to evaluate the predictive value of the model. Subgroup analysis was also conducted in terms of gender and being complicated with chronic kidney disease (CKD) or cardiovascular diseases (CVD).

Study Population.
In this cohort study, the data of 522 DN patients aged ≥65 years were derived from Medical Information Mart for Intensive Care (MIMIC-III) database. MIMIC-III database is an extensive and single-center database, constructed by Institutional Review Boards (IRB) of the Massachusetts Institute of Technology (Cambridge, MA, USA) and Beth Israel Deaconess Medical Center. It contained the data of over 50000 hospital patients admitted to intensive care units (ICUs) between 2001 and 2012 including the demographic details, admission and discharge times, dates of death, procedures such as dialysis, imaging studies, blood chemistry, hematology, urine analysis, microbiology test results, administration records of intravenous medications, medication orders, free text notes such as provider progress notes and hospital discharge summaries, and nurse-verified vital signs [13]. After excluding participants without the data on Sequential Organ Failure Assessment (SOFA) score, the simplified acute physiology score-II (SAPS-II), and temperature, 511 patients were finally involved in our study.

Outcome Variables.
e outcome variable was the death of elderly DN patients within 2 years. e follow-up time was 10 years and the median survival time was 2 years.

Definitions of Variables
. κ is 0.7 for females and 0.9 for males, α is −0.329 for females and −0.411 for males, min indicates the minimum of Scr/κ or 1, and max indicates the maximum of Scr/κ or 1. LOS is the length of stay in the ICUs.

Logistic Regression Model.
Logistic regression is a classification method applied for binary or classification method generalizing logistic regression to multiclass problems multinomial outcome variables. It evaluates the associations between a dependent categorical outcome and one or more independent predictor variables, which provides predicted probabilities for each category [14] (1). e detailed formula of the logistic regression model is as follows: 2.6. Statistical Analysis. e normal distributed measurement data were expressed as mean ± standard deviation (mean ± SD), and comparisons between groups were subjected to independent-sample t-test. Nonnormal distributed data were described as M (Q 1 , Q 3 ), and the Mann-Whitney U rank-sum test was used for comparing differences between groups. e enumeration data were displayed as n (%), and comparisons between groups were performed by χ 2 test or Fisher's exact probability method [15]. All the data were divided into the training set (n � 358) and the testing set (n � 153) at a ratio of 7 : 3 16 . e prediction model was constructed in the training set and verified in the testing set. e data in the training set were grouped into the survival group (n � 203) or the death group (n � 155), and comparisons between the two groups were performed. Variables with P ≤ 0.1 were selected as preliminary predictors. e preliminarily screened predictors were then involved in the multivariable logistic regression model and the covariables were gradually adjusted. Subgroup analysis was conducted in male group and female group, CKD group and non-CKD group, CVD group and non-CVD group, age ≤75 years group, and age >75 years group, respectively. e area under the curve (AUC), Kolmogorov-Smirnov (KS), calibration curve, sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV), and accuracy were employed for evaluating the predictive performance of the model. A nomogram was also plotted to evaluate the possibility of mortality of elderly patients with DN. e confidence level was 0.05 and Python 3 was used for statistical analysis.

Missing Value Manipulation and Sensitivity Analysis.
e missing values of variables are shown in Supplementary  Table 1. e missing data were manipulated via multiple interpolation using R mice. Sensitivity analysis was performed in the data before and after the manipulation. e results delineated that there was no statistical difference between the data before and after the manipulation, indicating that the data after manipulation could be used for further analysis.

Baseline Characteristics of Participants.
In total, 522 DN patients aged ≥65 years from MIMIC-III were involved in our study. Participants without the data on SOFA score and SAPS-II (n � 9) and those without the data on temperature (n � 2) were excluded, and 511 patients were finally included. e detailed screen process is shown in Figure 1. e equilibrium test revealed that there was no significant difference between the data of participants in the training set and the testing set ( Table 2).
LOS: length of stay, SBP: systolic blood pressure, DBP: diastolic blood pressure, MAP: mean arterial pressure, SpO 2 : peripheral oxygen saturation, WBC: white blood cells, RBC: red blood cells, INR: international normalized ratio, MCV: mean corpuscular volume, MCHC: mean corpuscular hemoglobin concentration, RDW: red cell distribution width, COPD: chronic obstructive pulmonary disease, AF: atrial fibrillation, eGFR-CKD-EPI: the Chronic Kidney Disease Epidemiology Collaboration equation for estimated glomerular filtration rate, eGFR-MDRD: the Modification of Diet in Renal Disease equation for estimated glomerular filtration rate, CKD: chronic kidney disease, CVD: cardiovascular diseases, SOFA: Sequential Organ Failure Assessment, SAPS-II: the simplified acute physiology score-II.  Journal of Healthcare Engineering  Table 3).

Predictors for Mortality of Elderly Patients with DN.
Variables with P ≤ 0.1 in the survival group and the death group were included in the multivariable logistical analysis.
Stepwise regression was applied to identify the predictors for mortality of elderly patients with DN within 2 years. As depicted in Table 4 Figure 3. e nomogram was plotted and a sample was selected, which showed that the total score of the patient was 284, and the predicted mortality probability was 0.155, which was lower than the cut-off, 0.33 (Figure 4). e predicted outcome of the     patient was survival, which was consistent with the actual outcome.            Figure 5).

Discussion
is study extracted the data of 511 DN patients aged ≥65 years and screened the predictors to establish a prediction model for the mortality of DN patients within 2 years. e results revealed that the model had good predictive ability for the mortality of DN patients within 2 years. Additionally, the predictive values of female DN patients, DN patients ≤75 years, DN patients accompanied with CKD, and patients with or without CVD were also good. e findings of our study might offer a tool for identifying DN patients with high risk of death within 2 years and the clinicians should provide timely interventions to those patients to improve their outcomes.
is study established a prediction model for the mortality of elderly DN patients within 2 years. In previous prediction models for the mortality of DN patents, many studies were focused on evaluating the risk of renal survival in DN patients [9,16].Our study constructed a model and evaluated its predictive value for all-cause mortality in DN patients. DN patients were associated with various complications and the all-cause mortality of DN patients was high and should be brought to attention [17]. Sato et al. [10] established a prediction model for all-cause mortality in DN patients, but this model was based on only one laboratory index (predialysis neutrophil-lymphocyte ratio) and the sample size was small (n = 78). In addition, internal validation was also not performed to verify the performance of the model [10]. In our study, the prediction model was constructed based on the predictors including LOS, temperature, heart rate, SpO 2 , Scr, RDW, the simplified acute physiology score-II (SAPS-II), hyperlipidemia, and eGFR-CKD-EPI, which presented a better predictive ability compared to the model involving one predictor. e sample size in this study was larger than that in the previous study. Additionally, internal validation was performed and it was found that the predictive value of the model for the mortality of DN patents within 2 years was good. e prediction model in our study might provide a tool for the clinicians for quickly identifying DN patients with high risk of death and timely interventions should be provided in those patients for improving their outcomes. We also plotted a nomogram of the prediction model based on the results from the logistic regression. e nomogram can quickly and intuitively obtain the probability of mortality of each patient. e impaired glomerular filtration rate (GFR) was regarded as a marker of DN in DM patients [18]. A previous meta-analysis revealed that the impaired GFR was an independent risk factor for progressive CKD, end-stage renal failure, and all-cause mortality in general population [19]. e eGFR-CKD-EPI is an extensively used equation for estimating GFR [20]. e decline of eGFR-CKD-EPI was associated with renal hyperfiltration and impaired GFR in DM patients [21]. ese supported the results in our study, which revealed that the eGFR-CKD-EPI was a predictor for the mortality of DN patients within 2 years. Patients with rapid decline of eGFR-CKD-EPI should be brought to the forefront and special treatments should be provided to prevent the mortality of DN patients. DN was associated with higher Scr levels in patients, and high Scr levels indicated a declining renal function [22,23]. is allied with the results in this study, which indicated that the Scr level was an important predictor for the mortality of elderly DN patients within 2 years. Clinicians should pay special attention to DN patients with high level of Scr. SpO 2 is an index for oxygenation status of people and tissue hypoxia is an important contributor to diabetic complications [24]. Frequent abnormal blood oxygen in patients was reported to be associated with elevated inflammation in patients [25]. Herein, SpO 2 was a predictor for the mortality of elderly DN patients within 2 years. In this study, RDW was another predictor for the mortality of elderly DN patients within 2  years. is was supported by several previous studies. Zhang et al. [26] identified that patients with DN were found to be with high level of RDW and RDW was associated with increased risk of progression to ESRD in patients with DN [26]. Another study also demonstrated that high level of RDW was an indicator of prognosis in DN patients and high level of RDW in T2D patients indicated a poor prognosis for DN [27]. SAPS-II is an indicator evaluating the outcomes of patients in ICUs and estimating their risk of mortality [28]. SAPS-II has good power to predict the deaths in ICU, which has been recommended for the identification and mortality prognostication of patients in ICUs [29]. In our study, SAPS-II was found to be a predictor for the mortality in ICU patients with DN. High-risk patients were associated with longer LOS in ICUs and with higher hospital mortality [30]. e prolonged LOS in ICUs has been reported to be a risk factor for infections, which might also increase the risk of death in patients [31]. ese gave evidence to the findings in this study, showing that LOS in ICUs was a predictor for the mortality of DN patients in ICUs.
Several limitations existed in our study. Firstly, this study extracted the data from MIMIC-III database, which lacked several important variables including the medications of DN patients, as well as the control of blood glucose of the subjects, and these were closely associated with the outcomes of these patients. Secondly, external validation of the predictive value of the model was not performed. In the future, studies with large scale of sample size were required to validate the findings in our study. Currently, there were numerous machine learning algorithms that can be used for predicting the mortality of elderly patients with DN. Some recent studies have also used principal component analysis-(PCA-) firefly based deep learning model for predicting the occurrence or the detection of diabetic retinopathy [32][33][34]. e predictive accuracy was evidently improved using these methods. Diabetic nephropathy and DN are common microvascular complications of diabetes mellitus. In our study, we only used logistic regression model, and, in the future, PCA-firefly based deep learning model might be applied in our further studies to improve the predictive ability for the mortality of DN patients and achieve a better tool for the clinicians to quickly and accurately identify those with high risk of death.

Conclusion
is study established a prediction model for the mortality of DN patients within 2 years based on LOS, temperature, heart rate, SpO 2 , Scr, RDW, SAPS-II, hyperlipidemia, and eGFR-CKD-EPI. e model had good predictive value for the mortality of elderly patients with DN within 2 years. In addition, the model showed good predictive values for female DN patients, DN patients ≤75 years, and DN patients accompanied with CKD.

Data Availability
e data used to support the findings of this study can be obtained from the corresponding author upon request.