Chronic kidney disease (CKD) is a growing health problem worldwide (about 10 to 15 percent of the adult population in USA, 11.2% in Australia, 10.1% in Singapore, 18.7% in Japan, and 8.3% to 18.9% in Iran) [
In recent years, early diagnosis of the disease, especially determining the appropriate time to apply medical treatments for CKD has received great attention among clinicians and researchers [
In recent years, fuzzy intelligent methods, especially fuzzy expert systems, have been increasingly used to predict different diseases. It seems that employing this method along with the clinical tools for diagnosis of different diseases and conditions may drastically reduce diagnostic errors. Fuzzy expert technique is more accurate than machine learning techniques.
Adaptive neurofuzzy inference system (ANFIS) is a learning based system based on the neural networks concepts. The ANFIS network used in the present study is based on the model proposed by Jang et al. [
If we can model and predict the renal function worsening, we can effectively manage this disorder. In this regard, an appropriate parameter should be considered as the marker of disease worsening. The GFR is the only reliable parameter of the renal function and progression of CKD [
All the data of the present study were the clinical records of a cohort study of newly diagnosed CKD patients who were serially admitted to the Clinic of Nephrology, Imam Khomeini Hospital (Tehran, Iran), during October 2002–October 2011. The inclusion criteria for CKD definition included small sized kidney in ultrasound images or GFR less than 60 cc/kg/min/1.73 m2 for more than 3 months. The datasets were built using the clinical and laboratory data of different parameters. All the procedures of the study were approved by the ethics committee of Tehran University of Medical Sciences that completely coincide with the Declaration of Helsinki Ethical Principles for Medical Research Involving Human Subjects. The written consent form was obtained from each patient to participate in the study.
The patients were divided into two groups according to the pattern of their adherence to follow-up schedule in the clinic. A total of 465 CKD patients were enrolled in the study. The test group consisted of 389 patients who continuously (at least every six months) were visited in the clinic. The control group consisted of 76 patients who did not regularly follow their visit schedule in the clinic, but their visits had lasted at least one year. The details of demographic data and clinical measurements of the patients are presented in Table
Baseline characteristics of the study population (
Variable | Mean ± standard deviation |
---|---|
GFR (cc/min/m2) | 34.8 ± 11.0 |
Age (years) | 61.3 ± 14.9 |
BMI (kg/M2) | 26.5 ± 4.2 |
Systolic blood pressure (CmHg) | 14.0 ± 2.7 |
Diastolic blood pressure (CmHg) | 7.8 ± 1.3 |
Hemoglobin (gr/dL) | 12.6 ± 2.1 |
Phosphorus (mg/dL) | 4.10 ± 0.8 |
Uric acid (mg/dL) | 8.1 ± 1.9 |
Cholesterol (mg/dL) | 198.5 ± 58.9 |
Triglyceride (mg/dL) | 205.6 ± 134.5 |
The present study used ANFIS neural networks to predict GFR values and compared the accuracy of the method. At each visit, patient’s demographic data, weight, blood pressure, blood sample test variables including serum creatinine levels, fasting plasma glucose levels, lipid profile, calcium, phosphorus, hemoglobin, and other parameters were monitored. The appropriate treatments for blood pressure, bone mineral metabolism indices, and hemoglobin control were administered to each patient based on the clinical evaluations. The GFR was calculated using MDRD equation. The end point of data collecting for each patient was either GFR value less than 15 cc/kg/min/1.73 m2, start of RRT, or patient death. All quantitative variables were used as continuous to have a better training of model. According to the nephrologists’ opinions and the kidney function, ten variables were initially selected as influencing parameters on the GFR variations. These variables included age, sex, weight, underlying diseases, diastolic blood pressure (dbp), creatinine (Cr), calcium (Ca), phosphorus (P), uric acid, and GFR. They were considered as the inputs of the predicting model. Some of these variables did not necessarily show strong correlation with the output (
The correlation coefficients between all inputs and target output (
Input | Input number | Correlation coefficient between input |
---|---|---|
Underlying disease |
1 | 0.2505 |
Sex |
2 | 0.0706 |
Age |
3 | −0.1043 |
dbp |
4 | 0.7145 |
Cr |
5 | 0.0322 |
Ca |
6 | −0.2224 |
P |
7 | −0.1444 |
Uric acid |
8 | 0.1089 |
Weight |
9 | 0.8120 |
GFR |
10 | 0.5196 |
The first step to train all neural networks and accurate modeling was to divide data into training and test datasets. Training data were used to optimize the weights and other parameters in the model. The test data were used to evaluate the quality of estimates and forecasts. In all further processing and modeling, the test dataset was not used for training models. Test data realistically simulated the model in the case where there was no information about the future.
The test data were randomly selected so that all data had an equal chance to participate in the selection process. The test datasets are usually selected among 30 to 40% of the available data. In this study, 30% of the data were selected as test dataset. The remaining 70% were used as training dataset to estimate and train models.
Genfis3 code in MATLAB was used to fuzzify input variables and to establish the rule base. Genfis3 uses fuzzy
It is easy to establish a fuzzy rule base for ANFIS after fuzzifying variables using FCM clustering technique in Genfis3. The number of fuzzy rules is equal to the number of membership functions of input variables (11 functions). Thus, 11 fuzzy rules have been created in the rule base and used to estimate GFR values. Figures
Schematic diagram of predicting model and input of variables.
The ANFIS network structure used in the study to predict GFR values.
A total of 465 CKD patients were evaluated in the study, 277 of them were male. Diabetes mellitus was the underlying disease in 153 patients (33%). The GFR values ranged from 45 to 60 cc/kg/min/1.73 m2 in 154 patients (33.1%); 30–45 cc/kg/min/1.73 m2 in 215 (46.2%); and 15–30 cc/kg/min/1.73 m2 in 96 patients (20.7%).
According to the clinical dataset recorded from patients, fuzzy clustering of the four significant input variables is shown for the period of 6 months in Figures
The fuzzy functions selected for input 1, underlying disease (diabetes), for the 6-month period.
The fuzzy functions selected for input 2, diastolic blood pressure (dbp), for the 6-month period.
The fuzzy functions selected for input 3, weight, for the 6-month period.
The fuzzy functions selected for input 4,
The rules in the rule base build a fuzzy inference system. After training, it was converted to a fuzzy inference system called ANFIS. ANFIS training was performed using MATLAB.
Considering GFR modeling at one, two, or three future periods with the selected input variables including underlying disease, diastolic blood pressure, weight, and
Comparison of the ANFIS prediction and real
Comparison of the ANFIS prediction and real
In next step, the GFR function was estimated for the 6-month period according to the patients’ records. The results are presented as fitted surfaces for the output variable,
In next step, the model was used to predict the GFR values at sequential 12- and 18-month intervals (Figures
Comparison of the ANFIS prediction and real
Comparison of the ANFIS prediction and real
Comparison of the ANFIS prediction and real
Comparison of the ANFIS prediction and real
For further assessments and comparisons of the modeling, the results were examined based on the error criteria. Considering the variations of the results, appropriate error criteria should be used to evaluate the accuracy and efficiency of the predicting model. Three criteria were selected to evaluate and compare the accuracy of the fuzzy model: Mean Square Error (MSE), Mean Absolute Error (MAE), and Normalized MSE (NMSE). Of them, the NMSE is preferred since it provides the normalized error ranged from 0 to 100 percent. The formulas for error criteria are expressed by (
Comparison of error criteria for the training/test datasets for 6-, 12-, and 18-month periods. MSE: Mean Square Error; MAE: Mean Absolute Error; NMSE: Normalized MSE.
Training dataset ANFIS | Test dataset ANFIS | ||
---|---|---|---|
6 months | MSE | 48.7053 | 58.6253 |
MAE | 4.9960 | 4.7654 | |
NMSE | 3.7428% | 4.7676% | |
|
|||
12 months | MSE | 53.1676 | 54.885 |
MAE | 5.1170 | 5.5010 | |
NMSE | 4.1714% | 4.3019% | |
|
|||
18 months | MSE | 62.5255 | 64.0022 |
MAE | 5.5640 | 5.9302 | |
NMSE | 4.8709% | 4.8787% |
As the number of input variables increases, the modeling error decreases. As the ratio of observations to variables increases, the reliability increases, and thereby estimation error of the model parameters decreases. The modeler prefers fewer variables, provided that the model error remains low. Reviewing the relationships between the independent and dependent variables, among 10 independent variables, four of the most important independent variables with the highest correlation with the dependent variable (GFR) were selected. The accuracy of models was enhanced through eliminating other variables.
The present study was conducted in specific clinical situations (predicting the CKD progression) by integrated fuzzy modeling. This can be helpful in expediting medical applications. However, a question whether the numbers of time periods or lags considered for prediction affect modeling and error rate arises. It is noteworthy that a modeler always prefers to do forecasts for long periods, provided that the forecast error is not very high. Therefore, the time delay ( The model provides the ability to model fuzzy variables. In this regard, the uncertainty can be appropriately modeled. The model is able to estimate and predict GFR, so that the error rate has been reduced to 4% in some cases. In addition to GFR prediction, the model produces a fuzzy database. The database shows the complex relationships between experimental inputs and GFR as simple linear models in different modeling environments. The transparency of GFR membership function in the ANFIS is the advantage of ANFIS compared to other models such as statistical model like regression or neural network like multilayer perceptron neural network.
The proposed model can be used as a core computational component in a perspective medical decision support system (MDSS) to help physicians in making appropriate decisions about the time of renal replacement therapy. This system can take physiological parameters as input and predict the GFR values at future intervals. Using appropriate threshold value of GFR, nephrologists can effectively manage the CKD patients. To build such MDSS, it is necessary to complement the predicting model through evaluating the other possible influential parameters in kidney function.
The results of modeling and forecasting by ANFIS networks show that the models have a reliable accuracy for all periods of 6, 12, and 18 months. Therefore, fuzzy model could be used to predict GFR with a high reliability. The main challenge in modeling the renal failure progression is the high uncertainties of the human body as the environment as well as high dynamicity of the disease. Statistical and machine learning based prediction models cannot effectively overcome these problems. However, the proposed model could significantly control these issues using neurofuzzy approach. Furthermore, our model can accurately predict the GFR values in long future period so that increasing the forecasting period to 12 and 18 months do not reduce the accuracy of the prediction model (4.88% NMAE). This model can be used in clinical practice.
One of the main concerns of clinicians has been the efficacy of CKD patients’ supervision and regular follow-up on decreasing the speed of disease progression and prevention of its inevitable complications. Therefore, the use of efficient predicting model proposed in this study for decision support system in the field of kidney diseases as well as CKD management in a more quantitative manner may be an important strategy for reducing its burden. Using this model, it is possible to monitor the impact of each variable, routinely measured in CKD patients. Rucci et al. found that patients with proteinuria and a baseline estimated GFR (eGFR) of >33 mL/min/1.73 m2 had faster decline of GFR (2.8 mL/min/1.73 m2) than those with baseline eGFR of <33 mL/min/1.73 m2 and a baseline serum phosphorus of >4.3 mg/dL in their retrospective study. Among patients without proteinuria, those younger than 67 years exhibited a significantly faster progression, which was even faster for the subgroup with diabetes. Among patients aged older than 67 years, females had more steady eGFR than men [
Our results in comparison with the results of Tian et al. [
An ANFIS based model was developed for modeling the renal failure progression and predicting the renal failure time. The model could accurately (>95%) predict the GFR for sequential 6-, 12-, and 18-month intervals. The main limitation of this study from the clinical point of view was that urine protein was not among the variables evaluated for the prediction of GFR in 6 to 18 months.
The authors of the present study have no conflict of interests.
This study was financially supported by Ahvaz Jundishapur University of Medical Sciences (Grant no. u-93185).