An Efficient Prediction System for Diabetes Disease Based on Deep Neural Network

,


Introduction
Diabetes is a noncommunicable chronic disease that disrupts the body's natural blood glucose concentration management with disorders of carbohydrate, fats, and protein metabolism due to imperfections in insulin secretion, insulin action, or both of them [1][2][3][4][5].
e chronic hyperglycemia of diabetes is associated with long-term damage, dysfunction, and failure of different organs, especially the eyes, kidneys, nerves, heart, and blood vessels [1,6,7]. According to the World Health Organization, an estimation of about 422 million people worldwide have diabetes, and this number is expected to grow up to 693 million by 2045, and 1.6 million deaths are directly attributed to diabetes each year [8]. On the other hand, the worldwide economic expenditures for diabetes were estimated to be approximately USD 760 billion, and it is expected to reach over USD 802 billion in 2040 [9]. Day by day, both the number of cases and the prevalence of diabetes have been steadily increasing over the past few decades especially in the second-and thirdworld countries [2].
Medical diabetes diagnosis is one of the most challenging and important tasks in medicine [1]. To get the prediction of the disease, several parameters must be collected such as plasma glucose concentration, diastolic blood pressure, triceps skinfold thickness, serum insulin, body mass, and age [2,4], which may take a long time to analyze and make the final decision [1]. erefore, advanced computer and information technologies such as machine learning algorithms are used rather than traditional approaches [6]. is latter can help the physicians make critical medical decisions in a short time with small effort and little money with more accurate decisions [1].
e RF algorithm produces the highest accuracy compared to other algorithms. In [21], the authors developed a prediction model using DTapproach to identify low-risk individuals for incidence of type 2 diabetes for the Tehran Lipid and Glucose Study (TLGS) database. Moreover, different classification algorithms, such as Support Vector Machine (SVM), Multilayer Perceptron (MLP), Logistic Regression (LR), RF, and DT, have been compared in [22]. e K-fold cross-validation technique has been used to accurately classify diabetes. e MLP classifier achieved the highest accuracy. According to Jakka and Vakula [23], the performance of the diabetes prediction has been evaluated using several classification algorithms such as K-Nearest Neighbor (KNN), DT, NB, SVM, LR, and RF. e best accuracy achieved was with LR algorithm compared to other algorithms. Similarly, the authors in [24] have used many machine learning classification techniques such as DT, SVM, NB, RF, KNN, and LR to predict the disease, where LR and SVM algorithms work well on diabetes prediction compared to other techniques. In [25], the authors have proposed a comparative study on the disease diagnosis by using Levenberg-Marquardt (LM) and probabilistic MLP techniques, where the first one gave the highest classification accuracy. In [26], T. Roopesh et al. have employed a system to assess the performance of diabetes prediction using different machine learning algorithms by classification, regression, and clustering. Both the SVM and linear regression have obtained the highest accuracy in comparison with other techniques. Besides, Zou et al. [27] have made a comparative study between three classifiers (Neural Network, RF, and DT), where the latter was the better. In [28], a comprehensive comparative study was applied on various machine learning algorithms such as SVM, KNN, DT, NB, and LR for the disease classification, where LR gave the most accurate results. Likewise, Mujumdar and Vaidehi [29] have implemented many machine learning algorithms for diabetes prediction such as SVM, RF, DT, Extra Tree Classifier, AdaBoost algorithm, Perceptron, Linear Discriminant Analysis (LDA), LR, KNN, Gaussian NB, Bagging, and Gradient Boost. e LR gave the highest accuracy with 96%. Eventually, the authors in [30] used several machine learning algorithms including SVM, KNN, LR, DT, RF, and NB to predict diabetes disease. Both SVM and KNN algorithms provided the highest accuracy rate compared to the other algorithms.
However, machine learning techniques present some limitations in terms of precision and feature selection [1].
is drawback has been lifted by the Deep Learning (DL) algorithms, which are used widely in many forms in medical fields [31][32][33][34][35][36][37]. Numerous studies show that DL techniques give better results by minimizing the error rate, increasing the precision, and better resisting the noise, compared to other techniques [1,3]. DL techniques can perfectly handle a massive number of datasets and have the ability to deal with complex problems at ease [1], which makes them very adequate for our diabetes disease prediction system [6].
In this paper, we propose a diabetes prediction system for better diagnosis. Our work focuses on the following points: (1) Set up a system architecture for diabetes prediction based on DNN algorithm in order to make an efficient decision to the diabetes diagnosing; • An evaluation of four different DNN architectures to get the best model. (2) A comparison of best DNN model's results against those of many well-known ML classifiers such as LR, SVM, XGBoost, DT, and RF. (3) Furthermore, we compare our proposed method with the state-of-the-art methods that used the same datasets, the same experimental protocol, and the same performance measurements. e rest of the paper is planned as follows: e second section provides an overview of the proposed system. e section that follows presents results and analyses. en, we show the comparison of the state-of-the-art techniques. At last, Section 5 concludes the paper.

Proposed System
e proposed diabetes disease prediction system consists of many steps which are perfectly linked to each other to get the desired results. e first step consists of splitting the used dataset into two subsets, training and testing data. en, we applied two different categories (ML and DL methods) in order to carry out the training phase using the training samples with the best parameters. Eventually, the trained models will be able to predict the testing samples. e overall flowchart of the proposed system is shown in Figure 1.

Dataset Description.
To evaluate the performance of this work, we used the famous diabetes dataset taken from Frankfurt Hospital, Germany [38]. is latter contains 2000 records with 9 attributes for each one. A brief overview of the attributes can be found in Table 1, while the 9th one is considered as the target that shows the absence or presence of the disease (value of 0 or 1, respectively). In this dataset, 32.4% of the records had a value of 1 and the rest had a value of 0 (67.6%), taking into consideration the fact that all the patients are females and their ages are between 21 and 81. e first attribute "Pregnancies" shows the pregnancy frequency and it is described from 0 to 17. e Glucose attribute is the result of Glucose Tolerance Test, which examines how the body moves sugar from the blood into tissues such as muscle and fat; it has values ranging from 0 to 199. BloodPressure is the pressure in the arteries when the heart stops between beats; it has been recorded with a range of values from 0 to 122. Insulin is a hormone that aids in the movement of glucose (blood sugar) from the bloodstream into the cells, and its values are from 0 to 864. e Skinickness attribute provides information about the fat reserves of the body, it has values from 0 to 99. e BMI attribute offers a quick and accurate way to determine whether a patient is overweight or underweight. It has been recorded with a range of values from 0 to 67.1. Finally, DiabetesPedigreeFunction provides a synthesis of the diabetes mellitus history in relatives and the genetic relationship of those relatives to the subject, which can take float values from 0.078 to 2.42.

Dataset Preprocessing.
Data preprocessing is a crucial stage that transforms the data into a usable and efficient format, so that it can fit as an input to the machine learning algorithm. In our system, only one technique has been used for data preprocessing, which is data normalization. is latter is generally considered as the process of data structuring. It is also called StandardScaler normalization, where all the values of the attributes are within [−1, 1]. e StandardScaler formula is shown below in equation (1), where X represents the input columns of the dataset to transform and X_STS represents the transformed ones [39].

Prediction Methods.
In this subsection, we briefly describe the different machine learning methods as well as the Deep Neural Networks that we used for evaluating the proposed system.

Logistic Regression.
Logistic Regression (LR) is a subset of generalized linear models which deals with the analysis of binary data, which seeks out the best-fitting model for describing the connection between dependent and independent predictors [40,41]. When it comes to predicting sickness or health status, the LR model is most commonly used [42,43]. Based on the risk factors given, the LR model can calculate the likelihood of an individual acquiring diabetes disease [43]. If a person suffers from diabetes disease, the value of target is 1; otherwise, target is 0. We determined that the probability of an individual developing diabetes disease is P (X). e LR model's formula is defined as follows: After exponentiating both sides, we obtain where X 1 , X 2 , X 3 , . . . , X k represent the risk factors and β 1 , β 2 , β 3 , . . . , β k are regression coefficients.

Support Vector
Machine. SVM is a nonprobabilistic classifier with a separating hyperplane as its formal definition. e technique creates an ideal hyperplane with the greatest distance from the support vectors based on the available training data (supervised learning).
is hyperplane is a line that divides a plane into two classes in twodimensional space. e epsilon ε, regularization, and kernel parameters are the SVM classifier's tuning parameters [6,44]. e principle of SVM is shown below in Figure 2.

Extreme Gradient Boosting (XGBoost).
e Extreme Gradient Boosting is an improved supervised algorithm proposed by Chen and Guestrin [45] based on the Gradient Boosting Decision Tree algorithm [46]. XGBoost can be used to solve problems for regression and classification, which has been chosen to be used by data scientists because of its high execution speed and the high accuracy that it supplies [47].
e XGBoost objective function includes its loss function and regularization term, which can help to prevent overfitting by smoothing the final learned weights to obtain an optimal solution [48]. e loss function l(y i , y i ) controls the ability of the prediction, which determines the deviation between predicted label y i and the actual label y i . e regularization term Ω(f k ) controls the complexity of the model and it can also handle the overfitting issue [48,49]. XGBoost can also optimize the loss function using firstorder and second-order gradient statistics. e objective function for XGBoost is defined as follows [49]: e predicted label y i of the tree boosting model can be expressed as the total sum of all the trees prediction scores where k refers to how many trees are in XGBoost model and x i refers to the instances samples for a given dataset. Finally F is the space of classification and regression trees (also referred as CART) [46][47][48]: e regularization term for penalizing the complexity of each tree is shown in equation (7), where T denotes the number of leaves in the tree, λ is a regularization hyperparameter for controlling the L2-norm of the weights of leaf W, and c is a regularization hyperparameter for the simplicity cost by introducing additional leaf depending on each dataset [49,50].
e main concept behind boosting is to create a more accurate model by combining a lot of simple trees with low accuracy, which will create a new tree for each iteration.
ere are many different methods for creating a new tree [50]. e common one is called Gradient Tree Boosting which is an improved version of tree boosting by training tree model using the gradient descent to generate the new tree based on all previous trees. erefore, y i can be represented by y (t−1) i + f t (x i ), and the objective function in the step t Obj (t) is as follows [48]: e first-order and second-order gradient statistics of the loss function are shown below in the two following equations, respectively: It is worth noticing that g i and h i can help to find the optimal weights W. Hence, the objective function becomes [47,49]

Decision Tree.
DT is a nonparametric supervised learning algorithm for regression and classification tasks. DT (Figure 3) can be seen as a construction model that includes root node, division, and leaf node. Each internal node represents a test on an attribute, each division represents the outcome of test, and each leaf node grips the class label. e opening node in the tree is the root node. First, an attribute is selected and sited at the root node. en, a division is made for each possible value. is splits dataset into subgroups, one for every value of the attribute. e tree process is  recursively repeated for each division using only those cases that reach the branch. When all cases on a node have the same classification, the tree progress can be stopped. Usually, entropy or classification error is used to define the best tree division [51,52].

Random
Forest. RF is one of the most common uses of classifier integration. As shown in Figure 4, RF is made up of numerous separate Decision Tree classifiers that vote on test samples according to a set of criteria [53,54]. e steps are as follows: (i) Extracting some samples from the training set as a training subset using the bootstrap method, which is a self-help sampling approach.  [56]. DNNs have the same basic architecture as ANNs, with the exception that DNNs may have several hidden layers; that is why we use the term "deep." A Deep Neural Network can hold almost 150 hidden layers [1], and each layer can have several neurons as shown in Figure 5 and, in each layer of neurons, the input of a layer depends on the previous layer's output and so on until we get the prediction of our model in the output layer [57]. e final output value of the first neuron for hidden layer (1) is Z 1 , which is the sum of the products of the various weights and inputs with the bias as shown in equation (12). e value that Z 1 can take is any number from -∞ to +∞ so the neuron cannot decide whether to fire or not. Activation functions F are responsible for deciding whether the neuron will fire or not and calculating A 1 which would be the input for the next layer and so on [57]. e two activation functions used in the proposed model are the ReLU for the hidden layers and the Sigmoid for the output layer (binary classification).

Experimental Results
In this section, we evaluate the performance of DNN algorithm by using the testing data to assess the effectiveness of our system based on several evaluation metrics. Besides, comparison between our proposed model and the machine learning algorithms described in section (2.3) has been conducted in order to demonstrate the superiority of our model. e used dataset was split into two subsets, the first one for training which contains 80% of the whole data (547 diabetics/1053 nondiabetics) and the other for testing which contains 20% of the whole data (137 diabetics/263 nondiabetics).

Evaluation Metrics.
e confusion matrix ( Figure 6) is considered as a great tool to show the results summary of a model with the classification issues [1,56]. In the classification, the prediction can be one of four special cases as follows.
If the actual value of the target in the dataset is True and the classifier predicts it as such, then the prediction is a True Positive (TP). On the contrary, if the classifier predicts it as False, then the prediction is a False Negative (FN). Similarly, if the actual value of the target in the dataset is False and the classifier predicts it as such, then the prediction is True Negative (TN). On the contrary, if the classifier predicts it as True, then the prediction is False Positive (FP) [58].
Finding out how the developed predictive model performs becomes easy with the help of the confusion matrix, which is clearly shown above in Figure 6. e following metrics are used to evaluate the proposed model [49,[56][57][58][59].
Accuracy (Acc) is the percentage of the correct predictions that a classifier has made compared with the actual values of the target in the testing phase.
Specificity (Spec) gives information about of True Negatives that are correctly classified during the test.
Precision (Pre) is the percentage of instances that a classifier has labelled as positive with respect to the total predictive positives (the exactness of a classifier).
F1-score shows the harmonic mean of precision and recall.

Prediction with ML Methods.
A comparative analysis of all the conventional machine learning algorithms has been Complexity 5 done in this section for diabetes prediction. It has been done for comparing and analyzing accuracies of all the conventional algorithms.

Hyperparameter Optimization.
Hyperparameter optimization (i.e., tuning) is important because it directly controls the behavior of the training process of the algorithm and has a significant impact on the performance of the model. ere are four common methods of hyperparameter optimization: Manual search, Random search, Bayesian optimization, and Grid search [56,58]. In this work, we applied the Grid search method for each algorithm which systematically builds and evaluates a model for each combination of parameters in a specific grid. We implemented five machine learning classifiers for binary classification by determining whether or not the patient has diabetes, where each classifier has many different hyperparameters that are not necessary to change, but the main of them needs to be altered to get a good model. us, to achieve better results, these parameters and their default values for each algorithm are shown in Table 2. Now in order to show the impact of hyperparameters optimization on the overall system results, we compare the performances of the selected ML algorithms with and without the use of this process. Table 3 presents the average score obtained from each classifier using five metrics. We clearly see that all prediction methods give better results than without optimization, while RF gives the highest performance among the others.

Evaluation of the DNN Method.
ere are different types of layers in DNN. In this work, three types of layers were implemented: a dense layer, which consists of a matrix of weights and the bias; a dropout layer, which can prevent an overfitting issue by dropping out certain fractions of layer's inputs units at each stage of training [1,60]; and a batch normalization layer, which performs synchronized rescaling for the layer's inputs. We used the Early Stopping technique, which controls the improvement of our model [61]. We have made many experiments by changing the number of layers, the number of neurons in each layer, and different types of layers as shown in Table 4.
As shown in Table 5, the DNN model number 4 is the best one with the following parameters: Epochs � 500, Batch_size � 200, and Random_state � 0. erefore, this model is considered for the rest of this study. e confusion 6 Complexity matrix of DNN prediction results is shown in Figure 7. e performance of the model can be easily got using this confusion matrix by determining the metrics summarized in Table 5. e behavior of the accuracy is shown in Figure 8, where the blue line represents the training phase, and the orange one represents the testing phase resulting in the best values of the accuracy, 99.0% and 99.75%, respectively.

Performance Comparison.
To give an idea of how the proposed DNN has superior performance, we compared it with other prediction methods evaluated above x. In the following, we discuss the obtained performance for each classifier using Boxplot diagrams.

Accuracy.
e accuracy performance of the proposed DNN in comparison with five ML methods is shown in Figure 9. Obviously, DNN achieved the highest ACC with 99.75%, where all the implemented ML methods also perform excellently. Only LR performs relatively poorly with an ACC less than 80%. Figure 10 shows the specificity performance of the proposed DNN in comparison with other ML methods that performed excellently with more than 96%, except LR that shows the lowest specificity. e highest value of specificity is 99.60%, and it was achieved by the DNN method.

Sensitivity.
e sensitivity performance of the proposed DNN and ML methods is shown in Figure 11. e proposed DNN has achieved the highest sensitivity with 100.0%. e other ML methods performed excellently with more than 95%, except LR method that presented a very bad performance.

Precision.
e precision performance of the proposed DNN and ML methods is presented in Figure 12. e highest precision achieved (99.32%) was that obtained with DNN method. In addition, the ML methods have achieved a good range of precision with more than 93%, except the LR method that gave the worst precision.

F1-Score.
e F1-score performance of the proposed DNN and other ML methods is shown in Figure 13. Except LR technique, all used methods performed excellently with F1-score greater than 94%. e highest value of F1-score (99.66%) was achieved by using DNN.
Based on these statistics, it was observed that the proposed DNN is the better prediction model among the other implemented ML methods.

Comparison with the State-of-the-Art Methods
To present how well our diabetes prediction system performs, we compared it with other works that used the same dataset and the same performance measures. It is worth noting that this comparison was based only on the accuracy metric because the other evaluation metrics are not available. As observed from Table 6, the proposed DNN prediction outperforms works reported in literature.  Kernel Specifies the kernel type to be used in the algorithm. "rbf" ["linear," "poly," "rbf," "sigmoid"] "poly" XGBoost learning_rate Step

Conclusion
In this study, we proposed an efficient diabetes prediction system based on Deep Neural Network (DNN) algorithm to identify whether or not a person has diabetes. We presented a comparative study between the Deep Neural Network (DNN) and several machine learning techniques. e performance evaluation of these models that have been studied and evaluated on various performance metrics such as accuracy, specificity, sensitivity, precision, and F1-score proved the superiority of the proposed DNN method. Furthermore, we performed a comparison between our system and the state-of-the-art methods. is comparison showed that a diabetes prediction system based on DNN algorithm could significantly provide promising, better performances compared to the state-of-the-art techniques. Applying this method can have a direct impact and economic saving on the design and development of diabetes disease prediction system in healthcare.

Data Availability
e data used to support the findings of this study are freely available.

Conflicts of Interest
e authors declare that they have no conflicts of interest.