An Improved Artificial Neural Network Model for Effective Diabetes Prediction

Data analytics, machine intelligence, and other cognitive algorithms have been employed in predicting various types of diseases in health care. The revolution of artiﬁcial neural networks (ANNs) in the medical discipline emerged for data-driven applications, particularly in the healthcare domain. It ranges from diagnosis of various diseases, medical image processing, decision support system (DSS)


Introduction
e revolution in artificial neural networks (ANNs) within the medical discipline research domain emerged in datadriven applications, particularly in the healthcare sector. It ranges from diagnosis of various kinds of diseases, imageprocessing in the medical field, decision support system (DSS), and disease prediction. Many contemporary ANN models such as deep learning models, recurrent neural networks, and genetic algorithms are significant in artificial intelligence, robotics, image processing, and several other cutting-edge technologies, especially in the health sector. ANN is an ideal tool that is used in the identification, analysis, and prediction of general and health sectors involving in rombo-embolic stroke disease, bone densitometry, hepatitis B, and breast cancer [1].
is paper presents the predictive framework for a successful diagnosis of diabetes using a dataset comprising female patients with corresponding attributes. It is quite pertinent to mention that typical regression models have been employed to solve the problem. Usually, these models' basis includes inference of statistical independence and interdependencies of their input and output variables and uniformity of continuity and presence of external variables.
On the other hand, in most applications, these inferences are often offended or disregarded due to complicated physiological characteristics. To enhance patients' diabetes self-management, research in designing various models is in progress [2]. However, several models and programs have been developed to provide medical research benefits; rigorous testing of these models is partial and inadequate, resulting in very hard to perform and difficult to manage. Here, in this research, a well-organized and refined regression model, i.e., artificial backpropagation scaled conjugate gradient neural network (ABP-SCGNN), is proposed that predicts the underlying disease since ANN has been employed in medical research and analytical studies [3]. Backpropagation is supervised learning of ANN using gradient descent. e algorithm computes the gradient of the error function in connection with weights.
Diabetes is one of the diseases that affect problems associated with the hormone insulin. It may appear when the human body acts in response, particularly in a negative manner. Unfortunately, it has obtained the cure for nothing up till now, but people with diabetes can adopt certain precautions to manage their disease and enjoy a healthy life.
is paper focuses on the successful prediction using ANN in the diagnostic approach by choosing some attributes carefully. Upon the positive predicted result, people may consult their doctors for prior precautions treatment. Typically, suffering from diabetes, patients become the victims of other serious diseases ranging from heart attack to kidney failure if not appropriately controlled [4]. In many countries, a significant cause of cardiovascular disease, blindness, kidney failure, and lower limb amputation is diabetes.
e number of patients with diabetes has been increasing worldwide, nearly 200 million people are targeted, and more than half of the population was female. Females can quickly become victims of this disease during the age range from 25 to 44 and at high risk accordingly. Several research studies are in progress to predict diabetes for early diagnoses and cure hence [4,5].
Such models can be designed and implemented using ANN that proved to be more helpful, efficient, and effective in several fields of medicine like analysis, diagnosis, and prediction and assist not only professionals but also ordinary people [5]. ANN is a representation of the human neural system mathematically, demonstrating the strength of training and generalization. Most of the ANN techniques are based on nonlinear functions in which the link or association of input features is either a bit complex or unknown. A series of nodes also called neurons that form an ANN is organized in different layers. In a typical statistical model of an ANN, each neuron is directly connected to the neurons of the other layers employing some weighted values that illustrate the strength or power of the connection between them [6,7]. Each neuron input is affected with the weighted permutation of several input signals that may contain distinct computations and finally on the resultant output. ese neurons apply transfer function to the weighted inputs to evaluate the threshold value. Using the activation function, the message is sent to the next concerned neuron if the threshold value exceeds.
Based on the functions of ANNs, it is significant to understand when it presents prediction, perception classification, and pattern recognition along with training accordingly [7]. Despite this, on the other hand, significant work is carried out towards ANN development for the applications in medical fields, for example, classifications, clustering, data optimization, and prediction based on a given set of inputs. Any ANN model consists of several components comprising layers, specifically an input layer, one or more hidden layers, an output layer, several neurons, and their corresponding interaction. Choosing these features is sensitive as few features result in slow training, and on the contrary, many features decrease the overall network processing power.
ere are different ways to determine these components, typically the pruning method and growing method. One can go with many features in the pruning method and reduce the network size with those not important components [8]. e growing process that was used in the proposed work adds desire components gradually. Several algorithms have been applied to perform various operations on the dataset used in this paper to select the attributes, processes the data, and predict diabetes accordingly. Diagnosis in the medical sector using the same diabetes dataset utilizing general regression neural networks (GRNNs), multilayer perceptron (MLP) neural networks, radial basis function (RBF), and feed-forward neural networks for comparing the performance of the MLP was tested for different types of backpropagation training algorithms [9]. e adaptive learning routine has been employed in [10]. e rest of the paper is organized as follows. Section 2 presents the previous related work of diabetes prediction. Section 3 explains the proposed methodology in which the ABP-SCGNN-based diabetes prediction framework is given in detail. Experiments and results are introduced in Section 4, and then a conclusion of the study with future work is shown in Section 5.

Literature Review
In 2011, Sapon et al. [1] took 250 diabetes patients who were both male and female ranging 25-78 years with 27 input variables to training the network to identify the disease pattern. Among the three algorithms, the Bayesian regulation algorithm presented the most excellent result in the prediction of diabetes to Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm and Quasi-Newton and Levenberg-Marquardt algorithms. BFGS Quasi-Newton possesses 0.86714 correlation coefficients with 578 epochs while Bayesian regulation acquires 0.99579 for 37 epochs only and Levenberg-Marquardt holds 0.6051 for only five epochs. In [1], the Bayesian regulation algorithm presented a good correlation between estimated targets and actual outputs (i.e., 0.99579) with 88.8% prediction accuracy that affirms the validation that exhibits the appropriateness of this algorithm to carry out the successful diabetes prediction.

Complexity
For the sake of dataset classification, in 2012, Choubey et al. [8] applied a naïve Bayes (NBs) classifier, as well as a genetic algorithm (GA) with NBs methods to predict diabetes in females aged from 21-78 years. e total number of instances was 768. First, using naïve Bayes, the process of classification has been performed on PIDD, and a genetic algorithm was used to add and remove attributes from the dataset. It comparatively decreased the computational cost as well as time and increased ROC and classification accuracy. e result comparison on PIDD regarding accuracy with ROC, GA, and NB highlights the most accurate result and better ROC compared with other methods.
e Levenberg-Marquardt training algorithm has demonstrated the best result against the training data. e accuracy of RBF was not better than that of MLP, even utilizing the all-underlying values. e GRNNs attained the best result using test data (i.e., 80.21%).
is algorithm proves to be an excellent and pragmatic selection for the successful classification of diabetes data.
In 2016, Florez et al. [11] used 700 instances as training data that were selected randomly using software R to predict the intensity of diabetes. MSE was 2.952. Next, the higher risk subgroups of characteristics that lead to diabetes were obtained. e unmodified model is the number of times pregnant (PRG), plasma concentration in saliva (PLASMA), BP, body mass index (BODY), and diabetes pedigree function (PEDIGREE). PRG, PLASMA, BODY, and PED-IGREE have certain effects on predicting diabetes according to their coefficient. A large number of variables with a higher probability of containing diabetes possess the MSE to be 3.21068. erefore, the concluding variables include PRG, PLASMA, BODY, and PEDIGREE.
Iqbal et al. [12] proposed an intelligent system for smartcities using clusters and fuzzy inference systems to predict traffic congestion. e solution for traffic congestion is presented using smart mobile technologies. It motivates the idea for the prediction of diabetes using computational intelligence technologies.
Since the proposed framework is based on the ABP-SCGNN algorithm, the experiment might proceed with a small number of components, and a decision could be made on the performance of the learning curve. e features of the network can be concluded upon generating a small error rate when training complete.

Proposed Methodology
People of any age group may become the victim of diabetes. e reasons may vary among different age groups, gender, living style, glucose and insulin level, BP, and so on. At present, numerous algorithms, such as ANNs, SVM, and naïve Bayes (NBs) with fuzzy logic (FL), are incorporated for predicting the diagnosis of diabetes. ese algorithms have their trade-offs regarding their processing time and accuracy rate, obtaining hidden information from the given data. Since in complex problem domains, finding an optimum solution may not require a trivial approach [13][14][15][16] and the proposed framework drawn in the following figure demonstrates the working process of ANN algorithms for predicting, validating, and systematically testing the network for the intended purpose to improve the self-reliance and significant certainty. Operations and parameters areyielded during ANN training, and a careful comparison is made on predicted and obtained values used for ANN optimization. e whole course of action is explained in Figure 1 schematically.
According to the framework, the network obtains dataset that consists of female diabetes patients' detail for predicting the disease diagnosis. Since the dataset contains different attributes/features (glucose, insulin, and BP) relevant to the patients' detail, a careful selection of these features must be performed as some features may contribute to misleading the result because of noise or null data. Sometimes, the value range among attributes is high. It is recommended to apply a formalized process for the sake of minimizing the erroneous result while finding a relatively improved result [9]. Data that need to train must be preprocessed before the evaluation process. Network architecture varies from the classifier to classifier, exhibiting the underlying algorithm parameters that are dependent on the classifier that is supposed to train the network. Other diabetes patients can use this network for predicting the diagnoses of their disease. To end with it, a general physician of diabetes or doctor of medicine performs an evaluation and appraisal process on this expected diagnosis. A detailed explanation of the proposed framework is provided as follows.

Data Collection and Representation.
e dataset used in the study is obtained from the National Institute of Diabetes and Digestive and Kidney Diseases [17]. e purpose behind this is to predict the disease considering some selected diagnostic key attributes included in the dataset whether a person is a diabetes patient or not. e dataset contains the data for female patients only with minimum age 21 years for the resident of Arizona USA. e responder includes a binary value of either 0 (negative test representing diabetes) or 1 (positive test representing diabetes). Approximately 35% (268) patients are included in a class containing value 1. On the other hand, 65% (500) patients are included in another class containing a value of zero [9]. e researchers have mentioned many vital factors that may vary from patient to patient for diabetes currently or after a predefined period [18].

Feature Selection/Reduction.
To diagnose diabetes correctly, key attributes are normally independent of each other. Medical experts examine the attributes carefully and decide if each is required in identifying the viable diagnosis. ese attributes, called features in ANN, may be identified as diabetes symptoms and another relevant piece of information that assists in predicting diabetes. Choubey et al. [8] used GA for selecting attributes (features) and NBs for classification on PID dataset.
Since feature selection is a process of identifying the most relevant features extracted from the complete set, it Complexity increases the success ratio of predicting the diagnosis of diabetes patients [8]. Given the fact [18], these features must be robust and noise-free; hence, a careful selection is made in choosing these features with the entire dataset. Women may have diabetes from the sixth month of their pregnancy, and in addition to it, a certain level of glucose and insulin play a vital role in the diagnosis of diabetes. Eventually, there are eight explanatory input variables with one output responder variable that are included in the selected dataset. Each variable plays a vital role in diagnosing diabetes individually. Accumulating their values enables the network to be trained effectively and perform the diabetes prediction in return. e detail of both types of variables is explained in Table 1.

Preprocessing.
Many methods are available to preprocess the data in the dataset before model evaluation [19,20]. e preprocessed result must demonstrate the anticipated output. Learning rate, momentum, and time were taken; all are obtained in the response to the preprocessing process. Data must be transformed in the form of fulfilling the acceptance criterion. Accumulating the provided values in various features like blood pressure, skin thickness, insulin, and BMI must map with the outcome value. e diabetes dataset contains 768 instances with 8 input variables that are enough to predict the diabetes diagnosis using the simulation model. Noisy data have already been eliminated before receiving the dataset from the concerned authority [19].

Neural Network Fitting Process.
e objective to accumulate the abovementioned information is to design such a neural network model that can predict precisely whether a particular patient possesses diabetes or not. Since ANN is the combination of many algorithms, some algorithms are used in the paper to train the network, obtain the prediction ratio, and given that compare the mutual results. After choosing the patients' dataset, the next step is to segregate the data according to the requirements. In our case, data are needed to be divided into three main segments as training, validation, and test data. Training data are presented to the network when the training session needs to commence while adjusting the network with an error reference. e diabetes dataset consists of 768 patients, and 70% (approximately 537) instances are used for training purposes. Multiple pieces of training may generate different results each time regarding various conditions as well as data. On the other hand, validation data need to measure the network generalization. When it reaches its optimum level, the validation process stops. Approximately 115 instances are selected to perform the validation process to evaluate the training impact pragmatically. In so far as test data are concerned, there is no effect or change on training data due to testing data and after training process. It enhances the independent measurement of the desired performance of the underlying network. A set of 115 instances are chosen for testing purposes, and each algorithm's overall performance is observed and recorded accordingly.

Proposed System Model.
e ABP-SCGNN method architecture is segregated into four distinct parts, as depicted in Figure 2. ese include initialization of weight, forward and backward propagation of error, updating of weight, and bias. e hidden layer consists of a number of neurons, and every neuron has an activation function as f(x) � Sigmoid(x). Activation function gets the sum of weighted input (w 11 * x 1 + w 21 * x 2 + w 31 * x 3 + · · · + w 81 * x 8 + 1 * b) argument as follows: Activation function was used to build a nonlinear transformation that is allocated to correspond nonlinear proposition or for assessing the complex functions. e inputs of the activation function were calculated by multiplying the weight by input and then added bias value. In the following section, the proposed neural network is explained how it performed its intended operations. To begin with, forward propagation is performed. First, backward propagation is made after comparing predicted Y output with actual output Y during the calculation of the gradient error at the output layer.
A neural network is trained with diabetes datasets by using the following algorithms: (1) Multilayer perceptron (MLP) (2) Bayesian regularized (3) Scaled conjugate gradient e proposed framework contains one input and one output layer with a single hidden layer containing eight input features (neurons) that produce one of the two outputs mentioned in the proposed research methodology. e dataset comprises input and output layers (X i , Y i ), where X i is the input layer and Y i is the predicted output. If n is the total size of the entire set, then X n � x i , y i , . . . , x n , y n .
e parameters of feed-forward are denoted collectively as θ. Since training a neural network involves the gradient of the error function E � (X, θ) relevant to the weight w k ij (weight at node j in layer lk for node i), biases are b k i (bias at node i in layer lk), so as per learning rate, gradient descent updates the weights at each iteration as follows: where θ t represents ANN parameters at iteration t. e MSE in backpropagation is where yi is the predicted output and yi is the actual output of the input xi. e derivative of f (x) is represented asf(x), and the derivative of the sigmoid function is σ ′ (x). b(k/i) (bias) in layer kth at node ith is included into the weight as w k oi with an output o k−1 0 � 1 at node 0 in layer k − 1; hence,  Complexity . .
Weight and bias values are initialized with random values as one-time initiation; an updated weight value is used in subsequent iterations. Equation (6) can be written as follows: Equation (7) demonstrates that W ⇀ IH contains the weight from the input layer to hidden layer where as W ⇀ HO is the weight representing from hidden layer to output layer.
In our case, only one output neuron is used. Equation (9) can also be written as W ⇀ HO � w n+1,0 w n+1,1 · · · w n+1,m . (10) Corresponding to the original formulation T k−1 , en, the error can be calculated as After taking the derivative of Equation (12), e backpropagation algorithm is concerned with one I/ O pair, and all I/O pairs in X may be produced in merging each gradient. For derivation, the error function is

Error Function Derivatives.
After applying the chain rule, where a k j treats as activation of node j in layerk. Here, e second term from the equation for a k j is as follows: Error function E partial derivative with w k ij (weight) is as follows: So, the partial derivative of weight is a product of the error term φ k j at node j in layer k and the output o k−1 i of node i in layerk − 1.

e Output Layer.
Backpropagation characterizes the value φ m 1 , where m is the final layer. Four-layer neural network possesses m � 3 for the ending layer and m � 2 for the second to the last layer. Expressing E (error function) in terms of the value a m 1 (since φ m 1 is a partial derivative for φ m 1 ) gives en, applying the partial derivative, For combining the partial derivatives, For updating the weights, 3.6. Algorithm of ABP-SGCNN Model. Assume a as the learning rate and parameter initialization as w k ij , following steps are used to keep progress in the proposed algorithm.
3.6.1. Calculate the Forward Phase. For each pair of input and output (x ⇀ d , y d ), store the results (y d , a k j ) and (o k j ) for each node (j) in layer (k) by proceeding from layer zero, input layer, to layer (m), the output layer.

Calculate the Backward Phase.
For each pair of input and output (x ⇀ d, , y d ), store the results (zE d /zw k ij ) for each weight (w k ij ) connecting node(i) in layer (k − 1)) to node (j) in layer (k) by proceeding from layer(m), the output layer, to layer (1), the input layer.
Evaluate the error term for the final layer using equation (2).
Backpropagate the error terms for the hidden layers δ k l , working backward from thek � m − 1, by repeatedly using equation (3).
Evaluate the partial derivatives of the individual error E d regarding w k ij by using equation (1).

Combine the Individual Gradients.
Combine the individual gradients for each input-output pair (zE d /zw k ij ) to get the total gradient (zE(X, θ)/zw k ij ), for the entire set of input-output pairs X � (x ⇀ 1 , y 1 ), . . . , (x ⇀ N , y N ) by using equation (4) (a simple average of the individual gradients).

Update the Weights.
According to the learning rate a and total gradient (zE d /zw k ij ) and by using equation (5) (moving in the direction of the negative gradient), the weights are updated using equation (27).

Experimental and Analytical Verification
Probably, it is not likely to achieve the desired result during a single training iteration. However, sometimes, the model needs to train several times till it approaches near to the predicted outcome. After completing thousands of iterations, the measurement and evaluation results of predicting the diagnosis of diabetes are described in Table 2 for analysing the actual performance. Taking into account, optimizing the framework along with training the network for 5000 epochs and using 768 instances, the experiment demonstrates the potential benefit and worth of neural network algorithms. For all algorithms used in training, the learning rate was 0.25, and the momentum coefficient was 0.5. During the simulation, a number of neurons that were Mean squared error (MSE) is calculated on all algorithms. e prediction accuracy is calculated with the ratio of the total number of predictions that are correct to the complete predictions. e proposed algorithm (ABP-SCGNN) produces more correct predictions with 93% prediction accuracy. Figure 3 presents the MSE during the learning process for the hidden layer using different number of neurons in the range 5 to 50 which is computed against all algorithms. Table  3 reveals that the MLP algorithm contains 0.0026 using five neurons at the hidden layer; no doubt by increasing the number of neurons, i.e., 20 and 40, MSE was approaching low, but network complexity increased along with decreasing MSE rate. e Bayesian regularization algorithm has approximately high MSE, which is the same for a different number of neurons of the hidden layer. On the contrary, the proposed algorithm, ABP-SCGNN, attains relatively a lower MSE result when five neurons are used at the hidden layer, and this result is reduced more upon increasing the number of neurons, as shown in Figure 4. It confirms that using 20 neurons gets a lowest MSE and a high accuracy results. Moreover, the model achieves a regression accuracy value of 93%, as presented in Figure 3. Consequently, a trade-off between network complexity and performance of the model is approaching for 20 hidden neurons in both MSE and accuracy measures. e regression accuracy plot in Figure 3 depicts a significant correlation of the proposed algorithm between targets and predicted values that are identified by the dashed line that exhibits the highest accuracy.
ree different numbers of neurons were used in the hidden layer, as mentioned in Table 2. Prediction accuracy of MLP remains the same for all neurons used. In the same case with the Bayesian regularization algorithm, its regression accuracy result ranges from 70 to 78 using the same number of hidden neurons taken for the other two algorithms.
Since forward propagation concerns the inference phase of a feed-forward neural network and the learning phase deals with the backpropagation neural network, the learning phase is relatively slower than the inference phase because gradient descent needs to repeat several times during the whole process. Using parallel running, the work performance of underlying algorithms can be efficient. e computational complexity of the feed-forward neural network is calculated by splitting the computation in the training and inference phase. As shown in Figure 5, the overall time complexity will increase with the number of epochs. However, the efficiency of algorithms remains stable. e following equation calculates the factor of complexity (Fc) according to Figure 5.
where i is the input layer neurons, j is hidden layer neurons, and it is the number of iterations.
Likewise, e MSE time complexity can be calculated for the same algorithms as follows: where Tc is time complexity, i is the input layer neurons, j is hidden layer neurons, and k is the number of iterations. Figure 5 demonstrates the time complexity based on the  Indeed, with the increase in neurons, the success ratio increases to some extent. However, after examining 20 neurons, the success ratio remains the same in conjunction with increase in time complexity. Table 4 shows the comparison among various neural net algorithms with the proposed (ABP-SCGNN) algorithm that demonstrates its highest success ratio, such as 93%.
e Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm contains an 88.8% success ratio higher than ever used earlier in the underlying problem. On the other hand, the proposed algorithm shows the best result regarding success ratio and regression accuracy.

Conclusion
Using artificial backpropagation neural network (ABPNN), a pragmatic framework has been proposed to predict the diagnosis of diabetes.
e presented results have been demonstrated in the paper that it can be viable to model an adaptive framework using APBNN. Since the applications of ANN have been organized differently from the classical approach, the features are provided as input variables rather than to provide them into the ANN. e proposed ABP-SCGNN framework is effective and efficient, with a 93% success ratio when simulated with a test PIDD dataset [17]. To check the effective performance of ANNs and computational analysis, many performance indicators were calculated. Among these indicators, mean squared error (MSE) and regression accuracy analysis were examined anxiously. All neural networks were trained using 5 to 50 hidden layer neurons with the same

Algorithm
Accuracy (%) BFGS [1] 88.8 Genetic algorithm [8] 87 GRNN [9] 80.21 ABP-SCGNN (proposed) 93 dataset. e best result was demonstrated at a hidden layer containing 20 neurons, specifically at the proposed ABP-SCGNN. e rest of the two algorithms remain with high MSE at the same 20 hidden layer neurons, whereas the presented ABP-GCGNN gives attractive results for the same dataset. e experimental results demonstrated that ABP-SCGNN was the best algorithm that exhibited the highest prediction result among all algorithms. Since ANN possibly presents new approaches, strategies, and methodologies to remove uncertainties and reduce the potential instability of correlation, the intensity of network performance achieved using the proposed framework verified that ANN is functionally beneficial for successfully predicting the diagnose of diabetes with suitable disease input features and a corresponding set of instances.
Data Availability e dataset used in this study is obtained from the National Institute of Diabetes and Digestive and Kidney Diseases, as cited in [17] (http://archive.ics.uci.edu/ml).

Conflicts of Interest
e authors declare that they have no conflicts of interest to report regarding the present study.