Soft Clustering for Enhancing the Diagnosis of Chronic Diseases over Machine Learning Algorithms

Chronic diseases represent a serious threat to public health across the world. It is estimated at about 60% of all deaths worldwide and approximately 43% of the global burden of chronic diseases. Thus, the analysis of the healthcare data has helped health officials, patients, and healthcare communities to perform early detection for those diseases. Extracting the patterns from healthcare data has helped the healthcare communities to obtain complete medical data for the purpose of diagnosis. The objective of the present research work is presented to improve the surveillance detection system for chronic diseases, which is used for the protection of people's lives. For this purpose, the proposed system has been developed to enhance the detection of chronic disease by using machine learning algorithms. The standard data related to chronic diseases have been collected from various worldwide resources. In healthcare data, special chronic diseases include ambiguous objects of the class. Therefore, the presence of ambiguous objects shows the availability of traits involving two or more classes, which reduces the accuracy of the machine learning algorithms. The novelty of the current research work lies in the assumption that demonstrates the noncrisp Rough K-means (RKM) clustering for figuring out the ambiguity in chronic disease dataset to improve the performance of the system. The RKM algorithm has clustered data into two sets, namely, the upper approximation and lower approximation. The objects belonging to the upper approximation are favourable objects, whereas the ones belonging to the lower approximation are excluded and identified as ambiguous. These ambiguous objects have been excluded to improve the machine learning algorithms. The machine learning algorithms, namely, naïve Bayes (NB), support vector machine (SVM), K-nearest neighbors (KNN), and random forest tree, are presented and compared. The chronic disease data are obtained from the machine learning repository and Kaggle to test and evaluate the proposed model. The experimental results demonstrate that the proposed system is successfully employed for the diagnosis of chronic diseases. The proposed model achieved the best results with naive Bayes with RKM for the classification of diabetic disease (80.55%), whereas SVM with RKM for the classification of kidney disease achieved 100% and SVM with RKM for the classification of cancer disease achieved 97.53 with respect to accuracy metric. The performance measures, such as accuracy, sensitivity, specificity, precision, and F-score, are employed to evaluate the performance of the proposed system. Furthermore, evaluation and comparison of the proposed system with the existing machine learning algorithms are presented. Finally, the proposed system has enhanced the performance of machine learning algorithms.


Introduction
Chronic diseases are serious diseases because they pose a serious threat to people's lives and persist over long periods. ey can impede the freedom and health of people who have physical disabilities. us, they further cause frustration of people who suffer from various health disabilities. e available vaccines and medicine cannot completely prevent chronic diseases because they show no indications in any case. With aging changes, chronic diseases continue to become a more common phenomenon. Hence, there is a need to identify factors causing them and to take the required corrective measures accordingly. Factors such as smoking, physical inactivity, food diet, and insufficient or excessive alcohol consumption could largely contribute to chronic diseases. Previous studies have identified chronic diseases as the seventh cause of death among other causes. In the United States, they resulted in 65.8% of deaths among US males and 67.2% of deaths among US females in 2010 [1]. Heart, cancer, diabetes, asthma, and kidney diseases are identified as chronic diseases. In addition, chronic diseases are measured as noncommunicable diseases; they slowly end the life of people in a long period. Chronic diseases do not transfer from one person to another. In the United States, chronic diseases drive up medicinal service expenses and break up human services reasonably. ey possess an essential part of the economy and thwart the health quality of people.
is study promotes the classification of chronic disease conditions, namely, cancer, kidney, asthma, and diabetes. e World Health Organization reported in 2002 that mortality, dreariness, and incapacity were credited to the major chronic diseases. Currently, records show that 60% of all deaths and 43% of the global weight of illnesses are attributed to chronic diseases. By 2020, it is expected that the percentage of deaths will reach 73% of total deaths and 60% of the global weight of sicknesses [2]. With the help of machine learning algorithms, predicting chronic diseases has become an easy task. erefore, it is aimed here to develop a surveillance system to predict and diagnose healthcare data for helping the health communities. e machine learning algorithms increase the level of individual prediction, and therefore diagnosis will ultimately strengthen anticipation efforts. e availability of well prevention measures will not only enhance or provide good health for persons but also reduce healthcare spending. Machine learning techniques have been used widely in the healthcare domain. ey have now become crucial tools for healthcare management. ey have also assisted in improving health care by using the prediction measures for epidemic diseases faced by people around the world. e World Health Organization (WHO) has significantly benefited from the employment of machine learning applications that improve the quality of health care.
Machine learning algorithms are considered to be classification, clustering, and prediction for the sake of solving various issues in real-time applications. ey provide an assurance of the classification and prediction solutions for stability and reliability in performance. Based on machine learning algorithms, a few researchers have developed successful healthcare systems. Algorithms include statistics, SVM, decision trees, clustering, and optimization algorithms and others. Machine learning applications rely largely on datasets that analyze and discover the patterns that are used to solve specific tasks. e healthcare system has the potential promotion in the health domain to extract and discover the hidden patterns in the database [3]. us, the available healthcare data are universally scattered and ambiguous. ey may also contain insufficient and insignificant information stored in terms of the constancy in prediction and classification. One of the biggest challenges of healthcare data and its information is the accurate diagnosis of certain significant information. To predict and analyze the chronic diseases such as kidney, diabetic, cancer, and heart diseases, there are several proposed machine learning algorithms that can be used. ese algorithms include the decision tree (DT), SVM, ANN, linear regression (LR), KNN, NB, and time series prediction models. Because of the rapid innovation and continuous changes in software engineering, a huge volume of information can be generated. With the development of a healthcare database management system, there will be more opportunities for the enhancement of the healthcare systems. Extracting patterns from these datasets and managing large amounts of dimensionality data have become a major field of machine learning. e machine learning algorithm is considered to be the classification of healthcare datasets to obtain useful knowledge that can help health officials and communities. To apply machine learning algorithms that enhance the performance of the classification process, the preprocessing of the soft clustering algorithm is required. e remaining parts of the article are organized into sections. Introduction is discussed in Section 1, related studies are given in Section 2, data and methods are shown in Section 3, and results and discussion are shown in Section 4. Lastly, conclusion is presented in Section 5.

Related Studies
ere is a considerable number of research works that have been done in relation to the classification and the prediction of healthcare data. Solanki [4] proposed most of the classifier algorithms on the Weka tool for predicting the prevalent sickle cell disease. e obtained results compared with the classifiers are available on the Weka data mining tool. It is observed that the random tree approach is a better algorithm for classifying sickle cell. Similarly, Joshi et al. [5] used a number of machine learning approaches such as Bayes net, logistic model tree (LMT), multilayer perception, stochastic gradient descent, and sequential minimal optimization techniques. ese researchers suggested using LMT algorithms for diagnosing breast cancer because of its high performance and accuracy. Furthermore, David et al. [6] applied the KNN algorithm, Bayesian network, decision tree algorithm, and random tree method, namely, the J48 tree to predict leukemia disease. Accordingly, it was found that the decision tree algorithm had shown better accuracy in the result. In one more study conducted by Vijayarani and Sudha [7], LMT and the sequential minimal optimization multilayer, and perceptron algorithms are employed to predict heart diseases. Furthermore, the study conducted by Sugandhi et al. [8] proposed a random tree algorithm for the classification of heart diseases. e outcome of the research has shown that random tree gives a better performance than other classification algorithms. Consequently, for obtaining results from a random tree classifier, it is found that the random tree classifier is outperformed. e study of Yasodha and Kannan [9] has also reported that the Weka classification algorithm was used for analyzing and predicting the diabetic patient's database. Likewise, Bin Othman and Yau [10] have compared various classification approaches with the Weka data mining tool for predicting breast cancer. Israa [11] has applied NB, decision tree (DT), random forest, and support vector machine techniques to improve the classification of heart diseases. On the other hand, D. Sisodia and Sisodia [12] have proposed three machine learning algorithms; that is, DT, SVM, and naive Bayes (NB) to detect diabetes.
us, experiments have been done by using standard data from the UCI machine learning repository. It is observed that the NB approach outperforms as compared with other algorithms, which have an accuracy rate of 76.30%. e study of Syed et al. [13] has employed SVM, Bayesian network, and decision tree algorithms to predict the obesity of schoolchildren. Sandeep et al. [14] proposed linear discriminate analysis (LDA), NB, random forest, LR, and quadratic discriminate analysis (QDA) for the analysis and classification of chronic kidney diseases. e study of Sahana and Minavathi [15] has also focused on predicting kidney disease using classification algorithms, namely, ANN and C45. It concentrated on accurate prediction and time factor performance. e ANN and C45 algorithms are used for helping out the medical practitioner to give proper medication and medical treatment. K. Polaraju and Prasad [16] have proposed a multiple regression model to classify chronic heart disease. It is proved that the multiple linear regression model is favourable for predicting heart diseases.
In this research, the training dataset consists of 3000 values with 13 different features. From the experimental results, it is shown that the regression algorithm performs better than other algorithms. Kim et al. [17] have proposed the character-recurrent neural network (Char-RNN) model to predict chronic diseases. ey have collected data from the Korean National Health and Nutrition Examination Survey (KNHANES). It is observed that the Char-RNN model obtained higher accuracy than the conventional multilayer perceptron model. Ng et al. [18] have used machine learning algorithms to detect heart failure. Moreover, electronic health record data are used to predict events and the onset of diseases. Zhang et al. [19] have proposed a convolution neural network (CNN) architecture named Group Net to predict chronic diseases.
us, the experimental analysis is conducted using data from local medical centers. ey have noted that CNN has achieved the best accuracy. Kriplani et al. [20] have used deep learning to predict chronic kidney disease. e proposed models are tested by using standard datasets of diseases available on the UCI. e analysis results have appeared better in using cross-validation performance. Liu et al. [21] proposed CNN, LSTM, and hierarchical models to predict chronic diseases. Brisimi et al. [22] have applied four machine learning algorithms, namely, SVM, kernelized, sparse logistic regression, and random forests to predict chronic heart and diabetic diseases. ey have gathered standard data from electronic health records (EHRs). Chen et al. [23] used streamline machine learning techniques to predict chronic disease epidemics. eir experiment has proposed prediction models using real-life hospital data gathered from central China in 2013-2015. e convolution neural network is implemented as based on multimodal disease risk prediction (CNN-MDRP) algorithm using structured and unstructured data from hospitals. Patel et al. [24] have developed a system using three classifiers such as KStar, SMO, and J48, Bayes net, and multilayer perception neural network algorithms with the help of Weka software to classify heart diseases. It is observed that the Bayes net has accomplished optimum performance as compared with further classification algorithms, namely, KStar, multilayer perception, and J48 approaches by using the k-fold crossvalidation method. e research of Deepika and Seema [25] also designed a system to predict chronic diseases via machine learning algorithms such as naïve Bayes, decision tree, SVM, and ANN. A comparative analysis of the performances of algorithms is presented. It is observed that the support of the vector machine and the naïve Bayes provides the highest accuracy rate when predicting the diabetic disease. Ul Haq et al. [26] have suggested different machine learning algorithms such as naive Bayes, classification tree, KNN, logistic regression, SVM, and ANN to predict heart diseases. ree feature selection methods are applied to improve the classification algorithms. It is concluded that the feature selection method increases the performance of the classifier for predicting heart diseases. Ahmed et al. [27] proposed a fuzzy logic algorithm to classify kidney diseases. Coacci et al. [28] used two classification approaches, namely, logistic regression and ANN. Xun et al. [29] presented ANN and naïve Bayes classifiers for predicting chronic diseases. Some researchers used the UCI machine learning repository for testing proposed models, such as chronic diseases [30,31], diabetic disease [32,33], and breast cancer [34]. Using the deep learning algorithm to predict chronic diseases, Kim et al. [17] proposed nature-inspired computing algorithms for the diagnosis of chronic diseases [35] and employed machine learning algorithm to develop E-health for the diagnosis of chronic diseases [36].
In the current research article, traditional machine learning algorithms are employed for predicting chronic diseases. erefore, the result of the existing classification algorithms is needed to make the healthcare system more reliable. Subsequently, the soft clustering algorithm is applied to increase the accuracy of classification algorithms.

Materials and Methods
e proposed model is designed explicitly to classify chronic diseases using machine learning algorithms. Figure 1 displays the proposed system that combines the existing machine learning algorithms with the rough k-means clustering technique. Noncrisp rough k-means algorithm is demonstrated to handle the ambiguous objects. ese ambiguous objects obstruct the performance of machine learning algorithms. e RKM clustering has clustered data into two clusters. us, it is used to measure the roughness of the objects. Moreover, the threshold value is also used to maximize the roughness of objects for reducing the ambiguous objects. e threshold value parameter plays a very significant role in making the program of the noncrisp algorithm. It has experimented and found out that the threshold value is 1.4. e RKM algorithm is used to deal with ambiguous objects for improving the classification Journal of Healthcare Engineering 3 algorithm. e RKM algorithm has clustered data into lower approximation and upper approximation in which the clustered objects in the lower approximation are considered, but the objects clustered in upper approximation are excluded. e novelty of the proposed model has used rough k-means to handle the ambiguous objects belonging to lower approximation that is processed with the help of machine learning algorithms. e rough k-means clustering is proposed to explicitly determine ambiguous objects. To close, it is investigated that the results of the proposed system have outperformed all the alternative models used for measuring the performance. e detailed description of the proposed system is discussed in the following subsections.

Datasets.
e chronic disease datasets have been collected from the different resources as follows: e diabetes data collected from the machine learning repository contained nine attributes, eight features, and one class. is dataset has been gathered from an automatic electronic recording device and paper records [37]. Table 1 shows the features of data.

Breast Cancer Disease Dataset.
e cancer data collected from the Kaggle contained nine attributes, 30 features, and 1 class. ese features are obtained from digitized images of breast cancer [38]. Table 2 shows the features of data.

Kidney Disease Dataset.
e collected kidney data from the Kaggle contained 26 attributes, 24 features, and 1 class [38]. Table 3 shows the features of data.

Handling Ambiguity.
Machine learning algorithms have succeeded in a number of real-time applications such as image processing recognition, video recognition, marketing prediction, weather forecasting, and network security. e conventional machine learning algorithms are used to identify the objects belonging to exactly one class. In data analysis, it may be possible that an object shows the characteristics of different classes [39]. In that event, an object should belong to more than one class, and as a result, object boundaries should necessarily overlap. e machine learning algorithms categorize an object into one class precisely. Figure 2 shows the ambiguous data. Such requirement is found to be too restrictive in a number of realtime applications.
In Figure 2, the basic example of ambiguous objects can be noted. It clearly shows the three separate classes. Hence, it is observed that five objects are not classified under any precise class. Henceforth, these five objects decrease the performance of machine learning algorithms.
us, it is required to determine such ambiguous objects and deal with them before applying machine learning algorithms. For this  issue, the present research work applies the RKM technique to recognize ambiguous packets from chronic disease datasets. e detailed description of the RKM algorithm employed for identifying the ambiguous objects is presented in the subsequent subsections.

Rough K-Means Clustering Algorithm.
e proposed RKM clustering approach is based on a simple K-means clustering [40][41][42]. Peters [43] enhanced the algorithm of [40] (original proposal) by calculating rough centroid using ratios of distances as new proposals to differentiate between similar distances. Joshi and Lingras [44] used RKM and ECM clustering algorithms to handle high dimensional data. Aldhyani and Joshi [39] used the rough K-means and ECM clustering algorithms to handle ambiguous objects of intrusion detection. e rough K-means approach is designed to determine the ambiguous objects that belong to the upper boundary of clusters. Cluster the data as lower approximation and upper approximation. e rough K-Means represents each.
(P1) An object x → can be part of, at most, one lower Overall, ideas of soft clustering are more appropriate to deal with ambiguous objects. When the algorithm is processed, all objects are assigned w lower and w upper . For each → is not part of any lower approximation? e above criterion guarantees that property (P3) is satisfied.   Journal of Healthcare Engineering . e rough k-means algorithm has stability and reliability for handling ambiguity. e rough k-means algorithm has clustered objects into lower bound and upper bound. e objects in the upper bound are ambiguous objects, whereas the objects in the lower bound are correct objects. e upper bound should not be empty, and the objects in the upper bound can belong to one or more upper bounds in the cluster numbers. Figure 3 shows a snapshot of output obtained from the RKM algorithm to determine the ambiguous objects for improving the performance of machine learning algorithms. e objects in lower bound are correction objects, whereas the objects on boundary bound are ambiguous objects.

Classification Algorithms.
In this section, conventional machine learning algorithms are discussed. e automatic classification, namely, naive Bayes (NB), support vector machine (SVM), K-nearest neighbor (KNN), and random forest tree, are presented to predict chronic diseases for enhancing healthcare systems.

Support Vector Machine Algorithm.
e support vector machine is used to analyze data as classification and regression. In the SVM algorithm, the data point is considered as n-dimensional space where there are a number of features of data, and the values of features are the values of a specific coordinate. e classification of data is achieved by finding the best difference between the classes of data using hyperplane. A support vector machine algorithm classifies data by separating the hyperactive plane of label training data. e SVM obtains lower error when the margin is large. In the present research work, two classes of chronic diseases are used. All types of kernel functions are applied to classify the chronic disease datasets in which radial basis function (RBF), along with kernel function, obtain high accuracy. e kernel function is applied and observed that the RBF function and kernel function are appropriate with the RKM algorithm to obtain good accuracy.
where ‖x − x‖ 2 is the square Euclidean distance between two feature vectors and σ is a parameter.

Naïve Bayes Algorithm.
Naive Bayes algorithm is defined as a probabilistic method used to classify the dataset based on the well-known Bayes theorem of probability. e naïve Bayes classification algorithm works as prior probability, posterior probability, likelihood probability, and evidence probability. It normally uses probability distributions. e working Bayesian algorithm is as follows: Assume A � A 1 , A 2 , A 3 , . . ., A n is regarded as the feature vector of chronic disease features, and the values of the features are A 1 , A 2 , A 3 , . . ., A n and are considered as a number of features in the dataset. C indicates a class of chronic data as normal and abnormal. e Bayes equation is shown as follows: Conditional probability: It is assumed that the predictor A on the given class c is independent of the values of other predictors, and it is known as conditional class independence. P(C | A) is the posterior probability of class c, given predictor (feature). eorem is as follows:

K-Nearest Neighbors Algorithm.
A K-nearest neighbors algorithm is a simple machine learning algorithm, which uses the entire dataset in its training phase. KNN algorithm has low complexity in programming and implementation. e basic idea can be presented in a sample space when its nearest neighboring features belong to a category, and then the features belong to the same category.
e KNN classification algorithm can be used with either a single or a multidimensional feature dataset and can find the closest features. It employs the Euclidean distance method for finding the closest point among the features.

Random Forest Algorithm.
A decision tree algorithm is one of the powerful decisions. It is used to build the block of a random forest. It works to select the best split of an object from the dataset in each step. To reduce the high harnessed of variance, we can create multiple trees with various samples of datasets and combine this operation with bootstrap aggregation or bagging. e disadvantage of the bootstrap aggregation method is used to spill the values of each tree, which creates a problem in decision-making. Furthermore, it makes predictions of training data similar and mitigates the variance originally sought. us, the random forest algorithm can be further used for the classification and regression problems and for the overfitting of data as well.
e selected attributes are measured by employing the information gain method to discover the value or the information from the entire dataset. e information gain method is calculated for each splitting attribute with selecting high gain attributes. It is assumed that D is the dataset. 6 Journal of Healthcare Engineering where D is the dataset, i � 1, 2, . . ., m is the class of dataset D, and the probability is p i Let B be an attribute in dataset D and b 1 , b 2 , b 3 , . . . , b n are values of the attributes in B. Attributes are a partition for generating the amount of information from attributes.
e attributes show the highest information as follows:

Performance Measurement.
e performance measures are used to test and evaluate the proposed system. e accuracy, specificity and sensitivity, precision, recall, and Fscore evaluation matrices have been employed to test the proposed model. e evaluation matrices are computed by using the equations (9)-(13) as described below. where we have true positive (TP), true negative (TN), false positive (FP), and false negative (FN).

Accuracy.
Accuracy is the number of correct predictions made by the model over all kinds of predictions made. It is calculated as the total number of correct labels (TP + TN) divided by the total number of chronic disease datasets (P + N): 3.4.2. Specificity. Specificity (also called the true negative rate) is a measure that tells us about the percentage of patients who do not suffer from chronic diseases, which are predicted by the model as not chronic diseases: 3.4.3. Sensitivity. Sensitivity (also called the true positive rate, the recall, or probability of detection) is the measure that tells about the percentage of patients who actually suffer from chronic diseases, which are diagnosed by the classification algorithms on chronic diseases:

Precision.
Precision is a measure that tells about the proportion of patients that we diagnosed as having chronic diseases, actually had chronic diseases. It is known as positive predictive value (PPV): 3.4.5. F1-Score. F1-score (also called the precision, F-score, F-measure, and recall) is the harmonic mean (average) of the precision and recall: F1 − score � 2 * precision * sensitivity precision + sensitivity %100.

Experimental Results and Discussion
erefore, the rough K-means algorithm is applied for improving the classification of chronic diseases. It is used to determine the ambiguous objects that have obstructed the classification algorithms. It has further experimented with various standard chronic datasets. It is aimed here to improve the diagnosis of chronic diseases. In the beginning, the conventional classification algorithms are applied to predict chronic diseases. However, it is observed that the obtained results were not appropriate. From the obtained results, it is noted that there are ambiguous objects that decrease the accuracy of machine learning algorithms. One of the biggest challenges that we have faced within the implementation of the proposed system is the ambiguity embedded in the variable of the standard dataset. For this reason, the RKM algorithm is considered to handle these ambiguous objects so that the accuracy of the classification algorithms can be improved. e RKM algorithm is appropriately designed for detecting the ambiguity in the chronic disease datasets. e experimental results have shown that the performance of the proposed system is better than that of the conventional models. For measuring and evaluating the performance of the proposed system, the performance measures are applied. e standard evaluation matrices, namely, accuracy, specificity, sensitivity, precision, and F-score have been presented to test the proposed system against the existing machine learning techniques. Moreover, for validating the proposed system, the datasets are divided into 70% train and 30% test. Numerous experiments have attempted to evaluate the proposed system. e results of machine learning algorithms and enhanced model to the various datasets are presented as follows: 4.1. Classification Results of Diabetics Disease. In this section, different experiments of classification algorithms with the enhanced proposed system have been conducted. e soft computing rough K-means algorithm is used to handle ambiguous objects. e ambiguous objects in chronic disease datasets have reduced the performance of machine learning algorithms. When applying the classification algorithm on the original diabetics data, it is observed that the results are not favourable. From the data, it is investigated that there are ambiguous objects that hinder the classification algorithms. e diabetes data contain seven instances and two classes. ese ambiguous objects are examined by RKM clustering to assist in determining the exact class of ambiguous diseases or the closest one. e dataset has been clustered for two clusters corresponding into two classes that are labelled variables in datasets. e RKM algorithm has clustered the ambiguous objects into upper approximation and lower approximation. ose objects that belong to upper approximation, which belongs to one or more cluster numbers, are excluded. Among 768 instances, 718 instances are clustered as a lower approximation. Moreover, the remaining objects are clustered as an upper approximation and are considered as ambiguous objects as well. e ambiguous objects have been denied from the data. e classification algorithm is applied to process the data in a lower approximation for diagnosing the diseases. Table 4 shows the results obtained from the RKM algorithm used for discovering the ambiguous objects. Table 5 shows the results of the classification algorithm, namely, naïve Bayes, SVM, random forest tree, and KNN. It is observed that the obtained results are needed to improve. e rough K-means is applied to enhance the existing machine learning. Table 6 shows the results of machine learning techniques with RKM algorithm. e rough K-means is used to deal with ambiguous objects. It is observed that the RKM algorithm has improved the results of the classification algorithms. e results of naïve Bayes with RKM are 80.55%, 80.14%, 80.14%, 90%, and 84.78% in terms of accuracy, sensitivity, specificity, precision, and F-score, respectively. Similarly, the results of SVM with RKM approaches are 77.78, 77.24, 78.87, 88.19, and 82.35 corresponding to accuracy, sensitivity and specificity, precision, and F-score, in that order. e results of random forest with RKM are 77.20, 56.09, 69.05, and 62.0. Furthermore, the obtained results by using KNN with RKM are 71.30%, 79.29%, 56.58%, 77.08%, 77.08%, and 78.70%. Finally, from the obtained data, it is investigated that the classification algorithm is improved by using the RKM algorithm. Figures 4-7 show the performance of the classification algorithms with the RKM algorithm.

Classification Results of Kidney Disease.
is section demonstrates the classification of kidney diseases with the help of machine learning algorithms and the enhanced proposed system. Table 7 shows the results obtained from the RKM algorithm to figure out the ambiguous objects. Kidney diseases contain 400 instances. e data are clustered into two clusters. e rough K-means clusters data into upper approximation and lower approximation. e objects that have been clustered in lower approximation are 174 instances. ose objects that belong to lower approximation are regarded as approved objects because they belong to the same cluster numbers. e remaining objects, that is, 226 objects, are clustered in upper approximation, which is considered as ambiguous objects. Table 8 shows the performance of existing machine learning algorithms, namely, naïve Bayes, SVM, random forest tree, and KNN. It is observed that there is a possibility for improving the classification algorithms if they handle ambiguous objects. It is also noted that the RKM algorithm has improved the results of the classification algorithms. Table 9 shows results of machine learning techniques with the RKM algorithm. e results of naïve Bayes with RKM are 98.11%, 96.43%, 96.15%, 96.15%, and 98.04.78%, with respect to the evaluation matrices. Similarly, results of SVM with RKM approaches are 100%, 100%, 100, 100%, and 100% in terms of accuracy, sensitivity and specificity, precision, and F-score, respectively. e results of random forest with RKM are 100%, 100%, 100, 100%, and 98.02%. Furthermore, the obtained results by using KNN with RKM are 84.91%, 80.65%, 90.91%, 92.59%, and 86.21%. Lastly, from the obtained results, it is found that the classification algorithm is improved by using the RKM algorithm. Figures 8-11 display the performance of the classification algorithms with the RKM algorithm for predicting kidney diseases.

Classification Results of Cancer Disease.
is section shows the classification of cancer disease using the existing machine learning algorithms and the proposed system using the RKM algorithm. e cancer data contain 569 instances that are classified into two classes such as benign and malignant. e soft computing RKM clustering algorithm is used to handle ambiguous objects. It clusters data into two    e RKM technique has clustered data into lower approximation and upper approximation. e objects belonged to the lower approximation are appropriate objects, and are processed by using a machine learning algorithm. However, the objects that have been clustered into upper approximation are considered as ambiguous ones. e RKM is clustered into 539 objects in the lower approximation, and the remaining objects are in the upper approximation. Table 10 demonstrates the results of the RKM algorithm. Subsequently, machine learning is applied to diagnose cancer as benign and malignant. Table 11 shows the obtained results of conventional machine learning for the classification of cancer disease. It is noted that the results need more improvement, and the RKM algorithm is applied to enhance the existing machine learning algorithm.

Comparative Analysis
In this section, a comparative analysis between the proposed model and some of the other state-of-the-art work is used in the same datasets. e comparison is very important because it examines the results of the proposed model. e accuracy metric is used to compare the proposed model with the existing classification algorithms. Table 13 shows the results of the proposed system and the existing neural network approach. It is investigated that the results of the proposed system are better than those of the existing neural network approach.      Journal of Healthcare Engineering 13

Conclusion
e performance of the existing machine learning is thwarted from diagnosing chronic diseases because of the availability of ambiguous objects. ese ambiguous objects show traits in more than one class. To identify and process the ambiguous objects explicitly, we have demonstrated the noncrisp RKM clustering that can handle these ambiguous  objects to improve the accuracy of classification algorithms. e framework of the proposed system lies in its use of a soft clustering algorithm, namely, rough K-means that can be employed for modelling ambiguity. e rough K-means clustering can assist in determining the exact class of the ambiguous objects or the approximate ones. It is observed that the RKM algorithm has increased the performance of the conventional machine algorithms to predict chronic diseases. e ambiguous objects are excluded from chronic dataset. erefore, the RKM algorithm clustered the data into lower and upper approximation. e objects clustered in lower approximation are considered as appropriate objects. Additionally, the objects that belong to the upper approximation are denied and considered as ambiguous objects. e objects that belong to the lower approximation are proposed by using machine learning algorithms to predict chronic diseases. e experimental results demonstrate that the proposed system is successfully employed for the diagnosis of chronic diseases. Comparative analysis results between existing machine learning algorithms and the proposed system are presented. Moreover, it is observed that the results of the proposed system are superior in terms of accuracy, specificity, sensitivity, precision, recall, and F-score performance measures. Identifying common web search activity behaviour is regarded as a proxy for chronic disease risk factors using machine learning algorithms can be considered in future work.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare no conflicts of interest.