An Online Weighted Bayesian Fuzzy Clustering Method for Large Medical Data Sets

With the rapid development of artificial intelligence, various medical devices and wearable devices have emerged, enabling people to collect various health data of themselves in hospitals or other places. This has led to a substantial increase in the scale of medical data, and it is impossible to import these data into memory at one time. As a result, the hardware requirements of the computer become higher and the time consumption increases. This paper introduces an online clustering framework, divides the large data set into several small data blocks, processes each data block by weighting clustering, and obtains the cluster center and corresponding weight of each data block. Finally, the final cluster center is obtained by processing these cluster centers and corresponding weights, so as to accelerate clustering processing and reduce memory consumption. Extensive experiments are performed on UCI standard database, real cancer data set, and brain CT image data set. The experimental results show that the proposed method is superior to previous methods in less time consumption and good clustering performance.


Introduction
In recent years, smart medical care has emerged with the vigorous development of artificial intelligence (AI) technology. At present, the application of AI technology in the medical field involves many aspects such as disease prediction, intervention and consultation, disease diagnosis and treatment, drug research and development, and health service management [1]. e fusion of AI and healthcare services can help clinicians reduce reading time, aid in early detection, and improve diagnostic accuracy. e technology of clustering plays a very wide role in the fields of medical data analysis.
As a typical unsupervised learning method, clustering mines the internal relationship between data samples and then puts the samples with the same or similar attributes in the same cluster, which avoids the dependence on the label data and saves a lot of manpower and material resources [2]. Fuzzy clustering is a typical representative of clustering methods, and the most classic fuzzy clustering is the fuzzy C-means algorithm (FCM). Fuzzy clustering improves the traditional hard clustering partition.
ere are a large number of derivative algorithms based on the FCM, including the probabilistic C-means algorithm (PCM), which uses probabilistic methods to express fuzzy membership to improve the limitation of fuzzy membership [3]. Recently, many researchers have improved the traditional FCM method from multiple perspectives and applied them in various scenes [4]. Hua et al. [5] developed a multiview fuzzy clustering based on the framework of FCM. Gu et al. [6] proposed a probabilistic FCM method to be used for antecedent parameter learning in Takagi-Sugeno-Kang fuzzy system. Zhou et al. [7] proposed a new membership scaling FCM method by selecting the unchanged clustering centers through triangular inequality, which solves the problems of slow convergence and a large amount of calculation when a FCM algorithm is dealing with large data sets. Mishro et al. [8] proposed a new type 2 adaptive weighted space FCM clustering algorithm to solve the problem of noise misclassification and inaccurate clustering center obtained by FCM in the process of MR brain image segmentation. Wang et al. [9] proposed an FCM algorithm for irregular image segmentation, which has higher robustness and less computational effort compared to traditional segmentation algorithms. Based on hyperplane partitioning, Shen et al. [10] developed a feasible and efficient FCM algorithm to deal with large data sets. Jha et al. [11] designed and implemented a kernelized fuzzy clustering algorithm using its in-memory cluster computing technology. Liu et al. [12] proposed a FCM algorithm based on multiple surface approximate interval membership for processing artifacts in brain MRI images. Wang et al. [13] proposed a FCM algorithm based on wavelet frame, which can effectively remove image noise and preserve image details. is algorithm can provide a new way to segment images in an irregular domain. Li et al. [14] proposed a domain-qualified adaptive FCM method for processing MRI brain images with noise and uneven intensity. Zhang and Huang [15] studied the generalization error of a FCM algorithm from the perspective of theory and limited the generalization error from the perspective of convergence, which can provide guidance for the application of the sampling-based FCM method. Wu et al. [16] proposed an online clustering algorithm by combining FCM algorithm with an online framework set to solve the problem that batch learning cannot deal with large-scale data sets. Zhang et al. [17] combined FCM with a nonlinear genetic algorithm and proposed an apple defect detection method to improve fruit defect detection. Shen et al. [18] proposed a hyperplane partition method based on FCM to deal with big data clustering. Recently, the Bayesian fuzzy clustering (BFC) [19] algorithm is proposed to combine the fuzzy method into a probability model. BFC reinterprets the fuzzy method from the perspective of probability, expands the value range of the fuzzy index, and solves the problem that the fuzzy method is prone to local optimization. e many characteristics of the BFC algorithm make it very widely used in medical data processing. But due to the high complexity of the BFC method, its efficiency is not high, and it has received great limitations in practical applications. Inspired by the above ideas, this paper proposes the online weighted Bayesian fuzzy clustering method (OWBFC) and uses an online clustering framework, which not only retains all the advantages of the Bayesian fuzzy clustering algorithm but also improves the efficiency of the Bayesian fuzzy clustering method through the online clustering framework. We verify the OWBFC method on a series of real-world data sets. Compared with the existed Bayesian fuzzy clustering algorithms, the contributions of our study are concluded as follows: (1) OWBFC combines the probability method with the fuzzy method and realizes the fuzzy clustering through the probability method, which has the common advantages of the probability method and the fuzzy method. (2) In the process of solving the parameters, the Markov chain Monte Carlo method (MCMC) is used for sampling instead of a closed solution, so the global optimal solution of the parameters can be obtained in OWBFC. (3) e online clustering framework is used in OWBFC to deal with the problem that large data sets cannot be imported into memory, and the weighting mechanism is used to improve the clustering efficiency.

Related Work
e BFC algorithm combines the probabilistic method with the fuzzy method. From the perspective of prior knowledge and Bayesian theory, it expands the range of the fuzzy index of the traditional fuzzy method. e BFC algorithm uses the MCMC strategy [20] and particle filter method [21,22] to solve the optimization problem. e maximum posterior probability (MAP) is used to process fuzzy clustering, and the normal distribution is further used to predict the number of clusters. erefore, the BFC method is superior to the previous fuzzy or probabilistic methods in many aspects. However, the algorithm complexity of BFC is relatively high.
is shortcoming makes the BFC method not suitable for large-scale data, and its application range is greatly limited, which does not meet the current actual needs. e BFC algorithm aims to solve fuzzy clustering from the perspective of probability. e probability model of BFC consists of three parts, namely fuzzy data likelihood (FDL), fuzzy membership prior (FCP), and cluster center prior, as follows. e fuzzy data likelihood is as follows: where X, U, and C are matrices of training data, fuzzy member, and cluster centers, respectively. K and N represent the numbers of samples and the numbers of clusters, respectively. u kn is the membership of data point x k in cluster n. e parameters m, c n , and I represent the fuzzy index, the cluster center, and the identity matrix, respectively. And, Z(u k , m, C) is the normalization constant, and m is the fuzzy index. Since Z(u k , m, C) will be eliminated by the following equation (2), it does not need to be calculated. e prior of fuzzy membership is expressed as 2 Computational Intelligence and Neuroscience p(U|C) consists of three parts as follows: . F 1 is to eliminate the normalizing constant in equation (1). F 3 is the Dirichlet distribution as follows: where x n ≥ 0, n � 1,. . ., N and N n�1 x n � 1. e parameter α is the Dirichlet prior parameter, which controls the membership degree of the sample. rough Dirichlet distribution, the BFC algorithm breaks the constraint that the fuzzy index in the FCM algorithm must be greater than 1, so that the fuzzy index in the BFC algorithm can take any value.
Cluster center prior is defined as It is noted that p(C) is to match the high degree of membership produced by equation (4). μ c and Σ c are the mean and variance of all samples, as follows: where c is a parameter that affects the strength of the prior, which is set by the user, and we use c � 3 in our study. e joint likelihood of X, U, and C is obtained by multiplying equations (1), (2), and (4).
According to map theory, the joint likelihood form of equation (7) is its negative logarithm, and a factor of 2 can be multiplied to simplify. e joint likelihood form is as follows: Finally, BFC uses MAP inference and uses sampling to filter membership and cluster centers to obtain their optimal values.
From the above introduction, we can see that the BFC algorithm breaks through the constraints of the traditional fuzzy clustering fuzzy index and can obtain the global optimal solution, but its time complexity is too high to handle large data sets.

Weighted Bayesian Fuzzy
Clustering. For large data sets, it is a difficult problem that the data cannot be imported into the computer at one time. In this paper, the online clustering framework is adopted. By dividing the large data set into several easy-to-handle small data blocks, the clustering center of each data block is defined as the representative point. In the process of processing the data blocks, the representative points of each data block and the corresponding weights of the representative points are combined into two new different sets, and then, the two new sets are processed to get the clustering center of the whole data and accelerate clustering. Since the OWBFC method uses a block and weighting mechanism to introduce weights for the clustering centers of each data block, the weighted Bayesian fuzzy clustering (WBFC) algorithm is introduced, and then, WBFC is extended to its online version.
To further judge the contribution of each sample point to the cluster in the process of clustering, this paper introduces Computational Intelligence and Neuroscience the WBFC algorithm, which adaptively weights different sample points to select the representative sample points. e objective function of WBFC is defined as where w k > 0 represents the contribution of the nth sample to the final cluster division. How to set w k will be described in detail in the next section. Following [19], the MCMC parameter optimization strategy is used in the WBFC algorithm. First, we initialize the parameter u k and c n by Dirichlet distribution and normal distribution. We sample the U matrix according to U ∼ p(U|X, C) ∝ p(X, U, C) using the Gibbs sampling. We judge whether the new membership sample is accepted. If it is accepted, then u k is set as u k � u Ψ k , u Ψ k as a new membership sample. e acceptation rate A u is computed as en, we sample C according to C ∼ p(C|X, U) ∝ p(X, U, C). We judge whether the new cluster center sample is accepted. If it is accepted, then c n is set as c n � c Ψ n . e acceptation rate A c is computed as If p(X, c Ψ n |U * ) > p(X, c * n |U * ), we set the current c * n as c * n � c Ψ n , c Ψ n as a new cluster center. e p(X, c n |U) is computed as Finally, we check the maximum likelihood of all samples using equation (9). e whole training process circulates several times until the model converges. e training procedure of WBFC is shown in Algorithm 1.

Online Weighted Bayesian Fuzzy Clustering.
e WBFC algorithm aims to introduce object weights based on the BFC algorithm, so more representative sample points can be selected while clustering. Based on the characteristics of the WBFC algorithm, we further proposed the online version of WBFC algorithm called OWBFC algorithm. Inspired of the online algorithm advantage, the OWBFC algorithm can handle large data sets based on the WBFC algorithm. OWBFC divides the large-scale data into several easy-toprocess data blocks. en, OWBFC uses the WBFC algorithm to process each data block, merges the cluster center of each data block into a new set, calculates the weight of each cluster center, and merges the obtained weight. Finally, the new cluster center set and the corresponding weight set are processed to obtain the final cluster center. e weight factor w n in OWBFC is computed as follows: where w q represents the weight of representative points of each data block. Here, we give the training procedure of the OWBFC algorithm as shown in Algorithm 2. e parameter K l represents the number of sample points in the lth block, u kq is the membership of x k in cluster q, and Q represents the number of clusters. First, we divide the training data X into d blocks as X � X 1 , . . . , X d , and each block X l has K l sample points, l � 1, 2, ..., d. U l and C l are the fuzzy membership and clustering center matrices, respectively. We run the WBFC algorithm in the first block X 1 and obtain the fuzzy membership and clustering center matrices in X 1 . en, we run the WBFC algorithm in the rest blocks with the clustering center matrix C l-1 .

Data Sets and Experiment Settings.
In the experiment, we use several medical data sets, including two cancer data sets, Armstrong-2002-v2 and Bhattacharjee-2001 [23], three medical data sets, and brain images in the UCI database [24]. Armstrong-2002-v2 is a data set to distinguish the expression of leukemia genes. It is divided into three categories, with a total of 72 samples. Bhattacharjee-2001 is a lung cancer classification data set, including five categories, a total of 203 samples. Because of the small sample size of these two data sets, they are not segmented here. e heart disease data set, diabetic retinopathy Debrecen (DRD) data set, and hepatitis C virus (HCV) for Egyptian patient data set are three UCI medical data sets. e heart disease data set contains 303 samples, and only 14 of them are used in this article. e DRD data set contains 1151 samples, and the HCV data set contains 1385 samples. To facilitate the division, this study takes 1000 samples for the diabetic retinopathy Debrecen data set. e HCV data set took 1,200 samples. A total of three brain CT images were selected as CT1, CT2, and CT3, with pixels of 275 × 273, 273×277, and 264 × 271. To facilitate segmentation, the pixels of the three pictures are reduced to 272 × 272, 272 × 272, and 264 × 264, respectively. e comparison algorithms include OFCM [25] and SPFCM [25], which can process large-scale data clustering. Among them, the two cancer data sets and one UCI medical data set are used to compare the clustering effects of OWBFC, OFCM, and SPFCM algorithms without segmentation. e remaining two UCI medical data sets are used to compare the clustering effects and time of OWBFC, OFCM, and SPFCM algorithms in different proportions of segmentation. e brain images are used to show the running time comparison of OWBFC and BFC. e OWBFC, OFCM, and SPFCM algorithms have two parameters: fuzzy index m and prior parameter α. In this study, we set m � 1.7 and α � 1. To visually display the clustering performance, we use four clustering performance indicators of accuracy, entropy, F-measure, and purity to show the clustering results. R � full t /block t represents the ratio of the running time of the whole processing data set and the block processing data set of the algorithm, full t represents the running time on the whole data set, and block t represents the sum of the running time on each block. Although this part loads the data set into the memory at one time, this paper believes that R is similar to the data that cannot be loaded, because the total amount of data is the same, whether it is processed separately or at one time. Our experimental platform is AMD R5-5600X, six cores, 16G memory, Windows10 operating system, Matlab2016a.

Experimental Results on the Armstrong-2002-v2, Bhattacharjee-2001, and Heart Disease Data Set.
To make SPFCM and OFCM algorithms run better, according to the suggestions of Havens et al. [25], we set the fuzzy index m � 1.7. For the Armstrong-2002-v2 data set, Bhattacharjee-2001 data set, and heart disease data set, the number of clusters is set to 3, 5, and 5, respectively. Because the sample size of these three data sets is small, they are not processed in blocks. Table 1 shows the experimental results of the OFCM, SPFCM, and OWBFC algorithms. We can see that the OFCM algorithm and the SPFCM algorithm have similar clustering performances on these three different data sets, and it is difficult to compare the advantages and disadvantages of the two algorithms. But comparing the OFCM algorithm, SPCM algorithm, and OWBFC algorithm, it is easy to see that the OWBFC algorithm has the best clustering results except for some special cases.

Clustering Performance on DRD and HCV Data Set.
Like the parameter setting in Section 4.2, the DRD and HCV data sets are divided into 5%, 10%, and 50% of the whole data set, and the last column of the HCV data set is selected as the basis for the number of clusters, the fuzzy index m � 1.7, and the number of clusters is set to 4. e OFCM, SPFCM, and OWBFC algorithms run independently 10 times on the basis of random initialization to calculate the maximum, minimum, and average values of accuracy, entropy, F-measure, and purity. e clustering results of the two data sets are shown in Tables 2-5. From Table 2, it can be seen that the accuracy of the OFCM is slightly lower than that of the SPFCM algorithm when the number of data blocks is large, and the accuracy of the OFCM algorithm is higher than that of the SPFCM algorithm when the number of blocks is small. Overall, the accuracy of OFCM algorithm is similar to that of SPFCM algorithm, and the accuracy of OWBFC algorithm is the best. Tables 2-5 show the accuracy, entropy, F-measure, and purity of these two different data sets. For example, 74.78/74.90/74.66 represents the mean, max, and min accuracy values, respectively. Compared with the OFCM algorithm and SPFCM algorithm, the OWBFC algorithm has the best results whether it is entropy or F-measure or purity. Because the OWBFC algorithm uses the MCMC sampling method to solve the parameters, it can obtain the global optimization of the parameters. erefore, the OWBFC algorithm can obtain better clustering performance. Only from Tables 3-5, the gap between the three algorithms is not obvious. Combining with Table 8, it can be clearly seen that the OWBFC algorithm has good clustering performance and also greatly reduces the time consumption of the algorithm. Table 8 shows the running time at different division ratios. Because the data set name is too long, the abbreviation is used in the experiment.

Brain Images.
ree brain images are shown in Figure 1. We use them to verify the clustering performance of OWBFC for large-scale image segmentation. We compare the OWBFC and BFC algorithms in this subsection. According to the recommendations [19,25], the parameters α set to 1, and the Computational Intelligence and Neuroscience Input:Training data X, fuzzy index m, number of clusters N, number of sampling iteration N iter , weight w Output:Fuzzy membership U * and cluster prototypes C * Step 1. Initialize parameters μ c and Σ c Step 2. Initial u k ∼ Dirichlet(α � 1 N ), k � 1, . . . , K Step 3. Initial y c ∼ N(μ y , Σ y ), c� 1, . . . , C Step 4. u * k � u k , c * n � c n ,//assign map sample to current sample Step 5. For iter � 1,. . ., N iter //Sample U according to U ∼ p(U|X, C) ∝ p(X, U, C) Step 7.
Sample new membership u Ψ k by equation (3) Step 9.
Judge whether the new membership sample is accepted by equation (10). If it is accepted, then u k � u Ψ k Step 10.
Sample new cluster center c Ψ n from N(c n , Σ c /δ) Step 16.
Judge whether the new cluster center sample is accepted by equation (12). If it is accepted, then c n � c Ψ n Step 17.
Input: Training data X, fuzzy index m, number of clusters Q, number of sampling iteration N iter , weight w n , number of data blocks d Output: Cluster prototypes C * Step 1. Divide X into d blocks X � X 1 , . . . , X d //Each block has K l sample points, 1 ≤ l ≤ d Step 2. Initialize w � 1 K l Step 3. Use algorithm to obtain the U 1 , C 1 � WBFC(X 1 , Q 1 , m) with w Step 4. for l � 2 to d Step 5. Use algorithm to obtain the U l , C l � WBFC(X l , Q l , m) with C l-1 Step 6. end Step 7.
Step 10. end ALGORITHM 2: Online Weighted Bayesian Fuzzy Clustering (OWBFC) algorithm.  Computational Intelligence and Neuroscience parameters m set to 1.7. We split brain images at a ratio of 25% and set all classes to 3. Figures 2 and 3 show the clustering results of three brain images by BFC and OWBFC algorithms, respectively. Table 6 shows the experimental results on three brain images. Table 7 shows the running time results of BFC and OWBFC on three brain images. We can see from Table 6 that the clustering performance of OWBFC is better than that of BFC. Meanwhile, from Table 7, we can clearly see that OWBFC has a shorter time consumption compared with BFC. In summary, compared with BFC, the OWBFC algorithm not only maintains a good clustering effect but also consumes less time.          Computational Intelligence and Neuroscience 9

Conclusion
With the advancement of science and technology, the collection of various medical data has become more frequent and easier, which makes the scale of medical data larger and larger, and it is impossible to import the data into the memory at one time, so the hardware requirements for processing these data become higher and the time consumption increases. is paper proposes an OWBFC method, which reduces the memory consumption of the computer and the time consumption of the algorithm by introducing an online clustering framework to process the data set in blocks. From the experimental results, the block processing can effectively reduce the time consumption of the algorithm. However, the online clustering framework adopted in this paper needs to merge and save the cluster centers of each data block in the process of processing data, which raises the space consumption of the algorithm. erefore, how to avoid excessive space consumption while ensuring low time consumption is a problem worth thinking about.

Conflicts of Interest
e authors declare that they have no conflicts of interest.

Authors' Contributions
Cong Zhang and Jing Xue contributed equally to this work.