AN OPTIMIZED K-MEANS ALGORITHM

— Data mining is a new technology, developing with database and artificial intelligence. It is a processing procedure of extracting credible, novel, effective and understandable patterns from database. Cluster analysis is an important data mining technique used to find data segmentation and pattern information. By clustering the data, people can obtain the data distribution, observe the character of each cluster, and make further study on particular clusters. Cluster analysis method is one of the most analytical methods of data mining. The method will directly influence the result of clustering. This paper discusses the standard of k-mean clustering and analyzes the shortcomings of standard k-means such as k-means algorithm calculates distance of each data point from each cluster centre. Calculating this distance in each iteration makes the algorithm of low efficiency. This paper introduces an optimized algorithm which solves this problem. This is done by introducing a simple data structure to store some information in every iteration a nd used this information in next iteration.


Introduction
The face detection problem is to determine the face existence in the input image and then return its location if it exists.But in the face localization problem it is given that the input image contains a face and the goal is to determine the location of this face.The automatic recognition system, which is a special case of the face detection, locates, in its earliest phase, the region of interest (ROI) that contains the face.Face localization in an input image is a challenging task due to possible variations in location, scale, pose, occlusion, illumination, facial expressions, and clutter background.Various methods were proposed to detect and/or localize faces in an input image; however, there are still needs to improve the performance of localization and detection methods.A survey on some face recognition and detection techniques can be found in [1].A more recent survey, mainly on face detection, was written by Yang et al. [2].
One of the ROI detection trends is the idea of segmenting the image pixels into a number of groups or regions based on similarity properties.It gained more attention in the recognition technologies, which rely on grouping the features vector to distinguish between the image regions, and then concentrate on a particular region, which contains the face.One of the earliest surveys on image segmentation techniques was done by Fu and Mui [3].They classify the techniques into three classes: features shareholding (clustering), edge detection, and region extraction.A later survey by N. R. Pal and S. K. Pal [4] did only concentrate on fuzzy, nonfuzzy, and colour images techniques.Many efforts had been done in ROI detection, which may be divided into two types: region growing and region clustering.The difference between the two types is that the region clustering searches for the clusters without prior information while the region growing needs initial points called seeds to detect the clusters.The main problem in region growing approach is to find the suitable seeds since the clusters will grow from the neighbouring pixels of these seeds based on a specified deviation.For seeds selection, Wan and Higgins [5] defined a number of criteria to select insensitive initial points for the subclass of region growing.To reduce the region growing time, Chang and Li [6] proposed a fast region growing method by using parallel segmentation.
As mentioned before, the region clustering approach searches for clusters without prior information.Pappas and Jayant [7] generalized the K-means algorithm to group the clustering algorithm is a direction to solve this problem.Guha et al. [16] proposed the CURE method, which makes use of multiple representative points to obtain the "natural" clusters shape information.The problem of outliers and noise in the data can also reduce clustering algorithms performance [17], especially for prototype-based algorithms such as K-means.The direction to solve this kind of problem is by combining some outlier removal techniques before conducting K-means clustering.For example, a simple method [9] of detecting outliers is based on the distance measure.On the other hand, many modified K-means clustering algorithms that work well for smaller medium-size datasets are unable to deal with large datasets.Bradley et al. in [18] introduced a discussion of scaling K-means clustering to large datasets.
Arthur and Vassilvitskii implemented a preliminary version, k-means++, in C++ to evaluate the performance of kmeans.K-means++ [19] uses a careful seeding method to optimize the speed and the accuracy of k-means.Experiments were performed on four datasets and results showed that that this algorithm is (log )-competitive with the optimal clustering.Experiments also proved its better speed (70% faster) and accuracy (potential value obtained is better by factors of 10 to 1000) than k-means.Kanungo et al. [20] also proposed an ( 3  − ) algorithm for the k-means problem.It is (9 + ) competitive but quiet slow in practice.Xiong et al. investigates the measures that best reflects the performance of K-means clustering [14].An organized study was performed to understand the impact of data distributions on the performance of K-means clustering.Research work also has improved the traditional K-means clustering so that it can handle datasets with large variation of cluster sizes.This formal study illustrates that the entropy sometimes provided misleading information on the clustering performance.Based on this, a coefficient of variation (CV) is proposed to validate the clustering outcome.Experimental results proved that, for datasets with great variation in "true" cluster sizes, K-means lessens (less than 1.0) the variation in resultant cluster sizes.For datasets with small variation in "true" cluster sizes Kmeans increases (greater than 0.3) the variation in resultant cluster sizes.
Scalability of the k-means for large datasets is one of the major limitations.Chiang et al. [21] proposed a simple and easy to implement algorithm for reducing the computation time of k-means.This pattern reduction algorithm compresses and/or removes iteration patterns that are not likely to change their membership.The pattern is compressed by selecting one of the patterns to be removed and setting its value to the average of all patterns removed.If [   ] is the pattern to be removed then we can write it as follows:  [23], which reduces the computation time of k-means.Ordonez and Omiecinski proposed relational k-means [24] that uses the block and incremental concept to improve the stability of scalable kmeans.The computation time of k-means can also be reduced using Parallel bisecting k-means [25] and triangle inequality [26] methods.

𝐷 [𝑟
Although K-means is the most popular and simple clustering algorithm, the major difficulty with this process is that it cannot ensure the global optimum results because the initial cluster centers are selected randomly.Reference [27] presents a novel technique that selects the initial cluster centers by using Voronoi diagram.This method automates the selection of the initial cluster centers.To validate the performance experiments were performed on a range of artificial and real world datasets.The quality of the algorithm is assessed using the following error rate (ER) equation: The lower the error rate the better the clustering.The proposed algorithm is able to produce better clustering results than the traditional K-means.Experimental results proved 60% reduction in iterations for defining the centroids.
All previous solutions and efforts to increase the performance of K-means algorithm still need more investigation since they are all looking for a local minimum.Searching for the global minimum will certainly improve the performance of the algorithm.The rest of the paper is organized as follows.In Section 2, the proposed method is introduced that finds the optimal centers for each cluster which corresponds to the global minimum of the k-means cluster.In Section 3 the results are discussed and analyzed using two sets from MIT, BioID, and Caltech datasets.Finally in Section 4, conclusions are drawn.

K-Means Clustering
K-means, an unsupervised learning algorithm first proposed by MacQueen, 1967 [28], is a popular method of cluster analysis.It is a process of organizing the specified objects into uniform classes called clusters based on similarities among objects based on certain criteria.It solves the wellknown clustering problem by considering certain attributes and performing an iterative alternating fitting process.This algorithm partitions a dataset  into  disjoint clusters such that each observation belongs to the cluster with the nearest mean.Let   be the centroid of cluster   and let (  ,   ) be the K-means clustering is utilized in a vast number of applications including machine learning, fault detection, pattern recognition, image processing, statistics, and artificial intelligent [11,29,30].k-means algorithm is considered as one of the fastest clustering algorithms with a number of variants that are sensitive to the selection of initial points and are intended for solving many issues of k-means like the evaluation of the clusters number [31], the method of initialization of the clusters centroids [32], and the speed of the algorithm [33].[34] is a proposal to improve global search properties of k-means algorithm and its performance on very large datasets by computing clusters successively.The main weakness of k-means clustering, that is, its sensitivity to initial positions of the cluster centers, is overcome by global kmeans clustering algorithm which consists of a deterministic global optimization technique.It does not select initial values randomly; instead an incremental approach is applied to optimally add one new cluster center at each stage.Global k-means also proposes a method to reduce the computational load while maintain the solution quality.Experiments were performed to compare the performance of k-means, global k-means, min k-means, and fast global k-means as shown in Figure 1.Numerical results show that the global k-means algorithm considerably outperforms other k-means algorithms.

Modifications in K-Means Clustering. Global k-means algorithm
Bagirov [35] proposed a new version of the global kmeans algorithm for minimum sum-of-squares clustering problems.He also compared three different versions of the k-means algorithm to propose the modified version of the global k-means algorithm.The proposed algorithm computes clusters incrementally and  − 1 cluster centers from the previous iteration are used to compute k-partition of a dataset.The starting point of the th cluster center is computed by minimizing an auxiliary cluster function.Given a finite set of points  in the -dimensional space the global k-means compute the centroid  1 of the set  as the following equation: Numerical experiments performed on 14 datasets demonstrate the efficiency of the modified global k-means algorithm in comparison to the multistart k-means (MS k-means) and the GKM algorithms when the number of clusters  > 5. Modified global k-means however requires more computational time than the global k-means algorithm.Xie and Jiang [36] also proposed a novel version of the global k-means algorithm.The method of creating the next cluster center in the global k-means algorithm is modified and that showed a better execution time without affecting the performance of the global k-means algorithm.Another extension of the standard k-means clustering is the Global Kernel k-means [37] which optimizes the clustering error in feature space by employing kernel-means as a local search procedure.A kernel function is used in order to solve the M-clustering problem and near-optimal solutions are used to optimize the clustering error in the feature space.This incremental algorithm has high ability to identify nonlinearly separable clusters in input space and no dependence on cluster initialization.It can handle weighted data points making it suitable for graph partitioning applications.Two major modifications were performed in order to reduce the computational cost with no major effect on the solution quality.
Video imaging and Image segmentation are important applications for k-means clustering.Hung et al. [38] modified the k-means algorithm for color image segmentation where a weight selection procedure in the W-k-means algorithm is proposed.The evaluation function () is used for comparison: where  = segmented image,  = the number of regions in the segmented image,   = the area, or the number of pixels of the th region, and   = the color error of region .  is the sum of the Euclidean distance of the color vectors between the original image and the segmented image.Smaller values of () give better segmentation results.Results from color image segmentation illustrate that the proposed procedure produces better segmentation than the random initialization.Maliatski and Yadid-Pecht [39] also propose an adaptive k-means clustering, hardware-driven approach.The algorithm uses both pixel intensity and pixel distance in the clustering process.In [40] also a FPGA implementation of real-time k-means clustering is done for colored images.A filtering algorithm is used for hardware implementation.Suliman and Isa [11] presented a unique clustering algorithm called adaptive fuzzy-K-means clustering for image segmentation.It can be used for different types of images including medical images, microscopic images, and CCD camera images.The concepts of fuzziness and belongingness are applied to produce more adaptive clustering as compared to other conventional clustering algorithms.The proposed adaptive fuzzy-K-means clustering algorithm provides better segmentation and visual quality for a range of images.

The Proposed Methodology.
In this method, the face location is determined automatically by using an optimized K-means algorithm.First, the input image is reshaped into vectors of pixels values, followed by clustering the pixels, based on applying a certain threshold determined by the algorithm, into two classes.Pixels with values below the threshold are put in the nonface class.The rest of the pixels are put in the face class.However, some unwanted parts may get clustered to this face class.Therefore, the algorithm is applied again to the face class to obtain only the face part.

Optimized K-Means Algorithm.
The proposed K-means modified algorithm is intended to solve the limitations of the standard version by using differential equations to determine the optimum separation point.This algorithm finds the optimal centers for each cluster which corresponds to the global minimum of the k-means cluster.As an example, say we would like to cluster an image into two subclasses: face and nonface.So we look for a separator which will separate the image into two different clusters in two dimensions.Then the means of the clusters can be found easily.Figure 2 shows the separation points and the mean of each class.To find these points we proposed the following modification in continuous and discrete cases.
When the size of the data is very large, the question of the cost of the minimization of , where  is the number of clusters and   is the mean of cluster   . Let Then it is enough to minimize Var 1 without losing the generality since where  is the dimension of   = ( 1  , . . .,    ) and   = ( 1  , . . .,    ).

Continuous Case.
For the continuous case () is like the probability density function (PDF).Then for 2 classes, we need to minimize the following equation: If  ≥ 2 we can use (8) min Var 2 Let  ∈ R be any random variable with probability density function (); we want to find  1 and  2 such that it minimizes (7) as follows: min min We know that So we can find  1 and  2 in terms of .Let us define Var 1 and Var 2 as follows: 2  () , After simplification we get To find the minimum, Var 1 and Var 2 are both partially differentiated with respect to  1 and  2 , respectively.After simplification, we conclude that the minimization occurs when where  − =  ⋅ 1 < and  + =  ⋅ 1 ≥ .

Mathematical Problems in Engineering
To find the minimum of Var 2, we need to find / After simplification we get In order to find the minimum of Var 2, we will find Var 1 / and  2 / separately and add them as follows: After simplification we get Similarly After simplification we get Adding both these differentials and after simplifying, it follows that Equating this with 0 gives But [(( − )/( − ))−(( + )/( + ))] ̸ = 0, as in that case  1 =  2 , which contradicts the initial assumption.Therefore Therefore, we have to find all values of  that satisfy (26), which is easier than minimizing Var 2 directly.
To find the minimum in terms of , it is enough to derive Then we find

Finding the Centers of Some Common Probability Density Functions
(1) Uniform Distribution.The probability density function of the continuous uniform distribution is where  and  are the two parameters of the distribution.The expected values for  ⋅ 1  ≥  and  ⋅ 1  <  are Putting all these in (28) and solving, we get (2) Log-Normal Distribution.The probability density function of a log-normal distribution is The probabilities in left and right of  are, respectively, √ 2 (ln  − )

𝜎
) . ( The expected values for  ⋅ 1  ≥  and  ⋅ 1  <  are Putting all these in (28) and assuming  = 0 and  = 1 and solving, we get (3) Normal Distribution.The probability density function of a normal distribution is The probabilities in left and right of  are, respectively, The expected values for  ⋅ 1  ≥  and  ⋅ 1  <  are Putting all these in ( 28) and solving we get  = .Assuming  = 0 and  = 1 and solving, we get 2.2.4.Discrete Case.Let  be a discrete random variable and assume we have a set of  observations from .Let   be the mass density function for an observation   .Let  −1 <  −1 <   and   <   <  +1 .We define  ≤  as the mean for all  ≤   , and  ≥  as the mean for all  ≥   .Define Var 2( −1 ) as the variance of two centers forced by  −1 as a separator

Mathematical Problems in Engineering
The first part of Var( −1 ) simplifies as follows: ( ≤  −1 ) . ( Similarly, Now, in order to simplify the calculation, we rearrange ΔVar 2 as follows: The first part is simplified as follows: Similarly, simplifying the second part yields In general, these types of optimization arise when the dataset is very big.
To find the minimum for the total variation, it is enough to find the good separator  between   and  −1 such that (  ) ⋅ ( +1 ) < 0 and (  ) < 0 (see Figure 3).
After finding few local minima, we can get the global minimum by finding the smallest among them.

Face Localization Using Proposed Method
Now, it becomes clear the superiority of the K-means modified algorithm over the standard algorithm in terms of finding  All pixels with values less than the threshold belong to the first class and will be assigned to 0 values.This will keep the pixels with values above the threshold and these pixels belong to both the face part and the illuminated part of the background, as shown in Figure 5.
In order to remove the illuminated background part, another threshold will be applied to the pixels using the algorithm meant to separate between the face part and the illuminated background part.New classes will be obtained, from  1 to  2 and from  2 to 255.Then, the pixels with values less than the threshold will be assigned a value of 0 and the nonzero pixels left represent the face part.Let [ 1 , . . .,  2 ] be the class 1 which represents the illuminated background part and let [ 2 , . . ., 255] be the class 2 which represents the face part.
The result of the removal of the illuminated back-ground part is shown in Figure 6.One can see that some face pixels were assigned to 0 due to the effects of the illumination.Therefore, there is a need to return to the original image in Figure 4 to fully extract the image window corresponding to the face.A filtering process is then performed to reduce the noise and the result is shown in Figure 7.
Figure 8 shows the final result which is a window containing the face extracted from the original image.

Results
In this Section, the experimental results of the proposed algorithm will be analyzed and discussed.First, it starts with presenting a description of the datasets used in this work, followed by the results of the previous part of the work.Three popular databases were used to test the proposed face localization method.The MIT-CBCL database has image size 115 × 115, and 300 images were selected from this dataset based on different conditions such as light variation, pose, and background variations.The second dataset is the BioID dataset with image size 384 × 286 and 300 images from the database were selected based on different conditions such as indoor/outdoor.The last dataset is the Caltech dataset with image size 896 × 592 and the 300 images from the database were selected based on different conditions such as indoor/outdoor.

BioID Dataset.
This dataset [42] consists of 1521 gray level images with a resolution of 384 × 286 pixels.Each one shows the frontal view of a face of one out of 23 different test persons.The images were taken under different lighting conditions, with a complex background, and contain tilted and rotated faces.This dataset is considered as the most difficult one for eye detection (see Figure 10).

Caltech Dataset. This database [43] has been collected
by Markus Weber at California Institute of Technology.The database contains 450 face images of 27 subjects under different lighting conditions, different face expressions, and complex backgrounds (see Figure 11).

Proposed Method Result.
The proposed method is evaluated on these datasets where the result was 100% with a    Figure 12 shows some examples of face localization on MIT-CBCL.
In Figure 13(a), an original image from BioID dataset is shown.The localization position is shown in Figure 13(b).But there is need to do a filtering operation in order to remove the unwanted parts and this is shown in Figure 13(c) and then the area of ROI will be determined.
Figure 14 shows the face position on an image from the Caltech dataset.

Conclusion
In this paper, we focus on developing the RIO detection by suggesting a new method for face localization.In this method of localization, the input image is reshaped into a vector and then the optimized k-means algorithm is applied twice to cluster the image pixels into classes.At the end, the block window corresponding to the class of pixels containing the face only is extracted from the input image.Three sets of images taken from MIT, BioID, and Caltech datasets and differing in illumination, and backgrounds, as well as in-door/outdoor settings were used to evaluate the proposed method.The results show that the proposed method achieved significant localization accuracy.Our future research direction includes finding the optimal centers for higher dimension data, which we believe is just generalization of our approach to higher dimension (8) and to higher subclusters.

Figure 2 :
Figure 2: The separation points in two dimensions.

Figure 5 :
Figure 5: The face with shade only.

4. 1 .
MIT-CBCL Dataset.This dataset was established by the Center for Biological and Computational Learning (CBCL)

Figure 8 :
Figure 8: The corresponding window in the original image.

Figure 10 :
Figure 10: Sample of faces from BioID dataset for localization.

Figure 11 :
Figure 11: Sample of faces from Caltech dataset for localization.

Table 1 :
Face localization using K-mean modified algorithm on MIT-CBCL, BioID, and Caltech; the sets contain faces with different poses and faces with clutter background as well as indoors/outdoors.time of 4 s.Table1shows the proposed method results with the MIT-CBCL, Caltech, and BioID databases.In this test, we have focused on the use of images of faces from different angles from 30 degrees left to 30 degrees right and the variation in the background as well as the cases of indoor and outdoor images.Two sets of images are selected from the MIT-CBCL dataset and two other sets from the other databases to evaluate the proposed method.The first set from MIT-CBCL contains the images with different poses and the second set contains images with different backgrounds.Each set contains 150 images.The BioID and Caltech sets contain images with indoor and outdoor settings.The results showed the efficiency of the proposed K-mean modified algorithm to locate the faces from the image with high background and poses changes.In addition, another parameter was considered in this test which is the localization time.From the results, the proposed method can locate the face in a very short time because it has less complexity due to the use of differential equations.

Figure 12 :Figure 13 :
Figure 12: Examples of face localization using k-mean modified algorithm on MIT-CBCL dataset.

Figure 14 :
Figure 14: Example of face localization using k-mean modified algorithm on Caltech dataset: (a) the original image, (b) face position, and (c) filtered image.
] = et al. proposed Scalable k-means that uses buffering and a two-stage compression scheme to compress or remove patterns in order to improve the performance of k-means.The algorithm is slightly faster than k-means but does not show the same result always.The factors that degrade the performance of scalable k-means are the compression processes and compression ratio.Farnstrom et al. also proposed a simple single pass k-means algorithm