A Clustering Method for Data in Cylindrical Coordinates

Copyright © 2017 Kazuhisa Fujita. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We propose a new clustering method for data in cylindrical coordinates based on the k-means. The goal of the k-means family is to maximize an optimization function, which requires a similarity. Thus, we need a new similarity to obtain the new clustering method for data in cylindrical coordinates. In this study, we first derive a new similarity for the new clusteringmethod by assuming a particular probabilistic model. A data point in cylindrical coordinates has radius, azimuth, and height. We assume that the azimuth is sampled from a von Mises distribution and the radius and the height are independently generated from isotropic Gaussian distributions. We derive the new similarity from the log likelihood of the assumed probability distribution. Our experiments demonstrate that the proposed method using the new similarity can appropriately partition synthetic data defined in cylindrical coordinates. Furthermore, we apply the proposed method to color image quantization and show that the methods successfully quantize a color image with respect to the hue element.


Introduction
Clustering is an important technique in many areas such as data analysis, data visualization, image processing, and pattern recognition.The most popular and useful clustering method is the -means.The -means uses the Euclidean distance as coefficient and partitions data to  clusters.The Euclidean distance is a reasonable measurement for data sampled from an isotropic Gaussian distribution.We cannot always obtain a good clustering result using the -means because not all data distributions are isotropic Gaussian distributions.
The present study focuses on data in cylindrical coordinates.Data in cylindrical coordinates have a periodic element, so clustering methods using the Euclidean distance will lead to an improper analysis of the data.Furthermore, a clustering method using the Euclidean distance may not be able to extract meaningful centroids.For example, if a distribution in cylindrical coordinates is remarkably curved crescent-shape, the centroid of the distribution calculated by the -means may not be on the data distribution.However, there are no clustering methods optimized for data in cylindrical coordinates.
The cylindrical data are found in many fields such as image processing, meteorology, and biology.Movements of plants and animals and wind direction with another environmental measure are typical examples of cylindrical data [1].The most popular example of data in cylindrical coordinates is color defined in the HSV color model.The HSV color has three attributes that are hue (direction), saturation (radius), and value that means brightness (height).The HSV color model can represent hue information and has a more natural correspondence to human vision than the RGB color model [2].The clustering method for cylindrical coordinates is useful for many fields, especially image processing.
The purpose of this study is to develop a new clustering method for data in cylindrical coordinates based on the means.We first derive a new similarity for clustering data in cylindrical coordinates assuming that the data are sampled from a probabilistic model that is the product of a von Mises distribution and Gaussian distributions.We propose a new clustering method with this new similarity for data in cylindrical coordinates.Using numerical experiments, we demonstrate that the proposed method can partition synthetic data.Furthermore, we evaluate the performance of the proposed method for real world data.Finally, we apply the

Related Works
The most commonly used clustering method is the -means [3], which is one of the top 10 most common algorithms used in data mining [4].We have applied the -means to various fields because it is fast, simple, and easy to understand.It uses the Euclidean distance as a clustering criterion and assumes that the data is sampled from a mixture of isotropic Gaussian distributions.Thus, we can apply the -means to data sampled from a mixture of isotropic Gaussian distributions, but the -means is not appropriate for data generated from other distributions.Data in cylindrical coordinates have periodic characteristics, so the -means will be inappropriate as a clustering method for the data.
We can cluster periodic data distributed on an dimensional sphere surface using the spherical -means (skmeans).Dhillon and Modha [5] and Banerjee et al. [6,7] have developed the sk-means for clustering high dimensional text data.It is a -means based method that uses cosine similarity as the criterion for clustering.The sk-means assumes that the data are sampled from a mixture of von Mises-Fisher distributions with the same concentrate parameters and the same mixture weights.However, we cannot apply the sk-means to data that have direction, radius, and height.To appropriately partition these data, we need a different nonlinear separation method.
There are many methods for achieving nonlinear separation.One method is the kernel -means [8], which partitions the data points in a higher-dimensional feature space after they are mapped to the feature space using a nonlinear function.The spectral clustering [9] is another popular modern nonlinear clustering method, which uses the eigenvectors of a similarity (kernel) matrix to partition data points.The support vector clustering [10] is inspired by the support vector machine [11].These nonlinear clustering methods based on the kernel method can provide reasonable clustering results for non-Gaussian data.However, these methods can hardly provide significant statistics because they perform the clustering in a feature space.This is a problem when we also want to determine some features of data, such as color image quantization.Furthermore, we must experimentally select the optimal kernel functions and its parameters.
Clustering methods are frequently used for color image quantization.Color image quantization reduces the number of colors in an image and plays an important role in applications such as image segmentation [12], image compression [13], and color feature extraction [14].A color quantization technique consists of two stages: the palette design stage and the pixel mapping stage.These stages can be, respectively, regarded as calculating the centroids and assigning a data point to a cluster.Many researchers have developed color quantization methods including median cut [15], the -means [16], the fuzzy -means [17,18], the self-organizing maps [19][20][21], and the particle swarm optimization [22].However, generally, color quantization is performed in the RGB color space although HSV color space is rarely adopted.

Assumed Probabilistic Distribution.
A data point in cylindrical coordinates, x, is represented by x = (,u, ) with ‖u‖ = 1 and  ≥ 0, where , u,  are called the radius, azimuth, and height, respectively.In this study, we represent the azimuth as a unit vector u = (  ,   ) to simply calculate the cosine similarity.Here, each element of x is assumed to be independent and identically distributed.Let a data point in cylindrical coordinates, x = (, u, ), be generated by a probability density function (pdf) of the form where  = {  ,   ,   , ,  2  ,  2  } and vM(⋅) and (⋅) are pdfs of a von Mises distribution and an isotropic Gaussian distribution, respectively.A pdf of a von Mises distribution vM(⋅) has the form where   is the mean of the azimuth with ‖  ‖ = 1,  is the concentrate parameter, and  0 (⋅) is the modified Bessel function of the first kind (order 0).A pdf of an isotropic Gaussian distribution (⋅) has the form where  is the mean and  2 is the variance.  and   are the means of the radius and height, respectively. 2  and  2  are the variances of radius and height, respectively.Thus, the density (x | ) can be written as We can estimate the parameters of density (x | ) using maximum likelihood estimation.Let data set  = {x 1 , . . ., x  } be generated from density ( | ).The log likelihood function of

𝑧
) . ( Maximizing this equation subject to      = 1, we find the maximum likelihood estimates μ , μ , μ , (κ), σ , and σ obtained from where (κ) is It is difficult to estimate the concentrate parameter , because an analytic solution cannot be obtained using the maximum likelihood estimate and we can only calculate the ratio of the Bessel functions.We approximate  using the numerical method proposed by Sra [23], because it produces the most accurate estimates for  (compared to other methods).
We estimate  using the recursive function where  is the iteration number.The recursive calculations terminate when |  −  −1 | <   .In this study,   = 0.001.We calculate  0 using the method proposed by Banerjee et al. [6].
3.2.Cylindrical -Means.The -means family uses a particular similarity to decide whether a data point belongs to a cluster.The Euclidean distance (dissimilarity) is most frequently used by the -means family, and, moreover, is derived using the log likelihood of an isotropic Gaussian distribution.Therefore, the -means using the Euclidean distance will be able to appropriately partition data sampled from isotropic Gaussian distributions but not other distributions.We must develop a new similarity for data in cylindrical coordinates because the -means family clusters by maximizing the sum of similarities between a centroid of a cluster and data points that belong to the cluster.In this study, we obtain the optimal similarity for partitioning data in cylindrical coordinates from an assumed pdf.
First, to develop a -means based method for data in cylindrical coordinates (cylindrical -means; cyk-means), we obtain a new similarity measure for data in cylindrical coordinates by assuming a probability distribution.Assume that a data point x in a cluster that has a centroid m = (  ,   ,   ) is sampled from the probability distribution (x | ) denoted by (4) where  = (m, ,   ,   ).The natural logarithm of where  is a normalizing constant given by Here, we ignore the normalizing constant ln  to obtain In this study, this equation is used as a similarity for the cyk-means.(x, m) denotes the similarity between the data point x and the centroid m.The terms in ( 12) consist of the cosine similarity and the Euclidean similarities, and the new similarity is a sum of these similarities weighed.The weights indicate the concentrations of distributions.This similarity can also be considered as a simplified log likelihood.
The cyk-means partitions data points in cylindrical coordinates into  clusters using the procedure same as the means.Let  = {x 1 , . . ., x  } be a set of data points in cylindrical coordinates.Let m  = (  ,   ,   ) be the centroid of the th cluster.Using the similarity (x  , m  ), the objective function  is where   is a binary indicator value.If the th data point belongs to the th cluster,   = 1.Otherwise,   = 0.The aim of the cyk-means is to maximize the objective function .
The process to maximize the objective function is the same as that of the -means and is described as follows.
(1) Fix  and initialize m  . ( = 1 (8) end for (9) {Estimate parameters} (10) for  = 1 to  do (11) In this study, we use   = |0.001×   | where   is the objective function of the th iteration.Algorithm 1 shows the details of the algorithm of the cyk-means.From ( 6), the elements of the centroid vector, m  = (  ,   ,   ), of the th cluster are where   is the number of data points in the th cluster (which has the form   = ∑  =1   ).The other values used to calculate the objective function are is approximated by Sra's method using the ratio of the Bessel function   .The cyk-means method has many parameters.The means method for data in three-dimensional Cartesian coordinates has only 3 parameters, which are multiples of the number of centroid vectors and dimensions.However, the cyk-means has 7 parameters, which are multiples of the number of clusters and the number of parameters of a cluster.The parameters of the th cluster are   ,   (two dimensions),   ,   ,   , and   .Because the cyk-means has more degrees of freedom, the dead unit problem (i.e., empty clusters) will frequently occur if the initial  is not optimal.

Fixed cyk-Means.
Model based clustering methods have various problems such as the dead units and initial value problems.One reason for this is that the log likelihood equation can have many local optima [9].If a model has more parameters, these problems tend to be more frequent.In the fixed cyk-means, the concentrate parameter  and the variances  2 s are fixed for particular values.As a consequence, the fixed cyk-means has 4 parameters.Fixing the parameters decreases the complexity of the model and makes these problems less.Algorithm 2 indicates the fixed cyk-means algorithm.

Computational Complexity.
Assigning data points to clusters has a complexity of () per iteration.We must estimate six parameters.We obtain three s, two s, and  in (6 +  max ) time per iteration, where  max is the convergence time of .Therefore, the total computational complexity per iteration is (7 +  max ).The complexity of the fixed cyk-means is (5) per iteration, so the cykmeans is approximately 1.5 times as complex as the fixed cykmeans.

Experimental Results
In our experiments, we use Python and its libraries (NumPy, SciPy, and scikit-learn) to implement the proposed method.

Synthetic Data.
In this subsection, we demonstrate that the cyk-means and the fixed cyk-means can partition synthetic data that is defined using cylindrical coordinates.The dataset used in this experience has three clusters, as shown in Figure 1(a).The data points in each cluster are generated from the probability distribution denoted by (4), with the parameters shown in Table 1.Figures 1(b), 1(c), and 1(d) show the clustering results of the cyk-means, the fixed cykmeans with  = 25,   = 0.1, and   = 0.1, and the means, respectively.We can see that the cyk-means and the fixed cyk-means properly partition the dataset into each cluster.On the other hand, the -means regards two upper right clusters as one cluster and unsuccessfully partitions the dataset.Table 2 shows the parameters estimated by the cyk-means, the fixed cyk-means, and the -means.The cykmeans can only estimate the concentrate parameters and the variances.The values of the concentrate parameters and the variances estimated by the cyk-means are approximate to the true values.The cyk-means most appropriately estimates the  number of data points in each cluster.The fixed cyk-means most approximately estimates the all means and the cykmeans also approximately calculates the all means.These results show that the cyk-means and the fixed cyk-means sufficiently approximately estimate the all means.
In the next experiment, we examine the effectiveness of the proposed methods (the cyk-means and the fixed cykmeans with  = 15,   = 0.1, and   = 0.1) compared to the -means and the kernel -means with a radial basis function.
The parameter of the radial basis function is  = 0.1.The synthetic data have  clusters and are defined in cylindrical coordinates.The number of data points in each cluster is 200.The mean azimuth of the th cluster arctan(  /  ) is a random number in [0, 2}.The concentrate parameter   is a random number in [5, 30}.The mean radius of the th cluster   is a random number in [1, 4}.The mean height of the th cluster   is a random number in [−1.5, 1.5}.The standard deviations of   and   are random numbers in [0.05, 0.2}.
Figure 2 shows the relationship between the number of clusters and adjusted rand index (ARI).ARI evaluates the performance of clustering algorithms [24].When ARI = 1, all data points belong to true clusters.The figure shows that the cyk-means has the largest ARI for almost all cases.The fixed cyk-means performs better than the kernel -means and the -means.The -means performs the worst.In conclusion, the cyk-means most accurately partitions synthetic data defined in cylindrical coordinates, and the fixed cyk-means also performs well.

Real World Data.
We show the performances of the proposed methods for the iris dataset (http://mlearn.ics.uci.edu/databases/iris/)and the segmentation benchmark dataset (http://www.ntu.edu.sg/home/asjfcai/BenchmarkWebsite/ benchmark index.html)[25].The iris dataset has 150 data points of three classes of irises.The data point consists of the four attributes, sepal length in cm, sepal width in cm, petal length in cm, and petal width in cm.The segmentation benchmark dataset consists of 100 images from the Berkeley segmentation database [26] and ground-truths generated by manual labeling.
Table 3 depicts the ARI scores of the cyk-means, the fixed cyk-means, the -means, and the kernel -means for the iris dataset.The parameters of the fixed cyk-means are  = 15,   = 0.1, and   = 0.1. of the radial basis function of the kernel -means is 0.01.In this experiment, we use only three attributes of the iris dataset because the proposed methods are specialized for 3-dimensional data.Furthermore, we transform this dataset that has three attributes into zero mean dataset.In all cases, the performance of the cyk-means is lower than the other methods.Conversely, in almost all cases, the performance of the fixed cyk-means is the best.However, the difference in the performance between the fixed cykmeans, the -means, and the kernel -means is not large.
Table 4 shows the ARI scores of the cyk-means, the fixed cyk-means, and the k-mean for seven images in the segmentation benchmark dataset.The parameters of the fixed cyk-means are  = 15,   = 0.1, and   = 0.1.To evaluate the performances of the cyk-means and the fixed cyk-means, we convert images from RGB color to HSV color.When we cluster the dataset by the -means, we use images represented by RGB color and HSV color.In this experiment, we compare a clustering result with a ground truth using the ARI score.We set the number of clusters  to the number of segments in a ground truth.In all cases, the fixed cyk-means stably shows good performance.The cyk-means indicates much better or worse performances than the other methods.In other words, the cyk-means shows unstable performance.This instability will be caused by the cyk-means more easily trapping a local minimum because of more parameters.

Application to Color Image
Quantization.We apply the cyk-means and the fixed cyk-means to color image quantization and compare the results to those using the -means.We convert images quantized by the proposed methods from RGB color space to HSV color space before quantization, whereas an image processed by the -means is represented using RGB. Figure 3 contains the four test images from the Berkeley segmentation database [26] and their quantization results.The original color images have sizes of 481 × 321 or 321 × 481 and are used as the test images to quantize into three colors.These quantization results are generated by the cyk-means, the fixed cyk-means with  = 25,   = 0.1, and   = 0.1, and the -means.The color of a pixel in the quantized image represents the value of the centroid of the cluster that contains the pixel.For image 118035 in Figure 3, the colors of the background, the wall, and the roof are obviously different from each other.The cyk-means and the fixed cyk-means successfully segment this image, whereas the -means extracts the shade from the wall and can not merge the wall to one color.Furthermore, the quantization results using the cyk-means and the fixed cyk-means are very similar.
Image 26098 consists of red and green peppers on a display table.The cyk-means merges the red peppers and the planks of the display table and divides the dark area into two colors.The fixed cyk-means successfully extracts the red peppers.The -means assigns red to the planks and part of the green peppers.
Image 299091 consists of some sky with cloud, an ocher pyramid, and ocher ground.The cyk-means groups the ocher pyramid and white cloud into the same color, whereas the fixed cyk-means correctly segments the pyramid and the sky.The -means is unsuccessful; it divides the pyramid into three regions (an ocher region, a highlight region, and a shade region).
The cyk-means did not perform well for image 295087.It segments the image into two colors even though we set the number of clusters to three.Thus, the cyk-means makes a dead unit.This is because the concentrate parameter and variances, respectively, become small and large if a distribution of data points is regarded to visually consist of a few clusters.Thus, a few clusters include all data points and dead units (empty clusters) appear, even if we fix the number of clusters to a large number.In contrast, the fixed cyk-means (which has fixed concentrate and variance values) appropriately partitions the ground and the blue and the deep blue regions of the sky.The -means extracts shaded regions from the ground; that is, it can not group the ground into one region.
Furthermore, the initial parameters,  and s, of the fixed cyk-means can control the quantization results.Figure 4 shows the quantization results generated by the fixed cykmeans using the different parameters.The original image in Figure 4 consists of two objects: the red fish and the arms of an anemone.The fixed cyk-means with  = 25,   = 0.1, and   = 0.1 can not extract the red fish shown in the middle image of Figure 4.However, the fixed cyk-means with  = 50,   = 0.5, and   = 0.5 extracts the red fish in the left image of Figure 4.This is because a large  and/or large variances relatively increase the cosine similarity term of (12), and consequently clustering is more focused on the hue element.
In conclusion, the fixed cyk-means is a more suitable method for color image quantization than the cyk-means.The fixed cyk-means quantizes color images with respect to the hue.The quantization results of the fixed cyk-means differ from that generated by -means.That is because the Euclidean metric cannot consider the hue.

Conclusion and Discussion
In this study, we develop the cyk-means and the fixed cykmeans methods, which are new clustering methods for data in cylindrical coordinates.We derive a new similarity for the cyk-means from a probability distribution that is the product of a von Mises distribution and two Gaussian distributions (see (4)), because the Euclidean distance cannot properly represent dissimilarities between data points on periodic axes.Our experiments demonstrate that the cyk-means and the fixed cyk-means can properly partition synthetic data in cylindrical coordinates.Furthermore, the experimental results using real world data show that the fixed cyk-means has equal or better performance than the -means and the kernel -means.In the final experiment, the proposed methods are applied to color image quantization and successfully quantize a color image with respect to the hue element.
The experiments that partitioned synthetic data demonstrate the effectiveness of the cyk-means.In the first experimental results, the cyk-means produces good estimates of the parameters and clustering data.The results of the second experiment show that the cyk-means performs the best when clustering synthetic data.However, in the experiment using real world data we find that the cyk-means did not provide good clustering results.Furthermore, the results of the color image quantization suggest that the flexibility of the cykmeans often produces dead units or a small cluster containing few data points.Thus, the cyk-means may not be appropriate for actual applications.
The fixed cyk-means will be an effective method for actual applications.The fixed cyk-means is stable and performs well when we apply it to clusterings of synthetic data, real world data, and color image quantization.Furthermore, the fixed cyk-means hardly makes dead units because the number of its parameters is smaller than the cyk-means.The fixed cykmeans requires less computational time than the cyk-means with similar results.
In future work, we will improve the performance of the proposed methods.The proposed methods are exposed to the ill-initialization problem and/or the dead unit problem caused by an incorrect initialization, similar to -means.The -means++ method proposed by Athur and Vassilvitskii [27] solves the ill-initialization problem of -means and improves the clustering performance by obtaining an initial set of cluster centers that is close to the optimal solution.The conscience mechanism improves the performance of competitive learning and clustering algorithms [28][29][30].It inserts a bias into the competition process so that each unit can win the competition with equal probability.Xu et al. [31] proposed an algorithm based on competitive learning called rival penalized competitive learning [2,32], which determines the appropriate number of clusters and solves the dead unit problem.The strategy of rival penalized competitive learning is to adapt the weights of the winning unit to the input and to unlearn the weights of the 2nd winner.By incorporating the approaches in these algorithms into the proposed methods, we will improve the performance and reduce the effect of the intrinsic problems.

Figure 1 :
Figure 1: (a) Scatter plot of the synthetic dataset including three clusters.(b), (c), (d) Clustering results of the cyk-means, the fixed cyk-means, and the -means.The three clusters are shown with circles, triangles, and crosses.

Figure 2 :
Figure 2: Relationship between the number of clusters  and adjusted rand index (ARI).The vertical and the horizontal lines indicate ARI and the number of clusters , respectively.The results are the mean of 200 runs on randomly generated synthetic data for each .

Figure 3 :
Figure 3: Quantization results.The first column contains the original images.The second, third, and fourth columns contain the quantization results generated by the cyk-means, the fixed cyk-means, and the -means, respectively.All original images are clustered with  = 3.

Original k = 25 5 Figure 4 :
Figure 4: Quantization results of the fixed cyk-means with different parameters.The left image is the original.The middle and the right images are quantized with  = 3.

Table 1 :
Parameters of dataset. is a cluster number.  is the azimuth of the centroid of the th cluster.  = arctan(  /  ).

Table 2 :
Parameters estimated by the cyk-means, the fixed cyk-means, and the -means.The results are the mean of twenty runs on randomly generated initial values. is a cluster number.  is the azimuth of the centroid of the th cluster.  = arctan(  /  ).The best estimations are bold.

Table 3 :
Comparison of performances of the four methods using the iris dataset.are the mean of 200 runs with random initial values.The best estimations are bold."Attributes" indicates three attributes used in clustering. ARI

Table 4 :
Comparison of performances of the four methods using the segmentation benchmark dataset.are the mean of 200 runs with random initial values.The best estimations are bold."Number" indicates the number of the image."-means (HSV)" and "-means (RGB)" indicate that we cluster HSV and RGB color images by the -means, respectively. ARI