Improved Unsupervised Color Segmentation Using a Modified HSV Color Model and a Bagging Procedure in K-Means + + Algorithm

Accurate color image segmentation has stayed as a relevant topic between the researches/scientific community due to the wide range of application areas such as medicine and agriculture. A major issue is the presence of illumination variations that obstruct precise segmentation. On the other hand, the machine learning unsupervised techniques have become attractive principally for the easy implementations. However, there is not an easy way to verify or ensure the accuracy of the unsupervised techniques; so these techniques could lead to an unknown result.This paper proposes an algorithm and a modification to theHSV color model in order to improve the accuracy of the results obtained from the color segmentation using the K-means++ algorithm. The proposal gives better segmentation and less erroneous color detections due to illumination conditions. This is achieved shifting the hue and rearranging theH equation in order to avoid undefined conditions and increase robustness in the color model.


Introduction
Machine learning application area is growing every day, so it is possible to find applications using machine learning in health areas [1][2][3][4][5], system behavior predictions [6][7][8][9][10], image and video analysis [11][12][13][14][15], and speech and writing recognition [16][17][18][19][20], just to mention some of the most notable and recent applications.Some of the results of these advances in machine learning can be appreciated in several applications that are widely and freely available.As a result, the usage of machine learning has become a common component in the daily modern life.
Despite the huge improvements done in machine learning in the recent years, it still requires more work and research.This can be stated from the fact that a total accuracy is not yet achieved by machine learning, and sometimes the results are still not usable.This is the reason for the present proposal, which is developed in the spirit of improving a common process performed in image analysis using machine learning.
This paper gives an overview of the color models by stating their advantages and problems found when they are implemented or are used as input in other processes.A second topic discussed in the color models overview is the concept of chromatic and achromatic separation by the exclusion or mitigation of the illumination component.The latter topic has significant relevance, since many of the issues found in color detection and segmentation come from the fact that the illumination can change the perception from a color ranging from bright white to black, visiting several tones of the same base color.The overview will set the ground on which some changes are made to the  perceptual color model.
Also a section regarding machine leaning algorithms is included, where the relevance of unsupervised learning is explained.Also in this section what issues can be found in the popular -means algorithms is explained, as well as of course some existing techniques used to improve the classification results obtained from the usage of this algorithm (like -means++).This section is used to support some minor extra changes applied in combination with some commonly used techniques that can improve the outcome from the algorithm.

Mathematical Problems in Engineering
The changes applied to the color model and to the classification algorithm are implemented in a way that they aid the resulting classification process.The resulting process is compared against other color models and some variants.As a testing set, the Berkley Segmentation Dataset and Benchmark BSDS500 is mainly used [21], which provides several testing images and ground truth segmentations done by different subjects.The BSDS500 dataset has been used as a testing environment by other segmentation works [22][23][24][25][26][27].Using the BSDS500 dataset gives a more reliable ground in the testing cases.

Commonly Used Colors Models
Color is a property that is usually addressed in computer vision, because it becomes useful to distinguish and recognize objects or characteristics in an image.Due to its importance, several ways to describe and explain the color hue have been developed.All these developed methods can be named as color models and color spaces.A color model is a set of equations and procedures used to calculate a specific color, while a color space is the set of all the possible colors generated by a color model.
The additive, the subtractive, the perceptive, and the CIE models are among the most common color models that can be found.Other models were made especially for video and image transmission (like television broadcasting).
The additive color model consists of the mixture of two or more colors known as primary colors.The most representative model in this type is the Red-Green-Blue model or .This model is widely spread, since it is the base of many electronic devices that display color (television, computers, mobile phones, etc.).This model is the simplest to implement, since it only required an amount of each primary color added over a black surface in order to obtain a given color hue [28].
The subtractive color model is similar to the additive color model; it uses a set of primary colors to obtain a color hue.The difference between additive and subtractive color models is that the subtractive model subtracts or blocks a certain amount of primary colors over a white surface instead of adding an amount of the primary colors over a black surface.The most representative subtractive color model is the Cyan-Magenta-Yellow model or .This model is mostly used in printing processes.
The idea behind the perceptual color models is to create a similar process to the one that occurs when the brain processes an image.It is also referred to as a psychological interpretation of the colors.Basically this type of model splits the color in a hue component, a saturation component, and a light component.The most common models in this category are , , and .These models have the characteristic of being represented by geometric figures, usually a cone, bicone, or cylinder.This type of geometric representation allows an easy manipulation of the color [29].
The CIE color models are those models created in the International Commission on Illumination (CIE).The CIE is a global nonprofit organization that gathers and shares information related to the science and art of light, color, vision, photobiology, and image technology [30].
This organization was the first to propose the creation of standardized color models.The most famous are CIE-, CIE-, and CIE- or .The  color model is used in several image edition software tools, since it offers a robust gamut.The  color model has an illumination component "" and two chromatic components "" and "." In the video and image transmissions, different color models are used.These models do not belong to a specific type and are related to the  color model.These models also split the color into an illumination component and two chromatic components but differ from  model in the way of the calculation of each component.The main purpose of these models is to adapt to the color image to the transmission processes (television broadcasting).Most of the calculations of the components in these models are meant to be used directly in analog television sets or cameras.The most common models in this category are YCbCr, YUV, and YDbDr.

Problems with the Color Models.
A reason for the existence of many colors models is that none of the color models is perfect.Any color model has failure points or is sometimes hard to manipulate.Due to the advantages and disadvantages, each color model has its own niche.
The additive and subtractive color models are easy to implement and understand, but they do not have a linear behavior.Also they are highly susceptible to illumination changes.
The CIE color models have robust gamut and the illumination component is separated from the chromatic components.The problems with the CIE color models are related to the nonlinearity behavior and the difficulty in the implementation of these models.
The models used for video and image transmissions are designed to be used in digital and analogic transmissions, so the implementation of these models for other purposes is complex.
The perceptual color models have the illumination component isolated and have a linear behavior.These models do not offer a robust gamut as the CIE models, since the perceptual color models only have one component for the chromatic information.Another issue with the perceptual color models comes from the equations used to calculate the hue and the saturation component.Hence, in case of having a white, black, or gray color, these two components could become undefined.
Equations ( 1), (2), and (3) are used for the  color model.Using these equations as example, it can be seen that in case of white or black or a gray tone the maximum and the minimum have the same value.In this case the  component (see (3)) becomes undefined.A usual workaround implemented in the most popular image processing libraries is to assign the value of zero when  is undefined.In the  color model, red hue has an  value of zero.So, it produces erroneous detections assigning the same hue to red, black, and gray tones.)  = max (, , ) . (3)

Alternative Color Models.
Due to the issues present in color models, some proposals have arrived in order to alleviate the problems found.Some of these alternative color models are variants from the existing color models, and the creation is meant to address a specific issue or to create an easier implementation of the model.The normalized  or n- was created specifically to help the  color model to deal with the illumination changes.Illumination is one of the most serious issues when color detection is performed.So the main idea behind the normalization is to use a percentage of the primary color instead of an amount.Theoretically, illumination modifies proportionality each color component, so the  color (50, 100, and 150) should have the same color hue as the color (5, 10, and 15) but with different illumination.
Equations ( 4) are used to calculate the n- color space.The n- space mitigates the effect of shadows and shines but also it could reduce the detection precision [31].
Another technique to improve detection processes and avoid the effect of illumination is to ignore the illumination component.This is usually done in perceptual color models and -like color models, where the illumination component can be split.By applying this partial selection of components from the color models in color segmentation, some interference coming from unnecessary data like illumination or saturation can be avoided.Also as the information input from the color model is reduced, the segmentation and identification process is accelerated in the classification algorithms.
The most common cases are from the perceptual models, where the  and  components [32][33][34] or the  and  components [35] or only the  component [36,37] or a mixture between the components [38] is used.
Another case is the partial usage of the  color model, where the  component is excluded, using only the chromatic components to perform the color detection [39].

Adapting 𝐻𝑆𝑉 Color Model to 𝐾-Means
-Means is an algorithm classified under the unsupervised learning category.Unsupervised learning algorithms are capable of discovering structures and relationships by themselves just using the input data [40].
The -mean algorithm is commonly used in clustering processes.The algorithm was introduced by MacQueen in 1967 [41], even though the idea was conceived in 1957.The public disclosure of the algorithm was not done until 1982 [42].The -means algorithm is an iterative method that selects  random clusters centroids.In every iteration, the centroids are adjusted using the closest data points to each centroid.The algorithm ends when a defined iteration has been executed or a desired minimum data-centroid distance has been found.This behavior makes the -means be referred to as an expectation maximization algorithm variant.
-Means algorithm has some variations that are meant to improve the quality of the resulting segmentation; some popular examples are fuzzy -means and -means++.
The -means algorithm does not always generate good results, mostly due to the random cluster centroid initialization.This random initialization generates problems like the case of having two cluster centroids being defined too close to each other.This would result in the misclassification of one group of related items in two different clusters.Another case is when a cluster centroid is defined far from the real data related group centroid.The random defined cluster centroid could never reach the real centroid in the amount of defined iterations.
These kinds of problem motivate researchers to propose improvements to the original -means algorithm.Arthur and Vassilvitskii proposed an improvement to the -means algorithm, focusing on the initialization process; they called their algorithm -means++ [43].Basically -means++ uses a simple probabilistic approach to calculate the initial cluster centroids by obtaining the probability of how well a given point performs as a possible centroid.
Due to the advantages and the ease of implementation, this paper uses -means++ in order to create more accurate results in the clustering process.
As it was mentioned,  produces undefined values when a black or white or gray tone is present in the image.This would discourage the usage of this model or using it under the premise that sometimes the color detection will fail under the previously mentioned circumstances.
An additional issue comes into account when a distancebased algorithm like -means is used.This comes from the fact that the  component is measured like the angle of a circumference.The usage of this angle representation implies that the next  value for 359 is 0.An algorithm like -means detects that 359 and 0 are far from each other and they should be classified in different clusters.
The previous issue could be solved by adding additional logic in the distance measurement method by adding rules to avoid the miscalculation of the distance in the  component.Using this approach could produce an excessive increase in the computational work.
The implementation of -means++ does not guarantee the correct classification of the input items.-Means++ improves the general outcome by providing a better start.A better start helps to find the best solution faster and/or to reduce the amount of erroneous clusters definition.
The present paper proposes an adaptation to the  model in order to overcome the previously mentioned issues while providing basis to improve the result in the -means++ algorithm.
Most of the image libraries use the 1-byte per component representation, which implies that the value for the  component from the  color model must be adapted to fit in the given space.The preferred approach is to take the half of the  component (divided by two), so the  component goes from 0 to 179.Regarding the  and  components, each has a range from 0 to 255.This approach is preferred, since if the 2-byte representation is used, the amount of memory required to process the image increases significantly.The proposed change consists in modifying the way  is defined, especially when it becomes undefined.The idea is to take advantage of the unassigned values in the  byte.The  byte covers a range only from 0 to 179, so 180 to 255 are unassigned empty values.Basically, instead of assigning  to zero when white, black, and grayscale colors are detected, these colors are assigned to a range of the empty values.
The selected range in this work for the black, white, and gray tones is from 200 to 255.The starting point was selected in a way that the separation from the last  value (179) is easily detected by -means.Lower starting points can be chosen, but 200 was selected in order to remark the separation between the possible clusters.The two areas defined in the  component match the definition of chromatic and achromatic regions.In this case, the chromatic region is defined from 0 to 179 and the achromatic region from 200 to 255.Using the achromatic and chromatic definitions [55] adapted for  color space [Figure 1], the following can be stated: (1) Color hue () is meaningless when the illumination () is very low (turns to black).
(2) Color hue () is unstable when the saturation () is very low (turns to gray).
(3) When saturation () is low and illumination () is high, the color hue () is meaningless (turns to white).
In all the cases when  becomes unstable or meaningless, the achromatic zone is considered; otherwise the chromatic zone is considered.The procedure marks as achromatic an  value when the saturation is low or the illumination is low.The previous statement requires the definition of a threshold (th) that indicates when the  value is achromatic.Using the definition, this threshold must be applied to the  and  components and it will indicate when  or  values are low enough to consider  meaningless.
Using the previous concepts in the  equation [see (3)] results in the following equation:   As it was explained in the previous sections,  is the angle of a circumference, so the next value from 359 is zero.In the case of the 1 byte per component representation, the next value from 179 is zero.The color hue corresponding to this discontinuity area is the red tone.This issue produces the creation of two separate clusters for the red color, even if the hues are almost the same.Some approaches can be implemented in order to correct the possible erroneous creation of clusters.Rules in the distance measurement in the -means++ algorithm when the  value is close to the discontinuity region can be implemented.But this generates an important load in the computer work.
This paper proposes the usage of shifted angles for the  component.This means that the discontinuity can be placed in another color hue.The creation of two shifted angles representations for  is proposed, so they can be combined and eliminate the discontinuity issue.The original  has the discontinuity in the red hue, the first shifted  (120) has the discontinuity in the green hue (120 ∘ ), and the second shifted  (240) has the discontinuity in the blue hue (240 ∘ ).As can be seen, the shift operation is done in evenly defined amounts (120 ∘ from each  component) [Figure 2].
In the case of the 1-byte representation, the shift amount is 60 for 120 and 120 for the 240 (half of the original values).
The original and the two shifted  components are meant to be processed by -means++.This would seem to generate significant extra computational work, but this process is meant to solve the discontinuity issue in the  and also improve the classification performed by the -means++ algorithm.This is explained in detail in the next section where the complete improvement process is exposed.
The work done in this paper uses the  partial model approach to eliminate the effect of illumination in component  from the color process.It also excludes the  component, since the main purpose is the segmentation or classification by color hue.The selection of only one component speeds up the process done by -means++ by reducing the complexity in the input.The complete process to create the input for the -means++ is as in Pseudocode 1 in order to calculate , 120, and 240 for each pixel in the  input image.
Since the pseudocode is set to operate in a 1 byte per channel model, the  values are in the range of 0-179.After obtaining the -means++ clusters, a matching and grouping operation is performed.The reason for the matching and grouping operation is to detect similar clusters and group them together.The idea is that if a cluster group has two or more members, this cluster group has more probability to be a real cluster.This approach makes those cluster groups that were affected by the discontinuity be detected and ignored as they usually contain only one member.The shift operation forces the discontinuity to affect a different hue, so the other tones are not affected.
For instance, the original  component is affected in the red hue by the discontinuity, so the -means++ algorithm would produce a split cluster in this affected hue area.But 120 and 240 are not affected in the red hue by the discontinuity, so the -means++ algorithm would produce the correct cluster for a red hue.
Another reason to apply -means++ to three versions of the same information is to improve the cluster quality.Even though -means++ is an improvement over -means, still certain amount of the process relies on randomness, producing sometimes a not so accurate initial centroid.Performing the same classification several times helps to enforce the results by taking those groups of similar clusters with more members as the most probable real clusters.The process for the -means++ clustering and grouping can be described as in Pseudocode 2.
The purpose of the matching and grouping is to take those clusters with a high similarity and group them together.This could seem to be a trivial task, but its implications make it a complex procedure.The simplest approach is to only use the Euclidean distance between the cluster centroids and group the clusters with the lowest distance [54].
The previous approach could not always produce the best result, due to the variance of the elements in the clusters or the cases of missing clusters or the case where a cluster is divided.An algorithm proposed to match clusters alleviating the possible issues found is the Mixed Edge Cover (MEC) [56].The MEC algorithm calculates the similarities and dissimilarities between the clusters using a distance measurement that eliminates the variance issues.The Mahalanobis distance between each cluster element can be used for this purpose [57].So this paper uses the Mahalanobis distance as the similarity measurement between the clusters.
Bagging is a technique used in machine learning, where several versions of a predictor algorithm or a classifier algorithm are used to generate a new predictor or classifiers.Usually this is done by averaging the results in predictors, and, in the case of the classifiers, a voting process is performed [58].The proposed procedure creates groups of similar clusters and then eliminates the groups with fewer members using a voting system.
After the voting is finished in the first bagging process, a second bagging process is executed in order to create a unified cluster from each selected cluster group.The voting in the second bagging process creates a cluster from those cluster items common in two or more clusters.So if a cluster item appears in just one cluster, this is considered as noise data or a misclassified pixel.

Proposed Method's Theoretical
Ground.The issues found in color spaces are related to discontinuities and nonlinear behaviors.Classification methods based on distances like means cannot handle these issues correctly when a color classification is required.The  model has a linear behavior in the color hue component  but suffers from a discontinuity when it changes from 359 to 0.
The proposed change moves the discontinuity to different values.It creates two additional versions of the  component (120 and 240), where the discontinuity occurs in different colors hues.Performing a clustering operation on one of the components, , 120, or 240, produces clusters, where the discontinuity could be manifested in the form of real cluster divided into two clusters.This issue does not exist in the clusters coming from the other two components.
Performing cluster matching and grouping over all the clusters coming from all the components generates groups, where if it contains 2 or more elements it can be considered like a real cluster; otherwise the group can be ignored.So, this process alleviates the discontinuity issue found in the  component.
Additionally splitting the chromatic and achromatic values allows reducing the effect of shines and reflections that can lead to incorrect classification.Also it avoids the issues happening in the  when the pixel is a shade of gray ( becomes undefined using (3)).Instead of setting the  value to 0 in this case, the proposed improvement uses an unassigned value range in the  component.This facilitates the clustering process by having a specific region for the chromatic tone and a separated region for the achromatic ones.
All the changes performed in the proposed improvement eliminate the discontinuity and provide a more linear input data for the -means algorithm.Additionally the changes allow mitigating shadow, shines, and reflection which alter the perception of color tones.This has a positive effect in the classification performed by -means compared to classification performed using other color models, producing more accurate results.

Testing and Experimentation
In the testing process, the proposed  model is tested against other color models and the original .The testing dataset comes from two sources, mainly the BSDS500 and a couple of images from the Free Images website [59].From the first dataset, the ground truth is taken from the files inside the dataset, while in the second test some ground truth images were created.All the color models are processed by the -means++ algorithm which is set to find 4 or 5 clusters (usually the amount of segmented objects found in the BSDS500 dataset).
Once the clusters are obtained for each tested color model, they will be evaluated using statistical measurements.Usually measurements like specificity [see (6)], sensitivity [see (7)], and accuracy [see (8)] are used in segmentation tests.These measurements use parameters like True Positive (TP, number of pixels included in the segmented object which are correctly classified), True Negative (TN, number of pixels not included in the segmented object which are correctly classified), False Positive (FP, number of pixels included in the segmented object which are incorrectly classified), and False Negative (FN, number of pixels not included in the segmented object which are incorrectly classified).This work uses balanced accuracy [see (9)] [60] in order to use the accuracy as an overall measurement, in which the specificity and sensitivity are added in certain proportion by applying the adjustment parameters  and  (usually these parameters are set to 0.5).
Balanced accuracy measurement should give an overview of how well the test is performing, but unfortunately this is not always possible.Basically, since the parameters depend on the amount of pixels in the segmented object or outside of it, it could lead to a high balanced accuracy if a high value in specificity or sensitivity is calculated.In order to avoid this case, the parameters  [see (10)] and  [see (11)] are calculated considering the number of pixels in the segmented object (VP) and the pixels outside the segmented object or background (BP) [61].
The selected color models used for the comparison are those that appear commonly in the literature: The test for each color model is executed 20 times, taking the best result and the average.So a more reliable statistical comparison can be made among the color models using -means++ algorithm.
In order to apply the modified  model, it is necessary to define a threshold value, th [see (5)], so the chromatic and achromatic regions can be placed in the  component.After performing some tests over a group of images, it was observed that setting the threshold around 30% of the value for the  and  components produced the best result in the segmentation, so this threshold will be used in the tests.
Also a set of images from the selected datasets sources is selected to perform the comparison.The BSDS500 dataset is intended mainly to perform object segmentation.Color segmentation algorithms can solve the segmentation task in some of the proposed scenarios in the BSDS500 dataset.So, taking that in account, a subset of images, where the ground truth is close to color segmentation, was selected.
In order to provide more comparison data regarding the behavior of the proposed improvement, another clustering algorithm is used in the tests.Gaussian mixture model performs in a similar way to -means, so implementation of the GMM using the expectation maximization (EM) method is used to provide a comparison with a different algorithm.
For both algorithms, the conditions are similar; both perform 200 iterations.Regarding the starting point for the Gaussians in GMM, the -means++ initialization algorithm is used to set the initial mean and standard deviation.It creates a scenario where a fair comparison can be made.

Test Results.
A few images from the selected test dataset are exposed in order to demonstrate how every color performs in the segmentation done by -means++.
The images in Figure 3 are selected to show visually the segmentation done by each of the selected color models and some metrics showing the performance.
In Tables 1-12 in the first row the ground truth clusters coming from each of the images from Figure 3 are given.The following rows contain the results from each of the segmentations produced using each of the selected color models.In the last columns, some metrics measuring the performance are given: (i) Mean BAcc: the average balanced accuracy using all the data from all the clusters and all the iterations (ii) Best BAcc: the best individual balanced accuracy for one cluster occurring in the iterations (iii) Mean sen.: the average sensitivity (iv) Mean spe.: the average specificity (v) Avg.time: the running time for the algorithm given in seconds.This is used to measure the CPU time needed.For the proposal, the time measurement is divided into 2 phases: one for the clustering time () and another for the bagging time () From the results in Tables 1-12, it can be seen that the proposed improvement is most of the time in the first place.worst BAcc and the best BAcc measurements, but in the average (mean BAcc) the proposed method has a better score.
In order to correctly validate the experimental results, a statistical test is performed over the balanced accuracy observed in the comparison results.In this case, the Wilcoxon test is conducted.The Wilcoxon test is a nonparametric test used when a normal distribution cannot be guaranteed in the data.Its null hypothesis, over two different results, considers that the two compared populations come from the same distribution [62].The Wilcoxon method has been commonly used to compare algorithms behaviors in order to verify which one has a better performance using normalized values (from 0 to 1) [63].The Wilcoxon signed-rank sum is set to use the right tail.Under such conditions, the alternative hypothesis is that the first population data has a higher median than the second population data.Therefore, the first = max (, , ) − min (, , ) 1 − |max (, , ) + min (, , ) − 1| max (, , ) − min (, , ) )  = max (, , ) 60 * ( 2 + ( − ) max (, , ) − min (, , ) )  = max (, , ) 60 * ( 4 + ( − ) max (, , ) − min (, , )

3. 1 .
Modified  Color Model Calculation.The proposed adaptation applied to  color model addressed two important issues: the undefined values produced by the  component equation [see (3)] and the discontinuity in this component when it changes from 359 to 0.

Figure 1 :
Figure 1: Achromatic and chromatic areas for  color space.

Figure 2 :
Figure 2: Original  and the two shifted  representations.
set threshold V, threshold  //predefined thresholds set  Entry, 120 Entry, 240 Entry for each pixel in  do set  with Eq. (1) set  with Eq. (2) if  > threshold V and  > threshold  then set  with the first part of Eq. (5) if  > = 120 then set 120 equals  − 120 else set 120 equals  + 60 if  > = 60 then set 240 equals  − 60 else set 240 equals  + 120 else set  with the second part of Eq. (5) set 120 equals  set 240 equals  add , 120, 240 in  Entry, 120 Entry, 240 Entry for each entry in [ Entry, 120 Entry, 240 Entry] do execute -Means with entry Pseudocode 1: Modified  color model pseudocode for -means++.

Table 1 :
Segmentation performance results for image "a" using -means++.