Evolutionary Algorithms for Robust Density-Based Data Clustering

Density-based clustering methods are known to be robust against outliers in data; however, they are sensitive to user-speci�ed parameters, the selection of which is not trivial. Moreover, relational data clustering is an area that has received considerably less attention than object data clustering. In this paper, two approaches to robust density-based clustering for relational data using evolutionary computation are investigated.


Introduction
Clustering as an integral machine learning activity involves unsupervised classi�cation of data into self-similar clusters-entities in a cluster are alike and entities across clusters are not.A cluster is de�ned in terms of internal homogeneity and external separation, or in density-related terms, clusters are dense regions in feature space with sparser regions separating clusters from one another.Datasets themselves can be divided into two groups-object data and relational data; the distinction that is described later in the paper.While a lot of effort and research have gone into developing clustering algorithms for object data, data clustering methods for relational data have received lesser attention.In application domains such as social sciences and bioinformatics, relational datasets are more common than object data.Prototype-based clustering algorithms are popular for clustering object data, where a cluster can be represented by a cluster prototype, and algorithms are built around optimization parameters of the prototypes.Most optimization is done iteratively from a randomly chosen initial state, and as it turns out, prototype-based object clustering is very sensitive to this initialization.Evolutionary algorithms and other approaches that operate on a population of potential solutions have lately been used as a remedy to the curse of initialization.Real-life data is also inherently noisy, and prototype-based clustering methods have been shown to be adversely affected by noise in data.Unless guarded against, the presence of outliers in data in�uences the calculation of prototype parameters.Density-based clustering algorithms are resistant to outliers if it can be assumed that outliers occupy the less-dense regions in the feature space.Density-based spatial clustering of applications with noise (DBSCAN) is the most popular density-based clustering algorithm [1].Resistant to outliers and easily adapted to large-scale data clustering, DBSCAN and its variants still suffer from the problem of pre-speci�cation of two important parameters, described later, which in practice is not always straightforward or trivial.In this paper, two evolutionary approaches to relational data clustering are presented-one that converts relational data to an equivalent object data and simultaneously partitions the data, and another that implements a relational noise-resistant version of the DBSCAN.

Relational Data Clustering
A set of entities    1 ,  2 , … ,   },   , can be numerically speci�ed by an object dataset    1 ,  2 , … ,   }  ℝ  ,   , where each of the  entities is numerically described by  real-valued features.In particular, discrete features are included in this de�nition; however, object data can consist of nominal (categorical) or ordinal values, interval or ratio values, or can be purely descriptive or consist of a combination of features.Each object is represented as a point in the -dimensional feature space, so an object dataset can be visualized by plotting it in the feature space.e same data can also be numerically speci�ed by a relational dataset     , where a pair of objects is numerically described by a real-valued relationship.Each entry   (1 ≤ ,  ≤ ) is a quanti�cation of the relationship between objects   and   .Dissimilarity and similarity are common relationships; when objects can be numerically speci�ed by , distance or a function thereof can be used as a measure of dissimilarity or similarity.Positive relations such that   ≥ 0 and symmetric relations such that   =   are oen considered.Also similarity relations are required to be re�exive, that is,   = 1, and dissimilarity relations are required to be irre�exive, that is,   = 0 for all 1 ≤  ≤ .
More oen than not, relational data do not have an object basis and are just a quanti�cation of how two entities are related to each other.Prototype-based clustering models that operate on object data oen iteratively re�ne cluster assignments starting from an initial state by minimizing some distance measure of a cluster representative (called a prototype) from entities in that cluster.Examples of such methods include -means clustering and fuzzy clustering.However, cluster prototypes are not well-de�ned in relational data, and hence prototype-based clustering models that have been reformulated for relational data compute the equivalent object data either implicitly or explicitly.Such methods include relational -medoids, relational fuzzy means, fuzzy relational clustering and relational alternating cluster estimation [2][3][4].In the absence of cluster prototypes, the methods are pseudoalternating at the very best, and cluster assignment is therefore computationally expensive.Another approach is data transformation which involves the linear or the nonlinear projection of the relational data into a lower dimensional object space, followed by prototypebased clustering in the transformed feature space.Since data clustering is the primary focus, simple transformations which roughly preserve the original interrelations between entities in a smaller dimensional feature space are preferred.Fuzzy nonlinear projection is a clustering method that simultaneously transforms and partitions the data.e method and its shortcomings are presented in detail in the next section.

Evolutionary Clustering
Clustering can be considered as a particular class of NPhard problems [5], which has led to the development of efficient metaheuristics to provide optimal solutions to the clustering problem in reasonable time.e propensity of simple local search techniques like hill climbers and -means to get trapped in local minima has also led to development of algorithms that search for multiple solutions such as ones based on evolutionary algorithms.e aim of evolutionary clustering algorithms is to evolve a population of (random) suboptimal partitions to a population that contains potential near-optimal partition(s).For more details, the reader is referred to Hruschka et al. 's review paper [6] on clustering using evolutionary algorithms.
One of the reasons why relational data clustering has seen fewer evolutionary algorithm-based implementation methodologies than object data clustering is because of the difficulty in coming up with a succinct yet meaningful genetic representation of the partition.Object data clustering using prototype methods is easier to implement using evolutionary algorithms, and most implementations seem to prefer the prototype-based string representation; that is, a -partition of  data objects in -dimensions can be represented by    string.is representation is preferred over the -bit long cluster label representation since ,   .In the absence of explicit prototypes, certain judiciously chosen encodable parameters that are representative of the partition can be used to represent the partition in addition to the -bit cluster-label (assignment) representation.Clustering algorithms such as DBSCAN produce uniquely different partitions that are a direct function of certain clustering parameters; however, prototype-based clustering methods are more sensitive to initialization and the number of clusters which cannot be considered clustering parameters.e drawback of using an -bit cluster-label representation for relational data is that it cannot be directly and efficiently mapped on to the clustering criterion.In this paper, modi�ed versions of both representations will be used.
For robust clustering, the challenge is to devise a suitable �tness function that will quantify the quality of a partition in the presence of outliers in the data and one that will reach its maximum when all the outliers have been correctly identi�ed.One of the �rst robust evolutionary algorithms to be proposed used a function of the least median of squares (�MS) criterion as the �tness function to drive the evolution [7].In fact, most evolutionary clustering algorithms use some kind of partition-dependent measure of homogeneity and separation as the �tness function.
Another strategy is to cast the robust clustering problem as a multiobjective optimization problem and solve using multi-objective evolutionary algorithms.e �tness function has to be designed as a pair (or more) of complementary functions in a way that Pareto-optimal fronts in the objective plane (or space if there are more than two objectives) can be extracted.e optimal solution to the robust clustering problem will then be found in the extreme Pareto-optimal front.e challenge lies in designing a suitable �tness function and devising a representation that will encode information about the objectives separately.e binary representation (0 for noise points, 1 for good points) concatenated with prototype locations is an easy representation to manipulate using canonical genetic operators, yet powerful enough to encode partition information in the nonnoisy subset of the data [8,9].Complementary objectives such as minimizing number of clusters along with uncovering dense regions of the data (the nonnoisy subset) and maximizing intercluster distances have been used successfully.

Relational Clustering Using Sammon Mapping
An indirect approach to clustering relational data is applying Sammon mapping to obtain an object-equivalent data in lower dimensions, followed by subjecting it to prototypebased clustering.Since the primary focus is clustering of data and not mapping accuracy, one can in principle map the relational data into a 1-D data in a very limited range.is indirect approach is pursued in this paper.Sammon mapping produces an object dataset    1 ,  2 , … ,   } ∈ ℝ  ,  ∈  such that the distances    ‖  −  ‖, where 1 ≤ ,  ≤  are as close as possible to the corresponding relational distance   , that is, ‖  −   ‖ ≈ 0. If  is computed from an object data set  ∈ ℝ  , then  ≈ .Sammon mapping can be minimized by the error functional e minimization is performed using numerical optimization schemes such as gradient descent or Newton's algorithm.Sammon mapping has been applied to a relational data, followed by fuzzy clustering on the resulting  [10], and Sammon mapping error functional has been explicitly incorporated into a clustering criteria (a more direct approach to clustering based on Sammon mapping) in fuzzy nonlinear projection algorithm [11].In this paper, an optimization approach is proposed where the mapping error functional and a separate clustering criterion are simultaneously optimized.
Clustering criteria are used to evaluate individual partitions and are in essence a measure that maximizes both the homogeneity within clusters and the separation between clusters.Such measures like Davies-Bouldin Rule Index [12] are easily de�ned in the object space.For each cluster, a centroid is de�ned in  dimensions as where   is the th cluster comprising of   entities.Intracluster similarity can be quanti�ed as e distance between two clusters   and   can be measured in terms of the distance between their centroids as Davies-Bouldin (DB) index in the feature space is de�ned as which takes values in the range 0 to 1; the smaller the value, the better the partition.e aim of the relational clustering using Sammon mapping is to simultaneously map the relational data into an equivalent lower-dimensional object space and optimize the mapped data into clusters.For the sake of brevity this algorithm will be referred to as SMC-R, for Sammon mapping and clustering of relational data.In all of the experiments described here, the lower dimension is   1.Each mapped object   is represented by a binary string of length , that is, it can lie between 0 and 2  − 1 and  can be chosen such that 2  ≫ , where  is the size of the dataset.A modi�ed cluster label representation is used where each mapped object is appended by its cluster label.Again, in the experiments described here, it was assumed that clusters are �xed and less than � in any given dataset.erefore, three bits can be used to represent the cluster label of the mapped object.A partition can therefore be represented by a binary string of size     .For large values of , this representation is not computationally attractive, and therefore the clustering methodology described here will be implemented only on small to medium datasets (  1000).e binary bits representing the cluster labels use a restricted growth function (RGF) scheme.e restricted growth function (RGF) is a remedy for the degeneracy problem inherent in the cluster labels [13].Consider a twocluster dataset of �ve entities where entity 1, 3, and 4 belong to one cluster, and entities labeled 2 and 5 belong to the second cluster.A cluster-label representation of a partition could be 12112}.However, 21221} is the same partition.In other words, there is a many-to-one mapping of representations in the phenotype (partition) space.e RGF scheme reorders the labels such that entity 1 is always assigned to the cluster labeled 1, entity 2 is assigned either to the cluster labeled 1 or 2, but not 3 or later, and so on.For  entities grouped into  clusters, the cluster-label chromosome is a function      such that A particular chromosome can be decoded as follows� �rst  bits provide object information about the �rst entity on a 1-D line in the range [0, 2  − 1], the next three bits decode to their cluster label, and so on.Aer recombination, an additional step which reorders the cluster label according to the RGF scheme is required.Single-point crossover and random bit-�ip mutation with prede�ned probabilities are used as genetic recombination operators.If, aer reordering, an individual in the offspring population decodes to a partition with less than  clusters, clusters are split up in two (one random cluster at a time) until a  partition is obtained.If, on the other hand, an individual represents a partition with more than  clusters, two clusters are picked at random and merged until a  partition is obtained.is step is combined with the RGF reordering step and is implemented aer the entire offspring population is generated.e algorithm is described as follows.
(1) A random population of  individuals is initialized; each individual is a binary string of size     .Parameters such as number of generations , probabilities of mutation, and crossover   and   , respectively, are �xed.Generation counter is initialized to 0.
(2) A mating pool is created using a tournament selector operator of size 2; two individuals are picked at random from the parent population, and the one with the higher �tness is inserted into the mating pool.e �tness of an individual is de�ned as where  and  are constants that weight the relative importance of Sammon mapping accuracy to cluster assignment accuracy, and  1 and  2 are given in ( 1) and ( 5), respectively.e size of the mating pool is the same as the population size.
(3) A pair of individuals is then picked sequentially from the mating pool, and two offspring individuals are created using recombination operators with probabilities   and   .
(4) e offspring population is merged with the parent population, and the combined population is ranked according to individual �tness.e top half of the population is retained as the new generation.

Density-Based Partitional Clustering
e attractiveness of DBSCAN lies in its simplistic framework and its robustness to outliers; however, the algorithmic implementation has issues which this paper deals with.is is conceptually similar to partitional clustering using the minimum sum of squares error criterion or the least squares (LSs) measure which states that the best partition is the one which minimizes intracluster distances although the criterion does not lend itself very well to implementation.Approximate iterative schemes such as -means have been developed to deal with implementation of the LS measure.DBSCAN divides the entities to be clustered into three groups-core entities, border entities, and noise, based on a measure of how centrally (or noncentrally) located the points are in dense regions.A few de�nitions are in the following order.
���niti�n � (-neighborhood of an entity).e -neighborhood of an object  de�ned by  is a collection of entities whose distance from  is no more than a prede�ned threshold , which is one of the two user-de�ned parameters.
���niti�n � (core entities).An object   is a core object if the size of the -neighborhood of   is larger than a prede�ned threshold MinPts, which is the second of the two user-de�ned parameters.
���niti�n � (directly density-reachable and border entities).An object   is directly density-reachable from a core object   if   belongs to the -neighborhood of   , that is,   ∈   .If   itself is not a core object (according to De�nition 2), then such directly density-reachable objects are termed border objects.
���niti�n � (density-reachable).An object   is densityreachable from another object   if there exists a chain of objects starting from   to   such that each object in the chain is directly density-reachable from the previous object in the chain.
���niti�n � (density-connected).An object   is densityconnected to an object   if there exists another object   which is density-reachable from both   and   .
���niti�n � (cluster).A cluster is de�ned as soon as a core object is located.e two requirements to expand a cluster are that all directly density-reachable objects be part of that cluster and all objects in a cluster be at least densityconnected to each other.If two core objects are in each other's -neighborhood, they belong to the same cluster.
���niti�n � (noise).Noise objects are those that are neither core nor border objects and do not belong to any cluster.
As can be seen from the de�nitions, reasonable estimates for  and MinPts are critical to the success of DBSCAN.When different regions in the data have different densities, the same value of  and MinPts may not always be the prudent choice.Modi�cations of DBSCAN include variants such as OPTICS [14] and DBCLASD [15].e former uses additional de�nitions of core distance and reachability distance to cluster spatial databases which have regions of differing densities, while the latter assumes that entities inside a cluster follow a uniform distribution which reduces the dependence on user-speci�ed parameters.In the next section, a new algorithm called DBSCAN-RC (relational clustering) is proposed which evolves the original DBSCAN estimates of  and MinPts.
e two user-de�ned parameters in DBSCAN,  and MinPts, are evolved using a genetic algorithm, and the resulting algorithm is simply called DBSCAN-RC.e original DBSCAN is recast as a relational clustering problem here.e relational data is modi�ed as A multiparameter, mapped, �xed point coding scheme is used for representing potential solutions to the clustering problem.e �rst parameter encodes the value of , and the second parameter encodes for MinPts, with the result that all encodings are the same length.In this implementation presented later, both parameters use eight each.
In the relational space, the neighborhood parameter  is de�ned as a proximity parameter ranging between 0 and 1-a value of 0 implying that no entity is a core entity, a value of 1 meaning that all entities are core entities, and for any other , an entity   is a core entity with respect to MinPts if MinPts is the minimum limit on the number of entities that must be within the  distance for an entity to be classi�ed as a core entity.It can theoretically range from 0 to -a value of 0 means that any entity is a core entity as long as it has at least one other entity proximal to it above the proximity threshold, and a value of  means that unless the proximity threshold is unity, no entity is ever a core entity.For any other value in the range 0 to , the encoded parameter is rede�ned as (1 − ).is rede�nition scales the second parameter such that it always lies between 0 and 1.For example, with   , if the �rst parameter in an individual (partition) in the population decodes to 0.35 and the second parameter to 0.45 (i.e.,   11), an entity will only be classi�ed as a core entity only if there are at least 110 other entities with proximities of more than 0.35 with respect to it.
In the absence of cluster prototypes in the relational space, cluster assignment functionals can be used as measures of the quality of partition.e relational fuzzy -means minimization functional is given by, where  is the number of clusters,     ; 1 ≤  ≤ , 1 ≤  ≤  is a membership function that quanti�es the degree of belongingness of entity   in clusters   , and  is a parameter called the fuzzi�er.Individual memberships assume values between 0 and 1. e membership function is constrained such that sum total of the membership of an entity across all clusters is unity, and the sum total of memberships of all entities in any cluster is more than zero (this ensures that there are no empty clusters).e previous equation is a general case, of which the crisp (nonfuzzy) case is a particular one-each entity is part of only one cluster (with a membership of one) and has zero membership in all other clusters.e relational hard -means minimization functional per cluster is For �xed values of , a small value of the above functional denotes a good partition.e nature of scaling   to   and the fact that the functional value is divided by the total number of cluster ensure that  3 is always between 0 and 1.
In addition, a penalty is imposed on a partition that classi�es an unusually large number of entities as noise.is requires an assumption about the possible contamination (noise) in the data-one reasonable assumption is that at least half the data will form meaningful clusters; a partition is not considered unless the size of the clustered data is more than .5.However, in experiments reported in this paper, a linear weighted function of the inverse of  3 and the number of non-noise entities considered by the partition are used as a �tness function where  and  are weights such that +  1.While the �rst term rewards partitions with smaller  3 values, the second term rewards those that classify more entities as either core or border entities (less noise) on the assumption that a vast majority of entities are part of good clusters.More discussion follows in the next section.e clustering algorithm is the same as the one proposed in Section 4, the �tness function given by ( 12) replacing the one in (7).

Computational Results
e basic framework of the two algorithms presented here is the same-the differences lie in the representation used and the way in which the �tness function is calculated.In addition, SMC-R is noise-sensitive and is designed speci�cally for smaller datasets in higher dimensions; while DBSCAN-RC is robust against noise, and the sequential nature of DBSCAN has been shown to scale very well for large datasets.In this section, results are presented for two three-cluster synthetic datasets, one noisy and one free of noise.Both datasets are generated from Gaussian functions in eight dimensions.is is followed by providing DBSCAN-RC clustering results for two moderately large datasets from a literature.e �tness functions in (7) and ( 12) require the user to prede�ne weights ( and  in SMC-R and  and  in DBSCAN-RC) which are not trivial either.In this paper, values that work aer limited experimentation were chosen and will be described later in this section.
6.1.Synthetic ree-Cluster Data.ree Gaussian clusters each comprised of 300 eight-dimensional vectors are generated.e three clusters are centered around 1, 4, and 7, respectively, in all eight dimensions.e data has intradimensional variance of 0.5 and interdimensional variance of 0.05.Each cluster contributes 300 objects to the dataset for a total of 900 objects.e object labels are randomized aer generation.e object data so generated was then converted to relational (proximity) data using Euclidean distance measure.Clustering is performed using SMC-R and DBSCAN-RC with two different population sizes (  1, 200) and total number of generations   3.Parameters of the algorithms are shown in Table 1.Simulations were run on the 64-bit PyScripter IDE on a dual core 3.16 GHz processor with 8150 MB RAM.Results are presented in Tables 2 and 3 for two speci�c combinations of  and .Ten runs were conducted with identical initial populations in both cases.While data about optimal partition found (yes or no) and number of generations till optimal partition was �rst uncovered is presented only for the best among the 10 runs, the ratio of average �tness of the terminating generation to the initial generation is averaged over 10 runs.e dataset was then corrupted by 100 uniformly distributed noise vectors in the range (1,7) in all eight dimensions.In the eight-dimensional feature space, the Gaussian clusters are comprised of entities that are in dense regions, and the uniformly distributed noisy entities are in regions that are less dense.e ratio of the mean �tness of the terminating population to the initial population is measured and averaged over 10 random runs from the same initial population.e population size was varied from 50 to 200 in increments of 10 to track the effect of population size.Other parameters are listed in Table 4. e variation of the average �tness ratio with population size for four different values of maximum generations is shown in Figure 1.
No signi�cant gains are achieved as the maximum number of generations is increased from    to   .Likewise,    is a signi�cantly better choice than    for both    and   ; however, the average �tness ratio does not markedly improve beyond   .e choices are not generalizable, but the plots provide an empirical basis to judiciously select  and  for other similar datasets.In all runs except for    for   , DBSCAN-RC identi�es the optimal partition, de�ned as the one that identi�es the three Gaussian clusters and the 100 uniformly distributed noise entities (within a 2% misclassi�cation error).SMC-R does not correctly identify the optimal partition in any of the runs.e results show that while SMC-R is more efficient for smaller, noise-free data than DBSCAN-RC (possibly faster too, although CPU usage time is not reported), the opposite is true for clustering with noisy data as expected.e results provide a justi�cation for the two algorithms and veri�cation of the proof of concept.

Wisconsin Breast Cancer Data. e Wisconsin breast cancer database is available at the UCI Machine Learning
Repository [16] and has been analyzed as a two-class problem [17] and as a multiobjective optimization problem with unknown number of clusters in [8].�ach entity is de�ned by nine attributes (ranks in the range of 1-10) and an associated class label which are ignored during clustering.ere are 699 total cases out of which 16 have a single missing feature.ese missing features are randomly given "unusually large" values, these 16 cases can therefore be treated as noisy entities.Hamming distance was considered as the distance measure because the attributes take rank values.e maximum hamming distance between two entities is 81 when all nine of their attributes take maximal ranging values; the minimum distance being zero.e problem was analyzed using a population size of 300 running for 300 generations with binary-bit representations, and the initial population was randomly generated using biased generators [8].In this paper, two simulations are run using    and   5.e algorithm is stopped when the average �tness of the top half of the individuals converged within a prede�ned threshold ( −4 in here).e simulations with    converged within 30-50 generations with near-optimal solutions appearing as early as 25th generation in the best case.In this case, near-optimal solutions are de�ned as partitions that correctly classify the noisy entities and have the small percentage misclassi�cation in the nonnoisy part of the data.Due to the nature of the data, 95-96% classi�cation accuracy has been reported in the literature, which was also found to be the maximum limit of accuracy in all the simulations using DBSCAN-RC.e simulations with   5 not only converged quicker than those with    (as is to be expected), but the populations also lost diversity considerably quickly.Although most of the individuals by the 25th generation encoded for two clusters and were near-optimal, almost a fourth (125-140 individuals) of the population were also near-optimal but encoded for �ve clusters.Similar results have been reported in the literature [18,19].However, this smaller subset did not constantly appear in simulations with   .For almost similar results (and trends), the present implementation involves much less computational effort than previous studies.

Iris Data
Set with Simulated Noise.e Iris dataset has been a benchmark set for machine learning tasks.e same set of four Iris datasets-the original and three contaminated datasets from [8,9] are revisited here.e original Iris dataset consists of 150 samples of �ower from the Iris species Setosa, Versicolor, and Virginica; 50 samples from each species are categorized according to four features-sepal length, sepal width, petal length, and petal width.Setosa is linearly separable from the others, but Versicolor and Virginica are not, which very oen results in clustering algorithms uncovering two clusters in the Iris dataset instead of three.e Iris dataset and its variants were partitioned using a substantially large population evolving over quite a number of generations in [8,9].ere was also the added issue of long binary bit representations used in previous studies; the constant length two-parameter representation used here is computationally a lot more attractive.
In this implementation of DBSCAN-RC, Euclidean distance is used as the basis of creating the proximity matrix.e four features are �rst scaled before being used in the distance metric.For   5, the results of DBSCAN-RC are compared to the results using the genetic algorithmbased clustering method that used a two-part chromosome and a two-tier �tness evaluation [9], abbreviated as 2GA for convenience and presented in Table 5. e second-tier �tness function used in 2GA is the fuzzy silhouette width which is similar to DB index in that it is also a measure of intracluster compactness and intercluster distances.e convergence criterion in 2GA is when the average secondtier �tness of a mating pool converges; a mating pool consists  5. Also reported are speci�city and sensitivity values.Speci�city or the true negative rate is the measure of identifying arti�cially added noise correctly (if all noise entities have been identi�ed, the speci�city is 100), and sensitivity or the true positive rate is the measure of identifying true data correctly (sensitivity is 100 if all the real Iris data is identi�ed as core or border entities).Unlike the two previous studies, the discrimination power of DBSCAN-RC does not deteriorate rapidly as contamination increases.In fact, the speci�city and sensitivity of both classes (solutions encoding for    and for   ) are considerably better than those obtained with 2GA and reported in [9].e �ttest individual in the converged best population encodes for    more oen than   , a fact reported in [9] as well.

Discussion
e primary contribution of this paper is presenting the framework for two relational clustering methodologies using evolutionary algorithms.A simpli�ed constant length representation of a partition is also proposed that could potentially encode for a wide range of  values.Genetic representations used for clustering problems, where the number of clusters is unknown, have been of variable length.e constant length chromosome is made possible by including two densityrelated parameters instead of encoding for prototype location or cluster labels.e density-based clustering method, DBSCAN is also robust against outliers in the data, which makes it suitable for real world applications where data are almost always contaminated.Moreover, DBSCAN and its variants have been originally de�ned in feature �vector or entity) space; in this paper, a new relational version of the algorithm is presented.

F 1 :
Ratio of the average �tness of the terminating population (at generation ) to the average �tness of the initial population plotted as a function of the population size .

T 5 :% individuals with c = 2 25 10 %
Clustering results for contaminated Iris datasets.Iris dataset with n = 165, individuals with c = 3 67 88 Average speci�city (family of c = 2) 83.33 90.00 Average speci�city (family of c = 3) 90.00 93.33 Average sensitivity (family of c = 2) 86.67 90.66 Average sensitivity (family of c = 3individuals in a population of size 50.DBSCAN-RC terminates when the average �tness of the top 12 individuals in the best population at any generation converges below a threshold of  −4 .e �nal population of best individuals is decoded, and phenotypes (partitions) are evaluated.e partitions are seen to correspond either to two or three clusters, with percentages reported in Table