A Greedy Clustering Algorithm Based on Interval Pattern Concepts and the Problem of Optimal Box Positioning

We consider a clustering approach based on interval pattern concepts. Exact algorithms developed within the framework of this approach are unable to produce a solution for high-dimensional data in a reasonable time, so we propose a fast greedy algorithm which solves the problem in geometrical reformulation and shows a good rate of convergence and adequate accuracy for experimental high-dimensional data. Particularly, the algorithm provided high-quality clustering of tactile frames registered by Medical Tactile Endosurgical Complex.


Introduction
We consider the problem of clustering, that is, splitting a finite set  ⊂ R  into disjoint subsets (called clusters) in such a way that points from the same cluster are similar (with respect to some criterion) and points from different clusters are dissimilar (see, e.g., [1]).It is convenient to present the input data in the form of a numerical context (table) whose rows correspond to objects and columns correspond to attributes of the objects.
Formal concept analysis (FCA) is a data analysis method based on applied lattice theory and order theory.The objectattribute binary relation is visualized with the use of the line diagram of the concept lattice.Within the framework of this theory a formal concept is defined as a pair (extent, intent) obeying a Galois connection (for exact definitions see the monograph [2]

by Ganter and Wille).
There exist several generalizations of FCA to fuzzy and numerical contexts.One of them is known as the theory of pattern structures introduced by Ganter and Kuznetsov in [3].An important particular case of pattern concepts, which are the key object in the theory of pattern structures, is interval pattern concepts with the operation of interval intersection.Interval pattern concepts allow one to apply cluster analysis to rows of formal numerical contexts.In this case the criterion of similarity consists in belonging of all the differences between the values of the corresponding attributes to given intervals.
It can be easily seen that the problem of finding an interval pattern concept of maximum extent size (i.e., cardinality) can be reformulated as the problem of the optimal positioning of a -dimensional box with given edge lengths for the given set , that is, finding a position of the box that maximizes the number of points of the set  enclosed by the box (the details are given below in Section 2.2).
The existing algorithms that solve the problem of finding the optimal position of a box do not allow one to obtain an exact or at least approximate solution for high-dimensional data within a reasonable time (see a detailed survey in Section 2.2).The main goal of this paper is to propose a greedy algorithm which gives an approximate solution to this problem and a clustering algorithm based on the optimal positioning problem.We propose a clustering algorithm with  (( log () +  3  1−1/  min  (, ))   min ) worst-case time and () space complexity, where (, ) denotes the number of iterations of the main stage of the

Main Definitions and Statement of the Problem
In this section we start with the main definitions from the theory of formal concepts and then present a geometrical reformulation of the problem of finding the interval pattern concept of maximum extent size (we call it simply the maximum interval pattern concept).

Formal Concepts.
Let us recall the main definitions which we need to formalize our clustering method based on interval pattern concepts.Additional details can be found in [2,3].
Definition 1.An upper (lower) semilattice is a partially ordered set (, ≤) such that for any elements ,  ∈  there exists a unique least upper bound (greatest lower bound, resp.).

Definition 2.
A semilattice operation on the set  is a binary operation ⊓:  ×  that features the following properties for a certain  ∈  and any elements , ,  ∈ : (i)  ⊓  =  (idempotency).
(iv)  ⊓  = .Definition 3. A lattice is an ordered set (, ≤) which is at the same time an upper and a lower semilattice.
A Galois connection between these sets is a pair of maps :  →  and :  →  (each of them is referred to as a Galois operator) such that the following relations hold for any  1 ,  2 ∈  and  1 ,  2 ∈ : Applying the Galois operator twice, namely, (()) and (()), defines a closure operator.Definition 5. A closure operator (⋅) on  is a map that assigns a closure  ⊆  to each subset  ⊆  under the following conditions: (i)  ≤  ⇒  ≤  (monotony).
(iii)  =  (idempotency).Definition 6.A pattern structure is a triple (, (, ⊓), ), where  is a set of objects, (, ⊓) is a meet-semilattice of potential object descriptions, and :  →  is a function that associates descriptions with objects.
The Galois connection between the subsets of the set of objects and the set of descriptions for the pattern structure (, (, ⊓), ) is defined as follows: (2) Definition 7. A pattern concept of the pattern structure (, (, ⊓), ) is a pair (, ), where  ⊆  is a subset of the set of objects and  ∈  is one of the descriptions in the semilattice, such that  ◻ =  and  ◻ = ;  is called the pattern extent of the concept and  is the pattern intent.
A particular case of a pattern concept is the interval pattern concept.The set  consists of the rows of a numerical context, which are treated as tuples of intervals of zero length.An interval pattern concept is a pair (, ), where  is a subset of the set of objects and  is a tuple of intervals with ends determined by the smallest and the largest values of the corresponding component in the descriptions of all objects in .
Interval pattern concepts are convenient to use in the analysis of numerical contexts, when there is a need to divide all data into clusters that comprise objects in which the numerical data is similarly "distributed" in the rows.
For each component of an interval pattern concept we introduce the width : the difference between the largest and the smallest values of the component.Then a clustering procedure can be defined using a standard greedy approach.Specifically, at each step the maximum interval pattern concept is identified, that is, an interval pattern concept with the maximum number of objects, whose width with respect to each component does not exceed a predefined .The objects of the identified interval pattern concept are combined into a cluster and excluded from the set of objects analyzed at subsequent steps.
In Example 1 presented in Table 1 the objects are pupils and the numerical data of the context consist of the grades they got at exams in various disciplines.
We need to divide the set of pupils into clusters in such a way that the grades of pupils in the same cluster differ by at most 1 for each of the disciplines.Such a setting corresponds to  = 1; in this case we obtain 6 clusters (interval pattern concepts whose width is not greater than 1), each containing one pupil.In the case  = 2 we arrive at the same 6 clusters.

Geometry.
Let  be a set of  points in R  ( ∈ N) and  1 ,  2 , . . .,   be a set of positive real numbers.Definition 8.A -orthotope (also called a box) with center  = ( 1 , . . .,   ) ∈ R  and edge lengths  1 ,  2 , . . .,   is the Cartesian product of the intervals It can be easily seen that the problem of identification of maximum interval pattern concept can be reformulated in terms of finding the optimal position of the box with the edge lengths  1 ,  2 , . . .,   , that is, maximizing the number of points of set  enclosed by the box.This formulation can be generalized to the problem of finding the optimal position of a ball in an arbitrary metric space, since any box can be treated as a ball in the stretched  ∞ metric in which the distance (, ) between the points  = ( 1 , . . .,   ) and  = ( 1 , . . .,   ) is defined as The problem of optimal positioning has been well studied for  = 2: some lower and sharp upper bounds on complexity are known (see, e.g., [6,7]).However, to the best of our knowledge for the case of an arbitrary dimension  no lower bounds and no efficient exact algorithms are available so far.de Figueiredo and da Fonseca noted [8] that the problem can be solved exactly in roughly ( +1/ ) time by projecting the points onto a ( + 1)-dimensional paraboloid and using half-space range searching data structures [9].In the same paper for the case of weighted points under certain additional restrictions they also obtained a lower bound Ω(  ) for exact algorithms and indicated that existing algorithms for the unweighted version of the problem do not beat this lower bound in the worst case.Eckstein et al. showed that a generalization of the problem of optimal positioning whose input also includes a set of prohibited points is NP-hard [10].
Known approximate algorithms for optimal positioning also have time complexity which depends on  exponentially.For example, de Figueiredo and da Fonseca suggested an approximate algorithm [8] which solves the problem in worst-case time (3  / −1 ), where 0 <  < 1 is a given approximation parameter.Due to exponential dependence on  these approximate algorithms are also practically inapplicable in the case of high dimension, and there is a need to develop an algorithm which can produce an approximate solution in reasonable time.

A Greedy Algorithm for Finding an Approximately Optimal Position of a Box
In this section we present a greedy algorithm for finding an approximately optimal position of a box with edge lengths  1 ,  2 , . . .,   for a set  = {  }  =1 ⊂ R  (the order in which points are listed in  is insignificant).This algorithm is auxiliary for the clustering method described in Section 4.
The algorithm has several input parameters: positive real numbers ,  min ,  < 1, and a function : N × N → N. The parameters ,  min , and  regulate the duration of one iteration.The function  takes the values  and  as inputs and returns the number of iterations at the main stage of the algorithm.Greater number of iterations and greater duration of each iteration provide better approximation.
The algorithm includes two basic stages: the preprocessing stage and the main stage.

Preprocessing
(1) At the first stage of our algorithm the box with the edge lengths  1 ,  2 , . . .,   is transformed into the unit cube (we call it simply the cube) by means of dividing the th coordinate of each point by   ,  = 1, . . ., .This stage can be performed in () operations.
(2) We consider the integer lattice with edges of length 1, compute the number of points of  in each cell, and denote the cell that contains the maximum number of points by  0 .The cell  0 is called the base cube.Let  0 ∈ R  denote the center of  0 .This stage requires () operations as well.
(3) At the final step of the preprocessing stage we build a - tree data structure (which is used at the main stage to organize the fast range search) in ( log()) operations with the space complexity of () (see [11,12]).

The Main Stage.
Let : 2 R  → Z + denote the function which counts number of points of the set  in an arbitrary subset of R  .The main idea of our algorithm consists in constructing a finite sequence of cubes that starts from a random point  in the base cube and satisfies the condition that the next cube contains more points than the previous one.Let  , and take  +1 =   .In other words, if there exists a cube in the   -neighborhood of the current cube which contains more points of , then we move the current cube to this position.
(2) If there are no such cubes (i.e., all cubes in the  neighborhood of the current cube contain at most the same number of points), then we set   +1 =    ,   +1 =    , and  +1 =   (i.e., decrease the current step size).If  +1 <  min (the step size threshold is reached), then the procedure is ended and    is returned as the procedure result.
In order to obtain acceptable time complexity we impose additional restrictions on the selection of the next cube.These assumptions are necessary to avoid the situation where the length of the sequence grows exponentially with .Validation on experimental data confirmed that these restrictions do not essentially affect the clustering results.
Restriction 1.All cubes in the sequence must have common points with the base cube  0 .
In Figure 1 we present an example of a set  for which this requirement causes a significant difference between the exact solution and the solution produced by the algorithm.However, this difference is essentially reduced at further steps of the clustering algorithm as generally it affects only the order in which clusters are constructed.There is no way to move from the red cube to the blue one without losing touch with the base cube.
Restriction 2. For each individual coordinate it is not allowed to translate the cube in the opposite directions at different steps of the procedure described above.
The above restrictions lead to the following lemma.

Lemma 10. The main stage of the algorithm has
worst-case time complexity.
Proof.First we get an upper estimate for the length () of the sequence of cubes (for an arbitrary  ∈  0 ).Due to Restrictions 1 and 2 we have
Thus, Restrictions 1 and 2 avoid the situation where the length of the sequence grows exponentially with .Each step of the procedure of constructing the sequence of cubes requires 2 evaluations of the function  for the cubes (i.e., 2 range searches).With the use of a - tree the range search can be performed with ( 1−1/ ) worst-case time complexity (see [13]).The procedure of constructing the sequence of cubes involves (, ) iterations, so the above complexity bound holds.
Note that we also have a trivial estimate () ≤ , as (   ) grows, and hence () ≤ (  () ) ≤ .Thus, without the imposed restrictions the worst-case complexity estimate holds, and hence the restrictions can be omitted without violation of practical feasibility in case if the number of objects  has the same order as the dimensionality .
Journal of Applied Mathematics 5

Precision and Complexity of the Algorithm
Theorem 11.Let  0 be a center of  0 ,  g =   0 ( 0 ) be a cube produced by an algorithm iteration which was initialized with  0 (and so for this iteration   0 1 =  0 ), and   be an optimal cube (i.e.,   ∈ arg max (), where maximum is taken over all unit cubes in R  ).Then and this estimate is sharp.
Proof.The upper estimate is trivial.The lower estimate follows from the fact that  opt is covered by at most 2  cells of the integer lattice with edges of length 1, and hence An example that shows that the estimate is sharp is similar to the example from Figure 1.For example, we can locate the center of  opt at the integer lattice node and put 2  points in  opt in such a way that each cell of the integer lattice contains at most one of these points.Then, we select an arbitrary cell of the integer lattice that is distant from  opt and put one point to this cell, which becomes  0 .
Proof.Combining the estimates for the time and space complexity of the preprocessing stage and the main stage of the algorithm gives the bounds mentioned above.

Clustering Algorithm
Now let us consider the clustering problem, that is, the problem of splitting the given set  = {  }  =1 ⊂ R  into mutually disjoint subsets  1 , . . .,   .Following interval pattern concept approach, we construct clusters with controlled interval pattern concept width.We propose a clustering algorithm based on the greedy approach and the procedure for finding an approximately optimal position of a box described in Section 3. The algorithm is not sensitive to the order in which points  are given.The parameters of the algorithm include positive real numbers  1 ,  2 , . . .,   and all parameters of the positioning algorithm, namely, ,  min , , and (, ).
First, we put  1 =  and find an approximately optimal position  1 of the box with the edge lengths  1 , . . .,   for the set  1 .Now suppose that the sets  1 , . . .,   and  1 , . . .,   have been already constructed and let  +1 =   \   .If  +1 = ⌀ then the procedure is ended.Else we find an approximately optimal position  +1 of the box for the set  +1 .The output of this procedure is a set of clusters   =   ∩   .
In order to avoid producing a lot of small clusters consisting of outliers we impose one more restriction.
Restriction 3. The resulting clusters must include at least  min objects.
With this restriction if the size of  +1 ∩  +1 is less than  min then the procedure ends (and points belonging to  \ ( 1 ∪ ⋅ ⋅ ⋅ ∪   ) are considered unclustered and referred to as outliers).
Restriction 3 together with Theorem 12 immediately leads to the following theorem.

Theorem 13. The clustering algorithm has
worst-case time complexity and () space complexity.

Validation
Validation of the clustering algorithm developed in this study was performed on a dataset of tactile images registered by the Medical Tactile Endosurgical Complex (MTEC) during examination of artificial samples.MTEC allows intraoperative mechanoreceptor tactile examination of tissues and is already used in endoscopic surgery [14][15][16].As methods for automated analysis of medical tactile images are still insufficient, validation results in particular and the developed clustering algorithm in general provide new opportunities for the medical domain applications.
The key component of MTEC is a tactile mechanoreceptor [17, Fig. 1].Its operating head is equipped with 19 pressure sensors that perform synchronous measurements 100 times per second.Each measurement result (called "a tactile frame" and consisting of 19 values) is wirelessly transmitted to a computer that performs preprocessing and visualization.The sensors are located at the operating head surface which is a circle with diameter 20 mm.
In order to create a dataset of tactile images we utilized MTEC for tactile examinations of three types of artificial samples.The samples were similar to the L-samples utilized in the study [17]-they were made using a soft silicone (Ecoflex 00-10, Shore hardness 00-10A) according to manufacturer's instructions and had a shape of a rectangular block with length, width, and height of 40 mm, 35 mm, and 11 mm, respectively.The difference was in sizes and shapes of hard inclusions enclosed in the samples.For the first sample type (ST1) the inclusion had a form of a spherical cap with base diameter 8 mm and height 2.4 mm oriented for palpation from the convex side.For the second sample type (ST2) the inclusion had a form of a spherical cap with a base diameter 4.7 mm and height 1.7 mm also oriented for palpation from the convex side.For the third sample type (ST3) the inclusions were the same as for ST2, but they were oriented for palpation from the flat side.For all sample types the inclusions were located in the center at height of approximately 3 mm.Thus, sample types were similar with a difference in either size or convexity of the inclusion.These samples simulated tissue with malignant neoplasms.
Totally 55 tactile examinations of the described samples were performed using MTEC.The contact angle was kept approximately equal to 90 ∘ , and inclusions were located close to the center of the operating head surface.We performed twenty-two, seventeen, and sixteen examinations for samples of ST1, ST2, and ST3 types, respectively.For each examination one tactile frame was selected, namely, the one with the largest standard deviation (SD) of values, and other tactile frames were disregarded.Visualization of tactile frames for each sample type is presented in Figures 2(a)-2(c).
Thus, each examination was associated with a point in R 19 , and the total number of points was 55.This set of points was clustered using the developed clustering algorithm, and the results were compared with the results of -means clustering ( = 3, Euclidean distance; see, e.g., [1]), which was used as a reference.Scikit-learn implementation [18] of -means algorithm was utilized.Adjusted and raw Rand indexes (clustering result versus original classes; see, e.g., [1]) were used as compared characteristics of the clustering quality.Note that both clustering algorithms use random initialization, so multiple runs were performed for clustering quality estimation (specifically, 100 runs were performed to estimate Rand index for each algorithm with given parameters).
The results produced by both the proposed algorithm and by the -means algorithm were unsatisfactory.However, the poor quality of the resulting clustering was predictable as examining of a single sample can result in tactile frames that are essentially different with respect to representation by a point in R 19 due to rotation and slight shifts of a tactile mechanoreceptor.
To get better results we mapped the data to the new 9dimensional space of attributes.The new attributes included (i) SD of all values in a tactile frame; (ii) mean and SD of the values corresponding to 7 middle sensors; (iii) mean and SD of the values corresponding to 12 outer sensors; (iv) mean and SD of the values corresponding to sensors that belong to the main diagonals (3 diagonals each consisting of 5 sensors, 13 sensors in total; see  Transition to the new attribute space essentially improved the clustering quality, but our algorithm left 10-14 points as outliers ( min was set equal to 8; the values of ,  min , and  were set equal to 0.9, 0.3, and 0.8 respectively, and  was set equal to 0.27 for all attributes).A representative result of one run is presented in Table 2. Then we placed outliers points to the obtained clusters by the -nearest neighbors algorithm ( = 8, unweighted; see, e.g., [19]).A representative result for one run is presented in Table 3.
Table 4 contains mean values and SDs for Rand indices and timing information.
As one can see, the proposed algorithm has an acceptable running time, and both our and -means algorithm reach mean quality plateau already at 20 iterations.
The advantage of the proposed algorithm over the means algorithm with respect to the clustering quality was statistically significant.For example, for 20 iterations and adjusted Rand index the comparison of our algorithm with outliers and the -means on 100 runs resulted in Mann-Whitney -test two-tailed  value equal to 1.0 ⋅ 10 −10 .As outliers are the points that are the most difficult for clustering, the advantage of our algorithm complemented by kNN-attributing of outliers to clusters over the -means was lower but still firmly significant with Mann-Whitney -test two-tailed  value equal to 9.5 ⋅ 10 −4 .Interestingly, the transition to the new attribute space improved the quality of our algorithm more than the quality of the -means clustering.For example, for 20 iterations, adjusted Rand index, and 100 runs, the comparison of the clustering quality for the initial attribute space and the new attribute space resulted in Mann-Whitney -test two-tailed  values not exceeding 10 −12 for both "with outliers" and "no outliers" versions of our algorithm, while for -means the  value was 0.43.

Conclusions
In this paper we proposed a greedy clustering algorithm based on interval pattern concepts.The obtained theoretical estimate on algorithm complexity proved computational feasibility for high-dimensional spaces, and the validation on experimental data demonstrated high quality of the resulting clustering in comparison with conventional clustering algorithms such as -means.

𝑦 1 ,
. . .,   () denote these cubes with centers   1 , . . .,   () , respectively.In our notation we have   1 =  and (   ) < (  +1 ) for all  ∈ {1, . . ., () − 1}.After (, ) iterations the algorithm returns a locally optimal cube .Definition 9.The -neighborhood of a cube  with center  = ( 1 , . . .,   ) is the set consisting of all cubes with centers at points of the form ( 1 , . . .,  −1 ,   ± ,  +1 , . . .,   ) for all  ∈ {1, . . ., }, that is, all cubes obtained through translation of  along one of the axes by the distance ±.Now we describe the procedure of constructing the sequence of cubes.Let  be an arbitrary point in the base cube  0 and   1 = ,   1 be the cube with center at   1 ,  1 = .In order to get a definite estimate on the precision of the algorithm (see Theorem 11) we initialize the first iteration deterministically by taking the center of  0 as .Other iterations are initialized randomly.Suppose that the cubes   1 , . . .,    with centers   1 , . . .,    , respectively, and the numbers  1 , . . .,   have been already constructed.There are two possible cases.(1) If there exists a cube  in the   -neighborhood of    such that () > (   ), then we set   +1 = , take the center of  as   +1

Figure 1 :
Figure 1: The base cube is colored red; the global optimum is blue.There is no way to move from the red cube to the blue one without losing touch with the base cube.
Figure 2(d) for details); (v) mean and SD of the values corresponding to sensors that belong to the secondary diagonals (6 diagonals

Figure 2 :
Figure 2: (a-c) Examples of tactile frames for examinations of ST1 (a), ST2 (b), and ST3 (c) type samples.Pressure values are scaled to [0, 255] segment and color-coded.(d) Correspondence between sensors and attributes from the new attribute space.Each hexagon represents one sensor.Middle sensors are colored in light-gray, outer sensors are colored in dark-gray.The main diagonals are shown by orange lines; the secondary diagonals are shown by blue lines.Centers of the hexagons that represent sensors belonging to both main and secondary diagonals are colored in red; belonging only to main diagonals, in orange; belonging only to secondary diagonals, in blue.

Table 1 :
A fuzzy formal context, where the objects are pupils and the attributes are disciplines.

Table 2 :
Correspondence between the original classes and the clusters constructed by the proposed algorithm (with outliers).

Table 3 :
Correspondence between the original classes and the clusters constructed by the proposed algorithm (no outliers).

Table 4 :
Dependency of Rand index values and the running time for our and -means clustering methods on number of iterations performed (100 program runs for each value).Values of Rand index are presented in terms of medians and interquartile ranges (IQR).Particular results obtained during validation, such as a new attribute space for tactile frames registered by the Medical Tactile Endosurgical Complex, have individual significance as they provide new opportunities for the medical domain applications aimed at automated analysis of tactile images.