We consider a clustering approach based on interval pattern concepts. Exact algorithms developed within the framework of this approach are unable to produce a solution for highdimensional data in a reasonable time, so we propose a fast greedy algorithm which solves the problem in geometrical reformulation and shows a good rate of convergence and adequate accuracy for experimental highdimensional data. Particularly, the algorithm provided highquality clustering of tactile frames registered by Medical Tactile Endosurgical Complex.
We consider the problem of clustering, that is, splitting a finite set
Formal concept analysis (FCA) is a data analysis method based on applied lattice theory and order theory. The objectattribute binary relation is visualized with the use of the line diagram of the concept lattice. Within the framework of this theory a formal concept is defined as a pair (extent, intent) obeying a Galois connection (for exact definitions see the monograph [
There exist several generalizations of FCA to fuzzy and numerical contexts. One of them is known as the theory of pattern structures introduced by Ganter and Kuznetsov in [
It can be easily seen that the problem of finding an interval pattern concept of maximum extent size (i.e., cardinality) can be reformulated as the problem of the optimal positioning of a
The existing algorithms that solve the problem of finding the optimal position of a box do not allow one to obtain an exact or at least approximate solution for highdimensional data within a reasonable time (see a detailed survey in Section
The rest of the paper is organized as follows. In Section
In this section we start with the main definitions from the theory of formal concepts and then present a geometrical reformulation of the problem of finding the interval pattern concept of maximum extent size (we call it simply the
Let us recall the main definitions which we need to formalize our clustering method based on interval pattern concepts. Additional details can be found in [
An
A
A
Let
Applying the Galois operator twice, namely,
A
A
The Galois connection between the subsets of the set of objects and the set of descriptions for the pattern structure
A
A particular case of a pattern concept is the interval pattern concept. The set
Interval pattern concepts are convenient to use in the analysis of numerical contexts, when there is a need to divide all data into clusters that comprise objects in which the numerical data is similarly “distributed” in the rows.
For each component of an interval pattern concept we introduce the width
In Example
A fuzzy formal context, where the objects are pupils and the attributes are disciplines.
Arts  Mathematics  Computer science  Sports  

A  9  9  10  9 
B  8  2  6  5 
C  6  5  10  7 
D  8  9  9  6 
E  8  4  6  9 
F  6  5  2  10 
We need to divide the set of pupils into clusters in such a way that the grades of pupils in the same cluster differ by at most 1 for each of the disciplines. Such a setting corresponds to
When
Clustering methods based on interval pattern concepts find applications in the analysis of experimental data. For instance, applications of such methods to gene expression analysis were discussed in [
Let
A
It can be easily seen that the problem of identification of maximum interval pattern concept can be reformulated in terms of finding the
The problem of optimal positioning has been well studied for
Known approximate algorithms for optimal positioning also have time complexity which depends on
In this section we present a greedy algorithm for finding an approximately optimal position of a box with edge lengths
The algorithm has several input parameters: positive real numbers
The algorithm includes two basic stages: the preprocessing stage and the main stage.
At the first stage of our algorithm the box with the edge lengths
We consider the integer lattice with edges of length 1, compute the number of points of
At the final step of the preprocessing stage we build a
Let
The
Now we describe the procedure of constructing the sequence of cubes. Let
Suppose that the cubes
If there exists a cube
If there are no such cubes (i.e., all cubes in the
In order to obtain acceptable time complexity we impose additional restrictions on the selection of the next cube. These assumptions are necessary to avoid the situation where the length of the sequence grows exponentially with
In Figure
The base cube is colored red; the global optimum is blue. There is no way to move from the red cube to the blue one without losing touch with the base cube.
The above restrictions lead to the following lemma.
The main stage of the algorithm has
First we get an upper estimate for the length
Note that we also have a trivial estimate
Let
The upper estimate is trivial. The lower estimate follows from the fact that
The algorithm for finding an approximately optimal position of the box has
Combining the estimates for the time and space complexity of the preprocessing stage and the main stage of the algorithm gives the bounds mentioned above.
Note that omitting Restrictions
Now let us consider the clustering problem, that is, the problem of splitting the given set
First, we put
In order to avoid producing a lot of small clusters consisting of outliers we impose one more restriction.
With this restriction if the size of
Restriction
The clustering algorithm has
If Restrictions
Validation of the clustering algorithm developed in this study was performed on a dataset of tactile images registered by the Medical Tactile Endosurgical Complex (MTEC) during examination of artificial samples. MTEC allows intraoperative mechanoreceptor tactile examination of tissues and is already used in endoscopic surgery [
The key component of MTEC is a tactile mechanoreceptor [
In order to create a dataset of tactile images we utilized MTEC for tactile examinations of three types of artificial samples. The samples were similar to the Lsamples utilized in the study [
Totally 55 tactile examinations of the described samples were performed using MTEC. The contact angle was kept approximately equal to
(a–c) Examples of tactile frames for examinations of ST1 (a), ST2 (b), and ST3 (c) type samples. Pressure values are scaled to
Thus, each examination was associated with a point in
The results produced by both the proposed algorithm and by the
To get better results we mapped the data to the new 9dimensional space of attributes. The new attributes included
SD of all values in a tactile frame;
mean and SD of the values corresponding to 7 middle sensors;
mean and SD of the values corresponding to 12 outer sensors;
mean and SD of the values corresponding to sensors that belong to the main diagonals (3 diagonals each consisting of 5 sensors, 13 sensors in total; see Figure
mean and SD of the values corresponding to sensors that belong to the secondary diagonals (6 diagonals each consisting of 4 sensors, 12 sensors in total; see Figure
Transition to the new attribute space essentially improved the clustering quality, but our algorithm left 10–14 points as outliers (
Correspondence between the original classes and the clusters constructed by the proposed algorithm (with outliers).
1st cluster 
2nd cluster 
3rd cluster 
Unclustered  

ST1 
9 points  1 points  5 points  7 points 
ST2 
0 points  12 points  3 points  2 points 
ST3 
0 points  0 points  14 points  2 points 
Correspondence between the original classes and the clusters constructed by the proposed algorithm (no outliers).
1st cluster 
2nd cluster 
3rd cluster  

ST1 
11 points  3 points  8 points 
ST2 
0 points  14 points  3 points 
ST3 
0 points  0 points  16 points 
Table
Dependency of Rand index values and the running time for our and
Number of iterations  Clustering method  Rand index median 
Rand index IQR 
Average running time (in seconds) 

20  Our method 
0.43/0.73  0.12/0.05  0.8 
Our method 
0.39/0.73  0.10/0.04  0.8  

0.32/0.70  0.21/0.09  0.02  


50  Our method 
0.43/0.73  0.08/0.05  2.4 
Our method 
0.39/0.73  0.06/0.03  2.5  

0.27/0.68  0.20/ 0.09  0.05  


100  Our method 
0.42/0.74  0.08/0.04  4.2 
Our method 
0.39/0.73  0.06/0.03  4.3  

0.31/0.70  0.20/0.09  0.09 
As one can see, the proposed algorithm has an acceptable running time, and both our and
The advantage of the proposed algorithm over the
Interestingly, the transition to the new attribute space improved the quality of our algorithm more than the quality of the
In this paper we proposed a greedy clustering algorithm based on interval pattern concepts. The obtained theoretical estimate on algorithm complexity proved computational feasibility for highdimensional spaces, and the validation on experimental data demonstrated high quality of the resulting clustering in comparison with conventional clustering algorithms such as
Particular results obtained during validation, such as a new attribute space for tactile frames registered by the Medical Tactile Endosurgical Complex, have individual significance as they provide new opportunities for the medical domain applications aimed at automated analysis of tactile images.
Dataset of tactile frames used for the validation and the Python script that implements the developed clustering algorithm are available upon request from the authors.
The authors declare that there are no conflicts of interest regarding the publication of this paper.
The authors thank Dr. Alexey V. Galatenko and Dr. Vladimir V. Galatenko for valuable comments and discussions. The research was supported by the Russian Science Foundation (Project 161100058 “The Development of Methods and Algorithms for Automated Analysis of Medical Tactile Information and Classification of Tactile Images”).