Acknowledgments

MPE

Mathematical Problems in Engineering

1563-5147 1024-123X

Hindawi Publishing Corporation

10.1155/2016/5427923

5427923

Editorial

Theory and Applications of Data Clustering

Panagiotakis

Costas

¹ Ramasso

Emmanuel

² Fragopoulou

Paraskevi

³ Aloise

Daniel

⁴

Department of Business Administration

TEI of Crete

72100 Agios Nikolaos

Greece

teicrete.gr

Department of Automatic Control & Micro-Mechatronic Systems and Applied Mechanics Department

FEMTO-ST Institute

UMR CNRS 6174-UBFC/ENSMM/UTBM

25000 Besançon

France

femto-st.fr

Department of Informatics Engineering

TEI of Crete

71004 Heraklion

Greece

teicrete.gr

⁴

Department of Computer Engineering and Automation

Federal University of Rio Grande do Norte

59072-970 Natal

Brazil

ufrn.br

2016

222016

2016 17 01 2016 17 01 2016

2016

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This special issue is particularly focused on fundamental and practical issues in data clustering [1–6]. Data clustering aims at organizing a set of records into a set of groups so that the overall similarity between the records within a group is maximized while minimizing the similarity with the records in the other groups. The data clustering is a state of the art problem with increasing number of applications. For decades, data clustering problems have been identified in many applications and domains such as computer vision and pattern recognition (e.g., video and image analysis for information retrieval, object recognition, image segmentation, and point clustering), networks (e.g., identification of web communities), databases and computing (facing privacy in databases), and statistical physics and mechanics (e.g., understanding phase transitions, vibration control, and fracture identification using acoustic emission data). In addition, several definitions and validation measures [3, 7] of data clustering problem have been used on different applications in engineering. For instance, the goal of the classical clustering problem is to find the clusters that optimize a predefined criterion while the goal of the microaggregation problem [8] is to determine the clusters under the constraint of a given minimum cluster size for masking microdata.

In this special issue, the selected papers focus on the topics of theory and applications of data clustering. They propose new methods that have been successfully applied on several clustering problems including image segmentation [9, 10], time series clustering [4], graph clustering (community detection) [11, 12], and (stock) recommendation systems [13, 14]. Image segmentation is a key step in many image analysis and interpretation tasks. Finding semantic regions is the ultimate goal of segmentation for image understanding. It has become a necessity for many applications, such as content based image retrieval and object recognition. The goal of time series clustering is to partition time series into clusters based on similarity or distance criteria, so that time series in the same group are similar and dissimilar to the time series in the other groups. Concerning the community detection problem, it holds that networks are usually composed of subgroup structures, whose interconnections are sparse and the intraconnections are dense, which is called community structure. Detecting the community structure of a network is a fundamental problem in complex networks which presents many variations. Community detection is often a NP-hard problem and traditional methods for detecting communities in networks can be concluded into two categories: graph partitioning and hierarchical clustering. The recommender system tries to predict the behavior of a complex system by producing a list of recommendations. In stock recommendation that has become a hot topic, most of the methods try to integrate multiple technologies, such as data mining, machine learning, herd psychology, and other nontraditional technologies.

During the last decades, there have been published thousands of clustering algorithms [1]. The clustering methods can be classified into five major categories [2]: partitioning methods, hierarchical methods, density-based methods, grid-based methods, and model-based methods. A partitioning method constructs (crisp or fuzzy) partitions of the data, where each partition represents a cluster. The partition is called crisp if each object belongs to exactly one cluster or fuzzy if one object is allowed to belong to more than one cluster at the same time. Hierarchical clustering algorithms recursively find nested clusters either in agglomerative (bottom-up) mode or in divisive (top-down) mode. Agglomerative algorithms start with each point as a separate cluster and successively merge the most similar pair of clusters. On the contrary, divisive algorithms start with all the data points in one cluster and recursively divide each cluster into smaller clusters. In both cases, a hierarchical structure (e.g., dendrogram) is provided which represents the merging or dividing steps of the method. The density-based methods continue growing a cluster as long as its density (number of data objects in the “neighborhood”) exceeds a threshold. Concerning the grid-based methods, they quantize the object space into a finite number of cells that form a grid structure. Then, they use statistical attributes for all the data objects located in each individual cell and clustering is performed on the grid, instead of data objects themselves. Model-based methods assume a model for each of the clusters and attempt to best fit the data to the assumed model.

The definition of a metric that can be used to validate clusters of different densities and/or sizes is an open problem. In the literature, several clustering validity measures have been proposed to measure the quality of clustering [3, 7, 15]. In addition, using the clustering validity measures, it is possible to compare the performance of clustering algorithms and to improve their results by getting a local minima of them.

The papers, published in this special issue, have novelty and contain some interesting methods and applications on data clustering. We believe that the papers published in this special issue will motivate further research in the field of data clustering.

Acknowledgments

The guest editors wish to express their sincere gratitude to the authors and reviewers who contributed greatly to the success of this special issue. We would also like to thank the editorial board members of this journal for their support and help throughout the preparation of this special issue.

Costas Panagiotakis Emmanuel Ramasso Paraskevi Fragopoulou Daniel Aloise

Jain

A. K.

Data clustering: 50 years beyond K-means

Pattern Recognition Letters 2010 31 8 651 666

10.1016/j.patrec.2009.09.011

2-s2.0-77950369345

Liao

T. W.

Clustering of time series data—a survey

Pattern Recognition 2005 38 11 1857 1874

10.1016/j.patcog.2005.01.025

2-s2.0-24044470614

Panagiotakis

Point clustering via voting maximization

Journal of Classification 2015 32 2 212 240

10.1007/s00357-015-9182-2

MR3369414

Ramasso

Placet

Boubakar

M. L.

Unsupervised consensus clustering of acoustic emission time-series for robust damage sequence estimation in composites

IEEE Transactions on Instrumentation and Measurement 2015 64 12 3297 3307

10.1109/tim.2015.2450354

Aloise

Deshpande

Hansen

Popat

NP-hardness of Euclidean sum-of-squares clustering

Machine Learning 2009 75 2 245 248

10.1007/s10994-009-5103-0

2-s2.0-62249143532

Hruschka

E. R.

Campello

R. J. G. B.

Freitas

A. A.

de Carvalho

A. C. P. L. F.

A survey of evolutionary algorithms for clustering

IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews 2009 39 2 133 155

10.1109/tsmcc.2008.2007252

2-s2.0-63049111403

Chou

C.-H.

M.-C.

Lai

A new cluster validity measure and its application to image compression

Pattern Analysis and Applications 2004 7 2 205 220

10.1007/s10044-004-0218-1

MR2076104

Panagiotakis

Tziritas

Successive group selection for microaggregation

IEEE Transactions on Knowledge and Data Engineering 2013 25 5 1191 1195

10.1109/tkde.2011.242

2-s2.0-84875745007

Panagiotakis

Papadakis

Grinias

Komodakis

Fragopoulou

Tziritas

Interactive image segmentation based on synthetic graph coordinates

Pattern Recognition 2013 46 11 2940 2952

10.1016/j.patcog.2013.04.004

2-s2.0-84878867512

Panagiotakis

Grinias

Tziritas

Natural image segmentation based on tree equipartition, bayesian flooding and region merging

IEEE Transactions on Image Processing 2011 20 8 2276 2287

10.1109/tip.2011.2114893

MR2866297

2-s2.0-79960525478

Papadakis

Panagiotakis

Fragopoulou

Distributed detection of communities in complex networks using synthetic coordinates

Journal of Statistical Mechanics: Theory and Experiment 2014 2014 3

P03013

10.1088/1742-5468/2014/03/p03013

2-s2.0-84898848693

Aloise

Caporossi

Hansen

Liberti

Perron

Ruiz

Modularity maximization in networks by variable neighborhood search

Graph Partitioning and Graph Clustering 2013 588

American Mathematical Society

113 128

Adomavicius

Tuzhilin

Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions

IEEE Transactions on Knowledge and Data Engineering 2005 17 6 734 749

10.1109/tkde.2005.99

2-s2.0-20844435854

de Fortuny

E. J.

De Smedt

Martens

Daelemans

Evaluating and understanding text-based stock price prediction models

Information Processing & Management 2014 50 2 426 441

10.1016/j.ipm.2013.12.002

2-s2.0-84894630762

Sedlmair

Tatu

Munzner

Tory

A taxonomy of visual cluster separation factors

Computer Graphics Forum 2012 31 3, part 4 1335 1344

10.1111/j.1467-8659.2012.03125.x

2-s2.0-84875863474