Related papers: A framework for benchmarking clustering algorithms
Note: A revised version of this is now published. Please cite and read (it's open access): Van Mechelen, I., Boulesteix, A.-L., Dangl, R., Dean, N., Hennig, C., Leisch, F., Steinley, D., Warrens, M. J. (2023). A white paper on good research…
Comprehensive benchmarking of clustering algorithms is rendered difficult by two key factors: (i)~the elusiveness of a unique mathematical definition of this unsupervised learning approach and (ii)~dependencies between the generating models…
Experimental evaluation is a major research methodology for investigating clustering algorithms and many other machine learning algorithms. For this purpose, a number of benchmark datasets have been widely used in the literature and their…
Performance of clustering algorithms is evaluated with the help of accuracy metrics. There is a great diversity of clustering algorithms, which are key components of many data analysis and exploration systems. However, there exist only few…
With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial…
Quality assessments of models in unsupervised learning and clustering verification in particular have been a long-standing problem in the machine learning research. The lack of robust and universally applicable cluster validity scores often…
Clustering is an unsupervised technique of Data Mining. It means grouping similar objects together and separating the dissimilar ones. Each object in the data set is assigned a class label in the clustering process using a distance measure.…
Clustering evaluation measures are frequently used to evaluate the performance of algorithms. However, most measures are not properly normalized and ignore some information in the inherent structure of clusterings. We model the relation…
Cluster analysis refers to a wide range of data analytic techniques for class discovery and is popular in many application fields. To judge the quality of a clustering result, different cluster validation procedures have been proposed in…
Clustering algorithms aim to organize data into groups or clusters based on the inherent patterns and similarities within the data. They play an important role in today's life, such as in marketing and e-commerce, healthcare, data…
The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark…
Many cluster similarity indices are used to evaluate clustering algorithms, and choosing the best one for a particular task remains an open problem. We demonstrate that this problem is crucial: there are many disagreements among the…
Cluster analysis is one of the essential tasks in data mining and knowledge discovery. Each type of data poses unique challenges in achieving relatively efficient partitioning of the data into homogeneous groups. While the algorithms for…
We review clustering as an analysis tool and the underlying concepts from an introductory perspective. What is clustering and how can clusterings be realised programmatically? How can data be represented and prepared for a clustering task?…
There is a great diversity of clustering and community detection algorithms, which are key components of many data analysis and exploration systems. To the best of our knowledge, however, there does not exist yet any uniform benchmarking…
A data analysis pipeline is a structured sequence of steps that transforms raw data into meaningful insights by integrating multiple analysis algorithms. In many practical applications, analytical findings are obtained only after data pass…
Graph clustering is widely used in analysis of biological networks, social networks and etc. For over a decade many graph clustering algorithms have been published, however a comprehensive and consistent performance comparison is not…
Recommender systems are one of the most applied methods in machine learning and find applications in many areas, ranging from economics to the Internet of things. This article provides a general overview of modern approaches to recommender…
We consider clustering in group decision making where the opinions are given by pairwise comparison matrices. In particular, the k-medoids model is suggested to classify the matrices since it has a linear programming problem formulation…
This paper is a chapter in the forthcoming Handbook of Cluster Analysis, Hennig et al. (2015). For definitions of basic clustering methods and some further methodology, other chapters of the Handbook are referred to. To read this version of…