Related papers: Accuracy Evaluation of Overlapping and Multi-resol…

A framework for benchmarking clustering algorithms

The evaluation of clustering algorithms can involve running them on a variety of benchmark problems, and comparing their outputs to the reference, ground-truth groupings provided by experts. Unfortunately, many research papers and graduate…

Machine Learning · Computer Science 2023-10-27 Marek Gagolewski

A Short Survey on Data Clustering Algorithms

With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial…

Data Structures and Algorithms · Computer Science 2015-12-01 Ka-Chun Wong

Benchmarking of Clustering Validity Measures Revisited

Validation plays a crucial role in the clustering process. Many different internal validity indexes exist for the purpose of determining the best clustering solution(s) from a given collection of candidates, e.g., as produced by different…

Machine Learning · Statistics 2026-02-23 Connor Simpson , Ricardo J. G. B. Campello , Elizabeth Stojanovski

Normalised clustering accuracy: An asymmetric external cluster validity measure

There is no, nor will there ever be, single best clustering algorithm. Nevertheless, we would still like to be able to distinguish between methods that work well on certain task types and those that systematically underperform. Clustering…

Machine Learning · Computer Science 2025-10-16 Marek Gagolewski

Overlapping clustering based on kernel similarity metric

Producing overlapping schemes is a major issue in clustering. Recent proposed overlapping methods relies on the search of an optimal covering and are based on different metrics, such as Euclidean distance and I-Divergence, used to measure…

Machine Learning · Statistics 2012-11-30 Chiheb-Eddine Ben N'Cir , Nadia Essoussi , Patrice Bertrand

On Hyperparameter Search in Cluster Ensembles

Quality assessments of models in unsupervised learning and clustering verification in particular have been a long-standing problem in the machine learning research. The lack of robust and universally applicable cluster validity scores often…

Machine Learning · Statistics 2018-03-30 Luzie Helfmann , Johannes von Lindheim , Mattes Mollenhauer , Ralf Banisch

Clustering validity based on the most similarity

One basic requirement of many studies is the necessity of classifying data. Clustering is a proposed method for summarizing networks. Clustering methods can be divided into two categories named model-based approaches and algorithmic…

Machine Learning · Computer Science 2013-02-19 Raheleh Namayandeh , Farzad Didehvar , Zahra Shojaei

Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes

A key issue in cluster analysis is the choice of an appropriate clustering method and the determination of the best number of clusters. Different clusterings are optimal on the same data set according to different criteria, and the choice…

Methodology · Statistics 2020-06-24 Serhat Emre Akhanli , Christian Hennig

A sampling-based approach for efficient clustering in large datasets

We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…

Machine Learning · Computer Science 2022-03-30 Georgios Exarchakis , Omar Oubari , Gregor Lenz

Algorithms for Internal Validation Clustering Measures in the Post Genomic Era

Inferring cluster structure in microarray datasets is a fundamental task for the -omic sciences. A fundamental question in Statistics, Data Analysis and Classification, is the prediction of the number of clusters in a dataset, usually…

Data Structures and Algorithms · Computer Science 2011-02-16 Filippo Utro

Robust Task Clustering for Deep Many-Task Learning

We investigate task clustering for deep-learning based multi-task and few-shot learning in a many-task setting. We propose a new method to measure task similarities with cross-task transfer performance matrix for the deep learning scenario.…

Machine Learning · Computer Science 2018-05-21 Mo Yu , Xiaoxiao Guo , Jinfeng Yi , Shiyu Chang , Saloni Potdar , Gerald Tesauro , Haoyu Wang , Bowen Zhou

Comparative Analysis of Optimization Strategies for K-means Clustering in Big Data Contexts: A Review

This paper presents a comparative analysis of different optimization techniques for the K-means algorithm in the context of big data. K-means is a widely used clustering algorithm, but it can suffer from scalability issues when dealing with…

Machine Learning · Computer Science 2024-05-21 Ravil Mussabayev , Rustam Mussabayev

Clustering Mixed Numeric and Categorical Data: A Cluster Ensemble Approach

Clustering is a widely used technique in data mining applications for discovering patterns in underlying data. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes.…

Artificial Intelligence · Computer Science 2007-05-23 Zengyou He , Xiaofei Xu , Shengchun Deng

Settling Time vs. Accuracy Tradeoffs for Clustering Big Data

We study the theoretical and practical runtime limits of k-means and k-median clustering on large datasets. Since effectively all clustering methods are slower than the time it takes to read the dataset, the fastest approach is to quickly…

Machine Learning · Computer Science 2024-04-03 Andrew Draganov , David Saulpic , Chris Schwiegelshohn

Systematic Analysis of Cluster Similarity Indices: How to Validate Validation Measures

Many cluster similarity indices are used to evaluate clustering algorithms, and choosing the best one for a particular task remains an open problem. We demonstrate that this problem is crucial: there are many disagreements among the…

Discrete Mathematics · Computer Science 2021-08-27 Martijn Gösgens , Alexey Tikhonov , Liudmila Prokhorenkova

Clustering of Big Data with Mixed Features

Clustering large, mixed data is a central problem in data mining. Many approaches adopt the idea of k-means, and hence are sensitive to initialisation, detect only spherical clusters, and require a priori the unknown number of clusters. We…

Machine Learning · Statistics 2020-11-13 Joshua Tobin , Mimi Zhang

Validating Clustering Frameworks for Electric Load Demand Profiles

Large-scale deployment of smart meters has made it possible to collect sufficient and high-resolution data of residential electric demand profiles. Clustering analysis of these profiles is important to further analyze and comment on…

Signal Processing · Electrical Eng. & Systems 2021-03-02 Mayank Jain , Tarek AlSkaif , Soumyabrata Dev

A Rapid Review of Clustering Algorithms

Clustering algorithms aim to organize data into groups or clusters based on the inherent patterns and similarities within the data. They play an important role in today's life, such as in marketing and e-commerce, healthcare, data…

Machine Learning · Computer Science 2024-01-17 Hui Yin , Amir Aryani , Stephen Petrie , Aishwarya Nambissan , Aland Astudillo , Shengyuan Cao

Multi-level algorithms for modularity clustering

Modularity is one of the most widely used quality measures for graph clusterings. Maximizing modularity is NP-hard, and the runtime of exact algorithms is prohibitive for large graphs. A simple and effective class of heuristics coarsens the…

Data Structures and Algorithms · Computer Science 2009-09-22 Andreas Noack , Randolf Rotta

Performance Comparison for Scientific Computations on the Edge via Relative Performance

In a typical Internet-of-Things setting that involves scientific applications, a target computation can be evaluated in many different ways depending on the split of computations among various devices. On the one hand, different…

Performance · Computer Science 2022-08-09 Aravind Sankaran , Paolo Bientinesi