Related papers: Persistent Clustering and a Theorem of J. Kleinber…

Classifying Clustering Schemes

Many clustering schemes are defined by optimizing an objective function defined on the partitions of the underlying set of a finite metric space. In this paper, we construct a framework for studying what happens when we instead impose…

Machine Learning · Statistics 2010-12-01 Gunnar Carlsson , Facundo Memoli

On the Persistence of Clustering Solutions and True Number of Clusters in a Dataset

Typically clustering algorithms provide clustering solutions with prespecified number of clusters. The lack of a priori knowledge on the true number of underlying clusters in the dataset makes it important to have a metric to compare the…

Machine Learning · Computer Science 2018-11-20 Amber Srivastava , Mayank Baranwal , Srinivasa Salapaka

A Uniqueness Theorem for Clustering

Despite the widespread use of Clustering, there is distressingly little general theory of clustering available. Questions like "What distinguishes a clustering of data from other data partitioning?", "Are there any principles governing all…

Machine Learning · Computer Science 2012-05-14 Reza Bosagh Zadeh , Shai Ben-David

Evaluation Metrics for Unsupervised Learning Algorithms

Determining the quality of the results obtained by clustering techniques is a key issue in unsupervised machine learning. Many authors have discussed the desirable features of good clustering algorithms. However, Jon Kleinberg established…

Machine Learning · Computer Science 2019-05-24 Julio-Omar Palacio-Niño , Fernando Berzal

Possibility results for graph clustering: A novel consistency axiom

Kleinberg introduced three natural clustering properties, or axioms, and showed they cannot be simultaneously satisfied by any clustering algorithm. We present a new clustering property, Monotonic Consistency, which avoids the well-known…

Machine Learning · Computer Science 2022-04-05 Fabio Strazzeri , Rubén J. Sánchez-García

Limitations of Clustering Using Quantum Persistent Homology

Different algorithms can be used for clustering purposes with data sets. On of these algorithms, uses topological features extracted from the data set to base the clusters on. The complexity of this algorithm is however exponential in the…

Quantum Physics · Physics 2019-11-26 Niels Neumann , Sterre den Breeijen

A Computational Theory and Semi-Supervised Algorithm for Clustering

A computational theory for clustering and a semi-supervised clustering algorithm is presented. Clustering is defined to be the obtainment of groupings of data such that each group contains no anomalies with respect to a chosen grouping…

Machine Learning · Computer Science 2025-07-17 Nassir Mohammad

Clustering processes

The problem of clustering is considered, for the case when each data point is a sample generated by a stationary ergodic process. We propose a very natural asymptotic notion of consistency, and show that simple consistent algorithms exist,…

Machine Learning · Computer Science 2013-05-01 Daniil Ryabko

Clustering processes

The problem of clustering is considered, for the case when each data point is a sample generated by a stationary ergodic process. We propose a very natural asymptotic notion of consistency, and show that simple consistent algorithms exist,…

Machine Learning · Computer Science 2010-05-31 Daniil Ryabko

Learning with Clustering Structure

We study supervised learning problems using clustering constraints to impose structure on either features or samples, seeking to help both prediction and interpretation. The problem of clustering features arises naturally in text…

Machine Learning · Computer Science 2016-09-20 Vincent Roulet , Fajwel Fogel , Alexandre d'Aspremont , Francis Bach

Selecting the Number of Clusters $K$ with a Stability Trade-off: an Internal Validation Criterion

Model selection is a major challenge in non-parametric clustering. There is no universally admitted way to evaluate clustering results for the obvious reason that no ground truth is available. The difficulty to find a universal evaluation…

Machine Learning · Computer Science 2023-05-18 Alex Mourer , Florent Forest , Mustapha Lebbah , Hanane Azzag , Jérôme Lacaille

On the Structural Theorem of Persistent Homology

We study the categorical framework for the computation of persistent homology, without reliance on a particular computational algorithm. The computation of persistent homology is commonly summarized as a matrix theorem, which we call the…

Algebraic Topology · Mathematics 2018-10-02 Killian Meehan , Andrei Pavlichenko , Jan Segert

A Rapid Review of Clustering Algorithms

Clustering algorithms aim to organize data into groups or clusters based on the inherent patterns and similarities within the data. They play an important role in today's life, such as in marketing and e-commerce, healthcare, data…

Machine Learning · Computer Science 2024-01-17 Hui Yin , Amir Aryani , Stephen Petrie , Aishwarya Nambissan , Aland Astudillo , Shengyuan Cao

On the Discrepancy Between Kleinberg's Clustering Axioms and $k$-Means Clustering Algorithm Behavior

This paper investigates the validity of Kleinberg's axioms for clustering functions with respect to the quite popular clustering algorithm called $k$-means. While Kleinberg's axioms have been discussed heavily in the past, we concentrate…

Machine Learning · Computer Science 2017-04-25 Robert Kłopotek , Mieczysław Kłopotek

A Clustering Preserving Transformation for k-Means Algorithm Output

This note introduces a novel clustering preserving transformation of cluster sets obtained from $k$-means algorithm. This transformation may be used to generate new labeled data{}sets from existent ones. It is more flexible that Kleinberg…

Machine Learning · Computer Science 2022-07-26 Mieczysław A. Kłopotek

When is Clustering Perturbation Robust?

Clustering is a fundamental data mining tool that aims to divide data into groups of similar items. Generally, intuition about clustering reflects the ideal case -- exact data sets endowed with flawless dissimilarity between individual…

Machine Learning · Computer Science 2016-01-25 Margareta Ackerman , Jarrod Moore

A Fixed point view: A Model-Based Clustering Framework

With the inflation of the data, clustering analysis, as a branch of unsupervised learning, lacks unified understanding and application of its mathematical law. Based on the view of fixed point, this paper restates the model-based clustering…

Machine Learning · Computer Science 2020-02-20 Jianhao Ding , Lansheng Han

A framework for benchmarking clustering algorithms

The evaluation of clustering algorithms can involve running them on a variety of benchmark problems, and comparing their outputs to the reference, ground-truth groupings provided by experts. Unfortunately, many research papers and graduate…

Machine Learning · Computer Science 2023-10-27 Marek Gagolewski

Clustering with Confidence: Finding Clusters with Statistical Guarantees

Clustering is a widely used unsupervised learning method for finding structure in the data. However, the resulting clusters are typically presented without any guarantees on their robustness; slightly changing the used data sample or…

Machine Learning · Statistics 2017-01-02 Andreas Henelius , Kai Puolamäki , Henrik Boström , Panagiotis Papapetrou

Statistical Parameter Selection for Clustering Persistence Diagrams

In urgent decision making applications, ensemble simulations are an important way to determine different outcome scenarios based on currently available data. In this paper, we will analyze the output of ensemble simulations by considering…

Graphics · Computer Science 2019-10-21 Max Kontak , Jules Vidal , Julien Tierny