Related papers: Information theoretic model validation for cluster…

Information based clustering

In an age of increasingly large data sets, investigators in many different disciplines have turned to clustering as a tool for data analysis and exploration. Existing clustering methods, however, typically depend on several nontrivial…

Quantitative Methods · Quantitative Biology 2009-11-11 Noam Slonim , Gurinder Singh Atwal , Gasper Tkacik , William Bialek

Demystifying Information-Theoretic Clustering

We propose a novel method for clustering data which is grounded in information-theoretic principles and requires no parametric assumptions. Previous attempts to use information theory to define clusters in an assumption-free way are based…

Machine Learning · Computer Science 2014-02-07 Greg Ver Steeg , Aram Galstyan , Fei Sha , Simon DeDeo

How many clusters? An information theoretic perspective

Clustering provides a common means of identifying structure in complex data, and there is renewed interest in clustering as a tool for the analysis of large data sets in many fields. A natural question is how many clusters are appropriate…

Data Analysis, Statistics and Probability · Physics 2007-05-23 Susanne Still , William Bialek

Selective inference for clustering with unknown variance

In many modern statistical problems, the limited available data must be used both to develop the hypotheses to test, and to test these hypotheses-that is, both for exploratory and confirmatory data analysis. Reusing the same dataset for…

Methodology · Statistics 2023-07-24 Youngjoo Yun , Rina Foygel Barber

Informed Random Partition Models with Temporal Dependence

Model-based clustering is a powerful tool that is often used to discover hidden structure in data by grouping observational units that exhibit similar response values. Recently, clustering methods have been developed that permit…

Methodology · Statistics 2025-06-24 Sally Paganin , Garritt L. Page , Fernando Andrés Quintana

Clustering is difficult only when it does not matter

Numerous papers ask how difficult it is to cluster data. We suggest that the more relevant and interesting question is how difficult it is to cluster data sets {\em that can be clustered well}. More generally, despite the ubiquity and the…

Machine Learning · Computer Science 2012-05-23 Amit Daniely , Nati Linial , Michael Saks

Enhancing the selection of a model-based clustering with external qualitative variables

In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which were not directly involved to cluster the data. An approach is proposed in the model-based clustering…

Methodology · Statistics 2013-07-18 Jean-Patrick Baudry , Margarida Cardoso , Gilles Celeux , Maria José Amorim , Ana Sousa Ferreira

Selecting the Number of Clusters $K$ with a Stability Trade-off: an Internal Validation Criterion

Model selection is a major challenge in non-parametric clustering. There is no universally admitted way to evaluate clustering results for the obvious reason that no ground truth is available. The difficulty to find a universal evaluation…

Machine Learning · Computer Science 2023-05-18 Alex Mourer , Florent Forest , Mustapha Lebbah , Hanane Azzag , Jérôme Lacaille

Unifying Information-Theoretic and Pair-Counting Clustering Similarity

Comparing clusterings is central to evaluating unsupervised models, yet the many existing similarity measures can produce widely divergent, sometimes contradictory, evaluations. Clustering similarity measures are typically organized into…

Machine Learning · Statistics 2025-11-06 Alexander J. Gates

Uncertain Centroid based Partitional Clustering of Uncertain Data

Clustering uncertain data has emerged as a challenging task in uncertain data management and mining. Thanks to a computational complexity advantage over other clustering paradigms, partitional clustering has been particularly studied and a…

Databases · Computer Science 2012-03-30 Francesco Gullo , Andrea Tagarelli

A Tutorial on Discriminative Clustering and Mutual Information

To cluster data is to separate samples into distinctive groups that should ideally have some cohesive properties. Today, numerous clustering algorithms exist, and their differences lie essentially in what can be perceived as ``cohesive…

Machine Learning · Statistics 2025-05-08 Louis Ohl , Pierre-Alexandre Mattei , Frédéric Precioso

An Information-Theoretic External Cluster-Validity Measure

In this paper we propose a measure of clustering quality or accuracy that is appropriate in situations where it is desirable to evaluate a clustering algorithm by somehow comparing the clusters it produces with ``ground truth' consisting of…

Machine Learning · Computer Science 2013-01-07 Byron E Dom

When is Clustering Perturbation Robust?

Clustering is a fundamental data mining tool that aims to divide data into groups of similar items. Generally, intuition about clustering reflects the ideal case -- exact data sets endowed with flawless dissimilarity between individual…

Machine Learning · Computer Science 2016-01-25 Margareta Ackerman , Jarrod Moore

Consensus clustering in complex networks

The community structure of complex networks reveals both their organization and hidden relationships among their constituents. Most community detection methods currently available are not deterministic, and their results typically depend on…

Physics and Society · Physics 2012-03-29 Andrea Lancichinetti , Santo Fortunato

Clustering validity based on the most similarity

One basic requirement of many studies is the necessity of classifying data. Clustering is a proposed method for summarizing networks. Clustering methods can be divided into two categories named model-based approaches and algorithmic…

Machine Learning · Computer Science 2013-02-19 Raheleh Namayandeh , Farzad Didehvar , Zahra Shojaei

Information vs. Uncertainty as the Foundation for a Science of Environmental Modeling

Information accounting provides a better foundation for hypothesis testing than does uncertainty quantification. A quantitative account of science is derived under this perspective that alleviates the need for epistemic bridge principles,…

Other Statistics · Statistics 2017-04-26 Grey Nearing , Hoshin Gupta

Systematic Analysis of Cluster Similarity Indices: How to Validate Validation Measures

Many cluster similarity indices are used to evaluate clustering algorithms, and choosing the best one for a particular task remains an open problem. We demonstrate that this problem is crucial: there are many disagreements among the…

Discrete Mathematics · Computer Science 2021-08-27 Martijn Gösgens , Alexey Tikhonov , Liudmila Prokhorenkova

Partial Information Framework: Model-Based Aggregation of Estimates from Diverse Information Sources

Prediction polling is an increasingly popular form of crowdsourcing in which multiple participants estimate the probability or magnitude of some future event. These estimates are then aggregated into a single forecast. Historically,…

Methodology · Statistics 2016-04-25 Ville A. Satopää , Shane T. Jensen , Robin Pemantle , Lyle H. Ungar

To Cluster, or Not to Cluster: An Analysis of Clusterability Methods

Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. For most applications, applying clustering is only appropriate when cluster structure is present. As such, the study of clusterability,…

Machine Learning · Statistics 2018-10-30 A. Adolfsson , M. Ackerman , N. C. Brownstein

Clustering with Confidence: Finding Clusters with Statistical Guarantees

Clustering is a widely used unsupervised learning method for finding structure in the data. However, the resulting clusters are typically presented without any guarantees on their robustness; slightly changing the used data sample or…

Machine Learning · Statistics 2017-01-02 Andreas Henelius , Kai Puolamäki , Henrik Boström , Panagiotis Papapetrou