English
Related papers

Related papers: Random Partition Models for Microclustering Tasks

200 papers

Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman--Yor process…

Many popular random partition models, such as the Chinese restaurant process and its two-parameter extension, fall in the class of exchangeable random partitions, and have found wide applicability in model-based clustering, population…

Methodology · Statistics 2017-11-21 Giuseppe Di Benedetto , François Caron , Yee Whye Teh

Although exchangeable processes from Bayesian nonparametrics have been used as a generating mechanism for random partition models, we deviate from this paradigm to explicitly incorporate clustering information in the formulation of our…

Methodology · Statistics 2024-10-28 David B. Dahl , Richard L. Warr , Thomas P. Jensen

Clustering is a crucial task in various domains of knowledge, including medicine, epidemiology, genomics, environmental science, economics, and visual sciences, among others. Methodologies for inferring the number of clusters have often…

Methodology · Statistics 2025-05-26 Clara Grazian

Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman--Yor process…

Methodology · Statistics 2015-12-03 Jeffrey Miller , Brenda Betancourt , Abbas Zaidi , Hanna Wallach , Rebecca C. Steorts

We introduce the microclustering Ewens--Pitman model for random partitions, obtained by scaling the strength parameter of the Ewens--Pitman model linearly with the sample size. The resulting random partition is shown to have the…

Methodology · Statistics 2025-07-25 Mario Beraha , Stefano Favaro

A Bayesian approach to the classification problem is proposed in which random partitions play a central role. It is argued that the partitioning approach has the capacity to take advantage of a variety of large-scale spatial structures, if…

Statistics Theory · Mathematics 2007-06-13 Marc A. Coram

Monte-Carlo techniques are standard numerical tools for exploring non-Gaussian and multivariate likelihoods. Many variants of the original Metropolis-Hastings algorithm have been proposed to increase the sampling efficiency. Motivated by…

Cosmology and Nongalactic Astrophysics · Physics 2024-10-31 Maximilian Philipp Herzog , Heinrich von Campe , Rebecca Maria Kuntz , Lennart Röver , Björn Malte Schäfer

Recent advances in Bayesian models for random partitions have led to the formulation and exploration of Exchangeable Sequences of Clusters (ESC) models. Under ESC models, it is the cluster sizes that are exchangeable, rather than the…

Statistics Theory · Mathematics 2022-09-08 Keith Levin , Brenda Betancourt

We present a consensus Monte Carlo algorithm that scales existing Bayesian nonparametric models for clustering and feature allocation to big data. The algorithm is valid for any prior on random subsets such as partitions and latent feature…

Computation · Statistics 2020-02-26 Yang Ni , Yuan Ji , Peter Mueller

In cluster analysis interest lies in probabilistically capturing partitions of individuals, items or observations into groups, such that those belonging to the same group share similar attributes or relational profiles. Bayesian posterior…

Methodology · Statistics 2017-03-23 Riccardo Rastelli , Nial Friel

We present an approach to model-based hierarchical clustering by formulating an objective function based on a Bayesian analysis. This model organizes the data into a cluster hierarchy while specifying a complex feature-set partitioning that…

Machine Learning · Computer Science 2013-01-18 Shivakumar Vaithyanathan , Byron E Dom

The paper introduces the concept of a cluster structure to define a joint distribution of the sample size and its exchangeable random partitions. The cluster structure allows the probability distribution of the random partitions of a subset…

Methodology · Statistics 2013-10-08 Mingyuan Zhou

Nonparametric Bayesian approaches provide a flexible framework for clustering without pre-specifying the number of groups, yet they are well known to overestimate the number of clusters, especially for functional data. We show that a…

Methodology · Statistics 2025-10-21 Fumiya Iwashige , Tomoya Wakayama , Shonosuke Sugasawa , Shintaro Hashimoto

Clustering functional data is a challenging task due to intrinsic infinite-dimensionality and the need for stable, data-adaptive partitioning. In this work, we propose a clustering framework based on Random Projections, which simultaneously…

Methodology · Statistics 2025-12-18 Matteo Mori , Laura Anderlucci

We consider the task of modeling a dependent sequence of random partitions. It is well-known that a random measure in Bayesian nonparametrics induces a distribution over random partitions. The community has therefore assumed that the best…

Methodology · Statistics 2021-08-03 Garritt L. Page , Fernando A. Quintana , David B. Dahl

Model-based clustering is a powerful tool that is often used to discover hidden structure in data by grouping observational units that exhibit similar response values. Recently, clustering methods have been developed that permit…

Methodology · Statistics 2025-06-24 Sally Paganin , Garritt L. Page , Fernando Andrés Quintana

Motivated by the fundamental problem of measuring species diversity, this paper introduces the concept of a cluster structure to define an exchangeable cluster probability function that governs the joint distribution of a random count and…

Methodology · Statistics 2014-10-14 Mingyuan Zhou , Stephen G Walker

Bayesian entity resolution merges together multiple, noisy databases and returns the minimal collection of unique individuals represented, together with their true, latent record values. Bayesian methods allow flexible generative models…

Methodology · Statistics 2014-10-20 Tamara Broderick , Rebecca C. Steorts

Bayesian clustering methods have the widely touted advantage of providing a probabilistic characterization of uncertainty in clustering through the posterior distribution. An amazing variety of priors and likelihoods have been proposed for…

Methodology · Statistics 2025-11-21 Garritt L. Page , Andrés F. Barrientos , David B. Dahl , David B. Dunson
‹ Prev 1 2 3 10 Next ›