Related papers: Random Partition Models for Microclustering Tasks
Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman--Yor process…
Many popular random partition models, such as the Chinese restaurant process and its two-parameter extension, fall in the class of exchangeable random partitions, and have found wide applicability in model-based clustering, population…
Although exchangeable processes from Bayesian nonparametrics have been used as a generating mechanism for random partition models, we deviate from this paradigm to explicitly incorporate clustering information in the formulation of our…
Clustering is a crucial task in various domains of knowledge, including medicine, epidemiology, genomics, environmental science, economics, and visual sciences, among others. Methodologies for inferring the number of clusters have often…
Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman--Yor process…
We introduce the microclustering Ewens--Pitman model for random partitions, obtained by scaling the strength parameter of the Ewens--Pitman model linearly with the sample size. The resulting random partition is shown to have the…
A Bayesian approach to the classification problem is proposed in which random partitions play a central role. It is argued that the partitioning approach has the capacity to take advantage of a variety of large-scale spatial structures, if…
Monte-Carlo techniques are standard numerical tools for exploring non-Gaussian and multivariate likelihoods. Many variants of the original Metropolis-Hastings algorithm have been proposed to increase the sampling efficiency. Motivated by…
Recent advances in Bayesian models for random partitions have led to the formulation and exploration of Exchangeable Sequences of Clusters (ESC) models. Under ESC models, it is the cluster sizes that are exchangeable, rather than the…
We present a consensus Monte Carlo algorithm that scales existing Bayesian nonparametric models for clustering and feature allocation to big data. The algorithm is valid for any prior on random subsets such as partitions and latent feature…
In cluster analysis interest lies in probabilistically capturing partitions of individuals, items or observations into groups, such that those belonging to the same group share similar attributes or relational profiles. Bayesian posterior…
We present an approach to model-based hierarchical clustering by formulating an objective function based on a Bayesian analysis. This model organizes the data into a cluster hierarchy while specifying a complex feature-set partitioning that…
The paper introduces the concept of a cluster structure to define a joint distribution of the sample size and its exchangeable random partitions. The cluster structure allows the probability distribution of the random partitions of a subset…
Nonparametric Bayesian approaches provide a flexible framework for clustering without pre-specifying the number of groups, yet they are well known to overestimate the number of clusters, especially for functional data. We show that a…
Clustering functional data is a challenging task due to intrinsic infinite-dimensionality and the need for stable, data-adaptive partitioning. In this work, we propose a clustering framework based on Random Projections, which simultaneously…
We consider the task of modeling a dependent sequence of random partitions. It is well-known that a random measure in Bayesian nonparametrics induces a distribution over random partitions. The community has therefore assumed that the best…
Model-based clustering is a powerful tool that is often used to discover hidden structure in data by grouping observational units that exhibit similar response values. Recently, clustering methods have been developed that permit…
Motivated by the fundamental problem of measuring species diversity, this paper introduces the concept of a cluster structure to define an exchangeable cluster probability function that governs the joint distribution of a random count and…
Bayesian entity resolution merges together multiple, noisy databases and returns the minimal collection of unique individuals represented, together with their true, latent record values. Bayesian methods allow flexible generative models…
Bayesian clustering methods have the widely touted advantage of providing a probabilistic characterization of uncertainty in clustering through the posterior distribution. An amazing variety of priors and likelihoods have been proposed for…