English

Bayesian Level Set Clustering

Methodology 2025-12-12 v2

Abstract

Classically, Bayesian clustering interprets each component of a mixture model as a cluster. The inferred clustering posterior is highly sensitive to any inaccuracies in the kernel within each component. As this kernel is made more flexible, problems arise in identifying the underlying clusters in the data. To address this pitfall, this article proposes a fundamentally different approach to Bayesian clustering that decouples the problems of clustering and flexible modeling of the data density ff. Starting with an arbitrary Bayesian model for ff and a loss function for defining clusters based on ff, we develop a Bayesian decision-theoretic framework for density-based clustering. Within this framework, we develop a Bayesian level set clustering method to cluster data into connected components of a level set of ff. We provide theoretical support, including clustering consistency, and highlight performance in a variety of simulated examples. An application to astronomical data illustrates improvements over the popular DBSCAN algorithm in terms of accuracy, insensitivity to tuning parameters, and providing uncertainty quantification.

Keywords

Cite

@article{arxiv.2403.04912,
  title  = {Bayesian Level Set Clustering},
  author = {David Buch and Miheer Dewaskar and David B. Dunson},
  journal= {arXiv preprint arXiv:2403.04912},
  year   = {2025}
}

Comments

25 pages, 6 figures