English

An Alternative Prior Process for Nonparametric Bayesian Clustering

Methodology 2010-10-18 v2 Statistics Theory Statistics Theory

Abstract

Prior distributions play a crucial role in Bayesian approaches to clustering. Two commonly-used prior distributions are the Dirichlet and Pitman-Yor processes. In this paper, we investigate the predictive probabilities that underlie these processes, and the implicit "rich-get-richer" characteristic of the resulting partitions. We explore an alternative prior for nonparametric Bayesian clustering -- the uniform process -- for applications where the "rich-get-richer" property is undesirable. We also explore the cost of this process: partitions are no longer exchangeable with respect to the ordering of variables. We present new asymptotic and simulation-based results for the clustering characteristics of the uniform process and compare these with known results for the Dirichlet and Pitman-Yor processes. We compare performance on a real document clustering task, demonstrating the practical advantage of the uniform process despite its lack of exchangeability over orderings.

Keywords

Cite

@article{arxiv.0801.0461,
  title  = {An Alternative Prior Process for Nonparametric Bayesian Clustering},
  author = {Hanna M. Wallach and Shane T. Jensen and Lee Dicker and Katherine A. Heller},
  journal= {arXiv preprint arXiv:0801.0461},
  year   = {2010}
}
R2 v1 2026-06-21T09:59:09.628Z