English

Nested Hierarchical Dirichlet Processes

Machine Learning 2016-11-17 v4 Machine Learning

Abstract

We develop a nested hierarchical Dirichlet process (nHDP) for hierarchical topic modeling. The nHDP is a generalization of the nested Chinese restaurant process (nCRP) that allows each word to follow its own path to a topic node according to a document-specific distribution on a shared tree. This alleviates the rigid, single-path formulation of the nCRP, allowing a document to more easily express thematic borrowings as a random effect. We derive a stochastic variational inference algorithm for the model, in addition to a greedy subtree selection method for each document, which allows for efficient inference using massive collections of text documents. We demonstrate our algorithm on 1.8 million documents from The New York Times and 3.3 million documents from Wikipedia.

Cite

@article{arxiv.1210.6738,
  title  = {Nested Hierarchical Dirichlet Processes},
  author = {John Paisley and Chong Wang and David M. Blei and Michael I. Jordan},
  journal= {arXiv preprint arXiv:1210.6738},
  year   = {2016}
}

Comments

To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence, Special Issue on Bayesian Nonparametrics

R2 v1 2026-06-21T22:27:30.757Z