English

Content Modeling Using Latent Permutations

Information Retrieval 2014-01-16 v1 Computation and Language Machine Learning

Abstract

We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that this space of orderings can be effectively represented using a distribution over permutations called the Generalized Mallows Model. We apply our method to three complementary discourse-level tasks: cross-document alignment, document segmentation, and information ordering. Our experiments show that incorporating our permutation-based model in these applications yields substantial improvements in performance over previously proposed methods.

Keywords

Cite

@article{arxiv.1401.3488,
  title  = {Content Modeling Using Latent Permutations},
  author = {Harr Chen and S. R. K. Branavan and Regina Barzilay and David R. Karger},
  journal= {arXiv preprint arXiv:1401.3488},
  year   = {2014}
}
R2 v1 2026-06-22T02:45:51.098Z