English

Zero-shot topic generation

Computation and Language 2020-04-30 v1

Abstract

We present an approach to generating topics using a model trained only for document title generation, with zero examples of topics given during training. We leverage features that capture the relevance of a candidate span in a document for the generation of a title for that document. The output is a weighted collection of the phrases that are most relevant for describing the document and distinguishing it within a corpus, without requiring access to the rest of the corpus. We conducted a double-blind trial in which human annotators scored the quality of our machine-generated topics along with original human-written topics associated with news articles from The Guardian and The Huffington Post. The results show that our zero-shot model generates topic labels for news documents that are on average equal to or higher quality than those written by humans, as judged by humans.

Keywords

Cite

@article{arxiv.2004.13956,
  title  = {Zero-shot topic generation},
  author = {Oleg Vasilyev and Kathryn Evans and Anna Venancio-Marques and John Bohannon},
  journal= {arXiv preprint arXiv:2004.13956},
  year   = {2020}
}

Comments

12 pages, 9 figures, 3 tables

R2 v1 2026-06-23T15:10:23.908Z