English

Topic Modeling Using Distributed Word Embeddings

Computation and Language 2016-03-16 v1

Abstract

We propose a new algorithm for topic modeling, Vec2Topic, that identifies the main topics in a corpus using semantic information captured via high-dimensional distributed word embeddings. Our technique is unsupervised and generates a list of topics ranked with respect to importance. We find that it works better than existing topic modeling techniques such as Latent Dirichlet Allocation for identifying key topics in user-generated content, such as emails, chats, etc., where topics are diffused across the corpus. We also find that Vec2Topic works equally well for non-user generated content, such as papers, reports, etc., and for small corpora such as a single-document.

Keywords

Cite

@article{arxiv.1603.04747,
  title  = {Topic Modeling Using Distributed Word Embeddings},
  author = {Ramandeep S Randhawa and Parag Jain and Gagan Madan},
  journal= {arXiv preprint arXiv:1603.04747},
  year   = {2016}
}
R2 v1 2026-06-22T13:11:30.711Z