English

Coordinated Topic Modeling

Computation and Language 2022-10-25 v2 Information Retrieval

Abstract

We propose a new problem called coordinated topic modeling that imitates human behavior while describing a text corpus. It considers a set of well-defined topics like the axes of a semantic space with a reference representation. It then uses the axes to model a corpus for easily understandable representation. This new task helps represent a corpus more interpretably by reusing existing knowledge and benefits the corpora comparison task. We design ECTM, an embedding-based coordinated topic model that effectively uses the reference representation to capture the target corpus-specific aspects while maintaining each topic's global semantics. In ECTM, we introduce the topic- and document-level supervision with a self-training mechanism to solve the problem. Finally, extensive experiments on multiple domains show the superiority of our model over other baselines.

Keywords

Cite

@article{arxiv.2210.08559,
  title  = {Coordinated Topic Modeling},
  author = {Pritom Saha Akash and Jie Huang and Kevin Chen-Chuan Chang},
  journal= {arXiv preprint arXiv:2210.08559},
  year   = {2022}
}