English

DiffETM: Diffusion Process Enhanced Embedded Topic Model

Computation and Language 2025-01-03 v1 Artificial Intelligence Information Retrieval Machine Learning

Abstract

The embedded topic model (ETM) is a widely used approach that assumes the sampled document-topic distribution conforms to the logistic normal distribution for easier optimization. However, this assumption oversimplifies the real document-topic distribution, limiting the model's performance. In response, we propose a novel method that introduces the diffusion process into the sampling process of document-topic distribution to overcome this limitation and maintain an easy optimization process. We validate our method through extensive experiments on two mainstream datasets, proving its effectiveness in improving topic modeling performance.

Keywords

Cite

@article{arxiv.2501.00862,
  title  = {DiffETM: Diffusion Process Enhanced Embedded Topic Model},
  author = {Wei Shao and Mingyang Liu and Linqi Song},
  journal= {arXiv preprint arXiv:2501.00862},
  year   = {2025}
}

Comments

5 pages, 2 figures, Accepted by ICASSP 2025