Related papers: Coordinated Topic Modeling

Interactive Topic Models with Optimal Transport

Topic models are widely used to analyze document collections. While they are valuable for discovering latent topics in a corpus when analysts are unfamiliar with the corpus, analysts also commonly start with an understanding of the content…

Computation and Language · Computer Science 2024-07-01 Garima Dhanania , Sheshera Mysore , Chau Minh Pham , Mohit Iyyer , Hamed Zamani , Andrew McCallum

Effective Neural Topic Modeling with Embedding Clustering Regularization

Topic models have been prevalent for decades with various applications. However, existing topic models commonly suffer from the notorious topic collapsing: discovered topics semantically collapse towards each other, leading to highly…

Computation and Language · Computer Science 2023-06-08 Xiaobao Wu , Xinshuai Dong , Thong Nguyen , Anh Tuan Luu

TopicAdapt- An Inter-Corpora Topics Adaptation Approach

Topic models are popular statistical tools for detecting latent semantic topics in a text corpus. They have been utilized in various applications across different fields. However, traditional topic models have some limitations, including…

Computation and Language · Computer Science 2023-10-10 Pritom Saha Akash , Trisha Das , Kevin Chen-Chuan Chang

Topic Modeling Using Distributed Word Embeddings

We propose a new algorithm for topic modeling, Vec2Topic, that identifies the main topics in a corpus using semantic information captured via high-dimensional distributed word embeddings. Our technique is unsupervised and generates a list…

Computation and Language · Computer Science 2016-03-16 Ramandeep S Randhawa , Parag Jain , Gagan Madan

Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings

A topic model is often formulated as a generative model that explains how each word of a document is generated given a set of topics and document-specific topic proportions. It is focused on capturing the word co-occurrences in a document…

Machine Learning · Computer Science 2022-03-16 Dongsheng Wang , Dandan Guo , He Zhao , Huangjie Zheng , Korawat Tanwisuth , Bo Chen , Mingyuan Zhou

A Query-Driven Topic Model

Topic modeling is an unsupervised method for revealing the hidden semantic structure of a corpus. It has been increasingly widely adopted as a tool in the social sciences, including political science, digital humanities and sociological…

Information Retrieval · Computer Science 2022-01-12 Zheng Fang , Yulan He , Rob Procter

CAST: Corpus-Aware Self-similarity Enhanced Topic modelling

Topic modelling is a pivotal unsupervised machine learning technique for extracting valuable insights from large document collections. Existing neural topic modelling methods often encode contextual information of documents, while ignoring…

Computation and Language · Computer Science 2025-02-07 Yanan Ma , Chenghao Xiao , Chenhan Yuan , Sabine N van der Veer , Lamiece Hassan , Chenghua Lin , Goran Nenadic

Topic Modeling in Embedding Spaces

Topic modeling analyzes documents to learn meaningful patterns of words. However, existing topic models fail to learn interpretable topics when working with large and heavy-tailed vocabularies. To this end, we develop the Embedded Topic…

Information Retrieval · Computer Science 2019-07-12 Adji B. Dieng , Francisco J. R. Ruiz , David M. Blei

A modified model for topic detection from a corpus and a new metric evaluating the understandability of topics

This paper presents a modified neural model for topic detection from a corpus and proposes a new metric to evaluate the detected topics. The new model builds upon the embedded topic model incorporating some modifications such as document…

Computation and Language · Computer Science 2023-06-09 Tomoya Kitano , Yuto Miyatake , Daisuke Furihata

Topic Modeling over Short Texts by Incorporating Word Embeddings

Inferring topics from the overwhelming amount of short texts becomes a critical but challenging task for many content analysis tasks, such as content charactering, user interest profiling, and emerging topic detecting. Existing methods such…

Computation and Language · Computer Science 2016-09-28 Jipeng Qiang , Ping Chen , Tong Wang , Xindong Wu

LTSG: Latent Topical Skip-Gram for Mutually Learning Topic Model and Vector Representations

Topic models have been widely used in discovering latent topics which are shared across documents in text mining. Vector representations, word embeddings and topic embeddings, map words and topics into a low-dimensional and dense real-value…

Computation and Language · Computer Science 2017-02-24 Jarvan Law , Hankz Hankui Zhuo , Junhua He , Erhu Rong

Correlated topic modeling has been limited to small model and problem sizes due to their high computational cost and poor scaling. In this paper, we propose a new model which learns compact topic embeddings and captures topic correlations…

Machine Learning · Computer Science 2017-07-04 Junxian He , Zhiting Hu , Taylor Berg-Kirkpatrick , Ying Huang , Eric P. Xing

Topics as Entity Clusters: Entity-based Topics from Large Language Models and Graph Neural Networks

Topic models aim to reveal latent structures within a corpus of text, typically through the use of term-frequency statistics over bag-of-words representations from documents. In recent years, conceptual entities -- interpretable,…

Computation and Language · Computer Science 2024-08-27 Manuel V. Loureiro , Steven Derby , Tri Kurniawan Wijaya

Bidirectional Topic Matching: Quantifying Thematic Overlap Between Corpora Through Topic Modelling

This study introduces Bidirectional Topic Matching (BTM), a novel method for cross-corpus topic modeling that quantifies thematic overlap and divergence between corpora. BTM is a flexible framework that can incorporate various topic…

Computation and Language · Computer Science 2024-12-25 Raven Adam , Marie Lisa Kogler

Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

Topic models extract groups of words from documents, whose interpretation as a topic hopefully allows for a better understanding of the data. However, the resulting word groups are often not coherent, making them harder to interpret.…

Computation and Language · Computer Science 2021-06-18 Federico Bianchi , Silvia Terragni , Dirk Hovy

CEMTM: Contextual Embedding-based Multimodal Topic Modeling

We introduce CEMTM, a context-enhanced multimodal topic model designed to infer coherent and interpretable topic structures from both short and long documents containing text and images. CEMTM builds on fine-tuned large vision language…

Computation and Language · Computer Science 2025-10-07 Amirhossein Abaskohi , Raymond Li , Chuyuan Li , Shafiq Joty , Giuseppe Carenini

Towards Generalising Neural Topical Representations

Topic models have evolved from conventional Bayesian probabilistic models to recent Neural Topic Models (NTMs). Although NTMs have shown promising performance when trained and tested on a specific corpus, their generalisation ability across…

Computation and Language · Computer Science 2024-06-14 Xiaohao Yang , He Zhao , Dinh Phung , Lan Du

Neural Topic Modeling by Incorporating Document Relationship Graph

Graph Neural Networks (GNNs) that capture the relationships between graph nodes via message passing have been a hot research direction in the natural language processing community. In this paper, we propose Graph Topic Model (GTM), a GNN…

Computation and Language · Computer Science 2020-09-30 Deyu Zhou , Xuemeng Hu , Rui Wang

Semantic-Driven Topic Modeling Using Transformer-Based Embeddings and Clustering Algorithms

Topic modeling is a powerful technique to discover hidden topics and patterns within a collection of documents without prior knowledge. Traditional topic modeling and clustering-based techniques encounter challenges in capturing contextual…

Computation and Language · Computer Science 2024-10-04 Melkamu Abay Mersha , Mesay Gemeda yigezu , Jugal Kalita

Knowledge-Aware Bayesian Deep Topic Model

We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling. Although embedded topic models (ETMs) and its variants have gained promising performance in text analysis, they mainly focus…

Computation and Language · Computer Science 2022-09-29 Dongsheng Wang , Yishi Xu , Miaoge Li , Zhibin Duan , Chaojie Wang , Bo Chen , Mingyuan Zhou