Related papers: Improving Contextualized Topic Models with Negativ…

Evaluating Negative Sampling Approaches for Neural Topic Models

Negative sampling has emerged as an effective technique that enables deep learning models to learn better representations by introducing the paradigm of learn-to-compare. The goal of this approach is to add robustness to deep learning…

Computation and Language · Computer Science 2025-03-26 Suman Adhya , Avishek Lahiri , Debarshi Kumar Sanyal , Partha Pratim Das

Prompting Large Language Models for Topic Modeling

Topic modeling is a widely used technique for revealing underlying thematic structures within textual data. However, existing models have certain limitations, particularly when dealing with short text datasets that lack co-occurring words.…

Artificial Intelligence · Computer Science 2023-12-18 Han Wang , Nirmalendu Prakash , Nguyen Khoi Hoang , Ming Shan Hee , Usman Naseem , Roy Ka-Wei Lee

Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

Topic models extract groups of words from documents, whose interpretation as a topic hopefully allows for a better understanding of the data. However, the resulting word groups are often not coherent, making them harder to interpret.…

Computation and Language · Computer Science 2021-06-18 Federico Bianchi , Silvia Terragni , Dirk Hovy

Investigating the Impact of Text Summarization on Topic Modeling

Topic models are used to identify and group similar themes in a set of documents. Recent advancements in deep learning based neural topic models has received significant research interest. In this paper, an approach is proposed that further…

Computation and Language · Computer Science 2024-10-15 Trishia Khandelwal

Improving Topic Models with Latent Feature Word Representations

Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two…

Computation and Language · Computer Science 2018-10-16 Dat Quoc Nguyen , Richard Billingsley , Lan Du , Mark Johnson

A Large Language Model Guided Topic Refinement Mechanism for Short Text Modeling

Modeling topics effectively in short texts, such as tweets and news snippets, is crucial to capturing rapidly evolving social trends. Existing topic models often struggle to accurately capture the underlying semantic patterns of short…

Computation and Language · Computer Science 2025-02-18 Shuyu Chang , Rui Wang , Peng Ren , Qi Wang , Haiping Huang

Enhancing Short-Text Topic Modeling with LLM-Driven Context Expansion and Prefix-Tuned VAEs

Topic modeling is a powerful technique for uncovering hidden themes within a collection of documents. However, the effectiveness of traditional topic models often relies on sufficient word co-occurrence, which is lacking in short texts.…

Computation and Language · Computer Science 2024-10-22 Pritom Saha Akash , Kevin Chen-Chuan Chang

Improving Neural Topic Modeling with Semantically-Grounded Soft Label Distributions

Traditional neural topic models are typically optimized by reconstructing the document's Bag-of-Words (BoW) representations, overlooking contextual information and struggling with data sparsity. In this work, we propose a novel approach to…

Computation and Language · Computer Science 2026-02-23 Raymond Li , Amirhossein Abaskohi , Chuyuan Li , Gabriel Murray , Giuseppe Carenini

Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings

A topic model is often formulated as a generative model that explains how each word of a document is generated given a set of topics and document-specific topic proportions. It is focused on capturing the word co-occurrences in a document…

Machine Learning · Computer Science 2022-03-16 Dongsheng Wang , Dandan Guo , He Zhao , Huangjie Zheng , Korawat Tanwisuth , Bo Chen , Mingyuan Zhou

Enhancing Idiomatic Representation in Multiple Languages via an Adaptive Contrastive Triplet Loss

Accurately modeling idiomatic or non-compositional language has been a longstanding challenge in Natural Language Processing (NLP). This is partly because these expressions do not derive their meanings solely from their constituent words,…

Computation and Language · Computer Science 2024-09-06 Wei He , Marco Idiart , Carolina Scarton , Aline Villavicencio

Let the Pretrained Language Models "Imagine" for Short Texts Topic Modeling

Topic models are one of the compelling methods for discovering latent semantics in a document collection. However, it assumes that a document has sufficient co-occurrence information to be effective. However, in short texts, co-occurrence…

Computation and Language · Computer Science 2023-10-25 Pritom Saha Akash , Jie Huang , Kevin Chen-Chuan Chang

On Negative Sampling for Contrastive Audio-Text Retrieval

This paper investigates negative sampling for contrastive learning in the context of audio-text retrieval. The strategy for negative sampling refers to selecting negatives (either audio clips or textual descriptions) from a pool of…

Audio and Speech Processing · Electrical Eng. & Systems 2023-02-20 Huang Xie , Okko Räsänen , Tuomas Virtanen

Topic Modeling as Multi-Objective Contrastive Optimization

Recent representation learning approaches enhance neural topic models by optimizing the weighted linear combination of the evidence lower bound (ELBO) of the log-likelihood and the contrastive learning objective that contrasts pairs of…

Computation and Language · Computer Science 2025-07-15 Thong Nguyen , Xiaobao Wu , Xinshuai Dong , Cong-Duy T Nguyen , See-Kiong Ng , Anh Tuan Luu

Contrastive Learning for Neural Topic Model

Recent empirical studies show that adversarial topic models (ATM) can successfully capture semantic patterns of the document by differentiating a document with another dissimilar sample. However, utilizing that discriminative-generative…

Computation and Language · Computer Science 2021-10-26 Thong Nguyen , Anh Tuan Luu

CAST: Corpus-Aware Self-similarity Enhanced Topic modelling

Topic modelling is a pivotal unsupervised machine learning technique for extracting valuable insights from large document collections. Existing neural topic modelling methods often encode contextual information of documents, while ignoring…

Computation and Language · Computer Science 2025-02-07 Yanan Ma , Chenghao Xiao , Chenhan Yuan , Sabine N van der Veer , Lamiece Hassan , Chenghua Lin , Goran Nenadic

Improving Neural Topic Models using Knowledge Distillation

Topic models are often used to identify human-interpretable topics to help make sense of large document collections. We use knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers. Our…

Computation and Language · Computer Science 2020-10-07 Alexander Hoyle , Pranav Goel , Philip Resnik

TopicAdapt- An Inter-Corpora Topics Adaptation Approach

Topic models are popular statistical tools for detecting latent semantic topics in a text corpus. They have been utilized in various applications across different fields. However, traditional topic models have some limitations, including…

Computation and Language · Computer Science 2023-10-10 Pritom Saha Akash , Trisha Das , Kevin Chen-Chuan Chang

Testing Hypotheses of Covariate Effects on Topics of Discourse

We introduce an approach to topic modelling with document-level covariates that remains tractable in the face of large text corpora. This is achieved by de-emphasizing the role of parameter estimation in an underlying probabilistic model,…

Methodology · Statistics 2025-11-05 Gabriel Phelan , David A. Campbell

Neural Dynamic Focused Topic Model

Topic models and all their variants analyse text by learning meaningful representations through word co-occurrences. As pointed out by Williamson et al. (2010), such models implicitly assume that the probability of a topic to be active and…

Computation and Language · Computer Science 2023-01-27 Kostadin Cvejoski , Ramsés J. Sánchez , César Ojeda

Semantic-Driven Topic Modeling Using Transformer-Based Embeddings and Clustering Algorithms

Topic modeling is a powerful technique to discover hidden topics and patterns within a collection of documents without prior knowledge. Traditional topic modeling and clustering-based techniques encounter challenges in capturing contextual…

Computation and Language · Computer Science 2024-10-04 Melkamu Abay Mersha , Mesay Gemeda yigezu , Jugal Kalita