Related papers: TopicModel4J: A Java Package for Topic Models

TopicGPT: A Prompt-based Topic Modeling Framework

Topic modeling is a well-established technique for exploring text corpora. Conventional topic models (e.g., LDA) represent topics as bags of words that often require "reading the tea leaves" to interpret; additionally, they offer users…

Computation and Language · Computer Science 2024-04-03 Chau Minh Pham , Alexander Hoyle , Simeng Sun , Philip Resnik , Mohit Iyyer

jLDADMM: A Java package for the LDA and DMM topic models

In this technical report, we present jLDADMM---an easy-to-use Java toolkit for conventional topic models. jLDADMM is released to provide alternatives for topic modeling on normal or short texts. It provides implementations of the Latent…

Information Retrieval · Computer Science 2018-08-14 Dat Quoc Nguyen

STTM: A Tool for Short Text Topic Modeling

Along with the emergence and popularity of social communications on the Internet, topic discovery from short texts becomes fundamental to many applications that require semantic understanding of textual content. As a rising research field,…

Information Retrieval · Computer Science 2018-08-08 Jipeng Qiang , Yun Li , Yunhao Yuan , Wei Liu , Xindong Wu

Prompting Large Language Models for Topic Modeling

Topic modeling is a widely used technique for revealing underlying thematic structures within textual data. However, existing models have certain limitations, particularly when dealing with short text datasets that lack co-occurring words.…

Artificial Intelligence · Computer Science 2023-12-18 Han Wang , Nirmalendu Prakash , Nguyen Khoi Hoang , Ming Shan Hee , Usman Naseem , Roy Ka-Wei Lee

GPTopic: Dynamic and Interactive Topic Representations

Topic modeling seems to be almost synonymous with generating lists of top words to represent topics within large text corpora. However, deducing a topic from such list of individual terms can require substantial expertise and experience,…

Computation and Language · Computer Science 2025-11-21 Arik Reuter , Bishnu Khadka , Anton Thielmann , Christoph Weisser , Sebastian Fischer , Benjamin Säfken

Improving Neural Topic Modeling with Semantically-Grounded Soft Label Distributions

Traditional neural topic models are typically optimized by reconstructing the document's Bag-of-Words (BoW) representations, overlooking contextual information and struggling with data sparsity. In this work, we propose a novel approach to…

Computation and Language · Computer Science 2026-02-23 Raymond Li , Amirhossein Abaskohi , Chuyuan Li , Gabriel Murray , Giuseppe Carenini

Revisiting Topic-Guided Language Models

A recent line of work in natural language processing has aimed to combine language models and topic models. These topic-guided language models augment neural language models with topic models, unsupervised learning methods that can discover…

Computation and Language · Computer Science 2023-12-06 Carolina Zheng , Keyon Vafa , David M. Blei

Topics in the Haystack: Extracting and Evaluating Topics beyond Coherence

Extracting and identifying latent topics in large text corpora has gained increasing importance in Natural Language Processing (NLP). Most models, whether probabilistic models similar to Latent Dirichlet Allocation (LDA) or neural topic…

Computation and Language · Computer Science 2023-03-31 Anton Thielmann , Quentin Seifert , Arik Reuter , Elisabeth Bergherr , Benjamin Säfken

Short Text Topic Modeling Techniques, Applications, and Performance: A Survey

Analyzing short texts infers discriminative and coherent latent topics that is a critical and fundamental task since many real-world applications require semantic understanding of short texts. Traditional long text topic modeling algorithms…

Information Retrieval · Computer Science 2019-04-17 Qiang Jipeng , Qian Zhenyu , Li Yun , Yuan Yunhao , Wu Xindong

Topic Modelling Meets Deep Neural Networks: A Survey

Topic modelling has been a successful technique for text analysis for almost twenty years. When topic modelling met deep neural networks, there emerged a new and increasingly popular research area, neural topic models, with over a hundred…

Machine Learning · Computer Science 2021-03-02 He Zhao , Dinh Phung , Viet Huynh , Yuan Jin , Lan Du , Wray Buntine

Procedural Text Mining with Large Language Models

Recent advancements in the field of Natural Language Processing, particularly the development of large-scale language models that are pretrained on vast amounts of knowledge, are creating novel opportunities within the realm of Knowledge…

Computation and Language · Computer Science 2023-10-06 Anisa Rula , Jennifer D'Souza

Let the Pretrained Language Models "Imagine" for Short Texts Topic Modeling

Topic models are one of the compelling methods for discovering latent semantics in a document collection. However, it assumes that a document has sufficient co-occurrence information to be effective. However, in short texts, co-occurrence…

Computation and Language · Computer Science 2023-10-25 Pritom Saha Akash , Jie Huang , Kevin Chen-Chuan Chang

Nonparametric Relational Topic Models through Dependent Gamma Processes

Traditional Relational Topic Models provide a way to discover the hidden topics from a document network. Many theoretical and practical tasks, such as dimensional reduction, document clustering, link prediction, benefit from this revealed…

Machine Learning · Statistics 2015-03-31 Junyu Xuan , Jie Lu , Guangquan Zhang , Richard Yi Da Xu , Xiangfeng Luo

LLM-Assisted Topic Reduction for BERTopic on Social Media Data

The BERTopic framework leverages transformer embeddings and hierarchical clustering to extract latent topics from unstructured text corpora. While effective, it often struggles with social media data, which tends to be noisy and sparse,…

Computation and Language · Computer Science 2025-09-25 Wannes Janssens , Matthias Bogaert , Dirk Van den Poel

NLOMJ--Natural Language Object Model in Java

In this paper we present NLOMJ--a natural language object model in Java with English as the experiment language. This modal describes the grammar elements of any permissible expression in a natural language and their complicated relations…

Computation and Language · Computer Science 2007-05-23 Jiyou Jia

Enhancing Short-Text Topic Modeling with LLM-Driven Context Expansion and Prefix-Tuned VAEs

Topic modeling is a powerful technique for uncovering hidden themes within a collection of documents. However, the effectiveness of traditional topic models often relies on sufficient word co-occurrence, which is lacking in short texts.…

Computation and Language · Computer Science 2024-10-22 Pritom Saha Akash , Kevin Chen-Chuan Chang

Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling

Topic modelling, as a well-established unsupervised technique, has found extensive use in automatically detecting significant topics within a corpus of documents. However, classic topic modelling approaches (e.g., LDA) have certain…

Computation and Language · Computer Science 2024-03-27 Yida Mu , Chun Dong , Kalina Bontcheva , Xingyi Song

Large Language Models as Data Preprocessors

Large Language Models (LLMs), typified by OpenAI's GPT, have marked a significant advancement in artificial intelligence. Trained on vast amounts of text data, LLMs are capable of understanding and generating human-like text across a…

Artificial Intelligence · Computer Science 2024-10-29 Haochen Zhang , Yuyang Dong , Chuan Xiao , Masafumi Oyamada

Towards the TopMost: A Topic Modeling System Toolkit

Topic models have a rich history with various applications and have recently been reinvigorated by neural topic modeling. However, these numerous topic models adopt totally distinct datasets, implementations, and evaluations. This impedes…

Computation and Language · Computer Science 2024-06-17 Xiaobao Wu , Fengjun Pan , Anh Tuan Luu

A Language Model of Java Methods with Train/Test Deduplication

This tool demonstration presents a research toolkit for a language model of Java source code. The target audience includes researchers studying problems at the granularity level of subroutines, statements, or variables in Java. In contrast…

Software Engineering · Computer Science 2023-05-16 Chia-Yi Su , Aakash Bansal , Vijayanta Jain , Sepideh Ghanavati , Collin McMillan