Related papers: Combinatorial Topic Models using Small-Variance As…

Quantifying consistency and accuracy of Latent Dirichlet Allocation

Topic modelling in Natural Language Processing uncovers hidden topics in large, unlabelled text datasets. It is widely applied in fields such as information retrieval, content summarisation, and trend analysis across various disciplines.…

Computation and Language · Computer Science 2025-11-18 Saranzaya Magsarjav , Melissa Humphries , Jonathan Tuke , Lewis Mitchell

Analyzing Political Text at Scale with Online Tensor LDA

This paper proposes a topic modeling method that scales linearly to billions of documents. We make three core contributions: i) we present a topic modeling method, Tensor Latent Dirichlet Allocation (TLDA), that has identifiable and…

Machine Learning · Computer Science 2026-01-14 Sara Kangaslahti , Danny Ebanks , Jean Kossaifi , Anqi Liu , R. Michael Alvarez , Animashree Anandkumar

Uncovering hidden patterns in collider events with Bayesian probabilistic models

Individual events at high-energy colliders like the LHC can be represented by a sequence of measurements, or 'point patterns' in an observable space. Starting from this data representation, we build a simple Bayesian probabilistic model for…

High Energy Physics - Phenomenology · Physics 2020-12-17 Darius A. Faroughy

What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)

Context: Topic modeling finds human-readable structures in unstructured textual data. A widely used topic modeler is Latent Dirichlet allocation. When run on different datasets, LDA suffers from "order effects" i.e. different topics are…

Software Engineering · Computer Science 2018-03-16 Amritanshu Agrawal , Wei Fu , Tim Menzies

Identifying Patterns of Associated-Conditions through Topic Models of Electronic Medical Records

Multiple adverse health conditions co-occurring in a patient are typically associated with poor prognosis and increased office or hospital visits. Developing methods to identify patterns of co-occurring conditions can assist in diagnosis.…

Computation and Language · Computer Science 2017-11-30 Moumita Bhattacharya , Claudine Jurkovitz , Hagit Shatkay

Topic Modeling Analysis of Aviation Accident Reports: A Comparative Study between LDA and NMF Models

Aviation safety is paramount in the modern world, with a continuous commitment to reducing accidents and improving safety standards. Central to this endeavor is the analysis of aviation accident reports, rich textual resources that hold…

Computation and Language · Computer Science 2024-03-11 Aziida Nanyonga , Hassan Wasswa , Graham Wild

Prompting Large Language Models for Topic Modeling

Topic modeling is a widely used technique for revealing underlying thematic structures within textual data. However, existing models have certain limitations, particularly when dealing with short text datasets that lack co-occurring words.…

Artificial Intelligence · Computer Science 2023-12-18 Han Wang , Nirmalendu Prakash , Nguyen Khoi Hoang , Ming Shan Hee , Usman Naseem , Roy Ka-Wei Lee

More Than Words: Collocation Tokenization for Latent Dirichlet Allocation Models

Traditionally, Latent Dirichlet Allocation (LDA) ingests words in a collection of documents to discover their latent topics using word-document co-occurrences. However, it is unclear how to achieve the best results for languages without…

Computation and Language · Computer Science 2021-08-25 Jin Cheevaprawatdomrong , Alexandra Schofield , Attapol T. Rutherford

Prediction-Constrained Training for Semi-Supervised Mixture and Topic Models

Supervisory signals have the potential to make low-dimensional data representations, like those learned by mixture and topic models, more interpretable and useful. We propose a framework for training latent variable models that explicitly…

Machine Learning · Statistics 2017-11-15 Michael C. Hughes , Leah Weiner , Gabriel Hope , Thomas H. McCoy , Roy H. Perlis , Erik B. Sudderth , Finale Doshi-Velez

A Discrete Variational Recurrent Topic Model without the Reparametrization Trick

We show how to learn a neural topic model with discrete random variables---one that explicitly models each word's assigned topic---using neural variational inference that does not rely on stochastic backpropagation to handle the discrete…

Machine Learning · Computer Science 2020-10-26 Mehdi Rezaee , Francis Ferraro

Neural Dynamic Focused Topic Model

Topic models and all their variants analyse text by learning meaningful representations through word co-occurrences. As pointed out by Williamson et al. (2010), such models implicitly assume that the probability of a topic to be active and…

Computation and Language · Computer Science 2023-01-27 Kostadin Cvejoski , Ramsés J. Sánchez , César Ojeda

TMT: A Simple Way to Translate Topic Models Using Dictionaries

The training of topic models for a multilingual environment is a challenging task, requiring the use of sophisticated algorithms, topic-aligned corpora, and manual evaluation. These difficulties are further exacerbated when the developer…

Computation and Language · Computer Science 2025-09-03 Felix Engl , Andreas Henrich

Understanding Cross-Domain Adaptation in Low-Resource Topic Modeling

Topic modeling plays a vital role in uncovering hidden semantic structures within text corpora, but existing models struggle in low-resource settings where limited target-domain data leads to unstable and incoherent topic inference. We…

Computation and Language · Computer Science 2025-06-10 Pritom Saha Akash , Kevin Chen-Chuan Chang

Probabilistic Topic Modelling with Transformer Representations

Topic modelling was mostly dominated by Bayesian graphical models during the last decade. With the rise of transformers in Natural Language Processing, however, several successful models that rely on straightforward clustering approaches in…

Machine Learning · Computer Science 2024-03-07 Arik Reuter , Anton Thielmann , Christoph Weisser , Benjamin Säfken , Thomas Kneib

DOLDA - a regularized supervised topic model for high-dimensional multi-class regression

Generating user interpretable multi-class predictions in data rich environments with many classes and explanatory covariates is a daunting task. We introduce Diagonal Orthant Latent Dirichlet Allocation (DOLDA), a supervised topic model for…

Machine Learning · Statistics 2016-10-21 Måns Magnusson , Leif Jonsson , Mattias Villani

The Exact Asymptotic Form of Bayesian Generalization Error in Latent Dirichlet Allocation

Latent Dirichlet allocation (LDA) obtains essential information from data by using Bayesian inference. It is applied to knowledge discovery via dimension reducing and clustering in many fields. However, its generalization error had not been…

Machine Learning · Statistics 2021-01-26 Naoki Hayashi

On Privacy Protection of Latent Dirichlet Allocation Model Training

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for discovery of hidden semantic architecture of text datasets, and plays a fundamental role in many machine learning applications. However, like many other machine…

Machine Learning · Computer Science 2019-07-02 Fangyuan Zhao , Xuebin Ren , Shusen Yang , Xinyu Yang

Incremental Variational Inference for Latent Dirichlet Allocation

We introduce incremental variational inference and apply it to latent Dirichlet allocation (LDA). Incremental variational inference is inspired by incremental EM and provides an alternative to stochastic variational inference. Incremental…

Machine Learning · Statistics 2015-07-23 Cedric Archambeau , Beyza Ermis

Topic Analysis for Text with Side Data

Although latent factor models (e.g., matrix factorization) obtain good performance in predictions, they suffer from several problems including cold-start, non-transparency, and suboptimal recommendations. In this paper, we employ text with…

Machine Learning · Computer Science 2022-03-03 Biyi Fang , Kripa Rajshekhar , Diego Klabjan

Analysing the Impact of Removing Infrequent Words on Topic Quality in LDA Models

An initial procedure in text-as-data applications is text preprocessing. One of the typical steps, which can substantially facilitate computations, consists in removing infrequent words believed to provide limited information about the…

Computation and Language · Computer Science 2023-11-27 Victor Bystrov , Viktoriia Naboka-Krell , Anna Staszewska-Bystrova , Peter Winker