English
Related papers

Related papers: Combinatorial Topic Models using Small-Variance As…

200 papers

Topic modelling in Natural Language Processing uncovers hidden topics in large, unlabelled text datasets. It is widely applied in fields such as information retrieval, content summarisation, and trend analysis across various disciplines.…

Computation and Language · Computer Science 2025-11-18 Saranzaya Magsarjav , Melissa Humphries , Jonathan Tuke , Lewis Mitchell

This paper proposes a topic modeling method that scales linearly to billions of documents. We make three core contributions: i) we present a topic modeling method, Tensor Latent Dirichlet Allocation (TLDA), that has identifiable and…

Machine Learning · Computer Science 2026-01-14 Sara Kangaslahti , Danny Ebanks , Jean Kossaifi , Anqi Liu , R. Michael Alvarez , Animashree Anandkumar

Individual events at high-energy colliders like the LHC can be represented by a sequence of measurements, or 'point patterns' in an observable space. Starting from this data representation, we build a simple Bayesian probabilistic model for…

High Energy Physics - Phenomenology · Physics 2020-12-17 Darius A. Faroughy

Context: Topic modeling finds human-readable structures in unstructured textual data. A widely used topic modeler is Latent Dirichlet allocation. When run on different datasets, LDA suffers from "order effects" i.e. different topics are…

Software Engineering · Computer Science 2018-03-16 Amritanshu Agrawal , Wei Fu , Tim Menzies

Multiple adverse health conditions co-occurring in a patient are typically associated with poor prognosis and increased office or hospital visits. Developing methods to identify patterns of co-occurring conditions can assist in diagnosis.…

Computation and Language · Computer Science 2017-11-30 Moumita Bhattacharya , Claudine Jurkovitz , Hagit Shatkay

Aviation safety is paramount in the modern world, with a continuous commitment to reducing accidents and improving safety standards. Central to this endeavor is the analysis of aviation accident reports, rich textual resources that hold…

Computation and Language · Computer Science 2024-03-11 Aziida Nanyonga , Hassan Wasswa , Graham Wild

Topic modeling is a widely used technique for revealing underlying thematic structures within textual data. However, existing models have certain limitations, particularly when dealing with short text datasets that lack co-occurring words.…

Artificial Intelligence · Computer Science 2023-12-18 Han Wang , Nirmalendu Prakash , Nguyen Khoi Hoang , Ming Shan Hee , Usman Naseem , Roy Ka-Wei Lee

Traditionally, Latent Dirichlet Allocation (LDA) ingests words in a collection of documents to discover their latent topics using word-document co-occurrences. However, it is unclear how to achieve the best results for languages without…

Computation and Language · Computer Science 2021-08-25 Jin Cheevaprawatdomrong , Alexandra Schofield , Attapol T. Rutherford

Supervisory signals have the potential to make low-dimensional data representations, like those learned by mixture and topic models, more interpretable and useful. We propose a framework for training latent variable models that explicitly…

We show how to learn a neural topic model with discrete random variables---one that explicitly models each word's assigned topic---using neural variational inference that does not rely on stochastic backpropagation to handle the discrete…

Machine Learning · Computer Science 2020-10-26 Mehdi Rezaee , Francis Ferraro

Topic models and all their variants analyse text by learning meaningful representations through word co-occurrences. As pointed out by Williamson et al. (2010), such models implicitly assume that the probability of a topic to be active and…

Computation and Language · Computer Science 2023-01-27 Kostadin Cvejoski , Ramsés J. Sánchez , César Ojeda

The training of topic models for a multilingual environment is a challenging task, requiring the use of sophisticated algorithms, topic-aligned corpora, and manual evaluation. These difficulties are further exacerbated when the developer…

Computation and Language · Computer Science 2025-09-03 Felix Engl , Andreas Henrich

Topic modeling plays a vital role in uncovering hidden semantic structures within text corpora, but existing models struggle in low-resource settings where limited target-domain data leads to unstable and incoherent topic inference. We…

Computation and Language · Computer Science 2025-06-10 Pritom Saha Akash , Kevin Chen-Chuan Chang

Topic modelling was mostly dominated by Bayesian graphical models during the last decade. With the rise of transformers in Natural Language Processing, however, several successful models that rely on straightforward clustering approaches in…

Machine Learning · Computer Science 2024-03-07 Arik Reuter , Anton Thielmann , Christoph Weisser , Benjamin Säfken , Thomas Kneib

Generating user interpretable multi-class predictions in data rich environments with many classes and explanatory covariates is a daunting task. We introduce Diagonal Orthant Latent Dirichlet Allocation (DOLDA), a supervised topic model for…

Machine Learning · Statistics 2016-10-21 Måns Magnusson , Leif Jonsson , Mattias Villani

Latent Dirichlet allocation (LDA) obtains essential information from data by using Bayesian inference. It is applied to knowledge discovery via dimension reducing and clustering in many fields. However, its generalization error had not been…

Machine Learning · Statistics 2021-01-26 Naoki Hayashi

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for discovery of hidden semantic architecture of text datasets, and plays a fundamental role in many machine learning applications. However, like many other machine…

Machine Learning · Computer Science 2019-07-02 Fangyuan Zhao , Xuebin Ren , Shusen Yang , Xinyu Yang

We introduce incremental variational inference and apply it to latent Dirichlet allocation (LDA). Incremental variational inference is inspired by incremental EM and provides an alternative to stochastic variational inference. Incremental…

Machine Learning · Statistics 2015-07-23 Cedric Archambeau , Beyza Ermis

Although latent factor models (e.g., matrix factorization) obtain good performance in predictions, they suffer from several problems including cold-start, non-transparency, and suboptimal recommendations. In this paper, we employ text with…

Machine Learning · Computer Science 2022-03-03 Biyi Fang , Kripa Rajshekhar , Diego Klabjan

An initial procedure in text-as-data applications is text preprocessing. One of the typical steps, which can substantially facilitate computations, consists in removing infrequent words believed to provide limited information about the…

Computation and Language · Computer Science 2023-11-27 Victor Bystrov , Viktoriia Naboka-Krell , Anna Staszewska-Bystrova , Peter Winker
‹ Prev 1 3 4 5 6 7 10 Next ›