Related papers: Kernel Topic Models

Multivariate Gaussian Topic Modelling: A novel approach to discover topics with greater semantic coherence

An important aspect of text mining involves information retrieval in form of discovery of semantic themes (topics) from documents using topic modelling. While generative topic models like Latent Dirichlet Allocation (LDA) or Latent Semantic…

Machine Learning · Computer Science 2025-11-04 Satyajeet Sahoo , Jhareswar Maiti

Gaussian mixture models in Hilbert spaces via kernel methods

Modern datasets across many disciplines increasingly consist of time-evolving, potentially infinite-dimensional random objects, such as dynamic functional data, which are naturally modeled in Hilbert spaces. In these settings,…

Machine Learning · Statistics 2026-05-08 Daniel López-Montero , Antonio Álvarez-López , Marcos Matabuena

Gaussian Hierarchical Latent Dirichlet Allocation: Bringing Polysemy Back

Topic models are widely used to discover the latent representation of a set of documents. The two canonical models are latent Dirichlet allocation, and Gaussian latent Dirichlet allocation, where the former uses multinomial distributions…

Machine Learning · Statistics 2023-06-08 Takahiro Yoshida , Ryohei Hisano , Takaaki Ohnishi

On Smoothing and Inference for Topic Models

Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling,…

Machine Learning · Computer Science 2012-05-14 Arthur Asuncion , Max Welling , Padhraic Smyth , Yee Whye Teh

Gaussian Process Topic Models

We introduce Gaussian Process Topic Models (GPTMs), a new family of topic models which can leverage a kernel among documents while extracting correlated topics. GPTMs can be considered a systematic generalization of the Correlated Topic…

Machine Learning · Computer Science 2012-03-19 Amrudin Agovic , Arindam Banerjee

Model Selection for Topic Models via Spectral Decomposition

Topic models have achieved significant successes in analyzing large-scale text corpus. In practical applications, we are always confronted with the challenge of model selection, i.e., how to appropriately set the number of topics. Following…

Machine Learning · Statistics 2015-02-18 Dehua Cheng , Xinran He , Yan Liu

A correlated topic model of Science

Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics,…

Applications · Statistics 2009-09-29 David M. Blei , John D. Lafferty

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec

Distributed dense word vectors have been shown to be effective at capturing token-level semantic and syntactic regularities in language, while topic models can form interpretable representations over documents. In this work, we describe…

Computation and Language · Computer Science 2016-05-09 Christopher E Moody

Dirichlet moment tensors and the correspondence between admixture and mixture of product models

Understanding posterior contraction behavior in Bayesian hierarchical models is of fundamental importance, but progress in this question is relatively sparse in comparison to the theory of density estimation. In this paper, we study two…

Statistics Theory · Mathematics 2025-12-22 Dat Do , Sunrit Chakraborty , Jonathan Terhorst , XuanLong Nguyen

Geometric Dirichlet Means algorithm for topic inference

We propose a geometric algorithm for topic learning and inference that is built on the convex geometry of topics arising from the Latent Dirichlet Allocation (LDA) model and its nonparametric extensions. To this end we study the…

Machine Learning · Statistics 2016-10-31 Mikhail Yurochkin , XuanLong Nguyen

Topic Modeling in Embedding Spaces

Topic modeling analyzes documents to learn meaningful patterns of words. However, existing topic models fail to learn interpretable topics when working with large and heavy-tailed vocabularies. To this end, we develop the Embedded Topic…

Information Retrieval · Computer Science 2019-07-12 Adji B. Dieng , Francisco J. R. Ruiz , David M. Blei

Ordering-sensitive and Semantic-aware Topic Modeling

Topic modeling of textual corpora is an important and challenging problem. In most previous work, the "bag-of-words" assumption is usually made which ignores the ordering of words. This assumption simplifies the computation, but it…

Machine Learning · Computer Science 2015-02-13 Min Yang , Tianyi Cui , Wenting Tu

Topic Modeling of Hierarchical Corpora

We study the problem of topic modeling in corpora whose documents are organized in a multi-level hierarchy. We explore a parametric approach to this problem, assuming that the number of topics is known or can be estimated by…

Machine Learning · Statistics 2015-04-14 Do-kyum Kim , Geoffrey M. Voelker , Lawrence K. Saul

Improving Topic Models with Latent Feature Word Representations

Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two…

Computation and Language · Computer Science 2018-10-16 Dat Quoc Nguyen , Richard Billingsley , Lan Du , Mark Johnson

Keyword Assisted Embedded Topic Model

By illuminating latent structures in a corpus of text, topic models are an essential tool for categorizing, summarizing, and exploring large collections of documents. Probabilistic topic models, such as latent Dirichlet allocation (LDA),…

Information Retrieval · Computer Science 2021-12-07 Bahareh Harandizadeh , J. Hunter Priniski , Fred Morstatter

An Adaptation of Topic Modeling to Sentences

Advances in topic modeling have yielded effective methods for characterizing the latent semantics of textual data. However, applying standard topic modeling approaches to sentence-level tasks introduces a number of challenges. In this…

Computation and Language · Computer Science 2016-07-21 Ruey-Cheng Chen , Reid Swanson , Andrew S. Gordon

Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey

Topic modeling is one of the most powerful techniques in text mining for data mining, latent data discovery, and finding relationships among data, text documents. Researchers have published many articles in the field of topic modeling and…

Information Retrieval · Computer Science 2018-12-07 Hamed Jelodar , Yongli Wang , Chi Yuan , Xia Feng , Xiahui Jiang , Yanchao Li , Liang Zhao

Towards a Kernel based Uncertainty Decomposition Framework for Data and Models

This paper introduces a new framework for quantifying predictive uncertainty for both data and models that relies on projecting the data into a Gaussian reproducing kernel Hilbert space (RKHS) and transforming the data probability density…

Machine Learning · Computer Science 2021-09-24 Rishabh Singh , Jose C. Principe

The Dynamic Embedded Topic Model

Topic modeling analyzes documents to learn meaningful patterns of words. For documents collected in sequence, dynamic topic models capture how these patterns vary over time. We develop the dynamic embedded topic model (D-ETM), a generative…

Computation and Language · Computer Science 2019-10-14 Adji B. Dieng , Francisco J. R. Ruiz , David M. Blei

A Gamma-Poisson Mixture Topic Model for Short Text

Most topic models are constructed under the assumption that documents follow a multinomial distribution. The Poisson distribution is an alternative distribution to describe the probability of count data. For topic modelling, the Poisson…

Computation and Language · Computer Science 2020-04-27 Jocelyn Mazarura , Alta de Waal , Pieter de Villiers