Related papers: Dirichlet-vMF Mixture Model

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec

Distributed dense word vectors have been shown to be effective at capturing token-level semantic and syntactic regularities in language, while topic models can form interpretable representations over documents. In this work, we describe…

Computation and Language · Computer Science 2016-05-09 Christopher E Moody

On the Replicability of Combining Word Embeddings and Retrieval Models

We replicate recent experiments attempting to demonstrate an attractive hypothesis about the use of the Fisher kernel framework and mixture models for aggregating word embeddings towards document representations and the use of these…

Computation and Language · Computer Science 2020-01-15 Luca Papariello , Alexandros Bampoulidis , Mihai Lupu

Nonparametric Spherical Topic Modeling with Word Embeddings

Traditional topic models do not account for semantic regularities in language. Recent distributional representations of words exhibit semantic consistency over directional metrics such as cosine similarity. However, neither categorical nor…

Computation and Language · Computer Science 2016-04-04 Kayhan Batmanghelich , Ardavan Saeedi , Karthik Narasimhan , Sam Gershman

Learning Topic Models - Going beyond SVD

Topic Modeling is an approach used for automatic comprehension and classification of data in a variety of settings, and perhaps the canonical application is in uncovering thematic structure in a corpus of documents. A number of foundational…

Machine Learning · Computer Science 2012-04-13 Sanjeev Arora , Rong Ge , Ankur Moitra

Variable Selection for Latent Dirichlet Allocation

In latent Dirichlet allocation (LDA), topics are multinomial distributions over the entire vocabulary. However, the vocabulary usually contains many words that are not relevant in forming the topics. We adopt a variable selection method…

Machine Learning · Computer Science 2012-05-08 Dongwoo Kim , Yeonseung Chung , Alice Oh

Using Variational Inference and MapReduce to Scale Topic Modeling

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for exploring document collections. Because of the increasing prevalence of large datasets, there is a need to improve the scalability of inference of LDA. In this…

Artificial Intelligence · Computer Science 2011-07-20 Ke Zhai , Jordan Boyd-Graber , Nima Asadi

On PyTorch Implementation of Density Estimators for von Mises-Fisher and Its Mixture

The von Mises-Fisher (vMF) is a well-known density model for directional random variables. The recent surge of the deep embedding methodologies for high-dimensional structured data such as images or texts, aimed at extracting salient…

Machine Learning · Computer Science 2021-02-11 Minyoung Kim

Indexing by Latent Dirichlet Allocation and Ensemble Model

The contribution of this paper is two-fold. First, we present Indexing by Latent Dirichlet Allocation (LDI), an automatic document indexing method. The probability distributions in LDI utilize those in Latent Dirichlet Allocation (LDA), a…

Information Retrieval · Computer Science 2014-12-12 Yanshan Wang , Jae-Sung Lee , In-Chan Choi

Statistical Speech Model Description with VMF Mixture Model

In this paper, we present the LSF parameters by a unit vector form, which has directional characteristics. The underlying distribution of this unit vector variable is modeled by a von Mises-Fisher mixture model (VMM). With the high rate…

Sound · Computer Science 2020-02-04 Zhanyu Ma , Arne Leijon

The Dynamic Embedded Topic Model

Topic modeling analyzes documents to learn meaningful patterns of words. For documents collected in sequence, dynamic topic models capture how these patterns vary over time. We develop the dynamic embedded topic model (D-ETM), a generative…

Computation and Language · Computer Science 2019-10-14 Adji B. Dieng , Francisco J. R. Ruiz , David M. Blei

Kernel Topic Models

Latent Dirichlet Allocation models discrete data as a mixture of discrete distributions, using Dirichlet beliefs over the mixture weights. We study a variation of this concept, in which the documents' mixture weight beliefs are replaced…

Machine Learning · Computer Science 2011-10-24 Philipp Hennig , David Stern , Ralf Herbrich , Thore Graepel

Topic2Vec: Learning Distributed Representations of Topics

Latent Dirichlet Allocation (LDA) mining thematic structure of documents plays an important role in nature language processing and machine learning areas. However, the probability distribution from LDA only describes the statistical…

Computation and Language · Computer Science 2015-06-30 Li-Qiang Niu , Xin-Yu Dai

A derivation of variational message passing (VMP) for latent Dirichlet allocation (LDA)

Latent Dirichlet Allocation (LDA) is a probabilistic model used to uncover latent topics in a corpus of documents. Inference is often performed using variational Bayes (VB) algorithms, which calculate a lower bound to the posterior…

Machine Learning · Computer Science 2022-08-26 Rebecca M. C. Taylor , Dirko Coetsee , Johan A. du Preez

Multivariate Gaussian Topic Modelling: A novel approach to discover topics with greater semantic coherence

An important aspect of text mining involves information retrieval in form of discovery of semantic themes (topics) from documents using topic modelling. While generative topic models like Latent Dirichlet Allocation (LDA) or Latent Semantic…

Machine Learning · Computer Science 2025-11-04 Satyajeet Sahoo , Jhareswar Maiti

A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data

Topic modeling based on latent Dirichlet allocation (LDA) has been a framework of choice to deal with multimodal data, such as in image annotation tasks. Another popular approach to model the multimodal data is through deep neural networks,…

Computer Vision and Pattern Recognition · Computer Science 2016-01-01 Yin Zheng , Yu-Jin Zhang , Hugo Larochelle

von Mises-Fisher Mixture Model-based Deep learning: Application to Face Verification

A number of pattern recognition tasks, \textit{e.g.}, face verification, can be boiled down to classification or clustering of unit length directional feature vectors whose distance can be simply computed by their angle. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2018-01-03 Md. Abul Hasnat , Julien Bohné , Jonathan Milgram , Stéphane Gentric , Liming Chen

Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings

A topic model is often formulated as a generative model that explains how each word of a document is generated given a set of topics and document-specific topic proportions. It is focused on capturing the word co-occurrences in a document…

Machine Learning · Computer Science 2022-03-16 Dongsheng Wang , Dandan Guo , He Zhao , Huangjie Zheng , Korawat Tanwisuth , Bo Chen , Mingyuan Zhou

Spherical Latent Spaces for Stable Variational Autoencoders

A hallmark of variational autoencoders (VAEs) for text processing is their combination of powerful encoder-decoder models, such as LSTMs, with simple latent distributions, typically multivariate Gaussians. These models pose a difficult…

Computation and Language · Computer Science 2018-10-15 Jiacheng Xu , Greg Durrett

Fast Online EM for Big Topic Modeling

The expectation-maximization (EM) algorithm can compute the maximum-likelihood (ML) or maximum a posterior (MAP) point estimate of the mixture models or latent variable models such as latent Dirichlet allocation (LDA), which has been one of…

Machine Learning · Computer Science 2015-12-08 Jia Zeng , Zhi-Qiang Liu , Xiao-Qin Cao

Autoencoding Variational Inference For Topic Models

Topic models are one of the most popular methods for learning representations of text, but a major challenge is that any change to the topic model requires mathematically deriving a new inference algorithm. A promising approach to address…

Machine Learning · Statistics 2017-03-07 Akash Srivastava , Charles Sutton