English
Related papers

Related papers: Model-Parallel Inference for Big Topic Models

200 papers

Topic models, and more specifically the class of Latent Dirichlet Allocation (LDA), are widely used for probabilistic modeling of text. MCMC sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We…

Machine Learning · Statistics 2017-08-16 Måns Magnusson , Leif Jonsson , Mattias Villani , David Broman

This paper presents our recent efforts, zenLDA, an efficient and scalable Collapsed Gibbs Sampling system for Latent Dirichlet Allocation training, which is thought to be challenging that both data parallelism and model parallelism are…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-23 Bo Zhao , Hucheng Zhou , Guoqiang Li , Yihua Huang

When building large-scale machine learning (ML) programs, such as big topic models or deep neural nets, one usually assumes such tasks can only be attempted with industrial-sized clusters with thousands of nodes, which are out of reach for…

Machine Learning · Statistics 2014-12-05 Jinhui Yuan , Fei Gao , Qirong Ho , Wei Dai , Jinliang Wei , Xun Zheng , Eric P. Xing , Tie-Yan Liu , Wei-Ying Ma

The increasing scale of model size and continuous improvement of performance herald the arrival of the Big Model era. In this report, we explore what and how the big model training works by diving into training objectives and training…

Machine Learning · Computer Science 2022-07-26 Qinghua Liu , Yuxiang Jiang

To solve the big topic modeling problem, we need to reduce both time and space complexities of batch latent Dirichlet allocation (LDA) algorithms. Although parallel LDA algorithms on the multi-processor architecture have low time and space…

Machine Learning · Computer Science 2013-11-19 Jian-Feng Yan , Jia Zeng , Zhi-Qiang Liu , Yang Gao

Dynamic topic models (DTMs) are very effective in discovering topics and capturing their evolution trends in time series data. To do posterior inference of DTMs, existing methods are all batch algorithms that scan the full dataset before…

Machine Learning · Statistics 2016-02-22 Arnab Bhadury , Jianfei Chen , Jun Zhu , Shixia Liu

Besides the text content, documents and their associated words usually come with rich sets of meta informa- tion, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating…

Computation and Language · Computer Science 2017-09-20 He Zhao , Lan Du , Wray Buntine , Gang Liu

Latent Dirichlet Allocation (LDA) is a topic model widely used in natural language processing and machine learning. Most approaches to training the model rely on iterative algorithms, which makes it difficult to run LDA on big corpora that…

Machine Learning · Statistics 2020-10-23 Alexander Terenin , Måns Magnusson , Leif Jonsson , David Draper

In the internet era there has been an explosion in the amount of digital text information available, leading to difficulties of scale for traditional inference algorithms for topic models. Recent advances in stochastic variational inference…

Machine Learning · Computer Science 2013-05-14 James Foulds , Levi Boyles , Christopher Dubois , Padhraic Smyth , Max Welling

In this paper, we provide the first practical algorithms with provable guarantees for the problem of inferring the topics assigned to each document in an LDA topic model. This is the primary inference problem for many applications of topic…

Machine Learning · Computer Science 2025-06-10 Adam Breuer

As one of the simplest probabilistic topic modeling techniques, latent Dirichlet allocation (LDA) has found many important applications in text mining, computer vision and computational biology. Recent training algorithms for LDA can be…

Machine Learning · Computer Science 2012-06-11 Jia Zeng , Zhi-Qiang Liu , Xiao-Qin Cao

Topic modelling, as a well-established unsupervised technique, has found extensive use in automatically detecting significant topics within a corpus of documents. However, classic topic modelling approaches (e.g., LDA) have certain…

Computation and Language · Computer Science 2024-03-27 Yida Mu , Chun Dong , Kalina Bontcheva , Xingyi Song

We present an LDA approach to entity disambiguation. Each topic is associated with a Wikipedia article and topics generate either content words or entity mentions. Training such models is challenging because of the topic and vocabulary…

Machine Learning · Statistics 2013-09-03 Neil Houlsby , Massimiliano Ciaramita

Latent Dirichlet Allocation (LDA) is a prominent generative probabilistic model used for uncovering abstract topics within document collections. In this paper, we explore the effectiveness of augmenting topic models with Large Language…

Computation and Language · Computer Science 2025-07-14 Mengze Hong , Chen Jason Zhang , Di Jiang

The abundant sequential documents such as online archival, social media and news feeds are streamingly updated, where each chunk of documents is incorporated with smoothly evolving yet dependent topics. Such digital texts have attracted…

Information Retrieval · Computer Science 2021-06-28 Jinjin Guo , Longbing Cao , Zhiguo Gong

Topic modeling is a branch of Natural Language Processing (NLP) that aims to organize large collections of texts into coherent groups according to word co-occurrence patterns, with Latent Dirichlet Allocation (LDA) remaining one of the most…

Computation and Language · Computer Science 2026-05-29 Alex Ding , Tarun Rapaka , Willy Rodriguez , Jason Yang

With the rapid adoption of large language models (LLMs) in recommendation systems, the computational and communication bottlenecks caused by their massive parameter sizes and large data volumes have become increasingly prominent. This paper…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-25 Haowei Yang , Yu Tian , Zhongheng Yang , Zhao Wang , Chengrui Zhou , Dannier Li

Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling,…

Machine Learning · Computer Science 2012-05-14 Arthur Asuncion , Max Welling , Padhraic Smyth , Yee Whye Teh

Current topic models often suffer from discovering topics not matching human intuition, unnatural switching of topics within documents and high computational demands. We address these concerns by proposing a topic model and an inference…

Computation and Language · Computer Science 2018-02-06 Johannes Schneider

With the rapid growth of large language models (LLMs), a wide range of methods have been developed to distribute computation and memory across hardware devices for efficient training and inference. While existing surveys provide descriptive…

Machine Learning · Computer Science 2026-02-11 Hossam Amer , Rezaul Karim , Ali Pourranjbar , Weiwei Zhang , Walid Ahmed , Boxing Chen
‹ Prev 1 2 3 10 Next ›