English
Related papers

Related papers: Combinatorial Topic Models using Small-Variance As…

200 papers

One of the main computational and scientific challenges in the modern age is to extract useful information from unstructured texts. Topic models are one popular machine-learning approach which infers the latent topical structure of a…

Machine Learning · Statistics 2018-07-20 Martin Gerlach , Tiago P. Peixoto , Eduardo G. Altmann

Latent Dirichlet Allocation (LDA) is a topic model widely used in natural language processing and machine learning. Most approaches to training the model rely on iterative algorithms, which makes it difficult to run LDA on big corpora that…

Machine Learning · Statistics 2020-10-23 Alexander Terenin , Måns Magnusson , Leif Jonsson , David Draper

Inferring topics from the overwhelming amount of short texts becomes a critical but challenging task for many content analysis tasks, such as content charactering, user interest profiling, and emerging topic detecting. Existing methods such…

Computation and Language · Computer Science 2016-09-28 Jipeng Qiang , Ping Chen , Tong Wang , Xindong Wu

Topic modeling is a branch of Natural Language Processing (NLP) that aims to organize large collections of texts into coherent groups according to word co-occurrence patterns, with Latent Dirichlet Allocation (LDA) remaining one of the most…

Computation and Language · Computer Science 2026-05-29 Alex Ding , Tarun Rapaka , Willy Rodriguez , Jason Yang

Latent Dirichlet allocation (LDA) is a popular topic modeling technique in academia but less so in industry, especially in large-scale applications involving search engine and online advertising systems. A main underlying reason is that the…

Information Retrieval · Computer Science 2015-12-08 Yi Wang , Xuemin Zhao , Zhenlong Sun , Hao Yan , Lifeng Wang , Zhihui Jin , Liubin Wang , Yang Gao , Ching Law , Jia Zeng

As one of the simplest probabilistic topic modeling techniques, latent Dirichlet allocation (LDA) has found many important applications in text mining, computer vision and computational biology. Recent training algorithms for LDA can be…

Machine Learning · Computer Science 2012-06-11 Jia Zeng , Zhi-Qiang Liu , Xiao-Qin Cao

Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two…

Computation and Language · Computer Science 2018-10-16 Dat Quoc Nguyen , Richard Billingsley , Lan Du , Mark Johnson

Topic models, such as Latent Dirichlet Allocation (LDA), posit that documents are drawn from admixtures of distributions over words, known as topics. The inference problem of recovering topics from admixtures, is NP-hard. Assuming…

Machine Learning · Statistics 2014-11-05 Trapit Bansal , Chiranjib Bhattacharyya , Ravindran Kannan

A text mining approach is proposed based on latent Dirichlet allocation (LDA) to analyze the Consumer Financial Protection Bureau (CFPB) consumer complaints. The proposed approach aims to extract latent topics in the CFPB complaint…

Information Retrieval · Computer Science 2018-07-20 Kaveh Bastani , Hamed Namavari , Jeffry Shaffer

Unsupervised estimation of latent variable models is a fundamental problem central to numerous applications of machine learning and statistics. This work presents a principled approach for estimating broad classes of such models, including…

Machine Learning · Statistics 2013-05-27 Animashree Anandkumar , Daniel Hsu , Adel Javanmard , Sham M. Kakade

Probabilistic topic models are generative models that describe the content of documents by discovering the latent topics underlying them. However, the structure of the textual input, and for instance the grouping of words in coherent text…

Computation and Language · Computer Science 2016-06-02 Georgios Balikas , Massih-Reza Amini , Marianne Clausel

Variational inference is a very efficient and popular heuristic used in various forms in the context of latent variable models. It's closely related to Expectation Maximization (EM), and is applied when exact EM is computationally…

Machine Learning · Computer Science 2015-08-25 Pranjal Awasthi , Andrej Risteski

The Latent Dirichlet Allocation (LDA) model is a popular method for creating mixed-membership clusters. Despite having been originally developed for text analysis, LDA has been used for a wide range of other applications. We propose a new…

Information Retrieval · Computer Science 2022-02-24 Gilson Shimizu , Rafael Izbicki , Denis Valle

Topic modeling is widely used for analytically evaluating large collections of textual data. One of the most popular topic techniques is Latent Dirichlet Allocation (LDA), which is flexible and adaptive, but not optimal for e.g. short texts…

Computation and Language · Computer Science 2022-12-19 Muriël de Groot , Mohammad Aliannejadi , Marcel R. Haas

We propose a parsimonious topic model for text corpora. In related models such as Latent Dirichlet Allocation (LDA), all words are modeled topic-specifically, even though many words occur with similar frequencies across different topics.…

Machine Learning · Computer Science 2016-05-16 Hossein Soleimani , David J. Miller

A popular approach to topic modeling involves extracting co-occurring n-grams of a corpus into semantic themes. The set of n-grams in a theme represents an underlying topic, but most topic modeling approaches are not able to label these…

Computation and Language · Computer Science 2017-05-19 Justin Wood , Patrick Tan , Wei Wang , Corey Arnold

Topic modeling has found wide application in many problems where latent structures of the data are crucial for typical inference tasks. When applying a topic model, a relatively standard pre-processing step is to first build a vocabulary of…

Computer Vision and Pattern Recognition · Computer Science 2020-01-17 Yuzhen Ding , Baoxin Li

Extracting and identifying latent topics in large text corpora has gained increasing importance in Natural Language Processing (NLP). Most models, whether probabilistic models similar to Latent Dirichlet Allocation (LDA) or neural topic…

Computation and Language · Computer Science 2023-03-31 Anton Thielmann , Quentin Seifert , Arik Reuter , Elisabeth Bergherr , Benjamin Säfken

Choosing the number of topics $T$ in Latent Dirichlet Allocation (LDA) is a key design decision that strongly affects both the statistical fit and interpretability of topic models. In this work, we formulate the selection of $T$ as a…

Machine Learning · Computer Science 2025-12-19 Roman Akramov , Artem Khamatullin , Svetlana Glazyrina , Maksim Kryzhanovskiy , Roman Ischenko

Labeled Latent Dirichlet Allocation (LLDA) is an extension of the standard unsupervised Latent Dirichlet Allocation (LDA) algorithm, to address multi-label learning tasks. Previous work has shown it to perform in par with other…

Machine Learning · Statistics 2017-09-19 Yannis Papanikolaou , Grigorios Tsoumakas