English
Related papers

Related papers: Combinatorial Topic Models using Small-Variance As…

200 papers

Topic Modeling is an approach used for automatic comprehension and classification of data in a variety of settings, and perhaps the canonical application is in uncovering thematic structure in a corpus of documents. A number of foundational…

Machine Learning · Computer Science 2012-04-13 Sanjeev Arora , Rong Ge , Ankur Moitra

Latent Dirichlet Allocation (LDA) mining thematic structure of documents plays an important role in nature language processing and machine learning areas. However, the probability distribution from LDA only describes the statistical…

Computation and Language · Computer Science 2015-06-30 Li-Qiang Niu , Xin-Yu Dai

In this work, automatic analysis of themes contained in a large corpora of judgments from public procurement domain is performed. The employed technique is unsupervised latent Dirichlet allocation (LDA). In addition, it is proposed, to use…

Computation and Language · Computer Science 2014-12-18 Michał Łopuszyński

Nowadays, data analysis has become a problem as the amount of data is constantly increasing. In order to overcome this problem in textual data, many models and methods are used in natural language processing. The topic modeling field is one…

Computation and Language · Computer Science 2021-10-22 Zekeriya Anil Guven , Banu Diri , Tolgahan Cakaloglu

With the advent and popularity of big data mining and huge text analysis in modern times, automated text summarization became prominent for extracting and retrieving important information from documents. This research investigates aspects…

Information Retrieval · Computer Science 2023-05-31 Daniel F. O. Onah , Elaine L. L. Pang , Mahmoud El-Haj

In this paper, we provide the first practical algorithms with provable guarantees for the problem of inferring the topics assigned to each document in an LDA topic model. This is the primary inference problem for many applications of topic…

Machine Learning · Computer Science 2025-06-10 Adam Breuer

Latent Dirichlet Allocation (LDA) is a three-level hierarchical Bayesian model for topic inference. In spite of its great success, inferring the latent topic distribution with LDA is time-consuming. Motivated by the transfer learning…

Machine Learning · Computer Science 2015-08-06 Dongxu Zhang , Tianyi Luo , Dong Wang , Rong Liu

Most of the information on the Internet is represented in the form of microtexts, which are short text snippets such as news headlines or tweets. These sources of information are abundant, and mining these data could uncover meaningful…

Computation and Language · Computer Science 2019-09-17 Trung Trinh , Tho Quan , Trung Mai

This paper presents an algorithm for the unsupervised learning of latent variable models from unlabeled sets of data. We base our technique on spectral decomposition, providing a technique that proves to be robust both in theory and in…

Machine Learning · Statistics 2017-04-05 Matteo Ruffini , Marta Casanellas , Ricard Gavaldà

We study a parametric family of latent variable models, namely topic models, equipped with a hierarchical structure among the topic variables. Such models may be viewed as a finite mixture of the latent Dirichlet allocation (LDA) induced…

Statistics Theory · Mathematics 2024-08-27 Sunrit Chakraborty , Rayleigh Lei , XuanLong Nguyen

Social scientists employ latent Dirichlet allocation (LDA) to find highly specific topics in large corpora, but they often struggle in this task because (1) LDA, in general, takes a significant amount of time to fit on large corpora; (2)…

Methodology · Statistics 2025-12-23 Kohei Watanabe

Variational Bayes (VB) applied to latent Dirichlet allocation (LDA) has become the most popular algorithm for aspect modeling. While sufficiently successful in text topic extraction from large corpora, VB is less successful in identifying…

Machine Learning · Computer Science 2022-08-22 Rebecca M. C. Taylor , Johan A. du Preez

Latent Dirichlet allocation (LDA) is an important hierarchical Bayesian model for probabilistic topic modeling, which attracts worldwide interests and touches on many important applications in text mining, computer vision and computational…

Machine Learning · Computer Science 2012-08-14 Jia Zeng

Topic modeling is admittedly a convenient way to monitor markets trend. Conventionally, Latent Dirichlet Allocation, LDA, is considered a must-do model to gain this type of information. By given the merit of deducing keyword with token…

Computation and Language · Computer Science 2023-09-19 Ching-Hsun Tseng , Shin-Jye Lee , Po-Wei Cheng , Chien Lee , Chih-Chieh Hung

Latent Dirichlet Allocation (LDA) models trained without stopword removal often produce topics with high posterior probabilities on uninformative words, obscuring the underlying corpus content. Even when canonical stopwords are manually…

Computation and Language · Computer Science 2017-10-17 Angela Fan , Finale Doshi-Velez , Luke Miratrix

Analyzing short texts infers discriminative and coherent latent topics that is a critical and fundamental task since many real-world applications require semantic understanding of short texts. Traditional long text topic modeling algorithms…

Information Retrieval · Computer Science 2019-04-17 Qiang Jipeng , Qian Zhenyu , Li Yun , Yuan Yunhao , Wu Xindong

We present LDAExplore, a tool to visualize topic distributions in a given document corpus that are generated using Topic Modeling methods. Latent Dirichlet Allocation (LDA) is one of the basic methods that is predominantly used to generate…

Information Retrieval · Computer Science 2015-07-24 Ashwinkumar Ganesan , Kiante Brantley , Shimei Pan , Jian Chen

Current topic models often suffer from discovering topics not matching human intuition, unnatural switching of topics within documents and high computational demands. We address these concerns by proposing a topic model and an inference…

Computation and Language · Computer Science 2018-02-06 Johannes Schneider

We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive an approximate maximum-likelihood procedure for parameter estimation, which…

Machine Learning · Statistics 2010-03-04 David M. Blei , Jon D. McAuliffe

Topic modeling, a method for extracting the underlying themes from a collection of documents, is an increasingly important component of the design of intelligent systems enabling the sense-making of highly dynamic and diverse streams of…

Information Retrieval · Computer Science 2019-10-07 Chris Gropp , Alexander Herzog , Ilya Safro , Paul W. Wilson , Amy W. Apon