Related papers: Supervised topic models for clinical interpretabil…

Prediction-Constrained Topic Models for Antidepressant Recommendation

Supervisory signals can help topic models discover low-dimensional data representations that are more interpretable for clinical tasks. We propose a framework for training supervised latent Dirichlet allocation that balances two goals:…

Machine Learning · Computer Science 2017-12-05 Michael C. Hughes , Gabriel Hope , Leah Weiner , Thomas H. McCoy , Roy H. Perlis , Erik B. Sudderth , Finale Doshi-Velez

Supervised Topic Models

We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive an approximate maximum-likelihood procedure for parameter estimation, which…

Machine Learning · Statistics 2010-03-04 David M. Blei , Jon D. McAuliffe

MedLDA: A General Framework of Maximum Margin Supervised Topic Models

Supervised topic models utilize document's side information for discovering predictive low dimensional representations of documents. Existing models apply the likelihood-based estimation. In this paper, we present a general framework of…

Machine Learning · Statistics 2013-04-09 Jun Zhu , Amr Ahmed , Eric P. Xing

Combinatorial Topic Models using Small-Variance Asymptotics

Topic models have emerged as fundamental tools in unsupervised machine learning. Most modern topic modeling algorithms take a probabilistic view and derive inference algorithms based on Latent Dirichlet Allocation (LDA) or its variants. In…

Machine Learning · Computer Science 2016-05-30 Ke Jiang , Suvrit Sra , Brian Kulis

Topic Analysis with Side Information: A Neural-Augmented LDA Approach

Traditional topic models such as Latent Dirichlet Allocation (LDA) have been widely used to uncover latent structures in text corpora, but they often struggle to integrate auxiliary information such as metadata, user attributes, or document…

Machine Learning · Computer Science 2025-11-04 Biyi Fang , Truong Vo , Kripa Rajshekhar , Diego Klabjan

Prediction-Constrained Training for Semi-Supervised Mixture and Topic Models

Supervisory signals have the potential to make low-dimensional data representations, like those learned by mixture and topic models, more interpretable and useful. We propose a framework for training latent variable models that explicitly…

Machine Learning · Statistics 2017-11-15 Michael C. Hughes , Leah Weiner , Gabriel Hope , Thomas H. McCoy , Roy H. Perlis , Erik B. Sudderth , Finale Doshi-Velez

Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey

Topic modeling is one of the most powerful techniques in text mining for data mining, latent data discovery, and finding relationships among data, text documents. Researchers have published many articles in the field of topic modeling and…

Information Retrieval · Computer Science 2018-12-07 Hamed Jelodar , Yongli Wang , Chi Yuan , Xia Feng , Xiahui Jiang , Yanchao Li , Liang Zhao

Modeling Word Relatedness in Latent Dirichlet Allocation

Standard LDA model suffers the problem that the topic assignment of each word is independent and word correlation hence is neglected. To address this problem, in this paper, we propose a model called Word Related Latent Dirichlet Allocation…

Computation and Language · Computer Science 2014-11-11 Xun Wang

Discriminative Topic Modeling with Logistic LDA

Despite many years of research into latent Dirichlet allocation (LDA), applying LDA to collections of non-categorical items is still challenging. Yet many problems with much richer data share a similar structure and could benefit from the…

Machine Learning · Statistics 2020-01-08 Iryna Korshunova , Hanchen Xiong , Mateusz Fedoryszak , Lucas Theis

Graph-Sparse LDA: A Topic Model with Structured Sparsity

Originally designed to model text, topic modeling has become a powerful tool for uncovering latent structure in domains including medicine, finance, and vision. The goals for the model vary depending on the application: in some cases, the…

Machine Learning · Statistics 2014-11-24 Finale Doshi-Velez , Byron Wallace , Ryan Adams

Survival-supervised latent Dirichlet allocation models for genomic analysis of time-to-event outcomes

Two challenging problems in the clinical study of cancer are the characterization of cancer subtypes and the classification of individual patients according to those subtypes. Statistical approaches addressing these problems are hampered by…

Methodology · Statistics 2012-02-28 John A. Dawson , Christina Kendziorski

A Spectral Algorithm for Latent Dirichlet Allocation

The problem of topic modeling can be seen as a generalization of the clustering problem, in that it posits that observations are generated due to multiple latent factors (e.g., the words in each document are generated as a mixture of…

Machine Learning · Computer Science 2013-01-21 Animashree Anandkumar , Dean P. Foster , Daniel Hsu , Sham M. Kakade , Yi-Kai Liu

E-LDA: Toward Interpretable LDA Topic Models with Strong Guarantees in Logarithmic Parallel Time

In this paper, we provide the first practical algorithms with provable guarantees for the problem of inferring the topics assigned to each document in an LDA topic model. This is the primary inference problem for many applications of topic…

Machine Learning · Computer Science 2025-06-10 Adam Breuer

A Supervised Neural Autoregressive Topic Model for Simultaneous Image Classification and Annotation

Topic modeling based on latent Dirichlet allocation (LDA) has been a framework of choice to perform scene recognition and annotation. Recently, a new type of topic model called the Document Neural Autoregressive Distribution Estimator…

Computer Vision and Pattern Recognition · Computer Science 2013-05-24 Yin Zheng , Yu-Jin Zhang , Hugo Larochelle

Prior matters: simple and general methods for evaluating and improving topic quality in topic modeling

Latent Dirichlet Allocation (LDA) models trained without stopword removal often produce topics with high posterior probabilities on uninformative words, obscuring the underlying corpus content. Even when canonical stopwords are manually…

Computation and Language · Computer Science 2017-10-17 Angela Fan , Finale Doshi-Velez , Luke Miratrix

A Data-driven Latent Semantic Analysis for Automatic Text Summarization using LDA Topic Modelling

With the advent and popularity of big data mining and huge text analysis in modern times, automated text summarization became prominent for extracting and retrieving important information from documents. This research investigates aspects…

Information Retrieval · Computer Science 2023-05-31 Daniel F. O. Onah , Elaine L. L. Pang , Mahmoud El-Haj

On a Topic Model for Sentences

Probabilistic topic models are generative models that describe the content of documents by discovering the latent topics underlying them. However, the structure of the textual input, and for instance the grouping of words in coherent text…

Computation and Language · Computer Science 2016-06-02 Georgios Balikas , Massih-Reza Amini , Marianne Clausel

Variable Selection for Latent Dirichlet Allocation

In latent Dirichlet allocation (LDA), topics are multinomial distributions over the entire vocabulary. However, the vocabulary usually contains many words that are not relevant in forming the topics. We adopt a variable selection method…

Machine Learning · Computer Science 2012-05-08 Dongwoo Kim , Yeonseung Chung , Alice Oh

Scalable Dynamic Topic Modeling with Clustered Latent Dirichlet Allocation (CLDA)

Topic modeling, a method for extracting the underlying themes from a collection of documents, is an increasingly important component of the design of intelligent systems enabling the sense-making of highly dynamic and diverse streams of…

Information Retrieval · Computer Science 2019-10-07 Chris Gropp , Alexander Herzog , Ilya Safro , Paul W. Wilson , Amy W. Apon

Unsupervised Terminological Ontology Learning based on Hierarchical Topic Modeling

In this paper, we present hierarchical relationbased latent Dirichlet allocation (hrLDA), a data-driven hierarchical topic model for extracting terminological ontologies from a large number of heterogeneous documents. In contrast to…

Computation and Language · Computer Science 2020-01-10 Xiaofeng Zhu , Diego Klabjan , Patrick Bless