Related papers: Topic Analysis for Text with Side Data

Topic Analysis with Side Information: A Neural-Augmented LDA Approach

Traditional topic models such as Latent Dirichlet Allocation (LDA) have been widely used to uncover latent structures in text corpora, but they often struggle to integrate auxiliary information such as metadata, user attributes, or document…

Machine Learning · Computer Science 2025-11-04 Biyi Fang , Truong Vo , Kripa Rajshekhar , Diego Klabjan

On a Topic Model for Sentences

Probabilistic topic models are generative models that describe the content of documents by discovering the latent topics underlying them. However, the structure of the textual input, and for instance the grouping of words in coherent text…

Computation and Language · Computer Science 2016-06-02 Georgios Balikas , Massih-Reza Amini , Marianne Clausel

Source-LDA: Enhancing probabilistic topic models using prior knowledge sources

A popular approach to topic modeling involves extracting co-occurring n-grams of a corpus into semantic themes. The set of n-grams in a theme represents an underlying topic, but most topic modeling approaches are not able to label these…

Computation and Language · Computer Science 2017-05-19 Justin Wood , Patrick Tan , Wei Wang , Corey Arnold

Multivariate Gaussian Topic Modelling: A novel approach to discover topics with greater semantic coherence

An important aspect of text mining involves information retrieval in form of discovery of semantic themes (topics) from documents using topic modelling. While generative topic models like Latent Dirichlet Allocation (LDA) or Latent Semantic…

Machine Learning · Computer Science 2025-11-04 Satyajeet Sahoo , Jhareswar Maiti

Evaluating Topic Quality with Posterior Variability

Probabilistic topic models such as latent Dirichlet allocation (LDA) are popularly used with Bayesian inference methods such as Gibbs sampling to learn posterior distributions over topic model parameters. We derive a novel measure of LDA…

Computation and Language · Computer Science 2019-09-17 Linzi Xing , Michael J. Paul , Giuseppe Carenini

Graph Topic Modeling for Documents with Spatial or Covariate Dependencies

We address the challenge of incorporating document-level metadata into topic modeling to improve topic mixture estimation. To overcome the computational complexity and lack of theoretical guarantees in existing Bayesian methods, we extend…

Machine Learning · Computer Science 2025-03-18 Yeo Jin Jung , Claire Donnat

The Author-Topic Model for Authors and Documents

We introduce the author-topic model, a generative model for documents that extends Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003) to include authorship information. Each author is associated with a multinomial distribution over…

Information Retrieval · Computer Science 2012-07-19 Michal Rosen-Zvi , Thomas Griffiths , Mark Steyvers , Padhraic Smyth

Topics in the Haystack: Extracting and Evaluating Topics beyond Coherence

Extracting and identifying latent topics in large text corpora has gained increasing importance in Natural Language Processing (NLP). Most models, whether probabilistic models similar to Latent Dirichlet Allocation (LDA) or neural topic…

Computation and Language · Computer Science 2023-03-31 Anton Thielmann , Quentin Seifert , Arik Reuter , Elisabeth Bergherr , Benjamin Säfken

Hierarchical Topic Presence Models

Topic models analyze text from a set of documents. Documents are modeled as a mixture of topics, with topics defined as probability distributions on words. Inferences of interest include the most probable topics and characterization of a…

Information Retrieval · Computer Science 2021-04-19 Jason Wang , Robert E. Weiss

Latent Topic Models for Hypertext

Latent topic models have been successfully applied as an unsupervised topic discovery technique in large document collections. With the proliferation of hypertext document collection such as the Internet, there has also been great interest…

Information Retrieval · Computer Science 2012-06-18 Amit Gruber , Michal Rosen-Zvi , Yair Weiss

A Novel Document Generation Process for Topic Detection based on Hierarchical Latent Tree Models

We propose a novel document generation process based on hierarchical latent tree models (HLTMs) learned from data. An HLTM has a layer of observed word variables at the bottom and multiple layers of latent variables on top. For each…

Computation and Language · Computer Science 2019-07-01 Peixian Chen , Zhourong Chen , Nevin L. Zhang

Latent Tree Models for Hierarchical Topic Detection

We present a novel method for hierarchical topic detection where topics are obtained by clustering documents in multiple ways. Specifically, we model document collections using a class of graphical models called hierarchical latent tree…

Computation and Language · Computer Science 2016-12-22 Peixian Chen , Nevin L. Zhang , Tengfei Liu , Leonard K. M. Poon , Zhourong Chen , Farhan Khawar

Hierarchical Latent Semantic Mapping for Automated Topic Generation

Much of information sits in an unprecedented amount of text data. Managing allocation of these large scale text data is an important problem for many areas. Topic modeling performs well in this problem. The traditional generative models…

Machine Learning · Computer Science 2015-11-30 Guorui Zhou , Guang Chen

Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey

Topic modeling is one of the most powerful techniques in text mining for data mining, latent data discovery, and finding relationships among data, text documents. Researchers have published many articles in the field of topic modeling and…

Information Retrieval · Computer Science 2018-12-07 Hamed Jelodar , Yongli Wang , Chi Yuan , Xia Feng , Xiahui Jiang , Yanchao Li , Liang Zhao

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of two-mode and co-occurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from…

Machine Learning · Computer Science 2013-01-30 Thomas Hofmann

Text Modeling using Unsupervised Topic Models and Concept Hierarchies

Statistical topic models provide a general data-driven framework for automated discovery of high-level knowledge from large collections of text documents. While topic models can potentially discover a broad range of themes in a data set,…

Artificial Intelligence · Computer Science 2008-08-08 Chaitanya Chemudugunta , Padhraic Smyth , Mark Steyvers

Keyword Assisted Embedded Topic Model

By illuminating latent structures in a corpus of text, topic models are an essential tool for categorizing, summarizing, and exploring large collections of documents. Probabilistic topic models, such as latent Dirichlet allocation (LDA),…

Information Retrieval · Computer Science 2021-12-07 Bahareh Harandizadeh , J. Hunter Priniski , Fred Morstatter

Nested Variational Autoencoder for Topic Modeling on Microtexts with Word Vectors

Most of the information on the Internet is represented in the form of microtexts, which are short text snippets such as news headlines or tweets. These sources of information are abundant, and mining these data could uncover meaningful…

Computation and Language · Computer Science 2019-09-17 Trung Trinh , Tho Quan , Trung Mai

Topic2Vec: Learning Distributed Representations of Topics

Latent Dirichlet Allocation (LDA) mining thematic structure of documents plays an important role in nature language processing and machine learning areas. However, the probability distribution from LDA only describes the statistical…

Computation and Language · Computer Science 2015-06-30 Li-Qiang Niu , Xin-Yu Dai

Learning Topic Models by Belief Propagation

Latent Dirichlet allocation (LDA) is an important hierarchical Bayesian model for probabilistic topic modeling, which attracts worldwide interests and touches on many important applications in text mining, computer vision and computational…

Machine Learning · Computer Science 2015-03-19 Jia Zeng , William K. Cheung , Jiming Liu