Related papers: Visualizing Topic Uncertainty in Topic Modelling

Uncertainty-aware Topic Modeling Visualization

Topic modeling is a state-of-the-art technique for analyzing text corpora. It uses a statistical model, most commonly Latent Dirichlet Allocation (LDA), to discover abstract topics that occur in the document collection. However, the…

Human-Computer Interaction · Computer Science 2021-10-19 Valerie Müller , Christian Sieg , Lars Linsen

Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey

Topic modeling is one of the most powerful techniques in text mining for data mining, latent data discovery, and finding relationships among data, text documents. Researchers have published many articles in the field of topic modeling and…

Information Retrieval · Computer Science 2018-12-07 Hamed Jelodar , Yongli Wang , Chi Yuan , Xia Feng , Xiahui Jiang , Yanchao Li , Liang Zhao

LDAExplore: Visualizing Topic Models Generated Using Latent Dirichlet Allocation

We present LDAExplore, a tool to visualize topic distributions in a given document corpus that are generated using Topic Modeling methods. Latent Dirichlet Allocation (LDA) is one of the basic methods that is predominantly used to generate…

Information Retrieval · Computer Science 2015-07-24 Ashwinkumar Ganesan , Kiante Brantley , Shimei Pan , Jian Chen

A Spectral Algorithm for Latent Dirichlet Allocation

The problem of topic modeling can be seen as a generalization of the clustering problem, in that it posits that observations are generated due to multiple latent factors (e.g., the words in each document are generated as a mixture of…

Machine Learning · Computer Science 2013-01-21 Animashree Anandkumar , Dean P. Foster , Daniel Hsu , Sham M. Kakade , Yi-Kai Liu

Variable Selection for Latent Dirichlet Allocation

In latent Dirichlet allocation (LDA), topics are multinomial distributions over the entire vocabulary. However, the vocabulary usually contains many words that are not relevant in forming the topics. We adopt a variable selection method…

Machine Learning · Computer Science 2012-05-08 Dongwoo Kim , Yeonseung Chung , Alice Oh

More Than Words: Collocation Tokenization for Latent Dirichlet Allocation Models

Traditionally, Latent Dirichlet Allocation (LDA) ingests words in a collection of documents to discover their latent topics using word-document co-occurrences. However, it is unclear how to achieve the best results for languages without…

Computation and Language · Computer Science 2021-08-25 Jin Cheevaprawatdomrong , Alexandra Schofield , Attapol T. Rutherford

A correlated topic model of Science

Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics,…

Applications · Statistics 2009-09-29 David M. Blei , John D. Lafferty

A Novel Method of Fuzzy Topic Modeling based on Transformer Processing

Topic modeling is admittedly a convenient way to monitor markets trend. Conventionally, Latent Dirichlet Allocation, LDA, is considered a must-do model to gain this type of information. By given the merit of deducing keyword with token…

Computation and Language · Computer Science 2023-09-19 Ching-Hsun Tseng , Shin-Jye Lee , Po-Wei Cheng , Chien Lee , Chih-Chieh Hung

Distributed Asymmetric Allocation: A Topic Model for Large Imbalanced Corpora in Social Sciences

Social scientists employ latent Dirichlet allocation (LDA) to find highly specific topics in large corpora, but they often struggle in this task because (1) LDA, in general, takes a significant amount of time to fit on large corpora; (2)…

Methodology · Statistics 2025-12-23 Kohei Watanabe

Quantifying consistency and accuracy of Latent Dirichlet Allocation

Topic modelling in Natural Language Processing uncovers hidden topics in large, unlabelled text datasets. It is widely applied in fields such as information retrieval, content summarisation, and trend analysis across various disciplines.…

Computation and Language · Computer Science 2025-11-18 Saranzaya Magsarjav , Melissa Humphries , Jonathan Tuke , Lewis Mitchell

Word Clouds as Common Voices: LLM-Assisted Visualization of Participant-Weighted Themes in Qualitative Interviews

Word clouds are a common way to summarize qualitative interviews, yet traditional frequency-based methods often fail in conversational contexts: they surface filler words, ignore paraphrase, and fragment semantically related ideas. This…

Computation and Language · Computer Science 2025-08-12 Joseph T. Colonel , Baihan Lin

Improving Topic Models with Latent Feature Word Representations

Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two…

Computation and Language · Computer Science 2018-10-16 Dat Quoc Nguyen , Richard Billingsley , Lan Du , Mark Johnson

Modeling Word Relatedness in Latent Dirichlet Allocation

Standard LDA model suffers the problem that the topic assignment of each word is independent and word correlation hence is neglected. To address this problem, in this paper, we propose a model called Word Related Latent Dirichlet Allocation…

Computation and Language · Computer Science 2014-11-11 Xun Wang

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec

Distributed dense word vectors have been shown to be effective at capturing token-level semantic and syntactic regularities in language, while topic models can form interpretable representations over documents. In this work, we describe…

Computation and Language · Computer Science 2016-05-09 Christopher E Moody

On a Topic Model for Sentences

Probabilistic topic models are generative models that describe the content of documents by discovering the latent topics underlying them. However, the structure of the textual input, and for instance the grouping of words in coherent text…

Computation and Language · Computer Science 2016-06-02 Georgios Balikas , Massih-Reza Amini , Marianne Clausel

A network approach to topic models

One of the main computational and scientific challenges in the modern age is to extract useful information from unstructured texts. Topic models are one popular machine-learning approach which infers the latent topical structure of a…

Machine Learning · Statistics 2018-07-20 Martin Gerlach , Tiago P. Peixoto , Eduardo G. Altmann

Topic2Vec: Learning Distributed Representations of Topics

Latent Dirichlet Allocation (LDA) mining thematic structure of documents plays an important role in nature language processing and machine learning areas. However, the probability distribution from LDA only describes the statistical…

Computation and Language · Computer Science 2015-06-30 Li-Qiang Niu , Xin-Yu Dai

n-stage Latent Dirichlet Allocation: A Novel Approach for LDA

Nowadays, data analysis has become a problem as the amount of data is constantly increasing. In order to overcome this problem in textual data, many models and methods are used in natural language processing. The topic modeling field is one…

Computation and Language · Computer Science 2021-10-22 Zekeriya Anil Guven , Banu Diri , Tolgahan Cakaloglu

Topics in the Haystack: Extracting and Evaluating Topics beyond Coherence

Extracting and identifying latent topics in large text corpora has gained increasing importance in Natural Language Processing (NLP). Most models, whether probabilistic models similar to Latent Dirichlet Allocation (LDA) or neural topic…

Computation and Language · Computer Science 2023-03-31 Anton Thielmann , Quentin Seifert , Arik Reuter , Elisabeth Bergherr , Benjamin Säfken

Prior matters: simple and general methods for evaluating and improving topic quality in topic modeling

Latent Dirichlet Allocation (LDA) models trained without stopword removal often produce topics with high posterior probabilities on uninformative words, obscuring the underlying corpus content. Even when canonical stopwords are manually…

Computation and Language · Computer Science 2017-10-17 Angela Fan , Finale Doshi-Velez , Luke Miratrix