Related papers: Concentrated Document Topic Model
Topic modeling, a method for extracting the underlying themes from a collection of documents, is an increasingly important component of the design of intelligent systems enabling the sense-making of highly dynamic and diverse streams of…
We propose an Exclusive Topic Modeling (ETM) for unsupervised text classification, which is able to 1) identify the field-specific keywords though less frequently appeared and 2) deliver well-structured topics with exclusive words. In…
We introduce the author-topic model, a generative model for documents that extends Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003) to include authorship information. Each author is associated with a multinomial distribution over…
Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics,…
An important aspect of text mining involves information retrieval in form of discovery of semantic themes (topics) from documents using topic modelling. While generative topic models like Latent Dirichlet Allocation (LDA) or Latent Semantic…
In this paper, we develop the continuous time dynamic topic model (cDTM). The cDTM is a dynamic topic model that uses Brownian motion to model the latent topics through a sequential collection of documents, where a "topic" is a pattern of…
Comparative text mining extends from genre analysis and political bias detection to the revelation of cultural and geographic differences, through to the search for prior art across patents and scientific papers. These applications use…
Topic modeling analyzes documents to learn meaningful patterns of words. For documents collected in sequence, dynamic topic models capture how these patterns vary over time. We develop the dynamic embedded topic model (D-ETM), a generative…
In text classification tasks, fine tuning pretrained language models like BERT and GPT-3 yields competitive accuracy; however, both methods require pretraining on large text datasets. In contrast, general topic modeling methods possess the…
The embedded topic model (ETM) is a widely used approach that assumes the sampled document-topic distribution conforms to the logistic normal distribution for easier optimization. However, this assumption oversimplifies the real…
As electronically stored data grow in daily life, obtaining novel and relevant information becomes challenging in text mining. Thus people have sought statistical methods based on term frequency, matrix algebra, or topic modeling for text…
Manually labeling documents is tedious and expensive, but it is essential for training a traditional text classifier. In recent years, a few dataless text classification techniques have been proposed to address this problem. However,…
Topic Modeling is an approach used for automatic comprehension and classification of data in a variety of settings, and perhaps the canonical application is in uncovering thematic structure in a corpus of documents. A number of foundational…
Supervised topic models can help clinical researchers find interpretable cooccurence patterns in count data that are relevant for diagnostics. However, standard formulations of supervised Latent Dirichlet Allocation have two problems.…
We introduce CEMTM, a context-enhanced multimodal topic model designed to infer coherent and interpretable topic structures from both short and long documents containing text and images. CEMTM builds on fine-tuned large vision language…
We present LDAExplore, a tool to visualize topic distributions in a given document corpus that are generated using Topic Modeling methods. Latent Dirichlet Allocation (LDA) is one of the basic methods that is predominantly used to generate…
The contribution of this paper is two-fold. First, we present Indexing by Latent Dirichlet Allocation (LDI), an automatic document indexing method. The probability distributions in LDI utilize those in Latent Dirichlet Allocation (LDA), a…
Topic modeling analyzes documents to learn meaningful patterns of words. However, existing topic models fail to learn interpretable topics when working with large and heavy-tailed vocabularies. To this end, we develop the Embedded Topic…
In this paper we present a model for unsupervised topic discovery in texts corpora. The proposed model uses documents, words, and topics lookup table embedding as neural network model parameters to build probabilities of words given topics,…
Social scientists analyze citation networks to study how documents influence subsequent work across various domains such as judicial politics and international relations. However, conventional approaches that summarize document attributes…