Related papers: Knowledge-Aware Bayesian Deep Topic Model

Topic Modeling in Embedding Spaces

Topic modeling analyzes documents to learn meaningful patterns of words. However, existing topic models fail to learn interpretable topics when working with large and heavy-tailed vocabularies. To this end, we develop the Embedded Topic…

Information Retrieval · Computer Science 2019-07-12 Adji B. Dieng , Francisco J. R. Ruiz , David M. Blei

Keyword Assisted Embedded Topic Model

By illuminating latent structures in a corpus of text, topic models are an essential tool for categorizing, summarizing, and exploring large collections of documents. Probabilistic topic models, such as latent Dirichlet allocation (LDA),…

Information Retrieval · Computer Science 2021-12-07 Bahareh Harandizadeh , J. Hunter Priniski , Fred Morstatter

TopicNet: Semantic Graph-Guided Topic Discovery

Existing deep hierarchical topic models are able to extract semantically meaningful topics from a text corpus in an unsupervised manner and automatically organize them into a topic hierarchy. However, it is unclear how to incorporate prior…

Machine Learning · Computer Science 2021-10-28 Zhibin Duan , Yishi Xu , Bo Chen , Dongsheng Wang , Chaojie Wang , Mingyuan Zhou

Neural Attention-Aware Hierarchical Topic Model

Neural topic models (NTMs) apply deep neural networks to topic modelling. Despite their success, NTMs generally ignore two important aspects: (1) only document-level word count information is utilized for the training, while more…

Computation and Language · Computer Science 2021-10-15 Yuan Jin , He Zhao , Ming Liu , Lan Du , Wray Buntine

Topic Modeling over Short Texts by Incorporating Word Embeddings

Inferring topics from the overwhelming amount of short texts becomes a critical but challenging task for many content analysis tasks, such as content charactering, user interest profiling, and emerging topic detecting. Existing methods such…

Computation and Language · Computer Science 2016-09-28 Jipeng Qiang , Ping Chen , Tong Wang , Xindong Wu

The Dynamic Embedded Topic Model

Topic modeling analyzes documents to learn meaningful patterns of words. For documents collected in sequence, dynamic topic models capture how these patterns vary over time. We develop the dynamic embedded topic model (D-ETM), a generative…

Computation and Language · Computer Science 2019-10-14 Adji B. Dieng , Francisco J. R. Ruiz , David M. Blei

A Neural Generative Model for Joint Learning Topics and Topic-Specific Word Embeddings

We propose a novel generative model to explore both local and global context for joint learning topics and topic-specific word embeddings. In particular, we assume that global latent topics are shared across documents, a word is generated…

Computation and Language · Computer Science 2020-08-12 Lixing Zhu , Yulan He , Deyu Zhou

Knowledge-Base Enriched Word Embeddings for Biomedical Domain

Word embeddings have been shown adept at capturing the semantic and syntactic regularities of the natural language text, as a result of which these representations have found their utility in a wide variety of downstream content analysis…

Computation and Language · Computer Science 2021-03-02 Kishlay Jha

Multivariate Gaussian Topic Modelling: A novel approach to discover topics with greater semantic coherence

An important aspect of text mining involves information retrieval in form of discovery of semantic themes (topics) from documents using topic modelling. While generative topic models like Latent Dirichlet Allocation (LDA) or Latent Semantic…

Machine Learning · Computer Science 2025-11-04 Satyajeet Sahoo , Jhareswar Maiti

Learning document embeddings along with their uncertainties

Majority of the text modelling techniques yield only point-estimates of document embeddings and lack in capturing the uncertainty of the estimates. These uncertainties give a notion of how well the embeddings represent a document. We…

Computation and Language · Computer Science 2020-08-03 Santosh Kesiraju , Oldřich Plchot , Lukáš Burget , Suryakanth V Gangashetty

Latent Topic Models for Hypertext

Latent topic models have been successfully applied as an unsupervised topic discovery technique in large document collections. With the proliferation of hypertext document collection such as the Internet, there has also been great interest…

Information Retrieval · Computer Science 2012-06-18 Amit Gruber , Michal Rosen-Zvi , Yair Weiss

Conceptualization Topic Modeling

Recently, topic modeling has been widely used to discover the abstract topics in text corpora. Most of the existing topic models are based on the assumption of three-layer hierarchical Bayesian structure, i.e. each document is modeled as a…

Computation and Language · Computer Science 2017-04-10 Yi-Kun Tang , Xian-Ling Mao , Heyan Huang , Guihua Wen

Category Enhanced Word Embedding

Distributed word representations have been demonstrated to be effective in capturing semantic and syntactic regularities. Unsupervised representation learning from large unlabeled corpora can learn similar representations for those words…

Computation and Language · Computer Science 2015-12-01 Chunting Zhou , Chonglin Sun , Zhiyuan Liu , Francis C. M. Lau

HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding

Embedded topic models are able to learn interpretable topics even with large and heavy-tailed vocabularies. However, they generally hold the Euclidean embedding space assumption, leading to a basic limitation in capturing hierarchical…

Information Retrieval · Computer Science 2022-10-20 Yishi Xu , Dongsheng Wang , Bo Chen , Ruiying Lu , Zhibin Duan , Mingyuan Zhou

Semantic-Driven Topic Modeling Using Transformer-Based Embeddings and Clustering Algorithms

Topic modeling is a powerful technique to discover hidden topics and patterns within a collection of documents without prior knowledge. Traditional topic modeling and clustering-based techniques encounter challenges in capturing contextual…

Computation and Language · Computer Science 2024-10-04 Melkamu Abay Mersha , Mesay Gemeda yigezu , Jugal Kalita

Enriching Medcial Terminology Knowledge Bases via Pre-trained Language Model and Graph Convolutional Network

Enriching existing medical terminology knowledge bases (KBs) is an important and never-ending work for clinical research because new terminology alias may be continually added and standard terminologies may be newly renamed. In this paper,…

Computation and Language · Computer Science 2019-09-04 Jiaying Zhang , Zhixing Zhang , Huanhuan Zhang , Zhiyuan Ma , Yangming Zhou , Ping He

Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

Topic models extract groups of words from documents, whose interpretation as a topic hopefully allows for a better understanding of the data. However, the resulting word groups are often not coherent, making them harder to interpret.…

Computation and Language · Computer Science 2021-06-18 Federico Bianchi , Silvia Terragni , Dirk Hovy

Generative Topic Embedding: a Continuous Representation of Documents (Extended Version with Proofs)

Word embedding maps words into a low-dimensional continuous embedding space by exploiting the local word collocation patterns in a small context window. On the other hand, topic modeling maps documents onto a low-dimensional topic space, by…

Computation and Language · Computer Science 2016-08-09 Shaohua Li , Tat-Seng Chua , Jun Zhu , Chunyan Miao

Embedded Topic Models Enhanced by Wikification

Topic modeling analyzes a collection of documents to learn meaningful patterns of words. However, previous topic models consider only the spelling of words and do not take into consideration the homography of words. In this study, we…

Computation and Language · Computer Science 2024-10-04 Takashi Shibuya , Takehito Utsuro

Multi-view and Multi-source Transfers in Neural Topic Modeling with Pretrained Topic and Word Embeddings

Though word embeddings and topics are complementary representations, several past works have only used pre-trained word embeddings in (neural) topic modeling to address data sparsity problem in short text or small collection of documents.…

Computation and Language · Computer Science 2019-09-18 Pankaj Gupta , Yatin Chaudhary , Hinrich Schütze