Related papers: Topic Modeling Using Distributed Word Embeddings

Top2Vec: Distributed Representations of Topics

Topic modeling is used for discovering latent semantic structure, usually referred to as topics, in a large collection of documents. The most widely used methods are Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis.…

Computation and Language · Computer Science 2020-08-24 Dimo Angelov

Topic2Vec: Learning Distributed Representations of Topics

Latent Dirichlet Allocation (LDA) mining thematic structure of documents plays an important role in nature language processing and machine learning areas. However, the probability distribution from LDA only describes the statistical…

Computation and Language · Computer Science 2015-06-30 Li-Qiang Niu , Xin-Yu Dai

Topic Modeling in Embedding Spaces

Topic modeling analyzes documents to learn meaningful patterns of words. However, existing topic models fail to learn interpretable topics when working with large and heavy-tailed vocabularies. To this end, we develop the Embedded Topic…

Information Retrieval · Computer Science 2019-07-12 Adji B. Dieng , Francisco J. R. Ruiz , David M. Blei

Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings

A topic model is often formulated as a generative model that explains how each word of a document is generated given a set of topics and document-specific topic proportions. It is focused on capturing the word co-occurrences in a document…

Machine Learning · Computer Science 2022-03-16 Dongsheng Wang , Dandan Guo , He Zhao , Huangjie Zheng , Korawat Tanwisuth , Bo Chen , Mingyuan Zhou

Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too!

Topic models are a useful analysis tool to uncover the underlying themes within document collections. The dominant approach is to use probabilistic topic models that posit a generative story, but in this paper we propose an alternative way…

Computation and Language · Computer Science 2020-10-08 Suzanna Sia , Ayush Dalmia , Sabrina J. Mielke

Coordinated Topic Modeling

We propose a new problem called coordinated topic modeling that imitates human behavior while describing a text corpus. It considers a set of well-defined topics like the axes of a semantic space with a reference representation. It then…

Computation and Language · Computer Science 2022-10-25 Pritom Saha Akash , Jie Huang , Kevin Chen-Chuan Chang

LTSG: Latent Topical Skip-Gram for Mutually Learning Topic Model and Vector Representations

Topic models have been widely used in discovering latent topics which are shared across documents in text mining. Vector representations, word embeddings and topic embeddings, map words and topics into a low-dimensional and dense real-value…

Computation and Language · Computer Science 2017-02-24 Jarvan Law , Hankz Hankui Zhuo , Junhua He , Erhu Rong

Graph2topic: an opensource topic modeling framework based on sentence embedding and community detection

It has been reported that clustering-based topic models, which cluster high-quality sentence embeddings with an appropriate word selection method, can generate better topics than generative probabilistic topic models. However, these…

Computation and Language · Computer Science 2023-06-07 Leihang Zhang , Jiapeng Liu , Qiang Yan

A Process for Topic Modelling Via Word Embeddings

This work combines algorithms based on word embeddings, dimensionality reduction, and clustering. The objective is to obtain topics from a set of unclassified texts. The algorithm to obtain the word embeddings is the BERT model, a neural…

Computation and Language · Computer Science 2023-12-08 Diego Saldaña Ulloa

Graph-based Topic Extraction from Vector Embeddings of Text Documents: Application to a Corpus of News Articles

Production of news content is growing at an astonishing rate. To help manage and monitor the sheer amount of text, there is an increasing need to develop efficient methods that can provide insights into emerging content areas, and stratify…

Computation and Language · Computer Science 2020-10-29 M. Tarik Altuncu , Sophia N. Yaliraki , Mauricio Barahona

Topic Modeling over Short Texts by Incorporating Word Embeddings

Inferring topics from the overwhelming amount of short texts becomes a critical but challenging task for many content analysis tasks, such as content charactering, user interest profiling, and emerging topic detecting. Existing methods such…

Computation and Language · Computer Science 2016-09-28 Jipeng Qiang , Ping Chen , Tong Wang , Xindong Wu

Semantic-Driven Topic Modeling Using Transformer-Based Embeddings and Clustering Algorithms

Topic modeling is a powerful technique to discover hidden topics and patterns within a collection of documents without prior knowledge. Traditional topic modeling and clustering-based techniques encounter challenges in capturing contextual…

Computation and Language · Computer Science 2024-10-04 Melkamu Abay Mersha , Mesay Gemeda yigezu , Jugal Kalita

TopicNet: Semantic Graph-Guided Topic Discovery

Existing deep hierarchical topic models are able to extract semantically meaningful topics from a text corpus in an unsupervised manner and automatically organize them into a topic hierarchy. However, it is unclear how to incorporate prior…

Machine Learning · Computer Science 2021-10-28 Zhibin Duan , Yishi Xu , Bo Chen , Dongsheng Wang , Chaojie Wang , Mingyuan Zhou

Generative Topic Embedding: a Continuous Representation of Documents (Extended Version with Proofs)

Word embedding maps words into a low-dimensional continuous embedding space by exploiting the local word collocation patterns in a small context window. On the other hand, topic modeling maps documents onto a low-dimensional topic space, by…

Computation and Language · Computer Science 2016-08-09 Shaohua Li , Tat-Seng Chua , Jun Zhu , Chunyan Miao

Topics as Entity Clusters: Entity-based Topics from Large Language Models and Graph Neural Networks

Topic models aim to reveal latent structures within a corpus of text, typically through the use of term-frequency statistics over bag-of-words representations from documents. In recent years, conceptual entities -- interpretable,…

Computation and Language · Computer Science 2024-08-27 Manuel V. Loureiro , Steven Derby , Tri Kurniawan Wijaya

CAST: Corpus-Aware Self-similarity Enhanced Topic modelling

Topic modelling is a pivotal unsupervised machine learning technique for extracting valuable insights from large document collections. Existing neural topic modelling methods often encode contextual information of documents, while ignoring…

Computation and Language · Computer Science 2025-02-07 Yanan Ma , Chenghao Xiao , Chenhan Yuan , Sabine N van der Veer , Lamiece Hassan , Chenghua Lin , Goran Nenadic

Automatic Labelling of Topics with Neural Embeddings

Topics generated by topic models are typically represented as list of terms. To reduce the cognitive overhead of interpreting these topics for end-users, we propose labelling a topic with a succinct phrase that summarises its theme or idea.…

Computation and Language · Computer Science 2016-12-26 Shraey Bhatia , Jey Han Lau , Timothy Baldwin

Large-Scale Evaluation of Topic Models and Dimensionality Reduction Methods for 2D Text Spatialization

Topic models are a class of unsupervised learning algorithms for detecting the semantic structure within a text corpus. Together with a subsequent dimensionality reduction algorithm, topic models can be used for deriving spatializations for…

Computation and Language · Computer Science 2023-10-26 Daniel Atzberger , Tim Cech , Willy Scheibel , Matthias Trapp , Rico Richter , Jürgen Döllner , Tobias Schreck

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec

Distributed dense word vectors have been shown to be effective at capturing token-level semantic and syntactic regularities in language, while topic models can form interpretable representations over documents. In this work, we describe…

Computation and Language · Computer Science 2016-05-09 Christopher E Moody

Enhancing BERTopic with Intermediate Layer Representations

BERTopic is a topic modeling algorithm that leverages transformer-based embeddings to create dense clusters, enabling the estimation of topic structures and the extraction of valuable insights from a corpus of documents. This approach…

Computation and Language · Computer Science 2025-05-13 Dominik Koterwa , Maciej Świtała