English
Related papers

Related papers: Multi-environment Topic Models

200 papers

An important aspect of text mining involves information retrieval in form of discovery of semantic themes (topics) from documents using topic modelling. While generative topic models like Latent Dirichlet Allocation (LDA) or Latent Semantic…

Machine Learning · Computer Science 2025-11-04 Satyajeet Sahoo , Jhareswar Maiti

Traditional topic models are effective at uncovering latent themes in large text collections. However, due to their reliance on bag-of-words representations, they struggle to capture semantically abstract features. While some neural…

Computation and Language · Computer Science 2025-08-01 Carolina Zheng , Nicolas Beltran-Velez , Sweta Karlekar , Claudia Shi , Achille Nazaret , Asif Mallik , Amir Feder , David M. Blei

Topic modeling is a widely used technique for revealing underlying thematic structures within textual data. However, existing models have certain limitations, particularly when dealing with short text datasets that lack co-occurring words.…

Artificial Intelligence · Computer Science 2023-12-18 Han Wang , Nirmalendu Prakash , Nguyen Khoi Hoang , Ming Shan Hee , Usman Naseem , Roy Ka-Wei Lee

Topic modeling is a well-established technique for exploring text corpora. Conventional topic models (e.g., LDA) represent topics as bags of words that often require "reading the tea leaves" to interpret; additionally, they offer users…

Computation and Language · Computer Science 2024-04-03 Chau Minh Pham , Alexander Hoyle , Simeng Sun , Philip Resnik , Mohit Iyyer

Topic modeling analyzes documents to learn meaningful patterns of words. However, existing topic models fail to learn interpretable topics when working with large and heavy-tailed vocabularies. To this end, we develop the Embedded Topic…

Information Retrieval · Computer Science 2019-07-12 Adji B. Dieng , Francisco J. R. Ruiz , David M. Blei

The proliferation of social media has given rise to a new form of communication: memes. Memes are multimodal and often contain a combination of text and visual elements that convey meaning, humor, and cultural significance. While meme…

Computation and Language · Computer Science 2023-12-12 Nirmalendu Prakash , Han Wang , Nguyen Khoi Hoang , Ming Shan Hee , Roy Ka-Wei Lee

Recently there has been significant activity in developing algorithms with provable guarantees for topic modeling. In standard topic models, a topic (such as sports, business, or politics) is viewed as a probability distribution $\vec a_i$…

Machine Learning · Computer Science 2016-11-07 Avrim Blum , Nika Haghtalab

Extracting topics from text has become an essential task, especially with the rapid growth of unstructured textual data. Most existing works rely on highly computational methods to address this challenge. In this paper, we argue that…

Computation and Language · Computer Science 2025-11-07 Salma Mekaoui , Hiba Sofyan , Imane Amaaz , Imane Benchrif , Arsalane Zarghili , Ilham Chaker , Nikola S. Nikolov

Topic modelling is a text mining technique for identifying salient themes from a number of documents. The output is commonly a set of topics consisting of isolated tokens that often co-occur in such documents. Manual effort is often…

Computation and Language · Computer Science 2024-04-26 Lowri Williams , Eirini Anthi , Laura Arman , Pete Burnap

As we continue to collect and store textual data in a multitude of domains, we are regularly confronted with material whose largely unknown thematic structure we want to uncover. With unsupervised, exploratory analysis, no prior knowledge…

Information Retrieval · Computer Science 2015-07-20 Samuel Rönnqvist

Statistical topic models provide a general data-driven framework for automated discovery of high-level knowledge from large collections of text documents. While topic models can potentially discover a broad range of themes in a data set,…

Artificial Intelligence · Computer Science 2008-08-08 Chaitanya Chemudugunta , Padhraic Smyth , Mark Steyvers

Contextualised word vectors obtained via pre-trained language models encode a variety of knowledge that has already been exploited in applications. Complementary to these language models are probabilistic topic models that learn thematic…

Computation and Language · Computer Science 2023-01-12 Mozhgan Talebpour , Alba Garcia Seco de Herrera , Shoaib Jameel

We proposed a novel multilayer correlated topic model (MCTM) to analyze how the main ideas inherit and vary between a document and its different segments, which helps understand an article's structure. The variational…

Information Retrieval · Computer Science 2021-01-07 Ye Tian

We develop the multilingual topic model for unaligned text (MuTo), a probabilistic model of text that is designed to analyze corpora composed of documents in two languages. From these documents, MuTo uses stochastic EM to simultaneously…

Computation and Language · Computer Science 2012-05-14 Jordan Boyd-Graber , David Blei

Automated predictions require explanations to be interpretable by humans. Past work used attention and rationale mechanisms to find words that predict the target variable of a document. Often though, they result in a tradeoff between noisy…

Computation and Language · Computer Science 2020-12-22 Diego Antognini , Claudiu Musat , Boi Faltings

Topic models have been the prominent tools for automatic topic discovery from text corpora. Despite their effectiveness, topic models suffer from several limitations including the inability of modeling word ordering information in…

Computation and Language · Computer Science 2022-02-10 Yu Meng , Yunyi Zhang , Jiaxin Huang , Yu Zhang , Jiawei Han

In recent years, fully automated content analysis based on probabilistic topic models has become popular among social scientists because of their scalability. The unsupervised nature of the models makes them suitable for exploring topics in…

Computation and Language · Computer Science 2023-02-06 Shusei Eshima , Kosuke Imai , Tomoya Sasaki

We introduce an approach to topic modelling with document-level covariates that remains tractable in the face of large text corpora. This is achieved by de-emphasizing the role of parameter estimation in an underlying probabilistic model,…

Methodology · Statistics 2025-11-05 Gabriel Phelan , David A. Campbell

Dynamic topic models (DTMs) model the evolution of prevalent themes in literature, online media, and other forms of text over time. DTMs assume that word co-occurrence statistics change continuously and therefore impose continuous…

Machine Learning · Statistics 2018-03-22 Patrick Jähnichen , Florian Wenzel , Marius Kloft , Stephan Mandt

The growing societal dependence on social media and user generated content for news and information has increased the influence of unreliable sources and fake content, which muddles public discourse and lessens trust in the media.…

Computation and Language · Computer Science 2022-09-07 Marjan Hosseini , Alireza Javadian Sabet , Suining He , Derek Aguiar
‹ Prev 1 2 3 10 Next ›