Related papers: Topic Model Supervised by Understanding Map

Cross-topic distributional semantic representations via unsupervised mappings

In traditional Distributional Semantic Models (DSMs) the multiple senses of a polysemous word are conflated into a single vector space representation. In this work, we propose a DSM that learns multiple distributional representations of a…

Computation and Language · Computer Science 2019-04-12 Eleftheria Briakou , Nikos Athanasiou , Alexandros Potamianos

Topic Modeling for Free-Response Text Data from a Complex Survey

Topic Modeling is a popular statistical tool commonly used on textual data to identify the hidden thematic structure in a document collection based on the distribution of words. Additionally, it can be used to cluster the documents, with…

Applications · Statistics 2025-01-24 Namitha V. Pais , Scott H. Holan , Paul A. Parker

Unveiling the semantic structure of text documents using paragraph-aware Topic Models

Classic Topic Models are built under the Bag Of Words assumption, in which word position is ignored for simplicity. Besides, symmetric priors are typically used in most applications. In order to easily learn topics with different properties…

Computation and Language · Computer Science 2018-06-27 Simón Roca-Sotelo , Jerónimo Arenas-García

Exploratory topic modeling with distributional semantics

As we continue to collect and store textual data in a multitude of domains, we are regularly confronted with material whose largely unknown thematic structure we want to uncover. With unsupervised, exploratory analysis, no prior knowledge…

Information Retrieval · Computer Science 2015-07-20 Samuel Rönnqvist

Multivariate Gaussian Topic Modelling: A novel approach to discover topics with greater semantic coherence

An important aspect of text mining involves information retrieval in form of discovery of semantic themes (topics) from documents using topic modelling. While generative topic models like Latent Dirichlet Allocation (LDA) or Latent Semantic…

Machine Learning · Computer Science 2025-11-04 Satyajeet Sahoo , Jhareswar Maiti

Tag-Weighted Topic Model For Large-scale Semi-Structured Documents

To date, there have been massive Semi-Structured Documents (SSDs) during the evolution of the Internet. These SSDs contain both unstructured features (e.g., plain text) and metadata (e.g., tags). Most previous works focused on modeling the…

Computation and Language · Computer Science 2015-07-31 Shuangyin Li , Jiefei Li , Guan Huang , Ruiyang Tan , Rong Pan

A new generation of science overlay maps with an application to the history of biosystematics

The paper proposes a text-mining based analytical framework aiming at the cognitive organization of complex scientific discourses. The approach is based on models recently developed in science mapping, being a generalization of the…

Digital Libraries · Computer Science 2015-04-23 Sandor Soos

Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration

We propose a novel end-to-end document understanding model called SeRum (SElective Region Understanding Model) for extracting meaningful information from document images, including document analysis, retrieval, and office automation. Unlike…

Computer Vision and Pattern Recognition · Computer Science 2023-09-06 Haoyu Cao , Changcun Bao , Chaohu Liu , Huang Chen , Kun Yin , Hao Liu , Yinsong Liu , Deqiang Jiang , Xing Sun

A network approach to topic models

One of the main computational and scientific challenges in the modern age is to extract useful information from unstructured texts. Topic models are one popular machine-learning approach which infers the latent topical structure of a…

Machine Learning · Statistics 2018-07-20 Martin Gerlach , Tiago P. Peixoto , Eduardo G. Altmann

A Multilayer Correlated Topic Model

We proposed a novel multilayer correlated topic model (MCTM) to analyze how the main ideas inherit and vary between a document and its different segments, which helps understand an article's structure. The variational…

Information Retrieval · Computer Science 2021-01-07 Ye Tian

Explainable and Discourse Topic-aware Neural Language Understanding

Marrying topic models and language models exposes language understanding to a broader source of document-level context beyond sentences via topics. While introducing topical semantics in language models, existing approaches incorporate…

Computation and Language · Computer Science 2023-06-28 Yatin Chaudhary , Hinrich Schütze , Pankaj Gupta

TAN-NTM: Topic Attention Networks for Neural Topic Modeling

Topic models have been widely used to learn text representations and gain insight into document corpora. To perform topic discovery, most existing neural models either take document bag-of-words (BoW) or sequence of tokens as input followed…

Computation and Language · Computer Science 2021-07-12 Madhur Panwar , Shashank Shailabh , Milan Aggarwal , Balaji Krishnamurthy

CEMTM: Contextual Embedding-based Multimodal Topic Modeling

We introduce CEMTM, a context-enhanced multimodal topic model designed to infer coherent and interpretable topic structures from both short and long documents containing text and images. CEMTM builds on fine-tuned large vision language…

Computation and Language · Computer Science 2025-10-07 Amirhossein Abaskohi , Raymond Li , Chuyuan Li , Shafiq Joty , Giuseppe Carenini

Exploring term-document matrices from matrix models in text mining

We explore a matrix-space model, that is a natural extension to the vector space model for Information Retrieval. Each document can be represented by a matrix that is based on document extracts (e.g. sentences, paragraphs, sections). We…

Information Retrieval · Computer Science 2007-05-23 Ioannis Antonellis , Efstratios Gallopoulos

Document Clustering based on Topic Maps

Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next…

Information Retrieval · Computer Science 2011-12-30 Muhammad Rafi , M. Shahid Shaikh , Amir Farooq

Syntactic Topic Models

The syntactic topic model (STM) is a Bayesian nonparametric model of language that discovers latent distributions of words (topics) that are both semantically and syntactically coherent. The STM models dependency parsed corpora where…

Computation and Language · Computer Science 2010-03-04 Jordan Boyd-Graber , David M. Blei

Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data

As language models (LMs) deliver increasing performance on a range of NLP tasks, probing classifiers have become an indispensable technique in the effort to better understand their inner workings. A typical setup involves (1) defining an…

Computation and Language · Computer Science 2024-08-01 Charles Jin , Martin Rinard

Topic Compositional Neural Language Model

We propose a Topic Compositional Neural Language Model (TCNLM), a novel method designed to simultaneously capture both the global semantic meaning and the local word ordering structure in a document. The TCNLM learns the global semantic…

Machine Learning · Computer Science 2018-02-27 Wenlin Wang , Zhe Gan , Wenqi Wang , Dinghan Shen , Jiaji Huang , Wei Ping , Sanjeev Satheesh , Lawrence Carin

Text Modeling using Unsupervised Topic Models and Concept Hierarchies

Statistical topic models provide a general data-driven framework for automated discovery of high-level knowledge from large collections of text documents. While topic models can potentially discover a broad range of themes in a data set,…

Artificial Intelligence · Computer Science 2008-08-08 Chaitanya Chemudugunta , Padhraic Smyth , Mark Steyvers

Topic Modeling in Embedding Spaces

Topic modeling analyzes documents to learn meaningful patterns of words. However, existing topic models fail to learn interpretable topics when working with large and heavy-tailed vocabularies. To this end, we develop the Embedded Topic…

Information Retrieval · Computer Science 2019-07-12 Adji B. Dieng , Francisco J. R. Ruiz , David M. Blei