Related papers: Content Modeling Using Latent Permutations

Jointly Modeling Topics and Intents with Global Order Structure

Modeling document structure is of great importance for discourse analysis and related applications. The goal of this research is to capture the document intent structure by modeling documents as a mixture of topic words and rhetorical…

Computation and Language · Computer Science 2015-12-08 Bei Chen , Jun Zhu , Nan Yang , Tian Tian , Ming Zhou , Bo Zhang

Unveiling the semantic structure of text documents using paragraph-aware Topic Models

Classic Topic Models are built under the Bag Of Words assumption, in which word position is ignored for simplicity. Besides, symmetric priors are typically used in most applications. In order to easily learn topics with different properties…

Computation and Language · Computer Science 2018-06-27 Simón Roca-Sotelo , Jerónimo Arenas-García

Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization

We consider the problem of modeling the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear. We first present an effective knowledge-lean method for…

Computation and Language · Computer Science 2007-05-23 Regina Barzilay , Lillian Lee

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations

Topic models have been the prominent tools for automatic topic discovery from text corpora. Despite their effectiveness, topic models suffer from several limitations including the inability of modeling word ordering information in…

Computation and Language · Computer Science 2022-02-10 Yu Meng , Yunyi Zhang , Jiaxin Huang , Yu Zhang , Jiawei Han

Topic Scaling: A Joint Document Scaling -- Topic Model Approach To Learn Time-Specific Topics

This paper proposes a new methodology to study sequential corpora by implementing a two-stage algorithm that learns time-based topics with respect to a scale of document positions and introduces the concept of Topic Scaling which ranks…

Information Retrieval · Computer Science 2021-04-05 Sami Diaf , Ulrich Fritsche

An Adaptation of Topic Modeling to Sentences

Advances in topic modeling have yielded effective methods for characterizing the latent semantics of textual data. However, applying standard topic modeling approaches to sentence-level tasks introduces a number of challenges. In this…

Computation and Language · Computer Science 2016-07-21 Ruey-Cheng Chen , Reid Swanson , Andrew S. Gordon

Conceptualization Topic Modeling

Recently, topic modeling has been widely used to discover the abstract topics in text corpora. Most of the existing topic models are based on the assumption of three-layer hierarchical Bayesian structure, i.e. each document is modeled as a…

Computation and Language · Computer Science 2017-04-10 Yi-Kun Tang , Xian-Ling Mao , Heyan Huang , Guihua Wen

Joint Modeling of Content and Discourse Relations in Dialogues

We present a joint modeling approach to identify salient discussion points in spoken meetings as well as to label the discourse relations between speaker turns. A variation of our model is also discussed when discourse relations are treated…

Computation and Language · Computer Science 2017-05-16 Kechen Qin , Lu Wang , Joseph Kim

Explainable and Discourse Topic-aware Neural Language Understanding

Marrying topic models and language models exposes language understanding to a broader source of document-level context beyond sentences via topics. While introducing topical semantics in language models, existing approaches incorporate…

Computation and Language · Computer Science 2023-06-28 Yatin Chaudhary , Hinrich Schütze , Pankaj Gupta

Knowledge-Aware Bayesian Deep Topic Model

We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling. Although embedded topic models (ETMs) and its variants have gained promising performance in text analysis, they mainly focus…

Computation and Language · Computer Science 2022-09-29 Dongsheng Wang , Yishi Xu , Miaoge Li , Zhibin Duan , Chaojie Wang , Bo Chen , Mingyuan Zhou

Latent Relation Language Models

In this paper, we propose Latent Relation Language Models (LRLMs), a class of language models that parameterizes the joint distribution over the words in a document and the entities that occur therein via knowledge graph relations. This…

Computation and Language · Computer Science 2019-08-22 Hiroaki Hayashi , Zecong Hu , Chenyan Xiong , Graham Neubig

Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints

Unsupervised estimation of latent variable models is a fundamental problem central to numerous applications of machine learning and statistics. This work presents a principled approach for estimating broad classes of such models, including…

Machine Learning · Statistics 2013-05-27 Animashree Anandkumar , Daniel Hsu , Adel Javanmard , Sham M. Kakade

On Smoothing and Inference for Topic Models

Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling,…

Machine Learning · Computer Science 2012-05-14 Arthur Asuncion , Max Welling , Padhraic Smyth , Yee Whye Teh

Latent Topic Models for Hypertext

Latent topic models have been successfully applied as an unsupervised topic discovery technique in large document collections. With the proliferation of hypertext document collection such as the Internet, there has also been great interest…

Information Retrieval · Computer Science 2012-06-18 Amit Gruber , Michal Rosen-Zvi , Yair Weiss

Learning document embeddings along with their uncertainties

Majority of the text modelling techniques yield only point-estimates of document embeddings and lack in capturing the uncertainty of the estimates. These uncertainties give a notion of how well the embeddings represent a document. We…

Computation and Language · Computer Science 2020-08-03 Santosh Kesiraju , Oldřich Plchot , Lukáš Burget , Suryakanth V Gangashetty

A Topic Modeling Approach to Ranking

We propose a topic modeling approach to the prediction of preferences in pairwise comparisons. We develop a new generative model for pairwise comparisons that accounts for multiple shared latent rankings that are prevalent in a population…

Machine Learning · Computer Science 2015-01-27 Weicong Ding , Prakash Ishwar , Venkatesh Saligrama

Generative Topic Embedding: a Continuous Representation of Documents (Extended Version with Proofs)

Word embedding maps words into a low-dimensional continuous embedding space by exploiting the local word collocation patterns in a small context window. On the other hand, topic modeling maps documents onto a low-dimensional topic space, by…

Computation and Language · Computer Science 2016-08-09 Shaohua Li , Tat-Seng Chua , Jun Zhu , Chunyan Miao

A Joint Model of Conversational Discourse and Latent Topics on Microblogs

Conventional topic models are ineffective for topic extraction from microblog messages, because the data sparseness exhibited in short messages lacking structure and contexts results in poor message-level word co-occurrence patterns. To…

Computation and Language · Computer Science 2018-09-12 Jing Li , Yan Song , Zhongyu Wei , Kam-Fai Wong

Graph Topic Modeling for Documents with Spatial or Covariate Dependencies

We address the challenge of incorporating document-level metadata into topic modeling to improve topic mixture estimation. To overcome the computational complexity and lack of theoretical guarantees in existing Bayesian methods, we extend…

Machine Learning · Computer Science 2025-03-18 Yeo Jin Jung , Claire Donnat

Top2Vec: Distributed Representations of Topics

Topic modeling is used for discovering latent semantic structure, usually referred to as topics, in a large collection of documents. The most widely used methods are Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis.…

Computation and Language · Computer Science 2020-08-24 Dimo Angelov