Related papers: Topically Driven Neural Language Model

On a Topic Model for Sentences

Probabilistic topic models are generative models that describe the content of documents by discovering the latent topics underlying them. However, the structure of the textual input, and for instance the grouping of words in coherent text…

Computation and Language · Computer Science 2016-06-02 Georgios Balikas , Massih-Reza Amini , Marianne Clausel

Explainable and Discourse Topic-aware Neural Language Understanding

Marrying topic models and language models exposes language understanding to a broader source of document-level context beyond sentences via topics. While introducing topical semantics in language models, existing approaches incorporate…

Computation and Language · Computer Science 2023-06-28 Yatin Chaudhary , Hinrich Schütze , Pankaj Gupta

Document Context Language Models

Text documents are structured on multiple levels of detail: individual words are related by syntax, but larger units of text are related by discourse structure. Existing language models generally fail to account for discourse structure, but…

Computation and Language · Computer Science 2016-02-23 Yangfeng Ji , Trevor Cohn , Lingpeng Kong , Chris Dyer , Jacob Eisenstein

Towards Making the Most of Context in Neural Machine Translation

Document-level machine translation manages to outperform sentence level models by a small margin, but have failed to be widely adopted. We argue that previous research did not make a clear use of the global context, and propose a new…

Computation and Language · Computer Science 2020-09-10 Zaixiang Zheng , Xiang Yue , Shujian Huang , Jiajun Chen , Alexandra Birch

The Polylingual Labeled Topic Model

In this paper, we present the Polylingual Labeled Topic Model, a model which combines the characteristics of the existing Polylingual Topic Model and Labeled LDA. The model accounts for multiple languages with separate topic distributions…

Computation and Language · Computer Science 2017-05-03 Lisa Posch , Arnim Bleier , Philipp Schaer , Markus Strohmaier

Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling

Topic modelling, as a well-established unsupervised technique, has found extensive use in automatically detecting significant topics within a corpus of documents. However, classic topic modelling approaches (e.g., LDA) have certain…

Computation and Language · Computer Science 2024-03-27 Yida Mu , Chun Dong , Kalina Bontcheva , Xingyi Song

Document Informed Neural Autoregressive Topic Models

Context information around words helps in determining their actual meaning, for example "networks" used in contexts of artificial neural networks or biological neuron networks. Generative topic models infer topic-word distributions, taking…

Information Retrieval · Computer Science 2018-08-14 Pankaj Gupta , Florian Buettner , Hinrich Schütze

Lifelong Neural Topic Learning in Contextualized Autoregressive Topic Models of Language via Informative Transfers

Topic models such as LDA, DocNADE, iDocNADEe have been popular in document analysis. However, the traditional topic models have several limitations including: (1) Bag-of-words (BoW) assumption, where they ignore word ordering, (2) Data…

Information Retrieval · Computer Science 2019-10-01 Yatin Chaudhary , Pankaj Gupta , Thomas Runkler

Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

Topic models extract groups of words from documents, whose interpretation as a topic hopefully allows for a better understanding of the data. However, the resulting word groups are often not coherent, making them harder to interpret.…

Computation and Language · Computer Science 2021-06-18 Federico Bianchi , Silvia Terragni , Dirk Hovy

Revisiting Topic-Guided Language Models

A recent line of work in natural language processing has aimed to combine language models and topic models. These topic-guided language models augment neural language models with topic models, unsupervised learning methods that can discover…

Computation and Language · Computer Science 2023-12-06 Carolina Zheng , Keyon Vafa , David M. Blei

TopicGPT: A Prompt-based Topic Modeling Framework

Topic modeling is a well-established technique for exploring text corpora. Conventional topic models (e.g., LDA) represent topics as bags of words that often require "reading the tea leaves" to interpret; additionally, they offer users…

Computation and Language · Computer Science 2024-04-03 Chau Minh Pham , Alexander Hoyle , Simeng Sun , Philip Resnik , Mohit Iyyer

Document-Level Language Models for Machine Translation

Despite the known limitations, most machine translation systems today still operate on the sentence-level. One reason for this is, that most parallel training data is only sentence-level aligned, without document-level meta information…

Computation and Language · Computer Science 2023-10-20 Frithjof Petrick , Christian Herold , Pavel Petrushkov , Shahram Khadivi , Hermann Ney

Prompting Large Language Models for Topic Modeling

Topic modeling is a widely used technique for revealing underlying thematic structures within textual data. However, existing models have certain limitations, particularly when dealing with short text datasets that lack co-occurring words.…

Artificial Intelligence · Computer Science 2023-12-18 Han Wang , Nirmalendu Prakash , Nguyen Khoi Hoang , Ming Shan Hee , Usman Naseem , Roy Ka-Wei Lee

Language Model Pre-training for Hierarchical Document Representations

Hierarchical neural architectures are often used to capture long-distance dependencies and have been applied to many document-level tasks such as summarization, document segmentation, and sentiment analysis. However, effective usage of such…

Computation and Language · Computer Science 2019-01-29 Ming-Wei Chang , Kristina Toutanova , Kenton Lee , Jacob Devlin

Language Models Are Implicitly Continuous

Language is typically modelled with discrete sequences. However, the most successful approaches to language modelling, namely neural networks, are continuous and smooth function approximators. In this work, we show that Transformer-based…

Computation and Language · Computer Science 2025-04-08 Samuele Marro , Davide Evangelista , X. Angelo Huang , Emanuele La Malfa , Michele Lombardi , Michael Wooldridge

TMT: A Simple Way to Translate Topic Models Using Dictionaries

The training of topic models for a multilingual environment is a challenging task, requiring the use of sophisticated algorithms, topic-aligned corpora, and manual evaluation. These difficulties are further exacerbated when the developer…

Computation and Language · Computer Science 2025-09-03 Felix Engl , Andreas Henrich

Learning Dynamic Author Representations with Temporal Language Models

Language models are at the heart of numerous works, notably in the text mining and information retrieval communities. These statistical models aim at extracting word distributions, from simple unigram models to recurrent approaches with…

Computation and Language · Computer Science 2020-02-25 Edouard Delasalles , Sylvain Lamprier , Ludovic Denoyer

Topics as Entity Clusters: Entity-based Topics from Large Language Models and Graph Neural Networks

Topic models aim to reveal latent structures within a corpus of text, typically through the use of term-frequency statistics over bag-of-words representations from documents. In recent years, conceptual entities -- interpretable,…

Computation and Language · Computer Science 2024-08-27 Manuel V. Loureiro , Steven Derby , Tri Kurniawan Wijaya

An Adaptation of Topic Modeling to Sentences

Advances in topic modeling have yielded effective methods for characterizing the latent semantics of textual data. However, applying standard topic modeling approaches to sentence-level tasks introduces a number of challenges. In this…

Computation and Language · Computer Science 2016-07-21 Ruey-Cheng Chen , Reid Swanson , Andrew S. Gordon

Keyword Assisted Embedded Topic Model

By illuminating latent structures in a corpus of text, topic models are an essential tool for categorizing, summarizing, and exploring large collections of documents. Probabilistic topic models, such as latent Dirichlet allocation (LDA),…

Information Retrieval · Computer Science 2021-12-07 Bahareh Harandizadeh , J. Hunter Priniski , Fred Morstatter