Related papers: Multilingual Topic Models

Multilingual Factor Analysis

In this work we approach the task of learning multilingual word representations in an offline manner by fitting a generative latent variable model to a multilingual dictionary. We model equivalent words in different languages as different…

Machine Learning · Computer Science 2019-10-25 Francisco Vargas , Kamen Brestnichki , Alex Papadopoulos-Korfiatis , Nils Hammerla

Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings

A topic model is often formulated as a generative model that explains how each word of a document is generated given a set of topics and document-specific topic proportions. It is focused on capturing the word co-occurrences in a document…

Machine Learning · Computer Science 2022-03-16 Dongsheng Wang , Dandan Guo , He Zhao , Huangjie Zheng , Korawat Tanwisuth , Bo Chen , Mingyuan Zhou

Subtopic-aware View Sampling and Temporal Aggregation for Long-form Document Matching

Long-form document matching aims to judge the relevance between two documents and has been applied to various scenarios. Most existing works utilize hierarchical or long context models to process documents, which achieve coarse…

Information Retrieval · Computer Science 2024-12-25 Youchao Zhou , Heyan Huang , Zhijing Wu , Yuhang Liu , Xinglin Wang

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations

Topic models have been the prominent tools for automatic topic discovery from text corpora. Despite their effectiveness, topic models suffer from several limitations including the inability of modeling word ordering information in…

Computation and Language · Computer Science 2022-02-10 Yu Meng , Yunyi Zhang , Jiaxin Huang , Yu Zhang , Jiawei Han

Multilingual Models for Compositional Distributed Semantics

We present a novel technique for learning semantic representations, which extends the distributional hypothesis to multilingual data and joint-space embeddings. Our models leverage parallel data and learn to strongly align the embeddings of…

Computation and Language · Computer Science 2014-04-21 Karl Moritz Hermann , Phil Blunsom

Automatic Text Summarization Approaches to Speed up Topic Model Learning Process

The number of documents available into Internet moves each day up. For this reason, processing this amount of information effectively and expressibly becomes a major concern for companies and scientists. Methods that represent a textual…

Information Retrieval · Computer Science 2017-03-21 Mohamed Morchid , Juan-Manuel Torres-Moreno , Richard Dufour , Javier Ramírez-Rodríguez , Georges Linarès

Learning Multilingual Topics from Incomparable Corpus

Multilingual topic models enable crosslingual tasks by extracting consistent topics from multilingual corpora. Most models require parallel or comparable training corpora, which limits their ability to generalize. In this paper, we first…

Computation and Language · Computer Science 2018-06-13 Shudong Hao , Michael J. Paul

Generative Topic Embedding: a Continuous Representation of Documents (Extended Version with Proofs)

Word embedding maps words into a low-dimensional continuous embedding space by exploiting the local word collocation patterns in a small context window. On the other hand, topic modeling maps documents onto a low-dimensional topic space, by…

Computation and Language · Computer Science 2016-08-09 Shaohua Li , Tat-Seng Chua , Jun Zhu , Chunyan Miao

Explainable and Discourse Topic-aware Neural Language Understanding

Marrying topic models and language models exposes language understanding to a broader source of document-level context beyond sentences via topics. While introducing topical semantics in language models, existing approaches incorporate…

Computation and Language · Computer Science 2023-06-28 Yatin Chaudhary , Hinrich Schütze , Pankaj Gupta

Bilingual Topic Models for Comparable Corpora

Probabilistic topic models like Latent Dirichlet Allocation (LDA) have been previously extended to the bilingual setting. A fundamental modeling assumption in several of these extensions is that the input corpora are in the form of document…

Computation and Language · Computer Science 2021-12-01 Georgios Balikas , Massih-Reza Amini , Marianne Clausel

Improving Topic Models with Latent Feature Word Representations

Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two…

Computation and Language · Computer Science 2018-10-16 Dat Quoc Nguyen , Richard Billingsley , Lan Du , Mark Johnson

LTSG: Latent Topical Skip-Gram for Mutually Learning Topic Model and Vector Representations

Topic models have been widely used in discovering latent topics which are shared across documents in text mining. Vector representations, word embeddings and topic embeddings, map words and topics into a low-dimensional and dense real-value…

Computation and Language · Computer Science 2017-02-24 Jarvan Law , Hankz Hankui Zhuo , Junhua He , Erhu Rong

Bridging Linguistic Typology and Multilingual Machine Translation with Multi-View Language Representations

Sparse language vectors from linguistic typology databases and learned embeddings from tasks like multilingual machine translation have been investigated in isolation, without analysing how they could benefit from each other's language…

Computation and Language · Computer Science 2020-10-27 Arturo Oncevay , Barry Haddow , Alexandra Birch

Multivariate Gaussian Topic Modelling: A novel approach to discover topics with greater semantic coherence

An important aspect of text mining involves information retrieval in form of discovery of semantic themes (topics) from documents using topic modelling. While generative topic models like Latent Dirichlet Allocation (LDA) or Latent Semantic…

Machine Learning · Computer Science 2025-11-04 Satyajeet Sahoo , Jhareswar Maiti

Dynamic Topic Modeling with a Higher-Order Hypergraphical Representation

Dynamic topic modeling is widely used to analyze evolving trends in scientific literature, medical records, and social media. Traditional topic models represent each topic through a single probability vector on the multinomial simplex and…

Machine Learning · Computer Science 2026-05-28 Hanjia Gao , Hanwen Ye , Qing Nie , Annie Qu

Learning Semantic Textual Similarity via Topic-informed Discrete Latent Variables

Recently, discrete latent variable models have received a surge of interest in both Natural Language Processing (NLP) and Computer Vision (CV), attributed to their comparable performance to the continuous counterparts in representation…

Computation and Language · Computer Science 2022-11-08 Erxin Yu , Lan Du , Yuan Jin , Zhepei Wei , Yi Chang

The Polylingual Labeled Topic Model

In this paper, we present the Polylingual Labeled Topic Model, a model which combines the characteristics of the existing Polylingual Topic Model and Labeled LDA. The model accounts for multiple languages with separate topic distributions…

Computation and Language · Computer Science 2017-05-03 Lisa Posch , Arnim Bleier , Philipp Schaer , Markus Strohmaier

Legal document retrieval across languages: topic hierarchies based on synsets

Cross-lingual annotations of legislative texts enable us to explore major themes covered in multilingual legal data and are a key facilitator of semantic similarity when searching for similar documents. Multilingual probabilistic topic…

Information Retrieval · Computer Science 2019-12-02 Carlos Badenes-Olmedo , Jose-Luis Redondo-Garcia , Oscar Corcho

A Joint Model of Conversational Discourse and Latent Topics on Microblogs

Conventional topic models are ineffective for topic extraction from microblog messages, because the data sparseness exhibited in short messages lacking structure and contexts results in poor message-level word co-occurrence patterns. To…

Computation and Language · Computer Science 2018-09-12 Jing Li , Yan Song , Zhongyu Wei , Kam-Fai Wong

LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models

Topic modeling has been a widely used tool for unsupervised text analysis. However, comprehensive evaluations of a topic model remain challenging. Existing evaluation methods are either less comparable across different models (e.g.,…

Computation and Language · Computer Science 2025-01-15 Xiaohao Yang , He Zhao , Dinh Phung , Wray Buntine , Lan Du