Related papers: Multilingual Factor Analysis

Multilingual Models for Compositional Distributed Semantics

We present a novel technique for learning semantic representations, which extends the distributional hypothesis to multilingual data and joint-space embeddings. Our models leverage parallel data and learn to strongly align the embeddings of…

Computation and Language · Computer Science 2014-04-21 Karl Moritz Hermann , Phil Blunsom

Multilingual Topic Models

Scientific publications have evolved several features for mitigating vocabulary mismatch when indexing, retrieving, and computing similarity between articles. These mitigation strategies range from simply focusing on high-value article…

Machine Learning · Statistics 2017-12-20 Kriste Krstovski , Michael J. Kurtz , David A. Smith , Alberto Accomazzi

Deep Generative Model for Joint Alignment and Word Representation

This work exploits translation data as a source of semantically relevant learning signal for models of word representation. In particular, we exploit equivalence through translation as a form of distributed context and jointly learn how to…

Computation and Language · Computer Science 2018-04-24 Miguel Rios , Wilker Aziz , Khalil Sima'an

A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution

Most existing word embedding methods can be categorized into Neural Embedding Models and Matrix Factorization (MF)-based methods. However some models are opaque to probabilistic interpretation, and MF-based methods, typically solved using…

Computation and Language · Computer Science 2015-08-18 Shaohua Li , Jun Zhu , Chunyan Miao

Modelling Latent Skills for Multitask Language Generation

We present a generative model for multitask conditional language generation. Our guiding hypothesis is that a shared set of latent skills underlies many disparate language generation tasks, and that explicitly modelling these skills in a…

Computation and Language · Computer Science 2020-02-25 Kris Cao , Dani Yogatama

A Neural Generative Model for Joint Learning Topics and Topic-Specific Word Embeddings

We propose a novel generative model to explore both local and global context for joint learning topics and topic-specific word embeddings. In particular, we assume that global latent topics are shared across documents, a word is generated…

Computation and Language · Computer Science 2020-08-12 Lixing Zhu , Yulan He , Deyu Zhou

A Generative Model of Words and Relationships from Multiple Sources

Neural language models are a powerful tool to embed words into semantic vector spaces. However, learning such models generally relies on the availability of abundant and diverse training examples. In highly specialised domains this…

Computation and Language · Computer Science 2015-12-04 Stephanie L. Hyland , Theofanis Karaletsos , Gunnar Rätsch

Consistent Alignment of Word Embedding Models

Word embedding models offer continuous vector representations that can capture rich contextual semantics based on their word co-occurrence patterns. While these word vectors can provide very effective features used in many NLP tasks such as…

Computation and Language · Computer Science 2017-02-27 Cem Safak Sahin , Rajmonda S. Caceres , Brandon Oselio , William M. Campbell

Learning to Represent Bilingual Dictionaries

Bilingual word embeddings have been widely used to capture the similarity of lexical semantics in different human languages. However, many applications, such as cross-lingual semantic search and question answering, can be largely benefited…

Computation and Language · Computer Science 2019-09-10 Muhao Chen , Yingtao Tian , Haochen Chen , Kai-Wei Chang , Steven Skiena , Carlo Zaniolo

Multilingual Distributed Representations without Word Alignment

Distributed representations of meaning are a natural way to encode covariance relationships between words and phrases in NLP. By overcoming data sparsity problems, as well as providing information about semantic relatedness which is not…

Computation and Language · Computer Science 2014-03-21 Karl Moritz Hermann , Phil Blunsom

Analogical Inference for Multi-Relational Embeddings

Large-scale multi-relational embedding refers to the task of learning the latent representations for entities and relations in large knowledge graphs. An effective and scalable solution for this problem is crucial for the true success of…

Machine Learning · Computer Science 2017-07-07 Hanxiao Liu , Yuexin Wu , Yiming Yang

Explaining latent representations of generative models with large multimodal models

Learning interpretable representations of data generative latent factors is an important topic for the development of artificial intelligence. With the rise of the large multimodal model, it can align images with text to generate answers.…

Machine Learning · Computer Science 2024-04-19 Mengdan Zhu , Zhenke Liu , Bo Pan , Abhinav Angirekula , Liang Zhao

Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval

Contrastive learning has been successfully used for retrieval of semantically aligned sentences, but it often requires large batch sizes or careful engineering to work well. In this paper, we instead propose a generative model for learning…

Computation and Language · Computer Science 2023-06-06 John Wieting , Jonathan H. Clark , William W. Cohen , Graham Neubig , Taylor Berg-Kirkpatrick

Multilingual Word Embeddings using Multigraphs

We present a family of neural-network--inspired models for computing continuous word representations, specifically designed to exploit both monolingual and multilingual text. This framework allows us to perform unsupervised training of…

Computation and Language · Computer Science 2016-12-15 Radu Soricut , Nan Ding

Polyglot: Distributed Word Representations for Multilingual NLP

Distributed word representations (word embeddings) have recently contributed to competitive performance in language modeling and several NLP tasks. In this work, we train word embeddings for more than 100 languages using their corresponding…

Computation and Language · Computer Science 2014-06-30 Rami Al-Rfou , Bryan Perozzi , Steven Skiena

VCDM: Leveraging Variational Bi-encoding and Deep Contextualized Word Representations for Improved Definition Modeling

In this paper, we tackle the task of definition modeling, where the goal is to learn to generate definitions of words and phrases. Existing approaches for this task are discriminative, combining distributional and lexical semantics in an…

Computation and Language · Computer Science 2020-10-08 Machel Reid , Edison Marrese-Taylor , Yutaka Matsuo

Definition Modeling: Learning to define word embeddings in natural language

Distributed representations of words have been shown to capture lexical semantics, as demonstrated by their effectiveness in word similarity and analogical relation tasks. But, these tasks only evaluate lexical semantics indirectly. In this…

Computation and Language · Computer Science 2016-12-02 Thanapon Noraset , Chen Liang , Larry Birnbaum , Doug Downey

Learning Joint Multilingual Sentence Representations with Neural Machine Translation

In this paper, we use the framework of neural machine translation to learn joint sentence representations across six very different languages. Our aim is that a representation which is independent of the language, is likely to capture the…

Computation and Language · Computer Science 2017-08-09 Holger Schwenk , Matthijs Douze

A Latent Variable Model Approach to PMI-based Word Embeddings

Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods. Many use nonlinear operations on co-occurrence statistics, and have hand-tuned hyperparameters and reweighting methods. This paper…

Machine Learning · Computer Science 2019-06-21 Sanjeev Arora , Yuanzhi Li , Yingyu Liang , Tengyu Ma , Andrej Risteski

A Bilingual Generative Transformer for Semantic Sentence Embedding

Semantic sentence embedding models encode natural language sentences into vectors, such that closeness in embedding space indicates closeness in the semantics between the sentences. Bilingual data offers a useful signal for learning such…

Computation and Language · Computer Science 2020-11-20 John Wieting , Graham Neubig , Taylor Berg-Kirkpatrick