Related papers: Pixel Sentence Representation Learning

An efficient framework for learning sentence representations

In this work we propose a simple and efficient framework for learning sentence representations from unlabelled data. Drawing inspiration from the distributional hypothesis and recent work on learning sentence representations, we reformulate…

Computation and Language · Computer Science 2018-03-09 Lajanugen Logeswaran , Honglak Lee

SLM: Learning a Discourse Language Representation with Sentence Unshuffling

We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation in a fully self-supervised manner. Recent pre-training methods in NLP focus on learning either bottom or top-level…

Computation and Language · Computer Science 2020-11-02 Haejun Lee , Drew A. Hudson , Kangwook Lee , Christopher D. Manning

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

Many modern NLP systems rely on word embeddings, previously trained in an unsupervised manner on large corpora, as base features. Efforts to obtain embeddings for larger chunks of text, such as sentences, have however not been so…

Computation and Language · Computer Science 2018-07-10 Alexis Conneau , Douwe Kiela , Holger Schwenk , Loic Barrault , Antoine Bordes

An Unsupervised Sampling Approach for Image-Sentence Matching Using Document-Level Structural Information

In this paper, we focus on the problem of unsupervised image-sentence matching. Existing research explores to utilize document-level structural information to sample positive and negative instances for model training. Although the approach…

Computer Vision and Pattern Recognition · Computer Science 2021-04-07 Zejun Li , Zhongyu Wei , Zhihao Fan , Haijun Shan , Xuanjing Huang

Universal Multimodal Representation for Language Understanding

Representation learning is the foundation of natural language processing (NLP). This work presents new methods to employ visual information as assistant signals to general NLP tasks. For each sentence, we first retrieve a flexible number of…

Computation and Language · Computer Science 2023-01-10 Zhuosheng Zhang , Kehai Chen , Rui Wang , Masao Utiyama , Eiichiro Sumita , Zuchao Li , Hai Zhao

Multilingual Distributed Representations without Word Alignment

Distributed representations of meaning are a natural way to encode covariance relationships between words and phrases in NLP. By overcoming data sparsity problems, as well as providing information about semantic relatedness which is not…

Computation and Language · Computer Science 2014-03-21 Karl Moritz Hermann , Phil Blunsom

Unsupervised Sentence Representations as Word Information Series: Revisiting TF--IDF

Sentence representation at the semantic level is a challenging task for Natural Language Processing and Artificial Intelligence. Despite the advances in word embeddings (i.e. word vector representations), capturing sentence meaning is an…

Computation and Language · Computer Science 2017-10-23 Ignacio Arroyo-Fernández , Carlos-Francisco Méndez-Cruz , Gerardo Sierra , Juan-Manuel Torres-Moreno , Grigori Sidorov

Learning Representations by Predicting Bags of Visual Words

Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data. Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially…

Computer Vision and Pattern Recognition · Computer Science 2020-02-28 Spyros Gidaris , Andrei Bursuc , Nikos Komodakis , Patrick Pérez , Matthieu Cord

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

Pre-trained representations are becoming crucial for many NLP and perception tasks. While representation learning in NLP has transitioned to training on raw text without human annotations, visual and vision-language representations still…

Computer Vision and Pattern Recognition · Computer Science 2021-06-14 Chao Jia , Yinfei Yang , Ye Xia , Yi-Ting Chen , Zarana Parekh , Hieu Pham , Quoc V. Le , Yunhsuan Sung , Zhen Li , Tom Duerig

Improving Disentangled Text Representation Learning with Information-Theoretic Guidance

Learning disentangled representations of natural language is essential for many NLP tasks, e.g., conditional text generation, style transfer, personalized dialogue systems, etc. Similar problems have been studied extensively for other forms…

Machine Learning · Computer Science 2022-01-13 Pengyu Cheng , Martin Renqiang Min , Dinghan Shen , Christopher Malon , Yizhe Zhang , Yitong Li , Lawrence Carin

Unsupervised Attention-based Sentence-Level Meta-Embeddings from Contextualised Language Models

A variety of contextualised language models have been proposed in the NLP community, which are trained on diverse corpora to produce numerous Neural Language Models (NLMs). However, different NLMs have reported different levels of…

Computation and Language · Computer Science 2022-04-19 Keigo Takahashi , Danushka Bollegala

A Comprehensive Survey of Sentence Representations: From the BERT Epoch to the ChatGPT Era and Beyond

Sentence representations are a critical component in NLP applications such as retrieval, question answering, and text classification. They capture the meaning of a sentence, enabling machines to understand and reason over human language. In…

Computation and Language · Computer Science 2024-02-05 Abhinav Ramesh Kashyap , Thanh-Tung Nguyen , Viktor Schlegel , Stefan Winkler , See-Kiong Ng , Soujanya Poria

Learning Disentangled Representations for Natural Language Definitions

Disentangling the encodings of neural models is a fundamental aspect for improving interpretability, semantic control and downstream task performance in Natural Language Processing. Currently, most disentanglement methods are unsupervised…

Computation and Language · Computer Science 2023-02-17 Danilo S. Carvalho , Giangiacomo Mercatali , Yingji Zhang , Andre Freitas

Sentence transition matrix: An efficient approach that preserves sentence semantics

Sentence embedding is a significant research topic in the field of natural language processing (NLP). Generating sentence embedding vectors reflecting the intrinsic meaning of a sentence is a key factor to achieve an enhanced performance in…

Computation and Language · Computer Science 2019-01-17 Myeongjun Jang , Pilsung Kang

Universal Sentence Representation Learning with Conditional Masked Language Model

This paper presents a novel training method, Conditional Masked Language Modeling (CMLM), to effectively learn sentence representations on large scale unlabeled corpora. CMLM integrates sentence representation learning into MLM training by…

Computation and Language · Computer Science 2021-09-13 Ziyi Yang , Yinfei Yang , Daniel Cer , Jax Law , Eric Darve

Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings

Semantic representation learning for sentences is an important and well-studied problem in NLP. The current trend for this task involves training a Transformer-based sentence encoder through a contrastive objective with text, i.e.,…

Computation and Language · Computer Science 2022-09-21 Yiren Jian , Chongyang Gao , Soroush Vosoughi

Syntactic Perturbations Reveal Representational Correlates of Hierarchical Phrase Structure in Pretrained Language Models

While vector-based language representations from pretrained language models have set a new standard for many NLP tasks, there is not yet a complete accounting of their inner workings. In particular, it is not entirely clear what aspects of…

Computation and Language · Computer Science 2021-04-16 Matteo Alleman , Jonathan Mamou , Miguel A Del Rio , Hanlin Tang , Yoon Kim , SueYeon Chung

The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision

We propose the Neuro-Symbolic Concept Learner (NS-CL), a model that learns visual concepts, words, and semantic parsing of sentences without explicit supervision on any of them; instead, our model learns by simply looking at images and…

Computer Vision and Pattern Recognition · Computer Science 2019-04-30 Jiayuan Mao , Chuang Gan , Pushmeet Kohli , Joshua B. Tenenbaum , Jiajun Wu

On the difficulty of a distributional semantics of spoken language

In the domain of unsupervised learning most work on speech has focused on discovering low-level constructs such as phoneme inventories or word-like units. In contrast, for written language, where there is a large body of work on…

Computation and Language · Computer Science 2018-10-29 Grzegorz Chrupała , Lieke Gelderloos , Ákos Kádár , Afra Alishahi

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

A lot of the recent success in natural language processing (NLP) has been driven by distributed vector representations of words trained on large amounts of text in an unsupervised manner. These representations are typically used as general…

Computation and Language · Computer Science 2018-04-03 Sandeep Subramanian , Adam Trischler , Yoshua Bengio , Christopher J Pal