Related papers: Learning Visual Representations via Language-Guide…

Learning Visual Composition through Improved Semantic Guidance

Visual imagery does not consist of solitary objects, but instead reflects the composition of a multitude of fluid concepts. While there have been great advances in visual representation learning, such advances have focused on building…

Computer Vision and Pattern Recognition · Computer Science 2025-04-07 Austin Stone , Hagen Soltau , Robert Geirhos , Xi Yi , Ye Xia , Bingyi Cao , Kaifeng Chen , Abhijit Ogale , Jonathon Shlens

Support-set bottlenecks for video-text representation learning

The dominant paradigm for learning video-text representations -- noise contrastive learning -- increases the similarity of the representations of pairs of samples that are known to be related, such as text and video from the same sample,…

Computer Vision and Pattern Recognition · Computer Science 2021-01-15 Mandela Patrick , Po-Yao Huang , Yuki Asano , Florian Metze , Alexander Hauptmann , João Henriques , Andrea Vedaldi

Debiased Contrastive Learning

A prominent technique for self-supervised representation learning has been to contrast semantically similar and dissimilar pairs of samples. Without access to labels, dissimilar (negative) points are typically taken to be randomly sampled…

Machine Learning · Computer Science 2020-10-22 Ching-Yao Chuang , Joshua Robinson , Lin Yen-Chen , Antonio Torralba , Stefanie Jegelka

Visualizing and Understanding Contrastive Learning

Contrastive learning has revolutionized the field of computer vision, learning rich representations from unlabeled data, which generalize well to diverse vision tasks. Consequently, it has become increasingly important to explain these…

Computer Vision and Pattern Recognition · Computer Science 2023-12-15 Fawaz Sammani , Boris Joukovsky , Nikos Deligiannis

Contrastive Learning for Unsupervised Image-to-Image Translation

Image-to-image translation aims to learn a mapping between different groups of visually distinguishable images. While recent methods have shown impressive ability to change even intricate appearance of images, they still rely on domain…

Computer Vision and Pattern Recognition · Computer Science 2021-05-10 Hanbit Lee , Jinseok Seol , Sang-goo Lee

Universal Multimodal Representation for Language Understanding

Representation learning is the foundation of natural language processing (NLP). This work presents new methods to employ visual information as assistant signals to general NLP tasks. For each sentence, we first retrieve a flexible number of…

Computation and Language · Computer Science 2023-01-10 Zhuosheng Zhang , Kehai Chen , Rui Wang , Masao Utiyama , Eiichiro Sumita , Zuchao Li , Hai Zhao

$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs

Learning good representations involves capturing the diverse ways in which data samples relate. Contrastive loss - an objective matching related samples - underlies methods from self-supervised to multimodal learning. Contrastive losses,…

Computer Vision and Pattern Recognition · Computer Science 2024-09-13 Vlad Sobal , Mark Ibrahim , Randall Balestriero , Vivien Cabannes , Diane Bouchacourt , Pietro Astolfi , Kyunghyun Cho , Yann LeCun

Diversifying Joint Vision-Language Tokenization Learning

Building joint representations across images and text is an essential step for tasks such as Visual Question Answering and Video Question Answering. In this work, we find that the representations must not only jointly capture features from…

Computer Vision and Pattern Recognition · Computer Science 2023-06-19 Vardaan Pahuja , AJ Piergiovanni , Anelia Angelova

Sample-Specific Debiasing for Better Image-Text Models

Self-supervised representation learning on image-text data facilitates crucial medical applications, such as image classification, visual grounding, and cross-modal retrieval. One common approach involves contrasting semantically similar…

Machine Learning · Computer Science 2023-08-15 Peiqi Wang , Yingcheng Liu , Ching-Yun Ko , William M. Wells , Seth Berkowitz , Steven Horng , Polina Golland

Audiovisual representation learning typically relies on the correspondence between sight and sound. However, there are often multiple audio tracks that can correspond with a visual scene. Consider, for example, different conversations on…

Sound · Computer Science 2024-06-11 Nikhil Singh , Chih-Wei Wu , Iroro Orife , Mahdi Kalayeh

CoCon: Cooperative-Contrastive Learning

Labeling videos at scale is impractical. Consequently, self-supervised visual representation learning is key for efficient video analysis. Recent success in learning image representations suggests contrastive learning is a promising…

Computer Vision and Pattern Recognition · Computer Science 2021-05-03 Nishant Rai , Ehsan Adeli , Kuan-Hui Lee , Adrien Gaidon , Juan Carlos Niebles

Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning

Vision-language models (VLMs) mainly rely on contrastive training to learn general-purpose representations of images and captions. We focus on the situation when one image is associated with several captions, each caption containing both…

Computer Vision and Pattern Recognition · Computer Science 2024-08-02 Maurits Bleeker , Mariya Hendriksen , Andrew Yates , Maarten de Rijke

Contrastive Learning of Global-Local Video Representations

Contrastive learning has delivered impressive results for various tasks in the self-supervised regime. However, existing approaches optimize for learning representations specific to downstream scenarios, i.e., \textit{global}…

Machine Learning · Computer Science 2021-10-29 Shuang Ma , Zhaoyang Zeng , Daniel McDuff , Yale Song

Rethinking Positive Pairs in Contrastive Learning

The training methods in AI do involve semantically distinct pairs of samples. However, their role typically is to enhance the between class separability. The actual notion of similarity is normally learned from semantically identical pairs.…

Computer Vision and Pattern Recognition · Computer Science 2025-05-30 Jiantao Wu , Sara Atito , Zhenhua Feng , Shentong Mo , Josef Kitler , Muhammad Awais

Co$^2$L: Contrastive Continual Learning

Recent breakthroughs in self-supervised learning show that such algorithms learn visual representations that can be transferred better to unseen tasks than joint-training methods relying on task-specific supervision. In this paper, we found…

Machine Learning · Computer Science 2021-06-29 Hyuntak Cha , Jaeho Lee , Jinwoo Shin

Perceptual Grouping in Contrastive Vision-Language Models

Recent advances in zero-shot image recognition suggest that vision-language models learn generic visual representations with a high degree of semantic information that may be arbitrarily probed with natural language phrases. Understanding…

Computer Vision and Pattern Recognition · Computer Science 2023-08-23 Kanchana Ranasinghe , Brandon McKinzie , Sachin Ravi , Yinfei Yang , Alexander Toshev , Jonathon Shlens

Understanding Synonymous Referring Expressions via Contrastive Features

Referring expression comprehension aims to localize objects identified by natural language descriptions. This is a challenging task as it requires understanding of both visual and language domains. One nature is that each object can be…

Computer Vision and Pattern Recognition · Computer Science 2021-04-21 Yi-Wen Chen , Yi-Hsuan Tsai , Ming-Hsuan Yang

Limitations of Cross-Lingual Learning from Image Search

Cross-lingual representation learning is an important step in making NLP scale to all the world's languages. Recent work on bilingual lexicon induction suggests that it is possible to learn cross-lingual representations of words based on…

Computation and Language · Computer Science 2017-09-19 Mareike Hartmann , Anders Soegaard

Multilingual Representation Distillation with Contrastive Learning

Multilingual sentence representations from large models encode semantic information from two or more languages and can be used for different cross-lingual information retrieval and matching tasks. In this paper, we integrate contrastive…

Computation and Language · Computer Science 2023-05-02 Weiting Tan , Kevin Heffernan , Holger Schwenk , Philipp Koehn

Efficient Vision-Language Pre-training by Cluster Masking

We propose a simple strategy for masking image patches during visual-language contrastive learning that improves the quality of the learned representations and the training speed. During each iteration of training, we randomly mask clusters…

Computer Vision and Pattern Recognition · Computer Science 2024-05-15 Zihao Wei , Zixuan Pan , Andrew Owens