Related papers: Robust Audio-Visual Instance Discrimination

Audio-Visual Instance Discrimination with Cross-Modal Agreement

We present a self-supervised learning approach to learn audio-visual representations from video and audio. Our method uses contrastive learning for cross-modal discrimination of video from audio and vice-versa. We show that optimizing for…

Computer Vision and Pattern Recognition · Computer Science 2021-03-31 Pedro Morgado , Nuno Vasconcelos , Ishan Misra

Contrastive representation learning has proven to be an effective self-supervised learning method for images and videos. Most successful approaches are based on Noise Contrastive Estimation (NCE) and use different views of an instance as…

Computer Vision and Pattern Recognition · Computer Science 2023-09-27 Julien Denize , Jaonary Rabarisoa , Astrid Orcesi , Romain Hérault

Self-supervised Co-training for Video Representation Learning

The objective of this paper is visual-only self-supervised video representation learning. We make the following contributions: (i) we investigate the benefit of adding semantic-class positives to instance-based Info Noise Contrastive…

Computer Vision and Pattern Recognition · Computer Science 2021-01-13 Tengda Han , Weidi Xie , Andrew Zisserman

ASCNet: Self-supervised Video Representation Learning with Appearance-Speed Consistency

We study self-supervised video representation learning, which is a challenging task due to 1) lack of labels for explicit supervision; 2) unstructured and noisy visual information. Existing methods mainly use contrastive loss with video…

Computer Vision and Pattern Recognition · Computer Science 2021-08-18 Deng Huang , Wenhao Wu , Weiwen Hu , Xu Liu , Dongliang He , Zhihua Wu , Xiangmiao Wu , Mingkui Tan , Errui Ding

MarginNCE: Robust Sound Localization with a Negative Margin

The goal of this work is to localize sound sources in visual scenes with a self-supervised approach. Contrastive learning in the context of sound source localization leverages the natural correspondence between audio and visual signals…

Computer Vision and Pattern Recognition · Computer Science 2022-11-04 Sooyoung Park , Arda Senocak , Joon Son Chung

Unsupervised Contrastive Learning of Sound Event Representations

Self-supervised representation learning can mitigate the limitations in recognition tasks with few manually labeled data but abundant unlabeled data---a common scenario in sound event research. In this work, we explore unsupervised…

Sound · Computer Science 2020-11-17 Eduardo Fonseca , Diego Ortego , Kevin McGuinness , Noel E. O'Connor , Xavier Serra

Noise-Tolerant Learning for Audio-Visual Action Recognition

Recently, video recognition is emerging with the help of multi-modal learning, which focuses on integrating distinct modalities to improve the performance or robustness of the model. Although various multi-modal learning methods have been…

Computer Vision and Pattern Recognition · Computer Science 2024-10-28 Haochen Han , Qinghua Zheng , Minnan Luo , Kaiyao Miao , Feng Tian , Yan Chen

Audio-Visual Contrastive Learning with Temporal Self-Supervision

We propose a self-supervised learning approach for videos that learns representations of both the RGB frames and the accompanying audio without human supervision. In contrast to images that capture the static scene appearance, videos also…

Computer Vision and Pattern Recognition · Computer Science 2023-02-16 Simon Jenni , Alexander Black , John Collomosse

Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework

We propose a self-supervised method to learn feature representations from videos. A standard approach in traditional self-supervised methods uses positive-negative data pairs to train with contrastive learning strategy. In such a case,…

Computer Vision and Pattern Recognition · Computer Science 2020-08-13 Li Tao , Xueting Wang , Toshihiko Yamasaki

Unsupervised Feature Clustering Improves Contrastive Representation Learning for Medical Image Segmentation

Self-supervised instance discrimination is an effective contrastive pretext task to learn feature representations and address limited medical image annotations. The idea is to make features of transformed versions of the same images similar…

Computer Vision and Pattern Recognition · Computer Science 2022-11-17 Yejia Zhang , Xinrong Hu , Nishchal Sapkota , Yiyu Shi , Danny Z. Chen

Contrastive representation learning has proven to be an effective self-supervised learning method. Most successful approaches are based on Noise Contrastive Estimation (NCE) and use different views of an instance as positives that should be…

Computer Vision and Pattern Recognition · Computer Science 2023-09-04 Julien Denize , Jaonary Rabarisoa , Astrid Orcesi , Romain Hérault , Stéphane Canu

Investigating the Role of Negatives in Contrastive Representation Learning

Noise contrastive learning is a popular technique for unsupervised representation learning. In this approach, a representation is obtained via reduction to supervised learning, where given a notion of semantic similarity, the learner tries…

Machine Learning · Computer Science 2021-06-21 Jordan T. Ash , Surbhi Goel , Akshay Krishnamurthy , Dipendra Misra

Conditional Negative Sampling for Contrastive Learning of Visual Representations

Recent methods for learning unsupervised visual representations, dubbed contrastive learning, optimize the noise-contrastive estimation (NCE) bound on mutual information between two views of an image. NCE uses randomly sampled negative…

Machine Learning · Computer Science 2020-10-06 Mike Wu , Milan Mosse , Chengxu Zhuang , Daniel Yamins , Noah Goodman

Amortised Invariance Learning for Contrastive Self-Supervision

Contrastive self-supervised learning methods famously produce high quality transferable representations by learning invariances to different data augmentations. Invariances established during pre-training can be interpreted as strong…

Computer Vision and Pattern Recognition · Computer Science 2023-04-05 Ruchika Chavhan , Henry Gouk , Jan Stuehmer , Calum Heggan , Mehrdad Yaghoobi , Timothy Hospedales

Contrastive Transformation for Self-supervised Correspondence Learning

In this paper, we focus on the self-supervised learning of visual correspondence using unlabeled videos in the wild. Our method simultaneously considers intra- and inter-video representation associations for reliable correspondence…

Computer Vision and Pattern Recognition · Computer Science 2020-12-10 Ning Wang , Wengang Zhou , Houqiang Li

Contrastive Video Representation Learning via Adversarial Perturbations

Adversarial perturbations are noise-like patterns that can subtly change the data, while failing an otherwise accurate classifier. In this paper, we propose to use such perturbations within a novel contrastive learning setup to build…

Computer Vision and Pattern Recognition · Computer Science 2020-04-17 Jue Wang , Anoop Cherian

Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning

Self-supervised audio-visual source localization aims to locate sound-source objects in video frames without extra annotations. Recent methods often approach this goal with the help of contrastive learning, which assumes only the audio and…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Weixuan Sun , Jiayi Zhang , Jianyuan Wang , Zheyuan Liu , Yiran Zhong , Tianpeng Feng , Yandong Guo , Yanhao Zhang , Nick Barnes

Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast

We present an approach to learn voice-face representations from the talking face videos, without any identity labels. Previous works employ cross-modal instance discrimination tasks to establish the correlation of voice and face. These…

Sound · Computer Science 2022-05-30 Boqing Zhu , Kele Xu , Changjian Wang , Zheng Qin , Tao Sun , Huaimin Wang , Yuxing Peng

Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning

Self-supervised pre-training methods based on contrastive learning or regression tasks can utilize more unlabeled data to improve the performance of automatic speech recognition (ASR). However, the robustness impact of combining the two…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-28 Qiu-Shi Zhu , Long Zhou , Jie Zhang , Shu-Jie Liu , Yu-Chen Hu , Li-Rong Dai

Self-Supervised Learning from Contrastive Mixtures for Personalized Speech Enhancement

This work explores how self-supervised learning can be universally used to discover speaker-specific features towards enabling personalized speech enhancement models. We specifically address the few-shot learning scenario where access to…

Audio and Speech Processing · Electrical Eng. & Systems 2022-08-11 Aswin Sivaraman , Minje Kim