English
Related papers

Related papers: Robust Audio-Visual Instance Discrimination

200 papers

We present a self-supervised learning approach to learn audio-visual representations from video and audio. Our method uses contrastive learning for cross-modal discrimination of video from audio and vice-versa. We show that optimizing for…

Computer Vision and Pattern Recognition · Computer Science 2021-03-31 Pedro Morgado , Nuno Vasconcelos , Ishan Misra

Contrastive representation learning has proven to be an effective self-supervised learning method for images and videos. Most successful approaches are based on Noise Contrastive Estimation (NCE) and use different views of an instance as…

Computer Vision and Pattern Recognition · Computer Science 2023-09-27 Julien Denize , Jaonary Rabarisoa , Astrid Orcesi , Romain Hérault

The objective of this paper is visual-only self-supervised video representation learning. We make the following contributions: (i) we investigate the benefit of adding semantic-class positives to instance-based Info Noise Contrastive…

Computer Vision and Pattern Recognition · Computer Science 2021-01-13 Tengda Han , Weidi Xie , Andrew Zisserman

We study self-supervised video representation learning, which is a challenging task due to 1) lack of labels for explicit supervision; 2) unstructured and noisy visual information. Existing methods mainly use contrastive loss with video…

Computer Vision and Pattern Recognition · Computer Science 2021-08-18 Deng Huang , Wenhao Wu , Weiwen Hu , Xu Liu , Dongliang He , Zhihua Wu , Xiangmiao Wu , Mingkui Tan , Errui Ding

The goal of this work is to localize sound sources in visual scenes with a self-supervised approach. Contrastive learning in the context of sound source localization leverages the natural correspondence between audio and visual signals…

Computer Vision and Pattern Recognition · Computer Science 2022-11-04 Sooyoung Park , Arda Senocak , Joon Son Chung

Self-supervised representation learning can mitigate the limitations in recognition tasks with few manually labeled data but abundant unlabeled data---a common scenario in sound event research. In this work, we explore unsupervised…

Sound · Computer Science 2020-11-17 Eduardo Fonseca , Diego Ortego , Kevin McGuinness , Noel E. O'Connor , Xavier Serra

Recently, video recognition is emerging with the help of multi-modal learning, which focuses on integrating distinct modalities to improve the performance or robustness of the model. Although various multi-modal learning methods have been…

Computer Vision and Pattern Recognition · Computer Science 2024-10-28 Haochen Han , Qinghua Zheng , Minnan Luo , Kaiyao Miao , Feng Tian , Yan Chen

We propose a self-supervised learning approach for videos that learns representations of both the RGB frames and the accompanying audio without human supervision. In contrast to images that capture the static scene appearance, videos also…

Computer Vision and Pattern Recognition · Computer Science 2023-02-16 Simon Jenni , Alexander Black , John Collomosse

We propose a self-supervised method to learn feature representations from videos. A standard approach in traditional self-supervised methods uses positive-negative data pairs to train with contrastive learning strategy. In such a case,…

Computer Vision and Pattern Recognition · Computer Science 2020-08-13 Li Tao , Xueting Wang , Toshihiko Yamasaki

Self-supervised instance discrimination is an effective contrastive pretext task to learn feature representations and address limited medical image annotations. The idea is to make features of transformed versions of the same images similar…

Computer Vision and Pattern Recognition · Computer Science 2022-11-17 Yejia Zhang , Xinrong Hu , Nishchal Sapkota , Yiyu Shi , Danny Z. Chen

Contrastive representation learning has proven to be an effective self-supervised learning method. Most successful approaches are based on Noise Contrastive Estimation (NCE) and use different views of an instance as positives that should be…

Computer Vision and Pattern Recognition · Computer Science 2023-09-04 Julien Denize , Jaonary Rabarisoa , Astrid Orcesi , Romain Hérault , Stéphane Canu

Noise contrastive learning is a popular technique for unsupervised representation learning. In this approach, a representation is obtained via reduction to supervised learning, where given a notion of semantic similarity, the learner tries…

Machine Learning · Computer Science 2021-06-21 Jordan T. Ash , Surbhi Goel , Akshay Krishnamurthy , Dipendra Misra

Recent methods for learning unsupervised visual representations, dubbed contrastive learning, optimize the noise-contrastive estimation (NCE) bound on mutual information between two views of an image. NCE uses randomly sampled negative…

Machine Learning · Computer Science 2020-10-06 Mike Wu , Milan Mosse , Chengxu Zhuang , Daniel Yamins , Noah Goodman

Contrastive self-supervised learning methods famously produce high quality transferable representations by learning invariances to different data augmentations. Invariances established during pre-training can be interpreted as strong…

Computer Vision and Pattern Recognition · Computer Science 2023-04-05 Ruchika Chavhan , Henry Gouk , Jan Stuehmer , Calum Heggan , Mehrdad Yaghoobi , Timothy Hospedales

In this paper, we focus on the self-supervised learning of visual correspondence using unlabeled videos in the wild. Our method simultaneously considers intra- and inter-video representation associations for reliable correspondence…

Computer Vision and Pattern Recognition · Computer Science 2020-12-10 Ning Wang , Wengang Zhou , Houqiang Li

Adversarial perturbations are noise-like patterns that can subtly change the data, while failing an otherwise accurate classifier. In this paper, we propose to use such perturbations within a novel contrastive learning setup to build…

Computer Vision and Pattern Recognition · Computer Science 2020-04-17 Jue Wang , Anoop Cherian

Self-supervised audio-visual source localization aims to locate sound-source objects in video frames without extra annotations. Recent methods often approach this goal with the help of contrastive learning, which assumes only the audio and…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Weixuan Sun , Jiayi Zhang , Jianyuan Wang , Zheyuan Liu , Yiran Zhong , Tianpeng Feng , Yandong Guo , Yanhao Zhang , Nick Barnes

We present an approach to learn voice-face representations from the talking face videos, without any identity labels. Previous works employ cross-modal instance discrimination tasks to establish the correlation of voice and face. These…

Sound · Computer Science 2022-05-30 Boqing Zhu , Kele Xu , Changjian Wang , Zheng Qin , Tao Sun , Huaimin Wang , Yuxing Peng

Self-supervised pre-training methods based on contrastive learning or regression tasks can utilize more unlabeled data to improve the performance of automatic speech recognition (ASR). However, the robustness impact of combining the two…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-28 Qiu-Shi Zhu , Long Zhou , Jie Zhang , Shu-Jie Liu , Yu-Chen Hu , Li-Rong Dai

This work explores how self-supervised learning can be universally used to discover speaker-specific features towards enabling personalized speech enhancement models. We specifically address the few-shot learning scenario where access to…

Audio and Speech Processing · Electrical Eng. & Systems 2022-08-11 Aswin Sivaraman , Minje Kim
‹ Prev 1 2 3 10 Next ›