English
Related papers

Related papers: Controllable Augmentations for Video Representatio…

200 papers

We address the problem of data augmentation for video action recognition. Standard augmentation strategies in video are hand-designed and sample the space of possible augmented data points either at random, without knowing which augmented…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Shreyank N Gowda , Marcus Rohrbach , Frank Keller , Laura Sevilla-Lara

Facial action unit (AU) detection, aiming to classify AU present in the facial image, has long suffered from insufficient AU annotations. In this paper, we aim to mitigate this data scarcity issue by learning AU representations from a large…

Computer Vision and Pattern Recognition · Computer Science 2024-03-07 Yong Li , Shiguang Shan

A key challenge in self-supervised video representation learning is how to effectively capture motion information besides context bias. While most existing works implicitly achieve this with video-specific pretext tasks (e.g., predicting…

Computer Vision and Pattern Recognition · Computer Science 2021-04-05 Lianghua Huang , Yu Liu , Bin Wang , Pan Pan , Yinghui Xu , Rong Jin

There is a natural correlation between the visual and auditive elements of a video. In this work we leverage this connection to learn general and effective models for both audio and video analysis from self-supervised temporal…

Computer Vision and Pattern Recognition · Computer Science 2018-11-13 Bruno Korbar , Du Tran , Lorenzo Torresani

Modern self-supervised learning algorithms typically enforce persistency of instance representations across views. While being very effective on learning holistic image and video representations, such an objective becomes sub-optimal for…

Computer Vision and Pattern Recognition · Computer Science 2022-04-05 Liangzhe Yuan , Rui Qian , Yin Cui , Boqing Gong , Florian Schroff , Ming-Hsuan Yang , Hartwig Adam , Ting Liu

This paper proposes a novel pretext task to address the self-supervised video representation learning problem. Specifically, given an unlabeled video clip, we compute a series of spatio-temporal statistical summaries, such as the spatial…

Computer Vision and Pattern Recognition · Computer Science 2021-02-01 Jiangliu Wang , Jianbo Jiao , Linchao Bao , Shengfeng He , Wei Liu , Yun-hui Liu

Children learn to build a visual representation of the world from unsupervised exploration and we hypothesize that a key part of this learning ability is the use of self-generated navigational information as a similarity label to drive a…

Computer Vision and Pattern Recognition · Computer Science 2022-02-17 Lizhen Zhu , Brad Wyble , James Z. Wang

Recent single image unsupervised representation learning techniques show remarkable success on a variety of tasks. The basic principle in these works is instance discrimination: learning to differentiate between two augmented versions of…

Computer Vision and Pattern Recognition · Computer Science 2020-05-08 Daniel Gordon , Kiana Ehsani , Dieter Fox , Ali Farhadi

Contrastive learning has recently narrowed the gap between self-supervised and supervised methods in image and video domain. State-of-the-art video contrastive learning methods such as CVRL and $\rho$-MoCo spatiotemporally augment two clips…

Computer Vision and Pattern Recognition · Computer Science 2023-03-14 David Fan , Deyu Yang , Xinyu Li , Vimal Bhat , Rohith MV

Self-supervision has emerged as a propitious method for visual representation learning after the recent paradigm shift from handcrafted pretext tasks to instance-similarity based approaches. Most state-of-the-art methods enforce similarity…

Computer Vision and Pattern Recognition · Computer Science 2022-10-19 Sravanti Addepalli , Kaushal Bhogale , Priyam Dey , R. Venkatesh Babu

We propose Domain-Conditioned Meta-Contrastive Learning, a framework for improving the cross-domain generalization of vision-language models. While contrastive models such as CLIP achieve strong performance through large-scale training,…

Optimization and Control · Mathematics 2026-03-31 Merham Fouladvand , Peuroly Batra

Leveraging temporal information has been regarded as essential for developing video understanding models. However, how to properly incorporate temporal information into the recent successful instance discrimination based contrastive…

Computer Vision and Pattern Recognition · Computer Science 2020-11-30 Yutong Bai , Haoqi Fan , Ishan Misra , Ganesh Venkatesh , Yongyi Lu , Yuyin Zhou , Qihang Yu , Vikas Chandra , Alan Yuille

This paper addresses the problem of self-supervised video representation learning from a new perspective -- by video pace prediction. It stems from the observation that human visual system is sensitive to video pace, e.g., slow motion, a…

Computer Vision and Pattern Recognition · Computer Science 2020-09-07 Jiangliu Wang , Jianbo Jiao , Yun-Hui Liu

We present a self-supervised learning method to learn audio and video representations. Prior work uses the natural correspondence between audio and video to define a standard cross-modal instance discrimination task, where a model is…

Computer Vision and Pattern Recognition · Computer Science 2021-03-31 Pedro Morgado , Ishan Misra , Nuno Vasconcelos

This paper introduces a novel method for self-supervised video representation learning via feature prediction. In contrast to the previous methods that focus on future feature prediction, we argue that a supervisory signal arising from…

Computer Vision and Pattern Recognition · Computer Science 2020-11-13 Nadine Behrmann , Juergen Gall , Mehdi Noroozi

A data augmentation module is utilized in contrastive learning to transform the given data example into two views, which is considered essential and irreplaceable. However, the predetermined composition of multiple data augmentations brings…

Computer Vision and Pattern Recognition · Computer Science 2022-08-23 Junbo Zhang , Kaisheng Ma

Spatially dense self-supervised learning is a rapidly growing problem domain with promising applications for unsupervised segmentation and pretraining for dense downstream tasks. Despite the abundance of temporal data in the form of videos,…

Computer Vision and Pattern Recognition · Computer Science 2023-08-24 Mohammadreza Salehi , Efstratios Gavves , Cees G. M. Snoek , Yuki M. Asano

Video self-supervised learning is a challenging task, which requires significant expressive power from the model to leverage rich spatial-temporal knowledge and generate effective supervisory signals from large amounts of unlabeled videos.…

Computer Vision and Pattern Recognition · Computer Science 2023-07-19 Yang Liu , Keze Wang , Lingbo Liu , Haoyuan Lan , Liang Lin

Unsupervised domain adaptation which aims to adapt models trained on a labeled source domain to a completely unlabeled target domain has attracted much attention in recent years. While many domain adaptation techniques have been proposed…

Computer Vision and Pattern Recognition · Computer Science 2021-10-29 Aadarsh Sahoo , Rutav Shah , Rameswar Panda , Kate Saenko , Abir Das

Most of the existing video self-supervised methods mainly leverage temporal signals of videos, ignoring that the semantics of moving objects and environmental information are all critical for video-related tasks. In this paper, we propose a…

Computer Vision and Pattern Recognition · Computer Science 2021-07-09 Wei Li , Dezhao Luo , Bo Fang , Yu Zhou , Weiping Wang