Related papers: Controllable Augmentations for Video Representatio…

Learn2Augment: Learning to Composite Videos for Data Augmentation in Action Recognition

We address the problem of data augmentation for video action recognition. Standard augmentation strategies in video are hand-designed and sample the space of possible augmented data points either at random, without knowing which augmented…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Shreyank N Gowda , Marcus Rohrbach , Frank Keller , Laura Sevilla-Lara

Contrastive Learning of Person-independent Representations for Facial Action Unit Detection

Facial action unit (AU) detection, aiming to classify AU present in the facial image, has long suffered from insufficient AU annotations. In this paper, we aim to mitigate this data scarcity issue by learning AU representations from a large…

Computer Vision and Pattern Recognition · Computer Science 2024-03-07 Yong Li , Shiguang Shan

Self-supervised Video Representation Learning by Context and Motion Decoupling

A key challenge in self-supervised video representation learning is how to effectively capture motion information besides context bias. While most existing works implicitly achieve this with video-specific pretext tasks (e.g., predicting…

Computer Vision and Pattern Recognition · Computer Science 2021-04-05 Lianghua Huang , Yu Liu , Bin Wang , Pan Pan , Yinghui Xu , Rong Jin

Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization

There is a natural correlation between the visual and auditive elements of a video. In this work we leverage this connection to learn general and effective models for both audio and video analysis from self-supervised temporal…

Computer Vision and Pattern Recognition · Computer Science 2018-11-13 Bruno Korbar , Du Tran , Lorenzo Torresani

Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision

Modern self-supervised learning algorithms typically enforce persistency of instance representations across views. While being very effective on learning holistic image and video representations, such an objective becomes sub-optimal for…

Computer Vision and Pattern Recognition · Computer Science 2022-04-05 Liangzhe Yuan , Rui Qian , Yin Cui , Boqing Gong , Florian Schroff , Ming-Hsuan Yang , Hartwig Adam , Ting Liu

Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics

This paper proposes a novel pretext task to address the self-supervised video representation learning problem. Specifically, given an unlabeled video clip, we compute a series of spatio-temporal statistical summaries, such as the spatial…

Computer Vision and Pattern Recognition · Computer Science 2021-02-01 Jiangliu Wang , Jianbo Jiao , Linchao Bao , Shengfeng He , Wei Liu , Yun-hui Liu

Using Navigational Information to Learn Visual Representations

Children learn to build a visual representation of the world from unsupervised exploration and we hypothesize that a key part of this learning ability is the use of self-generated navigational information as a similarity label to drive a…

Computer Vision and Pattern Recognition · Computer Science 2022-02-17 Lizhen Zhu , Brad Wyble , James Z. Wang

Watching the World Go By: Representation Learning from Unlabeled Videos

Recent single image unsupervised representation learning techniques show remarkable success on a variety of tasks. The basic principle in these works is instance discrimination: learning to differentiate between two augmented versions of…

Computer Vision and Pattern Recognition · Computer Science 2020-05-08 Daniel Gordon , Kiana Ehsani , Dieter Fox , Ali Farhadi

Nearest-Neighbor Inter-Intra Contrastive Learning from Unlabeled Videos

Contrastive learning has recently narrowed the gap between self-supervised and supervised methods in image and video domain. State-of-the-art video contrastive learning methods such as CVRL and $\rho$-MoCo spatiotemporally augment two clips…

Computer Vision and Pattern Recognition · Computer Science 2023-03-14 David Fan , Deyu Yang , Xinyu Li , Vimal Bhat , Rohith MV

Towards Efficient and Effective Self-Supervised Learning of Visual Representations

Self-supervision has emerged as a propitious method for visual representation learning after the recent paradigm shift from handcrafted pretext tasks to instance-similarity based approaches. Most state-of-the-art methods enforce similarity…

Computer Vision and Pattern Recognition · Computer Science 2022-10-19 Sravanti Addepalli , Kaushal Bhogale , Priyam Dey , R. Venkatesh Babu

Meta-Contrastive Learning for Vision-Language Models via Task-Adaptive CLIP Training

We propose Domain-Conditioned Meta-Contrastive Learning, a framework for improving the cross-domain generalization of vision-language models. While contrastive models such as CLIP achieve strong performance through large-scale training,…

Optimization and Control · Mathematics 2026-03-31 Merham Fouladvand , Peuroly Batra

Can Temporal Information Help with Contrastive Self-Supervised Learning?

Leveraging temporal information has been regarded as essential for developing video understanding models. However, how to properly incorporate temporal information into the recent successful instance discrimination based contrastive…

Computer Vision and Pattern Recognition · Computer Science 2020-11-30 Yutong Bai , Haoqi Fan , Ishan Misra , Ganesh Venkatesh , Yongyi Lu , Yuyin Zhou , Qihang Yu , Vikas Chandra , Alan Yuille

Self-supervised Video Representation Learning by Pace Prediction

This paper addresses the problem of self-supervised video representation learning from a new perspective -- by video pace prediction. It stems from the observation that human visual system is sensitive to video pace, e.g., slow motion, a…

Computer Vision and Pattern Recognition · Computer Science 2020-09-07 Jiangliu Wang , Jianbo Jiao , Yun-Hui Liu

Robust Audio-Visual Instance Discrimination

We present a self-supervised learning method to learn audio and video representations. Prior work uses the natural correspondence between audio and video to define a standard cross-modal instance discrimination task, where a model is…

Computer Vision and Pattern Recognition · Computer Science 2021-03-31 Pedro Morgado , Ishan Misra , Nuno Vasconcelos

Unsupervised Video Representation Learning by Bidirectional Feature Prediction

This paper introduces a novel method for self-supervised video representation learning via feature prediction. In contrast to the previous methods that focus on future feature prediction, we argue that a supervisory signal arising from…

Computer Vision and Pattern Recognition · Computer Science 2020-11-13 Nadine Behrmann , Juergen Gall , Mehdi Noroozi

Rethinking the Augmentation Module in Contrastive Learning: Learning Hierarchical Augmentation Invariance with Expanded Views

A data augmentation module is utilized in contrastive learning to transform the given data example into two views, which is considered essential and irreplaceable. However, the predetermined composition of multiple data augmentations brings…

Computer Vision and Pattern Recognition · Computer Science 2022-08-23 Junbo Zhang , Kaisheng Ma

Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations

Spatially dense self-supervised learning is a rapidly growing problem domain with promising applications for unsupervised segmentation and pretraining for dense downstream tasks. Despite the abundance of temporal data in the form of videos,…

Computer Vision and Pattern Recognition · Computer Science 2023-08-24 Mohammadreza Salehi , Efstratios Gavves , Cees G. M. Snoek , Yuki M. Asano

TCGL: Temporal Contrastive Graph for Self-supervised Video Representation Learning

Video self-supervised learning is a challenging task, which requires significant expressive power from the model to leverage rich spatial-temporal knowledge and generate effective supervisory signals from large amounts of unlabeled videos.…

Computer Vision and Pattern Recognition · Computer Science 2023-07-19 Yang Liu , Keze Wang , Lingbo Liu , Haoyuan Lan , Liang Lin

Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

Unsupervised domain adaptation which aims to adapt models trained on a labeled source domain to a completely unlabeled target domain has attracted much attention in recent years. While many domain adaptation techniques have been proposed…

Computer Vision and Pattern Recognition · Computer Science 2021-10-29 Aadarsh Sahoo , Rutav Shah , Rameswar Panda , Kate Saenko , Abir Das

Video 3D Sampling for Self-supervised Representation Learning

Most of the existing video self-supervised methods mainly leverage temporal signals of videos, ignoring that the semantics of moving objects and environmental information are all critical for video-related tasks. In this paper, we propose a…

Computer Vision and Pattern Recognition · Computer Science 2021-07-09 Wei Li , Dezhao Luo , Bo Fang , Yu Zhou , Weiping Wang