English
Related papers

Related papers: Controllable Augmentations for Video Representatio…

200 papers

Joint Embedding Architecture-based self-supervised learning methods have attributed the composition of data augmentations as a crucial factor for their strong representation learning capabilities. While regional dropout strategies have…

Computer Vision and Pattern Recognition · Computer Science 2023-09-08 Arjon Das , Xin Zhong

We study pre-training representations for decision-making using video data, which is abundantly available for tasks such as game agents and software testing. Even though significant empirical advances have been made on this problem, a…

Machine Learning · Computer Science 2024-03-21 Dipendra Misra , Akanksha Saran , Tengyang Xie , Alex Lamb , John Langford

In this paper, we focus on the self-supervised learning of visual correspondence using unlabeled videos in the wild. Our method simultaneously considers intra- and inter-video representation associations for reliable correspondence…

Computer Vision and Pattern Recognition · Computer Science 2020-12-10 Ning Wang , Wengang Zhou , Houqiang Li

The de facto approach in video object-centric learning maintains temporal consistency through learned dynamics modules that predict future object representations, called slots. We demonstrate that these predictors function as expensive…

Computer Vision and Pattern Recognition · Computer Science 2026-05-12 Zhiyuan Li , Rongzhen Zhao , Wenyan Yang , Wenshuai Zhao , Pekka Marttinen , Joni Pajarinen

Video-Language Pre-training models have recently significantly improved various multi-modal downstream tasks. Previous dominant works mainly adopt contrastive learning to achieve global feature alignment across modalities. However, the…

Computer Vision and Pattern Recognition · Computer Science 2023-01-19 Fan Ma , Xiaojie Jin , Heng Wang , Jingjia Huang , Linchao Zhu , Jiashi Feng , Yi Yang

Temporal graph representation learning aims to generate low-dimensional dynamic node embeddings to capture temporal information as well as structural and property information. Current representation learning methods for temporal networks…

Machine Learning · Computer Science 2023-11-08 Hongjiang Chen , Pengfei Jiao , Huijun Tang , Huaming Wu

Instance-level contrastive learning techniques, which rely on data augmentation and a contrastive loss function, have found great success in the domain of visual representation learning. They are not suitable for exploiting the rich…

Computer Vision and Pattern Recognition · Computer Science 2021-10-22 Martine Toering , Ioannis Gatopoulos , Maarten Stol , Vincent Tao Hu

Sequential recommendation addresses the issue of preference drift by predicting the next item based on the user's previous behaviors. Recently, a promising approach using contrastive learning has emerged, demonstrating its effectiveness in…

Information Retrieval · Computer Science 2023-08-08 Dongjun Lee , Donggeun Ko , Jaekwang Kim

One central question for video action recognition is how to model motion. In this paper, we present hierarchical contrastive motion learning, a new self-supervised learning framework to extract effective motion representations from raw…

Computer Vision and Pattern Recognition · Computer Science 2022-01-19 Xitong Yang , Xiaodong Yang , Sifei Liu , Deqing Sun , Larry Davis , Jan Kautz

The abundance and ease of utilizing sound, along with the fact that auditory clues reveal so much about what happens in the scene, make the audio-visual space a perfectly intuitive choice for self-supervised representation learning.…

Computer Vision and Pattern Recognition · Computer Science 2021-06-17 Mahdi M. Kalayeh , Nagendra Kamath , Lingyi Liu , Ashok Chandrashekar

Learning rich visual representations using contrastive self-supervised learning has been extremely successful. However, it is still a major question whether we could use a similar approach to learn superior auditory representations. In this…

Sound · Computer Science 2020-10-20 Haider Al-Tahan , Yalda Mohsenzadeh

Video super-resolution aims at generating a high-resolution video from its low-resolution counterpart. With the rapid rise of deep learning, many recently proposed video super-resolution methods use convolutional neural networks in…

Computer Vision and Pattern Recognition · Computer Science 2020-01-07 Xiaohong Liu , Lingshi Kong , Yang Zhou , Jiying Zhao , Jun Chen

Robotic manipulation requires anticipating how the environment evolves in response to actions, yet most existing systems lack this predictive capability, often resulting in errors and inefficiency. While Vision-Language Models (VLMs)…

Robotics · Computer Science 2026-02-12 Songen Gu , Yunuo Cai , Tianyu Wang , Simo Wu , Yanwei Fu

Contrastive learning-based recommendation algorithms have significantly advanced the field of self-supervised recommendation, particularly with BPR as a representative ranking prediction task that dominates implicit collaborative filtering.…

Information Retrieval · Computer Science 2024-03-13 Shipeng Song , Bin Liu , Fei Teng , Tianrui Li

Image-text contrastive learning models such as CLIP have demonstrated strong task transfer ability. The high generality and usability of these visual models is achieved via a web-scale data collection process to ensure broad concept…

Computer Vision and Pattern Recognition · Computer Science 2023-01-18 Haotian Liu , Kilho Son , Jianwei Yang , Ce Liu , Jianfeng Gao , Yong Jae Lee , Chunyuan Li

Contrastive learning enables learning useful audio and speech representations without ground-truth labels by maximizing the similarity between latent representations of similar signal segments. In this framework various data augmentation…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-11 Salah Zaiem , Titouan Parcollet , Slim Essid

Video understanding relies on perceiving the global content and modeling its internal connections (e.g., causality, movement, and spatio-temporal correspondence). To learn these interactions, we apply a mask-then-predict pre-training task…

Computer Vision and Pattern Recognition · Computer Science 2021-06-22 Hao Tan , Jie Lei , Thomas Wolf , Mohit Bansal

Recently, self-supervised contrastive learning has achieved great success on various tasks. However, its underlying working mechanism is yet unclear. In this paper, we first provide the tightest bounds based on the widely adopted assumption…

Machine Learning · Computer Science 2025-11-06 Qi Zhang , Yifei Wang , Yisen Wang

Recently, contrastive self-supervised learning has become a key component for learning visual representations across many computer vision tasks and benchmarks. However, contrastive learning in the context of domain adaptation remains…

Computer Vision and Pattern Recognition · Computer Science 2021-06-25 Mamatha Thota , Georgios Leontidis

We explore spatiotemporal data augmentation using video foundation models to diversify both camera viewpoints and scene dynamics. Unlike existing approaches based on simple geometric transforms or appearance perturbations, our method…

Computer Vision and Pattern Recognition · Computer Science 2025-12-16 Jinfan Zhou , Lixin Luo , Sungmin Eum , Heesung Kwon , Jeong Joon Park