English
Related papers

Related papers: Controllable Augmentations for Video Representatio…

200 papers

We address the problem of video representation learning without human-annotated labels. While previous efforts address the problem by designing novel self-supervised tasks using video data, the learned features are merely on a…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Jiangliu Wang , Jianbo Jiao , Linchao Bao , Shengfeng He , Yunhui Liu , Wei Liu

Temporal grounding, which localizes video moments related to a natural language query, is a core problem of vision-language learning and video understanding. To encode video moments of varying lengths, recent methods employ a multi-level…

Computer Vision and Pattern Recognition · Computer Science 2026-04-28 Thong Thanh Nguyen , Yi Bin , Xiaobao Wu , Zhiyuan Hu , Cong-Duy T Nguyen , See-Kiong Ng , Anh Tuan Luu

In this paper, we present a new cross-architecture contrastive learning (CACL) framework for self-supervised video representation learning. CACL consists of a 3D CNN and a video transformer which are used in parallel to generate diverse…

Computer Vision and Pattern Recognition · Computer Science 2022-05-27 Sheng Guo , Zihua Xiong , Yujie Zhong , Limin Wang , Xiaobo Guo , Bing Han , Weilin Huang

Contrastive learning has shown promising potential in self-supervised spatio-temporal representation learning. Most works naively sample different clips to construct positive and negative pairs. However, we observe that this formulation…

Computer Vision and Pattern Recognition · Computer Science 2022-07-13 Shuangrui Ding , Rui Qian , Hongkai Xiong

Contrastive pretraining can substantially increase model generalisation and downstream performance. However, the quality of the learned representations is highly dependent on the data augmentation strategy applied to generate positive…

Computer Vision and Pattern Recognition · Computer Science 2025-06-17 Mélanie Roschewitz , Fabio De Sousa Ribeiro , Tian Xia , Galvin Khara , Ben Glocker

Natural videos provide rich visual contents for self-supervised learning. Yet most existing approaches for learning spatio-temporal representations rely on manually trimmed videos, leading to limited diversity in visual patterns and limited…

Computer Vision and Pattern Recognition · Computer Science 2022-04-08 Zhiwu Qing , Shiwei Zhang , Ziyuan Huang , Yi Xu , Xiang Wang , Mingqian Tang , Changxin Gao , Rong Jin , Nong Sang

In low-level video analyses, effective representations are important to derive the correspondences between video frames. These representations have been learned in a self-supervised fashion from unlabeled images or videos, using carefully…

Computer Vision and Pattern Recognition · Computer Science 2023-06-23 Rui Li , Dong Liu

Video representation learning has been successful in video-text pre-training for zero-shot transfer, where each sentence is trained to be close to the paired video clips in a common feature space. For long videos, given a paragraph of…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Yuncong Yang , Jiawei Ma , Shiyuan Huang , Long Chen , Xudong Lin , Guangxing Han , Shih-Fu Chang

Recent contrastive methods show significant improvement in self-supervised learning in several domains. In particular, contrastive methods are most effective where data augmentation can be easily constructed e.g. in computer vision.…

Machine Learning · Computer Science 2021-12-09 Konstantinos Kallidromitis , Denis Gudovskiy , Kazuki Kozuka , Iku Ohama , Luca Rigazio

Deep-Learning-based video recognition has shown promising improvements along with the development of large-scale datasets and spatiotemporal network architectures. In image recognition, learning spatially invariant features is a key factor…

Computer Vision and Pattern Recognition · Computer Science 2020-08-14 Taeoh Kim , Hyeongmin Lee , MyeongAh Cho , Ho Seong Lee , Dong Heon Cho , Sangyoun Lee

Unsupervised object-centric learning from videos is a promising approach to extract structured representations from large, unlabeled collections of videos. To support downstream tasks like autonomous control, these representations must be…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Anna Manasyan , Maximilian Seitzer , Filip Radovic , Georg Martius , Andrii Zadaianchuk

Learning transferable and domain adaptive feature representations from videos is important for video-relevant tasks such as action recognition. Existing video domain adaptation methods mainly rely on adversarial feature alignment, which has…

Computer Vision and Pattern Recognition · Computer Science 2021-08-30 Donghyun Kim , Yi-Hsuan Tsai , Bingbing Zhuang , Xiang Yu , Stan Sclaroff , Kate Saenko , Manmohan Chandraker

Recently, pretext-task based methods are proposed one after another in self-supervised video feature learning. Meanwhile, contrastive learning methods also yield good performance. Usually, new methods can beat previous ones as claimed that…

Computer Vision and Pattern Recognition · Computer Science 2021-04-06 Li Tao , Xueting Wang , Toshihiko Yamasaki

Attempt to fully discover the temporal diversity and chronological characteristics for self-supervised video representation learning, this work takes advantage of the temporal dependencies within videos and further proposes a novel…

Computer Vision and Pattern Recognition · Computer Science 2021-03-18 Yang Liu , Keze Wang , Haoyuan Lan , Liang Lin

We propose a self-supervised method for learning motion-focused video representations. Existing approaches minimize distances between temporally augmented videos, which maintain high spatial similarity. We instead propose to learn…

Computer Vision and Pattern Recognition · Computer Science 2023-09-29 Fida Mohammad Thoker , Hazel Doughty , Cees Snoek

Data augmentation plays a critical role in generating high-quality positive and negative pairs necessary for effective contrastive learning. However, common practices involve using a single augmentation policy repeatedly to generate…

Computer Vision and Pattern Recognition · Computer Science 2024-05-14 Nazim Bendib

Visual imagery does not consist of solitary objects, but instead reflects the composition of a multitude of fluid concepts. While there have been great advances in visual representation learning, such advances have focused on building…

Computer Vision and Pattern Recognition · Computer Science 2025-04-07 Austin Stone , Hagen Soltau , Robert Geirhos , Xi Yi , Ye Xia , Bingyi Cao , Kaifeng Chen , Abhijit Ogale , Jonathon Shlens

Recent advances in supervised deep learning methods are enabling remote measurements of photoplethysmography-based physiological signals using facial videos. The performance of these supervised methods, however, are dependent on the…

Computer Vision and Pattern Recognition · Computer Science 2021-12-15 Hao Wang , Euijoon Ahn , Jinman Kim

This thesis explores the central question of how to leverage temporal relations among video elements to advance video understanding. Addressing the limitations of existing methods, the work presents a five-fold contribution: (1) an…

Computer Vision and Pattern Recognition · Computer Science 2026-04-06 Thong Thanh Nguyen

Temporal cues in videos provide important information for recognizing actions accurately. However, temporal-discriminative features can hardly be extracted without using an annotated large-scale video action dataset for training. This paper…

Computer Vision and Pattern Recognition · Computer Science 2020-08-06 Jinpeng Wang , Yiqi Lin , Andy J. Ma , Pong C. Yuen