Related papers: Video Representation Learning by Recognizing Tempo…

Shuffle and Learn: Unsupervised Learning using Temporal Order Verification

In this paper, we present an approach for learning a visual representation from the raw spatiotemporal signals in videos. Our representation is learned without supervision from semantic labels. We formulate our method as an unsupervised…

Computer Vision and Pattern Recognition · Computer Science 2016-07-27 Ishan Misra , C. Lawrence Zitnick , Martial Hebert

Self-Supervised Visual Learning by Variable Playback Speeds Prediction of a Video

We propose a self-supervised visual learning method by predicting the variable playback speeds of a video. Without semantic labels, we learn the spatio-temporal visual representation of the video by leveraging the variations in the visual…

Computer Vision and Pattern Recognition · Computer Science 2021-06-02 Hyeon Cho , Taehoon Kim , Hyung Jin Chang , Wonjun Hwang

Self-Supervised Learning via multi-Transformation Classification for Action Recognition

Self-supervised tasks have been utilized to build useful representations that can be used in downstream tasks when the annotation is unavailable. In this paper, we introduce a self-supervised video representation learning method based on…

Computer Vision and Pattern Recognition · Computer Science 2021-02-23 Duc Quang Vu , Ngan T. H. Le , Jia-Ching Wang

Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics

We address the problem of video representation learning without human-annotated labels. While previous efforts address the problem by designing novel self-supervised tasks using video data, the learned features are merely on a…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Jiangliu Wang , Jianbo Jiao , Linchao Bao , Shengfeng He , Yunhui Liu , Wei Liu

Time-Contrastive Networks: Self-Supervised Learning from Video

We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used in two robotic imitation settings:…

Computer Vision and Pattern Recognition · Computer Science 2018-03-21 Pierre Sermanet , Corey Lynch , Yevgen Chebotar , Jasmine Hsu , Eric Jang , Stefan Schaal , Sergey Levine

Learning Representative Temporal Features for Action Recognition

In this paper, a novel video classification method is presented that aims to recognize different categories of third-person videos efficiently. Our motivation is to achieve a light model that could be trained with insufficient training…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Ali Javidani , Ahmad Mahmoudi-Aznaveh

Time-Equivariant Contrastive Video Representation Learning

We introduce a novel self-supervised contrastive learning method to learn representations from unlabelled videos. Existing approaches ignore the specifics of input distortions, e.g., by learning invariance to temporal transformations.…

Computer Vision and Pattern Recognition · Computer Science 2021-12-08 Simon Jenni , Hailin Jin

Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction

The success of deep neural networks generally requires a vast amount of training data to be labeled, which is expensive and unfeasible in scale, especially for video collections. To alleviate this problem, in this paper, we propose…

Computer Vision and Pattern Recognition · Computer Science 2019-04-05 Longlong Jing , Xiaodong Yang , Jingen Liu , Yingli Tian

Self-supervised Representation Learning for Ultrasound Video

Recent advances in deep learning have achieved promising performance for medical image analysis, while in most cases ground-truth annotations from human experts are necessary to train the deep model. In practice, such annotations are…

Computer Vision and Pattern Recognition · Computer Science 2020-03-03 Jianbo Jiao , Richard Droste , Lior Drukker , Aris T. Papageorghiou , J. Alison Noble

Watching the World Go By: Representation Learning from Unlabeled Videos

Recent single image unsupervised representation learning techniques show remarkable success on a variety of tasks. The basic principle in these works is instance discrimination: learning to differentiate between two augmented versions of…

Computer Vision and Pattern Recognition · Computer Science 2020-05-08 Daniel Gordon , Kiana Ehsani , Dieter Fox , Ali Farhadi

Learning Features by Watching Objects Move

This paper presents a novel yet intuitive approach to unsupervised feature learning. Inspired by the human visual system, we explore whether low-level motion-based grouping cues can be used to learn an effective visual representation.…

Computer Vision and Pattern Recognition · Computer Science 2017-04-13 Deepak Pathak , Ross Girshick , Piotr Dollár , Trevor Darrell , Bharath Hariharan

Learning-based Video Motion Magnification

Video motion magnification techniques allow us to see small motions previously invisible to the naked eyes, such as those of vibrating airplane wings, or swaying buildings under the influence of the wind. Because the motion is small, the…

Computer Vision and Pattern Recognition · Computer Science 2019-02-18 Tae-Hyun Oh , Ronnachai Jaroensri , Changil Kim , Mohamed Elgharib , Frédo Durand , William T. Freeman , Wojciech Matusik

The TIME Machine: On The Power of Motion for Efficient Perception

Video representation learning has seen tremendous progress in recent years. This has been driven by many factors, including the scale of training and the success of visual models trained contrastively with language. While these factors have…

Computer Vision and Pattern Recognition · Computer Science 2026-05-25 Mantas Skackauskas , Xinyue Hao , Laura Sevilla-Lara

Learning Transferable Self-attentive Representations for Action Recognition in Untrimmed Videos with Weak Supervision

Action recognition in videos has attracted a lot of attention in the past decade. In order to learn robust models, previous methods usually assume videos are trimmed as short sequences and require ground-truth annotations of each video…

Computer Vision and Pattern Recognition · Computer Science 2019-02-21 Xiao-Yu Zhang , Haichao Shi , Changsheng Li , Kai Zheng , Xiaobin Zhu , Lixin Duan

Unsupervised Learning of View-invariant Action Representations

The recent success in human action recognition with deep learning methods mostly adopt the supervised learning paradigm, which requires significant amount of manually labeled data to achieve good performance. However, label collection is an…

Computer Vision and Pattern Recognition · Computer Science 2018-09-07 Junnan Li , Yongkang Wong , Qi Zhao , Mohan S. Kankanhalli

Towards Unsupervised Representation Learning: Learning, Evaluating and Transferring Visual Representations

Unsupervised representation learning aims at finding methods that learn representations from data without annotation-based signals. Abstaining from annotations not only leads to economic benefits but may - and to some extent already does -…

Computer Vision and Pattern Recognition · Computer Science 2023-12-04 Bonifaz Stuhr

Video Representation Learning by Dense Predictive Coding

The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition. We make three contributions: First, we introduce the Dense Predictive Coding (DPC) framework for…

Computer Vision and Pattern Recognition · Computer Science 2019-09-30 Tengda Han , Weidi Xie , Andrew Zisserman

Learning Video Representations without Natural Videos

We show that useful video representations can be learned from synthetic videos and natural images, without incorporating natural videos in the training. We propose a progression of video datasets synthesized by simple generative processes,…

Computer Vision and Pattern Recognition · Computer Science 2024-11-20 Xueyang Yu , Xinlei Chen , Yossi Gandelsman

Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization

We propose a self-supervised method for learning motion-focused video representations. Existing approaches minimize distances between temporally augmented videos, which maintain high spatial similarity. We instead propose to learn…

Computer Vision and Pattern Recognition · Computer Science 2023-09-29 Fida Mohammad Thoker , Hazel Doughty , Cees Snoek

Grasp2Vec: Learning Object Representations from Self-Supervised Grasping

Well structured visual representations can make robot learning faster and can improve generalization. In this paper, we study how we can acquire effective object-centric representations for robotic manipulation tasks without human labeling…

Robotics · Computer Science 2018-11-20 Eric Jang , Coline Devin , Vincent Vanhoucke , Sergey Levine