Related papers: Self-supervised Motion Learning from Static Images

MOFO: MOtion FOcused Self-Supervision for Video Understanding

Self-supervised learning (SSL) techniques have recently produced outstanding results in learning visual representations from unlabeled videos. Despite the importance of motion in supervised learning techniques for action recognition, SSL…

Computer Vision and Pattern Recognition · Computer Science 2023-11-02 Mona Ahmadian , Frank Guerin , Andrew Gilbert

Self-supervised Sparse to Dense Motion Segmentation

Observable motion in videos can give rise to the definition of objects moving with respect to the scene. The task of segmenting such moving objects is referred to as motion segmentation and is usually tackled either by aggregating motion…

Computer Vision and Pattern Recognition · Computer Science 2020-08-19 Amirhossein Kardoost , Kalun Ho , Peter Ochs , Margret Keuper

MoSiC: Optimal-Transport Motion Trajectory for Dense Self-Supervised Learning

Dense self-supervised learning has shown great promise for learning pixel- and patch-level representations, but extending it to videos remains challenging due to the complexity of motion dynamics. Existing approaches struggle as they rely…

Computer Vision and Pattern Recognition · Computer Science 2025-07-11 Mohammadreza Salehi , Shashanka Venkataramanan , Ioana Simion , Efstratios Gavves , Cees G. M. Snoek , Yuki M Asano

Using Motion and Internal Supervision in Object Recognition

In this thesis we address two related aspects of visual object recognition: the use of motion information, and the use of internal supervision, to help unsupervised learning. These two aspects are inter-related in the current study, since…

Computer Vision and Pattern Recognition · Computer Science 2018-12-14 Daniel Harari

Pose from Action: Unsupervised Learning of Pose Features based on Motion

Human actions are comprised of a sequence of poses. This makes videos of humans a rich and dense source of human poses. We propose an unsupervised method to learn pose features from videos that exploits a signal which is complementary to…

Computer Vision and Pattern Recognition · Computer Science 2016-09-20 Senthil Purushwalkam , Abhinav Gupta

Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics

We address the problem of video representation learning without human-annotated labels. While previous efforts address the problem by designing novel self-supervised tasks using video data, the learned features are merely on a…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Jiangliu Wang , Jianbo Jiao , Linchao Bao , Shengfeng He , Yunhui Liu , Wei Liu

Self-Supervised Learning via Conditional Motion Propagation

Intelligent agent naturally learns from motion. Various self-supervised algorithms have leveraged motion cues to learn effective visual representations. The hurdle here is that motion is both ambiguous and complex, rendering previous works…

Computer Vision and Pattern Recognition · Computer Science 2019-04-26 Xiaohang Zhan , Xingang Pan , Ziwei Liu , Dahua Lin , Chen Change Loy

Self-supervised Video Representation Learning by Context and Motion Decoupling

A key challenge in self-supervised video representation learning is how to effectively capture motion information besides context bias. While most existing works implicitly achieve this with video-specific pretext tasks (e.g., predicting…

Computer Vision and Pattern Recognition · Computer Science 2021-04-05 Lianghua Huang , Yu Liu , Bin Wang , Pan Pan , Yinghui Xu , Rong Jin

Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders

Masked autoencoders (MAEs) have emerged recently as art self-supervised spatiotemporal representation learners. Inheriting from the image counterparts, however, existing video MAEs still focus largely on static appearance learning whilst…

Computer Vision and Pattern Recognition · Computer Science 2022-10-11 Haosen Yang , Deng Huang , Bin Wen , Jiannan Wu , Hongxun Yao , Yi Jiang , Xiatian Zhu , Zehuan Yuan

Learning Features by Watching Objects Move

This paper presents a novel yet intuitive approach to unsupervised feature learning. Inspired by the human visual system, we explore whether low-level motion-based grouping cues can be used to learn an effective visual representation.…

Computer Vision and Pattern Recognition · Computer Science 2017-04-13 Deepak Pathak , Ross Girshick , Piotr Dollár , Trevor Darrell , Bharath Hariharan

Learning Motion Patterns in Videos

The problem of determining whether an object is in motion, irrespective of camera motion, is far from being solved. We address this challenging task by learning motion patterns in videos. The core of our approach is a fully convolutional…

Computer Vision and Pattern Recognition · Computer Science 2017-04-11 Pavel Tokmakov , Karteek Alahari , Cordelia Schmid

Adversarial Framework for Unsupervised Learning of Motion Dynamics in Videos

Human behavior understanding in videos is a complex, still unsolved problem and requires to accurately model motion at both the local (pixel-wise dense prediction) and global (aggregation of motion cues) levels. Current approaches based on…

Computer Vision and Pattern Recognition · Computer Science 2019-09-19 C. Spampinato , S. Palazzo , P. D'Oro , D. Giordano , M. Shah

Learning to encode motion using spatio-temporal synchrony

We consider the task of learning to extract motion from videos. To this end, we show that the detection of spatial transformations can be viewed as the detection of synchrony between the image sequence and a sequence of features undergoing…

Computer Vision and Pattern Recognition · Computer Science 2014-02-11 Kishore Reddy Konda , Roland Memisevic , Vincent Michalski

Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition

Spatio-temporal convolution often fails to learn motion dynamics in videos and thus an effective motion representation is required for video understanding in the wild. In this paper, we propose a rich and robust motion representation based…

Computer Vision and Pattern Recognition · Computer Science 2021-11-03 Heeseung Kwon , Manjin Kim , Suha Kwak , Minsu Cho

Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics

This paper proposes a novel pretext task to address the self-supervised video representation learning problem. Specifically, given an unlabeled video clip, we compute a series of spatio-temporal statistical summaries, such as the spatial…

Computer Vision and Pattern Recognition · Computer Science 2021-02-01 Jiangliu Wang , Jianbo Jiao , Linchao Bao , Shengfeng He , Wei Liu , Yun-hui Liu

Motion-supervised Co-Part Segmentation

Recent co-part segmentation methods mostly operate in a supervised learning setting, which requires a large amount of annotated data for training. To overcome this limitation, we propose a self-supervised deep learning method for co-part…

Computer Vision and Pattern Recognition · Computer Science 2021-04-12 Aliaksandr Siarohin , Subhankar Roy , Stéphane Lathuilière , Sergey Tulyakov , Elisa Ricci , Nicu Sebe

Guess What Moves: Unsupervised Video and Image Segmentation by Anticipating Motion

Motion, measured via optical flow, provides a powerful cue to discover and learn objects in images and videos. However, compared to using appearance, it has some blind spots, such as the fact that objects become invisible if they do not…

Computer Vision and Pattern Recognition · Computer Science 2022-10-17 Subhabrata Choudhury , Laurynas Karazija , Iro Laina , Andrea Vedaldi , Christian Rupprecht

Unsupervised Learning of Object Structure and Dynamics from Videos

Extracting and predicting object structure and dynamics from videos without supervision is a major challenge in machine learning. To address this challenge, we adopt a keypoint-based image representation and learn a stochastic dynamics…

Computer Vision and Pattern Recognition · Computer Science 2020-03-03 Matthias Minderer , Chen Sun , Ruben Villegas , Forrester Cole , Kevin Murphy , Honglak Lee

LocoMotion: Learning Motion-Focused Video-Language Representations

This paper strives for motion-focused video-language representations. Existing methods to learn video-language representations use spatial-focused data, where identifying the objects and scene is often enough to distinguish the relevant…

Computer Vision and Pattern Recognition · Computer Science 2024-10-24 Hazel Doughty , Fida Mohammad Thoker , Cees G. M. Snoek

Self-supervised Video Object Segmentation by Motion Grouping

Animals have evolved highly functional visual systems to understand motion, assisting perception even under complex environments. In this paper, we work towards developing a computer vision system able to segment objects by exploiting…

Computer Vision and Pattern Recognition · Computer Science 2021-08-12 Charig Yang , Hala Lamdouar , Erika Lu , Andrew Zisserman , Weidi Xie