Related papers: Memory-augmented Dense Predictive Coding for Video…

Video Representation Learning by Dense Predictive Coding

The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition. We make three contributions: First, we introduce the Dense Predictive Coding (DPC) framework for…

Computer Vision and Pattern Recognition · Computer Science 2019-09-30 Tengda Han , Weidi Xie , Andrew Zisserman

Self-supervised Video Representation Learning by Context and Motion Decoupling

A key challenge in self-supervised video representation learning is how to effectively capture motion information besides context bias. While most existing works implicitly achieve this with video-specific pretext tasks (e.g., predicting…

Computer Vision and Pattern Recognition · Computer Science 2021-04-05 Lianghua Huang , Yu Liu , Bin Wang , Pan Pan , Yinghui Xu , Rong Jin

Self-Supervised Learning via multi-Transformation Classification for Action Recognition

Self-supervised tasks have been utilized to build useful representations that can be used in downstream tasks when the annotation is unavailable. In this paper, we introduce a self-supervised video representation learning method based on…

Computer Vision and Pattern Recognition · Computer Science 2021-02-23 Duc Quang Vu , Ngan T. H. Le , Jia-Ching Wang

Meta-Representational Predictive Coding: Biomimetic Self-Supervised Learning

Self-supervised learning has become an increasingly important paradigm in the domain of machine intelligence. Furthermore, evidence for self-supervised adaptation, such as contrastive formulations, has emerged in recent computational…

Neural and Evolutionary Computing · Computer Science 2025-03-31 Alexander Ororbia , Karl Friston , Rajesh P. N. Rao

Unsupervised Learning of Video Representations using LSTMs

We use multilayer Long Short Term Memory (LSTM) networks to learn representations of video sequences. Our model uses an encoder LSTM to map an input sequence into a fixed length representation. This representation is decoded using single or…

Machine Learning · Computer Science 2016-01-05 Nitish Srivastava , Elman Mansimov , Ruslan Salakhutdinov

Self-supervised Video Object Segmentation

The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking). We make the following contributions: (i) we propose to improve the existing…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Fangrui Zhu , Li Zhang , Yanwei Fu , Guodong Guo , Weidi Xie

Self-supervised Representation Learning with Relative Predictive Coding

This paper introduces Relative Predictive Coding (RPC), a new contrastive representation learning objective that maintains a good balance among training stability, minibatch size sensitivity, and downstream task performance. The key to the…

Machine Learning · Computer Science 2021-04-14 Yao-Hung Hubert Tsai , Martin Q. Ma , Muqiao Yang , Han Zhao , Louis-Philippe Morency , Ruslan Salakhutdinov

Cross-Modal Discrete Representation Learning

Recent advances in representation learning have demonstrated an ability to represent information from different modalities such as video, text, and audio in a single high-level embedding vector. In this work we present a self-supervised…

Computer Vision and Pattern Recognition · Computer Science 2021-06-11 Alexander H. Liu , SouYoung Jin , Cheng-I Jeff Lai , Andrew Rouditchenko , Aude Oliva , James Glass

Human-Centered Prior-Guided and Task-Dependent Multi-Task Representation Learning for Action Recognition Pre-Training

Recently, much progress has been made for self-supervised action recognition. Most existing approaches emphasize the contrastive relations among videos, including appearance and motion consistency. However, two main issues remain for…

Computer Vision and Pattern Recognition · Computer Science 2022-04-28 Guanhong Wang , Keyu Lu , Yang Zhou , Zhanhao He , Gaoang Wang

Unsupervised Learning of View-invariant Action Representations

The recent success in human action recognition with deep learning methods mostly adopt the supervised learning paradigm, which requires significant amount of manually labeled data to achieve good performance. However, label collection is an…

Computer Vision and Pattern Recognition · Computer Science 2018-09-07 Junnan Li , Yongkang Wong , Qi Zhao , Mohan S. Kankanhalli

Visually Guided Self Supervised Learning of Speech Representations

Self supervised representation learning has recently attracted a lot of research interest for both the audio and visual modalities. However, most works typically focus on a particular modality or feature alone and there has been very…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-21 Abhinav Shukla , Konstantinos Vougioukas , Pingchuan Ma , Stavros Petridis , Maja Pantic

Representation Learning for Compressed Video Action Recognition via Attentive Cross-modal Interaction with Motion Enhancement

Compressed video action recognition has recently drawn growing attention, since it remarkably reduces the storage and computational cost via replacing raw videos by sparsely sampled RGB frames and compressed motion cues (e.g., motion…

Computer Vision and Pattern Recognition · Computer Science 2025-10-06 Bing Li , Jiaxin Chen , Dongming Zhang , Xiuguo Bao , Di Huang

Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks

Deep learning models have achieved excellent recognition results on large-scale video benchmarks. However, they perform poorly when applied to videos with rare scenes or objects, primarily due to the bias of existing video datasets. We…

Computer Vision and Pattern Recognition · Computer Science 2022-09-21 Haodong Duan , Yue Zhao , Kai Chen , Yuanjun Xiong , Dahua Lin

Unsupervised Learning of Long-Term Motion Dynamics for Videos

We present an unsupervised representation learning approach that compactly encodes the motion dependencies in videos. Given a pair of images from a video clip, our framework learns to predict the long-term 3D motions. To reduce the…

Computer Vision and Pattern Recognition · Computer Science 2017-04-13 Zelun Luo , Boya Peng , De-An Huang , Alexandre Alahi , Li Fei-Fei

Learning to Identify Physical Parameters from Video Using Differentiable Physics

Video representation learning has recently attracted attention in computer vision due to its applications for activity and scene forecasting or vision-based planning and control. Video prediction models often learn a latent representation…

Computer Vision and Pattern Recognition · Computer Science 2020-09-18 Rama Krishna Kandukuri , Jan Achterhold , Michael Möller , Jörg Stückler

Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies

Self-supervised speech representations have been shown to be effective in a variety of speech applications. However, existing representation learning methods generally rely on the autoregressive model and/or observed global dependencies…

Computation and Language · Computer Science 2020-11-03 Alexander H. Liu , Yu-An Chung , James Glass

Self-attention Multi-view Representation Learning with Diversity-promoting Complementarity

Multi-view learning attempts to generate a model with a better performance by exploiting the consensus and/or complementarity among multi-view data. However, in terms of complementarity, most existing approaches only can find…

Machine Learning · Computer Science 2022-01-04 Jian-wei Liu , Xi-hao Ding , Run-kun Lu , Xionglin Luo

Visual Representation Learning with Stochastic Frame Prediction

Self-supervised learning of image representations by predicting future frames is a promising direction but still remains a challenge. This is because of the under-determined nature of frame prediction; multiple potential futures can arise…

Computer Vision and Pattern Recognition · Computer Science 2024-08-12 Huiwon Jang , Dongyoung Kim , Junsu Kim , Jinwoo Shin , Pieter Abbeel , Younggyo Seo

Unsupervised Learning of Video Representations via Dense Trajectory Clustering

This paper addresses the task of unsupervised learning of representations for action recognition in videos. Previous works proposed to utilize future prediction, or other domain-specific objectives to train a network, but achieved only…

Computer Vision and Pattern Recognition · Computer Science 2020-06-30 Pavel Tokmakov , Martial Hebert , Cordelia Schmid

Self-supervised Co-training for Video Representation Learning

The objective of this paper is visual-only self-supervised video representation learning. We make the following contributions: (i) we investigate the benefit of adding semantic-class positives to instance-based Info Noise Contrastive…

Computer Vision and Pattern Recognition · Computer Science 2021-01-13 Tengda Han , Weidi Xie , Andrew Zisserman