English
Related papers

Related papers: Memory-augmented Dense Predictive Coding for Video…

200 papers

The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition. We make three contributions: First, we introduce the Dense Predictive Coding (DPC) framework for…

Computer Vision and Pattern Recognition · Computer Science 2019-09-30 Tengda Han , Weidi Xie , Andrew Zisserman

A key challenge in self-supervised video representation learning is how to effectively capture motion information besides context bias. While most existing works implicitly achieve this with video-specific pretext tasks (e.g., predicting…

Computer Vision and Pattern Recognition · Computer Science 2021-04-05 Lianghua Huang , Yu Liu , Bin Wang , Pan Pan , Yinghui Xu , Rong Jin

Self-supervised tasks have been utilized to build useful representations that can be used in downstream tasks when the annotation is unavailable. In this paper, we introduce a self-supervised video representation learning method based on…

Computer Vision and Pattern Recognition · Computer Science 2021-02-23 Duc Quang Vu , Ngan T. H. Le , Jia-Ching Wang

Self-supervised learning has become an increasingly important paradigm in the domain of machine intelligence. Furthermore, evidence for self-supervised adaptation, such as contrastive formulations, has emerged in recent computational…

Neural and Evolutionary Computing · Computer Science 2025-03-31 Alexander Ororbia , Karl Friston , Rajesh P. N. Rao

We use multilayer Long Short Term Memory (LSTM) networks to learn representations of video sequences. Our model uses an encoder LSTM to map an input sequence into a fixed length representation. This representation is decoded using single or…

Machine Learning · Computer Science 2016-01-05 Nitish Srivastava , Elman Mansimov , Ruslan Salakhutdinov

The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking). We make the following contributions: (i) we propose to improve the existing…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Fangrui Zhu , Li Zhang , Yanwei Fu , Guodong Guo , Weidi Xie

This paper introduces Relative Predictive Coding (RPC), a new contrastive representation learning objective that maintains a good balance among training stability, minibatch size sensitivity, and downstream task performance. The key to the…

Machine Learning · Computer Science 2021-04-14 Yao-Hung Hubert Tsai , Martin Q. Ma , Muqiao Yang , Han Zhao , Louis-Philippe Morency , Ruslan Salakhutdinov

Recent advances in representation learning have demonstrated an ability to represent information from different modalities such as video, text, and audio in a single high-level embedding vector. In this work we present a self-supervised…

Computer Vision and Pattern Recognition · Computer Science 2021-06-11 Alexander H. Liu , SouYoung Jin , Cheng-I Jeff Lai , Andrew Rouditchenko , Aude Oliva , James Glass

Recently, much progress has been made for self-supervised action recognition. Most existing approaches emphasize the contrastive relations among videos, including appearance and motion consistency. However, two main issues remain for…

Computer Vision and Pattern Recognition · Computer Science 2022-04-28 Guanhong Wang , Keyu Lu , Yang Zhou , Zhanhao He , Gaoang Wang

The recent success in human action recognition with deep learning methods mostly adopt the supervised learning paradigm, which requires significant amount of manually labeled data to achieve good performance. However, label collection is an…

Computer Vision and Pattern Recognition · Computer Science 2018-09-07 Junnan Li , Yongkang Wong , Qi Zhao , Mohan S. Kankanhalli

Self supervised representation learning has recently attracted a lot of research interest for both the audio and visual modalities. However, most works typically focus on a particular modality or feature alone and there has been very…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-21 Abhinav Shukla , Konstantinos Vougioukas , Pingchuan Ma , Stavros Petridis , Maja Pantic

Compressed video action recognition has recently drawn growing attention, since it remarkably reduces the storage and computational cost via replacing raw videos by sparsely sampled RGB frames and compressed motion cues (e.g., motion…

Computer Vision and Pattern Recognition · Computer Science 2025-10-06 Bing Li , Jiaxin Chen , Dongming Zhang , Xiuguo Bao , Di Huang

Deep learning models have achieved excellent recognition results on large-scale video benchmarks. However, they perform poorly when applied to videos with rare scenes or objects, primarily due to the bias of existing video datasets. We…

Computer Vision and Pattern Recognition · Computer Science 2022-09-21 Haodong Duan , Yue Zhao , Kai Chen , Yuanjun Xiong , Dahua Lin

We present an unsupervised representation learning approach that compactly encodes the motion dependencies in videos. Given a pair of images from a video clip, our framework learns to predict the long-term 3D motions. To reduce the…

Computer Vision and Pattern Recognition · Computer Science 2017-04-13 Zelun Luo , Boya Peng , De-An Huang , Alexandre Alahi , Li Fei-Fei

Video representation learning has recently attracted attention in computer vision due to its applications for activity and scene forecasting or vision-based planning and control. Video prediction models often learn a latent representation…

Computer Vision and Pattern Recognition · Computer Science 2020-09-18 Rama Krishna Kandukuri , Jan Achterhold , Michael Möller , Jörg Stückler

Self-supervised speech representations have been shown to be effective in a variety of speech applications. However, existing representation learning methods generally rely on the autoregressive model and/or observed global dependencies…

Computation and Language · Computer Science 2020-11-03 Alexander H. Liu , Yu-An Chung , James Glass

Multi-view learning attempts to generate a model with a better performance by exploiting the consensus and/or complementarity among multi-view data. However, in terms of complementarity, most existing approaches only can find…

Machine Learning · Computer Science 2022-01-04 Jian-wei Liu , Xi-hao Ding , Run-kun Lu , Xionglin Luo

Self-supervised learning of image representations by predicting future frames is a promising direction but still remains a challenge. This is because of the under-determined nature of frame prediction; multiple potential futures can arise…

Computer Vision and Pattern Recognition · Computer Science 2024-08-12 Huiwon Jang , Dongyoung Kim , Junsu Kim , Jinwoo Shin , Pieter Abbeel , Younggyo Seo

This paper addresses the task of unsupervised learning of representations for action recognition in videos. Previous works proposed to utilize future prediction, or other domain-specific objectives to train a network, but achieved only…

Computer Vision and Pattern Recognition · Computer Science 2020-06-30 Pavel Tokmakov , Martial Hebert , Cordelia Schmid

The objective of this paper is visual-only self-supervised video representation learning. We make the following contributions: (i) we investigate the benefit of adding semantic-class positives to instance-based Info Noise Contrastive…

Computer Vision and Pattern Recognition · Computer Science 2021-01-13 Tengda Han , Weidi Xie , Andrew Zisserman
‹ Prev 1 2 3 10 Next ›