Related papers: Video 3D Sampling for Self-supervised Representati…

Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics

We address the problem of video representation learning without human-annotated labels. While previous efforts address the problem by designing novel self-supervised tasks using video data, the learned features are merely on a…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Jiangliu Wang , Jianbo Jiao , Linchao Bao , Shengfeng He , Yunhui Liu , Wei Liu

Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics

This paper proposes a novel pretext task to address the self-supervised video representation learning problem. Specifically, given an unlabeled video clip, we compute a series of spatio-temporal statistical summaries, such as the spatial…

Computer Vision and Pattern Recognition · Computer Science 2021-02-01 Jiangliu Wang , Jianbo Jiao , Linchao Bao , Shengfeng He , Wei Liu , Yun-hui Liu

Self-Supervised Visual Learning by Variable Playback Speeds Prediction of a Video

We propose a self-supervised visual learning method by predicting the variable playback speeds of a video. Without semantic labels, we learn the spatio-temporal visual representation of the video by leveraging the variations in the visual…

Computer Vision and Pattern Recognition · Computer Science 2021-06-02 Hyeon Cho , Taehoon Kim , Hyung Jin Chang , Wonjun Hwang

Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles

Self-supervised tasks such as colorization, inpainting and zigsaw puzzle have been utilized for visual representation learning for still images, when the number of labeled images is limited or absent at all. Recently, this worthwhile stream…

Computer Vision and Pattern Recognition · Computer Science 2018-11-27 Dahun Kim , Donghyeon Cho , In So Kweon

Self-Supervised Representation Learning for Visual Anomaly Detection

Self-supervised learning allows for better utilization of unlabelled data. The feature representation obtained by self-supervision can be used in downstream tasks such as classification, object detection, segmentation, and anomaly…

Computer Vision and Pattern Recognition · Computer Science 2020-06-18 Rabia Ali , Muhammad Umar Karim Khan , Chong Min Kyung

Time-Contrastive Networks: Self-Supervised Learning from Video

We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used in two robotic imitation settings:…

Computer Vision and Pattern Recognition · Computer Science 2018-03-21 Pierre Sermanet , Corey Lynch , Yevgen Chebotar , Jasmine Hsu , Eric Jang , Stefan Schaal , Sergey Levine

Self-Supervised Learning via multi-Transformation Classification for Action Recognition

Self-supervised tasks have been utilized to build useful representations that can be used in downstream tasks when the annotation is unavailable. In this paper, we introduce a self-supervised video representation learning method based on…

Computer Vision and Pattern Recognition · Computer Science 2021-02-23 Duc Quang Vu , Ngan T. H. Le , Jia-Ching Wang

Self-supervised 3D Representation Learning of Dressed Humans from Social Media Videos

A key challenge of learning a visual representation for the 3D high fidelity geometry of dressed humans lies in the limited availability of the ground truth data (e.g., 3D scanned models), which results in the performance degradation of 3D…

Computer Vision and Pattern Recognition · Computer Science 2022-12-29 Yasamin Jafarian , Hyun Soo Park

Self-Supervised Learning for Videos: A Survey

The remarkable success of deep learning in various domains relies on the availability of large-scale annotated datasets. However, obtaining annotations is expensive and requires great effort, which is especially challenging for videos.…

Computer Vision and Pattern Recognition · Computer Science 2023-07-20 Madeline C. Schiappa , Yogesh S. Rawat , Mubarak Shah

Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization

We propose a self-supervised method for learning motion-focused video representations. Existing approaches minimize distances between temporally augmented videos, which maintain high spatial similarity. We instead propose to learn…

Computer Vision and Pattern Recognition · Computer Science 2023-09-29 Fida Mohammad Thoker , Hazel Doughty , Cees Snoek

A Large-Scale Analysis on Self-Supervised Video Representation Learning

Self-supervised learning is an effective way for label-free model pre-training, especially in the video domain where labeling is expensive. Existing self-supervised works in the video domain use varying experimental setups to demonstrate…

Computer Vision and Pattern Recognition · Computer Science 2023-11-22 Akash Kumar , Ashlesha Kumar , Vibhav Vineet , Yogesh Singh Rawat

Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations

Spatially dense self-supervised learning is a rapidly growing problem domain with promising applications for unsupervised segmentation and pretraining for dense downstream tasks. Despite the abundance of temporal data in the form of videos,…

Computer Vision and Pattern Recognition · Computer Science 2023-08-24 Mohammadreza Salehi , Efstratios Gavves , Cees G. M. Snoek , Yuki M. Asano

Semi-supervised 3D Video Information Retrieval with Deep Neural Network and Bi-directional Dynamic-time Warping Algorithm

This paper presents a novel semi-supervised deep learning algorithm for retrieving similar 2D and 3D videos based on visual content. The proposed approach combines the power of deep convolutional and recurrent neural networks with dynamic…

Computer Vision and Pattern Recognition · Computer Science 2023-09-06 Yintai Ma , Diego Klabjan

Progressive Video Summarization via Multimodal Self-supervised Learning

Modern video summarization methods are based on deep neural networks that require a large amount of annotated data for training. However, existing datasets for video summarization are small-scale, easily leading to over-fitting of the deep…

Computer Vision and Pattern Recognition · Computer Science 2022-10-20 Li Haopeng , Ke Qiuhong , Gong Mingming , Tom Drummond

3D-CSL: self-supervised 3D context similarity learning for Near-Duplicate Video Retrieval

In this paper, we introduce 3D-CSL, a compact pipeline for Near-Duplicate Video Retrieval (NDVR), and explore a novel self-supervised learning strategy for video similarity learning. Most previous methods only extract video spatial features…

Computer Vision and Pattern Recognition · Computer Science 2022-11-11 Rui Deng , Qian Wu , Yuke Li

Video Autoencoder: self-supervised disentanglement of static 3D structure and motion

A video autoencoder is proposed for learning disentan- gled representations of 3D structure and camera pose from videos in a self-supervised manner. Relying on temporal continuity in videos, our work assumes that the 3D scene structure in…

Computer Vision and Pattern Recognition · Computer Science 2021-10-07 Zihang Lai , Sifei Liu , Alexei A. Efros , Xiaolong Wang

Time-Equivariant Contrastive Video Representation Learning

We introduce a novel self-supervised contrastive learning method to learn representations from unlabelled videos. Existing approaches ignore the specifics of input distortions, e.g., by learning invariance to temporal transformations.…

Computer Vision and Pattern Recognition · Computer Science 2021-12-08 Simon Jenni , Hailin Jin

Learning by Aligning Videos in Time

We present a self-supervised approach for learning video representations using temporal video alignment as a pretext task, while exploiting both frame-level and video-level information. We leverage a novel combination of temporal alignment…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Sanjay Haresh , Sateesh Kumar , Huseyin Coskun , Shahram Najam Syed , Andrey Konin , Muhammad Zeeshan Zia , Quoc-Huy Tran

Audio-Visual Contrastive Learning with Temporal Self-Supervision

We propose a self-supervised learning approach for videos that learns representations of both the RGB frames and the accompanying audio without human supervision. In contrast to images that capture the static scene appearance, videos also…

Computer Vision and Pattern Recognition · Computer Science 2023-02-16 Simon Jenni , Alexander Black , John Collomosse

Self-supervised Video Representation Learning by Pace Prediction

This paper addresses the problem of self-supervised video representation learning from a new perspective -- by video pace prediction. It stems from the observation that human visual system is sensitive to video pace, e.g., slow motion, a…

Computer Vision and Pattern Recognition · Computer Science 2020-09-07 Jiangliu Wang , Jianbo Jiao , Yun-Hui Liu