Related papers: Controllable Augmentations for Video Representatio…

Semi-Supervised Contrastive Learning for Controllable Video-to-Music Retrieval

Content creators often use music to enhance their videos, from soundtracks in movies to background music in video blogs and social media content. However, identifying the best music for a video can be a difficult and time-consuming task. To…

Multimedia · Computer Science 2024-12-24 Shanti Stewart , Gouthaman KV , Lie Lu , Andrea Fanelli

Self-Supervised Video Representation Learning by Video Incoherence Detection

This paper introduces a novel self-supervised method that leverages incoherence detection for video representation learning. It roots from the observation that visual systems of human beings can easily identify video incoherence based on…

Computer Vision and Pattern Recognition · Computer Science 2021-09-28 Haozhi Cao , Yuecong Xu , Jianfei Yang , Kezhi Mao , Lihua Xie , Jianxiong Yin , Simon See

Contrastive Learning from Demonstrations

This paper presents a framework for learning visual representations from unlabeled video demonstrations captured from multiple viewpoints. We show that these representations are applicable for imitating several robotic tasks, including pick…

Computer Vision and Pattern Recognition · Computer Science 2023-01-30 André Correia , Luís A. Alexandre

Multimodal Self-Supervised Learning of General Audio Representations

We present a multimodal framework to learn general audio representations from videos. Existing contrastive audio representation learning methods mainly focus on using the audio modality alone during training. In this work, we show that…

Sound · Computer Science 2021-04-29 Luyu Wang , Pauline Luc , Adria Recasens , Jean-Baptiste Alayrac , Aaron van den Oord

CHAIN: Exploring Global-Local Spatio-Temporal Information for Improved Self-Supervised Video Hashing

Compressing videos into binary codes can improve retrieval speed and reduce storage overhead. However, learning accurate hash codes for video retrieval can be challenging due to high local redundancy and complex global dependencies between…

Computer Vision and Pattern Recognition · Computer Science 2023-11-06 Rukai Wei , Yu Liu , Jingkuan Song , Heng Cui , Yanzhao Xie , Ke Zhou

Learning Temporal Dynamics from Cycles in Narrated Video

Learning to model how the world changes as time elapses has proven a challenging problem for the computer vision community. We propose a self-supervised solution to this problem using temporal cycle consistency jointly in vision and…

Computer Vision and Pattern Recognition · Computer Science 2021-09-14 Dave Epstein , Jiajun Wu , Cordelia Schmid , Chen Sun

CCVS: Context-aware Controllable Video Synthesis

This presentation introduces a self-supervised learning approach to the synthesis of new video clips from old ones, with several new key elements for improved spatial resolution and realism: It conditions the synthesis process on contextual…

Computer Vision and Pattern Recognition · Computer Science 2021-10-27 Guillaume Le Moing , Jean Ponce , Cordelia Schmid

Video Understanding as Machine Translation

With the advent of large-scale multimodal video datasets, especially sequences with audio or transcribed speech, there has been a growing interest in self-supervised learning of video representations. Most prior work formulates the…

Computer Vision and Pattern Recognition · Computer Science 2020-09-21 Bruno Korbar , Fabio Petroni , Rohit Girdhar , Lorenzo Torresani

Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos

We propose a unified point cloud video self-supervised learning framework for object-centric and scene-centric data. Previous methods commonly conduct representation learning at the clip or frame level and cannot well capture fine-grained…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Xiaoxiao Sheng , Zhiqiang Shen , Gang Xiao , Longguang Wang , Yulan Guo , Hehe Fan

Video alignment using unsupervised learning of local and global features

In this paper, we tackle the problem of video alignment, the process of matching the frames of a pair of videos containing similar actions. The main challenge in video alignment is that accurate correspondence should be established despite…

Computer Vision and Pattern Recognition · Computer Science 2024-09-09 Niloufar Fakhfour , Mohammad ShahverdiKondori , Sajjad Hashembeiki , Mohammadjavad Norouzi , Hoda Mohammadzade

Quality-Aware Collaborative Multi-Positive Contrastive Learning for Sequential Recommendation

The effectiveness of contrastive learning in sequential recommendation hinges on the construction of contrastive views, which ideally should be both semantically consistent and diverse. However, most existing CL-based methods rely on…

Information Retrieval · Computer Science 2026-05-13 Wei Wang

TimesURL: Self-supervised Contrastive Learning for Universal Time Series Representation Learning

Learning universal time series representations applicable to various types of downstream tasks is challenging but valuable in real applications. Recently, researchers have attempted to leverage the success of self-supervised contrastive…

Machine Learning · Computer Science 2023-12-27 Jiexi Liu , Songcan Chen

Self-Supervised Video Representation Learning with Motion-Contrastive Perception

Visual-only self-supervised learning has achieved significant improvement in video representation learning. Existing related methods encourage models to learn video representations by utilizing contrastive learning or designing specific…

Computer Vision and Pattern Recognition · Computer Science 2022-04-12 Jinyu Liu , Ying Cheng , Yuejie Zhang , Rui-Wei Zhao , Rui Feng

Self-supervised and Weakly Supervised Contrastive Learning for Frame-wise Action Representations

Previous work on action representation learning focused on global representations for short video clips. In contrast, many practical applications, such as video alignment, strongly demand learning the intensive representation of long…

Computer Vision and Pattern Recognition · Computer Science 2023-03-03 Minghao Chen , Renbo Tu , Chenxi Huang , Yuqi Lin , Boxi Wu , Deng Cai

Learning Visual Representations via Language-Guided Sampling

Although an object may appear in numerous contexts, we often describe it in a limited number of ways. Language allows us to abstract away visual variation to represent and communicate concepts. Building on this intuition, we propose an…

Computer Vision and Pattern Recognition · Computer Science 2023-03-30 Mohamed El Banani , Karan Desai , Justin Johnson

Extending Temporal Data Augmentation for Video Action Recognition

Pixel space augmentation has grown in popularity in many Deep Learning areas, due to its effectiveness, simplicity, and low computational cost. Data augmentation for videos, however, still remains an under-explored research topic, as most…

Computer Vision and Pattern Recognition · Computer Science 2022-11-10 Artjoms Gorpincenko , Michal Mackiewicz

Long-Short Temporal Contrastive Learning of Video Transformers

Video transformers have recently emerged as a competitive alternative to 3D CNNs for video understanding. However, due to their large number of parameters and reduced inductive biases, these models require supervised pretraining on…

Computer Vision and Pattern Recognition · Computer Science 2022-04-01 Jue Wang , Gedas Bertasius , Du Tran , Lorenzo Torresani

Time-Contrastive Networks: Self-Supervised Learning from Video

We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used in two robotic imitation settings:…

Computer Vision and Pattern Recognition · Computer Science 2018-03-21 Pierre Sermanet , Corey Lynch , Yevgen Chebotar , Jasmine Hsu , Eric Jang , Stefan Schaal , Sergey Levine

Video-ReTime: Learning Temporally Varying Speediness for Time Remapping

We propose a method for generating a temporally remapped video that matches the desired target duration while maximally preserving natural video dynamics. Our approach trains a neural network through self-supervision to recognize and…

Computer Vision and Pattern Recognition · Computer Science 2022-05-12 Simon Jenni , Markus Woodson , Fabian Caba Heilbron

Learning Representations from Audio-Visual Spatial Alignment

We introduce a novel self-supervised pretext task for learning representations from audio-visual content. Prior work on audio-visual representation learning leverages correspondences at the video level. Approaches based on audio-visual…

Computer Vision and Pattern Recognition · Computer Science 2020-11-04 Pedro Morgado , Yi Li , Nuno Vasconcelos