Related papers: Unsupervised Keypoint Learning for Guiding Class-C…

Motion and Context-Aware Audio-Visual Conditioned Video Prediction

The existing state-of-the-art method for audio-visual conditioned video prediction uses the latent codes of the audio-visual frames from a multimodal stochastic network and a frame encoder to predict the next visual frame. However, a direct…

Computer Vision and Pattern Recognition · Computer Science 2023-09-21 Yating Xu , Conghui Hu , Gim Hee Lee

Time-Conditioned Generative Modeling of Object-Centric Representations for Video Decomposition and Prediction

When perceiving the world from multiple viewpoints, humans have the ability to reason about the complete objects in a compositional manner even when an object is completely occluded from certain viewpoints. Meanwhile, humans are able to…

Computer Vision and Pattern Recognition · Computer Science 2023-10-27 Chengmin Gao , Bin Li

Time-Contrastive Networks: Self-Supervised Learning from Video

We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used in two robotic imitation settings:…

Computer Vision and Pattern Recognition · Computer Science 2018-03-21 Pierre Sermanet , Corey Lynch , Yevgen Chebotar , Jasmine Hsu , Eric Jang , Stefan Schaal , Sergey Levine

Object Recognition as Next Token Prediction

We present an approach to pose object recognition as next token prediction. The idea is to apply a language decoder that auto-regressively predicts the text tokens from image embeddings to form labels. To ground this prediction process in…

Computer Vision and Pattern Recognition · Computer Science 2024-04-02 Kaiyu Yue , Bor-Chun Chen , Jonas Geiping , Hengduo Li , Tom Goldstein , Ser-Nam Lim

Prediction-Tracking-Segmentation

We introduce a prediction driven method for visual tracking and segmentation in videos. Instead of solely relying on matching with appearance cues for tracking, we build a predictive model which guides finding more accurate tracking regions…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Jianren Wang , Yihui He , Xiaobo Wang , Xinjia Yu , Xia Chen

Video Diffusion Models Excel at Tracking Similar-Looking Objects Without Supervision

Distinguishing visually similar objects by their motion remains a critical challenge in computer vision. Although supervised trackers show promise, contemporary self-supervised trackers struggle when visual cues become ambiguous, limiting…

Computer Vision and Pattern Recognition · Computer Science 2025-12-03 Chenshuang Zhang , Kang Zhang , Joon Son Chung , In So Kweon , Junmo Kim , Chengzhi Mao

Self-supervisory Signals for Object Discovery and Detection

In robotic applications, we often face the challenge of discovering new objects while having very little or no labelled training data. In this paper we explore the use of self-supervision provided by a robot traversing an environment to…

Computer Vision and Pattern Recognition · Computer Science 2018-06-12 Etienne Pot , Alexander Toshev , Jana Kosecka

Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos

Learning to predict scene depth from RGB inputs is a challenging task both for indoor and outdoor robot navigation. In this work we address unsupervised learning of scene depth and robot ego-motion where supervision is provided by monocular…

Computer Vision and Pattern Recognition · Computer Science 2018-11-16 Vincent Casser , Soeren Pirk , Reza Mahjourian , Anelia Angelova

Self-supervised Learning of Geometrically Stable Features Through Probabilistic Introspection

Self-supervision can dramatically cut back the amount of manually-labelled data required to train deep neural networks. While self-supervision has usually been considered for tasks such as image classification, in this paper we aim at…

Computer Vision and Pattern Recognition · Computer Science 2018-04-06 David Novotny , Samuel Albanie , Diane Larlus , Andrea Vedaldi

Object landmark discovery through unsupervised adaptation

This paper proposes a method to ease the unsupervised learning of object landmark detectors. Similarly to previous methods, our approach is fully unsupervised in a sense that it does not require or make any use of annotated landmarks for…

Computer Vision and Pattern Recognition · Computer Science 2019-10-22 Enrique Sanchez , Georgios Tzimiropoulos

Enhancing Cell Tracking with a Time-Symmetric Deep Learning Approach

The accurate tracking of live cells using video microscopy recordings remains a challenging task for popular state-of-the-art image processing based object tracking methods. In recent years, several existing and new applications have…

Image and Video Processing · Electrical Eng. & Systems 2025-02-03 Gergely Szabó , Paolo Bonaiuti , Andrea Ciliberto , András Horváth

Learning to See by Moving

The dominant paradigm for feature learning in computer vision relies on training neural networks for the task of object recognition using millions of hand labelled images. Is it possible to learn useful features for a diverse set of visual…

Computer Vision and Pattern Recognition · Computer Science 2015-09-15 Pulkit Agrawal , Joao Carreira , Jitendra Malik

Visual Pre-Training on Unlabeled Images using Reinforcement Learning

In reinforcement learning (RL), value-based algorithms learn to associate each observation with the states and rewards that are likely to be reached from it. We observe that many self-supervised image pre-training methods bear similarity to…

Machine Learning · Computer Science 2025-06-16 Dibya Ghosh , Sergey Levine

Motion Segmentation using Frequency Domain Transformer Networks

Self-supervised prediction is a powerful mechanism to learn representations that capture the underlying structure of the data. Despite recent progress, the self-supervised video prediction task is still challenging. One of the critical…

Computer Vision and Pattern Recognition · Computer Science 2020-04-21 Hafez Farazi , Sven Behnke

Unsupervised Learning of Video Representations via Dense Trajectory Clustering

This paper addresses the task of unsupervised learning of representations for action recognition in videos. Previous works proposed to utilize future prediction, or other domain-specific objectives to train a network, but achieved only…

Computer Vision and Pattern Recognition · Computer Science 2020-06-30 Pavel Tokmakov , Martial Hebert , Cordelia Schmid

Unsupervised Part Discovery by Unsupervised Disentanglement

We address the problem of discovering part segmentations of articulated objects without supervision. In contrast to keypoints, part segmentations provide information about part localizations on the level of individual pixels. Capturing both…

Computer Vision and Pattern Recognition · Computer Science 2020-09-11 Sandro Braun , Patrick Esser , Björn Ommer

Decoupled Appearance and Motion Learning for Efficient Anomaly Detection in Surveillance Video

Automating the analysis of surveillance video footage is of great interest when urban environments or industrial sites are monitored by a large number of cameras. As anomalies are often context-specific, it is hard to predefine events of…

Computer Vision and Pattern Recognition · Computer Science 2020-11-13 Bo Li , Sam Leroux , Pieter Simoens

Deep Keyframe Detection in Human Action Videos

Detecting representative frames in videos based on human actions is quite challenging because of the combined factors of human pose in action and the background. This paper addresses this problem and formulates the key frame detection as…

Computer Vision and Pattern Recognition · Computer Science 2018-04-27 Xiang Yan , Syed Zulqarnain Gilani , Hanlin Qin , Mingtao Feng , Liang Zhang , Ajmal Mian

Unsupervised Video Decomposition using Spatio-temporal Iterative Inference

Unsupervised multi-object scene decomposition is a fast-emerging problem in representation learning. Despite significant progress in static scenes, such models are unable to leverage important dynamic cues present in video. We propose a…

Computer Vision and Pattern Recognition · Computer Science 2020-06-29 Polina Zablotskaia , Edoardo A. Dominici , Leonid Sigal , Andreas M. Lehrmann

Unsupervised Part-Based Disentangling of Object Shape and Appearance

Large intra-class variation is the result of changes in multiple object characteristics. Images, however, only show the superposition of different variable factors such as appearance or shape. Therefore, learning to disentangle and…

Computer Vision and Pattern Recognition · Computer Science 2019-06-18 Dominik Lorenz , Leonard Bereska , Timo Milbich , Björn Ommer