English
Related papers

Related papers: Unsupervised Video Decomposition using Spatio-temp…

200 papers

The ability to decompose complex multi-object scenes into meaningful abstractions like objects is fundamental to achieve higher-level cognition. Previous approaches for unsupervised object-oriented scene representation learning are either…

Machine Learning · Computer Science 2020-03-17 Zhixuan Lin , Yi-Fu Wu , Skand Vishwanath Peri , Weihao Sun , Gautam Singh , Fei Deng , Jindong Jiang , Sungjin Ahn

We address the problem of video representation learning without human-annotated labels. While previous efforts address the problem by designing novel self-supervised tasks using video data, the learned features are merely on a…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Jiangliu Wang , Jianbo Jiao , Linchao Bao , Shengfeng He , Yunhui Liu , Wei Liu

We present a large-scale study on unsupervised spatiotemporal representation learning from videos. With a unified perspective on four recent image-based frameworks, we study a simple objective that can easily generalize all these methods to…

Computer Vision and Pattern Recognition · Computer Science 2021-04-30 Christoph Feichtenhofer , Haoqi Fan , Bo Xiong , Ross Girshick , Kaiming He

Human perception is structured around objects which form the basis for our higher-level cognition and impressive systematic generalization abilities. Yet most work on representation learning focuses on feature learning without even…

We describe a new spatio-temporal video autoencoder, based on a classic spatial image autoencoder and a novel nested temporal autoencoder. The temporal encoder is represented by a differentiable visual memory composed of convolutional long…

Machine Learning · Computer Science 2016-09-02 Viorica Patraucean , Ankur Handa , Roberto Cipolla

Video prediction is a crucial task for intelligent agents such as robots and autonomous vehicles, since it enables them to anticipate and act early on time-critical incidents. State-of-the-art video prediction methods typically model the…

Computer Vision and Pattern Recognition · Computer Science 2025-01-22 Eliyas Suleyman , Paul Henderson , Nicolas Pugeault

This paper proposes a novel pretext task to address the self-supervised video representation learning problem. Specifically, given an unlabeled video clip, we compute a series of spatio-temporal statistical summaries, such as the spatial…

Computer Vision and Pattern Recognition · Computer Science 2021-02-01 Jiangliu Wang , Jianbo Jiao , Linchao Bao , Shengfeng He , Wei Liu , Yun-hui Liu

We use multilayer Long Short Term Memory (LSTM) networks to learn representations of video sequences. Our model uses an encoder LSTM to map an input sequence into a fixed length representation. This representation is decoded using single or…

Machine Learning · Computer Science 2016-01-05 Nitish Srivastava , Elman Mansimov , Ruslan Salakhutdinov

Large intra-class variation is the result of changes in multiple object characteristics. Images, however, only show the superposition of different variable factors such as appearance or shape. Therefore, learning to disentangle and…

Computer Vision and Pattern Recognition · Computer Science 2019-06-18 Dominik Lorenz , Leonard Bereska , Timo Milbich , Björn Ommer

When perceiving the world from multiple viewpoints, humans have the ability to reason about the complete objects in a compositional manner even when an object is completely occluded from certain viewpoints. Meanwhile, humans are able to…

Computer Vision and Pattern Recognition · Computer Science 2023-10-27 Chengmin Gao , Bin Li

Intrinsic image decomposition, which is an essential task in computer vision, aims to infer the reflectance and shading of the scene. It is challenging since it needs to separate one image into two components. To tackle this, conventional…

Computer Vision and Pattern Recognition · Computer Science 2020-05-28 Yunfei Liu , Yu Li , Shaodi You , Feng Lu

Video prediction is a pixel-wise dense prediction task to infer future frames based on past frames. Missing appearance details and motion blur are still two major problems for current predictive models, which lead to image distortion and…

Computer Vision and Pattern Recognition · Computer Science 2020-05-25 Beibei Jin , Yu Hu , Qiankun Tang , Jingyu Niu , Zhiping Shi , Yinhe Han , Xiaowei Li

The ability to detect and track objects in the visual world is a crucial skill for any intelligent agent, as it is a necessary precursor to any object-level reasoning process. Moreover, it is important that agents learn to track objects…

Machine Learning · Computer Science 2019-11-21 Eric Crawford , Joelle Pineau

Assigning consistent temporal identifiers to multiple moving objects in a video sequence is a challenging problem. A solution to that problem would have immediate ramifications in multiple object tracking and segmentation problems. We…

Computer Vision and Pattern Recognition · Computer Science 2021-11-08 Abubakar Siddique , Reza Jalil Mozhdehi , Henry Medeiros

This paper introduces a novel method for self-supervised video representation learning via feature prediction. In contrast to the previous methods that focus on future feature prediction, we argue that a supervisory signal arising from…

Computer Vision and Pattern Recognition · Computer Science 2020-11-13 Nadine Behrmann , Juergen Gall , Mehdi Noroozi

Extracting and predicting object structure and dynamics from videos without supervision is a major challenge in machine learning. To address this challenge, we adopt a keypoint-based image representation and learn a stochastic dynamics…

Computer Vision and Pattern Recognition · Computer Science 2020-03-03 Matthias Minderer , Chen Sun , Ruben Villegas , Forrester Cole , Kevin Murphy , Honglak Lee

To help agents reason about scenes in terms of their building blocks, we wish to extract the compositional structure of any given scene (in particular, the configuration and characteristics of objects comprising the scene). This problem is…

Computer Vision and Pattern Recognition · Computer Science 2021-12-07 Rishabh Kabra , Daniel Zoran , Goker Erdogan , Loic Matthey , Antonia Creswell , Matthew Botvinick , Alexander Lerchner , Christopher P. Burgess

Unsupervised multi-object segmentation has shown impressive results on images by utilizing powerful semantics learned from self-supervised pretraining. An additional modality such as depth or motion is often used to facilitate the…

Computer Vision and Pattern Recognition · Computer Science 2023-10-12 Görkay Aydemir , Weidi Xie , Fatma Güney

Visual scenes are extremely rich in diversity, not only because there are infinite combinations of objects and background, but also because the observations of the same scene may vary greatly with the change of viewpoints. When observing a…

Computer Vision and Pattern Recognition · Computer Science 2021-12-14 Jinyang Yuan , Bin Li , Xiangyang Xue

We present a novel technique for self-supervised video representation learning by: (a) decoupling the learning objective into two contrastive subtasks respectively emphasizing spatial and temporal features, and (b) performing it…

Computer Vision and Pattern Recognition · Computer Science 2021-09-02 Zehua Zhang , David Crandall
‹ Prev 1 2 3 10 Next ›