English
Related papers

Related papers: Conditional Object-Centric Learning from Video

200 papers

Unsupervised object-centric learning from videos is a promising approach towards learning compositional representations that can be applied to various downstream tasks, such as prediction and reasoning. Recently, it was shown that…

Computer Vision and Pattern Recognition · Computer Science 2024-10-22 Cristian Meo , Akihiro Nakano , Mircea Lică , Aniket Didolkar , Masahiro Suzuki , Anirudh Goyal , Mengmi Zhang , Justin Dauwels , Yutaka Matsuo , Yoshua Bengio

Unsupervised object-centric learning from videos is a promising approach to extract structured representations from large, unlabeled collections of videos. To support downstream tasks like autonomous control, these representations must be…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Anna Manasyan , Maximilian Seitzer , Filip Radovic , Georg Martius , Andrii Zadaianchuk

Unsupervised video-based object-centric learning is a promising avenue to learn structured representations from large, unlabeled video collections, but previous approaches have only managed to scale to real-world datasets in restricted…

Computer Vision and Pattern Recognition · Computer Science 2024-03-18 Andrii Zadaianchuk , Maximilian Seitzer , Georg Martius

When perceiving the world from multiple viewpoints, humans have the ability to reason about the complete objects in a compositional manner even when an object is completely occluded from certain viewpoints. Meanwhile, humans are able to…

Computer Vision and Pattern Recognition · Computer Science 2023-10-27 Chengmin Gao , Bin Li

A central goal in AI is to represent scenes as compositions of discrete objects, enabling fine-grained, controllable image and video generation. Yet leading diffusion models treat images holistically and rely on text conditioning, creating…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Adil Kaan Akan

We present a novel framework for compositional video synthesis that leverages temporally consistent object-centric representations, extending our previous work, SlotAdapt, from images to video. While existing object-centric approaches…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Adil Kaan Akan , Yucel Yemez

The aim of object-centric vision is to construct an explicit representation of the objects in a scene. This representation is obtained via a set of interchangeable modules called \emph{slots} or \emph{object files} that compete for local…

Computer Vision and Pattern Recognition · Computer Science 2023-06-06 Ayush Chakravarthy , Trang Nguyen , Anirudh Goyal , Yoshua Bengio , Michael C. Mozer

Object-centric representation learning aims to decompose visual scenes into fixed-size vectors called "slots" or "object files", where each slot captures a distinct object. Current state-of-the-art object-centric models have shown…

Computer Vision and Pattern Recognition · Computer Science 2025-03-28 Aniket Didolkar , Andrii Zadaianchuk , Rabiul Awal , Maximilian Seitzer , Efstratios Gavves , Aishwarya Agrawal

Unsupervised multi-object segmentation has shown impressive results on images by utilizing powerful semantics learned from self-supervised pretraining. An additional modality such as depth or motion is often used to facilitate the…

Computer Vision and Pattern Recognition · Computer Science 2023-10-12 Görkay Aydemir , Weidi Xie , Fatma Güney

Unsupervised object-centric learning aims to represent the modular, compositional, and causal structure of a scene as a set of object representations and thereby promises to resolve many critical limitations of traditional single-vector…

Computer Vision and Pattern Recognition · Computer Science 2022-05-30 Gautam Singh , Yi-Fu Wu , Sungjin Ahn

Robotic manipulation in complex open-world scenarios requires both reliable physical manipulation skills and effective and generalizable perception. In this paper, we propose a method where general purpose pretrained visual models serve as…

Robotics · Computer Science 2017-09-27 Coline Devin , Pieter Abbeel , Trevor Darrell , Sergey Levine

The ability to decompose complex natural scenes into meaningful object-centric abstractions lies at the core of human perception and reasoning. In the recent culmination of unsupervised object-centric learning, the Slot-Attention module has…

Computer Vision and Pattern Recognition · Computer Science 2023-02-13 Baoxiong Jia , Yu Liu , Siyuan Huang

Learning structured representations of the visual world in terms of objects promises to significantly improve the generalization abilities of current machine learning models. While recent efforts to this end have shown promising empirical…

Machine Learning · Computer Science 2023-05-24 Jack Brady , Roland S. Zimmermann , Yash Sharma , Bernhard Schölkopf , Julius von Kügelgen , Wieland Brendel

In order to interact with the world, agents must be able to predict the results of the world's dynamics. A natural approach to learn about these dynamics is through video prediction, as cameras are ubiquitous and powerful sensors. Direct…

Computer Vision and Pattern Recognition · Computer Science 2021-05-07 Karl Schmeckpeper , Georgios Georgakis , Kostas Daniilidis

Despite their irresistible success, deep learning algorithms still heavily rely on annotated data. On the other hand, unsupervised settings pose many challenges, especially about determining the right inductive bias in diverse scenarios.…

Computer Vision and Pattern Recognition · Computer Science 2021-03-11 Beril Besbinar , Pascal Frossard

Learning compositional representation is a key aspect of object-centric learning as it enables flexible systematic generalization and supports complex visual reasoning. However, most of the existing approaches rely on auto-encoding…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Whie Jung , Jaehoon Yoo , Sungjin Ahn , Seunghoon Hong

Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT)…

Learning object-centric representations of complex scenes is a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep learning approaches learn distributed representations that do not…

This paper focuses on building object-centric representations for long-term action anticipation in videos. Our key motivation is that objects provide important cues to recognize and predict human-object interactions, especially when the…

Computer Vision and Pattern Recognition · Computer Science 2023-11-02 Ce Zhang , Changcheng Fu , Shijie Wang , Nakul Agarwal , Kwonjoon Lee , Chiho Choi , Chen Sun

Learning visual representations from observing actions to benefit robot visuo-motor policy generation is a promising direction that closely resembles human cognitive function and perception. Motivated by this, and further inspired by…

‹ Prev 1 2 3 10 Next ›