Related papers: Conditional Object-Centric Learning from Video

Object-Centric Temporal Consistency via Conditional Autoregressive Inductive Biases

Unsupervised object-centric learning from videos is a promising approach towards learning compositional representations that can be applied to various downstream tasks, such as prediction and reasoning. Recently, it was shown that…

Computer Vision and Pattern Recognition · Computer Science 2024-10-22 Cristian Meo , Akihiro Nakano , Mircea Lică , Aniket Didolkar , Masahiro Suzuki , Anirudh Goyal , Mengmi Zhang , Justin Dauwels , Yutaka Matsuo , Yoshua Bengio

Temporally Consistent Object-Centric Learning by Contrasting Slots

Unsupervised object-centric learning from videos is a promising approach to extract structured representations from large, unlabeled collections of videos. To support downstream tasks like autonomous control, these representations must be…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Anna Manasyan , Maximilian Seitzer , Filip Radovic , Georg Martius , Andrii Zadaianchuk

Object-Centric Learning for Real-World Videos by Predicting Temporal Feature Similarities

Unsupervised video-based object-centric learning is a promising avenue to learn structured representations from large, unlabeled video collections, but previous approaches have only managed to scale to real-world datasets in restricted…

Computer Vision and Pattern Recognition · Computer Science 2024-03-18 Andrii Zadaianchuk , Maximilian Seitzer , Georg Martius

Time-Conditioned Generative Modeling of Object-Centric Representations for Video Decomposition and Prediction

When perceiving the world from multiple viewpoints, humans have the ability to reason about the complete objects in a compositional manner even when an object is completely occluded from certain viewpoints. Meanwhile, humans are able to…

Computer Vision and Pattern Recognition · Computer Science 2023-10-27 Chengmin Gao , Bin Li

Learning Object-Centric Representations Based on Slots in Real World Scenarios

A central goal in AI is to represent scenes as compositions of discrete objects, enabling fine-grained, controllable image and video generation. Yet leading diffusion models treat images holistically and rely on text conditioning, creating…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Adil Kaan Akan

Compositional Video Synthesis by Temporal Object-Centric Learning

We present a novel framework for compositional video synthesis that leverages temporally consistent object-centric representations, extending our previous work, SlotAdapt, from images to video. While existing object-centric approaches…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Adil Kaan Akan , Yucel Yemez

Spotlight Attention: Robust Object-Centric Learning With a Spatial Locality Prior

The aim of object-centric vision is to construct an explicit representation of the objects in a scene. This representation is obtained via a set of interchangeable modules called \emph{slots} or \emph{object files} that compete for local…

Computer Vision and Pattern Recognition · Computer Science 2023-06-06 Ayush Chakravarthy , Trang Nguyen , Anirudh Goyal , Yoshua Bengio , Michael C. Mozer

CTRL-O: Language-Controllable Object-Centric Visual Representation Learning

Object-centric representation learning aims to decompose visual scenes into fixed-size vectors called "slots" or "object files", where each slot captures a distinct object. Current state-of-the-art object-centric models have shown…

Computer Vision and Pattern Recognition · Computer Science 2025-03-28 Aniket Didolkar , Andrii Zadaianchuk , Rabiul Awal , Maximilian Seitzer , Efstratios Gavves , Aishwarya Agrawal

Self-supervised Object-Centric Learning for Videos

Unsupervised multi-object segmentation has shown impressive results on images by utilizing powerful semantics learned from self-supervised pretraining. An additional modality such as depth or motion is often used to facilitate the…

Computer Vision and Pattern Recognition · Computer Science 2023-10-12 Görkay Aydemir , Weidi Xie , Fatma Güney

Simple Unsupervised Object-Centric Learning for Complex and Naturalistic Videos

Unsupervised object-centric learning aims to represent the modular, compositional, and causal structure of a scene as a set of object representations and thereby promises to resolve many critical limitations of traditional single-vector…

Computer Vision and Pattern Recognition · Computer Science 2022-05-30 Gautam Singh , Yi-Fu Wu , Sungjin Ahn

Deep Object-Centric Representations for Generalizable Robot Learning

Robotic manipulation in complex open-world scenarios requires both reliable physical manipulation skills and effective and generalizable perception. In this paper, we propose a method where general purpose pretrained visual models serve as…

Robotics · Computer Science 2017-09-27 Coline Devin , Pieter Abbeel , Trevor Darrell , Sergey Levine

Improving Object-centric Learning with Query Optimization

The ability to decompose complex natural scenes into meaningful object-centric abstractions lies at the core of human perception and reasoning. In the recent culmination of unsupervised object-centric learning, the Slot-Attention module has…

Computer Vision and Pattern Recognition · Computer Science 2023-02-13 Baoxiong Jia , Yu Liu , Siyuan Huang

Provably Learning Object-Centric Representations

Learning structured representations of the visual world in terms of objects promises to significantly improve the generalization abilities of current machine learning models. While recent efforts to this end have shown promising empirical…

Machine Learning · Computer Science 2023-05-24 Jack Brady , Roland S. Zimmermann , Yash Sharma , Bernhard Schölkopf , Julius von Kügelgen , Wieland Brendel

Object-centric Video Prediction without Annotation

In order to interact with the world, agents must be able to predict the results of the world's dynamics. A natural approach to learn about these dynamics is through video prediction, as cameras are ubiquitous and powerful sensors. Direct…

Computer Vision and Pattern Recognition · Computer Science 2021-05-07 Karl Schmeckpeper , Georgios Georgakis , Kostas Daniilidis

Self-Supervision by Prediction for Object Discovery in Videos

Despite their irresistible success, deep learning algorithms still heavily rely on annotated data. On the other hand, unsupervised settings pose many challenges, especially about determining the right inductive bias in diverse scenarios.…

Computer Vision and Pattern Recognition · Computer Science 2021-03-11 Beril Besbinar , Pascal Frossard

Learning to Compose: Improving Object Centric Learning by Injecting Compositionality

Learning compositional representation is a key aspect of object-centric learning as it enables flexible systematic generalization and supports complex visual reasoning. However, most of the existing approaches rely on auto-encoding…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Whie Jung , Jaehoon Yoo , Sungjin Ahn , Seunghoon Hong

Object-Centric Multiple Object Tracking

Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT)…

Computer Vision and Pattern Recognition · Computer Science 2023-09-06 Zixu Zhao , Jiaze Wang , Max Horn , Yizhuo Ding , Tong He , Zechen Bai , Dominik Zietlow , Carl-Johann Simon-Gabriel , Bing Shuai , Zhuowen Tu , Thomas Brox , Bernt Schiele , Yanwei Fu , Francesco Locatello , Zheng Zhang , Tianjun Xiao

Object-Centric Learning with Slot Attention

Learning object-centric representations of complex scenes is a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep learning approaches learn distributed representations that do not…

Machine Learning · Computer Science 2020-10-15 Francesco Locatello , Dirk Weissenborn , Thomas Unterthiner , Aravindh Mahendran , Georg Heigold , Jakob Uszkoreit , Alexey Dosovitskiy , Thomas Kipf

Object-centric Video Representation for Long-term Action Anticipation

This paper focuses on building object-centric representations for long-term action anticipation in videos. Our key motivation is that objects provide important cues to recognize and predict human-object interactions, especially when the…

Computer Vision and Pattern Recognition · Computer Science 2023-11-02 Ce Zhang , Changcheng Fu , Shijie Wang , Nakul Agarwal , Kwonjoon Lee , Chiho Choi , Chen Sun

Object-Centric Action-Enhanced Representations for Robot Visuo-Motor Policy Learning

Learning visual representations from observing actions to benefit robot visuo-motor policy generation is a promising direction that closely resembles human cognitive function and perception. Motivated by this, and further inspired by…

Robotics · Computer Science 2025-05-28 Nikos Giannakakis , Argyris Manetas , Panagiotis P. Filntisis , Petros Maragos , George Retsinas