Related papers: CTRL-O: Language-Controllable Object-Centric Visua…

Language-Mediated, Object-Centric Representation Learning

We present Language-mediated, Object-centric Representation Learning (LORL), a paradigm for learning disentangled, object-centric scene representations from vision and language. LORL builds upon recent advances in unsupervised object…

Machine Learning · Computer Science 2021-06-09 Ruocheng Wang , Jiayuan Mao , Samuel J. Gershman , Jiajun Wu

Object-Centric Learning with Slot Attention

Learning object-centric representations of complex scenes is a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep learning approaches learn distributed representations that do not…

Machine Learning · Computer Science 2020-10-15 Francesco Locatello , Dirk Weissenborn , Thomas Unterthiner , Aravindh Mahendran , Georg Heigold , Jakob Uszkoreit , Alexey Dosovitskiy , Thomas Kipf

Cycle Consistency Driven Object Discovery

Developing deep learning models that effectively learn object-centric representations, akin to human cognition, remains a challenging task. Existing approaches facilitate object discovery by representing objects as fixed-size vectors,…

Computer Vision and Pattern Recognition · Computer Science 2023-12-11 Aniket Didolkar , Anirudh Goyal , Yoshua Bengio

Learning Global Object-Centric Representations via Disentangled Slot Attention

Humans can discern scene-independent features of objects across various environments, allowing them to swiftly identify objects amidst changing factors such as lighting, perspective, size, and position and imagine the complete images of the…

Computer Vision and Pattern Recognition · Computer Science 2024-11-05 Tonglin Chen , Yinxuan Huang , Zhimeng Shen , Jinghao Huang , Bin Li , Xiangyang Xue

Leveraging Image Augmentation for Object Manipulation: Towards Interpretable Controllability in Object-Centric Learning

The binding problem in artificial neural networks is actively explored with the goal of achieving human-level recognition skills through the comprehension of the world in terms of symbol-like entities. Especially in the field of computer…

Computer Vision and Pattern Recognition · Computer Science 2024-03-05 Jinwoo Kim , Janghyuk Choi , Jaehyun Kang , Changyeon Lee , Ho-Jin Choi , Seon Joo Kim

Are We Done with Object-Centric Learning?

Object-centric learning (OCL) seeks to learn representations that only encode an object, isolated from other objects or background cues in a scene. This approach underpins various aims, including out-of-distribution (OOD) generalization,…

Computer Vision and Pattern Recognition · Computer Science 2025-04-14 Alexander Rubinstein , Ameya Prabhu , Matthias Bethge , Seong Joon Oh

Learning Object-Centric Representations Based on Slots in Real World Scenarios

A central goal in AI is to represent scenes as compositions of discrete objects, enabling fine-grained, controllable image and video generation. Yet leading diffusion models treat images holistically and rely on text conditioning, creating…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Adil Kaan Akan

Efficient Object-centric Representation Learning with Pre-trained Geometric Prior

This paper addresses key challenges in object-centric representation learning of video. While existing approaches struggle with complex scenes, we propose a novel weakly-supervised framework that emphasises geometric understanding and…

Computer Vision and Pattern Recognition · Computer Science 2024-12-18 Phúc H. Le Khac , Graham Healy , Alan F. Smeaton

GLASS: Guided Latent Slot Diffusion for Object-Centric Learning

Object-centric learning aims to decompose an input image into a set of meaningful object files (slots). These latent object representations enable a variety of downstream tasks. Yet, object-centric learning struggles on real-world datasets,…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Krishnakant Singh , Simone Schaub-Meyer , Stefan Roth

Conditional Object-Centric Learning from Video

Object-centric representations are a promising path toward more systematic generalization by providing flexible abstractions upon which compositional world models can be built. Recent work on simple 2D and 3D datasets has shown that models…

Computer Vision and Pattern Recognition · Computer Science 2022-03-16 Thomas Kipf , Gamaleldin F. Elsayed , Aravindh Mahendran , Austin Stone , Sara Sabour , Georg Heigold , Rico Jonschkowski , Alexey Dosovitskiy , Klaus Greff

Is an object-centric representation beneficial for robotic manipulation ?

Object-centric representation (OCR) has recently become a subject of interest in the computer vision community for learning a structured representation of images and videos. It has been several times presented as a potential way to improve…

Artificial Intelligence · Computer Science 2025-06-25 Alexandre Chapin , Emmanuel Dellandrea , Liming Chen

MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning

Learning object-level, structured representations is widely regarded as a key to better generalization in vision and underpins the design of next-generation Pre-trained Vision Models (PVMs). Mainstream Object-Centric Learning (OCL) methods…

Computer Vision and Pattern Recognition · Computer Science 2025-10-09 Hongjia Liu , Rongzhen Zhao , Haohan Chen , Joni Pajarinen

Object-Centric World Model for Language-Guided Manipulation

A world model is essential for an agent to predict the future and plan in domains such as autonomous driving and robotics. To achieve this, recent advancements have focused on video generation, which has gained significant attention due to…

Artificial Intelligence · Computer Science 2025-03-13 Youngjoon Jeong , Junha Chun , Soonwoo Cha , Taesup Kim

Bootstrapping Top-down Information for Self-modulating Slot Attention

Object-centric learning (OCL) aims to learn representations of individual objects within visual scenes without manual supervision, facilitating efficient and effective visual reasoning. Traditional OCL methods primarily employ bottom-up…

Computer Vision and Pattern Recognition · Computer Science 2024-11-11 Dongwon Kim , Seoyeon Kim , Suha Kwak

Object-Centric Temporal Consistency via Conditional Autoregressive Inductive Biases

Unsupervised object-centric learning from videos is a promising approach towards learning compositional representations that can be applied to various downstream tasks, such as prediction and reasoning. Recently, it was shown that…

Computer Vision and Pattern Recognition · Computer Science 2024-10-22 Cristian Meo , Akihiro Nakano , Mircea Lică , Aniket Didolkar , Masahiro Suzuki , Anirudh Goyal , Mengmi Zhang , Justin Dauwels , Yutaka Matsuo , Yoshua Bengio

Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

Object-centric learning (OCL) extracts the representation of objects with slots, offering an exceptional blend of flexibility and interpretability for abstracting low-level perceptual features. A widely adopted method within OCL is slot…

Computer Vision and Pattern Recognition · Computer Science 2024-06-14 Ke Fan , Zechen Bai , Tianjun Xiao , Tong He , Max Horn , Yanwei Fu , Francesco Locatello , Zheng Zhang

Object-Centric Pretraining via Target Encoder Bootstrapping

Object-centric representation learning has recently been successfully applied to real-world datasets. This success can be attributed to pretrained non-object-centric foundation models, whose features serve as reconstruction targets for slot…

Computer Vision and Pattern Recognition · Computer Science 2025-03-20 Nikola Đukić , Tim Lebailly , Tinne Tuytelaars

Identifiable Object-Centric Representation Learning via Probabilistic Slot Attention

Learning modular object-centric representations is crucial for systematic generalization. Existing methods show promising object-binding capabilities empirically, but theoretical identifiability guarantees remain relatively underdeveloped.…

Machine Learning · Computer Science 2024-11-12 Avinash Kori , Francesco Locatello , Ainkaran Santhirasekaram , Francesca Toni , Ben Glocker , Fabio De Sousa Ribeiro

Zero-Shot Object-Centric Representation Learning

The goal of object-centric representation learning is to decompose visual scenes into a structured representation that isolates the entities. Recent successes have shown that object-centric representation learning can be scaled to…

Computer Vision and Pattern Recognition · Computer Science 2024-08-20 Aniket Didolkar , Andrii Zadaianchuk , Anirudh Goyal , Mike Mozer , Yoshua Bengio , Georg Martius , Maximilian Seitzer

SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models

Object-centric learning aims to represent visual data with a set of object entities (a.k.a. slots), providing structured representations that enable systematic generalization. Leveraging advanced architectures like Transformers, recent…

Computer Vision and Pattern Recognition · Computer Science 2023-09-25 Ziyi Wu , Jingyu Hu , Wuyue Lu , Igor Gilitschenski , Animesh Garg