Related papers: Grounded Object Centric Learning

Object-Centric Learning with Slot Attention

Learning object-centric representations of complex scenes is a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep learning approaches learn distributed representations that do not…

Machine Learning · Computer Science 2020-10-15 Francesco Locatello , Dirk Weissenborn , Thomas Unterthiner , Aravindh Mahendran , Georg Heigold , Jakob Uszkoreit , Alexey Dosovitskiy , Thomas Kipf

Object-Centric Learning with Slot Mixture Module

Object-centric architectures usually apply a differentiable module to the entire feature map to decompose it into sets of entity representations called slots. Some of these methods structurally resemble clustering algorithms, where the…

Machine Learning · Computer Science 2024-12-30 Daniil Kirilenko , Vitaliy Vorobyov , Alexey K. Kovalev , Aleksandr I. Panov

GLASS: Guided Latent Slot Diffusion for Object-Centric Learning

Object-centric learning aims to decompose an input image into a set of meaningful object files (slots). These latent object representations enable a variety of downstream tasks. Yet, object-centric learning struggles on real-world datasets,…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Krishnakant Singh , Simone Schaub-Meyer , Stefan Roth

Learning Global Object-Centric Representations via Disentangled Slot Attention

Humans can discern scene-independent features of objects across various environments, allowing them to swiftly identify objects amidst changing factors such as lighting, perspective, size, and position and imagine the complete images of the…

Computer Vision and Pattern Recognition · Computer Science 2024-11-05 Tonglin Chen , Yinxuan Huang , Zhimeng Shen , Jinghao Huang , Bin Li , Xiangyang Xue

Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

Object-centric learning (OCL) extracts the representation of objects with slots, offering an exceptional blend of flexibility and interpretability for abstracting low-level perceptual features. A widely adopted method within OCL is slot…

Computer Vision and Pattern Recognition · Computer Science 2024-06-14 Ke Fan , Zechen Bai , Tianjun Xiao , Tong He , Max Horn , Yanwei Fu , Francesco Locatello , Zheng Zhang

Cycle Consistency Driven Object Discovery

Developing deep learning models that effectively learn object-centric representations, akin to human cognition, remains a challenging task. Existing approaches facilitate object discovery by representing objects as fixed-size vectors,…

Computer Vision and Pattern Recognition · Computer Science 2023-12-11 Aniket Didolkar , Anirudh Goyal , Yoshua Bengio

Identifiable Object-Centric Representation Learning via Probabilistic Slot Attention

Learning modular object-centric representations is crucial for systematic generalization. Existing methods show promising object-binding capabilities empirically, but theoretical identifiability guarantees remain relatively underdeveloped.…

Machine Learning · Computer Science 2024-11-12 Avinash Kori , Francesco Locatello , Ainkaran Santhirasekaram , Francesca Toni , Ben Glocker , Fabio De Sousa Ribeiro

Spotlight Attention: Robust Object-Centric Learning With a Spatial Locality Prior

The aim of object-centric vision is to construct an explicit representation of the objects in a scene. This representation is obtained via a set of interchangeable modules called \emph{slots} or \emph{object files} that compete for local…

Computer Vision and Pattern Recognition · Computer Science 2023-06-06 Ayush Chakravarthy , Trang Nguyen , Anirudh Goyal , Yoshua Bengio , Michael C. Mozer

Object-Centric Temporal Consistency via Conditional Autoregressive Inductive Biases

Unsupervised object-centric learning from videos is a promising approach towards learning compositional representations that can be applied to various downstream tasks, such as prediction and reasoning. Recently, it was shown that…

Computer Vision and Pattern Recognition · Computer Science 2024-10-22 Cristian Meo , Akihiro Nakano , Mircea Lică , Aniket Didolkar , Masahiro Suzuki , Anirudh Goyal , Mengmi Zhang , Justin Dauwels , Yutaka Matsuo , Yoshua Bengio

Learning Object-Centric Representations Based on Slots in Real World Scenarios

A central goal in AI is to represent scenes as compositions of discrete objects, enabling fine-grained, controllable image and video generation. Yet leading diffusion models treat images holistically and rely on text conditioning, creating…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Adil Kaan Akan

SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models

Object-centric learning aims to represent visual data with a set of object entities (a.k.a. slots), providing structured representations that enable systematic generalization. Leveraging advanced architectures like Transformers, recent…

Computer Vision and Pattern Recognition · Computer Science 2023-09-25 Ziyi Wu , Jingyu Hu , Wuyue Lu , Igor Gilitschenski , Animesh Garg

Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation

We present SlotAdapt, an object-centric learning method that combines slot attention with pretrained diffusion models by introducing adapters for slot-based conditioning. Our method preserves the generative power of pretrained diffusion…

Computer Vision and Pattern Recognition · Computer Science 2025-03-04 Adil Kaan Akan , Yucel Yemez

Object-centric architectures enable efficient causal representation learning

Causal representation learning has showed a variety of settings in which we can disentangle latent variables with identifiability guarantees (up to some reasonable equivalence class). Common to all of these approaches is the assumption that…

Machine Learning · Computer Science 2023-10-31 Amin Mansouri , Jason Hartford , Yan Zhang , Yoshua Bengio

Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames

Automatically discovering composable abstractions from raw perceptual data is a long-standing challenge in machine learning. Recent slot-based neural networks that learn about objects in a self-supervised manner have made exciting progress…

Computer Vision and Pattern Recognition · Computer Science 2023-07-24 Ondrej Biza , Sjoerd van Steenkiste , Mehdi S. M. Sajjadi , Gamaleldin F. Elsayed , Aravindh Mahendran , Thomas Kipf

MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning

Learning object-level, structured representations is widely regarded as a key to better generalization in vision and underpins the design of next-generation Pre-trained Vision Models (PVMs). Mainstream Object-Centric Learning (OCL) methods…

Computer Vision and Pattern Recognition · Computer Science 2025-10-09 Hongjia Liu , Rongzhen Zhao , Haohan Chen , Joni Pajarinen

Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment

Slot Attention (SA) with pretrained diffusion models has recently shown promise for object-centric learning (OCL), but suffers from slot entanglement and weak alignment between object slots and image content. We propose Contrastive…

Computer Vision and Pattern Recognition · Computer Science 2026-02-20 Bac Nguyen , Yuhta Takida , Naoki Murata , Chieh-Hsin Lai , Toshimitsu Uesaka , Stefano Ermon , Yuki Mitsufuji

Sensitivity of Slot-Based Object-Centric Models to their Number of Slots

Self-supervised methods for learning object-centric representations have recently been applied successfully to various datasets. This progress is largely fueled by slot-based methods, whose ability to cluster visual scenes into meaningful…

Computer Vision and Pattern Recognition · Computer Science 2023-05-31 Roland S. Zimmermann , Sjoerd van Steenkiste , Mehdi S. M. Sajjadi , Thomas Kipf , Klaus Greff

CGSA: Class-Guided Slot-Aware Adaptation for Source-Free Object Detection

Source-Free Domain Adaptive Object Detection (SF-DAOD) aims to adapt a detector trained on a labeled source domain to an unlabeled target domain without retaining any source data. Despite recent progress, most popular approaches focus on…

Computer Vision and Pattern Recognition · Computer Science 2026-02-27 Boyang Dai , Zeng Fan , Zihao Qi , Meng Lou , Yizhou Yu

Rethinking Temporal Consistency in Video Object-Centric Learning: From Prediction to Correspondence

The de facto approach in video object-centric learning maintains temporal consistency through learned dynamics modules that predict future object representations, called slots. We demonstrate that these predictors function as expensive…

Computer Vision and Pattern Recognition · Computer Science 2026-05-12 Zhiyuan Li , Rongzhen Zhao , Wenyan Yang , Wenshuai Zhao , Pekka Marttinen , Joni Pajarinen

ContextFusion and Bootstrap: An Effective Approach to Improve Slot Attention-Based Object-Centric Learning

A key human ability is to decompose a scene into distinct objects and use their relationships to understand the environment. Object-centric learning aims to mimic this process in an unsupervised manner. Recently, the slot attention-based…

Computer Vision and Pattern Recognition · Computer Science 2025-09-03 Pinzhuo Tian , Shengjie Yang , Hang Yu , Alex C. Kot