Related papers: Improving Object-centric Learning with Query Optim…

Object-Centric Learning with Slot Attention

Learning object-centric representations of complex scenes is a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep learning approaches learn distributed representations that do not…

Machine Learning · Computer Science 2020-10-15 Francesco Locatello , Dirk Weissenborn , Thomas Unterthiner , Aravindh Mahendran , Georg Heigold , Jakob Uszkoreit , Alexey Dosovitskiy , Thomas Kipf

ContextFusion and Bootstrap: An Effective Approach to Improve Slot Attention-Based Object-Centric Learning

A key human ability is to decompose a scene into distinct objects and use their relationships to understand the environment. Object-centric learning aims to mimic this process in an unsupervised manner. Recently, the slot attention-based…

Computer Vision and Pattern Recognition · Computer Science 2025-09-03 Pinzhuo Tian , Shengjie Yang , Hang Yu , Alex C. Kot

SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers

Unsupervised object-centric learning aims to decompose scenes into interpretable object entities, termed slots. Slot-based auto-encoders stand out as a prominent method for this task. Within them, crucial aspects include guiding the encoder…

Computer Vision and Pattern Recognition · Computer Science 2024-04-08 Ioannis Kakogeorgiou , Spyros Gidaris , Konstantinos Karantzalos , Nikos Komodakis

Learning Global Object-Centric Representations via Disentangled Slot Attention

Humans can discern scene-independent features of objects across various environments, allowing them to swiftly identify objects amidst changing factors such as lighting, perspective, size, and position and imagine the complete images of the…

Computer Vision and Pattern Recognition · Computer Science 2024-11-05 Tonglin Chen , Yinxuan Huang , Zhimeng Shen , Jinghao Huang , Bin Li , Xiangyang Xue

Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning

Object-centric learning (OCL) aspires general and compositional understanding of scenes by representing a scene as a collection of object-centric representations. OCL has also been extended to multi-view image and video datasets to apply…

Computer Vision and Pattern Recognition · Computer Science 2023-04-03 Jinwoo Kim , Janghyuk Choi , Ho-Jin Choi , Seon Joo Kim

Spotlight Attention: Robust Object-Centric Learning With a Spatial Locality Prior

The aim of object-centric vision is to construct an explicit representation of the objects in a scene. This representation is obtained via a set of interchangeable modules called \emph{slots} or \emph{object files} that compete for local…

Computer Vision and Pattern Recognition · Computer Science 2023-06-06 Ayush Chakravarthy , Trang Nguyen , Anirudh Goyal , Yoshua Bengio , Michael C. Mozer

Conditional Object-Centric Learning from Video

Object-centric representations are a promising path toward more systematic generalization by providing flexible abstractions upon which compositional world models can be built. Recent work on simple 2D and 3D datasets has shown that models…

Computer Vision and Pattern Recognition · Computer Science 2022-03-16 Thomas Kipf , Gamaleldin F. Elsayed , Aravindh Mahendran , Austin Stone , Sara Sabour , Georg Heigold , Rico Jonschkowski , Alexey Dosovitskiy , Klaus Greff

GLASS: Guided Latent Slot Diffusion for Object-Centric Learning

Object-centric learning aims to decompose an input image into a set of meaningful object files (slots). These latent object representations enable a variety of downstream tasks. Yet, object-centric learning struggles on real-world datasets,…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Krishnakant Singh , Simone Schaub-Meyer , Stefan Roth

Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

Object-centric learning (OCL) extracts the representation of objects with slots, offering an exceptional blend of flexibility and interpretability for abstracting low-level perceptual features. A widely adopted method within OCL is slot…

Computer Vision and Pattern Recognition · Computer Science 2024-06-14 Ke Fan , Zechen Bai , Tianjun Xiao , Tong He , Max Horn , Yanwei Fu , Francesco Locatello , Zheng Zhang

Zero-Shot Object-Centric Representation Learning

The goal of object-centric representation learning is to decompose visual scenes into a structured representation that isolates the entities. Recent successes have shown that object-centric representation learning can be scaled to…

Computer Vision and Pattern Recognition · Computer Science 2024-08-20 Aniket Didolkar , Andrii Zadaianchuk , Anirudh Goyal , Mike Mozer , Yoshua Bengio , Georg Martius , Maximilian Seitzer

Learning to Compose: Improving Object Centric Learning by Injecting Compositionality

Learning compositional representation is a key aspect of object-centric learning as it enables flexible systematic generalization and supports complex visual reasoning. However, most of the existing approaches rely on auto-encoding…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Whie Jung , Jaehoon Yoo , Sungjin Ahn , Seunghoon Hong

Slot-BERT: Self-supervised Object Discovery in Surgical Video

Object-centric slot attention is a powerful framework for unsupervised learning of structured and explainable representations that can support reasoning about objects and actions, including in surgical videos. While conventional…

Image and Video Processing · Electrical Eng. & Systems 2026-03-04 Guiqiu Liao , Matjaz Jogan , Marcel Hussing , Kenta Nakahashi , Kazuhiro Yasufuku , Amin Madani , Eric Eaton , Daniel A. Hashimoto

MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning

Learning object-level, structured representations is widely regarded as a key to better generalization in vision and underpins the design of next-generation Pre-trained Vision Models (PVMs). Mainstream Object-Centric Learning (OCL) methods…

Computer Vision and Pattern Recognition · Computer Science 2025-10-09 Hongjia Liu , Rongzhen Zhao , Haohan Chen , Joni Pajarinen

Reasoning-Enhanced Object-Centric Learning for Videos

Object-centric learning aims to break down complex visual scenes into more manageable object representations, enhancing the understanding and reasoning abilities of machine learning systems toward the physical world. Recently, slot-based…

Computer Vision and Pattern Recognition · Computer Science 2025-02-18 Jian Li , Pu Ren , Yang Liu , Hao Sun

Simplified priors for Object-Centric Learning

Humans excel at abstracting data and constructing \emph{reusable} concepts, a capability lacking in current continual learning systems. The field of object-centric learning addresses this by developing abstract representations, or slots,…

Computer Vision and Pattern Recognition · Computer Science 2024-10-02 Vihang Patil , Andreas Radler , Daniel Klotz , Sepp Hochreiter

Guided Slot Attention for Unsupervised Video Object Segmentation

Unsupervised video object segmentation aims to segment the most prominent object in a video sequence. However, the existence of complex backgrounds and multiple foreground objects make this task challenging. To address this issue, we…

Computer Vision and Pattern Recognition · Computer Science 2024-04-02 Minhyeok Lee , Suhwan Cho , Dogyoon Lee , Chaewon Park , Jungho Lee , Sangyoun Lee

Identifiable Object-Centric Representation Learning via Probabilistic Slot Attention

Learning modular object-centric representations is crucial for systematic generalization. Existing methods show promising object-binding capabilities empirically, but theoretical identifiability guarantees remain relatively underdeveloped.…

Machine Learning · Computer Science 2024-11-12 Avinash Kori , Francesco Locatello , Ainkaran Santhirasekaram , Francesca Toni , Ben Glocker , Fabio De Sousa Ribeiro

Simple Unsupervised Object-Centric Learning for Complex and Naturalistic Videos

Unsupervised object-centric learning aims to represent the modular, compositional, and causal structure of a scene as a set of object representations and thereby promises to resolve many critical limitations of traditional single-vector…

Computer Vision and Pattern Recognition · Computer Science 2022-05-30 Gautam Singh , Yi-Fu Wu , Sungjin Ahn

Attention Normalization Impacts Cardinality Generalization in Slot Attention

Object-centric scene decompositions are important representations for downstream tasks in fields such as computer vision and robotics. The recently proposed Slot Attention module, already leveraged by several derivative works for image…

Computer Vision and Pattern Recognition · Computer Science 2024-11-12 Markus Krimmel , Jan Achterhold , Joerg Stueckler

Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames

Automatically discovering composable abstractions from raw perceptual data is a long-standing challenge in machine learning. Recent slot-based neural networks that learn about objects in a self-supervised manner have made exciting progress…

Computer Vision and Pattern Recognition · Computer Science 2023-07-24 Ondrej Biza , Sjoerd van Steenkiste , Mehdi S. M. Sajjadi , Gamaleldin F. Elsayed , Aravindh Mahendran , Thomas Kipf