Related papers: Are Object-Centric Representations Better At Compo…

Provable Compositional Generalization for Object-Centric Learning

Learning representations that generalize to novel compositions of known concepts is crucial for bridging the gap between human and machine perception. One prominent effort is learning object-centric representations, which are widely…

Machine Learning · Computer Science 2024-11-13 Thaddäus Wiedemer , Jack Brady , Alexander Panfilov , Attila Juhos , Matthias Bethge , Wieland Brendel

Object-Centric Representations Improve Policy Generalization in Robot Manipulation

Visual representations are central to the learning and generalization capabilities of robotic manipulation policies. While existing methods rely on global or dense features, such representations often entangle task-relevant and irrelevant…

Robotics · Computer Science 2025-05-20 Alexandre Chapin , Bruno Machado , Emmanuel Dellandrea , Liming Chen

Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models

Object-centric (OC) representations, which model visual scenes as compositions of discrete objects, have the potential to be used in various downstream tasks to achieve systematic compositional generalization and facilitate reasoning.…

Computer Vision and Pattern Recognition · Computer Science 2025-03-04 Amir Mohammad Karimi Mamaghan , Samuele Papa , Karl Henrik Johansson , Stefan Bauer , Andrea Dittadi

Learning to Compose: Improving Object Centric Learning by Injecting Compositionality

Learning compositional representation is a key aspect of object-centric learning as it enables flexible systematic generalization and supports complex visual reasoning. However, most of the existing approaches rely on auto-encoding…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Whie Jung , Jaehoon Yoo , Sungjin Ahn , Seunghoon Hong

Synthetic Object Compositions for Scalable and Accurate Learning in Detection, Segmentation, and Grounding

Visual grouping -- operationalized through tasks such as instance segmentation, visual grounding, and object detection -- enables applications ranging from robotic perception to photo editing. These fundamental problems in computer vision…

Computer Vision and Pattern Recognition · Computer Science 2026-01-28 Weikai Huang , Jieyu Zhang , Taoyang Jia , Chenhao Zheng , Ziqi Gao , Jae Sung Park , Winson Han , Ranjay Krishna

Systematic Visual Reasoning through Object-Centric Relational Abstraction

Human visual reasoning is characterized by an ability to identify abstract patterns from only a small number of examples, and to systematically generalize those patterns to novel inputs. This capacity depends in large part on our ability to…

Computer Vision and Pattern Recognition · Computer Science 2023-11-14 Taylor W. Webb , Shanka Subhra Mondal , Jonathan D. Cohen

Provably Learning Object-Centric Representations

Learning structured representations of the visual world in terms of objects promises to significantly improve the generalization abilities of current machine learning models. While recent efforts to this end have shown promising empirical…

Machine Learning · Computer Science 2023-05-24 Jack Brady , Roland S. Zimmermann , Yash Sharma , Bernhard Schölkopf , Julius von Kügelgen , Wieland Brendel

Is an object-centric representation beneficial for robotic manipulation ?

Object-centric representation (OCR) has recently become a subject of interest in the computer vision community for learning a structured representation of images and videos. It has been several times presented as a potential way to improve…

Artificial Intelligence · Computer Science 2025-06-25 Alexandre Chapin , Emmanuel Dellandrea , Liming Chen

Does Data Scaling Lead to Visual Compositional Generalization?

Compositional understanding is crucial for human intelligence, yet it remains unclear whether contemporary vision models exhibit it. The dominant machine learning paradigm is built on the premise that scaling data and model sizes will…

Machine Learning · Computer Science 2025-07-10 Arnas Uselis , Andrea Dittadi , Seong Joon Oh

Are We Done with Object-Centric Learning?

Object-centric learning (OCL) seeks to learn representations that only encode an object, isolated from other objects or background cues in a scene. This approach underpins various aims, including out-of-distribution (OOD) generalization,…

Computer Vision and Pattern Recognition · Computer Science 2025-04-14 Alexander Rubinstein , Ameya Prabhu , Matthias Bethge , Seong Joon Oh

Compositional Scene Modeling with Global Object-Centric Representations

The appearance of the same object may vary in different scene images due to perspectives and occlusions between objects. Humans can easily identify the same object, even if occlusions exist, by completing the occluded parts based on its…

Computer Vision and Pattern Recognition · Computer Science 2022-11-28 Tonglin Chen , Bin Li , Zhimeng Shen , Xiangyang Xue

Interpreting the structure of multi-object representations in vision encoders

In this work, we interpret the representations of multi-object scenes in vision encoders through the lens of structured representations. Structured representations allow modeling of individual objects distinctly and their flexible use based…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Tarun Khajuria , Braian Olmiro Dias , Marharyta Domnich , Jaan Aru

VinVL: Revisiting Visual Representations in Vision-Language Models

This paper presents a detailed study of improving visual representations for vision language (VL) tasks and develops an improved object detection model to provide object-centric representations of images. Compared to the most widely used…

Computer Vision and Pattern Recognition · Computer Science 2021-03-11 Pengchuan Zhang , Xiujun Li , Xiaowei Hu , Jianwei Yang , Lei Zhang , Lijuan Wang , Yejin Choi , Jianfeng Gao

Spotlighting Task-Relevant Features: Object-Centric Representations for Better Generalization in Robotic Manipulation

The generalization capabilities of robotic manipulation policies are heavily influenced by the choice of visual representations. Existing approaches typically rely on representations extracted from pre-trained encoders, using two dominant…

Robotics · Computer Science 2026-01-30 Alexandre Chapin , Bruno Machado , Emmanuel Dellandréa , Liming Chen

Visual Reasoning in Object-Centric Deep Neural Networks: A Comparative Cognition Approach

Achieving visual reasoning is a long-term goal of artificial intelligence. In the last decade, several studies have applied deep neural networks (DNNs) to the task of learning visual relations from images, with modest results in terms of…

Computer Vision and Pattern Recognition · Computer Science 2024-02-21 Guillermo Puebla , Jeffrey S. Bowers

Evaluating Object-Centric Models beyond Object Discovery

Object-centric learning (OCL) aims to learn structured scene representations that support compositional generalization and robustness to out-of-distribution (OOD) data. However, OCL models are often not evaluated regarding these goals.…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Krishnakant Singh , Simone Schaub-Meyer , Stefan Roth

Self-supervised Visual Reinforcement Learning with Object-centric Representations

Autonomous agents need large repertoires of skills to act reasonably on new tasks that they have not seen before. However, acquiring these skills using only a stream of high-dimensional, unstructured, and unlabeled observations is a tricky…

Machine Learning · Computer Science 2021-02-09 Andrii Zadaianchuk , Maximilian Seitzer , Georg Martius

Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models

Compositional generalization, the ability to recognize familiar parts in novel contexts, is a defining property of intelligent systems. Although modern models are trained on massive datasets, they still cover only a tiny fraction of the…

Computer Vision and Pattern Recognition · Computer Science 2026-03-02 Arnas Uselis , Andrea Dittadi , Seong Joon Oh

Successes and Limitations of Object-centric Models at Compositional Generalisation

In recent years, it has been shown empirically that standard disentangled latent variable models do not support robust compositional learning in the visual domain. Indeed, in spite of being designed with the goal of factorising datasets…

Computer Vision and Pattern Recognition · Computer Science 2024-12-30 Milton L. Montero , Jeffrey S. Bowers , Gaurav Malhotra

Compositional Concept Generalization with Variational Quantum Circuits

Compositional generalization is a key facet of human cognition, but lacking in current AI tools such as vision-language models. Previous work examined whether a compositional tensor-based sentence semantics can overcome the challenge, but…

Artificial Intelligence · Computer Science 2025-09-12 Hala Hawashin , Mina Abbaszadeh , Nicholas Joseph , Beth Pearson , Martha Lewis , Mehrnoosh sadrzadeh