Related papers: SlotDiffusion: Object-Centric Generative Modeling …

Object-Centric Slot Diffusion

The recent success of transformer-based image generative models in object-centric learning highlights the importance of powerful image generators for handling complex scenes. However, despite the high expressiveness of diffusion models in…

Computer Vision and Pattern Recognition · Computer Science 2023-11-06 Jindong Jiang , Fei Deng , Gautam Singh , Sungjin Ahn

GLASS: Guided Latent Slot Diffusion for Object-Centric Learning

Object-centric learning aims to decompose an input image into a set of meaningful object files (slots). These latent object representations enable a variety of downstream tasks. Yet, object-centric learning struggles on real-world datasets,…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Krishnakant Singh , Simone Schaub-Meyer , Stefan Roth

Learning Object-Centric Representations Based on Slots in Real World Scenarios

A central goal in AI is to represent scenes as compositions of discrete objects, enabling fine-grained, controllable image and video generation. Yet leading diffusion models treat images holistically and rely on text conditioning, creating…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Adil Kaan Akan

Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation

We present SlotAdapt, an object-centric learning method that combines slot attention with pretrained diffusion models by introducing adapters for slot-based conditioning. Our method preserves the generative power of pretrained diffusion…

Computer Vision and Pattern Recognition · Computer Science 2025-03-04 Adil Kaan Akan , Yucel Yemez

Object-Centric Learning with Slot Attention

Learning object-centric representations of complex scenes is a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep learning approaches learn distributed representations that do not…

Machine Learning · Computer Science 2020-10-15 Francesco Locatello , Dirk Weissenborn , Thomas Unterthiner , Aravindh Mahendran , Georg Heigold , Jakob Uszkoreit , Alexey Dosovitskiy , Thomas Kipf

SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields

The ability to distill object-centric abstractions from intricate visual scenes underpins human-level generalization. Despite the significant progress in object-centric learning methods, learning object-centric representations in the 3D…

Computer Vision and Pattern Recognition · Computer Science 2024-08-14 Yu Liu , Baoxiong Jia , Yixin Chen , Siyuan Huang

Reasoning-Enhanced Object-Centric Learning for Videos

Object-centric learning aims to break down complex visual scenes into more manageable object representations, enhancing the understanding and reasoning abilities of machine learning systems toward the physical world. Recently, slot-based…

Computer Vision and Pattern Recognition · Computer Science 2025-02-18 Jian Li , Pu Ren , Yang Liu , Hao Sun

Object-Centric World Model for Language-Guided Manipulation

A world model is essential for an agent to predict the future and plan in domains such as autonomous driving and robotics. To achieve this, recent advancements have focused on video generation, which has gained significant attention due to…

Artificial Intelligence · Computer Science 2025-03-13 Youngjoon Jeong , Junha Chun , Soonwoo Cha , Taesup Kim

Grounded Object Centric Learning

The extraction of modular object-centric representations for downstream tasks is an emerging area of research. Learning grounded representations of objects that are guaranteed to be stable and invariant promises robust performance across…

Machine Learning · Computer Science 2024-01-26 Avinash Kori , Francesco Locatello , Fabio De Sousa Ribeiro , Francesca Toni , Ben Glocker

FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection

Detecting objects seamlessly blended into their surroundings represents a complex task for both human cognitive capabilities and advanced artificial intelligence algorithms. Currently, the majority of methodologies for detecting camouflaged…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Jianwei Zhao , Xin Li , Fan Yang , Qiang Zhai , Ao Luo , Zicheng Jiao , Hong Cheng

Debiasing Diffusion Model: Enhancing Fairness through Latent Representation Learning in Stable Diffusion Model

Image generative models, particularly diffusion-based models, have surged in popularity due to their remarkable ability to synthesize highly realistic images. However, since these models are data-driven, they inherit biases from the…

Machine Learning · Computer Science 2025-03-18 Lin-Chun Huang , Ching Chieh Tsao , Fang-Yi Su , Jung-Hsien Chiang

Object-Centric Diffusion for Efficient Video Editing

Diffusion-based video editing have reached impressive quality and can transform either the global style, local structure, and attributes of given video inputs, following textual edit prompts. However, such solutions typically incur heavy…

Computer Vision and Pattern Recognition · Computer Science 2024-09-02 Kumara Kahatapitiya , Adil Karjauv , Davide Abati , Fatih Porikli , Yuki M. Asano , Amirhossein Habibian

Boosting Generative Image Modeling via Joint Image-Feature Synthesis

Latent diffusion models (LDMs) dominate high-quality image generation, yet integrating representation learning with generative modeling remains a challenge. We introduce a novel generative image modeling framework that seamlessly bridges…

Computer Vision and Pattern Recognition · Computer Science 2026-01-23 Theodoros Kouzelis , Efstathios Karypidis , Ioannis Kakogeorgiou , Spyros Gidaris , Nikos Komodakis

Learning Global Object-Centric Representations via Disentangled Slot Attention

Humans can discern scene-independent features of objects across various environments, allowing them to swiftly identify objects amidst changing factors such as lighting, perspective, size, and position and imagine the complete images of the…

Computer Vision and Pattern Recognition · Computer Science 2024-11-05 Tonglin Chen , Yinxuan Huang , Zhimeng Shen , Jinghao Huang , Bin Li , Xiangyang Xue

MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning

Learning object-level, structured representations is widely regarded as a key to better generalization in vision and underpins the design of next-generation Pre-trained Vision Models (PVMs). Mainstream Object-Centric Learning (OCL) methods…

Computer Vision and Pattern Recognition · Computer Science 2025-10-09 Hongjia Liu , Rongzhen Zhao , Haohan Chen , Joni Pajarinen

CTRL-O: Language-Controllable Object-Centric Visual Representation Learning

Object-centric representation learning aims to decompose visual scenes into fixed-size vectors called "slots" or "object files", where each slot captures a distinct object. Current state-of-the-art object-centric models have shown…

Computer Vision and Pattern Recognition · Computer Science 2025-03-28 Aniket Didolkar , Andrii Zadaianchuk , Rabiul Awal , Maximilian Seitzer , Efstratios Gavves , Aishwarya Agrawal

KeyPointDiffuser: Unsupervised 3D Keypoint Learning via Latent Diffusion Models

Understanding and representing the structure of 3D objects in an unsupervised manner remains a core challenge in computer vision and graphics. Most existing unsupervised keypoint methods are not designed for unconditional generative…

Computer Vision and Pattern Recognition · Computer Science 2025-12-04 Rhys Newbury , Juyan Zhang , Tin Tran , Hanna Kurniawati , Dana Kulić

Cycle Consistency Driven Object Discovery

Developing deep learning models that effectively learn object-centric representations, akin to human cognition, remains a challenging task. Existing approaches facilitate object discovery by representing objects as fixed-size vectors,…

Computer Vision and Pattern Recognition · Computer Science 2023-12-11 Aniket Didolkar , Anirudh Goyal , Yoshua Bengio

Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

Object-centric learning (OCL) extracts the representation of objects with slots, offering an exceptional blend of flexibility and interpretability for abstracting low-level perceptual features. A widely adopted method within OCL is slot…

Computer Vision and Pattern Recognition · Computer Science 2024-06-14 Ke Fan , Zechen Bai , Tianjun Xiao , Tong He , Max Horn , Yanwei Fu , Francesco Locatello , Zheng Zhang

Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion

3D asset generation plays a pivotal role in fields such as gaming and virtual reality, enabling the rapid synthesis of high-fidelity 3D objects from a single or multiple images. Building on this capability, enabling style-controllable…

Computer Vision and Pattern Recognition · Computer Science 2026-05-08 Yiran Qiao , Yiren Lu , Yunlai Zhou , Disheng Liu , Linlin Hou , Rui Yang , Yu Yin , Jing Ma