Related papers: StructDiffusion: Language-Guided Creation of Physi…

StructFormer: Learning Spatial Structure for Language-Guided Semantic Rearrangement of Novel Objects

Geometric organization of objects into semantically meaningful arrangements pervades the built world. As such, assistive robots operating in warehouses, offices, and homes would greatly benefit from the ability to recognize and rearrange…

Robotics · Computer Science 2021-10-22 Weiyu Liu , Chris Paxton , Tucker Hermans , Dieter Fox

ReorientDiff: Diffusion Model based Reorientation for Object Manipulation

The ability to manipulate objects in a desired configurations is a fundamental requirement for robots to complete various practical applications. While certain goals can be achieved by picking and placing the objects of interest directly,…

Robotics · Computer Science 2023-09-18 Utkarsh A. Mishra , Yongxin Chen

InsertDiffusion: Identity Preserving Visualization of Objects through a Training-Free Diffusion Architecture

Recent advancements in image synthesis are fueled by the advent of large-scale diffusion models. Yet, integrating realistic object visualizations seamlessly into new or existing backgrounds without extensive training remains a challenge.…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Phillip Mueller , Jannik Wiese , Ioan Craciun , Lars Mikelsons

Mixed Diffusion for 3D Indoor Scene Synthesis

Generating realistic 3D scenes is an area of growing interest in computer vision and robotics. However, creating high-quality, diverse synthetic 3D content often requires expert intervention, making it costly and complex. Recently, efforts…

Computer Vision and Pattern Recognition · Computer Science 2024-12-11 Siyi Hu , Diego Martin Arroyo , Stephanie Debats , Fabian Manhardt , Luca Carlone , Federico Tombari

PolyDiffuse: Polygonal Shape Reconstruction via Guided Set Diffusion Models

This paper presents PolyDiffuse, a novel structured reconstruction algorithm that transforms visual sensor data into polygonal shapes with Diffusion Models (DM), an emerging machinery amid exploding generative AI, while formulating…

Computer Vision and Pattern Recognition · Computer Science 2023-12-27 Jiacheng Chen , Ruizhi Deng , Yasutaka Furukawa

SculptDiff: Learning Robotic Clay Sculpting from Humans with Goal Conditioned Diffusion Policy

Manipulating deformable objects remains a challenge within robotics due to the difficulties of state estimation, long-horizon planning, and predicting how the object will deform given an interaction. These challenges are the most pronounced…

Robotics · Computer Science 2024-03-18 Alison Bartsch , Arvind Car , Charlotte Avra , Amir Barati Farimani

SPREAD: Spatial-Physical REasoning via geometry Aware Diffusion

Automated 3D scene generation is pivotal for applications spanning virtual reality, digital content creation, and Embodied AI. While computer graphics prioritizes aesthetic layouts, vision and robotics demand scenes that mirror real-world…

Graphics · Computer Science 2026-03-31 Minzhang Li , Kuixiang Shao , Xuebing Li , Yuyang Jiao , Yinuo Bai , Hengan Zhou , Sixian Shen , Jiayuan Gu , Jingyi Yu

Language-Guided Object-Centric Diffusion Policy for Generalizable and Collision-Aware Robotic Manipulation

Learning from demonstrations faces challenges in generalizing beyond the training data and often lacks collision awareness. This paper introduces Lan-o3dp, a language-guided object-centric diffusion policy framework that can adapt to unseen…

Robotics · Computer Science 2025-03-18 Hang Li , Qian Feng , Zhi Zheng , Jianxiang Feng , Zhaopeng Chen , Alois Knoll

StackGen: Generating Stable Structures from Silhouettes via Diffusion

Humans naturally obtain intuition about the interactions between and the stability of rigid objects by observing and interacting with the world. It is this intuition that governs the way in which we regularly configure objects in our…

Robotics · Computer Science 2025-03-20 Luzhe Sun , Takuma Yoneda , Samuel W. Wheeler , Tianchong Jiang , Matthew R. Walter

Generating Stable Placements via Physics-guided Diffusion Models

Stably placing an object in a multi-object scene is a fundamental challenge in robotic manipulation, as placements must be penetration-free, establish precise surface contact, and result in a force equilibrium. To assess stability, existing…

Robotics · Computer Science 2025-09-29 Philippe Nadeau , Miguel Rogel , Ivan Bilić , Ivan Petrović , Jonathan Kelly

SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

Diffusion models have demonstrated strong potential for robotic trajectory planning. However, generating coherent trajectories from high-level instructions remains challenging, especially for long-range composition tasks requiring multiple…

Robotics · Computer Science 2024-03-29 Zhixuan Liang , Yao Mu , Hengbo Ma , Masayoshi Tomizuka , Mingyu Ding , Ping Luo

Non-rigid Relative Placement through 3D Dense Diffusion

The task of "relative placement" is to predict the placement of one object in relation to another, e.g. placing a mug onto a mug rack. Through explicit object-centric geometric reasoning, recent methods for relative placement have made…

Robotics · Computer Science 2024-10-30 Eric Cai , Octavian Donca , Ben Eisner , David Held

EraseDraw: Learning to Draw Step-by-Step via Erasing Objects from Images

Creative processes such as painting often involve creating different components of an image one by one. Can we build a computational model to perform this task? Prior works often fail by making global changes to the image, inserting objects…

Computer Vision and Pattern Recognition · Computer Science 2024-12-25 Alper Canberk , Maksym Bondarenko , Ege Ozguroglu , Ruoshi Liu , Carl Vondrick

PlayFusion: Skill Acquisition via Diffusion from Language-Annotated Play

Learning from unstructured and uncurated data has become the dominant paradigm for generative approaches in language and vision. Such unstructured and unguided behavior data, commonly known as play, is also easier to collect in robotics but…

Robotics · Computer Science 2023-12-08 Lili Chen , Shikhar Bahl , Deepak Pathak

InstructDiffusion: A Generalist Modeling Interface for Vision Tasks

We present InstructDiffusion, a unifying and generic framework for aligning computer vision tasks with human instructions. Unlike existing approaches that integrate prior knowledge and pre-define the output space (e.g., categories and…

Computer Vision and Pattern Recognition · Computer Science 2023-09-08 Zigang Geng , Binxin Yang , Tiankai Hang , Chen Li , Shuyang Gu , Ting Zhang , Jianmin Bao , Zheng Zhang , Han Hu , Dong Chen , Baining Guo

SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models

Object-centric learning aims to represent visual data with a set of object entities (a.k.a. slots), providing structured representations that enable systematic generalization. Leveraging advanced architectures like Transformers, recent…

Computer Vision and Pattern Recognition · Computer Science 2023-09-25 Ziyi Wu , Jingyu Hu , Wuyue Lu , Igor Gilitschenski , Animesh Garg

LVDiffusor: Distilling Functional Rearrangement Priors from Large Models into Diffusor

Object rearrangement, a fundamental challenge in robotics, demands versatile strategies to handle diverse objects, configurations, and functional needs. To achieve this, the AI robot needs to learn functional rearrangement priors in order…

Robotics · Computer Science 2024-03-11 Yiming Zeng , Mingdong Wu , Long Yang , Jiyao Zhang , Hao Ding , Hui Cheng , Hao Dong

SceneFoundry: Generating Interactive Infinite 3D Worlds

The ability to automatically generate large-scale, interactive, and physically realistic 3D environments is crucial for advancing robotic learning and embodied intelligence. However, existing generative approaches often fail to capture the…

Computer Vision and Pattern Recognition · Computer Science 2026-01-19 ChunTeng Chen , YiChen Hsu , YiWen Liu , WeiFang Sun , TsaiChing Ni , ChunYi Lee , Min Sun , YuanFu Yang

Move Anything with Layered Scene Diffusion

Diffusion models generate images with an unprecedented level of quality, but how can we freely rearrange image layouts? Recent works generate controllable scenes via learning spatially disentangled latent codes, but these methods do not…

Computer Vision and Pattern Recognition · Computer Science 2024-04-11 Jiawei Ren , Mengmeng Xu , Jui-Chieh Wu , Ziwei Liu , Tao Xiang , Antoine Toisoul

EL3DD: Extended Latent 3D Diffusion for Language Conditioned Multitask Manipulation

Acting in human environments is a crucial capability for general-purpose robots, necessitating a robust understanding of natural language and its application to physical tasks. This paper seeks to harness the capabilities of diffusion…

Robotics · Computer Science 2026-04-28 Jonas Bode , Raphael Memmesheimer , Sven Behnke