StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects

Weiyu Liu; Yilun Du; Tucker Hermans; Sonia Chernova; Chris Paxton

StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects

Robotics 2023-04-26 v2 Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition Machine Learning

Authors: Weiyu Liu , Yilun Du , Tucker Hermans , Sonia Chernova , Chris Paxton

View on arXiv ↗ PDF ↗

Abstract

Robots operating in human environments must be able to rearrange objects into semantically-meaningful configurations, even if these objects are previously unseen. In this work, we focus on the problem of building physically-valid structures without step-by-step instructions. We propose StructDiffusion, which combines a diffusion model and an object-centric transformer to construct structures given partial-view point clouds and high-level language goals, such as "set the table". Our method can perform multiple challenging language-conditioned multi-step 3D planning tasks using one model. StructDiffusion even improves the success rate of assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model trained on specific structures. We show experiments on held-out objects in both simulation and on real-world rearrangement tasks. Importantly, we show how integrating both a diffusion model and a collision-discriminator model allows for improved generalization over other methods when rearranging previously-unseen objects. For videos and additional results, see our website: https://structdiffusion.github.io/.

Keywords

diffusion model robotic manipulation 3d scene understanding

Cite

@article{arxiv.2211.04604,
  title  = {StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects},
  author = {Weiyu Liu and Yilun Du and Tucker Hermans and Sonia Chernova and Chris Paxton},
  journal= {arXiv preprint arXiv:2211.04604},
  year   = {2023}
}

Comments

Accepted to Robotics: Science and Systems (RSS) 2023. The previous version appeared in CoRL Workshop on Language and Robot Learning 2022

StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects

Abstract

Keywords

Cite

Comments

Related papers