English
Related papers

Related papers: Motion Before Action: Diffusing Object Motion as M…

200 papers

Learning priors on trajectory distributions can help accelerate robot motion planning optimization. Given previously successful plans, learning trajectory generative models as priors for a new planning problem is highly desirable. Prior…

Robotics · Computer Science 2024-03-27 Joao Carvalho , An T. Le , Mark Baierl , Dorothea Koert , Jan Peters

When performing tasks like laundry, humans naturally coordinate both hands to manipulate objects and anticipate how their actions will change the state of the clothes. However, achieving such coordination in robotics remains challenging due…

Robotics · Computer Science 2025-04-01 Haonan Chen , Jiaming Xu , Lily Sheng , Tianchen Ji , Shuijing Liu , Yunzhu Li , Katherine Driggs-Campbell

Forecasting a typical object's future motion is a critical task for interpreting and interacting with dynamic environments in computer vision. Event-based sensors, which could capture changes in the scene with exceptional temporal…

Computer Vision and Pattern Recognition · Computer Science 2024-10-14 Song Wu , Zhiyu Zhu , Junhui Hou , Guangming Shi , Jinjian Wu

We present a method to generate video-action pairs that follow text instructions, starting from an initial image observation and the robot's joint states. Our approach automatically provides action labels for video diffusion models,…

Computer Vision and Pattern Recognition · Computer Science 2025-12-19 Liudi Yang , Yang Bai , George Eskandar , Fengyi Shen , Mohammad Altillawi , Dong Chen , Ziyuan Liu , Abhinav Valada

Imitation can allow us to quickly gain an understanding of a new task. Through a demonstration, we can gain direct knowledge about which actions need to be performed and which goals they have. In this paper, we introduce a new approach to…

Robotics · Computer Science 2024-06-04 Josua Spisak , Matthias Kerzel , Stefan Wermter

Bimanual manipulation is crucial in robotics, enabling complex tasks in industrial automation and household services. However, it poses significant challenges due to the high-dimensional action space and intricate coordination requirements.…

Video generation models offer a promising imagination mechanism for robot manipulation by predicting long-horizon future observations, but effectively exploiting these imagined futures for action execution remains challenging. Existing…

Robotics · Computer Science 2026-05-13 Yajie Li , Bozhou Zhang , Chun Gu , Zipei Ma , Jiahui Zhang , Jiankang Deng , Xiatian Zhu , Li Zhang

This paper introduces Diffusion Policy, a new way of generating robot behavior by representing a robot's visuomotor policy as a conditional denoising diffusion process. We benchmark Diffusion Policy across 12 different tasks from 4…

Robotics · Computer Science 2024-03-15 Cheng Chi , Zhenjia Xu , Siyuan Feng , Eric Cousineau , Yilun Du , Benjamin Burchfiel , Russ Tedrake , Shuran Song

Vision-Language-Action (VLA) models have achieved remarkable progress in robotic manipulation by mapping multimodal observations and instructions directly to actions. However, they typically mimic expert trajectories without predictive…

The performance of optimization-based robot motion planning algorithms is highly dependent on the initial solutions, commonly obtained by running a sampling-based planner to obtain a collision-free path. However, these methods can be slow…

Robotics · Computer Science 2025-08-15 J. Carvalho , A. Le , P. Kicki , D. Koert , J. Peters

We propose DemoDiffusion, a simple method for enabling robots to perform manipulation tasks by imitating a single human demonstration, without requiring task-specific training or paired human-robot data. Our approach is based on two…

Robotics · Computer Science 2026-03-10 Sungjae Park , Homanga Bharadhwaj , Shubham Tulsiani

Diffusion models have recently gained significant attention in robotics due to their ability to generate multi-modal distributions of system states and behaviors. However, a key challenge remains: ensuring precise control over the generated…

Robotics · Computer Science 2025-10-01 Luobin Wang , Hongzhan Yu , Chenning Yu , Sicun Gao , Henrik Christensen

This work highlights that video world modeling, alongside vision-language pre-training, establishes a fresh and independent foundation for robot learning. Intuitively, video world models provide the ability to imagine the near future by…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Lin Li , Qihang Zhang , Yiming Luo , Shuai Yang , Ruilin Wang , Fei Han , Mingrui Yu , Zelin Gao , Nan Xue , Xing Zhu , Yujun Shen , Yinghao Xu

This paper introduces a Multi-modal Diffusion model for Motion Prediction (MDMP) that integrates and synchronizes skeletal data and textual descriptions of actions to generate refined long-term motion predictions with quantifiable…

Computer Vision and Pattern Recognition · Computer Science 2025-06-03 Leo Bringer , Joey Wilson , Kira Barton , Maani Ghaffari

Diffusion models have recently been successfully applied to a wide range of robotics applications for learning complex multi-modal behaviors from data. However, prior works have mostly been confined to single-robot and small-scale…

Robotics · Computer Science 2025-05-08 Yorai Shaoul , Itamar Mishani , Shivam Vats , Jiaoyang Li , Maxim Likhachev

Learning human motion based on a time-dependent input signal presents a challenging yet impactful task with various applications. The goal of this task is to generate or estimate human movement that consistently reflects the temporal…

Computer Vision and Pattern Recognition · Computer Science 2025-10-15 Quang Nguyen , Tri Le , Baoru Huang , Minh Nhat Vu , Ngan Le , Thieu Vo , Anh Nguyen

Modeling human behaviors in contextual environments has a wide range of applications in character animation, embodied AI, VR/AR, and robotics. In real-world scenarios, humans frequently interact with the environment and manipulate various…

Computer Vision and Pattern Recognition · Computer Science 2023-09-29 Jiaman Li , Jiajun Wu , C. Karen Liu

We introduce SPOT, an object-centric imitation learning framework. The key idea is to capture each task by an object-centric representation, specifically the SE(3) object pose trajectory relative to the target. This approach decouples…

Robotics · Computer Science 2025-05-15 Cheng-Chun Hsu , Bowen Wen , Jie Xu , Yashraj Narang , Xiaolong Wang , Yuke Zhu , Joydeep Biswas , Stan Birchfield

While pre-trained visual representations have significantly advanced imitation learning, they are often task-agnostic as they remain frozen during policy learning. In this work, we explore leveraging pre-trained text-to-image diffusion…

Computer Vision and Pattern Recognition · Computer Science 2026-04-09 Heeseong Shin , Byeongho Heo , Dongyoon Han , Seungryong Kim , Taekyung Kim

Modeling generalized robot control policies poses ongoing challenges for language-guided robot manipulation tasks. Existing methods often struggle to efficiently utilize cross-dataset resources or rely on resource-intensive vision-language…

Robotics · Computer Science 2024-11-05 Wenhui Tan , Bei Liu , Junbo Zhang , Ruihua Song , Jianlong Fu
‹ Prev 1 2 3 10 Next ›