English
Related papers

Related papers: Diffusion Transformer Policy

200 papers

Vision-Language-Action (VLA) models are emerging as a next-generation paradigm for robotics. We introduce dVLA, a diffusion-based VLA that leverages a multimodal chain-of-thought to unify visual perception, language reasoning, and robotic…

Robotics · Computer Science 2025-10-01 Junjie Wen , Minjie Zhu , Jiaming Liu , Zhiyuan Liu , Yicun Yang , Linfeng Zhang , Shanghang Zhang , Yichen Zhu , Yi Xu

Modeling generalized robot control policies poses ongoing challenges for language-guided robot manipulation tasks. Existing methods often struggle to efficiently utilize cross-dataset resources or rely on resource-intensive vision-language…

Robotics · Computer Science 2024-11-05 Wenhui Tan , Bei Liu , Junbo Zhang , Ruihua Song , Jianlong Fu

Diffusion Policy is a powerful technique tool for learning end-to-end visuomotor robot control. It is expected that Diffusion Policy possesses scalability, a key attribute for deep neural networks, typically suggesting that increasing model…

Scaling Transformer policies and diffusion models has advanced robotic manipulation, yet combining these techniques in lightweight, cross-embodiment learning settings remains challenging. We study design choices that most affect stability…

Robotics · Computer Science 2025-09-16 Travis Davies , Yiqi Huang , Yunxin Liu , Xiang Chen , Huxian Liu , Luhui Hu

We propose DemoDiffusion, a simple method for enabling robots to perform manipulation tasks by imitating a single human demonstration, without requiring task-specific training or paired human-robot data. Our approach is based on two…

Robotics · Computer Science 2026-03-10 Sungjae Park , Homanga Bharadhwaj , Shubham Tulsiani

Diffusion policies are powerful visuomotor models for robotic manipulation, yet they often fail to generalize to manipulators or end-effectors unseen during training and struggle to accommodate new task requirements at inference time.…

Current robotic pick-and-place policies typically require consistent gripper configurations across training and inference. This constraint imposes high retraining or fine-tuning costs, especially for imitation learning-based approaches,…

We present a diffusion-based model recipe for real-world control of a highly dexterous humanoid robotic hand, designed for sample-efficient learning and smooth fine-motor action inference. Our system features a newly designed 16-DoF…

Learning transferable latent actions from large-scale object manipulation videos can significantly enhance generalization in downstream robotics tasks, as such representations are agnostic to different robot embodiments. Existing approaches…

Robotics · Computer Science 2025-12-01 Zuolei Li , Xingyu Gao , Xiaofan Wang , Jianlong Fu

Vision-Language-Action (VLA) models adapt large vision-language backbones to map images and instructions into robot actions. However, prevailing VLAs either generate actions auto-regressively in a fixed left-to-right order or attach…

Computer Vision and Pattern Recognition · Computer Science 2025-12-23 Zhixuan Liang , Yizhuo Li , Tianshuo Yang , Chengyue Wu , Sitong Mao , Tian Nian , Liuao Pei , Shunbo Zhou , Xiaokang Yang , Jiangmiao Pang , Yao Mu , Ping Luo

Learning a generalist embodied agent capable of completing multiple tasks poses challenges, primarily stemming from the scarcity of action-labeled robotic datasets. In contrast, a vast amount of human videos exist, capturing intricate tasks…

Machine Learning · Computer Science 2024-10-10 Haoran He , Chenjia Bai , Ling Pan , Weinan Zhang , Bin Zhao , Xuelong Li

A generalist robot should perform effectively across various environments. However, most existing approaches heavily rely on scaling action-annotated data to enhance their capabilities. Consequently, they are often limited to single…

Robotics · Computer Science 2025-11-04 Qingwen Bu , Yanting Yang , Jisong Cai , Shenyuan Gao , Guanghui Ren , Maoqing Yao , Ping Luo , Hongyang Li

In recent years roboticists have achieved remarkable progress in solving increasingly general tasks on dexterous robotic hardware by leveraging high capacity Transformer network architectures and generative diffusion models. Unfortunately,…

Robotics · Computer Science 2024-10-15 Sudeep Dasari , Oier Mees , Sebastian Zhao , Mohan Kumar Srirama , Sergey Levine

This paper introduces Diffusion Policy, a new way of generating robot behavior by representing a robot's visuomotor policy as a conditional denoising diffusion process. We benchmark Diffusion Policy across 12 different tasks from 4…

Robotics · Computer Science 2024-03-15 Cheng Chi , Zhenjia Xu , Siyuan Feng , Eric Cousineau , Yilun Du , Benjamin Burchfiel , Russ Tedrake , Shuran Song

Learning visuomotor policy for multi-task robotic manipulation has been a long-standing challenge for the robotics community. The difficulty lies in the diversity of action space: typically, a goal can be accomplished in multiple ways,…

Robotics · Computer Science 2025-03-24 Kun Wu , Yichen Zhu , Jinming Li , Junjie Wen , Ning Liu , Zhiyuan Xu , Jian Tang

Intelligent surgical robots have the potential to revolutionize clinical practice by enabling more precise and automated surgical procedures. However, the automation of such robot for surgical tasks remains under-explored compared to recent…

Robotics · Computer Science 2026-03-10 Chonlam Ho , Jianshu Hu , Lei Song , Hesheng Wang , Qi Dou , Yutong Ban

Diffusion-based models for robotic control, including vision-language-action (VLA) and vision-action (VA) policies, have demonstrated significant capabilities. Yet their advancement is constrained by the high cost of acquiring large-scale…

In this paper, we present DiffusionVLA, a novel framework that seamlessly combines the autoregression model with the diffusion model for learning visuomotor policy. Central to our approach is a next-token prediction objective, enabling the…

Learning based multi-robot path planning methods struggle to scale or generalize to changes, particularly variations in the number of robots during deployment. Most existing methods are trained on a fixed number of robots and may tolerate a…

Robotics · Computer Science 2026-04-09 Siddharth Singh , Soumee Guha , Qing Chang , Scott Acton

End-to-end learning is emerging as a powerful paradigm for robotic manipulation, but its effectiveness is limited by data scarcity and the heterogeneity of action spaces across robot embodiments. In particular, diverse action spaces across…

Robotics · Computer Science 2026-03-23 Erik Bauer , Elvis Nava , Robert K. Katzschmann
‹ Prev 1 2 3 10 Next ›