Related papers: DNAct: Diffusion Guided Multi-Task 3D Policy Learn…

RoLD: Robot Latent Diffusion for Multi-task Policy Modeling

Modeling generalized robot control policies poses ongoing challenges for language-guided robot manipulation tasks. Existing methods often struggle to efficiently utilize cross-dataset resources or rely on resource-intensive vision-language…

Robotics · Computer Science 2024-11-05 Wenhui Tan , Bei Liu , Junbo Zhang , Ruihua Song , Jianlong Fu

Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals

This work introduces the Multimodal Diffusion Transformer (MDT), a novel diffusion policy framework, that excels at learning versatile behavior from multimodal goal specifications with few language annotations. MDT leverages a…

Robotics · Computer Science 2024-07-09 Moritz Reuss , Ömer Erdinç Yağmurlu , Fabian Wenzel , Rudolf Lioutikov

A Novel Task-Driven Diffusion-Based Policy with Affordance Learning for Generalizable Manipulation of Articulated Objects

Despite recent advances in dexterous manipulations, the manipulation of articulated objects and generalization across different categories remain significant challenges. To address these issues, we introduce DART, a novel framework that…

Robotics · Computer Science 2025-09-19 Hao Zhang , Zhen Kan , Weiwei Shang , Yongduan Song

Medical Semantic Segmentation with Diffusion Pretrain

Recent advances in deep learning have shown that learning robust feature representations is critical for the success of many computer vision tasks, including medical image segmentation. In particular, both transformer and…

Computer Vision and Pattern Recognition · Computer Science 2025-02-03 David Li , Anvar Kurmukov , Mikhail Goncharov , Roman Sokolov , Mikhail Belyaev

Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

Previously, non-autoregressive models were widely perceived as being superior in generation efficiency but inferior in generation quality due to the difficulties of modeling multiple target modalities. To enhance the multi-modality modeling…

Computation and Language · Computer Science 2023-11-30 Lihua Qian , Mingxuan Wang , Yang Liu , Hao Zhou

EL3DD: Extended Latent 3D Diffusion for Language Conditioned Multitask Manipulation

Acting in human environments is a crucial capability for general-purpose robots, necessitating a robust understanding of natural language and its application to physical tasks. This paper seeks to harness the capabilities of diffusion…

Robotics · Computer Science 2026-04-28 Jonas Bode , Raphael Memmesheimer , Sven Behnke

DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data

Recently, there has been an increased interest in the practical problem of learning multiple dense scene understanding tasks from partially annotated data, where each training sample is only labeled for a subset of the tasks. The missing of…

Computer Vision and Pattern Recognition · Computer Science 2024-03-25 Hanrong Ye , Dan Xu

TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model

Recently, 2D speaking avatars have increasingly participated in everyday scenarios due to the fast development of facial animation techniques. However, most existing works neglect the explicit control of human bodies. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2024-10-15 Jiazhi Guan , Quanwei Yang , Kaisiyuan Wang , Hang Zhou , Shengyi He , Zhiliang Xu , Haocheng Feng , Errui Ding , Jingdong Wang , Hongtao Xie , Youjian Zhao , Ziwei Liu

In-Context Learning Unlocked for Diffusion Models

We present Prompt Diffusion, a framework for enabling in-context learning in diffusion-based generative models. Given a pair of task-specific example images, such as depth from/to image and scribble from/to image, and a text guidance, our…

Computer Vision and Pattern Recognition · Computer Science 2023-10-20 Zhendong Wang , Yifan Jiang , Yadong Lu , Yelong Shen , Pengcheng He , Weizhu Chen , Zhangyang Wang , Mingyuan Zhou

Video2Act: A Dual-System Video Diffusion Policy with Robotic Spatio-Motional Modeling

Robust perception and dynamics modeling are fundamental to real-world robotic policy learning. Recent methods employ video diffusion models (VDMs) to enhance robotic policies, improving their understanding and modeling of the physical…

Robotics · Computer Science 2026-03-25 Yueru Jia , Jiaming Liu , Shengbang Liu , Rui Zhou , Wanhe Yu , Yuyang Yan , Xiaowei Chi , Yandong Guo , Boxin Shi , Shanghang Zhang

DADP: Domain Adaptive Diffusion Policy

Learning domain adaptive policies that can generalize to unseen transition dynamics, remains a fundamental challenge in learning-based control. Substantial progress has been made through domain representation learning to capture…

Machine Learning · Computer Science 2026-03-31 Pengcheng Wang , Qinghang Liu , Haotian Lin , Yiheng Li , Guojian Zhan , Masayoshi Tomizuka , Yixiao Wang

DRDT3: Diffusion-Refined Decision Test-Time Training Model

Decision Transformer (DT), a trajectory modelling method, has shown competitive performance compared to traditional offline reinforcement learning (RL) approaches on various classic control tasks. However, it struggles to learn optimal…

Machine Learning · Computer Science 2025-09-18 Xingshuai Huang , Di Wu , Benoit Boulet

Integrating Diffusion-based Multi-task Learning with Online Reinforcement Learning for Robust Quadruped Robot Control

Recent research has highlighted the powerful capabilities of imitation learning in robotics. Leveraging generative models, particularly diffusion models, these approaches offer notable advantages such as strong multi-task generalization,…

Robotics · Computer Science 2025-09-15 Xinyao Qin , Xiaoteng Ma , Yang Qi , Qihan Liu , Chuanyi Xue , Ning Gui , Qinyu Dong , Jun Yang , Bin Liang

Discrete Policy: Learning Disentangled Action Space for Multi-Task Robotic Manipulation

Learning visuomotor policy for multi-task robotic manipulation has been a long-standing challenge for the robotics community. The difficulty lies in the diversity of action space: typically, a goal can be accomplished in multiple ways,…

Robotics · Computer Science 2025-03-24 Kun Wu , Yichen Zhu , Jinming Li , Junjie Wen , Ning Liu , Zhiyuan Xu , Jian Tang

Controllable Diverse Sampling for Diffusion Based Motion Behavior Forecasting

In autonomous driving tasks, trajectory prediction in complex traffic environments requires adherence to real-world context conditions and behavior multimodalities. Existing methods predominantly rely on prior assumptions or generative…

Computer Vision and Pattern Recognition · Computer Science 2024-02-07 Yiming Xu , Hao Cheng , Monika Sester

DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks

Pre-trained large language models demonstrate potential in extracting information from DNA sequences, yet adapting to a variety of tasks and data modalities remains a challenge. To address this, we propose DNAGPT, a generalized DNA…

Genomics · Quantitative Biology 2023-09-01 Daoan Zhang , Weitong Zhang , Yu Zhao , Jianguo Zhang , Bing He , Chenchen Qin , Jianhua Yao

DONUT: Physics-aware Machine Learning for Real-time X-ray Nanodiffraction Analysis

Coherent X-ray scattering techniques are critical for investigating the fundamental structural properties of materials at the nanoscale. While advancements have made these experiments more accessible, real-time analysis remains a…

Machine Learning · Computer Science 2025-07-21 Aileen Luo , Tao Zhou , Ming Du , Martin V. Holt , Andrej Singer , Mathew J. Cherukara

Diffusion Models For Multi-Modal Generative Modeling

Diffusion-based generative modeling has been achieving state-of-the-art results on various generation tasks. Most diffusion models, however, are limited to a single-generation modeling. Can we generalize diffusion models with the ability of…

Computer Vision and Pattern Recognition · Computer Science 2024-09-26 Changyou Chen , Han Ding , Bunyamin Sisman , Yi Xu , Ouye Xie , Benjamin Z. Yao , Son Dinh Tran , Belinda Zeng

Dynamic Rank Adjustment in Diffusion Policies for Efficient and Flexible Training

Diffusion policies trained via offline behavioral cloning have recently gained traction in robotic motion generation. While effective, these policies typically require a large number of trainable parameters. This model size affords powerful…

Robotics · Computer Science 2025-04-29 Xiatao Sun , Shuo Yang , Yinxing Chen , Francis Fan , Yiyan Liang , Daniel Rakita

3D Diffuser Actor: Policy Diffusion with 3D Scene Representations

Diffusion policies are conditional diffusion models that learn robot action distributions conditioned on the robot and environment state. They have recently shown to outperform both deterministic and alternative action distribution learning…

Robotics · Computer Science 2024-07-26 Tsung-Wei Ke , Nikolaos Gkanatsios , Katerina Fragkiadaki