Related papers: Text-Aware Diffusion for Policy Learning

Training Diffusion Models with Reinforcement Learning

Diffusion models are a class of flexible generative models trained with an approximation to the log-likelihood objective. However, most use cases of diffusion models are not concerned with likelihoods, but instead with downstream objectives…

Machine Learning · Computer Science 2024-01-08 Kevin Black , Michael Janner , Yilun Du , Ilya Kostrikov , Sergey Levine

Dichotomous Diffusion Policy Optimization

Diffusion-based policies have gained growing popularity in solving a wide range of decision-making tasks due to their superior expressiveness and controllable generation during inference. However, effectively training large diffusion…

Machine Learning · Computer Science 2026-02-03 Ruiming Liang , Yinan Zheng , Kexin Zheng , Tianyi Tan , Jianxiong Li , Liyuan Mao , Zhihao Wang , Guang Chen , Hangjun Ye , Jingjing Liu , Jinqiao Wang , Xianyuan Zhan

DADP: Domain Adaptive Diffusion Policy

Learning domain adaptive policies that can generalize to unseen transition dynamics, remains a fundamental challenge in learning-based control. Substantial progress has been made through domain representation learning to capture…

Machine Learning · Computer Science 2026-03-31 Pengcheng Wang , Qinghang Liu , Haotian Lin , Yiheng Li , Guojian Zhan , Masayoshi Tomizuka , Yixiao Wang

Delay-Aware Diffusion Policy: Bridging the Observation-Execution Gap in Dynamic Tasks

As a robot senses and selects actions, the world keeps changing. This inference delay creates a gap of tens to hundreds of milliseconds between the observed state and the state at execution. In this work, we take the natural generalization…

Robotics · Computer Science 2026-03-25 Aileen Liao , Dong-Ki Kim , Max Olan Smith , Ali-akbar Agha-mohammadi , Shayegan Omidshafiei

Transferable Reinforcement Learning via Probabilistic Latent Embeddings and Dynamic Policy Adaptation for Sim-to-Real Deployment

Due to limited resources and public safety concerns, deep reinforcement learning (RL) agents for many cyber-physical systems (e.g., autonomous vehicles) are first trained in simulators. However, when deployed in real world environments,…

Machine Learning · Computer Science 2026-05-28 Gengyue Han , Yiheng Feng

AdaWorldPolicy: World-Model-Driven Diffusion Policy with Online Adaptive Learning for Robotic Manipulation

Effective robotic manipulation requires policies that can anticipate physical outcomes and adapt to real-world environments. Effective robotic manipulation requires policies that can anticipate physical outcomes and adapt to real-world…

Robotics · Computer Science 2026-02-24 Ge Yuan , Qiyuan Qiao , Jing Zhang , Dong Xu

Goal-Driven Reward by Video Diffusion Models for Reinforcement Learning

Reinforcement Learning (RL) has achieved remarkable success in various domains, yet it often relies on carefully designed programmatic reward functions to guide agent behavior. Designing such reward functions can be challenging and may not…

Machine Learning · Computer Science 2026-04-06 Qi Wang , Mian Wu , Yuyang Zhang , Mingqi Yuan , Wenyao Zhang , Haoxiang You , Yunbo Wang , Xin Jin , Xiaokang Yang , Wenjun Zeng

Beyond Imitation: Reinforcement Learning Fine-Tuning for Adaptive Diffusion Navigation Policies

Diffusion-based robot navigation policies trained on large-scale imitation learning datasets, can generate multi-modal trajectories directly from the robot's visual observations, bypassing the traditional localization-mapping-planning…

Robotics · Computer Science 2026-03-16 Junhe Sheng , Ruofei Bai , Kuan Xu , Ruimeng Liu , Jie Chen , Shenghai Yuan , Wei-Yun Yau , Lihua Xie

RoLD: Robot Latent Diffusion for Multi-task Policy Modeling

Modeling generalized robot control policies poses ongoing challenges for language-guided robot manipulation tasks. Existing methods often struggle to efficiently utilize cross-dataset resources or rely on resource-intensive vision-language…

Robotics · Computer Science 2024-11-05 Wenhui Tan , Bei Liu , Junbo Zhang , Ruihua Song , Jianlong Fu

RLZero: Direct Policy Inference from Language Without In-Domain Supervision

The reward hypothesis states that all goals and purposes can be understood as the maximization of a received scalar reward signal. However, in practice, defining such a reward signal is notoriously difficult, as humans are often unable to…

Artificial Intelligence · Computer Science 2025-11-26 Harshit Sikchi , Siddhant Agarwal , Pranaya Jajoo , Samyak Parajuli , Caleb Chuck , Max Rudolph , Peter Stone , Amy Zhang , Scott Niekum

Learning Transparent Reward Models via Unsupervised Feature Selection

In complex real-world tasks such as robotic manipulation and autonomous driving, collecting expert demonstrations is often more straightforward than specifying precise learning objectives and task descriptions. Learning from expert data can…

Robotics · Computer Science 2025-05-05 Daulet Baimukashev , Gokhan Alcan , Kevin Sebastian Luck , Ville Kyrki

Exploring Conditions for Diffusion models in Robotic Control

While pre-trained visual representations have significantly advanced imitation learning, they are often task-agnostic as they remain frozen during policy learning. In this work, we explore leveraging pre-trained text-to-image diffusion…

Computer Vision and Pattern Recognition · Computer Science 2026-04-09 Heeseong Shin , Byeongho Heo , Dongyoon Han , Seungryong Kim , Taekyung Kim

Learn Dynamic-Aware State Embedding for Transfer Learning

Transfer reinforcement learning aims to improve the sample efficiency of solving unseen new tasks by leveraging experiences obtained from previous tasks. We consider the setting where all tasks (MDPs) share the same environment dynamic…

Machine Learning · Computer Science 2021-01-08 Kaige Yang

ADPro: a Test-time Adaptive Diffusion Policy via Manifold-constrained Denoising and Task-aware Initialization for Robotic Manipulation

Diffusion policies have recently emerged as a powerful class of visuomotor controllers for robot manipulation, offering stable training and expressive multi-modal action modeling. However, existing approaches typically treat action…

Robotics · Computer Science 2025-10-01 Zezeng Li , Rui Yang , Ruochen Chen , ZhongXuan Luo , Liming Chen

Zero-shot Policy Learning with Spatial Temporal RewardDecomposition on Contingency-aware Observation

It is a long-standing challenge to enable an intelligent agent to learn in one environment and generalize to an unseen environment without further data collection and finetuning. In this paper, we consider a zero shot generalization problem…

Machine Learning · Computer Science 2021-03-16 Huazhe Xu , Boyuan Chen , Yang Gao , Trevor Darrell

TMRL: Diffusion Timestep-Modulated Pretraining Enables Exploration for Efficient Policy Finetuning

Fine-tuning pre-trained robot policies with reinforcement learning (RL) often inherits the bottlenecks introduced by pre-training with behavioral cloning (BC), which produces narrow action distributions that lack the coverage necessary for…

Robotics · Computer Science 2026-05-13 Matthew M. Hong , Jesse Zhang , Anusha Nagabandi , Abhishek Gupta

Interpretable Reinforcement Learning with Multilevel Subgoal Discovery

We propose a novel Reinforcement Learning model for discrete environments, which is inherently interpretable and supports the discovery of deep subgoal hierarchies. In the model, an agent learns information about environment in the form of…

Artificial Intelligence · Computer Science 2022-02-16 Alexander Demin , Denis Ponomaryov

PixL2R: Guiding Reinforcement Learning Using Natural Language by Mapping Pixels to Rewards

Reinforcement learning (RL), particularly in sparse reward settings, often requires prohibitively large numbers of interactions with the environment, thereby limiting its applicability to complex problems. To address this, several prior…

Machine Learning · Computer Science 2020-11-20 Prasoon Goyal , Scott Niekum , Raymond J. Mooney

Semi-Supervised Dialogue Policy Learning via Stochastic Reward Estimation

Dialogue policy optimization often obtains feedback until task completion in task-oriented dialogue systems. This is insufficient for training intermediate dialogue turns since supervision signals (or rewards) are only provided at the end…

Computation and Language · Computer Science 2020-05-12 Xinting Huang , Jianzhong Qi , Yu Sun , Rui Zhang

PlayFusion: Skill Acquisition via Diffusion from Language-Annotated Play

Learning from unstructured and uncurated data has become the dominant paradigm for generative approaches in language and vision. Such unstructured and unguided behavior data, commonly known as play, is also easier to collect in robotics but…

Robotics · Computer Science 2023-12-08 Lili Chen , Shikhar Bahl , Deepak Pathak