Related papers: Value function estimation using conditional diffus…

Value-Based Reinforcement Learning for Continuous Control Robotic Manipulation in Multi-Task Sparse Reward Settings

Learning continuous control in high-dimensional sparse reward settings, such as robotic manipulation, is a challenging problem due to the number of samples often required to obtain accurate optimal value and policy estimates. While many…

Robotics · Computer Science 2021-07-29 Sreehari Rammohan , Shangqun Yu , Bowen He , Eric Hsiung , Eric Rosen , Stefanie Tellex , George Konidaris

Model predictive control-based value estimation for efficient reinforcement learning

Reinforcement learning suffers from limitations in real practices primarily due to the number of required interactions with virtual environments. It results in a challenging problem because we are implausible to obtain a local optimal…

Machine Learning · Computer Science 2024-10-28 Qizhen Wu , Kexin Liu , Lei Chen

Non-differentiable Reward Optimization for Diffusion-based Autonomous Motion Planning

Safe and effective motion planning is crucial for autonomous robots. Diffusion models excel at capturing complex agent interactions, a fundamental aspect of decision-making in dynamic environments. Recent studies have successfully applied…

Robotics · Computer Science 2025-07-18 Giwon Lee , Daehee Park , Jaewoo Jeong , Kuk-Jin Yoon

Recurrent Value Functions

Despite recent successes in Reinforcement Learning, value-based methods often suffer from high variance hindering performance. In this paper, we illustrate this in a continuous control setting where state of the art methods perform poorly…

Machine Learning · Computer Science 2019-05-24 Pierre Thodoroff , Nishanth Anand , Lucas Caccia , Doina Precup , Joelle Pineau

Deep Radial-Basis Value Functions for Continuous Control

A core operation in reinforcement learning (RL) is finding an action that is optimal with respect to a learned value function. This operation is often challenging when the learned value function takes continuous actions as input. We…

Machine Learning · Computer Science 2021-03-16 Kavosh Asadi , Neev Parikh , Ronald E. Parr , George D. Konidaris , Michael L. Littman

Multimodal Diffusion Forcing for Forceful Manipulation

Given a dataset of expert trajectories, standard imitation learning approaches typically learn a direct mapping from observations (e.g., RGB images) to actions. However, such methods often overlook the rich interplay between different…

Robotics · Computer Science 2026-04-14 Zixuan Huang , Huaidian Hou , Dmitry Berenson

VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL

Diffusion models have emerged as powerful generative tools across various domains, yet tailoring pre-trained models to exhibit specific desirable properties remains challenging. While reinforcement learning (RL) offers a promising…

Computer Vision and Pattern Recognition · Computer Science 2025-06-03 Fengyuan Dai , Zifeng Zhuang , Yufei Huang , Siteng Huang , Bangyan Liao , Donglin Wang , Fajie Yuan

Diffusion World Model: Future Modeling Beyond Step-by-Step Rollout for Offline Reinforcement Learning

We introduce Diffusion World Model (DWM), a conditional diffusion model capable of predicting multistep future states and rewards concurrently. As opposed to traditional one-step dynamics models, DWM offers long-horizon predictions in a…

Machine Learning · Computer Science 2024-10-17 Zihan Ding , Amy Zhang , Yuandong Tian , Qinqing Zheng

Diffusion-DRF: Free, Rich, and Differentiable Reward for Video Diffusion Fine-Tuning

Video diffusion alignment has been heavily relied on scalar rewards. These rewards are typically derived from learned reward models in human preference datasets, requiring additional training and extensive collection. Moreover, scalar…

Computer Vision and Pattern Recognition · Computer Science 2026-03-18 Yifan Wang , Yanyu Li , Gordon Guocheng Qian , Sergey Tulyakov , Yun Fu , Anil Kag

Diffusion Reward: Learning Rewards via Conditional Video Diffusion

Learning rewards from expert videos offers an affordable and effective solution to specify the intended behaviors for reinforcement learning (RL) tasks. In this work, we propose Diffusion Reward, a novel framework that learns rewards from…

Machine Learning · Computer Science 2024-08-12 Tao Huang , Guangqi Jiang , Yanjie Ze , Huazhe Xu

Normality-Guided Distributional Reinforcement Learning for Continuous Control

Learning a predictive model of the mean return, or value function, plays a critical role in many reinforcement learning algorithms. Distributional reinforcement learning (DRL) has been shown to improve performance by modeling the value…

Machine Learning · Computer Science 2025-07-08 Ju-Seung Byun , Andrew Perrault

Factored Value Functions for Graph-Based Multi-Agent Reinforcement Learning

Credit assignment is a core challenge in multi-agent reinforcement learning (MARL), especially in large-scale systems with structured, local interactions. Graph-based Markov decision processes (GMDPs) capture such settings via an influence…

Machine Learning · Computer Science 2026-01-19 Ahmed Rashwan , Keith Briggs , Chris Budd , Lisa Kreusser

Extracting Reward Functions from Diffusion Models

Diffusion models have achieved remarkable results in image generation, and have similarly been used to learn high-performing policies in sequential decision-making tasks. Decision-making diffusion models can be trained on lower-quality…

Machine Learning · Computer Science 2023-12-12 Felipe Nuti , Tim Franzmeyer , João F. Henriques

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

This paper introduces Diffusion Policy, a new way of generating robot behavior by representing a robot's visuomotor policy as a conditional denoising diffusion process. We benchmark Diffusion Policy across 12 different tasks from 4…

Robotics · Computer Science 2024-03-15 Cheng Chi , Zhenjia Xu , Siyuan Feng , Eric Cousineau , Yilun Du , Benjamin Burchfiel , Russ Tedrake , Shuran Song

Diffusion Predictive Control with Constraints

Diffusion models have become popular for policy learning in robotics due to their ability to capture high-dimensional and multimodal distributions. However, diffusion policies are stochastic and typically trained offline, limiting their…

Robotics · Computer Science 2025-05-28 Ralf Römer , Alexander von Rohr , Angela P. Schoellig

CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion

Diffusion Policy (DP) enables robots to learn complex behaviors by imitating expert demonstrations through action diffusion. However, in practical applications, hardware limitations often degrade data quality, while real-time constraints…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Jiahua Ma , Yiran Qin , Yixiong Li , Xuanqi Liao , Yulan Guo , Ruimao Zhang

Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding

Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences. However, rather than merely generating designs that are natural, we often aim to optimize downstream reward functions while…

Machine Learning · Computer Science 2024-10-28 Xiner Li , Yulai Zhao , Chenyu Wang , Gabriele Scalia , Gokcen Eraslan , Surag Nair , Tommaso Biancalani , Shuiwang Ji , Aviv Regev , Sergey Levine , Masatoshi Uehara

Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement Learning

Training deep reinforcement learning agents on environments with multiple levels / scenes / conditions from the same task, has become essential for many applications aiming to achieve generalization and domain transfer from simulation to…

Machine Learning · Computer Science 2020-05-26 Jaskirat Singh , Liang Zheng

Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction

Value function is the central notion of Reinforcement Learning (RL). Value estimation, especially with function approximation, can be challenging since it involves the stochasticity of environmental dynamics and reward signals that can be…

Machine Learning · Computer Science 2021-03-04 Hongyao Tang , Jianye Hao , Guangyong Chen , Pengfei Chen , Chen Chen , Yaodong Yang , Luo Zhang , Wulong Liu , Zhaopeng Meng

Learning Transparent Reward Models via Unsupervised Feature Selection

In complex real-world tasks such as robotic manipulation and autonomous driving, collecting expert demonstrations is often more straightforward than specifying precise learning objectives and task descriptions. Learning from expert data can…

Robotics · Computer Science 2025-05-05 Daulet Baimukashev , Gokhan Alcan , Kevin Sebastian Luck , Ville Kyrki