Related papers: Bellman Error Centering

A Variance Minimization Approach to Temporal-Difference Learning

Fast-converging algorithms are a contemporary requirement in reinforcement learning. In the context of linear function approximation, the magnitude of the smallest eigenvalue of the key matrix is a major factor reflecting the convergence…

Machine Learning · Computer Science 2024-11-12 Xingguo Chen , Yu Gong , Shangdong Yang , Wenhao Wang

An Analysis of Categorical Distributional Reinforcement Learning

Distributional approaches to value-based reinforcement learning model the entire distribution of returns, rather than just their expected values, and have recently been shown to yield state-of-the-art empirical performance. This was…

Machine Learning · Statistics 2018-02-23 Mark Rowland , Marc G. Bellemare , Will Dabney , Rémi Munos , Yee Whye Teh

Reward Redistribution for CVaR MDPs using a Bellman Operator on L-infinity

Tail-end risk measures such as static conditional value-at-risk (CVaR) are used in safety-critical applications to prevent rare, yet catastrophic events. Unlike risk-neutral objectives, the static CVaR of the return depends on entire…

Machine Learning · Computer Science 2026-02-04 Aneri Muni , Vincent Taboga , Esther Derman , Pierre-Luc Bacon , Erick Delage

A Distributional Perspective on Reinforcement Learning

In this paper we argue for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent. This is in contrast to the common approach to reinforcement learning which…

Machine Learning · Computer Science 2017-07-24 Marc G. Bellemare , Will Dabney , Rémi Munos

Confounding Robust Continuous Control via Automatic Reward Shaping

Reward shaping has been applied widely to accelerate Reinforcement Learning (RL) agents' training. However, a principled way of designing effective reward shaping functions, especially for complex continuous control problems, remains…

Machine Learning · Computer Science 2026-02-12 Mateo Juliani , Mingxuan Li , Elias Bareinboim

Reward Centering

We show that discounted methods for solving continuing reinforcement learning problems can perform significantly better if they center their rewards by subtracting out the rewards' empirical average. The improvement is substantial at…

Machine Learning · Computer Science 2024-10-31 Abhishek Naik , Yi Wan , Manan Tomar , Richard S. Sutton

Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification

Reinforcement learning (RL) algorithms assume that users specify tasks by manually writing down a reward function. However, this process can be laborious and demands considerable technical expertise. Can we devise RL algorithms that instead…

Machine Learning · Computer Science 2022-01-03 Benjamin Eysenbach , Sergey Levine , Ruslan Salakhutdinov

On the Power of (Approximate) Reward Models for Inference-Time Scaling

Inference-time scaling has recently emerged as a powerful paradigm for improving the reasoning capability of large language models. Among various approaches, Sequential Monte Carlo (SMC) has become a particularly important framework,…

Computation and Language · Computer Science 2026-02-03 Youheng Zhu , Yiping Lu

Robust Losses for Learning Value Functions

Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error. However, squared errors are known to be sensitive to outliers, both skewing the solution of the objective and…

Machine Learning · Computer Science 2023-04-19 Andrew Patterson , Victor Liao , Martha White

Outcome-Driven Reinforcement Learning via Variational Inference

While reinforcement learning algorithms provide automated acquisition of optimal policies, practical application of such methods requires a number of design decisions, such as manually designing reward functions that not only define the…

Machine Learning · Computer Science 2022-12-29 Tim G. J. Rudner , Vitchyr H. Pong , Rowan McAllister , Yarin Gal , Sergey Levine

Nonconvex Regularization for Feature Selection in Reinforcement Learning

This work proposes an efficient batch algorithm for feature selection in reinforcement learning (RL) with theoretical convergence guarantees. To mitigate the estimation bias inherent in conventional regularization schemes, the first…

Machine Learning · Computer Science 2025-09-22 Kyohei Suzuki , Konstantinos Slavakis

Value-Distributional Model-Based Reinforcement Learning

Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks. We study the problem from a model-based Bayesian reinforcement learning perspective, where the goal is to learn the…

Machine Learning · Computer Science 2024-09-04 Carlos E. Luis , Alessandro G. Bottero , Julia Vinogradska , Felix Berkenkamp , Jan Peters

To Err is Robotic: Rapid Value-Based Trial-and-Error during Deployment

When faced with a novel scenario, it can be hard to succeed on the first attempt. In these challenging situations, it is important to know how to retry quickly and meaningfully. Retrying behavior can emerge naturally in robots trained on…

Robotics · Computer Science 2024-06-25 Maximilian Du , Alexander Khazatsky , Tobias Gerstenberg , Chelsea Finn

A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of…

Machine Learning · Computer Science 2020-03-30 Philip Amortila , Doina Precup , Prakash Panangaden , Marc G. Bellemare

Model-Based Uncertainty in Value Functions

We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. Previous work…

Machine Learning · Computer Science 2023-03-08 Carlos E. Luis , Alessandro G. Bottero , Julia Vinogradska , Felix Berkenkamp , Jan Peters

STARC: A General Framework For Quantifying Differences Between Reward Functions

In order to solve a task using reinforcement learning, it is necessary to first formalise the goal of that task as a reward function. However, for many real-world tasks, it is very difficult to manually specify a reward function that never…

Machine Learning · Computer Science 2024-12-13 Joar Skalse , Lucy Farnik , Sumeet Ramesh Motwani , Erik Jenner , Adam Gleave , Alessandro Abate

Bellman Calibration for $V$-Learning in Offline Reinforcement Learning

Reliable long-horizon value prediction is difficult in offline reinforcement learning because fitted value methods combine bootstrapping, function approximation, and distribution shift, while standard guarantees often require Bellman…

Machine Learning · Statistics 2026-05-11 Lars van der Laan , Nathan Kallus

On Reward-Balancing Methods for Reinforcement Learning

This paper investigates the so-called reward-balancing methods, a novel class of algorithms for solving discounted-return reinforcement learning (RL) problems. These methods consist of iteratively adjusting the reward function to transform…

Optimization and Control · Mathematics 2026-04-23 Simone Baroncini , Bahman Gharesifard , Giuseppe Notarstefano

An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning

Our goal is for AI systems to correctly identify and act according to their human user's objectives. Cooperative Inverse Reinforcement Learning (CIRL) formalizes this value alignment problem as a two-player game between a human and robot,…

Artificial Intelligence · Computer Science 2018-06-12 Dhruv Malik , Malayandi Palaniappan , Jaime F. Fisac , Dylan Hadfield-Menell , Stuart Russell , Anca D. Dragan

Distributional Bellman Operators over Mean Embeddings

We propose a novel algorithmic framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions. We derive several new algorithms for dynamic programming and…

Machine Learning · Statistics 2024-03-05 Li Kevin Wenliang , Grégoire Delétang , Matthew Aitchison , Marcus Hutter , Anian Ruoss , Arthur Gretton , Mark Rowland