Related papers: Toward Efficient Gradient-Based Value Estimation

Analysis and Optimisation of Bellman Residual Errors with Neural Function Approximation

Recent development of Deep Reinforcement Learning (DRL) has demonstrated superior performance of neural networks in solving challenging problems with large or even continuous state spaces. One specific approach is to deploy neural networks…

Machine Learning · Computer Science 2022-03-15 Martin Gottwald , Sven Gronauer , Hao Shen , Klaus Diepold

Deep Reinforcement Learning with Gradient Eligibility Traces

Achieving fast and stable off-policy learning in deep reinforcement learning (RL) is challenging. Most existing methods rely on semi-gradient temporal-difference (TD) methods for their simplicity and efficiency, but are consequently…

Machine Learning · Computer Science 2025-09-22 Esraa Elelimy , Brett Daley , Andrew Patterson , Marlos C. Machado , Adam White , Martha White

Differential Temporal Difference Learning

Value functions derived from Markov decision processes arise as a central component of algorithms as well as performance metrics in many statistics and engineering applications of machine learning techniques. Computation of the solution to…

Machine Learning · Computer Science 2020-03-02 Adithya M. Devraj , Ioannis Kontoyiannis , Sean P. Meyn

A Kernel Loss for Solving the Bellman Equation

Value function learning plays a central role in many state-of-the-art reinforcement-learning algorithms. Many popular algorithms like Q-learning do not optimize any objective function, but are fixed-point iterations of some variant of…

Machine Learning · Computer Science 2020-01-10 Yihao Feng , Lihong Li , Qiang Liu

A Variance Minimization Approach to Temporal-Difference Learning

Fast-converging algorithms are a contemporary requirement in reinforcement learning. In the context of linear function approximation, the magnitude of the smallest eigenvalue of the key matrix is a major factor reflecting the convergence…

Machine Learning · Computer Science 2024-11-12 Xingguo Chen , Yu Gong , Shangdong Yang , Wenhao Wang

An Experimental Comparison Between Temporal Difference and Residual Gradient with Neural Network Approximation

Gradient descent or its variants are popular in training neural networks. However, in deep Q-learning with neural network approximation, a type of reinforcement learning, gradient descent (also known as Residual Gradient (RG)) is barely…

Machine Learning · Computer Science 2022-11-15 Shuyu Yin , Tao Luo , Peilin Liu , Zhi-Qin John Xu

A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning

Many reinforcement learning algorithms rely on value estimation, however, the most widely used algorithms -- namely temporal difference algorithms -- can diverge under both off-policy sampling and nonlinear function approximation. Many…

Machine Learning · Computer Science 2024-08-02 Andrew Patterson , Adam White , Martha White

Differential TD Learning for Value Function Approximation

Value functions arise as a component of algorithms as well as performance metrics in statistics and engineering applications. Computation of the associated Bellman equations is numerically challenging in all but a few special cases. A…

Systems and Control · Computer Science 2018-12-27 Adithya M. Devraj , Sean P. Meyn

Learning Successor States and Goal-Dependent Values: A Mathematical Viewpoint

In reinforcement learning, temporal difference-based algorithms can be sample-inefficient: for instance, with sparse rewards, no learning occurs until a reward is observed. This can be remedied by learning richer objects, such as a model of…

Machine Learning · Computer Science 2021-01-19 Léonard Blier , Corentin Tallec , Yann Ollivier

Robust Losses for Learning Value Functions

Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error. However, squared errors are known to be sensitive to outliers, both skewing the solution of the objective and…

Machine Learning · Computer Science 2023-04-19 Andrew Patterson , Victor Liao , Martha White

Gradient Iterated Temporal-Difference Learning

Temporal-difference (TD) learning is highly effective at controlling and evaluating an agent's long-term outcomes. Most approaches in this paradigm implement a semi-gradient update to boost the learning speed, which consists of ignoring the…

Machine Learning · Computer Science 2026-05-15 Théo Vincent , Kevin Gerhardt , Yogesh Tripathi , Habib Maraqten , Adam White , Martha White , Jan Peters , Carlo D'Eramo

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD)…

Machine Learning · Computer Science 2020-06-09 Bo Liu , Ian Gemp , Mohammad Ghavamzadeh , Ji Liu , Sridhar Mahadevan , Marek Petrik

Reward Learning through Ranking Mean Squared Error

Reward design remains a significant bottleneck in applying reinforcement learning (RL) to real-world problems. A popular alternative is reward learning, where reward functions are inferred from human feedback rather than manually specified.…

Machine Learning · Computer Science 2026-01-16 Chaitanya Kharyal , Calarina Muslimani , Matthew E. Taylor

Bellman Gradient Iteration for Inverse Reinforcement Learning

This paper develops an inverse reinforcement learning algorithm aimed at recovering a reward function from the observed actions of an agent. We introduce a strategy to flexibly handle different types of actions with two approximations of…

Machine Learning · Computer Science 2017-07-26 Kun Li , Yanan Sui , Joel W. Burdick

Costate-focused models for reinforcement learning

Many recent algorithms for reinforcement learning are model-free and founded on the Bellman equation. Here we present a method founded on the costate equation and models of the state dynamics. We use the costate -- the gradient of cost with…

Machine Learning · Computer Science 2018-10-04 Bita Behrouzi , Xuefei Liu , Douglas Tweed

Reward-Reinforced Reinforcement Learning for Multi-agent Systems

Reinforcement learning algorithms in multi-agent systems deliver highly resilient and adaptable solutions for common problems in telecommunications,aerospace, and industrial robotics. However, achieving an optimal global goal remains a…

Multiagent Systems · Computer Science 2021-05-18 Changgang Zheng , Shufan Yang , Juan Parra-Ullauri , Antonio Garcia-Dominguez , Nelly Bencomo

Robust Reinforcement Learning under Diffusion Models for Data with Jumps

Reinforcement Learning (RL) has proven effective in solving complex decision-making tasks across various domains, but challenges remain in continuous-time settings, particularly when state dynamics are governed by stochastic differential…

Machine Learning · Computer Science 2025-09-19 Chenyang Jiang , Donggyu Kim , Alejandra Quintos , Yazhen Wang

Distributional value gradients for stochastic environments

Gradient-regularized value learning methods improve sample efficiency by leveraging learned models of transition dynamics and rewards to estimate return gradients. However, existing approaches, such as MAGE, struggle in stochastic or noisy…

Machine Learning · Computer Science 2026-03-04 Baptiste Debes , Tinne Tuytelaars

Fast Value Tracking for Deep Reinforcement Learning

Reinforcement learning (RL) tackles sequential decision-making problems by creating agents that interacts with their environment. However, existing algorithms often view these problem as static, focusing on point estimates for model…

Machine Learning · Statistics 2024-03-21 Frank Shih , Faming Liang

Simplified Temporal Consistency Reinforcement Learning

Reinforcement learning is able to solve complex sequential decision-making tasks but is currently limited by sample efficiency and required computation. To improve sample efficiency, recent work focuses on model-based RL which interleaves…

Machine Learning · Computer Science 2023-06-19 Yi Zhao , Wenshuai Zhao , Rinu Boney , Juho Kannala , Joni Pajarinen