Related papers: Parameter-Based Value Functions

General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States

Learning to evaluate and improve policies is a core problem of Reinforcement Learning (RL). Traditional RL algorithms learn a value function defined for a single policy. A recently explored competitive alternative is to learn a single value…

Machine Learning · Computer Science 2022-07-05 Francesco Faccio , Aditya Ramesh , Vincent Herrmann , Jean Harb , Jürgen Schmidhuber

Chaining Value Functions for Off-Policy Learning

To accumulate knowledge and improve its policy of behaviour, a reinforcement learning agent can learn `off-policy' about policies that differ from the policy used to generate its experience. This is important to learn counterfactuals, or…

Machine Learning · Computer Science 2022-02-03 Simon Schmitt , John Shawe-Taylor , Hado van Hasselt

Deep Radial-Basis Value Functions for Continuous Control

A core operation in reinforcement learning (RL) is finding an action that is optimal with respect to a learned value function. This operation is often challenging when the learned value function takes continuous actions as input. We…

Machine Learning · Computer Science 2021-03-16 Kavosh Asadi , Neev Parikh , Ronald E. Parr , George D. Konidaris , Michael L. Littman

Fast Adaptation via Policy-Dynamics Value Functions

Standard RL algorithms assume fixed environment dynamics and require a significant amount of interaction to adapt to new environments. We introduce Policy-Dynamics Value Functions (PD-VF), a novel approach for rapidly adapting to dynamics…

Machine Learning · Computer Science 2020-07-07 Roberta Raileanu , Max Goldstein , Arthur Szlam , Rob Fergus

Recurrent Value Functions

Despite recent successes in Reinforcement Learning, value-based methods often suffer from high variance hindering performance. In this paper, we illustrate this in a continuous control setting where state of the art methods perform poorly…

Machine Learning · Computer Science 2019-05-24 Pierre Thodoroff , Nishanth Anand , Lucas Caccia , Doina Precup , Joelle Pineau

Discovering Reinforcement Learning Algorithms

Reinforcement learning (RL) algorithms update an agent's parameters according to one of several possible rules, discovered manually through years of research. Automating the discovery of update rules from data could lead to more efficient…

Machine Learning · Computer Science 2021-01-06 Junhyuk Oh , Matteo Hessel , Wojciech M. Czarnecki , Zhongwen Xu , Hado van Hasselt , Satinder Singh , David Silver

Representation Learning on Graphs: A Reinforcement Learning Application

In this work, we study value function approximation in reinforcement learning (RL) problems with high dimensional state or action spaces via a generalized version of representation policy iteration (RPI). We consider the limitations of…

Machine Learning · Computer Science 2019-01-18 Sephora Madjiheurem , Laura Toni

Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics

Value-based reinforcement-learning algorithms provide state-of-the-art results in model-free discrete-action settings, and tend to outperform actor-critic algorithms. We argue that actor-critic algorithms are limited by their need for an…

Machine Learning · Computer Science 2019-06-13 Denis Steckelmacher , Hélène Plisnier , Diederik M. Roijers , Ann Nowé

Value function estimation using conditional diffusion models for control

A fairly reliable trend in deep reinforcement learning is that the performance scales with the number of parameters, provided a complimentary scaling in amount of training data. As the appetite for large models increases, it is imperative…

Machine Learning · Computer Science 2023-06-14 Bogdan Mazoure , Walter Talbott , Miguel Angel Bautista , Devon Hjelm , Alexander Toshev , Josh Susskind

Direct Preference-based Policy Optimization without Reward Modeling

Preference-based reinforcement learning (PbRL) is an approach that enables RL agents to learn from preference, which is particularly useful when formulating a reward function is challenging. Existing PbRL methods generally involve a…

Machine Learning · Computer Science 2023-10-30 Gaon An , Junhyeok Lee , Xingdong Zuo , Norio Kosaka , Kyung-Min Kim , Hyun Oh Song

Learning Value Functions in Deep Policy Gradients using Residual Variance

Policy gradient algorithms have proven to be successful in diverse decision making and control tasks. However, these methods suffer from high sample complexity and instability issues. In this paper, we address these challenges by providing…

Machine Learning · Computer Science 2021-03-17 Yannis Flet-Berliac , Reda Ouhamma , Odalric-Ambrym Maillard , Philippe Preux

Efficient Off-Policy Learning for High-Dimensional Action Spaces

Existing off-policy reinforcement learning algorithms often rely on an explicit state-action-value function representation, which can be problematic in high-dimensional action spaces due to the curse of dimensionality. This reliance results…

Machine Learning · Computer Science 2025-02-18 Fabian Otto , Philipp Becker , Ngo Anh Vien , Gerhard Neumann

Proximal Deterministic Policy Gradient

This paper introduces two simple techniques to improve off-policy Reinforcement Learning (RL) algorithms. First, we formulate off-policy RL as a stochastic proximal point iteration. The target network plays the role of the variable of…

Machine Learning · Computer Science 2020-08-04 Marco Maggipinto , Gian Antonio Susto , Pratik Chaudhari

Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning

How to select between policies and value functions produced by different training algorithms in offline reinforcement learning (RL) -- which is crucial for hyperpa-rameter tuning -- is an important open question. Existing approaches based…

Machine Learning · Computer Science 2021-11-04 Siyuan Zhang , Nan Jiang

Behaviour Policy Optimization: Provably Lower Variance Return Estimates for Off-Policy Reinforcement Learning

Many reinforcement learning algorithms, particularly those that rely on return estimates for policy improvement, can suffer from poor sample efficiency and training instability due to high-variance return estimates. In this paper we…

Machine Learning · Computer Science 2026-01-06 Alexander W. Goodall , Edwin Hamel-De le Court , Francesco Belardinelli

Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables

Deep reinforcement learning algorithms require large amounts of experience to learn an individual task. While in principle meta-reinforcement learning (meta-RL) algorithms enable agents to learn new skills from small amounts of experience,…

Machine Learning · Computer Science 2019-03-21 Kate Rakelly , Aurick Zhou , Deirdre Quillen , Chelsea Finn , Sergey Levine

Multi-agent Off-policy Actor-Critic Reinforcement Learning for Partially Observable Environments

This study proposes the use of a social learning method to estimate a global state within a multi-agent off-policy actor-critic algorithm for reinforcement learning (RL) operating in a partially observable environment. We assume that the…

Machine Learning · Computer Science 2024-07-09 Ainur Zhaikhan , Ali H. Sayed

Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation

We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning…

Artificial Intelligence · Computer Science 2018-02-23 Hamid Reza Maei

Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning

We study the problem of off-policy critic evaluation in several variants of value-based off-policy actor-critic algorithms. Off-policy actor-critic algorithms require an off-policy critic evaluation step, to estimate the value of the new…

Machine Learning · Computer Science 2019-12-12 Riashat Islam , Raihan Seraj , Samin Yeasar Arnob , Doina Precup

Massively Scaling Explicit Policy-conditioned Value Functions

We introduce a scaling strategy for Explicit Policy-Conditioned Value Functions (EPVFs) that significantly improves performance on challenging continuous-control tasks. EPVFs learn a value function V({\theta}) that is explicitly conditioned…

Machine Learning · Computer Science 2025-02-18 Nico Bohlinger , Jan Peters