Related papers: Direct Advantage Estimation

Skill or Luck? Return Decomposition via Advantage Functions

Learning from off-policy data is essential for sample-efficient reinforcement learning. In the present work, we build on the insight that the advantage function can be understood as the causal effect of an action on the return, and show…

Machine Learning · Computer Science 2024-02-21 Hsiao-Ru Pan , Bernhard Schölkopf

Bootstrap Advantage Estimation for Policy Optimization in Reinforcement Learning

This paper proposes an advantage estimation approach based on data augmentation for policy optimization. Unlike using data augmentation on the input to learn value and policy function as existing methods use, our method uses data…

Machine Learning · Computer Science 2022-10-17 Md Masudur Rahman , Yexiang Xue

Generalized Advantage Estimation for Distributional Policy Gradients

Generalized Advantage Estimation (GAE) has been used to mitigate the computational complexity of reinforcement learning (RL) by employing an exponentially weighted estimation of the advantage function to reduce the variance in policy…

Machine Learning · Computer Science 2025-07-24 Shahil Shaik , Jonathon M. Smereka , Yue Wang

ADORA: Training Reasoning Models with Dynamic Advantage Estimation on Reinforcement Learning

Reinforcement learning has become a cornerstone technique for developing reasoning models in complex tasks, ranging from mathematical problem-solving to imaginary reasoning. The optimization of these models typically relies on policy…

Machine Learning · Computer Science 2026-02-11 Qingnan Ren , Shiting Huang , Zhen Fang , Zehui Chen , Lin Chen , Lijun Li , Feng Zhao

Breaking Habits: On the Role of the Advantage Function in Learning Causal State Representations

Recent work has shown that reinforcement learning agents can develop policies that exploit spurious correlations between rewards and observations. This phenomenon, known as policy confounding, arises because the agent's policy influences…

Machine Learning · Computer Science 2025-06-16 Miguel Suau

UGAE: A Novel Approach to Non-exponential Discounting

The discounting mechanism in Reinforcement Learning determines the relative importance of future and present rewards. While exponential discounting is widely used in practice, non-exponential discounting methods that align with human…

Machine Learning · Computer Science 2023-02-14 Ariel Kwiatkowski , Vicky Kalogeiton , Julien Pettré , Marie-Paule Cani

Partial advantage estimator for proximal policy optimization

Estimation of value in policy gradient methods is a fundamental problem. Generalized Advantage Estimation (GAE) is an exponentially-weighted estimator of an advantage function similar to $\lambda$-return. It substantially reduces the…

Machine Learning · Computer Science 2023-01-27 Xiulei Song , Yizhao Jin , Greg Slabaugh , Simon Lucas

Directly Estimating the Variance of the {\lambda}-Return Using Temporal-Difference Methods

This paper investigates estimating the variance of a temporal-difference learning agent's update target. Most reinforcement learning methods use an estimate of the value function, which captures how good it is for the agent to be in a…

Artificial Intelligence · Computer Science 2018-02-15 Craig Sherstan , Brendan Bennett , Kenny Young , Dylan R. Ashley , Adam White , Martha White , Richard S. Sutton

High-Dimensional Continuous Control Using Generalized Advantage Estimation

Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks. The two main…

Machine Learning · Computer Science 2018-10-23 John Schulman , Philipp Moritz , Sergey Levine , Michael Jordan , Pieter Abbeel

Distributional Advantage Actor-Critic

In traditional reinforcement learning, an agent maximizes the reward collected during its interaction with the environment by approximating the optimal policy through the estimation of value functions. Typically, given a state s and action…

Machine Learning · Computer Science 2018-06-20 Shangda Li , Selina Bing , Steven Yang

Biased Estimates of Advantages over Path Ensembles

The estimation of advantage is crucial for a number of reinforcement learning algorithms, as it directly influences the choices of future paths. In this work, we propose a family of estimates based on the order statistics over the path…

Machine Learning · Computer Science 2019-09-17 Lanxin Lei , Zhizhong Li , Dahua Lin

Generalized Per-Agent Advantage Estimation for Multi-Agent Policy Optimization

In this paper, we propose a novel framework for multi-agent reinforcement learning that enhances sample efficiency and coordination through accurate per-agent advantage estimation. The core of our approach is Generalized Per-Agent Advantage…

Multiagent Systems · Computer Science 2026-03-10 Seongmin Kim , Giseung Park , Woojun Kim , Jiwon Jeon , Seungyul Han , Youngchul Sung

Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation

We consider the problem of imitation learning from a finite set of expert trajectories, without access to reinforcement signals. The classical approach of extracting the expert's reward function via inverse reinforcement learning, followed…

Machine Learning · Computer Science 2019-06-10 Ruohan Wang , Carlo Ciliberto , Pierluigi Amadori , Yiannis Demiris

Matching-Based Policy Learning

The beneficial effects of treatments vary across individuals in most studies. Treatment heterogeneity motivates practitioners to search for the optimal policy based on personal characteristics. A long-standing common practice in policy…

Statistics Theory · Mathematics 2025-01-06 Xuqiao Li , Ying Yan

Directly Attention Loss Adjusted Prioritized Experience Replay

Prioritized Experience Replay (PER) enables the model to learn more about relatively important samples by artificially changing their accessed frequencies. However, this non-uniform sampling method shifts the state-action distribution that…

Machine Learning · Computer Science 2023-11-27 Zhuoying Chen , Huiping Li , Zhaoxu Wang

The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation

We study the problem of temporal-difference-based policy evaluation in reinforcement learning. In particular, we analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD), for this…

Machine Learning · Computer Science 2023-05-31 Mark Rowland , Yunhao Tang , Clare Lyle , Rémi Munos , Marc G. Bellemare , Will Dabney

Evaluation-Aware Reinforcement Learning

Policy evaluation is a core component of many reinforcement learning (RL) algorithms and a critical tool for ensuring safe deployment of RL policies. However, existing policy evaluation methods often suffer from high variance or bias. To…

Artificial Intelligence · Computer Science 2026-03-23 Shripad Vilasrao Deshmukh , Will Schwarzer , Scott Niekum

Preference elicitation and inverse reinforcement learning

We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us…

Machine Learning · Statistics 2011-06-30 Constantin Rothkopf , Christos Dimitrakakis

DARE: Difficulty-Adaptive Reinforcement Learning with Co-Evolved Difficulty Estimation

Reinforcement learning improves the reasoning ability of large language models but remains costly and sample-inefficient, as many rollouts provide weak learning signals. Difficulty-aware data selection methods attempt to address this by…

Machine Learning · Computer Science 2026-05-12 Yang Zhou , Can Jin , Zihan Dong , Zhepeng Wang , Yanting Yang , Shiyu Zhao , Lei Li , Runxue Bao , Yaochen Xie , Dimitris N. Metaxas

Toward Efficient Automated Feature Engineering

Automated Feature Engineering (AFE) refers to automatically generate and select optimal feature sets for downstream tasks, which has achieved great success in real-world applications. Current AFE methods mainly focus on improving the…

Machine Learning · Computer Science 2022-12-27 Kafeng Wang , Pengyang Wang , Chengzhong xu