English
Related papers

Related papers: Direct Advantage Estimation

200 papers

Learning from off-policy data is essential for sample-efficient reinforcement learning. In the present work, we build on the insight that the advantage function can be understood as the causal effect of an action on the return, and show…

Machine Learning · Computer Science 2024-02-21 Hsiao-Ru Pan , Bernhard Schölkopf

This paper proposes an advantage estimation approach based on data augmentation for policy optimization. Unlike using data augmentation on the input to learn value and policy function as existing methods use, our method uses data…

Machine Learning · Computer Science 2022-10-17 Md Masudur Rahman , Yexiang Xue

Generalized Advantage Estimation (GAE) has been used to mitigate the computational complexity of reinforcement learning (RL) by employing an exponentially weighted estimation of the advantage function to reduce the variance in policy…

Machine Learning · Computer Science 2025-07-24 Shahil Shaik , Jonathon M. Smereka , Yue Wang

Reinforcement learning has become a cornerstone technique for developing reasoning models in complex tasks, ranging from mathematical problem-solving to imaginary reasoning. The optimization of these models typically relies on policy…

Machine Learning · Computer Science 2026-02-11 Qingnan Ren , Shiting Huang , Zhen Fang , Zehui Chen , Lin Chen , Lijun Li , Feng Zhao

Recent work has shown that reinforcement learning agents can develop policies that exploit spurious correlations between rewards and observations. This phenomenon, known as policy confounding, arises because the agent's policy influences…

Machine Learning · Computer Science 2025-06-16 Miguel Suau

The discounting mechanism in Reinforcement Learning determines the relative importance of future and present rewards. While exponential discounting is widely used in practice, non-exponential discounting methods that align with human…

Machine Learning · Computer Science 2023-02-14 Ariel Kwiatkowski , Vicky Kalogeiton , Julien Pettré , Marie-Paule Cani

Estimation of value in policy gradient methods is a fundamental problem. Generalized Advantage Estimation (GAE) is an exponentially-weighted estimator of an advantage function similar to $\lambda$-return. It substantially reduces the…

Machine Learning · Computer Science 2023-01-27 Xiulei Song , Yizhao Jin , Greg Slabaugh , Simon Lucas

This paper investigates estimating the variance of a temporal-difference learning agent's update target. Most reinforcement learning methods use an estimate of the value function, which captures how good it is for the agent to be in a…

Artificial Intelligence · Computer Science 2018-02-15 Craig Sherstan , Brendan Bennett , Kenny Young , Dylan R. Ashley , Adam White , Martha White , Richard S. Sutton

Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks. The two main…

Machine Learning · Computer Science 2018-10-23 John Schulman , Philipp Moritz , Sergey Levine , Michael Jordan , Pieter Abbeel

In traditional reinforcement learning, an agent maximizes the reward collected during its interaction with the environment by approximating the optimal policy through the estimation of value functions. Typically, given a state s and action…

Machine Learning · Computer Science 2018-06-20 Shangda Li , Selina Bing , Steven Yang

The estimation of advantage is crucial for a number of reinforcement learning algorithms, as it directly influences the choices of future paths. In this work, we propose a family of estimates based on the order statistics over the path…

Machine Learning · Computer Science 2019-09-17 Lanxin Lei , Zhizhong Li , Dahua Lin

In this paper, we propose a novel framework for multi-agent reinforcement learning that enhances sample efficiency and coordination through accurate per-agent advantage estimation. The core of our approach is Generalized Per-Agent Advantage…

Multiagent Systems · Computer Science 2026-03-10 Seongmin Kim , Giseung Park , Woojun Kim , Jiwon Jeon , Seungyul Han , Youngchul Sung

We consider the problem of imitation learning from a finite set of expert trajectories, without access to reinforcement signals. The classical approach of extracting the expert's reward function via inverse reinforcement learning, followed…

Machine Learning · Computer Science 2019-06-10 Ruohan Wang , Carlo Ciliberto , Pierluigi Amadori , Yiannis Demiris

The beneficial effects of treatments vary across individuals in most studies. Treatment heterogeneity motivates practitioners to search for the optimal policy based on personal characteristics. A long-standing common practice in policy…

Statistics Theory · Mathematics 2025-01-06 Xuqiao Li , Ying Yan

Prioritized Experience Replay (PER) enables the model to learn more about relatively important samples by artificially changing their accessed frequencies. However, this non-uniform sampling method shifts the state-action distribution that…

Machine Learning · Computer Science 2023-11-27 Zhuoying Chen , Huiping Li , Zhaoxu Wang

We study the problem of temporal-difference-based policy evaluation in reinforcement learning. In particular, we analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD), for this…

Machine Learning · Computer Science 2023-05-31 Mark Rowland , Yunhao Tang , Clare Lyle , Rémi Munos , Marc G. Bellemare , Will Dabney

Policy evaluation is a core component of many reinforcement learning (RL) algorithms and a critical tool for ensuring safe deployment of RL policies. However, existing policy evaluation methods often suffer from high variance or bias. To…

Artificial Intelligence · Computer Science 2026-03-23 Shripad Vilasrao Deshmukh , Will Schwarzer , Scott Niekum

We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us…

Machine Learning · Statistics 2011-06-30 Constantin Rothkopf , Christos Dimitrakakis

Reinforcement learning improves the reasoning ability of large language models but remains costly and sample-inefficient, as many rollouts provide weak learning signals. Difficulty-aware data selection methods attempt to address this by…

Machine Learning · Computer Science 2026-05-12 Yang Zhou , Can Jin , Zihan Dong , Zhepeng Wang , Yanting Yang , Shiyu Zhao , Lei Li , Runxue Bao , Yaochen Xie , Dimitris N. Metaxas

Automated Feature Engineering (AFE) refers to automatically generate and select optimal feature sets for downstream tasks, which has achieved great success in real-world applications. Current AFE methods mainly focus on improving the…

Machine Learning · Computer Science 2022-12-27 Kafeng Wang , Pengyang Wang , Chengzhong xu
‹ Prev 1 2 3 10 Next ›