English
Related papers

Related papers: Addressing Function Approximation Error in Actor-C…

200 papers

In value-based deep reinforcement learning methods, approximation of value functions induces overestimation bias and leads to suboptimal policies. We show that in deep actor-critic methods that aim to overcome the overestimation bias, if…

Machine Learning · Computer Science 2021-12-28 Baturay Saglam , Enes Duran , Dogan C. Cicek , Furkan B. Mutlu , Suleyman S. Kozat

On error of value function inevitably causes an overestimation phenomenon and has a negative impact on the convergence of the algorithms. To mitigate the negative effects of the approximation error, we propose Error Controlled Actor-critic…

Machine Learning · Computer Science 2021-09-08 Xingen Gao , Fei Chao , Changle Zhou , Zhen Ge , Chih-Min Lin , Longzhi Yang , Xiang Chang , Changjing Shang

In this work, we propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods, particularly in the deep Q-networks where the overestimation is exaggerated…

Artificial Intelligence · Computer Science 2020-09-30 Xing Wang , Alexander Vinel

The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overestimation of action values, an important issue that has recently received renewed attention. Double Q-learning has been proposed as an…

Machine Learning · Computer Science 2021-02-03 Rong Zhu , Mattia Rigotti

Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep…

Machine Learning · Computer Science 2022-05-20 Baturay Saglam , Furkan Burak Mutlu , Dogan Can Cicek , Suleyman Serdar Kozat

Double Q-learning is a classical method for reducing overestimation bias, which is caused by taking maximum estimated values in the Bellman operation. Its variants in the deep Q-learning paradigm have shown great promise in producing…

Machine Learning · Computer Science 2022-01-17 Zhizhou Ren , Guangxiang Zhu , Hao Hu , Beining Han , Jianglun Chen , Chongjie Zhang

We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning…

Artificial Intelligence · Computer Science 2018-02-23 Hamid Reza Maei

Compared to on-policy counterparts, off-policy model-free deep reinforcement learning can improve data efficiency by repeatedly using the previously gathered data. However, off-policy learning becomes challenging when the discrepancy…

Machine Learning · Computer Science 2023-09-27 Baturay Saglam , Dogan C. Cicek , Furkan B. Mutlu , Suleyman S. Kozat

Many real world tasks require multiple agents to work together. Multi-agent reinforcement learning (RL) methods have been proposed in recent years to solve these tasks, but current methods often fail to efficiently learn policies. We thus…

Machine Learning · Computer Science 2019-12-03 Johannes Ackermann , Volker Gabler , Takayuki Osa , Masashi Sugiyama

Overestimation is a fundamental characteristic of model-free reinforcement learning (MF-RL), arising from the principles of temporal difference learning and the approximation of the Q-function. To address this challenge, we propose a novel…

Machine Learning · Computer Science 2025-04-15 Ukjo Hwang , Songnam Hong

Continuous control Deep Reinforcement Learning (RL) approaches are known to suffer from estimation biases, leading to suboptimal policies. This paper introduces innovative methods in RL, focusing on addressing and exploiting estimation…

Machine Learning · Computer Science 2024-10-14 Niccolò Turcato , Alberto Sinigaglia , Alberto Dalla Libera , Ruggero Carli , Gian Antonio Susto

By reusing data throughout training, off-policy deep reinforcement learning algorithms offer improved sample efficiency relative to on-policy approaches. For continuous action spaces, the most popular methods for off-policy learning include…

Machine Learning · Computer Science 2023-12-01 Jared Markowitz , Jesse Silverberg , Gary Collins

Majority of off-policy reinforcement learning algorithms use overestimation bias control techniques. Most of these techniques rooted in heuristics, primarily addressing the consequences of overestimation rather than its fundamental origins.…

Machine Learning · Computer Science 2023-09-27 Arsenii Kuznetsov

How to obtain good value estimation is one of the key problems in Reinforcement Learning (RL). Current value estimation methods, such as DDPG and TD3, suffer from unnecessary over- or underestimation bias. In this paper, we explore the…

Machine Learning · Computer Science 2021-06-08 Jiafei Lyu , Xiaoteng Ma , Jiangpeng Yan , Xiu Li

The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can…

Machine Learning · Computer Science 2015-12-10 Hado van Hasselt , Arthur Guez , David Silver

Warm-Start reinforcement learning (RL), aided by a prior policy obtained from offline training, is emerging as a promising RL approach for practical applications. Recent empirical studies have demonstrated that the performance of Warm-Start…

Machine Learning · Computer Science 2023-06-21 Hang Wang , Sen Lin , Junshan Zhang

In this paper, we propose actor-director-critic, a new framework for deep reinforcement learning. Compared with the actor-critic framework, the director role is added, and action classification and action evaluation are applied…

Machine Learning · Computer Science 2023-01-11 Zongwei Liu , Yonghong Song , Yuanlin Zhang

Q-learning with value function approximation may have the poor performance because of overestimation bias and imprecise estimate. Specifically, overestimation bias is from the maximum operator over noise estimate, which is exaggerated using…

Machine Learning · Computer Science 2020-06-15 Gang Chen

The overestimation phenomenon caused by function approximation is a well-known issue in value-based reinforcement learning algorithms such as deep Q-networks and DDPG, which could lead to suboptimal policies. To address this issue, TD3…

Machine Learning · Computer Science 2023-11-07 Qiang He , Xinwen Hou

An agent's ability to leverage past experience is critical for efficiently solving new tasks. Prior work has focused on using value function estimates to obtain zero-shot approximations for solutions to a new task. In soft Q-learning, we…

Machine Learning · Computer Science 2024-06-27 Jacob Adamczyk , Volodymyr Makarenko , Stas Tiomkin , Rahul V. Kulkarni
‹ Prev 1 2 3 10 Next ›