Related papers: Addressing Function Approximation Error in Actor-C…

Estimation Error Correction in Deep Reinforcement Learning for Deterministic Actor-Critic Methods

In value-based deep reinforcement learning methods, approximation of value functions induces overestimation bias and leads to suboptimal policies. We show that in deep actor-critic methods that aim to overcome the overestimation bias, if…

Machine Learning · Computer Science 2021-12-28 Baturay Saglam , Enes Duran , Dogan C. Cicek , Furkan B. Mutlu , Suleyman S. Kozat

Error Controlled Actor-Critic

On error of value function inevitably causes an overestimation phenomenon and has a negative impact on the convergence of the algorithms. To mitigate the negative effects of the approximation error, we propose Error Controlled Actor-critic…

Machine Learning · Computer Science 2021-09-08 Xingen Gao , Fei Chao , Changle Zhou , Zhen Ge , Chih-Min Lin , Longzhi Yang , Xiang Chang , Changjing Shang

Cross Learning in Deep Q-Networks

In this work, we propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods, particularly in the deep Q-networks where the overestimation is exaggerated…

Artificial Intelligence · Computer Science 2020-09-30 Xing Wang , Alexander Vinel

Self-correcting Q-Learning

The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overestimation of action values, an important issue that has recently received renewed attention. Double Q-learning has been proposed as an…

Machine Learning · Computer Science 2021-02-03 Rong Zhu , Mattia Rigotti

Parameter-free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients

Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep…

Machine Learning · Computer Science 2022-05-20 Baturay Saglam , Furkan Burak Mutlu , Dogan Can Cicek , Suleyman Serdar Kozat

On the Estimation Bias in Double Q-Learning

Double Q-learning is a classical method for reducing overestimation bias, which is caused by taking maximum estimated values in the Bellman operation. Its variants in the deep Q-learning paradigm have shown great promise in producing…

Machine Learning · Computer Science 2022-01-17 Zhizhou Ren , Guangxiang Zhu , Hao Hu , Beining Han , Jianglun Chen , Chongjie Zhang

Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation

We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning…

Artificial Intelligence · Computer Science 2018-02-23 Hamid Reza Maei

Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction Approach

Compared to on-policy counterparts, off-policy model-free deep reinforcement learning can improve data efficiency by repeatedly using the previously gathered data. However, off-policy learning becomes challenging when the discrepancy…

Machine Learning · Computer Science 2023-09-27 Baturay Saglam , Dogan C. Cicek , Furkan B. Mutlu , Suleyman S. Kozat

Reducing Overestimation Bias in Multi-Agent Domains Using Double Centralized Critics

Many real world tasks require multiple agents to work together. Multi-agent reinforcement learning (RL) methods have been proposed in recent years to solve these tasks, but current methods often fail to efficiently learn policies. We thus…

Machine Learning · Computer Science 2019-12-03 Johannes Ackermann , Volker Gabler , Takayuki Osa , Masashi Sugiyama

Moderate Actor-Critic Methods: Controlling Overestimation Bias via Expectile Loss

Overestimation is a fundamental characteristic of model-free reinforcement learning (MF-RL), arising from the principles of temporal difference learning and the approximation of the Q-function. To address this challenge, we propose a novel…

Machine Learning · Computer Science 2025-04-15 Ukjo Hwang , Songnam Hong

Exploiting Estimation Bias in Clipped Double Q-Learning for Continous Control Reinforcement Learning Tasks

Continuous control Deep Reinforcement Learning (RL) approaches are known to suffer from estimation biases, leading to suboptimal policies. This paper introduces innovative methods in RL, focusing on addressing and exploiting estimation…

Machine Learning · Computer Science 2024-10-14 Niccolò Turcato , Alberto Sinigaglia , Alberto Dalla Libera , Ruggero Carli , Gian Antonio Susto

Handling Cost and Constraints with Off-Policy Deep Reinforcement Learning

By reusing data throughout training, off-policy deep reinforcement learning algorithms offer improved sample efficiency relative to on-policy approaches. For continuous action spaces, the most popular methods for off-policy learning include…

Machine Learning · Computer Science 2023-12-01 Jared Markowitz , Jesse Silverberg , Gary Collins

Adapting Double Q-Learning for Continuous Reinforcement Learning

Majority of off-policy reinforcement learning algorithms use overestimation bias control techniques. Most of these techniques rooted in heuristics, primarily addressing the consequences of overestimation rather than its fundamental origins.…

Machine Learning · Computer Science 2023-09-27 Arsenii Kuznetsov

Efficient Continuous Control with Double Actors and Regularized Critics

How to obtain good value estimation is one of the key problems in Reinforcement Learning (RL). Current value estimation methods, such as DDPG and TD3, suffer from unnecessary over- or underestimation bias. In this paper, we explore the…

Machine Learning · Computer Science 2021-06-08 Jiafei Lyu , Xiaoteng Ma , Jiangpeng Yan , Xiu Li

Deep Reinforcement Learning with Double Q-learning

The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can…

Machine Learning · Computer Science 2015-12-10 Hado van Hasselt , Arthur Guez , David Silver

Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap

Warm-Start reinforcement learning (RL), aided by a prior policy obtained from offline training, is emerging as a promising RL approach for practical applications. Recent empirical studies have demonstrated that the performance of Warm-Start…

Machine Learning · Computer Science 2023-06-21 Hang Wang , Sen Lin , Junshan Zhang

Actor-Director-Critic: A Novel Deep Reinforcement Learning Framework

In this paper, we propose actor-director-critic, a new framework for deep reinforcement learning. Compared with the actor-critic framework, the director role is added, and action classification and action evaluation are applied…

Machine Learning · Computer Science 2023-01-11 Zongwei Liu , Yonghong Song , Yuanlin Zhang

Decorrelated Double Q-learning

Q-learning with value function approximation may have the poor performance because of overestimation bias and imprecise estimate. Specifically, overestimation bias is from the maximum operator over noise estimate, which is exaggerated using…

Machine Learning · Computer Science 2020-06-15 Gang Chen

WD3: Taming the Estimation Bias in Deep Reinforcement Learning

The overestimation phenomenon caused by function approximation is a well-known issue in value-based reinforcement learning algorithms such as deep Q-networks and DDPG, which could lead to suboptimal policies. To address this issue, TD3…

Machine Learning · Computer Science 2023-11-07 Qiang He , Xinwen Hou

Boosting Soft Q-Learning by Bounding

An agent's ability to leverage past experience is critical for efficiently solving new tasks. Prior work has focused on using value function estimates to obtain zero-shot approximations for solutions to a new task. In soft Q-learning, we…

Machine Learning · Computer Science 2024-06-27 Jacob Adamczyk , Volodymyr Makarenko , Stas Tiomkin , Rahul V. Kulkarni