English
Related papers

Related papers: Regularized Q-learning through Robust Averaging

200 papers

Q-learning is a stochastic approximation version of the classic value iteration. The literature has established that Q-learning suffers from both maximization bias and slower convergence. Recently, multi-step algorithms have shown practical…

Machine Learning · Computer Science 2024-07-03 Antony Vijesh , Shreyas S R

Reinforcement learning algorithms based on Q-learning are driving Deep Reinforcement Learning (DRL) research towards solving complex problems and achieving super-human performance on many of them. Nevertheless, Q-Learning is known to be…

Machine Learning · Computer Science 2022-06-14 Andrea Cini , Carlo D'Eramo , Jan Peters , Cesare Alippi

The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overestimation of action values, an important issue that has recently received renewed attention. Double Q-learning has been proposed as an…

Machine Learning · Computer Science 2021-02-03 Rong Zhu , Mattia Rigotti

Bias problems in the estimation of $Q$-values are a well-known obstacle that slows down convergence of $Q$-learning and actor-critic methods. One of the reasons of the success of modern RL algorithms is partially a direct or indirect…

Machine Learning · Computer Science 2025-06-26 Leif Döring , Benedikt Wille , Maximilian Birr , Mihail Bîrsan , Martin Slowik

Q-learning is a regression-based approach that is widely used to formalize the development of an optimal dynamic treatment strategy. Finite dimensional working models are typically used to estimate certain nuisance parameters, and…

Methodology · Statistics 2020-03-30 Ashkan Ertefaie , James R. McKay , David Oslin , Robert L. Strawderman

While Bayesian-based exploration often demonstrates superior empirical performance compared to bonus-based methods in model-based reinforcement learning (RL), its theoretical understanding remains limited for model-free settings. Existing…

Machine Learning · Computer Science 2026-02-05 He Wang , Xingyu Xu , Yuejie Chi

The optimistic nature of the Q-learning target leads to an overestimation bias, which is an inherent problem associated with standard $Q-$learning. Such a bias fails to account for the possibility of low returns, particularly in risky…

Machine Learning · Computer Science 2021-11-05 Thommen George Karimpanal , Hung Le , Majid Abdolshah , Santu Rana , Sunil Gupta , Truyen Tran , Svetha Venkatesh

In Reinforcement Learning the Q-learning algorithm provably converges to the optimal solution. However, as others have demonstrated, Q-learning can also overestimate the values and thereby spend too long exploring unhelpful states. Double…

Machine Learning · Computer Science 2023-03-16 David Barber

$Q$-learning is one of the most fundamental reinforcement learning (RL) algorithms. Despite its widespread success in various applications, it is prone to overestimation bias in the $Q$-learning update. To address this issue, double…

Machine Learning · Computer Science 2026-01-13 Hyunjun Na , Donghwan Lee

Double Q-learning is a classical method for reducing overestimation bias, which is caused by taking maximum estimated values in the Bellman operation. Its variants in the deep Q-learning paradigm have shown great promise in producing…

Machine Learning · Computer Science 2022-01-17 Zhizhou Ren , Guangxiang Zhu , Hao Hu , Beining Han , Jianglun Chen , Chongjie Zhang

We present a novel $Q$-learning algorithm tailored to solve distributionally robust Markov decision problems where the corresponding ambiguity set of transition probabilities for the underlying Markov decision process is a Wasserstein ball…

Machine Learning · Computer Science 2024-06-21 Ariel Neufeld , Julian Sester

We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust $Q$-learning…

Machine Learning · Computer Science 2024-08-02 Shengbo Wang , Nian Si , Jose Blanchet , Zhengyuan Zhou

The goal of this paper is to propose a new Q-learning algorithm with a dummy adversarial player, which is called dummy adversarial Q-learning (DAQ), that can effectively regulate the overestimation bias in standard Q-learning. With the…

Machine Learning · Computer Science 2024-10-01 HyeAnn Lee , Donghwan Lee

Dynamic decision-making under distributional shifts is of fundamental interest in theory and applications of reinforcement learning: The distribution of the environment in which the data is collected can differ from that of the environment…

Machine Learning · Computer Science 2024-09-05 Shengbo Wang , Nian Si , Jose Blanchet , Zhengyuan Zhou

Q-learning (QL), a common reinforcement learning algorithm, suffers from over-estimation bias due to the maximization term in the optimal Bellman operator. This bias may lead to sub-optimal behavior. Double-Q-learning tackles this issue by…

Machine Learning · Computer Science 2021-04-21 Oren Peer , Chen Tessler , Nadav Merlis , Ron Meir

We give an efficient algorithm for learning a binary function in a given class C of bounded VC dimension, with training data distributed according to P and test data according to Q, where P and Q may be arbitrary distributions over X. This…

Machine Learning · Computer Science 2021-02-17 Adam Kalai , Varun Kanade

This paper studies the robustness of reinforcement learning algorithms to errors in the learning process. Specifically, we revisit the benchmark problem of discrete-time linear quadratic regulation (LQR) and study the long-standing open…

Optimization and Control · Mathematics 2021-03-16 Bo Pang , Zhong-Ping Jiang

We propose a novel distributionally robust $Q$-learning algorithm for the non-tabular case accounting for continuous state spaces where the state transition of the underlying Markov decision process is subject to model uncertainty. The…

Machine Learning · Computer Science 2025-05-27 Chung I Lu , Julian Sester , Aijia Zhang

The goal of robust reinforcement learning (RL) is to learn a policy that is robust against the uncertainty in model parameters. Parameter uncertainty commonly occurs in many real-world RL applications due to simulator modeling errors,…

Machine Learning · Computer Science 2022-10-19 Kishan Panaganti , Zaiyan Xu , Dileep Kalathil , Mohammad Ghavamzadeh

``Distribution shift'' is the main obstacle to the success of offline reinforcement learning. A learning policy may take actions beyond the behavior policy's knowledge, referred to as Out-of-Distribution (OOD) actions. The Q-values for…

Machine Learning · Computer Science 2025-01-14 Jing Zhang , Linjiajie Fang , Kexin Shi , Wenjia Wang , Bing-Yi Jing
‹ Prev 1 2 3 10 Next ›