English
Related papers

Related papers: Periodic Q-Learning

200 papers

Q-learning is a popular reinforcement learning algorithm. This algorithm has however been studied and analysed mainly in the infinite horizon setting. There are several important applications which can be modeled in the framework of finite…

Machine Learning · Computer Science 2022-08-09 Vivek VP , Dr. Shalabh Bhatnagar

The use of target networks has been a popular and key component of recent deep Q-learning algorithms for reinforcement learning, yet little is known from the theory side. In this work, we introduce a new family of target-based temporal…

Machine Learning · Computer Science 2019-09-24 Donghwan Lee , Niao He

In reinforcement learning (RL), Q-learning is a fundamental algorithm whose convergence is guaranteed in the tabular setting. However, this convergence guarantee does not hold under linear function approximation. To overcome this…

Machine Learning · Computer Science 2026-02-04 Hyukjun Yang , Han-Dong Lim , Donghwan Lee

This paper introduces Q-learning with gradient target tracking, a novel reinforcement learning framework that provides a learned continuous target update mechanism as an alternative to the conventional hard update paradigm. In the standard…

Machine Learning · Computer Science 2025-07-21 Bum Geun Park , Taeho Lee , Donghwan Lee

Deep Q-Learning is an important reinforcement learning algorithm, which involves training a deep neural network, called Deep Q-Network (DQN), to approximate the well-known Q-function. Although wildly successful under laboratory conditions,…

Machine Learning · Computer Science 2021-04-13 Arunselvan Ramaswamy , Eyke Hüllermeier

The use of target networks in deep reinforcement learning is a widely popular solution to mitigate the brittleness of semi-gradient approaches and stabilize learning. However, target networks notoriously require additional memory and delay…

Machine Learning · Computer Science 2026-03-02 Théo Vincent , Yogesh Tripathi , Tim Faust , Abdullah Akgül , Yaniv Oren , Melih Kandemir , Jan Peters , Carlo D'Eramo

A dynamic treatment regime effectively incorporates both accrued information and long-term effects of treatment from specially designed clinical trials. As these become more and more popular in conjunction with longitudinal data from…

Methodology · Statistics 2011-08-29 Rui Song , Weiwei Wang , Donglin Zeng , Michael R. Kosorok

In this work, we propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods, particularly in the deep Q-networks where the overestimation is exaggerated…

Artificial Intelligence · Computer Science 2020-09-30 Xing Wang , Alexander Vinel

Integral to recent successes in deep reinforcement learning has been a class of temporal difference methods that use infrequently updated target values for policy evaluation in a Markov Decision Process. Yet a complete theoretical…

Machine Learning · Computer Science 2023-08-15 Mattie Fellows , Matthew J. A. Smith , Shimon Whiteson

In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in…

Machine Learning · Computer Science 2024-11-01 Jia Lin Hau , Erick Delage , Esther Derman , Mohammad Ghavamzadeh , Marek Petrik

The paper considers a class of multi-agent Markov decision processes (MDPs), in which the network agents respond differently (as manifested by the instantaneous one-stage random costs) to a global controlled state and the control actions of…

Machine Learning · Statistics 2015-06-04 Soummya Kar , Jose' M. F. Moura , H. Vincent Poor

In this article, we propose a novel algorithm for deep reinforcement learning named Expert Q-learning. Expert Q-learning is inspired by Dueling Q-learning and aims at incorporating semi-supervised learning into reinforcement learning…

Machine Learning · Computer Science 2024-06-26 Li Meng , Anis Yazidi , Morten Goodwin , Paal Engelstad

In this paper, we formulate the adaptive learning problem---the problem of how to find an individualized learning plan (called policy) that chooses the most appropriate learning materials based on learner's latent traits---faced in adaptive…

Machine Learning · Computer Science 2020-04-21 Xiao Li , Hanchen Xu , Jinming Zhang , Hua-hua Chang

Reinforcement learning (RL) is a classical tool to solve network control or policy optimization problems in unknown environments. The original Q-learning suffers from performance and complexity challenges across very large networks. Herein,…

Machine Learning · Computer Science 2024-09-02 Talha Bozkus , Urbashi Mitra

In state of the art model-free off-policy deep reinforcement learning, a replay memory is used to store past experience and derive all network updates. Even if both state and action spaces are continuous, the replay memory only holds a…

Machine Learning · Computer Science 2020-07-16 Sabrina Hoppe , Marc Toussaint

An automatic program that generates constant profit from the financial market is lucrative for every market practitioner. Recent advance in deep reinforcement learning provides a framework toward end-to-end training of such trading agent.…

Trading and Market Microstructure · Quantitative Finance 2018-07-10 Chien Yi Huang

Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning…

Machine Learning · Computer Science 2020-03-05 Pan Xu , Quanquan Gu

The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overestimation of action values, an important issue that has recently received renewed attention. Double Q-learning has been proposed as an…

Machine Learning · Computer Science 2021-02-03 Rong Zhu , Mattia Rigotti

It is well-known that information loss can occur in the classic and simple Q-learning algorithm. Entropy-based policy search methods were introduced to replace Q-learning and to design algorithms that are more robust against information…

Machine Learning · Computer Science 2020-06-29 Tung D. Nguyen , Kathryn E. Kasmarik , Hussein A. Abbass

Time-inhomogeneous finite-horizon Markov decision processes (MDP) are frequently employed to model decision-making in dynamic treatment regimes and other statistical reinforcement learning (RL) scenarios. These fields, especially healthcare…

Machine Learning · Computer Science 2025-10-21 Elynn Chen , Sai Li , Michael I. Jordan
‹ Prev 1 2 3 10 Next ›