Related papers: Gradient Q$(\sigma, \lambda)$: A Unified Algorithm…

Multi-step Reinforcement Learning: A Unifying Algorithm

Unifying seemingly disparate algorithmic ideas to produce better performing algorithms has been a longstanding goal in reinforcement learning. As a primary example, TD($\lambda$) elegantly unifies one-step TD prediction with Monte Carlo…

Artificial Intelligence · Computer Science 2018-06-13 Kristopher De Asis , J. Fernando Hernandez-Garcia , G. Zacharias Holland , Richard S. Sutton

A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning

Recently, a new multi-step temporal learning algorithm, called $Q(\sigma)$, unifies $n$-step Tree-Backup (when $\sigma=0$) and $n$-step Sarsa (when $\sigma=1$) by introducing a sampling parameter $\sigma$. However, similar to other…

Artificial Intelligence · Computer Science 2018-02-12 Long Yang , Minhao Shi , Qian Zheng , Wenjia Meng , Gang Pan

Double Q($\sigma$) and Q($\sigma, \lambda$): Unifying Reinforcement Learning Control Algorithms

Temporal-difference (TD) learning is an important field in reinforcement learning. Sarsa and Q-Learning are among the most used TD algorithms. The Q($\sigma$) algorithm (Sutton and Barto (2017)) unifies both. This paper extends the…

Artificial Intelligence · Computer Science 2017-11-07 Markus Dumke

Learning the Model While Learning Q: Finite-Time Sample Complexity of Online SyncMBQ

Reinforcement learning has witnessed significant advancements, particularly with the emergence of model-based approaches. Among these, $Q$-learning has proven to be a powerful algorithm in model-free settings. However, the extension of…

Machine Learning · Computer Science 2026-03-31 Han-Dong Lim , HyeAnn Lee , Donghwan Lee

TBQ($\sigma$): Improving Efficiency of Trace Utilization for Off-Policy Reinforcement Learning

Off-policy reinforcement learning with eligibility traces is challenging because of the discrepancy between target policy and behavior policy. One common approach is to measure the difference between two policies in a probabilistic way,…

Machine Learning · Computer Science 2019-05-20 Longxiang Shi , Shijian Li , Longbing Cao , Long Yang , Gang Pan

A study on a Q-Learning algorithm application to a manufacturing assembly problem

The development of machine learning algorithms has been gathering relevance to address the increasing modelling complexity of manufacturing decision-making problems. Reinforcement learning is a methodology with great potential due to the…

Machine Learning · Computer Science 2023-04-18 Miguel Neves , Miguel Vieira , Pedro Neto

Smooth Q-learning: Accelerate Convergence of Q-learning Using Similarity

An improvement of Q-learning is proposed in this paper. It is different from classic Q-learning in that the similarity between different states and actions is considered in the proposed method. During the training, a new updating mechanism…

Artificial Intelligence · Computer Science 2021-06-03 Wei Liao , Xiaohui Wei , Jizhou Lai

Reinforcement Learning with Linear Function Approximation and LQ control Converges

Reinforcement learning is commonly used with function approximation. However, very few positive results are known about the convergence of function approximation based RL control algorithms. In this paper we show that TD(0) and Sarsa(0)…

Machine Learning · Computer Science 2007-05-23 Istvan Szita , Andras Lorincz

Smoothed Q-learning

In Reinforcement Learning the Q-learning algorithm provably converges to the optimal solution. However, as others have demonstrated, Q-learning can also overestimate the values and thereby spend too long exploring unhelpful states. Double…

Machine Learning · Computer Science 2023-03-16 David Barber

Implicit Q-Learning and SARSA: Liberating Policy Control from Step-Size Calibration

Q-learning and SARSA are foundational reinforcement learning algorithms whose practical success depends critically on step-size calibration. Step-sizes that are too large can cause numerical instability, while step-sizes that are too small…

Machine Learning · Statistics 2026-01-28 Hwanwoo Kim , Eric Laber

Regularized Q-learning

Q-learning is widely used algorithm in reinforcement learning community. Under the lookup table setting, its convergence is well established. However, its behavior is known to be unstable with the linear function approximation case. This…

Machine Learning · Computer Science 2025-02-11 Han-Dong Lim , Donghwan Lee

Deep Constrained Q-learning

In many real world applications, reinforcement learning agents have to optimize multiple objectives while following certain rules or satisfying a list of constraints. Classical methods based on reward shaping, i.e. a weighted combination of…

Machine Learning · Computer Science 2020-09-15 Gabriel Kalweit , Maria Huegle , Moritz Werling , Joschka Boedecker

Q-learning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast Learning

Sample complexity bounds are a common performance metric in the Reinforcement Learning literature. In the discounted cost, infinite horizon setting, all of the known bounds have a factor that is a polynomial in $1/(1-\gamma)$, where $\gamma…

Machine Learning · Computer Science 2020-07-09 Adithya M. Devraj , Sean P. Meyn

Reinforcement Learning by Comparing Immediate Reward

This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate rewards using a variation of Q-Learning algorithm. Unlike the conventional Q-Learning, the proposed algorithm compares current reward with…

Machine Learning · Computer Science 2010-09-15 Punit Pandey , Deepshikha Pandey , Shishir Kumar

Periodic Regularized Q-Learning

In reinforcement learning (RL), Q-learning is a fundamental algorithm whose convergence is guaranteed in the tabular setting. However, this convergence guarantee does not hold under linear function approximation. To overcome this…

Machine Learning · Computer Science 2026-02-04 Hyukjun Yang , Han-Dong Lim , Donghwan Lee

GQ($\lambda$) Quick Reference and Implementation Guide

This document should serve as a quick reference for and guide to the implementation of linear GQ($\lambda$), a gradient-based off-policy temporal-difference learning algorithm. Explanation of the intuition and theory behind the algorithm…

Machine Learning · Computer Science 2017-05-12 Adam White , Richard S. Sutton

On Convergence of Gradient Expected Sarsa($\lambda$)

We study the convergence of $\mathtt{Expected~Sarsa}(\lambda)$ with linear function approximation. We show that applying the off-line estimate (multi-step bootstrapping) to $\mathtt{Expected~Sarsa}(\lambda)$ is unstable for off-policy…

Machine Learning · Computer Science 2020-12-15 Long Yang , Gang Zheng , Yu Zhang , Qian Zheng , Pengfei Li , Gang Pan

Augmented Q Imitation Learning (AQIL)

The study of unsupervised learning can be generally divided into two categories: imitation learning and reinforcement learning. In imitation learning the machine learns by mimicking the behavior of an expert system whereas in reinforcement…

Machine Learning · Computer Science 2020-04-07 Xiao Lei Zhang , Anish Agarwal

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

We present an approach called Q-probing to adapt a pre-trained language model to maximize a task-specific reward function. At a high level, Q-probing sits between heavier approaches such as finetuning and lighter approaches such as few shot…

Machine Learning · Computer Science 2024-06-04 Kenneth Li , Samy Jelassi , Hugh Zhang , Sham Kakade , Martin Wattenberg , David Brandfonbrener

Two-Step Q-Learning

Q-learning is a stochastic approximation version of the classic value iteration. The literature has established that Q-learning suffers from both maximization bias and slower convergence. Recently, multi-step algorithms have shown practical…

Machine Learning · Computer Science 2024-07-03 Antony Vijesh , Shreyas S R