English
Related papers

Related papers: Multi-step Reinforcement Learning: A Unifying Algo…

200 papers

Recently, a new multi-step temporal learning algorithm, called $Q(\sigma)$, unifies $n$-step Tree-Backup (when $\sigma=0$) and $n$-step Sarsa (when $\sigma=1$) by introducing a sampling parameter $\sigma$. However, similar to other…

Artificial Intelligence · Computer Science 2018-02-12 Long Yang , Minhao Shi , Qian Zheng , Wenjia Meng , Gang Pan

Temporal-difference (TD) learning is an important field in reinforcement learning. Sarsa and Q-Learning are among the most used TD algorithms. The Q($\sigma$) algorithm (Sutton and Barto (2017)) unifies both. This paper extends the…

Artificial Intelligence · Computer Science 2017-11-07 Markus Dumke

Full-sampling (e.g., Q-learning) and pure-expectation (e.g., Expected Sarsa) algorithms are efficient and frequently used techniques in reinforcement learning. Q$(\sigma,\lambda)$ is the first approach unifies them with eligibility trace…

Machine Learning · Computer Science 2019-09-09 Long Yang , Yu Zhang , Qian Zheng , Pengfei Li , Gang Pan

In the landscape of TD algorithms, the Q($\sigma$, $\lambda$) algorithm is an algorithm with the ability to perform a multistep backup in an online manner while also successfully unifying the concepts of sampling with using the expectation…

Machine Learning · Computer Science 2019-12-24 Abhishek Nan

Multi-step methods such as Retrace($\lambda$) and $n$-step $Q$-learning have become a crucial component of modern deep reinforcement learning agents. These methods are often evaluated as a part of bigger architectures and their evaluations…

Machine Learning · Computer Science 2019-02-11 J. Fernando Hernandez-Garcia , Richard S. Sutton

Off-policy reinforcement learning with eligibility traces is challenging because of the discrepancy between target policy and behavior policy. One common approach is to measure the difference between two policies in a probabilistic way,…

Machine Learning · Computer Science 2019-05-20 Longxiang Shi , Shijian Li , Longbing Cao , Long Yang , Gang Pan

Multi-step temporal difference (TD) learning is an important approach in reinforcement learning, as it unifies one-step TD learning with Monte Carlo methods in a way where intermediate algorithms can outperform either extreme. They address…

Machine Learning · Computer Science 2018-09-10 Kristopher De Asis , Richard S. Sutton

Reinforcement Learning (RL) can model complex behavior policies for goal-directed sequential decision making tasks. A hallmark of RL algorithms is Temporal Difference (TD) learning: value function for the current state is moved towards a…

Machine Learning · Computer Science 2017-11-07 Sahil Sharma , Girish Raguvir J , Srivatsan Ramesh , Balaraman Ravindran

In reinforcement learning, the TD($\lambda$) algorithm is a fundamental policy evaluation method with an efficient online implementation that is suitable for large-scale problems. One practical drawback of TD($\lambda$) is its sensitivity…

Machine Learning · Statistics 2014-12-23 Aviv Tamar , Panos Toulis , Shie Mannor , Edoardo M. Airoldi

Multi-step temporal-difference (TD) learning, where the update targets contain information from multiple time steps ahead, is one of the most popular forms of TD learning for linear function approximation. The reason is that multi-step…

Artificial Intelligence · Computer Science 2016-08-19 Harm van Seijen

In numerous episodic reinforcement learning (RL) environments, SARSA-based methodologies are employed to enhance policies aimed at maximizing returns over long horizons. Traditional SARSA algorithms face challenges in achieving an optimal…

Machine Learning · Computer Science 2025-09-05 Mahammad Humayoo

Q-Learning is a fundamental off-policy reinforcement learning (RL) algorithm that has the objective of approximating action-value functions in order to learn optimal policies. Nonetheless, it has difficulties in reconciling bias with…

Machine Learning · Computer Science 2024-11-22 Mahammad Humayoo

Learning the value function of a given policy from data samples is an important problem in Reinforcement Learning. TD($\lambda$) is a popular class of algorithms to solve this problem. However, the weights assigned to different $n$-step…

Machine Learning · Computer Science 2021-11-24 Rohan Deb , Meet Gandhi , Shalabh Bhatnagar

Q($\sigma$) is a recently proposed temporal-difference learning method that interpolates between learning from expected backups and sampled backups. It has been shown that intermediate values for the interpolation parameter $\sigma \in…

Machine Learning · Computer Science 2022-06-07 Brett Daley , Isaac Chan

The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g, $n$-step and trace-based returns, have been…

Artificial Intelligence · Computer Science 2018-08-01 Yonathan Efroni , Gal Dalal , Bruno Scherrer , Shie Mannor

The hallmark feature of temporal-difference (TD) learning is bootstrapping: using value predictions to generate new value predictions. The vast majority of TD methods for control learn a policy by bootstrapping from a single action-value…

Machine Learning · Computer Science 2025-09-05 Brett Daley , Prabhat Nagarajan , Martha White , Marlos C. Machado

Learning a stable and generalizable centralized value function (CVF) is a crucial but challenging task in multi-agent reinforcement learning (MARL), as it has to deal with the issue that the joint action space increases exponentially with…

Multiagent Systems · Computer Science 2020-08-11 Xinghu Yao , Chao Wen , Yuhui Wang , Xiaoyang Tan

Q-learning and SARSA are foundational reinforcement learning algorithms whose practical success depends critically on step-size calibration. Step-sizes that are too large can cause numerical instability, while step-sizes that are too small…

Machine Learning · Statistics 2026-01-28 Hwanwoo Kim , Eric Laber

Gradient temporal difference (Gradient TD) algorithms are a popular class of stochastic approximation (SA) algorithms used for policy evaluation in reinforcement learning. Here, we consider Gradient TD algorithms with an additional heavy…

Machine Learning · Computer Science 2021-11-23 Rohan Deb , Shalabh Bhatnagar

We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of…

Machine Learning · Computer Science 2020-03-30 Philip Amortila , Doina Precup , Prakash Panangaden , Marc G. Bellemare
‹ Prev 1 2 3 10 Next ›