English
Related papers

Related papers: Temporal-Difference Learning Using Distributed Err…

200 papers

In a broad class of reinforcement learning applications, stochastic rewards have heavy-tailed distributions, which lead to infinite second-order moments for stochastic (semi)gradients in policy evaluation and direct policy optimization. In…

Machine Learning · Computer Science 2023-06-21 Semih Cayci , Atilla Eryilmaz

We study the multi-step off-policy learning approach to distributional RL. Despite the apparent similarity between value-based RL and distributional RL, our study reveals intriguing and fundamental differences between the two cases in the…

Machine Learning · Computer Science 2022-07-18 Yunhao Tang , Mark Rowland , Rémi Munos , Bernardo Ávila Pires , Will Dabney , Marc G. Bellemare

Differential temporal difference (TD) methods are value-based reinforcement learning algorithms that have been proposed for infinite-horizon problems. They rely on reward centering, where each reward is centered by the average reward. This…

Machine Learning · Computer Science 2026-05-07 Kris De Asis , Mohamed Elsayed , Jiamin He

Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor lambda. Currently the most important application of these methods is to temporal…

Artificial Intelligence · Computer Science 2008-02-03 P. Cichosz

Despite the notable success of deep neural networks (DNNs) in solving complex tasks, the training process still remains considerable challenges. A primary obstacle is the substantial time required for training, particularly as high…

Machine Learning · Computer Science 2025-09-09 Viet Hoang Pham , Hyo-Sung Ahn

Biological learning achieves temporal credit assignment despite sparse and imprecise feedback, often relying on neuromodulatory signals acting over space and time. Here, we introduce a learning mechanism in which error information diffuses…

Neurons and Cognition · Quantitative Biology 2026-03-11 João Barretto-Bittar , Anna Levina , Emmanouil Giannakakis , Roxana Zeraati

Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL), aimed at efficiently assessing a policy's value function. TD($\lambda$), a potent variant, incorporates a memory trace to distribute the prediction…

Machine Learning · Computer Science 2024-02-13 Jianfei Ma

Off-policy learning allows us to learn about possible policies of behavior from experience generated by a different behavior policy. Temporal difference (TD) learning algorithms can become unstable when combined with function approximation…

Machine Learning · Computer Science 2021-06-23 Ray Jiang , Tom Zahavy , Zhongwen Xu , Adam White , Matteo Hessel , Charles Blundell , Hado van Hasselt

The use of target networks has been a popular and key component of recent deep Q-learning algorithms for reinforcement learning, yet little is known from the theory side. In this work, we introduce a new family of target-based temporal…

Machine Learning · Computer Science 2019-09-24 Donghwan Lee , Niao He

Reinforcement learning (RL) algorithms allow artificial agents to improve their selection of actions to increase rewarding experiences in their environments. Temporal Difference (TD) Learning -- a model-free RL method -- is a leading…

Machine Learning · Computer Science 2019-09-05 Jacob Rafati , David C. Noelle

Temporal-difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning. However, due to the nonlinearity in value function approximation, such a coupling leads to…

Machine Learning · Computer Science 2020-04-16 Qi Cai , Zhuoran Yang , Jason D. Lee , Zhaoran Wang

Multi-step temporal difference (TD) learning is an important approach in reinforcement learning, as it unifies one-step TD learning with Monte Carlo methods in a way where intermediate algorithms can outperform either extreme. They address…

Machine Learning · Computer Science 2018-09-10 Kristopher De Asis , Richard S. Sutton

Big Data works perfectly along with Deep learning to extract knowledge from a huge amount of data. However, this processing could take a lot of training time. Genomics is a Big Data science with high dimensionality. It relies on deep…

Neural and Evolutionary Computing · Computer Science 2024-05-28 Tasnim Assali , Zayneb Trabelsi Ayoub , Sofiane Ouni

Q-Learning is a fundamental off-policy reinforcement learning (RL) algorithm that has the objective of approximating action-value functions in order to learn optimal policies. Nonetheless, it has difficulties in reconciling bias with…

Machine Learning · Computer Science 2024-11-22 Mahammad Humayoo

It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well. However, recent work…

Machine Learning · Computer Science 2020-09-21 Sina Ghiassian , Andrew Patterson , Shivam Garg , Dhawal Gupta , Adam White , Martha White

Temporal difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized…

Machine Learning · Computer Science 2025-06-24 Hwanwoo Kim , Panos Toulis , Eric Laber

The brain processes information through many layers of neurons. This deep architecture is representationally powerful, but it complicates learning by making it hard to identify the responsible neurons when a mistake is made. In machine…

Neurons and Cognition · Quantitative Biology 2014-11-04 Timothy P. Lillicrap , Daniel Cownden , Douglas B. Tweed , Colin J. Akerman

The Reward Prediction Error hypothesis proposes that phasic activity in the midbrain dopaminergic system reflects prediction errors needed for learning in reinforcement learning. Besides the well-documented association between dopamine and…

Neurons and Cognition · Quantitative Biology 2022-07-26 William H. Alexander , Samuel J. Gershman

In mMTC mode, with thousands of devices trying to access network resources sporadically, the problem of random access (RA) and collisions between devices that select the same resources becomes crucial. A promising approach to solve such an…

Machine Learning · Computer Science 2021-11-02 Giovanni Maciel Ferreira Silva , Taufik Abrao

Solving the synaptic Credit Assignment Problem(CAP) is central to learning in both biological and artificial neural systems. Finding an optimal solution for synaptic CAP means setting the synaptic weights that assign credit to each neuron…

Artificial Intelligence · Computer Science 2025-10-28 Saranraj Nambusubramaniyan , Shervin Safavi , Raja Guru , Andreas Knoblauch
‹ Prev 1 2 3 10 Next ›