Related papers: Temporal-Difference Learning Using Distributed Err…

Provably Robust Temporal Difference Learning for Heavy-Tailed Rewards

In a broad class of reinforcement learning applications, stochastic rewards have heavy-tailed distributions, which lead to infinite second-order moments for stochastic (semi)gradients in policy evaluation and direct policy optimization. In…

Machine Learning · Computer Science 2023-06-21 Semih Cayci , Atilla Eryilmaz

The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning

We study the multi-step off-policy learning approach to distributional RL. Despite the apparent similarity between value-based RL and distributional RL, our study reveals intriguing and fundamental differences between the two cases in the…

Machine Learning · Computer Science 2022-07-18 Yunhao Tang , Mark Rowland , Rémi Munos , Bernardo Ávila Pires , Will Dabney , Marc G. Bellemare

Extending Differential Temporal Difference Methods for Episodic Problems

Differential temporal difference (TD) methods are value-based reinforcement learning algorithms that have been proposed for infinite-horizon problems. They rely on reward centering, where each reward is centered by the average reward. This…

Machine Learning · Computer Science 2026-05-07 Kris De Asis , Mohamed Elsayed , Jiamin He

Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning

Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor lambda. Currently the most important application of these methods is to temporal…

Artificial Intelligence · Computer Science 2008-02-03 P. Cichosz

Distributed Deep Learning using Stochastic Gradient Staleness

Despite the notable success of deep neural networks (DNNs) in solving complex tasks, the training process still remains considerable challenges. A primary obstacle is the substantial time required for training, particularly as high…

Machine Learning · Computer Science 2025-09-09 Viet Hoang Pham , Hyo-Sung Ahn

Diffusion of Neuromodulators for Temporal Credit Assignment

Biological learning achieves temporal credit assignment despite sparse and imprecise feedback, often relying on neuromodulatory signals acting over space and time. Here, we introduce a learning mechanism in which error information diffuses…

Neurons and Cognition · Quantitative Biology 2026-03-11 João Barretto-Bittar , Anna Levina , Emmanouil Giannakakis , Roxana Zeraati

Discerning Temporal Difference Learning

Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL), aimed at efficiently assessing a policy's value function. TD($\lambda$), a potent variant, incorporates a memory trace to distribute the prediction…

Machine Learning · Computer Science 2024-02-13 Jianfei Ma

Emphatic Algorithms for Deep Reinforcement Learning

Off-policy learning allows us to learn about possible policies of behavior from experience generated by a different behavior policy. Temporal difference (TD) learning algorithms can become unstable when combined with function approximation…

Machine Learning · Computer Science 2021-06-23 Ray Jiang , Tom Zahavy , Zhongwen Xu , Adam White , Matteo Hessel , Charles Blundell , Hado van Hasselt

Target-Based Temporal Difference Learning

The use of target networks has been a popular and key component of recent deep Q-learning algorithms for reinforcement learning, yet little is known from the theory side. In this work, we introduce a new family of target-based temporal…

Machine Learning · Computer Science 2019-09-24 Donghwan Lee , Niao He

Learning sparse representations in reinforcement learning

Reinforcement learning (RL) algorithms allow artificial agents to improve their selection of actions to increase rewarding experiences in their environments. Temporal Difference (TD) Learning -- a model-free RL method -- is a leading…

Machine Learning · Computer Science 2019-09-05 Jacob Rafati , David C. Noelle

Neural Temporal-Difference and Q-Learning Provably Converge to Global Optima

Temporal-difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning. However, due to the nonlinearity in value function approximation, such a coupling leads to…

Machine Learning · Computer Science 2020-04-16 Qi Cai , Zhuoran Yang , Jason D. Lee , Zhaoran Wang

Per-decision Multi-step Temporal Difference Learning with Control Variates

Multi-step temporal difference (TD) learning is an important approach in reinforcement learning, as it unifies one-step TD learning with Monte Carlo methods in a way where intermediate algorithms can outperform either extreme. They address…

Machine Learning · Computer Science 2018-09-10 Kristopher De Asis , Richard S. Sutton

Apply Distributed CNN on Genomics to accelerate Transcription-Factor TAL1 Motif Prediction

Big Data works perfectly along with Deep learning to extract knowledge from a huge amount of data. However, this processing could take a lot of training time. Genomics is a Big Data science with high dimensionality. It relies on deep…

Neural and Evolutionary Computing · Computer Science 2024-05-28 Tasnim Assali , Zayneb Trabelsi Ayoub , Sofiane Ouni

Time-Scale Separation in Q-Learning: Extending TD($\triangle$) for Action-Value Function Decomposition

Q-Learning is a fundamental off-policy reinforcement learning (RL) algorithm that has the objective of approximating action-value functions in order to learn optimal policies. Nonetheless, it has difficulties in reconciling bias with…

Machine Learning · Computer Science 2024-11-22 Mahammad Humayoo

Gradient Temporal-Difference Learning with Regularized Corrections

It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well. However, recent work…

Machine Learning · Computer Science 2020-09-21 Sina Ghiassian , Andrew Patterson , Shivam Garg , Dhawal Gupta , Adam White , Martha White

Stabilizing Temporal Difference Learning via Implicit Stochastic Recursion

Temporal difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized…

Machine Learning · Computer Science 2025-06-24 Hwanwoo Kim , Panos Toulis , Eric Laber

Random feedback weights support learning in deep neural networks

The brain processes information through many layers of neurons. This deep architecture is representationally powerful, but it complicates learning by making it hard to identify the responsible neurons when a mistake is made. In machine…

Neurons and Cognition · Quantitative Biology 2014-11-04 Timothy P. Lillicrap , Daniel Cownden , Douglas B. Tweed , Colin J. Akerman

Representation learning with reward prediction errors

The Reward Prediction Error hypothesis proposes that phasic activity in the midbrain dopaminergic system reflects prediction errors needed for learning in reinforcement learning. Besides the well-documented association between dopamine and…

Neurons and Cognition · Quantitative Biology 2022-07-26 William H. Alexander , Samuel J. Gershman

Throughput and Latency in the Distributed Q-Learning Random Access mMTC Networks

In mMTC mode, with thousands of devices trying to access network resources sporadically, the problem of random access (RA) and collisions between devices that select the same resources becomes crucial. A promising approach to solve such an…

Machine Learning · Computer Science 2021-11-02 Giovanni Maciel Ferreira Silva , Taufik Abrao

Dopamine-driven synaptic credit assignment in neural networks

Solving the synaptic Credit Assignment Problem(CAP) is central to learning in both biological and artificial neural systems. Finding an optimal solution for synaptic CAP means setting the synaptic weights that assign credit to each neuron…

Artificial Intelligence · Computer Science 2025-10-28 Saranraj Nambusubramaniyan , Shervin Safavi , Raja Guru , Andreas Knoblauch