English
Related papers

Related papers: Pseudo-Quantized Actor-Critic Algorithm for Robust…

200 papers

Temporal difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized…

Machine Learning · Computer Science 2025-06-24 Hwanwoo Kim , Panos Toulis , Eric Laber

It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well. However, recent work…

Machine Learning · Computer Science 2020-09-21 Sina Ghiassian , Andrew Patterson , Shivam Garg , Dhawal Gupta , Adam White , Martha White

One of the most basic problems in reinforcement learning (RL) is policy evaluation: estimating the long-term return, i.e., value function, corresponding to a given fixed policy. The celebrated Temporal Difference (TD) learning algorithm…

Machine Learning · Computer Science 2025-02-10 Sreejeet Maity , Aritra Mitra

In large-scale distributed machine learning, recent works have studied the effects of compressing gradients in stochastic optimization to alleviate the communication bottleneck. These works have collectively revealed that stochastic…

Machine Learning · Computer Science 2024-06-05 Aritra Mitra , George J. Pappas , Hamed Hassani

Reinforcement learning (RL) algorithms allow artificial agents to improve their selection of actions to increase rewarding experiences in their environments. Temporal Difference (TD) Learning -- a model-free RL method -- is a leading…

Machine Learning · Computer Science 2019-09-05 Jacob Rafati , David C. Noelle

Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL), aimed at efficiently assessing a policy's value function. TD($\lambda$), a potent variant, incorporates a memory trace to distribute the prediction…

Machine Learning · Computer Science 2024-02-13 Jianfei Ma

Temporal-difference (TD) learning is highly effective at controlling and evaluating an agent's long-term outcomes. Most approaches in this paradigm implement a semi-gradient update to boost the learning speed, which consists of ignoring the…

Machine Learning · Computer Science 2026-05-15 Théo Vincent , Kevin Gerhardt , Yogesh Tripathi , Habib Maraqten , Adam White , Martha White , Jan Peters , Carlo D'Eramo

In a broad class of reinforcement learning applications, stochastic rewards have heavy-tailed distributions, which lead to infinite second-order moments for stochastic (semi)gradients in policy evaluation and direct policy optimization. In…

Machine Learning · Computer Science 2023-06-21 Semih Cayci , Atilla Eryilmaz

To obtain better value estimation in reinforcement learning, we propose a novel algorithm based on the double actor-critic framework with temporal difference error-driven regularization, abbreviated as TDDR. TDDR employs double actors, with…

Machine Learning · Computer Science 2024-10-01 Haohui Chen , Zhiyong Chen , Aoxiang Liu , Wentuo Fang

Test-time reinforcement learning (TTRL) always adapts models at inference time via pseudo-labeling, leaving it vulnerable to spurious optimization signals from label noise. Through an empirical study, we observe that responses with medium…

Machine Learning · Computer Science 2026-04-24 Yongcan Yu , Lingxiao He , Jian Liang , Kuangpu Guo , Meng Wang , Qianlong Xie , Xingxing Wang , Ran He

Off-policy learning ability is an important feature of reinforcement learning (RL) for practical applications. However, even one of the most elementary RL algorithms, temporal-difference (TD) learning, is known to suffer form divergence…

Machine Learning · Computer Science 2025-04-21 Han-Dong Lim , Donghwan Lee

In real scenarios, state observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training. In this paper, we study the training…

Machine Learning · Computer Science 2023-06-23 Ke Sun , Yingnan Zhao , Shangling Jui , Linglong Kong

Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor lambda. Currently the most important application of these methods is to temporal…

Artificial Intelligence · Computer Science 2008-02-03 P. Cichosz

Many reinforcement learning approaches rely on temporal-difference (TD) learning to learn a critic. However, TD-learning updates can be high variance. Here, we introduce a model-based RL framework, Taylor TD, which reduces this variance in…

Machine Learning · Computer Science 2023-10-19 Michele Garibbo , Maxime Robeyns , Laurence Aitchison

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement…

Machine Learning · Computer Science 2018-11-07 Jalaj Bhandari , Daniel Russo , Raghav Singal

We address the issue of estimation bias in deep reinforcement learning (DRL) by introducing solution mechanisms that include a new, twin TD-regularized actor-critic (TDR) method. It aims at reducing both over and under-estimation errors.…

Machine Learning · Computer Science 2023-11-08 Junmin Zhong , Ruofan Wu , Jennie Si

In reinforcement learning (RL), temporal difference (TD) error is known to be related to the firing rate of dopamine neurons. It has been observed that each dopamine neuron does not behave uniformly, but each responds to the TD error in an…

Machine Learning · Computer Science 2026-04-09 Taisuke Kobayashi

The focus of this paper is on stochastic variational inequalities (VI) under Markovian noise. A prominent application of our algorithmic developments is the stochastic policy evaluation problem in reinforcement learning. Prior…

Optimization and Control · Mathematics 2021-08-17 Georgios Kotsalis , Guanghui Lan , Tianjiao Li

Relative temporal-difference (TD) learning was introduced to mitigate the slow convergence of TD methods when the discount factor approaches one by subtracting a baseline from the temporal-difference update. While this idea has been studied…

Machine Learning · Computer Science 2026-04-08 Masoud S. Sakha , Rushikesh Kamalapurkar , Sean Meyn

The goal of this manuscript is to conduct a controltheoretic analysis of Temporal Difference (TD) learning algorithms. TD-learning serves as a cornerstone in the realm of reinforcement learning, offering a methodology for approximating the…

Artificial Intelligence · Computer Science 2023-09-12 Donghwan Lee , Do Wan Kim
‹ Prev 1 2 3 10 Next ›