English
Related papers

Related papers: Finite-Sample Analysis of Decentralized Temporal-D…

200 papers

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement…

Machine Learning · Computer Science 2018-11-07 Jalaj Bhandari , Daniel Russo , Raghav Singal

In this paper, we consider the policy evaluation problem in multi-agent reinforcement learning (MARL) and derive exact closed-form formulas for the finite-time mean-squared estimation errors of decentralized temporal difference (TD)…

Machine Learning · Computer Science 2022-04-22 Xingang Guo , Bin Hu

This paper considers the policy evaluation problem in a multi-agent reinforcement learning (MARL) environment over decentralized and directed networks. The focus is on decentralized temporal difference (TD) learning with linear function…

Optimization and Control · Mathematics 2021-09-01 Zhaoxian Wu , Han Shen , Tianyi Chen , Qing Ling

We study the policy evaluation problem in multi-agent reinforcement learning, modeled by a Markov decision process. In this problem, the agents operate in a common environment under a fixed control policy, working together to discover the…

Optimization and Control · Mathematics 2020-01-13 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

In this paper, we study the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The purpose of distributional TD learning is to estimate the return distribution of a…

Machine Learning · Statistics 2025-11-18 Kaicheng Jin , Yang Peng , Jiansheng Yang , Zhihua Zhang

In this paper, we study the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The aim of distributional TD learning is to estimate the return distribution of a discounted…

Machine Learning · Statistics 2025-05-14 Yang Peng , Kaicheng Jin , Liangyu Zhang , Zhihua Zhang

Recent research endeavours have theoretically shown the beneficial effect of cooperation in multi-agent reinforcement learning (MARL). In a setting involving $N$ agents, this beneficial effect usually comes in the form of an $N$-fold linear…

Multiagent Systems · Computer Science 2024-07-31 Nicolò Dal Fabbro , Arman Adibi , Aritra Mitra , George J. Pappas

Temporal difference (TD) learning is a fundamental algorithm for estimating value functions in reinforcement learning. Recent finite-time analyses of TD with linear function approximation quantify its theoretical convergence rate. However,…

Machine Learning · Computer Science 2026-03-04 Yunxiang Li , Mark Schmidt , Reza Babanezhad , Sharan Vaswani

We study the policy evaluation problem in multi-agent reinforcement learning. In this problem, a group of agents works cooperatively to evaluate the value function for the global discounted accumulative reward problem, which is composed of…

Optimization and Control · Mathematics 2019-06-04 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

The goal of this manuscript is to conduct a controltheoretic analysis of Temporal Difference (TD) learning algorithms. TD-learning serves as a cornerstone in the realm of reinforcement learning, offering a methodology for approximating the…

Artificial Intelligence · Computer Science 2023-09-12 Donghwan Lee , Do Wan Kim

Despite the increasing interest in multi-agent reinforcement learning (MARL) in multiple communities, understanding its theoretical foundation has long been recognized as a challenging problem. In this work, we address this problem by…

Machine Learning · Computer Science 2020-12-15 Kaiqing Zhang , Zhuoran Yang , Han Liu , Tong Zhang , Tamer Başar

We consider the problem of \emph{fully decentralized} multi-agent reinforcement learning (MARL), where the agents are located at the nodes of a time-varying communication network. Specifically, we assume that the reward functions of the…

Machine Learning · Computer Science 2018-02-28 Kaiqing Zhang , Zhuoran Yang , Han Liu , Tong Zhang , Tamer Başar

Temporal difference (TD) learning is a cornerstone reinforcement learning (RL) method for policy evaluation, where the goal is to estimate the value function of a Markov decision process under a fixed policy. While a substantial body of…

Machine Learning · Computer Science 2026-02-02 Donghwan Lee , Do Wan Kim

One of the challenges for multi-agent reinforcement learning (MARL) is designing efficient learning algorithms for a large system in which each agent has only limited or partial information of the entire system. While exciting progress has…

Machine Learning · Computer Science 2022-02-22 Haotian Gu , Xin Guo , Xiaoli Wei , Renyuan Xu

Temporal-difference (TD) learning is widely regarded as one of the most popular algorithms in reinforcement learning (RL). Despite its widespread use, it has only been recently that researchers have begun to actively study its finite time…

Machine Learning · Computer Science 2025-04-16 Han-Dong Lim , Donghwan Lee

Temporal difference (TD) learning algorithms with neural network function parameterization have well-established empirical success in many practical large-scale reinforcement learning tasks. However, theoretical understanding of these…

Machine Learning · Computer Science 2024-05-08 Zhifa Ke , Zaiwen Wen , Junyu Zhang

This paper revisits the temporal difference (TD) learning algorithm for the policy evaluation tasks in reinforcement learning. Typically, the performance of TD(0) and TD($\lambda$) is very sensitive to the choice of stepsizes. Oftentimes,…

Optimization and Control · Mathematics 2021-10-12 Tao Sun , Han Shen , Tianyi Chen , Dongsheng Li

The finite-time convergence of off-policy TD learning has been comprehensively studied recently. However, such a type of convergence has not been well established for off-policy TD learning in the multi-agent setting, which covers broader…

Machine Learning · Computer Science 2021-03-25 Ziyi Chen , Yi Zhou , Rongrong Chen

[Zhang, ICML 2018] provided the first decentralized actor-critic algorithm for multi-agent reinforcement learning (MARL) that offers convergence guarantees. In that work, policies are stochastic and are defined on finite action spaces. We…

Machine Learning · Computer Science 2021-02-22 Antoine Grosnit , Desmond Cai , Laura Wynter

In this paper we consider the problem of obtaining sharp bounds for the performance of temporal difference (TD) methods with linear function approximation for policy evaluation in discounted Markov decision processes. We show that a simple…

Machine Learning · Statistics 2024-06-18 Sergey Samsonov , Daniil Tiapkin , Alexey Naumov , Eric Moulines
‹ Prev 1 2 3 10 Next ›