English
Related papers

Related papers: One-Shot Averaging for Distributed TD($\lambda$) U…

200 papers

We provide a new non-asymptotic analysis of distributed TD(0) with linear function approximation. Our approach relies on "one-shot averaging," where $N$ agents run local copies of TD(0) and average the outcomes only once at the very end. We…

Machine Learning · Computer Science 2022-01-31 Rui Liu , Alex Olshevsky

We study the policy evaluation problem in multi-agent reinforcement learning, modeled by a Markov decision process. In this problem, the agents operate in a common environment under a fixed control policy, working together to discover the…

Optimization and Control · Mathematics 2020-01-13 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

We provide a new non-asymptotic analysis of distributed temporal difference learning with linear function approximation. Our approach relies on ``one-shot averaging,'' where $N$ agents run identical local copies of the TD(0) method and…

Machine Learning · Computer Science 2023-05-26 Rui Liu , Alex Olshevsky

Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling observations from the environment is usually split across multiple agents. However, transferring these observations from the agents to a central…

Machine Learning · Computer Science 2024-10-22 Sajad Khodadadian , Pranay Sharma , Gauri Joshi , Siva Theja Maguluri

We initiate the study of federated reinforcement learning under environmental heterogeneity by considering a policy evaluation problem. Our setup involves $N$ agents interacting with environments that share the same state and action space…

Machine Learning · Computer Science 2024-07-02 Han Wang , Aritra Mitra , Hamed Hassani , George J. Pappas , James Anderson

In this paper, we study the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The purpose of distributional TD learning is to estimate the return distribution of a…

Machine Learning · Statistics 2025-11-18 Kaicheng Jin , Yang Peng , Jiansheng Yang , Zhihua Zhang

Reinforcement learning algorithms typically rely on the assumption that the environment dynamics and value function can be expressed in terms of a Markovian state representation. However, when state information is only partially observable,…

The paper considers a class of multi-agent Markov decision processes (MDPs), in which the network agents respond differently (as manifested by the instantaneous one-stage random costs) to a global controlled state and the control actions of…

Machine Learning · Statistics 2015-06-04 Soummya Kar , Jose' M. F. Moura , H. Vincent Poor

Recent research endeavours have theoretically shown the beneficial effect of cooperation in multi-agent reinforcement learning (MARL). In a setting involving $N$ agents, this beneficial effect usually comes in the form of an $N$-fold linear…

Multiagent Systems · Computer Science 2024-07-31 Nicolò Dal Fabbro , Arman Adibi , Aritra Mitra , George J. Pappas

In this paper, we study the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The aim of distributional TD learning is to estimate the return distribution of a discounted…

Machine Learning · Statistics 2025-05-14 Yang Peng , Kaicheng Jin , Liangyu Zhang , Zhihua Zhang

Distributed learning is essential to train machine learning algorithms across heterogeneous agents while maintaining data privacy. We conduct an asymptotic analysis of Unified Distributed SGD (UD-SGD), exploring a variety of communication…

Machine Learning · Computer Science 2024-10-30 Jie Hu , Yi-Ting Ma , Do Young Eun

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement…

Machine Learning · Computer Science 2018-11-07 Jalaj Bhandari , Daniel Russo , Raghav Singal

We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of…

Machine Learning · Computer Science 2020-03-30 Philip Amortila , Doina Precup , Prakash Panangaden , Marc G. Bellemare

TD($\lambda$) in value-based MARL algorithms or the Temporal Difference critic learning in Actor-Critic-based (AC-based) algorithms synergistically integrate elements from Monte-Carlo simulation and Q function bootstrapping via dynamic…

Machine Learning · Computer Science 2026-05-13 Yue Deng , Zirui Wang , Yin Zhang

Policies for partially observed Markov decision processes can be efficiently learned by imitating policies for the corresponding fully observed Markov decision processes. Unfortunately, existing approaches for this kind of imitation…

Machine Learning · Computer Science 2021-07-02 Andrew Warrington , J. Wilder Lavington , Adam Ścibior , Mark Schmidt , Frank Wood

We present an algorithm for the problem of linear distributed estimation of a parameter in a network where a set of agents are successively taking measurements. The approach considers a roaming token in a network that carries the estimate,…

Systems and Control · Computer Science 2018-07-05 Lucas Balthazar , João Xavier , Bruno Sinopoli

We investigate reinforcement learning in the setting of Markov decision processes for a large number of exchangeable agents interacting in a mean field manner. Applications include, for example, the control of a large number of robots…

Optimization and Control · Mathematics 2025-04-30 René Carmona , Mathieu Laurière , Zongjun Tan

Learning the value function of a given policy from data samples is an important problem in Reinforcement Learning. TD($\lambda$) is a popular class of algorithms to solve this problem. However, the weights assigned to different $n$-step…

Machine Learning · Computer Science 2021-11-24 Rohan Deb , Meet Gandhi , Shalabh Bhatnagar

Markov Decision Process (MDP) presents a mathematical framework to formulate the learning processes of agents in reinforcement learning. MDP is limited by the Markovian assumption that a reward only depends on the immediate state and…

Machine Learning · Computer Science 2024-06-04 Bohao Qu , Xiaofeng Cao , Jielong Yang , Hechang Chen , Chang Yi , Ivor W. Tsang , Yew-Soon Ong

Motivated by the emerging use of multi-agent reinforcement learning (MARL) in engineering applications such as networked robotics, swarming drones, and sensor networks, we investigate the policy evaluation problem in a fully decentralized…

Machine Learning · Computer Science 2020-01-31 Jun Sun , Gang Wang , Georgios B. Giannakis , Qinmin Yang , Zaiyue Yang
‹ Prev 1 2 3 10 Next ›