English
Related papers

Related papers: Sample Complexity Bounds for Two Timescale Value-b…

200 papers

Greedy-GQ is an off-policy two timescale algorithm for optimal control in reinforcement learning. This paper develops the first finite-sample analysis for the Greedy-GQ algorithm with linear function approximation under Markovian noise. Our…

Machine Learning · Computer Science 2020-05-21 Yue Wang , Shaofeng Zou

Gradient-based temporal difference (GTD) algorithms are widely used in off-policy learning scenarios. Among them, the two time-scale TD with gradient correction (TDC) algorithm has been shown to have superior performance. In contrast to…

Machine Learning · Computer Science 2019-09-27 Tengyu Xu , Shaofeng Zou , Yingbin Liang

Temporal-difference learning with gradient correction (TDC) is a two time-scale algorithm for policy evaluation in reinforcement learning. This algorithm was initially proposed with linear function approximation, and was later extended to…

Machine Learning · Computer Science 2021-10-29 Yue Wang , Shaofeng Zou , Yi Zhou

Greedy-GQ with linear function approximation, originally proposed in \cite{maei2010toward}, is a value-based off-policy algorithm for optimal control in reinforcement learning, and it has a non-linear two timescale structure with the…

Machine Learning · Computer Science 2024-05-03 Yue Wang , Yi Zhou , Shaofeng Zou

Variance reduction techniques have been successfully applied to temporal-difference (TD) learning and help to improve the sample complexity in policy evaluation. However, the existing work applied variance reduction to either the less…

Machine Learning · Computer Science 2023-05-23 Shaocong Ma , Yi Zhou , Shaofeng Zou

Greedy-GQ is a value-based reinforcement learning (RL) algorithm for optimal control. Recently, the finite-time analysis of Greedy-GQ has been developed under linear function approximation and Markovian sampling, and the algorithm is shown…

Machine Learning · Computer Science 2021-03-31 Shaocong Ma , Ziyi Chen , Yi Zhou , Shaofeng Zou

We devise a distributional variant of gradient temporal-difference (TD) learning. Distributional reinforcement learning has been demonstrated to outperform the regular one in the recent study \citep{bellemare2017distributional}. In the…

Machine Learning · Computer Science 2019-04-04 Chao Qu , Shie Mannor , Huan Xu

Two-timescale Stochastic Approximation (SA) algorithms are widely used in Reinforcement Learning (RL). Their iterates have two parts that are updated using distinct stepsizes. In this work, we develop a novel recipe for their finite sample…

Artificial Intelligence · Computer Science 2018-06-06 Gal Dalal , Balazs Szorenyi , Gugan Thoppe , Shie Mannor

Stochastic approximation (SA) is an iterative algorithm for finding the fixed point of an operator using noisy samples and widely used in optimization and Reinforcement Learning (RL). The noise in RL exhibits a Markovian structure, and in…

Machine Learning · Computer Science 2025-05-13 Shaan Ul Haque , Sajad Khodadadian , Siva Theja Maguluri

Policy evaluation in reinforcement learning is often conducted using two-timescale stochastic approximation, which results in various gradient temporal difference methods such as GTD(0), GTD2, and TDC. Here, we provide convergence rate…

Machine Learning · Computer Science 2019-12-05 Gal Dalal , Balazs Szorenyi , Gugan Thoppe

As an important type of reinforcement learning algorithms, actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies. In the first nested-loop design, actor's one update of…

Machine Learning · Computer Science 2020-05-11 Tengyu Xu , Zhe Wang , Yingbin Liang

Despite the wide applications of Adam in reinforcement learning (RL), the theoretical convergence of Adam-type RL algorithms has not been established. This paper provides the first such convergence analysis for two fundamental RL algorithms…

Machine Learning · Computer Science 2020-08-18 Huaqing Xiong , Tengyu Xu , Yingbin Liang , Wei Zhang

We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by `controlled' Markov noise. In particular, both the faster and slower recursions have non-additive controlled Markov noise…

Dynamical Systems · Mathematics 2017-02-28 Prasenjit Karmakar , Shalabh Bhatnagar

In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD)…

Machine Learning · Computer Science 2020-06-09 Bo Liu , Ian Gemp , Mohammad Ghavamzadeh , Ji Liu , Sridhar Mahadevan , Marek Petrik

Two-time-scale Stochastic Approximation (SA) is an iterative algorithm with applications in reinforcement learning and optimization. Prior finite time analysis of such algorithms has focused on fixed point iterations with mappings…

Machine Learning · Computer Science 2025-09-30 Siddharth Chandak , Shaan Ul Haque , Nicholas Bambos

Temporal-Difference (TD) learning with nonlinear smooth function approximation for policy evaluation has achieved great success in modern reinforcement learning. It is shown that such a problem can be reformulated as a stochastic…

Machine Learning · Computer Science 2020-08-25 Shuang Qiu , Zhuoran Yang , Xiaohan Wei , Jieping Ye , Zhaoran Wang

Linear two-timescale stochastic approximation (SA) scheme is an important class of algorithms which has become popular in reinforcement learning (RL), particularly for the policy evaluation problem. Recently, a number of works have been…

Machine Learning · Statistics 2020-02-05 Maxim Kaledin , Eric Moulines , Alexey Naumov , Vladislav Tadic , Hoi-To Wai

We consider off-policy temporal-difference (TD) learning methods for policy evaluation in Markov decision processes with finite spaces and discounted reward criteria, and we present a collection of convergence results for several…

Machine Learning · Computer Science 2018-03-30 Huizhen Yu

Motivated by broad applications in machine learning, we study the popular accelerated stochastic gradient descent (ASGD) algorithm for solving (possibly nonconvex) optimization problems. We characterize the finite-time performance of this…

Optimization and Control · Mathematics 2020-10-20 Thinh T. Doan , Lam M. Nguyen , Nhan H. Pham , Justin Romberg

Motivated by applications in reinforcement learning (RL), we study a nonlinear stochastic approximation (SA) algorithm under Markovian noise, and establish its finite-sample convergence bounds under various stepsizes. Specifically, we show…

Optimization and Control · Mathematics 2022-01-27 Zaiwei Chen , Sheng Zhang , Thinh T. Doan , John-Paul Clarke , Siva Theja Maguluri
‹ Prev 1 2 3 10 Next ›