Related papers: Sample Complexity Bounds for Two Timescale Value-b…

Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise

Greedy-GQ is an off-policy two timescale algorithm for optimal control in reinforcement learning. This paper develops the first finite-sample analysis for the Greedy-GQ algorithm with linear function approximation under Markovian noise. Our…

Machine Learning · Computer Science 2020-05-21 Yue Wang , Shaofeng Zou

Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples

Gradient-based temporal difference (GTD) algorithms are widely used in off-policy learning scenarios. Among them, the two time-scale TD with gradient correction (TDC) algorithm has been shown to have superior performance. In contrast to…

Machine Learning · Computer Science 2019-09-27 Tengyu Xu , Shaofeng Zou , Yingbin Liang

Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation

Temporal-difference learning with gradient correction (TDC) is a two time-scale algorithm for policy evaluation in reinforcement learning. This algorithm was initially proposed with linear function approximation, and was later extended to…

Machine Learning · Computer Science 2021-10-29 Yue Wang , Shaofeng Zou , Yi Zhou

Finite-Time Error Bounds for Greedy-GQ

Greedy-GQ with linear function approximation, originally proposed in \cite{maei2010toward}, is a value-based off-policy algorithm for optimal control in reinforcement learning, and it has a non-linear two timescale structure with the…

Machine Learning · Computer Science 2024-05-03 Yue Wang , Yi Zhou , Shaofeng Zou

Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis

Variance reduction techniques have been successfully applied to temporal-difference (TD) learning and help to improve the sample complexity in policy evaluation. However, the existing work applied variance reduction to either the less…

Machine Learning · Computer Science 2023-05-23 Shaocong Ma , Yi Zhou , Shaofeng Zou

Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity

Greedy-GQ is a value-based reinforcement learning (RL) algorithm for optimal control. Recently, the finite-time analysis of Greedy-GQ has been developed under linear function approximation and Markovian sampling, and the algorithm is shown…

Machine Learning · Computer Science 2021-03-31 Shaocong Ma , Ziyi Chen , Yi Zhou , Shaofeng Zou

Nonlinear Distributional Gradient Temporal-Difference Learning

We devise a distributional variant of gradient temporal-difference (TD) learning. Distributional reinforcement learning has been demonstrated to outperform the regular one in the recent study \citep{bellemare2017distributional}. In the…

Machine Learning · Computer Science 2019-04-04 Chao Qu , Shie Mannor , Huan Xu

Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning

Two-timescale Stochastic Approximation (SA) algorithms are widely used in Reinforcement Learning (RL). Their iterates have two parts that are updated using distinct stepsizes. In this work, we develop a novel recipe for their finite sample…

Artificial Intelligence · Computer Science 2018-06-06 Gal Dalal , Balazs Szorenyi , Gugan Thoppe , Shie Mannor

Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic Approximation with Markovian Noise

Stochastic approximation (SA) is an iterative algorithm for finding the fixed point of an operator using noisy samples and widely used in optimization and Reinforcement Learning (RL). The noise in RL exhibits a Markovian structure, and in…

Machine Learning · Computer Science 2025-05-13 Shaan Ul Haque , Sajad Khodadadian , Siva Theja Maguluri

A Tale of Two-Timescale Reinforcement Learning with the Tightest Finite-Time Bound

Policy evaluation in reinforcement learning is often conducted using two-timescale stochastic approximation, which results in various gradient temporal difference methods such as GTD(0), GTD2, and TDC. Here, we provide convergence rate…

Machine Learning · Computer Science 2019-12-05 Gal Dalal , Balazs Szorenyi , Gugan Thoppe

Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms

As an important type of reinforcement learning algorithms, actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies. In the first nested-loop design, actor's one update of…

Machine Learning · Computer Science 2020-05-11 Tengyu Xu , Zhe Wang , Yingbin Liang

Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling

Despite the wide applications of Adam in reinforcement learning (RL), the theoretical convergence of Adam-type RL algorithms has not been established. This paper provides the first such convergence analysis for two fundamental RL algorithms…

Machine Learning · Computer Science 2020-08-18 Huaqing Xiong , Tengyu Xu , Yingbin Liang , Wei Zhang

Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by `controlled' Markov noise. In particular, both the faster and slower recursions have non-additive controlled Markov noise…

Dynamical Systems · Mathematics 2017-02-28 Prasenjit Karmakar , Shalabh Bhatnagar

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD)…

Machine Learning · Computer Science 2020-06-09 Bo Liu , Ian Gemp , Mohammad Ghavamzadeh , Ji Liu , Sridhar Mahadevan , Marek Petrik

Finite-Time Bounds for Two-Time-Scale Stochastic Approximation with Arbitrary Norm Contractions and Markovian Noise

Two-time-scale Stochastic Approximation (SA) is an iterative algorithm with applications in reinforcement learning and optimization. Prior finite time analysis of such algorithms has focused on fixed point iterations with mappings…

Machine Learning · Computer Science 2025-09-30 Siddharth Chandak , Shaan Ul Haque , Nicholas Bambos

Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning

Temporal-Difference (TD) learning with nonlinear smooth function approximation for policy evaluation has achieved great success in modern reinforcement learning. It is shown that such a problem can be reformulated as a stochastic…

Machine Learning · Computer Science 2020-08-25 Shuang Qiu , Zhuoran Yang , Xiaohan Wei , Jieping Ye , Zhaoran Wang

Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

Linear two-timescale stochastic approximation (SA) scheme is an important class of algorithms which has become popular in reinforcement learning (RL), particularly for the policy evaluation problem. Recently, a number of works have been…

Machine Learning · Statistics 2020-02-05 Maxim Kaledin , Eric Moulines , Alexey Naumov , Vladislav Tadic , Hoi-To Wai

On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning

We consider off-policy temporal-difference (TD) learning methods for policy evaluation in Markov decision processes with finite spaces and discounted reward criteria, and we present a collection of convergence results for several…

Machine Learning · Computer Science 2018-03-30 Huizhen Yu

Convergence Rates of Accelerated Markov Gradient Descent with Applications in Reinforcement Learning

Motivated by broad applications in machine learning, we study the popular accelerated stochastic gradient descent (ASGD) algorithm for solving (possibly nonconvex) optimization problems. We characterize the finite-time performance of this…

Optimization and Control · Mathematics 2020-10-20 Thinh T. Doan , Lam M. Nguyen , Nhan H. Pham , Justin Romberg

Finite-Sample Analysis of Nonlinear Stochastic Approximation with Applications in Reinforcement Learning

Motivated by applications in reinforcement learning (RL), we study a nonlinear stochastic approximation (SA) algorithm under Markovian noise, and establish its finite-sample convergence bounds under various stepsizes. Specifically, we show…

Optimization and Control · Mathematics 2022-01-27 Zaiwei Chen , Sheng Zhang , Thinh T. Doan , John-Paul Clarke , Siva Theja Maguluri