Related papers: Gaussian Approximation for Asynchronous Q-learning

On Gaussian approximation for entropy-regularized Q-learning with function approximation

In this paper, we derive rates of convergence in the high-dimensional central limit theorem for Polyak--Ruppert averaged iterates generated by entropy-regularized asynchronous Q-learning with linear function approximation and a polynomial…

Machine Learning · Statistics 2026-05-19 Artemy Rubtsov , Rahul Singh , Eric Moulines , Alexey Naumov , Sergey Samsonov

Central Limit Theorems for Asynchronous Averaged Q-Learning

This paper establishes central limit theorems for Polyak-Ruppert averaged Q-learning under asynchronous updates. We prove a non-asymptotic central limit theorem, where the convergence rate in Wasserstein distance explicitly reflects the…

Machine Learning · Computer Science 2026-04-21 Xingtu Liu

Parameter-free Optimal Rates for Nonlinear Semi-Norm Contractions with Applications to $Q$-Learning

Algorithms for solving \textit{nonlinear} fixed-point equations -- such as average-reward \textit{$Q$-learning} and \textit{TD-learning} -- often involve semi-norm contractions. Achieving parameter-free optimal convergence rates for these…

Machine Learning · Computer Science 2026-03-24 Ankur Naskar , Gugan Thoppe , Vijay Gupta

Improved Central Limit Theorem and Bootstrap Approximations for Linear Stochastic Approximation

In this paper, we refine the Berry-Esseen bounds for the multivariate normal approximation of Polyak-Ruppert averaged iterates arising from the linear stochastic approximation (LSA) algorithm with decreasing step size. We consider the…

Machine Learning · Statistics 2025-10-15 Bogdan Butyrin , Eric Moulines , Alexey Naumov , Sergey Samsonov , Qi-Man Shao , Zhuo-Song Zhang

Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation

Stochastic Approximation (SA) is a widely used algorithmic approach in various fields, including optimization and reinforcement learning (RL). Among RL algorithms, Q-learning is particularly popular due to its empirical success. In this…

Machine Learning · Statistics 2024-01-26 Yixuan Zhang , Qiaomin Xie

A Statistical Analysis of Polyak-Ruppert Averaged Q-learning

We study Q-learning with Polyak-Ruppert averaging in a discounted Markov decision process in synchronous and tabular settings. Under a Lipschitz condition, we establish a functional central limit theorem for the averaged iteration…

Machine Learning · Statistics 2023-02-21 Xiang Li , Wenhao Yang , Jiadong Liang , Zhihua Zhang , Michael I. Jordan

Finite-Sample Wasserstein Error Bounds and Concentration Inequalities for Nonlinear Stochastic Approximation

This paper derives non-asymptotic error bounds for nonlinear stochastic approximation algorithms in the Wasserstein-$p$ distance. To obtain explicit finite-sample guarantees for the last iterate, we develop a coupling argument that compares…

Machine Learning · Computer Science 2026-02-03 Seo Taek Kong , R. Srikant

From Set Convergence to Pointwise Convergence: Finite-Time Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes

This work presents the first finite-time analysis for the last-iterate convergence of average-reward $Q$-learning with an asynchronous implementation. A key feature of the algorithm we study is the use of adaptive stepsizes, which serve as…

Machine Learning · Computer Science 2026-04-07 Zaiwei Chen , Phalguni Nanda

Nonasymptotic CLT and Error Bounds for Two-Time-Scale Stochastic Approximation

We consider linear two-time-scale stochastic approximation algorithms driven by martingale noise. Recent applications in machine learning motivate the need to understand finite-time error rates, but conventional stochastic approximation…

Machine Learning · Computer Science 2025-12-12 Seo Taek Kong , Sihan Zeng , Thinh T. Doan , R. Srikant

A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging

Polyak-Ruppert averaging is a widely used technique to achieve the optimal asymptotic variance of stochastic approximation (SA) algorithms, yet its high-probability performance guarantees remain underexplored in general settings. In this…

Machine Learning · Statistics 2025-05-29 Sajad Khodadadian , Martin Zubeldia

Zap Q-Learning for Optimal Stopping Time Problems

The objective in this paper is to obtain fast converging reinforcement learning algorithms to approximate solutions to the problem of discounted cost optimal stopping in an irreducible, uniformly ergodic Markov chain, evolving on a compact…

Systems and Control · Computer Science 2019-10-01 Shuhang Chen , Adithya M. Devraj , Ana Bušić , Sean P. Meyn

General multilevel adaptations for stochastic approximation algorithms

In this article we establish central limit theorems for multilevel Polyak-Ruppert averaged stochastic approximation schemes. We work under very mild technical assumptions and consider the slow regime in wich typical errors decay like…

Probability · Mathematics 2019-12-18 Steffen Dereich

On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration

We undertake a precise study of the asymptotic and non-asymptotic properties of stochastic approximation procedures with Polyak-Ruppert averaging for solving a linear system $\bar{A} \theta = \bar{b}$. When the matrix $\bar{A}$ is Hurwitz,…

Machine Learning · Statistics 2020-04-10 Wenlong Mou , Chris Junchi Li , Martin J. Wainwright , Peter L. Bartlett , Michael I. Jordan

A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation

Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning…

Machine Learning · Computer Science 2020-03-05 Pan Xu , Quanquan Gu

Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning

We prove a non-asymptotic central limit theorem for vector-valued martingale differences using Stein's method, and use Poisson's equation to extend the result to functions of Markov Chains. We then show that these results can be applied to…

Probability · Mathematics 2026-02-10 R. Srikant

On the Rate of Gaussian Approximation for Linear Regression Problems

In this paper, we consider the problem of Gaussian approximation for the online linear regression task. We derive the corresponding rates for the setting of a constant learning rate and study the explicit dependence of the convergence rate…

Machine Learning · Statistics 2025-09-18 Marat Khusainov , Marina Sheshukova , Alain Durmus , Sergey Samsonov

A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants

This paper develops an unified framework to study finite-sample convergence guarantees of a large class of value-based asynchronous reinforcement learning (RL) algorithms. We do this by first reformulating the RL algorithms as…

Machine Learning · Computer Science 2023-09-06 Zaiwei Chen , Siva Theja Maguluri , Sanjay Shakkottai , Karthikeyan Shanmugam

Gaussian Approximation for Two-Timescale Linear Stochastic Approximation

In this paper, we establish non-asymptotic bounds for accuracy of normal approximation for linear two-timescale stochastic approximation (TTSA) algorithms driven by martingale difference or Markov noise. Focusing on both the last iterate…

Machine Learning · Statistics 2025-12-10 Bogdan Butyrin , Artemy Rubtsov , Alexey Naumov , Vladimir Ulyanov , Sergey Samsonov

Stochastic Approximation for Risk-aware Markov Decision Processes

We develop a stochastic approximation-type algorithm to solve finite state/action, infinite-horizon, risk-aware Markov decision processes. Our algorithm has two loops. The inner loop computes the risk by solving a stochastic saddle-point…

Optimization and Control · Mathematics 2019-12-05 Wenjie Huang , William B. Haskell

Statistical Inference for Temporal Difference Learning with Linear Function Approximation

We investigate the statistical properties of Temporal Difference (TD) learning with Polyak-Ruppert averaging, arguably one of the most widely used algorithms in reinforcement learning, for the task of estimating the parameters of the…

Machine Learning · Statistics 2026-02-25 Weichen Wu , Gen Li , Yuting Wei , Alessandro Rinaldo