English
Related papers

Related papers: Gaussian Approximation for Asynchronous Q-learning

200 papers

In this paper, we derive rates of convergence in the high-dimensional central limit theorem for Polyak--Ruppert averaged iterates generated by entropy-regularized asynchronous Q-learning with linear function approximation and a polynomial…

Machine Learning · Statistics 2026-05-19 Artemy Rubtsov , Rahul Singh , Eric Moulines , Alexey Naumov , Sergey Samsonov

This paper establishes central limit theorems for Polyak-Ruppert averaged Q-learning under asynchronous updates. We prove a non-asymptotic central limit theorem, where the convergence rate in Wasserstein distance explicitly reflects the…

Machine Learning · Computer Science 2026-04-21 Xingtu Liu

Algorithms for solving \textit{nonlinear} fixed-point equations -- such as average-reward \textit{$Q$-learning} and \textit{TD-learning} -- often involve semi-norm contractions. Achieving parameter-free optimal convergence rates for these…

Machine Learning · Computer Science 2026-03-24 Ankur Naskar , Gugan Thoppe , Vijay Gupta

In this paper, we refine the Berry-Esseen bounds for the multivariate normal approximation of Polyak-Ruppert averaged iterates arising from the linear stochastic approximation (LSA) algorithm with decreasing step size. We consider the…

Machine Learning · Statistics 2025-10-15 Bogdan Butyrin , Eric Moulines , Alexey Naumov , Sergey Samsonov , Qi-Man Shao , Zhuo-Song Zhang

Stochastic Approximation (SA) is a widely used algorithmic approach in various fields, including optimization and reinforcement learning (RL). Among RL algorithms, Q-learning is particularly popular due to its empirical success. In this…

Machine Learning · Statistics 2024-01-26 Yixuan Zhang , Qiaomin Xie

We study Q-learning with Polyak-Ruppert averaging in a discounted Markov decision process in synchronous and tabular settings. Under a Lipschitz condition, we establish a functional central limit theorem for the averaged iteration…

Machine Learning · Statistics 2023-02-21 Xiang Li , Wenhao Yang , Jiadong Liang , Zhihua Zhang , Michael I. Jordan

This paper derives non-asymptotic error bounds for nonlinear stochastic approximation algorithms in the Wasserstein-$p$ distance. To obtain explicit finite-sample guarantees for the last iterate, we develop a coupling argument that compares…

Machine Learning · Computer Science 2026-02-03 Seo Taek Kong , R. Srikant

This work presents the first finite-time analysis for the last-iterate convergence of average-reward $Q$-learning with an asynchronous implementation. A key feature of the algorithm we study is the use of adaptive stepsizes, which serve as…

Machine Learning · Computer Science 2026-04-07 Zaiwei Chen , Phalguni Nanda

We consider linear two-time-scale stochastic approximation algorithms driven by martingale noise. Recent applications in machine learning motivate the need to understand finite-time error rates, but conventional stochastic approximation…

Machine Learning · Computer Science 2025-12-12 Seo Taek Kong , Sihan Zeng , Thinh T. Doan , R. Srikant

Polyak-Ruppert averaging is a widely used technique to achieve the optimal asymptotic variance of stochastic approximation (SA) algorithms, yet its high-probability performance guarantees remain underexplored in general settings. In this…

Machine Learning · Statistics 2025-05-29 Sajad Khodadadian , Martin Zubeldia

The objective in this paper is to obtain fast converging reinforcement learning algorithms to approximate solutions to the problem of discounted cost optimal stopping in an irreducible, uniformly ergodic Markov chain, evolving on a compact…

Systems and Control · Computer Science 2019-10-01 Shuhang Chen , Adithya M. Devraj , Ana Bušić , Sean P. Meyn

In this article we establish central limit theorems for multilevel Polyak-Ruppert averaged stochastic approximation schemes. We work under very mild technical assumptions and consider the slow regime in wich typical errors decay like…

Probability · Mathematics 2019-12-18 Steffen Dereich

We undertake a precise study of the asymptotic and non-asymptotic properties of stochastic approximation procedures with Polyak-Ruppert averaging for solving a linear system $\bar{A} \theta = \bar{b}$. When the matrix $\bar{A}$ is Hurwitz,…

Machine Learning · Statistics 2020-04-10 Wenlong Mou , Chris Junchi Li , Martin J. Wainwright , Peter L. Bartlett , Michael I. Jordan

Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning…

Machine Learning · Computer Science 2020-03-05 Pan Xu , Quanquan Gu

We prove a non-asymptotic central limit theorem for vector-valued martingale differences using Stein's method, and use Poisson's equation to extend the result to functions of Markov Chains. We then show that these results can be applied to…

Probability · Mathematics 2026-02-10 R. Srikant

In this paper, we consider the problem of Gaussian approximation for the online linear regression task. We derive the corresponding rates for the setting of a constant learning rate and study the explicit dependence of the convergence rate…

Machine Learning · Statistics 2025-09-18 Marat Khusainov , Marina Sheshukova , Alain Durmus , Sergey Samsonov

This paper develops an unified framework to study finite-sample convergence guarantees of a large class of value-based asynchronous reinforcement learning (RL) algorithms. We do this by first reformulating the RL algorithms as…

Machine Learning · Computer Science 2023-09-06 Zaiwei Chen , Siva Theja Maguluri , Sanjay Shakkottai , Karthikeyan Shanmugam

In this paper, we establish non-asymptotic bounds for accuracy of normal approximation for linear two-timescale stochastic approximation (TTSA) algorithms driven by martingale difference or Markov noise. Focusing on both the last iterate…

Machine Learning · Statistics 2025-12-10 Bogdan Butyrin , Artemy Rubtsov , Alexey Naumov , Vladimir Ulyanov , Sergey Samsonov

We develop a stochastic approximation-type algorithm to solve finite state/action, infinite-horizon, risk-aware Markov decision processes. Our algorithm has two loops. The inner loop computes the risk by solving a stochastic saddle-point…

Optimization and Control · Mathematics 2019-12-05 Wenjie Huang , William B. Haskell

We investigate the statistical properties of Temporal Difference (TD) learning with Polyak-Ruppert averaging, arguably one of the most widely used algorithms in reinforcement learning, for the task of estimating the parameters of the…

Machine Learning · Statistics 2026-02-25 Weichen Wu , Gen Li , Yuting Wei , Alessandro Rinaldo
‹ Prev 1 2 3 10 Next ›