Related papers: Accelerated and instance-optimal policy evaluation…

Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis

We address the problem of policy evaluation in discounted Markov decision processes, and provide instance-dependent guarantees on the $\ell_\infty$-error under a generative model. We establish both asymptotic and non-asymptotic versions of…

Machine Learning · Statistics 2020-03-17 Koulik Khamaru , Ashwin Pananjady , Feng Ruan , Martin J. Wainwright , Michael I. Jordan

Beyond Worst-Case Analysis in Stochastic Approximation: Moment Estimation Improves Instance Complexity

We study oracle complexity of gradient based methods for stochastic approximation problems. Though in many settings optimal algorithms and tight lower bounds are known for such problems, these optimal algorithms do not achieve the best…

Optimization and Control · Mathematics 2022-06-20 Jingzhao Zhang , Hongzhou Lin , Subhro Das , Suvrit Sra , Ali Jadbabaie

Optimal oracle inequalities for solving projected fixed-point equations

Linear fixed point equations in Hilbert spaces arise in a variety of settings, including reinforcement learning, and computational methods for solving differential and integral equations. We study methods that use a collection of random…

Machine Learning · Computer Science 2020-12-11 Wenlong Mou , Ashwin Pananjady , Martin J. Wainwright

Adaptive Resolving Methods for Reinforcement Learning with Function Approximations

Reinforcement learning (RL) problems are fundamental in online decision-making and have been instrumental in finding an optimal policy for Markov decision processes (MDPs). Function approximations are usually deployed to handle large or…

Machine Learning · Computer Science 2025-05-20 Jiashuo Jiang , Yiming Zong , Yinyu Ye

On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation

Sample-efficient offline reinforcement learning (RL) with linear function approximation has recently been studied extensively. Much of prior work has yielded the minimax-optimal bound of $\tilde{\mathcal{O}}(\frac{1}{\sqrt{K}})$, with $K$…

Machine Learning · Computer Science 2023-01-30 Thanh Nguyen-Tang , Ming Yin , Sunil Gupta , Svetha Venkatesh , Raman Arora

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

We study stochastic approximation procedures for approximately solving a $d$-dimensional linear fixed point equation based on observing a trajectory of length $n$ from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the…

Optimization and Control · Mathematics 2024-05-14 Wenlong Mou , Ashwin Pananjady , Martin J. Wainwright , Peter L. Bartlett

High-probability sample complexities for policy evaluation with linear function approximation

This paper is concerned with the problem of policy evaluation with linear function approximation in discounted infinite horizon Markov decision processes. We investigate the sample complexities required to guarantee a predefined estimation…

Machine Learning · Statistics 2024-05-03 Gen Li , Weichen Wu , Yuejie Chi , Cong Ma , Alessandro Rinaldo , Yuting Wei

Instance-optimal stochastic convex optimization: Can we improve upon sample-average and robust stochastic approximation?

We study the unconstrained minimization of a smooth and strongly convex population loss function under a stochastic oracle that introduces both additive and multiplicative noise; this is a canonical and widely-studied setting that arises…

Optimization and Control · Mathematics 2026-03-27 Liwei Jiang , Ashwin Pananjady

Instance-optimality in optimal value estimation: Adaptivity via variance-reduced Q-learning

Various algorithms in reinforcement learning exhibit dramatic variability in their convergence rates and ultimate accuracy as a function of the problem structure. Such instance-specific behavior is not captured by existing global minimax…

Machine Learning · Statistics 2021-06-29 Koulik Khamaru , Eric Xia , Martin J. Wainwright , Michael I. Jordan

Robust, Accurate Stochastic Optimization for Variational Inference

We consider the problem of fitting variational posterior approximations using stochastic optimization methods. The performance of these approximations depends on (1) how well the variational family matches the true posterior…

Machine Learning · Computer Science 2025-09-22 Akash Kumar Dhaka , Alejandro Catalina , Michael Riis Andersen , Måns Magnusson , Jonathan H. Huggins , Aki Vehtari

Adaptive Oracle-Efficient Online Learning

The classical algorithms for online learning and decision-making have the benefit of achieving the optimal performance guarantees, but suffer from computational complexity limitations when implemented at scale. More recent sophisticated…

Machine Learning · Computer Science 2022-10-19 Guanghui Wang , Zihao Hu , Vidya Muthukumar , Jacob Abernethy

Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design

While much progress has been made in understanding the minimax sample complexity of reinforcement learning (RL) -- the complexity of learning on the "worst-case" instance -- such measures of complexity often do not capture the true…

Machine Learning · Computer Science 2023-07-21 Andrew Wagenmaker , Kevin Jamieson

A Complete Characterization of Linear Estimators for Offline Policy Evaluation

Offline policy evaluation is a fundamental statistical problem in reinforcement learning that involves estimating the value function of some decision-making policy given data collected by a potentially different policy. In order to tackle…

Machine Learning · Computer Science 2022-12-20 Juan C. Perdomo , Akshay Krishnamurthy , Peter Bartlett , Sham Kakade

Incremental Truncated LSTD

Balancing between computational efficiency and sample efficiency is an important goal in reinforcement learning. Temporal difference (TD) learning algorithms stochastically update the value function, with a linear time complexity in the…

Machine Learning · Computer Science 2016-11-21 Clement Gehring , Yangchen Pan , Martha White

Instance-Dependent Confidence and Early Stopping for Reinforcement Learning

Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure. Such problem-dependent behavior is not captured by worst-case analyses and has accordingly inspired…

Machine Learning · Statistics 2022-01-24 Koulik Khamaru , Eric Xia , Martin J. Wainwright , Michael I. Jordan

Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency

We study optimal procedures for estimating a linear functional based on observational data. In many problems of this kind, a widely used assumption is strict overlap, i.e., uniform boundedness of the importance ratio, which measures how…

Statistics Theory · Mathematics 2023-01-18 Wenlong Mou , Peng Ding , Martin J. Wainwright , Peter L. Bartlett

Simple and optimal methods for stochastic variational inequalities, II: Markovian noise and policy evaluation in reinforcement learning

The focus of this paper is on stochastic variational inequalities (VI) under Markovian noise. A prominent application of our algorithmic developments is the stochastic policy evaluation problem in reinforcement learning. Prior…

Optimization and Control · Mathematics 2021-08-17 Georgios Kotsalis , Guanghui Lan , Tianjiao Li

Stochastic Gradient Descent with Dependent Data for Offline Reinforcement Learning

In reinforcement learning (RL), offline learning decoupled learning from data collection and is useful in dealing with exploration-exploitation tradeoff and enables data reuse in many applications. In this work, we study two offline…

Machine Learning · Computer Science 2022-02-08 Jing Dong , Xin T. Tong

Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation

This paper studies the statistical theory of batch data reinforcement learning with function approximation. Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history…

Machine Learning · Computer Science 2020-02-25 Yaqi Duan , Mengdi Wang

Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

The theory of reinforcement learning has focused on two fundamental problems: achieving low regret, and identifying $\epsilon$-optimal policies. While a simple reduction allows one to apply a low-regret algorithm to obtain an…

Machine Learning · Computer Science 2022-06-23 Andrew Wagenmaker , Max Simchowitz , Kevin Jamieson