English
Related papers

Related papers: Accelerated and instance-optimal policy evaluation…

200 papers

We address the problem of policy evaluation in discounted Markov decision processes, and provide instance-dependent guarantees on the $\ell_\infty$-error under a generative model. We establish both asymptotic and non-asymptotic versions of…

Machine Learning · Statistics 2020-03-17 Koulik Khamaru , Ashwin Pananjady , Feng Ruan , Martin J. Wainwright , Michael I. Jordan

We study oracle complexity of gradient based methods for stochastic approximation problems. Though in many settings optimal algorithms and tight lower bounds are known for such problems, these optimal algorithms do not achieve the best…

Optimization and Control · Mathematics 2022-06-20 Jingzhao Zhang , Hongzhou Lin , Subhro Das , Suvrit Sra , Ali Jadbabaie

Linear fixed point equations in Hilbert spaces arise in a variety of settings, including reinforcement learning, and computational methods for solving differential and integral equations. We study methods that use a collection of random…

Machine Learning · Computer Science 2020-12-11 Wenlong Mou , Ashwin Pananjady , Martin J. Wainwright

Reinforcement learning (RL) problems are fundamental in online decision-making and have been instrumental in finding an optimal policy for Markov decision processes (MDPs). Function approximations are usually deployed to handle large or…

Machine Learning · Computer Science 2025-05-20 Jiashuo Jiang , Yiming Zong , Yinyu Ye

Sample-efficient offline reinforcement learning (RL) with linear function approximation has recently been studied extensively. Much of prior work has yielded the minimax-optimal bound of $\tilde{\mathcal{O}}(\frac{1}{\sqrt{K}})$, with $K$…

Machine Learning · Computer Science 2023-01-30 Thanh Nguyen-Tang , Ming Yin , Sunil Gupta , Svetha Venkatesh , Raman Arora

We study stochastic approximation procedures for approximately solving a $d$-dimensional linear fixed point equation based on observing a trajectory of length $n$ from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the…

Optimization and Control · Mathematics 2024-05-14 Wenlong Mou , Ashwin Pananjady , Martin J. Wainwright , Peter L. Bartlett

This paper is concerned with the problem of policy evaluation with linear function approximation in discounted infinite horizon Markov decision processes. We investigate the sample complexities required to guarantee a predefined estimation…

Machine Learning · Statistics 2024-05-03 Gen Li , Weichen Wu , Yuejie Chi , Cong Ma , Alessandro Rinaldo , Yuting Wei

We study the unconstrained minimization of a smooth and strongly convex population loss function under a stochastic oracle that introduces both additive and multiplicative noise; this is a canonical and widely-studied setting that arises…

Optimization and Control · Mathematics 2026-03-27 Liwei Jiang , Ashwin Pananjady

Various algorithms in reinforcement learning exhibit dramatic variability in their convergence rates and ultimate accuracy as a function of the problem structure. Such instance-specific behavior is not captured by existing global minimax…

Machine Learning · Statistics 2021-06-29 Koulik Khamaru , Eric Xia , Martin J. Wainwright , Michael I. Jordan

We consider the problem of fitting variational posterior approximations using stochastic optimization methods. The performance of these approximations depends on (1) how well the variational family matches the true posterior…

The classical algorithms for online learning and decision-making have the benefit of achieving the optimal performance guarantees, but suffer from computational complexity limitations when implemented at scale. More recent sophisticated…

Machine Learning · Computer Science 2022-10-19 Guanghui Wang , Zihao Hu , Vidya Muthukumar , Jacob Abernethy

While much progress has been made in understanding the minimax sample complexity of reinforcement learning (RL) -- the complexity of learning on the "worst-case" instance -- such measures of complexity often do not capture the true…

Machine Learning · Computer Science 2023-07-21 Andrew Wagenmaker , Kevin Jamieson

Offline policy evaluation is a fundamental statistical problem in reinforcement learning that involves estimating the value function of some decision-making policy given data collected by a potentially different policy. In order to tackle…

Machine Learning · Computer Science 2022-12-20 Juan C. Perdomo , Akshay Krishnamurthy , Peter Bartlett , Sham Kakade

Balancing between computational efficiency and sample efficiency is an important goal in reinforcement learning. Temporal difference (TD) learning algorithms stochastically update the value function, with a linear time complexity in the…

Machine Learning · Computer Science 2016-11-21 Clement Gehring , Yangchen Pan , Martha White

Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure. Such problem-dependent behavior is not captured by worst-case analyses and has accordingly inspired…

Machine Learning · Statistics 2022-01-24 Koulik Khamaru , Eric Xia , Martin J. Wainwright , Michael I. Jordan

We study optimal procedures for estimating a linear functional based on observational data. In many problems of this kind, a widely used assumption is strict overlap, i.e., uniform boundedness of the importance ratio, which measures how…

Statistics Theory · Mathematics 2023-01-18 Wenlong Mou , Peng Ding , Martin J. Wainwright , Peter L. Bartlett

The focus of this paper is on stochastic variational inequalities (VI) under Markovian noise. A prominent application of our algorithmic developments is the stochastic policy evaluation problem in reinforcement learning. Prior…

Optimization and Control · Mathematics 2021-08-17 Georgios Kotsalis , Guanghui Lan , Tianjiao Li

In reinforcement learning (RL), offline learning decoupled learning from data collection and is useful in dealing with exploration-exploitation tradeoff and enables data reuse in many applications. In this work, we study two offline…

Machine Learning · Computer Science 2022-02-08 Jing Dong , Xin T. Tong

This paper studies the statistical theory of batch data reinforcement learning with function approximation. Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history…

Machine Learning · Computer Science 2020-02-25 Yaqi Duan , Mengdi Wang

The theory of reinforcement learning has focused on two fundamental problems: achieving low regret, and identifying $\epsilon$-optimal policies. While a simple reduction allows one to apply a low-regret algorithm to obtain an…

Machine Learning · Computer Science 2022-06-23 Andrew Wagenmaker , Max Simchowitz , Kevin Jamieson
‹ Prev 1 2 3 10 Next ›