Related papers: Adaptive Approximate Policy Iteration

Reinforcement Learning in MDPs with Information-Ordered Policies

We propose an epoch-based reinforcement learning algorithm for infinite-horizon average-cost Markov decision processes (MDPs) that leverages a partial order over a policy class. In this structure, $\pi' \leq \pi$ if data collected under…

Machine Learning · Statistics 2025-08-07 Zhongjun Zhang , Shipra Agrawal , Ilan Lobel , Sean R. Sinclair , Christina Lee Yu

Approximate Modified Policy Iteration

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form…

Artificial Intelligence · Computer Science 2012-05-21 Bruno Scherrer , Victor Gabillon , Mohammad Ghavamzadeh , Matthieu Geist

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

Model-free reinforcement learning is known to be memory and computation efficient and more amendable to large scale problems. In this paper, two model-free algorithms are introduced for learning infinite-horizon average-reward Markov…

Machine Learning · Computer Science 2020-02-26 Chen-Yu Wei , Mehdi Jafarnia-Jahromi , Haipeng Luo , Hiteshi Sharma , Rahul Jain

Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs

Learning Markov decision processes (MDPs) in the presence of the adversary is a challenging problem in reinforcement learning (RL). In this paper, we study RL in episodic MDPs with adversarial reward and full information feedback, where the…

Machine Learning · Computer Science 2022-04-21 Jiafan He , Dongruo Zhou , Quanquan Gu

Improved Regret Bound and Experience Replay in Regularized Policy Iteration

In this work, we study algorithms for learning in infinite-horizon undiscounted Markov decision processes (MDPs) with function approximation. We first show that the regret analysis of the Politex algorithm (a version of regularized policy…

Machine Learning · Computer Science 2021-02-26 Nevena Lazic , Dong Yin , Yasin Abbasi-Yadkori , Csaba Szepesvari

Reinforcement Learning for Infinite-Horizon Average-Reward Linear MDPs via Approximation by Discounted-Reward MDPs

We study the problem of infinite-horizon average-reward reinforcement learning with linear Markov decision processes (MDPs). The associated Bellman operator of the problem not being a contraction makes the algorithm design challenging.…

Machine Learning · Statistics 2025-03-12 Kihyuk Hong , Woojin Chae , Yufan Zhang , Dabeen Lee , Ambuj Tewari

Confident Approximate Policy Iteration for Efficient Local Planning in $q^\pi$-realizable MDPs

We consider approximate dynamic programming in $\gamma$-discounted Markov decision processes and apply it to approximate planning with linear value-function approximation. Our first contribution is a new variant of Approximate Policy…

Machine Learning · Computer Science 2022-10-31 Gellért Weisz , András György , Tadashi Kozuno , Csaba Szepesvári

Optimistic Regret Bounds for Online Learning in Adversarial Markov Decision Processes

The Adversarial Markov Decision Process (AMDP) is a learning framework that deals with unknown and varying tasks in decision-making applications like robotics and recommendation systems. A major limitation of the AMDP formalism, however, is…

Machine Learning · Statistics 2024-05-06 Sang Bin Moon , Abolfazl Hashemi

Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation

We study reinforcement learning with linear function approximation and adversarially changing cost functions, a setup that has mostly been considered under simplifying assumptions such as full information feedback or exploratory…

Machine Learning · Computer Science 2023-01-31 Uri Sherman , Tomer Koren , Yishay Mansour

Optimistically Optimistic Exploration for Provably Efficient Infinite-Horizon Reinforcement and Imitation Learning

We study the problem of reinforcement learning in infinite-horizon discounted linear Markov decision processes (MDPs), and propose the first computationally efficient algorithm achieving rate-optimal regret guarantees in this setting. Our…

Machine Learning · Computer Science 2026-03-16 Antoine Moulin , Gergely Neu , Luca Viano

Classification-based Approximate Policy Iteration: Experiments and Extended Discussions

Tackling large approximate dynamic programming or reinforcement learning problems requires methods that can exploit regularities, or intrinsic structure, of the problem in hand. Most current methods are geared towards exploiting the…

Machine Learning · Computer Science 2014-07-03 Amir-massoud Farahmand , Doina Precup , André M. S. Barreto , Mohammad Ghavamzadeh

Performance Bounds for Policy-Based Average Reward Reinforcement Learning Algorithms

Many policy-based reinforcement learning (RL) algorithms can be viewed as instantiations of approximate policy iteration (PI), i.e., where policy improvement and policy evaluation are both performed approximately. In applications where the…

Machine Learning · Computer Science 2023-06-29 Yashaswini Murthy , Mehrdad Moharrami , R. Srikant

Policy Learning with Adaptively Collected Data

Learning optimal policies from historical data enables personalization in a wide variety of applications including healthcare, digital recommendations, and online education. The growing policy learning literature focuses on settings where…

Machine Learning · Statistics 2022-11-17 Ruohan Zhan , Zhimei Ren , Susan Athey , Zhengyuan Zhou

Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time

A crucial problem in reinforcement learning is learning the optimal policy. We study this in tabular infinite-horizon discounted Markov decision processes under the online setting. The existing algorithms either fail to achieve regret…

Machine Learning · Computer Science 2023-12-13 Xiang Ji , Gen Li

Exploration-Enhanced POLITEX

We study algorithms for average-cost reinforcement learning problems with value function approximation. Our starting point is the recently proposed POLITEX algorithm, a version of policy iteration where the policy produced in each iteration…

Machine Learning · Computer Science 2019-08-29 Yasin Abbasi-Yadkori , Nevena Lazic , Csaba Szepesvari , Gellert Weisz

Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization

We consider the problem of learning in adversarial Markov decision processes [MDPs] with an oblivious adversary in a full-information setting. The agent interacts with an environment during $T$ episodes, each of which consists of $H$…

Machine Learning · Computer Science 2025-03-06 Daniil Tiapkin , Evgenii Chzhen , Gilles Stoltz

Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes

We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy…

Artificial Intelligence · Computer Science 2011-09-13 A. Fern , R. Givan , S. Yoon

Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning

We provide improved gap-dependent regret bounds for reinforcement learning in finite episodic Markov decision processes. Compared to prior work, our bounds depend on alternative definitions of gaps. These definitions are based on the…

Machine Learning · Computer Science 2021-10-27 Christoph Dann , Teodor V. Marinov , Mehryar Mohri , Julian Zimmert

Efficient Exploration in Average-Reward Constrained Reinforcement Learning: Achieving Near-Optimal Regret With Posterior Sampling

We present a new algorithm based on posterior sampling for learning in Constrained Markov Decision Processes (CMDP) in the infinite-horizon undiscounted setting. The algorithm achieves near-optimal regret bounds while being advantageous…

Machine Learning · Computer Science 2024-05-30 Danil Provodin , Maurits Kaptein , Mykola Pechenizkiy

An Approximate Dynamic Programming Approach to Adversarial Online Learning

We describe an approximate dynamic programming (ADP) approach to compute approximations of the optimal strategies and of the minimal losses that can be guaranteed in discounted repeated games with vector-valued losses. Such games…

Computer Science and Game Theory · Computer Science 2020-10-27 Vijay Kamble , Patrick Loiseau , Jean Walrand