Related papers: Representation Policy Iteration

Geometric Policy Iteration for Markov Decision Processes

Recently discovered polyhedral structures of the value function for finite state-action discounted Markov decision processes (MDP) shed light on understanding the success of reinforcement learning. We investigate the value function polytope…

Machine Learning · Computer Science 2022-06-27 Yue Wu , Jesús A. De Loera

Learning Efficient Representations for Reinforcement Learning

Markov decision processes (MDPs) are a well studied framework for solving sequential decision making problems under uncertainty. Exact methods for solving MDPs based on dynamic programming such as policy iteration and value iteration are…

Artificial Intelligence · Computer Science 2015-09-09 Yanping Huang

Representation Learning on Graphs: A Reinforcement Learning Application

In this work, we study value function approximation in reinforcement learning (RL) problems with high dimensional state or action spaces via a generalized version of representation policy iteration (RPI). We consider the limitations of…

Machine Learning · Computer Science 2019-01-18 Sephora Madjiheurem , Laura Toni

Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation

We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are…

Machine Learning · Computer Science 2022-10-17 Anna Winnicki , R. Srikant

Policy Iteration for Relational MDPs

Relational Markov Decision Processes are a useful abstraction for complex reinforcement learning problems and stochastic planning problems. Recent work developed representation schemes and algorithms for planning in such problems using the…

Artificial Intelligence · Computer Science 2012-06-26 Chenggang Wang , Roni Khardon

Robust Reinforcement Learning using Least Squares Policy Iteration with Provable Performance Guarantees

This paper addresses the problem of model-free reinforcement learning for Robust Markov Decision Process (RMDP) with large state spaces. The goal of the RMDP framework is to find a policy that is robust against the parameter uncertainties…

Machine Learning · Computer Science 2021-02-15 Kishan Panaganti , Dileep Kalathil

Adaptive Approximate Policy Iteration

Model-free reinforcement learning algorithms combined with value function approximation have recently achieved impressive performance in a variety of application domains. However, the theoretical understanding of such algorithms is limited,…

Machine Learning · Computer Science 2021-02-12 Botao Hao , Nevena Lazic , Yasin Abbasi-Yadkori , Pooria Joulani , Csaba Szepesvari

Policy Iteration for Factored MDPs

Many large MDPs can be represented compactly using a dynamic Bayesian network. Although the structure of the value function does not retain the structure of the process, recent work has shown that value functions in factored MDPs can often…

Artificial Intelligence · Computer Science 2013-01-18 Daphne Koller , Ron Parr

Geometric Re-Analysis of Classical MDP Solving Algorithms

We build on a recently introduced geometric interpretation of Markov Decision Processes (MDPs) to analyze classical MDP-solving algorithms: Value Iteration (VI) and Policy Iteration (PI). First, we develop a geometry-based analytical…

Machine Learning · Computer Science 2025-03-07 Arsenii Mustafin , Aleksei Pakharev , Alex Olshevsky , Ioannis Ch. Paschalidis

Addressing Finite-Horizon MDPs via Low-Rank Tensor Value Approximation

We study the problem of learning optimal policies in finite-horizon Markov Decision Processes (MDPs) using low-rank reinforcement learning (RL) methods. In finite-horizon MDPs, the policies, and therefore the value functions (VFs) are not…

Machine Learning · Computer Science 2026-05-14 Sergio Rozada , Jose Luis Orejuela , Antonio G. Marques

Learning Policy Representations for Steerable Behavior Synthesis

Given a Markov decision process (MDP), we seek to learn representations for a range of policies to facilitate behavior steering at test time. As policies of an MDP are uniquely determined by their occupancy measures, we propose modeling…

Machine Learning · Computer Science 2026-02-02 Beiming Li , Sergio Rozada , Alejandro Ribeiro

Stochastic convex optimization for provably efficient apprenticeship learning

We consider large-scale Markov decision processes (MDPs) with an unknown cost function and employ stochastic convex optimization tools to address the problem of imitation learning, which consists of learning a policy from a finite set of…

Machine Learning · Computer Science 2022-01-04 Angeliki Kamoutsi , Goran Banjac , John Lygeros

Confident Approximate Policy Iteration for Efficient Local Planning in $q^\pi$-realizable MDPs

We consider approximate dynamic programming in $\gamma$-discounted Markov decision processes and apply it to approximate planning with linear value-function approximation. Our first contribution is a new variant of Approximate Policy…

Machine Learning · Computer Science 2022-10-31 Gellért Weisz , András György , Tadashi Kozuno , Csaba Szepesvári

On the Complexity of Policy Iteration

Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MDPs). Policy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the…

Artificial Intelligence · Computer Science 2013-01-30 Yishay Mansour , Satinder Singh

Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation Complexity

Reinforcement Learning (RL) encompasses diverse paradigms, including model-based RL, policy-based RL, and value-based RL, each tailored to approximate the model, optimal policy, and optimal value function, respectively. This work…

Machine Learning · Computer Science 2024-12-10 Guhao Feng , Han Zhong

Bridging State and History Representations: Understanding Self-Predictive RL

Representations are at the core of all deep reinforcement learning (RL) methods for both Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). Many representation learning methods and theoretical…

Machine Learning · Computer Science 2024-04-23 Tianwei Ni , Benjamin Eysenbach , Erfan Seyedsalehi , Michel Ma , Clement Gehring , Aditya Mahajan , Pierre-Luc Bacon

Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations

In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning…

Machine Learning · Computer Science 2018-08-23 Dimitri P. Bertsekas

Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes

We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy…

Artificial Intelligence · Computer Science 2011-09-13 A. Fern , R. Givan , S. Yoon

Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space -- Fundamental Theory and Methods

Policy iteration (PI) is a recursive process of policy evaluation and improvement for solving an optimal decision-making/control problem, or in other words, a reinforcement learning (RL) problem. PI has also served as the fundamental for…

Artificial Intelligence · Computer Science 2021-04-06 Jaeyoung Lee , Richard S. Sutton

Proper Laplacian Representation Learning

The ability to learn good representations of states is essential for solving large reinforcement learning problems, where exploration, generalization, and transfer are particularly challenging. The Laplacian representation is a promising…

Machine Learning · Computer Science 2024-04-04 Diego Gomez , Michael Bowling , Marlos C. Machado