Related papers: Optimistic Initialization and Greediness Lead to P…

Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components. Assuming the factorization is known, we propose two model-based…

Machine Learning · Computer Science 2020-06-25 Yi Tian , Jian Qian , Suvrit Sra

Polynomial Time Reinforcement Learning in Factored State MDPs with Linear Value Functions

Many reinforcement learning (RL) environments in practice feature enormous state spaces that may be described compactly by a "factored" structure, that may be modeled by Factored Markov Decision Processes (FMDPs). We present the first…

Machine Learning · Computer Science 2022-03-08 Zihao Deng , Siddartha Devic , Brendan Juba

Optimistically Optimistic Exploration for Provably Efficient Infinite-Horizon Reinforcement and Imitation Learning

We study the problem of reinforcement learning in infinite-horizon discounted linear Markov decision processes (MDPs), and propose the first computationally efficient algorithm achieving rate-optimal regret guarantees in this setting. Our…

Machine Learning · Computer Science 2026-03-16 Antoine Moulin , Gergely Neu , Luca Viano

Incentive Decision Processes

We consider Incentive Decision Processes, where a principal seeks to reduce its costs due to another agent's behavior, by offering incentives to the agent for alternate behavior. We focus on the case where a principal interacts with a…

Computer Science and Game Theory · Computer Science 2012-10-19 Sashank J. Reddi , Emma Brunskill

Factored Value Iteration Converges

In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solution of factored Markov decision processes (fMDPs). The traditional approximate value iteration algorithm is modified in two ways. For one,…

Artificial Intelligence · Computer Science 2008-08-13 Istvan Szita , Andras Lorincz

Planning in Observable POMDPs in Quasipolynomial Time

Partially Observable Markov Decision Processes (POMDPs) are a natural and general model in reinforcement learning that take into account the agent's uncertainty about its current state. In the literature on POMDPs, it is customary to assume…

Machine Learning · Computer Science 2022-03-24 Noah Golowich , Ankur Moitra , Dhruv Rohatgi

Thompson Sampling for Learning Parameterized Markov Decision Processes

We consider reinforcement learning in parameterized Markov Decision Processes (MDPs), where the parameterization may induce correlation across transition probabilities or rewards. Consequently, observing a particular state transition might…

Machine Learning · Statistics 2015-04-01 Aditya Gopalan , Shie Mannor

Learning in Observable POMDPs, without Computationally Intractable Oracles

Much of reinforcement learning theory is built on top of oracles that are computationally hard to implement. Specifically for learning near-optimal policies in Partially Observable Markov Decision Processes (POMDPs), existing algorithms…

Machine Learning · Computer Science 2022-06-08 Noah Golowich , Ankur Moitra , Dhruv Rohatgi

Infinite-Horizon Policy-Gradient Estimation

Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in…

Artificial Intelligence · Computer Science 2019-11-18 Jonathan Baxter , Peter L. Bartlett

On Optimistic versus Randomized Exploration in Reinforcement Learning

We discuss the relative merits of optimistic and randomized approaches to exploration in reinforcement learning. Optimistic approaches presented in the literature apply an optimistic boost to the value estimate at each state-action pair and…

Machine Learning · Statistics 2017-06-15 Ian Osband , Benjamin Van Roy

Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL

Reinforcement learning (RL) in episodic, factored Markov decision processes (FMDPs) is studied. We propose an algorithm called FMDP-BF, which leverages the factorization structure of FMDP. The regret of FMDP-BF is shown to be exponentially…

Machine Learning · Computer Science 2021-03-11 Xiaoyu Chen , Jiachen Hu , Lihong Li , Liwei Wang

Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes

Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding…

Artificial Intelligence · Computer Science 2011-06-02 N. L. Zhang , W. Zhang

Learning Optimal Admission Control in Partially Observable Queueing Networks

We present an efficient reinforcement learning algorithm that learns the optimal admission control policy in a partially observable queueing network. Specifically, only the arrival and departure times from the network are observable, and…

Machine Learning · Computer Science 2023-08-07 Jonatha Anselmi , Bruno Gaujal , Louis-Sébastien Rebuffi

Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time

A crucial problem in reinforcement learning is learning the optimal policy. We study this in tabular infinite-horizon discounted Markov decision processes under the online setting. The existing algorithms either fail to achieve regret…

Machine Learning · Computer Science 2023-12-13 Xiang Ji , Gen Li

Greedy Algorithm for Inference of Decision Trees from Decision Rule Systems

Decision trees and decision rule systems play important roles as classifiers, knowledge representation tools, and algorithms. They are easily interpretable models for data analysis, making them widely used and studied in computer science.…

Artificial Intelligence · Computer Science 2024-01-17 Kerven Durdymyradov , Mikhail Moshkov

Stochastic convex optimization for provably efficient apprenticeship learning

We consider large-scale Markov decision processes (MDPs) with an unknown cost function and employ stochastic convex optimization tools to address the problem of imitation learning, which consists of learning a policy from a finite set of…

Machine Learning · Computer Science 2022-01-04 Angeliki Kamoutsi , Goran Banjac , John Lygeros

Experimental results : Reinforcement Learning of POMDPs using Spectral Methods

We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive)…

Artificial Intelligence · Computer Science 2017-06-20 Kamyar Azizzadenesheli , Alessandro Lazaric , Animashree Anandkumar

Optimistic {\epsilon}-Greedy Exploration for Cooperative Multi-Agent Reinforcement Learning

The Centralized Training with Decentralized Execution (CTDE) paradigm is widely used in cooperative multi-agent reinforcement learning. However, conventional methods based on CTDE can suffer from value underestimation and converge to…

Multiagent Systems · Computer Science 2026-05-05 Ruoning Zhang , Siying Wang , Wenyu Chen , Yang Zhou , Zhitong Zhao , Zixuan Zhang , Ruijie Zhang , Stefano V. Albrecht

A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes

The proximal policy optimization (PPO) algorithm stands as one of the most prosperous methods in the field of reinforcement learning (RL). Despite its success, the theoretical understanding of PPO remains deficient. Specifically, it is…

Machine Learning · Computer Science 2023-06-09 Han Zhong , Tong Zhang

A Unifying View of Optimism in Episodic Reinforcement Learning

The principle of optimism in the face of uncertainty underpins many theoretically successful reinforcement learning algorithms. In this paper we provide a general framework for designing, analyzing and implementing such algorithms in the…

Machine Learning · Computer Science 2020-07-07 Gergely Neu , Ciara Pike-Burke