English
Related papers

Related papers: Exploration-Enhanced POLITEX

200 papers

In this work, we study algorithms for learning in infinite-horizon undiscounted Markov decision processes (MDPs) with function approximation. We first show that the regret analysis of the Politex algorithm (a version of regularized policy…

Machine Learning · Computer Science 2021-02-26 Nevena Lazic , Dong Yin , Yasin Abbasi-Yadkori , Csaba Szepesvari

We study reinforcement learning with linear function approximation and adversarially changing cost functions, a setup that has mostly been considered under simplifying assumptions such as full information feedback or exploratory…

Machine Learning · Computer Science 2023-01-31 Uri Sherman , Tomer Koren , Yishay Mansour

Model-free reinforcement learning algorithms combined with value function approximation have recently achieved impressive performance in a variety of application domains. However, the theoretical understanding of such algorithms is limited,…

Machine Learning · Computer Science 2021-02-12 Botao Hao , Nevena Lazic , Yasin Abbasi-Yadkori , Pooria Joulani , Csaba Szepesvari

While learning in an unknown Markov Decision Process (MDP), an agent should trade off exploration to discover new information about the MDP, and exploitation of the current knowledge to maximize the reward. Although the agent will…

Machine Learning · Computer Science 2020-07-16 Evrard Garcelon , Mohammad Ghavamzadeh , Alessandro Lazaric , Matteo Pirotta

We study the problem of reinforcement learning in infinite-horizon discounted linear Markov decision processes (MDPs), and propose the first computationally efficient algorithm achieving rate-optimal regret guarantees in this setting. Our…

Machine Learning · Computer Science 2026-03-16 Antoine Moulin , Gergely Neu , Luca Viano

We study the problem of infinite-horizon average-reward reinforcement learning with linear Markov decision processes (MDPs). The associated Bellman operator of the problem not being a contraction makes the algorithm design challenging.…

Machine Learning · Statistics 2025-03-12 Kihyuk Hong , Woojin Chae , Yufan Zhang , Dabeen Lee , Ambuj Tewari

The theory of reinforcement learning has focused on two fundamental problems: achieving low regret, and identifying $\epsilon$-optimal policies. While a simple reduction allows one to apply a low-regret algorithm to obtain an…

Machine Learning · Computer Science 2022-06-23 Andrew Wagenmaker , Max Simchowitz , Kevin Jamieson

We study reinforcement learning (RL) for decision processes with non-Markovian reward, in which high-level knowledge of the task in the form of reward machines is available to the learner. We consider probabilistic reward machines with…

Machine Learning · Computer Science 2024-12-30 Hippolyte Bourel , Anders Jonsson , Odalric-Ambrym Maillard , Chenxiao Ma , Mohammad Sadegh Talebi

We study the challenging exploration incentive problem in both bandit and reinforcement learning, where the rewards are scale-free and potentially unbounded, driven by real-world scenarios and differing from existing work. Past works in…

Machine Learning · Computer Science 2024-05-07 Mengfan Xu , Diego Klabjan

A default assumption in the design of reinforcement-learning algorithms is that a decision-making agent always explores to learn optimal behavior. In sufficiently complex environments that approach the vastness and scale of the real world,…

Machine Learning · Computer Science 2024-07-23 Dilip Arumugam , Saurabh Kumar , Ramki Gummadi , Benjamin Van Roy

Exploration is widely regarded as one of the most challenging aspects of reinforcement learning (RL), with many naive approaches succumbing to exponential sample complexity. To isolate the challenges of exploration, we propose a new…

Machine Learning · Computer Science 2020-02-10 Chi Jin , Akshay Krishnamurthy , Max Simchowitz , Tiancheng Yu

In model-based solution approaches to the problem of learning in an unknown environment, exploring to learn the model parameters takes a toll on the regret. The optimal performance with respect to regret or PAC bounds is achievable, if the…

Machine Learning · Computer Science 2015-10-13 P. Prasanna , Sarath Chandar , Balaraman Ravindran

Direct policy gradient methods for reinforcement learning are a successful approach for a variety of reasons: they are model free, they directly optimize the performance metric of interest, and they allow for richly parameterized policies.…

Machine Learning · Computer Science 2020-08-14 Alekh Agarwal , Mikael Henaff , Sham Kakade , Wen Sun

How do you incentivize self-interested agents to $\textit{explore}$ when they prefer to $\textit{exploit}$? We consider complex exploration problems, where each agent faces the same (but unknown) MDP. In contrast with traditional…

Machine Learning · Computer Science 2023-02-21 Max Simchowitz , Aleksandrs Slivkins

This work theoretically studies a ubiquitous reinforcement learning policy for controlling the canonical model of continuous-time stochastic linear-quadratic systems. We show that randomized certainty equivalent policy addresses the…

Machine Learning · Computer Science 2022-08-23 Mohamad Kazem Shirani Faradonbeh

We study the model-based undiscounted reinforcement learning for partially observable Markov decision processes (POMDPs). The oracle we consider is the optimal policy of the POMDP with a known environment in terms of the average reward over…

Machine Learning · Computer Science 2022-07-19 Yi Xiong , Ningyuan Chen , Xuefeng Gao , Xiang Zhou

Policy optimization is a widely-used method in reinforcement learning. Due to its local-search nature, however, theoretical guarantees on global optimality often rely on extra assumptions on the Markov Decision Processes (MDPs) that bypass…

Machine Learning · Computer Science 2021-07-20 Haipeng Luo , Chen-Yu Wei , Chung-Wei Lee

We address reinforcement learning problems with finite state and action spaces where the underlying MDP has some known structure that could be potentially exploited to minimize the exploration rates of suboptimal (state, action) pairs. For…

Machine Learning · Computer Science 2018-11-30 Jungseul Ok , Alexandre Proutiere , Damianos Tranos

We tackle average-reward infinite-horizon POMDPs with an unknown transition model but a known observation model, a setting that has been previously addressed in two limiting ways: (i) frequentist methods relying on suboptimal stochastic…

Machine Learning · Computer Science 2025-09-09 Alessio Russo , Alberto Maria Metelli , Marcello Restelli

The specification of aMarkov decision process (MDP) can be difficult. Reward function specification is especially problematic; in practice, it is often cognitively complex and time-consuming for users to precisely specify rewards. This work…

Artificial Intelligence · Computer Science 2012-05-14 Kevin Regan , Craig Boutilier
‹ Prev 1 2 3 10 Next ›