English
Related papers

Related papers: Improved Regret for Efficient Online Reinforcement…

200 papers

We study online reinforcement learning in linear Markov decision processes with adversarial losses and bandit feedback, without prior knowledge on transitions or access to simulators. We introduce two algorithms that achieve improved regret…

Machine Learning · Computer Science 2023-10-19 Haolin Liu , Chen-Yu Wei , Julian Zimmert

We consider learning in an adversarial Markov Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes and the state space can be arbitrarily large. We assume that the Q-function of any policy is linear in…

Machine Learning · Computer Science 2023-06-05 Yan Dai , Haipeng Luo , Chen-Yu Wei , Julian Zimmert

Motivated by the strategic participation of electricity producers in electricity day-ahead market, we study the problem of online learning in repeated multi-unit uniform price auctions focusing on the adversarial opposing bid setting. The…

Computer Science and Game Theory · Computer Science 2025-01-20 Marius Potfer , Dorian Baudry , Hugo Richard , Vianney Perchet , Cheng Wan

We present an algorithm based on the \emph{Optimism in the Face of Uncertainty} (OFU) principle which is able to learn Reinforcement Learning (RL) modeled by Markov decision process (MDP) with finite state-action space efficiently. By…

Machine Learning · Computer Science 2020-01-01 Zihan Zhang , Xiangyang Ji

We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs. We show that an optimistic modification to value iteration achieves a regret bound of $\tilde{O}( \sqrt{HSAT} + H^2S^2A+H\sqrt{T})$…

Machine Learning · Statistics 2017-07-04 Mohammad Gheshlaghi Azar , Ian Osband , Rémi Munos

Policy optimization methods are one of the most widely used classes of Reinforcement Learning (RL) algorithms. Yet, so far, such methods have been mostly analyzed from an optimization perspective, without addressing the problem of…

Machine Learning · Computer Science 2020-06-19 Yonathan Efroni , Lior Shani , Aviv Rosenberg , Shie Mannor

We study how representation learning can improve the efficiency of bandit problems. We study the setting where we play $T$ linear bandits with dimension $d$ concurrently, and these $T$ bandit tasks share a common $k (\ll d)$ dimensional…

Machine Learning · Computer Science 2021-05-06 Jiaqi Yang , Wei Hu , Jason D. Lee , Simon S. Du

We develop a model selection approach to tackle reinforcement learning with adversarial corruption in both transition and reward. For finite-horizon tabular MDPs, without prior knowledge on the total amount of corruption, our algorithm…

Machine Learning · Computer Science 2024-12-31 Chen-Yu Wei , Christoph Dann , Julian Zimmert

We study the problem of reinforcement learning in infinite-horizon discounted linear Markov decision processes (MDPs), and propose the first computationally efficient algorithm achieving rate-optimal regret guarantees in this setting. Our…

Machine Learning · Computer Science 2026-03-16 Antoine Moulin , Gergely Neu , Luca Viano

Policy optimization is a widely-used method in reinforcement learning. Due to its local-search nature, however, theoretical guarantees on global optimality often rely on extra assumptions on the Markov Decision Processes (MDPs) that bypass…

Machine Learning · Computer Science 2021-07-20 Haipeng Luo , Chen-Yu Wei , Chung-Wei Lee

Learning Markov decision processes (MDP) in an adversarial environment has been a challenging problem. The problem becomes even more challenging with function approximation, since the underlying structure of the loss function and transition…

Machine Learning · Computer Science 2023-02-15 Fang Kong , Xiangcheng Zhang , Baoxiang Wang , Shuai Li

We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem where the sequence of loss functions associated with each arm are allowed to change without restriction over time. Under the assumption that the…

Machine Learning · Computer Science 2022-05-25 Gergely Neu , Julia Olkhovskaya

Modern tasks in reinforcement learning have large state and action spaces. To deal with them efficiently, one often uses predefined feature mapping to represent states and actions in a low-dimensional space. In this paper, we study…

Machine Learning · Computer Science 2021-02-24 Dongruo Zhou , Jiafan He , Quanquan Gu

We study the regret of reinforcement learning from offline data generated by a fixed behavior policy in an infinite-horizon discounted Markov decision process (MDP). While existing analyses of common approaches, such as fitted $Q$-iteration…

Machine Learning · Computer Science 2023-07-13 Yichun Hu , Nathan Kallus , Masatoshi Uehara

Obtaining first-order regret bounds -- regret bounds scaling not as the worst-case but with some measure of the performance of the optimal policy on a given instance -- is a core question in sequential decision-making. While such bounds…

Machine Learning · Computer Science 2022-10-24 Andrew Wagenmaker , Yifang Chen , Max Simchowitz , Simon S. Du , Kevin Jamieson

Exploration in reinforcement learning (RL) suffers from the curse of dimensionality when the state-action space is large. A common practice is to parameterize the high-dimensional value and policy functions using given features. However…

Machine Learning · Computer Science 2019-06-14 Lin F. Yang , Mengdi Wang

We consider an online learning problem where the learner interacts with a Markov decision process in a sequence of episodes, where the reward function is allowed to change between episodes in an adversarial manner and the learner only gets…

Machine Learning · Computer Science 2021-06-15 Gergely Neu , Julia Olkhovskaya

We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a reproducing kernel Hilbert space, which allows for a more flexible modeling of complex…

Machine Learning · Statistics 2023-10-04 Gergely Neu , Julia Olkhovskaya , Sattar Vakili

In this paper, we study reinforcement learning in Markov Decision Processes with Probabilistic Reward Machines (PRMs), a form of non-Markovian reward commonly found in robotics tasks. We design an algorithm for PRMs that achieves a regret…

Machine Learning · Statistics 2024-08-21 Xiaofeng Lin , Xuezhou Zhang

We consider online reinforcement learning in episodic Markov decision process (MDP) with unknown transition function and stochastic rewards drawn from some fixed but unknown distribution. The learner aims to learn the optimal policy and…

Machine Learning · Computer Science 2024-03-12 Vincent Leon , S. Rasoul Etesami
‹ Prev 1 2 3 10 Next ›