English
Related papers

Related papers: Online Learning in Kernelized Markov Decision Proc…

200 papers

We develop algorithms with low regret for learning episodic Markov decision processes based on kernel approximation techniques. The algorithms are based on both the Upper Confidence Bound (UCB) as well as Posterior or Thompson Sampling…

Machine Learning · Computer Science 2019-11-06 Sayak Ray Chowdhury , Aditya Gopalan

We consider online reinforcement learning in episodic Markov decision process (MDP) with unknown transition function and stochastic rewards drawn from some fixed but unknown distribution. The learner aims to learn the optimal policy and…

Machine Learning · Computer Science 2024-03-12 Vincent Leon , S. Rasoul Etesami

The online Markov decision process (MDP) is a generalization of the classical Markov decision process that incorporates changing reward functions. In this paper, we propose practical online MDP algorithms with policy iteration and…

Machine Learning · Computer Science 2015-10-16 Yao Ma , Hao Zhang , Masashi Sugiyama

We study online learning in constrained Markov decision processes (CMDPs) with adversarial losses and stochastic hard constraints, under bandit feedback. We consider three scenarios. In the first one, we address general CMDPs, where we…

Machine Learning · Computer Science 2025-02-10 Francesco Emanuele Stradi , Matteo Castiglioni , Alberto Marchesi , Nicola Gatti

We consider online learning for episodic stochastically constrained Markov decision processes (CMDPs), which plays a central role in ensuring the safety of reinforcement learning. Here the loss function can vary arbitrarily across the…

Machine Learning · Computer Science 2021-10-19 Shuang Qiu , Xiaohan Wei , Zhuoran Yang , Jieping Ye , Zhaoran Wang

In constrained Markov decision processes (CMDPs) with adversarial rewards and constraints, a well-known impossibility result prevents any algorithm from attaining both sublinear regret and sublinear constraint violation, when competing…

Machine Learning · Computer Science 2024-09-27 Francesco Emanuele Stradi , Anna Lunghi , Matteo Castiglioni , Alberto Marchesi , Nicola Gatti

We consider multiple parallel Markov decision processes (MDPs) coupled by global constraints, where the time varying objective and constraint functions can only be observed after the decision is made. Special attention is given to how well…

Optimization and Control · Mathematics 2017-09-12 Xiaohan Wei , Hao Yu , Michael J. Neely

We consider online learning in episodic loop-free Markov decision processes (MDPs), where the loss function can change arbitrarily between episodes, and the transition function is not known to the learner. We show…

Machine Learning · Computer Science 2019-05-21 Aviv Rosenberg , Yishay Mansour

We investigate online Markov Decision Processes (MDPs) with adversarially changing loss functions and known transitions. We choose dynamic regret as the performance measure, defined as the performance difference between the learner and any…

Machine Learning · Computer Science 2022-08-29 Peng Zhao , Long-Fei Li , Zhi-Hua Zhou

We consider an online learning problem where the learner interacts with a Markov decision process in a sequence of episodes, where the reward function is allowed to change between episodes in an adversarial manner and the learner only gets…

Machine Learning · Computer Science 2021-06-15 Gergely Neu , Julia Olkhovskaya

We consider Markov Decision Processes (MDPs) with deterministic transitions and study the problem of regret minimization, which is central to the analysis and design of optimal learning algorithms. We present logarithmic problem-specific…

Machine Learning · Computer Science 2021-06-29 Damianos Tranos , Alexandre Proutiere

We study the problem of online learning in a class of Markov decision processes known as linearly solvable MDPs. In the stationary version of this problem, a learner interacts with its environment by directly controlling the state…

Machine Learning · Computer Science 2017-06-07 Gergely Neu , Vicenç Gómez

Constrained Markov decision processes (CMDPs) are a common way to model safety constraints in reinforcement learning. State-of-the-art methods for efficiently solving CMDPs are based on primal-dual algorithms. For these algorithms, all…

Machine Learning · Computer Science 2024-07-22 Adrian Müller , Pragnya Alatur , Volkan Cevher , Giorgia Ramponi , Niao He

We propose novel classical and quantum online algorithms for learning finite-horizon and infinite-horizon average-reward Markov Decision Processes (MDPs). Our algorithms are based on a hybrid exploration-generative reinforcement learning…

Machine Learning · Computer Science 2025-08-12 Andris Ambainis , Joao F. Doriguello , Debbie Lim

We study learning in periodic Markov Decision Process(MDP), a special type of non-stationary MDP where both the state transition probabilities and reward functions vary periodically, under the average reward maximization setting. We…

Machine Learning · Computer Science 2022-07-26 Ayush Aniket , Arpan Chattopadhyay

While a large body of empirical results show that temporally-extended actions and options may significantly affect the learning performance of an agent, the theoretical understanding of how and when options can be beneficial in online…

Machine Learning · Computer Science 2017-04-18 Ronan Fruit , Alessandro Lazaric

We study learning in periodic Markov Decision Process (MDP), a special type of non-stationary MDP where both the state transition probabilities and reward functions vary periodically, under the average reward maximization setting. We…

Machine Learning · Computer Science 2023-03-20 Ayush Aniket , Arpan Chattopadhyay

We study the model-based undiscounted reinforcement learning for partially observable Markov decision processes (POMDPs). The oracle we consider is the optimal policy of the POMDP with a known environment in terms of the average reward over…

Machine Learning · Computer Science 2022-07-19 Yi Xiong , Ningyuan Chen , Xuefeng Gao , Xiang Zhou

In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a metric. Using a non-parametric model of the MDP built with…

Machine Learning · Computer Science 2022-03-25 Omar Darwiche Domingues , Pierre Ménard , Matteo Pirotta , Emilie Kaufmann , Michal Valko

We study the problem of learning Markov decision processes with finite state and action spaces when the transition probability distributions and loss functions are chosen adversarially and are allowed to change with time. We introduce an…

Machine Learning · Computer Science 2013-03-14 Yasin Abbasi-Yadkori , Peter L. Bartlett , Csaba Szepesvari
‹ Prev 1 2 3 10 Next ›