English
Related papers

Related papers: Model-based Reinforcement Learning for Continuous …

200 papers

This work advances randomized exploration in reinforcement learning (RL) with function approximation modeled by linear mixture MDPs. We establish the first prior-dependent Bayesian regret bound for RL with function approximation; and refine…

Machine Learning · Statistics 2024-03-19 Yingru Li , Zhi-Quan Luo

Most provably-efficient learning algorithms introduce optimism about poorly-understood states and actions to encourage exploration. We study an alternative approach for efficient exploration, posterior sampling for reinforcement learning…

Machine Learning · Statistics 2013-12-30 Ian Osband , Daniel Russo , Benjamin Van Roy

We analyze the Bayesian regret of the Gaussian process posterior sampling reinforcement learning (GP-PSRL) algorithm. Posterior sampling is an effective heuristic for decision-making under uncertainty that has been used to develop…

Machine Learning · Statistics 2026-03-10 Hamish Flynn , Joe Watson , Ingmar Posner , Jan Peters

We develop an extension of posterior sampling for reinforcement learning (PSRL) that is suited for a continuing agent-environment interface and integrates naturally into agent designs that scale to complex environments. The approach,…

Machine Learning · Computer Science 2025-10-15 Wanqiao Xu , Shi Dong , Benjamin Van Roy

Constrained Markov decision processes (CMDPs) model scenarios of sequential decision making with multiple objectives that are increasingly important in many applications. However, the model is often unknown and must be learned online while…

Machine Learning · Computer Science 2023-01-30 Krishna C Kalagarla , Rahul Jain , Pierluigi Nuzzo

This paper studies model-based reinforcement learning (RL) for regret minimization. We focus on finite-horizon episodic RL where the transition model $P$ belongs to a known family of models $\mathcal{P}$, a special case of which is when…

Machine Learning · Computer Science 2020-06-02 Alex Ayoub , Zeyu Jia , Csaba Szepesvari , Mengdi Wang , Lin F. Yang

We consider the problem of online reinforcement learning for the Stochastic Shortest Path (SSP) problem modeled as an unknown MDP with an absorbing state. We propose PSRL-SSP, a simple posterior sampling-based reinforcement learning…

Machine Learning · Computer Science 2021-06-11 Mehdi Jafarnia-Jahromi , Liyu Chen , Rahul Jain , Haipeng Luo

We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon $H$ with $S$ states, and $A$ actions. The performance of an agent is measured by the regret after…

We present a new algorithm based on posterior sampling for learning in Constrained Markov Decision Processes (CMDP) in the infinite-horizon undiscounted setting. The algorithm achieves near-optimal regret bounds while being advantageous…

Machine Learning · Computer Science 2024-05-30 Danil Provodin , Maurits Kaptein , Mykola Pechenizkiy

We propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule. Our algorithm termed deterministic schedule PSRL (DS-PSRL) is…

Machine Learning · Computer Science 2018-10-24 Georgios Theocharous , Zheng Wen , Yasin Abbasi-Yadkori , Nikos Vlassis

Recent studies in reinforcement learning (RL) have made significant progress by leveraging function approximation to alleviate the sample complexity hurdle for better performance. Despite the success, existing provably efficient algorithms…

Machine Learning · Computer Science 2023-11-07 Nikki Lijing Kuang , Ming Yin , Mengdi Wang , Yu-Xiang Wang , Yi-An Ma

Despite remarkable successes, deep reinforcement learning algorithms remain sample inefficient: they require an enormous amount of trial and error to find good policies. Model-based algorithms promise sample efficiency by building an…

Machine Learning · Computer Science 2023-05-19 Remo Sasso , Michelangelo Conserva , Paulo Rauber

Computational results demonstrate that posterior sampling for reinforcement learning (PSRL) dramatically outperforms algorithms driven by optimism, such as UCRL2. We provide insight into the extent of this performance boost and the…

Machine Learning · Statistics 2017-06-14 Ian Osband , Benjamin Van Roy

Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer $\Omega(\sqrt{SAT})$ regret on some MDP, where $T$ is the elapsed time and $S$ and $A$ are the cardinalities of the state and action…

Machine Learning · Statistics 2014-11-04 Ian Osband , Benjamin Van Roy

We present a new algorithm based on posterior sampling for learning in constrained Markov decision processes (CMDP) in the infinite-horizon undiscounted setting. The algorithm achieves near-optimal regret bounds while being advantageous…

Machine Learning · Computer Science 2023-09-28 Danil Provodin , Pratik Gajane , Mykola Pechenizkiy , Maurits Kaptein

We study reinforcement learning (RL) for a class of continuous-time linear-quadratic (LQ) control problems for diffusions, where states are scalar-valued and running control rewards are absent but volatilities of the state processes depend…

Machine Learning · Computer Science 2025-07-25 Yilie Huang , Yanwei Jia , Xun Yu Zhou

Exploration in reinforcement learning (RL) suffers from the curse of dimensionality when the state-action space is large. A common practice is to parameterize the high-dimensional value and policy functions using given features. However…

Machine Learning · Computer Science 2019-06-14 Lin F. Yang , Mengdi Wang

We study the constrained reinforcement learning problem, in which an agent aims to maximize the expected cumulative reward subject to a constraint on the expected total value of a utility function. In contrast to existing model-based…

Machine Learning · Computer Science 2023-01-10 Arnob Ghosh , Xingyu Zhou , Ness Shroff

We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces. Suppose that the transition dynamics and reward function is unknown, but the state and action space is…

Machine Learning · Computer Science 2019-05-07 Lin F. Yang , Chengzhuo Ni , Mengdi Wang

In several realistic situations, an interactive learning agent can practice and refine its strategy before going on to be evaluated. For instance, consider a student preparing for a series of tests. She would typically take a few practice…

Machine Learning · Computer Science 2017-06-08 Sudeep Raja Putta , Theja Tulabandhula
‹ Prev 1 2 3 10 Next ›