Related papers: Randomized Exploration for Reinforcement Learning …

Frequentist Regret Bounds for Randomized Least-Squares Value Iteration

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning (RL). When the state space is large or continuous, traditional tabular approaches are unfeasible and some form of function approximation is mandatory.…

Machine Learning · Computer Science 2023-09-11 Andrea Zanette , David Brandfonbrener , Emma Brunskill , Matteo Pirotta , Alessandro Lazaric

Generalization and Exploration via Randomized Value Functions

We propose randomized least-squares value iteration (RLSVI) -- a new reinforcement learning algorithm designed to explore and generalize efficiently via linearly parameterized value functions. We explain why versions of least-squares value…

Machine Learning · Statistics 2016-02-16 Ian Osband , Benjamin Van Roy , Zheng Wen

Worst-Case Regret Bounds for Exploration via Randomized Value Functions

This paper studies a recent proposal to use randomized value functions to drive exploration in reinforcement learning. These randomized value functions are generated by injecting random noise into the training data, making the approach…

Machine Learning · Computer Science 2024-09-23 Daniel Russo

Near-Optimal Randomized Exploration for Tabular Markov Decision Processes

We study algorithms using randomized value functions for exploration in reinforcement learning. This type of algorithms enjoys appealing empirical performance. We show that when we use 1) a single random seed in each episode, and 2) a…

Machine Learning · Computer Science 2022-10-14 Zhihan Xiong , Ruoqi Shen , Qiwen Cui , Maryam Fazel , Simon S. Du

Provably Efficient Model-Free Constrained RL with Linear Function Approximation

We study the constrained reinforcement learning problem, in which an agent aims to maximize the expected cumulative reward subject to a constraint on the expected total value of a utility function. In contrast to existing model-based…

Machine Learning · Computer Science 2023-01-10 Arnob Ghosh , Xingyu Zhou , Ness Shroff

Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration

This paper studies regret minimization with randomized value functions in reinforcement learning. In tabular finite-horizon Markov Decision Processes, we introduce a clipping variant of one classical Thompson Sampling (TS)-like algorithm,…

Machine Learning · Computer Science 2021-11-10 Priyank Agrawal , Jinglin Chen , Nan Jiang

Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

We study risk-sensitive reinforcement learning in episodic Markov decision processes with unknown transition kernels, where the goal is to optimize the total reward under the risk measure of exponential utility. We propose two provably…

Machine Learning · Computer Science 2020-06-25 Yingjie Fei , Zhuoran Yang , Yudong Chen , Zhaoran Wang , Qiaomin Xie

Deep Exploration via Randomized Value Functions

We study the use of randomized value functions to guide deep exploration in reinforcement learning. This offers an elegant means for synthesizing statistically and computationally efficient exploration with common practical approaches to…

Machine Learning · Statistics 2019-09-25 Ian Osband , Benjamin Van Roy , Daniel Russo , Zheng Wen

Optimistically Optimistic Exploration for Provably Efficient Infinite-Horizon Reinforcement and Imitation Learning

We study the problem of reinforcement learning in infinite-horizon discounted linear Markov decision processes (MDPs), and propose the first computationally efficient algorithm achieving rate-optimal regret guarantees in this setting. Our…

Machine Learning · Computer Science 2026-03-16 Antoine Moulin , Gergely Neu , Luca Viano

Optimism in Reinforcement Learning with Generalized Linear Function Approximation

We design a new provably efficient algorithm for episodic reinforcement learning with generalized linear function approximation. We analyze the algorithm under a new expressivity assumption that we call "optimistic closure," which is…

Machine Learning · Statistics 2019-12-10 Yining Wang , Ruosong Wang , Simon S. Du , Akshay Krishnamurthy

Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

We study reinforcement learning with linear function approximation where the transition probability and reward functions are linear with respect to a feature mapping $\boldsymbol{\phi}(s,a)$. Specifically, we consider the episodic…

Machine Learning · Computer Science 2023-01-31 Pihe Hu , Yu Chen , Longbo Huang

Value Bonuses using Ensemble Errors for Exploration in Reinforcement Learning

Optimistic value estimates provide one mechanism for directed exploration in reinforcement learning (RL). The agent acts greedily with respect to an estimate of the value plus what can be seen as a value bonus. The value bonus can be…

Machine Learning · Computer Science 2026-02-16 Abdul Wahab , Raksha Kumaraswamy , Martha White

Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension

Value function approximation has demonstrated phenomenal empirical success in reinforcement learning (RL). Nevertheless, despite a handful of recent progress on developing theory for RL with linear function approximation, the understanding…

Machine Learning · Computer Science 2020-06-22 Ruosong Wang , Ruslan Salakhutdinov , Lin F. Yang

Towards Efficient Online Exploration for Reinforcement Learning with Human Feedback

Reinforcement learning with human feedback (RLHF), which learns a reward model from human preference data and then optimizes a policy to favor preferred responses, has emerged as a central paradigm for aligning large language models (LLMs)…

Machine Learning · Statistics 2025-09-29 Gen Li , Yuling Yan

On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces

The classical theory of reinforcement learning (RL) has focused on tabular and linear representations of value functions. Further progress hinges on combining RL with modern function approximators such as kernel functions and deep neural…

Machine Learning · Computer Science 2021-01-01 Zhuoran Yang , Chi Jin , Zhaoran Wang , Mengdi Wang , Michael I. Jordan

Beyond Optimism: Exploration With Partially Observable Rewards

Exploration in reinforcement learning (RL) remains an open challenge. RL algorithms rely on observing rewards to train the agent, and if informative rewards are sparse the agent learns slowly or may not learn at all. To improve exploration…

Machine Learning · Computer Science 2024-11-12 Simone Parisi , Alireza Kazemipour , Michael Bowling

Provably Efficient Exploration in Reward Machines with Low Regret

We study reinforcement learning (RL) for decision processes with non-Markovian reward, in which high-level knowledge of the task in the form of reward machines is available to the learner. We consider probabilistic reward machines with…

Machine Learning · Computer Science 2024-12-30 Hippolyte Bourel , Anders Jonsson , Odalric-Ambrym Maillard , Chenxiao Ma , Mohammad Sadegh Talebi

Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning

Offline reinforcement learning (RL), where the agent aims to learn the optimal policy based on the data collected by a behavior policy, has attracted increasing attention in recent years. While offline RL with linear function approximation…

Machine Learning · Computer Science 2024-10-10 Qiwei Di , Heyang Zhao , Jiafan He , Quanquan Gu

Provably Efficient Reinforcement Learning via Surprise Bound

Value function approximation is important in modern reinforcement learning (RL) problems especially when the state space is (infinitely) large. Despite the importance and wide applicability of value function approximation, its theoretical…

Machine Learning · Computer Science 2023-02-24 Hanlin Zhu , Ruosong Wang , Jason D. Lee

Efficient Learning in Non-Stationary Linear Markov Decision Processes

We study episodic reinforcement learning in non-stationary linear (a.k.a. low-rank) Markov Decision Processes (MDPs), i.e, both the reward and transition kernel are linear with respect to a given feature map and are allowed to evolve either…

Machine Learning · Computer Science 2021-12-28 Ahmed Touati , Pascal Vincent