Related papers: Batch Value-function Approximation with Only Reali…

Provably Efficient Reinforcement Learning with Linear Function Approximation

Modern Reinforcement Learning (RL) is commonly applied to practical problems with an enormous number of states, where function approximation must be deployed to approximate either the value function or the policy. The introduction of…

Machine Learning · Computer Science 2019-08-09 Chi Jin , Zhuoran Yang , Zhaoran Wang , Michael I. Jordan

On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces

The classical theory of reinforcement learning (RL) has focused on tabular and linear representations of value functions. Further progress hinges on combining RL with modern function approximators such as kernel functions and deep neural…

Machine Learning · Computer Science 2021-01-01 Zhuoran Yang , Chi Jin , Zhaoran Wang , Mengdi Wang , Michael I. Jordan

Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension

Value function approximation has demonstrated phenomenal empirical success in reinforcement learning (RL). Nevertheless, despite a handful of recent progress on developing theory for RL with linear function approximation, the understanding…

Machine Learning · Computer Science 2020-06-22 Ruosong Wang , Ruslan Salakhutdinov , Lin F. Yang

Information-Theoretic Considerations in Batch Reinforcement Learning

Value-function approximation methods that operate in batch mode have foundational importance to reinforcement learning (RL). Finite sample guarantees for these methods often crucially rely on two types of assumptions: (1) mild distribution…

Machine Learning · Computer Science 2019-05-02 Jinglin Chen , Nan Jiang

A Greedy Approximation of Bayesian Reinforcement Learning with Probably Optimistic Transition Model

Bayesian Reinforcement Learning (RL) is capable of not only incorporating domain knowledge, but also solving the exploration-exploitation dilemma in a natural way. As Bayesian RL is intractable except for special cases, previous work has…

Artificial Intelligence · Computer Science 2013-06-14 Kenji Kawaguchi , Mauricio Araya

Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance

Reinforcement Learning with Verifiable Rewards (RLVR) has achieved great success in developing Large Language Models (LLMs) with chain-of-thought rollouts for many tasks such as math and coding. Nevertheless, RLVR struggles with sample…

Machine Learning · Computer Science 2026-05-15 Kai Yan , Alexander G. Schwing , Yu-Xiong Wang

Context Bootstrapped Reinforcement Learning

Reinforcement Learning from Verifiable Rewards (RLVR) suffers from exploration inefficiency, where models struggle to generate successful rollouts, resulting in minimal learning signal. This challenge is particularly severe for tasks that…

Machine Learning · Computer Science 2026-03-20 Saaket Agashe , Jayanth Srinivasa , Gaowen Liu , Ramana Kompella , Xin Eric Wang

Provably Good Batch Reinforcement Learning Without Great Exploration

Batch reinforcement learning (RL) is important to apply RL algorithms to many high stakes tasks. Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions…

Machine Learning · Computer Science 2020-07-23 Yao Liu , Adith Swaminathan , Alekh Agarwal , Emma Brunskill

Reinforcement Learning with Probabilistically Complete Exploration

Balancing exploration and exploitation remains a key challenge in reinforcement learning (RL). State-of-the-art RL algorithms suffer from high sample complexity, particularly in the sparse reward case, where they can do no better than to…

Machine Learning · Computer Science 2020-01-22 Philippe Morere , Gilad Francis , Tom Blau , Fabio Ramos

Provably Efficient Reinforcement Learning via Surprise Bound

Value function approximation is important in modern reinforcement learning (RL) problems especially when the state space is (infinitely) large. Despite the importance and wide applicability of value function approximation, its theoretical…

Machine Learning · Computer Science 2023-02-24 Hanlin Zhu , Ruosong Wang , Jason D. Lee

Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning

Mathematical reasoning is a central challenge for large language models (LLMs), requiring not only correct answers but also faithful reasoning processes. Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising…

Machine Learning · Computer Science 2025-12-02 Md Tanvirul Alam , Nidhi Rastogi

Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison

We prove performance guarantees of two algorithms for approximating $Q^\star$ in batch reinforcement learning. Compared to classical iterative methods such as Fitted Q-Iteration---whose performance loss incurs quadratic dependence on…

Machine Learning · Computer Science 2020-08-25 Tengyang Xie , Nan Jiang

Chart-RL: Generalized Chart Comprehension via Reinforcement Learning with Verifiable Rewards

Accurate chart comprehension represents a critical challenge in advancing multimodal learning systems, as extensive information is compressed into structured visual representations. However, existing vision-language models (VLMs) frequently…

Machine Learning · Computer Science 2026-03-10 Xin Zhang , Xingyu Li , Rongguang Wang , Ruizhong Miao , Zheng Wang , Dan Roth , Chenyang Li

Continuous Doubly Constrained Batch Reinforcement Learning

Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL,…

Machine Learning · Computer Science 2021-12-07 Rasool Fakoor , Jonas Mueller , Kavosh Asadi , Pratik Chaudhari , Alexander J. Smola

Practical Kernel-Based Reinforcement Learning

Kernel-based reinforcement learning (KBRL) stands out among reinforcement learning algorithms for its strong theoretical guarantees. By casting the learning problem as a local kernel approximation, KBRL provides a way of computing a…

Machine Learning · Computer Science 2014-07-22 André M. S. Barreto , Doina Precup , Joelle Pineau

Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting

Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning (RL). The current paper pertains to a scenario with value-based linear representation, which postulates the…

Machine Learning · Computer Science 2021-10-19 Gen Li , Yuxin Chen , Yuejie Chi , Yuantao Gu , Yuting Wei

Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation

We consider the offline reinforcement learning problem, where the aim is to learn a decision making policy from logged data. Offline RL -- particularly when coupled with (value) function approximation to allow for generalization in large or…

Machine Learning · Computer Science 2022-08-31 Dylan J. Foster , Akshay Krishnamurthy , David Simchi-Levi , Yunzong Xu

Computational Hardness of Reinforcement Learning with Partial $q^{\pi}$-Realizability

This paper investigates the computational complexity of reinforcement learning in a novel linear function approximation regime, termed partial $q^{\pi}$-realizability. In this framework, the objective is to learn an $\epsilon$-optimal…

Artificial Intelligence · Computer Science 2025-10-31 Shayan Karimi , Xiaoqi Tan

On Reward-Free Reinforcement Learning with Linear Function Approximation

Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch RL setting and the setting where there are many reward functions of interest. During the exploration phase, an agent collects samples without using…

Machine Learning · Computer Science 2020-06-22 Ruosong Wang , Simon S. Du , Lin F. Yang , Ruslan Salakhutdinov

Provably Efficient Algorithms for Multi-Objective Competitive RL

We study multi-objective reinforcement learning (RL) where an agent's reward is represented as a vector. In settings where an agent competes against opponents, its performance is measured by the distance of its average return vector to a…

Machine Learning · Computer Science 2021-02-08 Tiancheng Yu , Yi Tian , Jingzhao Zhang , Suvrit Sra