Related papers: Provably Efficient $Q$-learning with Function Appr…

Diagnosing Bottlenecks in Deep Q-learning Algorithms

Q-learning methods represent a commonly used class of algorithms in reinforcement learning: they are generally efficient and simple, and can be combined readily with function approximators for deep reinforcement learning (RL). However, the…

Machine Learning · Computer Science 2019-02-28 Justin Fu , Aviral Kumar , Matthew Soh , Sergey Levine

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

Offline reinforcement learning, which aims at optimizing sequential decision-making strategies with historical data, has been extensively applied in real-life applications. State-Of-The-Art algorithms usually leverage powerful function…

Machine Learning · Computer Science 2022-11-28 Ming Yin , Mengdi Wang , Yu-Xiang Wang

$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training

Reinforcement learning (RL) post-training is crucial for LLM alignment and reasoning, but existing policy-based methods, such as PPO and DPO, can fall short of fixing shortcuts inherited from pre-training. In this work, we introduce…

Machine Learning · Computer Science 2025-10-21 Jin Peng Zhou , Kaiwen Wang , Jonathan Chang , Zhaolin Gao , Nathan Kallus , Kilian Q. Weinberger , Kianté Brantley , Wen Sun

Convergence of Distributionally Robust Q-Learning with Linear Function Approximation

Distributionally robust reinforcement learning (DRRL) focuses on designing policies that achieve good performance under model uncertainties. The goal is to maximize the worst-case long-term discounted reward, where the data for RL comes…

Machine Learning · Computer Science 2026-03-17 Saptarshi Mandal , Yashaswini Murthy , R. Srikant

Distributional reinforcement learning with linear function approximation

Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited. One exception is Rowland et al. (2018)'s analysis of the C51 algorithm in terms of the Cram\'er…

Machine Learning · Computer Science 2019-02-11 Marc G. Bellemare , Nicolas Le Roux , Pablo Samuel Castro , Subhodeep Moitra

A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation

The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes. In this paper, we propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for RL with…

Machine Learning · Computer Science 2025-10-06 Heyang Zhao , Jiafan He , Quanquan Gu

Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation

We study the problem of deployment efficient reinforcement learning (RL) with linear function approximation under the \emph{reward-free} exploration setting. This is a well-motivated problem because deploying new policies is costly in…

Machine Learning · Computer Science 2023-02-23 Dan Qiao , Yu-Xiang Wang

Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity

The current paper studies the problem of agnostic $Q$-learning with function approximation in deterministic systems where the optimal $Q$-function is approximable by a function in the class $\mathcal{F}$ with approximation error $\delta \ge…

Machine Learning · Computer Science 2020-02-18 Simon S. Du , Jason D. Lee , Gaurav Mahajan , Ruosong Wang

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Deep reinforcement learning can learn effective policies for a wide range of tasks, but is notoriously difficult to use due to instability and sensitivity to hyperparameters. The reasons for this remain unclear. When using standard…

Machine Learning · Computer Science 2020-03-17 Aviral Kumar , Abhishek Gupta , Sergey Levine

Variance Control for Distributional Reinforcement Learning

Although distributional reinforcement learning (DRL) has been widely examined in the past few years, very few studies investigate the validity of the obtained Q-function estimator in the distributional setting. To fully understand how the…

Machine Learning · Computer Science 2023-08-01 Qi Kuang , Zhoufan Zhu , Liwen Zhang , Fan Zhou

The Impact of Data Distribution on Q-learning with Function Approximation

We study the interplay between the data distribution and Q-learning-based algorithms with function approximation. We provide a unified theoretical and empirical analysis as to how different properties of the data distribution influence the…

Machine Learning · Computer Science 2023-02-13 Pedro P. Santos , Diogo S. Carvalho , Alberto Sardinha , Francisco S. Melo

Risk-Sensitive Policy with Distributional Reinforcement Learning

Classical reinforcement learning (RL) techniques are generally concerned with the design of decision-making policies driven by the maximisation of the expected outcome. Nevertheless, this approach does not take into consideration the…

Machine Learning · Computer Science 2023-01-02 Thibaut Théate , Damien Ernst

How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies

Using deep neural nets as function approximator for reinforcement learning tasks have recently been shown to be very powerful for solving problems approaching real-world complexity. Using these results as a benchmark, we discuss the role…

Machine Learning · Computer Science 2016-01-21 Vincent François-Lavet , Raphael Fonteneau , Damien Ernst

Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs

Q-learning is a popular Reinforcement Learning (RL) algorithm which is widely used in practice with function approximation (Mnih et al., 2015). In contrast, existing theoretical results are pessimistic about Q-learning. For example, (Baird,…

Machine Learning · Computer Science 2021-10-20 Naman Agarwal , Syomantak Chaudhuri , Prateek Jain , Dheeraj Nagaraj , Praneeth Netrapalli

Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors

In reinforcement learning (RL), function approximation errors are known to easily lead to the Q-value overestimations, thus greatly reducing policy performance. This paper presents a distributional soft actor-critic (DSAC) algorithm, which…

Machine Learning · Computer Science 2021-06-14 Jingliang Duan , Yang Guan , Shengbo Eben Li , Yangang Ren , Bo Cheng

Scalable Reinforcement Learning for Linear-Quadratic Control of Networks

Distributed optimal control is known to be challenging and can become intractable even for linear-quadratic regulator problems. In this work, we study a special class of such problems where distributed state feedback controllers can give…

Systems and Control · Electrical Eng. & Systems 2024-03-14 Johan Olsson , Runyu Zhang , Emma Tegling , Na Li

Regularized Q-learning

Q-learning is widely used algorithm in reinforcement learning community. Under the lookup table setting, its convergence is well established. However, its behavior is known to be unstable with the linear function approximation case. This…

Machine Learning · Computer Science 2025-02-11 Han-Dong Lim , Donghwan Lee

Replicable Reinforcement Learning with Linear Function Approximation

Replication of experimental results has been a challenge faced by many scientific disciplines, including the field of machine learning. Recent work on the theory of machine learning has formalized replicability as the demand that an…

Machine Learning · Computer Science 2026-04-15 Eric Eaton , Marcel Hussing , Michael Kearns , Aaron Roth , Sikata Bela Sengupta , Jessica Sorrell

Provably Efficient Reinforcement Learning with Linear Function Approximation

Modern Reinforcement Learning (RL) is commonly applied to practical problems with an enormous number of states, where function approximation must be deployed to approximate either the value function or the policy. The introduction of…

Machine Learning · Computer Science 2019-08-09 Chi Jin , Zhuoran Yang , Zhaoran Wang , Michael I. Jordan

Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning

The $Q$-learning algorithm is a simple and widely-used stochastic approximation scheme for reinforcement learning, but the basic protocol can exhibit instability in conjunction with function approximation. Such instability can be observed…

Machine Learning · Computer Science 2022-06-03 Andrea Zanette , Martin J. Wainwright