Related papers: Manifold Regularization for Kernelized LSTD

An $L^2$ Analysis of Reinforcement Learning in High Dimensions with Kernel and Neural Network Approximation

Reinforcement learning (RL) algorithms based on high-dimensional function approximation have achieved tremendous empirical success in large-scale problems with an enormous number of states. However, most analysis of such algorithms gives…

Machine Learning · Computer Science 2022-02-17 Jihao Long , Jiequn Han , Weinan E

Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning

Recent advances in large language models (LLMs) have increasingly relied on reinforcement learning (RL) to improve their reasoning capabilities. Three types of approaches have been widely adopted: The first relies on a deep neural network…

Machine Learning · Computer Science 2026-05-19 Shijin Gong , Kai Ye , Jin Zhu , Xinyu Zhang , Hongyi Zhou , Chengchun Shi

Q-Measure-Learning for Continuous State RL: Efficient Implementation and Convergence

We study reinforcement learning in infinite-horizon discounted Markov decision processes with continuous state spaces, where data are generated online from a single trajectory under a Markovian behavior policy. To avoid maintaining an…

Machine Learning · Computer Science 2026-03-05 Shengbo Wang

Sample Complexity of Kernel-Based Q-Learning

Modern reinforcement learning (RL) often faces an enormous state-action space. Existing analytical results are typically for settings with a small number of state-actions, or simple models such as linearly modeled Q-functions. To derive…

Machine Learning · Computer Science 2023-02-03 Sing-Yuan Yeh , Fu-Chieh Chang , Chang-Wei Yueh , Pei-Yuan Wu , Alberto Bernacchia , Sattar Vakili

Finite Sample Analysis of LSTD with Random Projections and Eligibility Traces

Policy evaluation with linear function approximation is an important problem in reinforcement learning. When facing high-dimensional feature spaces, such a problem becomes extremely hard considering the computation efficiency and quality of…

Machine Learning · Computer Science 2018-05-28 Haifang Li , Yingce Xia , Wensheng Zhang

Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning

We propose Q-Policy, a hybrid quantum-classical reinforcement learning (RL) framework that mathematically accelerates policy evaluation and optimization by exploiting quantum computing primitives. Q-Policy encodes value functions in quantum…

Machine Learning · Computer Science 2025-06-10 Kalyan Cherukuri , Aarav Lala , Yash Yardi

General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States

Learning to evaluate and improve policies is a core problem of Reinforcement Learning (RL). Traditional RL algorithms learn a value function defined for a single policy. A recently explored competitive alternative is to learn a single value…

Machine Learning · Computer Science 2022-07-05 Francesco Faccio , Aditya Ramesh , Vincent Herrmann , Jean Harb , Jürgen Schmidhuber

Kernelized Reinforcement Learning with Order Optimal Regret Bounds

Reinforcement learning (RL) has shown empirical success in various real world settings with complex models and large state-action spaces. The existing analytical results, however, typically focus on settings with a small number of…

Machine Learning · Computer Science 2024-03-15 Sattar Vakili , Julia Olkhovskaya

$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training

Reinforcement learning (RL) post-training is crucial for LLM alignment and reasoning, but existing policy-based methods, such as PPO and DPO, can fall short of fixing shortcuts inherited from pre-training. In this work, we introduce…

Machine Learning · Computer Science 2025-10-21 Jin Peng Zhou , Kaiwen Wang , Jonathan Chang , Zhaolin Gao , Nathan Kallus , Kilian Q. Weinberger , Kianté Brantley , Wen Sun

On Value Functions and the Agent-Environment Boundary

When function approximation is deployed in reinforcement learning (RL), the same problem may be formulated in different ways, often by treating a pre-processing step as a part of the environment or as part of the agent. As a consequence,…

Machine Learning · Computer Science 2020-06-02 Nan Jiang

Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks

We consider the off-policy evaluation problem of reinforcement learning using deep convolutional neural networks. We analyze the deep fitted Q-evaluation method for estimating the expected cumulative reward of a target policy, when the data…

Machine Learning · Computer Science 2022-10-05 Xiang Ji , Minshuo Chen , Mengdi Wang , Tuo Zhao

Bridging the gap between QP-based and MPC-based RL

Reinforcement learning methods typically use Deep Neural Networks to approximate the value functions and policies underlying a Markov Decision Process. Unfortunately, DNN-based RL suffers from a lack of explainability of the resulting…

Systems and Control · Electrical Eng. & Systems 2022-05-19 Shambhuraj Sawant , Sebastien Gros

Multilinear Tensor Low-Rank Approximation for Policy-Gradient Methods in Reinforcement Learning

Reinforcement learning (RL) aims to estimate the action to take given a (time-varying) state, with the goal of maximizing a cumulative reward function. Predominantly, there are two families of algorithms to solve RL problems: value-based…

Machine Learning · Computer Science 2025-01-10 Sergio Rozada , Hoi-To Wai , Antonio G. Marques

Offline Reinforcement Learning with On-Policy Q-Function Regularization

The core challenge of offline reinforcement learning (RL) is dealing with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy. A large portion of prior work…

Machine Learning · Computer Science 2023-07-27 Laixi Shi , Robert Dadashi , Yuejie Chi , Pablo Samuel Castro , Matthieu Geist

On the Model-Misspecification in Reinforcement Learning

The success of reinforcement learning (RL) crucially depends on effective function approximation when dealing with complex ground-truth models. Existing sample-efficient RL algorithms primarily employ three approaches to function…

Machine Learning · Computer Science 2024-01-09 Yunfan Li , Lin Yang

Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation

We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are…

Machine Learning · Computer Science 2022-10-17 Anna Winnicki , R. Srikant

Feature Selection for Value Function Approximation Using Bayesian Model Selection

Feature selection in reinforcement learning (RL), i.e. choosing basis functions such that useful approximations of the unkown value function can be obtained, is one of the main challenges in scaling RL to real-world applications. Here we…

Artificial Intelligence · Computer Science 2012-02-01 Tobias Jung , Peter Stone

Periodic Regularized Q-Learning

In reinforcement learning (RL), Q-learning is a fundamental algorithm whose convergence is guaranteed in the tabular setting. However, this convergence guarantee does not hold under linear function approximation. To overcome this…

Machine Learning · Computer Science 2026-02-04 Hyukjun Yang , Han-Dong Lim , Donghwan Lee

Policy Optimization over General State and Action Spaces

Reinforcement learning (RL) problems over general state and action spaces are notoriously challenging. In contrast to the tableau setting, one can not enumerate all the states and then iteratively update the policies for each state. This…

Machine Learning · Computer Science 2026-03-24 Caleb Ju , Guanghui Lan

Reinforcement Learning for Infinite-Dimensional Systems

Interest in reinforcement learning (RL) for large-scale systems, comprising extensive populations of intelligent agents interacting with heterogeneous environments, has surged significantly across diverse scientific domains in recent years.…

Systems and Control · Electrical Eng. & Systems 2025-09-16 Wei Zhang , Jr-Shin Li