Related papers: Value Function Approximation in Noisy Environments…

An Analysis of State-Relevance Weights and Sampling Distributions on L1-Regularized Approximate Linear Programming Approximation Accuracy

Recent interest in the use of $L_1$ regularization in the use of value function approximation includes Petrik et al.'s introduction of $L_1$-Regularized Approximate Linear Programming (RALP). RALP is unique among $L_1$-regularized…

Artificial Intelligence · Computer Science 2014-04-25 Gavin Taylor , Connor Geer , David Piekut

Approximation of Functions on Manifolds in High Dimension from Noisy Scattered Data

In this paper, we consider the fundamental problem of approximation of functions on a low-dimensional manifold embedded in a high-dimensional space, with noise affecting both in the data and values of the functions. Due to the curse of…

Numerical Analysis · Mathematics 2020-12-29 Shira Faigenbaum-Golovin , David Levin

An Adaptive Sampling Algorithm for Level-set Approximation

We propose a new numerical scheme for approximating level-sets of Lipschitz multivariate functions which is robust to stochastic noise. The algorithm's main feature is an adaptive grid-based stochastic approximation strategy which…

Numerical Analysis · Mathematics 2025-09-19 Matteo Croci , Abdul-Lateef Haji-Ali , Ian C. J. Powell

Adaptive Resolving Methods for Reinforcement Learning with Function Approximations

Reinforcement learning (RL) problems are fundamental in online decision-making and have been instrumental in finding an optimal policy for Markov decision processes (MDPs). Function approximations are usually deployed to handle large or…

Machine Learning · Computer Science 2025-05-20 Jiashuo Jiang , Yiming Zong , Yinyu Ye

A Linearly Relaxed Approximate Linear Program for Markov Decision Processes

Approximate linear programming (ALP) and its variants have been widely applied to Markov Decision Processes (MDPs) with a large number of states. A serious limitation of ALP is that it has an intractable number of constraints, as a result…

Systems and Control · Computer Science 2017-04-11 Chandrashekar Lakshminarayanan , Shalabh Bhatnagar , Csaba Szepesvari

Estimation of sparse polynomial approximation error to continuous function

The sparse polynomial approximation of continuous functions has emerged as a prominent area of interest in function approximation theory in recent years. A key challenge within this domain is the accurate estimation of approximation errors.…

Numerical Analysis · Mathematics 2025-06-10 Renzhong Feng , Bowen Zhang

Practical Linear Value-approximation Techniques for First-order MDPs

Recent work on approximate linear programming (ALP) techniques for first-order Markov Decision Processes (FOMDPs) represents the value function linearly w.r.t. a set of first-order basis functions and uses linear programming techniques to…

Artificial Intelligence · Computer Science 2012-07-02 Scott Sanner , Craig Boutilier

Exploration by Random Reward Perturbation

We introduce Random Reward Perturbation (RRP), a novel exploration strategy for reinforcement learning (RL). Our theoretical analyses demonstrate that adding zero-mean noise to environmental rewards effectively enhances policy diversity…

Machine Learning · Computer Science 2025-06-11 Haozhe Ma , Guoji Fu , Zhengding Luo , Jiele Wu , Tze-Yun Leong

Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes

Approximate dynamic programming has been used successfully in a large variety of domains, but it relies on a small set of provided approximation features to calculate solutions reliably. Large and rich sets of features can cause existing…

Artificial Intelligence · Computer Science 2015-03-17 Marek Petrik , Gavin Taylor , Ron Parr , Shlomo Zilberstein

Learning to Explore with Parameter-Space Noise: A Deep Dive into Parameter-Space Noise for Reinforcement Learning with Verifiable Rewards

Reinforcement Learning with Verifiable Rewards (RLVR) improves LLM reasoning, yet growing evidence indicates an exploration ceiling: it often reweights existing solution traces rather than discovering new strategies, limiting gains under…

Machine Learning · Computer Science 2026-03-03 Bizhe Bai , Xinyue Wang , Peng Ye , Tao Chen

Provably Efficient Reinforcement Learning with Linear Function Approximation

Modern Reinforcement Learning (RL) is commonly applied to practical problems with an enormous number of states, where function approximation must be deployed to approximate either the value function or the policy. The introduction of…

Machine Learning · Computer Science 2019-08-09 Chi Jin , Zhuoran Yang , Zhaoran Wang , Michael I. Jordan

Smart-GRPO: Smartly Sampling Noise for Efficient RL of Flow-Matching Models

Recent advancements in flow-matching have enabled high-quality text-to-image generation. However, the deterministic nature of flow-matching models makes them poorly suited for reinforcement learning, a key tool for improving image quality…

Computer Vision and Pattern Recognition · Computer Science 2025-10-06 Benjamin Yu , Jackie Liu , Justin Cui

Noisy Low-rank Matrix Optimization: Geometry of Local Minima and Convergence Rate

This paper is concerned with low-rank matrix optimization, which has found a wide range of applications in machine learning. This problem in the special case of matrix sensing has been studied extensively through the notion of Restricted…

Optimization and Control · Mathematics 2023-03-17 Ziye Ma , Somayeh Sojoudi

L2C2: Locally Lipschitz Continuous Constraint towards Stable and Smooth Reinforcement Learning

This paper proposes a new regularization technique for reinforcement learning (RL) towards making policy and value functions smooth and stable. RL is known for the instability of the learning process and the sensitivity of the acquired…

Robotics · Computer Science 2023-07-04 Taisuke Kobayashi

On the Convergence of Approximate and Regularized Policy Iteration Schemes

Entropy regularized algorithms such as Soft Q-learning and Soft Actor-Critic, recently showed state-of-the-art performance on a number of challenging reinforcement learning (RL) tasks. The regularized formulation modifies the standard RL…

Machine Learning · Statistics 2019-10-15 Elena Smirnova , Elvis Dohmatob

Self-Paced Absolute Learning Progress as a Regularized Approach to Curriculum Learning

The usability of Reinforcement Learning is restricted by the large computation times it requires. Curriculum Reinforcement Learning speeds up learning by defining a helpful order in which an agent encounters tasks, i.e. from simple to hard.…

Machine Learning · Computer Science 2023-06-12 Tobias Niehues , Ulla Scheler , Pascal Klink

Partitioned Linear Programming Approximations for MDPs

Approximate linear programming (ALP) is an efficient approach to solving large factored Markov decision processes (MDPs). The main idea of the method is to approximate the optimal value function by a set of basis functions and optimize…

Artificial Intelligence · Computer Science 2012-06-18 Branislav Kveton , Milos Hauskrecht

Noise-corrected GRPO: From Noisy Rewards to Unbiased Gradients

Reinforcement learning from human feedback (RLHF) or verifiable rewards (RLVR), the standard paradigm for aligning LLMs or building recent SOTA reasoning models, is highly sensitive to noise from inconsistent or erroneous rewards. Yet, the…

Machine Learning · Computer Science 2026-05-20 Omar El Mansouri , Fathinah Asma Izzati , Mohamed El Amine Seddik , Salem Lahlou

Approximate Dynamic Programming via a Smoothed Linear Program

We present a novel linear program for the approximation of the dynamic programming cost-to-go function in high-dimensional stochastic control problems. LP approaches to approximate DP have typically relied on a natural `projection' of a…

Optimization and Control · Mathematics 2009-10-05 V. V. Desai , V. F. Farias , C. C. Moallemi

Reinforcement Learning with Perturbed Rewards

Recent studies have shown that reinforcement learning (RL) models are vulnerable in various noisy scenarios. For instance, the observed reward channel is often subject to noise in practice (e.g., when rewards are collected through sensors),…

Machine Learning · Computer Science 2020-02-04 Jingkang Wang , Yang Liu , Bo Li