English
Related papers

Related papers: Error Controlled Actor-Critic

200 papers

In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting…

Artificial Intelligence · Computer Science 2018-10-23 Scott Fujimoto , Herke van Hoof , David Meger

In value-based deep reinforcement learning methods, approximation of value functions induces overestimation bias and leads to suboptimal policies. We show that in deep actor-critic methods that aim to overcome the overestimation bias, if…

Machine Learning · Computer Science 2021-12-28 Baturay Saglam , Enes Duran , Dogan C. Cicek , Furkan B. Mutlu , Suleyman S. Kozat

We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning…

Artificial Intelligence · Computer Science 2018-02-23 Hamid Reza Maei

Overestimation is a fundamental characteristic of model-free reinforcement learning (MF-RL), arising from the principles of temporal difference learning and the approximation of the Q-function. To address this challenge, we propose a novel…

Machine Learning · Computer Science 2025-04-15 Ukjo Hwang , Songnam Hong

Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the…

Machine Learning · Computer Science 2023-11-01 Sharan Vaswani , Amirreza Kazemi , Reza Babanezhad , Nicolas Le Roux

In reinforcement learning for partially observable environments, many successful algorithms have been developed within the asymmetric learning paradigm. This paradigm leverages additional state information available at training time for…

Machine Learning · Computer Science 2025-09-09 Gaspard Lambrechts , Damien Ernst , Aditya Mahajan

Actor-Critic based approaches were among the first to address reinforcement learning in a general setting. Recently, these algorithms have gained renewed interest due to their generality, good convergence properties, and possible biological…

Machine Learning · Computer Science 2009-09-17 D. Di Castro , R. Meir

Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence…

Machine Learning · Computer Science 2019-07-16 Zhuoran Yang , Yongxin Chen , Mingyi Hong , Zhaoran Wang

In this paper, we investigate the infinite-horizon risk-constrained linear quadratic regulator problem (RC-QR), which augments the classical LQR formulation with a statistical constraint on the variability of the system state to incorporate…

Optimization and Control · Mathematics 2025-10-28 Weijian Li , Andreas A. Malikopoulos

We revisit the standard formulation of tabular actor-critic algorithm as a two time-scale stochastic approximation with value function computed on a faster time-scale and policy computed on a slower time-scale. This emulates policy…

Machine Learning · Computer Science 2024-06-21 Shalabh Bhatnagar , Vivek S. Borkar , Soumyajit Guin

Value-based algorithms are a cornerstone of off-policy reinforcement learning due to their simplicity and training stability. However, their use has traditionally been restricted to discrete action spaces, as they rely on estimating…

Machine Learning · Computer Science 2025-10-23 Yigit Korkmaz , Urvi Bhuwania , Ayush Jain , Erdem Bıyık

We focus on a simulation-based optimization problem of choosing the best design from the feasible space. Although the simulation model can be queried with finite samples, its internal processing rule cannot be utilized in the optimization…

Machine Learning · Computer Science 2021-11-02 Kuo Li , Qing-Shan Jia , Jiaqi Yan

Deterministic policy gradient algorithms are foundational for actor-critic methods in controlling continuous systems, yet they often encounter inaccuracies due to their dependence on the derivative of the critic's value estimates with…

Machine Learning · Computer Science 2025-02-11 Baturay Saglam , Dionysis Kalogerias

Actor-critic methods, a type of model-free Reinforcement Learning, have been successfully applied to challenging tasks in continuous control, often achieving state-of-the art performance. However, wide-scale adoption of these methods in…

Machine Learning · Statistics 2019-10-29 Kamil Ciosek , Quan Vuong , Robert Loftin , Katja Hofmann

Warm-Start reinforcement learning (RL), aided by a prior policy obtained from offline training, is emerging as a promising RL approach for practical applications. Recent empirical studies have demonstrated that the performance of Warm-Start…

Machine Learning · Computer Science 2023-06-21 Hang Wang , Sen Lin , Junshan Zhang

Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators. The practicality of this approach comes at the expense of training instability, caused…

Machine Learning · Computer Science 2024-06-11 Bahareh Tasdighi , Abdullah Akgül , Manuel Haussmann , Kenny Kazimirzak Brink , Melih Kandemir

Actor-critic algorithms have become a cornerstone in reinforcement learning (RL), leveraging the strengths of both policy-based and value-based methods. Despite recent progress in understanding their statistical efficiency, no existing work…

Machine Learning · Statistics 2025-05-07 Kevin Tan , Wei Fan , Yuting Wei

Optimal control problems with free terminal time present many challenges including nonsmooth and discontinuous control laws, irregular value functions, many local optima, and the curse of dimensionality. To overcome these issues, we propose…

Optimization and Control · Mathematics 2022-08-08 Evan Burton , Tenavi Nakamura-Zimmerer , Qi Gong , Wei Kang

We propose an actor-critic framework to solve the time-continuous stochastic optimal control problem. A least square temporal difference method is applied to compute the value function for the critic. The policy gradient method is…

Optimization and Control · Mathematics 2025-01-27 Mo Zhou , Jianfeng Lu

Reinforcement learning, mathematically described by Markov Decision Problems, may be approached either through dynamic programming or policy search. Actor-critic algorithms combine the merits of both approaches by alternating between steps…

Machine Learning · Computer Science 2023-01-31 Harshat Kumar , Alec Koppel , Alejandro Ribeiro
‹ Prev 1 2 3 10 Next ›