English
Related papers

Related papers: Sample Efficient Policy Gradient Methods with Recu…

200 papers

We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed by Papini et al. (2018) for reinforcement learning. We provide an improved convergence analysis of SVRPG and show that it can find an $\epsilon$-approximate…

Machine Learning · Computer Science 2019-05-30 Pan Xu , Felicia Gao , Quanquan Gu

Policy gradient (PG) gives rise to a rich class of reinforcement learning (RL) methods. Recently, there has been an emerging trend to accelerate the existing PG methods such as REINFORCE by the \emph{variance reduction} techniques. However,…

Machine Learning · Computer Science 2021-05-31 Junyu Zhang , Chengzhuo Ni , Zheng Yu , Csaba Szepesvari , Mengdi Wang

Sample inefficiency is a long-lasting problem in reinforcement learning (RL). The state-of-the-art estimates the optimal action values while it usually involves an extensive search over the state-action space and unstable optimization.…

Machine Learning · Computer Science 2019-11-27 Kaixiang Lin , Jiayu Zhou

In this paper, we propose a novel reinforcement- learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods…

Machine Learning · Computer Science 2018-06-15 Matteo Papini , Damiano Binaghi , Giuseppe Canonaco , Matteo Pirotta , Marcello Restelli

Policy gradient (PG) methods are a class of effective reinforcement learning algorithms, particularly when dealing with continuous control problems. They rely on fresh on-policy data, making them sample-inefficient and requiring…

Machine Learning · Computer Science 2026-02-03 Alessandro Montenegro , Federico Mansutti , Marco Mussi , Matteo Papini , Alberto Maria Metelli

Reinforcement learning is essential for neural architecture search and hyperparameter optimization, but the conventional approaches impede widespread use due to prohibitive time and computational costs. Inspired by DeepSeek-V3 multi-token…

Machine Learning · Computer Science 2025-06-19 Zheng Li , Jerry Cheng , Huanying Helen Gu

Improving sample efficiency has been a longstanding goal in reinforcement learning. This paper proposes $\mathtt{VRMPO}$ algorithm: a sample efficient policy gradient method with stochastic mirror descent. In $\mathtt{VRMPO}$, a novel…

Machine Learning · Computer Science 2022-02-10 Long Yang , Yu Zhang , Gang Zheng , Qian Zheng , Pengfei Li , Jianhang Huang , Jun Wen , Gang Pan

We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to…

Machine Learning · Computer Science 2023-01-26 Naman Agarwal , Brian Bullins , Karan Singh

Policy gradient (PG) is widely used in reinforcement learning due to its scalability and good performance. In recent years, several variance-reduced PG methods have been proposed with a theoretical guarantee of converging to an approximate…

Machine Learning · Computer Science 2025-10-01 Sadegh Khorasani , Saber Salehkaleybar , Negar Kiyavash , Niao He , Matthias Grossglauser

Effective reinforcement learning (RL) for complex stochastic systems requires leveraging historical data collected in previous iterations to accelerate policy optimization. Classical experience replay treats all past observations uniformly…

Machine Learning · Statistics 2026-02-06 Hua Zheng , Wei Xie , M. Ben Feng , Keilung Choy

We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions. We propose a new stochastic gradient descent algorithm based on nested variance reduction. Compared with…

Machine Learning · Computer Science 2020-10-20 Dongruo Zhou , Pan Xu , Quanquan Gu

We propose two policy gradient algorithms for solving the problem of control in an off-policy reinforcement learning (RL) context. Both algorithms incorporate a smoothed functional (SF) based gradient estimation scheme. The first algorithm…

Machine Learning · Computer Science 2024-06-25 Nithia Vijayan , Prashanth L. A

Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed the development of their theoretical foundations. Despite the huge efforts directed at the design of efficient stochastic PG-type algorithms, the…

Machine Learning · Computer Science 2023-11-09 Ilyas Fatkhullin , Anas Barakat , Anastasia Kireeva , Niao He

Recent advances in policy gradient methods and deep learning have demonstrated their applicability for complex reinforcement learning problems. However, the variance of the performance gradient estimates obtained from the simulation is…

Machine Learning · Computer Science 2018-03-30 Tianbing Xu , Qiang Liu , Jian Peng

We propose a novel hybrid stochastic policy gradient estimator by combining an unbiased policy gradient estimator, the REINFORCE estimator, with another biased one, an adapted SARAH estimator for policy optimization. The hybrid policy…

Machine Learning · Computer Science 2020-09-23 Nhan H. Pham , Lam M. Nguyen , Dzung T. Phan , Phuong Ha Nguyen , Marten van Dijk , Quoc Tran-Dinh

Variance-reduced gradient estimators for policy gradient methods have been one of the main focus of research in the reinforcement learning in recent years as they allow acceleration of the estimation process. We propose a variance-reduced…

Machine Learning · Computer Science 2023-11-28 Saber Salehkaleybar , Sadegh Khorasani , Negar Kiyavash , Niao He , Patrick Thiran

The goal of policy-based reinforcement learning (RL) is to search the maximal point of its objective. However, due to the inherent non-concavity of its objective, convergence to a first-order stationary point (FOSP) can not guarantee the…

Machine Learning · Computer Science 2020-12-04 Long Yang , Qian Zheng , Gang Pan

We consider the reinforcement learning (RL) problem with general utilities which consists in maximizing a function of the state-action occupancy measure. Beyond the standard cumulative reward RL setting, this problem includes as particular…

Machine Learning · Computer Science 2023-06-06 Anas Barakat , Ilyas Fatkhullin , Niao He

In reinforcement learning (RL), offline learning decoupled learning from data collection and is useful in dealing with exploration-exploitation tradeoff and enables data reuse in many applications. In this work, we study two offline…

Machine Learning · Computer Science 2022-02-08 Jing Dong , Xin T. Tong

Policy gradient methods are among the most effective methods for large-scale reinforcement learning, and their empirical success has prompted several works that develop the foundation of their global convergence theory. However, prior works…

Machine Learning · Computer Science 2020-12-25 Junzi Zhang , Jongho Kim , Brendan O'Donoghue , Stephen Boyd
‹ Prev 1 2 3 10 Next ›