Related papers: Sample Efficient Policy Gradient Methods with Recu…

An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient

We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed by Papini et al. (2018) for reinforcement learning. We provide an improved convergence analysis of SVRPG and show that it can find an $\epsilon$-approximate…

Machine Learning · Computer Science 2019-05-30 Pan Xu , Felicia Gao , Quanquan Gu

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Policy gradient (PG) gives rise to a rich class of reinforcement learning (RL) methods. Recently, there has been an emerging trend to accelerate the existing PG methods such as REINFORCE by the \emph{variance reduction} techniques. However,…

Machine Learning · Computer Science 2021-05-31 Junyu Zhang , Chengzhuo Ni , Zheng Yu , Csaba Szepesvari , Mengdi Wang

Ranking Policy Gradient

Sample inefficiency is a long-lasting problem in reinforcement learning (RL). The state-of-the-art estimates the optimal action values while it usually involves an extensive search over the state-action space and unstable optimization.…

Machine Learning · Computer Science 2019-11-27 Kaixiang Lin , Jiayu Zhou

Stochastic Variance-Reduced Policy Gradient

In this paper, we propose a novel reinforcement- learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods…

Machine Learning · Computer Science 2018-06-15 Matteo Papini , Damiano Binaghi , Giuseppe Canonaco , Matteo Pirotta , Marcello Restelli

Reusing Trajectories in Policy Gradients Enables Fast Convergence

Policy gradient (PG) methods are a class of effective reinforcement learning algorithms, particularly when dealing with continuous control problems. They rely on fresh on-policy data, making them sample-inefficient and requiring…

Machine Learning · Computer Science 2026-02-03 Alessandro Montenegro , Federico Mansutti , Marco Mussi , Matteo Papini , Alberto Maria Metelli

Sequential Policy Gradient for Adaptive Hyperparameter Optimization

Reinforcement learning is essential for neural architecture search and hyperparameter optimization, but the conventional approaches impede widespread use due to prohibitive time and computational costs. Inspired by DeepSeek-V3 multi-token…

Machine Learning · Computer Science 2025-06-19 Zheng Li , Jerry Cheng , Huanying Helen Gu

Policy Optimization with Stochastic Mirror Descent

Improving sample efficiency has been a longstanding goal in reinforcement learning. This paper proposes $\mathtt{VRMPO}$ algorithm: a sample efficient policy gradient method with stochastic mirror descent. In $\mathtt{VRMPO}$, a novel…

Machine Learning · Computer Science 2022-02-10 Long Yang , Yu Zhang , Gang Zheng , Qian Zheng , Pengfei Li , Jianhang Huang , Jun Wen , Gang Pan

Variance-Reduced Conservative Policy Iteration

We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to…

Machine Learning · Computer Science 2023-01-26 Naman Agarwal , Brian Bullins , Karan Singh

Efficiently Escaping Saddle Points for Policy Optimization

Policy gradient (PG) is widely used in reinforcement learning due to its scalability and good performance. In recent years, several variance-reduced PG methods have been proposed with a theoretical guarantee of converging to an approximate…

Machine Learning · Computer Science 2025-10-01 Sadegh Khorasani , Saber Salehkaleybar , Negar Kiyavash , Niao He , Matthias Grossglauser

Variance Reduction Based Experience Replay for Policy Optimization

Effective reinforcement learning (RL) for complex stochastic systems requires leveraging historical data collected in previous iterations to accelerate policy optimization. Classical experience replay treats all past observations uniformly…

Machine Learning · Statistics 2026-02-06 Hua Zheng , Wei Xie , M. Ben Feng , Keilung Choy

Stochastic Nested Variance Reduction for Nonconvex Optimization

We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions. We propose a new stochastic gradient descent algorithm based on nested variance reduction. Compared with…

Machine Learning · Computer Science 2020-10-20 Dongruo Zhou , Pan Xu , Quanquan Gu

Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint

We propose two policy gradient algorithms for solving the problem of control in an off-policy reinforcement learning (RL) context. Both algorithms incorporate a smoothed functional (SF) based gradient estimation scheme. The first algorithm…

Machine Learning · Computer Science 2024-06-25 Nithia Vijayan , Prashanth L. A

Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies

Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed the development of their theoretical foundations. Despite the huge efforts directed at the design of efficient stochastic PG-type algorithms, the…

Machine Learning · Computer Science 2023-11-09 Ilyas Fatkhullin , Anas Barakat , Anastasia Kireeva , Niao He

Stochastic Variance Reduction for Policy Gradient Estimation

Recent advances in policy gradient methods and deep learning have demonstrated their applicability for complex reinforcement learning problems. However, the variance of the performance gradient estimates obtained from the simulation is…

Machine Learning · Computer Science 2018-03-30 Tianbing Xu , Qiang Liu , Jian Peng

A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning

We propose a novel hybrid stochastic policy gradient estimator by combining an unbiased policy gradient estimator, the REINFORCE estimator, with another biased one, an adapted SARAH estimator for policy optimization. The hybrid policy…

Machine Learning · Computer Science 2020-09-23 Nhan H. Pham , Lam M. Nguyen , Dzung T. Phan , Phuong Ha Nguyen , Marten van Dijk , Quoc Tran-Dinh

Momentum-Based Policy Gradient with Second-Order Information

Variance-reduced gradient estimators for policy gradient methods have been one of the main focus of research in the reinforcement learning in recent years as they allow acceleration of the estimation process. We propose a variance-reduced…

Machine Learning · Computer Science 2023-11-28 Saber Salehkaleybar , Sadegh Khorasani , Negar Kiyavash , Niao He , Patrick Thiran

Sample Complexity of Policy Gradient Finding Second-Order Stationary Points

The goal of policy-based reinforcement learning (RL) is to search the maximal point of its objective. However, due to the inherent non-concavity of its objective, convergence to a first-order stationary point (FOSP) can not guarantee the…

Machine Learning · Computer Science 2020-12-04 Long Yang , Qian Zheng , Gang Pan

Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space

We consider the reinforcement learning (RL) problem with general utilities which consists in maximizing a function of the state-action occupancy measure. Beyond the standard cumulative reward RL setting, this problem includes as particular…

Machine Learning · Computer Science 2023-06-06 Anas Barakat , Ilyas Fatkhullin , Niao He

Stochastic Gradient Descent with Dependent Data for Offline Reinforcement Learning

In reinforcement learning (RL), offline learning decoupled learning from data collection and is useful in dealing with exploration-exploitation tradeoff and enables data reuse in many applications. In this work, we study two offline…

Machine Learning · Computer Science 2022-02-08 Jing Dong , Xin T. Tong

Sample Efficient Reinforcement Learning with REINFORCE

Policy gradient methods are among the most effective methods for large-scale reinforcement learning, and their empirical success has prompted several works that develop the foundation of their global convergence theory. However, prior works…

Machine Learning · Computer Science 2020-12-25 Junzi Zhang , Jongho Kim , Brendan O'Donoghue , Stephen Boyd