Related papers: Expected Sarsa($\lambda$) with Control Variate for…

On Convergence of Gradient Expected Sarsa($\lambda$)

We study the convergence of $\mathtt{Expected~Sarsa}(\lambda)$ with linear function approximation. We show that applying the off-line estimate (multi-step bootstrapping) to $\mathtt{Expected~Sarsa}(\lambda)$ is unstable for off-policy…

Machine Learning · Computer Science 2020-12-15 Long Yang , Gang Zheng , Yu Zhang , Qian Zheng , Pengfei Li , Gang Pan

A Variance Controlled Stochastic Method with Biased Estimation for Faster Non-convex Optimization

In this paper, we proposed a new technique, {\em variance controlled stochastic gradient} (VCSG), to improve the performance of the stochastic variance reduced gradient (SVRG) algorithm. To avoid over-reducing the variance of gradient by…

Machine Learning · Computer Science 2021-02-22 Jia Bi , Steve R. Gunn

Stochastic gradient with least-squares control variates

The stochastic gradient descent (SGD) method is a widely used approach for solving stochastic optimization problems, but its convergence is typically slow. Existing variance reduction techniques, such as SAGA, improve convergence by…

Optimization and Control · Mathematics 2025-11-21 Fabio Nobile , Matteo Raviola , Nathan Schaeffer

Reward-estimation variance elimination in sequential decision processes

Policy gradient methods are very attractive in reinforcement learning due to their model-free nature and convergence guarantees. These methods, however, suffer from high variance in gradient estimation, resulting in poor sample efficiency.…

Machine Learning · Computer Science 2018-11-16 Sergey Pankov

Variance Reduction for Evolution Strategies via Structured Control Variates

Evolution Strategies (ES) are a powerful class of blackbox optimization techniques that recently became a competitive alternative to state-of-the-art policy gradient (PG) algorithms for reinforcement learning (RL). We propose a new method…

Neural and Evolutionary Computing · Computer Science 2020-03-16 Yunhao Tang , Krzysztof Choromanski , Alp Kucukelbir

An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient

We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed by Papini et al. (2018) for reinforcement learning. We provide an improved convergence analysis of SVRPG and show that it can find an $\epsilon$-approximate…

Machine Learning · Computer Science 2019-05-30 Pan Xu , Felicia Gao , Quanquan Gu

Safe and Efficient Off-Policy Reinforcement Learning

In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning. Expressing these in a common form, we derive a novel algorithm, Retrace($\lambda$), with three desired properties: (1) it…

Machine Learning · Computer Science 2016-11-09 Rémi Munos , Tom Stepleton , Anna Harutyunyan , Marc G. Bellemare

Policy Gradient Methods for Off-policy Control

Off-policy learning refers to the problem of learning the value function of a way of behaving, or policy, while following a different policy. Gradient-based off-policy learning algorithms, such as GTD and TDC/GQ, converge even when using…

Artificial Intelligence · Computer Science 2015-12-15 Lucas Lehnert , Doina Precup

Extreme Value Policy Optimization for Safe Reinforcement Learning

Ensuring safety is a critical challenge in applying Reinforcement Learning (RL) to real-world scenarios. Constrained Reinforcement Learning (CRL) addresses this by maximizing returns under predefined constraints, typically formulated as the…

Machine Learning · Computer Science 2026-01-21 Shiqing Gao , Yihang Zhou , Shuai Shao , Haoyu Luo , Yiheng Bing , Jiaxin Ding , Luoyi Fu , Xinbing Wang

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure

Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. Unfortunately, these techniques are unable to deal with stochastic perturbations of input data, induced for…

Machine Learning · Statistics 2017-11-16 Alberto Bietti , Julien Mairal

Reducing Conservativeness Oriented Offline Reinforcement Learning

In offline reinforcement learning, a policy learns to maximize cumulative rewards with a fixed collection of data. Towards conservative strategy, current methods choose to regularize the behavior policy or learn a lower bound of the value…

Machine Learning · Computer Science 2021-03-02 Hongchang Zhang , Jianzhun Shao , Yuhang Jiang , Shuncheng He , Xiangyang Ji

Global Linear Convergence of Evolution Strategies on More Than Smooth Strongly Convex Functions

Evolution strategies (ESs) are zeroth-order stochastic black-box optimization heuristics invariant to monotonic transformations of the objective function. They evolve a multivariate normal distribution, from which candidate solutions are…

Numerical Analysis · Mathematics 2022-02-09 Youhei Akimoto , Anne Auger , Tobias Glasmachers , Daiki Morinaga

Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods

Policy gradient methods have demonstrated success in reinforcement learning tasks that have high-dimensional continuous state and action spaces. However, policy gradient methods are also notoriously sample inefficient. This can be…

Machine Learning · Computer Science 2019-08-12 Ching-An Cheng , Xinyan Yan , Byron Boots

Statistical Learning with Conditional Value at Risk

We propose a risk-averse statistical learning framework wherein the performance of a learning algorithm is evaluated by the conditional value-at-risk (CVaR) of losses rather than the expected loss. We devise algorithms based on stochastic…

Machine Learning · Computer Science 2020-02-17 Tasuku Soma , Yuichi Yoshida

Stochastic Variance Reduction Methods for Policy Evaluation

Policy evaluation is a crucial step in many reinforcement-learning procedures, which estimates a value function that predicts states' long-term value under a given policy. In this paper, we focus on policy evaluation with linear function…

Machine Learning · Computer Science 2017-06-12 Simon S. Du , Jianshu Chen , Lihong Li , Lin Xiao , Dengyong Zhou

Variance Reduction for Policy-Gradient Methods via Empirical Variance Minimization

Policy-gradient methods in Reinforcement Learning(RL) are very universal and widely applied in practice but their performance suffers from the high variance of the gradient estimate. Several procedures were proposed to reduce it including…

Machine Learning · Computer Science 2022-06-16 Maxim Kaledin , Alexander Golubev , Denis Belomestny

Variance-Reduced Off-Policy Memory-Efficient Policy Search

Off-policy policy optimization is a challenging problem in reinforcement learning (RL). The algorithms designed for this problem often suffer from high variance in their estimators, which results in poor sample efficiency, and have issues…

Machine Learning · Computer Science 2020-09-15 Daoming Lyu , Qi Qi , Mohammad Ghavamzadeh , Hengshuai Yao , Tianbao Yang , Bo Liu

Coordinate-wise Control Variates for Deep Policy Gradients

The control variates (CV) method is widely used in policy gradient estimation to reduce the variance of the gradient estimators in practice. A control variate is applied by subtracting a baseline function from the state-action value…

Machine Learning · Computer Science 2021-08-12 Yuanyi Zhong , Yuan Zhou , Jian Peng

Constrained Variational Policy Optimization for Safe Reinforcement Learning

Safe reinforcement learning (RL) aims to learn policies that satisfy certain constraints before deploying them to safety-critical applications. Previous primal-dual style approaches suffer from instability issues and lack optimality…

Machine Learning · Computer Science 2022-06-20 Zuxin Liu , Zhepeng Cen , Vladislav Isenbaev , Wei Liu , Zhiwei Steven Wu , Bo Li , Ding Zhao

Combining Value-at-Risk and Expected Shortfall forecasts via the Model Confidence Set

To comply with increasingly stringent international standards in risk management and regulation, several approaches have been developed in the literature for forecasting tail-risk measures such as Value-at-Risk (VaR) and Expected Shortfall…

Risk Management · Quantitative Finance 2026-03-02 Alessandra Amendola , Vincenzo Candila , Antonio Naimoli , Giuseppe Storti