English
Related papers

Related papers: Expected Sarsa($\lambda$) with Control Variate for…

200 papers

We study the convergence of $\mathtt{Expected~Sarsa}(\lambda)$ with linear function approximation. We show that applying the off-line estimate (multi-step bootstrapping) to $\mathtt{Expected~Sarsa}(\lambda)$ is unstable for off-policy…

Machine Learning · Computer Science 2020-12-15 Long Yang , Gang Zheng , Yu Zhang , Qian Zheng , Pengfei Li , Gang Pan

In this paper, we proposed a new technique, {\em variance controlled stochastic gradient} (VCSG), to improve the performance of the stochastic variance reduced gradient (SVRG) algorithm. To avoid over-reducing the variance of gradient by…

Machine Learning · Computer Science 2021-02-22 Jia Bi , Steve R. Gunn

The stochastic gradient descent (SGD) method is a widely used approach for solving stochastic optimization problems, but its convergence is typically slow. Existing variance reduction techniques, such as SAGA, improve convergence by…

Optimization and Control · Mathematics 2025-11-21 Fabio Nobile , Matteo Raviola , Nathan Schaeffer

Policy gradient methods are very attractive in reinforcement learning due to their model-free nature and convergence guarantees. These methods, however, suffer from high variance in gradient estimation, resulting in poor sample efficiency.…

Machine Learning · Computer Science 2018-11-16 Sergey Pankov

Evolution Strategies (ES) are a powerful class of blackbox optimization techniques that recently became a competitive alternative to state-of-the-art policy gradient (PG) algorithms for reinforcement learning (RL). We propose a new method…

Neural and Evolutionary Computing · Computer Science 2020-03-16 Yunhao Tang , Krzysztof Choromanski , Alp Kucukelbir

We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed by Papini et al. (2018) for reinforcement learning. We provide an improved convergence analysis of SVRPG and show that it can find an $\epsilon$-approximate…

Machine Learning · Computer Science 2019-05-30 Pan Xu , Felicia Gao , Quanquan Gu

In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning. Expressing these in a common form, we derive a novel algorithm, Retrace($\lambda$), with three desired properties: (1) it…

Machine Learning · Computer Science 2016-11-09 Rémi Munos , Tom Stepleton , Anna Harutyunyan , Marc G. Bellemare

Off-policy learning refers to the problem of learning the value function of a way of behaving, or policy, while following a different policy. Gradient-based off-policy learning algorithms, such as GTD and TDC/GQ, converge even when using…

Artificial Intelligence · Computer Science 2015-12-15 Lucas Lehnert , Doina Precup

Ensuring safety is a critical challenge in applying Reinforcement Learning (RL) to real-world scenarios. Constrained Reinforcement Learning (CRL) addresses this by maximizing returns under predefined constraints, typically formulated as the…

Machine Learning · Computer Science 2026-01-21 Shiqing Gao , Yihang Zhou , Shuai Shao , Haoyu Luo , Yiheng Bing , Jiaxin Ding , Luoyi Fu , Xinbing Wang

Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. Unfortunately, these techniques are unable to deal with stochastic perturbations of input data, induced for…

Machine Learning · Statistics 2017-11-16 Alberto Bietti , Julien Mairal

In offline reinforcement learning, a policy learns to maximize cumulative rewards with a fixed collection of data. Towards conservative strategy, current methods choose to regularize the behavior policy or learn a lower bound of the value…

Machine Learning · Computer Science 2021-03-02 Hongchang Zhang , Jianzhun Shao , Yuhang Jiang , Shuncheng He , Xiangyang Ji

Evolution strategies (ESs) are zeroth-order stochastic black-box optimization heuristics invariant to monotonic transformations of the objective function. They evolve a multivariate normal distribution, from which candidate solutions are…

Numerical Analysis · Mathematics 2022-02-09 Youhei Akimoto , Anne Auger , Tobias Glasmachers , Daiki Morinaga

Policy gradient methods have demonstrated success in reinforcement learning tasks that have high-dimensional continuous state and action spaces. However, policy gradient methods are also notoriously sample inefficient. This can be…

Machine Learning · Computer Science 2019-08-12 Ching-An Cheng , Xinyan Yan , Byron Boots

We propose a risk-averse statistical learning framework wherein the performance of a learning algorithm is evaluated by the conditional value-at-risk (CVaR) of losses rather than the expected loss. We devise algorithms based on stochastic…

Machine Learning · Computer Science 2020-02-17 Tasuku Soma , Yuichi Yoshida

Policy evaluation is a crucial step in many reinforcement-learning procedures, which estimates a value function that predicts states' long-term value under a given policy. In this paper, we focus on policy evaluation with linear function…

Machine Learning · Computer Science 2017-06-12 Simon S. Du , Jianshu Chen , Lihong Li , Lin Xiao , Dengyong Zhou

Policy-gradient methods in Reinforcement Learning(RL) are very universal and widely applied in practice but their performance suffers from the high variance of the gradient estimate. Several procedures were proposed to reduce it including…

Machine Learning · Computer Science 2022-06-16 Maxim Kaledin , Alexander Golubev , Denis Belomestny

Off-policy policy optimization is a challenging problem in reinforcement learning (RL). The algorithms designed for this problem often suffer from high variance in their estimators, which results in poor sample efficiency, and have issues…

Machine Learning · Computer Science 2020-09-15 Daoming Lyu , Qi Qi , Mohammad Ghavamzadeh , Hengshuai Yao , Tianbao Yang , Bo Liu

The control variates (CV) method is widely used in policy gradient estimation to reduce the variance of the gradient estimators in practice. A control variate is applied by subtracting a baseline function from the state-action value…

Machine Learning · Computer Science 2021-08-12 Yuanyi Zhong , Yuan Zhou , Jian Peng

Safe reinforcement learning (RL) aims to learn policies that satisfy certain constraints before deploying them to safety-critical applications. Previous primal-dual style approaches suffer from instability issues and lack optimality…

Machine Learning · Computer Science 2022-06-20 Zuxin Liu , Zhepeng Cen , Vladislav Isenbaev , Wei Liu , Zhiwei Steven Wu , Bo Li , Ding Zhao

To comply with increasingly stringent international standards in risk management and regulation, several approaches have been developed in the literature for forecasting tail-risk measures such as Value-at-Risk (VaR) and Expected Shortfall…

Risk Management · Quantitative Finance 2026-03-02 Alessandra Amendola , Vincenzo Candila , Antonio Naimoli , Giuseppe Storti
‹ Prev 1 2 3 10 Next ›