Related papers: Policy Optimization via Importance Sampling

Policy Optimization Through Approximate Importance Sampling

Recent policy optimization approaches (Schulman et al., 2015a; 2017) have achieved substantial empirical successes by constructing new proxy optimization objectives. These proxy objectives allow stable and low variance policy learning, but…

Machine Learning · Computer Science 2020-02-24 Marcin B. Tomczak , Dongho Kim , Peter Vrancx , Kee-Eung Kim

Policy Optimization as Online Learning with Mediator Feedback

Policy Optimization (PO) is a widely used approach to address continuous control tasks. In this paper, we introduce the notion of mediator feedback that frames PO as an online learning problem over the policy space. The additional available…

Machine Learning · Computer Science 2020-12-16 Alberto Maria Metelli , Matteo Papini , Pierluca D'Oro , Marcello Restelli

Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation

Evaluating a policy by deploying it in the real world can be risky and costly. Off-policy policy evaluation (OPE) algorithms use historical data collected from running a previous policy to evaluate a new policy, which provides a means for…

Artificial Intelligence · Computer Science 2017-12-07 Zhaohan Daniel Guo , Philip S. Thomas , Emma Brunskill

Policy Gradient with Active Importance Sampling

Importance sampling (IS) represents a fundamental technique for a large surge of off-policy reinforcement learning approaches. Policy gradient (PG) methods, in particular, significantly benefit from IS, enabling the effective reuse of…

Machine Learning · Computer Science 2024-05-10 Matteo Papini , Giorgio Manganini , Alberto Maria Metelli , Marcello Restelli

Offline Policy Optimization with Eligible Actions

Offline policy optimization could have a large impact on many real-world decision-making problems, as online learning may be infeasible in many applications. Importance sampling and its variants are a commonly used type of estimator in…

Machine Learning · Computer Science 2022-07-05 Yao Liu , Yannis Flet-Berliac , Emma Brunskill

Proximal Policy Optimization Smoothed Algorithm

Proximal policy optimization (PPO) has yielded state-of-the-art results in policy search, a subfield of reinforcement learning, with one of its key points being the use of a surrogate objective function to restrict the step size at each…

Machine Learning · Computer Science 2020-12-07 Wangshu Zhu , Andre Rosendo

Generalized Proximal Policy Optimization with Sample Reuse

In real-world decision making tasks, it is critical for data-driven reinforcement learning methods to be both stable and sample efficient. On-policy methods typically generate reliable policy improvement throughout training, while…

Machine Learning · Computer Science 2021-11-02 James Queeney , Ioannis Ch. Paschalidis , Christos G. Cassandras

Offline Policy Optimization with Posterior Sampling

A fundamental challenge in model-based offline reinforcement learning (RL) lies in the trade-off between generalization and robustness against exploitation errors in out-of-distribution (OOD) regions. While OOD samples may capture valid…

Artificial Intelligence · Computer Science 2026-05-11 Hongqiang Lin , Dongxu Zhang , Yiding Sun , Mingzhe Li , Ning Yang , Haijun Zhang

Constrained Policy Optimization

For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function. For example, systems that physically interact…

Machine Learning · Computer Science 2017-05-31 Joshua Achiam , David Held , Aviv Tamar , Pieter Abbeel

Proximal Policy Optimization Algorithms

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.…

Machine Learning · Computer Science 2017-08-29 John Schulman , Filip Wolski , Prafulla Dhariwal , Alec Radford , Oleg Klimov

Transductive Off-policy Proximal Policy Optimization

Proximal Policy Optimization (PPO) is a popular model-free reinforcement learning algorithm, esteemed for its simplicity and efficacy. However, due to its inherent on-policy nature, its proficiency in harnessing data from disparate policies…

Machine Learning · Computer Science 2024-06-07 Yaozhong Gan , Renye Yan , Xiaoyang Tan , Zhe Wu , Junliang Xing

Reinforcement Learning Using Expectation Maximization Based Guided Policy Search for Stochastic Dynamics

Guided policy search algorithms have been proven to work with incredible accuracy for not only controlling a complicated dynamical system, but also learning optimal policies from various unseen instances. One assumes true nature of the…

Systems and Control · Electrical Eng. & Systems 2020-10-02 Prakash Mallick , Zhiyong Chen , Mohsen Zamani

Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation

Policy optimization methods are popular reinforcement learning algorithms, because their incremental and on-policy nature makes them more stable than the value-based counterparts. However, the same properties also make them slow to converge…

Machine Learning · Computer Science 2021-07-01 Andrea Zanette , Ching-An Cheng , Alekh Agarwal

Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling

Policy optimization methods are powerful algorithms in Reinforcement Learning (RL) for their flexibility to deal with policy parameterization and ability to handle model misspecification. However, these methods usually suffer from slow…

Machine Learning · Computer Science 2023-06-19 Yunfan Li , Yiran Wang , Yu Cheng , Lin Yang

Truly Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from…

Machine Learning · Computer Science 2020-01-15 Yuhui Wang , Hao He , Chao Wen , Xiaoyang Tan

Local policy search with Bayesian optimization

Reinforcement learning (RL) aims to find an optimal policy by interaction with an environment. Consequently, learning complex behavior requires a vast number of samples, which can be prohibitive in practice. Nevertheless, instead of…

Machine Learning · Computer Science 2021-11-23 Sarah Müller , Alexander von Rohr , Sebastian Trimpe

On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling

On-policy reinforcement learning (RL) algorithms are typically characterized as algorithms that perform policy updates using i.i.d. trajectories collected by the agent's current policy. However, after observing only a finite number of…

Machine Learning · Computer Science 2026-02-11 Nicholas E. Corrado , Josiah P. Hanna

Dimensionality Reduction and Prioritized Exploration for Policy Search

Black-box policy optimization is a class of reinforcement learning algorithms that explores and updates the policies at the parameter level. This class of algorithms is widely applied in robotics with movement primitives or…

Machine Learning · Computer Science 2022-03-22 Marius Memmel , Puze Liu , Davide Tateo , Jan Peters

Policy Optimization Prefers The Path of Least Resistance

Policy optimization (PO) algorithms are used to refine Large Language Models for complex, multi-step reasoning. Current state-of-the-art pipelines enforce a strict think-then-answer format to elicit chain-of-thought (CoT); however, the…

Computation and Language · Computer Science 2025-10-28 Debdeep Sanyal , Aakash Sen Sharma , Dhruv Kumar , Saurabh Deshpande , Murari Mandal

Doubly Optimal Policy Evaluation for Reinforcement Learning

Policy evaluation estimates the performance of a policy by (1) collecting data from the environment and (2) processing raw data into a meaningful estimate. Due to the sequential nature of reinforcement learning, any improper data-collecting…

Machine Learning · Computer Science 2025-03-21 Shuze Daniel Liu , Claire Chen , Shangtong Zhang