Related papers: Rollout Sampling Approximate Policy Iteration

Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration

Several approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervised learning problem, have been proposed recently. Finding good policies…

Machine Learning · Statistics 2009-12-30 Christos Dimitrakakis , Michail G. Lagoudakis

Beyond the One Step Greedy Approach in Reinforcement Learning

The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g, $n$-step and trace-based returns, have been…

Artificial Intelligence · Computer Science 2018-08-01 Yonathan Efroni , Gal Dalal , Bruno Scherrer , Shie Mannor

Accelerating Safe Reinforcement Learning with Constraint-mismatched Policies

We consider the problem of reinforcement learning when provided with (1) a baseline control policy and (2) a set of constraints that the learner must satisfy. The baseline policy can arise from demonstration data or a teacher agent and may…

Machine Learning · Computer Science 2021-07-13 Tsung-Yen Yang , Justinian Rosca , Karthik Narasimhan , Peter J. Ramadge

Reinforcement Learning in Economics and Finance

Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal…

Theoretical Economics · Economics 2020-03-24 Arthur Charpentier , Romuald Elie , Carl Remlinger

A Hybrid Approach for Reinforcement Learning Using Virtual Policy Gradient for Balancing an Inverted Pendulum

Using the policy gradient algorithm, we train a single-hidden-layer neural network to balance a physically accurate simulation of a single inverted pendulum. The trained weights and biases can then be transferred to a physical agent, where…

Machine Learning · Computer Science 2021-02-17 Dylan Bates

Reinforcement Learning in Education: A Multi-Armed Bandit Approach

Advances in reinforcement learning research have demonstrated the ways in which different agent-based models can learn how to optimally perform a task within a given environment. Reinforcement leaning solves unsupervised problems where…

Machine Learning · Computer Science 2022-11-03 Herkulaas Combrink , Vukosi Marivate , Benjamin Rosman

Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation

We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are…

Machine Learning · Computer Science 2022-10-17 Anna Winnicki , R. Srikant

Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation

We consider the problem of imitation learning from a finite set of expert trajectories, without access to reinforcement signals. The classical approach of extracting the expert's reward function via inverse reinforcement learning, followed…

Machine Learning · Computer Science 2019-06-10 Ruohan Wang , Carlo Ciliberto , Pierluigi Amadori , Yiannis Demiris

Reinforcement Learning for Pivoting Task

In this work we propose an approach to learn a robust policy for solving the pivoting task. Recently, several model-free continuous control algorithms were shown to learn successful policies without prior knowledge of the dynamics of the…

Robotics · Computer Science 2017-03-03 Rika Antonova , Silvia Cruciani , Christian Smith , Danica Kragic

Reinforcement Learning with an Abrupt Model Change

The problem of reinforcement learning is considered where the environment or the model undergoes a change. An algorithm is proposed that an agent can apply in such a problem to achieve the optimal long-time discounted reward. The algorithm…

Systems and Control · Electrical Eng. & Systems 2023-04-25 Wuxia Chen , Taposh Banerjee , Jemin George , Carl Busart

Imitation-Projected Programmatic Reinforcement Learning

We study the problem of programmatic reinforcement learning, in which policies are represented as short programs in a symbolic language. Programmatic policies can be more interpretable, generalizable, and amenable to formal verification…

Machine Learning · Computer Science 2021-01-21 Abhinav Verma , Hoang M. Le , Yisong Yue , Swarat Chaudhuri

Adversarial Imitation via Variational Inverse Reinforcement Learning

We consider a problem of learning the reward and policy from expert examples under unknown dynamics. Our proposed method builds on the framework of generative adversarial networks and introduces the empowerment-regularized maximum-entropy…

Machine Learning · Computer Science 2019-02-26 Ahmed H. Qureshi , Byron Boots , Michael C. Yip

Sample Complexity of Estimating the Policy Gradient for Nearly Deterministic Dynamical Systems

Reinforcement learning is a promising approach to learning robotics controllers. It has recently been shown that algorithms based on finite-difference estimates of the policy gradient are competitive with algorithms based on the policy…

Machine Learning · Computer Science 2021-10-12 Osbert Bastani

Integration of Reinforcement Learning Based Behavior Planning With Sampling Based Motion Planning for Automated Driving

Reinforcement learning has received high research interest for developing planning approaches in automated driving. Most prior works consider the end-to-end planning task that yields direct control commands and rarely deploy their algorithm…

Robotics · Computer Science 2023-07-31 Marvin Klimke , Benjamin Völz , Michael Buchholz

Reinforcement learning for non-prehensile manipulation: Transfer from simulation to physical system

Reinforcement learning has emerged as a promising methodology for training robot controllers. However, most results have been limited to simulation due to the need for a large number of samples and the lack of automated-yet-safe data…

Robotics · Computer Science 2018-03-29 Kendall Lowrey , Svetoslav Kolev , Jeremy Dao , Aravind Rajeswaran , Emanuel Todorov

Reward-Conditioned Policies

Reinforcement learning offers the promise of automating the acquisition of complex behavioral skills. However, compared to commonly used and well-understood supervised learning methods, reinforcement learning algorithms can be brittle,…

Machine Learning · Computer Science 2020-01-01 Aviral Kumar , Xue Bin Peng , Sergey Levine

A Benchmark Comparison of Imitation Learning-based Control Policies for Autonomous Racing

Autonomous racing with scaled race cars has gained increasing attention as an effective approach for developing perception, planning and control algorithms for safe autonomous driving at the limits of the vehicle's handling. To train agile…

Robotics · Computer Science 2023-05-30 Xiatao Sun , Mingyan Zhou , Zhijun Zhuang , Shuo Yang , Johannes Betz , Rahul Mangharam

On The Convergence Of Policy Iteration-Based Reinforcement Learning With Monte Carlo Policy Evaluation

A common technique in reinforcement learning is to evaluate the value function from Monte Carlo simulations of a given policy, and use the estimated value function to obtain a new policy which is greedy with respect to the estimated value…

Machine Learning · Computer Science 2023-03-01 Anna Winnicki , R. Srikant

Is Your Imitation Learning Policy Better than Mine? Policy Comparison with Near-Optimal Stopping

Imitation learning has enabled robots to perform complex, long-horizon tasks in challenging dexterous manipulation settings. As new methods are developed, they must be rigorously evaluated and compared against corresponding baselines…

Robotics · Computer Science 2025-06-09 David Snyder , Asher James Hancock , Apurva Badithela , Emma Dixon , Patrick Miller , Rares Andrei Ambrus , Anirudha Majumdar , Masha Itkina , Haruki Nishimura

Partial Policy Gradients for RL in LLMs

Reinforcement learning is a framework for learning to act sequentially in an unknown environment. We propose a natural approach for modeling policy structure in policy gradients. The key idea is to optimize for a subset of future rewards:…

Machine Learning · Computer Science 2026-03-09 Puneet Mathur , Branislav Kveton , Subhojyoti Mukherjee , Viet Dac Lai