Related papers: Gradient-Aware Model-based Policy Search

Policy-Aware Model Learning for Policy Gradient Methods

This paper considers the problem of learning a model in model-based reinforcement learning (MBRL). We examine how the planning module of an MBRL algorithm uses the model, and propose that the model learning module should incorporate the way…

Artificial Intelligence · Computer Science 2021-01-05 Romina Abachi , Mohammad Ghavamzadeh , Amir-massoud Farahmand

Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation

The goal of reinforcement learning (RL) is to let an agent learn an optimal control policy in an unknown environment so that future expected rewards are maximized. The model-free RL approach directly learns the policy based on data samples.…

Machine Learning · Statistics 2013-07-22 Syogo Mori , Voot Tangkaratt , Tingting Zhao , Jun Morimoto , Masashi Sugiyama

Gradient-free Policy Architecture Search and Adaptation

We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can…

Machine Learning · Computer Science 2017-10-18 Sayna Ebrahimi , Anna Rohrbach , Trevor Darrell

Gradient-based Reinforcement Planning in Policy-Search Methods

We introduce a learning method called ``gradient-based reinforcement planning'' (GREP). Unlike traditional DP methods that improve their policy backwards in time, GREP is a gradient-based method that plans ahead and improves its policy…

Artificial Intelligence · Computer Science 2007-05-23 Ivo Kwee , Marcus Hutter , Juergen Schmidhuber

Efficient Sample Reuse in Policy Gradients with Parameter-based Exploration

The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge in this scenario is how to reduce the variance of policy…

Machine Learning · Computer Science 2013-01-18 Tingting Zhao , Hirotaka Hachiya , Voot Tangkaratt , Jun Morimoto , Masashi Sugiyama

Uncertainty-aware Model-based Policy Optimization

Model-based reinforcement learning has the potential to be more sample efficient than model-free approaches. However, existing model-based methods are vulnerable to model bias, which leads to poor generalization and asymptotic performance…

Machine Learning · Computer Science 2019-06-27 Tung-Long Vuong , Kenneth Tran

Fast Model-based Policy Search for Universal Policy Networks

Adapting an agent's behaviour to new environments has been one of the primary focus areas of physics based reinforcement learning. Although recent approaches such as universal policy networks partially address this issue by enabling the…

Machine Learning · Computer Science 2022-02-15 Buddhika Laknath Semage , Thommen George Karimpanal , Santu Rana , Svetha Venkatesh

Model-based Policy Optimization with Unsupervised Model Adaptation

Model-based reinforcement learning methods learn a dynamics model with real data sampled from the environment and leverage it to generate simulated data to derive an agent. However, due to the potential distribution mismatch between…

Machine Learning · Computer Science 2020-10-29 Jian Shen , Han Zhao , Weinan Zhang , Yong Yu

Can Agents Learn by Analogy? An Inferable Model for PAC Reinforcement Learning

Model-based reinforcement learning algorithms make decisions by building and utilizing a model of the environment. However, none of the existing algorithms attempts to infer the dynamics of any state-action pair from known state-action…

Machine Learning · Computer Science 2020-02-25 Yanchao Sun , Furong Huang

A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning

A fundamental challenge in multiagent reinforcement learning is to learn beneficial behaviors in a shared environment with other simultaneously learning agents. In particular, each agent perceives the environment as effectively…

Machine Learning · Computer Science 2021-06-15 Dong-Ki Kim , Miao Liu , Matthew Riemer , Chuangchuang Sun , Marwa Abdulhai , Golnaz Habibi , Sebastian Lopez-Cot , Gerald Tesauro , Jonathan P. How

Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

Humans are masters at quickly learning many complex tasks, relying on an approximate understanding of the dynamics of their environments. In much the same way, we would like our learning agents to quickly adapt to new tasks. In this paper,…

Robotics · Computer Science 2019-10-10 Arunkumar Byravan , Jost Tobias Springenberg , Abbas Abdolmaleki , Roland Hafner , Michael Neunert , Thomas Lampe , Noah Siegel , Nicolas Heess , Martin Riedmiller

On the model-based stochastic value gradient for continuous reinforcement learning

For over a decade, model-based reinforcement learning has been seen as a way to leverage control-based domain knowledge to improve the sample-efficiency of reinforcement learning agents. While model-based agents are conceptually appealing,…

Machine Learning · Computer Science 2021-05-28 Brandon Amos , Samuel Stanton , Denis Yarats , Andrew Gordon Wilson

Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods

In this paper we propose a novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem. The…

Machine Learning · Computer Science 2012-06-26 Gergely Neu , Csaba Szepesvari

Model-free Policy Learning with Reward Gradients

Despite the increasing popularity of policy gradient methods, they are yet to be widely utilized in sample-scarce applications, such as robotics. The sample efficiency could be improved by making best usage of available information. As a…

Machine Learning · Computer Science 2023-11-03 Qingfeng Lan , Samuele Tosatto , Homayoon Farrahi , A. Rupam Mahmood

Gradient Informed Proximal Policy Optimization

We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm. To incorporate analytical gradients into the PPO framework, we…

Machine Learning · Computer Science 2023-12-15 Sanghyun Son , Laura Yu Zheng , Ryan Sullivan , Yi-Ling Qiao , Ming C. Lin

SocialGFs: Learning Social Gradient Fields for Multi-Agent Reinforcement Learning

Multi-agent systems (MAS) need to adaptively cope with dynamic environments, changing agent populations, and diverse tasks. However, most of the multi-agent systems cannot easily handle them, due to the complexity of the state and task…

Artificial Intelligence · Computer Science 2024-05-06 Qian Long , Fangwei Zhong , Mingdong Wu , Yizhou Wang , Song-Chun Zhu

Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization

Recent advancements in agentic test-time scaling allow models to gather environmental feedback before committing to final actions. A key limitation of existing methods is that they typically employ undifferentiated exploration strategies,…

Artificial Intelligence · Computer Science 2026-05-13 Xingyuan Hua , Sheng Yue , Ju Ren

Improving Gradient Estimation by Incorporating Sensor Data

An efficient policy search algorithm should estimate the local gradient of the objective function, with respect to the policy parameters, from as few trials as possible. Whereas most policy search methods estimate this gradient by observing…

Artificial Intelligence · Computer Science 2012-06-18 Gregory Lawrence , Stuart Russell

Low-Variance Policy Gradient Estimation with World Models

In this paper, we propose World Model Policy Gradient (WMPG), an approach to reduce the variance of policy gradient estimates using learned world models (WM's). In WMPG, a WM is trained online and used to imagine trajectories. The imagined…

Machine Learning · Statistics 2020-10-30 Michal Nauman , Floris Den Hengst

Domain Knowledge Integration By Gradient Matching For Sample-Efficient Reinforcement Learning

Model-free deep reinforcement learning (RL) agents can learn an effective policy directly from repeated interactions with a black-box environment. However in practice, the algorithms often require large amounts of training experience to…

Machine Learning · Computer Science 2020-05-29 Parth Chadha