Related papers: Policy Learning Using Weak Supervision

Coherent Soft Imitation Learning

Imitation learning methods seek to learn from an expert either through behavioral cloning (BC) of the policy or inverse reinforcement learning (IRL) of the reward. Such methods enable agents to learn complex tasks from humans that are…

Machine Learning · Computer Science 2023-12-07 Joe Watson , Sandy H. Huang , Nicolas Heess

Reward-Conditioned Policies

Reinforcement learning offers the promise of automating the acquisition of complex behavioral skills. However, compared to commonly used and well-understood supervised learning methods, reinforcement learning algorithms can be brittle,…

Machine Learning · Computer Science 2020-01-01 Aviral Kumar , Xue Bin Peng , Sergey Levine

Residual Off-Policy RL for Finetuning Behavior Cloning Policies

Recent advances in behavior cloning (BC) have enabled impressive visuomotor control policies. However, these approaches are limited by the quality of human demonstrations, the manual effort required for data collection, and the diminishing…

Robotics · Computer Science 2025-09-29 Lars Ankile , Zhenyu Jiang , Rocky Duan , Guanya Shi , Pieter Abbeel , Anusha Nagabandi

A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets

Recent Offline Reinforcement Learning methods have succeeded in learning high-performance policies from fixed datasets of experience. A particularly effective approach learns to first identify and then mimic optimal decision-making…

Machine Learning · Computer Science 2023-12-12 Jake Grigsby , Yanjun Qi

Weakly-Supervised Reinforcement Learning for Controllable Behavior

Reinforcement learning (RL) is a powerful framework for learning to take actions to solve tasks. However, in many settings, an agent must winnow down the inconceivably large space of all possible tasks to the single task that it is…

Machine Learning · Computer Science 2020-11-19 Lisa Lee , Benjamin Eysenbach , Ruslan Salakhutdinov , Shixiang Shane Gu , Chelsea Finn

Density-Ratio Weighted Behavioral Cloning: Learning Control Policies from Corrupted Datasets

Offline reinforcement learning (RL) enables policy optimization from fixed datasets, making it suitable for safety-critical applications where online exploration is infeasible. However, these datasets are often contaminated by adversarial…

Machine Learning · Computer Science 2026-05-19 Shriram Karpoora Sundara Pandian , Ali Baheri

Efficient Offline Reinforcement Learning: First Imitate, then Improve

Supervised imitation-based approaches are often favored over off-policy reinforcement learning approaches for learning policies offline, since their straightforward optimization objective makes them computationally efficient and stable to…

Machine Learning · Computer Science 2025-12-30 Adam Jelley , Trevor McInroe , Sam Devlin , Amos Storkey

When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?

Offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing previously collected experience, without any online interaction. It is widely understood that offline RL is able to extract good policies even from…

Machine Learning · Computer Science 2022-04-13 Aviral Kumar , Joey Hong , Anikait Singh , Sergey Levine

Planning with a Learned Policy Basis to Optimally Solve Complex Tasks

Conventional reinforcement learning (RL) methods can successfully solve a wide range of sequential decision problems. However, learning policies that can generalize predictably across multiple tasks in a setting with non-Markovian reward…

Machine Learning · Computer Science 2024-06-04 Guillermo Infante , David Kuric , Anders Jonsson , Vicenç Gómez , Herke van Hoof

Data Consistency for Weakly Supervised Learning

In many applications, training machine learning models involves using large amounts of human-annotated data. Obtaining precise labels for the data is expensive. Instead, training with weak supervision provides a low-cost alternative. We…

Machine Learning · Computer Science 2022-02-09 Chidubem Arachie , Bert Huang

For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal

In recent years, increasing attention has been directed to leveraging pre-trained vision models for motor control. While existing works mainly emphasize the importance of this pre-training phase, the arguably equally important role played…

Computer Vision and Pattern Recognition · Computer Science 2023-06-21 Yingdong Hu , Renhao Wang , Li Erran Li , Yang Gao

Reinforcement Learning with Supervision from Noisy Demonstrations

Reinforcement learning has achieved great success in various applications. To learn an effective policy for the agent, it usually requires a huge amount of data by interacting with the environment, which could be computational costly and…

Machine Learning · Computer Science 2020-06-16 Kun-Peng Ning , Sheng-Jun Huang

Evaluation-Time Policy Switching for Offline Reinforcement Learning

Offline reinforcement learning (RL) looks at learning how to optimally solve tasks using a fixed dataset of interactions from the environment. Many off-policy algorithms developed for online learning struggle in the offline setting as they…

Machine Learning · Computer Science 2025-03-18 Natinael Solomon Neggatu , Jeremie Houssineau , Giovanni Montana

Improving TD3-BC: Relaxed Policy Constraint for Offline Learning and Stable Online Fine-Tuning

The ability to discover optimal behaviour from fixed data sets has the potential to transfer the successes of reinforcement learning (RL) to domains where data collection is acutely problematic. In this offline setting, a key challenge is…

Machine Learning · Computer Science 2022-11-23 Alex Beeson , Giovanni Montana

Behaviour Policy Optimization: Provably Lower Variance Return Estimates for Off-Policy Reinforcement Learning

Many reinforcement learning algorithms, particularly those that rely on return estimates for policy improvement, can suffer from poor sample efficiency and training instability due to high-variance return estimates. In this paper we…

Machine Learning · Computer Science 2026-01-06 Alexander W. Goodall , Edwin Hamel-De le Court , Francesco Belardinelli

General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States

Learning to evaluate and improve policies is a core problem of Reinforcement Learning (RL). Traditional RL algorithms learn a value function defined for a single policy. A recently explored competitive alternative is to learn a single value…

Machine Learning · Computer Science 2022-07-05 Francesco Faccio , Aditya Ramesh , Vincent Herrmann , Jean Harb , Jürgen Schmidhuber

Guided Meta-Policy Search

Reinforcement learning (RL) algorithms have demonstrated promising results on complex tasks, yet often require impractical numbers of samples since they learn from scratch. Meta-RL aims to address this challenge by leveraging experience…

Machine Learning · Computer Science 2020-10-28 Russell Mendonca , Abhishek Gupta , Rosen Kralev , Pieter Abbeel , Sergey Levine , Chelsea Finn

Policy Learning for Off-Dynamics RL with Deficient Support

Reinforcement Learning (RL) can effectively learn complex policies. However, learning these policies often demands extensive trial-and-error interactions with the environment. In many real-world scenarios, this approach is not practical due…

Machine Learning · Computer Science 2024-02-19 Linh Le Pham Van , Hung The Tran , Sunil Gupta

Self-Supervised Adversarial Imitation Learning

Behavioural cloning is an imitation learning technique that teaches an agent how to behave via expert demonstrations. Recent approaches use self-supervision of fully-observable unlabelled snapshots of the states to decode state pairs into…

Machine Learning · Computer Science 2023-04-24 Juarez Monteiro , Nathan Gavenski , Felipe Meneguzzi , Rodrigo C. Barros

Weaker Than You Think: A Critical Look at Weakly Supervised Learning

Weakly supervised learning is a popular approach for training machine learning models in low-resource settings. Instead of requesting high-quality yet costly human annotations, it allows training models with noisy annotations obtained from…

Computation and Language · Computer Science 2023-09-19 Dawei Zhu , Xiaoyu Shen , Marius Mosbach , Andreas Stephan , Dietrich Klakow