Related papers: Optimistic Task Inference for Behavior Foundation …

Zero-Shot Adaptation of Behavioral Foundation Models to Unseen Dynamics

Behavioral Foundation Models (BFMs) proved successful in producing policies for arbitrary tasks in a zero-shot manner, requiring no test-time training or task-specific fine-tuning. Among the most promising BFMs are the ones that estimate…

Machine Learning · Computer Science 2026-05-05 Maksim Bobrin , Ilya Zisman , Alexander Nikulin , Vladislav Kurenkov , Dmitry Dylov

When Dynamics Shift, Robust Task Inference Wins: Offline Imitation Learning with Behavior Foundation Models Revisited

Behavior Foundation Models (BFMs) enable scalable imitation learning (IL) by pretraining task-agnostic representations that can be rapidly adapted to new tasks. However, existing BFMs assume fixed environment dynamics, limiting their…

Machine Learning · Computer Science 2026-05-19 Rishabh Agrawal , Rahul Jain , Ashutosh Nayyar

Fast Adaptation with Behavioral Foundation Models

Unsupervised zero-shot reinforcement learning (RL) has emerged as a powerful paradigm for pretraining behavioral foundation models (BFMs), enabling agents to solve a wide range of downstream tasks specified via reward functions in a…

Machine Learning · Computer Science 2025-04-11 Harshit Sikchi , Andrea Tirinzoni , Ahmed Touati , Yingchen Xu , Anssi Kanervisto , Scott Niekum , Amy Zhang , Alessandro Lazaric , Matteo Pirotta

Outcome-Driven Reinforcement Learning via Variational Inference

While reinforcement learning algorithms provide automated acquisition of optimal policies, practical application of such methods requires a number of design decisions, such as manually designing reward functions that not only define the…

Machine Learning · Computer Science 2022-12-29 Tim G. J. Rudner , Vitchyr H. Pong , Rowan McAllister , Yarin Gal , Sergey Levine

Improving Zero-Shot Offline RL via Behavioral Task Sampling

Offline zero-shot reinforcement learning (RL) aims to learn agents that optimize unseen reward functions without additional environment interaction. The standard approach to this problem trains task-conditioned policies by sampling task…

Artificial Intelligence · Computer Science 2026-04-29 Nazim Bendib , Nicolas Perrin-Gilbert , Olivier Sigaud

Optimistic Feasible Search for Closed-Loop Fair Threshold Decision-Making

Closed-loop decision-making systems (e.g., lending, screening, or recidivism risk assessment) often operate under fairness and service constraints while inducing feedback effects: decisions change who appears in the future, yielding…

Machine Learning · Computer Science 2025-12-30 Wenzhang Du

Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models

Behavioral Foundation Models (BFMs) produce agents with the capability to adapt to any unknown reward or task. These methods, however, are only able to produce near-optimal policies for the reward functions that are in the span of some…

Artificial Intelligence · Computer Science 2026-03-18 Pranaya Jajoo , Harshit Sikchi , Siddhant Agarwal , Amy Zhang , Scott Niekum , Martha White

Optimistic World Models: Efficient Exploration in Model-Based Deep Reinforcement Learning

Efficient exploration remains a central challenge in reinforcement learning (RL), particularly in sparse-reward environments. We introduce Optimistic World Models (OWMs), a principled and scalable framework for optimistic exploration that…

Machine Learning · Computer Science 2026-02-11 Akshay Mete , Shahid Aamir Sheikh , Tzu-Hsiang Lin , Dileep Kalathil , P. R. Kumar

BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning

Building Behavioral Foundation Models (BFMs) for humanoid robots has the potential to unify diverse control tasks under a single, promptable generalist policy. However, existing approaches are either exclusively deployed on simulated…

Robotics · Computer Science 2025-11-07 Yitang Li , Zhengyi Luo , Tonghe Zhang , Cunxi Dai , Anssi Kanervisto , Andrea Tirinzoni , Haoyang Weng , Kris Kitani , Mateusz Guzek , Ahmed Touati , Alessandro Lazaric , Matteo Pirotta , Guanya Shi

Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer

In many real-world applications, reinforcement learning (RL) agents might have to solve multiple tasks, each one typically modeled via a reward function. If reward functions are expressed linearly, and the agent has previously learned a set…

Machine Learning · Computer Science 2022-06-24 Lucas N. Alegre , Ana L. C. Bazzan , Bruno C. da Silva

Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models

Recent advancements in imitation learning have led to transformer-based behavior foundation models (BFMs) that enable multi-modal, human-like control for humanoid agents. While excelling at zero-shot generation of robust behaviors, BFMs…

Machine Learning · Computer Science 2026-03-30 Ron Vainshtein , Zohar Rimon , Shie Mannor , Chen Tessler

Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting

The forward-backward representation (FB) is a recently proposed framework (Touati et al., 2023; Touati & Ollivier, 2021) to train behavior foundation models (BFMs) that aim at providing zero-shot efficient policies for any new task…

Machine Learning · Computer Science 2024-12-06 Edoardo Cetin , Ahmed Touati , Yann Ollivier

Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization

Reward models (RMs) play a crucial role in reinforcement learning from human feedback (RLHF), aligning model behavior with human preferences. However, existing benchmarks for reward models show a weak correlation with the performance of…

Machine Learning · Computer Science 2025-05-20 Sunghwan Kim , Dongjin Kang , Taeyoon Kwon , Hyungjoo Chae , Dongha Lee , Jinyoung Yeo

Beyond Optimism: Exploration With Partially Observable Rewards

Exploration in reinforcement learning (RL) remains an open challenge. RL algorithms rely on observing rewards to train the agent, and if informative rewards are sparse the agent learns slowly or may not learn at all. To improve exploration…

Machine Learning · Computer Science 2024-11-12 Simone Parisi , Alireza Kazemipour , Michael Bowling

Fractional Moments on Bandit Problems

Reinforcement learning addresses the dilemma between exploration to find profitable actions and exploitation to act according to the best observations already made. Bandit problems are one such class of problems in stateless environments…

Machine Learning · Computer Science 2012-02-20 Ananda Narayanan B , Balaraman Ravindran

Reliable Off-policy Evaluation for Reinforcement Learning

In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy using logged trajectory data generated from a different behavior policy, without execution of the target policy.…

Machine Learning · Computer Science 2022-11-04 Jie Wang , Rui Gao , Hongyuan Zha

Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback

In many real-world applications, it is hard to provide a reward signal in each step of a Reinforcement Learning (RL) process and more natural to give feedback when an episode ends. To this end, we study the recently proposed model of RL…

Machine Learning · Computer Science 2024-05-15 Asaf Cassel , Haipeng Luo , Aviv Rosenberg , Dmitry Sotnikov

Optimistic Proximal Policy Optimization

Reinforcement Learning, a machine learning framework for training an autonomous agent based on rewards, has shown outstanding results in various domains. However, it is known that learning a good policy is difficult in a domain where…

Machine Learning · Computer Science 2019-06-27 Takahisa Imagawa , Takuya Hiraoka , Yoshimasa Tsuruoka

ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models

In robot manipulation, Reinforcement Learning (RL) often suffers from low sample efficiency and uncertain convergence, especially in large observation and action spaces. Foundation Models (FMs) offer an alternative, demonstrating promise in…

Robotics · Computer Science 2025-04-18 Runyu Ma , Jelle Luijkx , Zlatan Ajanovic , Jens Kober

Soft Forward-Backward Representations for Zero-shot Reinforcement Learning with General Utilities

Recent advancements in zero-shot reinforcement learning (RL) have facilitated the extraction of diverse behaviors from unlabeled, offline data sources. In particular, forward-backward algorithms (FB) can retrieve a family of policies that…

Machine Learning · Computer Science 2026-02-09 Marco Bagatella , Thomas Rupf , Georg Martius , Andreas Krause