Related papers: Offline Behavior Distillation

State Diversity Matters in Offline Behavior Distillation

Offline Behavior Distillation (OBD), which condenses massive offline RL data into a compact synthetic behavioral dataset, offers a promising approach for efficient policy training and can be applied across various downstream RL tasks. In…

Machine Learning · Computer Science 2025-12-09 Shiye Lei , Zhihao Cheng , Dacheng Tao

Dataset Distillation for Offline Reinforcement Learning

Offline reinforcement learning often requires a quality dataset that we can train a policy on. However, in many situations, it is not possible to get such a dataset, nor is it easy to train a policy to perform well in the actual environment…

Machine Learning · Computer Science 2025-11-04 Jonathan Light , Yuanzhe Liu , Ziniu Hu

Guided Online Distillation: Promoting Safe Reinforcement Learning by Offline Demonstration

Safe Reinforcement Learning (RL) aims to find a policy that achieves high rewards while satisfying cost constraints. When learning from scratch, safe RL agents tend to be overly conservative, which impedes exploration and restrains the…

Robotics · Computer Science 2023-10-16 Jinning Li , Xinyi Liu , Banghua Zhu , Jiantao Jiao , Masayoshi Tomizuka , Chen Tang , Wei Zhan

Boosting Offline Reinforcement Learning via Data Rebalancing

Offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets. To address this problem, existing works mainly focus on designing sophisticated algorithms to explicitly or implicitly…

Machine Learning · Computer Science 2022-10-18 Yang Yue , Bingyi Kang , Xiao Ma , Zhongwen Xu , Gao Huang , Shuicheng Yan

BRAC+: Improved Behavior Regularized Actor Critic for Offline Reinforcement Learning

Online interactions with the environment to collect data samples for training a Reinforcement Learning (RL) agent is not always feasible due to economic and safety concerns. The goal of Offline Reinforcement Learning is to address this…

Machine Learning · Computer Science 2021-10-05 Chi Zhang , Sanmukh Rao Kuppannagari , Viktor K Prasanna

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

We consider the problem of offline reinforcement learning (RL) -- a well-motivated setting of RL that aims at policy optimization using only historical data. Despite its wide applicability, theoretical understandings of offline RL, such as…

Machine Learning · Computer Science 2021-02-04 Ming Yin , Yu Bai , Yu-Xiang Wang

Behaviour Policy Optimization: Provably Lower Variance Return Estimates for Off-Policy Reinforcement Learning

Many reinforcement learning algorithms, particularly those that rely on return estimates for policy improvement, can suffer from poor sample efficiency and training instability due to high-variance return estimates. In this paper we…

Machine Learning · Computer Science 2026-01-06 Alexander W. Goodall , Edwin Hamel-De le Court , Francesco Belardinelli

Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL

Off-policy reinforcement learning (RL) has achieved notable success in tackling many complex real-world tasks, by leveraging previously collected data for policy learning. However, most existing off-policy RL algorithms fail to maximally…

Machine Learning · Computer Science 2024-05-30 Yu Luo , Tianying Ji , Fuchun Sun , Jianwei Zhang , Huazhe Xu , Xianyuan Zhan

Offline Reinforcement Learning with Behavioral Supervisor Tuning

Offline reinforcement learning (RL) algorithms are applied to learn performant, well-generalizing policies when provided with a static dataset of interactions. Many recent approaches to offline RL have seen substantial success, but with one…

Machine Learning · Computer Science 2024-07-30 Padmanaba Srinivasan , William Knottenbelt

Improving TD3-BC: Relaxed Policy Constraint for Offline Learning and Stable Online Fine-Tuning

The ability to discover optimal behaviour from fixed data sets has the potential to transfer the successes of reinforcement learning (RL) to domains where data collection is acutely problematic. In this offline setting, a key challenge is…

Machine Learning · Computer Science 2022-11-23 Alex Beeson , Giovanni Montana

IPD: Boosting Sequential Policy with Imaginary Planning Distillation in Offline Reinforcement Learning

Decision transformer based sequential policies have emerged as a powerful paradigm in offline reinforcement learning (RL), yet their efficacy remains constrained by the quality of static datasets and inherent architectural limitations.…

Machine Learning · Computer Science 2026-03-05 Yihao Qin , Yuanfei Wang , Hang Zhou , Peiran Liu , Hao Dong , Yiding Ji

OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation

We consider the offline reinforcement learning (RL) setting where the agent aims to optimize the policy solely from the data without further environment interactions. In offline RL, the distributional shift becomes the primary source of…

Machine Learning · Computer Science 2021-06-22 Jongmin Lee , Wonseok Jeon , Byung-Jun Lee , Joelle Pineau , Kee-Eung Kim

Policy Constraint by Only Support Constraint for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to optimize a policy by using pre-collected datasets, to maximize cumulative rewards. However, offline reinforcement learning suffers challenges due to the distributional shift between the learned…

Machine Learning · Computer Science 2025-03-10 Yunkai Gao , Jiaming Guo , Fan Wu , Rui Zhang

Forgetting and Imbalance in Robot Lifelong Learning with Off-policy Data

Robots will experience non-stationary environment dynamics throughout their lifetime: the robot dynamics can change due to wear and tear, or its surroundings may change over time. Eventually, the robots should perform well in all of the…

Robotics · Computer Science 2022-08-19 Wenxuan Zhou , Steven Bohez , Jan Humplik , Abbas Abdolmaleki , Dushyant Rao , Markus Wulfmeier , Tuomas Haarnoja , Nicolas Heess

Reducing Conservativeness Oriented Offline Reinforcement Learning

In offline reinforcement learning, a policy learns to maximize cumulative rewards with a fixed collection of data. Towards conservative strategy, current methods choose to regularize the behavior policy or learn a lower bound of the value…

Machine Learning · Computer Science 2021-03-02 Hongchang Zhang , Jianzhun Shao , Yuhang Jiang , Shuncheng He , Xiangyang Ji

OVD: On-policy Verbal Distillation

Knowledge distillation offers a promising path to transfer reasoning capabilities from large teacher models to efficient student models; however, existing token-level on-policy distillation methods require token-level alignment between the…

Computation and Language · Computer Science 2026-01-30 Jing Xiong , Hui Shen , Shansan Gong , Yuxin Cheng , Jianghan Shen , Chaofan Tao , Haochen Tan , Haoli Bai , Lifeng Shang , Ngai Wong

ReFORM: Reflected Flows for On-support Offline RL via Noise Manipulation

Offline reinforcement learning (RL) aims to learn the optimal policy from a fixed dataset generated by behavior policies without additional environment interactions. One common challenge that arises in this setting is the…

Machine Learning · Computer Science 2026-02-06 Songyuan Zhang , Oswin So , H. M. Sabbir Ahmad , Eric Yang Yu , Matthew Cleaveland , Mitchell Black , Chuchu Fan

Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn optimal policies from previously collected datasets. Recently, due to their powerful representational capabilities, diffusion models have shown significant potential as policy models for…

Machine Learning · Computer Science 2024-05-30 Tianle Zhang , Jiayi Guan , Lin Zhao , Yihang Li , Dongjiang Li , Zecui Zeng , Lei Sun , Yue Chen , Xuelong Wei , Lusong Li , Xiaodong He

From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning

Offline Reinforcement Learning (RL) aims to learn effective policies from a static dataset without requiring further agent-environment interactions. However, its practical adoption is often hindered by the need for explicit reward…

Machine Learning · Computer Science 2025-12-23 Gaurav Chaudhary , Laxmidhar Behera

Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning

Behavior regularization, which constrains the policy to stay close to some behavior policy, is widely used in offline reinforcement learning (RL) to manage the risk of hazardous exploitation of unseen actions. Nevertheless, existing…

Machine Learning · Computer Science 2025-05-30 Chen-Xiao Gao , Chenyang Wu , Mingjun Cao , Chenjun Xiao , Yang Yu , Zongzhang Zhang