Related papers: Deployment-Efficient Reinforcement Learning via Mo…

Boosting Offline Reinforcement Learning via Data Rebalancing

Offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets. To address this problem, existing works mainly focus on designing sophisticated algorithms to explicitly or implicitly…

Machine Learning · Computer Science 2022-10-18 Yang Yue , Bingyi Kang , Xiao Ma , Zhongwen Xu , Gao Huang , Shuicheng Yan

Offline Reinforcement Learning for Wireless Network Optimization with Mixture Datasets

The recent development of reinforcement learning (RL) has boosted the adoption of online RL for wireless radio resource management (RRM). However, online RL algorithms require direct interactions with the environment, which may be…

Information Theory · Computer Science 2023-11-21 Kun Yang , Cong Shen , Jing Yang , Shu-ping Yeh , Jerry Sydir

Behaviour Policy Optimization: Provably Lower Variance Return Estimates for Off-Policy Reinforcement Learning

Many reinforcement learning algorithms, particularly those that rely on return estimates for policy improvement, can suffer from poor sample efficiency and training instability due to high-variance return estimates. In this paper we…

Machine Learning · Computer Science 2026-01-06 Alexander W. Goodall , Edwin Hamel-De le Court , Francesco Belardinelli

RAPID: An Efficient Reinforcement Learning Algorithm for Small Language Models

Reinforcement learning (RL) has emerged as a promising strategy for finetuning small language models (SLMs) to solve targeted tasks such as math and coding. However, RL algorithms tend to be resource-intensive, taking a significant amount…

Machine Learning · Computer Science 2025-10-07 Lianghuan Huang , Sagnik Anupam , Insup Lee , Shuo Li , Osbert Bastani

Reinforcement Learning for Machine Learning Model Deployment: Evaluating Multi-Armed Bandits in ML Ops Environments

In modern ML Ops environments, model deployment is a critical process that traditionally relies on static heuristics such as validation error comparisons and A/B testing. However, these methods require human intervention to adapt to…

Machine Learning · Computer Science 2025-03-31 S. Aaron McClendon , Vishaal Venkatesh , Juan Morinelli

Off-Policy RL Algorithms Can be Sample-Efficient for Continuous Control via Sample Multiple Reuse

Sample efficiency is one of the most critical issues for online reinforcement learning (RL). Existing methods achieve higher sample efficiency by adopting model-based methods, Q-ensemble, or better exploration mechanisms. We, instead,…

Machine Learning · Computer Science 2023-05-31 Jiafei Lyu , Le Wan , Zongqing Lu , Xiu Li

Communication-Efficient Policy Gradient Methods for Distributed Reinforcement Learning

This paper deals with distributed policy optimization in reinforcement learning, which involves a central controller and a group of learners. In particular, two typical settings encountered in several applications are considered:…

Machine Learning · Computer Science 2021-04-21 Tianyi Chen , Kaiqing Zhang , Georgios B. Giannakis , Tamer Başar

Efficient Online Reinforcement Learning with Offline Data

Sample efficiency and exploration remain major challenges in online reinforcement learning (RL). A powerful approach that can be applied to address these issues is the inclusion of offline data, such as prior trajectories from a human…

Machine Learning · Computer Science 2023-06-01 Philip J. Ball , Laura Smith , Ilya Kostrikov , Sergey Levine

A Unified Framework for Alternating Offline Model Training and Policy Learning

In offline model-based reinforcement learning (offline MBRL), we learn a dynamic model from historically collected data, and subsequently utilize the learned model and fixed datasets for policy learning, without further interacting with the…

Machine Learning · Computer Science 2022-10-13 Shentao Yang , Shujian Zhang , Yihao Feng , Mingyuan Zhou

Safe Reinforcement Learning with Minimal Supervision

Reinforcement learning (RL) in the real world necessitates the development of procedures that enable agents to explore without causing harm to themselves or others. The most successful solutions to the problem of safe RL leverage offline…

Machine Learning · Computer Science 2025-01-09 Alexander Quessy , Thomas Richardson , Sebastian East

MOPO: Model-based Offline Policy Optimization

Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. This problem setting offers the promise of utilizing such datasets to acquire policies without any…

Machine Learning · Computer Science 2020-11-24 Tianhe Yu , Garrett Thomas , Lantao Yu , Stefano Ermon , James Zou , Sergey Levine , Chelsea Finn , Tengyu Ma

Deploying Offline Reinforcement Learning with Human Feedback

Reinforcement learning (RL) has shown promise for decision-making tasks in real-world applications. One practical framework involves training parameterized policy models from an offline dataset and subsequently deploying them in an online…

Machine Learning · Computer Science 2023-03-14 Ziniu Li , Ke Xu , Liu Liu , Lanqing Li , Deheng Ye , Peilin Zhao

Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization

Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy that maximizes the cumulative rewards in sequential decision making. Most of methods in the existing…

Machine Learning · Statistics 2023-01-06 Chengchun Shi , Zhengling Qi , Jianing Wang , Fan Zhou

Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning…

Machine Learning · Computer Science 2023-10-13 Zhang-Wei Hong , Aviral Kumar , Sathwik Karnik , Abhishek Bhandwaldar , Akash Srivastava , Joni Pajarinen , Romain Laroche , Abhishek Gupta , Pulkit Agrawal

Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons

We consider reinforcement learning (RL) methods in offline domains without additional online data collection, such as mobile health applications. Most of existing policy optimization algorithms in the computer science literature are…

Machine Learning · Statistics 2022-07-28 Chengchun Shi , Shikai Luo , Yuan Le , Hongtu Zhu , Rui Song

A Nonparametric Off-Policy Policy Gradient

Reinforcement learning (RL) algorithms still suffer from high sample complexity despite outstanding recent successes. The need for intensive interactions with the environment is especially observed in many widely popular policy gradient…

Machine Learning · Computer Science 2020-08-04 Samuele Tosatto , Joao Carvalho , Hany Abdulsamad , Jan Peters

Adaptive Policy Learning for Offline-to-Online Reinforcement Learning

Conventional reinforcement learning (RL) needs an environment to collect fresh data, which is impractical when online interactions are costly. Offline RL provides an alternative solution by directly learning from the previously collected…

Machine Learning · Computer Science 2023-03-15 Han Zheng , Xufang Luo , Pengfei Wei , Xuan Song , Dongsheng Li , Jing Jiang

Online Robust Reinforcement Learning with General Function Approximation

In many real-world settings, reinforcement learning systems suffer performance degradation when the environment encountered at deployment differs from that observed during training. Distributionally robust reinforcement learning (DR-RL)…

Machine Learning · Computer Science 2026-03-05 Debamita Ghosh , George K. Atia , Yue Wang

A Workflow for Offline Model-Free Robotic Reinforcement Learning

Offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction. This can allow robots to acquire generalizable skills from large and diverse datasets, without any…

Machine Learning · Computer Science 2021-09-24 Aviral Kumar , Anikait Singh , Stephen Tian , Chelsea Finn , Sergey Levine

Linear Bellman Completeness Suffices for Efficient Online Reinforcement Learning with Few Actions

One of the most natural approaches to reinforcement learning (RL) with function approximation is value iteration, which inductively generates approximations to the optimal value function by solving a sequence of regression problems. To…

Machine Learning · Computer Science 2024-06-19 Noah Golowich , Ankur Moitra