Related papers: WALL-E: An Efficient Reinforcement Learning Resear…

EARL: Efficient Agentic Reinforcement Learning Systems for Large Language Models

Reinforcement learning (RL) has become a pivotal component of large language model (LLM) post-training, and agentic RL extends this paradigm to operate as agents through multi-turn interaction and tool use. Scaling such systems exposes two…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-08 Zheyue Tan , Mustapha Abdullahi , Tuo Shi , Huining Yuan , Zelai Xu , Chao Yu , Boxun Li , Bo Zhao

Sample Efficient Reinforcement Learning by Automatically Learning to Compose Subtasks

Improving sample efficiency is central to Reinforcement Learning (RL), especially in environments where the rewards are sparse. Some recent approaches have proposed to specify reward functions as manually designed or learned reward…

Machine Learning · Computer Science 2024-01-26 Shuai Han , Mehdi Dastani , Shihan Wang

Towards Sample-Efficiency and Generalization of Transfer and Inverse Reinforcement Learning: A Comprehensive Literature Review

Reinforcement learning (RL) is a sub-domain of machine learning, mainly concerned with solving sequential decision-making problems by a learning agent that interacts with the decision environment to improve its behavior through the reward…

Machine Learning · Computer Science 2025-09-23 Hossein Hassani , Ehsan Hallaji , Roozbeh Razavi-Far , Mehrdad Saif , Liang Lin

Spreeze: High-Throughput Parallel Reinforcement Learning Framework

The promotion of large-scale applications of reinforcement learning (RL) requires efficient training computation. While existing parallel RL frameworks encompass a variety of RL algorithms and parallelization techniques, the excessively…

Machine Learning · Computer Science 2023-12-12 Jing Hou , Guang Chen , Ruiqi Zhang , Zhijun Li , Shangding Gu , Changjun Jiang

SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling

Scaling reinforcement learning (RL) has shown strong promise for enhancing the reasoning abilities of large language models (LLMs), particularly in tasks requiring long chain-of-thought generation. However, RL training efficiency is often…

Machine Learning · Computer Science 2026-03-25 Yiqi Zhang , Huiqiang Jiang , Xufang Luo , Zhihe Yang , Chengruidong Zhang , Yifei Shen , Dongsheng Li , Yuqing Yang , Lili Qiu , Yang You

Do Not Waste Your Rollouts: Recycling Search Experience for Efficient Test-Time Scaling

Test-Time Scaling enhances the reasoning capabilities of Large Language Models by allocating additional inference compute to broaden the exploration of the solution space. However, existing search strategies typically treat rollouts as…

Computation and Language · Computer Science 2026-05-06 Xinglin Wang , Jiayi Shi , Shaoxiong Feng , Peiwen Yuan , Yiwei Li , Yueqi Zhang , Chuyi Tan , Ji Zhang , Boyuan Pan , Yao Hu , Kan Li

Meta-Reinforcement Learning via Exploratory Task Clustering

Meta-reinforcement learning (meta-RL) aims to quickly solve new tasks by leveraging knowledge from prior tasks. However, previous studies often assume a single mode homogeneous task distribution, ignoring possible structured heterogeneity…

Machine Learning · Computer Science 2023-02-17 Zhendong Chu , Hongning Wang

Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL

Group-relative RL training (GRPO) samples a small group of parallel rollouts for every training prompt and uses their within-group reward spread to compute per-trajectory advantages. In agentic environments each rollout is a long multi-turn…

Machine Learning · Computer Science 2026-05-08 Zhiyuan Zhai , Xin Wang

Hierarchical Reinforcement Learning with Hindsight

Reinforcement Learning (RL) algorithms can suffer from poor sample efficiency when rewards are delayed and sparse. We introduce a solution that enables agents to learn temporally extended actions at multiple levels of abstraction in a…

Machine Learning · Computer Science 2019-03-11 Andrew Levy , Robert Platt , Kate Saenko

Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library

We introduce ROLL, an efficient, scalable, and user-friendly library designed for Reinforcement Learning Optimization for Large-scale Learning. ROLL caters to three primary user groups: tech pioneers aiming for cost-effective,…

Machine Learning · Computer Science 2025-06-09 Weixun Wang , Shaopan Xiong , Gengru Chen , Wei Gao , Sheng Guo , Yancheng He , Ju Huang , Jiaheng Liu , Zhendong Li , Xiaoyang Li , Zichen Liu , Haizhou Zhao , Dakai An , Lunxi Cao , Qiyang Cao , Wanxi Deng , Feilei Du , Yiliang Gu , Jiahe Li , Xiang Li , Mingjie Liu , Yijia Luo , Zihe Liu , Yadao Wang , Pei Wang , Tianyuan Wu , Yanan Wu , Yuheng Zhao , Shuaibing Zhao , Jin Yang , Siran Yang , Yingshui Tan , Huimin Yi , Yuchi Xu , Yujin Yuan , Xingyao Zhang , Lin Qu , Wenbo Su , Wei Wang , Jiamang Wang , Bo Zheng

ECHO-2: A Large-Scale Distributed Rollout Framework for Cost-Efficient Reinforcement Learning

Reinforcement learning (RL) is a critical stage in post-training large language models (LLMs), involving repeated interaction between rollout generation, reward evaluation, and centralized learning. Distributing rollout execution offers…

Machine Learning · Computer Science 2026-05-27 Jingwei Song , Meng Chen , Jie Xiao , Qingnan Ren , Jiaqi Huang , Yangshen Deng , Chris Tong , Wanyi Chen , Suli Wang , Zhisheng Chen , Ziqian Bi , Shuo Lu , Yiqun Duan , Xu Wang , Rymon Yu , Lynn Ai , Eric Yang , Tianyu Shi

Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle

Reinforcement learning (RL) has emerged as an effective post-training paradigm for enhancing the reasoning capabilities of multimodal large language model (MLLM). However, current RL pipelines often suffer from training inefficiencies…

Machine Learning · Computer Science 2026-03-04 Linghao Zhu , Yiran Guan , Dingkang Liang , Jianzhong Ju , Zhenbo Luo , Bin Qin , Jian Luan , Yuliang Liu , Xiang Bai

Breaking the Performance Ceiling in Reinforcement Learning requires Inference Strategies

Reinforcement learning (RL) systems have countless applications, from energy-grid management to protein design. However, such real-world scenarios are often extremely difficult, combinatorial in nature, and require complex coordination…

Machine Learning · Computer Science 2025-12-19 Felix Chalumeau , Daniel Rajaonarivonivelomanantsoa , Ruan de Kock , Claude Formanek , Sasha Abramowitz , Oumayma Mahjoub , Wiem Khlifi , Simon Du Toit , Louay Ben Nessir , Refiloe Shabe , Noah De Nicola , Arnol Fokam , Siddarth Singh , Ulrich Mbou Sob , Arnu Pretorius

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Parallel thinking has emerged as a novel approach for enhancing the reasoning capabilities of large language models (LLMs) by exploring multiple reasoning paths concurrently. However, activating such capabilities through training remains…

Computation and Language · Computer Science 2025-09-15 Tong Zheng , Hongming Zhang , Wenhao Yu , Xiaoyang Wang , Runpeng Dai , Rui Liu , Huiwen Bao , Chengsong Huang , Heng Huang , Dong Yu

Reinforcement Learning for Machine Learning Engineering Agents

Existing agents for solving tasks such as ML engineering rely on prompting powerful language models. As a result, these agents do not improve with more experience. In this paper, we show that agents backed by weaker models that improve via…

Machine Learning · Computer Science 2025-09-04 Sherry Yang , Joy He-Yueya , Percy Liang

A Real-Time Model-Based Reinforcement Learning Architecture for Robot Control

Reinforcement Learning (RL) is a method for learning decision-making tasks that could enable robots to learn and adapt to their situation on-line. For an RL algorithm to be practical for robotic control tasks, it must learn in very few…

Artificial Intelligence · Computer Science 2015-03-19 Todd Hester , Michael Quinlan , Peter Stone

Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation

Safe reinforcement learning (RL) is crucial for deploying RL agents in real-world applications, as it aims to maximize long-term rewards while satisfying safety constraints. However, safe RL often suffers from sample inefficiency, requiring…

Machine Learning · Computer Science 2024-06-03 Shangding Gu , Laixi Shi , Yuhao Ding , Alois Knoll , Costas Spanos , Adam Wierman , Ming Jin

Sample Efficient Reinforcement Learning in Mixed Systems through Augmented Samples and Its Applications to Queueing Networks

This paper considers a class of reinforcement learning problems, which involve systems with two types of states: stochastic and pseudo-stochastic. In such systems, stochastic states follow a stochastic transition kernel while the…

Machine Learning · Computer Science 2023-11-09 Honghao Wei , Xin Liu , Weina Wang , Lei Ying

Human-Inspired Framework to Accelerate Reinforcement Learning

Reinforcement learning (RL) is crucial for data science decision-making but suffers from sample inefficiency, particularly in real-world scenarios with costly physical interactions. This paper introduces a novel human-inspired framework to…

Machine Learning · Computer Science 2024-03-13 Ali Beikmohammadi , Sindri Magnússon

Analysis of Reinforcement Learning for determining task replication in workflows

Executing workflows on volunteer computing resources where individual tasks may be forced to relinquish their resource for the resource's primary use leads to unpredictability and often significantly increases execution time. Task…

Performance · Computer Science 2022-09-28 Andrew Stephen McGough , Matthew Forshaw