Related papers: AsyncFlow: An Asynchronous Streaming RL Framework …

DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training

Reinforcement learning (RL) has become the pivotal post-training technique for large language model (LLM). Effectively scaling reinforcement learning is now the key to unlocking advanced reasoning capabilities and ensuring safe,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-10 Zhixin Wang , Tianyi Zhou , Liming Liu , Ao Li , Jiarui Hu , Dian Yang , Yinhui Lu , Jinlong Hou , Siyuan Feng , Yuan Cheng , Yuan Qi

Unleashing Efficient Asynchronous RL Post-Training via Staleness-Constrained Rollout Coordination

Reinforcement learning (RL) post-training has become pivotal for enhancing the capabilities of modern large models. A recent trend is to develop RL systems with a fully disaggregated architecture, which decouples the three RL phases…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-21 Haoyang Li , Sheng Lin , Fangcheng Fu , Yuming Zhou , Xiaodong Ji , Yanfeng Zhao , Lefeng Wang , Jie Jiang , Bin Cui

Periodic Asynchrony: An On-Policy Approach for Accelerating LLM Reinforcement Learning

Since the introduction of the GRPO algorithm, reinforcement learning (RL) has attracted increasing attention for LLM post-training, yet training efficiency remains a critical challenge. In mainstream RL frameworks, inference and training…

Machine Learning · Computer Science 2026-05-05 Jian Lu

AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs

Reinforcement learning (RL) is increasingly used to improve the reasoning, coding, and tool-use capabilities of large language models, but agentic RL remains prohibitively expensive. Scaling RL to agentic LLMs requires supporting complex…

Machine Learning · Computer Science 2026-05-18 Haizhong Zheng , Yizhuo Di , Jiahui Wang , Shuowei Jin , Xueshen Liu , Yongji Wu , Z. Morley Mao , Ion Stoica , Jiawei Zhao , Beidi Chen

StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation

Reinforcement learning (RL) has become the core post-training technique for large language models (LLMs). RL for LLMs involves two stages: generation and training. The LLM first generates samples online, which are then used to derive…

Machine Learning · Computer Science 2025-04-23 Yinmin Zhong , Zili Zhang , Xiaoniu Song , Hanpeng Hu , Chao Jin , Bingyang Wu , Nuo Chen , Yukun Chen , Yu Zhou , Changyi Wan , Hongyu Zhou , Yimin Jiang , Yibo Zhu , Daxin Jiang

Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony

Synchronous Reinforcement Learning (RL) post-training has emerged as a crucial step for enhancing Large Language Models (LLMs) with diverse capabilities. However, many systems designed to accelerate RL post-training still suffer from low…

Machine Learning · Computer Science 2025-10-14 Han Lu , Zichen Liu , Shaopan Xiong , Yancheng He , Wei Gao , Yanan Wu , Weixun Wang , Jiashun Liu , Yang Li , Haizhou Zhao , Ju Huang , Siran Yang , Xiaoyang Li , Yijia Luo , Zihe Liu , Ling Pan , Junchi Yan , Wei Wang , Wenbo Su , Jiamang Wang , Lin Qu , Bo Zheng

LineFlow: A Framework to Learn Active Control of Production Lines

Many production lines require active control mechanisms, such as adaptive routing, worker reallocation, and rescheduling, to maintain optimal performance. However, designing these control systems is challenging for various reasons, and…

Machine Learning · Computer Science 2025-05-13 Kai Müller , Martin Wenzel , Tobias Windisch

LlamaRL: A Distributed Asynchronous Reinforcement Learning Framework for Efficient Large-scale LLM Training

Reinforcement Learning (RL) has become the most effective post-training approach for improving the capabilities of Large Language Models (LLMs). In practice, because of the high demands on latency and memory, it is particularly challenging…

Machine Learning · Computer Science 2025-06-03 Bo Wu , Sid Wang , Yunhao Tang , Jia Ding , Eryk Helenowski , Liang Tan , Tengyu Xu , Tushar Gowda , Zhengxing Chen , Chen Zhu , Xiaocheng Tang , Yundi Qian , Beibei Zhu , Rui Hou

MindSpeed RL: Distributed Dataflow for Scalable and Efficient RL Training on Ascend NPU Cluster

Reinforcement learning (RL) is a paradigm increasingly used to align large language models. Popular RL algorithms utilize multiple workers and can be modeled as a graph, where each node is the status of a worker and each edge represents…

Machine Learning · Computer Science 2025-07-28 Laingjun Feng , Chenyi Pan , Xinjie Guo , Fei Mei , Benzhe Ning , Jianxiang Zhang , Xinyang Liu , Beirong Zhou , Zeng Shu , Chang Liu , Guang Yang , Zhenyu Han , Jiangben Wang , Bo Wang

HybridFlow: A Flexible and Efficient RLHF Framework

Reinforcement Learning from Human Feedback (RLHF) is widely used in Large Language Model (LLM) alignment. Traditional RL can be modeled as a dataflow, where each node represents computation of a neural network (NN) and each edge denotes…

Machine Learning · Computer Science 2024-10-03 Guangming Sheng , Chi Zhang , Zilingfeng Ye , Xibin Wu , Wang Zhang , Ru Zhang , Yanghua Peng , Haibin Lin , Chuan Wu

RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation

Reinforcement learning (RL) has demonstrated immense potential in advancing artificial general intelligence, agentic intelligence, and embodied intelligence. However, the inherent heterogeneity and dynamicity of RL workflows often lead to…

Machine Learning · Computer Science 2025-12-30 Chao Yu , Yuanqing Wang , Zhen Guo , Hao Lin , Si Xu , Hongzhi Zang , Quanlu Zhang , Yongji Wu , Chunyang Zhu , Junhao Hu , Zixiao Huang , Mingjie Wei , Yuqing Xie , Ke Yang , Bo Dai , Zhexuan Xu , Jiakun Du , Xiangyuan Wang , Xu Fu , Letong Shi , Zhihao Liu , Kang Chen , Weilin Liu , Gang Liu , Boxun Li , Jianlei Yang , Zhi Yang , Guohao Dai , Yu Wang

Optimal Parallelization Strategies for Active Flow Control in Deep Reinforcement Learning-Based Computational Fluid Dynamics

Deep Reinforcement Learning (DRL) has emerged as a promising approach for handling highly dynamic and nonlinear Active Flow Control (AFC) problems. However, the computational cost associated with training DRL models presents a significant…

Machine Learning · Computer Science 2024-09-27 Wang Jia , Hang Xu

RL-VLA$^3$: A Flexible and Asynchronous Reinforcement Learning Framework for VLA Training

Reinforcement learning (RL) has emerged as a critical paradigm for post-training Vision-Language-Action (VLA) models, enabling embodied agents to adapt and improve through environmental interaction. However, existing RL frameworks for VLAs…

Artificial Intelligence · Computer Science 2026-04-08 Haoran Sun , Yongjian Guo , Zhong Guan , Shuai Di , Xiaodong Bai , Jing Long , Tianyun Zhao , Mingxi Luo , Hongke Zhao , Likang Wu , Xiaotie Deng , Xu Chu , Xi Xiao , Sheng Wen , Yicheng Gong , Junwu Xiong

Laminar: A Scalable Asynchronous RL Post-Training Framework

Reinforcement learning (RL) post-training for Large Language Models (LLMs) is now scaling to large clusters and running for extended durations to enhance model reasoning performance. However, the scalability of existing RL frameworks is…

Machine Learning · Computer Science 2025-10-15 Guangming Sheng , Yuxuan Tong , Borui Wan , Wang Zhang , Chaobo Jia , Xibin Wu , Yuqi Wu , Xiang Li , Chi Zhang , Yanghua Peng , Haibin Lin , Xin Liu , Chuan Wu

Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle

Reinforcement learning (RL) has emerged as an effective post-training paradigm for enhancing the reasoning capabilities of multimodal large language model (MLLM). However, current RL pipelines often suffer from training inefficiencies…

Machine Learning · Computer Science 2026-03-04 Linghao Zhu , Yiran Guan , Dingkang Liang , Jianzhong Ju , Zhenbo Luo , Bin Qin , Jian Luan , Yuliang Liu , Xiang Bai

SeamlessFlow: A Trainer Agent Isolation RL Framework Achieving Bubble-Free Pipelines via Tag Scheduling

We introduce SeamlessFlow, a server based reinforcement learning (RL) framework that addresses two core challenges in industrial scale RL: (1) decoupling RL training from the complex execution flow of agents; (2) maximizing GPU utilization…

Machine Learning · Computer Science 2025-08-18 Jinghui Wang , Shaojie Wang , Yinghan Cui , Xuxing Chen , Chao Wang , Xiaojiang Zhang , Minglei Zhang , Jiarong Zhang , Wenhao Zhuang , Yuchen Cao , Wankang Bao , Haimo Li , Zheng Lin , Huiming Wang , Haoyang Huang , Zongxian Feng , Zizheng Zhan , Ken Deng , Wen Xiang , Huaixi Tang , Kun Wu , Mengtong Li , Mengfei Xie , Junyi Peng , Haotian Zhang , Bin Chen , Bing Yu

Parallel Actors and Learners: A Framework for Generating Scalable RL Implementations

Reinforcement Learning (RL) has achieved significant success in application domains such as robotics, games and health care. However, training RL agents is very time consuming. Current implementations exhibit poor performance due to…

Machine Learning · Computer Science 2021-12-24 Chi Zhang , Sanmukh Rao Kuppannagari , Viktor K Prasanna

Efficient Parallel Reinforcement Learning Framework using the Reactor Model

Parallel Reinforcement Learning (RL) frameworks are essential for mapping RL workloads to multiple computational resources, allowing for faster generation of samples, estimation of values, and policy improvement. These computational…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-06 Jacky Kwok , Marten Lohstroh , Edward A. Lee

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

The rapidly growing demand for high-quality data in Large Language Models (LLMs) has intensified the need for scalable, reliable, and semantically rich data preparation pipelines. However, current practices remain dominated by ad-hoc…

Machine Learning · Computer Science 2025-12-19 Hao Liang , Xiaochen Ma , Zhou Liu , Zhen Hao Wong , Zhengyang Zhao , Zimo Meng , Runming He , Chengyu Shen , Qifeng Cai , Zhaoyang Han , Meiyi Qiang , Yalin Feng , Tianyi Bai , Zewei Pan , Ziyi Guo , Yizhen Jiang , Jingwen Deng , Qijie You , Peichao Lai , Tianyu Guo , Chi Hsu Tsai , Hengyi Feng , Rui Hu , Wenkai Yu , Junbo Niu , Bohan Zeng , Ruichuan An , Lu Ma , Jihao Huang , Yaowei Zheng , Conghui He , Linpeng Tang , Bin Cui , Weinan E , Wentao Zhang

Balancing Signal and Variance: Adaptive Offline RL Post-Training for VLA Flow Models

Vision-Language-Action (VLA) models based on flow matching have shown excellent performance in general-purpose robotic manipulation tasks. However, the action accuracy of these models on complex downstream tasks is unsatisfactory. One…

Robotics · Computer Science 2025-09-05 Hongyin Zhang , Shiyuan Zhang , Junxi Jin , Qixin Zeng , Yifan Qiao , Hongchao Lu , Donglin Wang