Related papers: AstraFlow: Dataflow-Oriented Reinforcement Learnin…

DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training

Reinforcement learning (RL) has become the pivotal post-training technique for large language model (LLM). Effectively scaling reinforcement learning is now the key to unlocking advanced reasoning capabilities and ensuring safe,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-10 Zhixin Wang , Tianyi Zhou , Liming Liu , Ao Li , Jiarui Hu , Dian Yang , Yinhui Lu , Jinlong Hou , Siyuan Feng , Yuan Cheng , Yuan Qi

AgentFly: Extensible and Scalable Reinforcement Learning for LM Agents

Language model (LM) agents have gained significant attention for their ability to autonomously complete tasks through interactions with environments, tools, and APIs. LM agents are primarily built with prompt engineering or supervised…

Artificial Intelligence · Computer Science 2025-07-22 Renxi Wang , Rifo Ahmad Genadi , Bilal El Bouardi , Yongxin Wang , Fajri Koto , Zhengzhong Liu , Timothy Baldwin , Haonan Li

ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas

Large language models (LLMs) are increasingly used as tool-augmented agents for multi-step decision making, yet training robust tool-using agents remains challenging. Existing methods still require manual intervention, depend on…

Computation and Language · Computer Science 2026-02-02 Xiaoyu Tian , Haotian Wang , Shuaiting Chen , Hao Zhou , Kaichi Yu , Yudian Zhang , Jade Ouyang , Junxi Yin , Jiong Chen , Baoyan Guo , Lei Zhang , Junjie Tao , Yuansheng Song , Ming Cui , Chengwei Liu

AgentRL: Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework

Recent advances in large language models (LLMs) have sparked growing interest in building generalist agents that can learn through online interactions. However, applying reinforcement learning (RL) to train LLM agents in multi-turn,…

Artificial Intelligence · Computer Science 2025-10-07 Hanchen Zhang , Xiao Liu , Bowen Lv , Xueqiao Sun , Bohao Jing , Iat Long Iong , Zhenyu Hou , Zehan Qi , Hanyu Lai , Yifan Xu , Rui Lu , Hongning Wang , Jie Tang , Yuxiao Dong

In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Outcome-driven reinforcement learning has advanced reasoning in large language models (LLMs), but prevailing tool-augmented approaches train a single, monolithic policy that interleaves thoughts and tool calls under full context; this…

Artificial Intelligence · Computer Science 2025-10-08 Zhuofeng Li , Haoxiang Zhang , Seungju Han , Sheng Liu , Jianwen Xie , Yu Zhang , Yejin Choi , James Zou , Pan Lu

AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training

Reinforcement learning (RL) has become a pivotal technology in the post-training phase of large language models (LLMs). Traditional task-colocated RL frameworks suffer from significant scalability bottlenecks, while task-separated RL…

Machine Learning · Computer Science 2025-07-03 Zhenyu Han , Ansheng You , Haibo Wang , Kui Luo , Guang Yang , Wenqi Shi , Menglong Chen , Sicheng Zhang , Zeshun Lan , Chunshi Deng , Huazhong Ji , Wenjie Liu , Yu Huang , Yixiang Zhang , Chenyi Pan , Jing Wang , Xin Huang , Chunsheng Li , Jianping Wu

LearningFlow: Automated Policy Learning Workflow for Urban Driving with Large Language Models

Recent advancements in reinforcement learning (RL) demonstrate the significant potential in autonomous driving. Despite this promise, challenges such as the manual design of reward functions and low sample efficiency in complex environments…

Robotics · Computer Science 2025-01-10 Zengqi Peng , Yubin Wang , Xu Han , Lei Zheng , Jun Ma

Rethinking Agentic Reinforcement Learning In Large Language Models

Reinforcement Learning (RL) has traditionally focused on training specialized agents to optimize predefined reward functions within narrowly defined environments. However, the advent of powerful Large Language Models (LLMs) and increasingly…

Artificial Intelligence · Computer Science 2026-05-18 Fangming Cui , Ruixiao Zhu , Cheng Fang , Sunan Li , Jiahong Li

EARL: Efficient Agentic Reinforcement Learning Systems for Large Language Models

Reinforcement learning (RL) has become a pivotal component of large language model (LLM) post-training, and agentic RL extends this paradigm to operate as agents through multi-turn interaction and tool use. Scaling such systems exposes two…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-08 Zheyue Tan , Mustapha Abdullahi , Tuo Shi , Huining Yuan , Zelai Xu , Chao Yu , Boxun Li , Bo Zhao

ARL-Tangram: Unleash the Resource Efficiency in Agentic Reinforcement Learning

Agentic reinforcement learning (RL) has emerged as a transformative workload in cloud clusters, enabling large language models (LLMs) to solve complex problems through interactions with real world. However, unlike traditional RL, agentic RL…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-16 Bangjun Xiao , Yihao Zhao , Xiangwei Deng , Shihua Yu , Yuxing Xiang , Huaqiu Liu , Qiying Wang , Liang Zhao , Hailin Zhang , Xuanzhe Liu , Xin Jin , Fuli Luo

RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem

Researchers and practitioners in the field of reinforcement learning (RL) frequently leverage parallel computation, which has led to a plethora of new algorithms and systems in the last few years. In this paper, we re-examine the challenges…

Machine Learning · Computer Science 2021-11-01 Eric Liang , Zhanghao Wu , Michael Luo , Sven Mika , Joseph E. Gonzalez , Ion Stoica

RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models

Reinforcement learning (RL) shows promise for enhancing LLM agentic reasoning, yet sparse terminal rewards hinder fine-grained optimization. Process reward modeling offers an alternative but incurs high computational costs, reward hacking…

Artificial Intelligence · Computer Science 2026-05-29 Xiao Feng , Bo Han , Zhanke Zhou , Jiaqi Fan , Jiangchao Yao , Ka Ho Li , Dahai Yu , Michael Kwok-Po Ng

Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony

Synchronous Reinforcement Learning (RL) post-training has emerged as a crucial step for enhancing Large Language Models (LLMs) with diverse capabilities. However, many systems designed to accelerate RL post-training still suffer from low…

Machine Learning · Computer Science 2025-10-14 Han Lu , Zichen Liu , Shaopan Xiong , Yancheng He , Wei Gao , Yanan Wu , Weixun Wang , Jiashun Liu , Yang Li , Haizhou Zhao , Ju Huang , Siran Yang , Xiaoyang Li , Yijia Luo , Zihe Liu , Ling Pan , Junchi Yan , Wei Wang , Wenbo Su , Jiamang Wang , Lin Qu , Bo Zheng

MindSpeed RL: Distributed Dataflow for Scalable and Efficient RL Training on Ascend NPU Cluster

Reinforcement learning (RL) is a paradigm increasingly used to align large language models. Popular RL algorithms utilize multiple workers and can be modeled as a graph, where each node is the status of a worker and each edge represents…

Machine Learning · Computer Science 2025-07-28 Laingjun Feng , Chenyi Pan , Xinjie Guo , Fei Mei , Benzhe Ning , Jianxiang Zhang , Xinyang Liu , Beirong Zhou , Zeng Shu , Chang Liu , Guang Yang , Zhenyu Han , Jiangben Wang , Bo Wang

HybridFlow: A Flexible and Efficient RLHF Framework

Reinforcement Learning from Human Feedback (RLHF) is widely used in Large Language Model (LLM) alignment. Traditional RL can be modeled as a dataflow, where each node represents computation of a neural network (NN) and each edge denotes…

Machine Learning · Computer Science 2024-10-03 Guangming Sheng , Chi Zhang , Zilingfeng Ye , Xibin Wu , Wang Zhang , Ru Zhang , Yanghua Peng , Haibin Lin , Chuan Wu

RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation

Reinforcement learning (RL) has demonstrated immense potential in advancing artificial general intelligence, agentic intelligence, and embodied intelligence. However, the inherent heterogeneity and dynamicity of RL workflows often lead to…

Machine Learning · Computer Science 2025-12-30 Chao Yu , Yuanqing Wang , Zhen Guo , Hao Lin , Si Xu , Hongzhi Zang , Quanlu Zhang , Yongji Wu , Chunyang Zhu , Junhao Hu , Zixiao Huang , Mingjie Wei , Yuqing Xie , Ke Yang , Bo Dai , Zhexuan Xu , Jiakun Du , Xiangyuan Wang , Xu Fu , Letong Shi , Zhihao Liu , Kang Chen , Weilin Liu , Gang Liu , Boxun Li , Jianlei Yang , Zhi Yang , Guohao Dai , Yu Wang

Agentic Reinforced Policy Optimization

Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks. In realistic reasoning scenarios, LLMs can…

Machine Learning · Computer Science 2025-07-29 Guanting Dong , Hangyu Mao , Kai Ma , Licheng Bao , Yifei Chen , Zhongyuan Wang , Zhongxia Chen , Jiazhen Du , Huiyang Wang , Fuzheng Zhang , Guorui Zhou , Yutao Zhu , Ji-Rong Wen , Zhicheng Dou

CastFlow: Learning Role-Specialized Agentic Workflows for Time Series Forecasting

Recently, large language models (LLMs) have shown great promise in time series forecasting. However, most existing LLM-based forecasting methods still follow a static generative paradigm that directly maps historical observations to future…

Machine Learning · Computer Science 2026-05-05 Bokai Pan , Mingyue Cheng , Zhiding Liu , Shuo Yu , Xiaoyu Tao , Yuchong Wu , Qi Liu , Defu Lian , Enhong Chen

Demystifying Reinforcement Learning in Agentic Reasoning

Recently, the emergence of agentic RL has showcased that RL could also effectively improve the agentic reasoning ability of LLMs, yet the key design principles and optimal practices remain unclear. In this work, we conduct a comprehensive…

Computation and Language · Computer Science 2025-10-14 Zhaochen Yu , Ling Yang , Jiaru Zou , Shuicheng Yan , Mengdi Wang

AAFLOW: Scalable Patterns for Agentic AI Workflows

Agentic workflows in large language model systems integrate retrieval, reasoning, and memory, but existing frameworks suffer from scalability and reproducibility limitations due to fragmented data orchestration, serialization overhead, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-05 Arup Kumar Sarker , Mills Staylor , Aymen Alsaadi , Gregor von Laszewski , Shantenu Jha , Geoffrey Fox