English
Related papers

Related papers: AstraFlow: Dataflow-Oriented Reinforcement Learnin…

200 papers

Reinforcement learning (RL) has become the pivotal post-training technique for large language model (LLM). Effectively scaling reinforcement learning is now the key to unlocking advanced reasoning capabilities and ensuring safe,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-10 Zhixin Wang , Tianyi Zhou , Liming Liu , Ao Li , Jiarui Hu , Dian Yang , Yinhui Lu , Jinlong Hou , Siyuan Feng , Yuan Cheng , Yuan Qi

Language model (LM) agents have gained significant attention for their ability to autonomously complete tasks through interactions with environments, tools, and APIs. LM agents are primarily built with prompt engineering or supervised…

Artificial Intelligence · Computer Science 2025-07-22 Renxi Wang , Rifo Ahmad Genadi , Bilal El Bouardi , Yongxin Wang , Fajri Koto , Zhengzhong Liu , Timothy Baldwin , Haonan Li

Large language models (LLMs) are increasingly used as tool-augmented agents for multi-step decision making, yet training robust tool-using agents remains challenging. Existing methods still require manual intervention, depend on…

Recent advances in large language models (LLMs) have sparked growing interest in building generalist agents that can learn through online interactions. However, applying reinforcement learning (RL) to train LLM agents in multi-turn,…

Artificial Intelligence · Computer Science 2025-10-07 Hanchen Zhang , Xiao Liu , Bowen Lv , Xueqiao Sun , Bohao Jing , Iat Long Iong , Zhenyu Hou , Zehan Qi , Hanyu Lai , Yifan Xu , Rui Lu , Hongning Wang , Jie Tang , Yuxiao Dong

Outcome-driven reinforcement learning has advanced reasoning in large language models (LLMs), but prevailing tool-augmented approaches train a single, monolithic policy that interleaves thoughts and tool calls under full context; this…

Artificial Intelligence · Computer Science 2025-10-08 Zhuofeng Li , Haoxiang Zhang , Seungju Han , Sheng Liu , Jianwen Xie , Yu Zhang , Yejin Choi , James Zou , Pan Lu

Reinforcement learning (RL) has become a pivotal technology in the post-training phase of large language models (LLMs). Traditional task-colocated RL frameworks suffer from significant scalability bottlenecks, while task-separated RL…

Recent advancements in reinforcement learning (RL) demonstrate the significant potential in autonomous driving. Despite this promise, challenges such as the manual design of reward functions and low sample efficiency in complex environments…

Robotics · Computer Science 2025-01-10 Zengqi Peng , Yubin Wang , Xu Han , Lei Zheng , Jun Ma

Reinforcement Learning (RL) has traditionally focused on training specialized agents to optimize predefined reward functions within narrowly defined environments. However, the advent of powerful Large Language Models (LLMs) and increasingly…

Artificial Intelligence · Computer Science 2026-05-18 Fangming Cui , Ruixiao Zhu , Cheng Fang , Sunan Li , Jiahong Li

Reinforcement learning (RL) has become a pivotal component of large language model (LLM) post-training, and agentic RL extends this paradigm to operate as agents through multi-turn interaction and tool use. Scaling such systems exposes two…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-08 Zheyue Tan , Mustapha Abdullahi , Tuo Shi , Huining Yuan , Zelai Xu , Chao Yu , Boxun Li , Bo Zhao

Agentic reinforcement learning (RL) has emerged as a transformative workload in cloud clusters, enabling large language models (LLMs) to solve complex problems through interactions with real world. However, unlike traditional RL, agentic RL…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-16 Bangjun Xiao , Yihao Zhao , Xiangwei Deng , Shihua Yu , Yuxing Xiang , Huaqiu Liu , Qiying Wang , Liang Zhao , Hailin Zhang , Xuanzhe Liu , Xin Jin , Fuli Luo

Researchers and practitioners in the field of reinforcement learning (RL) frequently leverage parallel computation, which has led to a plethora of new algorithms and systems in the last few years. In this paper, we re-examine the challenges…

Machine Learning · Computer Science 2021-11-01 Eric Liang , Zhanghao Wu , Michael Luo , Sven Mika , Joseph E. Gonzalez , Ion Stoica

Reinforcement learning (RL) shows promise for enhancing LLM agentic reasoning, yet sparse terminal rewards hinder fine-grained optimization. Process reward modeling offers an alternative but incurs high computational costs, reward hacking…

Artificial Intelligence · Computer Science 2026-05-29 Xiao Feng , Bo Han , Zhanke Zhou , Jiaqi Fan , Jiangchao Yao , Ka Ho Li , Dahai Yu , Michael Kwok-Po Ng

Synchronous Reinforcement Learning (RL) post-training has emerged as a crucial step for enhancing Large Language Models (LLMs) with diverse capabilities. However, many systems designed to accelerate RL post-training still suffer from low…

Reinforcement learning (RL) is a paradigm increasingly used to align large language models. Popular RL algorithms utilize multiple workers and can be modeled as a graph, where each node is the status of a worker and each edge represents…

Reinforcement Learning from Human Feedback (RLHF) is widely used in Large Language Model (LLM) alignment. Traditional RL can be modeled as a dataflow, where each node represents computation of a neural network (NN) and each edge denotes…

Machine Learning · Computer Science 2024-10-03 Guangming Sheng , Chi Zhang , Zilingfeng Ye , Xibin Wu , Wang Zhang , Ru Zhang , Yanghua Peng , Haibin Lin , Chuan Wu

Reinforcement learning (RL) has demonstrated immense potential in advancing artificial general intelligence, agentic intelligence, and embodied intelligence. However, the inherent heterogeneity and dynamicity of RL workflows often lead to…

Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks. In realistic reasoning scenarios, LLMs can…

Recently, large language models (LLMs) have shown great promise in time series forecasting. However, most existing LLM-based forecasting methods still follow a static generative paradigm that directly maps historical observations to future…

Machine Learning · Computer Science 2026-05-05 Bokai Pan , Mingyue Cheng , Zhiding Liu , Shuo Yu , Xiaoyu Tao , Yuchong Wu , Qi Liu , Defu Lian , Enhong Chen

Recently, the emergence of agentic RL has showcased that RL could also effectively improve the agentic reasoning ability of LLMs, yet the key design principles and optimal practices remain unclear. In this work, we conduct a comprehensive…

Computation and Language · Computer Science 2025-10-14 Zhaochen Yu , Ling Yang , Jiaru Zou , Shuicheng Yan , Mengdi Wang

Agentic workflows in large language model systems integrate retrieval, reasoning, and memory, but existing frameworks suffer from scalability and reproducibility limitations due to fragmented data orchestration, serialization overhead, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-05 Arup Kumar Sarker , Mills Staylor , Aymen Alsaadi , Gregor von Laszewski , Shantenu Jha , Geoffrey Fox
‹ Prev 1 2 3 10 Next ›