English
Related papers

Related papers: AsyncFlow: An Asynchronous Streaming RL Framework …

200 papers

Reinforcement learning (RL) has become the pivotal post-training technique for large language model (LLM). Effectively scaling reinforcement learning is now the key to unlocking advanced reasoning capabilities and ensuring safe,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-10 Zhixin Wang , Tianyi Zhou , Liming Liu , Ao Li , Jiarui Hu , Dian Yang , Yinhui Lu , Jinlong Hou , Siyuan Feng , Yuan Cheng , Yuan Qi

Reinforcement learning (RL) post-training has become pivotal for enhancing the capabilities of modern large models. A recent trend is to develop RL systems with a fully disaggregated architecture, which decouples the three RL phases…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-21 Haoyang Li , Sheng Lin , Fangcheng Fu , Yuming Zhou , Xiaodong Ji , Yanfeng Zhao , Lefeng Wang , Jie Jiang , Bin Cui

Since the introduction of the GRPO algorithm, reinforcement learning (RL) has attracted increasing attention for LLM post-training, yet training efficiency remains a critical challenge. In mainstream RL frameworks, inference and training…

Machine Learning · Computer Science 2026-05-05 Jian Lu

Reinforcement learning (RL) is increasingly used to improve the reasoning, coding, and tool-use capabilities of large language models, but agentic RL remains prohibitively expensive. Scaling RL to agentic LLMs requires supporting complex…

Machine Learning · Computer Science 2026-05-18 Haizhong Zheng , Yizhuo Di , Jiahui Wang , Shuowei Jin , Xueshen Liu , Yongji Wu , Z. Morley Mao , Ion Stoica , Jiawei Zhao , Beidi Chen

Reinforcement learning (RL) has become the core post-training technique for large language models (LLMs). RL for LLMs involves two stages: generation and training. The LLM first generates samples online, which are then used to derive…

Synchronous Reinforcement Learning (RL) post-training has emerged as a crucial step for enhancing Large Language Models (LLMs) with diverse capabilities. However, many systems designed to accelerate RL post-training still suffer from low…

Many production lines require active control mechanisms, such as adaptive routing, worker reallocation, and rescheduling, to maintain optimal performance. However, designing these control systems is challenging for various reasons, and…

Machine Learning · Computer Science 2025-05-13 Kai Müller , Martin Wenzel , Tobias Windisch

Reinforcement Learning (RL) has become the most effective post-training approach for improving the capabilities of Large Language Models (LLMs). In practice, because of the high demands on latency and memory, it is particularly challenging…

Reinforcement learning (RL) is a paradigm increasingly used to align large language models. Popular RL algorithms utilize multiple workers and can be modeled as a graph, where each node is the status of a worker and each edge represents…

Reinforcement Learning from Human Feedback (RLHF) is widely used in Large Language Model (LLM) alignment. Traditional RL can be modeled as a dataflow, where each node represents computation of a neural network (NN) and each edge denotes…

Machine Learning · Computer Science 2024-10-03 Guangming Sheng , Chi Zhang , Zilingfeng Ye , Xibin Wu , Wang Zhang , Ru Zhang , Yanghua Peng , Haibin Lin , Chuan Wu

Reinforcement learning (RL) has demonstrated immense potential in advancing artificial general intelligence, agentic intelligence, and embodied intelligence. However, the inherent heterogeneity and dynamicity of RL workflows often lead to…

Deep Reinforcement Learning (DRL) has emerged as a promising approach for handling highly dynamic and nonlinear Active Flow Control (AFC) problems. However, the computational cost associated with training DRL models presents a significant…

Machine Learning · Computer Science 2024-09-27 Wang Jia , Hang Xu

Reinforcement learning (RL) has emerged as a critical paradigm for post-training Vision-Language-Action (VLA) models, enabling embodied agents to adapt and improve through environmental interaction. However, existing RL frameworks for VLAs…

Reinforcement learning (RL) post-training for Large Language Models (LLMs) is now scaling to large clusters and running for extended durations to enhance model reasoning performance. However, the scalability of existing RL frameworks is…

Machine Learning · Computer Science 2025-10-15 Guangming Sheng , Yuxuan Tong , Borui Wan , Wang Zhang , Chaobo Jia , Xibin Wu , Yuqi Wu , Xiang Li , Chi Zhang , Yanghua Peng , Haibin Lin , Xin Liu , Chuan Wu

Reinforcement learning (RL) has emerged as an effective post-training paradigm for enhancing the reasoning capabilities of multimodal large language model (MLLM). However, current RL pipelines often suffer from training inefficiencies…

Machine Learning · Computer Science 2026-03-04 Linghao Zhu , Yiran Guan , Dingkang Liang , Jianzhong Ju , Zhenbo Luo , Bin Qin , Jian Luan , Yuliang Liu , Xiang Bai

We introduce SeamlessFlow, a server based reinforcement learning (RL) framework that addresses two core challenges in industrial scale RL: (1) decoupling RL training from the complex execution flow of agents; (2) maximizing GPU utilization…

Reinforcement Learning (RL) has achieved significant success in application domains such as robotics, games and health care. However, training RL agents is very time consuming. Current implementations exhibit poor performance due to…

Machine Learning · Computer Science 2021-12-24 Chi Zhang , Sanmukh Rao Kuppannagari , Viktor K Prasanna

Parallel Reinforcement Learning (RL) frameworks are essential for mapping RL workloads to multiple computational resources, allowing for faster generation of samples, estimation of values, and policy improvement. These computational…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-06 Jacky Kwok , Marten Lohstroh , Edward A. Lee

The rapidly growing demand for high-quality data in Large Language Models (LLMs) has intensified the need for scalable, reliable, and semantically rich data preparation pipelines. However, current practices remain dominated by ad-hoc…

Vision-Language-Action (VLA) models based on flow matching have shown excellent performance in general-purpose robotic manipulation tasks. However, the action accuracy of these models on complex downstream tasks is unsatisfactory. One…

Robotics · Computer Science 2025-09-05 Hongyin Zhang , Shiyuan Zhang , Junxi Jin , Qixin Zeng , Yifan Qiao , Hongchao Lu , Donglin Wang
‹ Prev 1 2 3 10 Next ›