Related papers: RAPID: An Efficient Reinforcement Learning Algorit…

Effective Reinforcement Learning for Reasoning in Language Models

Reinforcement learning (RL) has emerged as a promising strategy for improving the reasoning capabilities of language models (LMs) in domains such as mathematics and coding. However, most modern RL algorithms were designed to target robotics…

Artificial Intelligence · Computer Science 2025-05-26 Lianghuan Huang , Shuo Li , Sagnik Anupam , Insup Lee , Osbert Bastani

Training Language Models to Reason Efficiently

Scaling model size and training data has led to great advances in the performance of Large Language Models (LLMs). However, the diminishing returns of this approach necessitate alternative methods to improve model capabilities, particularly…

Machine Learning · Computer Science 2025-11-05 Daman Arora , Andrea Zanette

SPEED-RL: Faster Training of Reasoning Models via Online Curriculum Learning

Training large language models with reinforcement learning (RL) against verifiable rewards significantly enhances their reasoning abilities, yet remains computationally expensive due to inefficient uniform prompt sampling. We introduce…

Machine Learning · Computer Science 2026-03-06 Ruiqi Zhang , Daman Arora , Song Mei , Andrea Zanette

A Minimalist Approach to Offline Reinforcement Learning

Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data. Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms take the approach of constraining or regularizing…

Machine Learning · Computer Science 2021-12-06 Scott Fujimoto , Shixiang Shane Gu

Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization

Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy that maximizes the cumulative rewards in sequential decision making. Most of methods in the existing…

Machine Learning · Statistics 2023-01-06 Chengchun Shi , Zhengling Qi , Jianing Wang , Fan Zhou

Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving

The integration of Large Language Models (LLMs) into autonomous driving systems demonstrates strong common sense and reasoning abilities, effectively addressing the pitfalls of purely data-driven methods. Current LLM-based agents require…

Robotics · Computer Science 2024-10-22 Sihao Wu , Jiaxu Liu , Xiangyu Yin , Guangliang Cheng , Xingyu Zhao , Meng Fang , Xinping Yi , Xiaowei Huang

Using Large Language Models to Automate and Expedite Reinforcement Learning with Reward Machine

We present LARL-RM (Large language model-generated Automaton for Reinforcement Learning with Reward Machine) algorithm in order to encode high-level knowledge into reinforcement learning using automaton to expedite the reinforcement…

Machine Learning · Computer Science 2024-02-13 Shayan Meshkat Alsadat , Jean-Raphael Gaglione , Daniel Neider , Ufuk Topcu , Zhe Xu

Boosting Accuracy and Efficiency of Budget Forcing in LLMs via Reinforcement Learning for Mathematical Reasoning

Test-time scaling methods have seen a rapid increase in popularity for its computational efficiency and parameter-independent training to improve reasoning performance on Large Language Models. One such method is called budget forcing, a…

Artificial Intelligence · Computer Science 2025-10-27 Ravindra Aribowo Tarunokusumo , Rafael Fernandes Cunha

Offline RL by Reward-Weighted Fine-Tuning for Conversation Optimization

Offline reinforcement learning (RL) is a variant of RL where the policy is learned from a previously collected dataset of trajectories and rewards. In our work, we propose a practical approach to offline RL with large language models…

Computation and Language · Computer Science 2026-02-17 Subhojyoti Mukherjee , Viet Dac Lai , Raghavendra Addanki , Ryan Rossi , Seunghyun Yoon , Trung Bui , Anup Rao , Jayakumar Subramanian , Branislav Kveton

RoiRL: Efficient, Self-Supervised Reasoning with Offline Iterative Reinforcement Learning

Reinforcement learning (RL) is central to improving reasoning in large language models (LLMs) but typically requires ground-truth rewards. Test-Time Reinforcement Learning (TTRL) removes this need by using majority-vote rewards, but relies…

Machine Learning · Computer Science 2025-10-06 Aleksei Arzhantsev , Otmane Sakhi , Flavian Vasile

Improving RL Exploration for LLM Reasoning through Retrospective Replay

Reinforcement learning (RL) has increasingly become a pivotal technique in the post-training of large language models (LLMs). The effective exploration of the output space is essential for the success of RL. We observe that for complex…

Machine Learning · Computer Science 2025-07-08 Shihan Dou , Muling Wu , Jingwen Xu , Rui Zheng , Tao Gui , Qi Zhang , Xuanjing Huang

Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't

Enhancing the reasoning capabilities of large language models (LLMs) typically relies on massive computational resources and extensive datasets, limiting accessibility for resource-constrained settings. Our study investigates the potential…

Machine Learning · Computer Science 2026-01-21 Quy-Anh Dang , Chris Ngo

Searching for Plannable Domains can Speed up Reinforcement Learning

Reinforcement learning (RL) involves sequential decision making in uncertain environments. The aim of the decision-making agent is to maximize the benefit of acting in its environment over an extended period of time. Finding an optimal…

Artificial Intelligence · Computer Science 2007-05-23 Istvan Szita , Balint Takacs , Andras Lorincz

Reward-Machine-Guided, Self-Paced Reinforcement Learning

Self-paced reinforcement learning (RL) aims to improve the data efficiency of learning by automatically creating sequences, namely curricula, of probability distributions over contexts. However, existing techniques for self-paced RL fail in…

Machine Learning · Computer Science 2023-05-29 Cevahir Koprulu , Ufuk Topcu

SuperRL: Reinforcement Learning with Supervision to Boost Language Model Reasoning

Large language models are increasingly used for complex reasoning tasks where high-quality offline data such as expert-annotated solutions and distilled reasoning traces are often available. However, in environments with sparse rewards,…

Artificial Intelligence · Computer Science 2025-08-11 Yihao Liu , Shuocheng Li , Lang Cao , Yuhang Xie , Mengyu Zhou , Haoyu Dong , Xiaojun Ma , Shi Han , Dongmei Zhang

Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation

Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires…

Computation and Language · Computer Science 2019-07-31 Yang Gao , Christian M. Meyer , Mohsen Mesgar , Iryna Gurevych

Sample Efficient Reinforcement Learning by Automatically Learning to Compose Subtasks

Improving sample efficiency is central to Reinforcement Learning (RL), especially in environments where the rewards are sparse. Some recent approaches have proposed to specify reward functions as manually designed or learned reward…

Machine Learning · Computer Science 2024-01-26 Shuai Han , Mehdi Dastani , Shihan Wang

LLMs Can Learn to Reason Via Off-Policy RL

Reinforcement learning (RL) approaches for Large Language Models (LLMs) frequently use on-policy algorithms, such as PPO or GRPO. However, policy lag from distributed training architectures and differences between the training and inference…

Machine Learning · Computer Science 2026-03-03 Daniel Ritter , Owen Oertell , Bradley Guo , Jonathan Chang , Kianté Brantley , Wen Sun

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

This paper proposes a novel formulation for reinforcement learning (RL) with large language models, explaining why and under what conditions the true sequence-level reward can be optimized via a surrogate token-level objective in policy…

Machine Learning · Computer Science 2025-12-04 Chujie Zheng , Kai Dang , Bowen Yu , Mingze Li , Huiqiang Jiang , Junrong Lin , Yuqiong Liu , Hao Lin , Chencan Wu , Feng Hu , An Yang , Jingren Zhou , Junyang Lin

Boosting Offline Reinforcement Learning via Data Rebalancing

Offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets. To address this problem, existing works mainly focus on designing sophisticated algorithms to explicitly or implicitly…

Machine Learning · Computer Science 2022-10-18 Yang Yue , Bingyi Kang , Xiao Ma , Zhongwen Xu , Gao Huang , Shuicheng Yan