English
Related papers

Related papers: RAPID: An Efficient Reinforcement Learning Algorit…

200 papers

Reinforcement learning (RL) has emerged as a promising strategy for improving the reasoning capabilities of language models (LMs) in domains such as mathematics and coding. However, most modern RL algorithms were designed to target robotics…

Artificial Intelligence · Computer Science 2025-05-26 Lianghuan Huang , Shuo Li , Sagnik Anupam , Insup Lee , Osbert Bastani

Scaling model size and training data has led to great advances in the performance of Large Language Models (LLMs). However, the diminishing returns of this approach necessitate alternative methods to improve model capabilities, particularly…

Machine Learning · Computer Science 2025-11-05 Daman Arora , Andrea Zanette

Training large language models with reinforcement learning (RL) against verifiable rewards significantly enhances their reasoning abilities, yet remains computationally expensive due to inefficient uniform prompt sampling. We introduce…

Machine Learning · Computer Science 2026-03-06 Ruiqi Zhang , Daman Arora , Song Mei , Andrea Zanette

Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data. Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms take the approach of constraining or regularizing…

Machine Learning · Computer Science 2021-12-06 Scott Fujimoto , Shixiang Shane Gu

Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy that maximizes the cumulative rewards in sequential decision making. Most of methods in the existing…

Machine Learning · Statistics 2023-01-06 Chengchun Shi , Zhengling Qi , Jianing Wang , Fan Zhou

The integration of Large Language Models (LLMs) into autonomous driving systems demonstrates strong common sense and reasoning abilities, effectively addressing the pitfalls of purely data-driven methods. Current LLM-based agents require…

Robotics · Computer Science 2024-10-22 Sihao Wu , Jiaxu Liu , Xiangyu Yin , Guangliang Cheng , Xingyu Zhao , Meng Fang , Xinping Yi , Xiaowei Huang

We present LARL-RM (Large language model-generated Automaton for Reinforcement Learning with Reward Machine) algorithm in order to encode high-level knowledge into reinforcement learning using automaton to expedite the reinforcement…

Machine Learning · Computer Science 2024-02-13 Shayan Meshkat Alsadat , Jean-Raphael Gaglione , Daniel Neider , Ufuk Topcu , Zhe Xu

Test-time scaling methods have seen a rapid increase in popularity for its computational efficiency and parameter-independent training to improve reasoning performance on Large Language Models. One such method is called budget forcing, a…

Artificial Intelligence · Computer Science 2025-10-27 Ravindra Aribowo Tarunokusumo , Rafael Fernandes Cunha

Offline reinforcement learning (RL) is a variant of RL where the policy is learned from a previously collected dataset of trajectories and rewards. In our work, we propose a practical approach to offline RL with large language models…

Reinforcement learning (RL) is central to improving reasoning in large language models (LLMs) but typically requires ground-truth rewards. Test-Time Reinforcement Learning (TTRL) removes this need by using majority-vote rewards, but relies…

Machine Learning · Computer Science 2025-10-06 Aleksei Arzhantsev , Otmane Sakhi , Flavian Vasile

Reinforcement learning (RL) has increasingly become a pivotal technique in the post-training of large language models (LLMs). The effective exploration of the output space is essential for the success of RL. We observe that for complex…

Machine Learning · Computer Science 2025-07-08 Shihan Dou , Muling Wu , Jingwen Xu , Rui Zheng , Tao Gui , Qi Zhang , Xuanjing Huang

Enhancing the reasoning capabilities of large language models (LLMs) typically relies on massive computational resources and extensive datasets, limiting accessibility for resource-constrained settings. Our study investigates the potential…

Machine Learning · Computer Science 2026-01-21 Quy-Anh Dang , Chris Ngo

Reinforcement learning (RL) involves sequential decision making in uncertain environments. The aim of the decision-making agent is to maximize the benefit of acting in its environment over an extended period of time. Finding an optimal…

Artificial Intelligence · Computer Science 2007-05-23 Istvan Szita , Balint Takacs , Andras Lorincz

Self-paced reinforcement learning (RL) aims to improve the data efficiency of learning by automatically creating sequences, namely curricula, of probability distributions over contexts. However, existing techniques for self-paced RL fail in…

Machine Learning · Computer Science 2023-05-29 Cevahir Koprulu , Ufuk Topcu

Large language models are increasingly used for complex reasoning tasks where high-quality offline data such as expert-annotated solutions and distilled reasoning traces are often available. However, in environments with sparse rewards,…

Artificial Intelligence · Computer Science 2025-08-11 Yihao Liu , Shuocheng Li , Lang Cao , Yuhang Xie , Mengyu Zhou , Haoyu Dong , Xiaojun Ma , Shi Han , Dongmei Zhang

Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires…

Computation and Language · Computer Science 2019-07-31 Yang Gao , Christian M. Meyer , Mohsen Mesgar , Iryna Gurevych

Improving sample efficiency is central to Reinforcement Learning (RL), especially in environments where the rewards are sparse. Some recent approaches have proposed to specify reward functions as manually designed or learned reward…

Machine Learning · Computer Science 2024-01-26 Shuai Han , Mehdi Dastani , Shihan Wang

Reinforcement learning (RL) approaches for Large Language Models (LLMs) frequently use on-policy algorithms, such as PPO or GRPO. However, policy lag from distributed training architectures and differences between the training and inference…

Machine Learning · Computer Science 2026-03-03 Daniel Ritter , Owen Oertell , Bradley Guo , Jonathan Chang , Kianté Brantley , Wen Sun

This paper proposes a novel formulation for reinforcement learning (RL) with large language models, explaining why and under what conditions the true sequence-level reward can be optimized via a surrogate token-level objective in policy…

Machine Learning · Computer Science 2025-12-04 Chujie Zheng , Kai Dang , Bowen Yu , Mingze Li , Huiqiang Jiang , Junrong Lin , Yuqiong Liu , Hao Lin , Chencan Wu , Feng Hu , An Yang , Jingren Zhou , Junyang Lin

Offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets. To address this problem, existing works mainly focus on designing sophisticated algorithms to explicitly or implicitly…

Machine Learning · Computer Science 2022-10-18 Yang Yue , Bingyi Kang , Xiao Ma , Zhongwen Xu , Gao Huang , Shuicheng Yan
‹ Prev 1 2 3 10 Next ›