English
Related papers

Related papers: Context Bootstrapped Reinforcement Learning

200 papers

While reinforcement learning has achieved impressive progress in language model reasoning, it is constrained by the requirement for verifiable rewards. Recent verifier-free RL methods address this limitation by utilizing the probabilities…

Computation and Language · Computer Science 2026-05-26 Xueru Wen , Jie Lou , Yanjiang Liu , Hongyu Lin , Ben He , Xianpei Han , Le Sun , Yaojie Lu , Debing Zhang

Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced the reasoning capabilities of Large Language Models (LLMs) by optimizing them against factual outcomes. However, this paradigm falters in long-context…

Computation and Language · Computer Science 2026-03-03 Guanzheng Chen , Michael Qizhe Shieh , Lidong Bing

With the growing use of Retrieval-Augmented Generation (RAG), training large language models (LLMs) for context-sensitive reasoning and faithfulness is increasingly important. Existing RAG-oriented reinforcement learning (RL) methods rely…

Computation and Language · Computer Science 2026-03-06 Zhehao Tan , Yihan Jiao , Dan Yang , Junjie Wang , Duolin Sun , Jie Feng , Xidong Wang , Lei Liu , Yue Shen , Jian Wang , Jinjie Gu

Reinforcement learning (RL) solves sequential decision-making problems via a trial-and-error process interacting with the environment. While RL achieves outstanding success in playing complex video games that allow huge trial-and-error,…

Machine Learning · Computer Science 2022-06-22 Fan-Ming Luo , Tian Xu , Hang Lai , Xiong-Hui Chen , Weinan Zhang , Yang Yu

While Reinforcement Learning ( RL) has made great strides towards solving increasingly complicated problems, many algorithms are still brittle to even slight environmental changes. Contextual Reinforcement Learning (cRL) provides a…

Reinforcement learning from verifiable rewards (RLVR) has shown strong promise for LLM reasoning, but outcome-based RLVR remains inefficient on hard problems because correct final-answer rollouts are rare and sample-level credit assignment…

Machine Learning · Computer Science 2026-05-22 Xitai Jiang , Zihan Tang , Wenze Lin , Yang Yue , Shenzhi Wang , Gao Huang

We propose ContextRL, a novel framework that leverages context augmentation to overcome these bottlenecks. Specifically, to enhance Identifiability, we provide the reward model with full reference solutions as context, enabling fine-grained…

Despite the considerable potential of reinforcement learning (RL), robotic control tasks predominantly rely on imitation learning (IL) due to its better sample efficiency. However, it is costly to collect comprehensive expert demonstrations…

Machine Learning · Computer Science 2024-05-22 Hengyuan Hu , Suvir Mirchandani , Dorsa Sadigh

Reinforcement Learning with Verifiable Rewards (RLVR) improves reasoning in large language models but treats all correct solutions equally, potentially reinforcing flawed traces that get correct answers by chance. We observe that better…

Machine Learning · Computer Science 2026-03-11 Tiehua Mei , Minxuan Lv , Leiyu Pan , Zhenpeng Su , Hongru Hou , Hengrui Chen , Ao Xu , Deqing Yang

Reinforcement Learning with Verifiable Rewards (RLVR) has been an effective approach for improving Large Language Models' reasoning in domains such as coding and mathematics. Here, we apply RLVR methods towards forecasting future real-world…

Machine Learning · Computer Science 2025-12-02 Benjamin Turtel , Danny Franklin , Kris Skotheim , Luke Hewitt , Philipp Schoenegger

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as an effective approach for enhancing the reasoning capabilities of Large Language Models (LLMs). Despite its efficacy, RLVR faces a meta-learning bottleneck: it lacks…

Machine Learning · Computer Science 2026-02-12 Shiting Huang , Zecheng Li , Yu Zeng , Qingnan Ren , Zhen Fang , Qisheng Su , Kou Shi , Lin Chen , Zehui Chen , Feng Zhao

This paper proposes an exploration-efficient Deep Reinforcement Learning with Reference policy (DRLR) framework for learning robotics tasks that incorporates demonstrations. The DRLR framework is developed based on an algorithm called…

Robotics · Computer Science 2026-01-09 Chengyandan Shen , Christoffer Sloth

Reinforcement Learning with Verifiable Rewards (RLVR) has recently demonstrated notable success in enhancing the reasoning performance of large language models (LLMs), particularly on mathematics and programming tasks. Similar to how…

Artificial Intelligence · Computer Science 2025-11-25 Yang Yue , Zhiqi Chen , Rui Lu , Andrew Zhao , Zhaokai Wang , Yang Yue , Shiji Song , Gao Huang

Reinforcement Learning with Verifiable Rewards (RLVR) is an effective paradigm for improving the reasoning capabilities of large language models. However, existing RLVR methods utilize rollouts in an indiscriminate and short-horizon manner:…

Machine Learning · Computer Science 2026-05-26 Xiaodong Lu , Xiaohan Wang , Jiajun Chai , Guojun Yin , Wei Lin , Zhijun Chen , Yu Luo , Fuzhen Zhuang , Yikun Ban , Deqing Wang

Reinforcement Learning with Verifiable Rewards~(RLVR) has emerged as a powerful learn-to-reason paradigm for large reasoning models to tackle complex tasks. However, the current RLVR paradigm is still not efficient enough, as it works in a…

Computation and Language · Computer Science 2026-03-10 Junjie Zhang , Guozheng Ma , Shunyu Liu , Haoyu Wang , Jiaxing Huang , Ting-En Lin , Fei Huang , Yongbin Li , Dacheng Tao

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a key ingredient for unlocking complex reasoning capabilities in large language models. Recent work ProRL has shown promise in scaling RL by increasing the number of…

Machine Learning · Computer Science 2025-10-02 Jian Hu , Mingjie Liu , Ximing Lu , Fang Wu , Zaid Harchaoui , Shizhe Diao , Yejin Choi , Pavlo Molchanov , Jun Yang , Jan Kautz , Yi Dong

Reinforcement Learning with Verifiable Rewards (RLVR) has recently strengthened LLM reasoning, but its focus on final answer correctness leaves a critical gap: it does not ensure the robustness of the reasoning process itself. We adopt a…

Machine Learning · Computer Science 2026-02-10 Hyunseok Lee , Soheil Abbasloo , Jihoon Tack , Jinwoo Shin

Reinforcement learning (RL) has become a standard paradigm for refining large language models (LLMs) beyond pre-training and instruction tuning. A prominent line of work is RL with verifiable rewards (RLVR), which leverages automatically…

Machine Learning · Computer Science 2025-09-23 Bonan Zhang , Zhongqi Chen , Bowen Song , Qinya Li , Fan Wu , Guihai Chen

Recent advancements in long chain-of-thought (CoT) reasoning, particularly through the Group Relative Policy Optimization algorithm used by DeepSeek-R1, have led to significant interest in the potential of Reinforcement Learning with…

Artificial Intelligence · Computer Science 2025-10-03 Xumeng Wen , Zihan Liu , Shun Zheng , Shengyu Ye , Zhirong Wu , Yang Wang , Zhijian Xu , Xiao Liang , Junjie Li , Ziming Miao , Jiang Bian , Mao Yang

Reinforcement learning with verifiable rewards (RLVR) has become central to post-training reasoning models, yet a key limitation of existing studies is their narrow view of the reasoning space: difficulty is treated as reasoning depth…

Computation and Language · Computer Science 2026-05-27 Yihua Zhu , Qianying Liu , Fei Cheng , Jiaxin Wang , Akiko Aizawa , Sadao Kurohashi , Hidetoshi Shimodaira
‹ Prev 1 2 3 10 Next ›