Related papers: Context Bootstrapped Reinforcement Learning

Coupled Variational Reinforcement Learning for Language Model General Reasoning

While reinforcement learning has achieved impressive progress in language model reasoning, it is constrained by the requirement for verifiable rewards. Recent verifier-free RL methods address this limitation by utilizing the probabilities…

Computation and Language · Computer Science 2026-05-26 Xueru Wen , Jie Lou , Yanjiang Liu , Hongyu Lin , Ben He , Xianpei Han , Le Sun , Yaojie Lu , Debing Zhang

LongRLVR: Long-Context Reinforcement Learning Requires Verifiable Context Rewards

Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced the reasoning capabilities of Large Language Models (LLMs) by optimizing them against factual outcomes. However, this paradigm falters in long-context…

Computation and Language · Computer Science 2026-03-03 Guanzheng Chen , Michael Qizhe Shieh , Lidong Bing

CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG Models

With the growing use of Retrieval-Augmented Generation (RAG), training large language models (LLMs) for context-sensitive reasoning and faithfulness is increasingly important. Existing RAG-oriented reinforcement learning (RL) methods rely…

Computation and Language · Computer Science 2026-03-06 Zhehao Tan , Yihan Jiao , Dan Yang , Junjie Wang , Duolin Sun , Jie Feng , Xidong Wang , Lei Liu , Yue Shen , Jian Wang , Jinjie Gu

A Survey on Model-based Reinforcement Learning

Reinforcement learning (RL) solves sequential decision-making problems via a trial-and-error process interacting with the environment. While RL achieves outstanding success in playing complex video games that allow huge trial-and-error,…

Machine Learning · Computer Science 2022-06-22 Fan-Ming Luo , Tian Xu , Hang Lai , Xiong-Hui Chen , Weinan Zhang , Yang Yu

Contextualize Me -- The Case for Context in Reinforcement Learning

While Reinforcement Learning ( RL) has made great strides towards solving increasingly complicated problems, many algorithms are still brittle to even slight environmental changes. Contextual Reinforcement Learning (cRL) provides a…

Machine Learning · Computer Science 2023-06-05 Carolin Benjamins , Theresa Eimer , Frederik Schubert , Aditya Mohan , Sebastian Döhler , André Biedenkapp , Bodo Rosenhahn , Frank Hutter , Marius Lindauer

From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning

Reinforcement learning from verifiable rewards (RLVR) has shown strong promise for LLM reasoning, but outcome-based RLVR remains inefficient on hard problems because correct final-answer rollouts are rare and sample-level credit assignment…

Machine Learning · Computer Science 2026-05-22 Xitai Jiang , Zihan Tang , Wenze Lin , Yang Yue , Shenzhi Wang , Gao Huang

ContextRL: Enhancing MLLM's Knowledge Discovery Efficiency with Context-Augmented RL

We propose ContextRL, a novel framework that leverages context augmentation to overcome these bottlenecks. Specifically, to enhance Identifiability, we provide the reward model with full reference solutions as context, enabling fine-grained…

Machine Learning · Computer Science 2026-02-27 Xingyu Lu , Jinpeng Wang , YiFan Zhang , Shijie Ma , Xiao Hu , Tianke Zhang , Haonan fan , Kaiyu Jiang , Changyi Liu , Kaiyu Tang , Bin Wen , Fan Yang , Tingting Gao , Han Li , Chun Yuan

Imitation Bootstrapped Reinforcement Learning

Despite the considerable potential of reinforcement learning (RL), robotic control tasks predominantly rely on imitation learning (IL) due to its better sample efficiency. However, it is costly to collect comprehensive expert demonstrations…

Machine Learning · Computer Science 2024-05-22 Hengyuan Hu , Suvir Mirchandani , Dorsa Sadigh

Good Reasoning Makes Good Demonstrations: Implicit Reasoning Quality Supervision via In-Context Reinforcement Learning

Reinforcement Learning with Verifiable Rewards (RLVR) improves reasoning in large language models but treats all correct solutions equally, potentially reinforcing flawed traces that get correct answers by chance. We observe that better…

Machine Learning · Computer Science 2026-03-11 Tiehua Mei , Minxuan Lv , Leiyu Pan , Zhenpeng Su , Hongru Hou , Hengrui Chen , Ao Xu , Deqing Yang

Outcome-based Reinforcement Learning to Predict the Future

Reinforcement Learning with Verifiable Rewards (RLVR) has been an effective approach for improving Large Language Models' reasoning in domains such as coding and mathematics. Here, we apply RLVR methods towards forecasting future real-world…

Machine Learning · Computer Science 2025-12-02 Benjamin Turtel , Danny Franklin , Kris Skotheim , Luke Hewitt , Philipp Schoenegger

Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as an effective approach for enhancing the reasoning capabilities of Large Language Models (LLMs). Despite its efficacy, RLVR faces a meta-learning bottleneck: it lacks…

Machine Learning · Computer Science 2026-02-12 Shiting Huang , Zecheng Li , Yu Zeng , Qingnan Ren , Zhen Fang , Qisheng Su , Kou Shi , Lin Chen , Zehui Chen , Feng Zhao

Solving Robotics Tasks with Prior Demonstration via Exploration-Efficient Deep Reinforcement Learning

This paper proposes an exploration-efficient Deep Reinforcement Learning with Reference policy (DRLR) framework for learning robotics tasks that incorporates demonstrations. The DRLR framework is developed based on an algorithm called…

Robotics · Computer Science 2026-01-09 Chengyandan Shen , Christoffer Sloth

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Reinforcement Learning with Verifiable Rewards (RLVR) has recently demonstrated notable success in enhancing the reasoning performance of large language models (LLMs), particularly on mathematics and programming tasks. Similar to how…

Artificial Intelligence · Computer Science 2025-11-25 Yang Yue , Zhiqi Chen , Rui Lu , Andrew Zhao , Zhaokai Wang , Yang Yue , Shiji Song , Gao Huang

Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards

Reinforcement Learning with Verifiable Rewards (RLVR) is an effective paradigm for improving the reasoning capabilities of large language models. However, existing RLVR methods utilize rollouts in an indiscriminate and short-horizon manner:…

Machine Learning · Computer Science 2026-05-26 Xiaodong Lu , Xiaohan Wang , Jiajun Chai , Guojun Yin , Wei Lin , Zhijun Chen , Yu Luo , Fuzhen Zhuang , Yikun Ban , Deqing Wang

A Simple "Motivation" Can Enhance Reinforcement Finetuning of Large Reasoning Models

Reinforcement Learning with Verifiable Rewards~(RLVR) has emerged as a powerful learn-to-reason paradigm for large reasoning models to tackle complex tasks. However, the current RLVR paradigm is still not efficient enough, as it works in a…

Computation and Language · Computer Science 2026-03-10 Junjie Zhang , Guozheng Ma , Shunyu Liu , Haoyu Wang , Jiaxing Huang , Ting-En Lin , Fei Huang , Yongbin Li , Dacheng Tao

BroRL: Scaling Reinforcement Learning via Broadened Exploration

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a key ingredient for unlocking complex reasoning capabilities in large language models. Recent work ProRL has shown promise in scaling RL by increasing the number of…

Machine Learning · Computer Science 2025-10-02 Jian Hu , Mingjie Liu , Ximing Lu , Fang Wu , Zaid Harchaoui , Shizhe Diao , Yejin Choi , Pavlo Molchanov , Jun Yang , Jan Kautz , Yi Dong

Beyond Correctness: Learning Robust Reasoning via Transfer

Reinforcement Learning with Verifiable Rewards (RLVR) has recently strengthened LLM reasoning, but its focus on final answer correctness leaves a critical gap: it does not ensure the robustness of the reasoning process itself. We adopt a…

Machine Learning · Computer Science 2026-02-10 Hyunseok Lee , Soheil Abbasloo , Jihoon Tack , Jinwoo Shin

ConfClip: Confidence-Weighted and Clipped Reward for Reinforcement Learning in LLMs

Reinforcement learning (RL) has become a standard paradigm for refining large language models (LLMs) beyond pre-training and instruction tuning. A prominent line of work is RL with verifiable rewards (RLVR), which leverages automatically…

Machine Learning · Computer Science 2025-09-23 Bonan Zhang , Zhongqi Chen , Bowen Song , Qinya Li , Fan Wu , Guihai Chen

Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs

Recent advancements in long chain-of-thought (CoT) reasoning, particularly through the Group Relative Policy Optimization algorithm used by DeepSeek-R1, have led to significant interest in the potential of Reinforcement Learning with…

Artificial Intelligence · Computer Science 2025-10-03 Xumeng Wen , Zihan Liu , Shun Zheng , Shengyu Ye , Zhirong Wu , Yang Wang , Zhijian Xu , Xiao Liang , Junjie Li , Ziming Miao , Jiang Bian , Mao Yang

Reasoning Depth and Environment Complexity: A Controlled Study of RLVR Data Allocation across Logical Reasoning Tasks

Reinforcement learning with verifiable rewards (RLVR) has become central to post-training reasoning models, yet a key limitation of existing studies is their narrow view of the reasoning space: difficulty is treated as reasoning depth…

Computation and Language · Computer Science 2026-05-27 Yihua Zhu , Qianying Liu , Fei Cheng , Jiaxin Wang , Akiko Aizawa , Sadao Kurohashi , Hidetoshi Shimodaira