English
Related papers

Related papers: Execution-Verified Reinforcement Learning for Opti…

200 papers

Vision-Language-Action (VLA) models have become a prominent paradigm for embodied intelligence, yet further performance improvements typically rely on scaling up training data and model size -- an approach that is prohibitively expensive…

Robotics · Computer Science 2025-10-15 Mingtong Dai , Lingbo Liu , Yongjie Bai , Yang Liu , Zhouxia Wang , Rui SU , Chunjie Chen , Liang Lin , Xinyu Wu

Recent work on reinforcement learning with verifiable rewards (RLVR) has shown that large language models (LLMs) can be substantially improved using outcome-level verification signals, such as unit tests for code or exact-match checks for…

Computation and Language · Computer Science 2026-01-27 Massimiliano Pronesti , Anya Belz , Yufang Hou

Offline reinforcement learning (RL) shows promise of applying RL to real-world problems by effectively utilizing previously collected data. Most existing offline RL algorithms use regularization or constraints to suppress extrapolation…

Machine Learning · Computer Science 2021-10-20 Xiaoteng Ma , Yiqin Yang , Hao Hu , Qihan Liu , Jun Yang , Chongjie Zhang , Qianchuan Zhao , Bin Liang

Reinforcement learning with verifiable reward (RLVR) has become a promising paradigm for post-training large language models (LLMs) to improve their reasoning capability. However, when the rollout accuracy is low on hard problems, the…

Machine Learning · Computer Science 2026-04-21 Huanyu Liu , Jia Li , Yihong Dong , Chang Yu , Taozhi Chen , Lecheng Wang , Yongding Tao , Bin Gu , Ge Li

We propose Reinforcement Learning with Explicit Human Values (RLEV), a method that aligns Large Language Model (LLM) optimization directly with quantifiable human value signals. While Reinforcement Learning with Verifiable Rewards (RLVR)…

Machine Learning · Computer Science 2025-10-24 Dian Yu , Yulai Zhao , Kishan Panaganti , Linfeng Song , Haitao Mi , Dong Yu

Reinforcement learning with verifiable rewards (RLVR) has demonstrated superior performance in enhancing the reasoning capability of large language models (LLMs). However, this accuracy-oriented learning paradigm often suffers from entropy…

Artificial Intelligence · Computer Science 2026-01-19 Hongye Cao , Zhixin Bai , Ziyue Peng , Boyan Wang , Tianpei Yang , Jing Huo , Yuyao Zhang , Yang Gao

Large Vision-Language Models (LVLMs) have recently advanced robotic manipulation by leveraging vision for scene perception and language for instruction following. However, existing methods rely heavily on costly human-annotated training…

A promising research direction in enabling LLMs to generate consistently correct code involves addressing their inability to properly estimate program execution, particularly for code they generate. In this work, we demonstrate that Code…

Computation and Language · Computer Science 2026-04-07 Gallil Maimon , Ori Yoran , Felix Kreuk , Michael Hassid , Gal Cohen , Pierre Chambon , Yossi Adi

Code LLMs still struggle with code execution reasoning, especially in smaller models. Existing methods rely on supervised fine-tuning (SFT) with teacher-generated explanations, primarily in two forms: (1) input-output (I/O) prediction…

Software Engineering · Computer Science 2026-03-13 Lingxiao Tang , He Ye , Zhaoyang Chu , Muyang Ye , Zhongxin Liu , Xiaoxue Ren , Lingfeng Bao

Recent advances in large multimodal models (LMMs) have enabled impressive reasoning and perception abilities, yet most existing training pipelines still depend on human-curated data or externally verified reward models, limiting their…

Computer Vision and Pattern Recognition · Computer Science 2026-03-16 Omkar Thawakar , Shravan Venkatraman , Ritesh Thawkar , Abdelrahman Shaker , Hisham Cholakkal , Rao Muhammad Anwer , Salman Khan , Fahad Khan

Reinforcement Learning with Verifiable Rewards(RLVR) has demonstrated great potential in enhancing the reasoning capabilities of large language models (LLMs). However, its success has thus far been largely confined to the mathematical and…

Artificial Intelligence · Computer Science 2026-02-05 Mengyu Zhang , Siyu Ding , Weichong Yin , Yu Sun , Hua Wu

Vision-language process reward models (VL-PRMs) are increasingly used to score intermediate reasoning steps and rerank candidates under test-time scaling. However, they often function as black-box judges: a low step score may reflect a…

Computer Vision and Pattern Recognition · Computer Science 2026-05-12 Junxin Wang , Dai Guan , Weijie Qiu , Zhihang Li , Yongbo Gai , Zhengyi Yang , Mengyu Zhou , Erchao Zhao , Xiaoxi Jiang , Guanjun Jiang

Language models encode substantial evaluative knowledge from pretraining, yet current post-training methods rely on external supervision (human annotations, proprietary models, or scalar reward models) to produce reward signals. Each…

Artificial Intelligence · Computer Science 2026-05-06 Shuyue Stella Li , Rui Xin , Teng Xiao , Yike Wang , Rulin Shao , Zoey Hao , Melanie Sclar , Sewoong Oh , Faeze Brahman , Pang Wei Koh , Yulia Tsvetkov

While Large Language Models (LLMs) have demonstrated strong math reasoning abilities through Reinforcement Learning with *Verifiable Rewards* (RLVR), many advanced mathematical problems are proof-based, with no guaranteed way to determine…

Computation and Language · Computer Science 2026-02-20 Haotong Yang , Zitong Wang , Shijia Kang , Siqi Yang , Wenkai Yu , Xu Niu , Yike Sun , Yi Hu , Zhouchen Lin , Muhan Zhang

Reinforcement learning with verifiable rewards (RLVR) has improved the reasoning ability of large language models, yet training remains costly because many rollouts contribute little to optimization, considering the amount of computation…

Machine Learning · Computer Science 2026-02-20 Yan Sun , Jia Guo , Stanley Kok , Zihao Wang , Zujie Wen , Zhiqiang Zhang

Self-evolution of multimodal large language models (MLLMs) remains a critical challenge: pseudo-label-based methods suffer from progressive quality degradation as model predictions drift, while template-based methods are confined to a…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Yongrui Heng , Chaoya Jiang , Han Yang , Shikun Zhang , Wei Ye

Large language models (LLMs) excel at logical and algorithmic reasoning, yet their emotional intelligence (EQ) still lags far behind their cognitive prowess. While reinforcement learning from verifiable rewards (RLVR) has advanced in other…

Optimization modeling is fundamental to decision-making across diverse domains. Despite progress in automating optimization formulation from natural language descriptions, Large Language Models (LLMs) often struggle to generate formally…

Artificial Intelligence · Computer Science 2025-12-23 Yitian Chen , Jingfan Xia , Siyu Shao , Dongdong Ge , Yinyu Ye

Automated AI research holds great potential to accelerate scientific discovery. However, current LLMs often generate plausible-looking but ineffective ideas. Execution grounding may help, but it is unclear whether automated execution is…

Computation and Language · Computer Science 2026-01-22 Chenglei Si , Zitong Yang , Yejin Choi , Emmanuel Candès , Diyi Yang , Tatsunori Hashimoto

Recent advances in large reasoning models have leveraged reinforcement learning with verifiable rewards (RLVR) to improve reasoning capabilities. However, scaling these methods typically requires extensive rollout computation and large…

Machine Learning · Computer Science 2025-09-03 Xinyu Tang , Zhenduo Zhang , Yurou Liu , Wayne Xin Zhao , Zujie Wen , Zhiqiang Zhang , Jun Zhou
‹ Prev 1 2 3 10 Next ›