Related papers: CVeDRL: An Efficient Code Verifier via Difficulty-…

CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment

While Large Language Models (LLMs) excel at code generation by learning from vast code corpora, a fundamental semantic gap remains between their training on textual patterns and the goal of functional correctness, which is governed by…

Software Engineering · Computer Science 2026-04-23 Xue Jiang , Yihong Dong , Mengyang Liu , Hongyi Deng , Tian Wang , Yongding Tao , Rongyu Cao , Binhua Li , Zhi Jin , Wenpin Jiao , Fei Huang , Yongbin Li , Ge Li

VERIRL: Boosting the LLM-based Verilog Code Generation via Reinforcement Learning

Recent advancements in code generation have shown remarkable success across software domains, yet hardware description languages (HDLs) such as Verilog remain underexplored due to their concurrency semantics, syntactic rigidity, and…

Machine Learning · Computer Science 2025-08-27 Fu Teng , Miao Pan , Xuhong Zhang , Zhezhi He , Yiyao Yang , Xinyi Chai , Mengnan Qi , Liqiang Lu , Jianwei Yin

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Reward Models

Reinforcement Learning from Verifiable Rewards (RLVR) has driven recent progress in code large language models by leveraging execution-based feedback from unit tests, but its scalability is fundamentally constrained by the availability and…

Machine Learning · Computer Science 2026-05-19 Xiao Zhu , Xinyu Zhou , Boyu Zhu , Hanxu Hu , Mingzhe Du , Haotian Zhang , Huiming Wang , Zhijiang Guo

ConfClip: Confidence-Weighted and Clipped Reward for Reinforcement Learning in LLMs

Reinforcement learning (RL) has become a standard paradigm for refining large language models (LLMs) beyond pre-training and instruction tuning. A prominent line of work is RL with verifiable rewards (RLVR), which leverages automatically…

Machine Learning · Computer Science 2025-09-23 Bonan Zhang , Zhongqi Chen , Bowen Song , Qinya Li , Fan Wu , Guihai Chen

Improving LLM Code Generation via Requirement-Aware Curriculum Reinforcement Learning

Code generation, which aims to automatically generate source code from given programming requirements, has the potential to substantially improve software development efficiency. With the rapid advancement of large language models (LLMs),…

Software Engineering · Computer Science 2026-05-04 Shouyu Yin , Zhao Tian , Junjie Chen , Shikai Guo

VerIF: Verification Engineering for Reinforcement Learning in Instruction Following

Reinforcement learning with verifiable rewards (RLVR) has become a key technique for enhancing large language models (LLMs), with verification engineering playing a central role. However, best practices for RL in instruction following…

Computation and Language · Computer Science 2025-06-12 Hao Peng , Yunjia Qi , Xiaozhi Wang , Bin Xu , Lei Hou , Juanzi Li

EvolveCoder: Evolving Test Cases via Adversarial Verification for Code Reinforcement Learning

Reinforcement learning with verifiable rewards (RLVR) is a promising approach for improving code generation in large language models, but its effectiveness is limited by weak and static verification signals in existing coding RL datasets.…

Computation and Language · Computer Science 2026-03-16 Chi Ruan , Dongfu Jiang , Huaye Zeng , Ping Nie , Wenhu Chen

From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning

Reinforcement learning from verifiable rewards (RLVR) has shown strong promise for LLM reasoning, but outcome-based RLVR remains inefficient on hard problems because correct final-answer rollouts are rare and sample-level credit assignment…

Machine Learning · Computer Science 2026-05-22 Xitai Jiang , Zihan Tang , Wenze Lin , Yang Yue , Shenzhi Wang , Gao Huang

CRScore++: Reinforcement Learning with Verifiable Tool and AI Feedback for Code Review

Reinforcement learning (RL) to improve code review comment generation requires handling unstructured outputs, making reinforcement learning (RL) feedback challenging. The two main RL approaches, namely RL with Verifiable Feedback (RLVR) and…

Software Engineering · Computer Science 2025-06-03 Manav Nitin Kapadnis , Atharva Naik , Carolyn Rose

Soft-SVeRL: Self-Verified Reinforcement Learning with Soft Rewards

Reinforcement Learning from Verifiable Rewards (RLVR) has improved language models in domains such as mathematics and code, where correctness can be checked automatically. However, many important tasks are only partially verifiable: prompts…

Computation and Language · Computer Science 2026-05-28 Saurabh Dash , Pierre Clavier , John Dang , Matthias Galle , Marzieh Fadaee , Ahmet Üstün , Beyza Ermis

ExecVerify: White-Box RL with Verifiable Stepwise Rewards for Code Execution Reasoning

Code LLMs still struggle with code execution reasoning, especially in smaller models. Existing methods rely on supervised fine-tuning (SFT) with teacher-generated explanations, primarily in two forms: (1) input-output (I/O) prediction…

Software Engineering · Computer Science 2026-03-13 Lingxiao Tang , He Ye , Zhaoyang Chu , Muyang Ye , Zhongxin Liu , Xiaoxue Ren , Lingfeng Bao

From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation

Reinforcement learning with verifiable rewards (RLVR) succeeds in reasoning tasks (e.g., math and code) by checking the final verifiable answer (i.e., a verifiable dot signal). However, extending this paradigm to open-ended generation is…

Computation and Language · Computer Science 2026-01-27 Yuxin Jiang , Yufei Wang , Qiyuan Zhang , Xingshan Zeng , Liangyou Li , Jierun Chen , Chaofan Tao , Haoli Bai , Lifeng Shang

Code as Reward: Empowering Reinforcement Learning with VLMs

Pre-trained Vision-Language Models (VLMs) are able to understand visual concepts, describe and decompose complex tasks into sub-tasks, and provide feedback on task completion. In this paper, we aim to leverage these capabilities to support…

Machine Learning · Computer Science 2024-02-08 David Venuto , Sami Nur Islam , Martin Klissarov , Doina Precup , Sherry Yang , Ankit Anand

Efficient Stimuli Generation using Reinforcement Learning in Design Verification

The increasing design complexity of System-on-Chips (SoCs) has led to significant verification challenges, particularly in meeting coverage targets within a timely manner. At present, coverage closure is heavily dependent on constrained…

Artificial Intelligence · Computer Science 2025-12-09 Deepak Narayan Gadde , Thomas Nalapat , Aman Kumar , Djones Lettnin , Wolfgang Kunz , Sebastian Simon

Coupled Variational Reinforcement Learning for Language Model General Reasoning

While reinforcement learning has achieved impressive progress in language model reasoning, it is constrained by the requirement for verifiable rewards. Recent verifier-free RL methods address this limitation by utilizing the probabilities…

Computation and Language · Computer Science 2026-05-26 Xueru Wen , Jie Lou , Yanjiang Liu , Hongyu Lin , Ben He , Xianpei Han , Le Sun , Yaojie Lu , Debing Zhang

Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains

Reinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs), especially when structured reference answers are…

Computation and Language · Computer Science 2025-04-02 Yi Su , Dian Yu , Linfeng Song , Juntao Li , Haitao Mi , Zhaopeng Tu , Min Zhang , Dong Yu

Context Bootstrapped Reinforcement Learning

Reinforcement Learning from Verifiable Rewards (RLVR) suffers from exploration inefficiency, where models struggle to generate successful rollouts, resulting in minimal learning signal. This challenge is particularly severe for tasks that…

Machine Learning · Computer Science 2026-03-20 Saaket Agashe , Jayanth Srinivasa , Gaowen Liu , Ramana Kompella , Xin Eric Wang

Chart-RL: Generalized Chart Comprehension via Reinforcement Learning with Verifiable Rewards

Accurate chart comprehension represents a critical challenge in advancing multimodal learning systems, as extensive information is compressed into structured visual representations. However, existing vision-language models (VLMs) frequently…

Machine Learning · Computer Science 2026-03-10 Xin Zhang , Xingyu Li , Rongguang Wang , Ruizhong Miao , Zheng Wang , Dan Roth , Chenyang Li

SecureCodeRL: Security-Aware Reinforcement Learning for Code Generation with Partial-Credit Rewards

Large Language Models (LLMs) can generate plausible code, but in settings that require exact stdin/stdout behavior they frequently produce programs that compile yet fail tests, and in some cases they introduce security-sensitive patterns.…

Cryptography and Security · Computer Science 2026-01-06 Suryansh Singh Sijwali , Suman Saha

Efficient Reasoning via Reward Model

Reinforcement learning with verifiable rewards (RLVR) has been shown to enhance the reasoning capabilities of large language models (LLMs), enabling the development of large reasoning models (LRMs). However, LRMs such as DeepSeek-R1 and…

Artificial Intelligence · Computer Science 2025-11-13 Yuhao Wang , Xiaopeng Li , Cheng Gong , Ziru Liu , Suiyun Zhang , Rui Liu , Xiangyu Zhao