Related papers: EvolveCoder: Evolving Test Cases via Adversarial V…

ReVeal: Self-Evolving Code Agents via Reliable Self-Verification

Reinforcement learning with verifiable rewards (RLVR) has advanced the reasoning capabilities of large language models. However, existing methods rely solely on outcome rewards, without explicitly optimizing verification or leveraging…

Software Engineering · Computer Science 2025-10-22 Yiyang Jin , Kunzhao Xu , Hang Li , Xueting Han , Yanmin Zhou , Cheng Li , Jing Bai

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Reward Models

Reinforcement Learning from Verifiable Rewards (RLVR) has driven recent progress in code large language models by leveraging execution-based feedback from unit tests, but its scalability is fundamentally constrained by the availability and…

Machine Learning · Computer Science 2026-05-19 Xiao Zhu , Xinyu Zhou , Boyu Zhu , Hanxu Hu , Mingzhe Du , Haotian Zhang , Huiming Wang , Zhijiang Guo

CVeDRL: An Efficient Code Verifier via Difficulty-aware Reinforcement Learning

Code verifiers play a critical role in post-verification for LLM-based code generation, yet existing supervised fine-tuning methods suffer from data scarcity, high failure rates, and poor inference efficiency. While reinforcement learning…

Artificial Intelligence · Computer Science 2026-02-02 Ji Shi , Peiming Guo , Meishan Zhang , Miao Zhang , Xuebo Liu , Min Zhang , Weili Guan

From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation

Reinforcement learning with verifiable rewards (RLVR) succeeds in reasoning tasks (e.g., math and code) by checking the final verifiable answer (i.e., a verifiable dot signal). However, extending this paradigm to open-ended generation is…

Computation and Language · Computer Science 2026-01-27 Yuxin Jiang , Yufei Wang , Qiyuan Zhang , Xingshan Zeng , Liangyou Li , Jierun Chen , Chaofan Tao , Haoli Bai , Lifeng Shang

Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains

Reinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs), especially when structured reference answers are…

Computation and Language · Computer Science 2025-04-02 Yi Su , Dian Yu , Linfeng Song , Juntao Li , Haitao Mi , Zhaopeng Tu , Min Zhang , Dong Yu

Evolutionary Discovery of Reinforcement Learning Algorithms via Large Language Models

Reinforcement learning algorithms are defined by their learning update rules, which are typically hand-designed and fixed. We present an evolutionary framework for discovering reinforcement learning algorithms by searching directly over…

Machine Learning · Computer Science 2026-03-31 Alkis Sygkounas , Amy Loutfi , Andreas Persson

VERIRL: Boosting the LLM-based Verilog Code Generation via Reinforcement Learning

Recent advancements in code generation have shown remarkable success across software domains, yet hardware description languages (HDLs) such as Verilog remain underexplored due to their concurrency semantics, syntactic rigidity, and…

Machine Learning · Computer Science 2025-08-27 Fu Teng , Miao Pan , Xuhong Zhang , Zhezhi He , Yiyao Yang , Xinyi Chai , Mengnan Qi , Liqiang Lu , Jianwei Yin

Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning

Reinforcement learning for code generation relies on verifiable rewards from unit test pass rates. Yet high-quality test suites are scarce, existing datasets offer limited coverage, and static rewards fail to adapt as models improve. Recent…

Computation and Language · Computer Science 2026-03-17 Aozhe Wang , Yuchen Yan , Nan Zhou , Zhengxi Lu , Weiming Lu , Jun Xiao , Yueting Zhuang , Yongliang Shen

EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning

Progress in hardware model checking depends critically on high-quality benchmarks. However, the community faces a significant benchmark gap: existing suites are limited in number, often distributed only in representations such as BTOR2…

Hardware Architecture · Computer Science 2026-02-27 Guangyu Hu , Xiaofeng Zhou , Wei Zhang , Hongce Zhang

CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment

While Large Language Models (LLMs) excel at code generation by learning from vast code corpora, a fundamental semantic gap remains between their training on textual patterns and the goal of functional correctness, which is governed by…

Software Engineering · Computer Science 2026-04-23 Xue Jiang , Yihong Dong , Mengyang Liu , Hongyi Deng , Tian Wang , Yongding Tao , Rongyu Cao , Binhua Li , Zhi Jin , Wenpin Jiao , Fei Huang , Yongbin Li , Ge Li

The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning

Reinforcement learning with verifiable rewards (RLVR) is a promising approach for training language models (LMs) on reasoning tasks that elicit emergent long chains of thought (CoTs). Unlike supervised learning, it updates the model using…

Computation and Language · Computer Science 2025-10-28 Xinyu Zhu , Mengzhou Xia , Zhepei Wei , Wei-Lin Chen , Danqi Chen , Yu Meng

Extending RLVR to Open-Ended Tasks via Verifiable Multiple-Choice Reformulation

Reinforcement Learning with Verifiable Rewards(RLVR) has demonstrated great potential in enhancing the reasoning capabilities of large language models (LLMs). However, its success has thus far been largely confined to the mathematical and…

Artificial Intelligence · Computer Science 2026-02-05 Mengyu Zhang , Siyu Ding , Weichong Yin , Yu Sun , Hua Wu

Reinforcement Learning with Rubric Anchors

Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a powerful paradigm for enhancing Large Language Models (LLMs), exemplified by the success of OpenAI's o-series. In RLVR, rewards are derived from verifiable signals-such…

Artificial Intelligence · Computer Science 2025-08-19 Zenan Huang , Yihong Zhuang , Guoshan Lu , Zeyu Qin , Haokai Xu , Tianyu Zhao , Ru Peng , Jiaqi Hu , Zhanming Shen , Xiaomeng Hu , Xijun Gu , Peiyi Tu , Jiaxin Liu , Wenyu Chen , Yuzhuo Fu , Zhiting Fan , Yanmei Gu , Yuanyuan Wang , Zhengkai Yang , Jianguo Li , Junbo Zhao

$\textbf{Re}^{2}$: Unlocking LLM Reasoning via Reinforcement Learning with Re-solving

Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning performance of large language models (LLMs) by increasing test-time compute. However, even after extensive RLVR training, such models still…

Artificial Intelligence · Computer Science 2026-03-10 Pinzheng Wang , Shuli Xu , Juntao Li , Yu Luo , Dong Li , Jianye Hao , Min Zhang

ACECODER: Acing Coder RL via Automated Test-Case Synthesis

Most progress in recent coder models has been driven by supervised fine-tuning (SFT), while the potential of reinforcement learning (RL) remains largely unexplored, primarily due to the lack of reliable reward data/model in the code domain.…

Software Engineering · Computer Science 2025-05-27 Huaye Zeng , Dongfu Jiang , Haozhe Wang , Ping Nie , Xiaotong Chen , Wenhu Chen

VerIF: Verification Engineering for Reinforcement Learning in Instruction Following

Reinforcement learning with verifiable rewards (RLVR) has become a key technique for enhancing large language models (LLMs), with verification engineering playing a central role. However, best practices for RL in instruction following…

Computation and Language · Computer Science 2025-06-12 Hao Peng , Yunjia Qi , Xiaozhi Wang , Bin Xu , Lei Hou , Juanzi Li

Knowledge-to-Verification: Exploring RLVR for LLMs in Knowledge-Intensive Domains

Reinforcement learning with verifiable rewards (RLVR) has demonstrated promising potential to enhance the reasoning capabilities of large language models (LLMs) in domains such as mathematics and coding. However, its applications on…

Computation and Language · Computer Science 2026-05-19 Zhonghang Yuan , Zhefan Wang , Fang Hu , Zihong Chen , Jinzhe Li , Gang Li , Jie Ying , Huanjun Kong , Songyang Zhang , Nanqing Dong

Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs

Recent advancements in long chain-of-thought (CoT) reasoning, particularly through the Group Relative Policy Optimization algorithm used by DeepSeek-R1, have led to significant interest in the potential of Reinforcement Learning with…

Artificial Intelligence · Computer Science 2025-10-03 Xumeng Wen , Zihan Liu , Shun Zheng , Shengyu Ye , Zhirong Wu , Yang Wang , Zhijian Xu , Xiao Liang , Junjie Li , Ziming Miao , Jiang Bian , Mao Yang

ACE: Self-Evolving LLM Coding Framework via Adversarial Unit Test Generation and Preference Optimization

Large Language Models (LLMs) excel at code generation but remain heavily reliant on large-scale annotated solutions and verification-based supervision, which constrains scalability and hinders sustained self-improvement. Recent…

Software Engineering · Computer Science 2026-05-22 Yixu Huang , Xinglei Yu , Zhongyu Wei

Evolutionary Reinforcement Learning: A Survey

Reinforcement learning (RL) is a machine learning approach that trains agents to maximize cumulative rewards through interactions with environments. The integration of RL with deep learning has recently resulted in impressive achievements…

Neural and Evolutionary Computing · Computer Science 2023-08-31 Hui Bai , Ran Cheng , Yaochu Jin