English
Related papers

Related papers: EvolveCoder: Evolving Test Cases via Adversarial V…

200 papers

Reinforcement learning with verifiable rewards (RLVR) has advanced the reasoning capabilities of large language models. However, existing methods rely solely on outcome rewards, without explicitly optimizing verification or leveraging…

Software Engineering · Computer Science 2025-10-22 Yiyang Jin , Kunzhao Xu , Hang Li , Xueting Han , Yanmin Zhou , Cheng Li , Jing Bai

Reinforcement Learning from Verifiable Rewards (RLVR) has driven recent progress in code large language models by leveraging execution-based feedback from unit tests, but its scalability is fundamentally constrained by the availability and…

Machine Learning · Computer Science 2026-05-19 Xiao Zhu , Xinyu Zhou , Boyu Zhu , Hanxu Hu , Mingzhe Du , Haotian Zhang , Huiming Wang , Zhijiang Guo

Code verifiers play a critical role in post-verification for LLM-based code generation, yet existing supervised fine-tuning methods suffer from data scarcity, high failure rates, and poor inference efficiency. While reinforcement learning…

Artificial Intelligence · Computer Science 2026-02-02 Ji Shi , Peiming Guo , Meishan Zhang , Miao Zhang , Xuebo Liu , Min Zhang , Weili Guan

Reinforcement learning with verifiable rewards (RLVR) succeeds in reasoning tasks (e.g., math and code) by checking the final verifiable answer (i.e., a verifiable dot signal). However, extending this paradigm to open-ended generation is…

Computation and Language · Computer Science 2026-01-27 Yuxin Jiang , Yufei Wang , Qiyuan Zhang , Xingshan Zeng , Liangyou Li , Jierun Chen , Chaofan Tao , Haoli Bai , Lifeng Shang

Reinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs), especially when structured reference answers are…

Computation and Language · Computer Science 2025-04-02 Yi Su , Dian Yu , Linfeng Song , Juntao Li , Haitao Mi , Zhaopeng Tu , Min Zhang , Dong Yu

Reinforcement learning algorithms are defined by their learning update rules, which are typically hand-designed and fixed. We present an evolutionary framework for discovering reinforcement learning algorithms by searching directly over…

Machine Learning · Computer Science 2026-03-31 Alkis Sygkounas , Amy Loutfi , Andreas Persson

Recent advancements in code generation have shown remarkable success across software domains, yet hardware description languages (HDLs) such as Verilog remain underexplored due to their concurrency semantics, syntactic rigidity, and…

Machine Learning · Computer Science 2025-08-27 Fu Teng , Miao Pan , Xuhong Zhang , Zhezhi He , Yiyao Yang , Xinyi Chai , Mengnan Qi , Liqiang Lu , Jianwei Yin

Reinforcement learning for code generation relies on verifiable rewards from unit test pass rates. Yet high-quality test suites are scarce, existing datasets offer limited coverage, and static rewards fail to adapt as models improve. Recent…

Computation and Language · Computer Science 2026-03-17 Aozhe Wang , Yuchen Yan , Nan Zhou , Zhengxi Lu , Weiming Lu , Jun Xiao , Yueting Zhuang , Yongliang Shen

Progress in hardware model checking depends critically on high-quality benchmarks. However, the community faces a significant benchmark gap: existing suites are limited in number, often distributed only in representations such as BTOR2…

Hardware Architecture · Computer Science 2026-02-27 Guangyu Hu , Xiaofeng Zhou , Wei Zhang , Hongce Zhang

While Large Language Models (LLMs) excel at code generation by learning from vast code corpora, a fundamental semantic gap remains between their training on textual patterns and the goal of functional correctness, which is governed by…

Software Engineering · Computer Science 2026-04-23 Xue Jiang , Yihong Dong , Mengyang Liu , Hongyi Deng , Tian Wang , Yongding Tao , Rongyu Cao , Binhua Li , Zhi Jin , Wenpin Jiao , Fei Huang , Yongbin Li , Ge Li

Reinforcement learning with verifiable rewards (RLVR) is a promising approach for training language models (LMs) on reasoning tasks that elicit emergent long chains of thought (CoTs). Unlike supervised learning, it updates the model using…

Computation and Language · Computer Science 2025-10-28 Xinyu Zhu , Mengzhou Xia , Zhepei Wei , Wei-Lin Chen , Danqi Chen , Yu Meng

Reinforcement Learning with Verifiable Rewards(RLVR) has demonstrated great potential in enhancing the reasoning capabilities of large language models (LLMs). However, its success has thus far been largely confined to the mathematical and…

Artificial Intelligence · Computer Science 2026-02-05 Mengyu Zhang , Siyu Ding , Weichong Yin , Yu Sun , Hua Wu

Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a powerful paradigm for enhancing Large Language Models (LLMs), exemplified by the success of OpenAI's o-series. In RLVR, rewards are derived from verifiable signals-such…

Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning performance of large language models (LLMs) by increasing test-time compute. However, even after extensive RLVR training, such models still…

Artificial Intelligence · Computer Science 2026-03-10 Pinzheng Wang , Shuli Xu , Juntao Li , Yu Luo , Dong Li , Jianye Hao , Min Zhang

Most progress in recent coder models has been driven by supervised fine-tuning (SFT), while the potential of reinforcement learning (RL) remains largely unexplored, primarily due to the lack of reliable reward data/model in the code domain.…

Software Engineering · Computer Science 2025-05-27 Huaye Zeng , Dongfu Jiang , Haozhe Wang , Ping Nie , Xiaotong Chen , Wenhu Chen

Reinforcement learning with verifiable rewards (RLVR) has become a key technique for enhancing large language models (LLMs), with verification engineering playing a central role. However, best practices for RL in instruction following…

Computation and Language · Computer Science 2025-06-12 Hao Peng , Yunjia Qi , Xiaozhi Wang , Bin Xu , Lei Hou , Juanzi Li

Reinforcement learning with verifiable rewards (RLVR) has demonstrated promising potential to enhance the reasoning capabilities of large language models (LLMs) in domains such as mathematics and coding. However, its applications on…

Computation and Language · Computer Science 2026-05-19 Zhonghang Yuan , Zhefan Wang , Fang Hu , Zihong Chen , Jinzhe Li , Gang Li , Jie Ying , Huanjun Kong , Songyang Zhang , Nanqing Dong

Recent advancements in long chain-of-thought (CoT) reasoning, particularly through the Group Relative Policy Optimization algorithm used by DeepSeek-R1, have led to significant interest in the potential of Reinforcement Learning with…

Artificial Intelligence · Computer Science 2025-10-03 Xumeng Wen , Zihan Liu , Shun Zheng , Shengyu Ye , Zhirong Wu , Yang Wang , Zhijian Xu , Xiao Liang , Junjie Li , Ziming Miao , Jiang Bian , Mao Yang

Large Language Models (LLMs) excel at code generation but remain heavily reliant on large-scale annotated solutions and verification-based supervision, which constrains scalability and hinders sustained self-improvement. Recent…

Software Engineering · Computer Science 2026-05-22 Yixu Huang , Xinglei Yu , Zhongyu Wei

Reinforcement learning (RL) is a machine learning approach that trains agents to maximize cumulative rewards through interactions with environments. The integration of RL with deep learning has recently resulted in impressive achievements…

Neural and Evolutionary Computing · Computer Science 2023-08-31 Hui Bai , Ran Cheng , Yaochu Jin
‹ Prev 1 2 3 10 Next ›