Related papers: Process-Supervised Reinforcement Learning for Code…

Process Supervision-Guided Policy Optimization for Code Generation

Reinforcement learning (RL) with unit test feedback has enhanced large language models' (LLMs) code generation, but relies on sparse rewards provided only after complete code evaluation, limiting learning efficiency and incremental…

Artificial Intelligence · Computer Science 2025-02-05 Ning Dai , Zheng Wu , Renjie Zheng , Ziyun Wei , Wenlei Shi , Xing Jin , Guanlin Liu , Chen Dun , Liang Huang , Lin Yan

Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation

Large Language Models excel at code generation yet struggle with complex programming tasks that demand sophisticated reasoning. To bridge this gap, traditional process supervision relies on learned reward models requiring costly training…

Computation and Language · Computer Science 2025-06-09 Zhuohao Yu , Weizheng Gu , Yidong Wang , Xingru Jiang , Zhengran Zeng , Jindong Wang , Wei Ye , Shikun Zhang

Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning

The central challenge of reinforcement learning for reasoning lies not only in the sparsity of outcome-level supervision, but more fundamentally in how to transform feedback provided only at the end of a sequence into fine-grained learning…

Machine Learning · Computer Science 2026-05-26 Fei Ding , Yongkang Zhang , Runhao Liu , Yuhao Liao , Zijian Zeng , Sibo wang , Huiming Yang

Beyond Outcome Verification: Verifiable Process Reward Models for Structured Reasoning

Recent work on reinforcement learning with verifiable rewards (RLVR) has shown that large language models (LLMs) can be substantially improved using outcome-level verification signals, such as unit tests for code or exact-match checks for…

Computation and Language · Computer Science 2026-01-27 Massimiliano Pronesti , Anya Belz , Yufang Hou

Do We Need to Verify Step by Step? Rethinking Process Supervision from a Theoretical Perspective

As large language models have evolved, it has become crucial to distinguish between process supervision and outcome supervision -- two key reinforcement learning approaches to complex reasoning tasks. While process supervision offers…

Machine Learning · Computer Science 2025-03-28 Zeyu Jia , Alexander Rakhlin , Tengyang Xie

ReCode: Reinforcing Code Generation with Reasoning-Process Rewards

In practice, rigorous reasoning is often a key driver of correct code, while Reinforcement Learning (RL) for code generation often neglects optimizing reasoning quality. Bringing process-level supervision into RL is appealing, but it faces…

Software Engineering · Computer Science 2026-05-06 Lishui Fan , Yu Zhang , Mouxiang Chen , Zhongxin Liu

Improving HPC Code Generation Capability of LLMs via Online Reinforcement Learning with Real-Machine Benchmark Rewards

Large language models (LLMs) have demonstrated strong code generation capabilities, yet the runtime performance of generated code is not guaranteed, and there have been few attempts to train LLMs using runtime performance as a reward in the…

Machine Learning · Computer Science 2026-02-13 Ryo Mikasa , Shun-ichiro Hayashi , Daichi Mukunoki , Tetsuya Hoshino , Takahiro Katagiri

Good Learners Think Their Thinking: Generative PRM Makes Large Reasoning Model More Efficient Math Learner

Large reasoning models (LRMs) have recently shown promise in solving complex math problems when optimized with Reinforcement Learning (RL). But conventional approaches rely on outcome-only rewards that provide sparse feedback, resulting in…

Machine Learning · Computer Science 2025-08-01 Tao He , Rongchuan Mu , Lizi Liao , Yixin Cao , Ming Liu , Bing Qin

Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs

Code-generating Large Language Models (LLMs) have become essential tools in modern software development, enhancing productivity and accelerating development. This paper aims to investigate the fine-tuning of code-generating LLMs using…

Software Engineering · Computer Science 2025-05-06 Marina Sakharova , Abhinav Anand , Mira Mezini

Scaling Generative Verifiers For Natural Language Mathematical Proof Verification And Selection

Large language models have achieved remarkable success on final-answer mathematical problems, largely due to the ease of applying reinforcement learning with verifiable rewards. However, the reasoning underlying these solutions is often…

Artificial Intelligence · Computer Science 2025-11-18 Sadegh Mahdavi , Branislav Kisacanin , Shubham Toshniwal , Wei Du , Ivan Moshkov , George Armstrong , Renjie Liao , Christos Thrampoulidis , Igor Gitman

Execution-based Code Generation using Deep Reinforcement Learning

The utilization of programming language (PL) models, pre-trained on large-scale code corpora, as a means of automating software engineering processes has demonstrated considerable potential in streamlining various code generation tasks such…

Machine Learning · Computer Science 2023-07-21 Parshin Shojaee , Aneesh Jain , Sindhu Tipirneni , Chandan K. Reddy

Enhancing Code LLMs with Reinforcement Learning in Code Generation: A Survey

With the rapid evolution of large language models (LLM), reinforcement learning (RL) has emerged as a pivotal technique for code generation and optimization in various domains. This paper presents a systematic survey of the application of…

Software Engineering · Computer Science 2025-08-08 Junqiao Wang , Zeng Zhang , Yangfan He , Zihao Zhang , Xinyuan Song , Yuyang Song , Tianyu Shi , Yuchen Li , Hengyuan Xu , Kunyu Wu , Xin Yi , Zhongwei Wan , Xinhang Yuan , Zijun Wang , Kuan Lu , Menghao Huo , Tang Jingqun , Guangwu Qian , Keqin Li , Qiuwu Chen , Lewei He

ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation

Reinforcement learning (RL) has become a promising paradigm for optimizing Retrieval-Augmented Generation (RAG) in complex reasoning tasks. However, traditional outcome-based RL approaches often suffer from reward sparsity and inefficient…

Artificial Intelligence · Computer Science 2026-01-30 Zhao Wang , Ziliang Zhao , Zhicheng Dou

Performance-Aligned LLMs for Generating Fast Code

Optimizing scientific software is a difficult task because codebases are often large and complex, and performance can depend upon several factors including the algorithm, its implementation, and hardware among others. Causes of poor…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-30 Daniel Nichols , Pranav Polasam , Harshitha Menon , Aniruddha Marathe , Todd Gamblin , Abhinav Bhatele

Reinforcement Learning for Sequence Design Leveraging Protein Language Models

Protein sequence design, determined by amino acid sequences, are essential to protein engineering problems in drug discovery. Prior approaches have resorted to evolutionary strategies or Monte-Carlo methods for protein design, but often…

Machine Learning · Computer Science 2024-11-19 Jithendaraa Subramanian , Shivakanth Sujit , Niloy Irtisam , Umong Sain , Riashat Islam , Derek Nowrouzezahrai , Samira Ebrahimi Kahou

Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning

Decoding-based regression, which reformulates regression as a sequence generation task, has emerged as a promising paradigm of applying large language models for numerical prediction. However, its progress is hindered by the misalignment…

Machine Learning · Computer Science 2025-12-09 Ming Chen , Sheng Tang , Rong-Xi Tan , Ziniu Li , Jiacheng Chen , Ke Xue , Chao Qian

CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment

While Large Language Models (LLMs) excel at code generation by learning from vast code corpora, a fundamental semantic gap remains between their training on textual patterns and the goal of functional correctness, which is governed by…

Software Engineering · Computer Science 2026-04-23 Xue Jiang , Yihong Dong , Mengyang Liu , Hongyi Deng , Tian Wang , Yongding Tao , Rongyu Cao , Binhua Li , Zhi Jin , Wenpin Jiao , Fei Huang , Yongbin Li , Ge Li

A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models

Although Large Language Models (LLMs) exhibit advanced reasoning ability, conventional alignment remains largely dominated by outcome reward models (ORMs) that judge only final answers. Process Reward Models(PRMs) address this gap by…

Computation and Language · Computer Science 2026-04-30 Congmin Zheng , Jiachen Zhu , Zhuoying Ou , Yuxiang Chen , Kangning Zhang , Rong Shan , Zeyu Zheng , Mengyue Yang , Jianghao Lin , Yong Yu , Weinan Zhang

Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning

In this paper, we propose R$^3$: Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models. The…

Artificial Intelligence · Computer Science 2024-03-19 Zhiheng Xi , Wenxiang Chen , Boyang Hong , Senjie Jin , Rui Zheng , Wei He , Yiwen Ding , Shichun Liu , Xin Guo , Junzhe Wang , Honglin Guo , Wei Shen , Xiaoran Fan , Yuhao Zhou , Shihan Dou , Xiao Wang , Xinbo Zhang , Peng Sun , Tao Gui , Qi Zhang , Xuanjing Huang

Improving LLM Code Generation via Requirement-Aware Curriculum Reinforcement Learning

Code generation, which aims to automatically generate source code from given programming requirements, has the potential to substantially improve software development efficiency. With the rapid advancement of large language models (LLMs),…

Software Engineering · Computer Science 2026-05-04 Shouyu Yin , Zhao Tian , Junjie Chen , Shikai Guo