Related papers: Multi-Turn Code Generation Through Single-Step Rew…

Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model

While large language models have proven effective in a huge range of downstream applications, they often generate text that is problematic or lacks a desired attribute. In this paper, we introduce Reward-Augmented Decoding (RAD), a text…

Computation and Language · Computer Science 2024-01-03 Haikang Deng , Colin Raffel

FunPRM: Function-as-Step Process Reward Model with Meta Reward Correction for Code Generation

Code generation is a core application of large language models (LLMs), yet LLMs still frequently fail on complex programming tasks. Given its success in mathematical reasoning, test-time scaling approaches such as Process Reward Model…

Machine Learning · Computer Science 2026-02-02 Ruiyi Zhang , Peijia Qin , Qi Cao , Eric Xue , Pengtao Xie

Self-Correcting Code Generation Using Small Language Models

Self-correction has demonstrated potential in code generation by allowing language models to revise and improve their outputs through successive refinement. Recent studies have explored prompting-based strategies that incorporate…

Computation and Language · Computer Science 2025-08-26 Jeonghun Cho , Deokhyung Kang , Hyounghun Kim , Gary Geunbae Lee

RPM-MCTS: Knowledge-Retrieval as Process Reward Model with Monte Carlo Tree Search for Code Generation

Tree search-based methods have made significant progress in enhancing the code generation capabilities of large language models. However, due to the difficulty in effectively evaluating intermediate algorithmic steps and the inability to…

Artificial Intelligence · Computer Science 2025-12-18 Yuanyuan Lin , Xiangyu Ouyang , Teng Zhang , Kaixin Sui

Process Supervision-Guided Policy Optimization for Code Generation

Reinforcement learning (RL) with unit test feedback has enhanced large language models' (LLMs) code generation, but relies on sparse rewards provided only after complete code evaluation, limiting learning efficiency and incremental…

Artificial Intelligence · Computer Science 2025-02-05 Ning Dai , Zheng Wu , Renjie Zheng , Ziyun Wei , Wenlei Shi , Xing Jin , Guanlin Liu , Chen Dun , Liang Huang , Lin Yan

TIER: Trajectory-Invariant Execution Rewards for Multi-Step Tool Composition

Tool use enables large language models to solve complex tasks through sequences of API calls, yet existing reinforcement learning approaches fail to scale to multi-step composition settings. Outcome-based rewards provide only sparse…

Machine Learning · Computer Science 2026-05-19 Anay Kulkarni , ChiaEn Lu , Dheeraj Mekala , Jayanth Srinivasa , Gaowen Liu , Jingbo Shang

MURPHY: Feedback-Aware GRPO with Retrospective Credit Assignment for Multi-Turn Code Generation

Reinforcement Learning with Verifiable Rewards (RLVR) has become a standard recipe for post-training LLMs on reasoning tasks, with Group Relative Policy Optimization (GRPO) emerging as a leading approach. However, GRPO and its variants are…

Machine Learning · Computer Science 2026-05-12 Chanakya Ekbote , Vijay Lingam , Sujay Sanghavi , Jun Huan , Behrooz Omidvar-Tehrani , Anoop Deoras , Stefano Soatto

Process-Supervised Reinforcement Learning for Code Generation

Existing reinforcement learning strategies based on outcome supervision have proven effective in enhancing the performance of large language models(LLMs) for code generation. While reinforcement learning based on process supervision has…

Software Engineering · Computer Science 2025-02-05 Yufan Ye , Ting Zhang , Wenbin Jiang , Hua Huang

MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization

Large Language Models (LLMs) demonstrate strong capabilities in general coding tasks but encounter two key challenges when optimizing code: (i) the complexity of writing optimized code (such as performant CUDA kernels and competition-level…

Machine Learning · Computer Science 2026-01-12 Jiefu Ou , Sapana Chaudhary , Kaj Bostrom , Nathaniel Weir , Shuai Zhang , Huzefa Rangwala , George Karypis

A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning

Large Language Models (LLMs) have shown significant potential in designing reward functions for Reinforcement Learning (RL) tasks. However, obtaining high-quality reward code often involves human intervention, numerous LLM queries, or…

Machine Learning · Computer Science 2024-10-21 Shengjie Sun , Runze Liu , Jiafei Lyu , Jing-Wen Yang , Liangpeng Zhang , Xiu Li

GenX: Mastering Code and Test Generation with Execution Feedback

Recent advancements in language modeling have enabled the translation of natural language into code, and the use of execution feedback to improve code generation. However, these methods often rely heavily on pre-existing test cases, which…

Software Engineering · Computer Science 2024-12-19 Nan Wang , Yafei Liu , Chen Chen , Haonan Lu

Improving HPC Code Generation Capability of LLMs via Online Reinforcement Learning with Real-Machine Benchmark Rewards

Large language models (LLMs) have demonstrated strong code generation capabilities, yet the runtime performance of generated code is not guaranteed, and there have been few attempts to train LLMs using runtime performance as a reward in the…

Machine Learning · Computer Science 2026-02-13 Ryo Mikasa , Shun-ichiro Hayashi , Daichi Mukunoki , Tetsuya Hoshino , Takahiro Katagiri

Iterative Self-Training for Code Generation via Reinforced Re-Ranking

Generating high-quality code that solves complex programming tasks is challenging, especially with current decoder-based models that produce highly stochastic outputs. In code generation, even minor errors can easily break the entire…

Computation and Language · Computer Science 2025-04-15 Nikita Sorokin , Ivan Sedykh , Valentin Malykh

CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward

In this work, we introduce CAD-Coder, a novel framework that reformulates text-to-CAD as the generation of CadQuery scripts - a Python-based, parametric CAD language. This representation enables direct geometric validation, a richer…

Graphics · Computer Science 2026-05-15 Yandong Guan , Xilin Wang , Ximing Xing , Jing Zhang , Dong Xu , Qian Yu

Let's reward step by step: Step-Level reward model as the Navigators for Reasoning

Recent years have seen considerable advancements in multi-step reasoning with Large Language Models (LLMs). The previous studies have elucidated the merits of integrating feedback or search mechanisms during model inference to improve the…

Computation and Language · Computer Science 2023-10-17 Qianli Ma , Haotian Zhou , Tingkai Liu , Jianbo Yuan , Pengfei Liu , Yang You , Hongxia Yang

Towards Better Correctness and Efficiency in Code Generation

While code large language models have demonstrated remarkable progress in code generation, the generated code often exhibits poor runtime efficiency, limiting its practical application in performance-sensitive scenarios. To address this…

Software Engineering · Computer Science 2025-08-29 Yunlong Feng , Yang Xu , Xiao Xu , Binyuan Hui , Junyang Lin

ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation

Code generation plays a crucial role in various tasks, such as code auto-completion and mathematical reasoning. Previous work has proposed numerous methods to enhance code generation performance, including integrating feedback from the…

Computation and Language · Computer Science 2025-05-30 Houxing Ren , Mingjie Zhan , Zhongyuan Wu , Aojun Zhou , Junting Pan , Hongsheng Li

Retrieval-Based Neural Code Generation

In models to generate program source code from natural language, representing this code in a tree structure has been a common approach. However, existing methods often fail to generate complex code correctly due to a lack of ability to…

Computation and Language · Computer Science 2018-08-31 Shirley Anugrah Hayati , Raphael Olivier , Pravalika Avvaru , Pengcheng Yin , Anthony Tomasic , Graham Neubig

CodeTool: Enhancing Programmatic Tool Invocation of LLMs via Process Supervision

Tool invocation significantly enhances the capabilities of Large Language Models (LLMs), yet challenges persist, particularly in complex task scenarios. Current methods, such as instruction-enhanced reasoning and supervised fine-tuning,…

Artificial Intelligence · Computer Science 2025-08-07 Yifei Lu , Fanghua Ye , Jian Li , Qiang Gao , Cheng Liu , Haibo Luo , Nan Du , Xiaolong Li , Feiliang Ren

Coded Retransmission in Wireless Networks Via Abstract MDPs: Theory and Algorithms

Consider a transmission scheme with a single transmitter and multiple receivers over a faulty broadcast channel. For each receiver, the transmitter has a unique infinite stream of packets, and its goal is to deliver them at the highest…

Information Theory · Computer Science 2015-10-27 Mark Shifrin , Asaf Cohen , Omer Gurewitz , Olga Weisman