Related papers: StepCoder: Improve Code Generation with Reinforcem…

Enhancing Code LLMs with Reinforcement Learning in Code Generation: A Survey

With the rapid evolution of large language models (LLM), reinforcement learning (RL) has emerged as a pivotal technique for code generation and optimization in various domains. This paper presents a systematic survey of the application of…

Software Engineering · Computer Science 2025-08-08 Junqiao Wang , Zeng Zhang , Yangfan He , Zihao Zhang , Xinyuan Song , Yuyang Song , Tianyu Shi , Yuchen Li , Hengyuan Xu , Kunyu Wu , Xin Yi , Zhongwei Wan , Xinhang Yuan , Zijun Wang , Kuan Lu , Menghao Huo , Tang Jingqun , Guangwu Qian , Keqin Li , Qiuwu Chen , Lewei He

Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems

Training next-generation code generation models requires high-quality datasets, yet existing datasets face difficulty imbalance, format inconsistency, and data quality problems. We address these challenges through systematic data processing…

Computation and Language · Computer Science 2026-03-10 Zongqian Li , Tengchao Lv , Shaohan Huang , Yixuan Su , Qinzheng Sun , Qiufeng Yin , Ying Xin , Scarlett Li , Lei Cui , Nigel Collier , Furu Wei

UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance

Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet code generation remains a major challenge. Current approaches for obtaining high-quality code data primarily focus on (i) collecting large-scale…

Computation and Language · Computer Science 2025-02-18 Yichuan Ma , Yunfan Shao , Peiji Li , Demin Song , Qipeng Guo , Linyang Li , Xipeng Qiu , Kai Chen

ShortCoder: Knowledge-Augmented Syntax Optimization for Token-Efficient Code Generation

Code generation tasks aim to automate the conversion of user requirements into executable code, significantly reducing manual development efforts and enhancing software productivity. The emergence of large language models (LLMs) has…

Software Engineering · Computer Science 2026-01-15 Sicong Liu , Yanxian Huang , Mingwei Liu , Jiachi Chen , Ensheng Shi , Yuchi Ma , Hongyu Zhang , Yin Zhang , Yanlin Wang

Improving HPC Code Generation Capability of LLMs via Online Reinforcement Learning with Real-Machine Benchmark Rewards

Large language models (LLMs) have demonstrated strong code generation capabilities, yet the runtime performance of generated code is not guaranteed, and there have been few attempts to train LLMs using runtime performance as a reward in the…

Machine Learning · Computer Science 2026-02-13 Ryo Mikasa , Shun-ichiro Hayashi , Daichi Mukunoki , Tetsuya Hoshino , Takahiro Katagiri

TreeCoder: Systematic Exploration and Optimisation of Decoding and Constraints for LLM Code Generation

Large language models (LLMs) have shown remarkable ability to generate code, yet their outputs often violate syntactic or semantic constraints when guided only through natural language prompts. We introduce TreeCoder, the most general and…

Machine Learning · Computer Science 2026-04-27 Henrijs Princis , Arindam Sharma , Cristina David

R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning

Practical guidance on training Large Language Models (LLMs) to leverage Code Interpreter across diverse tasks remains lacking. We present R1-Code-Interpreter, an extension of a text-only LLM trained via multi-turn supervised fine-tuning…

Artificial Intelligence · Computer Science 2026-03-05 Yongchao Chen , Yueying Liu , Junwei Zhou , Yilun Hao , Jingquan Wang , Yang Zhang , Na Li , Chuchu Fan

ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language Models

Tool learning has emerged as a crucial capability for large language models (LLMs) to solve complex real-world tasks through interaction with external tools. Existing approaches face significant challenges, including reliance on…

Computation and Language · Computer Science 2025-06-02 Hanxing Ding , Shuchang Tao , Liang Pang , Zihao Wei , Jinyang Gao , Bolin Ding , Huawei Shen , Xueqi Cheng

ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning

While Large Language Models (LLMs) have revolutionized code generation, standard ``System 1'' approaches that generate solutions in a single forward pass often hit a performance ceiling on complex algorithmic tasks. Existing iterative…

Computation and Language · Computer Science 2026-04-21 Juyong Jiang , Jiasi Shen , Sunghun Kim , Kang Min Yoo , Jeonghoon Kim , Sungju Kim

JumpCoder: Go Beyond Autoregressive Coder via Online Modification

While existing code large language models (code LLMs) exhibit impressive capabilities in code generation, their autoregressive sequential generation inherently lacks reversibility. This limitation hinders them from timely correcting…

Computation and Language · Computer Science 2024-09-26 Mouxiang Chen , Hao Tian , Zhongxin Liu , Xiaoxue Ren , Jianling Sun

RLTF: Reinforcement Learning from Unit Test Feedback

The goal of program synthesis, or code generation, is to generate executable code based on given descriptions. Recently, there has been an increasing number of studies employing reinforcement learning (RL) to improve the performance of…

Artificial Intelligence · Computer Science 2023-11-14 Jiate Liu , Yiqin Zhu , Kaiwen Xiao , Qiang Fu , Xiao Han , Wei Yang , Deheng Ye

RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning

Large language models (LLMs) deployed as agents solve user-specified tasks over multiple steps while keeping the required manual engagement to a minimum. Crucially, such LLMs need to ground their generations in any feedback obtained to…

Computation and Language · Computer Science 2025-02-19 Jonas Gehring , Kunhao Zheng , Jade Copet , Vegard Mella , Quentin Carbonneaux , Taco Cohen , Gabriel Synnaeve

Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization

Large Language Models (LLMs) generate functionally correct solutions but often fall short in code efficiency, a critical bottleneck for real-world deployment. In this paper, we introduce a novel test-time iterative optimization framework to…

Software Engineering · Computer Science 2025-06-04 Mingzhe Du , Luu Anh Tuan , Yue Liu , Yuhao Qing , Dong Huang , Xinyi He , Qian Liu , Zejun Ma , See-kiong Ng

MapCoder: Multi-Agent Code Generation for Competitive Problem Solving

Code synthesis, which requires a deep understanding of complex natural language problem descriptions, generation of code instructions for complex algorithms and data structures, and the successful execution of comprehensive unit tests,…

Computation and Language · Computer Science 2024-05-21 Md. Ashraful Islam , Mohammed Eunus Ali , Md Rizwan Parvez

ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation

Code generation plays a crucial role in various tasks, such as code auto-completion and mathematical reasoning. Previous work has proposed numerous methods to enhance code generation performance, including integrating feedback from the…

Computation and Language · Computer Science 2025-05-30 Houxing Ren , Mingjie Zhan , Zhongyuan Wu , Aojun Zhou , Junting Pan , Hongsheng Li

ReCode: Updating Code API Knowledge with Reinforcement Learning

Large Language Models (LLMs) exhibit remarkable code generation capabilities but falter when adapting to frequent updates in external library APIs. This critical limitation, stemming from reliance on outdated API knowledge from their…

Computation and Language · Computer Science 2025-11-25 Haoze Wu , Yunzhi Yao , Wenhao Yu , Ningyu Zhang

CodeCoR: An LLM-Based Self-Reflective Multi-Agent Framework for Code Generation

Code generation aims to produce code that fulfills requirements written in natural languages automatically. Large language Models (LLMs) like ChatGPT have demonstrated promising effectiveness in this area. Nonetheless, these LLMs often fail…

Software Engineering · Computer Science 2025-01-15 Ruwei Pan , Hongyu Zhang , Chao Liu

CodeGrad: Integrating Multi-Step Verification with Gradient-Based LLM Refinement

While Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, they often produce solutions that lack guarantees of correctness, robustness, and efficiency. This limitation is particularly acute in domains…

Software Engineering · Computer Science 2025-09-04 Yueke Zhang , Yifan Zhang , Kevin Leach , Yu Huang

Process-Supervised Reinforcement Learning for Code Generation

Existing reinforcement learning strategies based on outcome supervision have proven effective in enhancing the performance of large language models(LLMs) for code generation. While reinforcement learning based on process supervision has…

Software Engineering · Computer Science 2025-02-05 Yufan Ye , Ting Zhang , Wenbin Jiang , Hua Huang

A Pair Programming Framework for Code Generation via Multi-Plan Exploration and Feedback-Driven Refinement

Large language models (LLMs) have achieved impressive performance on code generation. Although prior studies enhanced LLMs with prompting techniques and code refinement, they still struggle with complex programming problems due to rigid…

Software Engineering · Computer Science 2024-09-10 Huan Zhang , Wei Cheng , Yuhan Wu , Wei Hu