Related papers: Persistent Cross-Attempt State Optimization for Re…

CATCODER: Repository-Level Code Generation with Relevant Code and Type Context

Large language models (LLMs) have demonstrated remarkable capabilities in code generation tasks. However, repository-level code generation presents unique challenges, particularly due to the need to utilize information spread across…

Software Engineering · Computer Science 2025-11-24 Zhiyuan Pan , Xing Hu , Xin Xia , Xiaohu Yang

AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion

Repository-level code completion remains a challenging task for existing code large language models (code LLMs) due to their limited understanding of repository-specific context and domain knowledge. While retrieval-augmented generation…

Software Engineering · Computer Science 2026-01-28 Tianyue Jiang , Yanli Wang , Yanlin Wang , Daya Guo , Ensheng Shi , Yuchi Ma , Jiachi Chen , Zibin Zheng

A Review of Repository Level Prompting for LLMs

As coding challenges become more complex, recent advancements in Large Language Models (LLMs) have led to notable successes, such as achieving a 94.6\% solve rate on the HumanEval benchmark. Concurrently, there is an increasing commercial…

Software Engineering · Computer Science 2023-12-19 Douglas Schonholtz

RLCoder: Reinforcement Learning for Repository-Level Code Completion

Repository-level code completion aims to generate code for unfinished code snippets within the context of a specified repository. Existing approaches mainly rely on retrieval-augmented generation strategies due to limitations in input…

Software Engineering · Computer Science 2024-07-31 Yanlin Wang , Yanli Wang , Daya Guo , Jiachi Chen , Ruikai Zhang , Yuchi Ma , Zibin Zheng

ShortCoder: Knowledge-Augmented Syntax Optimization for Token-Efficient Code Generation

Code generation tasks aim to automate the conversion of user requirements into executable code, significantly reducing manual development efforts and enhancing software productivity. The emergence of large language models (LLMs) has…

Software Engineering · Computer Science 2026-01-15 Sicong Liu , Yanxian Huang , Mingwei Liu , Jiachi Chen , Ensheng Shi , Yuchi Ma , Hongyu Zhang , Yin Zhang , Yanlin Wang

LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues

Reproducing buggy code is the first and crucially important step in issue resolving, as it aids in identifying the underlying problems and validating that generated patches resolve the problem. While numerous approaches have been proposed…

Software Engineering · Computer Science 2024-11-22 Yalan Lin , Yingwei Ma , Rongyu Cao , Binhua Li , Fei Huang , Xiaodong Gu , Yongbin Li

StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

The advancement of large language models (LLMs) has significantly propelled the field of code generation. Previous work integrated reinforcement learning (RL) with compiler feedback for exploring the output space of LLMs to enhance code…

Software Engineering · Computer Science 2024-02-06 Shihan Dou , Yan Liu , Haoxiang Jia , Limao Xiong , Enyu Zhou , Wei Shen , Junjie Shan , Caishuang Huang , Xiao Wang , Xiaoran Fan , Zhiheng Xi , Yuhao Zhou , Tao Ji , Rui Zheng , Qi Zhang , Xuanjing Huang , Tao Gui

Thinking Before Running! Efficient Code Generation with Thorough Exploration and Optimal Refinement

Code generation is crucial in software engineering for automating the coding process efficiently. While test-time computation methods show promise, they suffer from high latency due to multiple computation rounds. To overcome this, we…

Software Engineering · Computer Science 2025-05-28 Xiaoqing Zhang , Yuhan Liu , Flood Sung , Xiuying Chen , Shuo Shang , Rui Yan

BatCoder: Self-Supervised Bidirectional Code-Documentation Learning via Back-Translation

Training LLMs for code-related tasks typically depends on high-quality code-documentation pairs, which are costly to curate and often scarce for niche programming languages. We introduce BatCoder, a self-supervised reinforcement learning…

Machine Learning · Computer Science 2026-02-04 Jingwen Xu , Yiyang Lu , Zisu Huang , Changze Lv , Xiaohua Wang , Shizheng Li , Zhibo Xu , Zhengkang Guo , Zhengyuan Wang , Muzhao Tian , Xuanjing Huang , Xiaoqing Zheng

FastCoder: Accelerating Repository-level Code Generation via Efficient Retrieval and Verification

Code generation is a latency-sensitive task that demands high timeliness. However, with the growing interest and inherent difficulty in repository-level code generation, most existing code generation studies focus on improving the…

Artificial Intelligence · Computer Science 2025-10-01 Qianhui Zhao , Li Zhang , Fang Liu , Xiaoli Lian , Qiaoyuanhe Meng , Ziqian Jiao , Zetong Zhou , Jia Li , Lin Shi

RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation

The task of repository-level code completion is to continue writing the unfinished code based on a broader context of the repository. While for automated code completion tools, it is difficult to utilize the useful information scattered in…

Computation and Language · Computer Science 2023-10-23 Fengji Zhang , Bei Chen , Yue Zhang , Jacky Keung , Jin Liu , Daoguang Zan , Yi Mao , Jian-Guang Lou , Weizhu Chen

What to Retrieve for Effective Retrieval-Augmented Code Generation? An Empirical Study and Beyond

Repository-level code generation remains challenging due to complex code dependencies and the limitations of large language models (LLMs) in processing long contexts. While retrieval-augmented generation (RAG) frameworks are widely adopted,…

Software Engineering · Computer Science 2025-03-27 Wenchao Gu , Juntao Chen , Yanlin Wang , Tianyue Jiang , Xingzhe Li , Mingwei Liu , Xilin Liu , Yuchi Ma , Zibin Zheng

UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance

Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet code generation remains a major challenge. Current approaches for obtaining high-quality code data primarily focus on (i) collecting large-scale…

Computation and Language · Computer Science 2025-02-18 Yichuan Ma , Yunfan Shao , Peiji Li , Demin Song , Qipeng Guo , Linyang Li , Xipeng Qiu , Kai Chen

Repoformer: Selective Retrieval for Repository-Level Code Completion

Recent advances in retrieval-augmented generation (RAG) have initiated a new era in repository-level code completion. However, the invariable use of retrieval in existing methods exposes issues in both efficiency and robustness, with a…

Software Engineering · Computer Science 2024-06-05 Di Wu , Wasi Uddin Ahmad , Dejiao Zhang , Murali Krishna Ramanathan , Xiaofei Ma

BioCoder: A Benchmark for Bioinformatics Code Generation with Large Language Models

Pre-trained large language models (LLMs) have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks and to be appropriately specialized to particular…

Machine Learning · Computer Science 2024-05-22 Xiangru Tang , Bill Qian , Rick Gao , Jiakang Chen , Xinyun Chen , Mark Gerstein

JumpCoder: Go Beyond Autoregressive Coder via Online Modification

While existing code large language models (code LLMs) exhibit impressive capabilities in code generation, their autoregressive sequential generation inherently lacks reversibility. This limitation hinders them from timely correcting…

Computation and Language · Computer Science 2024-09-26 Mouxiang Chen , Hao Tian , Zhongxin Liu , Xiaoxue Ren , Jianling Sun

GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model

The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit…

Software Engineering · Computer Science 2024-09-16 Wei Liu , Ailun Yu , Daoguang Zan , Bo Shen , Wei Zhang , Haiyan Zhao , Zhi Jin , Qianxiang Wang

RealBench: A Repo-Level Code Generation Benchmark Aligned with Real-World Software Development Practices

Writing code requires significant time and effort in software development. To automate this process, researchers have made substantial progress using Large Language Models (LLMs) for code generation. Many benchmarks like HumanEval and…

Software Engineering · Computer Science 2026-04-27 Jia Li , Hongyi Deng , Yiran Zhang , Kechi Zhang , Tianqi Shao , Tiankuo Zhao , Weinan Wang , Zhi Jin , Ge Li , Yang Liu , Yingtao Fang , Yihong Dong

MEMCoder: Multi-dimensional Evolving Memory for Private-Library-Oriented Code Generation

Large Language Models (LLMs) excel at general code generation, but their performance drops sharply in enterprise settings that rely on internal private libraries absent from public pre-training corpora. While Retrieval-Augmented Generation…

Software Engineering · Computer Science 2026-04-28 Mofei Li , Taozhi Chen , Guowei Yang , Jia Li

ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation

Code translation is a crucial activity in the software development and maintenance process, and researchers have recently begun to focus on using pre-trained large language models (LLMs) for code translation. However, existing LLMs only…

Software Engineering · Computer Science 2025-09-30 Minghua He , Yue Chen , Fangkai Yang , Pu Zhao , Wenjie Yin , Yu Kang , Qingwei Lin , Saravan Rajmohan , Dongmei Zhang