Related papers: CATCODER: Repository-Level Code Generation with Re…

A Review of Repository Level Prompting for LLMs

As coding challenges become more complex, recent advancements in Large Language Models (LLMs) have led to notable successes, such as achieving a 94.6\% solve rate on the HumanEval benchmark. Concurrently, there is an increasing commercial…

Software Engineering · Computer Science 2023-12-19 Douglas Schonholtz

In Line with Context: Repository-Level Code Generation via Context Inlining

Repository-level code generation has attracted growing attention in recent years. Unlike function-level code generation, it requires the model to understand the entire repository, reasoning over complex dependencies across functions,…

Software Engineering · Computer Science 2026-05-07 Chao Hu , Wenhao Zeng , Yuling Shi , Beijun Shen , Xiaodong Gu

AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion

Repository-level code completion remains a challenging task for existing code large language models (code LLMs) due to their limited understanding of repository-specific context and domain knowledge. While retrieval-augmented generation…

Software Engineering · Computer Science 2026-01-28 Tianyue Jiang , Yanli Wang , Yanlin Wang , Daya Guo , Ensheng Shi , Yuchi Ma , Jiachi Chen , Zibin Zheng

Persistent Cross-Attempt State Optimization for Repository-Level Code Generation

Large language models (LLMs) have achieved substantial progress in repository-level code generation. However, solving the same repository-level task often requires multiple attempts, while existing methods still optimize each attempt in…

Software Engineering · Computer Science 2026-04-07 Ruwei Pan , Jiangshuai Wang , Qisheng Zhang , Yueheng Zhu , Linhao Wu , Zixiong Yang , Yakun Zhang , Lu Zhang , Hongyu Zhang

GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model

The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit…

Software Engineering · Computer Science 2024-09-16 Wei Liu , Ailun Yu , Daoguang Zan , Bo Shen , Wei Zhang , Haiyan Zhao , Zhi Jin , Qianxiang Wang

What to Retrieve for Effective Retrieval-Augmented Code Generation? An Empirical Study and Beyond

Repository-level code generation remains challenging due to complex code dependencies and the limitations of large language models (LLMs) in processing long contexts. While retrieval-augmented generation (RAG) frameworks are widely adopted,…

Software Engineering · Computer Science 2025-03-27 Wenchao Gu , Juntao Chen , Yanlin Wang , Tianyue Jiang , Xingzhe Li , Mingwei Liu , Xilin Liu , Yuchi Ma , Zibin Zheng

FastCoder: Accelerating Repository-level Code Generation via Efficient Retrieval and Verification

Code generation is a latency-sensitive task that demands high timeliness. However, with the growing interest and inherent difficulty in repository-level code generation, most existing code generation studies focus on improving the…

Artificial Intelligence · Computer Science 2025-10-01 Qianhui Zhao , Li Zhang , Fang Liu , Xiaoli Lian , Qiaoyuanhe Meng , Ziqian Jiao , Zetong Zhou , Jia Li , Lin Shi

RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation

The task of repository-level code completion is to continue writing the unfinished code based on a broader context of the repository. While for automated code completion tools, it is difficult to utilize the useful information scattered in…

Computation and Language · Computer Science 2023-10-23 Fengji Zhang , Bei Chen , Yue Zhang , Jacky Keung , Jin Liu , Daoguang Zan , Yi Mao , Jian-Guang Lou , Weizhu Chen

On the Impacts of Contexts on Repository-Level Code Generation

CodeLLMs have gained widespread adoption for code generation tasks, yet their capacity to handle repository-level code generation with complex contextual dependencies remains underexplored. Our work underscores the critical importance of…

Software Engineering · Computer Science 2025-02-11 Nam Le Hai , Dung Manh Nguyen , Nghi D. Q. Bui

BioCoder: A Benchmark for Bioinformatics Code Generation with Large Language Models

Pre-trained large language models (LLMs) have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks and to be appropriately specialized to particular…

Machine Learning · Computer Science 2024-05-22 Xiangru Tang , Bill Qian , Rick Gao , Jiakang Chen , Xinyun Chen , Mark Gerstein

UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance

Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet code generation remains a major challenge. Current approaches for obtaining high-quality code data primarily focus on (i) collecting large-scale…

Computation and Language · Computer Science 2025-02-18 Yichuan Ma , Yunfan Shao , Peiji Li , Demin Song , Qipeng Guo , Linyang Li , Xipeng Qiu , Kai Chen

Class-Level Code Generation from Natural Language Using Iterative, Tool-Enhanced Reasoning over Repository

LLMs have demonstrated significant potential in code generation tasks, achieving promising results at the function or statement level across various benchmarks. However, the complexities associated with creating code artifacts like classes,…

Software Engineering · Computer Science 2024-06-06 Ajinkya Deshpande , Anmol Agarwal , Shashank Shet , Arun Iyer , Aditya Kanade , Ramakrishna Bairi , Suresh Parthasarathy

Repoformer: Selective Retrieval for Repository-Level Code Completion

Recent advances in retrieval-augmented generation (RAG) have initiated a new era in repository-level code completion. However, the invariable use of retrieval in existing methods exposes issues in both efficiency and robustness, with a…

Software Engineering · Computer Science 2024-06-05 Di Wu , Wasi Uddin Ahmad , Dejiao Zhang , Murali Krishna Ramanathan , Xiaofei Ma

RLCoder: Reinforcement Learning for Repository-Level Code Completion

Repository-level code completion aims to generate code for unfinished code snippets within the context of a specified repository. Existing approaches mainly rely on retrieval-augmented generation strategies due to limitations in input…

Software Engineering · Computer Science 2024-07-31 Yanlin Wang , Yanli Wang , Daya Guo , Jiachi Chen , Ruikai Zhang , Yuchi Ma , Zibin Zheng

RTLRepoCoder: Repository-Level RTL Code Completion through the Combination of Fine-Tuning and Retrieval Augmentation

As an essential part of modern hardware design, manually writing Register Transfer Level (RTL) code such as Verilog is often labor-intensive. Following the tremendous success of large language models (LLMs), researchers have begun to…

Software Engineering · Computer Science 2025-04-15 Peiyang Wu , Nan Guo , Junliang Lv , Xiao Xiao , Xiaochun Ye

Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches

Recent advances in large language models (LLMs) have significantly improved automated code generation. While existing approaches have achieved strong performance at the function and file levels, real-world software engineering requires…

Software Engineering · Computer Science 2026-05-21 Yicheng Tao , Yuante Li , Yao Qin , Yepang Liu

ContextModule: Improving Code Completion via Repository-level Contextual Information

Large Language Models (LLMs) have demonstrated impressive capabilities in code completion tasks, where they assist developers by predicting and generating new code in real-time. However, existing LLM-based code completion systems primarily…

Software Engineering · Computer Science 2024-12-12 Zhanming Guan , Junlin Liu , Jierui Liu , Chao Peng , Dexin Liu , Ningyuan Sun , Bo Jiang , Wenchao Li , Jie Liu , Hang Zhu

MapCoder: Multi-Agent Code Generation for Competitive Problem Solving

Code synthesis, which requires a deep understanding of complex natural language problem descriptions, generation of code instructions for complex algorithms and data structures, and the successful execution of comprehensive unit tests,…

Computation and Language · Computer Science 2024-05-21 Md. Ashraful Islam , Mohammed Eunus Ali , Md Rizwan Parvez

RepoScope: Leveraging Call Chain-Aware Multi-View Context for Repository-Level Code Generation

Repository-level code generation aims to generate code within the context of a specified repository. Existing approaches typically employ retrieval-augmented generation (RAG) techniques to provide LLMs with relevant contextual information…

Software Engineering · Computer Science 2025-11-04 Yang Liu , Li Zhang , Fang Liu , Zhuohang Wang , Donglin Wei , Zhishuo Yang , Kechi Zhang , Jia Li , Lin Shi

Repository-Level Prompt Generation for Large Language Models of Code

With the success of large language models (LLMs) of code and their use as code assistants (e.g. Codex used in GitHub Copilot), techniques for introducing domain-specific knowledge in the prompt design process become important. In this work,…

Machine Learning · Computer Science 2023-06-21 Disha Shrivastava , Hugo Larochelle , Daniel Tarlow