Related papers: CodeRAG: Finding Relevant and Necessary Knowledge …

AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion

Repository-level code completion remains a challenging task for existing code large language models (code LLMs) due to their limited understanding of repository-specific context and domain knowledge. While retrieval-augmented generation…

Software Engineering · Computer Science 2026-01-28 Tianyue Jiang , Yanli Wang , Yanlin Wang , Daya Guo , Ensheng Shi , Yuchi Ma , Jiachi Chen , Zibin Zheng

Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches

Recent advances in large language models (LLMs) have significantly improved automated code generation. While existing approaches have achieved strong performance at the function and file levels, real-world software engineering requires…

Software Engineering · Computer Science 2026-05-21 Yicheng Tao , Yuante Li , Yao Qin , Yepang Liu

Repoformer: Selective Retrieval for Repository-Level Code Completion

Recent advances in retrieval-augmented generation (RAG) have initiated a new era in repository-level code completion. However, the invariable use of retrieval in existing methods exposes issues in both efficiency and robustness, with a…

Software Engineering · Computer Science 2024-06-05 Di Wu , Wasi Uddin Ahmad , Dejiao Zhang , Murali Krishna Ramanathan , Xiaofei Ma

GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model

The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit…

Software Engineering · Computer Science 2024-09-16 Wei Liu , Ailun Yu , Daoguang Zan , Bo Shen , Wei Zhang , Haiyan Zhao , Zhi Jin , Qianxiang Wang

GrepRAG: An Empirical Study and Optimization of Grep-Like Retrieval for Code Completion

Repository-level code completion remains challenging for large language models (LLMs) due to cross-file dependencies and limited context windows. Prior work addresses this challenge using Retrieval-Augmented Generation (RAG) frameworks…

Software Engineering · Computer Science 2026-02-10 Baoyi Wang , Xingliang Wang , Guochang Li , Chen Zhi , Junxiao Han , Xinkui Zhao , Nan Wang , Shuiguang Deng , Jianwei Yin

A Deep Dive into Retrieval-Augmented Generation for Code Completion: Experience on WeChat

Code completion, a crucial task in software engineering that enhances developer productivity, has seen substantial improvements with the rapid advancement of large language models (LLMs). In recent years, retrieval-augmented generation…

Software Engineering · Computer Science 2025-07-25 Zezhou Yang , Ting Peng , Cuiyun Gao , Chaozheng Wang , Hailiang Huang , Yuetang Deng

RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation

The task of repository-level code completion is to continue writing the unfinished code based on a broader context of the repository. While for automated code completion tools, it is difficult to utilize the useful information scattered in…

Computation and Language · Computer Science 2023-10-23 Fengji Zhang , Bei Chen , Yue Zhang , Jacky Keung , Jin Liu , Daoguang Zan , Yi Mao , Jian-Guang Lou , Weizhu Chen

Prompt-based Code Completion via Multi-Retrieval Augmented Generation

Automated code completion, aiming at generating subsequent tokens from unfinished code, has been significantly benefited from recent progress in pre-trained Large Language Models (LLMs). However, these models often suffer from coherence…

Software Engineering · Computer Science 2024-05-14 Hanzhuo Tan , Qi Luo , Ling Jiang , Zizheng Zhan , Jing Li , Haotian Zhang , Yuqun Zhang

CodePlan: Repository-level Coding using LLMs and Planning

Software engineering activities such as package migration, fixing errors reports from static analysis or testing, and adding type annotations or other specifications to a codebase, involve pervasively editing the entire repository of code.…

Software Engineering · Computer Science 2023-09-25 Ramakrishna Bairi , Atharv Sonwane , Aditya Kanade , Vageesh D C , Arun Iyer , Suresh Parthasarathy , Sriram Rajamani , B. Ashok , Shashank Shet

CodeRAG-Bench: Can Retrieval Augment Code Generation?

While language models (LMs) have proven remarkably adept at generating code, many programs are challenging for LMs to generate using their parametric knowledge alone. Providing external contexts such as library documentation can facilitate…

Software Engineering · Computer Science 2025-02-28 Zora Zhiruo Wang , Akari Asai , Xinyan Velocity Yu , Frank F. Xu , Yiqing Xie , Graham Neubig , Daniel Fried

RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code Completion

Code Large Language Models (CodeLLMs) have demonstrated impressive proficiency in code completion tasks. However, they often fall short of fully understanding the extensive context of a project repository, such as the intricacies of…

Software Engineering · Computer Science 2024-08-15 Huy N. Phan , Hoang N. Phan , Tien N. Nguyen , Nghi D. Q. Bui

In Line with Context: Repository-Level Code Generation via Context Inlining

Repository-level code generation has attracted growing attention in recent years. Unlike function-level code generation, it requires the model to understand the entire repository, reasoning over complex dependencies across functions,…

Software Engineering · Computer Science 2026-05-07 Chao Hu , Wenhao Zeng , Yuling Shi , Beijun Shen , Xiaodong Gu

RepoGenReflex: Enhancing Repository-Level Code Completion with Verbal Reinforcement and Retrieval-Augmented Generation

In real-world software engineering tasks, solving a problem often requires understanding and modifying multiple functions, classes, and files across a large codebase. Therefore, on the repository level, it is crucial to extract the relevant…

Software Engineering · Computer Science 2024-09-25 Jicheng Wang , Yifeng He , Hao Chen

SelfRACG: Enabling LLMs to Self-Express and Retrieve for Code Generation

Existing retrieval-augmented code generation (RACG) methods typically use an external retrieval module to fetch semantically similar code snippets used for generating subsequent fragments. However, even for consecutive code fragments, the…

Information Retrieval · Computer Science 2025-10-10 Qian Dong , Jia Chen , Qingyao Ai , Hongning Wang , Haitao Li , Yi Wu , Yao Hu , Yiqun Liu , Shaoping Ma

Enhancing Project-Specific Code Completion by Inferring Internal API Information

Project-specific code completion is a critical task that leverages context from a project to generate accurate code. State-of-the-art methods use retrieval-augmented generation (RAG) with large language models (LLMs) and project information…

Software Engineering · Computer Science 2025-07-29 Le Deng , Xiaoxue Ren , Chao Ni , Ming Liang , David Lo , Zhongxin Liu

ReACC: A Retrieval-Augmented Code Completion Framework

Code completion, which aims to predict the following code token(s) according to the code context, can improve the productivity of software development. Recent work has proved that statistical language modeling with transformers can greatly…

Software Engineering · Computer Science 2022-03-16 Shuai Lu , Nan Duan , Hojae Han , Daya Guo , Seung-won Hwang , Alexey Svyatkovskiy

RLCoder: Reinforcement Learning for Repository-Level Code Completion

Repository-level code completion aims to generate code for unfinished code snippets within the context of a specified repository. Existing approaches mainly rely on retrieval-augmented generation strategies due to limitations in input…

Software Engineering · Computer Science 2024-07-31 Yanlin Wang , Yanli Wang , Daya Guo , Jiachi Chen , Ruikai Zhang , Yuchi Ma , Zibin Zheng

Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion

Recent years have witnessed the deployment of code language models (LMs) in various code intelligence tasks such as code completion. Yet, it is challenging for pre-trained LMs to generate correct completions in private repositories.…

Software Engineering · Computer Science 2024-05-31 Wei Cheng , Yuhan Wu , Wei Hu

Repository-level Code Search with Neural Retrieval Methods

This paper presents a multi-stage reranking system for repository-level code search, which leverages the vastly available commit histories of large open-source repositories to aid in bug fixing. We define the task of repository-level code…

Information Retrieval · Computer Science 2025-02-12 Siddharth Gandhi , Luyu Gao , Jamie Callan

ReCode: Improving LLM-based Code Repair with Fine-Grained Retrieval-Augmented Generation

Recent advances in large language models (LLMs) have demonstrated impressive capabilities in code-related tasks, such as code generation and automated program repair. Despite their promising performance, most existing approaches for code…

Software Engineering · Computer Science 2025-09-03 Yicong Zhao , Shisong Chen , Jiacheng Zhang , Zhixu Li