Related papers: Dataflow-Guided Retrieval Augmentation for Reposit…

GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model

The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit…

Software Engineering · Computer Science 2024-09-16 Wei Liu , Ailun Yu , Daoguang Zan , Bo Shen , Wei Zhang , Haiyan Zhao , Zhi Jin , Qianxiang Wang

ReACC: A Retrieval-Augmented Code Completion Framework

Code completion, which aims to predict the following code token(s) according to the code context, can improve the productivity of software development. Recent work has proved that statistical language modeling with transformers can greatly…

Software Engineering · Computer Science 2022-03-16 Shuai Lu , Nan Duan , Hojae Han , Daya Guo , Seung-won Hwang , Alexey Svyatkovskiy

Retrieval-augmented code completion for local projects using large language models

The use of large language models (LLMs) is becoming increasingly widespread among software developers. However, privacy and computational requirements are problematic with commercial solutions and the use of LLMs. In this work, we focus on…

Software Engineering · Computer Science 2025-06-17 Marko Hostnik , Marko Robnik-Šikonja

AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion

Repository-level code completion remains a challenging task for existing code large language models (code LLMs) due to their limited understanding of repository-specific context and domain knowledge. While retrieval-augmented generation…

Software Engineering · Computer Science 2026-01-28 Tianyue Jiang , Yanli Wang , Yanlin Wang , Daya Guo , Ensheng Shi , Yuchi Ma , Jiachi Chen , Zibin Zheng

CodeRAG: Finding Relevant and Necessary Knowledge for Retrieval-Augmented Repository-Level Code Completion

Repository-level code completion automatically predicts the unfinished code based on the broader information from the repository. Recent strides in Code Large Language Models (code LLMs) have spurred the development of repository-level code…

Computation and Language · Computer Science 2025-09-22 Sheng Zhang , Yifan Ding , Shuquan Lian , Shun Song , Hui Li

Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs

Some recently developed code large language models (Code LLMs) have been pre-trained on repository-level code data (Repo-Code LLMs), enabling these models to recognize repository structures and utilize cross-file information for code…

Computation and Language · Computer Science 2024-06-28 Lei Zhang , Yunshui Li , Jiaming Li , Xiaobo Xia , Jiaxi Yang , Run Luo , Minzheng Wang , Longze Chen , Junhao Liu , Min Yang

Repoformer: Selective Retrieval for Repository-Level Code Completion

Recent advances in retrieval-augmented generation (RAG) have initiated a new era in repository-level code completion. However, the invariable use of retrieval in existing methods exposes issues in both efficiency and robustness, with a…

Software Engineering · Computer Science 2024-06-05 Di Wu , Wasi Uddin Ahmad , Dejiao Zhang , Murali Krishna Ramanathan , Xiaofei Ma

Completion by Comprehension: Guiding Code Generation with Multi-Granularity Understanding

As code completion task from function-level to repository-level, leveraging contextual information from large-scale codebases becomes a core challenge. However, existing retrieval-augmented generation (RAG) methods typically treat code as…

Software Engineering · Computer Science 2025-12-05 Xinkui Zhao , Rongkai Liu , Yifan Zhang , Chen Zhi , Lufei Zhang , Guanjie Cheng , Yueshen Xu , Shuiguang Deng , Jianwei Yin

IRCoCo: Immediate Rewards-Guided Deep Reinforcement Learning for Code Completion

Code completion aims to enhance programming productivity by predicting potential code based on the current programming context. Recently, pretrained language models (LMs) have become prominent in this field. Various approaches have been…

Software Engineering · Computer Science 2024-02-23 Bolun Li , Zhihong Sun , Tao Huang , Hongyu Zhang , Yao Wan , Ge Li , Zhi Jin , Chen Lyu

SaraCoder: Orchestrating Semantic and Structural Cues for Resource-Optimized Repository-Level Code Completion

Despite Retrieval-Augmented Generation improving code completion, traditional retrieval methods struggle with information redundancy and a lack of diversity within limited context windows. To solve this, we propose a resource-optimized…

Software Engineering · Computer Science 2025-10-14 Xiaohan Chen , Zhongying Pan , Quan Feng , Yu Tian , Shuqun Yang , Mengru Wang , Lina Gong , Yuxia Geng , Piji Li , Xiang Chen

RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code Completion

Code Large Language Models (CodeLLMs) have demonstrated impressive proficiency in code completion tasks. However, they often fall short of fully understanding the extensive context of a project repository, such as the intricacies of…

Software Engineering · Computer Science 2024-08-15 Huy N. Phan , Hoang N. Phan , Tien N. Nguyen , Nghi D. Q. Bui

Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches

Recent advances in large language models (LLMs) have significantly improved automated code generation. While existing approaches have achieved strong performance at the function and file levels, real-world software engineering requires…

Software Engineering · Computer Science 2026-05-21 Yicheng Tao , Yuante Li , Yao Qin , Yepang Liu

Bridging the Long-Tail Gap: Robust Retrieval-Augmented Relation Completion via Multi-Stage Paraphrase Infusion

Large language models (LLMs) struggle with relation completion (RC), both with and without retrieval-augmented generation (RAG), particularly when the required information is rare or sparsely represented. To address this, we propose a novel…

Computation and Language · Computer Science 2026-04-27 Fahmida Alam , Mihai Surdeanu , Ellen Riloff

RLCoder: Reinforcement Learning for Repository-Level Code Completion

Repository-level code completion aims to generate code for unfinished code snippets within the context of a specified repository. Existing approaches mainly rely on retrieval-augmented generation strategies due to limitations in input…

Software Engineering · Computer Science 2024-07-31 Yanlin Wang , Yanli Wang , Daya Guo , Jiachi Chen , Ruikai Zhang , Yuchi Ma , Zibin Zheng

ContextModule: Improving Code Completion via Repository-level Contextual Information

Large Language Models (LLMs) have demonstrated impressive capabilities in code completion tasks, where they assist developers by predicting and generating new code in real-time. However, existing LLM-based code completion systems primarily…

Software Engineering · Computer Science 2024-12-12 Zhanming Guan , Junlin Liu , Jierui Liu , Chao Peng , Dexin Liu , Ningyuan Sun , Bo Jiang , Wenchao Li , Jie Liu , Hang Zhu

RepoGenReflex: Enhancing Repository-Level Code Completion with Verbal Reinforcement and Retrieval-Augmented Generation

In real-world software engineering tasks, solving a problem often requires understanding and modifying multiple functions, classes, and files across a large codebase. Therefore, on the repository level, it is crucial to extract the relevant…

Software Engineering · Computer Science 2024-09-25 Jicheng Wang , Yifeng He , Hao Chen

GraphCodeAgent: Dual Graph-Guided LLM Agent for Retrieval-Augmented Repo-Level Code Generation

Writing code requires significant time and effort in software development. To automate this process, researchers have made substantial progress for code generation. Recently, large language models (LLMs) have demonstrated remarkable…

Software Engineering · Computer Science 2025-11-19 Jia Li , Xianjie Shi , Kechi Zhang , Ge Li , Zhi Jin , Lei Li , Huangzhao Zhang , Jia Li , Fang Liu , Yuwei Zhang , Zhengwei Tao , Yihong Dong , Yuqi Zhu , Chongyang Tao

RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation

The task of repository-level code completion is to continue writing the unfinished code based on a broader context of the repository. While for automated code completion tools, it is difficult to utilize the useful information scattered in…

Computation and Language · Computer Science 2023-10-23 Fengji Zhang , Bei Chen , Yue Zhang , Jacky Keung , Jin Liu , Daoguang Zan , Yi Mao , Jian-Guang Lou , Weizhu Chen

Impact-driven Context Filtering For Cross-file Code Completion

Retrieval-augmented generation (RAG) has recently demonstrated considerable potential for repository-level code completion, as it integrates cross-file knowledge with in-file preceding code to provide comprehensive contexts for generation.…

Software Engineering · Computer Science 2025-08-11 Yanzhou Li , Shangqing Liu , Kangjie Chen , Tianwei Zhang , Yang Liu

R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models

Code completion models have made significant progress in recent years. Recently, repository-level code completion has drawn more attention in modern software development, and several baseline methods and benchmarks have been proposed.…

Computation and Language · Computer Science 2025-09-05 Ken Deng , Jiaheng Liu , He Zhu , Congnan Liu , Jingxin Li , Jiakai Wang , Peng Zhao , Chenchen Zhang , Yanan Wu , Xueqiao Yin , Yuanxing Zhang , Zizheng Zhan , Wenbo Su , Bangyu Xiang , Tiezheng Ge , Bo Zheng