Related papers: ReACC: A Retrieval-Augmented Code Completion Frame…

An Empirical Study of Retrieval-Augmented Code Generation: Challenges and Opportunities

Code generation aims to automatically generate code snippets of specific programming language according to natural language descriptions. The continuous advancements in deep learning, particularly pre-trained models, have empowered the code…

Software Engineering · Computer Science 2025-01-24 Zezhou Yang , Sirong Chen , Cuiyun Gao , Zhenhao Li , Xing Hu , Kui Liu , Xin Xia

AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion

Repository-level code completion remains a challenging task for existing code large language models (code LLMs) due to their limited understanding of repository-specific context and domain knowledge. While retrieval-augmented generation…

Software Engineering · Computer Science 2026-01-28 Tianyue Jiang , Yanli Wang , Yanlin Wang , Daya Guo , Ensheng Shi , Yuchi Ma , Jiachi Chen , Zibin Zheng

Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion

Recent years have witnessed the deployment of code language models (LMs) in various code intelligence tasks such as code completion. Yet, it is challenging for pre-trained LMs to generate correct completions in private repositories.…

Software Engineering · Computer Science 2024-05-31 Wei Cheng , Yuhan Wu , Wei Hu

Prompt-based Code Completion via Multi-Retrieval Augmented Generation

Automated code completion, aiming at generating subsequent tokens from unfinished code, has been significantly benefited from recent progress in pre-trained Large Language Models (LLMs). However, these models often suffer from coherence…

Software Engineering · Computer Science 2024-05-14 Hanzhuo Tan , Qi Luo , Ling Jiang , Zizheng Zhan , Jing Li , Haotian Zhang , Yuqun Zhang

RepoGenReflex: Enhancing Repository-Level Code Completion with Verbal Reinforcement and Retrieval-Augmented Generation

In real-world software engineering tasks, solving a problem often requires understanding and modifying multiple functions, classes, and files across a large codebase. Therefore, on the repository level, it is crucial to extract the relevant…

Software Engineering · Computer Science 2024-09-25 Jicheng Wang , Yifeng He , Hao Chen

RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation

The task of repository-level code completion is to continue writing the unfinished code based on a broader context of the repository. While for automated code completion tools, it is difficult to utilize the useful information scattered in…

Computation and Language · Computer Science 2023-10-23 Fengji Zhang , Bei Chen , Yue Zhang , Jacky Keung , Jin Liu , Daoguang Zan , Yi Mao , Jian-Guang Lou , Weizhu Chen

Better Context Makes Better Code Language Models: A Case Study on Function Call Argument Completion

Pretrained code language models have enabled great progress towards program synthesis. However, common approaches only consider in-file local context and thus miss information and constraints imposed by other parts of the codebase and its…

Software Engineering · Computer Science 2023-06-02 Hengzhi Pei , Jinman Zhao , Leonard Lausen , Sheng Zha , George Karypis

Enriching Source Code with Contextual Data for Code Completion Models: An Empirical Study

Transformer-based pre-trained models have recently achieved great results in solving many software engineering tasks including automatic code completion which is a staple in a developer's toolkit. While many have striven to improve the…

Computation and Language · Computer Science 2023-04-25 Tim van Dam , Maliheh Izadi , Arie van Deursen

CodeRAG: Finding Relevant and Necessary Knowledge for Retrieval-Augmented Repository-Level Code Completion

Repository-level code completion automatically predicts the unfinished code based on the broader information from the repository. Recent strides in Code Large Language Models (code LLMs) have spurred the development of repository-level code…

Computation and Language · Computer Science 2025-09-22 Sheng Zhang , Yifan Ding , Shuquan Lian , Shun Song , Hui Li

CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning

To accelerate software development, much research has been performed to help people understand and reuse the huge amount of available code resources. Two important tasks have been widely studied: code retrieval, which aims to retrieve code…

Software Engineering · Computer Science 2019-04-02 Ziyu Yao , Jayavardhan Reddy Peddamail , Huan Sun

Completion by Comprehension: Guiding Code Generation with Multi-Granularity Understanding

As code completion task from function-level to repository-level, leveraging contextual information from large-scale codebases becomes a core challenge. However, existing retrieval-augmented generation (RAG) methods typically treat code as…

Software Engineering · Computer Science 2025-12-05 Xinkui Zhao , Rongkai Liu , Yifan Zhang , Chen Zhi , Lufei Zhang , Guanjie Cheng , Yueshen Xu , Shuiguang Deng , Jianwei Yin

Impact-driven Context Filtering For Cross-file Code Completion

Retrieval-augmented generation (RAG) has recently demonstrated considerable potential for repository-level code completion, as it integrates cross-file knowledge with in-file preceding code to provide comprehensive contexts for generation.…

Software Engineering · Computer Science 2025-08-11 Yanzhou Li , Shangqing Liu , Kangjie Chen , Tianwei Zhang , Yang Liu

What to Retrieve for Effective Retrieval-Augmented Code Generation? An Empirical Study and Beyond

Repository-level code generation remains challenging due to complex code dependencies and the limitations of large language models (LLMs) in processing long contexts. While retrieval-augmented generation (RAG) frameworks are widely adopted,…

Software Engineering · Computer Science 2025-03-27 Wenchao Gu , Juntao Chen , Yanlin Wang , Tianyue Jiang , Xingzhe Li , Mingwei Liu , Xilin Liu , Yuchi Ma , Zibin Zheng

Beyond Function-Level Search: Repository-Aware Dual-Encoder Code Retrieval with Adversarial Verification

The escalating complexity of modern codebases has intensified the need for retrieval systems capable of interpreting cross-component change intents, a capability fundamentally absent in conventional function-level search paradigms. While…

Software Engineering · Computer Science 2025-10-30 Aofan Liu , Shiyuan Song , Haoxuan Li , Cehao Yang , Yiyan Qi

Repoformer: Selective Retrieval for Repository-Level Code Completion

Recent advances in retrieval-augmented generation (RAG) have initiated a new era in repository-level code completion. However, the invariable use of retrieval in existing methods exposes issues in both efficiency and robustness, with a…

Software Engineering · Computer Science 2024-06-05 Di Wu , Wasi Uddin Ahmad , Dejiao Zhang , Murali Krishna Ramanathan , Xiaofei Ma

AugmentedCode: Examining the Effects of Natural Language Resources in Code Retrieval Models

Code retrieval is allowing software engineers to search codes through a natural language query, which relies on both natural language processing and software engineering techniques. There have been several attempts on code retrieval from…

Software Engineering · Computer Science 2021-10-19 Mehdi Bahrami , N. C. Shrikanth , Yuji Mizobuchi , Lei Liu , Masahiro Fukuyori , Wei-Peng Chen , Kazuki Munakata

A Retrieve-and-Edit Framework for Predicting Structured Outputs

For the task of generating complex outputs such as source code, editing existing outputs can be easier than generating complex outputs from scratch. With this motivation, we propose an approach that first retrieves a training example based…

Machine Learning · Statistics 2018-12-05 Tatsunori B. Hashimoto , Kelvin Guu , Yonatan Oren , Percy Liang

REPOFUSE: Repository-Level Code Completion with Fused Dual Context

The success of language models in code assistance has spurred the proposal of repository-level code completion as a means to enhance prediction accuracy, utilizing the context from the entire codebase. However, this amplified context can…

Software Engineering · Computer Science 2024-02-26 Ming Liang , Xiaoheng Xie , Gehao Zhang , Xunjin Zheng , Peng Di , wei jiang , Hongwei Chen , Chengpeng Wang , Gang Fan

Towards Full-line Code Completion with Neural Language Models

A code completion system suggests future code elements to developers given a partially-complete code snippet. Code completion is one of the most useful features in Integrated Development Environments (IDEs). Currently, most code completion…

Software Engineering · Computer Science 2020-09-21 Wenhan Wang , Sijie Shen , Ge Li , Zhi Jin

Retrieval-augmented code completion for local projects using large language models

The use of large language models (LLMs) is becoming increasingly widespread among software developers. However, privacy and computational requirements are problematic with commercial solutions and the use of LLMs. In this work, we focus on…

Software Engineering · Computer Science 2025-06-17 Marko Hostnik , Marko Robnik-Šikonja