English
Related papers

Related papers: Dataflow-Guided Retrieval Augmentation for Reposit…

200 papers

The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit…

Software Engineering · Computer Science 2024-09-16 Wei Liu , Ailun Yu , Daoguang Zan , Bo Shen , Wei Zhang , Haiyan Zhao , Zhi Jin , Qianxiang Wang

Code completion, which aims to predict the following code token(s) according to the code context, can improve the productivity of software development. Recent work has proved that statistical language modeling with transformers can greatly…

Software Engineering · Computer Science 2022-03-16 Shuai Lu , Nan Duan , Hojae Han , Daya Guo , Seung-won Hwang , Alexey Svyatkovskiy

The use of large language models (LLMs) is becoming increasingly widespread among software developers. However, privacy and computational requirements are problematic with commercial solutions and the use of LLMs. In this work, we focus on…

Software Engineering · Computer Science 2025-06-17 Marko Hostnik , Marko Robnik-Šikonja

Repository-level code completion remains a challenging task for existing code large language models (code LLMs) due to their limited understanding of repository-specific context and domain knowledge. While retrieval-augmented generation…

Software Engineering · Computer Science 2026-01-28 Tianyue Jiang , Yanli Wang , Yanlin Wang , Daya Guo , Ensheng Shi , Yuchi Ma , Jiachi Chen , Zibin Zheng

Repository-level code completion automatically predicts the unfinished code based on the broader information from the repository. Recent strides in Code Large Language Models (code LLMs) have spurred the development of repository-level code…

Computation and Language · Computer Science 2025-09-22 Sheng Zhang , Yifan Ding , Shuquan Lian , Shun Song , Hui Li

Some recently developed code large language models (Code LLMs) have been pre-trained on repository-level code data (Repo-Code LLMs), enabling these models to recognize repository structures and utilize cross-file information for code…

Computation and Language · Computer Science 2024-06-28 Lei Zhang , Yunshui Li , Jiaming Li , Xiaobo Xia , Jiaxi Yang , Run Luo , Minzheng Wang , Longze Chen , Junhao Liu , Min Yang

Recent advances in retrieval-augmented generation (RAG) have initiated a new era in repository-level code completion. However, the invariable use of retrieval in existing methods exposes issues in both efficiency and robustness, with a…

Software Engineering · Computer Science 2024-06-05 Di Wu , Wasi Uddin Ahmad , Dejiao Zhang , Murali Krishna Ramanathan , Xiaofei Ma

As code completion task from function-level to repository-level, leveraging contextual information from large-scale codebases becomes a core challenge. However, existing retrieval-augmented generation (RAG) methods typically treat code as…

Software Engineering · Computer Science 2025-12-05 Xinkui Zhao , Rongkai Liu , Yifan Zhang , Chen Zhi , Lufei Zhang , Guanjie Cheng , Yueshen Xu , Shuiguang Deng , Jianwei Yin

Code completion aims to enhance programming productivity by predicting potential code based on the current programming context. Recently, pretrained language models (LMs) have become prominent in this field. Various approaches have been…

Software Engineering · Computer Science 2024-02-23 Bolun Li , Zhihong Sun , Tao Huang , Hongyu Zhang , Yao Wan , Ge Li , Zhi Jin , Chen Lyu

Despite Retrieval-Augmented Generation improving code completion, traditional retrieval methods struggle with information redundancy and a lack of diversity within limited context windows. To solve this, we propose a resource-optimized…

Software Engineering · Computer Science 2025-10-14 Xiaohan Chen , Zhongying Pan , Quan Feng , Yu Tian , Shuqun Yang , Mengru Wang , Lina Gong , Yuxia Geng , Piji Li , Xiang Chen

Code Large Language Models (CodeLLMs) have demonstrated impressive proficiency in code completion tasks. However, they often fall short of fully understanding the extensive context of a project repository, such as the intricacies of…

Software Engineering · Computer Science 2024-08-15 Huy N. Phan , Hoang N. Phan , Tien N. Nguyen , Nghi D. Q. Bui

Recent advances in large language models (LLMs) have significantly improved automated code generation. While existing approaches have achieved strong performance at the function and file levels, real-world software engineering requires…

Software Engineering · Computer Science 2026-05-21 Yicheng Tao , Yuante Li , Yao Qin , Yepang Liu

Large language models (LLMs) struggle with relation completion (RC), both with and without retrieval-augmented generation (RAG), particularly when the required information is rare or sparsely represented. To address this, we propose a novel…

Computation and Language · Computer Science 2026-04-27 Fahmida Alam , Mihai Surdeanu , Ellen Riloff

Repository-level code completion aims to generate code for unfinished code snippets within the context of a specified repository. Existing approaches mainly rely on retrieval-augmented generation strategies due to limitations in input…

Software Engineering · Computer Science 2024-07-31 Yanlin Wang , Yanli Wang , Daya Guo , Jiachi Chen , Ruikai Zhang , Yuchi Ma , Zibin Zheng

Large Language Models (LLMs) have demonstrated impressive capabilities in code completion tasks, where they assist developers by predicting and generating new code in real-time. However, existing LLM-based code completion systems primarily…

Software Engineering · Computer Science 2024-12-12 Zhanming Guan , Junlin Liu , Jierui Liu , Chao Peng , Dexin Liu , Ningyuan Sun , Bo Jiang , Wenchao Li , Jie Liu , Hang Zhu

In real-world software engineering tasks, solving a problem often requires understanding and modifying multiple functions, classes, and files across a large codebase. Therefore, on the repository level, it is crucial to extract the relevant…

Software Engineering · Computer Science 2024-09-25 Jicheng Wang , Yifeng He , Hao Chen

Writing code requires significant time and effort in software development. To automate this process, researchers have made substantial progress for code generation. Recently, large language models (LLMs) have demonstrated remarkable…

Software Engineering · Computer Science 2025-11-19 Jia Li , Xianjie Shi , Kechi Zhang , Ge Li , Zhi Jin , Lei Li , Huangzhao Zhang , Jia Li , Fang Liu , Yuwei Zhang , Zhengwei Tao , Yihong Dong , Yuqi Zhu , Chongyang Tao

The task of repository-level code completion is to continue writing the unfinished code based on a broader context of the repository. While for automated code completion tools, it is difficult to utilize the useful information scattered in…

Computation and Language · Computer Science 2023-10-23 Fengji Zhang , Bei Chen , Yue Zhang , Jacky Keung , Jin Liu , Daoguang Zan , Yi Mao , Jian-Guang Lou , Weizhu Chen

Retrieval-augmented generation (RAG) has recently demonstrated considerable potential for repository-level code completion, as it integrates cross-file knowledge with in-file preceding code to provide comprehensive contexts for generation.…

Software Engineering · Computer Science 2025-08-11 Yanzhou Li , Shangqing Liu , Kangjie Chen , Tianwei Zhang , Yang Liu

Code completion models have made significant progress in recent years. Recently, repository-level code completion has drawn more attention in modern software development, and several baseline methods and benchmarks have been proposed.…

‹ Prev 1 2 3 10 Next ›