Related papers: Retrieval Augmented Code Generation and Summarizat…

RetrievalSum: A Retrieval Enhanced Framework for Abstractive Summarization

Existing summarization systems mostly generate summaries purely relying on the content of the source document. However, even for humans, we usually need some references or exemplars to help us fully understand the source document and write…

Computation and Language · Computer Science 2021-12-14 Chenxin An , Ming Zhong , Zhichao Geng , Jianqiang Yang , Xipeng Qiu

What to Retrieve for Effective Retrieval-Augmented Code Generation? An Empirical Study and Beyond

Repository-level code generation remains challenging due to complex code dependencies and the limitations of large language models (LLMs) in processing long contexts. While retrieval-augmented generation (RAG) frameworks are widely adopted,…

Software Engineering · Computer Science 2025-03-27 Wenchao Gu , Juntao Chen , Yanlin Wang , Tianyue Jiang , Xingzhe Li , Mingwei Liu , Xilin Liu , Yuchi Ma , Zibin Zheng

AugmentedCode: Examining the Effects of Natural Language Resources in Code Retrieval Models

Code retrieval is allowing software engineers to search codes through a natural language query, which relies on both natural language processing and software engineering techniques. There have been several attempts on code retrieval from…

Software Engineering · Computer Science 2021-10-19 Mehdi Bahrami , N. C. Shrikanth , Yuji Mizobuchi , Lei Liu , Masahiro Fukuyori , Wei-Peng Chen , Kazuki Munakata

Retrieval-Augmented Generation for Code Summarization via Hybrid GNN

Source code summarization aims to generate natural language summaries from structured code snippets for better understanding code functionalities. However, automatic code summarization is challenging due to the complexity of the source code…

Machine Learning · Computer Science 2021-05-14 Shangqing Liu , Yu Chen , Xiaofei Xie , Jingkai Siow , Yang Liu

Retrieval-Based Neural Code Generation

In models to generate program source code from natural language, representing this code in a tree structure has been a common approach. However, existing methods often fail to generate complex code correctly due to a lack of ability to…

Computation and Language · Computer Science 2018-08-31 Shirley Anugrah Hayati , Raphael Olivier , Pravalika Avvaru , Pengcheng Yin , Anthony Tomasic , Graham Neubig

EditSum: A Retrieve-and-Edit Framework for Source Code Summarization

Existing studies show that code summaries help developers understand and maintain source code. Unfortunately, these summaries are often missing or outdated in software projects. Code summarization aims to generate natural language…

Software Engineering · Computer Science 2023-09-08 Jia Li , Yongmin Li , Ge Li , Xing Hu , Xin Xia , Zhi Jin

RepoSummary: Feature-Oriented Summarization and Documentation Generation for Code Repositories

Repository summarization is a crucial research question in development and maintenance for software engineering. Existing repository summarization techniques primarily focus on summarizing code according to the directory tree, which is…

Software Engineering · Computer Science 2025-10-14 Yifeng Zhu , Xianlin Zhao , Xutian Li , Yanzhen Zou , Haizhuo Yuan , Yue Wang , Bing Xie

SaraCoder: Orchestrating Semantic and Structural Cues for Resource-Optimized Repository-Level Code Completion

Despite Retrieval-Augmented Generation improving code completion, traditional retrieval methods struggle with information redundancy and a lack of diversity within limited context windows. To solve this, we propose a resource-optimized…

Software Engineering · Computer Science 2025-10-14 Xiaohan Chen , Zhongying Pan , Quan Feng , Yu Tian , Shuqun Yang , Mengru Wang , Lina Gong , Yuxia Geng , Piji Li , Xiang Chen

An Empirical Study of Retrieval-Augmented Code Generation: Challenges and Opportunities

Code generation aims to automatically generate code snippets of specific programming language according to natural language descriptions. The continuous advancements in deep learning, particularly pre-trained models, have empowered the code…

Software Engineering · Computer Science 2025-01-24 Zezhou Yang , Sirong Chen , Cuiyun Gao , Zhenhao Li , Xing Hu , Kui Liu , Xin Xia

Generation-Augmented Query Expansion For Code Retrieval

Pre-trained language models have achieved promising success in code retrieval tasks, where a natural language documentation query is given to find the most relevant existing code snippet. However, existing models focus only on optimizing…

Software Engineering · Computer Science 2022-12-22 Dong Li , Yelong Shen , Ruoming Jin , Yi Mao , Kuan Wang , Weizhu Chen

Improving Retrieval-Augmented Code Comment Generation by Retrieving for Generation

Code comment generation aims to generate high-quality comments from source code automatically and has been studied for years. Recent studies proposed to integrate information retrieval techniques with neural generation models to tackle this…

Software Engineering · Computer Science 2024-08-08 Hanzhen Lu , Zhongxin Liu

ReACC: A Retrieval-Augmented Code Completion Framework

Code completion, which aims to predict the following code token(s) according to the code context, can improve the productivity of software development. Recent work has proved that statistical language modeling with transformers can greatly…

Software Engineering · Computer Science 2022-03-16 Shuai Lu , Nan Duan , Hojae Han , Daya Guo , Seung-won Hwang , Alexey Svyatkovskiy

Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning

Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query. Since both tasks aim to model the association between natural…

Information Retrieval · Computer Science 2020-02-26 Wei Ye , Rui Xie , Jinglei Zhang , Tianxiang Hu , Xiaoyin Wang , Shikun Zhang

An Extractive-and-Abstractive Framework for Source Code Summarization

(Source) Code summarization aims to automatically generate summaries/comments for a given code snippet in the form of natural language. Such summaries play a key role in helping developers understand and maintain source code. Existing code…

Software Engineering · Computer Science 2023-11-07 Weisong Sun , Chunrong Fang , Yuchen Chen , Quanjun Zhang , Guanhong Tao , Tingxu Han , Yifei Ge , Yudu You , Bin Luo

ReCode: Improving LLM-based Code Repair with Fine-Grained Retrieval-Augmented Generation

Recent advances in large language models (LLMs) have demonstrated impressive capabilities in code-related tasks, such as code generation and automated program repair. Despite their promising performance, most existing approaches for code…

Software Engineering · Computer Science 2025-09-03 Yicong Zhao , Shisong Chen , Jiacheng Zhang , Zhixu Li

When Retriever Meets Generator: A Joint Model for Code Comment Generation

Automatically generating concise, informative comments for source code can lighten documentation effort and accelerate program comprehension. Retrieval-augmented approaches first fetch code snippets with existing comments and then…

Software Engineering · Computer Science 2025-07-25 Tien P. T. Le , Anh M. T. Bui , Huy N. D. Pham , Alessio Bucaioni , Phuong T. Nguyen

Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization

Automatically generating human-readable text describing the functionality of a program is the intent of source code summarization. Although neural language models achieve significant performance in this field, they are limited by their…

Artificial Intelligence · Computer Science 2024-04-02 Tong Ye , Lingfei Wu , Tengfei Ma , Xuhong Zhang , Yangkai Du , Peiyu Liu , Shouling Ji , Wenhai Wang

AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion

Repository-level code completion remains a challenging task for existing code large language models (code LLMs) due to their limited understanding of repository-specific context and domain knowledge. While retrieval-augmented generation…

Software Engineering · Computer Science 2026-01-28 Tianyue Jiang , Yanli Wang , Yanlin Wang , Daya Guo , Ensheng Shi , Yuchi Ma , Jiachi Chen , Zibin Zheng

Ensemble Models for Neural Source Code Summarization of Subroutines

A source code summary of a subroutine is a brief description of that subroutine. Summaries underpin a majority of documentation consumed by programmers, such as the method summaries in JavaDocs. Source code summarization is the task of…

Software Engineering · Computer Science 2021-07-27 Alexander LeClair , Aakash Bansal , Collin McMillan

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

Retrieval-augmented language models can better adapt to changes in world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic…

Computation and Language · Computer Science 2024-02-01 Parth Sarthi , Salman Abdullah , Aditi Tuli , Shubh Khanna , Anna Goldie , Christopher D. Manning