Related papers: Retrieval-Based Neural Code Generation

RETROcode: Leveraging a Code Database for Improved Natural Language to Code Generation

As text and code resources have expanded, large-scale pre-trained models have shown promising capabilities in code generation tasks, typically employing supervised fine-tuning with problem statement-program pairs. However, increasing model…

Computation and Language · Computer Science 2025-04-10 Nathanaël Beau , Benoît Crabbé

ReCode: Improving LLM-based Code Repair with Fine-Grained Retrieval-Augmented Generation

Recent advances in large language models (LLMs) have demonstrated impressive capabilities in code-related tasks, such as code generation and automated program repair. Despite their promising performance, most existing approaches for code…

Software Engineering · Computer Science 2025-09-03 Yicong Zhao , Shisong Chen , Jiacheng Zhang , Zhixu Li

An Empirical Study of Retrieval-Augmented Code Generation: Challenges and Opportunities

Code generation aims to automatically generate code snippets of specific programming language according to natural language descriptions. The continuous advancements in deep learning, particularly pre-trained models, have empowered the code…

Software Engineering · Computer Science 2025-01-24 Zezhou Yang , Sirong Chen , Cuiyun Gao , Zhenhao Li , Xing Hu , Kui Liu , Xin Xia

Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning

Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query. Since both tasks aim to model the association between natural…

Information Retrieval · Computer Science 2020-02-26 Wei Ye , Rui Xie , Jinglei Zhang , Tianxiang Hu , Xiaoyin Wang , Shikun Zhang

Retrieval Augmented Code Generation and Summarization

Software developers write a lot of source code and documentation during software development. Intrinsically, developers often recall parts of source code or code summaries that they had written in the past while implementing software or…

Software Engineering · Computer Science 2021-09-13 Md Rizwan Parvez , Wasi Uddin Ahmad , Saikat Chakraborty , Baishakhi Ray , Kai-Wei Chang

Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization

Automatically generating human-readable text describing the functionality of a program is the intent of source code summarization. Although neural language models achieve significant performance in this field, they are limited by their…

Artificial Intelligence · Computer Science 2024-04-02 Tong Ye , Lingfei Wu , Tengfei Ma , Xuhong Zhang , Yangkai Du , Peiyu Liu , Shouling Ji , Wenhai Wang

Neural Code Search Revisited: Enhancing Code Snippet Retrieval through Natural Language Intent

In this work, we propose and study annotated code search: the retrieval of code snippets paired with brief descriptions of their intent using natural language queries. On three benchmark datasets, we investigate how code retrieval systems…

Information Retrieval · Computer Science 2020-08-28 Geert Heyman , Tom Van Cutsem

In Tree Structure Should Sentence Be Generated

Generative models reliant on sequential autoregression have been at the forefront of language generation for an extensive period, particularly following the introduction of widely acclaimed transformers. Despite its excellent performance,…

Computation and Language · Computer Science 2024-06-21 Yaguang Li , Xin Chen

ReACC: A Retrieval-Augmented Code Completion Framework

Code completion, which aims to predict the following code token(s) according to the code context, can improve the productivity of software development. Recent work has proved that statistical language modeling with transformers can greatly…

Software Engineering · Computer Science 2022-03-16 Shuai Lu , Nan Duan , Hojae Han , Daya Guo , Seung-won Hwang , Alexey Svyatkovskiy

Retrieve and Refine: Exemplar-based Neural Comment Generation

Code comment generation which aims to automatically generate natural language descriptions for source code, is a crucial task in the field of automatic software development. Traditional comment generation methods use manually-crafted…

Software Engineering · Computer Science 2020-10-12 Bolin Wei , Yongmin Li , Ge Li , Xin Xia , Zhi Jin

Retrieval-Augmented Generation for Code Summarization via Hybrid GNN

Source code summarization aims to generate natural language summaries from structured code snippets for better understanding code functionalities. However, automatic code summarization is challenging due to the complexity of the source code…

Machine Learning · Computer Science 2021-05-14 Shangqing Liu , Yu Chen , Xiaofei Xie , Jingkai Siow , Yang Liu

What to Retrieve for Effective Retrieval-Augmented Code Generation? An Empirical Study and Beyond

Repository-level code generation remains challenging due to complex code dependencies and the limitations of large language models (LLMs) in processing long contexts. While retrieval-augmented generation (RAG) frameworks are widely adopted,…

Software Engineering · Computer Science 2025-03-27 Wenchao Gu , Juntao Chen , Yanlin Wang , Tianyue Jiang , Xingzhe Li , Mingwei Liu , Xilin Liu , Yuchi Ma , Zibin Zheng

Improving Retrieval-Augmented Code Comment Generation by Retrieving for Generation

Code comment generation aims to generate high-quality comments from source code automatically and has been studied for years. Recent studies proposed to integrate information retrieval techniques with neural generation models to tackle this…

Software Engineering · Computer Science 2024-08-08 Hanzhen Lu , Zhongxin Liu

A Grammar-Based Structural CNN Decoder for Code Generation

Code generation maps a program description to executable source code in a programming language. Existing approaches mainly rely on a recurrent neural network (RNN) as the decoder. However, we find that a program contains significantly more…

Machine Learning · Computer Science 2018-11-19 Zeyu Sun , Qihao Zhu , Lili Mou , Yingfei Xiong , Ge Li , Lu Zhang

AugmentedCode: Examining the Effects of Natural Language Resources in Code Retrieval Models

Code retrieval is allowing software engineers to search codes through a natural language query, which relies on both natural language processing and software engineering techniques. There have been several attempts on code retrieval from…

Software Engineering · Computer Science 2021-10-19 Mehdi Bahrami , N. C. Shrikanth , Yuji Mizobuchi , Lei Liu , Masahiro Fukuyori , Wei-Peng Chen , Kazuki Munakata

code2seq: Generating Sequences from Structured Representations of Code

The ability to generate natural language sequences from source code snippets has a variety of applications such as code summarization, documentation, and retrieval. Sequence-to-sequence (seq2seq) models, adopted from neural machine…

Machine Learning · Computer Science 2019-02-22 Uri Alon , Shaked Brody , Omer Levy , Eran Yahav

Improving Tree-Structured Decoder Training for Code Generation via Mutual Learning

Code generation aims to automatically generate a piece of code given an input natural language utterance. Currently, among dominant models, it is treated as a sequence-to-tree task, where a decoder outputs a sequence of actions…

Artificial Intelligence · Computer Science 2021-06-01 Binbin Xie , Jinsong Su , Yubin Ge , Xiang Li , Jianwei Cui , Junfeng Yao , Bin Wang

Retrieve and Refine: Exemplar-based Neural Comment Generation

Code comment generation is a crucial task in the field of automatic software development. Most previous neural comment generation systems used an encoder-decoder neural network and encoded only information from source code as input.…

Software Engineering · Computer Science 2019-10-24 Bolin Wei

Generation-Augmented Query Expansion For Code Retrieval

Pre-trained language models have achieved promising success in code retrieval tasks, where a natural language documentation query is given to find the most relevant existing code snippet. However, existing models focus only on optimizing…

Software Engineering · Computer Science 2022-12-22 Dong Li , Yelong Shen , Ruoming Jin , Yi Mao , Kuan Wang , Weizhu Chen

CodeDSI: Differentiable Code Search

Reimplementing solutions to previously solved software engineering problems is not only inefficient but also introduces inadequate and error-prone code. Many existing methods achieve impressive performance on this issue by using…

Software Engineering · Computer Science 2022-10-04 Usama Nadeem , Noah Ziems , Shaoen Wu