Related papers: CoaCor: Code Annotation for Code Retrieval with Re…

Neural Code Search Revisited: Enhancing Code Snippet Retrieval through Natural Language Intent

In this work, we propose and study annotated code search: the retrieval of code snippets paired with brief descriptions of their intent using natural language queries. On three benchmark datasets, we investigate how code retrieval systems…

Information Retrieval · Computer Science 2020-08-28 Geert Heyman , Tom Van Cutsem

OCoR: An Overlapping-Aware Code Retriever

Code retrieval helps developers reuse the code snippet in the open-source projects. Given a natural language description, code retrieval aims to search for the most relevant code among a set of code. Existing state-of-the-art approaches…

Computation and Language · Computer Science 2020-08-21 Qihao Zhu , Zeyu Sun , Xiran Liang , Yingfei Xiong , Lu Zhang

ReACC: A Retrieval-Augmented Code Completion Framework

Code completion, which aims to predict the following code token(s) according to the code context, can improve the productivity of software development. Recent work has proved that statistical language modeling with transformers can greatly…

Software Engineering · Computer Science 2022-03-16 Shuai Lu , Nan Duan , Hojae Han , Daya Guo , Seung-won Hwang , Alexey Svyatkovskiy

AugmentedCode: Examining the Effects of Natural Language Resources in Code Retrieval Models

Code retrieval is allowing software engineers to search codes through a natural language query, which relies on both natural language processing and software engineering techniques. There have been several attempts on code retrieval from…

Software Engineering · Computer Science 2021-10-19 Mehdi Bahrami , N. C. Shrikanth , Yuji Mizobuchi , Lei Liu , Masahiro Fukuyori , Wei-Peng Chen , Kazuki Munakata

An Empirical Study of Retrieval-Augmented Code Generation: Challenges and Opportunities

Code generation aims to automatically generate code snippets of specific programming language according to natural language descriptions. The continuous advancements in deep learning, particularly pre-trained models, have empowered the code…

Software Engineering · Computer Science 2025-01-24 Zezhou Yang , Sirong Chen , Cuiyun Gao , Zhenhao Li , Xing Hu , Kui Liu , Xin Xia

Clone-Seeker: Effective Code Clone Search Using Annotations

Source code search plays an important role in software development, e.g. for exploratory development or opportunistic reuse of existing code from a code base. Often, exploration of different implementations with the same functionality is…

Software Engineering · Computer Science 2021-06-08 Muhammad Hammad , Önder Babur , Hamid Abdul Basit , Mark van den Brand

Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning

Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query. Since both tasks aim to model the association between natural…

Information Retrieval · Computer Science 2020-02-26 Wei Ye , Rui Xie , Jinglei Zhang , Tianxiang Hu , Xiaoyin Wang , Shikun Zhang

Generation-Augmented Query Expansion For Code Retrieval

Pre-trained language models have achieved promising success in code retrieval tasks, where a natural language documentation query is given to find the most relevant existing code snippet. However, existing models focus only on optimizing…

Software Engineering · Computer Science 2022-12-22 Dong Li , Yelong Shen , Ruoming Jin , Yi Mao , Kuan Wang , Weizhu Chen

Adversarial Training for Code Retrieval with Question-Description Relevance Regularization

Code retrieval is a key task aiming to match natural and programming languages. In this work, we propose adversarial learning for code retrieval, that is regularized by question-description relevance. First, we adapt a simple adversarial…

Computation and Language · Computer Science 2020-11-11 Jie Zhao , Huan Sun

Retrieval-Based Neural Code Generation

In models to generate program source code from natural language, representing this code in a tree structure has been a common approach. However, existing methods often fail to generate complex code correctly due to a lack of ability to…

Computation and Language · Computer Science 2018-08-31 Shirley Anugrah Hayati , Raphael Olivier , Pravalika Avvaru , Pengcheng Yin , Anthony Tomasic , Graham Neubig

CoSQA: 20,000+ Web Queries for Code Search and Question Answering

Finding codes given natural language query isb eneficial to the productivity of software developers. Future progress towards better semantic matching between query and code requires richer supervised training resources. To remedy this, we…

Computation and Language · Computer Science 2021-05-28 Junjie Huang , Duyu Tang , Linjun Shou , Ming Gong , Ke Xu , Daxin Jiang , Ming Zhou , Nan Duan

Retrieval Augmented Code Generation and Summarization

Software developers write a lot of source code and documentation during software development. Intrinsically, developers often recall parts of source code or code summaries that they had written in the past while implementing software or…

Software Engineering · Computer Science 2021-09-13 Md Rizwan Parvez , Wasi Uddin Ahmad , Saikat Chakraborty , Baishakhi Ray , Kai-Wei Chang

AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion

Repository-level code completion remains a challenging task for existing code large language models (code LLMs) due to their limited understanding of repository-specific context and domain knowledge. While retrieval-augmented generation…

Software Engineering · Computer Science 2026-01-28 Tianyue Jiang , Yanli Wang , Yanlin Wang , Daya Guo , Ensheng Shi , Yuchi Ma , Jiachi Chen , Zibin Zheng

MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization

Large Language Models (LLMs) demonstrate strong capabilities in general coding tasks but encounter two key challenges when optimizing code: (i) the complexity of writing optimized code (such as performant CUDA kernels and competition-level…

Machine Learning · Computer Science 2026-01-12 Jiefu Ou , Sapana Chaudhary , Kaj Bostrom , Nathaniel Weir , Shuai Zhang , Huzefa Rangwala , George Karypis

CoNCRA: A Convolutional Neural Network Code Retrieval Approach

Software developers routinely search for code using general-purpose search engines. However, these search engines cannot find code semantically unless it has an accompanying description. We propose a technique for semantic code search: A…

Machine Learning · Computer Science 2024-01-24 Marcelo de Rezende Martins , Marco A. Gerosa

RETROcode: Leveraging a Code Database for Improved Natural Language to Code Generation

As text and code resources have expanded, large-scale pre-trained models have shown promising capabilities in code generation tasks, typically employing supervised fine-tuning with problem statement-program pairs. However, increasing model…

Computation and Language · Computer Science 2025-04-10 Nathanaël Beau , Benoît Crabbé

Building A Coding Assistant via the Retrieval-Augmented Language Model

Pretrained language models have shown strong effectiveness in code-related tasks, such as code retrieval, code generation, code summarization, and code completion tasks. In this paper, we propose COde assistaNt viA retrieval-augmeNted…

Computation and Language · Computer Science 2024-11-05 Xinze Li , Hanbin Wang , Zhenghao Liu , Shi Yu , Shuo Wang , Yukun Yan , Yukai Fu , Yu Gu , Ge Yu

REINFOREST: Reinforcing Semantic Code Similarity for Cross-Lingual Code Search Models

This paper introduces a novel code-to-code search technique that enhances the performance of Large Language Models (LLMs) by including both static and dynamic features as well as utilizing both similar and dissimilar examples during…

Software Engineering · Computer Science 2024-04-17 Anthony Saieva , Saikat Chakraborty , Gail Kaiser

CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents

A prerequisite for coding agents to perform tasks on large repositories is code localization - the identification of relevant files, classes, and functions to work on. While repository-level code localization has been performed using…

Software Engineering · Computer Science 2026-03-19 Lintang Sutawika , Aditya Bharat Soni , Bharath Sriraam R R , Apurva Gandhi , Taha Yassine , Sanidhya Vijayvargiya , Yuchen Li , Xuhui Zhou , Yilin Zhang , Leander Melroy Maben , Graham Neubig

Reinforced Context Order Recovery for Adaptive Reasoning and Planning

Modern causal language models, followed by rapid developments in discrete diffusion models, can now produce a wide variety of interesting and useful content. However, these families of models are predominantly trained to output tokens with…

Computation and Language · Computer Science 2025-08-19 Long Ma , Fangwei Zhong , Yizhou Wang