Related papers: CodeDSI: Differentiable Code Search

An Empirical Study of Retrieval-Augmented Code Generation: Challenges and Opportunities

Code generation aims to automatically generate code snippets of specific programming language according to natural language descriptions. The continuous advancements in deep learning, particularly pre-trained models, have empowered the code…

Software Engineering · Computer Science 2025-01-24 Zezhou Yang , Sirong Chen , Cuiyun Gao , Zhenhao Li , Xing Hu , Kui Liu , Xin Xia

CoDesc: A Large Code-Description Parallel Dataset

Translation between natural language and source code can help software development by enabling developers to comprehend, ideate, search, and write computer programs in natural language. Despite growing interest from the industry and the…

Computation and Language · Computer Science 2021-06-01 Masum Hasan , Tanveer Muttaqueen , Abdullah Al Ishtiaq , Kazi Sajeed Mehrab , Md. Mahim Anjum Haque , Tahmid Hasan , Wasi Uddin Ahmad , Anindya Iqbal , Rifat Shahriyar

Code Search Debiasing:Improve Search Results beyond Overall Ranking Performance

Code search engine is an essential tool in software development. Many code search methods have sprung up, focusing on the overall ranking performance of code search. In this paper, we study code search from another perspective by analyzing…

Computation and Language · Computer Science 2024-02-20 Sheng Zhang , Hui Li , Yanlin Wang , Zhao Wei , Yong Xiu , Juhong Wang , Rongong Ji

Constructing Multilingual Code Search Dataset Using Neural Machine Translation

Code search is a task to find programming codes that semantically match the given natural language queries. Even though some of the existing datasets for this task are multilingual on the programming language side, their query data are only…

Computation and Language · Computer Science 2023-06-28 Ryo Sekizawa , Nan Duan , Shuai Lu , Hitomi Yanaka

A Neural-based Program Decompiler

Reverse engineering of binary executables is a critical problem in the computer security domain. On the one hand, malicious parties may recover interpretable source codes from the software products to gain commercial advantages. On the…

Programming Languages · Computer Science 2019-07-01 Cheng Fu , Huili Chen , Haolan Liu , Xinyun Chen , Yuandong Tian , Farinaz Koushanfar , Jishen Zhao

Deep Learning Based Code Generation Methods: Literature Review

This paper focuses on Code Generation task that aims at generating relevant code fragments according to given natural language descriptions. In the process of software development, developers often encounter two scenarios. One is requested…

Software Engineering · Computer Science 2024-04-19 Zezhou Yang , Sirong Chen , Cuiyun Gao , Zhenhao Li , Ge Li , Michael Lyu

Opportunities and Challenges in Code Search Tools

Code search is a core software engineering task. Effective code search tools can help developers substantially improve their software development efficiency and effectiveness. In recent years, many code search studies have leveraged…

Software Engineering · Computer Science 2021-10-12 Chao Liu , Xin Xia , David Lo , Cuiyun Gao , Xiaohu Yang , John Grundy

Retrieval-Based Neural Code Generation

In models to generate program source code from natural language, representing this code in a tree structure has been a common approach. However, existing methods often fail to generate complex code correctly due to a lack of ability to…

Computation and Language · Computer Science 2018-08-31 Shirley Anugrah Hayati , Raphael Olivier , Pravalika Avvaru , Pengcheng Yin , Anthony Tomasic , Graham Neubig

Generation-Augmented Query Expansion For Code Retrieval

Pre-trained language models have achieved promising success in code retrieval tasks, where a natural language documentation query is given to find the most relevant existing code snippet. However, existing models focus only on optimizing…

Software Engineering · Computer Science 2022-12-22 Dong Li , Yelong Shen , Ruoming Jin , Yi Mao , Kuan Wang , Weizhu Chen

A Search-Based Testing Framework for Deep Neural Networks of Source Code Embedding

Over the past few years, deep neural networks (DNNs) have been continuously expanding their real-world applications for source code processing tasks across the software engineering domain, e.g., clone detection, code search, comment…

Software Engineering · Computer Science 2021-01-21 Maryam Vahdat Pour , Zhuo Li , Lei Ma , Hadi Hemmati

CodeFusion: A Pre-trained Diffusion Model for Code Generation

Imagine a developer who can only change their last line of code, how often would they have to start writing a function from scratch before it is correct? Auto-regressive models for code generation from natural language have a similar…

Software Engineering · Computer Science 2023-11-02 Mukul Singh , José Cambronero , Sumit Gulwani , Vu Le , Carina Negreanu , Gust Verbruggen

Searching a Database of Source Codes Using Contextualized Code Search

Consider the case where a programmer has written some part of a program, but has left part of the program (such as a method or a function body) incomplete. The goal is to use the context surrounding the missing code to automatically 'figure…

Software Engineering · Computer Science 2020-07-28 Rohan Mukherjee , Swarat Chaudhuri , Chris Jermaine

KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding

We introduce KodCode, a synthetic dataset that addresses the persistent challenge of acquiring high-quality, verifiable training data across diverse difficulties and domains for training Large Language Models for coding. Existing…

Machine Learning · Computer Science 2025-07-15 Zhangchen Xu , Yang Liu , Yueqin Yin , Mingyuan Zhou , Radha Poovendran

DocPrompting: Generating Code by Retrieving the Docs

Publicly available source-code libraries are continuously growing and changing. This makes it impossible for models of code to keep current with all available APIs by simply training these models on existing code repositories. Thus,…

Computation and Language · Computer Science 2023-02-21 Shuyan Zhou , Uri Alon , Frank F. Xu , Zhiruo Wang , Zhengbao Jiang , Graham Neubig

Code Search: A Survey of Techniques for Finding Code

The immense amounts of source code provide ample challenges and opportunities during software development. To handle the size of code bases, developers commonly search for code, e.g., when trying to find where a particular feature is…

Software Engineering · Computer Science 2022-10-06 Luca Di Grazia , Michael Pradel

Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning

Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query. Since both tasks aim to model the association between natural…

Information Retrieval · Computer Science 2020-02-26 Wei Ye , Rui Xie , Jinglei Zhang , Tianxiang Hu , Xiaoyin Wang , Shikun Zhang

Survey of Code Search Based on Deep Learning

Code writing is repetitive and predictable, inspiring us to develop various code intelligence techniques. This survey focuses on code search, that is, to retrieve code that matches a given query by effectively capturing the semantic…

Software Engineering · Computer Science 2023-12-14 Yutao Xie , Jiayi Lin , Hande Dong , Lei Zhang , Zhonghai Wu

CodeLSI: Leveraging Foundation Models for Automated Code Generation with Low-Rank Optimization and Domain-Specific Instruction Tuning

Context: Automated code generation using Foundation Models (FMs) offers promising solutions for enhancing software development efficiency. However, challenges remain in ensuring domain specificity, cost-effectiveness, and security -…

Software Engineering · Computer Science 2025-09-19 Huy Le , Phong Nguyen , Hao Do , Tuan Nguyen , Thien Pham , Anh Nguyen-Duc , Tho Quan

Neural Code Search Evaluation Dataset

There has been an increase of interest in code search using natural language. Assessing the performance of such code search models can be difficult without a readily available evaluation suite. In this paper, we present an evaluation…

Software Engineering · Computer Science 2019-10-03 Hongyu Li , Seohyun Kim , Satish Chandra

CodeT: Code Generation with Generated Tests

The task of generating code solutions for a given programming problem can benefit from the use of pre-trained language models such as Codex, which can produce multiple diverse samples. However, a major challenge for this task is to select…

Computation and Language · Computer Science 2022-11-24 Bei Chen , Fengji Zhang , Anh Nguyen , Daoguang Zan , Zeqi Lin , Jian-Guang Lou , Weizhu Chen