Related papers: CRaDLe: Deep Code Retrieval Based on Semantic Depe…

Structure-Grounded Knowledge Retrieval via Code Dependencies for Multi-Step Data Reasoning

Selecting the right knowledge is critical when using large language models (LLMs) to solve domain-specific data analysis tasks. However, most retrieval-augmented approaches rely primarily on lexical or embedding similarity, which is often a…

Computation and Language · Computer Science 2026-04-28 Xinyi Huang

Deep Graph Matching and Searching for Semantic Code Retrieval

Code retrieval is to find the code snippet from a large corpus of source code repositories that highly matches the query of natural language description. Recent work mainly uses natural language processing techniques to process both query…

Artificial Intelligence · Computer Science 2021-06-23 Xiang Ling , Lingfei Wu , Saizhuo Wang , Gaoning Pan , Tengfei Ma , Fangli Xu , Alex X. Liu , Chunming Wu , Shouling Ji

Source Code Retrieval Using Sequence Based Similarity

Duplicated code has a negative impact on the quality of software systems and should be detected at least. In this paper, we discuss an approach that improves source code retrieval using the structural information about the programs. We…

Software Engineering · Computer Science 2013-08-19 Yoshihisa Udagawa

Automating the Detection of Requirement Dependencies Using Large Language Models

Requirements are inherently interconnected through various types of dependencies. Identifying these dependencies is essential, as they underpin critical decisions and influence a range of activities throughout software development. However,…

Software Engineering · Computer Science 2026-02-27 Ikram Darif , Feifei Niu , Manel Abdellatif , Lionel C. Briand , Ramesh S. , Arun Adiththan

On the Challenges and Opportunities of Learned Sparse Retrieval for Code

Retrieval over large codebases is a key component of modern LLM-based software engineering systems. Existing approaches predominantly rely on dense embedding models, while learned sparse retrieval (LSR) remains largely unexplored for code.…

Information Retrieval · Computer Science 2026-03-24 Simon Lupart , Maxime Louis , Thibault Formal , Hervé Déjean , Stéphane Clinchant

CSRS: Code Search with Relevance Matching and Semantic Matching

Developers often search and reuse existing code snippets in the process of software development. Code search aims to retrieve relevant code snippets from a codebase according to natural language queries entered by the developer. Up to now,…

Software Engineering · Computer Science 2022-04-28 Yi Cheng , Li Kuang

Structural Code Search using Natural Language Queries

Searching code is a common task that developers perform to understand APIs, learn common code patterns, and navigate code. Currently, developers most commonly search using keywords and regular expressions that are easy to use and widely…

Software Engineering · Computer Science 2025-07-04 Ben Limpanukorn , Yanjun Wang , Zach Patterson , Pranav Garg , Murali Krishna Ramanathan , Xiaofei Ma , Anoop Deoras , Miryung Kim

Clone-Seeker: Effective Code Clone Search Using Annotations

Source code search plays an important role in software development, e.g. for exploratory development or opportunistic reuse of existing code from a code base. Often, exploration of different implementations with the same functionality is…

Software Engineering · Computer Science 2021-06-08 Muhammad Hammad , Önder Babur , Hamid Abdul Basit , Mark van den Brand

Retrieval-Based Neural Code Generation

In models to generate program source code from natural language, representing this code in a tree structure has been a common approach. However, existing methods often fail to generate complex code correctly due to a lack of ability to…

Computation and Language · Computer Science 2018-08-31 Shirley Anugrah Hayati , Raphael Olivier , Pravalika Avvaru , Pengcheng Yin , Anthony Tomasic , Graham Neubig

Source Code Comments: Overlooked in the Realm of Code Clone Detection

Reusing code can produce duplicate or near-duplicate code clones in code repositories. Current code clone detection techniques, like Program Dependence Graphs, rely on code structure and their dependencies to detect clones. These techniques…

Software Engineering · Computer Science 2020-06-26 Sandeep Kaur Kuttal , Akash Ghosh

Effective Reformulation of Query for Code Search using Crowdsourced Knowledge and Extra-Large Data Analytics

Software developers frequently issue generic natural language queries for code search while using code search engines (e.g., GitHub native search, Krugle). Such queries often do not lead to any relevant results due to vocabulary mismatch…

Software Engineering · Computer Science 2018-07-25 Mohammad Masudur Rahman , Chanchal K. Roy

Learning Deep Semantic Model for Code Search using CodeSearchNet Corpus

Semantic code search is the task of retrieving relevant code snippet given a natural language query. Different from typical information retrieval tasks, code search requires to bridge the semantic gap between the programming language and…

Computation and Language · Computer Science 2022-01-28 Chen Wu , Ming Yan

SECRET: Towards Scalable and Efficient Code Retrieval via Segmented Deep Hashing

Code retrieval, which retrieves code snippets based on users' natural language descriptions, is widely used by developers and plays a pivotal role in real-world software development. The advent of deep learning has shifted the retrieval…

Software Engineering · Computer Science 2024-12-17 Wenchao Gu , Ensheng Shi , Yanlin Wang , Lun Du , Shi Han , Hongyu Zhang , Dongmei Zhang , Michael R. Lyu

CodeSearchNet Challenge: Evaluating the State of Semantic Code Search

Semantic code search is the task of retrieving relevant code given a natural language query. While related to other information retrieval tasks, it requires bridging the gap between the language used in code (often abbreviated and highly…

Machine Learning · Computer Science 2020-06-09 Hamel Husain , Ho-Hsiang Wu , Tiferet Gazit , Miltiadis Allamanis , Marc Brockschmidt

Do Not Treat Code as Natural Language: Implications for Repository-Level Code Generation and Beyond

Large language models for code (CodeLLMs) have demonstrated remarkable success in standalone code completion and generation, sometimes even surpassing human performance, yet their effectiveness diminishes in repository-level settings where…

Software Engineering · Computer Science 2026-02-13 Minh Le-Anh , Huyen Nguyen , Khanh An Tran , Nam Le Hai , Linh Ngo Van , Nghi D. Q. Bui , Bach Le

CoNCRA: A Convolutional Neural Network Code Retrieval Approach

Software developers routinely search for code using general-purpose search engines. However, these search engines cannot find code semantically unless it has an accompanying description. We propose a technique for semantic code search: A…

Machine Learning · Computer Science 2024-01-24 Marcelo de Rezende Martins , Marco A. Gerosa

Dependency Parsing with the Structuralized Prompt Template

Dependency parsing is a fundamental task in natural language processing (NLP), aiming to identify syntactic dependencies and construct a syntactic tree for a given sentence. Traditional dependency parsing models typically construct…

Computation and Language · Computer Science 2025-02-25 Keunha Kim , Youngjoong Ko

Instructive Code Retriever: Learn from Large Language Model's Feedback for Code Intelligence Tasks

Recent studies proposed to leverage large language models (LLMs) with In-Context Learning (ICL) to handle code intelligence tasks without fine-tuning. ICL employs task instructions and a set of examples as demonstrations to guide the model…

Software Engineering · Computer Science 2024-10-16 Jiawei Lu , Haoye Wang , Zhongxin Liu , Keyu Liang , Lingfeng Bao , Xiaohu Yang

Crowd Sourced Data Analysis: Mapping of Programming Concepts to Syntactical Patterns

Since programming concepts do not match their syntactic representations, code search is a very tedious task. For instance in Java or C, array doesn't match [], so using "array" as a query, one cannot find what they are looking for. Often…

Information Retrieval · Computer Science 2019-04-01 Deepak Thukral , Darvesh Punia

Code Search based on Context-aware Code Translation

Code search is a widely used technique by developers during software development. It provides semantically similar implementations from a large code corpus to developers based on their queries. Existing techniques leverage deep learning…

Software Engineering · Computer Science 2022-02-17 Weisong Sun , Chunrong Fang , Yuchen Chen , Guanhong Tao , Tingxu Han , Quanjun Zhang