Related papers: DiffSearch: A Scalable and Precise Search Engine f…
Developers often refactor source code to improve its quality during software development. A challenge in refactoring is to determine if it can be applied or not. To help with this decision-making process, we aim to search for past…
The immense amounts of source code provide ample challenges and opportunities during software development. To handle the size of code bases, developers commonly search for code, e.g., when trying to find where a particular feature is…
Fixing software bugs and adding new features are two of the major maintenance tasks. Software bugs and features are reported as change requests. Developers consult these requests and often choose a few keywords from them as an ad hoc query.…
A source code difference (diff) indicates changes made by comparing new and old source codes, and it can be utilized in code reviews to help developers understand the changes made to the code. Although many diff generation methods have been…
A crucial activity in software maintenance and evolution is the comprehension of the changes performed by developers, when they submit a pull request and/or perform a commit on the repository. Typically, code changes are represented in the…
(Source) code search is widely concerned by software engineering researchers because it can improve the productivity and quality of software development. Given a functionality requirement usually described in a natural language sentence, a…
Code search is a core software engineering task. Effective code search tools can help developers substantially improve their software development efficiency and effectiveness. In recent years, many code search studies have leveraged…
While scaling training compute has led to remarkable improvements in large language models (LLMs), scaling inference compute has not yet yielded analogous gains. We hypothesize that a core missing component is a lack of diverse LLM outputs,…
Traditional code search engines often do not perform well with natural language queries since they mostly apply keyword matching. These engines thus require carefully designed queries containing information about programming APIs for code…
Code writing is repetitive and predictable, inspiring us to develop various code intelligence techniques. This survey focuses on code search, that is, to retrieve code that matches a given query by effectively capturing the semantic…
Code search engine is an essential tool in software development. Many code search methods have sprung up, focusing on the overall ranking performance of code search. In this paper, we study code search from another perspective by analyzing…
Developers spend a significant amount of time searching for code: e.g., to understand how to complete, correct, or adapt their own code for a new context. Unfortunately, the state of the art in code search has not evolved much beyond text…
Code search is a task to find programming codes that semantically match the given natural language queries. Even though some of the existing datasets for this task are multilingual on the programming language side, their query data are only…
Software developers frequently issue generic natural language queries for code search while using code search engines (e.g., GitHub native search, Krugle). Such queries often do not lead to any relevant results due to vocabulary mismatch…
Semantic code search has been widely adopted in both academia and industry. These approaches embed natural-language queries and code snippets into a shared embedding space and retrieve results based on vector similarity. Despit strong…
This paper introduces a novel code-to-code search technique that enhances the performance of Large Language Models (LLMs) by including both static and dynamic features as well as utilizing both similar and dissimilar examples during…
Comments within source code are essential for developers to comprehend the code's purpose and ensure its correct usage. However, as codebases evolve, maintaining an accurate alignment between the comments and the code becomes increasingly…
Duplicated code has a negative impact on the quality of software systems and should be detected at least. In this paper, we discuss an approach that improves source code retrieval using the structural information about the programs. We…
Refactoring is a well-known technique that is widely adopted by software engineers to improve the design and enable the evolution of a system. Knowing which refactoring operations were applied in a code change is a valuable information to…
Source code search plays an important role in software development, e.g. for exploratory development or opportunistic reuse of existing code from a code base. Often, exploration of different implementations with the same functionality is…