Related papers: BERT2Code: Can Pretrained Language Models be Lever…

Learning Deep Semantic Model for Code Search using CodeSearchNet Corpus

Semantic code search is the task of retrieving relevant code snippet given a natural language query. Different from typical information retrieval tasks, code search requires to bridge the semantic gap between the programming language and…

Computation and Language · Computer Science 2022-01-28 Chen Wu , Ming Yan

A Literature Study of Embeddings on Source Code

Natural language processing has improved tremendously after the success of word embedding techniques such as word2vec. Recently, the same idea has been applied on source code with encouraging results. In this survey, we aim to collect and…

Machine Learning · Computer Science 2019-04-08 Zimin Chen , Martin Monperrus

Semantic Source Code Search: A Study of the Past and a Glimpse at the Future

With the recent explosion in the size and complexity of source codebases and software projects, the need for efficient source code search engines has increased dramatically. Unfortunately, existing information retrieval-based methods fail…

Software Engineering · Computer Science 2021-09-27 Muhammad Khalifa

Probing Pretrained Models of Source Code

Deep learning models are widely used for solving challenging code processing tasks, such as code generation or code summarization. Traditionally, a specific model architecture was carefully built to solve a particular code processing task.…

Software Engineering · Computer Science 2022-11-18 Sergey Troshin , Nadezhda Chirkova

Constructing Multilingual Code Search Dataset Using Neural Machine Translation

Code search is a task to find programming codes that semantically match the given natural language queries. Even though some of the existing datasets for this task are multilingual on the programming language side, their query data are only…

Computation and Language · Computer Science 2023-06-28 Ryo Sekizawa , Nan Duan , Shuai Lu , Hitomi Yanaka

Learning and Evaluating Contextual Embedding of Source Code

Recent research has achieved impressive results on understanding and improving source code by building up on machine-learning techniques developed for natural languages. A significant advancement in natural-language understanding has come…

Software Engineering · Computer Science 2020-08-19 Aditya Kanade , Petros Maniatis , Gogul Balakrishnan , Kensen Shi

Neural Code Search Evaluation Dataset

There has been an increase of interest in code search using natural language. Assessing the performance of such code search models can be difficult without a readily available evaluation suite. In this paper, we present an evaluation…

Software Engineering · Computer Science 2019-10-03 Hongyu Li , Seohyun Kim , Satish Chandra

VectorSearch: Enhancing Document Retrieval with Semantic Embeddings and Optimized Search

Traditional retrieval methods have been essential for assessing document similarity but struggle with capturing semantic nuances. Despite advancements in latent semantic analysis (LSA) and deep learning, achieving comprehensive semantic…

Information Retrieval · Computer Science 2024-09-27 Solmaz Seyed Monir , Irene Lau , Shubing Yang , Dongfang Zhao

Semantic Certainty Assessment in Vector Retrieval Systems: A Novel Framework for Embedding Quality Evaluation

Vector retrieval systems exhibit significant performance variance across queries due to heterogeneous embedding quality. We propose a lightweight framework for predicting retrieval performance at the query level by combining quantization…

Information Retrieval · Computer Science 2025-07-09 Y. Du

Extracting Sentence Embeddings from Pretrained Transformer Models

Pre-trained transformer models shine in many natural language processing tasks and therefore are expected to bear the representation of the input sentence or text meaning. These sentence-level embeddings are also important in…

Computation and Language · Computer Science 2025-02-21 Lukas Stankevičius , Mantas Lukoševičius

Survey of Code Search Based on Deep Learning

Code writing is repetitive and predictable, inspiring us to develop various code intelligence techniques. This survey focuses on code search, that is, to retrieve code that matches a given query by effectively capturing the semantic…

Software Engineering · Computer Science 2023-12-14 Yutao Xie , Jiayi Lin , Hande Dong , Lei Zhang , Zhonghai Wu

Generation-Augmented Query Expansion For Code Retrieval

Pre-trained language models have achieved promising success in code retrieval tasks, where a natural language documentation query is given to find the most relevant existing code snippet. However, existing models focus only on optimizing…

Software Engineering · Computer Science 2022-12-22 Dong Li , Yelong Shen , Ruoming Jin , Yi Mao , Kuan Wang , Weizhu Chen

On the Effectiveness of Transfer Learning for Code Search

The Transformer architecture and transfer learning have marked a quantum leap in natural language processing, improving the state of the art across a range of text-based tasks. This paper examines how these advancements can be applied to…

Software Engineering · Computer Science 2022-08-29 Pasquale Salza , Christoph Schwizer , Jian Gu , Harald C. Gall

Text Embeddings for Retrieval From a Large Knowledge Base

Text embedding representing natural language documents in a semantic vector space can be used for document retrieval using nearest neighbor lookup. In order to study the feasibility of neural models specialized for retrieval in a…

Information Retrieval · Computer Science 2019-05-03 Tolgahan Cakaloglu , Christian Szegedy , Xiaowei Xu

Better Modeling the Programming World with Code Concept Graphs-augmented Multi-modal Learning

The progress made in code modeling has been tremendous in recent years thanks to the design of natural language processing learning approaches based on state-of-the-art model architectures. Nevertheless, we believe that the current…

Software Engineering · Computer Science 2022-02-22 Martin Weyssow , Houari Sahraoui , Bang Liu

Code Search based on Context-aware Code Translation

Code search is a widely used technique by developers during software development. It provides semantically similar implementations from a large code corpus to developers based on their queries. Existing techniques leverage deep learning…

Software Engineering · Computer Science 2022-02-17 Weisong Sun , Chunrong Fang , Yuchen Chen , Guanhong Tao , Tingxu Han , Quanjun Zhang

CodeBERT-nt: code naturalness via CodeBERT

Much of software-engineering research relies on the naturalness of code, the fact that code, in small code snippets, is repetitive and can be predicted using statistical language models like n-gram. Although powerful, training such models…

Software Engineering · Computer Science 2022-08-15 Ahmed Khanfir , Matthieu Jimenez , Mike Papadakis , Yves Le Traon

Neural Code Search Revisited: Enhancing Code Snippet Retrieval through Natural Language Intent

In this work, we propose and study annotated code search: the retrieval of code snippets paired with brief descriptions of their intent using natural language queries. On three benchmark datasets, we investigate how code retrieval systems…

Information Retrieval · Computer Science 2020-08-28 Geert Heyman , Tom Van Cutsem

Neural Code Comprehension: A Learnable Representation of Code Semantics

With the recent success of embeddings in natural language processing, research has been conducted into applying similar methods to code analysis. Most works attempt to process the code directly or use a syntactic tree representation,…

Machine Learning · Computer Science 2018-11-30 Tal Ben-Nun , Alice Shoshana Jakobovits , Torsten Hoefler

AugmentedCode: Examining the Effects of Natural Language Resources in Code Retrieval Models

Code retrieval is allowing software engineers to search codes through a natural language query, which relies on both natural language processing and software engineering techniques. There have been several attempts on code retrieval from…

Software Engineering · Computer Science 2021-10-19 Mehdi Bahrami , N. C. Shrikanth , Yuji Mizobuchi , Lei Liu , Masahiro Fukuyori , Wei-Peng Chen , Kazuki Munakata