Related papers: Bug Prediction Using Source Code Embedding Based o…

Using Distributed Representation of Code for Bug Detection

Recent advances in neural modeling for bug detection have been very promising. More specifically, using snippets of code to create continuous vectors or \textit{embeddings} has been shown to be very good at method name prediction and…

Software Engineering · Computer Science 2020-05-14 Jón Arnar Briem , Jordi Smit , Hendrig Sellik , Pavel Rapoport

Towards Demystifying Dimensions of Source Code Embeddings

Source code representations are key in applying machine learning techniques for processing and analyzing programs. A popular approach in representing source code is neural source code embeddings that represents programs with…

Machine Learning · Computer Science 2022-06-17 Md Rafiqul Islam Rabin , Arjun Mukherjee , Omprakash Gnawali , Mohammad Amin Alipour

A Literature Study of Embeddings on Source Code

Natural language processing has improved tremendously after the success of word embedding techniques such as word2vec. Recently, the same idea has been applied on source code with encouraging results. In this survey, we aim to collect and…

Machine Learning · Computer Science 2019-04-08 Zimin Chen , Martin Monperrus

code2vec: Learning Distributed Representations of Code

We present a neural model for representing snippets of code as continuous distributed vectors ("code embeddings"). The main idea is to represent a code snippet as a single fixed-length $\textit{code vector}$, which can be used to predict…

Machine Learning · Computer Science 2018-10-31 Uri Alon , Meital Zilberstein , Omer Levy , Eran Yahav

Aligning Programming Language and Natural Language: Exploring Design Choices in Multi-Modal Transformer-Based Embedding for Bug Localization

Bug localization refers to the identification of source code files which is in a programming language and also responsible for the unexpected behavior of software using the bug report, which is a natural language. As bug localization is…

Software Engineering · Computer Science 2024-06-26 Partha Chakraborty , Venkatraman Arumugam , Meiyappan Nagappan

Feature Engineering-Based Detection of Buffer Overflow Vulnerability in Source Code Using Neural Networks

One of the most significant challenges in the field of software code auditing is the presence of vulnerabilities in software source code. Every year, more and more software flaws are discovered, either internally in proprietary code or…

Cryptography and Security · Computer Science 2023-06-16 Mst Shapna Akter , Hossain Shahriar , Juan Rodriguez Cardenas , Sheikh Iqbal Ahamed , Alfredo Cuzzocrea

A Controlled Experiment of Different Code Representations for Learning-Based Bug Repair

Training a deep learning model on source code has gained significant traction recently. Since such models reason about vectors of numbers, source code needs to be converted to a code representation before vectorization. Numerous approaches…

Software Engineering · Computer Science 2022-07-18 Marjane Namavar , Noor Nashid , Ali Mesbah

GraphCode2Vec: Generic Code Embedding via Lexical and Program Dependence Analyses

Code embedding is a keystone in the application of machine learning on several Software Engineering (SE) tasks. To effectively support a plethora of SE tasks, the embedding needs to capture program syntax and semantics in a way that is…

Software Engineering · Computer Science 2022-01-24 Wei Ma , Mengjie Zhao , Ezekiel Soremekun , Qiang Hu , Jie Zhang , Mike Papadakis , Maxime Cordy , Xiaofei Xie , Yves Le Traon

Embedding Java Classes with code2vec: Improvements from Variable Obfuscation

Automatic source code analysis in key areas of software engineering, such as code security, can benefit from Machine Learning (ML). However, many standard ML approaches require a numeric representation of data and cannot be applied directly…

Machine Learning · Computer Science 2020-04-08 Rhys Compton , Eibe Frank , Panos Patros , Abigail Koay

An Effective Approach to Embedding Source Code by Combining Large Language and Sentence Embedding Models

The advent of large language models (LLMs) has significantly advanced artificial intelligence (AI) in software engineering (SE), with source code embeddings playing a crucial role in tasks such as source code clone detection and source code…

Software Engineering · Computer Science 2025-06-04 Zixiang Xian , Chenhui Cui , Rubing Huang , Chunrong Fang , Zhenyu Chen

A Comparison of Code Embeddings and Beyond

Program representation learning is a fundamental task in software engineering applications. With the availability of "big code" and the development of deep learning techniques, various program representation learning models have been…

Software Engineering · Computer Science 2021-09-17 Siqi Han , DongXia Wang , Wanting Li , Xuesong Lu

DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging

Tagging news articles or blog posts with relevant tags from a collection of predefined ones is coined as document tagging in this work. Accurate tagging of articles can benefit several downstream applications such as recommendation and…

Computation and Language · Computer Science 2017-07-18 Sheng Chen , Akshay Soni , Aasish Pappu , Yashar Mehdad

Learning and Evaluating Contextual Embedding of Source Code

Recent research has achieved impressive results on understanding and improving source code by building up on machine-learning techniques developed for natural languages. A significant advancement in natural-language understanding has come…

Software Engineering · Computer Science 2020-08-19 Aditya Kanade , Petros Maniatis , Gogul Balakrishnan , Kensen Shi

Neural Code Comprehension: A Learnable Representation of Code Semantics

With the recent success of embeddings in natural language processing, research has been conducted into applying similar methods to code analysis. Most works attempt to process the code directly or use a syntactic tree representation,…

Machine Learning · Computer Science 2018-11-30 Tal Ben-Nun , Alice Shoshana Jakobovits , Torsten Hoefler

An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation

Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings. Despite promising results in the original paper, others have struggled to reproduce those results. This…

Computation and Language · Computer Science 2016-12-19 Jey Han Lau , Timothy Baldwin

Efficient Vector Representation for Documents through Corruption

We present an efficient document representation learning framework, Document Vector through Corruption (Doc2VecC). Doc2VecC represents each document as a simple average of word embeddings. It ensures a representation generated as such…

Computation and Language · Computer Science 2017-07-11 Minmin Chen

Commit2Vec: Learning Distributed Representations of Code Changes

Deep learning methods, which have found successful applications in fields like image classification and natural language processing, have recently been applied to source code analysis too, due to the enormous amount of freely available…

Software Engineering · Computer Science 2021-11-18 Rocìo Cabrera Lozoya , Arnaud Baumann , Antonino Sabetta , Michele Bezzi

Citation Recommendations Considering Content and Structural Context Embedding

The number of academic papers being published is increasing exponentially in recent years, and recommending adequate citations to assist researchers in writing papers is a non-trivial task. Conventional approaches may not be optimal, as the…

Information Retrieval · Computer Science 2020-01-09 Yang Zhang , Qiang Ma

CC2Vec: Distributed Representations of Code Changes

Existing work on software patches often use features specific to a single task. These works often rely on manually identified features, and human effort is required to identify these features for each task. In this work, we propose CC2Vec,…

Software Engineering · Computer Science 2020-03-13 Thong Hoang , Hong Jin Kang , Julia Lawall , David Lo

Method-Level Bug Severity Prediction using Source Code Metrics and LLMs

In the past couple of decades, significant research efforts are devoted to the prediction of software bugs. However, most existing work in this domain treats all bugs the same, which is not the case in practice. It is important for a defect…

Software Engineering · Computer Science 2023-09-07 Ehsan Mashhadi , Hossein Ahmadvand , Hadi Hemmati