Related papers: CC2Vec: Distributed Representations of Code Change…

CCBERT: Self-Supervised Code Change Representation Learning

Numerous code changes are made by developers in their daily work, and a superior representation of code changes is desired for effective code change analysis. Recently, Hoang et al. proposed CC2Vec, a neural network-based approach that…

Software Engineering · Computer Science 2023-09-28 Xin Zhou , Bowen Xu , DongGyun Han , Zhou Yang , Junda He , David Lo

code2vec: Learning Distributed Representations of Code

We present a neural model for representing snippets of code as continuous distributed vectors ("code embeddings"). The main idea is to represent a code snippet as a single fixed-length $\textit{code vector}$, which can be used to predict…

Machine Learning · Computer Science 2018-10-31 Uri Alon , Meital Zilberstein , Omer Levy , Eran Yahav

node2vec: Scalable Feature Learning for Networks

Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating…

Social and Information Networks · Computer Science 2016-07-05 Aditya Grover , Jure Leskovec

CCRep: Learning Code Change Representations via Pre-Trained Code Model and Query Back

Representing code changes as numeric feature vectors, i.e., code change representations, is usually an essential step to automate many software engineering tasks related to code changes, e.g., commit message generation and just-in-time…

Software Engineering · Computer Science 2023-02-09 Zhongxin Liu , Zhijie Tang , Xin Xia , Xiaohu Yang

Using Distributed Representation of Code for Bug Detection

Recent advances in neural modeling for bug detection have been very promising. More specifically, using snippets of code to create continuous vectors or \textit{embeddings} has been shown to be very good at method name prediction and…

Software Engineering · Computer Science 2020-05-14 Jón Arnar Briem , Jordi Smit , Hendrig Sellik , Pavel Rapoport

CC2Vec: Combining Typed Tokens with Contrastive Learning for Effective Code Clone Detection

With the development of the open source community, the code is often copied, spread, and evolved in multiple software systems, which brings uncertainty and risk to the software system (e.g., bug propagation and copyright infringement).…

Software Engineering · Computer Science 2024-05-02 Shihan Dou , Yueming Wu , Haoxiang Jia , Yuhao Zhou , Yan Liu , Yang Liu

Commit2Vec: Learning Distributed Representations of Code Changes

Deep learning methods, which have found successful applications in fields like image classification and natural language processing, have recently been applied to source code analysis too, due to the enormous amount of freely available…

Software Engineering · Computer Science 2021-11-18 Rocìo Cabrera Lozoya , Arnaud Baumann , Antonino Sabetta , Michele Bezzi

eye2vec: Learning Distributed Representations of Eye Movement for Program Comprehension Analysis

This paper presents eye2vec, an infrastructure for analyzing software developers' eye movements while reading source code. In common eye-tracking studies in program comprehension, researchers must preselect analysis targets such as control…

Software Engineering · Computer Science 2025-10-16 Haruhiko Yoshioka , Kazumasa Shimari , Hidetake Uwano , Kenichi Matsumoto

Bug Prediction Using Source Code Embedding Based on Doc2Vec

Bug prediction is a resource demanding task that is hard to automate using static source code analysis. In many fields of computer science, machine learning has proven to be extremely useful in tasks like this, however, for it to work we…

Software Engineering · Computer Science 2021-10-12 Tamás Aladics , Judit Jász , Rudolf Ferenc

Distributed Representation of Subgraphs

Network embeddings have become very popular in learning effective feature representations of networks. Motivated by the recent successes of embeddings in natural language processing, researchers have tried to find network embeddings in…

Social and Information Networks · Computer Science 2017-02-23 Bijaya Adhikari , Yao Zhang , Naren Ramakrishnan , B. Aditya Prakash

Dis-S2V: Discourse Informed Sen2Vec

Vector representation of sentences is important for many text processing tasks that involve clustering, classifying, or ranking sentences. Recently, distributed representation of sentences learned by neural models from unlabeled data has…

Computation and Language · Computer Science 2016-10-27 Tanay Kumar Saha , Shafiq Joty , Naeemul Hassan , Mohammad Al Hasan

graph2vec: Learning Distributed Representations of Graphs

Recent works on representation learning for graph structured data predominantly focus on learning distributed representations of graph substructures such as nodes and subgraphs. However, many graph analytics tasks such as graph…

Artificial Intelligence · Computer Science 2017-07-18 Annamalai Narayanan , Mahinthan Chandramohan , Rajasekar Venkatesan , Lihui Chen , Yang Liu , Shantanu Jaiswal

SaC2Vec: Information Network Representation with Structure and Content

Network representation learning (also known as information network embedding) has been the central piece of research in social and information network analysis for the last couple of years. An information network can be viewed as a linked…

Social and Information Networks · Computer Science 2018-07-05 Sambaran Bandyopadhyay , Harsh Kara , Anirban Biswas , M N Murty

On using distributed representations of source code for the detection of C security vulnerabilities

This paper presents an evaluation of the code representation model Code2vec when trained on the task of detecting security vulnerabilities in C source code. We leverage the open-source library astminer to extract path-contexts from the…

Cryptography and Security · Computer Science 2021-06-04 David Coimbra , Sofia Reis , Rui Abreu , Corina Păsăreanu , Hakan Erdogmus

KeyVec: Key-semantics Preserving Document Representations

Previous studies have demonstrated the empirical success of word embeddings in various applications. In this paper, we investigate the problem of learning distributed representations for text documents which many machine learning algorithms…

Computation and Language · Computer Science 2017-09-29 Bin Bi , Hao Ma

motif2vec: Motif Aware Node Representation Learning for Heterogeneous Networks

Recent years have witnessed a surge of interest in machine learning on graphs and networks with applications ranging from vehicular network design to IoT traffic management to social network recommendations. Supervised machine learning…

Social and Information Networks · Computer Science 2019-08-23 Manoj Reddy Dareddy , Mahashweta Das , Hao Yang

Evaluating Representation Learning of Code Changes for Predicting Patch Correctness in Program Repair

A large body of the literature of automated program repair develops approaches where patches are generated to be validated against an oracle (e.g., a test suite). Because such an oracle can be imperfect, the generated patches, although…

Software Engineering · Computer Science 2020-08-10 Haoye Tian , Kui Liu , Abdoul Kader Kaboreé , Anil Koyuncu , Li Li , Jacques Klein , Tegawendé F. Bissyandé

cf2vec: Collaborative Filtering algorithm selection using graph distributed representations

Algorithm selection using Metalearning aims to find mappings between problem characteristics (i.e. metafeatures) with relative algorithm performance to predict the best algorithm(s) for new datasets. Therefore, it is of the utmost…

Information Retrieval · Computer Science 2018-09-18 Tiago Cunha , Carlos Soares , André C. P. L. F. de Carvalho

Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks

In an effort to understand the meaning of the intermediate representations captured by deep networks, recent papers have tried to associate specific semantic concepts to individual neural network filter responses, where interesting…

Computer Vision and Pattern Recognition · Computer Science 2018-03-30 Ruth Fong , Andrea Vedaldi

A Controlled Experiment of Different Code Representations for Learning-Based Bug Repair

Training a deep learning model on source code has gained significant traction recently. Since such models reason about vectors of numbers, source code needs to be converted to a code representation before vectorization. Numerous approaches…

Software Engineering · Computer Science 2022-07-18 Marjane Namavar , Noor Nashid , Ali Mesbah