Related papers: Evaluating Representation Learning of Code Changes…

The Best of Both Worlds: Combining Learned Embeddings with Engineered Features for Accurate Prediction of Correct Patches

A large body of the literature on automated program repair develops approaches where patches are automatically generated to be validated against an oracle (e.g., a test suite). Because such an oracle can be imperfect, the generated patches,…

Software Engineering · Computer Science 2022-11-15 Haoye Tian , Kui Liu , Yinghua Li , Abdoul Kader Kaboré , Anil Koyuncu , Andrew Habib , Li Li , Junhao Wen , Jacques Klein , Tegawendé F. Bissyandé

A Controlled Experiment of Different Code Representations for Learning-Based Bug Repair

Training a deep learning model on source code has gained significant traction recently. Since such models reason about vectors of numbers, source code needs to be converted to a code representation before vectorization. Numerous approaches…

Software Engineering · Computer Science 2022-07-18 Marjane Namavar , Noor Nashid , Ali Mesbah

On the Effectiveness of Code Representation in Deep Learning-Based Automated Patch Correctness Assessment

Automated program repair (APR) attempts to generate correct patches and has drawn wide attention from both academia and industry in the past decades. However, APR is continuously struggling with the patch overfitting issue due to the weak…

Software Engineering · Computer Science 2026-04-07 Quanjun Zhang , Haichuan Hu , Chunrong Fang , Ye Shang , Tao Zheng , Zhenyu Chen , Yun Yang , Liang Xiao

Learning Blended, Precise Semantic Program Embeddings

Learning neural program embeddings is key to utilizing deep neural networks in program languages research --- precise and efficient program representations enable the application of deep models to a wide range of program analysis tasks.…

Software Engineering · Computer Science 2019-07-12 Ke Wang , Zhendong Su

Predicting Patch Correctness Based on the Similarity of Failing Test Cases

Towards predicting patch correctness in APR, we propose a simple, but novel hypothesis on how the link between the patch behaviour and failing test specifications can be drawn: similar failing test cases should require similar patches. We…

Software Engineering · Computer Science 2022-03-17 Haoye Tian , Yinghua Li , Weiguo Pian , Abdoul Kader Kaboré , Kui Liu , Andrew Habib , Jacques Klein , Tegawendé F. Bissyande

Dynamic Neural Program Embedding for Program Repair

Neural program embeddings have shown much promise recently for a variety of program analysis tasks, including program synthesis, program repair, fault localization, etc. However, most existing program embeddings are based on syntactic…

Artificial Intelligence · Computer Science 2018-07-03 Ke Wang , Rishabh Singh , Zhendong Su

Exploring Plausible Patches Using Source Code Embeddings in JavaScript

Despite the immense popularity of the Automated Program Repair (APR) field, the question of patch validation is still open. Most of the present-day approaches follow the so-called Generate-and-Validate approach, where first a candidate…

Software Engineering · Computer Science 2021-04-01 Viktor Csuvik , Dániel Horváth , Márk Lajkó , László Vidács

Behavioral Embeddings of Programs: A Quasi-Dynamic Approach for Optimization Prediction

Learning effective numerical representations, or embeddings, of programs is a fundamental prerequisite for applying machine learning to automate and enhance compiler optimization. Prevailing paradigms, however, present a dilemma. Static…

Machine Learning · Computer Science 2025-10-16 Haolin Pan , Jinyuan Dong , Hongbin Zhang , Hongyu Lin , Mingjie Xing , Yanjun Wu

A Comparison of Code Embeddings and Beyond

Program representation learning is a fundamental task in software engineering applications. With the availability of "big code" and the development of deep learning techniques, various program representation learning models have been…

Software Engineering · Computer Science 2021-09-17 Siqi Han , DongXia Wang , Wanting Li , Xuesong Lu

CCRep: Learning Code Change Representations via Pre-Trained Code Model and Query Back

Representing code changes as numeric feature vectors, i.e., code change representations, is usually an essential step to automate many software engineering tasks related to code changes, e.g., commit message generation and just-in-time…

Software Engineering · Computer Science 2023-02-09 Zhongxin Liu , Zhijie Tang , Xin Xia , Xiaohu Yang

code2vec: Learning Distributed Representations of Code

We present a neural model for representing snippets of code as continuous distributed vectors ("code embeddings"). The main idea is to represent a code snippet as a single fixed-length $\textit{code vector}$, which can be used to predict…

Machine Learning · Computer Science 2018-10-31 Uri Alon , Meital Zilberstein , Omer Levy , Eran Yahav

Pre-trained Language Models Learn Remarkably Accurate Representations of Numbers

Pretrained language models (LMs) are prone to arithmetic errors. Existing work showed limited success in probing numeric values from models' representations, indicating that these errors can be attributed to the inherent unreliability of…

Computation and Language · Computer Science 2025-10-27 Marek Kadlčík , Michal Štefánik , Timothee Mickus , Michal Spiegel , Josef Kuchař

Neural Code Comprehension: A Learnable Representation of Code Semantics

With the recent success of embeddings in natural language processing, research has been conducted into applying similar methods to code analysis. Most works attempt to process the code directly or use a syntactic tree representation,…

Machine Learning · Computer Science 2018-11-30 Tal Ben-Nun , Alice Shoshana Jakobovits , Torsten Hoefler

APPT: Boosting Automated Patch Correctness Prediction via Fine-tuning Pre-trained Models

Automated program repair (APR) aims to fix software bugs automatically without human debugging efforts and plays a crucial role in software development and maintenance. Despite promising, APR is still challenged by a long-standing…

Software Engineering · Computer Science 2024-01-17 Quanjun Zhang , Chunrong Fang , Weisong Sun , Yan Liu , Tieke He , Xiaodong Hao , Zhenyu Chen

Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces

With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning…

Software Engineering · Computer Science 2018-08-21 Jordan Henkel , Shuvendu K. Lahiri , Ben Liblit , Thomas Reps

Learning to Represent Edits

We introduce the problem of learning distributed representations of edits. By combining a "neural editor" with an "edit encoder", our models learn to represent the salient information of an edit and can be used to apply edits to new inputs.…

Machine Learning · Computer Science 2019-02-25 Pengcheng Yin , Graham Neubig , Miltiadis Allamanis , Marc Brockschmidt , Alexander L. Gaunt

Learning to Represent Patches

Patch representation is crucial in automating various software engineering tasks, like determining patch accuracy or summarizing code changes. While recent research has employed deep learning for patch representation, focusing on token…

Software Engineering · Computer Science 2023-10-05 Xunzhu Tang , Haoye Tian , Zhenghan Chen , Weiguo Pian , Saad Ezzini , Abdoul Kader Kabore , Andrew Habib , Jacques Klein , Tegawende F. Bissyande

Learning Edge Representations via Low-Rank Asymmetric Projections

We propose a new method for embedding graphs while preserving directed edge information. Learning such continuous-space vector representations (or embeddings) of nodes in a graph is an important first step for using network information…

Machine Learning · Computer Science 2017-09-15 Sami Abu-El-Haija , Bryan Perozzi , Rami Al-Rfou

Improved Representation Learning for Predicting Commonsense Ontologies

Recent work in learning ontologies (hierarchical and partially-ordered structures) has leveraged the intrinsic geometry of spaces of learned representations to make predictions that automatically obey complex structural constraints. We…

Computation and Language · Computer Science 2017-08-03 Xiang Li , Luke Vilnis , Andrew McCallum

An Exploratory Study on Code Attention in BERT

Many recent models in software engineering introduced deep neural models based on the Transformer architecture or use transformer-based Pre-trained Language Models (PLM) trained on code. Although these models achieve the state of the arts…

Software Engineering · Computer Science 2022-04-22 Rishab Sharma , Fuxiang Chen , Fatemeh Fard , David Lo