Related papers: CCRep: Learning Code Change Representations via Pr…

SemRep: Generative Code Representation Learning with Code Transformations

Code transformation is a foundational capability in the software development process, where its effectiveness relies on constructing a high-quality code representation to characterize the input code semantics and guide the transformation.…

Machine Learning · Computer Science 2026-03-17 Weichen Li , Jiamin Song , Bogdan Alexandru Stoica , Arav Dhoot , Gabriel Ryan , Shengyu Fu , Kexin Pei

CCBERT: Self-Supervised Code Change Representation Learning

Numerous code changes are made by developers in their daily work, and a superior representation of code changes is desired for effective code change analysis. Recently, Hoang et al. proposed CC2Vec, a neural network-based approach that…

Software Engineering · Computer Science 2023-09-28 Xin Zhou , Bowen Xu , DongGyun Han , Zhou Yang , Junda He , David Lo

Unsupervised Learning of General-Purpose Embeddings for Code Changes

Applying machine learning to tasks that operate with code changes requires their numerical representation. In this work, we propose an approach for obtaining such representations during pre-training and evaluate them on two different…

Software Engineering · Computer Science 2021-07-12 Mikhail Pravilov , Egor Bogomolov , Yaroslav Golubev , Timofey Bryksin

CC2Vec: Distributed Representations of Code Changes

Existing work on software patches often use features specific to a single task. These works often rely on manually identified features, and human effort is required to identify these features for each task. In this work, we propose CC2Vec,…

Software Engineering · Computer Science 2020-03-13 Thong Hoang , Hong Jin Kang , Julia Lawall , David Lo

Commit2Vec: Learning Distributed Representations of Code Changes

Deep learning methods, which have found successful applications in fields like image classification and natural language processing, have recently been applied to source code analysis too, due to the enormous amount of freely available…

Software Engineering · Computer Science 2021-11-18 Rocìo Cabrera Lozoya , Arnaud Baumann , Antonino Sabetta , Michele Bezzi

A Controlled Experiment of Different Code Representations for Learning-Based Bug Repair

Training a deep learning model on source code has gained significant traction recently. Since such models reason about vectors of numbers, source code needs to be converted to a code representation before vectorization. Numerous approaches…

Software Engineering · Computer Science 2022-07-18 Marjane Namavar , Noor Nashid , Ali Mesbah

Code Representation Learning At Scale

Recent studies have shown that code language models at scale demonstrate significant performance gains on downstream tasks, i.e., code generation. However, most of the existing works on code representation learning train models at a hundred…

Computation and Language · Computer Science 2024-02-06 Dejiao Zhang , Wasi Ahmad , Ming Tan , Hantian Ding , Ramesh Nallapati , Dan Roth , Xiaofei Ma , Bing Xiang

CCT5: A Code-Change-Oriented Pre-Trained Model

Software is constantly changing, requiring developers to perform several derived tasks in a timely manner, such as writing a description for the intention of the code change, or identifying the defect-prone code changes. Considering that…

Software Engineering · Computer Science 2023-05-19 Bo Lin , Shangwen Wang , Zhongxin Liu , Yepang Liu , Xin Xia , Xiaoguang Mao

TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills

Code pre-trained models (CodePTMs) have recently demonstrated a solid capacity to process various software intelligence tasks, e.g., code clone detection, code translation, and code summarization. The current mainstream method that deploys…

Software Engineering · Computer Science 2024-05-10 Qiushi Sun , Nuo Chen , Jianing Wang , Xiang Li , Ming Gao

Context-Encoded Code Change Representation for Automated Commit Message Generation

Changes in source code are an inevitable part of software development. They are the results of indispensable activities such as fixing bugs or improving functionality. Descriptions for code changes (commit messages) help people better…

Software Engineering · Computer Science 2023-06-27 Thanh Trong Vu , Thanh-Dat Do , Hieu Dinh Vo

Evaluating Representation Learning of Code Changes for Predicting Patch Correctness in Program Repair

A large body of the literature of automated program repair develops approaches where patches are generated to be validated against an oracle (e.g., a test suite). Because such an oracle can be imperfect, the generated patches, although…

Software Engineering · Computer Science 2020-08-10 Haoye Tian , Kui Liu , Abdoul Kader Kaboreé , Anil Koyuncu , Li Li , Jacques Klein , Tegawendé F. Bissyandé

Structured Code Representations Enable Data-Efficient Adaptation of Code Language Models

Current language models tailored for code tasks often adopt the pre-training-then-fine-tuning paradigm from natural language processing, modeling source code as plain text. This approach, however, overlooks the unambiguous structures…

Computation and Language · Computer Science 2024-01-22 Mayank Agarwal , Yikang Shen , Bailin Wang , Yoon Kim , Jie Chen

Learning the Relation between Code Features and Code Transforms with Structured Prediction

To effectively guide the exploration of the code transform space for automated code evolution techniques, we present in this paper the first approach for structurally predicting code transforms at the level of AST nodes using conditional…

Software Engineering · Computer Science 2023-06-06 Zhongxing Yu , Matias Martinez , Zimin Chen , Tegawendé F. Bissyandé , Martin Monperrus

An AST-based Code Change Representation and its Performance in Just-in-time Vulnerability Prediction

The presence of software vulnerabilities is an ever-growing issue in software development. In most cases, it is desirable to detect vulnerabilities as early as possible, preferably in a just-in-time manner, when the vulnerable piece is…

Software Engineering · Computer Science 2023-03-30 Tamás Aladics , Péter Hegedűs , Rudolf Ferenc

code2vec: Learning Distributed Representations of Code

We present a neural model for representing snippets of code as continuous distributed vectors ("code embeddings"). The main idea is to represent a code snippet as a single fixed-length $\textit{code vector}$, which can be used to predict…

Machine Learning · Computer Science 2018-10-31 Uri Alon , Meital Zilberstein , Omer Levy , Eran Yahav

Robust Graph Representation Learning via Predictive Coding

Predictive coding is a message-passing framework initially developed to model information processing in the brain, and now also topic of research in machine learning due to some interesting properties. One of such properties is the natural…

Machine Learning · Computer Science 2022-12-12 Billy Byiringiro , Tommaso Salvatori , Thomas Lukasiewicz

Code Revert Prediction with Graph Neural Networks: A Case Study at J.P. Morgan Chase

Code revert prediction, a specialized form of software defect detection, aims to forecast or predict the likelihood of code changes being reverted or rolled back in software development. This task is very important in practice because by…

Software Engineering · Computer Science 2024-03-15 Yulong Pei , Salwa Alamir , Rares Dolga , Sameena Shah

Contrastive Code Representation Learning

Recent work learns contextual representations of source code by reconstructing tokens from their context. For downstream semantic understanding tasks like summarizing code in English, these representations should ideally capture program…

Machine Learning · Computer Science 2022-01-10 Paras Jain , Ajay Jain , Tianjun Zhang , Pieter Abbeel , Joseph E. Gonzalez , Ion Stoica

CoreGen: Contextualized Code Representation Learning for Commit Message Generation

Automatic generation of high-quality commit messages for code commits can substantially facilitate software developers' works and coordination. However, the semantic gap between source code and natural language poses a major challenge for…

Computation and Language · Computer Science 2021-06-22 Lun Yiu Nie , Cuiyun Gao , Zhicong Zhong , Wai Lam , Yang Liu , Zenglin Xu

Automating Code Review Activities by Large-Scale Pre-training

Code review is an essential part to software development lifecycle since it aims at guaranteeing the quality of codes. Modern code review activities necessitate developers viewing, understanding and even running the programs to assess…

Software Engineering · Computer Science 2022-10-12 Zhiyu Li , Shuai Lu , Daya Guo , Nan Duan , Shailesh Jannu , Grant Jenks , Deep Majumder , Jared Green , Alexey Svyatkovskiy , Shengyu Fu , Neel Sundaresan