English

CC2Vec: Distributed Representations of Code Changes

Software Engineering 2020-03-13 v1

Abstract

Existing work on software patches often use features specific to a single task. These works often rely on manually identified features, and human effort is required to identify these features for each task. In this work, we propose CC2Vec, a neural network model that learns a representation of code changes guided by their accompanying log messages, which represent the semantic intent of the code changes. CC2Vec models the hierarchical structure of a code change with the help of the attention mechanism and uses multiple comparison functions to identify the differences between the removed and added code. To evaluate if CC2Vec can produce a distributed representation of code changes that is general and useful for multiple tasks on software patches, we use the vectors produced by CC2Vec for three tasks: log message generation, bug fixing patch identification, and just-in-time defect prediction. In all tasks, the models using CC2Vec outperform the state-of-the-art techniques.

Keywords

Cite

@article{arxiv.2003.05620,
  title  = {CC2Vec: Distributed Representations of Code Changes},
  author = {Thong Hoang and Hong Jin Kang and Julia Lawall and David Lo},
  journal= {arXiv preprint arXiv:2003.05620},
  year   = {2020}
}