English

Source Code Clone Detection Using Unsupervised Similarity Measures

Software Engineering 2024-08-13 v3 Information Retrieval

Abstract

Assessing similarity in source code has gained significant attention in recent years due to its importance in software engineering tasks such as clone detection and code search and recommendation. This work presents a comparative analysis of unsupervised similarity measures for identifying source code clone detection. The goal is to overview the current state-of-the-art techniques, their strengths, and weaknesses. To do that, we compile the existing unsupervised strategies and evaluate their performance on a benchmark dataset to guide software engineers in selecting appropriate methods for their specific use cases. The source code of this study is available at https://github.com/jorge-martinez-gil/codesim

Keywords

Cite

@article{arxiv.2401.09885,
  title  = {Source Code Clone Detection Using Unsupervised Similarity Measures},
  author = {Jorge Martinez-Gil},
  journal= {arXiv preprint arXiv:2401.09885},
  year   = {2024}
}

Comments

Accepted for publication as Full Paper in the Software Quality Days 2024, Vienna, Austria