Related papers: On Tracking Java Methods with Git Mechanisms

On the Use of Information Retrieval to Automate the Detection of Third-Party Java Library Migration at the Method Level

The migration process between different third-party libraries is hard, complex and error-prone. Typically, during a library migration, developers need to find methods in the new library that are most adequate in replacing the old methods of…

Software Engineering · Computer Science 2019-06-07 Hussein Alrubaye , Mohamed Wiem Mkaouer , Ali Ouni

A Tool for Rejuvenating Feature Logging Levels via Git Histories and Degree of Interest

Logging is a significant programming practice. Due to the highly transactional nature of modern software applications, massive amount of logs are generated every day, which may overwhelm developers. Logging information overload can be…

Software Engineering · Computer Science 2022-07-20 Yiming Tang , Allan Spektor , Raffi Khatchadourian , Mehdi Bagherzadeh

How Different Are Different diff Algorithms in Git?

Automatic identification of the differences between two versions of a file is a common and basic task in several applications of mining code repositories. Git, a version control system, has a diff utility and users can select algorithms of…

Software Engineering · Computer Science 2019-10-18 Yusuf Sulistyo Nugroho , Hideaki Hata , Kenichi Matsumoto

Automated Evolution of Feature Logging Statement Levels Using Git Histories and Degree of Interest

Logging -- used for system events and security breaches to describe more informational yet essential aspects of software features -- is pervasive. Given the high transactionality of today's software, logging effectiveness can be reduced by…

Software Engineering · Computer Science 2022-07-20 Yiming Tang , Allan Spektor , Raffi Khatchadourian , Mehdi Bagherzadeh

HistoryFinder: Advancing Method-Level Source Code History Generation with Accurate Oracles and Enhanced Algorithm

Reconstructing a method's change history efficiently and accurately is critical for many software engineering tasks, including maintenance, refactoring, and comprehension. Despite the availability of method history generation tools such as…

Software Engineering · Computer Science 2025-10-16 Shahidul Islam , Ashik Aowal , Md Sharif Uddin , Shaiful Chowdhury

LabelGit: A Dataset for Software Repositories Classification using Attributed Dependency Graphs

Software repository hosting services contain large amounts of open-source software, with GitHub hosting more than 100 million repositories, from new to established ones. Given this vast amount of projects, there is a pressing need for a…

Software Engineering · Computer Science 2021-03-17 Cezar Sas , Andrea Capiluppi

GitBug-Java: A Reproducible Benchmark of Recent Java Bugs

Bug-fix benchmarks are essential for evaluating methodologies in automatic program repair (APR) and fault localization (FL). However, existing benchmarks, exemplified by Defects4J, need to evolve to incorporate recent bug-fixes aligned with…

Software Engineering · Computer Science 2024-11-04 André Silva , Nuno Saavedra , Martin Monperrus

RefDiff: Detecting Refactorings in Version Histories

Refactoring is a well-known technique that is widely adopted by software engineers to improve the design and enable the evolution of a system. Knowing which refactoring operations were applied in a code change is a valuable information to…

Software Engineering · Computer Science 2018-08-07 Danilo Silva , Marco Tulio Valente

The Need for a Fine-grained approach in Just-in-Time Defect Prediction

With software system complexity leading to the rise of software defects, research efforts have been done on techniques towards predicting software defects and Just-in-time (JIT) defect prediction which predicts whether a code change is…

Software Engineering · Computer Science 2021-10-05 Giuseppe Ng , Charibeth Cheng

Tooling for Time- and Space-efficient git Repository Mining

Software projects under version control grow with each commit, accumulating up to hundreds of thousands of commits per repository. Especially for such large projects, the traversal of a repository and data extraction for static source code…

Software Engineering · Computer Science 2022-05-04 Fabian Heseding , Willy Scheibel , Jürgen Döllner

Coming: a Tool for Mining Change Pattern Instances from Git Commits

Software repositories such as Git have become a relevant source of information for software engineer researcher. For instance, the detection of Commits that fulfill a given criterion (e.g., bugfixing commits) is one of the most frequent…

Software Engineering · Computer Science 2019-06-11 Matias Martinez , Martin Monperrus

Adoption and Evolution of Code Style and Best Programming Practices in Open-Source Projects

Following code style conventions in software projects is essential for maintaining overall code quality. Adhering to these conventions improves maintainability, understandability, and extensibility. Additionally, following best practices…

Software Engineering · Computer Science 2026-01-16 Alvari Kupari , Nasser Giacaman , Valerio Terragni

Public Git Archive: a Big Code dataset for all

The number of open source software projects has been growing exponentially. The major online software repository host, GitHub, has accumulated tens of millions of publicly available Git version-controlled repositories. Although the research…

Software Engineering · Computer Science 2018-03-28 Vadim Markovtsev , Waren Long

A Two-phase Recommendation Framework for Consistent Java Method Names

In software engineering (SE) tasks, the naming approach is so important that it attracts many scholars from all over the world to study how to improve the quality of method names. To accurately recommend method names, we employ a novel…

Software Engineering · Computer Science 2022-01-25 Weidong Wang , Dian Li , Yujian Kang

More Effective Software Repository Mining

Background: Data mining and analyzing of public Git software repositories is a growing research field. The tools used for studies that investigate a single project or a group of projects have been refined, but it is not clear whether the…

Software Engineering · Computer Science 2020-08-18 Adam Tutko , Austin Henley , Audris Mockus

Automatic Traceability Maintenance via Machine Learning Classification

Previous studies have shown that software traceability, the ability to link together related artifacts from different sources within a project (e.g., source code, use cases, documentation, etc.), improves project outcomes by assisting…

Software Engineering · Computer Science 2018-07-19 Chris Mills , Javier Escobar-Avila , Sonia Haiduc

An Alternative Issue Tracking Dataset of Public Jira Repositories

Organisations use issue tracking systems (ITSs) to track and document their projects' work in units called issues. This style of documentation encourages evolutionary refinement, as each issue can be independently improved, commented on,…

Software Engineering · Computer Science 2022-03-28 Lloyd Montgomery , Clara Lüders , Walid Maalej

An Empirical Study on Method-Level Performance Evolution in Open-Source Java Projects

Performance is a critical quality attribute in software development, yet the impact of method-level code changes on performance evolution remains poorly understood. While developers often make intuitive assumptions about which types of…

Software Engineering · Computer Science 2025-08-12 Kaveh Shahedi , Nana Gyambrah , Heng Li , Maxime Lamothe , Foutse Khomh

Methods2Test: A dataset of focal methods mapped to test cases

Unit testing is an essential part of the software development process, which helps to identify issues with source code in early stages of development and prevent regressions. Machine learning has emerged as viable approach to help software…

Software Engineering · Computer Science 2022-03-25 Michele Tufano , Shao Kun Deng , Neel Sundaresan , Alexey Svyatkovskiy

JUGE: An Infrastructure for Benchmarking Java Unit Test Generators

Researchers and practitioners have designed and implemented various automated test case generators to support effective software testing. Such generators exist for various languages (e.g., Java, C#, or Python) and for various platforms…

Software Engineering · Computer Science 2022-10-31 Xavier Devroey , Alessio Gambi , Juan Pablo Galeotti , René Just , Fitsum Kifetew , Annibale Panichella , Sebastiano Panichella