Related papers: How Different Are Different diff Algorithms in Git…
Despite being widely used, the algorithms that enable collaboration with Git are not well understood. The diff and merge algorithms are particularly interesting, as they could be applied in other contexts. In this thesis, I document the…
Software repositories such as Git have become a relevant source of information for software engineer researcher. For instance, the detection of Commits that fulfill a given criterion (e.g., bugfixing commits) is one of the most frequent…
Refactoring is a well-known technique that is widely adopted by software engineers to improve the design and enable the evolution of a system. Knowing which refactoring operations were applied in a code change is a valuable information to…
Software development is inherently incremental. Nowadays, many software companies adopt an agile process and a shorter release cycle, where software needs to be delivered faster with quality assurances. On the other hand, the majority of…
A source code difference (diff) indicates changes made by comparing new and old source codes, and it can be utilized in code reviews to help developers understand the changes made to the code. Although many diff generation methods have been…
Just-in-time (JIT) compilers are key components for many popular programming languages with managed runtimes (e.g., Java and JavaScript). JIT compilers perform optimizations and generate native code at runtime based on dynamic profiling…
Method-level historical information is useful in research on mining software repositories such as fault-prone module detection or evolutionary coupling identification. An existing technique named Historage converts a Git repository of a…
A version control system, such as Git, requires a way to integrate changes from different developers or branches. Given a merge scenario, a merge tool either outputs a clean integration of the changes, or it outputs a conflict for manual…
This paper presents Megadiff, a dataset of source code diffs. It focuses on Java, with strict inclusion criteria based on commit message and diff size. Megadiff contains 663 029 Java diffs that can be used for research on commit…
A version control system records changes to a file or set of files over time so that changes can be tracked and specific versions of a file can be recalled later. As such, it is an essential element of a reproducible workflow that deserves…
With software system complexity leading to the rise of software defects, research efforts have been done on techniques towards predicting software defects and Just-in-time (JIT) defect prediction which predicts whether a code change is…
Code clones are code snippets that are identical or similar to other snippets within the same or different files. They are often created through copy-and-paste practices and modified during development and maintenance activities. Since a…
This paper presents a large-scale study that investigates the bug resolution characteristics among popular Github projects written in different programming languages. We explore correlations but, of course, we cannot infer causation.…
Advancements in Artificial Intelligence, particularly with ChatGPT, have significantly impacted software development. Utilizing novel data from GitHub Innovation Graph, we hypothesize that ChatGPT enhances software production efficiency.…
Commit messages aid developers in their understanding of a continuously evolving codebase. However, developers not always document code changes properly. Automatically generating commit messages would relieve this burden on developers.…
Bug reports provide critical insights into software quality, yet existing datasets often suffer from limited scope, outdated content, or insufficient metadata for machine learning. To address these limitations, we present GitBugs-a…
Inspection of code changes is a time-consuming task that constitutes a big part of everyday work of software engineers. Existing IDEs provide little information about the semantics of code changes within the file editor view. Therefore…
Understanding the changes made by developers when they submit a pull request and/or perform a commit on a repository is a crucial activity in software maintenance and evolution. The common way to review changes relies on examining code…
GitHub's issue reports provide developers with valuable information that is essential to the evolution of a software development project. Contributors can use these reports to perform software engineering tasks like submitting bugs,…
Mutation testing is used extensively to support the experimentation of software engineering studies. Its application to real-world projects is possible thanks to modern tools that automate the whole mutation analysis process. However,…