Related papers: Pitfalls and Guidelines for Using Time-Based Git D…
Many software engineering research papers rely on time-based data (e.g., commit timestamps, issue report creation/update/close dates, release dates). Like most real-world data however, time-based data is often dirty. To date, there are no…
Data from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g.…
In open-source software development environments; textual, numerical and relationship-based data generated are of interest to researchers. Various data sets are available for this data, which is frequently used in areas such as software…
With the advent of open source software, a veritable treasure trove of previously proprietary software development data was made available. This opened the field of empirical software engineering research to anyone in academia. Data that is…
GitHub's issue reports provide developers with valuable information that is essential to the evolution of a software development project. Contributors can use these reports to perform software engineering tasks like submitting bugs,…
Data from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g.…
Large project overruns and overtime work have been reported in the software industry, resulting in additional expense for companies and personal issues for developers. The present work aims to provide an overview of studies related to time…
Git is used as the distributed version control system for many open-source software projects. One Git-based service, GitHub, is the most common code hosting and repository service for open-source software projects. For researchers that…
Context: The establishment of the Mining Software Repositories (MSR) data showcase conference track has encouraged researchers to provide data sets as a basis for further empirical studies. Objective: Examine the usage of data papers…
Background: Data mining and analyzing of public Git software repositories is a growing research field. The tools used for studies that investigate a single project or a group of projects have been refined, but it is not clear whether the…
Researchers often delve into the connections between different factors derived from the historical data of software projects. For example, scholars have devoted their endeavors to the exploration of associations among these factors.…
Under the data-driven research paradigm, research software has come to play crucial roles in nearly every stage of scientific inquiry. Scholars are advocating for the formal citation of software in academic publications, treating it on par…
Without sufficient information about research data practices occurring in a particular research organisation, there is a risk of mismatching research data service efforts with the needs of its researchers. This study describes how data…
Energy efficiency has become a growing concern in software development, leading to the need for tools designed to measure energy consumption. While several energy measurement tools are available as open-source projects, their…
Modern programming languages like Java require runtime systems to support the implementation and deployment of software applications in diverse computing platforms and operating systems. These runtime systems are normally developed in…
Millions of developers share their code on open-source platforms like GitHub, which offer social coding opportunities such as distributed collaboration and popularity-based ranking. Software engineering researchers have joined in as well,…
Almost every Mining Software Repositories (MSR) study requires, as first step, the selection of the subject software repositories. These repositories are usually collected from hosting services like GitHub using specific selection criteria…
Recent claims about the impressive abilities of large language models (LLMs) are often supported by evaluating publicly available benchmarks. Since LLMs train on wide swaths of the internet, this practice raises concerns of data…
Logs are widely used to record runtime information of software systems, such as the timestamp and the importance of an event, the unique ID of the source of the log, and a part of the state of a task's execution. The rich information of…
Data is a cornerstone of empirical software engineering (ESE) research and practice. Data underpin numerous process and project management activities, including the estimation of development effort and the prediction of the likely location…