English
Related papers

Related papers: Sampling Projects in GitHub for MSR Studies

200 papers

GitHub repositories consist of various detailed information about the project contributors, the number of commits and its contributors, releases, pull requests, programming languages, and issues. However, no systematic dataset of open…

Software Engineering · Computer Science 2020-12-08 Shreyansh Surana , Smit Detroja , Saurabh Tiwari

GitHub is the world's largest host of source code, with more than 150M repositories. However, most of these repositories are not labeled or inadequately so, making it harder for users to find relevant projects. There have been various…

Software Engineering · Computer Science 2023-11-21 Cezar Sas , Andrea Capiluppi , Claudio Di Sipio , Juri Di Rocco , Davide Di Ruscio

Hosting over 10 million of software projects, GitHub is one of the most important data sources to study behavior of developers and software projects. However, with the increase of the size of open source datasets, the potential threats to…

Software Engineering · Computer Science 2018-05-09 Can Cheng , Bing Li , Zengyang Li , Peng Liang

Background: Data mining and analyzing of public Git software repositories is a growing research field. The tools used for studies that investigate a single project or a group of projects have been refined, but it is not clear whether the…

Software Engineering · Computer Science 2020-08-18 Adam Tutko , Austin Henley , Audris Mockus

Mining Software Repositories (MSR) has become a popular research area recently. MSR analyzes different sources of data, such as version control systems, code repositories, defect tracking systems, archived communication, deployment logs,…

Software Engineering · Computer Science 2025-01-06 Zadia Codabux , Fatemeh Fard , Roberto Verdecchia , Fabio Palomba , Dario Di Nucci , Gilberto Recupito

Given the vast number of repositories hosted on GitHub, project discovery and retrieval have become increasingly important for GitHub users. Repository descriptions serve as one of the first points of contact for users who are accessing a…

Software Engineering · Computer Science 2021-10-27 Jazlyn Hellman , Eunbee Jang , Christoph Treude , Chenzhun Huang , Jin L. C. Guo

GitHub's issue reports provide developers with valuable information that is essential to the evolution of a software development project. Contributors can use these reports to perform software engineering tasks like submitting bugs,…

Software Engineering · Computer Science 2023-03-22 Nafiseh Nikeghbal , Amir Hossein Kargaran , Abbas Heydarnoori , Hinrich Schütze

GitHub is the largest host of open source software on the Internet. This large, freely accessible database has attracted the attention of practitioners and researchers alike. But as GitHub's growth continues, it is becoming increasingly…

Software Engineering · Computer Science 2022-08-02 Francisco Zanartu , Christoph Treude , Bruno Cartaxo , Hudson Silva Borges , Pedro Moura , Markus Wagner , Gustavo Pinto

Energy efficiency has become a growing concern in software development, leading to the need for tools designed to measure energy consumption. While several energy measurement tools are available as open-source projects, their…

Software Engineering · Computer Science 2026-03-24 Manuela Bechara Cannizza , Michel Albonico

Git is used as the distributed version control system for many open-source software projects. One Git-based service, GitHub, is the most common code hosting and repository service for open-source software projects. For researchers that…

Software Engineering · Computer Science 2021-01-22 Abdulkadir Şeker , Banu Diri , Halil Arslan , Mehmet Fatih Amasyalı

The SmartSHARK repository mining data is a collection of rich and detailed information about the evolution of software projects. The data is unique in its diversity and contains detailed information about each change, issue tracking data,…

Software Engineering · Computer Science 2021-08-05 Alexander Trautsch , Fabian Trautsch , Steffen Herbold

Large-scale code datasets have acquired an increasingly central role in software engineering (SE) research. This is the result of (i) the success of the mining software repositories (MSR) community, that pushed the standards of empirical…

Software Engineering · Computer Science 2024-09-30 Ozren Dabić , Rosalia Tufano , Gabriele Bavota

The definition of scholarly content has expanded to include the data and source code that contribute to a publication. While major archiving efforts to preserve conventional scholarly content, typically in PDFs (e.g., LOCKSS, CLOCKSS,…

Digital Libraries · Computer Science 2022-08-10 Emily Escamilla , Martin Klein , Talya Cooper , Vicky Rampin , Michele C. Weigle , Michael L. Nelson

We propose a novel software service recommendation model to help users find their suitable repositories in GitHub. Our model first designs a novel context-induced repository graph embedding method to leverage rich contextual information of…

Information Retrieval · Computer Science 2021-12-21 Mingwei Zhang , Jiayuan Liu , Weipu Zhang , Ke Deng , Hai Dong , Ying Liu

Developers use and contribute to repositories on GitHub. Documentation present in the repositories serves as an important source by helping developers to understand, maintain and contribute to the project. Currently, documentation in a…

Software Engineering · Computer Science 2021-03-02 Akhila Sri Manasa Venigalla , Sridhar Chimalakonda

Open-source repositories provide wealth of information and are increasingly being used to build artificial intelligence (AI) based systems to solve problems in software engineering. Open-source repositories could be of varying quality…

Software Engineering · Computer Science 2022-05-06 Niranjan Hasabnis

Software repository hosting services contain large amounts of open-source software, with GitHub hosting more than 100 million repositories, from new to established ones. Given this vast amount of projects, there is a pressing need for a…

Software Engineering · Computer Science 2021-03-17 Cezar Sas , Andrea Capiluppi

Software repositories is one of the sources of data in Empirical Software Engineering, primarily in the Mining Software Repositories field, aimed at extracting knowledge from the dynamics and practice of software projects. With the…

Software Engineering · Computer Science 2024-10-03 June Gorostidi , Adem Ait , Jordi Cabot , Javier Luis Cánovas Izquierdo

Besides a git-based version control system, GitHub integrates several social coding features. Particularly, GitHub users can star a repository, presumably to manifest interest or satisfaction with an open source project. However, the real…

Software Engineering · Computer Science 2019-03-20 Hudson Borges , Marco Tulio Valente

The Mining Software Repositories (MSR) field focuses on analysing the rich data contained in software repositories to derive actionable insights into software processes and products. Mining repositories at scale requires techniques capable…

Software Engineering · Computer Science 2026-04-02 Miguel Romero-Arjona , Saman Barakat , Ana B. Sánchez , Sergio Segura
‹ Prev 1 2 3 10 Next ›