Related papers: A Tool to Extract Structured Data from GitHub
Developers use and contribute to repositories on GitHub. Documentation present in the repositories serves as an important source by helping developers to understand, maintain and contribute to the project. Currently, documentation in a…
The number of open source software projects has been growing exponentially. The major online software repository host, GitHub, has accumulated tens of millions of publicly available Git version-controlled repositories. Although the research…
GitHub is the most popular social coding platform and widely used by developers and organizations to host their open-source projects around the world. Besides that, the platform has a web API that allow developers collect information from…
GitHub is the world's largest host of source code, with more than 150M repositories. However, most of these repositories are not labeled or inadequately so, making it harder for users to find relevant projects. There have been various…
Almost every Mining Software Repositories (MSR) study requires, as first step, the selection of the subject software repositories. These repositories are usually collected from hosting services like GitHub using specific selection criteria…
Git is used as the distributed version control system for many open-source software projects. One Git-based service, GitHub, is the most common code hosting and repository service for open-source software projects. For researchers that…
Software is often developed using versioned controlled software, such as Git, and hosted on centralized Web hosts, such as GitHub and GitLab. These Web hosted software repositories are made available to users in the form of traditional HTML…
GitHub is the largest source code repository in the world. It provides a git-based source code management platform and also many features inspired by social networks. For example, GitHub users can show appreciation to projects by adding…
GitHub is the largest code hosting platform, with millions of repositories spanning multiple technologies. Despite this, little is known about the actual contents of GitHub's repositories in the wild. This paper presents an initial…
Software repository hosting services contain large amounts of open-source software, with GitHub hosting more than 100 million repositories, from new to established ones. Given this vast amount of projects, there is a pressing need for a…
Open-source repositories provide wealth of information and are increasingly being used to build artificial intelligence (AI) based systems to solve problems in software engineering. Open-source repositories could be of varying quality…
GitHub hosts millions of software repositories, facilitating developers to contribute to many projects in multiple ways. Most of the information about the repositories is text-based in the form of stars, forks, commits, and so on. However,…
Besides a git-based version control system, GitHub integrates several social coding features. Particularly, GitHub users can star a repository, presumably to manifest interest or satisfaction with an open source project. However, the real…
GitHub is the world's largest platform for collaborative software development, with over 100 million users. GitHub is also used extensively for open data collaboration, hosting more than 800 million open data files, totaling 142 terabytes…
Given the vast number of repositories hosted on GitHub, project discovery and retrieval have become increasingly important for GitHub users. Repository descriptions serve as one of the first points of contact for users who are accessing a…
GitHub projects can be easily replicated through the site's fork process or through a Git clone-push sequence. This is a problem for empirical software engineering, because it can lead to skewed results or mistrained machine learning…
GitHub has become the central online platform for much of open source, hosting most open source code repositories. With this popularity, the public digital traces of GitHub are now a valuable means to study teamwork and collaboration. In…
GitHub's issue reports provide developers with valuable information that is essential to the evolution of a software development project. Contributors can use these reports to perform software engineering tasks like submitting bugs,…
In open-source software development environments; textual, numerical and relationship-based data generated are of interest to researchers. Various data sets are available for this data, which is frequently used in areas such as software…
[Background] In large open-source software projects, development knowledge is often fragmented across multiple artefacts and contributors such that individual stakeholders are generally unaware of the full breadth of the product features.…