Related papers: LabelGit: A Dataset for Software Repositories Clas…
Many platforms exploit collaborative tagging to provide their users with faster and more accurate results while searching or navigating. Tags can communicate different concepts such as the main features, technologies, functionality, and the…
GitHub has become an important platform for code sharing and scientific exchange. With the massive number of repositories available, there is a pressing need for topic-based search. Even though the topic label functionality has been…
GitHub is the world's largest host of source code, with more than 150M repositories. However, most of these repositories are not labeled or inadequately so, making it harder for users to find relevant projects. There have been various…
GitHub is the largest host of open source software on the Internet. This large, freely accessible database has attracted the attention of practitioners and researchers alike. But as GitHub's growth continues, it is becoming increasingly…
Communities on GitHub often use issue labels as a way of triaging issues by assigning them priority ratings based on how urgently they should be addressed. The labels used are determined by the repository contributors and not standardised…
README files play an essential role in shaping a developer's first impression of a software repository and in documenting the software project that the repository hosts. Yet, we lack a systematic understanding of the content of a typical…
GitHub is the most widely used platform for software maintenance in the open-source community. Developers report issues on GitHub from time to time while facing difficulties. Having labels on those issues can help developers easily address…
GitHub repositories consist of various detailed information about the project contributors, the number of commits and its contributors, releases, pull requests, programming languages, and issues. However, no systematic dataset of open…
App reviews reflect various user requirements that can aid in planning maintenance tasks. Recently, proposed approaches for automatically classifying user reviews rely on machine learning algorithms. A previous study demonstrated that…
Developers collaboratively discuss, implement, use, and share software entities hosted on software repositories. Proper documentation plays an important role in successful software management and maintenance. Users exploit Issue Tracking…
In a wave of growth, open-source projects need to modernize and change how they deal with processes, methods, and communication with their contributors. We could observe that open-source projects are constantly evolving to improve their…
The number of open source software projects has been growing exponentially. The major online software repository host, GitHub, has accumulated tens of millions of publicly available Git version-controlled repositories. Although the research…
Git is used as the distributed version control system for many open-source software projects. One Git-based service, GitHub, is the most common code hosting and repository service for open-source software projects. For researchers that…
GitHub hosts millions of software repositories, facilitating developers to contribute to many projects in multiple ways. Most of the information about the repositories is text-based in the form of stars, forks, commits, and so on. However,…
Open-source repositories provide wealth of information and are increasingly being used to build artificial intelligence (AI) based systems to solve problems in software engineering. Open-source repositories could be of varying quality…
We present stack graphs, an extension of Visser et al.'s scope graphs framework. Stack graphs power Precise Code Navigation at GitHub, allowing users to navigate name binding references both within and across repositories. Like scope…
We introduce GraphSL, a new library for studying the graph source localization problem. graph diffusion and graph source localization are inverse problems in nature: graph diffusion predicts information diffusions from information sources,…
Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving…
In order to understand the state and evolution of the entirety of open source software we need to get a handle on the set of distinct software projects. Most of open source projects presently utilize Git, which is a distributed version…
Modern open source software development heavily relies on the issue tracking systems to manage their feature requests, bug reports, tasks, and other similar artifacts. Together, those "issues" form a complex network with links to each…