Related papers: AskCI Server: Collaborative and version controlled…
Wikis provide a new way of collaboration and knowledge sharing. Wikis are software that allows users to work collectively on a web-based knowledge base. Wikis are characterised by a sense of anarchism, collaboration, connectivity, organic…
Git is used as the distributed version control system for many open-source software projects. One Git-based service, GitHub, is the most common code hosting and repository service for open-source software projects. For researchers that…
GitHub is the most popular social coding platform and widely used by developers and organizations to host their open-source projects around the world. Besides that, the platform has a web API that allow developers collect information from…
The ability to cite software and give credit to its authors and contributors is increasingly important. While the number of online open-source software repositories has grown rapidly over the past few years, few are being properly cited…
Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving…
DevOps is a combination of methodologies and tools that improves the software development, build, deployment, and monitoring processes by shortening its lifecycle and improving software quality. Part of this process is CI/CD, which embodies…
Empirical Software Engineering (ESE) researchers study (open-source) project issues and the comments and threads within to discover -- among others -- challenges developers face when incorporating new technologies, platforms, and…
Developers and data scientists often struggle to write command-line inputs, even though graphical interfaces or tools like ChatGPT can assist. The solution? "ai-cli," an open-source system inspired by GitHub Copilot that converts natural…
We introduce Version 2 of SPECI, a system for predictive simulation modeling of large-scale data-centres, i.e. warehouse-sized facilities containing hundreds of thousands of servers, as used to provide cloud services.
Cross-community collaboration can exploit the expertise and knowledges of crowds in different communities. Recently increasing users in open source software (OSS) community like GitHub attempt to gather software requirements from question…
Open data refers to data that is freely available for reuse. Although there has been rapid increase in availability of open data to public in the last decade, this has not translated into better decision-support tools for them. We propose…
GitHub is the most widely used social, distributed version control system. It has around 10 million registered users and hosts over 16 million public repositories. Its user base is also very active as GitHub ranks in the top 100 Alexa most…
We introduce DataCI, a comprehensive open-source platform designed specifically for data-centric AI in dynamic streaming data settings. DataCI provides 1) an infrastructure with rich APIs for seamless streaming dataset management,…
We present SciServer, a science platform built and supported by the Institute for Data Intensive Engineering and Science at the Johns Hopkins University. SciServer builds upon and extends the SkyServer system of server-side tools that…
Artificial Intelligence systems, which benefit from the availability of large-scale datasets and increasing computational power, have become effective solutions to various critical tasks, such as natural language understanding, speech…
A data commons is a cloud-based data platform with a governance structure that allows a community to manage, analyze and share its data. Data commons provide a research community with the ability to manage and analyze large datasets using…
Well established libraries typically have API documentation. However, they frequently lack examples and explanations, possibly making difficult their effective reuse. Stack Overflow is a question-and-answer website oriented to issues…
With over 28 million developers, success of the GitHub collaborative platform is highlighted through an abundance of communication channels among contemporary software projects. Knowledge is broken into two forms and its sharing (through…
The Platform for Content-Structure Inference (PCSI, pronounced "pixie") facilitates the sharing of information about the process of converting Web resources into structured content objects that conform to a predefined format. PCSI records…
Discussions is a new feature of GitHub for asking questions or discussing topics outside of specific Issues or Pull Requests. Before being available to all projects in December 2020, it had been tested on selected open source software…