Related papers: Binary R Packages for Linux: Past, Present and Fut…
One of the most powerful features of R is its infrastructure for contributed code. The built-in package manager and complementary repositories provide a great system for development and exchange of code, and have played an important role in…
The Rocker Project provides widely used Docker images for R across different application scenarios. This article surveys downstream projects that build upon the Rocker Project images and presents the current state of R packages for managing…
The increasing availability of cloud computing and scientific super computers brings great potential for making R accessible through public or shared resources. This allows us to efficiently run code requiring lots of cycles and memory, or…
Recent developments in the commercial open source community have catalysed the use of Linux containers for scalable deployment of web-based applications to the cloud. Scientific software can be containerized with dependencies, configuration…
R packages are the fundamental units of reproducible code in R, providing a mechanism for distributing user-developed code, documentation, and data. Docker is a virtualization technology that allows applications and their dependencies to be…
Reusing third-party software packages is a common practice in software development. As the scale and complexity of open-source software (OSS) projects continue to grow (e.g., Linux distributions), the number of reused third-party packages…
Science depends on collaboration, result reproduction, and the development of supporting software tools. Each of these requires careful management of software versions. We present a unified model for installing, managing, and publishing…
To harness the full benefit of new computing platforms, it is necessary to develop software with parallel computing capabilities. This is no less true for statisticians than for astrophysicists. The R programming language, which is perhaps…
Containers are an emerging technology that hold promise for improving productivity and code portability in scientific computing. We examine Linux container technology for the distribution of a non-trivial scientific computing software stack…
Binary package managers install software quickly but they limit configurability due to rigid ABI requirements that ensure compatibility between binaries. Source package managers provide flexibility in building software, but compilation can…
Having built up Linux clusters to more than 1000 nodes over the past five years, we already have practical experience confronting some of the LHC scale computing challenges: scalability, automation, hardware diversity, security, and rolling…
High Performance Computing~(HPC) software stacks have become complex, with the dependencies of some applications numbering in the hundreds. Packaging, distributing, and administering software stacks of that scale is a complex undertaking…
In this research, we present a comprehensive, longitudinal empirical summary of the R package ecosystem, including not just CRAN, but also Bioconductor and GitHub. We analyze more than 25,000 packages, 150,000 releases, and 15 million files…
Linux containers have gained high popularity in recent times. This popularity is significantly due to various advantages of containers over Virtual Machines (VM). The containers are lightweight, occupy lesser storage, have fast boot-up…
Linux container technologies such as Docker and Singularity offer encapsulated environments for easy execution of software. In high performance computing, this is especially important for evolving and complex software stacks with…
Snap is an alternative software packaging system developed by Canonical and provided by default in the Ubuntu Linux distribution. Given the heterogeneity of various Linux distributions and their various releases, Snap allows an…
As computational work becomes more and more integral to many aspects of scientific research, computational reproducibility has become an issue of increasing importance to computer systems researchers and domain scientists alike. Though…
Transparency is crucial in security-critical applications that rely on authoritative information, as it provides a robust mechanism for holding these authorities accountable for their actions. A number of solutions have emerged in recent…
An increasing volume of studies utilize geocomputation methods in large spatial data. There is a bottleneck in scalable computation for general scientific use as the existing solutions require high-performance computing domain knowledge and…
Package managers are a very important part of Linux distributions but we have noticed two weaknesses in them: They use pre-built packages that are not optimised for specific hardware and often they are too heavy for a specific need, or…