Related papers: Installing, Running and Maintaining Large Linux Cl…
The rapid development of large clusters built with commodity hardware has highlighted scalability issues with deploying and effectively running system software in large clusters. We describe here our experiences with monitoring, image…
Blockchain, like any other complex technology, needs a strong testing methodology to support its evolution in both research and development contexts. Setting up meaningful tests for permissionless blockchain technology is a notoriously…
HEP Cluster is designed and implemented in Scientific Linux Cern 5.5 to grant High Energy Physics researchers one place where they can go to undertake a particular task or to provide a parallel processing architecture in which CPU resources…
In this paper we report on the first two years of running the CERN testbed site for the EU DataGRID project. The site consists of about 120 dual-processor PCs distributed over several testbeds used for different purposes: software…
With the growing use of embedded systems in various industries, the need for automated platforms for the development and deployment of customized Linux-based operating systems has become more important. This research was conducted with the…
The CMS experiment heavily relies on the CMSWEB cluster to host critical services for its operational needs. The cluster is deployed on virtual machines (VMs) from the CERN OpenStack cloud and is manually maintained by operators and…
Configuring the Linux kernel to meet specific requirements, such as binary size, is highly challenging due to its immense complexity-with over 15,000 interdependent options evolving rapidly across different versions. Although several…
Open-source software is increasingly reused, complicating the process of patching to repair bugs. In the case of Linux, a distinct ecosystem has formed, with Linux mainline serving as the upstream, stable or long-term-support (LTS) systems…
Linux containers have gained high popularity in recent times. This popularity is significantly due to various advantages of containers over Virtual Machines (VM). The containers are lightweight, occupy lesser storage, have fast boot-up…
Containers are an emerging technology that hold promise for improving productivity and code portability in scientific computing. We examine Linux container technology for the distribution of a non-trivial scientific computing software stack…
The Offline Software of the CMS Experiment at the Large Hadron Collider (LHC) at CERN consists of 6M lines of in-house code, developed over a decade by nearly 1000 physicists, as well as a comparable amount of general use open-source code.…
Amid the rapid advancements in large machine learning (ML) models, universities worldwide are investing substantial funds and efforts into GPU clusters. However, managing a shared GPU cluster poses a pyramid of challenges, from hardware…
The globally distributed computing infrastructure required to cope with the multi-petabytes datasets produced by the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) at CERN comprises several subsystems, such as…
Kubernetes, an open-source platform for automating the deployment, scaling, and management of containerized applications, is widely used for its efficiency and scalability. However, its complexity and extensive configuration options often…
Real-time operating systems employ spatial and temporal isolation to guarantee predictability and schedulability of real-time systems on multi-core processors. Any unbounded and uncontrolled cross-core performance interference poses a…
As tools for designing multiple processor systems-on-chips (MPSoCs) continue to evolve to meet the demands of developers, there exist systematic gaps that must be bridged to provide a more cohesive hardware/software development environment.…
Highly configurable systems are highly complex systems, with the Linux kernel arguably being one of the most well-known ones. Since 2007, it has been a frequent target of the research community, conducting empirical studies and building…
Reusing third-party software packages is a common practice in software development. As the scale and complexity of open-source software (OSS) projects continue to grow (e.g., Linux distributions), the number of reused third-party packages…
Containers, enabling lightweight environment and performance isolation, fast and flexible deployment, and fine-grained resource sharing, have gained popularity in better application management and deployment in addition to hardware…
Edge computing is the practice of placing computing resources at the edges of the Internet in close proximity to devices and information sources. This, much like a cache on a CPU, increases bandwidth and reduces latency for applications but…