Related papers: Optimizing CMS build infrastructure via Apache Mes…
The globally distributed computing infrastructure required to cope with the multi-petabytes datasets produced by the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) at CERN comprises several subsystems, such as…
Open source cloud technologies provide a wide range of support for creating customized compute node clusters to schedule tasks and managing resources. In cloud infrastructures such as Jetstream and Chameleon, which are used for scientific…
The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its…
The CERN IT provides a set of Hadoop clusters featuring more than 5 PBytes of raw storage with different open-source, user-level tools available for analytical purposes. The CMS experiment started collecting a large set of computing…
We discuss initial results and our planned approach for incorporating Apache Mesos based resource management that will enable design and development of scheduling strategies for Apache Airavata jobs so that they can be launched on multiple…
The Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) is one of the largest data producers in the scientific world, with standard data products centrally produced, and then used by often competing teams within the…
The CMS experiment heavily relies on the CMSWEB cluster to host critical services for its operational needs. The cluster is deployed on virtual machines (VMs) from the CERN OpenStack cloud and is manually maintained by operators and…
Apache Mesos, a cluster-wide resource manager, is widely deployed in massive scale at several Clouds and Data Centers. Mesos aims to provide high cluster utilization via fine grained resource co-scheduling and resource fairness among…
The efficient exploitation of worldwide distributed storage and computing resources available in the grids require a robust, transparent and fast deployment of experiment specific software. The approach followed by the CMS experiment at…
The CMS offline software and computing system has successfully met the challenge of LHC Run 2. In this presentation, we will discuss how the entire system was improved in anticipation of increased trigger output rate, increased rate of…
As resource estimation for jobs is difficult, users often overestimate their requirements. Both commercial clouds and academic campus clusters suffer from low resource utilization and long wait times as the resource estimates for jobs,…
Having built up Linux clusters to more than 1000 nodes over the past five years, we already have practical experience confronting some of the LHC scale computing challenges: scalability, automation, hardware diversity, security, and rolling…
The NEMO High Performance Computing Cluster at the University of Freiburg has been made available to researchers of the ATLAS and CMS experiments. Users access the cluster from external machines connected to the World-wide LHC Computing…
The CMS (Compact Muon Solenoid) experiment is one of the two large general-purpose particle physics detectors built at the LHC (Large Hadron Collider) at CERN in Geneva, Switzerland. The diverse collaboration combined with a highly…
The Compact Muon Solenoid (CMS) is a large and complex general purpose experiment at the CERN Large Hadron Collider (LHC), built and maintained by many collaborators from around the world. Efficient operation of the detector requires…
Edge computing enables latency-critical applications to process data close to end devices, yet task heterogeneity and limited resources pose significant challenges to efficient orchestration. This paper presents a measurement-driven,…
The former CMS Run 2 High Level Trigger (HLT) farm is one of the largest contributors to CMS compute resources, providing about 25k job slots for offline computing. This CPU farm was initially employed as an opportunistic resource,…
While the computing landscape supporting LHC experiments is currently dominated by x86 processors at WLCG sites, this configuration will evolve in the coming years. LHC collaborations will be increasingly employing HPC and Cloud facilities…
In Run 1 of the Large Hadron Collider, software and computing was a strategic strength of the Compact Muon Solenoid experiment. The timely processing of data and simulation samples and the excellent performance of the reconstruction…
Enterprises operate large data lakes using Hadoop and Spark frameworks that (1) run a plethora of tools to automate powerful data preparation/transformation pipelines, (2) run on shared, large clusters to (3) perform many different…