Related papers: Optimizing CMS build infrastructure via Apache Mes…

The CMS monitoring infrastructure and applications

The globally distributed computing infrastructure required to cope with the multi-petabytes datasets produced by the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) at CERN comprises several subsystems, such as…

Software Engineering · Computer Science 2020-07-08 Christian Ariza-Porras , Valentin Kuznetsov , Federica Legger

Scylla: A Mesos Framework for Container Based MPI Jobs

Open source cloud technologies provide a wide range of support for creating customized compute node clusters to schedule tasks and managing resources. In cloud infrastructures such as Jetstream and Chameleon, which are used for scientific…

Performance · Computer Science 2019-05-23 Pankaj Saha , Angel Beltre , Madhusudhan Govindaraju

The archive solution for distributed workflow management agents of the CMS experiment at LHC

The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its…

High Energy Physics - Experiment · Physics 2018-01-12 Valentin Kuznetsov , Nils Leif Fischer , Yuyi Guo

Exploiting Apache Spark platform for CMS computing analytics

The CERN IT provides a set of Hadoop clusters featuring more than 5 PBytes of raw storage with different open-source, user-level tools available for analytical purposes. The CMS experiment started collecting a large set of computing…

Data Analysis, Statistics and Probability · Physics 2017-11-03 Marco Meoni , Valentin Kuznetsov , Luca Menichetti , Justinas Rumševičius , Tommaso Boccali , Daniele Bonacorsi

MultiCloud Resource Management using Apache Mesos for Planned Integration with Apache Airavata

We discuss initial results and our planned approach for incorporating Apache Mesos based resource management that will enable design and development of scheduling strategies for Apache Airavata jobs so that they can be launched on multiple…

Performance · Computer Science 2020-10-07 Pankaj Saha , Madhusudhan Govindaraju , Suresh Marru , Marlon Pierce

Data intensive physics analysis in Azure cloud

The Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) is one of the largest data producers in the scientific world, with standard data products centrally produced, and then used by often competing teams within the…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-20 Igor Sfiligoi , Frank Würthwein , Diego Davila

Migration of CMSWEB Cluster at CERN to Kubernetes

The CMS experiment heavily relies on the CMSWEB cluster to host critical services for its operational needs. The cluster is deployed on virtual machines (VMs) from the CERN OpenStack cloud and is manually maintained by operators and…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-31 Muhammad Imran , Valentin Kuznetsov , Lina Marcella , Katarzyna Maria Dziedziniewicz-Wojcik , Andreas Pfeiffer , Panos Paparrigopoulos

Exploring the Fairness and Resource Distribution in an Apache Mesos Environment

Apache Mesos, a cluster-wide resource manager, is widely deployed in massive scale at several Clouds and Data Centers. Mesos aims to provide high cluster utilization via fine grained resource co-scheduling and resource fairness among…

Performance · Computer Science 2019-05-22 Pankaj Saha , Angel Beltre , Madhusudhan Govindaraju

CMS Software Distribution on the LCG and OSG Grids

The efficient exploitation of worldwide distributed storage and computing resources available in the grids require a robust, transparent and fast deployment of experiment specific software. The approach followed by the CMS experiment at…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-15 K. Rabbertz , M. Thomas , S. Ashby , M. Corvo , S. Argirò , N. Darmenov , R. Darwish , D. Evans , B. Holzman , N. Ratnikova , S. Muzaffar , A. Nowack , T. Wildish , B. Kim , J. Weng , V. Büge

CMS software and computing for LHC Run 2

The CMS offline software and computing system has successfully met the challenge of LHC Run 2. In this presentation, we will discuss how the entire system was improved in anticipation of increased trigger output rate, increased rate of…

Instrumentation and Detectors · Physics 2016-11-11 Kenneth Bloom

Two stage cluster for resource optimization with Apache Mesos

As resource estimation for jobs is difficult, users often overestimate their requirements. Both commercial clouds and academic campus clusters suffer from low resource utilization and long wait times as the resource estimates for jobs,…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-27 Gourav Rattihalli , Pankaj Saha , Madhusudhan Govindaraju , Devesh Tiwari

Installing, Running and Maintaining Large Linux Clusters at CERN

Having built up Linux clusters to more than 1000 nodes over the past five years, we already have practical experience confronting some of the LHC scale computing challenges: scalability, automation, hardware diversity, security, and rolling…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 Vladimir Bahyl , Benjamin Chardi , Jan van Eldik , Ulrich Fuchs , Thorsten Kleinwort , Martin Murth , Tim Smith

Dynamic Virtualized Deployment of Particle Physics Environments on a High Performance Computing Cluster

The NEMO High Performance Computing Cluster at the University of Freiburg has been made available to researchers of the ATLAS and CMS experiments. Users access the cluster from external machines connected to the World-wide LHC Computing…

Computational Physics · Physics 2018-12-31 Felix Bührer , Frank Fischer , Georg Fleig , Anton Gamel , Manuel Giffels , Thomas Hauth , Michael Janczyk , Konrad Meier , Günter Quast , Benoît Roland , Ulrike Schnoor , Markus Schumacher , Dirk von Suchodoletz , Bernd Wiebelt

An outlook of the user support model to educate the users community at the CMS Experiment

The CMS (Compact Muon Solenoid) experiment is one of the two large general-purpose particle physics detectors built at the LHC (Large Hadron Collider) at CERN in Geneva, Switzerland. The diverse collaboration combined with a highly…

Popular Physics · Physics 2011-10-04 Sudhir Malik , Kati Lassila-Perini

Web Based Monitoring in the CMS Experiment at CERN

The Compact Muon Solenoid (CMS) is a large and complex general purpose experiment at the CERN Large Hadron Collider (LHC), built and maintained by many collaborators from around the world. Efficient operation of the detector requires…

Instrumentation and Detectors · Physics 2014-09-04 William Badgett , Laura Borrello , Irakli Chakaberia , Dominique Gigi , Young-Kwon Jo , Juan Antonio Lopez-Perez , Kaori Maeshima , Sho Maruyama , James Patrick , Valdas Rapsevicius , Aron Soha , Balys Sulmanas , Zongru Wan

Squeezing Edge Performance: A Sensitivity-Aware Container Management for Heterogeneous Tasks

Edge computing enables latency-critical applications to process data close to end devices, yet task heterogeneity and limited resources pose significant challenges to efficient orchestration. This paper presents a measurement-driven,…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-01 Yongmin Zhang , Pengyu Huang , Mingyi Dong , Jing Yao

Repurposing of the Run 2 CMS High Level Trigger Infrastructure as a Cloud Resource for Offline Computing

The former CMS Run 2 High Level Trigger (HLT) farm is one of the largest contributors to CMS compute resources, providing about 25k job slots for offline computing. This CPU farm was initially employed as an opportunistic resource,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-24 Marco Mascheroni , Antonio Perez-Calero Yzquierdo , Edita Kizinevic , Farrukh Aftab Khan , Hyunwoo Kim , Maria Acosta Flechas , Nikos Tsipinakis , Saqib Haleem , Damiele Spiga , Christoph Wissing , Frank Wurthwein

The integration of heterogeneous resources in the CMS Submission Infrastructure for the LHC Run 3 and beyond

While the computing landscape supporting LHC experiments is currently dominated by x86 processors at WLCG sites, this configuration will evolve in the coming years. LHC collaborations will be increasingly employing HPC and Cloud facilities…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-24 Antonio Perez-Calero Yzquierdo , Marco Mascheroni , Edita Kizinevic , Farrukh Aftab Khan , Hyunwoo Kim , Maria Acosta Flechas , Nikos Tsipinakis , Saqib Haleem

CMS Software and Computing: Ready for Run 2

In Run 1 of the Large Hadron Collider, software and computing was a strategic strength of the Compact Muon Solenoid experiment. The timely processing of data and simulation samples and the excellent performance of the reconstruction…

Computational Physics · Physics 2015-10-05 Kenneth Bloom

Deep Learning with Apache SystemML

Enterprises operate large data lakes using Hadoop and Spark frameworks that (1) run a plethora of tools to automate powerful data preparation/transformation pipelines, (2) run on shared, large clusters to (3) perform many different…

Machine Learning · Computer Science 2018-02-14 Niketan Pansare , Michael Dusenberry , Nakul Jindal , Matthias Boehm , Berthold Reinwald , Prithviraj Sen