Related papers: High-Throughput Computing on High-Performance Plat…
ATLAS, a general-purpose experiment at the Large Hadron Collider (LHC), makes use of a large internationally-distributed computing infrastructure, including over $10^6$ TB of managed data on disk and tape and almost one million…
The ATLAS experiment has developed extensive software and distributed computing systems for Run 3 of the LHC. These systems are described in detail, including software infrastructure and workflows, distributed data and workload management,…
This paper describes the use of a distributed cloud computing system for high-throughput computing (HTC) scientific applications. The distributed cloud computing system is composed of a number of separate Infrastructure-as-a-Service (IaaS)…
The ATLAS experiment at CERN relies on a worldwide distributed computing Grid infrastructure to support its physics program at the Large Hadron Collider. ATLAS has integrated cloud computing resources to complement its Grid infrastructure…
Power consumption will be a key constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics (HEP). This makes performance-per-watt a crucial metric for selecting cost-efficient computing…
In this presentation the experiences of the LHC experiments using grid computing were presented with a focus on experience with distributed analysis. After many years of development, preparation, exercises, and validation the LHC (Large…
The advent of experimental science facilities-instruments and observatories, such as the Large Hadron Collider, the Laser Interferometer Gravitational Wave Observatory, and the upcoming Large Synoptic Survey Telescope-has brought about…
Large High Energy Physics (HEP) experiments adopted a distributed computing model more than a decade ago. WLCG, the global computing infrastructure for LHC, in partnership with the US Open Science Grid, has achieved data management at the…
As particle physics experiments push their limits on both the energy and the intensity frontiers, the amount and complexity of the produced data are also expected to increase accordingly. With such large data volumes, next-generation…
In 2002 the ATLAS experiment started a series of Data Challenges (DC) of which the goals are the validation of the Computing Model, of the complete software suite, of the data model, and to ensure the correctness of the technical choices to…
In April 2023, HEPScore23, the new benchmark based on HEP specific applications, was adopted by WLCG, replacing HEP-SPEC06. As part of the transition to the new benchmark, the CPU corepower published by the sites needed to be compared with…
In this chapter we will argue that studying such multi-scale multi-science systems gives rise to inherently hybrid models containing many different algorithms best serviced by different types of computing environments (ranging from…
The aggregate power use of computing hardware is an important cost factor in scientific cluster and distributed computing systems. The Worldwide LHC Computing Grid (WLCG) is a major example of such a distributed computing system, used…
For decades, the use of HPC systems was limited to those in the physical sciences who had mastered their domain in conjunction with a deep understanding of HPC architectures and algorithms. During these same decades, consumer computing…
Digital twins are transforming the way we monitor, analyze, and control physical systems, but designing architectures that balance real-time responsiveness with heavy computational demands remains a challenge. Cloud-based solutions often…
High energy physics (HEP) experiments at the LHC generate data at a rate of $\mathcal{O}(10)$ Terabits per second. This data rate is expected to exponentially increase as experiments will be upgraded in the future to achieve higher…
Optimal use of computing resources requires extensive coding, tuning and benchmarking. To boost developer productivity in these time consuming tasks, we introduce the Experimental Linear Algebra Performance Studies framework (ELAPS), a…
Artificial Intelligence for scientific applications increasingly requires training large models on data that cannot be centralized due to privacy constraints, data sovereignty, or the sheer volume of data generated. Federated learning (FL)…
The Latin American Giant Observatory (LAGO) project utilizes extensive High-Performance Computing (HPC) resources for complex astroparticle physics simulations, making resource efficiency critical for scientific productivity and…
Each LHC experiment will produce datasets with sizes of order one petabyte per year. All of this data must be stored, processed, transferred, simulated and analyzed, which requires a computing system of a larger scale than ever mounted for…