Related papers: The A4 project: physics data processing using the …
In this article, we present the High-Performance Output (HiPO) data format developed at Jefferson Laboratory for storing and analyzing data from Nuclear Physics experiments. The format was designed to efficiently store large amounts of…
We propose a data format for Monte Carlo (MC) events, or any structural data, including experimental data, in a compact binary form using variable-size integer encoding as implemented in the Google's Protocol Buffers package. This approach…
Modern large-scale physics experiments create datasets with sizes and streaming rates that can exceed those from industry leaders such as Google Cloud and Netflix. Fully processing these datasets requires both sufficient compute power and…
The ROOT TTree data format encodes hundreds of petabytes of High Energy and Nuclear Physics events. Its columnar layout drives rapid analyses, as only those parts ("branches") that are really used in a given analysis need to be read from…
The development of a package for the management of physics data is described: its design, implementation and computational benchmarks. This package improves the data management tools originally developed for Geant4 physics models based on…
In High Energy Physics (HEP), experimentalists generate large volumes of data that, when analyzed, helps us better understand the fundamental particles and their interactions. This data is often captured in many files of small size,…
The development of a package for the management of physics data is described: its design, implementation and computational benchmarks. This package improves the data management tools originally developed for Geant4 physics models based on…
Computing needs for high energy physics are already intensive and are expected to increase drastically in the coming years. In this context, heterogeneous computing, specifically as-a-service computing, has the potential for significant…
The data processing model for the CDF experiment is described. Data processing reconstructs events from parallel data streams taken with different combinations of physics event triggers and further splits the events into datasets of…
In recent years, digital object management practices to support findability, accessibility, interoperability, and reusability (FAIR) have begun to be adopted across a number of data-intensive scientific disciplines. These digital objects…
Tabular data stands out as one of the most frequently encountered types in high energy physics. Unlike commonly homogeneous data such as pixelated images, simulating high-dimensional tabular data and accurately capturing their correlations…
With the increasing physical event rate and number of electronic channels, traditional readout scheme meets the challenge of improving readout speed caused by the limited bandwidth of crate backplane. In this paper, a high-speed data…
Open-source process mining provides many algorithms for the analysis of event data which could be used to analyze mainstream processes (e.g., O2C, P2P, CRM). However, compared to commercial tools, they lack the performance and struggle to…
Due to the advantages of universality, flexibility and high performance, fast Ethernet is widely used in readout system design of modern particle physics experiments. However, Ethernet is usually used together with TCP/IP protocol stack,…
The continuous growth of data production in almost all scientific areas raises new problems in data access and management, especially in a scenario where the end-users, as well as the resources that they can access, are worldwide…
There has been considerable research into improving Fast Fourier Transform (FFT) performance through parallelization and optimization for specialized hardware. However, even with those advancements, processing of very large files, over 1TB…
General purpose computing on graphic processing units (GPU) is a potential method of speeding up scientific computation with low cost and high energy efficiency. We experimented with the particle physics simulation toolkit Geant4 used at…
The ATLAS experiment at CERN relies on a worldwide distributed computing Grid infrastructure to support its physics program at the Large Hadron Collider. ATLAS has integrated cloud computing resources to complement its Grid infrastructure…
ARTUS is an event-based data-processing framework for high energy physics experiments. It is designed for large-scale data analysis in a collaborative environment. The architecture design choices take into account typical challenges and are…
The POOL project is the common persistency framework for the LHC experiments to store petabytes of experiment data and metadata in a distributed and grid enabled way. POOL is a hybrid event store consisting of a data streaming layer and a…