Related papers: The HSF Conditions Database Reference Implementati…
Conditions Data in high energy physics experiments is frequently seen as every data needed for reconstruction besides the event data itself. This includes all sorts of slowly evolving data like detector alignment, calibration and…
To produce the best physics results, high energy physics experiments require access to calibration and other non-event data during event data processing. These conditions data are typically stored in databases that provide versioning…
Apache HBase, a mainstay of the emerging Hadoop ecosystem, is a NoSQL key-value and column family hybrid database which, unlike a traditional RDBMS, is intentionally designed to scalably host large, semistructured, and heterogeneous data.…
This report evaluates the new analytical capabilities of DataStax Enterprise (DSE) [1] through the use of standard Hadoop workloads. In particular, we run experiments with CPU and I/O bound micro-benchmarks as well as OLAP-style analytical…
Across many domains, large swaths of digital assets are being stored across distributed data repositories, e.g., the DANDI Archive [8]. The distribution and diversity of these repositories impede researchers from formally defining…
ATLAS event data processing requires access to non-event data (detector conditions, calibrations, etc.) stored in relational databases. The database-resident data are crucial for the event data reconstruction processing steps and often…
With the use of object-oriented languages for HEP, many experiments have designed their data objects to contain direct references to other objects in the event (e.g., tracks and electromagnetic showers have references to each other to…
HRDBMS is a novel distributed relational database that uses a hybrid model combining the best of traditional distributed relational databases and Big Data analytics platforms such as Hive. This allows HRDBMS to leverage years worth of…
To extract physics results from the recorded data, the LHC experiments are using Grid computing infrastructure. The event data processing on the Grid requires scalable access to non-event data (detector conditions, calibrations, etc.)…
Research has become dependent on processing power and storage, one crucial aspect being data sharing. The Open Science Data Federation (OSDF) project aims to create a scientific global data distribution network based on the Pelican…
In the era of big data, conventional RDBMS models have become impractical for handling colossal workloads. Consequently, NoSQL databases have emerged as the preferred storage solutions for executing processing-intensive Online Analytical…
As storage systems become increasingly heterogeneous and complex, it adds burdens on DBAs, causing suboptimal performance even after a lot of human efforts have been made. In addition, existing monitoring-based storage management by access…
Hadoop is an open source implementation of the MapReduce Framework in the realm of distributed processing. A Hadoop cluster is a unique type of computational cluster designed for storing and analyzing large data sets across cluster of…
Standardizing terminology to annotate electrophysiological events can improve both computational research and clinical care. Sharing data enriched with standard terms can facilitate data exploration, from case studies to mega-analyses. The…
We present a multi-process application called HISTEX (HISTory EXerciser), which executes input histories in a generic transactional notation on commercial DBMS platforms. HISTEX could be used to discover potential errors in the…
In the CMS experiment, the non event data needed to set up the detector, or being produced by it, and needed to calibrate the physical responses of the detector itself are stored in ORACLE databases. The large amount of data to be stored,…
Memory-to-memory data streaming is essential for modern scientific workflows that require near real-time data analysis, experimental steering, and informed decision-making during experiment execution. It eliminates the latency bottlenecks…
In a data warehousing process, mastering the data preparation phase allows substantial gains in terms of time and performance when performing multidimensional analysis or using data mining algorithms. Furthermore, a data warehouse can…
In the current context of Big Data, a multitude of new NoSQL solutions for storing, managing, and extracting information and patterns from semi-structured data have been proposed and implemented. These solutions were developed to relieve…
This paper examines how a "Distributed Heterogeneous Relational Data Warehouse" can be integrated in a Grid environment that will provide physicists with efficient access to large and small object collections drawn from databases at…