Related papers: ArrayBridge: Interweaving declarative array proces…

Parallel netCDF: A Scientific High-Performance I/O Interface

Dataset storage, exchange, and access play a critical role in scientific applications. For such purposes netCDF serves as a portable and efficient file format and programming interface, which is popular in numerous scientific application…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 Jianwei Li , Wei-keng Liao , Alok Choudhary , Robert Ross , Rajeev Thakur , William Gropp , Rob Latham

Efficient Iterative Processing in the SciDB Parallel Array Engine

Many scientific data-intensive applications perform iterative computations on array data. There exist multiple engines specialized for array processing. These engines efficiently support various types of operations, but none includes native…

Databases · Computer Science 2015-06-02 Emad Soroush , Magdalena Balazinska , Simon Krughoff , Andrew Connolly

hep_tables: Heterogeneous Array Programming for HEP

Array operations are one of the most concise ways of expressing common filtering and simple aggregation operations that is the hallmark of the first step of a particle physics analysis: selection, filtering, basic vector operations, and…

Databases · Computer Science 2021-09-08 Gordon Watts

Multi-Dimensional Data Compression and Query Processing in Array Databases

In recent times, the production of multidimensional data in various domains and their storage in array databases has witnessed a sharp increase; this rapid growth in data volumes necessitates compression in array databases. However,…

Databases · Computer Science 2022-11-14 Minsoo Kim , Hyubjin Lee , Yon Dohn Chung

Multi-Terabyte EIDE Disk Arrays running Linux RAID5

High-energy physics experiments are currently recording large amounts of data and in a few years will be recording prodigious quantities of data. New methods must be developed to handle this data and make analysis at universities possible.…

Data Analysis, Statistics and Probability · Physics 2007-05-23 D. A. Sanders , L. M. Cremaldi , V. Eschenburg , R. Godang , M. D. Joy , D. J. Summers , D. L. Petravick

Design and optimisation of an efficient HDF5 I/O kernel for massive parallel fluid flow simulations

More and more massive parallel codes running on several hundreds of thousands of cores enter the computational science and engineering domain, allowing high-fidelity computations on up to trillions of unknowns for very detailed analyses of…

Performance · Computer Science 2018-07-18 Christoph Ertl , Jérôme Frisch , Ralf-Peter Mundani

RawArray: A Simple, Fast, and Extensible Archival Format for Numeric Data

Raw data sizes are growing and proliferating in scientific research, driven by the success of data-hungry computational methods, such as machine learning. The preponderance of proprietary and shoehorned data formats make computations slower…

Databases · Computer Science 2022-01-02 David S. Smith

MIRGE: An Array-Based Computational Framework for Scientific Computing

MIRGE is a computational approach for scientific computing based on NumPy-like array computation, but using lazy evaluation to recast computation as data-flow graphs, where nodes represent immutable, multi-dimensional arrays. Evaluation of…

Mathematical Software · Computer Science 2025-12-22 Matthias Diener , Matthew J. Smith , Michael T. Campbell , Kaushik Kulkarni , Michael J. Anderson , Andreas Klöckner , William Gropp , Jonathan B. Freund , Luke N. Olson

Data-parallel programming with Intel Array Building Blocks (ArBB)

Intel Array Building Blocks is a high-level data-parallel programming environment designed to produce scalable and portable results on existing and upcoming multi- and many-core platforms. We have chosen several mathematical kernels - a…

Performance · Computer Science 2012-11-08 Volker Weinberg

Distributed Caching for Complex Querying of Raw Arrays

As applications continue to generate multi-dimensional data at exponentially increasing rates, fast analytics to extract meaningful results is becoming extremely important. The database community has developed array databases that alleviate…

Databases · Computer Science 2018-03-19 Weijie Zhao , Florin Rusu , Bin Dong , Kesheng Wu , Anna Y. Q. Ho , Peter Nugent

An array-oriented Python interface for FastJet

Analysis on HEP data is an iterative process in which the results of one step often inform the next. In an exploratory analysis, it is common to perform one computation on a collection of events, then view the results (often with…

High Energy Physics - Experiment · Physics 2023-02-21 Aryan Roy , Jim Pivarski , Chad Wells Freer

ArrayFlex: A Systolic Array Architecture with Configurable Transparent Pipelining

Convolutional Neural Networks (CNNs) are the state-of-the-art solution for many deep learning applications. For maximum scalability, their computation should combine high performance and energy efficiency. In practice, the convolutions of…

Hardware Architecture · Computer Science 2023-06-07 C. Peltekis , D. Filippas , G. Dimitrakopoulos , C. Nicopoulos , D. Pnevmatikatos

Reproducible Cross-border High Performance Computing for Scientific Portals

To reproduce eScience, several challenges need to be solved: scientific workflows need to be automated; the involved software versions need to be provided in an unambiguous way; input data needs to be easily accessible; High-Performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-30 Kessy Abarenkov , Anne Fouilloux , Helmut Neukirchen , Abdulrahman Azab

A Hardware Co-design Workflow for Scientific Instruments at the Edge

As spatial and temporal resolutions of scientific instruments improve, the explosion in the volume of data produced is becoming a key challenge. It can be a critical bottleneck for integration between scientific instruments at the edge and…

Instrumentation and Detectors · Physics 2021-11-03 Kazutomo Yoshii , Rajesh Sankaran , Sebastian Strempfer , Maksim Levental , Mike Hammer , Antonino Miceli

Supporting High-Performance and High-Throughput Computing for Experimental Science

The advent of experimental science facilities-instruments and observatories, such as the Large Hadron Collider, the Laser Interferometer Gravitational Wave Observatory, and the upcoming Large Synoptic Survey Telescope-has brought about…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-12 E. A. Huerta , Roland Haas , Shantenu Jha , Mark Neubauer , Daniel S. Katz

A Survey on Array Storage, Query Languages, and Systems

Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data. To the best of our knowledge, a unified resource that…

Databases · Computer Science 2013-02-20 Florin Rusu , Yu Cheng

ds-array: A Distributed Data Structure for Large Scale Machine Learning

Machine learning has proved to be a useful tool for extracting knowledge from scientific data in numerous research fields, including astrophysics, genomics, and molecular dynamics. Often, data sets from these research areas need to be…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-21 Javier Álvarez Cid-Fuentes , Pol Álvarez , Salvi Solà , Kuninori Ishii , Rafael K. Morizawa , Rosa M. Badia

LifeRaft: Data-Driven, Batch Processing for the Exploration of Scientific Databases

Workloads that comb through vast amounts of data are gaining importance in the sciences. These workloads consist of "needle in a haystack" queries that are long running and data intensive so that query throughput limits performance. To…

Databases · Computer Science 2009-09-15 Xiaodan Wang , Randal Burns , Tanu Malik

Making Array-Based Translation Practical for Modern, High-Performance Buffer Management

Modern buffer pools must now support a broader workload mix than classic OLTP alone. In addition to B-tree lookups, database systems increasingly serve scan-heavy analytics and vector-search indexes with irregular high-fan-out graph…

Databases · Computer Science 2026-04-02 Xinjing Zhou , Jinming Hu , Andrew Pavlo , Michael Stonebraker

DeepBridge: A Unified and Production-Ready Framework for Multi-Dimensional Machine Learning Validation

We present DeepBridge, an 80K-line Python library that unifies multi-dimensional validation, automatic compliance verification, knowledge distillation, and synthetic data generation. DeepBridge offers: (i) 5 validation suites (fairness with…

Machine Learning · Computer Science 2025-12-24 Gustavo Coelho Haase , Paulo Henrique Dourado da Silva