Related papers: The NumPy array: a structure for efficient numeric…
Array programming provides a powerful, compact, expressive syntax for accessing, manipulating, and operating on data in vectors, matrices, and higher-dimensional arrays. NumPy is the primary array programming library for the Python…
Frameworks like Numpy are a popular choice for application developers from varied fields such as image processing to bio-informatics to machine learning. Numpy is often used to develop prototypes or for deployment since it provides…
PaPy, which stands for parallel pipelines in Python, is a highly flexible framework that enables the construction of robust, scalable workflows for either generating or processing voluminous datasets. A workflow is created from user-written…
Linked lists have long served as a valuable teaching tool in programming. However, the question arises: Are they truly practical for everyday program use? In most cases, it appears that array-based data structures offer distinct advantages,…
We introduce PrivPy, a practical privacy-preserving collaborative computation framework, especially optimized for machine learning tasks. PrivPy provides an easy-to-use and highly compatible Python programming front-end which supports…
Software that processes real-world data or that models a physical system must have some way of managing units. While simple approaches like the understood convention that all data are in a unit system (such as the MKS SI unit system) do…
Array-like collection data structures are widely established in Python's scientific computing-ecosystem for high-performance computations. The structure maps well to regular, gridded lattice structures that are common to computational…
Scientists increasingly rely on Python tools to perform scalable distributed memory array operations using rich, NumPy-like expressions. However, many of these tools rely on dynamic schedulers optimized for abstract task graphs, which often…
Numpy and SciPy are program libraries for the Python scripting language, which apply to a large spectrum of numerical and scientific computing tasks. The Sage project provides a multiplatform software environment which enables one to use,…
To execute scientific computing programs such as deep learning at high speed, GPU acceleration is a powerful option. With the recent advancements in web technologies, interfaces like WebGL and WebGPU, which utilize GPUs on the client side…
Big array analytics is becoming indispensable in answering important scientific and business questions. Most analysis tasks consist of multiple steps, each making one or multiple passes over the arrays to be analyzed and generating…
Matrices and more generally multidimensional arrays, form the backbone of computational studies. In this paper we demonstrate increases in computational efficiency by performing partial-tracing/tensor-contractions on sparse-arrays. It was…
Despite advancements in the areas of parallel and distributed computing, the complexity of programming on High Performance Computing (HPC) resources has deterred many domain experts, especially in the areas of machine learning and…
We present a systematic, algebraically based, design methodology for efficient implementation of computer programs optimized over multiple levels of the processor/memory and network hierarchy. Using a common formalism to describe the…
Effective provenance tracking enhances reproducibility, governance, and data quality in array workflows. However, significant challenges arise in capturing this provenance, including: (1) rapidly evolving APIs, (2) diverse operation types,…
Python is rapidly becoming the lingua franca of machine learning and scientific computing. With the broad use of frameworks such as Numpy, SciPy, and TensorFlow, scientific computing and machine learning are seeing a productivity boost on…
There are numerous examples of problems in symbolic algebra in which the required storage grows far beyond the limitations even of the distributed RAM of a cluster. Often this limitation determines how large a problem one can solve in…
Scipp is heavily inspired by the Python library xarray. It enriches raw NumPy-like multi-dimensional arrays of data by adding named dimensions and associated coordinates. Multiple arrays are combined into datasets. On top of this, scipp…
Existing Python libraries and tools lack the ability to efficiently compute statistical test results for large datasets in the presence of missing values. This presents an issue as soon as constraints on runtime and memory availability…
pPython seeks to provide a parallel capability that provides good speed-up without sacrificing the ease of programming in Python by implementing partitioned global array semantics (PGAS) on top of a simple file-based messaging library…