Related papers: Data-parallel programming with Intel Array Buildin…

Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors

This paper advocates for an intertwined design of the dense linear algebra software stack that breaks down the strict barriers between the high-level, blocked algorithms in LAPACK (Linear Algebra PACKage) and the low-level,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-01 Héctor Martínez , Sandra Catalán , Francisco D. Igual , José R. Herrero , Rafael Rodríguez-Sánchez , Enrique S. Quintana-Ortí

PolyBlocks: A Compiler Infrastructure for AI Chips and Programming Frameworks

We present the design and implementation of PolyBlocks, a modular and reusable MLIR-based compiler infrastructure for AI programming frameworks and AI chips. PolyBlocks is based on pass pipelines that compose transformations on loop nests…

Programming Languages · Computer Science 2026-03-11 Uday Bondhugula , Akshay Baviskar , Navdeep Katel , Vimal Patel , Anoop JS , Arnab Dutta

Towards a Multi-array Architecture for Accelerating Large-scale Matrix Multiplication on FPGAs

Large-scale floating-point matrix multiplication is a fundamental kernel in many scientific and engineering applications. Most existing work only focus on accelerating matrix multiplication on FPGA by adopting a linear systolic array. This…

Hardware Architecture · Computer Science 2018-03-13 Junzhong Shen , Yuran Qiao , You Huang , Mei Wen , Chunyuan Zhang

Parallel Computing With R: A Brief Review

Parallel computing has established itself as another standard method for applied research and data analysis. The R system, being internally constrained to mostly singly-threaded operations, can nevertheless be used along with different…

Computation · Statistics 2020-04-07 Dirk Eddelbuettel

ArrayBridge: Interweaving declarative array processing with high-performance computing

Scientists are increasingly turning to datacenter-scale computers to produce and analyze massive arrays. Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and…

Databases · Computer Science 2017-02-28 Haoyuan Xing , Sofoklis Floratos , Spyros Blanas , Suren Byna , Prabhat , Kesheng Wu , Paul Brown

Hybrid programming-model strategies for GPU offloading of electronic structure calculation kernels

To address the challenge of performance portability, and facilitate the implementation of electronic structure solvers, we developed the Basic Matrix Library (BML) and Parallel, Rapid O(N) and Graph-based Recursive Electronic Structure…

Computational Physics · Physics 2024-01-26 Jean-Luc Fattebert , Christian F. A. Negre , Joshua Finkelstein , Jamaludin Mohd-Yusof , Daniel Osei-Kuffuor , Michael E. Wall , Yu Zhang , Nicolas Bock , Susan M. Mniszewski

Efficient Tree-Traversals: Reconciling Parallelism and Dense Data Representations

Recent work showed that compiling functional programs to use dense, serialized memory representations for recursive algebraic datatypes can yield significant constant-factor speedups for sequential programs. But serializing data in a…

Programming Languages · Computer Science 2021-07-02 Chaitanya Koparkar , Mike Rainey , Michael Vollmer , Milind Kulkarni , Ryan R. Newton

Overview of the IBM Neural Computer Architecture

The IBM Neural Computer (INC) is a highly flexible, re-configurable parallel processing system that is intended as a research and development platform for emerging machine intelligence algorithms and computational neuroscience. It consists…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-26 Pritish Narayanan , Charles E. Cox , Alexis Asseman , Nicolas Antoine , Harald Huels , Winfried W. Wilcke , Ahmet S. Ozcan

PGAbB: A Block-Based Graph Processing Framework for Heterogeneous Platforms

Designing flexible graph kernels that can run well on various platforms is a crucial research problem due to the frequent usage of graphs for modeling data and recent architectural advances and variety. In this work, we propose a novel…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-13 Abdurrahman Yasar , Sivasankaran Rajamanickam , Jonathan W. Berry , Umit V. Catalyurek

Parallel Programming Models for Heterogeneous Many-Cores : A Survey

Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers. While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-11 Jianbin Fang , Chun Huang , Tao Tang , Zheng Wang

MSREP: A Fast yet Light Sparse Matrix Framework for Multi-GPU Systems

Sparse linear algebra kernels play a critical role in numerous applications, covering from exascale scientific simulation to large-scale data analytics. Offloading linear algebra kernels on one GPU will no longer be viable in these…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-19 Jieyang Chen , Chenhao Xie , Jesun S Firoz , Jiajia Li , Shuaiwen Leon Song , Kevin Barker , Mark Raugas , Ang Li

Conformal Computing: Algebraically connecting the hardware/software boundary using a uniform approach to high-performance computation for software and hardware applications

We present a systematic, algebraically based, design methodology for efficient implementation of computer programs optimized over multiple levels of the processor/memory and network hierarchy. Using a common formalism to describe the…

Mathematical Software · Computer Science 2008-03-18 Lenore R. Mullin , James E. Raynolds

A Parallel Task-based Approach to Linear Algebra

Processors with large numbers of cores are becoming commonplace. In order to take advantage of the available resources in these systems, the programming paradigm has to move towards increased parallelism. However, increasing the level of…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-10-07 Ashkan Tousimojarad , Wim Vanderbauwhede

A Parallelizable Acceleration Framework for Packing Linear Programs

This paper presents an acceleration framework for packing linear programming problems where the amount of data available is limited, i.e., where the number of constraints m is small compared to the variable dimension n. The framework can be…

Optimization and Control · Mathematics 2017-11-20 Palma London , Shai Vardi , Adam Wierman , Hanling Yi

Extreme-Scale Block-Structured Adaptive Mesh Refinement

In this article, we present a novel approach for block-structured adaptive mesh refinement (AMR) that is suitable for extreme-scale parallelism. All data structures are designed such that the size of the meta data in each distributed…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-24 Florian Schornbaum , Ulrich Rüde

Coarse-Grain Performance Estimator for Heterogeneous Parallel Computing Architectures like Zynq All-Programmable SoC

Heterogeneous computing is emerging as a mandatory requirement for power-efficient system design. With this aim, modern heterogeneous platforms like Zynq All-Programmable SoC, that integrates ARM-based SMP and programmable logic, have been…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-08-28 Daniel Jiménez-González , Carlos Álvarez , Antonio Filgueras , Xavier Martorell , Jan Langer , Juanjo Noguera , Kees Vissers

MIRGE: An Array-Based Computational Framework for Scientific Computing

MIRGE is a computational approach for scientific computing based on NumPy-like array computation, but using lazy evaluation to recast computation as data-flow graphs, where nodes represent immutable, multi-dimensional arrays. Evaluation of…

Mathematical Software · Computer Science 2025-12-22 Matthias Diener , Matthew J. Smith , Michael T. Campbell , Kaushik Kulkarni , Michael J. Anderson , Andreas Klöckner , William Gropp , Jonathan B. Freund , Luke N. Olson

AutoParallel: A Python module for automatic parallelization and distributed execution of affine loop nests

The last improvements in programming languages, programming models, and frameworks have focused on abstracting the users from many programming issues. Among others, recent programming frameworks include simpler syntax, automatic memory…

Programming Languages · Computer Science 2018-10-29 Cristian Ramon-Cortes , Ramon Amela , Jorge Ejarque , Philippe Clauss , Rosa M. Badia

High level programming abstractions for leveraging hierarchical memories with micro-core architectures

Micro-core architectures combine many low memory, low power computing cores together in a single package. These are attractive for use as accelerators but due to limited on-chip memory and multiple levels of memory hierarchy, the way in…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-06 Maurice Jamieson , Nick Brown

Data Generation for Stability Studies of Power Systems with High Penetration of Inverter-Based Resources

The increasing penetration of inverter-based resources (IBRs) is fundamentally reshaping power system dynamics and creating new challenges for stability assessment. Data-driven approaches, and in particular machine learning models, require…

Systems and Control · Electrical Eng. & Systems 2026-01-26 Francesca Rossi , Mauro Garcia Lorenzo , Eduardo Iraola de Acevedo , Elia Mateu Barriendos , Vinicius Albernaz Lacerda , Francesc Lordan-Gomis , Rosa Badia , Eduardo Prieto-Araujo