Related papers: Data-parallel programming with Intel Array Buildin…
This paper advocates for an intertwined design of the dense linear algebra software stack that breaks down the strict barriers between the high-level, blocked algorithms in LAPACK (Linear Algebra PACKage) and the low-level,…
We present the design and implementation of PolyBlocks, a modular and reusable MLIR-based compiler infrastructure for AI programming frameworks and AI chips. PolyBlocks is based on pass pipelines that compose transformations on loop nests…
Large-scale floating-point matrix multiplication is a fundamental kernel in many scientific and engineering applications. Most existing work only focus on accelerating matrix multiplication on FPGA by adopting a linear systolic array. This…
Parallel computing has established itself as another standard method for applied research and data analysis. The R system, being internally constrained to mostly singly-threaded operations, can nevertheless be used along with different…
Scientists are increasingly turning to datacenter-scale computers to produce and analyze massive arrays. Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and…
To address the challenge of performance portability, and facilitate the implementation of electronic structure solvers, we developed the Basic Matrix Library (BML) and Parallel, Rapid O(N) and Graph-based Recursive Electronic Structure…
Recent work showed that compiling functional programs to use dense, serialized memory representations for recursive algebraic datatypes can yield significant constant-factor speedups for sequential programs. But serializing data in a…
The IBM Neural Computer (INC) is a highly flexible, re-configurable parallel processing system that is intended as a research and development platform for emerging machine intelligence algorithms and computational neuroscience. It consists…
Designing flexible graph kernels that can run well on various platforms is a crucial research problem due to the frequent usage of graphs for modeling data and recent architectural advances and variety. In this work, we propose a novel…
Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers. While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential…
Sparse linear algebra kernels play a critical role in numerous applications, covering from exascale scientific simulation to large-scale data analytics. Offloading linear algebra kernels on one GPU will no longer be viable in these…
We present a systematic, algebraically based, design methodology for efficient implementation of computer programs optimized over multiple levels of the processor/memory and network hierarchy. Using a common formalism to describe the…
Processors with large numbers of cores are becoming commonplace. In order to take advantage of the available resources in these systems, the programming paradigm has to move towards increased parallelism. However, increasing the level of…
This paper presents an acceleration framework for packing linear programming problems where the amount of data available is limited, i.e., where the number of constraints m is small compared to the variable dimension n. The framework can be…
In this article, we present a novel approach for block-structured adaptive mesh refinement (AMR) that is suitable for extreme-scale parallelism. All data structures are designed such that the size of the meta data in each distributed…
Heterogeneous computing is emerging as a mandatory requirement for power-efficient system design. With this aim, modern heterogeneous platforms like Zynq All-Programmable SoC, that integrates ARM-based SMP and programmable logic, have been…
MIRGE is a computational approach for scientific computing based on NumPy-like array computation, but using lazy evaluation to recast computation as data-flow graphs, where nodes represent immutable, multi-dimensional arrays. Evaluation of…
The last improvements in programming languages, programming models, and frameworks have focused on abstracting the users from many programming issues. Among others, recent programming frameworks include simpler syntax, automatic memory…
Micro-core architectures combine many low memory, low power computing cores together in a single package. These are attractive for use as accelerators but due to limited on-chip memory and multiple levels of memory hierarchy, the way in…
The increasing penetration of inverter-based resources (IBRs) is fundamentally reshaping power system dynamics and creating new challenges for stability assessment. Data-driven approaches, and in particular machine learning models, require…