English
Related papers

Related papers: Data-parallel programming with Intel Array Buildin…

200 papers

This paper advocates for an intertwined design of the dense linear algebra software stack that breaks down the strict barriers between the high-level, blocked algorithms in LAPACK (Linear Algebra PACKage) and the low-level,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-01 Héctor Martínez , Sandra Catalán , Francisco D. Igual , José R. Herrero , Rafael Rodríguez-Sánchez , Enrique S. Quintana-Ortí

We present the design and implementation of PolyBlocks, a modular and reusable MLIR-based compiler infrastructure for AI programming frameworks and AI chips. PolyBlocks is based on pass pipelines that compose transformations on loop nests…

Programming Languages · Computer Science 2026-03-11 Uday Bondhugula , Akshay Baviskar , Navdeep Katel , Vimal Patel , Anoop JS , Arnab Dutta

Large-scale floating-point matrix multiplication is a fundamental kernel in many scientific and engineering applications. Most existing work only focus on accelerating matrix multiplication on FPGA by adopting a linear systolic array. This…

Hardware Architecture · Computer Science 2018-03-13 Junzhong Shen , Yuran Qiao , You Huang , Mei Wen , Chunyuan Zhang

Parallel computing has established itself as another standard method for applied research and data analysis. The R system, being internally constrained to mostly singly-threaded operations, can nevertheless be used along with different…

Computation · Statistics 2020-04-07 Dirk Eddelbuettel

Scientists are increasingly turning to datacenter-scale computers to produce and analyze massive arrays. Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and…

Databases · Computer Science 2017-02-28 Haoyuan Xing , Sofoklis Floratos , Spyros Blanas , Suren Byna , Prabhat , Kesheng Wu , Paul Brown

To address the challenge of performance portability, and facilitate the implementation of electronic structure solvers, we developed the Basic Matrix Library (BML) and Parallel, Rapid O(N) and Graph-based Recursive Electronic Structure…

Recent work showed that compiling functional programs to use dense, serialized memory representations for recursive algebraic datatypes can yield significant constant-factor speedups for sequential programs. But serializing data in a…

Programming Languages · Computer Science 2021-07-02 Chaitanya Koparkar , Mike Rainey , Michael Vollmer , Milind Kulkarni , Ryan R. Newton

The IBM Neural Computer (INC) is a highly flexible, re-configurable parallel processing system that is intended as a research and development platform for emerging machine intelligence algorithms and computational neuroscience. It consists…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-26 Pritish Narayanan , Charles E. Cox , Alexis Asseman , Nicolas Antoine , Harald Huels , Winfried W. Wilcke , Ahmet S. Ozcan

Designing flexible graph kernels that can run well on various platforms is a crucial research problem due to the frequent usage of graphs for modeling data and recent architectural advances and variety. In this work, we propose a novel…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-13 Abdurrahman Yasar , Sivasankaran Rajamanickam , Jonathan W. Berry , Umit V. Catalyurek

Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers. While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-11 Jianbin Fang , Chun Huang , Tao Tang , Zheng Wang

Sparse linear algebra kernels play a critical role in numerous applications, covering from exascale scientific simulation to large-scale data analytics. Offloading linear algebra kernels on one GPU will no longer be viable in these…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-19 Jieyang Chen , Chenhao Xie , Jesun S Firoz , Jiajia Li , Shuaiwen Leon Song , Kevin Barker , Mark Raugas , Ang Li

We present a systematic, algebraically based, design methodology for efficient implementation of computer programs optimized over multiple levels of the processor/memory and network hierarchy. Using a common formalism to describe the…

Mathematical Software · Computer Science 2008-03-18 Lenore R. Mullin , James E. Raynolds

Processors with large numbers of cores are becoming commonplace. In order to take advantage of the available resources in these systems, the programming paradigm has to move towards increased parallelism. However, increasing the level of…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-10-07 Ashkan Tousimojarad , Wim Vanderbauwhede

This paper presents an acceleration framework for packing linear programming problems where the amount of data available is limited, i.e., where the number of constraints m is small compared to the variable dimension n. The framework can be…

Optimization and Control · Mathematics 2017-11-20 Palma London , Shai Vardi , Adam Wierman , Hanling Yi

In this article, we present a novel approach for block-structured adaptive mesh refinement (AMR) that is suitable for extreme-scale parallelism. All data structures are designed such that the size of the meta data in each distributed…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-24 Florian Schornbaum , Ulrich Rüde

Heterogeneous computing is emerging as a mandatory requirement for power-efficient system design. With this aim, modern heterogeneous platforms like Zynq All-Programmable SoC, that integrates ARM-based SMP and programmable logic, have been…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-08-28 Daniel Jiménez-González , Carlos Álvarez , Antonio Filgueras , Xavier Martorell , Jan Langer , Juanjo Noguera , Kees Vissers

MIRGE is a computational approach for scientific computing based on NumPy-like array computation, but using lazy evaluation to recast computation as data-flow graphs, where nodes represent immutable, multi-dimensional arrays. Evaluation of…

The last improvements in programming languages, programming models, and frameworks have focused on abstracting the users from many programming issues. Among others, recent programming frameworks include simpler syntax, automatic memory…

Programming Languages · Computer Science 2018-10-29 Cristian Ramon-Cortes , Ramon Amela , Jorge Ejarque , Philippe Clauss , Rosa M. Badia

Micro-core architectures combine many low memory, low power computing cores together in a single package. These are attractive for use as accelerators but due to limited on-chip memory and multiple levels of memory hierarchy, the way in…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-06 Maurice Jamieson , Nick Brown

The increasing penetration of inverter-based resources (IBRs) is fundamentally reshaping power system dynamics and creating new challenges for stability assessment. Data-driven approaches, and in particular machine learning models, require…

‹ Prev 1 2 3 10 Next ›