Related papers: HONEI: A collection of libraries for numerical com…

A Generic Library for Stencil Computations

In this era of diverse and heterogeneous computer architectures, the programmability issues, such as productivity and portable efficiency, are crucial to software development and algorithm design. One way to approach the problem is to step…

Mathematical Software · Computer Science 2012-07-10 Mauro Bianco , Ugo Varetto

Fast Arithmetic Hardware Library For RLWE-Based Homomorphic Encryption

In this work, we propose an open-source, first-of-its-kind, arithmetic hardware library with a focus on accelerating the arithmetic operations involved in Ring Learning with Error (RLWE)-based somewhat homomorphic encryption (SHE). We…

Cryptography and Security · Computer Science 2020-07-06 Rashmi Agrawal , Lake Bu , Alan Ehret , Michel A. Kinsy

CPU-GPU Heterogeneous Code Acceleration of a Finite Volume Computational Fluid Dynamics Solver

This work deals with the CPU-GPU heterogeneous code acceleration of a finite-volume CFD solver utilizing multiple CPUs and GPUs at the same time. First, a high-level description of the CFD solver called SENSEI, the discretization of SENSEI,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-30 Weicheng Xue , Hongyu Wang , Christopher J. Roy

Towards An Approach to Identify Divergences in Hardware Designs for HPC Workloads

Developing efficient hardware accelerators for mathematical kernels used in scientific applications and machine learning has traditionally been a labor-intensive task. These accelerators typically require low-level programming in Verilog or…

Hardware Architecture · Computer Science 2025-09-15 Doru Thom Popovici , Mario Vega , Angelos Ioannou , Fabien Chaix , Dania Mosuli , Blair Reasoner , Tan Nguyen , Xiaokun Yang , John Shalf

votess: A multi-target, GPU-capable, parallel Voronoi tessellator

votess is a library for computing parallel 3D Voronoi tessellations on heterogeneous platforms, from CPUs and GPUs, to future accelerator architectures. To do so, it leverages the SYCL abstraction layer to achieve portability and…

Instrumentation and Methods for Astrophysics · Physics 2024-12-13 Samridh Dev Singh , Chris Byrohl , Dylan Nelson

Hybrid CPU-GPU generation of the Hamiltonian and Overlap matrices in FLAPW methods

In this paper we focus on the integration of high-performance numerical libraries in ab initio codes and the portability of performance and scalability. The target of our work is FLEUR, a software for electronic structure calculations…

Computational Engineering, Finance, and Science · Computer Science 2016-11-03 Diego Fabregat-Traver , Davor Davidović , Markus Höhnerbach , Edoardo Di Napoli

The Fused Kernel Library: A C++ API to Develop Highly-Efficient GPU Libraries

Existing GPU libraries often struggle to fully exploit the parallel resources and on-chip memory (SRAM) of GPUs when chaining multiple GPU functions as individual kernels. While Kernel Fusion (KF) techniques like Horizontal Fusion (HF) and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-09 Oscar Amoros , Albert Andaluz , Johnny Nunez , Antonio J. Pena

Efficient Hybrid Execution of C++ Applications using Intel(R) Xeon Phi(TM) Coprocessor

The introduction of Intel(R) Xeon Phi(TM) coprocessors opened up new possibilities in development of highly parallel applications. The familiarity and flexibility of the architecture together with compiler support integrated into the Intel…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-11-26 Jiri Dokulil , Enes Bajrovic , Siegfried Benkner , Sabri Pllana , Martin Sandrieser , Beverly Bachmayer

HEP-BNN: A Framework for Finding Low-Latency Execution Configurations of BNNs on Heterogeneous Multiprocessor Platforms

Binarized Neural Networks (BNNs) significantly reduce the computation and memory demands with binarized weights and activations compared to full-precision NNs. Executing a layer in a BNN on different devices of a heterogeneous…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-13 Leonard David Bereholschi , Ching-Chi Lin , Mikail Yayla , Jian-Jia Chen

Lifting to tensors when compiling scientific computing workloads for AI Engines

It has been demonstrated that specialised architectures, such as FPGAs and AMD's AI Engines (AIEs), have the potential to deliver energy and performance advantages for scientific computing. Given the integration of AIEs into AMD's CPUs,…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-06 Nick Brown , Gabriel Rodriguez-Canal

Linnea: Automatic Generation of Efficient Linear Algebra Programs

The translation of linear algebra computations into efficient sequences of library calls is a non-trivial task that requires expertise in both linear algebra and high-performance computing. Almost all high-level languages and libraries for…

Mathematical Software · Computer Science 2020-01-01 Henrik Barthels , Christos Psarras , Paolo Bientinesi

Optimization of Lattice Boltzmann Simulations on Heterogeneous Computers

High-performance computing systems are more and more often based on accelerators. Computing applications targeting those systems often follow a host-driven approach in which hosts offload almost all compute-intensive sections of the code…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-15 E. Calore , A. Gabbana , S. F. Schifano , R. Tripiccione

Sea: A lightweight data-placement library for Big Data scientific computing

The recent influx of open scientific data has contributed to the transitioning of scientific computing from compute intensive to data intensive. Whereas many Big Data frameworks exist that minimize the cost of data transfers, few scientific…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-06 Valérie Hayot-Sasson , Mathieu Dugré , Tristan Glatard

Ripple : Simplified Large-Scale Computation on Heterogeneous Architectures with Polymorphic Data Layout

GPUs are now used for a wide range of problems within HPC. However, making efficient use of the computational power available with multiple GPUs is challenging. The main challenges in achieving good performance are memory layout, affecting…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-20 Robert Clucas , Philip Blakely , Nikolaos Nikiforakis

Portability: A Necessary Approach for Future Scientific Software

Today's world of scientific software for High Energy Physics (HEP) is powered by x86 code, while the future will be much more reliant on accelerators like GPUs and FPGAs. The portable parallelization strategies (PPS) project of the High…

Computational Physics · Physics 2022-03-21 Meghna Bhattacharya , Paolo Calafiura , Taylor Childers , Mark Dewing , Zhihua Dong , Oliver Gutsche , Salman Habib , Xiangyang Ju , Michael Kirby , Kyle Knoepfel , Matti Kortelainen , Martin Kwok , Charles Leggett , Meifeng Lin , Vincent R. Pascuzzi , Alexei Strelchenko , Brett Viren , Beomki Yeo , Haiwang Yu

Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures: A Machine Learning Based Approach

This article presents an automatic approach to quickly derive a good solution for hardware resource partition and task granularity for task-based parallel applications on heterogeneous many-core architectures. Our approach employs a…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-10 Peng Zhang , Jianbin Fang , Canqun Yang , Chun Huang , Tao Tang , Zheng Wang

Hybrid quantum programming with PennyLane Lightning on HPC platforms

We introduce PennyLane's Lightning suite, a collection of high-performance state-vector simulators targeting CPU, GPU, and HPC-native architectures and workloads. Quantum applications such as QAOA, VQE, and synthetic workloads are…

Quantum Physics · Physics 2024-03-06 Ali Asadi , Amintor Dusko , Chae-Yeun Park , Vincent Michaud-Rioux , Isidor Schoch , Shuli Shu , Trevor Vincent , Lee James O'Riordan

CHET: Compiler and Runtime for Homomorphic Evaluation of Tensor Programs

Fully Homomorphic Encryption (FHE) refers to a set of encryption schemes that allow computations to be applied directly on encrypted data without requiring a secret key. This enables novel application scenarios where a client can safely…

Machine Learning · Computer Science 2018-10-02 Roshan Dathathri , Olli Saarikivi , Hao Chen , Kim Laine , Kristin Lauter , Saeed Maleki , Madanlal Musuvathi , Todd Mytkowicz

Hawkeye: Reproducing GPU-Level Non-Determinism

We present Hawkeye, a system for analyzing and reproducing GPU-level arithmetic operations. Using our framework, anyone can re-execute on a CPU the exact matrix multiplication operations underlying a machine learning model training or…

Cryptography and Security · Computer Science 2026-05-19 Erez Badash , Dan Boneh , Ilan Komargodski , Megha Srivastava

Honey: A dataflow programming language for the processing, featurization and analysis of multivariate, asynchronous and non-uniformly sampled scalar symbolic time sequences

We introduce HONEY; a new specialized programming language designed to facilitate the processing of multivariate, asynchronous and non-uniformly sampled symbolic and scalar time sequences. When compiled, a Honey program is transformed into…

Programming Languages · Computer Science 2016-09-13 Mathieu Guillame-Bert