Related papers: HONEI: A collection of libraries for numerical com…
In this era of diverse and heterogeneous computer architectures, the programmability issues, such as productivity and portable efficiency, are crucial to software development and algorithm design. One way to approach the problem is to step…
In this work, we propose an open-source, first-of-its-kind, arithmetic hardware library with a focus on accelerating the arithmetic operations involved in Ring Learning with Error (RLWE)-based somewhat homomorphic encryption (SHE). We…
This work deals with the CPU-GPU heterogeneous code acceleration of a finite-volume CFD solver utilizing multiple CPUs and GPUs at the same time. First, a high-level description of the CFD solver called SENSEI, the discretization of SENSEI,…
Developing efficient hardware accelerators for mathematical kernels used in scientific applications and machine learning has traditionally been a labor-intensive task. These accelerators typically require low-level programming in Verilog or…
votess is a library for computing parallel 3D Voronoi tessellations on heterogeneous platforms, from CPUs and GPUs, to future accelerator architectures. To do so, it leverages the SYCL abstraction layer to achieve portability and…
In this paper we focus on the integration of high-performance numerical libraries in ab initio codes and the portability of performance and scalability. The target of our work is FLEUR, a software for electronic structure calculations…
Existing GPU libraries often struggle to fully exploit the parallel resources and on-chip memory (SRAM) of GPUs when chaining multiple GPU functions as individual kernels. While Kernel Fusion (KF) techniques like Horizontal Fusion (HF) and…
The introduction of Intel(R) Xeon Phi(TM) coprocessors opened up new possibilities in development of highly parallel applications. The familiarity and flexibility of the architecture together with compiler support integrated into the Intel…
Binarized Neural Networks (BNNs) significantly reduce the computation and memory demands with binarized weights and activations compared to full-precision NNs. Executing a layer in a BNN on different devices of a heterogeneous…
It has been demonstrated that specialised architectures, such as FPGAs and AMD's AI Engines (AIEs), have the potential to deliver energy and performance advantages for scientific computing. Given the integration of AIEs into AMD's CPUs,…
The translation of linear algebra computations into efficient sequences of library calls is a non-trivial task that requires expertise in both linear algebra and high-performance computing. Almost all high-level languages and libraries for…
High-performance computing systems are more and more often based on accelerators. Computing applications targeting those systems often follow a host-driven approach in which hosts offload almost all compute-intensive sections of the code…
The recent influx of open scientific data has contributed to the transitioning of scientific computing from compute intensive to data intensive. Whereas many Big Data frameworks exist that minimize the cost of data transfers, few scientific…
GPUs are now used for a wide range of problems within HPC. However, making efficient use of the computational power available with multiple GPUs is challenging. The main challenges in achieving good performance are memory layout, affecting…
Today's world of scientific software for High Energy Physics (HEP) is powered by x86 code, while the future will be much more reliant on accelerators like GPUs and FPGAs. The portable parallelization strategies (PPS) project of the High…
This article presents an automatic approach to quickly derive a good solution for hardware resource partition and task granularity for task-based parallel applications on heterogeneous many-core architectures. Our approach employs a…
We introduce PennyLane's Lightning suite, a collection of high-performance state-vector simulators targeting CPU, GPU, and HPC-native architectures and workloads. Quantum applications such as QAOA, VQE, and synthetic workloads are…
Fully Homomorphic Encryption (FHE) refers to a set of encryption schemes that allow computations to be applied directly on encrypted data without requiring a secret key. This enables novel application scenarios where a client can safely…
We present Hawkeye, a system for analyzing and reproducing GPU-level arithmetic operations. Using our framework, anyone can re-execute on a CPU the exact matrix multiplication operations underlying a machine learning model training or…
We introduce HONEY; a new specialized programming language designed to facilitate the processing of multivariate, asynchronous and non-uniformly sampled symbolic and scalar time sequences. When compiled, a Honey program is transformed into…