Related papers: A portable coding strategy to exploit vectorizatio…
A current trend in HPC systems is the utilization of architectures with SIMD or vector extensions to exploit data parallelism. There are several ways to take advantage of such modern vector architectures, each with a different impact on the…
Vectorization is increasingly important to achieve high performance on modern hardware with SIMD instructions. Assembly of matrices and vectors in the finite element method, which is characterized by iterating a local assembly kernel over…
Recent trends in the HPC field have introduced new CPU architectures with improved vectorization capabilities that require optimization to achieve peak performance and thus pose challenges for performance portability. The deployment of…
Particle-In-Cell (PIC) codes are broadly applied to the kinetic simulation of plasmas, from laser-matter interaction to astrophysics. Their heavy simulation cost can be mitigated by using the Single Instruction Multiple Data (SIMD)…
For years, SIMD/vector units have enhanced the capabilities of modern CPUs in High-Performance Computing (HPC) and mobile technology. Typical commercially-available SIMD units process up to 8 double-precision elements with one instruction.…
Hardware/Software (HW/SW) co-designed processors provide a promising solution to the power and complexity problems of the modern microprocessors by keeping their hardware simple. Moreover, they employ several runtime optimizations to…
In current computer architectures, data movement (from die to network) is by far the most energy consuming part of an algorithm (10pJ/word on-die to 10,000pJ/word on the network). To increase memory locality at the hardware level and reduce…
Computational Fluid Dynamics (CFD) simulations are often constrained by the memory-bound nature of sparse matrix-vector operations, which eventually limits performance on modern high-performance computing (HPC) systems. This work introduces…
Advances in quantum simulator technology is increasingly required because research on quantum algorithms is becoming more sophisticated and complex. State vector simulation utilizes CPU and memory resources in computing nodes exponentially…
The present study addresses the challenge of enhancing computational efficiency without compromising accuracy in numerical simulations of vacuum gas dynamics using the direct simulation Monte Carlo (DSMC) method. A technique termed "fixed…
In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time.…
The High Energy Physics (HEP) experiments, such as those at the Large Hadron Collider (LHC), traditionally consume large amounts of CPU cycles for detector simulations and data analysis, but rarely use compute accelerators such as GPUs. As…
QMCPACK has enabled cutting-edge materials research on supercomputers for over a decade. It scales nearly ideally but has low single-node efficiency due to the physics-based abstractions using array-of-structures objects, causing…
High Performance Computing (HPC) platforms allow scientists to model computationally intensive algorithms. HPC clusters increasingly use General-Purpose Graphics Processing Units (GPGPUs) as accelerators; FPGAs provide an attractive…
The supercomputing platforms available for high performance computing based research evolve at a great rate. However, this rapid development of novel technologies requires constant adaptations and optimizations of the existing codes for…
We present the Continuous Empirical Cubature Method (CECM), a novel algorithm for empirically devising efficient integration rules. The CECM aims to improve existing cubature methods by producing rules that are close to the optimal,…
Auto-vectorization is a fundamental optimization for modern compilers to exploit SIMD parallelism. However, state-of-the-art approaches still struggle to handle intricate code patterns, often requiring manual hints or domain-specific…
Modern HPC systems are increasingly relying on greater core counts and wider vector registers. Thus, applications need to be adapted to fully utilize these hardware capabilities. One class of applications that can benefit from this increase…
Increasing complexity of scientific simulations and HPC architectures are driving the need for adaptive workflows, where the composition and execution of computational and data manipulation steps dynamically depend on the evolutionary state…
Molecular Dynamics simulations can help scientists to gather valuable insights for physical processes on an atomic scale. This work explores various techniques for SIMD vectorization to improve the pairwise force calculation between…