English
Related papers

Related papers: A portable coding strategy to exploit vectorizatio…

200 papers

A current trend in HPC systems is the utilization of architectures with SIMD or vector extensions to exploit data parallelism. There are several ways to take advantage of such modern vector architectures, each with a different impact on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-05 Marc Blancafort , Roger Ferrer , Guillaume Houzeaux , Marta Garcia-Gasulla , Filippo Mantovani

Vectorization is increasingly important to achieve high performance on modern hardware with SIMD instructions. Assembly of matrices and vectors in the finite element method, which is characterized by iterating a local assembly kernel over…

Mathematical Software · Computer Science 2020-08-26 Tianjiao Sun , Lawrence Mitchell , Kaushik Kulkarni , Andreas Klöckner , David A. Ham , Paul H. J. Kelly

Recent trends in the HPC field have introduced new CPU architectures with improved vectorization capabilities that require optimization to achieve peak performance and thus pose challenges for performance portability. The deployment of…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-17 Gianmarco Accordi , Jens Domke , Theresa Pollinger , Davide Gadioli , Gianluca Palermo

Particle-In-Cell (PIC) codes are broadly applied to the kinetic simulation of plasmas, from laser-matter interaction to astrophysics. Their heavy simulation cost can be mitigated by using the Single Instruction Multiple Data (SIMD)…

For years, SIMD/vector units have enhanced the capabilities of modern CPUs in High-Performance Computing (HPC) and mobile technology. Typical commercially-available SIMD units process up to 8 double-precision elements with one instruction.…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-14 Pablo Vizcaino , Georgios Ieronymakis , Nikolaos Dimou , Vassilis Papaefstathiou , Jesus Labarta , Filippo Mantovani

Hardware/Software (HW/SW) co-designed processors provide a promising solution to the power and complexity problems of the modern microprocessors by keeping their hardware simple. Moreover, they employ several runtime optimizations to…

Hardware Architecture · Computer Science 2021-03-01 Rakesh Kumar , Alejandro Martinez , Antonio Gonzalez

In current computer architectures, data movement (from die to network) is by far the most energy consuming part of an algorithm (10pJ/word on-die to 10,000pJ/word on the network). To increase memory locality at the hardware level and reduce…

Computational Physics · Physics 2018-01-17 H. Vincenti , R. Lehe , R. Sasanka , J-L. Vay

Computational Fluid Dynamics (CFD) simulations are often constrained by the memory-bound nature of sparse matrix-vector operations, which eventually limits performance on modern high-performance computing (HPC) systems. This work introduces…

Advances in quantum simulator technology is increasingly required because research on quantum algorithms is becoming more sophisticated and complex. State vector simulation utilizes CPU and memory resources in computing nodes exponentially…

Quantum Physics · Physics 2024-09-04 Mikio Morita , Yoshinori Tomita , Junpei Koyama , Koichi Kimura

The present study addresses the challenge of enhancing computational efficiency without compromising accuracy in numerical simulations of vacuum gas dynamics using the direct simulation Monte Carlo (DSMC) method. A technique termed "fixed…

Fluid Dynamics · Physics 2024-07-03 Moslem Sabouri , Ramin Zakeri , Amin Ebrahimi

In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time.…

Information Retrieval · Computer Science 2021-02-02 Daniel Lemire , Leonid Boytsov

The High Energy Physics (HEP) experiments, such as those at the Large Hadron Collider (LHC), traditionally consume large amounts of CPU cycles for detector simulations and data analysis, but rarely use compute accelerators such as GPUs. As…

High Energy Physics - Experiment · Physics 2022-03-17 Zhihua Dong , Heather Gray , Charles Leggett , Meifeng Lin , Vincent R. Pascuzzi , Kwangmin Yu

QMCPACK has enabled cutting-edge materials research on supercomputers for over a decade. It scales nearly ideally but has low single-node efficiency due to the physics-based abstractions using array-of-structures objects, causing…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-08-10 Amrita Mathuriya , Ye Luo , Raymond C. Clay , Anouar Benali , Luke Shulenburger , Jeongnim Kim

High Performance Computing (HPC) platforms allow scientists to model computationally intensive algorithms. HPC clusters increasingly use General-Purpose Graphics Processing Units (GPGPUs) as accelerators; FPGAs provide an attractive…

Hardware Architecture · Computer Science 2015-04-20 Syed Waqar Nabi , Saji N. Hameed , Wim Vanderbauwhede

The supercomputing platforms available for high performance computing based research evolve at a great rate. However, this rapid development of novel technologies requires constant adaptations and optimizations of the existing codes for…

High Energy Physics - Lattice · Physics 2017-02-23 Marina Krstic Marinkovic , Luka Stanisic

We present the Continuous Empirical Cubature Method (CECM), a novel algorithm for empirically devising efficient integration rules. The CECM aims to improve existing cubature methods by producing rules that are close to the optimal,…

Numerical Analysis · Mathematics 2023-11-03 J. A. Hernandez , J. R. Bravo , S. Ares de Parga

Auto-vectorization is a fundamental optimization for modern compilers to exploit SIMD parallelism. However, state-of-the-art approaches still struggle to handle intricate code patterns, often requiring manual hints or domain-specific…

Software Engineering · Computer Science 2025-06-05 Zhongchun Zheng , Kan Wu , Long Cheng , Lu Li , Rodrigo C. O. Rocha , Tianyi Liu , Wei Wei , Jianjiang Zeng , Xianwei Zhang , Yaoqing Gao

Modern HPC systems are increasingly relying on greater core counts and wider vector registers. Thus, applications need to be adapted to fully utilize these hardware capabilities. One class of applications that can benefit from this increase…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-16 James Vance , Zhen-Hao Xu , Nikita Tretyakov , Torsten Stuehn , Markus Rampp , Sebastian Eibl , Christoph Junghans , André Brinkmann

Increasing complexity of scientific simulations and HPC architectures are driving the need for adaptive workflows, where the composition and execution of computational and data manipulation steps dynamically depend on the evolutionary state…

Computational Engineering, Finance, and Science · Computer Science 2015-06-30 Janine C. Bennett , Ankit Bhagatwala , Jacqueline H. Chen , C. Seshadhri , Ali Pinar , Maher Salloum

Molecular Dynamics simulations can help scientists to gather valuable insights for physical processes on an atomic scale. This work explores various techniques for SIMD vectorization to improve the pairwise force calculation between…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-04 Luis Gall , Samuel James Newcome , Fabio Alexander Gratl , Markus Mühlhäußer , Manish Kumar Mishra , Hans-Joachim Bungartz
‹ Prev 1 2 3 10 Next ›