Related papers: Casper: Accelerating Stencil Computation using Nea…

Stencil Computations on Cerebras Wafer-Scale Engine

Stencil computations are a fundamental kernel in scientific computing, critical for simulations in domains such as fluid dynamics and climate modeling. However, these computations are often memory-bound on traditional High-Performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-11 Elia Belli , Daniele De Sensi

A Portable Framework for Accelerating Stencil Computations on Modern Node Architectures

Finite-difference methods based on high-order stencils are widely used in seismic simulations, weather forecasting, computational fluid dynamics, and other scientific applications. Achieving HPC-level stencil computations on one…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-09 Ryuichi Sai , John Mellor-Crummey , Jinfan Xu , Mauricio Araya-Polo

Do We Need Tensor Cores for Stencil Computations?

Stencil computation constitutes a cornerstone of scientific computing, serving as a critical kernel in domains ranging from fluid dynamics to weather simulation. While stencil computations are conventionally regarded as memory-bound and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-03 Qiqi Gu , Chenpeng Wu , Heng Shi , Jianguo Yao , Haibing Guan

SparStencil: Retargeting Sparse Tensor Cores to Scientific Stencil Computations via Structured Sparsity Transformation

Sparse Tensor Cores offer exceptional performance gains for AI workloads by exploiting structured 2:4 sparsity. However, their potential remains untapped for core scientific workloads such as stencil computations, which exhibit irregular…

Computational Engineering, Finance, and Science · Computer Science 2025-07-01 Qi Li , Kun Li , Haozhi Han , Liang Yuan , Junshi Chen , Yunquan Zhang , Yifeng Chen , Hong An , Ting Cao , Mao Yang

Stencil Computations on AMD and Nvidia Graphics Processors: Performance and Tuning Strategies

Over the last ten years, graphics processors have become the de facto accelerator for data-parallel tasks in various branches of high-performance computing, including machine learning and computational sciences. However, with the recent…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-28 Johannes Pekkilä , Oskar Lappi , Fredrik Robertsén , Maarit J. Korpi-Lagg

Accelerating GPU-Based Out-of-Core Stencil Computation with On-the-Fly Compression

Stencil computation is an important class of scientific applications that can be efficiently executed by graphics processing units (GPUs). Out-of-core approach helps run large scale stencil codes that process data with sizes larger than the…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-14 Jingcheng Shen , Yifan Wu , Masao Okita , Fumihiko Ino

SPIDER: Unleashing Sparse Tensor Cores for Stencil Computation via Strided Swapping

Recent research has focused on accelerating stencil computations by exploiting emerging hardware like Tensor Cores. To leverage these accelerators, the stencil operation must be transformed to matrix multiplications. However, this…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-27 Qiqi GU , Chenpeng Wu , Heng Shi , Jianguo Yao

Fast Stencil Computations using Fast Fourier Transforms

Stencil computations are widely used to simulate the change of state of physical systems across a multidimensional grid over multiple timesteps. The state-of-the-art techniques in this area fall into three groups: cache-aware tiled looping…

Data Structures and Algorithms · Computer Science 2021-05-17 Zafar Ahmad , Rezaul Chowdhury , Rathish Das , Pramod Ganapathi , Aaron Gregory , Yimin Zhu

Beyond 16GB: Out-of-Core Stencil Computations

Stencil computations are a key class of applications, widely used in the scientific computing community, and a class that has particularly benefited from performance improvements on architectures with high memory bandwidth. Unfortunately,…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-27 Istvan Z Reguly , Gihan R Mudalige , Michael B Giles

Compression-Based Optimizations for Out-of-Core GPU Stencil Computation

An out-of-core stencil computation code handles large data whose size is beyond the capacity of GPU memory. Whereas, such an code requires streaming data to and from the GPU frequently. As a result, data movement between the CPU and GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-26 Jingcheng Shen , Xin Deng , Yifan Wu , Masao Okita , Fumihiko Ino

Accelerating High-Order Stencils on GPUs

Stencil computations are widely used in HPC applications. Today, many HPC platforms use GPUs as accelerators. As a result, understanding how to perform stencil computations fast on GPUs is important. While implementation strategies for…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-16 Ryuichi Sai , John Mellor-Crummey , Xiaozhu Meng , Mauricio Araya-Polo , Jie Meng

A Synergy between On- and Off-Chip Data Reuse for GPU-based Out-of-Core Stencil Computation

Stencil computation is an extensively-utilized class of scientific-computing applications that can be efficiently accelerated by graphics processing units (GPUs). Out-of-core approaches enable a GPU to handle large stencil codes whose data…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-19 Jingcheng Shen , Linbo Long , Jun Zhang , Weiqi Shen , Masao Okita , Fumihiko Ino

An MLIR Lowering Pipeline for Stencils at Wafer-Scale

The Cerebras Wafer-Scale Engine (WSE) delivers performance at an unprecedented scale of over 900,000 compute units, all connected via a single-wafer on-chip interconnect. Initially designed for AI, the WSE architecture is also well-suited…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-27 Nicolai Stawinoga , David Katz , Anton Lydike , Justs Zarins , Nick Brown , George Bisbas , Tobias Grosser

Mapping Stencils on Coarse-grained Reconfigurable Spatial Architecture

Stencils represent a class of computational patterns where an output grid point depends on a fixed shape of neighboring points in an input grid. Stencil computations are prevalent in scientific applications engaging a significant portion of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-24 Jesmin Jahan Tithi , Fabrizio Petrini , Hongbo Rong , Andrei Valentin , Carl Ebeling

High Performance Computing with FPGAs and OpenCL

In this work we evaluate the potential of FPGAs for accelerating HPC workloads as a more power-efficient alternative to GPUs. Using High-Level Synthesis and a large set of optimization techniques, we show that FPGAs can achieve better…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-17 Hamid Reza Zohouri

Evaluation of Programming Models and Performance for Stencil Computation on Current GPU Architectures

Accelerated computing is widely used in high-performance computing. Therefore, it is crucial to experiment and discover how to better utilize GPUGPUs latest generations on relevant applications. In this paper, we present results and share…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-13 Baodi Shan , Mauricio Araya-Polo

Stencil Computations on Tenstorrent Wormhole

As investment in AI-focused accelerators grows and their deployment in supercomputing facilities expands, understanding whether these architectures can efficiently support traditional scientific kernels is critical for the future of…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-11 Lorenzo Piarulli , Daniele De Sensi

QPU Micro-Kernels for Stencil Computation

We introduce QPU micro-kernels: shallow quantum circuits that perform a stencil node update and return a Monte Carlo estimate from repeated measurements. We show how to use them to solve Partial Differential Equations (PDEs) explicitly…

Emerging Technologies · Computer Science 2025-11-18 Stefano Markidis , Luca Pennati , Marco Pasquale , Gilbert Netzer , Ivy Peng

Tight Bounds for Low Dimensional Star Stencils in the Parallel External Memory Model

Stencil computations on low dimensional grids are kernels of many scientific applications including finite difference methods used to solve partial differential equations. On typical modern computer architectures, such stencil computations…

Computational Complexity · Computer Science 2015-01-23 Philipp Hupp , Riko Jacob

Massively scalable stencil algorithm

Stencil computations lie at the heart of many scientific and industrial applications. Unfortunately, stencil algorithms perform poorly on machines with cache based memory hierarchy, due to low re-use of memory accesses. This work shows that…

Mathematical Software · Computer Science 2022-04-11 Mathias Jacquelin , Mauricio Araya-Polo , Jie Meng