English
Related papers

Related papers: Temporal Vectorization for Stencils

200 papers

Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization and tiling techniques, aiming at exploiting the in-core data parallelism and data…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-19 Kun Li , Liang Yuan , Yunquan Zhang , Yue Yue , Hang Cao , Pengqi Lu

Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization techniques, aiming at exploiting the in-core data parallelism. Briefly, they either…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-18 Kun Li , Liang Yuan , Yunquan Zhang , Yue Yue , Hang Cao , Pengqi Lu

Stencil computations consume a major part of runtime in many scientific simulation codes. As prototypes for this class of algorithms we consider the iterative Jacobi and Gauss-Seidel smoothers and aim at highly efficient parallel…

Performance · Computer Science 2012-03-01 Jan Treibig , Gerhard Wellein , Georg Hager

Stencil kernels dominate a range of scientific applications, including seismic and medical imaging, image processing, and neural networks. Temporal blocking is a performance optimization that aims to reduce the required memory bandwidth of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-26 George Bisbas , Fabio Luporini , Mathias Louboutin , Rhodri Nelson , Gerard Gorman , Paul H. J. Kelly

Current architectures are now equipped with matrix computation units designed to enhance AI and high-performance computing applications. Within these architectures, two fundamental instruction types are matrix multiplication and vector…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-04 Wenxuan Zhao , Liang Yuan , Baicheng Yan , Penghao Ma , Yunquan Zhang , Long Wang , Zhe Wang

Iterative stencils are used widely across the spectrum of High Performance Computing (HPC) applications. Many efforts have been put into optimizing stencil GPU kernels, given the prevalence of GPU-accelerated supercomputers. To improve the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-15 Lingqi Zhang , Mohamed Wahib , Peng Chen , Jintao Meng , Xiao Wang , Toshio Endo , Satoshi Matsuoka

New algorithms and optimization techniques are needed to balance the accelerating trend towards bandwidth-starved multicore chips. It is well known that the performance of stencil codes can be improved by temporal blocking, lessening the…

Performance · Computer Science 2012-03-01 Markus Wittmann , Georg Hager , Gerhard Wellein

Stencil computations are widely used to simulate the change of state of physical systems across a multidimensional grid over multiple timesteps. The state-of-the-art techniques in this area fall into three groups: cache-aware tiled looping…

Data Structures and Algorithms · Computer Science 2021-05-17 Zafar Ahmad , Rezaul Chowdhury , Rathish Das , Pramod Ganapathi , Aaron Gregory , Yimin Zhu

Stencil computation constitutes a cornerstone of scientific computing, serving as a critical kernel in domains ranging from fluid dynamics to weather simulation. While stencil computations are conventionally regarded as memory-bound and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-03 Qiqi Gu , Chenpeng Wu , Heng Shi , Jianguo Yao , Haibing Guan

Spatial computing devices have been shown to significantly accelerate stencil computations, but have so far relied on unrolling the iterative dimension of a single stencil operation to increase temporal locality. This work considers the…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-12 Johannes de Fine Licht , Andreas Kuster , Tiziano De Matteis , Tal Ben-Nun , Dominic Hofer , Torsten Hoefler

Over the last ten years, graphics processors have become the de facto accelerator for data-parallel tasks in various branches of high-performance computing, including machine learning and computational sciences. However, with the recent…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-28 Johannes Pekkilä , Oskar Lappi , Fredrik Robertsén , Maarit J. Korpi-Lagg

Stencils represent a class of computational patterns where an output grid point depends on a fixed shape of neighboring points in an input grid. Stencil computations are prevalent in scientific applications engaging a significant portion of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-24 Jesmin Jahan Tithi , Fabrizio Petrini , Hongbo Rong , Andrei Valentin , Carl Ebeling

Bandwidth-starved multicore chips have become ubiquitous. It is well known that the performance of stencil codes can be improved by temporal blocking, lessening the pressure on the memory interface. We introduce a new pipelined approach…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-06-17 Markus Wittmann , Georg Hager , Jan Treibig , Gerhard Wellein

Sparse Tensor Cores offer exceptional performance gains for AI workloads by exploiting structured 2:4 sparsity. However, their potential remains untapped for core scientific workloads such as stencil computations, which exhibit irregular…

Computational Engineering, Finance, and Science · Computer Science 2025-07-01 Qi Li , Kun Li , Haozhi Han , Liang Yuan , Junshi Chen , Yunquan Zhang , Yifeng Chen , Hong An , Ting Cao , Mao Yang

Graph embedding aims at learning a vector-based representation of vertices that incorporates the structure of the graph. This representation then enables inference of graph properties. Existing graph embedding techniques, however, do not…

Stiff systems of ordinary differential equations (ODEs) and sparse training data are common in scientific problems. This paper describes efficient, implicit, vectorized methods for integrating stiff systems of ordinary differential…

Numerical Analysis · Mathematics 2023-10-16 Mark C. Messner , Tianchen Hu , Tianju Chen

Block iterative methods are extremely important as smoothers for multigrid methods, as preconditioners for Krylov methods, and as solvers for diagonally dominant linear systems. Developing robust and efficient algorithms suitable for…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-07-16 Manuel Birke , Bobby Philip , Zhen Wang , Mark Berrill

Stencil computation is one of the most widely-used compute patterns in high performance computing applications. Spatial and temporal blocking have been proposed to overcome the memory-bound nature of this type of computation by moving…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-04 Kazuaki Matsumura , Hamid Reza Zohouri , Mohamed Wahib , Toshio Endo , Satoshi Matsuoka

Although modern supercomputers are composed of multicore machines, one can find scientists that still execute their legacy applications which were developed to monocore cluster where memory hierarchy is dedicated to a sole core. The main…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-10-31 Alexandre Sena , Aline Nascimento , Cristina Boeres , Vinod E. F. Rebello , André Bulcão

Temporal Graph Clustering (TGC) is a new task with little attention, focusing on node clustering in temporal graphs. Compared with existing static graph clustering, it can find the balance between time requirement and space requirement…

Machine Learning · Computer Science 2026-01-21 Meng Liu , Ke Liang , Siwei Wang , Xingchen Hu , Sihang Zhou , Xinwang Liu
‹ Prev 1 2 3 10 Next ›