English
Related papers

Related papers: An Efficient Vectorization Scheme for Stencil Comp…

200 papers

Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization techniques, aiming at exploiting the in-core data parallelism. Briefly, they either…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-18 Kun Li , Liang Yuan , Yunquan Zhang , Yue Yue , Hang Cao , Pengqi Lu

Stencil computations represent a very common class of nested loops in scientific and engineering applications. Exploiting vector units in modern CPUs is crucial to achieving peak performance. Previous vectorization approaches often consider…

Mathematical Software · Computer Science 2020-10-13 Liang Yuan , Hang Cao , Yunquan Zhang , Kun Li , Pengqi Lu , Yue Yue

Stencil computations consume a major part of runtime in many scientific simulation codes. As prototypes for this class of algorithms we consider the iterative Jacobi and Gauss-Seidel smoothers and aim at highly efficient parallel…

Performance · Computer Science 2012-03-01 Jan Treibig , Gerhard Wellein , Georg Hager

Over the last ten years, graphics processors have become the de facto accelerator for data-parallel tasks in various branches of high-performance computing, including machine learning and computational sciences. However, with the recent…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-28 Johannes Pekkilä , Oskar Lappi , Fredrik Robertsén , Maarit J. Korpi-Lagg

Stencil kernels dominate a range of scientific applications, including seismic and medical imaging, image processing, and neural networks. Temporal blocking is a performance optimization that aims to reduce the required memory bandwidth of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-26 George Bisbas , Fabio Luporini , Mathias Louboutin , Rhodri Nelson , Gerard Gorman , Paul H. J. Kelly

Optimizing the performance of stencil algorithms has been the subject of intense research over the last two decades. Since many stencil schemes have low arithmetic intensity, most optimizations focus on increasing the temporal data access…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-19 Tareq Malas , Georg Hager , Hatem Ltaief , David Keyes

Current architectures are now equipped with matrix computation units designed to enhance AI and high-performance computing applications. Within these architectures, two fundamental instruction types are matrix multiplication and vector…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-04 Wenxuan Zhao , Liang Yuan , Baicheng Yan , Penghao Ma , Yunquan Zhang , Long Wang , Zhe Wang

As investment in AI-focused accelerators grows and their deployment in supercomputing facilities expands, understanding whether these architectures can efficiently support traditional scientific kernels is critical for the future of…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-11 Lorenzo Piarulli , Daniele De Sensi

Stencil computations are a key class of applications, widely used in the scientific computing community, and a class that has particularly benefited from performance improvements on architectures with high memory bandwidth. Unfortunately,…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-27 Istvan Z Reguly , Gihan R Mudalige , Michael B Giles

The importance of stencil-based algorithms in computational science has focused attention on optimized parallel implementations for multilevel cache-based processors. Temporal blocking schemes leverage the large bandwidth and low latency of…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-07-28 Tareq Malas , Georg Hager , Hatem Ltaief , Holger Stengel , Gerhard Wellein , David Keyes

Stencil computation is essential in high-performance computing, especially for large-scale tasks like liquid simulation and weather forecasting. Optimizing its performance can reduce both energy consumption and computation time, which is…

Performance · Computer Science 2025-03-04 Hongguang Chen

Although modern supercomputers are composed of multicore machines, one can find scientists that still execute their legacy applications which were developed to monocore cluster where memory hierarchy is dedicated to a sole core. The main…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-10-31 Alexandre Sena , Aline Nascimento , Cristina Boeres , Vinod E. F. Rebello , André Bulcão

Stencil computation is an extensively-utilized class of scientific-computing applications that can be efficiently accelerated by graphics processing units (GPUs). Out-of-core approaches enable a GPU to handle large stencil codes whose data…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-19 Jingcheng Shen , Linbo Long , Jun Zhang , Weiqi Shen , Masao Okita , Fumihiko Ino

Finite-difference methods based on high-order stencils are widely used in seismic simulations, weather forecasting, computational fluid dynamics, and other scientific applications. Achieving HPC-level stencil computations on one…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-09 Ryuichi Sai , John Mellor-Crummey , Jinfan Xu , Mauricio Araya-Polo

Deep learning implementations on CPUs (Central Processing Units) are gaining more traction. Enhanced AI capabilities on commodity x86 architectures are commercially appealing due to the reuse of existing hardware and virtualization ease. A…

Machine Learning · Computer Science 2021-03-22 Shabnam Daghaghi , Nicholas Meisburger , Mengnan Zhao , Yong Wu , Sameh Gobriel , Charlie Tai , Anshumali Shrivastava

New algorithms and optimization techniques are needed to balance the accelerating trend towards bandwidth-starved multicore chips. It is well known that the performance of stencil codes can be improved by temporal blocking, lessening the…

Performance · Computer Science 2012-03-01 Markus Wittmann , Georg Hager , Gerhard Wellein

Stencils represent a class of computational patterns where an output grid point depends on a fixed shape of neighboring points in an input grid. Stencil computations are prevalent in scientific applications engaging a significant portion of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-24 Jesmin Jahan Tithi , Fabrizio Petrini , Hongbo Rong , Andrei Valentin , Carl Ebeling

Spatial computing devices have been shown to significantly accelerate stencil computations, but have so far relied on unrolling the iterative dimension of a single stencil operation to increase temporal locality. This work considers the…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-12 Johannes de Fine Licht , Andreas Kuster , Tiziano De Matteis , Tal Ben-Nun , Dominic Hofer , Torsten Hoefler

Stencil computation constitutes a cornerstone of scientific computing, serving as a critical kernel in domains ranging from fluid dynamics to weather simulation. While stencil computations are conventionally regarded as memory-bound and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-03 Qiqi Gu , Chenpeng Wu , Heng Shi , Jianguo Yao , Haibing Guan

Iterative stencils are used widely across the spectrum of High Performance Computing (HPC) applications. Many efforts have been put into optimizing stencil GPU kernels, given the prevalence of GPU-accelerated supercomputers. To improve the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-15 Lingqi Zhang , Mohamed Wahib , Peng Chen , Jintao Meng , Xiao Wang , Toshio Endo , Satoshi Matsuoka
‹ Prev 1 2 3 10 Next ›