English
Related papers

Related papers: Efficient multicore-aware parallelization strategi…

200 papers

New algorithms and optimization techniques are needed to balance the accelerating trend towards bandwidth-starved multicore chips. It is well known that the performance of stencil codes can be improved by temporal blocking, lessening the…

Performance · Computer Science 2012-03-01 Markus Wittmann , Georg Hager , Gerhard Wellein

Bandwidth-starved multicore chips have become ubiquitous. It is well known that the performance of stencil codes can be improved by temporal blocking, lessening the pressure on the memory interface. We introduce a new pipelined approach…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-06-17 Markus Wittmann , Georg Hager , Jan Treibig , Gerhard Wellein

Block iterative methods are extremely important as smoothers for multigrid methods, as preconditioners for Krylov methods, and as solvers for diagonally dominant linear systems. Developing robust and efficient algorithms suitable for…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-07-16 Manuel Birke , Bobby Philip , Zhen Wang , Mark Berrill

Iterative stencils are used widely across the spectrum of High Performance Computing (HPC) applications. Many efforts have been put into optimizing stencil GPU kernels, given the prevalence of GPU-accelerated supercomputers. To improve the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-15 Lingqi Zhang , Mohamed Wahib , Peng Chen , Jintao Meng , Xiao Wang , Toshio Endo , Satoshi Matsuoka

Optimizing the performance of stencil algorithms has been the subject of intense research over the last two decades. Since many stencil schemes have low arithmetic intensity, most optimizations focus on increasing the temporal data access…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-19 Tareq Malas , Georg Hager , Hatem Ltaief , David Keyes

We present the design, implementation, and evaluation of optimized matrix-free stencil kernels for multigrid smoothing in the incompressible Stokes equations with variable viscosity, motivated by geophysical flow problems. We investigate…

Computational Physics · Physics 2025-09-24 Marcel Ferrari , Cyrill Püntener , Alexander Sotoudeh , Niklas Viebig

The importance of stencil-based algorithms in computational science has focused attention on optimized parallel implementations for multilevel cache-based processors. Temporal blocking schemes leverage the large bandwidth and low latency of…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-07-28 Tareq Malas , Georg Hager , Hatem Ltaief , Holger Stengel , Gerhard Wellein , David Keyes

Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization and tiling techniques, aiming at exploiting the in-core data parallelism and data…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-19 Kun Li , Liang Yuan , Yunquan Zhang , Yue Yue , Hang Cao , Pengqi Lu

Stencil kernels dominate a range of scientific applications, including seismic and medical imaging, image processing, and neural networks. Temporal blocking is a performance optimization that aims to reduce the required memory bandwidth of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-26 George Bisbas , Fabio Luporini , Mathias Louboutin , Rhodri Nelson , Gerard Gorman , Paul H. J. Kelly

Stencil computations represent a very common class of nested loops in scientific and engineering applications. Exploiting vector units in modern CPUs is crucial to achieving peak performance. Previous vectorization approaches often consider…

Mathematical Software · Computer Science 2020-10-13 Liang Yuan , Hang Cao , Yunquan Zhang , Kun Li , Pengqi Lu , Yue Yue

Although modern supercomputers are composed of multicore machines, one can find scientists that still execute their legacy applications which were developed to monocore cluster where memory hierarchy is dedicated to a sole core. The main…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-10-31 Alexandre Sena , Aline Nascimento , Cristina Boeres , Vinod E. F. Rebello , André Bulcão

Modern compute nodes in high-performance computing provide a tremendous level of parallelism and processing power. However, as arithmetic performance has been observed to increase at a faster rate relative to memory and network bandwidths,…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-11 Johannes Pekkilä , Miikka S. Väisälä , Maarit J. Käpylä , Matthias Rheinhardt , Oskar Lappi

General Purpose Graphics Processing Units (GPGPU) are used in most of the top systems in HPC. The total capacity of scratchpad memory has increased by more than 40 times in the last decade. However, existing optimizations for stencil…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-07 Lingqi Zhang , Mohamed Wahib , Peng Chen , Jintao Meng , Xiao Wang , Toshio Endo , Satoshi Matsuoka

This article presents a systematic quantitative performance analysis for large finite element computations on extreme scale computing systems. Three parallel iterative solvers for the Stokes system, discretized by low order tetrahedral…

Computational Engineering, Finance, and Science · Computer Science 2015-11-09 Björn Gmeiner , Markus Huber , Lorenz John , Ulrich Rüde , Barbara Wohlmuth

Matrix-accelerated stencil computation is a hot research topic, yet its application to three-dimensional (3D) high-order stencils and HPC remains underexplored. With the emergence of matrix units on multicore CPUs, we analyze matrix-based…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-16 Yinuo Wang , Tianqi Mao , Lin Gan , Wubing Wan , Zeyu Song , Jiayu Fu , Lanke He , Wenqiang Wang , Zekun Yin , Wei Xue , Guangwen Yang

Stencil computation is one of the most widely-used compute patterns in high performance computing applications. Spatial and temporal blocking have been proposed to overcome the memory-bound nature of this type of computation by moving…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-04 Kazuaki Matsumura , Hamid Reza Zohouri , Mohamed Wahib , Toshio Endo , Satoshi Matsuoka

Stencil computations are a key class of applications, widely used in the scientific computing community, and a class that has particularly benefited from performance improvements on architectures with high memory bandwidth. Unfortunately,…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-27 Istvan Z Reguly , Gihan R Mudalige , Michael B Giles

Understanding and optimizing the properties of solar cells is becoming a key issue in the search for alternatives to nuclear and fossil energy sources. A theoretical analysis via numerical simulations involves solving Maxwell's Equations in…

Computational Engineering, Finance, and Science · Computer Science 2015-10-20 Tareq M. Malas , Julian Hornich , Georg Hager , Hatem Ltaief , Christoph Pflaum , David E. Keyes

Smoothing filter is the method of choice for image preprocessing and pattern recognition. We present a new concurrent method for smoothing 2D object in binary case. Proposed method provides a parallel computation while preserving the…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-04-01 Ramzi Mahmoudi , Mohamed Akil

Over the last ten years, graphics processors have become the de facto accelerator for data-parallel tasks in various branches of high-performance computing, including machine learning and computational sciences. However, with the recent…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-28 Johannes Pekkilä , Oskar Lappi , Fredrik Robertsén , Maarit J. Korpi-Lagg
‹ Prev 1 2 3 10 Next ›